Commit Graph

26 Commits

Author SHA1 Message Date
Charlie Marsh c0f6406c76
Migrate to published `astral-tokio-tar` crate (#11260)
We now publish this to `crates.io`:
https://crates.io/crates/astral-tokio-tar
2025-02-05 15:43:33 -05:00
Charlie Marsh 85461c2c90
Avoid setting permissions during tar extraction (#11191)
## Summary

As in our zip operation (and like pip), we want to explicitly avoid
setting permissions during unpacking -- apart from setting the
executable bit.

This depends on https://github.com/astral-sh/tokio-tar/pull/8.

Closes https://github.com/astral-sh/uv/issues/11188.
2025-02-03 19:29:11 +00:00
Charlie Marsh 7b43baf251
Use Astral-maintained `tokio-tar` fork (#11174)
## Summary

I shipped one security fix here along with several significant
performance improvements for large TAR files:

- https://github.com/astral-sh/tokio-tar/pull/2
- https://github.com/astral-sh/tokio-tar/pull/4
- https://github.com/astral-sh/tokio-tar/pull/5

I also PR'd the security fix to `edera-dev`
(https://github.com/edera-dev/tokio-tar/pull/4).
2025-02-03 17:51:35 +00:00
samypr100 4d3809cc6b
Upgrade Rust toolchain to 1.84.0 (#10533)
## Summary
Upgrade the rust toolchain to 1.84.0. This PR does not bump the MSRV.
2025-01-11 22:19:33 -05:00
Charlie Marsh f3264583ac
Sanitize filenames during zip extraction (#8732)
## Summary

Based on the example in `async-zip`:
527bda9d58/examples/file_extraction.rs (L33)

Closes: https://github.com/astral-sh/uv/issues/8731.

## Test Plan

Created https://github.com/astral-sh/sanitize-wheel-test.
2024-10-31 19:12:51 +00:00
Charlie Marsh 14507a1793
Add `uv-` prefix to all internal crates (#7853)
## Summary

Brings more consistency to the repo and ensures that all crates
automatically show up in `--verbose` logging.
2024-10-01 20:15:32 -04:00
Aditya Pratap Singh 3d62154849
Add support for remaining pip-supported file extensions (#7387)
closes #7365 

Summary

This pull request adds support for additional file extension aliases in
the SourceDistExtension and ExtensionError enums. The newly supported
file extensions include .tbz, .tgz, .txz, .tar.lz, .tar.lzma. These
changes align the extensions supported by the SourceDistExtension with
those used in Python packaging tools, enhancing compatibility with a
broader range of source distribution formats.

Test Plan
should be added or updated to verify that the new extensions are
correctly recognized as valid source distributions and that errors are
correctly raised when unsupported extensions are provided.
2024-09-14 19:59:07 +00:00
Charlie Marsh 21408c1f35
Enforce extension validity at parse time (#5888)
## Summary

This PR adds a `DistExtension` field to some of our distribution types,
which requires that we validate that the file type is known and
supported when parsing (rather than when attempting to unzip). It
removes a bunch of extension parsing from the code too, in favor of
doing it once upfront.

Closes https://github.com/astral-sh/uv/issues/5858.
2024-08-08 21:39:47 -04:00
Charlie Marsh 22dbb1741b
Avoid monomorphization of `untar` (#5743)
## Summary

This reduces the LLVM lines of the entire project by about 15%.

Before:

```
  Lines                  Copies               Function name
  -----                  ------               -------------
  1742332                42054                (TOTAL)
    22384 (1.3%,  1.3%)     16 (0.0%,  0.0%)  tokio_tar::entry::EntryFields<R>::unpack::{{closure}}
    19113 (1.1%,  2.4%)     69 (0.2%,  0.2%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    18352 (1.1%,  3.4%)     80 (0.2%,  0.4%)  tokio_tar::entry::EntryFields<R>::unpack::{{closure}}::{{closure}}
    16871 (1.0%,  4.4%)    613 (1.5%,  1.9%)  core::result::Result<T,E>::map_err
    14747 (0.8%,  5.2%)    594 (1.4%,  3.3%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    12576 (0.7%,  6.0%)     16 (0.0%,  3.3%)  <tokio_tar::archive::Entries<R> as futures_core::stream::Stream>::poll_next
    12314 (0.7%,  6.7%)    226 (0.5%,  3.8%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    11895 (0.7%,  7.4%)    296 (0.7%,  4.5%)  std::panicking::try
    11718 (0.7%,  8.0%)     63 (0.1%,  4.7%)  alloc::raw_vec::RawVec<T,A>::try_allocate_in
    11152 (0.6%,  8.7%)     16 (0.0%,  4.7%)  uv_extract::stream::untar_in::{{closure}}
    10977 (0.6%,  9.3%)      1 (0.0%,  4.7%)  uv::run::{{closure}}::{{closure}}
    10859 (0.6%,  9.9%)     77 (0.2%,  4.9%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    10508 (0.6%, 10.5%)     18 (0.0%,  5.0%)  <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::size_hint
    10260 (0.6%, 11.1%)    138 (0.3%,  5.3%)  core::iter::traits::iterator::Iterator::try_fold
    10196 (0.6%, 11.7%)      8 (0.0%,  5.3%)  uv_extract::stream::unzip::{{closure}}
    10178 (0.6%, 12.3%)      7 (0.0%,  5.3%)  uv_client::cached_client::CachedClient::get_cacheable::{{closure}}::{{closure}}
     9698 (0.6%, 12.8%)    293 (0.7%,  6.0%)  tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
```

After:

```
  Lines                  Copies               Function name
  -----                  ------               -------------
  1496463                37891                (TOTAL)
    14958 (1.0%,  1.0%)     54 (0.1%,  0.1%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    13997 (0.9%,  1.9%)    564 (1.5%,  1.6%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    12776 (0.9%,  2.8%)    463 (1.2%,  2.9%)  core::result::Result<T,E>::map_err
    12381 (0.8%,  3.6%)    227 (0.6%,  3.5%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    11895 (0.8%,  4.4%)    296 (0.8%,  4.2%)  std::panicking::try
    10977 (0.7%,  5.1%)      1 (0.0%,  4.2%)  uv::run::{{closure}}::{{closure}}
    10859 (0.7%,  5.9%)     77 (0.2%,  4.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    10508 (0.7%,  6.6%)     18 (0.0%,  4.5%)  <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::size_hint
    10196 (0.7%,  7.3%)      8 (0.0%,  4.5%)  uv_extract::stream::unzip::{{closure}}
    10178 (0.7%,  7.9%)      7 (0.0%,  4.5%)  uv_client::cached_client::CachedClient::get_cacheable::{{closure}}::{{closure}}
     9698 (0.6%,  8.6%)    293 (0.8%,  5.3%)  tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
     9078 (0.6%,  9.2%)      9 (0.0%,  5.3%)  core::slice::sort::partition_in_blocks
     8928 (0.6%,  9.8%)     48 (0.1%,  5.4%)  alloc::raw_vec::RawVec<T,A>::try_allocate_in
     8288 (0.6%, 10.3%)    296 (0.8%,  6.2%)  std::panicking::try::do_catch
     8190 (0.5%, 10.9%)    108 (0.3%,  6.5%)  core::iter::traits::iterator::Iterator::try_fold
     7540 (0.5%, 11.4%)    466 (1.2%,  7.7%)  core::ops::function::FnOnce::call_once
     6612 (0.4%, 11.8%)    296 (0.8%,  8.5%)  std::panicking::try::do_call
     6513 (0.4%, 12.3%)     56 (0.1%,  8.7%)  tokio::runtime::task::core::Cell<T,S>::new
     6438 (0.4%, 12.7%)    269 (0.7%,  9.4%)  alloc::boxed::Box<T>::new
     6360 (0.4%, 13.1%)     20 (0.1%,  9.4%)  <toml_edit:🇩🇪:value::ValueDeserializer as serde:🇩🇪:Deserializer>::deserialize_any
```
2024-08-03 23:29:17 +00:00
Charlie Marsh 750b3a7c8c
Avoid setting executable permissions on files we might not own (#5582)
## Summary

If we just created an entrypoint script, we can of course set the
permissions (we just created it). However, if we're copying from the
cache, we might _not_ own the file. In that case, if we need to change
the permissions (we shouldn't, since the script is likely already
executable -- we set the permissions when we unzip, but I guess they
could _not_ be properly set in the zip itself), we have to copy it.

Closes https://github.com/astral-sh/uv/issues/5581.
2024-07-30 12:32:52 +00:00
Charlie Marsh f70501a22e
Use a consistent buffer size when writing out zip files (#5570) 2024-07-29 14:45:33 -04:00
Charlie Marsh 05b1f51aa1
Use a consistent buffer size for downloads (#5569) 2024-07-29 14:31:39 -04:00
Krishnan Chandra 4b4128446d
Support xz compressed packages (#5513)
## Summary

Closes #2187.

The [xz
backdoor](https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27)
is still fairly recent, but luckily the [Rust `xz2` crate bundles
version 5.2.5 of the C `xz`
package](https://github.com/alexcrichton/xz2-rs/tree/main/lzma-sys),
which is before the backdoor was introduced.

It's worth noting that a security risk still exists if you have a
compromised version of `xz` installed on your system, but that risk is
not introduced by `uv` or the Rust packages in general.

## Test Plan

Tried installing the package mentioned in the linked issue: `python-apt
@
https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/python-apt/2.7.6/python-apt_2.7.6.tar.xz`

(Note that this will only work on Ubuntu - I tried on a Mac and while
the archive was extracted properly, the package did not install because
of some missing files)

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-07-28 18:37:48 +00:00
Sergey Kolosov d2551bb2bd
Add support for .tar.bz2 source distributions (#3069)
## Summary

Source distributions in the .tar.bz2 format are still relatively common
within the existing code-bases, namely, the most common examples are the
Twisted source distributions up to the version 20.3.0. As quite so often
the ability to upgrade Twisted to a more recent version is not available
for a given project, we add the support for .tar.bz2 here to still allow
`uv` to be a drop-in replacement for `pip` in these projects.

## Test Plan

The feature was tested both by adding the corresponding test coverage,
and by directly installing a package of interest under a Python version
that doesn't have the corresponding wheel:

```sh
cargo run venv -p python3.8
cargo run pip install Twisted==20.3.0 --no-cache
```

The `--no-cache` argument in the example above serves the purpose of
cleaning the cached information regarding the unsatisfiability of the
requirements, as it may have been cached during some previous attempt to
install this package by `uv` version that didn't implement this feature
yet.
2024-04-16 18:34:55 +00:00
Charlie Marsh 1f3b5bb093
Add hash-checking support to `install` and `sync` (#2945)
## Summary

This PR adds support for hash-checking mode in `pip install` and `pip
sync`. It's a large change, both in terms of the size of the diff and
the modifications in behavior, but it's also one that's hard to merge in
pieces (at least, with any test coverage) since it needs to work
end-to-end to be useful and testable.

Here are some of the most important highlights:

- We store hashes in the cache. Where we previously stored pointers to
unzipped wheels in the `archives` directory, we now store pointers with
a set of known hashes. So every pointer to an unzipped wheel also
includes its known hashes.
- By default, we don't compute any hashes. If the user runs with
`--require-hashes`, and the cache doesn't contain those hashes, we
invalidate the cache, redownload the wheel, and compute the hashes as we
go. For users that don't run with `--require-hashes`, there will be no
change in performance. For users that _do_, the only change will be if
they don't run with `--generate-hashes` -- then they may see some
repeated work between resolution and installation, if they use `pip
compile` then `pip sync`.
- Many of the distribution types now include a `hashes` field, like
`CachedDist` and `LocalWheel`.
- Our behavior is similar to pip, in that we enforce hashes when pulling
any remote distributions, and when pulling from our own cache. Like pip,
though, we _don't_ enforce hashes if a distribution is _already_
installed.
- Hash validity is enforced in a few different places:
1. During resolution, we enforce hash validity based on the hashes
reported by the registry. If we need to access a source distribution,
though, we then enforce hash validity at that point too, prior to
running any untrusted code. (This is enforced in the distribution
database.)
2. In the install plan, we _only_ add cached distributions that have
matching hashes. If a cached distribution is missing any hashes, or the
hashes don't match, we don't return them from the install plan.
3. In the downloader, we _only_ return distributions with matching
hashes.
4. The final combination of "things we install" are: (1) the wheels from
the cache, and (2) the downloaded wheels. So this ensures that we never
install any mismatching distributions.
- Like pip, if `--require-hashes` is provided, we require that _all_
distributions are pinned with either `==` or a direct URL. We also
require that _all_ distributions have hashes.

There are a few notable TODOs:

- We don't support hash-checking mode for unnamed requirements. These
should be _somewhat_ rare, though? Since `pip compile` never outputs
unnamed requirements. I can fix this, it's just some additional work.
- We don't automatically enable `--require-hashes` with a hash exists in
the requirements file. We require `--require-hashes`.

Closes #474.

## Test Plan

I'd like to add some tests for registries that report incorrect hashes,
but otherwise: `cargo test`
2024-04-10 19:09:03 +00:00
Zanie Blue 44e39bdca3
Replace Python bootstrapping script with Rust implementation (#2842)
See https://github.com/astral-sh/uv/issues/2617

Note this also includes:
- #2918 
- #2931 (pending)

A first step towards Python toolchain management in Rust.

First, we add a new crate to manage Python download metadata:

- Adds a new `uv-toolchain` crate
- Adds Rust structs for Python version download metadata
- Duplicates the script which downloads Python version metadata
- Adds a script to generate Rust code from the JSON metadata
- Adds a utility to download and extract the Python version

I explored some alternatives like a build script using things like
`serde` and `uneval` to automatically construct the code from our
structs but deemed it to heavy. Unlike Rye, I don't generate the Rust
directly from the web requests and have an intermediate JSON layer to
speed up iteration on the Rust types.

Next, we add add a `uv-dev` command `fetch-python` to download Python
versions per the bootstrapping script.

- Downloads a requested version or reads from `.python-versions`
- Extracts to `UV_BOOTSTRAP_DIR`
- Links executables for path extension

This command is not really intended to be user facing, but it's a good
PoC for the `uv-toolchain` API. Hash checking (via the sha256) isn't
implemented yet, we can do that in a follow-up.

Finally, we remove the `scripts/bootstrap` directory, update CI to use
the new command, and update the CONTRIBUTING docs.

<img width="1023" alt="Screenshot 2024-04-08 at 17 12 15"
src="https://github.com/astral-sh/uv/assets/2586601/57bd3cf1-7477-4bb8-a8e9-802a00d772cb">
2024-04-10 11:22:41 -05:00
Zanie Blue bdeab55193
Add extract support for zstd (#2861)
We need this to extract toolchain downloads
2024-04-08 15:34:08 -05:00
Charlie Marsh dc2c289dff
Upgrade `rs-async-zip` to support data descriptors (#2809)
## Summary

Upgrading `rs-async-zip` enables us to support data descriptors in
streaming. This both greatly improves performance for indexes that use
data descriptors _and_ ensures that we support them in a few other
places (e.g., zipped source distributions created in Finder).

Closes #2808.
2024-04-04 01:31:40 +00:00
Jonathon Belotti e16140a849
Address #2220 (slow download perf against PyPi mirror) (#2319)
## Summary

Addressing the extremely slow performance detailed in
https://github.com/astral-sh/uv/issues/2220. There are two changes to
increase download performance:

1. setting `accept-encoding: identity`, in the spirit of
https://github.com/pypa/pip/pull/1688
2. increasing buffer from 8KiB to 128KiB. 

### 1. accept-encoding: identity

I think this related `pip` PR has a good explanation of what's going on:
https://github.com/pypa/pip/pull/1688

```
  # We use Accept-Encoding: identity here because requests
  # defaults to accepting compressed responses. This breaks in
  # a variety of ways depending on how the server is configured.
  # - Some servers will notice that the file isn't a compressible
  #   file and will leave the file alone and with an empty
  #   Content-Encoding
  # - Some servers will notice that the file is already
  #   compressed and will leave the file alone and will add a
  #   Content-Encoding: gzip header
  # - Some servers won't notice anything at all and will take
  #   a file that's already been compressed and compress it again
  #   and set the Content-Encoding: gzip header
```

The `files.pythonhosted.org` server is the 1st kind. Example debug log I
added in `uv` when installing against PyPI:

<img width="1459" alt="image"
src="https://github.com/astral-sh/uv/assets/12058921/ef10d758-46aa-4c8e-9dba-47f33437401b">

(there is no `content-encoding` header in this response, the `whl`
hasn't been compressed, and there is a content-length header)

Our internal mirror is the third case. It does seem sensible that our
mirror should be modified to act like the 1st kind. But `uv` should
handle all three cases like `pip` does.

### 2. buffer increase

In https://github.com/astral-sh/uv/issues/2220 I observed that `pip`'s
downloading was causing up-to 128KiB flushes in our mirror.

After fix 1, `uv` was still only causing up-to 8KiB flushes, and was
slower to download than `pip`. Increasing this buffer from the default
8KiB led to a download performance improvement against our mirror and
the expected observed 128KiB flushes.

## Test Plan

Ran benchmarking as instructed by @charliermarsh 

<img width="1447" alt="image"
src="https://github.com/astral-sh/uv/assets/12058921/840d9c8d-4b98-4bfa-89f3-073a2dec1f23">

No performance improvement or regression.
2024-03-09 19:49:29 -05:00
danieleades 8d721830db
Clippy pedantic (#1963)
Address a few pedantic lints

lints are separated into separate commits so they can be reviewed
individually.

I've not added enforcement for any of these lints, but that could be
added if desirable.
2024-02-25 14:04:05 -05:00
Charlie Marsh a1f50418fd
Avoid erroring for source distributions with symlinks in archive (#1944)
## Summary

For context, we have three extraction paths:

- untar (async) - used for any `.tar.gz`, local or remote.
- unzip (async) - used to unzip remote wheels, or local or remote source
distributions.
- unzip (sync) - used to untar locally-available wheels into the cache.

We use three different crates for these:

- [`tokio-tar`](https://github.com/vorot93/tokio-tar)
- [`async-zip`](https://github.com/Majored/rs-async-zip)
- [`zip-rs`](https://github.com/zip-rs/zip)

These all seem to have different support for symlinks:

- `tokio-tar` tries to create a symlink (which works fine on Unix but
errors on Windows, since we typically don't have elevated permissions).
- `async-zip` _seems_ to write the target contents directly to the file
(which is what we want).
- `zip-rs` _apparently_ writes the _name_ of the target to the file
(which isn't what we want).

Thankfully, symlinks are not allowed in wheels
(https://github.com/pypa/pip/issues/5919,
https://discuss.python.org/t/symbolic-links-in-wheels/1945), so we can
ignore `zip-rs`.

For `tokio-tar`, we now _skip_ (and warn) if we see a symlink on
Windows. We could do what pip does, and recursively copy, but it's
difficult because we don't have `Seek` on the file. (Alternatively, we
could use hard links and junctions, though those also might need to
exist already.) Let's see how far this gets us.

(We also no longer attempt to set permissions on symlinks on Unix, which
caused another failure.)

Closes https://github.com/astral-sh/uv/issues/1858.
2024-02-24 03:22:13 +00:00
Charlie Marsh 19890feb77
Preserve executable bit when untarring archives (#1790)
## Summary

Closes https://github.com/astral-sh/uv/issues/1767.
2024-02-21 14:18:44 +00:00
Andrew Gallant d6aaf7be30 uv-extract: call `target.as_ref()` once
This is just a style thing.
2024-02-20 10:57:51 -05:00
konsti 9a720a87c8
Only preserve the executable bit (#1743)
A file in a zip can set arbitrary unix permissions, but we, like pip,
want to preserve only the executable bit and otherwise use the OS
defaults.

This should be faster for wheels with many files since we now avoid the
blocking fs call to set the permissions in most cases.

Fixes #1740.
2024-02-20 16:41:05 +01:00
konsti f7722c02c1
Don't preserve timestamp in streaming unzip (#1749)
## Summary

Don't preserve mtime to work around alexcrichton/tar-rs#349. Same as
#634 except for the streaming unzip.

Fixes #1748.

## Test Plan

Added the tomli source dist as test case.
2024-02-20 14:54:28 +00:00
Zanie Blue 2586f655bb
Rename to `uv` (#1302)
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.

Then, update directory and file names:

```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g

# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```

Then add all the files again

```
# Add all the files again
git add crates
git add python/uv

# This one needs a force-add
git add -f crates/uv-trampoline
```
2024-02-15 11:19:46 -06:00