Venvs should not be in source distributions, and on Unix, we now reject
them for having a link outside the source directory. This PR adds a hint
for that since users were confused (#15096).
In the process, we're differentiating IO errors for format error for
uncompression generally.
Fixes#15096
## Summary
uv will now reject ZIP files that meet any of the following conditions:
- Multiple local header entries exist for the same file with different
contents.
- A local header entry exists for a file that isn't included in the
end-of-central directory record.
- An entry exists in the end-of-central directory record that does not
have a corresponding local header.
- The ZIP file contains contents after the first end-of-central
directory record.
- The CRC32 doesn't match between the local file header and the
end-of-central directory record.
- The compressed size doesn't match between the local file header and
the end-of-central directory record.
- The uncompressed size doesn't match between the local file header and
the end-of-central directory record.
- The reported central directory offset (in the end-of-central-directory
header) does not match the actual offset.
- The reported ZIP64 end of central directory locator offset does not
match the actual offset.
We also validate the above for files with data descriptors, which we
previously ignored.
Wheels from the most recent releases of the top 15,000 packages on PyPI
have been confirmed to pass these checks, and PyPI will also reject ZIPs
under many of the same conditions (at upload time) in the future.
In rare cases, this validation can be disabled by setting
`UV_INSECURE_NO_ZIP_VALIDATION=1`. Any validations should be reported to
the uv issue tracker and to the upstream package maintainer.
Fixes#12618
Instead of succeeding the user now gets:
```
uvdloc pip install osqp==1.0.2 --reinstall --python-platform=linux
Resolved 7 packages in 171ms
× Failed to download `osqp==1.0.2`
├─▶ Failed to extract archive
╰─▶ a computed CRC32 value did not match the expected value
```
I am not entirely sure if we have infra for testing this kind of thing,
but it would be nice to check in a test or two. I'm also not entirely
clear if there's any cases where these checks are overzealous.
## Summary
Workaround the `stream_wheel` not retry issue
[found](https://github.com/astral-sh/uv/issues/3514#issuecomment-2229820667)
in #3514, it's not a perfect solution but I think it's acceptable
because the error should not occur frequently.
## Test Plan
Manually using `iptables -A OUTPUT -p tcp -dport 3128 -j REJECT
--reject-with tcp-reset` to inject connection reset error to the HTTP
proxy that proxies PyPI requests.
```
error: Failed to prepare distributions
Caused by: Failed to fetch wheel: piqp==0.4.1
Caused by: Request failed after 3 retries
Caused by: error sending request for url (09ade94dfdd3c368ac505b6ca09831/piqp-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)
Caused by: client error (Connect)
Caused by: tcp connect error: Connection refused (os error 111)
Caused by: Connection refused (os error 111)
```
## Summary
Some zip files can't be streamed; in particular, `rs-async-zip` doesn't
support data descriptors right now (though it may in the future). This
PR adds a fallback path for such zips that downloads the entire zip file
to disk, then unzips it from disk (which gives us `Seek`).
Closes https://github.com/astral-sh/uv/issues/2216.
## Test Plan
`cargo run pip install --extra-index-url https://buf.build/gen/python
hashb_foxglove_protocolbuffers_python==25.3.0.1.20240226043130+465630478360
--force-reinstall -n`
## Summary#1562
It turns out that `hexdump` uses an invalid source distribution format
whereby the contents aren't nested in a top-level directory -- instead,
they're all just flattened at the top-level. In looking at pip's source
(51de88ca64/src/pip/_internal/utils/unpacking.py (L62)),
it only strips the top-level directory if all entries have the same
directory prefix (i.e., if it's the only thing in the directory). This
PR accommodates these "invalid" distributions.
I can't find any history on this method in `pip`. It looks like it dates
back over 15 years ago, to before `pip` was even called `pip`.
Closes https://github.com/astral-sh/uv/issues/1376.
This PR improves the error message for the problem described in
https://github.com/astral-sh/uv/issues/1376. The original output
duplicates the actual error message and includes lots of noise
(`DirEntry { inner: DirEntry(...) }`).
```
$ uv pip install hexdump==3.3
error: Failed to download and build: hexdump==3.3
Caused by: Failed to extract source distribution: The top level of the archive must only contain a list directory, but it contains: [DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/__main__.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/hexdump.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/data") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/PKG-INFO") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/setup.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/README.txt") }]
Caused by: The top level of the archive must only contain a list directory, but it contains: [DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/__main__.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/hexdump.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/data") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/PKG-INFO") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/setup.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/README.txt") }]
```
This PR removes the duplication and `DirEntry` internals so that the
error message is easier to grasp:
```
$ uv pip install hexdump==3.3
error: Failed to download and build: hexdump==3.3
Caused by: Failed to extract source distribution
Caused by: The top level of the archive must only contain a list directory, but it contains: ["__main__.py", "hexdump.py", "data", "PKG-INFO", "setup.py", "README.txt"]
```
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```