This PR is a combination of #12920 and #13754. Prior to these changes,
following a redirect when searching indexes would bypass our
authentication middleware. This PR updates uv to support propagating
credentials through our middleware on same-origin redirects and to
support netrc credentials for both same- and cross-origin redirects. It
does not handle the case described in #11097 where the redirect location
itself includes credentials (e.g.,
`https://user:pass@redirect-location.com`). That will be addressed in
follow-up work.
This includes unit tests for the new redirect logic and integration
tests for credential propagation. The automated external registries test
is also passing for AWS CodeArtifact, Azure Artifacts, GCP Artifact
Registry, JFrog Artifactory, GitLab, Cloudsmith, and Gemfury.
Using a companion change in the middleware
(https://github.com/TrueLayer/reqwest-middleware/pull/235, forked&tagged
pending review), we can check and show retries for HTTP status core
errors, to consistently report retries again.
We fix two cases:
* Show retries for status code errors for cache client requests
* Show retries for status code errors for Python download requests
Not handled:
* Show previous retries when a distribution download fails mid-streaming
* Perform retries when a distribution download fails mid-streaming
* Show previous retries when a Python download fails mid-streaming
* Perform retries when a Python download fails mid-streaming
Prior to this PR, there were numerous places where uv would leak
credentials in logs. We had a way to mask credentials by calling methods
or a recently-added `redact_url` function, but this was not secure by
default. There were a number of other types (like `GitUrl`) that would
leak credentials on display.
This PR adds a `DisplaySafeUrl` newtype to prevent leaking credentials
when logging by default. It takes a maximalist approach, replacing the
use of `Url` almost everywhere. This includes when first parsing config
files, when storing URLs in types like `GitUrl`, and also when storing
URLs in types that in practice will never contain credentials (like
`DirectorySourceUrl`). The idea is to make it easy for developers to do
the right thing and for the compiler to support this (and to minimize
ever having to manually convert back and forth). Displaying credentials
now requires an active step. Note that despite this maximalist approach,
the use of the newtype should be zero cost.
One conspicuous place this PR does not use `DisplaySafeUrl` is in the
`uv-auth` crate. That would require new clones since there are calls to
`request.url()` that return a `&Url`. One option would have been to make
`DisplaySafeUrl` wrap a `Cow`, but this would lead to lifetime
annotations all over the codebase. I've created a separate PR based on
this one (#13576) that updates `uv-auth` to use `DisplaySafeUrl` with
one new clone. We can discuss the tradeoffs there.
Most of this PR just replaces `Url` with `DisplaySafeUrl`. The core is
`uv_redacted/lib.rs`, where the newtype is implemented. To make it
easier to review the rest, here are some points of note:
* `DisplaySafeUrl` has a `Display` implementation that masks
credentials. Currently, it will still display the username when there is
both a username and password. If we think is the wrong choice, it can
now be changed in one place.
* `DisplaySafeUrl` has a `remove_credentials()` method and also a
`.to_string_with_credentials()` method. This allows us to use it in a
variety of scenarios.
* `IndexUrl::redacted()` was renamed to
`IndexUrl::removed_credentials()` to make it clearer that we are not
masking.
* We convert from a `DisplaySafeUrl` to a `Url` when calling `reqwest`
methods like `.get()` and `.head()`.
* We convert from a `DisplaySafeUrl` to a `Url` when creating a
`uv_auth::Index`. That is because, as mentioned above, I will be
updating the `uv_auth` crate to use this newtype in a separate PR.
* A number of tests (e.g., in `pip_install.rs`) that formerly used
filters to mask tokens in the test output no longer need those filters
since tokens in URLs are now masked automatically.
* The one place we are still knowingly writing credentials to
`pyproject.toml` is when a URL with credentials is passed to `uv add`
with `--raw`. Since displaying credentials is no longer automatic, I
have added a `to_string_with_credentials()` method to the `Pep508Url`
trait. This is used when `--raw` is passed. Adding it to that trait is a
bit weird, but it's the simplest way to achieve the goal. I'm open to
suggestions on how to improve this, but note that because of the way
we're using generic bounds, it's not as simple as just creating a
separate trait for that method.
This PR restores #13041 and integrates two PRs from @zanieb:
* #13038
* #13040
It also adds tests for relative URI and fragment handling.
Closes#13037.
---------
Co-authored-by: Zanie Blue <contact@zanie.dev>
uv was failing to authenticate on 302 redirects when credentials were
available. This was because it was relying on `reqwest_middleware`'s
default redirect behavior which bypasses the middleware pipeline when
trying the redirect request (and hence bypasses our authentication
middleware). This PR updates uv to retrigger the middleware pipeline
when handling a 302 redirect, correctly using credentials from the URL,
the keyring, or `.netrc`.
Closes#5595Closes#11097
## Summary
The reqwest middleware doesn't retry errors that occur "after" the
request completes -- but in some cases, these do include spurious errors
that we want to retry. See https://github.com/astral-sh/uv/issues/8144
for examples. This PR adds a second retry layer during the response
_handler_, which should help with some of the spurious failures we see
in the linked issue.
Closes https://github.com/astral-sh/uv/issues/8144.
Recently, rkyv 0.8 was released. Its API is a fair bit simpler now for
higher level uses (like for us in `uv`) and results in us being able to
delete a fair bit of code. This also removes our last dependency on `syn
1.0`, and thus drops that dependency.
Performance (via testing on the `transformers` example) seems to remain
about the same, which is what was expected:
```
$ hyperfine -w5 -r100 'uv lock' 'uv-ag-rkyv-update lock'
Benchmark 1: uv lock
Time (mean ± σ): 55.6 ms ± 6.4 ms [User: 30.4 ms, System: 35.1 ms]
Range (min … max): 43.0 ms … 73.1 ms 100 runs
Benchmark 2: uv-ag-rkyv-update lock
Time (mean ± σ): 56.5 ms ± 7.2 ms [User: 30.5 ms, System: 36.3 ms]
Range (min … max): 39.1 ms … 71.5 ms 100 runs
Summary
uv lock ran
1.02 ± 0.18 times faster than uv-ag-rkyv-update lock
```
Closes#7415
## Summary
This PR revives https://github.com/astral-sh/uv/pull/4944, which I think
was a good start towards adding `--trusted-host`. Last night, I tried to
add `--trusted-host` with a custom verifier, but we had to vendor a lot
of `reqwest` code and I eventually hit some private APIs. I'm not
confident that I can implement it correctly with that mechanism, and
since this is security, correctness is the priority.
So, instead, we now use two clients and multiplex between them.
Closes https://github.com/astral-sh/uv/issues/1339.
## Test Plan
Created self-signed certificate, and ran `python3 -m http.server --bind
127.0.0.1 4443 --directory . --certfile cert.pem --keyfile key.pem` from
the packse index directory.
Verified that `cargo run pip install
transitive-yanked-and-unyanked-dependency-a-0abad3b6 --index-url
https://127.0.0.1:8443/simple-html` failed with:
```
error: Request failed after 3 retries
Caused by: error sending request for url (https://127.0.0.1:8443/simple-html/transitive-yanked-and-unyanked-dependency-a-0abad3b6/)
Caused by: client error (Connect)
Caused by: invalid peer certificate: Other(OtherError(CaUsedAsEndEntity))
```
Verified that `cargo run pip install
transitive-yanked-and-unyanked-dependency-a-0abad3b6 --index-url
'https://127.0.0.1:8443/simple-html' --trusted-host '127.0.0.1:8443'`
failed with the expected error (invalid resolution) and made valid
requests.
Verified that `cargo run pip install
transitive-yanked-and-unyanked-dependency-a-0abad3b6 --index-url
'https://127.0.0.1:8443/simple-html' --trusted-host '127.0.0.2' -n` also
failed.
## Summary
All of the resolver code is run on the main thread, so a lot of the
`Send` bounds and uses of `DashMap` and `Arc` are unnecessary. We could
also switch to using single-threaded versions of `Mutex` and `Notify` in
some places, but there isn't really a crate that provides those I would
be comfortable with using.
The `Arc` in `OnceMap` can't easily be removed because of the uv-auth
code which uses the
[reqwest-middleware](https://docs.rs/reqwest-middleware/latest/reqwest_middleware/trait.Middleware.html)
crate, that seems to adds unnecessary `Send` bounds because of
`async-trait`. We could duplicate the code and create a `OnceMapLocal`
variant, but I don't feel that's worth it.
## Summary
This PR adds support for hash-checking mode in `pip install` and `pip
sync`. It's a large change, both in terms of the size of the diff and
the modifications in behavior, but it's also one that's hard to merge in
pieces (at least, with any test coverage) since it needs to work
end-to-end to be useful and testable.
Here are some of the most important highlights:
- We store hashes in the cache. Where we previously stored pointers to
unzipped wheels in the `archives` directory, we now store pointers with
a set of known hashes. So every pointer to an unzipped wheel also
includes its known hashes.
- By default, we don't compute any hashes. If the user runs with
`--require-hashes`, and the cache doesn't contain those hashes, we
invalidate the cache, redownload the wheel, and compute the hashes as we
go. For users that don't run with `--require-hashes`, there will be no
change in performance. For users that _do_, the only change will be if
they don't run with `--generate-hashes` -- then they may see some
repeated work between resolution and installation, if they use `pip
compile` then `pip sync`.
- Many of the distribution types now include a `hashes` field, like
`CachedDist` and `LocalWheel`.
- Our behavior is similar to pip, in that we enforce hashes when pulling
any remote distributions, and when pulling from our own cache. Like pip,
though, we _don't_ enforce hashes if a distribution is _already_
installed.
- Hash validity is enforced in a few different places:
1. During resolution, we enforce hash validity based on the hashes
reported by the registry. If we need to access a source distribution,
though, we then enforce hash validity at that point too, prior to
running any untrusted code. (This is enforced in the distribution
database.)
2. In the install plan, we _only_ add cached distributions that have
matching hashes. If a cached distribution is missing any hashes, or the
hashes don't match, we don't return them from the install plan.
3. In the downloader, we _only_ return distributions with matching
hashes.
4. The final combination of "things we install" are: (1) the wheels from
the cache, and (2) the downloaded wheels. So this ensures that we never
install any mismatching distributions.
- Like pip, if `--require-hashes` is provided, we require that _all_
distributions are pinned with either `==` or a direct URL. We also
require that _all_ distributions have hashes.
There are a few notable TODOs:
- We don't support hash-checking mode for unnamed requirements. These
should be _somewhat_ rare, though? Since `pip compile` never outputs
unnamed requirements. I can fix this, it's just some additional work.
- We don't automatically enable `--require-hashes` with a hash exists in
the requirements file. We require `--require-hashes`.
Closes#474.
## Test Plan
I'd like to add some tests for registries that report incorrect hashes,
but otherwise: `cargo test`
## Summary
We're seeing reports that Sonatype Nexus isn't working with cached data.
Users are reporting 304 responses that show "Found modified response..."
path in the logs. I can't reproduce this on latest Sonatype Nexus, but
my best guess is that there's a 304 response that is failing our
validators, and we try to use that as if it's a "complete" response?
Closes https://github.com/astral-sh/uv/issues/1754.
Error for `uv pip compile scripts/requirements/jupyter.in` without
internet:
**Before**
```
error: error sending request for url (https://pypi.org/simple/jupyter/): error trying to connect: dns error: failed to lookup address information: No such host is known. (os error 11001)
Caused by: error trying to connect: dns error: failed to lookup address information: No such host is known. (os error 11001)
Caused by: dns error: failed to lookup address information: No such host is known. (os error 11001)
Caused by: failed to lookup address information: No such host is known. (os error 11001)
```
**After**
```
error: Could not connect, are you offline?
Caused by: error sending request for url (https://pypi.org/simple/django/): error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution
Caused by: error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution
Caused by: dns error: failed to lookup address information: Temporary failure in name resolution
Caused by: failed to lookup address information: Temporary failure in name resolution
```
On linux, it would be "Temporary failure in name resolution" instead of
"No such host is known. (os error 11001)".
The implementation checks for "dne error" stringly as hyper errors are
opaque. The danger is that this breaks with a hyper update. We still get
the complete error trace since reqwest eagerly inlines errors
(https://github.com/seanmonstar/reqwest/issues/2147).
No test since i wouldn't know how to simulate this in cargo test.
Fixes#1971
Address a few pedantic lints
lints are separated into separate commits so they can be reviewed
individually.
I've not added enforcement for any of these lints, but that could be
added if desirable.
A WARN log was being emitted for a "broken cache entry" in the case
where the cache entry simply doesn't exist. But this is totally fine and
expected. So we detect the kind of error that occurred and emit a TRACE
if the file simply didn't exist.
This PR introduces more robust cache healing when `uv` fails to
deserialize an existing cache entry.
("Cache healing" in this context means that if `uv` fails to
deserialize a cache entry, then it will automatically invalidate that
entry and re-generate the data. Typically by sending an HTTP request.)
Previous to some optimizations I made around deserialization, we were
already doing this. After those optimizations, deserializing a cache
policy and the payload were split into two steps. While deserializing
a cache policy retained its cache healing behavior, deserializing the
payload did not. This became an issue when #1556 landed, which changed
one of our `rkyv` data types. This in turn made our internal types
incompatible with existing cache entries. One could work-around this
by clearing `uv`'s cache with `uv clean`, but we should just do it
automatically on a cache entry by entry basis.
This does technically introduce a new cost by pessimistically cloning
the HTTP request so that we can re-send it if necessary (see the commit
messages for the knot pushing me toward this approach). So I re-ran my
favorite ad-hoc benchmark:
```
$ hyperfine -w10 --runs 50 "uv-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "uv-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A bart
Benchmark 1: uv-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
Time (mean ± σ): 114.4 ms ± 3.2 ms [User: 149.4 ms, System: 221.5 ms]
Range (min … max): 106.7 ms … 122.0 ms 50 runs
Benchmark 2: uv-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
Time (mean ± σ): 114.0 ms ± 3.0 ms [User: 146.0 ms, System: 223.3 ms]
Range (min … max): 105.3 ms … 121.4 ms 50 runs
Summary
uv-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
1.00 ± 0.04 times faster than uv-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```
Which is about what I expected.
We should endeavor to have a better testing strategy for these kinds of
bugs, but I think it might be a little tricky to do. I created
https://github.com/astral-sh/uv/issues/1699 to track that.
Fixes#1571
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```