## Summary
* Upgrade the rust toolchain to 1.85.0. This does not increase the MSRV.
* Update windows trampoline to 1.86 nightly beta (previously in 1.85
nightly beta).
## Test Plan
Existing tests
With the parallel simple index fetching, we would only acquire one
download concurrency token, meaning that we could in the worst case make
times the number of indexes more requests than the user requested limit.
We fix this by passing the semaphore down to the simple API method.
## Summary
This lets us drop a dependency entirely. `percent-encoding` is used by
`url` and so is already in the graph, whereas `urlencoding` isn't used
by anything else.
## Summary
I'm not a fan of registries including fragments here that aren't hashes,
but the spec doesn't expressly forbid it. I think it's reasonable to
ignore them.
Specifically, the spec is here:
https://packaging.python.org/en/latest/specifications/simple-repository-api/.
It says that:
> The URL **SHOULD** include a hash in the form of a URL fragment with
the following syntax: `#<hashname>=<hashvalue>`, where `<hashname>`he
lowercase name of the hash function (such as sha256) and `<hashvalue>`
is the hex encoded digest.
But it doesn't mention other fragments.
Closes https://github.com/astral-sh/uv/issues/7257.
## Summary
This PR migrates all of our PyTorch tests to use our own mirror, which
includes upload timestamps that we can use to enforce
`--excludes-newer`, making the tests far more stable over time. (Today,
if you checkout old versions of `uv`, many of the PyTorch tests will
fail, since the index contents drift over time.)
Some snapshots changed in this PR (see, e.g.,
`universal_nested_overlapping_local_requirement`). The underlying reason
is that I used the current timestamp when setting upload times in the
PyTorch mirror, but those tests read from both the PyTorch
`--find-links` index _and_ PyPI. I guess we don't omit `--find-links`
entries based on `--excludes-newer`? That might be a bug. But I had to
_increase_ the `--excludes-newer` to include the PyTorch mirror's
`--find-links`, which meant pulling in some newer packages from PyPI
too. This is fine: it's a one-time churn, and they'll be stable going
forward.
## Summary
On Windows, we have a lot of issues with atomic replacement and such.
There are a bunch of different failure modes, but they generally
involve: trying to persist a fail to a path at which the file already
exists, trying to replace or remove a file while someone else is reading
it, etc.
This PR adds locks to all of the relevant database paths. We already use
these advisory locks when building source distributions; now we use them
when unzipping wheels, storing metadata, etc.
Closes#11002.
## Test Plan
I ran the following script:
```shell
# Define the cache directory path
$cacheDir = "C:\Users\crmar\workspace\uv\cache"
# Clear the cache directory if it exists
if (Test-Path $cacheDir) {
Remove-Item -Recurse -Force $cacheDir
}
# Create the cache directory again
New-Item -ItemType Directory -Force -Path $cacheDir
# Define the command to run with --cache-dir flag
$command = {
param ($venvPath)
# Create a virtual environment in the specified path with --python
uv venv $venvPath
# Run the pip install command with --cache-dir flag
C:\Users\crmar\workspace\uv\target\profiling\uv.exe pip install flask==1.0.4 --no-binary flask --cache-dir C:\Users\crmar\workspace\uv\cache -v --python $venvPath
}
# Define the paths for the different virtual environments
$venv1 = "C:\Users\crmar\workspace\uv\venv1"
$venv2 = "C:\Users\crmar\workspace\uv\venv2"
$venv3 = "C:\Users\crmar\workspace\uv\venv3"
$venv4 = "C:\Users\crmar\workspace\uv\venv4"
$venv5 = "C:\Users\crmar\workspace\uv\venv5"
# Start the command in parallel five times using Start-Job, each with a different venv
$job1 = Start-Job -ScriptBlock $command -ArgumentList $venv1
$job2 = Start-Job -ScriptBlock $command -ArgumentList $venv2
$job3 = Start-Job -ScriptBlock $command -ArgumentList $venv3
$job4 = Start-Job -ScriptBlock $command -ArgumentList $venv4
$job5 = Start-Job -ScriptBlock $command -ArgumentList $venv5
# Wait for all jobs to complete
$jobs = @($job1, $job2, $job3, $job4, $job5)
$jobs | ForEach-Object { Wait-Job $_ }
# Retrieve the results (optional)
$jobs | ForEach-Object { Receive-Job -Job $_ }
# Clean up the jobs
$jobs | ForEach-Object { Remove-Job -Job $_ }
```
And ensured it succeeded in five straight invocations (whereas on
`main`, it consistently fails with a variety of different traces).
## Summary
This should address the comment here:
https://github.com/astral-sh/uv/pull/10179#issuecomment-2569189265. We
don't compute implied markers if the marker is already `TRUE`, and we
set it to `TRUE` as soon as we see a source distribution. So if we visit
the source distribution before the wheels, we'll avoid computing these
for any irrelevant distributions.
## Summary
So the error here is:
```rust
ExtractError("cpython-3.11.11%2B20241206-aarch64-apple-darwin-install_only_stripped.tar.gz", Io(Custom { kind: UnexpectedEof, error: TarError { desc: "failed to unpack `/Users/crmarsh/.local/share/uv/python/.cache/.tmpkqFzqE/python/lib/libpython3.11.dylib`", io: Custom { kind: UnexpectedEof, error: TarError { desc: "failed to unpack `python/lib/libpython3.11.dylib` into `/Users/crmarsh/.local/share/uv/python/.cache/.tmpkqFzqE/python/lib/libpython3.11.dylib`", io: Custom { kind: UnexpectedEof, error: "unexpected end of file" } } } } }))
```
This isn't a Reqwest error, so we miss it in
`is_extended_transient_error`.
We could add `TarError` or `ExtractError` here, but... should we? This
PR just extends it to any error that has an IO source. I don't see much
of a downside.
Closes https://github.com/astral-sh/uv/issues/9747.
## Test Plan
First, ran: `uv run ./scripts/create-python-mirror.py --name cpython
--arch aarch64 --os darwin`.
Then, dropped this into `./scripts/mirror/server.py`:
```python
import os
import random
from http.server import SimpleHTTPRequestHandler, HTTPServer
class GlitchyStaticServer(SimpleHTTPRequestHandler):
def do_GET(self):
"""Handle GET request."""
file_path = self.translate_path(self.path)
if not os.path.exists(file_path):
self.send_error(404, "File not found")
return
try:
with open(file_path, 'rb') as f:
file_content = f.read()
# Introduce an "unexpected end of file" glitch randomly
if random.random() < 0.75: # 75% chance of glitch
glitch_point = random.randint(1, len(file_content) - 1)
file_content = file_content[:glitch_point]
self.send_response(200)
self.send_header("Content-type", self.guess_type(file_path))
self.send_header("Content-Length", len(file_content))
self.end_headers()
self.wfile.write(file_content)
except Exception as e:
self.send_error(500, f"Internal Server Error: {e}")
def run(server_class=HTTPServer, handler_class=GlitchyStaticServer, port=8080):
"""Run the server."""
server_address = ('', port)
httpd = server_class(server_address, handler_class)
print(f"Serving on port {port} with glitchy behavior")
httpd.serve_forever()
if __name__ == "__main__":
run()
```
Then ran `python server.py` from that directory.
From there, ran `UV_PYTHON_INSTALL_MIRROR="http://localhost:8080" cargo
run python install 3.11 --reinstall --verbose` to reliably test retries.
## Summary
A lot of good new lints, and most importantly, error stabilizations. I
tried to find a few usages of the new stabilizations, but I'm sure there
are more.
IIUC, this _does_ require bumping our MSRV.
## Summary
It turns out that `WrappedReqwestError` skips the `reqwest::Error`
itself in order to hack the display. This PR adds it to the list of
variants we check when retrying transient errors.
Closes https://github.com/astral-sh/uv/issues/9246.
## Test Plan
Patched `reqwest` locally to return an error in `bytes()`. Verified that
it was _not_ caught prior to this PR, but was caught afterwards.
## Summary
The reqwest middleware doesn't retry errors that occur "after" the
request completes -- but in some cases, these do include spurious errors
that we want to retry. See https://github.com/astral-sh/uv/issues/8144
for examples. This PR adds a second retry layer during the response
_handler_, which should help with some of the spurious failures we see
in the linked issue.
Closes https://github.com/astral-sh/uv/issues/8144.
## Summary
These were moved as part of a broader refactor to create a single
integration test module. That "single integration test module" did
indeed have a big impact on compile times, which is great! But we aren't
seeing any benefit from moving these tests into their own files (despite
the claim in [this blog
post](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
I see the same compilation pattern regardless of where the tests are
located). Plus, we don't have many of these, and same-file tests is such
a strong Rust convention.
This PR is a revival of #3502, albeit in a much simpler form.
This would allow for different middlewares like authentication and such,
useful for if you want to deviate from the keychain authentication
methods when using uv as a library.
@zanieb I hope I made the changes as you noted you wanted to see them :)
Happy to add/change anything you need.
This still utilizes the RFC 2822 datetime formatter, but utilizes new
methods [added in jiff 0.1.14] to emit timestamps in a format strictly
compatible with RFC 9110.
It seems like most HTTP servers were pretty flexible and supported RFC
2822 datetime formats, but #8747 shows at least one case where that
isn't true. Given that the [MDN docs prescribe RFC 9110], we defer to
them.
Fixes#8747
[added in jiff 0.1.14]: https://github.com/BurntSushi/jiff/pull/154
[MDN docs prescribe RFC 9110]:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since
## Summary
This PR adds a first-class API for defining registry indexes, beyond our
existing `--index-url` and `--extra-index-url` setup.
Specifically, you now define indexes like so in a `uv.toml` or
`pyproject.toml` file:
```toml
[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
```
You can also provide indexes via `--index` and `UV_INDEX`, and override
the default index with `--default-index` and `UV_DEFAULT_INDEX`.
### Index priority
Indexes are prioritized in the order in which they're defined, such that
the first-defined index has highest priority.
Indexes are also inherited from parent configuration (e.g., the
user-level `uv.toml`), but are placed after any indexes in the current
project, matching our semantics for other array-based configuration
values.
You can mix `--index` and `--default-index` with the legacy
`--index-url` and `--extra-index-url` settings; the latter two are
merely treated as unnamed `[[tool.uv.index]]` entries.
### Index pinning
If an index includes a name (which is optional), it can then be
referenced via `tool.uv.sources`:
```toml
[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
[tool.uv.sources]
torch = { index = "pytorch" }
```
If an index is marked as `explicit = true`, it can _only_ be used via
such references, and will never be searched implicitly:
```toml
[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
explicit = true
[tool.uv.sources]
torch = { index = "pytorch" }
```
Indexes defined outside of the current project (e.g., in the user-level
`uv.toml`) can _not_ be explicitly selected.
(As of now, we only support using a single index for a given
`tool.uv.sources` definition.)
### Default index
By default, we include PyPI as the default index. This remains true even
if the user defines a `[[tool.uv.index]]` -- PyPI is still used as a
fallback. You can mark an index as `default = true` to (1) disable the
use of PyPI, and (2) bump it to the bottom of the prioritized list, such
that it's used only if a package does not exist on a prior index:
```toml
[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
default = true
```
### Name reuse
If a name is reused, the higher-priority index with that name is used,
while the lower-priority indexes are ignored entirely.
For example, given:
```toml
[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
[[tool.uv.index]]
name = "pytorch"
url = "https://test.pypi.org/simple"
```
The `https://test.pypi.org/simple` index would be ignored entirely,
since it's lower-priority than `https://download.pytorch.org/whl/cu121`
but shares the same name.
Closes#171.
## Future work
- Users should be able to provide authentication for named indexes via
environment variables.
- `uv add` should automatically write `--index` entries to the
`pyproject.toml` file.
- Users should be able to provide multiple indexes for a given package,
stratified by platform:
```toml
[tool.uv.sources]
torch = [
{ index = "cpu", markers = "sys_platform == 'darwin'" },
{ index = "gpu", markers = "sys_platform != 'darwin'" },
]
```
- Users should be able to specify a proxy URL for a given index, to
avoid writing user-specific URLs to a lockfile:
```toml
[[tool.uv.index]]
name = "test"
url = "https://private.org/simple"
proxy = "http://<omitted>/pypi/simple"
```
## Summary
This PR declares and documents all environment variables that are used
in one way or another in `uv`, either internally, or externally, or
transitively under a common struct.
I think over time as uv has grown there's been many environment
variables introduced. Its harder to know which ones exists, which ones
are missing, what they're used for, or where are they used across the
code. The docs only documents a handful of them, for others you'd have
to dive into the code and inspect across crates to know which crates
they're used on or where they're relevant.
This PR is a starting attempt to unify them, make it easier to discover
which ones we have, and maybe unlock future posibilities in automating
generating documentation for them.
I think we can split out into multiple structs later to better organize,
but given the high influx of PR's and possibly new environment variables
introduced/re-used, it would be hard to try to organize them all now
into their proper namespaced struct while this is all happening given
merge conflicts and/or keeping up to date.
I don't think this has any impact on performance as they all should
still be inlined, although it may affect local build times on changes to
the environment vars as more crates would likely need a rebuild. Lastly,
some of them are declared but not used in the code, for example those in
`build.rs`. I left them declared because I still think it's useful to at
least have a reference.
Did I miss any? Are their initial docs cohesive?
Note, `uv-static` is a terrible name for a new crate, thoughts? Others
considered `uv-vars`, `uv-consts`.
## Test Plan
Existing tests
As per
https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html
Before that, there were 91 separate integration tests binary.
(As discussed on Discord — I've done the `uv` crate, there's still a few
more commits coming before this is mergeable, and I want to see how it
performs in CI and locally).
## Summary
Random, but I noticed that we can remove a ton of serialize and
deserialize derives by using `rkyv` for the flat-index caches. (We
already use `rkyv` for these same structs in the registry cache.)
Recently, rkyv 0.8 was released. Its API is a fair bit simpler now for
higher level uses (like for us in `uv`) and results in us being able to
delete a fair bit of code. This also removes our last dependency on `syn
1.0`, and thus drops that dependency.
Performance (via testing on the `transformers` example) seems to remain
about the same, which is what was expected:
```
$ hyperfine -w5 -r100 'uv lock' 'uv-ag-rkyv-update lock'
Benchmark 1: uv lock
Time (mean ± σ): 55.6 ms ± 6.4 ms [User: 30.4 ms, System: 35.1 ms]
Range (min … max): 43.0 ms … 73.1 ms 100 runs
Benchmark 2: uv-ag-rkyv-update lock
Time (mean ± σ): 56.5 ms ± 7.2 ms [User: 30.5 ms, System: 36.3 ms]
Range (min … max): 39.1 ms … 71.5 ms 100 runs
Summary
uv lock ran
1.02 ± 0.18 times faster than uv-ag-rkyv-update lock
```
Closes#7415
This is preparatory work for the upload functionality, which needs to
read the METADATA file and attach its parsed contents to the POST
request: We move finding the `.dist-info` from `install-wheel-rs` and
`uv-client` to a new `uv-metadata` crate, so it can be shared with the
publish crate.
I don't properly know if its the right place since the upload code isn't
ready, but i'm PR-ing it now because it already had merge conflicts.
## Summary
We now track the discovered `IndexCapabilities` for each `IndexUrl`. If
we learn that an index doesn't support range requests, we avoid doing
any batch prefetching.
Closes https://github.com/astral-sh/uv/issues/7221.
## Summary
Http headers are supposed to be case-insensitive (RFC 2616), but there
are some implementations that don't normalize them.
I noticed it while migrating to `uv`, calls to an internal registry
failed. A man in the middle server helped me to find that `pip` uses
Title-Case while `uv pip` uses lowercase.
## Test Plan
I tested `uv` with the same server and now it works fine.
Forward an error for missing temp directories:
```
$ env TMPDIR=.tmp uv-debug pip install httpx
error: No such file or directory (os error 2) at path "/home/konsti/projects/uv/.tmp/.tmpgIBhhh"
```
Fixes#6878
## Summary
This PR revives https://github.com/astral-sh/uv/pull/4944, which I think
was a good start towards adding `--trusted-host`. Last night, I tried to
add `--trusted-host` with a custom verifier, but we had to vendor a lot
of `reqwest` code and I eventually hit some private APIs. I'm not
confident that I can implement it correctly with that mechanism, and
since this is security, correctness is the priority.
So, instead, we now use two clients and multiplex between them.
Closes https://github.com/astral-sh/uv/issues/1339.
## Test Plan
Created self-signed certificate, and ran `python3 -m http.server --bind
127.0.0.1 4443 --directory . --certfile cert.pem --keyfile key.pem` from
the packse index directory.
Verified that `cargo run pip install
transitive-yanked-and-unyanked-dependency-a-0abad3b6 --index-url
https://127.0.0.1:8443/simple-html` failed with:
```
error: Request failed after 3 retries
Caused by: error sending request for url (https://127.0.0.1:8443/simple-html/transitive-yanked-and-unyanked-dependency-a-0abad3b6/)
Caused by: client error (Connect)
Caused by: invalid peer certificate: Other(OtherError(CaUsedAsEndEntity))
```
Verified that `cargo run pip install
transitive-yanked-and-unyanked-dependency-a-0abad3b6 --index-url
'https://127.0.0.1:8443/simple-html' --trusted-host '127.0.0.1:8443'`
failed with the expected error (invalid resolution) and made valid
requests.
Verified that `cargo run pip install
transitive-yanked-and-unyanked-dependency-a-0abad3b6 --index-url
'https://127.0.0.1:8443/simple-html' --trusted-host '127.0.0.2' -n` also
failed.
## Summary
This reverts commit 7d92915f3d.
I thought this would be a net performance improvement, but we've now had
multiple reports that this made locking _extremely_ slow. I also tested
this today with a very large codebase against a registry that does not
support range requests, and the number of downloads was sort of wild to
watch. Reverting the reduced resolution time by over 50%.
Closes#6104.
This PR migrates uv's use of `chrono` to `jiff`.
I did most of this work a while back as one of my tests to ensure Jiff
could actually be used in a real world project. I decided to revive
this because I noticed that `reqwest-retry` dropped its Chrono
dependency,
which is I believe the only other thing requiring Chrono in uv.
(Although, we use a fork of `reqwest-middleware` at present, and that
hasn't been updated to latest upstream yet. I wasn't quite sure of the
process we have for that.)
In course of doing this, I actually made two changes to uv:
First is that the lock file now writes an RFC 3339 timestamp for
`exclude-newer`. Previously, we were using Chrono's `Display`
implementation for this which is a non-standard but "human readable"
format. I think the right thing to do here is an RFC 3339 timestamp.
Second is that, in addition to an RFC 3339 timestamp, `--exclude-newer`
used to accept a "UTC date." But this PR changes it to a "local date."
That is, a date in the user's system configured time zone. I think
this makes more sense than a UTC date, but one alternative is to drop
support for a date and just rely on an RFC 3339 timestamp. The main
motivation here is that automatically assuming UTC is often somewhat
confusing, since just writing an unqualified date like `2024-08-19` is
often assumed to be interpreted relative to the writer's "local" time.
## Summary
The strategy here is: if the user provides supported environments, we
use those as the initial forks when resolving. As a result, we never add
or explore branches that are disjoint with the supported environments.
(If the supported environments change, we ignore the lockfile entirely,
so we don't have to worry about any interactions between supported
environments and the preference forks.)
Closes https://github.com/astral-sh/uv/issues/6184.
Surprisingly, this is a lockfile schema change: We can't store relative
paths in urls, so we have to store a `filename` entry instead of the
whole url.
Fixes#4355
## Summary
This PR adds a `DistExtension` field to some of our distribution types,
which requires that we validate that the file type is known and
supported when parsing (rather than when attempting to unzip). It
removes a bunch of extension parsing from the code too, in favor of
doing it once upfront.
Closes https://github.com/astral-sh/uv/issues/5858.
The comment in the code explains the bulk of this:
```rust
// We previously computed this heuristic freshness lifetime by
// looking at the difference between the last modified header and
// the response's date header. We then asserted that the cached
// response ought to be "fresh" for 10% of that interval.
//
// It turns out that this can result in very long freshness
// lifetimes[1] that lead to uv caching too aggressively.
//
// Since PyPI sets a max-age of 600 seconds and since we're
// principally just interacting with Python package indices here,
// we just assume a freshness lifetime equal to what PyPI has.
//
// Note though that a better solution here is for the index to
// support proper HTTP caching headers (ideally Cache-Control, but
// Expires also works too, as above).
```
We also remove the `heuristic_percent` field on `CacheConfig`. Since
that's actually part of the cache itself, we bump the simple cache
version.
Finally, we add some more `trace!` calls that should hopefully make
diagnosing issues related to the freshness lifetime a bit easier in the
future.
Fixes#5351
## Summary
In my setup, I have a directory of wheels symlinked from different
directories. I can point `--find-links` at it with `pip` and it works
but not `uv`.
Currently, `uv` checks if a candidate file `is_file` which is for
regular files. By also checking `is_symlink` I was able to install a
symlinked wheel. I'm not *exactly* sure where, but some other place is
eventually resolving the absolute path of the wheel. (`uv`? The OS?)
## Test Plan
Manually tested - I didn't see any tests for `FlatIndexClient` in the
`uv-client` crate.
```
mkdir /tmp/a /tmp/b # Create a directory of wheels (/tmp/a) and a directory of symlinked wheels (/tmp/b)
cp test-0.0.1-py3-none-any.whl /tmp/a # Add a wheel to the directory of wheels
ln -s /tmp/a/test-0.0.1-py3-none-any.whl /tmp/b/ # Create a symlink to that wheel
uv pip install test --find-links /tmp/b # Install pointing at the symlinked wheels directory
```
## Summary
When range requests aren't supported, we fall back to streaming the
wheel, stopping as soon as we hit a `METADATA` file. This is a small
optimization, but the downside is that we don't get to cache the
resulting wheel...
We don't know whether `METADATA` will be at the beginning or end of the
wheel, but it _seems_ like a better tradeoff to download and cache the
entire wheel?
Closes: https://github.com/astral-sh/uv/issues/5088.
Sort of a revert of: https://github.com/astral-sh/uv/pull/1792.
## Summary
We currently store wheel URLs in an unparsed state because we don't have
a stable parsed representation to use with rykv. Unfortunately this
means we end up reparsing unnecessarily in a lot of places, especially
when constructing a `Lock`. This PR adds a `UrlString` type that lets us
avoid reparsing without losing the validity of the `Url`.
## Test Plan
Shaves off another ~10 ms from
https://github.com/astral-sh/uv/issues/4860.
```
➜ transformers hyperfine "../../uv/target/profiling/uv lock" "../../uv/target/profiling/baseline lock" --warmup 3
Benchmark 1: ../../uv/target/profiling/uv lock
Time (mean ± σ): 120.9 ms ± 2.5 ms [User: 126.0 ms, System: 80.6 ms]
Range (min … max): 116.8 ms … 125.7 ms 23 runs
Benchmark 2: ../../uv/target/profiling/baseline lock
Time (mean ± σ): 129.9 ms ± 4.2 ms [User: 127.1 ms, System: 86.1 ms]
Range (min … max): 123.4 ms … 141.2 ms 23 runs
Summary
../../uv/target/profiling/uv lock ran
1.07 ± 0.04 times faster than ../../uv/target/profiling/baseline lock
```
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
<!-- What's the purpose of the change? What does it do, and why? -->
Addresses https://github.com/astral-sh/uv/issues/4330, to reduce
duplication in the client creation logic.
## Test Plan
<!-- How was it tested? -->
https://github.com/astral-sh/uv/pull/4729#issuecomment-2204681655
In #3514 and #2755, users had intermittent network errors, but it was
not always clear whether we had already retried these requests or not.
Building upon https://github.com/TrueLayer/reqwest-middleware/pull/159,
this PR adds the number of retries to the error message, so we can see
at first glance where we're missing retries and where we might need to
change retry settings.
Example error trace:
```
Could not connect, are you offline?
Caused by: Request failed after 3 retries
Caused by: error sending request for url (https://pypi.org/simple/uv/)
Caused by: client error (Connect)
Caused by: dns error: failed to lookup address information: Name or service not known
Caused by: failed to lookup address information: Name or service not known
```
This code is ugly since i'm missing a better pattern for attaching
context to reqwest middleware errors in
https://github.com/TrueLayer/reqwest-middleware/pull/159.
As in #4360, updates the uv project CLI to respect `.python-version`
files as default Python version requests. Additionally, updates project
interpreter discovery to fetch managed toolchains as in `uv venv
--preview`.
We retry several kinds of network request failures, but it's often
unclear whether a request was retried or not
(https://github.com/astral-sh/uv/issues/3514#issuecomment-2105485773).
This PR adds a small intermediary layer that logs all transient request
failures, adding the `DEBUG Transient request failure` lines:
```
DEBUG Searching for Python interpreter in virtual environments
DEBUG Found CPython 3.12.3 at `/home/konsti/projects/uv/.venv/bin/python3` (active virtual environment)
DEBUG Using Python 3.12.3 environment at .venv/bin/python3
DEBUG Acquired lock for `.venv`
DEBUG At least one requirement is not satisfied: tqdm
DEBUG Using registry request timeout of 30s
DEBUG Solving with target Python version 3.12.3
DEBUG Adding direct dependency: tqdm*
DEBUG No cache entry for: https://pypi.org/simple/tqdm/
DEBUG Transient request failure for https://pypi.org/simple/tqdm/, retrying: Request error: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: client error (Connect)
Caused by: dns error: failed to lookup address information: Name or service not known
Caused by: failed to lookup address information: Name or service not known
DEBUG Transient request failure for https://pypi.org/simple/tqdm/, retrying: Request error: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: client error (Connect)
Caused by: dns error: failed to lookup address information: Name or service not known
Caused by: failed to lookup address information: Name or service not known
DEBUG Transient request failure for https://pypi.org/simple/tqdm/, retrying: Request error: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: client error (Connect)
Caused by: dns error: failed to lookup address information: Name or service not known
Caused by: failed to lookup address information: Name or service not known
DEBUG Transient request failure for https://pypi.org/simple/tqdm/, retrying: Request error: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: client error (Connect)
Caused by: dns error: failed to lookup address information: Name or service not known
Caused by: failed to lookup address information: Name or service not known
error: Could not connect, are you offline?
Caused by: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: client error (Connect)
Caused by: dns error: failed to lookup address information: Name or service not known
Caused by: failed to lookup address information: Name or service not known
```
I decided for multi-line logging to show the complete error trace since
only `Transient request failure for https://pypi.org/simple/tqdm/,
retrying: Request error: error sending request for url
(https://pypi.org/simple/tqdm/)` doesn't tell you the actual problem (a
dns error).
Note that running with `-v` will not show messages about retry backoff
timing, but running with `RUST_LOG=debug` now shows a complete picture:
```
DEBUG starting new connection: https://pypi.org/
DEBUG resolving host="pypi.org"
DEBUG Transient request failure for https://pypi.org/simple/tqdm/, retrying: Request error: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: error sending request for url (https://pypi.org/simple/tqdm/)
Caused by: client error (Connect)
Caused by: dns error: failed to lookup address information: Name or service not known
Caused by: failed to lookup address information: Name or service not known
WARN Retry attempt #2. Sleeping 528.728192ms before the next attempt
```
Fixes#3572
Adds `--offline` support to `uv tool run` and `uv run` because I needed
it on the airplane today.
I think we should move `--offline` to the global settings like
`--native-tls`.
## Summary
Closes https://github.com/astral-sh/uv/issues/3715.
## Test Plan
```
❯ echo "/../test" | cargo run pip compile -
error: Couldn't parse requirement in `-` at position 0
Caused by: path could not be normalized: /../test
/../test
^^^^^^^^
❯ echo "-e /../test" | cargo run pip compile -
error: Invalid URL in `-`: `/../test`
Caused by: path could not be normalized: /../test
Caused by: cannot normalize a relative path beyond the base directory
```
Our current flow of data from "simple registry package" to "final
resolved distribution" goes through a number of types:
* `SimpleMetadata` is the API response from a registry that includes all
published versions for a package. Each version has an assortment of
metadata
associated with it.
* `VersionFiles` is the aforementioned metadata. It is split in two: a
group of files for source distributions and a group of files for wheels.
* `PrioritizedDist` collects a subset of the files from `VersionFiles`
to form a selection of the "best" sdist and the "best" wheel for the
current environment.
* `CompatibleDist` is created from a borrowed `PrioritizedDist` that,
perhaps among other things, encapsulates the decision of whether to pick
an sdist or a wheel. (This decision depends both on compatibility and
the action being performed. e.g., When doing installation, a
`CompatibleDist` will sometimes select an sdist over a wheel.)
* `ResolvedDistRef` is like a `ResolvedDist`, but borrows a `Dist`.
* `ResolvedDist` is the almost-final-form of a distribution in a
resolution and is created from a `ResolvedDistRef`.
* `AnnotatedResolvedDist` is a new data type that is the actual final
form of a distribution that a universal lock file cares about. It
bundles a `ResolvedDist` with some metadata needed to generate a lock
file.
One of the requirements of a universal lock file is that we include all
wheels (and maybe all source distributions? but at least one if it's
present) associated with a distribution. But the above flow of data (in
the step from `VersionFiles` to `PrioritizedDist`) drops all wheels
except for the best one.
To remedy this, in this PR, we rejigger `PrioritizedDist`,
`CompatibleDist` and `ResolvedDistRef` so that all wheel data is
preserved. And when a `ResolvedDistRef` is finally turned into a
`ResolvedDist`, we copy all of the wheel data. And finally, we adjust
the `Lock` constructor to read this new data and include it in the lock
file. To make this work, we also modify `RegistryBuiltDist` so that it
can contain one or more wheels instead of just one.
One shortcoming here (called out in the code as a FIXME) is that if a
source distribution is selected as the "best" thing to use (perhaps
there are no compatible wheels), then the wheels won't end up in the
lock file. I plan to fix this in a follow-up PR.
We also aren't totally consistent on source distribution naming.
Sometimes we use `sdist`. Sometimes `source`. Sometimes `source_dist`.
I think it'd be nice to just use `sdist` everywhere, but I do prefer
the type names to be `SourceDist`. And sometimes you want function
names to match the type names (i.e., `from_source_dist`), which in turn
leads to an appearance of inconsistency. I'm open to ideas.
Closes#3351
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
Just fix typos.
While `alpha-numeric` is not really a misspelling:
- it is missing from mainstream curated dictionaries, all of them
suggest `alphanumeric`;
- it is less used than `alphanumeric` (more than ⨉10 less) according to
the Google [Ngram
Viewer](https://books.google.com/ngrams/graph?content=alpha-numeric%2Calphanumeric&year_start=1900&year_end=2019&corpus=en-2019);
- it is [missing from
SCOWL](http://app.aspell.net/lookup?dict=en_US-large;words=alpha-numeric).
## Test Plan
CI jobs.
We now use the getters and setters everywhere.
There were some places where we wanted to build a `MarkerEnvironment`
out of whole cloth, usually in tests. To facilitate those use cases, we
add a `MarkerEnvironmentBuilder` that provides a convenient constructor.
It's basically like a `MarkerEnvironment::new`, but with named
parameters. That's useful here because there are so many fields (and
they many have the same type).
## Summary
All of the resolver code is run on the main thread, so a lot of the
`Send` bounds and uses of `DashMap` and `Arc` are unnecessary. We could
also switch to using single-threaded versions of `Mutex` and `Notify` in
some places, but there isn't really a crate that provides those I would
be comfortable with using.
The `Arc` in `OnceMap` can't easily be removed because of the uv-auth
code which uses the
[reqwest-middleware](https://docs.rs/reqwest-middleware/latest/reqwest_middleware/trait.Middleware.html)
crate, that seems to adds unnecessary `Send` bounds because of
`async-trait`. We could duplicate the code and create a `OnceMapLocal`
variant, but I don't feel that's worth it.
In *some* places in our crates, `serde` (and `rkyv`) are optional
dependencies. I believe this was done out of reasons of "good sense,"
that is, it follows a Rust ecosystem pattern where serde integration
tends to be an opt-in crate feature. (And similarly for `rkyv`.)
However, ultimately, `uv` itself requires `serde` and `rkyv` to
function. Since our crates are strictly internal, there are limited
consumers for our crates without `serde` (and `rkyv`) enabled. I think
one possibility is that optional `serde` (and `rkyv`) integration means
that someone can do this:
cargo test -p pep440_rs
And this will run tests _without_ `serde` or `rkyv` enabled. That in
turn could lead to faster iteration time by reducing compile times. But,
I'm not sure this is worth supporting. The iterative compilation times
of
individual crates are probably fast enough in debug mode, even with
`serde` and `rkyv` enabled. Namely, `serde` and `rkyv` themselves
shouldn't need to be re-compiled in most cases. On `main`:
```
from-scratch: `cargo test -p pep440_rs --lib` 0.685
incremental: `cargo test -p pep440_rs --lib` 0.278s
from-scratch: `cargo test -p pep440_rs --features serde,rkyv --lib` 3.948s
incremental: `cargo test -p pep440_rs --features serde,rkyv --lib` 0.321s
```
So while a from-scratch build does take significantly longer, an
incremental build is about the same.
The benefit of doing this change is two-fold:
1. It brings out crates into alignment with "reality." In particular,
some crates were _implicitly_ relying on `serde` being enabled
without explicitly declaring it. This technically means that our
`Cargo.toml`s were wrong in some cases, but it is hard to observe it
because of feature unification in a Cargo workspace.
2. We no longer need to deal with the cognitive burden of writing
`#[cfg_attr(feature = "serde", ...)]` everywhere.
## Summary
It seems like Azure might return a 401 when you request a package that
doesn't exist (even with valid credentials)? But I admittedly haven't
tested this. (We already skip 403, and this seems similar?)
Closes https://github.com/astral-sh/uv/issues/3291.
## Summary
This index strategy resolves every package to the latest possible
version across indexes. If a version is in multiple indexes, the first
available index is selected.
Implements #3137
This closely matches pip.
## Test Plan
Good question. I'm hesitant to use my certifi example here, since that
would inevitably break when torch removes this package. Please comment!
Previously, uv-auth would fail to compile due to a missing process
feature. I chose to make all tokio features we use top level features,
so we can share the tokio cache between all test invocations.
Since we're now using read timeouts and not total timeouts, we can use a
lower threshold, a single read shouldn't take 5 min (and not even 10s).
The 10s value is somewhat arbitrary.
Like #3144, this is a breaking change in some sense.
In #2976 I made some changes that led to regressions:
- We stopped tracking URLs that we had not seen credentials for in the
cache
- This means the cache no longer returns a value to indicate we've seen
a realm before
- We stopped seeding the cache with URLs
- Combined with the above, this means we no longer had a list of
locations that we would never attempt to fetch credentials for
- We added caching of credentials found on requests
- Previously the cache was only populated from the seed or credentials
found in the netrc or keyring
- This meant that the cache was populated for locations that we
previously did not cache, i.e. GitHub artifacts(?)
Unfortunately this unveiled problems with the granularity of our cache.
We cache credentials per realm (roughly the hostname) but some realms
have mixed authentication modes i.e. different credentials per URL or
URLs that do not require credentials. Applying credentials to a URL that
does not require it can lead to a failed request, as seen in #3123 where
GitHub throws a 401 when receiving credentials.
To resolve this, the cache is expanded to supporting caching at two
levels:
- URL, cached URL must be a prefix of the request URL
- Realm, exact match required
When we don't have URL-level credentials cached, we attempt the request
without authentication first. On failure, we'll search for realm-level
credentials or fetch credentials from external services. This avoids
providing credentials to new URLs unless we know we need them.
Closes https://github.com/astral-sh/uv/issues/3123
## Summary
This leverages the new `read_timeout` property, which ensures that (like
pip) our timeout is not applied to the _entire_ request, but rather, to
each individual read operation.
Closes: #1921.
See: #1912.
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
Closes#2564
## Test Plan
1. Changed existing linehaul tests to leverage insta.
2. Ran tests in various linux distros (Debian, Ubuntu, Centos, Fedora,
Alpine) to ensure they also pass locally again.
---------
Co-authored-by: konstin <konstin@mailbox.org>
## Summary
This PR adds support for hash-checking mode in `pip install` and `pip
sync`. It's a large change, both in terms of the size of the diff and
the modifications in behavior, but it's also one that's hard to merge in
pieces (at least, with any test coverage) since it needs to work
end-to-end to be useful and testable.
Here are some of the most important highlights:
- We store hashes in the cache. Where we previously stored pointers to
unzipped wheels in the `archives` directory, we now store pointers with
a set of known hashes. So every pointer to an unzipped wheel also
includes its known hashes.
- By default, we don't compute any hashes. If the user runs with
`--require-hashes`, and the cache doesn't contain those hashes, we
invalidate the cache, redownload the wheel, and compute the hashes as we
go. For users that don't run with `--require-hashes`, there will be no
change in performance. For users that _do_, the only change will be if
they don't run with `--generate-hashes` -- then they may see some
repeated work between resolution and installation, if they use `pip
compile` then `pip sync`.
- Many of the distribution types now include a `hashes` field, like
`CachedDist` and `LocalWheel`.
- Our behavior is similar to pip, in that we enforce hashes when pulling
any remote distributions, and when pulling from our own cache. Like pip,
though, we _don't_ enforce hashes if a distribution is _already_
installed.
- Hash validity is enforced in a few different places:
1. During resolution, we enforce hash validity based on the hashes
reported by the registry. If we need to access a source distribution,
though, we then enforce hash validity at that point too, prior to
running any untrusted code. (This is enforced in the distribution
database.)
2. In the install plan, we _only_ add cached distributions that have
matching hashes. If a cached distribution is missing any hashes, or the
hashes don't match, we don't return them from the install plan.
3. In the downloader, we _only_ return distributions with matching
hashes.
4. The final combination of "things we install" are: (1) the wheels from
the cache, and (2) the downloaded wheels. So this ensures that we never
install any mismatching distributions.
- Like pip, if `--require-hashes` is provided, we require that _all_
distributions are pinned with either `==` or a direct URL. We also
require that _all_ distributions have hashes.
There are a few notable TODOs:
- We don't support hash-checking mode for unnamed requirements. These
should be _somewhat_ rare, though? Since `pip compile` never outputs
unnamed requirements. I can fix this, it's just some additional work.
- We don't automatically enable `--require-hashes` with a hash exists in
the requirements file. We require `--require-hashes`.
Closes#474.
## Test Plan
I'd like to add some tests for registries that report incorrect hashes,
but otherwise: `cargo test`
## Summary
This lets us remove circular dependencies (in the future, e.g., #2945)
that arise from `FlatIndex` needing a bunch of resolver-specific
abstractions (like incompatibilities, required hashes, etc.) that aren't
necessary to _fetch_ the flat index entries.
## Summary
Right now, we have a `Hashes` representation that looks like:
```rust
/// A dictionary mapping a hash name to a hex encoded digest of the file.
///
/// PEP 691 says multiple hashes can be included and the interpretation is left to the client.
#[derive(Debug, Clone, Eq, PartialEq, Default, Deserialize)]
pub struct Hashes {
pub md5: Option<Box<str>>,
pub sha256: Option<Box<str>>,
pub sha384: Option<Box<str>>,
pub sha512: Option<Box<str>>,
}
```
It stems from the PyPI API, which returns a dictionary of hashes.
We tend to pass these around as a vector of `Vec<Hashes>`. But it's a
bit strange because each entry in that vector could contain multiple
hashes. And it makes it difficult to ask questions like "Is
`sha256:ab21378ca980a8` in the set of hashes"?
This PR instead treats `Hashes` as the PyPI-internal type, and uses a
new `Vec<HashDigest>` everywhere in our own APIs.
Needed to prevent circular dependencies in my toolchain work (#2931). I
think this is probably a reasonable change as we move towards persistent
configuration too?
Unfortunately `BuildIsolation` needs to be in `uv-types` to avoid
circular dependencies still. We might be able to resolve that in the
future.
## Summary
In working on `--require-hashes`, I noticed that we're missing some
incompatibility tracking for `--find-links` distributions. Specifically,
we don't respect `--no-build` or `--no-binary`, so if we select a wheel
due to `--find-links`, we then throw a hard error when trying to build
it later (if `--no-binary` is provided), rather than selecting the
source distribution instead.
Closes https://github.com/astral-sh/uv/issues/2827.
## Summary
Upgrading `rs-async-zip` enables us to support data descriptors in
streaming. This both greatly improves performance for indexes that use
data descriptors _and_ ensures that we support them in a few other
places (e.g., zipped source distributions created in Finder).
Closes#2808.
## Summary
This partially revives https://github.com/astral-sh/uv/pull/2135 (with
some modifications) to enable users to opt-in to looking for packages
across multiple indexes.
The behavior is such that, in version selection, we take _any_
compatible version from a "higher-priority" index over the compatible
versions of a "lower-priority" index, even if that means we might accept
an "older" version.
Closes https://github.com/astral-sh/uv/issues/2775.
## Summary
In `pip sync`, we weren't properly handling cases in which a package
_only_ existed in `--find-links` (e.g., the user passed `--offline` or
`--no-index`).
I plan to explore removing `Finder` entirely to avoid these mismatch
bugs between `pip sync` and other commands, but this is fine for now.
Closes https://github.com/astral-sh/uv/issues/2688.
## Test Plan
`cargo test`
This test was introduced in 42973cd9cb. It
looks like it compares some values against some platform specific code
that attempts to find the OS version. But the comparisons made some
assumptions about what kind of data is available. In this commit, we try
to make the test a little more flexible on Linux by not assuming that
`Option` values are `Some`.
## Summary
This PR enables the source distribution database to be used with unnamed
requirements (i.e., URLs without a package name). The (significant)
upside here is that we can now use PEP 517 hooks to resolve unnamed
requirement metadata _and_ reuse any computation in the cache.
The changes to `crates/uv-distribution/src/source/mod.rs` are quite
extensive, but mostly mechanical. The core idea is that we introduce a
new `BuildableSource` abstraction, which can either be a distribution,
or an unnamed URL:
```rust
/// A reference to a source that can be built into a built distribution.
///
/// This can either be a distribution (e.g., a package on a registry) or a direct URL.
///
/// Distributions can _also_ point to URLs in lieu of a registry; however, the primary distinction
/// here is that a distribution will always include a package name, while a URL will not.
#[derive(Debug, Clone, Copy)]
pub enum BuildableSource<'a> {
Dist(&'a SourceDist),
Url(SourceUrl<'a>),
}
```
All the methods on the source distribution database now accept
`BuildableSource`. `BuildableSource` has a `name()` method, but it
returns `Option<&PackageName>`, and everything is required to work with
and without a package name.
The main drawback of this approach (which isn't a terrible one) is that
we can no longer include the package name in the cache. (We do continue
to use the package name for registry-based distributions, since those
always have a name.). The package name was included in the cache route
for two reasons: (1) it's nice for debugging; and (2) we use it to power
`uv cache clean flask`, to identify the entries that are relevant for
Flask.
To solve this, I changed the `uv cache clean` code to look one level
deeper. So, when we want to determine whether to remove the cache entry
for a given URL, we now look into the directory to see if there are any
wheels that match the package name. This isn't as nice, but it does work
(and we have test coverage for it -- all passing).
I also considered removing the package name from the cache routes for
non-registry _wheels_, for consistency... But, it would require a cache
bump, and it didn't feel important enough to merit that.