## Summary
I haven't tested on Windows yet, but the idea here is that we should use
a portable representation when printing paths.
I decided to limit the scope here to paths that we write to output
files.
Closes https://github.com/astral-sh/uv/issues/3800.
When parsing requirements from any source, directly parse the url parts
(and reject unsupported urls) instead of parsing url parts at a later
stage. This removes a bunch of error branches and concludes the work
parsing url parts once and passing them around everywhere.
Many usages of the assembled `VerbatimUrl` remain, but these can be
removed incrementally.
Please review commit-by-commit.
## Summary
This seems to be one of the most consistent benchmark cases we have in
terms of standard deviation:
```
➜ hyperfine "target/profiling/main pip compile scripts/requirements/airflow.in" --runs 200
Benchmark 1: target/profiling/main pip compile scripts/requirements/airflow.in
Time (mean ± σ): 292.6 ms ± 6.6 ms [User: 414.1 ms, System: 194.2 ms]
Range (min … max): 282.7 ms … 320.1 ms 200 runs
```
For smaller benchmarks, scispacy and dtlssocket seem to be a bit more
consistent than our current jupyter benchmark, but it hasn't given us
any problems so I'll leave it for now.
## Summary
We now show yanks as part of the resolution diagnostics, so they now
appear for `sync`, `install`, `compile`, and any other operations.
Further, they'll also appear for cached packages (but not packages that
are _already_ installed).
Closes https://github.com/astral-sh/uv/issues/3768.
Closes#3766.
Updates our Python interpreter discovery to conform to the rules
described in #2386, please see that issue for a full description of the
behavior. Briefly, we now will search for interpreters that satisfy a
requested version without stopping at the first Python executable.
Additionally, if retrieving information about an interpreter fails we
will continue to search for a working interpreter. We also add the
plumbing necessary to request Python implementations other than CPython,
though we do not add support for other implementations at this time.
A major internal goal of this work is to prepare for user-facing managed
toolchains i.e. fetching a requested version during `uv run`. These APIs
are not introduced, but there is some managed toolchain handling as
required for our test suite.
Some noteworthy implementation changes:
- The `uv_interpreter::find_python` module has been removed in favor of
a `uv_interpreter::discovery` module.
- There are new types to help structure interpreter requests and track
sources
- Executable discovery is implemented as a big lazy iterator and is a
central authority for source precedence
- `uv_interpreter::Error` variants were split into scoped types in each
module
- There's much more unit test coverage, but not for Windows yet
Remaining work:
- [x] Write new test cases
- [x] Determine correct behavior around executables in the current
directory
- _Future_: Combine `PythonVersion` and `VersionRequest`
- _Future_: Consider splitting `ManagedToolchain` into local and remote
variants
- _Future_: Add Windows unit test coverage
- _Future_: Explore behavior around implementation precedence (i.e.
CPython over PyPy)
Refactors split into:
- #3329
- #3330
- #3331
- #3332Closes#2386
Add minimal support for workspace discovery, only used for determining
paths in the bluejay commands.
We can now discover the workspace structure, namely that the
`pyproject.toml` of a package belongs to a workspace `pyproject.toml`
with members and exclusion. The globbing logic is inspired by cargo. We
don't resolve `workspace = true` metadata declarations yet.
## Summary
Restore API-compatibility with pre-1.1.0 versions of the `zip` crate,
and pin the dependency to the 0.6 series, due to concerns discussed in
https://github.com/astral-sh/uv/issues/3642.
## Test Plan
```
cargo run -p uv-dev -- fetch-python
cargo test
```
## Summary
This PR introduces parallelism to the resolver. Specifically, we can
perform PubGrub resolution on a separate thread, while keeping all I/O
on the tokio thread. We already have the infrastructure set up for this
with the channel and `OnceMap`, which makes this change relatively
simple. The big change needed to make this possible is removing the
lifetimes on some of the types that need to be shared between the
resolver and pubgrub thread.
A related PR, https://github.com/astral-sh/uv/pull/1163, found that
adding `yield_now` calls improved throughput. With optimal scheduling we
might be able to get away with everything on the same thread here.
However, in the ideal pipeline with perfect prefetching, the resolution
and prefetching can run completely in parallel without depending on one
another. While this would be very difficult to achieve, even with our
current prefetching pattern we see a consistent performance improvement
from parallelism.
This does also require reverting a few of the changes from
https://github.com/astral-sh/uv/pull/3413, but not all of them. The
sharing is isolated to the resolver task.
## Test Plan
On smaller tasks performance is mixed with ~2% improvements/regressions
on both sides. However, on medium-large resolution tasks we see the
benefits of parallelism, with improvements anywhere from 10-50%.
```
./scripts/requirements/jupyter.in
Benchmark 1: ./target/profiling/baseline (resolve-warm)
Time (mean ± σ): 29.2 ms ± 1.8 ms [User: 20.3 ms, System: 29.8 ms]
Range (min … max): 26.4 ms … 36.0 ms 91 runs
Benchmark 2: ./target/profiling/parallel (resolve-warm)
Time (mean ± σ): 25.5 ms ± 1.0 ms [User: 19.5 ms, System: 25.5 ms]
Range (min … max): 23.6 ms … 27.8 ms 99 runs
Summary
./target/profiling/parallel (resolve-warm) ran
1.15 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)
```
```
./scripts/requirements/boto3.in
Benchmark 1: ./target/profiling/baseline (resolve-warm)
Time (mean ± σ): 487.1 ms ± 6.2 ms [User: 464.6 ms, System: 61.6 ms]
Range (min … max): 480.0 ms … 497.3 ms 10 runs
Benchmark 2: ./target/profiling/parallel (resolve-warm)
Time (mean ± σ): 430.8 ms ± 9.3 ms [User: 529.0 ms, System: 77.2 ms]
Range (min … max): 417.1 ms … 442.5 ms 10 runs
Summary
./target/profiling/parallel (resolve-warm) ran
1.13 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm)
```
```
./scripts/requirements/airflow.in
Benchmark 1: ./target/profiling/baseline (resolve-warm)
Time (mean ± σ): 478.1 ms ± 18.8 ms [User: 482.6 ms, System: 205.0 ms]
Range (min … max): 454.7 ms … 508.9 ms 10 runs
Benchmark 2: ./target/profiling/parallel (resolve-warm)
Time (mean ± σ): 308.7 ms ± 11.7 ms [User: 428.5 ms, System: 209.5 ms]
Range (min … max): 287.8 ms … 323.1 ms 10 runs
Summary
./target/profiling/parallel (resolve-warm) ran
1.55 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)
```
## Summary
This PR consolidates the concurrency limits used throughout `uv` and
exposes two limits, `UV_CONCURRENT_DOWNLOADS` and
`UV_CONCURRENT_BUILDS`, as environment variables.
Currently, `uv` has a number of concurrent streams that it buffers using
relatively arbitrary limits for backpressure. However, many of these
limits are conflated. We run a relatively small number of tasks overall
and should start most things as soon as possible. What we really want to
limit are three separate operations:
- File I/O. This is managed by tokio's blocking pool and we should not
really have to worry about it.
- Network I/O.
- Python build processes.
Because the current limits span a broad range of tasks, it's possible
that a limit meant for network I/O is occupied by tasks performing
builds, reading from the file system, or even waiting on a `OnceMap`. We
also don't limit build processes that end up being required to perform a
download. While this may not pose a performance problem because our
limits are relatively high, it does mean that the limits do not do what
we want, making it tricky to expose them to users
(https://github.com/astral-sh/uv/issues/1205,
https://github.com/astral-sh/uv/issues/3311).
After this change, the limits on network I/O and build processes are
centralized and managed by semaphores. All other tasks are unbuffered
(note that these tasks are still bounded, so backpressure should not be
a problem).
This only makes hashes optional for wheels/sdists that come from
registires or direct URLs. For wheels/sdists that come from other
sources, a hash should not be present.
For path dependencies, a hash should not be present because the state of
the path dependency is not intended to be tracked in the lock file. This
is consistent with how other tools deal with path dependencies, and if
it were otherwise, the hash would I believe need to be updated for every
change to the path dependency.
For git dependencies (source dists only), a hash should not be present
because the lock will contain the specific commit revision hash. This is
functionally equivalent to a hash, and so a hash is redundant.
As part of this change, we validate the presence or absence of a hash
based on the dependency source. We also add our first regression tests.
## Summary
This is universal environment variable used to determine the mac OS
deployment target. We now respect it in `--python-platform` -- so we
default to 12.0, but users can override it as needed.
Pubgrub got a new feature where all unavailability is a custom, instead
of the reasonless `UnavailableDependencies` and our custom `String` type
previously (https://github.com/pubgrub-rs/pubgrub/pull/208). This PR
introduces a `UnavailableReason` that tracks either an entire version
being unusable, or a specific version. The error messages now also track
this difference properly.
The pubgrub commit is our main rebased onto the merged
https://github.com/pubgrub-rs/pubgrub/pull/208, i'll push
`konsti/main-rebase-generic-reason` to `main` after checking for rebase
problems.