mirror of https://github.com/astral-sh/uv
9 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
76418f5bdf
|
Arc-wrap `PubGrubPackage` for cheap cloning in pubgrub (#3688)
Pubgrub stores incompatibilities as (package name, version range) tuples, meaning it needs to clone the package name for each incompatibility, and each non-borrowed operation on incompatibilities. https://github.com/astral-sh/uv/pull/3673 made me realize that `PubGrubPackage` has gotten large (expensive to copy), so like `Version` and other structs, i've added an `Arc` wrapper around it. It's a pity clippy forbids `.deref()`, it's less opaque than `&**` and has IDE support (clicking on `.deref()` jumps to the right impl). ## Benchmarks It looks like this matters most for complex resolutions which, i assume because they carry larger `PubGrubPackageInner::Package` and `PubGrubPackageInner::Extra` types. ```bash hyperfine --warmup 5 "./uv-main pip compile -q ./scripts/requirements/jupyter.in" "./uv-branch pip compile -q ./scripts/requirements/jupyter.in" hyperfine --warmup 5 "./uv-main pip compile -q ./scripts/requirements/airflow.in" "./uv-branch pip compile -q ./scripts/requirements/airflow.in" hyperfine --warmup 5 "./uv-main pip compile -q ./scripts/requirements/boto3.in" "./uv-branch pip compile -q ./scripts/requirements/boto3.in" ``` ``` Benchmark 1: ./uv-main pip compile -q ./scripts/requirements/jupyter.in Time (mean ± σ): 18.2 ms ± 1.6 ms [User: 14.4 ms, System: 26.0 ms] Range (min … max): 15.8 ms … 22.5 ms 181 runs Benchmark 2: ./uv-branch pip compile -q ./scripts/requirements/jupyter.in Time (mean ± σ): 17.8 ms ± 1.4 ms [User: 14.4 ms, System: 25.3 ms] Range (min … max): 15.4 ms … 23.1 ms 159 runs Summary ./uv-branch pip compile -q ./scripts/requirements/jupyter.in ran 1.02 ± 0.12 times faster than ./uv-main pip compile -q ./scripts/requirements/jupyter.in ``` ``` Benchmark 1: ./uv-main pip compile -q ./scripts/requirements/airflow.in Time (mean ± σ): 153.7 ms ± 3.5 ms [User: 165.2 ms, System: 157.6 ms] Range (min … max): 150.4 ms … 163.0 ms 19 runs Benchmark 2: ./uv-branch pip compile -q ./scripts/requirements/airflow.in Time (mean ± σ): 123.9 ms ± 4.6 ms [User: 152.4 ms, System: 133.8 ms] Range (min … max): 118.4 ms … 138.1 ms 24 runs Summary ./uv-branch pip compile -q ./scripts/requirements/airflow.in ran 1.24 ± 0.05 times faster than ./uv-main pip compile -q ./scripts/requirements/airflow.in ``` ``` Benchmark 1: ./uv-main pip compile -q ./scripts/requirements/boto3.in Time (mean ± σ): 327.0 ms ± 3.8 ms [User: 344.5 ms, System: 71.6 ms] Range (min … max): 322.7 ms … 334.6 ms 10 runs Benchmark 2: ./uv-branch pip compile -q ./scripts/requirements/boto3.in Time (mean ± σ): 311.2 ms ± 3.1 ms [User: 339.3 ms, System: 63.1 ms] Range (min … max): 307.8 ms … 317.0 ms 10 runs Summary ./uv-branch pip compile -q ./scripts/requirements/boto3.in ran 1.05 ± 0.02 times faster than ./uv-main pip compile -q ./scripts/requirements/boto3.in ``` <!-- Thank you for contributing to uv! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> |
|
|
|
776a7e47f3 |
uv-resolver: add `Option<MarkerTree>` to `PubGrubPackage`
This just adds the field to the type and always sets it to `None`. There are semantic changes in this commit. Closes #3359 |
|
|
|
eac8221718 |
uv-resolver: use named fields for some PubGrubPackage variants
I'm planning to add another field here (markers), which puts a lot of stress on the positional approach. So let's just switch over to named fields. |
|
|
|
39af09f09b
|
Parallelize resolver (#3627)
## Summary This PR introduces parallelism to the resolver. Specifically, we can perform PubGrub resolution on a separate thread, while keeping all I/O on the tokio thread. We already have the infrastructure set up for this with the channel and `OnceMap`, which makes this change relatively simple. The big change needed to make this possible is removing the lifetimes on some of the types that need to be shared between the resolver and pubgrub thread. A related PR, https://github.com/astral-sh/uv/pull/1163, found that adding `yield_now` calls improved throughput. With optimal scheduling we might be able to get away with everything on the same thread here. However, in the ideal pipeline with perfect prefetching, the resolution and prefetching can run completely in parallel without depending on one another. While this would be very difficult to achieve, even with our current prefetching pattern we see a consistent performance improvement from parallelism. This does also require reverting a few of the changes from https://github.com/astral-sh/uv/pull/3413, but not all of them. The sharing is isolated to the resolver task. ## Test Plan On smaller tasks performance is mixed with ~2% improvements/regressions on both sides. However, on medium-large resolution tasks we see the benefits of parallelism, with improvements anywhere from 10-50%. ``` ./scripts/requirements/jupyter.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 29.2 ms ± 1.8 ms [User: 20.3 ms, System: 29.8 ms] Range (min … max): 26.4 ms … 36.0 ms 91 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 25.5 ms ± 1.0 ms [User: 19.5 ms, System: 25.5 ms] Range (min … max): 23.6 ms … 27.8 ms 99 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.15 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm) ``` ``` ./scripts/requirements/boto3.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 487.1 ms ± 6.2 ms [User: 464.6 ms, System: 61.6 ms] Range (min … max): 480.0 ms … 497.3 ms 10 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 430.8 ms ± 9.3 ms [User: 529.0 ms, System: 77.2 ms] Range (min … max): 417.1 ms … 442.5 ms 10 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.13 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm) ``` ``` ./scripts/requirements/airflow.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 478.1 ms ± 18.8 ms [User: 482.6 ms, System: 205.0 ms] Range (min … max): 454.7 ms … 508.9 ms 10 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 308.7 ms ± 11.7 ms [User: 428.5 ms, System: 209.5 ms] Range (min … max): 287.8 ms … 323.1 ms 10 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.55 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm) ``` |
|
|
|
018a7150d6
|
uv-distribution: include all wheels in distribution types (#3595)
Our current flow of data from "simple registry package" to "final resolved distribution" goes through a number of types: * `SimpleMetadata` is the API response from a registry that includes all published versions for a package. Each version has an assortment of metadata associated with it. * `VersionFiles` is the aforementioned metadata. It is split in two: a group of files for source distributions and a group of files for wheels. * `PrioritizedDist` collects a subset of the files from `VersionFiles` to form a selection of the "best" sdist and the "best" wheel for the current environment. * `CompatibleDist` is created from a borrowed `PrioritizedDist` that, perhaps among other things, encapsulates the decision of whether to pick an sdist or a wheel. (This decision depends both on compatibility and the action being performed. e.g., When doing installation, a `CompatibleDist` will sometimes select an sdist over a wheel.) * `ResolvedDistRef` is like a `ResolvedDist`, but borrows a `Dist`. * `ResolvedDist` is the almost-final-form of a distribution in a resolution and is created from a `ResolvedDistRef`. * `AnnotatedResolvedDist` is a new data type that is the actual final form of a distribution that a universal lock file cares about. It bundles a `ResolvedDist` with some metadata needed to generate a lock file. One of the requirements of a universal lock file is that we include all wheels (and maybe all source distributions? but at least one if it's present) associated with a distribution. But the above flow of data (in the step from `VersionFiles` to `PrioritizedDist`) drops all wheels except for the best one. To remedy this, in this PR, we rejigger `PrioritizedDist`, `CompatibleDist` and `ResolvedDistRef` so that all wheel data is preserved. And when a `ResolvedDistRef` is finally turned into a `ResolvedDist`, we copy all of the wheel data. And finally, we adjust the `Lock` constructor to read this new data and include it in the lock file. To make this work, we also modify `RegistryBuiltDist` so that it can contain one or more wheels instead of just one. One shortcoming here (called out in the code as a FIXME) is that if a source distribution is selected as the "best" thing to use (perhaps there are no compatible wheels), then the wheels won't end up in the lock file. I plan to fix this in a follow-up PR. We also aren't totally consistent on source distribution naming. Sometimes we use `sdist`. Sometimes `source`. Sometimes `source_dist`. I think it'd be nice to just use `sdist` everywhere, but I do prefer the type names to be `SourceDist`. And sometimes you want function names to match the type names (i.e., `from_source_dist`), which in turn leads to an appearance of inconsistency. I'm open to ideas. Closes #3351 |
|
|
|
d1b07a3f49
|
Log versions tried from batch prefetch (#3090)
This is required for evaluating #3087. This also removes tracking of virtual packages from extras from the batch prefetcher (we only track real packages). Let's look at some stats: * jupyter: Tried 100 versions: anyio 1, argon2-cffi 1, argon2-cffi-bindings 1, arrow 1, asttokens 1, async-lru 1, attrs 1, babel 1, beautifulsoup4 1, bleach 1, certifi 1, cffi 1, charset-normalizer 1, comm 1, debugpy 1, decorator 1, defusedxml 1, exceptiongroup 1, executing 1, fastjsonschema 1, fqdn 1, h11 1, httpcore 1, httpx 1, idna 1, ipykernel 1, ipython 1, ipywidgets 1, isoduration 1, jedi 1, jinja2 1, json5 1, jsonpointer 1, jsonschema 1, jsonschema-specifications 1, jupyter 1, jupyter-client 1, jupyter-console 1, jupyter-core 1, jupyter-events 1, jupyter-lsp 1, jupyter-server 1, jupyter-server-terminals 1, jupyterlab 1, jupyterlab-pygments 1, jupyterlab-server 1, jupyterlab-widgets 1, markupsafe 1, matplotlib-inline 1, mistune 1, nbclient 1, nbconvert 1, nbformat 1, nest-asyncio 1, notebook 1, notebook-shim 1, overrides 1, packaging 1, pandocfilters 1, parso 1, pexpect 1, platformdirs 1, prometheus-client 1, prompt-toolkit 1, psutil 1, ptyprocess 1, pure-eval 1, pycparser 1, pygments 1, python-dateutil 1, python-json-logger 1, pyyaml 1, pyzmq 1, qtconsole 1, qtpy 1, referencing 1, requests 1, rfc3339-validator 1, rfc3986-validator 1, root 1, rpds-py 1, send2trash 1, six 1, sniffio 1, soupsieve 1, stack-data 1, terminado 1, tinycss2 1, tomli 1, tornado 1, traitlets 1, types-python-dateutil 1, typing-extensions 1, uri-template 1, urllib3 1, wcwidth 1, webcolors 1, webencodings 1, websocket-client 1, widgetsnbextension 1 * boto3: botocore 1697, boto3 849, urllib3 2, jmespath 1, python-dateutil 1, root 1, s3transfer 1, six 1 * transformers-extras: Tried 1191 versions: sagemaker 152, hypothesis 67, tensorflow 21, jsonschema 19, tensorflow-cpu 18, multiprocess 10, pathos 10, tensorflow-text 10, chex 8, tf-keras 8, tf2onnx 8, aiohttp 6, aiosignal 6, alembic 6, annotated-types 6, apscheduler 6, attrs 6, backoff 6, binaryornot 6, black 6, boto3 6, click 6, coloredlogs 6, colorlog 6, dash 6, dash-bootstrap-components 6, dlinfo 6, exceptiongroup 6, execnet 6, fire 6, frozenlist 6, gitdb 6, google-auth 6, google-auth-oauthlib 6, hjson 6, iniconfig 6, jinja2-time 6, markdown 6, markdown-it-py 6, markupsafe 6, mpmath 6, namex 6, nbformat 6, ninja 6, nvidia-nvjitlink-cu12 6, onnxconverter-common 6, pandas 6, plac 6, platformdirs 6, pluggy 6, portalocker 6, poyo 6, protobuf3-to-dict 6, py-cpuinfo 6, py3nvml 6, pyarrow 6, pyarrow-hotfix 6, pydantic-core 6, pygments 6, pynvml 6, pypng 6, python-slugify 6, responses 6, smdebug-rulesconfig 6, soupsieve 6, sqlalchemy 6, tensorboard-data-server 6, tensorboard-plugin-wit 6, tensorboardx 6, threadpoolctl 6, tomli 6, wasabi 6, wcwidth 6, werkzeug 6, wheel 6, xxhash 6, zipp 6, etils 5, tensorboard 5, beautifulsoup4 4, cffi 4, clldutils 4, codecarbon 4, datasets 4, dill 4, evaluate 4, gitpython 4, hf-doc-builder 4, kenlm 4, librosa 4, llvmlite 4, nest-asyncio 4, nltk 4, optuna 4, parameterized 4, phonemizer 4, psutil 4, pyctcdecode 4, pytest 4, pytest-timeout 4, pytest-xdist 4, ray 4, rjieba 4, rouge-score 4, ruff 4, sacrebleu 4, sacremoses 4, sigopt 4, sortedcontainers 4, tensorstore 4, timeout-decorator 4, toolz 4, torchaudio 4, accelerate 3, audioread 3, cookiecutter 3, decorator 3, deepspeed 3, faiss-cpu 3, flax 3, fugashi 3, ipadic 3, isort 3, jax 3, jaxlib 3, joblib 3, keras-nlp 3, lazy-loader 3, numba 3, optax 3, pooch 3, pydantic 3, pygtrie 3, rhoknp 3, scikit-learn 3, segments 3, soundfile 3, soxr 3, sudachidict-core 3, sudachipy 3, torch 3, unidic 3, unidic-lite 3, urllib3 3, absl-py 2, arrow 2, astunparse 2, async-timeout 2, botocore 2, cachetools 2, certifi 2, chardet 2, charset-normalizer 2, csvw 2, dash-core-components 2, dash-html-components 2, dash-table 2, diffusers 2, dm-tree 2, fastjsonschema 2, flask 2, flatbuffers 2, fsspec 2, ftfy 2, gast 2, google-pasta 2, greenlet 2, grpcio 2, h5py 2, humanfriendly 2, idna 2, importlib-metadata 2, importlib-resources 2, jinja2 2, jmespath 2, jupyter-core 2, kagglehub 2, keras 2, keras-core 2, keras-preprocessing 2, libclang 2, mako 2, mdurl 2, ml-dtypes 2, msgpack 2, multidict 2, mypy-extensions 2, networkx 2, nvidia-cublas-cu12 2, nvidia-cuda-cupti-cu12 2, nvidia-cuda-nvrtc-cu12 2, nvidia-cuda-runtime-cu12 2, nvidia-cudnn-cu12 2, nvidia-cufft-cu12 2, nvidia-curand-cu12 2, nvidia-cusolver-cu12 2, nvidia-cusparse-cu12 2, nvidia-nccl-cu12 2, nvidia-nvtx-cu12 2, onnx 2, onnxruntime 2, onnxruntime-tools 2, opencv-python 2, opt-einsum 2, orbax-checkpoint 2, pathspec 2, plotly 2, pox 2, ppft 2, pyasn1-modules 2, pycparser 2, pyrsistent 2, python-dateutil 2, pytz 2, requests-oauthlib 2, retrying 2, rich 2, rsa 2, s3transfer 2, scipy 2, setuptools 2, six 2, smmap 2, sympy 2, tabulate 2, tensorflow-estimator 2, tensorflow-hub 2, tensorflow-io-gcs-filesystem 2, termcolor 2, text-unidecode 2, traitlets 2, triton 2, typing-extensions 2, tzdata 2, tzlocal 2, wrapt 2, xmltodict 2, yarl 2, Python 1, av 1, babel 1, bibtexparser 1, blinker 1, colorama 1, decord 1, filelock 1, huggingface-hub 1, isodate 1, itsdangerous 1, language-tags 1, lxml 1, numpy 1, oauthlib 1, packaging 1, pillow 1, protobuf 1, pyasn1 1, pylatexenc 1, pyparsing 1, pyyaml 1, rdflib 1, regex 1, requests 1, rfc3986 1, root 1, safetensors 1, sentencepiece 1, tenacity 1, timm 1, tokenizers 1, torchvision 1, tqdm 1, transformers 1, types-python-dateutil 1, uritemplate 1 You can reproduce them with python 3.10 and: ``` RUST_LOG=uv_resolver=debug cargo run pip compile -o /dev/null -q scripts/requirements/<input>.in 2>&1 | tail -n 1 ``` Closes #2270 - This is less invasive compared to the other PR, we can revisit number of network/build request tracking later. |
|
|
|
96c3c2e774
|
Support unnamed requirements in `--require-hashes` (#2993)
## Summary This PR enables `--require-hashes` with unnamed requirements. The key change is that `PackageId` becomes `VersionId` (since it refers to a package at a specific version), and the new `PackageId` consists of _either_ a package name _or_ a URL. The hashes are keyed by `PackageId`, so we can generate the `RequiredHashes` before we have names for all packages, and enforce them throughout. Closes #2979. |
|
|
|
f9c0632953
|
Ignore direct URL distributions in prefetcher (#2943)
## Summary The prefetcher tallies the number of times we tried a given package, and then once we hit a threshold, grabs the version map, assuming it's already been fetched. For direct URL distributions, though, we don't have a version map! And there's no need to prefetch. Closes https://github.com/astral-sh/uv/issues/2941. |
|
|
|
fb4ba2bbc2
|
Speed up cold cache urllib3/boto3/botocore with batched prefetching (#2452)
With pubgrub being fast for complex ranges, we can now compute the next n candidates without taking a performance hit. This speeds up cold cache `urllib3<1.25.4` `boto3` from maybe 40s - 50s to ~2s. See docstrings for details on the heuristics. **Before**  **After**  --- We need two parts of the prefetching, first looking for compatible version and then falling back to flat next versions. After we selected a boto3 version, there is only one compatible botocore version remaining, so when won't find other compatible candidates for prefetching. We see this as a pattern where we only prefetch boto3 (stack bars), but not botocore (sequential requests between the stacked bars).  The risk is that we're completely wrong with the guess and cause a lot of useless network requests. I think this is acceptable since this mechanism only triggers when we're already on the bad path and we should simply have fetched all versions after some seconds (assuming a fast index like pypi). --- It would be even better if the pubgrub state was copy-on-write so we could simulate more progress than we actually have; currently we're guessing what the next version is which could be completely wrong, but i think this is still a valuable heuristic. Fixes #170. |