Commit Graph

9 Commits

Author SHA1 Message Date
konsti 76418f5bdf
Arc-wrap `PubGrubPackage` for cheap cloning in pubgrub (#3688)
Pubgrub stores incompatibilities as (package name, version range)
tuples, meaning it needs to clone the package name for each
incompatibility, and each non-borrowed operation on incompatibilities.
https://github.com/astral-sh/uv/pull/3673 made me realize that
`PubGrubPackage` has gotten large (expensive to copy), so like `Version`
and other structs, i've added an `Arc` wrapper around it.

It's a pity clippy forbids `.deref()`, it's less opaque than `&**` and
has IDE support (clicking on `.deref()` jumps to the right impl).

## Benchmarks

It looks like this matters most for complex resolutions which, i assume
because they carry larger `PubGrubPackageInner::Package` and
`PubGrubPackageInner::Extra` types.

```bash
hyperfine --warmup 5 "./uv-main pip compile -q ./scripts/requirements/jupyter.in" "./uv-branch pip compile -q ./scripts/requirements/jupyter.in"
hyperfine --warmup 5 "./uv-main pip compile -q ./scripts/requirements/airflow.in" "./uv-branch pip compile -q ./scripts/requirements/airflow.in"
hyperfine --warmup 5 "./uv-main pip compile -q ./scripts/requirements/boto3.in" "./uv-branch pip compile -q ./scripts/requirements/boto3.in"
```

```
Benchmark 1: ./uv-main pip compile -q ./scripts/requirements/jupyter.in
  Time (mean ± σ):      18.2 ms ±   1.6 ms    [User: 14.4 ms, System: 26.0 ms]
  Range (min … max):    15.8 ms …  22.5 ms    181 runs

Benchmark 2: ./uv-branch pip compile -q ./scripts/requirements/jupyter.in
  Time (mean ± σ):      17.8 ms ±   1.4 ms    [User: 14.4 ms, System: 25.3 ms]
  Range (min … max):    15.4 ms …  23.1 ms    159 runs

Summary
  ./uv-branch pip compile -q ./scripts/requirements/jupyter.in ran
    1.02 ± 0.12 times faster than ./uv-main pip compile -q ./scripts/requirements/jupyter.in
```

```
Benchmark 1: ./uv-main pip compile -q ./scripts/requirements/airflow.in
  Time (mean ± σ):     153.7 ms ±   3.5 ms    [User: 165.2 ms, System: 157.6 ms]
  Range (min … max):   150.4 ms … 163.0 ms    19 runs

Benchmark 2: ./uv-branch pip compile -q ./scripts/requirements/airflow.in
  Time (mean ± σ):     123.9 ms ±   4.6 ms    [User: 152.4 ms, System: 133.8 ms]
  Range (min … max):   118.4 ms … 138.1 ms    24 runs

Summary
  ./uv-branch pip compile -q ./scripts/requirements/airflow.in ran
    1.24 ± 0.05 times faster than ./uv-main pip compile -q ./scripts/requirements/airflow.in
```

```
Benchmark 1: ./uv-main pip compile -q ./scripts/requirements/boto3.in
  Time (mean ± σ):     327.0 ms ±   3.8 ms    [User: 344.5 ms, System: 71.6 ms]
  Range (min … max):   322.7 ms … 334.6 ms    10 runs

Benchmark 2: ./uv-branch pip compile -q ./scripts/requirements/boto3.in
  Time (mean ± σ):     311.2 ms ±   3.1 ms    [User: 339.3 ms, System: 63.1 ms]
  Range (min … max):   307.8 ms … 317.0 ms    10 runs

Summary
  ./uv-branch pip compile -q ./scripts/requirements/boto3.in ran
    1.05 ± 0.02 times faster than ./uv-main pip compile -q ./scripts/requirements/boto3.in
```

<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
2024-05-21 13:49:35 +02:00
Andrew Gallant 776a7e47f3 uv-resolver: add `Option<MarkerTree>` to `PubGrubPackage`
This just adds the field to the type and always sets it to `None`. There
are semantic changes in this commit.

Closes #3359
2024-05-20 19:56:24 -04:00
Andrew Gallant eac8221718 uv-resolver: use named fields for some PubGrubPackage variants
I'm planning to add another field here (markers), which puts a lot of
stress on the positional approach. So let's just switch over to named
fields.
2024-05-20 19:56:24 -04:00
Ibraheem Ahmed 39af09f09b
Parallelize resolver (#3627)
## Summary

This PR introduces parallelism to the resolver. Specifically, we can
perform PubGrub resolution on a separate thread, while keeping all I/O
on the tokio thread. We already have the infrastructure set up for this
with the channel and `OnceMap`, which makes this change relatively
simple. The big change needed to make this possible is removing the
lifetimes on some of the types that need to be shared between the
resolver and pubgrub thread.

A related PR, https://github.com/astral-sh/uv/pull/1163, found that
adding `yield_now` calls improved throughput. With optimal scheduling we
might be able to get away with everything on the same thread here.
However, in the ideal pipeline with perfect prefetching, the resolution
and prefetching can run completely in parallel without depending on one
another. While this would be very difficult to achieve, even with our
current prefetching pattern we see a consistent performance improvement
from parallelism.

This does also require reverting a few of the changes from
https://github.com/astral-sh/uv/pull/3413, but not all of them. The
sharing is isolated to the resolver task.

## Test Plan

On smaller tasks performance is mixed with ~2% improvements/regressions
on both sides. However, on medium-large resolution tasks we see the
benefits of parallelism, with improvements anywhere from 10-50%.

```
./scripts/requirements/jupyter.in
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):      29.2 ms ±   1.8 ms    [User: 20.3 ms, System: 29.8 ms]
  Range (min … max):    26.4 ms …  36.0 ms    91 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):      25.5 ms ±   1.0 ms    [User: 19.5 ms, System: 25.5 ms]
  Range (min … max):    23.6 ms …  27.8 ms    99 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.15 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)
```
```
./scripts/requirements/boto3.in   
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):     487.1 ms ±   6.2 ms    [User: 464.6 ms, System: 61.6 ms]
  Range (min … max):   480.0 ms … 497.3 ms    10 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):     430.8 ms ±   9.3 ms    [User: 529.0 ms, System: 77.2 ms]
  Range (min … max):   417.1 ms … 442.5 ms    10 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.13 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm)
```
```
./scripts/requirements/airflow.in 
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):     478.1 ms ±  18.8 ms    [User: 482.6 ms, System: 205.0 ms]
  Range (min … max):   454.7 ms … 508.9 ms    10 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):     308.7 ms ±  11.7 ms    [User: 428.5 ms, System: 209.5 ms]
  Range (min … max):   287.8 ms … 323.1 ms    10 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.55 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)
```
2024-05-17 11:47:30 -04:00
Andrew Gallant 018a7150d6
uv-distribution: include all wheels in distribution types (#3595)
Our current flow of data from "simple registry package" to "final
resolved distribution" goes through a number of types:

* `SimpleMetadata` is the API response from a registry that includes all
published versions for a package. Each version has an assortment of
metadata
associated with it.
* `VersionFiles` is the aforementioned metadata. It is split in two: a
group of files for source distributions and a group of files for wheels.
* `PrioritizedDist` collects a subset of the files from `VersionFiles`
to form a selection of the "best" sdist and the "best" wheel for the
current environment.
* `CompatibleDist` is created from a borrowed `PrioritizedDist` that,
perhaps among other things, encapsulates the decision of whether to pick
an sdist or a wheel. (This decision depends both on compatibility and
the action being performed. e.g., When doing installation, a
`CompatibleDist` will sometimes select an sdist over a wheel.)
* `ResolvedDistRef` is like a `ResolvedDist`, but borrows a `Dist`.
* `ResolvedDist` is the almost-final-form of a distribution in a
resolution and is created from a `ResolvedDistRef`.
* `AnnotatedResolvedDist` is a new data type that is the actual final
form of a distribution that a universal lock file cares about. It
bundles a `ResolvedDist` with some metadata needed to generate a lock
file.

One of the requirements of a universal lock file is that we include all
wheels (and maybe all source distributions? but at least one if it's
present) associated with a distribution. But the above flow of data (in
the step from `VersionFiles` to `PrioritizedDist`) drops all wheels
except for the best one.

To remedy this, in this PR, we rejigger `PrioritizedDist`,
`CompatibleDist` and `ResolvedDistRef` so that all wheel data is
preserved. And when a `ResolvedDistRef` is finally turned into a
`ResolvedDist`, we copy all of the wheel data. And finally, we adjust
the `Lock` constructor to read this new data and include it in the lock
file. To make this work, we also modify `RegistryBuiltDist` so that it
can contain one or more wheels instead of just one.

One shortcoming here (called out in the code as a FIXME) is that if a
source distribution is selected as the "best" thing to use (perhaps
there are no compatible wheels), then the wheels won't end up in the
lock file. I plan to fix this in a follow-up PR.

We also aren't totally consistent on source distribution naming.
Sometimes we use `sdist`. Sometimes `source`. Sometimes `source_dist`.
I think it'd be nice to just use `sdist` everywhere, but I do prefer
the type names to be `SourceDist`. And sometimes you want function
names to match the type names (i.e., `from_source_dist`), which in turn
leads to an appearance of inconsistency. I'm open to ideas.

Closes #3351
2024-05-15 15:07:28 -04:00
konsti d1b07a3f49
Log versions tried from batch prefetch (#3090)
This is required for evaluating #3087.

This also removes tracking of virtual packages from extras from the
batch prefetcher (we only track real packages).

Let's look at some stats:
* jupyter: Tried 100 versions: anyio 1, argon2-cffi 1,
argon2-cffi-bindings 1, arrow 1, asttokens 1, async-lru 1, attrs 1,
babel 1, beautifulsoup4 1, bleach 1, certifi 1, cffi 1,
charset-normalizer 1, comm 1, debugpy 1, decorator 1, defusedxml 1,
exceptiongroup 1, executing 1, fastjsonschema 1, fqdn 1, h11 1, httpcore
1, httpx 1, idna 1, ipykernel 1, ipython 1, ipywidgets 1, isoduration 1,
jedi 1, jinja2 1, json5 1, jsonpointer 1, jsonschema 1,
jsonschema-specifications 1, jupyter 1, jupyter-client 1,
jupyter-console 1, jupyter-core 1, jupyter-events 1, jupyter-lsp 1,
jupyter-server 1, jupyter-server-terminals 1, jupyterlab 1,
jupyterlab-pygments 1, jupyterlab-server 1, jupyterlab-widgets 1,
markupsafe 1, matplotlib-inline 1, mistune 1, nbclient 1, nbconvert 1,
nbformat 1, nest-asyncio 1, notebook 1, notebook-shim 1, overrides 1,
packaging 1, pandocfilters 1, parso 1, pexpect 1, platformdirs 1,
prometheus-client 1, prompt-toolkit 1, psutil 1, ptyprocess 1, pure-eval
1, pycparser 1, pygments 1, python-dateutil 1, python-json-logger 1,
pyyaml 1, pyzmq 1, qtconsole 1, qtpy 1, referencing 1, requests 1,
rfc3339-validator 1, rfc3986-validator 1, root 1, rpds-py 1, send2trash
1, six 1, sniffio 1, soupsieve 1, stack-data 1, terminado 1, tinycss2 1,
tomli 1, tornado 1, traitlets 1, types-python-dateutil 1,
typing-extensions 1, uri-template 1, urllib3 1, wcwidth 1, webcolors 1,
webencodings 1, websocket-client 1, widgetsnbextension 1
* boto3: botocore 1697, boto3 849, urllib3 2, jmespath 1,
python-dateutil 1, root 1, s3transfer 1, six 1
* transformers-extras: Tried 1191 versions: sagemaker 152, hypothesis
67, tensorflow 21, jsonschema 19, tensorflow-cpu 18, multiprocess 10,
pathos 10, tensorflow-text 10, chex 8, tf-keras 8, tf2onnx 8, aiohttp 6,
aiosignal 6, alembic 6, annotated-types 6, apscheduler 6, attrs 6,
backoff 6, binaryornot 6, black 6, boto3 6, click 6, coloredlogs 6,
colorlog 6, dash 6, dash-bootstrap-components 6, dlinfo 6,
exceptiongroup 6, execnet 6, fire 6, frozenlist 6, gitdb 6, google-auth
6, google-auth-oauthlib 6, hjson 6, iniconfig 6, jinja2-time 6, markdown
6, markdown-it-py 6, markupsafe 6, mpmath 6, namex 6, nbformat 6, ninja
6, nvidia-nvjitlink-cu12 6, onnxconverter-common 6, pandas 6, plac 6,
platformdirs 6, pluggy 6, portalocker 6, poyo 6, protobuf3-to-dict 6,
py-cpuinfo 6, py3nvml 6, pyarrow 6, pyarrow-hotfix 6, pydantic-core 6,
pygments 6, pynvml 6, pypng 6, python-slugify 6, responses 6,
smdebug-rulesconfig 6, soupsieve 6, sqlalchemy 6,
tensorboard-data-server 6, tensorboard-plugin-wit 6, tensorboardx 6,
threadpoolctl 6, tomli 6, wasabi 6, wcwidth 6, werkzeug 6, wheel 6,
xxhash 6, zipp 6, etils 5, tensorboard 5, beautifulsoup4 4, cffi 4,
clldutils 4, codecarbon 4, datasets 4, dill 4, evaluate 4, gitpython 4,
hf-doc-builder 4, kenlm 4, librosa 4, llvmlite 4, nest-asyncio 4, nltk
4, optuna 4, parameterized 4, phonemizer 4, psutil 4, pyctcdecode 4,
pytest 4, pytest-timeout 4, pytest-xdist 4, ray 4, rjieba 4, rouge-score
4, ruff 4, sacrebleu 4, sacremoses 4, sigopt 4, sortedcontainers 4,
tensorstore 4, timeout-decorator 4, toolz 4, torchaudio 4, accelerate 3,
audioread 3, cookiecutter 3, decorator 3, deepspeed 3, faiss-cpu 3, flax
3, fugashi 3, ipadic 3, isort 3, jax 3, jaxlib 3, joblib 3, keras-nlp 3,
lazy-loader 3, numba 3, optax 3, pooch 3, pydantic 3, pygtrie 3, rhoknp
3, scikit-learn 3, segments 3, soundfile 3, soxr 3, sudachidict-core 3,
sudachipy 3, torch 3, unidic 3, unidic-lite 3, urllib3 3, absl-py 2,
arrow 2, astunparse 2, async-timeout 2, botocore 2, cachetools 2,
certifi 2, chardet 2, charset-normalizer 2, csvw 2, dash-core-components
2, dash-html-components 2, dash-table 2, diffusers 2, dm-tree 2,
fastjsonschema 2, flask 2, flatbuffers 2, fsspec 2, ftfy 2, gast 2,
google-pasta 2, greenlet 2, grpcio 2, h5py 2, humanfriendly 2, idna 2,
importlib-metadata 2, importlib-resources 2, jinja2 2, jmespath 2,
jupyter-core 2, kagglehub 2, keras 2, keras-core 2, keras-preprocessing
2, libclang 2, mako 2, mdurl 2, ml-dtypes 2, msgpack 2, multidict 2,
mypy-extensions 2, networkx 2, nvidia-cublas-cu12 2,
nvidia-cuda-cupti-cu12 2, nvidia-cuda-nvrtc-cu12 2,
nvidia-cuda-runtime-cu12 2, nvidia-cudnn-cu12 2, nvidia-cufft-cu12 2,
nvidia-curand-cu12 2, nvidia-cusolver-cu12 2, nvidia-cusparse-cu12 2,
nvidia-nccl-cu12 2, nvidia-nvtx-cu12 2, onnx 2, onnxruntime 2,
onnxruntime-tools 2, opencv-python 2, opt-einsum 2, orbax-checkpoint 2,
pathspec 2, plotly 2, pox 2, ppft 2, pyasn1-modules 2, pycparser 2,
pyrsistent 2, python-dateutil 2, pytz 2, requests-oauthlib 2, retrying
2, rich 2, rsa 2, s3transfer 2, scipy 2, setuptools 2, six 2, smmap 2,
sympy 2, tabulate 2, tensorflow-estimator 2, tensorflow-hub 2,
tensorflow-io-gcs-filesystem 2, termcolor 2, text-unidecode 2, traitlets
2, triton 2, typing-extensions 2, tzdata 2, tzlocal 2, wrapt 2,
xmltodict 2, yarl 2, Python 1, av 1, babel 1, bibtexparser 1, blinker 1,
colorama 1, decord 1, filelock 1, huggingface-hub 1, isodate 1,
itsdangerous 1, language-tags 1, lxml 1, numpy 1, oauthlib 1, packaging
1, pillow 1, protobuf 1, pyasn1 1, pylatexenc 1, pyparsing 1, pyyaml 1,
rdflib 1, regex 1, requests 1, rfc3986 1, root 1, safetensors 1,
sentencepiece 1, tenacity 1, timm 1, tokenizers 1, torchvision 1, tqdm
1, transformers 1, types-python-dateutil 1, uritemplate 1


You can reproduce them with python 3.10 and:
```
RUST_LOG=uv_resolver=debug cargo run pip compile -o /dev/null -q scripts/requirements/<input>.in 2>&1 | tail -n 1
```

Closes #2270 - This is less invasive compared to the other PR, we can
revisit number of network/build request tracking later.
2024-04-17 09:08:21 +00:00
Charlie Marsh 96c3c2e774
Support unnamed requirements in `--require-hashes` (#2993)
## Summary

This PR enables `--require-hashes` with unnamed requirements. The key
change is that `PackageId` becomes `VersionId` (since it refers to a
package at a specific version), and the new `PackageId` consists of
_either_ a package name _or_ a URL. The hashes are keyed by `PackageId`,
so we can generate the `RequiredHashes` before we have names for all
packages, and enforce them throughout.

Closes #2979.
2024-04-11 11:26:50 -04:00
Charlie Marsh f9c0632953
Ignore direct URL distributions in prefetcher (#2943)
## Summary

The prefetcher tallies the number of times we tried a given package, and
then once we hit a threshold, grabs the version map, assuming it's
already been fetched. For direct URL distributions, though, we don't
have a version map! And there's no need to prefetch.

Closes https://github.com/astral-sh/uv/issues/2941.
2024-04-09 14:09:41 -05:00
konsti fb4ba2bbc2
Speed up cold cache urllib3/boto3/botocore with batched prefetching (#2452)
With pubgrub being fast for complex ranges, we can now compute the next
n candidates without taking a performance hit. This speeds up cold cache
`urllib3<1.25.4` `boto3` from maybe 40s - 50s to ~2s. See docstrings for
details on the heuristics.

**Before**


![image](https://github.com/astral-sh/uv/assets/6826232/b7b06950-e45b-4c49-b65e-ae19fe9888cc)

**After**


![image](https://github.com/astral-sh/uv/assets/6826232/1c749248-850e-49c1-9d57-a7d78f87b3aa)

---

We need two parts of the prefetching, first looking for compatible
version and then falling back to flat next versions. After we selected a
boto3 version, there is only one compatible botocore version remaining,
so when won't find other compatible candidates for prefetching. We see
this as a pattern where we only prefetch boto3 (stack bars), but not
botocore (sequential requests between the stacked bars).


![image](https://github.com/astral-sh/uv/assets/6826232/e5186800-23ac-4ed1-99b9-4d1046fbd03a)

The risk is that we're completely wrong with the guess and cause a lot
of useless network requests. I think this is acceptable since this
mechanism only triggers when we're already on the bad path and we should
simply have fetched all versions after some seconds (assuming a fast
index like pypi).

---

It would be even better if the pubgrub state was copy-on-write so we
could simulate more progress than we actually have; currently we're
guessing what the next version is which could be completely wrong, but i
think this is still a valuable heuristic.

Fixes #170.
2024-04-08 14:28:56 +00:00