139 Commits

Author SHA1 Message Date
Charlie Marsh
67fb023f10 Avoid duplicating authorization header with netrc (#2325)
## Summary

The netrc middleware we added in
https://github.com/astral-sh/uv/pull/2241 has a slight problem. If you
include credentials in your index URL, _and_ in the netrc file, the
crate blindly adds the netrc credentials as a header. And given the
`ReqwestBuilder` API, this means you end up with _two_ `Authorization`
headers, which always leads to an invalid request, though the exact
failure can take different forms.

This PR removes the middleware crate in favor of our own middleware.
Instead of using the `RequestInitialiser` API, we have to use the
`Middleware` API, so that we can remove the header on the request
itself.

Closes https://github.com/astral-sh/uv/issues/2323.

## Test Plan

- Verified that running against a private index with credentials in the
URL (but no netrc file) worked without error.
- Verified that running against a private index with credentials in the
netrc file (but not the URL) worked without error.
- Verified that running against a private index with a mix of
credentials in both _also_ worked without error.
2024-03-10 15:02:24 +00:00
Andrew Gallant
b3b5afaf78 uv-fs: transparently support reading UTF-16 files
This PR tweaks uv to support reading `requirements.txt` regardless of
whether it is encoded as UTF-8 or UTF-16. This is particularly relevant
on Windows where `uv pip freeze > requirements.txt` will likely write a
UTF-16 encoded `requirements.txt` file.

There is some discussion on #1666 where it's suggested that perhaps
we should explicitly not support this. I didn't see that until I
had already put this PR together, but even so, I think it's worth
considering this. UTF-16 is predominant on Windows. It is very easy
to produce a UTF-16 encoded file. Moreover, there is an easy and well
specified way to recognize and transcode UTF-16 encoded data to UTF-8.

I think the downside of this is that it could encourage the use UTF-16
encoded `requirements.txt` files *in addition* to UTF-8 encoded
files, and it would probably be nice to converge and standardize on
one encoding. One possible alternative to this PR is that we provide
a better error message. Another alternative is to ensure that a
`-o/--output` flag exists for all commands (neither `uv pip freeze` nor
`pip freeze` have such a flag) so that users can always write output
to a file without relying on their environment's piping behavior.
(Although this last alternative seems a little sad to me.)

It's also worth noting the [PEP-0508] doesn't seem to mention file
encoding at all. So I think from a "do the standards allow this"
perspective, this change is OK.

Finally, `pip` itself seems to work with UTF-16 encoded
`requirements.txt` files.

I think I personally overall lean towards supporting UTF-16 for
`requirements.txt` files. In part because I think it smoothes out the
UX a little bit, in part because there is no obvious specification
(that I'm aware of) that mandates that these files are UTF-8, and
finally in part because `pip` supports it too.

Fixes #1666, Fixes #2276

[PEP-0508]: https://peps.python.org/pep-0508/
2024-03-08 09:10:14 -05:00
Charlie Marsh
996a859b29 Use reparse points to detect Windows installer shims (#2284)
## Summary

This PR enables use of the Windows Store Pythons even with `py` is not
installed. Specifically, we need to ensure that the `python.exe` and
`python3.exe` executables installed into the
`C:\Users\crmar\AppData\Local\Microsoft\WindowsApp` directory _are_ used
when they're not "App execution aliases" (which merely open the Windows
Store, to help you install Python).

When `py` is installed, this isn't strictly necessary, since the
"resolved" executables are discovered via `py`. These look like
`C:\Users\crmar\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbs5n2kfra8p0\python.exe`.

Closes https://github.com/astral-sh/uv/issues/2264.

## Test Plan

- Removed all Python installations from my Windows machine.
- Uninstalled `py`.
- Enabled "App execution aliases".
- Verified that for both `cargo run venv --python python.exe` and `cargo
run venv --python python3.exe`, `uv` exited with a failure that no
Python could be found.
- Installed Python 3.10 via the Windows Store.
- Verified that the above commands succeeded without error.
- Verified that `cargo run venv --python python3.10.exe` _also_
succeeded.
2024-03-07 15:47:32 -05:00
Bas Schoenmaeckers
e7742070c1 feat: Add netrc authentication to uv-client (#2241)
## Summary

Add netrc support to the uv-client.

closes #1405 

## Test Plan

I've added a corresponding test case to validate the correct header.
Furthermore a tested it against a real world private repository.
2024-03-06 20:48:30 +00:00
konsti
97b4526825 Update pubgrub onto upstream (#2243)
No functional changes, just pulling in the upstream changes in
preparation of new changes.
2024-03-06 16:06:35 +00:00
jannisko
71626e8dec Support remote https:// requirements files (#1332) (#2081)
## Summary

Allow using http(s) urls for constraints and requirements files handed
to the CLI, by handling paths starting with `http://` or `https://`
differently. This allows commands for such as: `uv pip install -c
https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.8.txt
requests`.

closes #1332

## Test Plan

Testing install using a `constraints.txt` file hosted on github in the
airflow repository:

fbdc2eba8e/crates/uv/tests/pip_install.rs (L1440-L1484)

## Advice Needed

- filesystem/http dispatch is implemented at a relatively low level (at
`crates/uv-fs/src/lib.rs#read_to_string`). Should I change some naming
here so it is obvious that the function is able to dispatch?
- I kept the CLI argument for -c and -r as a PathBuf, even though now it
is technically either a path or a url. We could either keep this as is
for now, or implement a new enum for this case? The enum could then
handle dispatch to files/http.
- Using another abstraction layer like
https://docs.rs/object_store/latest/object_store/ for the
files/urls/[s3] could work as well, though I ran into a bug during
testing which I couldn't debug
2024-03-06 04:18:11 +00:00
konsti
2a53e789b0 Add an option to bytecode compile during installation (#2086)
Add a `--compile` option to `pip install` and `pip sync`.

I chose to implement this as a separate pass over the entire venv. If we
wanted to compile during installation, we'd have to make sure that
writing is exclusive, to avoid concurrent processes writing broken
`.pyc` files. Additionally, this ensures that the entire site-packages
are bytecode compiled, even if there are packages that aren't from this
`uv` invocation. The disadvantage is that we do not update RECORD and
rely on this comment from [PEP 491](https://peps.python.org/pep-0491/):

> Uninstallers should be smart enough to remove .pyc even if it is not
mentioned in RECORD.

If this is a problem we can change it to run during installation and
write RECORD entries.

Internally, this is implemented as an async work-stealing subprocess
worker pool. The producer is a directory traversal over site-packages,
sending each `.py` file to a bounded async FIFO queue/channel. Each
worker has a long-running python process. It pops the queue to get a
single path (or exists if the channel is closed), then sends it to
stdin, waits until it's informed that the compilation is done through a
line on stdout, and repeat. This is fast, e.g. installing `jupyter
plotly` on Python 3.12 it processes 15876 files in 319ms with 32 threads
(vs. 3.8s with a single core). The python processes internally calls
`compileall.compile_file`, the same as pip.

Like pip, we ignore and silence all compilation errors
(https://github.com/astral-sh/uv/issues/1559). There is a 10s timeout to
handle the case when the workers got stuck. For the reviewers, please
check if i missed any spots where we could deadlock, this is the hardest
part of this PR.

I've added `uv-dev compile <dir>` and `uv-dev clear-compile <dir>`
commands, mainly for my own benchmarking. I don't want to expose them in
`uv`, they almost certainly not the correct workflow and we don't want
to support them.

Fixes #1788
Closes #1559
Closes #1928
2024-03-05 03:35:24 +00:00
dependabot[bot]
bc43f609b9 Bump http from 0.2.11 to 0.2.12 (#2179) 2024-03-04 22:22:45 +00:00
dependabot[bot]
9fcffae084 Bump anstream from 0.6.12 to 0.6.13 (#2177) 2024-03-04 16:16:10 -06:00
Charlie Marsh
6f54aacfea Remove camino from uv-virtualenv (#2119)
## Summary

I think Camino is nice but it makes it much harder to work in
`uv-virtualenv`, since it's the _only_ crate that uses it. If we want to
use Camino, we should use it everywhere IMO.
2024-03-01 19:36:56 +00:00
Charlie Marsh
10175143d1 Add a --python flag to allow installation into arbitrary Python interpreters (#2000)
## Summary

This PR adds a `--python` flag that allows users to provide a specific
Python interpreter into which `uv` should install packages. This would
replace the `VIRTUAL_ENV=` workaround that folks have been using to
install into arbitrary, system environments, while _also_ actually being
correct for installing into non-virtual environments, where the bin and
site-packages paths can differ.

The approach taken here is to use `sysconfig.get_paths()` to get the
correct paths from the interpreter, and then use those for determining
the `bin` and `site-packages` directories, rather than constructing them
based on hard-coded expectations for each platform.

Closes https://github.com/astral-sh/uv/issues/1396.

Closes https://github.com/astral-sh/uv/issues/1779.

Closes https://github.com/astral-sh/uv/issues/1988.

## Test Plan

- Verified that, on my Windows machine, I was able to install `requests`
into a global environment with: `cargo run pip install requests --python
'C:\\Users\\crmarsh\\AppData\\Local\\Programs\\Python\\Python3.12\\python.exe`,
then `python` and `import requests`.
- Verified that, on macOS, I was able to install `requests` into a
global environment installed via Homebrew with: `cargo run pip install
requests --python $(which python3.8)`.
2024-02-28 02:10:29 +00:00
dependabot[bot]
6678d545fb Bump serde_json from 1.0.113 to 1.0.114 (#1996) 2024-02-26 23:12:54 +00:00
dependabot[bot]
abbdad1856 Bump anstream from 0.6.11 to 0.6.12 (#1994) 2024-02-26 17:01:16 -06:00
dependabot[bot]
ae525d4720 Bump serde from 1.0.196 to 1.0.197 (#1993) 2024-02-26 17:01:09 -06:00
dependabot[bot]
22fb373ab2 Bump target-lexicon from 0.12.13 to 0.12.14 (#1992) 2024-02-26 17:01:01 -06:00
Charlie Marsh
4fdf230c2f Support recursive extras in direct pyproject.toml files (#1990)
## Summary

When a `pyproject.toml` is provided directly to `uv pip compile`, we
were failing to resolve recursive extras. The solution I settled on here
is to flatten them recursively when determining the requirements
upfront.

Closes https://github.com/astral-sh/uv/issues/1987.

## Test Plan

`cargo test`
2024-02-26 15:44:25 -05:00
samypr100
df812a181e feat: bump actions/{download,upload}-artifact (#1947)
## Summary

Closes #1943

Makes sure `build-binaries` and `publish-pypi` workflows are compatible
with `actions/{download,upload}-artifact@v4`. In nature, this PR is very
similar to the changes in https://github.com/astral-sh/ruff/pull/10105.
This PR also updates cargo-dist.

## Test Plan

I ran a small non-dry-run [smoke
test](https://github.com/samypr100/uv/actions/runs/8027864059) on my own
fork CI with only linux builds (for speed) and those jobs seem to work
at a glance.
2024-02-25 20:09:04 -05:00
dependabot[bot]
8ff6182815 Bump anyhow from 1.0.79 to 1.0.80 (#1941) 2024-02-23 14:01:40 -06:00
dependabot[bot]
7e0d9b2cea Bump pyo3 from 0.20.2 to 0.20.3 (#1940) 2024-02-23 14:01:18 -06:00
dependabot[bot]
fee79aea7d Bump itertools from 0.10.5 to 0.12.1 (#1939) 2024-02-23 14:01:11 -06:00
dependabot[bot]
086e58e54a Bump textwrap from 0.16.0 to 0.16.1 (#1938) 2024-02-23 14:01:05 -06:00
konsti
5a50a753bd Update cargo dist (#1929)
Pull in the fixes in the last few releases.
2024-02-23 18:56:34 +00:00
Zanie Blue
fe1847561c Retain authentication when making range requests (#1902)
Needs https://github.com/prefix-dev/async_http_range_reader/pull/9
Closes https://github.com/astral-sh/uv/issues/1709
2024-02-23 15:21:10 +00:00
Charlie Marsh
7eaed07f6c Move conflicting dependencies into PubGrub (#1796)
## Summary

This revives a PR from long ago
(https://github.com/astral-sh/uv/pull/383 and
https://github.com/zanieb/pubgrub/pull/24) that modifies how we deal
with dependencies that are declared multiple times within a single
package.

To quote from the originating PR:

> Uses an experimental pubgrub branch (#370) that allows us to handle
multiple version ranges for a single dependency to the solver which
results in better error messages because the derivation tree contains
all of the relevant versions. Previously, the version ranges were merged
(by us) in the resolver before handing them to pubgrub since only one
range could be provided per package. Since we don't merge the versions
anymore, we no longer give the solver an empty range for conflicting
requirements; instead the solver comes to that conclusion from the
provided versions. You can see the improved error message for direct
dependencies in [this
snapshot](https://github.com/astral-sh/puffin/pull/383/files#diff-a0437f2c20cde5e2f15199a3bf81a102b92580063268417847ec9c793a115bd0).

The main issue with that PR was around its handling of URL dependencies,
so this PR _also_ refactors how we handle those. Previously, we stored
URL dependencies on `PubGrubPackage`, but they were omitted from the
hash and equality implementations of `PubGrubPackage`. This led to some
really careful codepaths wherein we had to ensure that we always visited
URLs before non-URL packages, so that the URL-inclusive versions were
included in any hashmaps, etc. I considered preserving this approach,
but it would require us to rely on lots of internal details of PubGrub
(since we'd now be relying on PubGrub to merge those packages in the
"right" order).

So, instead, we now _always_ set the URL on a given package, whenever
that package was _given_ a URL upfront. I think this is easier to reason
about: if the user provided a URL for `flask`, then we should just
always add the URL for `flask`. If we see some other URL for `flask`, we
error, like before. If we see some unknown URL for `flask`, we error,
like before.

Closes https://github.com/astral-sh/uv/issues/1522.

Closes https://github.com/astral-sh/uv/issues/1821.

Closes https://github.com/astral-sh/uv/issues/1615.
2024-02-21 21:27:58 -05:00
Charlie Marsh
88a0c13865 Use async unzip for local source distributions (#1809)
## Summary

We currently maintain separate untar methods for sync and async, but we
only use the sync version when the user provides a local source
distribution. (Otherwise, we untar as we download the distribution.) In
my testing, this is actually slower anyway:

```
❯ python -m scripts.bench \
        --uv-path ./target/release/main \
        --uv-path ./target/release/uv \
        ./requirements.in --benchmark resolve-cold --min-runs 50
Benchmark 1: ./target/release/main (resolve-cold)
  Time (mean ± σ):     835.2 ms ± 107.4 ms    [User: 346.0 ms, System: 151.3 ms]
  Range (min … max):   639.2 ms … 1051.0 ms    50 runs

Benchmark 2: ./target/release/uv (resolve-cold)
  Time (mean ± σ):     750.7 ms ±  91.9 ms    [User: 345.7 ms, System: 149.4 ms]
  Range (min … max):   637.9 ms … 905.7 ms    50 runs

Summary
  './target/release/uv (resolve-cold)' ran
    1.11 ± 0.20 times faster than './target/release/main (resolve-cold)'
```
2024-02-21 14:11:37 +00:00
konsti
2928c6e574 Backport changes from publish crates (#1739)
Backport of changes for the published new versions of pep440_rs and
pep508_rs to make it easier to keep them in sync.
2024-02-20 19:33:27 +01:00
Alexander Gherm
4dfcf32e4c Add shell completions generation (#1675)
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

Adds cli command / flag (`generate-shell-completion <SHELL>` /
`--generate-shell-completion <SHELL>`) to generate the completion script
for the given shell. Implemented in exactly the same way as it is done
in ruff
(https://github.com/astral-sh/ruff/blob/main/crates/ruff/src/lib.rs#L197)

Closes https://github.com/astral-sh/uv/issues/1654

## Test Plan

I've normally tested the generated script manually only for bash shell
on Ubuntu 22.04.3
```bash
$ uv --generate-shell-completion bash > /usr/share/bash-completion/completions/uv
$ uv # <TAB>
-q                         -h                         --verbose                  --no-cache                 --version                  clean
-v                         -V                         --no-color                 --cache-dir                pip                        generate-shell-completion
-n                         --quiet                    --color                    --help                     venv                       help
$ uv pip # <TAB>
-q           -n           -V           --verbose    --color      --cache-dir  --version    sync         uninstall    help
-v           -h           --quiet      --no-color   --no-cache   --help       compile      install      freeze
```
2024-02-18 21:43:18 -06:00
Charlie Marsh
facc60f3a8 Add graceful fallback for Artifactory indexes (#1574)
## Summary

There are more details in https://github.com/astral-sh/uv/issues/1370,
but it looks like Artifactory servers have incorrect behavior when it
comes to HTTP range requests, in that they return `Accept-Ranges:
bytes`, but then incorrectly return 200 requests when you actually ask
for a given range.

This PR ensures that we fallback gracefully in this case. It's built on
https://github.com/prefix-dev/async_http_range_reader/pull/5. Assuming
that gets merged upstream, we can then remove the Git dependency.

Closes https://github.com/astral-sh/uv/issues/1370.

## Test Plan

`cargo run pip install requests -i
https://killjoyuvbug.jfrog.io/artifactory/api/pypi/pypi/simple
--verbose`
2024-02-17 14:37:06 +00:00
Charlie Marsh
01ffc36520 Apply percent-decoding to file-based URLs (#1541)
## Summary

Closes https://github.com/astral-sh/uv/issues/1537.
2024-02-16 16:11:16 -05:00
Zanie Blue
9737b93b79 Use the system trust store for HTTPS requests (#1512)
Closes #1474 

Using the `rustls-tls-native-roots` feature

> `rustls-tls`: Enables TLS functionality provided by rustls. Equivalent
to rustls-tls-webpki-roots.
>
> `rustls-tls-webpki-roots`: Enables TLS functionality provided by
rustls, while using root certificates from the webpki-roots crate.
>
> `rustls-tls-native-roots`: Enables TLS functionality provided by
rustls, while using root certificates from the rustls-native-certs
crate.

Additional context:

- https://github.com/seanmonstar/reqwest/issues/1554
- https://github.com/encode/httpx/issues/302
- [Should I use the native certs or
webpki-roots?](https://github.com/rustls/rustls-native-certs#should-i-use-this-or-webpki-roots)

Prior discussion at https://github.com/astral-sh/uv/pull/609
2024-02-16 14:07:18 -05:00
Charlie Marsh
458b4f2dde Remove Docker (#1323)
I will revisit this later.
2024-02-15 18:30:18 +00:00
Zanie Blue
2586f655bb Rename to uv (#1302)
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.

Then, update directory and file names:

```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g

# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```

Then add all the files again

```
# Add all the files again
git add crates
git add python/uv

# This one needs a force-add
git add -f crates/uv-trampoline
```
2024-02-15 11:19:46 -06:00
Andrew Gallant
8102980192 puffin-resolver: make VersionMap construction lazy
That is, a `PrioritizedDistribution` for a specific version of a
package is not actually materialized in memory until a corresponding
`VersionMap::get` call is made for that version. Similarly, iteration
lazily materializes distributions as it moves through the map. It
specifically does not materialize everything first.

The main reason why this is effective is that an
`OwnedArchive<SimpleMetadata>` represents a zero-copy (other than
reading the source file) version of `SimpleMetadata` that is really just
a `Vec<u8>` internally. The problem with `VersionMap` construction
previously is that it had to eagerly materialize a `SimpleMetadata` in
memory before anything else, which defeats a large part of the purpose
of zero-copy deserialization. By making more of `VersionMap`
construction itself lazy, we permit doing some parts of resolution
without necessarily fully deserializing a `SimpleMetadata` into memory.
Indeed, with this commit, in the warm cached case, a `SimpleMetadata` is
itself never materialized fully in memory.

This does not completely and totally fully realize the benefits of
zero-copy deserialization. For example, we are likely still building
lots of distributions in memory that we don't actually need in some
cases. Perhaps in cases where no resolution exists, or when one needs to
iterate over large portions of the total versions published for a
package.
2024-02-15 08:10:32 -05:00
Andrew Gallant
bdb491baf6 deps: bump pubgrub
This brings in a [PR] that makes `Range::as_singleton` return a
borrow.

[PR]: https://github.com/zanieb/pubgrub/pull/23
2024-02-15 08:10:32 -05:00
Charlie Marsh
16bb80132f Add an --offline mode (#1270)
## Summary

This PR adds an `--offline` flag to Puffin that disables network
requests (implemented as a Reqwest middleware on our registry client).
When `--offline` is provided, we also allow the HTTP cache to return
stale data.

Closes #942.
2024-02-13 03:35:23 +00:00
Charlie Marsh
c75eef28b5 Upgrade to miette v6.0.0 (#1272) 2024-02-11 23:23:27 -05:00
Charlie Marsh
62416286e2 Remove add and remove commands (#1259)
## Summary

These add and remove dependencies from a `pyproject.toml` -- but they're
currently hidden, and don't match the rest of the workflow. We can
re-add them when the time is right.
2024-02-06 14:18:27 -05:00
Andrew Gallant
d4b4c21133 initial implementation of zero-copy deserialization for SimpleMetadata (#1249)
(Please review this PR commit by commit.)

This PR closes an initial loop on zero-copy deserialization. That
is, provides a way to get a `Archived<SimpleMetadata>` (spelled
`OwnedArchive<SimpleMetadata>` in the code) from a `CachedClient`. The
main benefit of zero-copy deserialization is that we can read bytes
from a file, cast those bytes to a structured representation without
cost, and then start using that type as any other Rust type. The
"catch" is that the structured representation is not the actual type
you started with, but the "archived" version of it.

In order to make all this work, we ended up needing to shave a rather
large yak: we had to re-implement HTTP cache semantics. Previously,
we were using the `http-cache-semantics` crate. While it does support
Serde, it doesn't support `rkyv`. Moreover, even simple support for
`rkyv` wouldn't be enough. What we actually want is for the HTTP cache
semantics to be implemented on the *archived* type so that we can
decide whether our cached response is stale or not without needing to
do a full deserialization into the unarchived type. This is why, in
this PR, you'll see `impl ArchivedCachePolicy { ... }` instead of
`impl CachePolicy { ... }`. (The `derive(rkyv::Archive)` macro
automatically introduces the `ArchivedCachePolicy` type into the
current namespace.)

Unfortunately, this PR does not fully realize the dream that is
zero-copy deserialization. Namely, while a `CachedClient` can now
provide an `OwnedArchive<SimpleMetadata>`, the rest of our code
doesn't really make use of it. Indeed, as soon as we go to build a
`VersionMap`, we eagerly convert our archived metadata into an owned
`SimpleMetadata` via deserialization (that *isn't* zero-copy). After
this change, a lot of the work now shifts to `rkyv` deserialization
and `VersionMap` construction. More precisely, the main thing we drop
here is `CachePolicy` deserialization (which is now truly zero-copy)
and the parsing of the MessagePack format for `SimpleMetadata`. But we
are still paying for deserialization. We're just paying for it in a
different place.

This PR does seem to bring a speed-up, but it is somewhat underwhelming.
My measurements have been pretty noisy, but I get a 1.1x speedup fairly
often:

```
$ hyperfine -w5 "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     164.4 ms ±  18.8 ms    [User: 427.1 ms, System: 348.6 ms]
  Range (min … max):   131.1 ms … 190.5 ms    18 runs

Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     148.3 ms ±  10.2 ms    [User: 357.1 ms, System: 319.4 ms]
  Range (min … max):   136.8 ms … 184.4 ms    19 runs

Summary
  puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.11 ± 0.15 times faster than puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```

One downside is that this does increase cache size (`rkyv`'s
serialization format is not as compact as MessagePack). On disk size
increases by about 1.8x for our `simple-v0` cache.

```
$ sort-filesize cache-main
4.0K    cache-main/CACHEDIR.TAG
4.0K    cache-main/.gitignore
8.0K    cache-main/interpreter-v0
8.7M    cache-main/wheels-v0
18M     cache-main/archive-v0
59M     cache-main/simple-v0
109M    cache-main/built-wheels-v0
193M    cache-main
193M    total

$ sort-filesize cache-test
4.0K    cache-test/CACHEDIR.TAG
4.0K    cache-test/.gitignore
8.0K    cache-test/interpreter-v0
8.7M    cache-test/wheels-v0
18M     cache-test/archive-v0
107M    cache-test/simple-v0
109M    cache-test/built-wheels-v0
242M    cache-test
242M    total
```

Also, while I initially intended to do a simplistic implementation of
HTTP cache semantics, I found that everything was somewhat
inter-connected. I could have wrote code that _specifically_ only worked
with the present behavior of PyPI, but then it would need to be special
cased and everything else would need to continue to use
`http-cache-sematics`. By implementing what we need based on what Puffin
actually is (which is still less than what `http-cache-semantics` does),
we can avoid special casing and use zero-copy deserialization for our
cache policy in _all_ cases.
2024-02-05 16:47:53 -05:00
Zanie Blue
d090acf13d Improve error messaging when a dependency is not found (#1241)
Previously, whenever we encountered a missing package we would throw an
error without information about why the package was requested. This
meant that if a transitive dependency required a missing package, the
user would have no idea why it was even selected. Here, we track
`NotFound` and `NoIndex` errors as `NoVersions` incompatibilities with
an attached reason. Improves our test coverage for `--no-index` without
`--find-links`.

The
[snapshots](https://github.com/astral-sh/puffin/pull/1241/files#diff-3eea1658f165476252f1f061d0aa9f915aabdceafac21611cdf45019447f60ec)
show a nice improvement.

I think this will also enable backtracking to another version if some
version of transitive dependency has a missing dependency. I'll write a
scenario for that next.

Requires https://github.com/zanieb/pubgrub/pull/22
2024-02-05 08:43:05 -06:00
konsti
f10f902570 Yield after channel send and move cpu tasks to thread (#1163)
## Summary

Previously, we were blocking operations that could run in parallel. We
would send request through our main requests channel, but not yield so
that the receiver could only start processing requests much later than
necessary. We solve this by switching to the async
`tokio::sync::mpsc::channel`, where send is an async functions that
yields.

Due to the increased parallelism cache deserialization and the
conversion from simple api request to version map became bottlenecks, so
i moved them to `spawn_blocking`. Together these result in a 30-60%
speedup for larger warm cache resolution. Small cases such as black
already resolve in 5.7 ms on my machine so there's no speedup to be
gained, refresh and no cache were to noisy to get signal from.

Note for the future: Revisit the bounded channel if we want to produce
requests from `process_request`, too, (this would be good for
prefetching) to avoid deadlocks.

## Details

We can look at the behavior change through the spans:

```
RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null
```

Below, you can see how on main, we have discrete phases: All (cached)
simple api requests in parallel, then all (cached) metadata requests in
parallel, repeat until done. The solver is mostly waiting until it has
it's version map from the simple API query to be able to choose a
version. The main thread is blocked by process requests.

In the PR branch, the simple api requests succeeds much earlier,
allowing the solver to advance and also to schedule more prefetching.
Due to that `parse_cache` and `from_metadata` became bottlenecks, so i
moved them off the main thread (green color, and their spans can now
overlap because they can run on multiple threads in parallel). The main
thread isn't blocked on `process_request` anymore, instead it has
frequent idle times. The spans are all much shorter, which indicates
that on main they could have finished much earlier, but a task didn't
yield so they weren't scheduled to finish (though i haven't dug deep
enough to understand the exact scheduling between the process request
stream and the solver here).

**main**


![jupyter-warm-main](https://github.com/astral-sh/puffin/assets/6826232/693c53cc-1090-41b7-b02a-a607fcd2cd99)

**PR**


![jupyter-warm-branch](https://github.com/astral-sh/puffin/assets/6826232/33435f34-b39b-4b0a-a9d7-4bfc22f55f05)

## Benchmarks

```
$ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter"
Benchmark 1: target/profiling/main-dev resolve jupyter
  Time (mean ± σ):      29.1 ms ±   0.7 ms    [User: 22.9 ms, System: 11.1 ms]
  Range (min … max):    27.7 ms …  32.2 ms    103 runs
 
Benchmark 2: target/profiling/branch-dev resolve jupyter
  Time (mean ± σ):      18.8 ms ±   1.1 ms    [User: 37.0 ms, System: 22.7 ms]
  Range (min … max):    16.5 ms …  21.9 ms    154 runs
 
Summary
  target/profiling/branch-dev resolve jupyter ran
    1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter

$ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent"
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      37.8 ms ±   0.9 ms    [User: 30.7 ms, System: 14.1 ms]
  Range (min … max):    36.6 ms …  41.5 ms    79 runs
 
Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent
  Time (mean ± σ):      24.7 ms ±   1.5 ms    [User: 47.0 ms, System: 39.3 ms]
  Range (min … max):    21.5 ms …  28.7 ms    113 runs
 
Summary
  target/profiling/branch-dev resolve meine_stadt_transparent ran
    1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent

$ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):     229.0 ms ±   2.8 ms    [User: 197.3 ms, System: 63.7 ms]
  Range (min … max):   225.8 ms … 234.0 ms    13 runs
 
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):      91.4 ms ±   5.3 ms    [User: 289.2 ms, System: 176.9 ms]
  Range (min … max):    81.0 ms … 104.7 ms    32 runs
 
Summary
  target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
    2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
```
2024-02-02 18:18:24 +01:00
Charlie Marsh
b2f1bbaa63 Add a Ctrl+C handler to the confirm workflow (#1202)
Fixes an issue whereby exiting the confirmation prompt can lead to your
cursor disappearing: https://github.com/console-rs/dialoguer/issues/294.

See:
b839a2c5b7/rye/src/main.rs (L36-L48).
2024-01-31 02:08:27 +00:00
Charlie Marsh
3f5e7306a5 Remove WaitMap dependency (#1183)
## Summary

This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by
removing the `WaitMap` and gaining more granular control over the values
that we hold over `await` boundaries.
2024-01-30 15:25:22 -05:00
Charlie Marsh
aa3b79ec63 Prompt user for missing -r and -e flags in pip install (#1180)
## Summary

If the user runs a command like `pip install requirements.txt`, we now
prompt them to ask if they meant to include the `-r` flag:

![Screenshot 2024-01-29 at 8 38
29 PM](https://github.com/astral-sh/puffin/assets/1309177/82b9f7a2-2526-4144-b200-a5015e5b8a4b)

![Screenshot 2024-01-29 at 8 38
33 PM](https://github.com/astral-sh/puffin/assets/1309177/bd8ebb51-2537-4540-a0e0-718e66a1c69c)

The specific logic is: if the requirement ends in `.txt` or `.in`, and
the file exists locally, prompt the user for `-r`. If the requirement
contains a directory separator, and the directory exists locally, prompt
the user for `-e`.

Closes #1166.
2024-01-30 18:58:45 +00:00
konsti
614bb0cf52 Update async_http_range_reader to 0.5.0 (#1189)
Removes a git dep and removes itertools 0.11
2024-01-30 16:32:53 +00:00
konsti
ab27913f68 Instrument the main function and add jupyter.in (#1186)
Instrument the main function as anchor span for checking overhead and
update tracing-durations-export to 0.2.0 for differentiating
blocking/non-blocking tasks.

Add a `jupyter.in` requirement since `pip install jupyter` is a common
operation. I tried `jupyterlab` too but there is no difference in
performance (1.00 ± 0.07).
2024-01-30 11:03:24 +00:00
Charlie Marsh
61a3060383 Run cargo update (#1178) 2024-01-29 21:01:37 -05:00
Charlie Marsh
fa3c9afdc1 Deduplicate pep440_rs in dependency tree (#1177)
## Summary

Closes https://github.com/astral-sh/puffin/issues/1176.

## Test Plan

`cargo tree -p puffin -i pep440_rs` runs without error. Previously, it
errored due to multiple versions.
2024-01-29 16:11:42 -05:00
Charlie Marsh
4b9daf9604 Use tokio_tar instead of async_tar (#1170)
## Summary

`tokio_tar` is a fork of `async_tar` that uses Tokio instead of
`async-std`. Using it removes a significant dependency from our tree.

(There is an open PR
(https://github.com/dignifiedquire/async-tar/pull/41) in `async-tar` to
add Tokio support, but it's over a year old.)

See:
https://github.com/astral-sh/puffin/pull/1157#discussion_r1469190249.
2024-01-29 10:00:30 -05:00
Charlie Marsh
d88ce76979 Stream unpacking of source distribution downloads (#1157)
This PR migrates our source distribution downloads to unzip as we
stream, similar to our approach for wheels.

In my testing, this showed a consistent speedup (e.g., 6% here for a few
representative source distributions):

```text
❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in
Benchmark 1: ./target/release/main (install-cold)
  Time (mean ± σ):      1.503 s ±  0.039 s    [User: 1.479 s, System: 0.537 s]
  Range (min … max):    1.466 s …  1.605 s    10 runs

Benchmark 2: ./target/release/puffin (install-cold)
  Time (mean ± σ):      1.421 s ±  0.024 s    [User: 1.505 s, System: 0.593 s]
  Range (min … max):    1.381 s …  1.454 s    10 runs

Summary
  './target/release/puffin (install-cold)' ran
    1.06 ± 0.03 times faster than './target/release/main (install-cold)'
```
2024-01-28 20:09:24 -05:00
Andrew Gallant
5219d37250 add initial rkyv support (#1135)
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.

For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:

* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
  cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
  type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
  of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
  else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
  introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
  given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
  allocate and create all necessary objects.

This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.

The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.

My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.

There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.

[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
2024-01-28 12:14:59 -05:00