116 Commits

Author SHA1 Message Date
Charlie Marsh
7eaed07f6c Move conflicting dependencies into PubGrub (#1796)
## Summary

This revives a PR from long ago
(https://github.com/astral-sh/uv/pull/383 and
https://github.com/zanieb/pubgrub/pull/24) that modifies how we deal
with dependencies that are declared multiple times within a single
package.

To quote from the originating PR:

> Uses an experimental pubgrub branch (#370) that allows us to handle
multiple version ranges for a single dependency to the solver which
results in better error messages because the derivation tree contains
all of the relevant versions. Previously, the version ranges were merged
(by us) in the resolver before handing them to pubgrub since only one
range could be provided per package. Since we don't merge the versions
anymore, we no longer give the solver an empty range for conflicting
requirements; instead the solver comes to that conclusion from the
provided versions. You can see the improved error message for direct
dependencies in [this
snapshot](https://github.com/astral-sh/puffin/pull/383/files#diff-a0437f2c20cde5e2f15199a3bf81a102b92580063268417847ec9c793a115bd0).

The main issue with that PR was around its handling of URL dependencies,
so this PR _also_ refactors how we handle those. Previously, we stored
URL dependencies on `PubGrubPackage`, but they were omitted from the
hash and equality implementations of `PubGrubPackage`. This led to some
really careful codepaths wherein we had to ensure that we always visited
URLs before non-URL packages, so that the URL-inclusive versions were
included in any hashmaps, etc. I considered preserving this approach,
but it would require us to rely on lots of internal details of PubGrub
(since we'd now be relying on PubGrub to merge those packages in the
"right" order).

So, instead, we now _always_ set the URL on a given package, whenever
that package was _given_ a URL upfront. I think this is easier to reason
about: if the user provided a URL for `flask`, then we should just
always add the URL for `flask`. If we see some other URL for `flask`, we
error, like before. If we see some unknown URL for `flask`, we error,
like before.

Closes https://github.com/astral-sh/uv/issues/1522.

Closes https://github.com/astral-sh/uv/issues/1821.

Closes https://github.com/astral-sh/uv/issues/1615.
2024-02-21 21:27:58 -05:00
Charlie Marsh
88a0c13865 Use async unzip for local source distributions (#1809)
## Summary

We currently maintain separate untar methods for sync and async, but we
only use the sync version when the user provides a local source
distribution. (Otherwise, we untar as we download the distribution.) In
my testing, this is actually slower anyway:

```
❯ python -m scripts.bench \
        --uv-path ./target/release/main \
        --uv-path ./target/release/uv \
        ./requirements.in --benchmark resolve-cold --min-runs 50
Benchmark 1: ./target/release/main (resolve-cold)
  Time (mean ± σ):     835.2 ms ± 107.4 ms    [User: 346.0 ms, System: 151.3 ms]
  Range (min … max):   639.2 ms … 1051.0 ms    50 runs

Benchmark 2: ./target/release/uv (resolve-cold)
  Time (mean ± σ):     750.7 ms ±  91.9 ms    [User: 345.7 ms, System: 149.4 ms]
  Range (min … max):   637.9 ms … 905.7 ms    50 runs

Summary
  './target/release/uv (resolve-cold)' ran
    1.11 ± 0.20 times faster than './target/release/main (resolve-cold)'
```
2024-02-21 14:11:37 +00:00
konsti
2928c6e574 Backport changes from publish crates (#1739)
Backport of changes for the published new versions of pep440_rs and
pep508_rs to make it easier to keep them in sync.
2024-02-20 19:33:27 +01:00
Alexander Gherm
4dfcf32e4c Add shell completions generation (#1675)
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

Adds cli command / flag (`generate-shell-completion <SHELL>` /
`--generate-shell-completion <SHELL>`) to generate the completion script
for the given shell. Implemented in exactly the same way as it is done
in ruff
(https://github.com/astral-sh/ruff/blob/main/crates/ruff/src/lib.rs#L197)

Closes https://github.com/astral-sh/uv/issues/1654

## Test Plan

I've normally tested the generated script manually only for bash shell
on Ubuntu 22.04.3
```bash
$ uv --generate-shell-completion bash > /usr/share/bash-completion/completions/uv
$ uv # <TAB>
-q                         -h                         --verbose                  --no-cache                 --version                  clean
-v                         -V                         --no-color                 --cache-dir                pip                        generate-shell-completion
-n                         --quiet                    --color                    --help                     venv                       help
$ uv pip # <TAB>
-q           -n           -V           --verbose    --color      --cache-dir  --version    sync         uninstall    help
-v           -h           --quiet      --no-color   --no-cache   --help       compile      install      freeze
```
2024-02-18 21:43:18 -06:00
Charlie Marsh
facc60f3a8 Add graceful fallback for Artifactory indexes (#1574)
## Summary

There are more details in https://github.com/astral-sh/uv/issues/1370,
but it looks like Artifactory servers have incorrect behavior when it
comes to HTTP range requests, in that they return `Accept-Ranges:
bytes`, but then incorrectly return 200 requests when you actually ask
for a given range.

This PR ensures that we fallback gracefully in this case. It's built on
https://github.com/prefix-dev/async_http_range_reader/pull/5. Assuming
that gets merged upstream, we can then remove the Git dependency.

Closes https://github.com/astral-sh/uv/issues/1370.

## Test Plan

`cargo run pip install requests -i
https://killjoyuvbug.jfrog.io/artifactory/api/pypi/pypi/simple
--verbose`
2024-02-17 14:37:06 +00:00
Charlie Marsh
01ffc36520 Apply percent-decoding to file-based URLs (#1541)
## Summary

Closes https://github.com/astral-sh/uv/issues/1537.
2024-02-16 16:11:16 -05:00
Zanie Blue
9737b93b79 Use the system trust store for HTTPS requests (#1512)
Closes #1474 

Using the `rustls-tls-native-roots` feature

> `rustls-tls`: Enables TLS functionality provided by rustls. Equivalent
to rustls-tls-webpki-roots.
>
> `rustls-tls-webpki-roots`: Enables TLS functionality provided by
rustls, while using root certificates from the webpki-roots crate.
>
> `rustls-tls-native-roots`: Enables TLS functionality provided by
rustls, while using root certificates from the rustls-native-certs
crate.

Additional context:

- https://github.com/seanmonstar/reqwest/issues/1554
- https://github.com/encode/httpx/issues/302
- [Should I use the native certs or
webpki-roots?](https://github.com/rustls/rustls-native-certs#should-i-use-this-or-webpki-roots)

Prior discussion at https://github.com/astral-sh/uv/pull/609
2024-02-16 14:07:18 -05:00
Charlie Marsh
458b4f2dde Remove Docker (#1323)
I will revisit this later.
2024-02-15 18:30:18 +00:00
Zanie Blue
2586f655bb Rename to uv (#1302)
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.

Then, update directory and file names:

```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g

# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```

Then add all the files again

```
# Add all the files again
git add crates
git add python/uv

# This one needs a force-add
git add -f crates/uv-trampoline
```
2024-02-15 11:19:46 -06:00
Andrew Gallant
8102980192 puffin-resolver: make VersionMap construction lazy
That is, a `PrioritizedDistribution` for a specific version of a
package is not actually materialized in memory until a corresponding
`VersionMap::get` call is made for that version. Similarly, iteration
lazily materializes distributions as it moves through the map. It
specifically does not materialize everything first.

The main reason why this is effective is that an
`OwnedArchive<SimpleMetadata>` represents a zero-copy (other than
reading the source file) version of `SimpleMetadata` that is really just
a `Vec<u8>` internally. The problem with `VersionMap` construction
previously is that it had to eagerly materialize a `SimpleMetadata` in
memory before anything else, which defeats a large part of the purpose
of zero-copy deserialization. By making more of `VersionMap`
construction itself lazy, we permit doing some parts of resolution
without necessarily fully deserializing a `SimpleMetadata` into memory.
Indeed, with this commit, in the warm cached case, a `SimpleMetadata` is
itself never materialized fully in memory.

This does not completely and totally fully realize the benefits of
zero-copy deserialization. For example, we are likely still building
lots of distributions in memory that we don't actually need in some
cases. Perhaps in cases where no resolution exists, or when one needs to
iterate over large portions of the total versions published for a
package.
2024-02-15 08:10:32 -05:00
Andrew Gallant
bdb491baf6 deps: bump pubgrub
This brings in a [PR] that makes `Range::as_singleton` return a
borrow.

[PR]: https://github.com/zanieb/pubgrub/pull/23
2024-02-15 08:10:32 -05:00
Charlie Marsh
16bb80132f Add an --offline mode (#1270)
## Summary

This PR adds an `--offline` flag to Puffin that disables network
requests (implemented as a Reqwest middleware on our registry client).
When `--offline` is provided, we also allow the HTTP cache to return
stale data.

Closes #942.
2024-02-13 03:35:23 +00:00
Charlie Marsh
c75eef28b5 Upgrade to miette v6.0.0 (#1272) 2024-02-11 23:23:27 -05:00
Charlie Marsh
62416286e2 Remove add and remove commands (#1259)
## Summary

These add and remove dependencies from a `pyproject.toml` -- but they're
currently hidden, and don't match the rest of the workflow. We can
re-add them when the time is right.
2024-02-06 14:18:27 -05:00
Andrew Gallant
d4b4c21133 initial implementation of zero-copy deserialization for SimpleMetadata (#1249)
(Please review this PR commit by commit.)

This PR closes an initial loop on zero-copy deserialization. That
is, provides a way to get a `Archived<SimpleMetadata>` (spelled
`OwnedArchive<SimpleMetadata>` in the code) from a `CachedClient`. The
main benefit of zero-copy deserialization is that we can read bytes
from a file, cast those bytes to a structured representation without
cost, and then start using that type as any other Rust type. The
"catch" is that the structured representation is not the actual type
you started with, but the "archived" version of it.

In order to make all this work, we ended up needing to shave a rather
large yak: we had to re-implement HTTP cache semantics. Previously,
we were using the `http-cache-semantics` crate. While it does support
Serde, it doesn't support `rkyv`. Moreover, even simple support for
`rkyv` wouldn't be enough. What we actually want is for the HTTP cache
semantics to be implemented on the *archived* type so that we can
decide whether our cached response is stale or not without needing to
do a full deserialization into the unarchived type. This is why, in
this PR, you'll see `impl ArchivedCachePolicy { ... }` instead of
`impl CachePolicy { ... }`. (The `derive(rkyv::Archive)` macro
automatically introduces the `ArchivedCachePolicy` type into the
current namespace.)

Unfortunately, this PR does not fully realize the dream that is
zero-copy deserialization. Namely, while a `CachedClient` can now
provide an `OwnedArchive<SimpleMetadata>`, the rest of our code
doesn't really make use of it. Indeed, as soon as we go to build a
`VersionMap`, we eagerly convert our archived metadata into an owned
`SimpleMetadata` via deserialization (that *isn't* zero-copy). After
this change, a lot of the work now shifts to `rkyv` deserialization
and `VersionMap` construction. More precisely, the main thing we drop
here is `CachePolicy` deserialization (which is now truly zero-copy)
and the parsing of the MessagePack format for `SimpleMetadata`. But we
are still paying for deserialization. We're just paying for it in a
different place.

This PR does seem to bring a speed-up, but it is somewhat underwhelming.
My measurements have been pretty noisy, but I get a 1.1x speedup fairly
often:

```
$ hyperfine -w5 "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     164.4 ms ±  18.8 ms    [User: 427.1 ms, System: 348.6 ms]
  Range (min … max):   131.1 ms … 190.5 ms    18 runs

Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     148.3 ms ±  10.2 ms    [User: 357.1 ms, System: 319.4 ms]
  Range (min … max):   136.8 ms … 184.4 ms    19 runs

Summary
  puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.11 ± 0.15 times faster than puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```

One downside is that this does increase cache size (`rkyv`'s
serialization format is not as compact as MessagePack). On disk size
increases by about 1.8x for our `simple-v0` cache.

```
$ sort-filesize cache-main
4.0K    cache-main/CACHEDIR.TAG
4.0K    cache-main/.gitignore
8.0K    cache-main/interpreter-v0
8.7M    cache-main/wheels-v0
18M     cache-main/archive-v0
59M     cache-main/simple-v0
109M    cache-main/built-wheels-v0
193M    cache-main
193M    total

$ sort-filesize cache-test
4.0K    cache-test/CACHEDIR.TAG
4.0K    cache-test/.gitignore
8.0K    cache-test/interpreter-v0
8.7M    cache-test/wheels-v0
18M     cache-test/archive-v0
107M    cache-test/simple-v0
109M    cache-test/built-wheels-v0
242M    cache-test
242M    total
```

Also, while I initially intended to do a simplistic implementation of
HTTP cache semantics, I found that everything was somewhat
inter-connected. I could have wrote code that _specifically_ only worked
with the present behavior of PyPI, but then it would need to be special
cased and everything else would need to continue to use
`http-cache-sematics`. By implementing what we need based on what Puffin
actually is (which is still less than what `http-cache-semantics` does),
we can avoid special casing and use zero-copy deserialization for our
cache policy in _all_ cases.
2024-02-05 16:47:53 -05:00
Zanie Blue
d090acf13d Improve error messaging when a dependency is not found (#1241)
Previously, whenever we encountered a missing package we would throw an
error without information about why the package was requested. This
meant that if a transitive dependency required a missing package, the
user would have no idea why it was even selected. Here, we track
`NotFound` and `NoIndex` errors as `NoVersions` incompatibilities with
an attached reason. Improves our test coverage for `--no-index` without
`--find-links`.

The
[snapshots](https://github.com/astral-sh/puffin/pull/1241/files#diff-3eea1658f165476252f1f061d0aa9f915aabdceafac21611cdf45019447f60ec)
show a nice improvement.

I think this will also enable backtracking to another version if some
version of transitive dependency has a missing dependency. I'll write a
scenario for that next.

Requires https://github.com/zanieb/pubgrub/pull/22
2024-02-05 08:43:05 -06:00
konsti
f10f902570 Yield after channel send and move cpu tasks to thread (#1163)
## Summary

Previously, we were blocking operations that could run in parallel. We
would send request through our main requests channel, but not yield so
that the receiver could only start processing requests much later than
necessary. We solve this by switching to the async
`tokio::sync::mpsc::channel`, where send is an async functions that
yields.

Due to the increased parallelism cache deserialization and the
conversion from simple api request to version map became bottlenecks, so
i moved them to `spawn_blocking`. Together these result in a 30-60%
speedup for larger warm cache resolution. Small cases such as black
already resolve in 5.7 ms on my machine so there's no speedup to be
gained, refresh and no cache were to noisy to get signal from.

Note for the future: Revisit the bounded channel if we want to produce
requests from `process_request`, too, (this would be good for
prefetching) to avoid deadlocks.

## Details

We can look at the behavior change through the spans:

```
RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null
```

Below, you can see how on main, we have discrete phases: All (cached)
simple api requests in parallel, then all (cached) metadata requests in
parallel, repeat until done. The solver is mostly waiting until it has
it's version map from the simple API query to be able to choose a
version. The main thread is blocked by process requests.

In the PR branch, the simple api requests succeeds much earlier,
allowing the solver to advance and also to schedule more prefetching.
Due to that `parse_cache` and `from_metadata` became bottlenecks, so i
moved them off the main thread (green color, and their spans can now
overlap because they can run on multiple threads in parallel). The main
thread isn't blocked on `process_request` anymore, instead it has
frequent idle times. The spans are all much shorter, which indicates
that on main they could have finished much earlier, but a task didn't
yield so they weren't scheduled to finish (though i haven't dug deep
enough to understand the exact scheduling between the process request
stream and the solver here).

**main**


![jupyter-warm-main](https://github.com/astral-sh/puffin/assets/6826232/693c53cc-1090-41b7-b02a-a607fcd2cd99)

**PR**


![jupyter-warm-branch](https://github.com/astral-sh/puffin/assets/6826232/33435f34-b39b-4b0a-a9d7-4bfc22f55f05)

## Benchmarks

```
$ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter"
Benchmark 1: target/profiling/main-dev resolve jupyter
  Time (mean ± σ):      29.1 ms ±   0.7 ms    [User: 22.9 ms, System: 11.1 ms]
  Range (min … max):    27.7 ms …  32.2 ms    103 runs
 
Benchmark 2: target/profiling/branch-dev resolve jupyter
  Time (mean ± σ):      18.8 ms ±   1.1 ms    [User: 37.0 ms, System: 22.7 ms]
  Range (min … max):    16.5 ms …  21.9 ms    154 runs
 
Summary
  target/profiling/branch-dev resolve jupyter ran
    1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter

$ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent"
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      37.8 ms ±   0.9 ms    [User: 30.7 ms, System: 14.1 ms]
  Range (min … max):    36.6 ms …  41.5 ms    79 runs
 
Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent
  Time (mean ± σ):      24.7 ms ±   1.5 ms    [User: 47.0 ms, System: 39.3 ms]
  Range (min … max):    21.5 ms …  28.7 ms    113 runs
 
Summary
  target/profiling/branch-dev resolve meine_stadt_transparent ran
    1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent

$ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):     229.0 ms ±   2.8 ms    [User: 197.3 ms, System: 63.7 ms]
  Range (min … max):   225.8 ms … 234.0 ms    13 runs
 
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):      91.4 ms ±   5.3 ms    [User: 289.2 ms, System: 176.9 ms]
  Range (min … max):    81.0 ms … 104.7 ms    32 runs
 
Summary
  target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
    2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
```
2024-02-02 18:18:24 +01:00
Charlie Marsh
b2f1bbaa63 Add a Ctrl+C handler to the confirm workflow (#1202)
Fixes an issue whereby exiting the confirmation prompt can lead to your
cursor disappearing: https://github.com/console-rs/dialoguer/issues/294.

See:
b839a2c5b7/rye/src/main.rs (L36-L48).
2024-01-31 02:08:27 +00:00
Charlie Marsh
3f5e7306a5 Remove WaitMap dependency (#1183)
## Summary

This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by
removing the `WaitMap` and gaining more granular control over the values
that we hold over `await` boundaries.
2024-01-30 15:25:22 -05:00
Charlie Marsh
aa3b79ec63 Prompt user for missing -r and -e flags in pip install (#1180)
## Summary

If the user runs a command like `pip install requirements.txt`, we now
prompt them to ask if they meant to include the `-r` flag:

![Screenshot 2024-01-29 at 8 38
29 PM](https://github.com/astral-sh/puffin/assets/1309177/82b9f7a2-2526-4144-b200-a5015e5b8a4b)

![Screenshot 2024-01-29 at 8 38
33 PM](https://github.com/astral-sh/puffin/assets/1309177/bd8ebb51-2537-4540-a0e0-718e66a1c69c)

The specific logic is: if the requirement ends in `.txt` or `.in`, and
the file exists locally, prompt the user for `-r`. If the requirement
contains a directory separator, and the directory exists locally, prompt
the user for `-e`.

Closes #1166.
2024-01-30 18:58:45 +00:00
konsti
614bb0cf52 Update async_http_range_reader to 0.5.0 (#1189)
Removes a git dep and removes itertools 0.11
2024-01-30 16:32:53 +00:00
konsti
ab27913f68 Instrument the main function and add jupyter.in (#1186)
Instrument the main function as anchor span for checking overhead and
update tracing-durations-export to 0.2.0 for differentiating
blocking/non-blocking tasks.

Add a `jupyter.in` requirement since `pip install jupyter` is a common
operation. I tried `jupyterlab` too but there is no difference in
performance (1.00 ± 0.07).
2024-01-30 11:03:24 +00:00
Charlie Marsh
61a3060383 Run cargo update (#1178) 2024-01-29 21:01:37 -05:00
Charlie Marsh
fa3c9afdc1 Deduplicate pep440_rs in dependency tree (#1177)
## Summary

Closes https://github.com/astral-sh/puffin/issues/1176.

## Test Plan

`cargo tree -p puffin -i pep440_rs` runs without error. Previously, it
errored due to multiple versions.
2024-01-29 16:11:42 -05:00
Charlie Marsh
4b9daf9604 Use tokio_tar instead of async_tar (#1170)
## Summary

`tokio_tar` is a fork of `async_tar` that uses Tokio instead of
`async-std`. Using it removes a significant dependency from our tree.

(There is an open PR
(https://github.com/dignifiedquire/async-tar/pull/41) in `async-tar` to
add Tokio support, but it's over a year old.)

See:
https://github.com/astral-sh/puffin/pull/1157#discussion_r1469190249.
2024-01-29 10:00:30 -05:00
Charlie Marsh
d88ce76979 Stream unpacking of source distribution downloads (#1157)
This PR migrates our source distribution downloads to unzip as we
stream, similar to our approach for wheels.

In my testing, this showed a consistent speedup (e.g., 6% here for a few
representative source distributions):

```text
❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in
Benchmark 1: ./target/release/main (install-cold)
  Time (mean ± σ):      1.503 s ±  0.039 s    [User: 1.479 s, System: 0.537 s]
  Range (min … max):    1.466 s …  1.605 s    10 runs

Benchmark 2: ./target/release/puffin (install-cold)
  Time (mean ± σ):      1.421 s ±  0.024 s    [User: 1.505 s, System: 0.593 s]
  Range (min … max):    1.381 s …  1.454 s    10 runs

Summary
  './target/release/puffin (install-cold)' ran
    1.06 ± 0.03 times faster than './target/release/main (install-cold)'
```
2024-01-28 20:09:24 -05:00
Andrew Gallant
5219d37250 add initial rkyv support (#1135)
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.

For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:

* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
  cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
  type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
  of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
  else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
  introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
  given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
  allocate and create all necessary objects.

This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.

The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.

My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.

There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.

[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
2024-01-28 12:14:59 -05:00
Charlie Marsh
d6795da0ea Set permissions after streaming unzip (#1151)
## Summary

When we migrated to an "unzip while we stream" solution, we lost the
logic to set permissions on the extracted files, so executables in
wheels were no longer executable. It turns out this is a little tricky,
since the permissions metadata is in the central directory at the _end_
of the zip file, and the async ZIP reader explicitly stops iteration
once it hits the central directory. (Specifically, it goes 4 bytes into
the central directory, since it sees the 4-byte signature header and
then stops.)

So, to solve that, I've added a `CentralDirectoryReader` that continues
where that iterator left off. This required forking the async zip crate:
https://github.com/charliermarsh/rs-async-zip/pull/1. It took a lot of
fiddling but I'm quite confident in the code now, especially since the
async zip crate validates the signature kind on every read.

The central directory is typically quite small (even for the Zig wheel,
which is enormous, it's just around 1MB), so I don't expect this to have
a high cost.

Closes https://github.com/astral-sh/puffin/issues/1148.
2024-01-27 19:22:44 -05:00
Charlie Marsh
abe1867a0d Enable Windows wheel builds in CI (#1129)
Closes https://github.com/astral-sh/puffin/issues/990.
2024-01-27 01:12:25 +00:00
konsti
39021263dd Windows launchers using posy trampolines (#1092)
## Background

In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.

On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.

```python
#!/usr/bin/env python
```

will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:

```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
    sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
    sys.exit(patched_main())
```

On windows, this doesn't work, we can only rely on launching `.exe`
files.

## Summary

We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.

The changes in this PR make the `black` entrypoint work:

```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```

Integration with our existing tests will be done in follow-up PRs.

## Implementation and Details

I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.

The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.

On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
2024-01-26 13:54:11 +00:00
Charlie Marsh
50057cd5f2 Re-add Cargo's known hosts checking (#1118)
## Summary

This ensures that (like Cargo) we don't suffer from
https://github.com/advisories/GHSA-r5w3-xm58-jv6j, by way of checking
known hosts when fetching via `libgit2`.

The implementation is taken from Cargo itself, modified to remove all
configuration, since we don't yet support configuration for known hosts,
etc.

Closes #285.
2024-01-25 22:29:36 -05:00
Charlie Marsh
5ad2e60561 Use same-file to detect interpreter shims (#1099)
Our existing detection doesn't work on Windows, because we canoncalize
the interpreter path but not `info.sys_executable`, so the former
includes the UNC prefix, etc. This is cross-platform and gets at the
intent of the check.
2024-01-25 12:27:49 -05:00
Zanie Blue
272555e915 Switch to ref on main for PubGrub (#1094)
Just fixing the wrong merge order from
https://github.com/astral-sh/puffin/pull/1088
2024-01-25 14:50:12 +00:00
Charlie Marsh
904db967af Use junctions instead of symlinks on Windows (#1087)
## Summary

When we unzip wheels in the cache, we write the directories out to an
`archive-v0` bucket, and then symlink into that bucket from the
`wheels-v0` and `built-wheels-v0` buckets.

On Windows, symlinks are not well supported. Specifically, they need to
be explicitly enabled by the user. So, instead of symlinks, we now use
junctions, which are well-supported on Windows, and allow you to
(effectively) symlink a directory to another directory. This PR
implements said junction support, which gets the core installer working
on Windows.

In the past, we also used symlinks to implement another primitive: we
wanted to be able to replace a directory "atomically" (I put
"atomically" in quotes because I don't know if it's actually a
guaranteed atomic operation), in case someone was trying to use the
directory while we were replacing it (as opposed to deleting the
directory, then moving it into place).

On Windows, it doesn't appear to be possible to atomically replace a
junction. So instead, I'm using a new design, whereby the cache always
returns canonicalized paths. We know these canonicalized paths are
unique and won't be replaced, so they're safe for writers to rely on. In
general, when we write new data to the cache, we now return the
canonicalized path. When we read from the cache, and try to identify
(e.g.) the set of wheels available to us, we canonicalize the links
immediately and consider them non-existent if that operation fails.

Closes #1085.

---------

Co-authored-by: konstin <konstin@mailbox.org>
2024-01-25 10:06:38 +01:00
Zanie Blue
ed1ac640b9 Consolidate UnusableDependencies into a generic Unavailable incompatibility (#1088)
Requires https://github.com/zanieb/pubgrub/pull/20

In short, `UnusableDependencies` can be generalized into `Unavailable`
which encompasses incompatibilities where a package range which is
unusable for some inherent reason as well as when its dependencies are
unusable. We can eventually use this to track more incompatibilities in
the solver. I made the reason string required because I can't see a case
where we should leave it out.

Additionally, this improves the display of conflicts in the root
requirements.
2024-01-24 22:10:44 -06:00
Charlie Marsh
0519375bd6 Remove some unused dependencies (#1077) 2024-01-24 11:58:21 -05:00
Charlie Marsh
63f3434b21 Use nanoid instead of uuid (#1074)
## Summary

Gives us equivalent randomness with ~half as many characters.
2024-01-24 05:05:14 +00:00
Andrew Gallant
eebc2f340a make some things guaranteed to be deterministic (#1065)
This PR replaces a few uses of hash maps/sets with btree maps/sets and
index maps/sets. This has the benefit of guaranteeing a deterministic
order of iteration.

I made these changes as part of looking into a flaky test.
Unfortunately, I'm not optimistic that anything here will actually fix
the flaky test, since I don't believe anything was actually dependent
on the order of iteration.
2024-01-23 20:30:33 -05:00
Charlie Marsh
6561617c56 Store source distribution builds under a unique manifest ID (#1051)
## Summary

This is a refactor of the source distribution cache that again aims to
make the cache purely additive. Instead of deleting all built wheels
when the cache gets invalidated (e.g., because the source distribution
changed on PyPI or something), we now treat each invalidation as its own
cache directory. The manifest inside of the source distribution
directory now becomes a pointer to the "latest" version of the source
distribution cache.

Here's a visual example:

![Screenshot 2024-01-22 at 5 35
41 PM](https://github.com/astral-sh/puffin/assets/1309177/ca103c83-e116-4956-b91c-8434fe62cffe)

With this change, we avoid deleting built distributions that might be
relied on elsewhere and maintain our invariant that the cache is purely
additive. The cost is that we now preserve stale wheels, but we should
add a garbage collection mechanism to deal with that.
2024-01-23 19:49:11 +00:00
Charlie Marsh
53b7e3cb4f Update list of expected architectures for cargo-dist (#1021) 2024-01-19 22:32:01 +00:00
Charlie Marsh
b3954f2449 Enable PowerPC builds (#1017)
Closes #1015.
2024-01-19 17:29:11 -05:00
Charlie Marsh
72924935f8 Upgrade cargo-dist (#1016) 2024-01-19 20:19:02 +00:00
Charlie Marsh
7b365195cb Add support for ARM Linux builds in release (#1012)
Closes #992.
2024-01-19 15:13:07 -05:00
Charlie Marsh
980e1f6d79 Set explicit Docker permissions (#997) 2024-01-19 05:29:29 +00:00
Charlie Marsh
6af4eb7a45 Remove [puffin] prefix (#989)
And disable the plan job on CI now.
2024-01-19 01:59:06 +00:00
Charlie Marsh
3a1cd44fc6 Add Puffin Docker image (#985)
Missing piece for the release.

## Test Plan

Built the image locally:

```shell
❯ docker run 99956098e1f8f04e209dcfc4a0afcee67df1fe8a726c164884e67f035b1a0f42
Usage: puffin [OPTIONS] <COMMAND>

Commands:
  pip    Resolve and install Python packages
  venv   Create a virtual environment
  clean  Clear the cache
  help   Print this message or the help of the given subcommand(s)

Options:
  -q, --quiet                  Do not print any output
  -v, --verbose                Use verbose output
  -n, --no-cache               Avoid reading from or writing to the cache
      --cache-dir <CACHE_DIR>  Path to the cache directory [env: PUFFIN_CACHE_DIR=]
  -h, --help                   Print help
  -V, --version                Print version
```
2024-01-18 20:21:31 -05:00
Charlie Marsh
59d700ad2d Run cargo-dist plan on PR (#971) 2024-01-18 16:16:11 -05:00
Charlie Marsh
f9154e8297 Add release workflow (#961)
## Summary

This PR adds a release workflow powered by `cargo-dist`. It's similar to
the version that's PR'd in Ruff
(https://github.com/astral-sh/ruff/pull/9559), with the exception that
it doesn't include the Docker build or the "update dependents" step for
pre-commit.
2024-01-18 15:44:11 -05:00
Charlie Marsh
96a61fb351 Remove RFC2047 decoder (#967)
## Summary

- This was inherited from
d719988323/src/metadata.rs (LL78C2-L91C26)
- ...which introduced this code here:
9cd1d43f7c
- ...with the originating issue here:
https://github.com/PyO3/maturin/issues/612
- ...and the upstream issue here:
https://github.com/staktrace/mailparse/issues/50

It seems like the goal was to support Unicode in certain header fields,
but I don't think this is necessary for us. We only use
`get_first_value` for `Requires-Python`, which has to be ASCII, doesn't
it?

In my testing, it seems like the `charset` hack can also be removed. The
tests I copied over actually work without it, which makes me a bit
skeptical.

The main benefit here is that we get to a remove a _big_ dependency
stack, including Chumsky and Stacker and psm which have limited
cross-platform support.
2024-01-18 15:09:45 -05:00
Charlie Marsh
fb22680311 Remove legacy release files (#960)
I added these long ago, and they're going to make the release diff a lot
more confusing.
2024-01-18 05:07:55 +00:00