Commit Graph

1000 Commits

Author SHA1 Message Date
Andrew Gallant ed000d0dd5 puffin-resolver: add singleton fast path
In many cases, version ranges are actually just pins to a
specific and single version. And we can detect that statically
by examining the range. If we do have a range that is just one
version, then we can ask a `VersionMap` for just that version
instead of iterating over what's in the map until we find one
that satisfies the range.

I had tried this before making `VersionMap` construction lazy,
but it didn't seem to matter much. But helps a lot more now
with a lazy `VersionMap` because it lets us avoid creating a
lot of distributions in memory that we won't ultimately use.
2024-02-15 08:10:32 -05:00
Andrew Gallant 8102980192 puffin-resolver: make VersionMap construction lazy
That is, a `PrioritizedDistribution` for a specific version of a
package is not actually materialized in memory until a corresponding
`VersionMap::get` call is made for that version. Similarly, iteration
lazily materializes distributions as it moves through the map. It
specifically does not materialize everything first.

The main reason why this is effective is that an
`OwnedArchive<SimpleMetadata>` represents a zero-copy (other than
reading the source file) version of `SimpleMetadata` that is really just
a `Vec<u8>` internally. The problem with `VersionMap` construction
previously is that it had to eagerly materialize a `SimpleMetadata` in
memory before anything else, which defeats a large part of the purpose
of zero-copy deserialization. By making more of `VersionMap`
construction itself lazy, we permit doing some parts of resolution
without necessarily fully deserializing a `SimpleMetadata` into memory.
Indeed, with this commit, in the warm cached case, a `SimpleMetadata` is
itself never materialized fully in memory.

This does not completely and totally fully realize the benefits of
zero-copy deserialization. For example, we are likely still building
lots of distributions in memory that we don't actually need in some
cases. Perhaps in cases where no resolution exists, or when one needs to
iterate over large portions of the total versions published for a
package.
2024-02-15 08:10:32 -05:00
Andrew Gallant e2f3ad0e28 puffin-resolver: add some trace calls
This commit adds some logging to candidate selection during
resolution. The idea with these logs is to get a signal on
how much "exploring" the resolver does in specific examples.

For example, this logs helped me realize that at least in
some cases, candidate selection was looking through a long list
of versions even when its range consisted of exactly one
version. We'll use this fact in a later commit.
2024-02-15 08:10:32 -05:00
Andrew Gallant 1cff7c3774 platform-tags: make Tags use an Arc internally
This makes cloning and thus sharing across multiple threads much
cheaper. Since Tags is conceptually immutable once it is constructed,
this doesn't pose an issue and shouldn't introduce any additional
costs.
2024-02-15 08:10:32 -05:00
Charlie Marsh e4fffc15f5
Remove Cargo-specific error messages (#1306)
We're leveraging Cargo's git implementation, but we left in some
Cargo-specific error messages for features we don't yet support.
2024-02-15 06:04:22 +00:00
Zanie Blue 9808c6b500
Reset all of the snapshots for consistent indentation (#1300)
This is really annoying, but the snapshots keep changing indentation
when updated.

I could not get insta to update them. So I added a print statement to
`main` and updated the snapshots, then removed the statement and updated
the snapshots again to force them all to refresh.
2024-02-14 12:50:28 -06:00
Charlie Marsh 40b74fb0fb
Replace `MarkupSafe` for no-binary tests (#1296) 2024-02-14 04:44:07 +00:00
Zanie Blue 7fec2a311a
Refactor storage of distribution metadata needed in resolver (#1291)
Follows #1290 and https://github.com/astral-sh/puffin/pull/912 with some
minor clean-up.
2024-02-13 04:19:21 +00:00
Zanie Blue 3bff8d5f79
Add scenario coverage for wheels with incompatible ABI and Python tags (#1285)
We use

- An arbitrary ABI hash: `MMMMMM` (six base64 characters)
- An unlikely Jython27 Python tag

For cases that are valid but are never going to be available during
tests.

See https://github.com/zanieb/packse/pull/109
2024-02-12 22:14:38 -06:00
Zanie Blue b5dd8b7de2
Track yanked versions as incompatibilities (#1290)
Moves yanked version filtering from `VersionMap::from_metadata` to the
resolver and tracks it as a PubGrub unavailable incompatibility so
yanked versions are reflected in error messages.

e.g. before
```
╰─▶ Because only albatross<=0.1.0 is available and you require albatross>0.1.0, 
       we can conclude that the requirements are unsatisfiable.
```

after

```
╰─▶ Because only the following versions of albatross are available:
            albatross<=0.1.0
            albatross==1.0.0
      and albatross==1.0.0 is unusable because it was yanked, we can conclude that albatross>0.1.0 cannot be used.
      And because you require albatross>0.1.0, we can conclude that the requirements are unsatisfiable.
```
2024-02-12 22:01:17 -06:00
Charlie Marsh d8619f668a
Surface errors for offline `--find-links` URLs (#1271)
## Summary

Ensures that if the user passes `--no-index` with `--find-links`, and
we're unable to access the HTML page, we show an appropriate hint.
2024-02-13 03:41:00 +00:00
Charlie Marsh 16bb80132f
Add an `--offline` mode (#1270)
## Summary

This PR adds an `--offline` flag to Puffin that disables network
requests (implemented as a Reqwest middleware on our registry client).
When `--offline` is provided, we also allow the HTTP cache to return
stale data.

Closes #942.
2024-02-13 03:35:23 +00:00
Zanie Blue 0cd6b7be8c
Fix incompatible wheel test scenarios (#1284)
I had specified the tags incorrectly
https://github.com/zanieb/packse/pull/105
2024-02-12 18:51:49 +00:00
Zanie Blue 6d24d998e0
Add scenarios for yanked packages (#1283) 2024-02-12 12:44:59 -06:00
Zanie Blue 336d12556c
Add scenario tests for `--only-binary` and `--no-binary` (#1279) 2024-02-12 11:21:14 -06:00
Charlie Marsh b386590b3c
Add some compatibility arguments to `puffin venv` (#1282)
See: https://github.com/astral-sh/puffin/issues/1276.
2024-02-12 03:19:55 +00:00
Charlie Marsh 93b7a1140f
Allow virtualenv creation at existing, empty directories (#1281)
## Summary

If the directory exists but is empty, we should allow `puffin venv`
without erroring.

Also adds test cases for a variety of error cases.
2024-02-12 03:13:13 +00:00
Charlie Marsh b7e3933fe7
Place editable requirements before non-editable requirements (#1278)
## Summary

`pip-compile` puts the editable requirements first.

Closes https://github.com/astral-sh/puffin/issues/1275.
2024-02-12 02:26:40 +00:00
Charlie Marsh a16ec45d1f
Set an exclude cutoff for virtualenv tests (#1280)
## Summary

This test is failing since a new version of one of the seed packages was
uploaded.
2024-02-12 02:21:05 +00:00
Zanie Blue a37b08808e
Implement pip compatible `--no-binary` and `--only-binary` options (#1268)
Updates our `--no-binary` option and adds a `--only-binary` option for
compatibility with `pip` which uses `:all:`, `:none:` and `<name>` for
specifying packages.

This required adding support for `--only-binary <name>` into our
resolver, previously it was only a boolean toggle.

Retains`--no-build` which is equivalent to `--only-binary :all:`. This
is common enough for safety that I would prefer it is available without
pip's awkward `:all:` syntax.

---------

Co-authored-by: konsti <konstin@mailbox.org>
2024-02-11 19:31:41 -06:00
Charlie Marsh d98b3c3070
Strip UNC prefix when setting working directory (#1277)
## Summary

For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...

This PR ensures that we remove the UNC prefix when setting the current
working directory.

Closes #1238.

## Test Plan

I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
2024-02-12 00:51:36 +00:00
Charlie Marsh ba4c6e1a55
Remove unused deps (#1273) 2024-02-11 18:53:58 +00:00
Charlie Marsh 32aacc35a9
Bump version to v0.0.4 (#1269) 2024-02-09 16:42:17 -05:00
konsti 561e33e353
Validate instead of discovering python patch version (#1266)
Contrary to our prior assumption, we can't reliably select a specific
patch version. With the deadsnakes PPA for example, `python3.12` is
installed into `PATH`, but `python3.12.1` isn't. Based on the assumption
(or rather, observation) that users have a single python patch version
per python minor version installed, generally the latest, we only check
if the installed patch version matches the selected patch version, if
any, instead of search for one.

In the process, i deduplicated the python discovery logic.
2024-02-08 22:38:00 +01:00
konsti 1dc9904f8c
Run the test suite on windows in CI (#1262)
Run `cargo test` on windows in CI, pulling the switch on tier 1 windows
support.

These changes make the bootstrap script virtually required for running
the tests. This gives us consistency between and CI, but it also locks
our tests to python-build-standalone and an articificial `PATH`.

I've deleted the shell bootstrap script in favor of only the python one,
which also runs on windows. I've left the (sym)link creation of the
bootstrap in place, even though it is not used by the tests anymore.

I've reactivated the three tests that would previously stack overflow by
doubling their stack sizes. The stack overflows only happen in debug
mode, so this is neither a user facing problem nor an actual problem
with our code and this workaround seems better than optimizing our code
for case that the (release) compiler can optimize much better for.

The handling of patch versions will be fixed in a follow-up PR.

Closes #1160 
Closes #1161

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-02-08 22:09:55 +01:00
Andrew Gallant 96276d9e3e
puffin-resolver: simplify version map construction (#1267)
In the process of making VersionMap construction lazy, I realized this
refactoring would be useful to me. It also simplifies a fair bit of case
analysis and does fewer BTreeMap lookups during construction. With that
said, this doesn't seem to matter for perf:

```
$ hyperfine -w10 --runs 50 \
    "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" \
    "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null"
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     146.8 ms ±   4.1 ms    [User: 350.1 ms, System: 314.2 ms]
  Range (min … max):   140.7 ms … 158.0 ms    50 runs

Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     146.8 ms ±   4.5 ms    [User: 359.8 ms, System: 308.3 ms]
  Range (min … max):   138.2 ms … 160.1 ms    50 runs

Summary
  puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.00 ± 0.04 times faster than puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```

But the simplification is still nice, and will decrease the delta
between what we have now and a lazy version map.
2024-02-08 15:33:33 -05:00
Zanie Blue fc2ab611d5
Use "locations" instead of "listings" for find links errors (#1263) 2024-02-07 10:28:22 -06:00
konsti ab45485eb5
Reduce stack sizes further and ignore remaining tests (#1261)
This PR reduces the stack sizes a windows a little further using the
stack traces from stack overflows combined with looking at the type
sizes. Ultimately, it ignore the three remaining tests failing in debug
on windows due to stack overflows to unblock `cargo test` for windows on
CI.

444 tests run: 444 passed (39 slow), 1 skipped
2024-02-06 23:08:18 +01:00
konsti e0cdf1a16f
Use anstream consistently and remove clippy lints (#1260)
We need to use the anstream print macros instead of the std print
macros, otherwise we risk wrong color behavior
(https://github.com/astral-sh/puffin/pull/1258#discussion_r1480428236).
Luckily, the `print_stderr` and `print_stdout` lints catch usages of the
std prints.

This PR switches over to anstream consistently and removes the now
redundant clippy lints. The lints should catch missing anstream usage in
the future.
2024-02-06 22:16:26 +01:00
konsti f4ca175df4
Search and replace windows specific tests in deps (#1255)
Remove windows-only dependencies from the snapshot output using regex.
We now do the filtering entirely on our without relying on insta
settings.

435 tests run: 430 passed (30 slow), 5 failed, 1 skipped
2024-02-06 19:31:42 +00:00
konsti ac49dec4a2
Multiple entries in PUFFIN_PYTHON_PATH for windows tests (#1254)
There are no binary installers for the latests patch versions of cpython
for windows, and building them is hard. As an alternative, we download
python-build-standanlone cpythons and put them into `<project
root>/bin`. On unix, we can symlink `pythonx.y.z` into this directory
and point `PUFFIN_PYTHON_PATH` to it. On windows, all pythons are called
`python.exe` and they don't like being linked. Instead, we add the path
to each directory containing a `python.exe` to `PUFFIN_PYTHON_PATH`,
similar to the regular `PATH`. The python discovery on windows was
extended to respect `PUFFIN_PYTHON_PATH` where needed.

These changes mean that we don't need to (sym)link pythons anymore and
could drop that part to the script.

435 tests run: 389 passed (21 slow), 46 failed, 1 skipped
2024-02-06 20:28:30 +01:00
Charlie Marsh 91118a962a
Offer tip when users omits pip prefix (#1257)
## Summary

Open to other opinions here. We could just continue (and warn), prompt
the user with a confirmation, etc.

(The weird thing about those two options is we might need to validate
the command-line arguments _before_ we do that -- so you could get
errors for bad arguments, and then get a warning that your subcommand is
wrong. I can probably avoid that with more work if it feels like a
better out come though.)

Closes https://github.com/astral-sh/puffin/issues/1256.
2024-02-06 19:25:07 +00:00
Charlie Marsh 62416286e2
Remove `add` and `remove` commands (#1259)
## Summary

These add and remove dependencies from a `pyproject.toml` -- but they're
currently hidden, and don't match the rest of the workflow. We can
re-add them when the time is right.
2024-02-06 14:18:27 -05:00
Zanie Blue d4bbaf1755
Add hint for `--no-index` without `--find-links` (#1258)
Since unavailable packages with `--no-index` can be confusing when the
user does not also provide `--find-links` we add a hint for this case.
Required some plumbing to get the required information to the
`NoSolution` error.

---------

Co-authored-by: konstin <konstin@mailbox.org>
2024-02-06 11:04:14 -06:00
konsti b2a810fe37
Add windows specific filters for tests (#1231)
Add more windows specific filters in various places.

435 tests run: 333 passed (21 slow), 102 failed, 1 skipped
2024-02-06 15:58:16 +01:00
Andrew Gallant d4b4c21133
initial implementation of zero-copy deserialization for SimpleMetadata (#1249)
(Please review this PR commit by commit.)

This PR closes an initial loop on zero-copy deserialization. That
is, provides a way to get a `Archived<SimpleMetadata>` (spelled
`OwnedArchive<SimpleMetadata>` in the code) from a `CachedClient`. The
main benefit of zero-copy deserialization is that we can read bytes
from a file, cast those bytes to a structured representation without
cost, and then start using that type as any other Rust type. The
"catch" is that the structured representation is not the actual type
you started with, but the "archived" version of it.

In order to make all this work, we ended up needing to shave a rather
large yak: we had to re-implement HTTP cache semantics. Previously,
we were using the `http-cache-semantics` crate. While it does support
Serde, it doesn't support `rkyv`. Moreover, even simple support for
`rkyv` wouldn't be enough. What we actually want is for the HTTP cache
semantics to be implemented on the *archived* type so that we can
decide whether our cached response is stale or not without needing to
do a full deserialization into the unarchived type. This is why, in
this PR, you'll see `impl ArchivedCachePolicy { ... }` instead of
`impl CachePolicy { ... }`. (The `derive(rkyv::Archive)` macro
automatically introduces the `ArchivedCachePolicy` type into the
current namespace.)

Unfortunately, this PR does not fully realize the dream that is
zero-copy deserialization. Namely, while a `CachedClient` can now
provide an `OwnedArchive<SimpleMetadata>`, the rest of our code
doesn't really make use of it. Indeed, as soon as we go to build a
`VersionMap`, we eagerly convert our archived metadata into an owned
`SimpleMetadata` via deserialization (that *isn't* zero-copy). After
this change, a lot of the work now shifts to `rkyv` deserialization
and `VersionMap` construction. More precisely, the main thing we drop
here is `CachePolicy` deserialization (which is now truly zero-copy)
and the parsing of the MessagePack format for `SimpleMetadata`. But we
are still paying for deserialization. We're just paying for it in a
different place.

This PR does seem to bring a speed-up, but it is somewhat underwhelming.
My measurements have been pretty noisy, but I get a 1.1x speedup fairly
often:

```
$ hyperfine -w5 "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     164.4 ms ±  18.8 ms    [User: 427.1 ms, System: 348.6 ms]
  Range (min … max):   131.1 ms … 190.5 ms    18 runs

Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     148.3 ms ±  10.2 ms    [User: 357.1 ms, System: 319.4 ms]
  Range (min … max):   136.8 ms … 184.4 ms    19 runs

Summary
  puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.11 ± 0.15 times faster than puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```

One downside is that this does increase cache size (`rkyv`'s
serialization format is not as compact as MessagePack). On disk size
increases by about 1.8x for our `simple-v0` cache.

```
$ sort-filesize cache-main
4.0K    cache-main/CACHEDIR.TAG
4.0K    cache-main/.gitignore
8.0K    cache-main/interpreter-v0
8.7M    cache-main/wheels-v0
18M     cache-main/archive-v0
59M     cache-main/simple-v0
109M    cache-main/built-wheels-v0
193M    cache-main
193M    total

$ sort-filesize cache-test
4.0K    cache-test/CACHEDIR.TAG
4.0K    cache-test/.gitignore
8.0K    cache-test/interpreter-v0
8.7M    cache-test/wheels-v0
18M     cache-test/archive-v0
107M    cache-test/simple-v0
109M    cache-test/built-wheels-v0
242M    cache-test
242M    total
```

Also, while I initially intended to do a simplistic implementation of
HTTP cache semantics, I found that everything was somewhat
inter-connected. I could have wrote code that _specifically_ only worked
with the present behavior of PyPI, but then it would need to be special
cased and everything else would need to continue to use
`http-cache-sematics`. By implementing what we need based on what Puffin
actually is (which is still less than what `http-cache-semantics` does),
we can avoid special casing and use zero-copy deserialization for our
cache policy in _all_ cases.
2024-02-05 16:47:53 -05:00
Charlie Marsh 398659a9b0
Show yank warnings for `pip install` (#1253)
Closes https://github.com/astral-sh/puffin/issues/1252.
2024-02-05 17:15:44 +00:00
Zanie Blue d090acf13d
Improve error messaging when a dependency is not found (#1241)
Previously, whenever we encountered a missing package we would throw an
error without information about why the package was requested. This
meant that if a transitive dependency required a missing package, the
user would have no idea why it was even selected. Here, we track
`NotFound` and `NoIndex` errors as `NoVersions` incompatibilities with
an attached reason. Improves our test coverage for `--no-index` without
`--find-links`.

The
[snapshots](https://github.com/astral-sh/puffin/pull/1241/files#diff-3eea1658f165476252f1f061d0aa9f915aabdceafac21611cdf45019447f60ec)
show a nice improvement.

I think this will also enable backtracking to another version if some
version of transitive dependency has a missing dependency. I'll write a
scenario for that next.

Requires https://github.com/zanieb/pubgrub/pull/22
2024-02-05 08:43:05 -06:00
Charlie Marsh be9125b0f0
Remove unnecessary `is_dir` in `clone_recursive` (#1247) 2024-02-04 23:54:22 +00:00
Charlie Marsh 9d42cfd09b
Clarify documentation for `--no-index` (#1243)
Closes #1242.
2024-02-04 18:46:01 -05:00
Andrew Gallant 586eeb6eca
tests: update snapshot for new `pip` release (#1245)
See: https://pypi.org/project/pip/#history
2024-02-03 15:44:13 -05:00
Zanie Blue bc2f8f5b1e
Fix direct use of `range.simplify` (#1236) 2024-02-02 18:08:25 +00:00
Zanie Blue 6db9db0079
Invert display of "no versions" incompatibilities with multiple ranges (#1233)
Closes #884 

e.g.

```
❯ cargo run -q -- pip compile --python-version 3.12 requirements.in
  × No solution found when resolving dependencies:
  ╰─▶ Because the requested Python version (3.12) does not satisfy Python>=3.6,<3.10 and recommenders==1.0.0 depends on Python>=3.6,<3.9, we can conclude that recommenders==1.0.0 cannot be used.
      And because only the following versions of recommenders are available:
          recommenders<=0.7
          recommenders==1.0.0
          recommenders==1.1.0
          recommenders==1.1.1
      we can conclude that recommenders>0.7,<1.1.0 cannot be used. (1)

      Because the requested Python version (3.12) does not satisfy Python>=3.6,<3.10 and recommenders>=1.1.0 depends on Python>=3.6,<3.10, we can conclude that recommenders>=1.1.0 cannot be used.
      And because we know from (1) that recommenders>0.7,<1.1.0 cannot be used, we can conclude that recommenders>0.7 cannot be used.
      And because you require recommenders>0.7, we can conclude that the requirements are unsatisfiable.
```
2024-02-02 12:00:25 -06:00
konsti f10f902570
Yield after channel send and move cpu tasks to thread (#1163)
## Summary

Previously, we were blocking operations that could run in parallel. We
would send request through our main requests channel, but not yield so
that the receiver could only start processing requests much later than
necessary. We solve this by switching to the async
`tokio::sync::mpsc::channel`, where send is an async functions that
yields.

Due to the increased parallelism cache deserialization and the
conversion from simple api request to version map became bottlenecks, so
i moved them to `spawn_blocking`. Together these result in a 30-60%
speedup for larger warm cache resolution. Small cases such as black
already resolve in 5.7 ms on my machine so there's no speedup to be
gained, refresh and no cache were to noisy to get signal from.

Note for the future: Revisit the bounded channel if we want to produce
requests from `process_request`, too, (this would be good for
prefetching) to avoid deadlocks.

## Details

We can look at the behavior change through the spans:

```
RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null
```

Below, you can see how on main, we have discrete phases: All (cached)
simple api requests in parallel, then all (cached) metadata requests in
parallel, repeat until done. The solver is mostly waiting until it has
it's version map from the simple API query to be able to choose a
version. The main thread is blocked by process requests.

In the PR branch, the simple api requests succeeds much earlier,
allowing the solver to advance and also to schedule more prefetching.
Due to that `parse_cache` and `from_metadata` became bottlenecks, so i
moved them off the main thread (green color, and their spans can now
overlap because they can run on multiple threads in parallel). The main
thread isn't blocked on `process_request` anymore, instead it has
frequent idle times. The spans are all much shorter, which indicates
that on main they could have finished much earlier, but a task didn't
yield so they weren't scheduled to finish (though i haven't dug deep
enough to understand the exact scheduling between the process request
stream and the solver here).

**main**


![jupyter-warm-main](https://github.com/astral-sh/puffin/assets/6826232/693c53cc-1090-41b7-b02a-a607fcd2cd99)

**PR**


![jupyter-warm-branch](https://github.com/astral-sh/puffin/assets/6826232/33435f34-b39b-4b0a-a9d7-4bfc22f55f05)

## Benchmarks

```
$ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter"
Benchmark 1: target/profiling/main-dev resolve jupyter
  Time (mean ± σ):      29.1 ms ±   0.7 ms    [User: 22.9 ms, System: 11.1 ms]
  Range (min … max):    27.7 ms …  32.2 ms    103 runs
 
Benchmark 2: target/profiling/branch-dev resolve jupyter
  Time (mean ± σ):      18.8 ms ±   1.1 ms    [User: 37.0 ms, System: 22.7 ms]
  Range (min … max):    16.5 ms …  21.9 ms    154 runs
 
Summary
  target/profiling/branch-dev resolve jupyter ran
    1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter

$ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent"
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      37.8 ms ±   0.9 ms    [User: 30.7 ms, System: 14.1 ms]
  Range (min … max):    36.6 ms …  41.5 ms    79 runs
 
Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent
  Time (mean ± σ):      24.7 ms ±   1.5 ms    [User: 47.0 ms, System: 39.3 ms]
  Range (min … max):    21.5 ms …  28.7 ms    113 runs
 
Summary
  target/profiling/branch-dev resolve meine_stadt_transparent ran
    1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent

$ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):     229.0 ms ±   2.8 ms    [User: 197.3 ms, System: 63.7 ms]
  Range (min … max):   225.8 ms … 234.0 ms    13 runs
 
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):      91.4 ms ±   5.3 ms    [User: 289.2 ms, System: 176.9 ms]
  Range (min … max):    81.0 ms … 104.7 ms    32 runs
 
Summary
  target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
    2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
```
2024-02-02 18:18:24 +01:00
konsti 3771f6656e
Allow additional assertions on command output (#1226)
In the scenario tests, we want to make sure we're actually conforming to
the scenario's expectations, so we now have an extra assertion on
whether resolution failed or succeeded as well as that it includes the
given packages.

Closes #1112
Closes #1030
2024-02-02 09:41:35 +00:00
konsti b16422a108
Remove insta_cmd (#1225)
We need more flexible filters than those `inta` offers, and `insta_cmd`
makes it impossible to plug in programmatic filters. At the same time we
use barely any of `insta_cmd`'s features. We can replace the subset we
need in about 50 loc.
2024-02-02 09:37:04 +00:00
konsti 0925e446a8
Refactor remaining integration tests (#1220)
Mostly a mechanical refactor to use the `puffin_snapshot!` and
`TestContext` infrastructure in the add, remove, venv and pip uninstall
tests, in preparation for adding programmatic windows testing filters.
The is only one remaining usage of `assert_cmd_snapshot!` now in the
`puffin_snapshot!` macro.
2024-02-02 10:26:59 +01:00
konsti 6b050a1972
Refactor pip install and sync tests (#1213)
Mostly a mechanical refactor to use the `puffin_snapshot!` and
`TestContext` infrastructure in the pip install and pip sync tests, in
preparation for adding programmatic windows testing filters.
2024-02-02 10:26:31 +01:00
Charlie Marsh d77d129e8d
Run `cargo update` (#1230) 2024-02-01 11:14:38 -05:00
Charlie Marsh bb49ebee1e
Avoid race condition in clone file replacement (#1229)
## Summary

I've never seen this in practice but in theory it is possible, and we
have the same guardrail in the hardlink path.
2024-02-01 10:55:23 -05:00
konsti 809c6d676f
Use normalized display in tests and other small windows fixes (#1228)
Split out from the large test refactoring PR. Use `normalized_display`
in tests and two more thiserror derives to match snapshots and output,
and other small windows fixes.
2024-02-01 16:12:30 +01:00
Charlie Marsh 9487378ef9
Avoid TOCTOU errors in data directory installations (#1227)
## Summary

See: https://github.com/astral-sh/puffin/issues/1224

## Test Plan

Ran `python -m scripts.bench --puffin
scripts/requirements/compiled/jupyter.txt --min-runs 100 --benchmark
install-warm --verbose` several times, which failed eventually on `main`
but not on this branch.
2024-02-01 14:55:29 +00:00
konsti ea0bfc565d
Refactor pip scenario tests (#1212)
Mostly a mechanical refactor to use the `puffin_snapshot!` and
`TestContext` infrastructure in the pip compile and pip install
scenarios, in preparation for adding programmatic windows testing
filters.
2024-02-01 10:31:40 +01:00
Charlie Marsh 0757862a7a
Accommodate minute-level filters in Insta (#1219)
I don't know why `compile_editable` took over a minute in this case, but
seems like it did? Hard to test this fix.


https://github.com/astral-sh/puffin/actions/runs/7734769259/job/21089338951?pr=1216
2024-02-01 09:43:09 +01:00
Charlie Marsh 8cbe1d220c
Remove double-download for source distributions (#1218)
## Summary

Oops -- this was using a different cache key than the route above (this
is the wheel _metadata_ route vs. the wheel build route), so we were
saving and building source distributions twice in `pip install`.
2024-02-01 04:41:29 +00:00
Charlie Marsh 51e8609ee8
Use Python 3.12 in benchmarks (#1215)
I originally used Python 3.10, since 3.10 and 3.11 are by far the most
common (at least for [Ruff](https://pypistats.org/packages/ruff)). But
3.12 should give Python tools the most favorable benchmarks.
2024-01-31 15:51:13 -05:00
Charlie Marsh c4bfb6efee
Add a `BENCHMARKS.md` with rendered benchmarks (#1211)
As a precursor to the release, I want to include a structured document
with detailed benchmarks.

Closes https://github.com/astral-sh/puffin/issues/1210.
2024-01-31 20:11:52 +00:00
Andrew Gallant b9d89e7624
puffin-client: generalize SimpleMetadaRaw into OwnedArchive<A> (#1208)
It turns out that the pattern I coded up for SimpleMetadataRaw is
generally useful when working with rkyv. This commit makes it generic by
supporting any type that implements rkyv's traits, and makes a few
simplifying assumptions by picking a concrete serializer, validator and
deserializer. In effect, this lets use own any archived value.

We also rejigger the API a little bit and double-down on
`OwnedArchive<A>` just being a owned wrapper for `Archived<A>`. Namely,
we implement `Deref` and turn its inherent methods into methods that
require fully qualified syntax. (As is standard for things that
implement `Deref` to avoid ambiguity with the deref target's methods.)

(This PR also makes a couple small simplifications to our custom rkyv
serializer since we no longer need to use it directly. We do still need
to name the type in trait bounds, so it has to be public.)
2024-01-31 11:56:34 -05:00
konsti 234e8d0bb7
Abstract away test duplication in pip-compile (#1187)
In preparation for the new windows handling, i want to introduce a
`TestContext` and `puffin_snapshot!` abstraction. This PR applies those
changes for pip-compile. My plan is to use those for all venv-based
integration tests and build the custom windows filters on top of
`puffin_snapshot!`.
2024-01-31 16:11:10 +00:00
Charlie Marsh 01258c1bb3
Report number of bytes deleted when clearing cache (#1203)
## Summary

This is based on Cargo's `clean` implementation, with modifications
based on some of my own preferences, and to better adhere to patterns we
use in our codebase:

![Screenshot 2024-01-31 at 1 31
10 AM](https://github.com/astral-sh/puffin/assets/1309177/38704798-b17f-4972-ab67-00484ce63d62)
2024-01-31 10:48:28 -05:00
Charlie Marsh 8f9258fae3
Invert default feature for testing (#1200)
## Summary

We have some flags in Puffin that enable us to opt-in to certain tests.
To date, they've been opt-in, so we've run our tests with
`--all-features`. This PR makes them opt-out, and we now run tests with
default features.

The main motivation here is that I want to get tests working for macOS
on CI, but for unknown reasons, macOS can't compile the PyO3 features at
the same time as everything else due to strange linker issues. By
avoiding `--all-features` for tests, we thus avoid unnecessarily
including features that we don't actually use in Puffin.

I verified that the exact same number of tests (439) are run before and
after this change. For users, the primary difference is that you now
need to specify `--no-default-features --features pypi --features
python` to avoid (e.g.) including the Git tests.
2024-01-31 09:44:26 -05:00
Charlie Marsh b2f1bbaa63
Add a Ctrl+C handler to the confirm workflow (#1202)
Fixes an issue whereby exiting the confirmation prompt can lead to your
cursor disappearing: https://github.com/console-rs/dialoguer/issues/294.

See:
b839a2c5b7/rye/src/main.rs (L36-L48).
2024-01-31 02:08:27 +00:00
Charlie Marsh 262f29b558
Add missing `--exclude-newer` to executable tests (#1201)
A new version of `platformdirs` came out, which broke these.
2024-01-30 20:26:11 -05:00
Charlie Marsh b88b9e1f3d
Remove dedicated `flate2` features from Puffin (#1199)
We should be able to enable and disable these without crate-internal
features.
2024-01-30 19:41:08 -05:00
Andrew Gallant b47f70917f
puffin-client: simplify use of http-cache-semantics (#1197)
The `http-cache-semantics` crate is polymorphic on the types of requests
and responses it accepts. We had previously been explicitly converting
between `http` and `reqwest` types, but this isn't necessary. We can
provide impls of the traits in `http-cache-semantics` for `reqwest`'s
types (via a wrapper). This saves us from the awkward request/response
type conversions.

While this does clone the request, this is:

1. Not new. We were previously cloning the request to do the conversion.
2. An artifact (I believe) of http-cache-semantics API. (It kind of
   seems like an API bug to me?)

There is also a little bit of messiness around inter-operating between
http::uri::Uri and url::Url. But overall shouldn't be a big deal.
2024-01-30 18:20:44 -05:00
Charlie Marsh 3f5e7306a5
Remove `WaitMap` dependency (#1183)
## Summary

This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by
removing the `WaitMap` and gaining more granular control over the values
that we hold over `await` boundaries.
2024-01-30 15:25:22 -05:00
Charlie Marsh c129717b41
Add support for `--no-deps` to `pip install` (#1191)
## Summary

Closes https://github.com/astral-sh/puffin/issues/1188.
2024-01-30 19:54:57 +00:00
Charlie Marsh 8305acc584
Add a builder for resolution options (#1192) 2024-01-30 19:50:16 +00:00
Charlie Marsh aa3b79ec63
Prompt user for missing `-r` and `-e` flags in `pip install` (#1180)
## Summary

If the user runs a command like `pip install requirements.txt`, we now
prompt them to ask if they meant to include the `-r` flag:

![Screenshot 2024-01-29 at 8 38
29 PM](https://github.com/astral-sh/puffin/assets/1309177/82b9f7a2-2526-4144-b200-a5015e5b8a4b)

![Screenshot 2024-01-29 at 8 38
33 PM](https://github.com/astral-sh/puffin/assets/1309177/bd8ebb51-2537-4540-a0e0-718e66a1c69c)

The specific logic is: if the requirement ends in `.txt` or `.in`, and
the file exists locally, prompt the user for `-r`. If the requirement
contains a directory separator, and the directory exists locally, prompt
the user for `-e`.

Closes #1166.
2024-01-30 18:58:45 +00:00
Charlie Marsh 7a937e0f60
Error when parsing `requirements.txt`-like packages in `requirements.txt` file (#1179)
## Summary

Like https://github.com/astral-sh/puffin/pull/1180, this PR adds logic
for `requirements.txt` parsing whereby if a requirement _looks like_ a
local requirements file or an editable directory, we prompt the user to
correct the error (typically, by adding `-r`).
2024-01-30 18:55:11 +00:00
konsti 4ad0dc8b9e
Add windows aarch64 trampolines (#1190)
Lacking windows compatible aarch64 hardware, i cross compiled the
trampoline from x86_64 linux to aarch64-pc-windows-msvc; I added the
instructions to the puffin-trampoline readme. With some testing on an
aarch64 windows machine, this should be sufficient to build working
win_arm64 tagged wheels.

i686-pc-windows-msvc is failing with an error:

```
error: linking with `lld-link` failed: exit status: 1
  = note: lld-link: error: undefined symbol: __aulldiv
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced 4 more times

          lld-link: error: undefined symbol: __aullrem
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced 4 more times
```
2024-01-30 17:51:27 +00:00
Charlie Marsh c479c26cab
Add compatibility arguments for `pip sync` (#1185)
## Summary

As with `pip compile`, we can provide useful error messages and warnings
when people pass `pip sync` arguments.

Closes https://github.com/astral-sh/puffin/issues/1184.
2024-01-30 08:48:55 -05:00
konsti ab27913f68
Instrument the main function and add jupyter.in (#1186)
Instrument the main function as anchor span for checking overhead and
update tracing-durations-export to 0.2.0 for differentiating
blocking/non-blocking tasks.

Add a `jupyter.in` requirement since `pip install jupyter` is a common
operation. I tried `jupyterlab` too but there is no difference in
performance (1.00 ± 0.07).
2024-01-30 11:03:24 +00:00
konsti a6c4cbfe55
Cleanup puffin interpreter errors (#1169)
Use `virtualenv` consistently, remove unused error variants and hint the
user towards installing missing python versions.

I didn't touch the Readme but i replaced `virtualenv environment` with
`virtualenv` in the strings i found.

Fixes https://github.com/astral-sh/puffin/issues/1167
2024-01-30 10:52:46 +01:00
Charlie Marsh bd934207e4
Accept relative file paths in CLI requirements (#1182)
## Summary

See: https://github.com/astral-sh/puffin/issues/1181.

## Test Plan

```
❯ cargo run -- pip install packse@../../zanieb/packse
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/puffin pip install 'packse@../../zanieb/packse'`
error: Distribution not found at: file:///Users/crmarsh/zanieb/packse
```
2024-01-30 03:31:24 +00:00
konsti d4ed5ea858
Fix the `compile_python_37` test with python 3.7 installed (#1172)
Make the test `compile_python_37` pass whether python 3.7 is installed
or not by muting the warning for a missing 3.7. The resolution error is
independent of whether 3.7 is installed or not.
2024-01-29 18:59:28 +01:00
Charlie Marsh 67a09649f2
Support parsing `--find-links`, `--index-url`, and `--extra-index-url` in `requirements.txt` (#1146)
## Summary

This PR adds support for `--find-links`, `--index-url`, and
`--extra-index-url` arguments when specified in a `requirements.txt`.

It's a mostly-straightforward change. The only uncertain piece is what
to do when multiple files include these flags, and/or when we include
them on the CLI and in other files.

In general:

- If _anything_ specifies `--no-index`, we respect it.
- We combine all `--extra-index-url` and `--find-links` across all
sources, since those are just vectors.
- If we see multiple `--index-url` in requirements files, we error.
- We respect the `--index-url` from the command line over any provided
in a requirements file.

(`pip-compile` seems to just pick one semi-arbitrarily when multiple are
provided.)

Closes https://github.com/astral-sh/puffin/issues/1143.
2024-01-29 15:06:40 +00:00
Charlie Marsh 4b9daf9604
Use tokio_tar instead of async_tar (#1170)
## Summary

`tokio_tar` is a fork of `async_tar` that uses Tokio instead of
`async-std`. Using it removes a significant dependency from our tree.

(There is an open PR
(https://github.com/dignifiedquire/async-tar/pull/41) in `async-tar` to
add Tokio support, but it's over a year old.)

See:
https://github.com/astral-sh/puffin/pull/1157#discussion_r1469190249.
2024-01-29 10:00:30 -05:00
Andrew Gallant a42b385e9b
puffin-client: add SimpleMetadataRaw (#1150)
This adds what is effectively an owned wrapper around
`Archived<SimpleMetadata>`. Normally, an `Archived<SimpleMetadata>`
has to be used behind a pointer (since it has a lifetime
attached to its underlying byte buffer), but we create a
wrapper around it that owns the underlying buffer and provides
free access to the archived type.

This in effect creates an anchor point for the archived type
and lets us pass it around easily. (There has to be an anchor
point for it somewhere.)

An alternative to this approach would be to store it as a file
backed memory map. But in practice, we're dealing with small
files, and just reading them on to the heap is likely to be
faster. (Memory maps also have wildly different perf characteristics
across platforms.)

Note that this commit just defines the type. It isn't actually
used anywhere yet.
2024-01-29 09:37:06 -05:00
konsti be48200642
Small instrumentation improvements (#1164)
Less verbose span fields for `Dist`s by using the display impl and no
more min length in the tracing durations plot config for comparability
(we lose spans due to a speedup otherwise). Both wait points in the
solver loop are now instrumented so we can inspect what we're waiting
for to progress in the solver.
2024-01-29 10:55:19 +00:00
konsti 8bfc3c1b37
Trim `get_cached_with_callback` and `send_cached` down some more. (#1128)
I noticed that `get_cached_with_callback` and `send_cached` are large
both in terms of llvm lines and in terms of types (and large types can
cause buffer overflows on windows). `get_cached_with_callback`
specifically is large because it's monomorphized for each callback. I've
split both functions into smaller units and boxed the callback.

llvm lines, before:

```
  Lines                 Copies               Function name
  -----                 ------               -------------
  909511                21625                (TOTAL)
   36026 (4.0%,  4.0%)     33 (0.2%,  0.2%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde:🇩🇪:Deserializer>::deserialize_any
   14688 (1.6%,  5.6%)      8 (0.0%,  0.2%)  puffin_client::cached_client::CachedClient::get_cached_with_callback::{{closure}}::{{closure}}
   13748 (1.5%,  7.1%)      5 (0.0%,  0.2%)  puffin_client::cached_client::CachedClient::send_cached::{{closure}}
   12460 (1.4%,  8.5%)     35 (0.2%,  0.4%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   10731 (1.2%,  9.6%)    122 (0.6%,  0.9%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    8952 (1.0%, 10.6%)      9 (0.0%,  1.0%)  core::slice::sort::partition_in_blocks
    8216 (0.9%, 11.5%)    323 (1.5%,  2.5%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    7745 (0.9%, 12.4%)    205 (0.9%,  3.4%)  core::result::Result<T,E>::map_err
    6862 (0.8%, 13.1%)     54 (0.2%,  3.7%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    6720 (0.7%, 13.9%)    133 (0.6%,  4.3%)  std::panicking::try
    6600 (0.7%, 14.6%)     45 (0.2%,  4.5%)  <alloc::sync::Weak<T,A> as core::ops::drop::Drop>::drop
    5899 (0.6%, 15.2%)     33 (0.2%,  4.6%)  rmp_serde::decode::Deserializer<R,C>::read_str_data
    5610 (0.6%, 15.9%)     33 (0.2%,  4.8%)  alloc::raw_vec::RawVec<T,A>::allocate_in
    5187 (0.6%, 16.4%)    133 (0.6%,  5.4%)  std::panicking::try::do_catch
    4740 (0.5%, 17.0%)    268 (1.2%,  6.7%)  core::ops::function::FnOnce::call_once
    4670 (0.5%, 17.5%)     40 (0.2%,  6.8%)  puffin_client::cached_client::CachedClient::get_cached_with_callback::{{closure}}::{{closure}}::{{closure}}
    4527 (0.5%, 18.0%)     54 (0.2%,  7.1%)  core::iter::traits::iterator::Iterator::try_fold
```

after:

```
  Lines                 Copies               Function name
  -----                 ------               -------------
  910275                21712                (TOTAL)
   36026 (4.0%,  4.0%)     33 (0.2%,  0.2%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde:🇩🇪:Deserializer>::deserialize_any
   12460 (1.4%,  5.3%)     35 (0.2%,  0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   10935 (1.2%,  6.5%)    124 (0.6%,  0.9%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    8952 (1.0%,  7.5%)      9 (0.0%,  0.9%)  core::slice::sort::partition_in_blocks
    8714 (1.0%,  8.5%)      5 (0.0%,  0.9%)  puffin_client::cached_client::CachedClient::send_cached_handle_stale::{{closure}}
    8216 (0.9%,  9.4%)    323 (1.5%,  2.4%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    8192 (0.9%, 10.3%)      8 (0.0%,  2.5%)  puffin_client::cached_client::CachedClient::get_cached_with_callback::{{closure}}::{{closure}}
    7745 (0.9%, 11.1%)    205 (0.9%,  3.4%)  core::result::Result<T,E>::map_err
    6862 (0.8%, 11.9%)     54 (0.2%,  3.7%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    6778 (0.7%, 12.6%)      5 (0.0%,  3.7%)  puffin_client::cached_client::CachedClient::send_cached::{{closure}}
    6720 (0.7%, 13.4%)    133 (0.6%,  4.3%)  std::panicking::try
    6600 (0.7%, 14.1%)     45 (0.2%,  4.5%)  <alloc::sync::Weak<T,A> as core::ops::drop::Drop>::drop
    5899 (0.6%, 14.7%)     33 (0.2%,  4.7%)  rmp_serde::decode::Deserializer<R,C>::read_str_data
    5610 (0.6%, 15.3%)     33 (0.2%,  4.8%)  alloc::raw_vec::RawVec<T,A>::allocate_in
    5187 (0.6%, 15.9%)    133 (0.6%,  5.4%)  std::panicking::try::do_catch
    4740 (0.5%, 16.4%)    268 (1.2%,  6.7%)  core::ops::function::FnOnce::call_once
    4527 (0.5%, 16.9%)     54 (0.2%,  6.9%)  core::iter::traits::iterator::Iterator::try_fold
```

Stack sizes diff:
https://gist.github.com/konstin/a3f38276aacf1170038a756c8c49793c
2024-01-29 08:31:27 +00:00
Charlie Marsh fa3f0d7a55
Remove cache `purge` methods to `clean` (#1159)
This is more consistent with the public interface.
2024-01-28 21:15:11 -05:00
Charlie Marsh d88ce76979
Stream unpacking of source distribution downloads (#1157)
This PR migrates our source distribution downloads to unzip as we
stream, similar to our approach for wheels.

In my testing, this showed a consistent speedup (e.g., 6% here for a few
representative source distributions):

```text
❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in
Benchmark 1: ./target/release/main (install-cold)
  Time (mean ± σ):      1.503 s ±  0.039 s    [User: 1.479 s, System: 0.537 s]
  Range (min … max):    1.466 s …  1.605 s    10 runs

Benchmark 2: ./target/release/puffin (install-cold)
  Time (mean ± σ):      1.421 s ±  0.024 s    [User: 1.505 s, System: 0.593 s]
  Range (min … max):    1.381 s …  1.454 s    10 runs

Summary
  './target/release/puffin (install-cold)' ran
    1.06 ± 0.03 times faster than './target/release/main (install-cold)'
```
2024-01-28 20:09:24 -05:00
Andrew Gallant 5219d37250
add initial rkyv support (#1135)
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.

For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:

* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
  cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
  type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
  of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
  else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
  introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
  given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
  allocate and create all necessary objects.

This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.

The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.

My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.

There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.

[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
2024-01-28 12:14:59 -05:00
Charlie Marsh a25a1f2958
Avoid re-creating directories in async unzip (#1155)
This PR extends the optimizations from #1154 to other unzip paths.
2024-01-28 14:30:38 +00:00
Charlie Marsh 3d10f344f3
Only include visited packages in error message derivation (#1144)
## Summary

This is my guess as to the source of the resolver flake, based on
information and extensive debugging from @zanieb. In short, if we rely
on `self.index.packages` as a source of truth during error reporting, we
open ourselves up to a source of non-determinism, because we fetch
package metadata asynchronously in the background while we solve -- so
packages _could_ be included in or excluded from the index depending on
the order in which those requests are returned.

So, instead, we now track the set of packages that _were_ visited by the
solver. Visiting a package _requires_ that we wait for its metadata to
be available. By limiting analysis to those packages that were visited
during solving, we are faithfully representing the state of the solver
at the time of failure.

Closes #863
2024-01-28 09:27:22 -05:00
Charlie Marsh 6f2c235d21
Avoid re-creating directories during unzip (#1154)
## Summary

We have this optimization in `wheel.rs`, in the installer, but it makes
a huge difference for zips with many small files:

```
Benchmarking file_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 74.2s, or reduce sample count to 10.
file_reader/Django-5.0.1-py3-none-any.whl
                        time:   [751.63 ms 757.78 ms 764.27 ms]
                        change: [-1.0290% +0.0841% +1.2289%] (p = 0.88 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking buffered_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 53.4s, or reduce sample count to 10.
buffered_reader/Django-5.0.1-py3-none-any.whl
                        time:   [529.86 ms 536.44 ms 543.35 ms]
                        change: [+0.0293% +1.5543% +3.1426%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
```

That's almost 30% faster...
2024-01-28 00:07:54 -05:00
Charlie Marsh 888a9e6f53
Remove an unnecessary `Path` clone (#1153) 2024-01-28 03:16:51 +00:00
Charlie Marsh d243250dec
Avoid unnecessary permissions changes for copy paths (#1152)
In Rust, `fs::copy` automatically preserves permissions (see:
https://doc.rust-lang.org/std/fs/fn.copy.html).

Elsewhere, when copying from the zip archive out to the cache, we can
set permissions during file creation, rather than as a separate call.

Both of these should be slightly more efficient.
2024-01-27 22:11:55 -05:00
Charlie Marsh d6795da0ea
Set permissions after streaming unzip (#1151)
## Summary

When we migrated to an "unzip while we stream" solution, we lost the
logic to set permissions on the extracted files, so executables in
wheels were no longer executable. It turns out this is a little tricky,
since the permissions metadata is in the central directory at the _end_
of the zip file, and the async ZIP reader explicitly stops iteration
once it hits the central directory. (Specifically, it goes 4 bytes into
the central directory, since it sees the 4-byte signature header and
then stops.)

So, to solve that, I've added a `CentralDirectoryReader` that continues
where that iterator left off. This required forking the async zip crate:
https://github.com/charliermarsh/rs-async-zip/pull/1. It took a lot of
fiddling but I'm quite confident in the code now, especially since the
async zip crate validates the signature kind on every read.

The central directory is typically quite small (even for the Zig wheel,
which is enormous, it's just around 1MB), so I don't expect this to have
a high cost.

Closes https://github.com/astral-sh/puffin/issues/1148.
2024-01-27 19:22:44 -05:00
Charlie Marsh 15ca17a68d
Support relative `file:` paths for `--find-links` (#1147)
Just for consistency.
2024-01-27 03:48:25 +00:00
Charlie Marsh 4e19e6846d
Accept long form of pip arguments in `requirements.txt` (#1145) 2024-01-26 21:56:10 -05:00
Charlie Marsh addb94fbd6
Add support for emitting index URLs and --find-links (#1142)
Closes https://github.com/astral-sh/puffin/issues/1140.
2024-01-27 01:37:55 +00:00
Charlie Marsh a2ef2010d2
Add arguments for pip-compile compatibility (#1139)
## Summary

This ensures that we warn when redundant options are passed (like
`--allow-unsafe`, which is really common for forwards compatibility
since it's going to be the default in a future release), and errors when
known variants are passed that we _don't_ support (like
`--resolver=backtracking`).

Closes https://github.com/astral-sh/puffin/issues/1127.
2024-01-26 16:54:02 -05:00
Charlie Marsh 06024653f9
Reduce visibility of some methods in `wheel.rs` (#1125) 2024-01-26 16:34:51 -05:00
Zanie Blue 5cc4e5d31e
Add `pip compile` test where specific Python versions are available on the system (#1111)
Extends https://github.com/astral-sh/puffin/pull/1106 with the scenario
from https://github.com/zanieb/packse/pull/95 which tests that `pip
compile` will use the matching system Python version for builds when
available
2024-01-26 18:38:24 +00:00
Zanie Blue 91f421cf97
Do not allow `pip compile` scenario tests to discover other Python versions (#1106)
In https://github.com/astral-sh/puffin/pull/1040 we broke the pip
compile scenarios designed to test failure when a required Python
version is not available — resolution succeeded because all of the
Python versions were available in CI. Following #1105 we have the
ability to isolate tests from Python versions available in the system.
Here, we limit the scenarios to only the Python version in the current
environment, restoring our ability to test the error messages.

With https://github.com/zanieb/packse/pull/95, we will be able to
specify scenarios with access to additional system Python versions. This
will allow us to include test coverage where resolution can succeed by
using a version available elsewhere on the system. See #1111 for this
follow-up.
2024-01-26 18:18:15 +00:00
Zanie Blue 21577ad002
Add bootstrapping and isolation of development Python versions (#1105)
Replaces https://github.com/astral-sh/puffin/pull/1068 and #1070 which
were more complicated than I wanted.

- Introduces a `.python-versions` file which defines the Python versions
needed for development
- Adds a Bash script at `scripts/bootstrap/install` which installs the
required Python versions from `python-build-standalone` to `./bin`
- Checks in a `versions.json` file with metadata about available
versions on each platform and a `fetch-version` Python script derived
from `rye` for updating the versions
- Updates CI to use these Python builds instead of the `setup-python`
action
- Updates to the latest packse scenarios which require Python 3.8+
instead of 3.7+ since we cannot use 3.7 anymore and includes new test
coverage of patch Python version requests
- Adds a `PUFFIN_PYTHON_PATH` variable to prevent lookup of system
Python versions for isolation during development

Tested on Linux (via CI) and macOS (locally) — presumably it will be a
bit more complicated to do proper Windows support.
2024-01-26 12:12:48 -06:00
Charlie Marsh cc0e211074
Avoid embedding launcher scripts on non-Windows (#1124)
Just to reduce binary size on all other platforms.
2024-01-26 17:19:05 +01:00
Charlie Marsh f946d46273
Avoid allocating a max-size buffer (#1123)
This seems potentially-dangerous with no upside.
2024-01-26 14:27:19 +00:00
konsti 39021263dd
Windows launchers using posy trampolines (#1092)
## Background

In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.

On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.

```python
#!/usr/bin/env python
```

will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:

```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
    sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
    sys.exit(patched_main())
```

On windows, this doesn't work, we can only rely on launching `.exe`
files.

## Summary

We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.

The changes in this PR make the `black` entrypoint work:

```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```

Integration with our existing tests will be done in follow-up PRs.

## Implementation and Details

I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.

The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.

On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
2024-01-26 13:54:11 +00:00
konsti f1d3b08c12
Add missing version to pip sync test (#1121)
The test started failing due to a newer version on pypi.
2024-01-26 13:36:25 +00:00
Charlie Marsh 361a2039d2
Add `--no-annotate` and `--no-header` flags (#1117)
Closes #1107.
Closes #1108.
2024-01-26 12:14:18 +00:00
Charlie Marsh 7755f986c3
Support extras in editable requirements (#1113)
## Summary

This PR adds support for requirements like `-e .[d]`.

Closes #1091.
2024-01-26 12:07:51 +00:00
Charlie Marsh f593b65447
Remove refresh checks from the install plan (#1119)
## Summary

Rather than checking cache freshness in the install plan, it's a lot
simple to have the install plan _never_ return cached data when the
refresh policy is in place, and then rely on the distribution database
to check for freshness. The original implementation didn't support this,
since the distribution database was rebuilding things too often. Now, it
rarely rebuilds (it's much better about this), so it seems conceptually
much simpler to split up the responsibilities like this.
2024-01-25 22:48:16 -05:00
Charlie Marsh 50057cd5f2
Re-add Cargo's known hosts checking (#1118)
## Summary

This ensures that (like Cargo) we don't suffer from
https://github.com/advisories/GHSA-r5w3-xm58-jv6j, by way of checking
known hosts when fetching via `libgit2`.

The implementation is taken from Cargo itself, modified to remove all
configuration, since we don't yet support configuration for known hosts,
etc.

Closes #285.
2024-01-25 22:29:36 -05:00
Charlie Marsh 67b41427cc
Store source distribution directly in the cache (#1116)
I want to move towards using the archive bucket exclusively for wheels.
We never overwrite source distributions, so there's no need to symlink
them.
2024-01-25 20:52:31 -05:00
Charlie Marsh 77351c7874
Use snapshots for requirements.txt error tests (#1115)
## Summary

I find these too difficult to edit and maintain. This brings them closer
to the rest of our testing setups.
2024-01-25 20:35:52 -05:00
Charlie Marsh 57c116ee9a
Move Black editable to flit backend (#1114)
I ran into a bug in PDM that's making it impossible to use the Black
example for extras: https://github.com/pdm-project/pdm/issues/2591.

I've confirmed that Flit handles it correctly.
2024-01-25 19:54:54 -05:00
Zanie Blue 3a05ef5285
Add venv tests for missing Python versions (#1096)
These demonstrate some lackluster error messages.
2024-01-25 13:57:05 -06:00
Charlie Marsh f36c167982
Use a consolidated error for distribution failures (#1104)
## Summary

Use a single error type in `puffin_distribution`, rather than two
confusingly similar types between `DistributionDatabase` and the source
distribution module.

Also removes the `#[from]` for IO errors and replaces with explicit
wrapping, which is verbose but removes a bunch of incorrect error
messages.
2024-01-25 14:49:11 -05:00
Charlie Marsh 8ef819e07e
Remove `Option` wrapper from requirement extras (#1103)
There's no semantic difference between `None` and empty, so seems
simpler to represent this way.
2024-01-25 13:21:53 -05:00
Andrew Gallant 067acfe79e
puffin-client: rejigger error type (#1102)
This PR changes the error type to be boxed internally so that it uses
less size on the stack. This makes functions returning `Result<T,
Error>`, in particular, return something much smaller.

The specific thing that motivated this was Clippy lints firing when I
tried to refactor code in this crate.

I chose to achieve boxing by splitting the enum out into a separate
type, and then wiring up the necessary `From` impl to make error
conversions easy, and then making `Error` itself opaque. We could expose
the `Box`, but there isn't a ton of benefit in doing so because one
cannot pattern match through a `Box`.

This required using more explicit error conversions in several places.
And as a result, I was able to remove all `#[from]` attributes on
non-transparent error variants.
2024-01-25 13:13:21 -05:00
Charlie Marsh 3e86c80874
Set buffer size when unzipping (#1101)
The zip archive includes an uncompressed size header, which we can use
to preallocate.
2024-01-25 17:58:36 +00:00
Charlie Marsh e0902d7d5a
Make `puffin-fs` `tokio` dependency opt-in (#1100) 2024-01-25 12:47:46 -05:00
Charlie Marsh 5ad2e60561
Use `same-file` to detect interpreter shims (#1099)
Our existing detection doesn't work on Windows, because we canoncalize
the interpreter path but not `info.sys_executable`, so the former
includes the UNC prefix, etc. This is cross-platform and gets at the
intent of the check.
2024-01-25 12:27:49 -05:00
Charlie Marsh f4939e50a6
Remove UNC prefixes on Windows (#1086)
## Summary

This PR adds a `NormalizedDisplay` trait that we can use for user-facing
paths, to strip the UNC prefix on Windows.

On other platforms, the implementation is a no-op (vs. `Display`).

I audited all usages of `.display()`, and changed any that were
user-facing, either via `println!` or `eprintln!`, or by way of being
included in error messages. I did _not_ change uses that were only in
tests or only went to tracing.

Closes https://github.com/astral-sh/puffin/issues/1084.
2024-01-25 11:44:22 -05:00
konsti 035cd81ac8
Fix venv PATH on windows (#1095)
Windows uses `;` instead of `:` to separate `PATH` entries. This pull
request switches from manually using `:` to the `std::env` functions.
This fixes

```
puffin pip install -e scripts/editable-installs/maturin_editable
```

on windows.
2024-01-25 15:40:52 +00:00
Charlie Marsh 904db967af
Use junctions instead of symlinks on Windows (#1087)
## Summary

When we unzip wheels in the cache, we write the directories out to an
`archive-v0` bucket, and then symlink into that bucket from the
`wheels-v0` and `built-wheels-v0` buckets.

On Windows, symlinks are not well supported. Specifically, they need to
be explicitly enabled by the user. So, instead of symlinks, we now use
junctions, which are well-supported on Windows, and allow you to
(effectively) symlink a directory to another directory. This PR
implements said junction support, which gets the core installer working
on Windows.

In the past, we also used symlinks to implement another primitive: we
wanted to be able to replace a directory "atomically" (I put
"atomically" in quotes because I don't know if it's actually a
guaranteed atomic operation), in case someone was trying to use the
directory while we were replacing it (as opposed to deleting the
directory, then moving it into place).

On Windows, it doesn't appear to be possible to atomically replace a
junction. So instead, I'm using a new design, whereby the cache always
returns canonicalized paths. We know these canonicalized paths are
unique and won't be replaced, so they're safe for writers to rely on. In
general, when we write new data to the cache, we now return the
canonicalized path. When we read from the cache, and try to identify
(e.g.) the set of wheels available to us, we canonicalize the links
immediately and consider them non-existent if that operation fails.

Closes #1085.

---------

Co-authored-by: konstin <konstin@mailbox.org>
2024-01-25 10:06:38 +01:00
Charlie Marsh 036b7e5f43
Use `parse_headers` rather than parsing body (#1090)
Looking at the internals, this should make almost no difference in
performance, but anyway...
2024-01-25 09:41:21 +01:00
Zanie Blue ed1ac640b9
Consolidate `UnusableDependencies` into a generic `Unavailable` incompatibility (#1088)
Requires https://github.com/zanieb/pubgrub/pull/20

In short, `UnusableDependencies` can be generalized into `Unavailable`
which encompasses incompatibilities where a package range which is
unusable for some inherent reason as well as when its dependencies are
unusable. We can eventually use this to track more incompatibilities in
the solver. I made the reason string required because I can't see a case
where we should leave it out.

Additionally, this improves the display of conflicts in the root
requirements.
2024-01-24 22:10:44 -06:00
Zanie Blue 091f8e09ff
Use a cache directory for venv tests (#1089) 2024-01-24 22:09:37 -06:00
konsti ed6a1606b9
Use `which::which` instead of `which::which_global` (#1083)
`which::which_global` does not resolve relative paths, which we want to
support, while `which::which` does.
2024-01-24 18:35:57 -06:00
Charlie Marsh cedd2e0b3f
Use a buffered reader for wheel metadata (#1082)
## Summary

It turns out this is significantly faster when reading (e.g.) _just_ the
`METADATA` file from a zipped wheel.

I audited other `File::open` usages, and everything else seems to be
using a buffered reader already (directly, or in whatever third-party
crate it's passed to) _or_ is read immediately in full.

See the criterion benchmark:

```
file_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
                        time:   [6.9618 ms 6.9664 ms 6.9713 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
file_reader/flask-3.0.1-py3-none-any.whl
                        time:   [237.50 µs 238.25 µs 239.13 µs]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

buffered_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
                        time:   [648.92 µs 653.85 µs 660.09 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
buffered_reader/flask-3.0.1-py3-none-any.whl
                        time:   [39.578 µs 39.712 µs 39.869 µs]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
```
2024-01-24 15:22:55 -05:00
Zanie Blue 0019fe71f6
Add warning when target version does not match build version (#1072)
Follow-up to https://github.com/astral-sh/puffin/pull/1040 adding a
user-facing warning when we cannot build with their requested version.

e.g.

```
❯ cargo run -- pip compile requirements.in --python-version 3.11.4 --no-build
Resolved 8 packages in 483ms
❯ cargo run -- pip compile requirements.in --python-version 3.11.4
warning: The requested Python version 3.11.4 is not available; 3.11.7 will be used to build dependencies instead.
Resolved 8 packages in 71ms
❯ cargo run -- pip compile requirements.in --python-version 3.11
Resolved 8 packages in 71ms
```
2024-01-24 13:42:19 -06:00
Charlie Marsh 738e8341e2
Use a consistent `Timestamp` struct (#1081)
## Summary

This PR uses `ctime` consistently on Unix as a more conservative
approach to change detection. It also ensures that our timestamp
abstraction is entirely internal, so we can change the representation
and logic easily across the codebase in the future.
2024-01-24 14:21:31 -05:00
Zanie Blue bdfabfb088
Fixup doc for `find_best` (#1079) 2024-01-24 12:55:01 -06:00
konsti 2e0ce70d13
Initial windows support (#940)
## Summary

First batch of changes for windows support. Notable changes:

* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.

## Outlook

Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped

There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
2024-01-24 18:27:49 +01:00
Zanie Blue ea4ab29bad
Prefer target Python version over current version for builds (#1040)
Extends #1029 
Closes https://github.com/astral-sh/puffin/issues/1038

Instead of always using the current Python version for builds when a
target version is provided, we will do our best to use a compatible
Python version for builds.

Removes behavior where Python versions without patch versions were
always assumed to be the latest known patch version (previously
discussed in https://github.com/astral-sh/puffin/pull/534). While this
was convenient for resolutions which include packages which require
minimum patch versions e.g. `requires-python=">=3.7.4"`, it conflicts
with the idea that the target Python version you provide is the
_minimum_ compatible version. Additionally, it complicates interpreter
lookup as we cannot tell if the user has asked for that specific patch
version or not.
2024-01-24 11:12:02 -06:00
Charlie Marsh 0519375bd6
Remove some unused dependencies (#1077) 2024-01-24 11:58:21 -05:00
Charlie Marsh afb571643f
Avoid unzipping local wheels when fresh (#1076)
Since the archive is a single file in this case, we can rely on the
modification timestamp to check for freshness.
2024-01-24 15:01:16 +00:00
konsti 411613a24e
No python prefix in packse scenarios (#1066)
In windows, `python3.9` and `python3.11` are not in `PATH`. Instead, we
should pass only the python version to `puffin venv -p` in packse
scenarios (#1039).
2024-01-24 11:22:48 +00:00
Charlie Marsh 63f3434b21
Use nanoid instead of uuid (#1074)
## Summary

Gives us equivalent randomness with ~half as many characters.
2024-01-24 05:05:14 +00:00
Andrew Gallant eebc2f340a
make some things guaranteed to be deterministic (#1065)
This PR replaces a few uses of hash maps/sets with btree maps/sets and
index maps/sets. This has the benefit of guaranteeing a deterministic
order of iteration.

I made these changes as part of looking into a flaky test.
Unfortunately, I'm not optimistic that anything here will actually fix
the flaky test, since I don't believe anything was actually dependent
on the order of iteration.
2024-01-23 20:30:33 -05:00
Charlie Marsh 1b3a3f4e80
Add `--refresh` behavior to the cache (#1057)
## Summary

This PR is an alternative approach to #949 which should be much safer.
As in #949, we add a `Refresh` policy to the cache. However, instead of
deleting entries from the cache the first time we read them, we now
check if the entry is sufficiently new (created after the start of the
command) if the refresh policy applies. If the entry is stale, then we
avoid reading it and continue onward, relying on the cache to
appropriately overwrite based on "new" data. (This relies on the
preceding PRs, which ensure the cache is append-only, and ensure that we
can atomically overwrite.)

Unfortunately, there are just a lot of paths through the cache, and
didn't data is handled with different policies, so I really had to go
through and consider the "right" behavior for each case. For example,
the HTTP requests can use `max-age=0, must-revalidate`. But for the
routes that are based on filesystem modification, we need to do
something slightly different.

Closes #945.
2024-01-23 18:30:26 -05:00
Charlie Marsh cf8b452414
Track HTTP caches for URL wheels (#1071)
## Summary

This PR ensures that we store HTTP caching information for wheels.
Previously, we only stored these for source distributions. This will be
helpful for refresh, since we can avoid re-downloading wheels that are
unchanged per HTTP caching semantics.

There should be zero performance hit here for warm installs, and only an
extremely small hit for cold installs (writing the HTTP cache data to
disk). The hyperfine benchmarks reflect this.
2024-01-23 17:31:42 -05:00
Charlie Marsh 09f5884f28
Avoid revalidating immutable HTTP responses (#1069)
## Summary

If you send a revalidation request to a resource that returns an
`immutable` directive, the server apparently returns a 200 instead of a
304? In other words, the server can ignore the revalidation request.
This PR adds handling on top of the HTTP cache semantics to respect
immutable resources, which is especially useful since all PyPI files are
immutable.
2024-01-23 16:22:21 -05:00
Charlie Marsh 5621c414cf
Use symlinks for directories entries in cache (#1037)
## Summary

One problem we have in the cache today is that we can't overwrite
entries atomically, because we store unzipped _directories_ in the cache
(which makes installation _much_ faster than storing zipped
directories). So, if you ignore the existing contents of the cache when
writing, you might run into an error, because you might attempt to write
a directory where a directory already exists.

This is especially annoying for cache refresh, because in order to
refresh the cache, we have to purge it (i.e., delete a bunch of stuff),
which is also highly unsafe if Puffin is running across multiple threads
or multiple processes.

The solution I'm proposing here is that whenever we persist a
_directory_ to the cache, we persist it to a special "archive" bucket.
Then, within the other buckets, directory entries are actually symlinks
into that "archive" bucket. With symlinks, we can atomically replace,
which means we can easily overwrite cache entries without having to
delete from the cache.

The main downside is that we'll now accumulate dangling entries in the
"archive" bucket, and so we'll need to implement some form of garbage
collection to ensure that we remove entries with no symlinks. Another
downside is that cache reads and writes will be a bit slower, since we
need to deal with creating and resolving these symlinks.

As an example... after this change, the cache entry for this unzipped
wheel is actually a symlink:

![Screenshot 2024-01-22 at 11 56
18 AM](https://github.com/astral-sh/puffin/assets/1309177/99ff6940-5096-4246-8d16-2a7bdcdd8d4b)

Then, within the archive directory, we actually have two unique entries
(since I intentionally ran the command twice to ensure overwrites were
safe):

![Screenshot 2024-01-22 at 11 56
22 AM](https://github.com/astral-sh/puffin/assets/1309177/717d04e2-25d9-4225-b190-bad1441868c6)
2024-01-23 19:52:37 +00:00
Charlie Marsh 556080225d
Use ctime for interpreter timestamps (#1067)
Per https://apenwarr.ca/log/20181113, `ctime` should be a lot more
conservative, and should detect things like the issue we see with the
python-build-standalone builds, where the `mtime` is identical across
builds.

On Windows, I'm just using `last_write_time`. But we should probably add
`volume_serial_number` and other attributes via
[`winapi_util`](https://docs.rs/winapi-util/latest/winapi_util/index.html).
2024-01-23 19:52:20 +00:00
Charlie Marsh 6561617c56
Store source distribution builds under a unique manifest ID (#1051)
## Summary

This is a refactor of the source distribution cache that again aims to
make the cache purely additive. Instead of deleting all built wheels
when the cache gets invalidated (e.g., because the source distribution
changed on PyPI or something), we now treat each invalidation as its own
cache directory. The manifest inside of the source distribution
directory now becomes a pointer to the "latest" version of the source
distribution cache.

Here's a visual example:

![Screenshot 2024-01-22 at 5 35
41 PM](https://github.com/astral-sh/puffin/assets/1309177/ca103c83-e116-4956-b91c-8434fe62cffe)

With this change, we avoid deleting built distributions that might be
relied on elsewhere and maintain our invariant that the cache is purely
additive. The cost is that we now preserve stale wheels, but we should
add a garbage collection mechanism to deal with that.
2024-01-23 19:49:11 +00:00
Charlie Marsh e32027e384
Avoid persisting manifest data in standalone file (#1044)
## Summary

This PR gets rid of the manifest that we store for source distributions.
Historically, that manifest included the source distribution metadata,
plus a list of built wheels.

The problem with the manifest is that it duplicates state, since we now
have to look at both the manifest and the filesystem to understand the
cache state. Instead, I think we should treat the cache as the source of
truth, and get rid of the duplicated state in the manifest.

Now, we store the manifest (which is merely used to check for cache
freshness -- in future PRs, I will repurpose it though, so I left it
around), then the distribution metadata as its own file, then any
distributions in the same directory. When we want to see if there are
any valid distributions, we `readdir` on the directory. This is also
much more consistent with how the install plan works.
2024-01-23 19:46:48 +00:00
Zanie Blue 1f0a21d127
Write an `Into<anstream::ColorChoice>` implementation for more idiomatic code (#1064)
Follow-up to #1049
2024-01-23 15:43:16 +00:00
konsti 1131341cbc
Support more formats in `puffin venv`, incl. windows support (#1039)
Mirroring `virtualenv -p` and driven by the lack of `pythonx.y` in
`PATH` on windows, this PR adds `-p x.y` support to `puffin venv` (first
commit).

Supported formats:
* NEW: `-p 3.10` searches for an installed Python 3.10 (Looking for
`python3.10` on linux/mac).
  Specifying a patch version is not supported
* `-p python3.10` or `-p python.exe` looks for a binary in `PATH`
* `-p /home/ferris/.local/bin/python3.10` uses this exact Python

In the second commit, we add python interpreter search on windows using
`py --list-paths`. On windows, all python are called `python.exe` so the
unix trick of looking for `python{}.{}` in `PATH` doesn't work. Instead,
we ask the python launcher for windows to tell us about all installed
packages. We should eventually migrate this to [PEP
514](https://peps.python.org/pep-0514/) by reading the registry entries
ourselves.
2024-01-23 15:35:07 +00:00
Charlie Marsh cb04fa4496
Hide `--exclude-newer` from the command line (#1058)
This exists for our own test suite.
2024-01-23 00:29:47 -05:00
Zanie Blue 5db81c7caa
Add `--color always|never|auto` interface (#1049)
Extends #1048 interface providing a more general interface that I think
should be standard.

Allows forcing colors to be on _or_ off. e.g. `NO_COLOR=1 pip install
pip-tools --color always` would be colored.

Hides the `--no-color` option as it only exists for compatibility (and
seems better than throwing an error when people assume it will exist).

Has a nice side-effect of documenting our coloring behaviors e.g.

```
--color <COLOR>
    Control colors in output
    
    [default: auto]

    Possible values:
    - auto:   Enables colored output only when the output is going to a terminal or TTY with support
    - always: Enables colored output regardless of the detected environment
    - never:  Disables colored output
```
2024-01-22 23:01:36 -06:00
Zanie Blue a9a7b0069b
Add `--force-reinstall` alias for `--reinstall` to match pip interface (#1045)
Tested with `cargo run -- pip install pip-tools --force-reinstall`.

The alias is hidden.
2024-01-22 22:59:43 -06:00
Zanie Blue a87e071b5e
Add `--no-color` support for `pip` compatibility (#1048)
Adds `--no-color` as provided by `pip`.

See #1049 for follow-up.
2024-01-22 22:56:51 -06:00
Charlie Marsh 81401a17e5
Use `archive_mtime` in another call site (#1056)
_Not_ using this was an oversight.
2024-01-23 04:51:18 +00:00
Charlie Marsh 9fd3b8298d
Use `fs_err::tokio` consistently in distribution database (#1055) 2024-01-22 19:14:29 -05:00
Zanie Blue f3562e5a25
Canonicalize paths to interpreter executables before checking modified time (#1046)
If the executable is a symbolic link, checking the modified time will
not reflect changes to the source file e.g.

```
❯ touch foo
❯ ln -s foo foobar
❯ gstat -c %Y foo
1705958431
❯ gstat -c %Y foobar
1705958438
❯ touch foo
❯ gstat -c %Y foobar
1705958438
```

This can result in a stale cache being treated as fresh; for example,
when Rye changes the interpreter linked in a virtual environment.
2024-01-22 15:44:22 -06:00
Zanie Blue c06bc335c4
Fix failing test cases (#1047)
These tests from #1041 failing on `main`
https://github.com/astral-sh/puffin/actions/runs/7616995716/job/20745019216
due to conflict with #1042
2024-01-22 21:31:12 +00:00
Zanie Blue a2efd74209
Add complex Python requirement scenarios (#1041)
Follows #1011 with some more scenarios
2024-01-22 14:31:06 -06:00
Charlie Marsh c8941d4799
Rename metadata.msgpack to manifest.msgpack (#1043)
We store the `Manifest` at this path, so this name feels more
appropriate.
2024-01-22 15:00:41 -05:00
Zanie Blue 89eb8547ce
Fix missing comma before conclusions (#1042)
Closes https://github.com/astral-sh/puffin/issues/1010
2024-01-22 13:31:09 -06:00
Zanie Blue e21948f353
Improve display of Python versions (#1029)
In https://github.com/astral-sh/puffin/pull/986 there was some confusion
about what these values are set to and I noticed that we never actually
display the target version being used for a resolution.

- Consistently display the Python interpreter being used, i.e. make it
clear that we are referring the the interpreter/installed Python version
and always show the version number
- Display the target Python version during solving
2024-01-22 18:46:18 +00:00
Charlie Marsh e6f5c8360c
Use a separate memory index for each requirement (#1036)
Closes #1005.
2024-01-22 16:22:03 +00:00
Charlie Marsh b0e73d796c
Add support for PyPy wheels (#1028)
## Summary

This PR adds support for PyPy wheels by changing the compatible tags
based on the implementation name and version of the current interpreter.

For now, we only support CPython and PyPy, and explicitly error out when
given other interpreters. (Is this right? Should we just fallback to
CPython tags...? Or skip the ABI-specific tags for unknown
interpreters?)

The logic is based on
4d85340613/src/packaging/tags.py (L247).
Note, however, that `packaging` uses the `EXT_SUFFIX` variable from
`sysconfig`... Instead, I looked at the way that PyPy formats the tags,
and recreated them based on the Python and implementation version. For
example, PyPy wheels look like
`cchardet-2.1.7-pp37-pypy37_pp73-win_amd64.whl` -- so that's `pp37` for
PyPy with Python version 3.7, and then `pypy37_pp73` for PyPy with
Python version 3.7 and PyPy version 7.3.

Closes https://github.com/astral-sh/puffin/issues/1013.

## Test Plan

I tested this manually, but I couldn't find macOS universal PyPy
wheels... So instead I added `cchardet` to a `requirements.in`, ran
`cargo run pip sync requirements.in --index-url
https://pypy.kmtea.eu/simple --verbose`, and added logging to verify
that the platform tags matched (even if the architecture didn't).
2024-01-22 14:22:27 +00:00
Charlie Marsh 145ba0e5ab
Allow relative paths in requirements.txt (#1027)
This PR attempts to fix a common footgun in `requirements.txt` files.
Previously, to provide a file, you had to use `package_name @
file:///Users/crmarsh/...` -- in other words, an absolute path.

Now, these requirements follow the exact same rules as editables, so you
can do:
```
package_name @ ./file.zip
```

And similar.

The way the parsing is setup, this is intentionally _not_ supported when
reading metadata -- only when parsing `requirements.txt` directly.

Closes #984.
2024-01-22 14:20:30 +00:00
Charlie Marsh e09a51653e
Propagate cancellation errors in `OnceMap` (#1032)
## Summary

Ensures that if an operation is cancelled in one thread, we propagate it
to others rather than panicking.

Related to https://github.com/astral-sh/puffin/issues/1005.
2024-01-22 09:00:21 -05:00
Charlie Marsh db0c76c4ba
Improve `requirements-txt` error formatting (#1026)
- Wrap filename in quotes
- Only show the start position (I think the end is a bit noisy)
2024-01-22 13:42:17 +00:00
konsti 765e3175e1
Make windows compile (#1035)
Minimal changes to make `cargo check`/`cargo run` work to unblock the
remaining PR stacking
2024-01-22 13:11:20 +00:00
Charlie Marsh b9bee013ce
Use full Python version for installed version (#1033)
## Summary

`interpreter.version()` returns the `python_full_version`, but the
marker variant uses `python_version` instead of `python_full_version` --
so it's omitting the patch.
2024-01-22 00:44:39 -06:00
Zanie Blue 6202c9e1b5
Use current and requested Python versions in `requires-python` incompatibility errors (#986)
Closes https://github.com/astral-sh/puffin/issues/806
2024-01-22 00:32:02 -06:00
Charlie Marsh 23f73592b1
Add test to avoid invalidating virtualenv (#1031)
## Summary

I think if we used symlinks (instead of hardlinks), this test would fail
-- so it's worth including.
2024-01-21 19:53:58 -05:00
Charlie Marsh 540442b8de
Treat missing package name error as an unsupported requirement (#1025)
## Summary

Based on user feedback. Calling it a "parse error" is misleading, since
this is really something we don't support, but that users can work
around.
2024-01-21 19:53:10 -05:00
Zanie Blue 4026710189
Add scenario tests for `pip-compile` (#1011)
e.g. for scenarios that test resolution _without_ installation.

This refactors the `update` script to generate scenario test files for
`pip compile` _and_ `pip install`. We don't overlap scenarios to save
time. We only generate `pip compile` test cases for scenarios we cannot
represent with `pip install` e.g. a `--python-version` override.

The _one_ scenario I added happened to reveal a bug in our resolver
where we were incorrectly filtering versions by the installed version
when wheels were available. Per the comment at
https://github.com/astral-sh/puffin/issues/883#issuecomment-1890773112,
we should _only_ need to check for a compatible installed Python version
when using a different _target_ Python version if we need to build a
source distribution.
53bce68400
resolves this by removing the excessive constraints — the correct Python
version incompatibilities are applied elsewhere.
2024-01-21 17:47:42 -06:00
Charlie Marsh d9cc9dbf88
Improve error message when editable requirement doesn't exist (#1024)
Making these a lot clearer in the common case by reducing the depth of
the error.
2024-01-20 12:59:18 -05:00
Charlie Marsh 69d2791a43
Remove URL clone in requirements-txt parser (#1020) 2024-01-19 17:30:17 -05:00
Charlie Marsh b3954f2449
Enable PowerPC builds (#1017)
Closes #1015.
2024-01-19 17:29:11 -05:00
Charlie Marsh 459c2abc81
Avoid canonicalizing paths in `requirements-txt` (#1019)
## Summary

When you specify an editable that doesn't exist, it should error, but
not in the parser -- the error should be downstream.
2024-01-19 16:28:04 -05:00
Charlie Marsh d55e34c310
Make editable URL parsing more robust (#1018)
This just generalizes the parsing to handle arbitrary schemes instead of
encoding a fixed list.
2024-01-19 16:01:33 -05:00
Charlie Marsh c66395977d
Rename `pep440-rs` to `Readme.md` (#1014)
This is due to a bug in Maturin
(https://github.com/PyO3/maturin/pull/1915), so I'll just fix our setup
to work with existing versions.

Closes https://github.com/astral-sh/puffin/issues/991.
2024-01-19 15:16:12 -05:00
Zanie Blue 33b35f7020
Add support for disabling installation from pre-built wheels (#956)
Adds support for disabling installation from pre-built wheels i.e. the
package must be built from source locally.
We will still always use pre-built wheels for metadata during
resolution.

Available via `--no-binary` and `--no-binary-package <name>` flags in
`pip install` and `pip sync`. There is no flag for `pip compile` since
no installation happens there.

```
--no-binary

    Don't install pre-built wheels.
    
    When enabled, all installed packages will be installed from a source distribution. 
    The resolver will still use pre-built wheels for metadata.


--no-binary-package <NO_BINARY_PACKAGE>

    Don't install pre-built wheels for a specific package.
    
    When enabled, the specified packages will be installed from a source distribution. 
    The resolver will still use pre-built wheels for metadata.
```

When packages are already installed, the `--no-binary` flag will have no
affect without the `--reinstall` flag. In the future, I'd like to change
this by tracking if a local distribution is from a pre-built wheel or a
locally-built wheel. However, this is significantly more complex and
different than `pip`'s behavior so deferring for now.

For reference, `pip`'s flag works as follows:

```
--no-binary <format_control>

    Do not use binary packages. Can be supplied multiple times, and each time adds to the
    existing value. Accepts either ":all:" to disable all binary packages, ":none:" to empty the
    set (notice the colons), or one or more package names with commas between them (no colons).
    Note that some packages are tricky to compile and may fail to install when this option is
    used on them.
```

Note we are not matching the exact `pip` interface here because it seems
complicated to use. I think we may want to consider adjusting our
interface for this behavior since we're not entirely compatible anyway
e.g. I think `--force-build` and `--force-build-package` are clearer
names. We could also consider matching the `pip` interface or only
allowing `--no-binary <package>` for compatibility. We can of course do
whatever we want in our _own_ install interfaces later.

Additionally, we may want to further consider the semantics of
`--no-binary`. For example, if I run `pip install pydantic --no-binary`
I expect _just_ Pydantic to be installed without binaries but by default
we will build all of Pydantic's dependencies too.

This work was prompted by #895, as it is much easier to measure
performance gains from building source distributions if we have a flag
to ensure we actually build source distributions. Additionally, this is
a flag I have used frequently in production to debug packages that ship
Cythonized wheels.
2024-01-19 11:24:27 -06:00
Zanie Blue 8b49d900bd
Refer to the user instead of "root" when mentioning direct dependencies (#982)
Closes https://github.com/astral-sh/puffin/issues/857
2024-01-19 11:17:42 -06:00
Zanie Blue ae7a2cddc2
Avoid showing negations of ranges in error messages (#981)
Closes https://github.com/astral-sh/puffin/issues/980
2024-01-19 11:07:14 -06:00
Zanie Blue 02ed195982
Improve simple no version messages using complement of range (#979)
Improves some of the "no versions of <package> are available" messages
by showing the complement or inversion of the package.

Does not address cases like

```
Because there are no versions of crow that satisfy any of:
    crow>1.0.0,<2.0.0a5
    crow>2.0.0a7,<2.0.0b1
    crow>2.0.0b1,<2.0.0b5
...
```

which are a bit more complicated; I'll focus on those cases in a
follow-up.
2024-01-19 16:48:20 +00:00
Zanie Blue 7bb4fda8af
Say "depend on" instead of "depends on" when proper in error messages (#968)
I would like to spend some additional time working on the package range
display abstractions, but maybe that is best done _after_ I've done a
good bit of fiddling with the error messages.

Addresses
https://github.com/astral-sh/puffin/pull/868#discussion_r1447593081
2024-01-19 16:08:17 +00:00
Zanie Blue 5fe3444e5a
Use more realistic names in scenario snapshots (#978)
This is helpful to make the error messages more realistic and the names
are indisputably cuter.
2024-01-19 10:01:34 -06:00
Charlie Marsh 5adb08a304
Allow relative paths and environment variables in all editable representations (#1000)
## Summary

I don't know if this is actually a good change, but it tries to make the
editable install experience more consistent. Specifically, we now
support...

```
# Use a relative path with a `file://` prefix.
# Prior to this PR, we supported `file:../foo`, but not `file://../foo`, which felt inconsistent.
-e file://../foo

# Use environment variables with paths, not just URLs.
# Prior to this PR, we supported `file://${PROJECT_ROOT}/../foo`, but not the below.
-e ${PROJECT_ROOT}/../foo
```

Importantly, `-e file://../foo` is actually not supported by pip... `-e
file:../foo` _is_ supported though. We support both, as of this PR. Open
to feedback.
2024-01-19 09:00:37 -05:00
konsti cd2fb6fd60
Box `PrioritizedDistribution` (#948)
On top of https://github.com/astral-sh/puffin/pull/947, we can also box
`PrioritizedDistribution`.

In a simple benchmark, this seems to slightly improve performance when
comparing only this commit to main, even though the benchmark is too
noisy to establish significance:

```
$ hyperfine --warmup 30 --runs 300 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent"
  Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
    Time (mean ± σ):      83.6 ms ±   2.0 ms    [User: 77.7 ms, System: 20.0 ms]
    Range (min … max):    81.4 ms …  98.2 ms    300 runs

    Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

  Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
    Time (mean ± σ):      80.8 ms ±   2.2 ms    [User: 75.4 ms, System: 19.5 ms]
    Range (min … max):    78.6 ms …  98.6 ms    300 runs

    Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

  Summary
    target/profiling/puffin-dev resolve meine_stadt_transparent ran
      1.03 ± 0.04 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```

The effect on type sizes however is considerable ([downstack
PR](https://gist.github.com/konstin/38e6c774db541db46d61f1d4ea6b498f)
vs. [this
PR](https://gist.github.com/konstin/003a77fe7d7d246b0d535e3fc843cb36)):

```patch
--- branch.txt  2024-01-17 14:26:01.826085176 +0100
+++ boxed-prioritized-dist.txt  2024-01-17 14:25:57.101900963 +0100
@@ -1,19 +1,3 @@
-9264 alloc::collections::btree::node::InternalNode<pep440_rs::version::Version, distribution_types::PrioritizedDistribution> align=8
-   9168 data
-     96 edges
-
-9264 alloc::collections::btree::node::InternalNode<pep440_rs::Version, distribution_types::PrioritizedDistribution> align=8
-   9168 data
-     96 edges
-
-9168 alloc::collections::btree::node::LeafNode<pep440_rs::version::Version, distribution_types::PrioritizedDistribution> align=8
-   9064 vals
-     88 keys
-
-9168 alloc::collections::btree::node::LeafNode<pep440_rs::Version, distribution_types::PrioritizedDistribution> align=8
-   9064 vals
-     88 keys
-
 8992 tokio::sync::mpsc::block::Block<hyper::client::dispatch::Envelope<http::request::Request<reqwest::async_impl::body::ImplStream>, http::response::Response<hyper::body::body::Body>>> align=8
    8960 values
      32 header
@@ -74,10 +58,23 @@
          40 __tracing_attr_span
      64 variant Unresumed, Returned, Panicked

+5648 {async fn body@crates/puffin-client/src/registry_client.rs:224:5: 224:30} align=8
+   5647 variant Suspend0
+       5576 __awaitee align=8
+         40 __tracing_attr_span
```
2024-01-19 10:44:41 +01:00
konsti 47fc90d1b3
Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures (#1004)
This is https://github.com/astral-sh/puffin/pull/947 again but this time
merging into main instead of downstack, sorry for the noise.

---

Windows has a default stack size of 1MB, which makes puffin often fail
with stack overflows. The PR reduces stack size by three changes:

* Boxing `File` in `Dist`, reducing the size from 496 to 240.
* Boxing the largest futures.
* Boxing `CachePolicy`

## Method

Debugging happened on linux using
https://github.com/astral-sh/puffin/pull/941 to limit the stack size to
1MB. Used ran the command below.

```
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt
```

The main drawback is top-type-sizes not saying what the `__awaitee` is,
so it requires manually looking up with a future with matching size.

When the `brotli` features on `reqwest` is active, a lot of brotli types
show up. Toggling this feature however seems to have no effect. I assume
they are false positives since the `brotli` crate has elaborate control
about allocation. The sizes are therefore shown with the feature off.

## Results

The largest future goes from 12208B to 6416B, the largest type
(`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full
diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the
tests passed, then with an 800KB stack limit looked through the
backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

```console
$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```
2024-01-19 09:38:36 +00:00
konsti 66e651901e
Add an env var to artificially limit the stack size (#941)
By default, windows has a stack size limit of 1MB which we run against
in debug without any explicit culprit. A new environment variable
`PUFFIN_STACK_SIZE` allows setting an artificially smaller stack size.
2024-01-19 09:34:46 +00:00
Charlie Marsh 69c72b6fa1
Validate wheel metadata against filename (#1002)
Closes #983.
2024-01-19 05:48:55 +00:00
Charlie Marsh f86d9b1c31
Add tests for missing file errors (#1001) 2024-01-19 05:47:25 +00:00
Charlie Marsh c8285cb5ef
Bump version to v0.0.3 (#999) 2024-01-18 23:39:35 -05:00
Charlie Marsh 9b24fcd306
Remove verbatim URL from path file location (#998)
## Summary

I got confused by why `VerbatimUrl` was on `Path`. Since it's directly
computed from it, I think we should just compute it as-needed. I think
it's also possibly-buggy because the URL is the URL of the _directory_,
not the artifact itself, which differs from other distributions.
2024-01-18 22:40:48 -05:00
Charlie Marsh 732ef7adb7
Bump version to v0.0.2 (#987)
Bumping the version so that I can test the release process again
(including PyPI publish).
2024-01-18 20:56:09 -05:00
Charlie Marsh fe180804b5
Avoid encoding current version in test output (#988) 2024-01-19 01:50:23 +00:00
Charlie Marsh 3a1cd44fc6
Add Puffin Docker image (#985)
Missing piece for the release.

## Test Plan

Built the image locally:

```shell
❯ docker run 99956098e1f8f04e209dcfc4a0afcee67df1fe8a726c164884e67f035b1a0f42
Usage: puffin [OPTIONS] <COMMAND>

Commands:
  pip    Resolve and install Python packages
  venv   Create a virtual environment
  clean  Clear the cache
  help   Print this message or the help of the given subcommand(s)

Options:
  -q, --quiet                  Do not print any output
  -v, --verbose                Use verbose output
  -n, --no-cache               Avoid reading from or writing to the cache
      --cache-dir <CACHE_DIR>  Path to the cache directory [env: PUFFIN_CACHE_DIR=]
  -h, --help                   Print help
  -V, --version                Print version
```
2024-01-18 20:21:31 -05:00
Charlie Marsh 5e2b715366
Rename `puffin-cli` crate to `puffin` (#976)
## Summary

Like in Ruff, this simplifies a few things.
2024-01-18 19:02:52 -05:00
Charlie Marsh 6cad0f609c
Mark `puffin-dev` as `publish = false` (#975) 2024-01-18 17:20:44 -05:00
Charlie Marsh 8eadca4f8d
Remove unused path method (#974) 2024-01-18 21:59:12 +00:00
Charlie Marsh a262936366
Allow file:-relative paths in editable installs (#970)
Supports editable install via (e.g.) `puffin pip install -e file:.`,
which pip seems to support.

Closes #964.
2024-01-18 21:15:42 +00:00
Charlie Marsh f9154e8297
Add release workflow (#961)
## Summary

This PR adds a release workflow powered by `cargo-dist`. It's similar to
the version that's PR'd in Ruff
(https://github.com/astral-sh/ruff/pull/9559), with the exception that
it doesn't include the Docker build or the "update dependents" step for
pre-commit.
2024-01-18 15:44:11 -05:00
Charlie Marsh a883de4fb0
Enforce modification freshness checks against virtual environment (#959)
## Summary

This PR is like #957, but for validating the virtual environment, rather
than the cache. So, if you have a local wheel, and you rebuild it, we'll
now correctly uninstall and reinstall it in the virtual environment.
2024-01-18 20:21:16 +00:00
Charlie Marsh 96a61fb351
Remove RFC2047 decoder (#967)
## Summary

- This was inherited from
d719988323/src/metadata.rs (LL78C2-L91C26)
- ...which introduced this code here:
9cd1d43f7c
- ...with the originating issue here:
https://github.com/PyO3/maturin/issues/612
- ...and the upstream issue here:
https://github.com/staktrace/mailparse/issues/50

It seems like the goal was to support Unicode in certain header fields,
but I don't think this is necessary for us. We only use
`get_first_value` for `Requires-Python`, which has to be ASCII, doesn't
it?

In my testing, it seems like the `charset` hack can also be removed. The
tests I copied over actually work without it, which makes me a bit
skeptical.

The main benefit here is that we get to a remove a _big_ dependency
stack, including Chumsky and Stacker and psm which have limited
cross-platform support.
2024-01-18 15:09:45 -05:00
Charlie Marsh f17bad0a75
Mark path-based cache entries as stale during install plan (#957)
## Summary

This is a small correctness improvement that ensures that we avoid using
stale cache entries for local dependencies in the install plan. We
already have some logic like this in the source distribution builder,
but it didn't apply in the install plan, and so we'd end up using stale
wheels.

Specifically, now, if you create a new local wheel, and run `pip sync`,
we'll mark the cache entries as stale and make sure we unzip it and
install it. (If the wheel is _already_ installed, we won't reinstall it
though, which will be a separate change. This is just about reading from
the cache, not the environment.)
2024-01-18 19:13:29 +00:00
konsti a11744e438
Normalize base python in venv creation (#966)
Fixes #965

We have to canonicalize the interpreter path, otherwise the home is set
to the venv dir instead of the real root. This would make
python-build-standalone fail with the encodings module not being found
because its home is wrong.
2024-01-18 15:32:30 +00:00
konsti 7acde5a9a0
Fix `pep508_rs` doc test (#963)
Since nextest does not run doctests, this did not show up on CI.
2024-01-18 14:24:30 +00:00
konsti 5ec5a3243c
Set miette hook in all of puffin-cli (#962)
Fixes #938
2024-01-18 08:37:26 -05:00
Charlie Marsh 8ae8ddc7d9
Fix 3-to-2 reference in pip sync test (#958) 2024-01-18 04:33:46 +00:00
Charlie Marsh fbe70f4218
Split install plan into builder and struct (#955)
The `InstallPlan` does a lot of work in the constructor, which I tend to
feel is an anti-pattern. With cache refresh, it's also going to need to
be made `async`, so it really feels like it should be a clearer method
rather than an async, fallible constructor that does a bunch of IO. This
PR splits into a `Planner` (with a `build` method) and a `Plan`.
2024-01-17 15:28:46 -05:00
Charlie Marsh 055fd64eb1
Add an `--update-package` setting to allow individual package upgrades (#953)
Closes #950.
2024-01-17 14:31:52 -05:00
Zanie Blue a4204d00c1
Bump to latest packse version with "extras" scenarios (#935)
Includes:

- https://github.com/zanieb/packse/pull/83 (replaces some of the
post-processing here)
- https://github.com/zanieb/packse/pull/82
- https://github.com/zanieb/packse/pull/81
2024-01-17 13:25:48 -06:00
Charlie Marsh a0420114c3
Avoid storing absolute URLs for files (#944)
## Summary

It turns out that storing an absolute URL for every file caused a
significant performance regression. This PR attempts to address the
regression with two changes.

The first is that we now store the raw string if the URL is an absolute
URL. If the URL is relative, we store the base URL alongside the raw
relative string. As such, we avoid serializing and deserializing URLs
until we need them (later on), except for the base URL.

The second is that we now use the internal `Url` crate methods for
serializing and deserializing. If you look inside `Url`, its standard
serializer and deserialization actually convert it to a string, then
parse the string. But the crate exposes some other methods for faster
serialization and deserialization (with fewer guarantees). I think this
is totally fine since the cache is entirely internal.

If we _just_ change the `Url` serialization (and no other code -- so
continue to store URLs for every file), then the regression goes down to
about 5%:

```shell
❯ python -m scripts.bench \
        --puffin-path ./target/release/main \
        --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \
        scripts/requirements/home-assistant.in --benchmark resolve-warm
Benchmark 1: ./target/release/main (resolve-warm)
  Time (mean ± σ):     496.3 ms ±   4.3 ms    [User: 452.4 ms, System: 175.5 ms]
  Range (min … max):   487.3 ms … 502.4 ms    10 runs

Benchmark 2: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     284.8 ms ±   2.1 ms    [User: 245.8 ms, System: 165.6 ms]
  Range (min … max):   280.3 ms … 288.0 ms    10 runs

Benchmark 3: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     300.4 ms ±   3.2 ms    [User: 255.5 ms, System: 178.1 ms]
  Range (min … max):   295.4 ms … 305.1 ms    10 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)'
    1.74 ± 0.02 times faster than './target/release/main (resolve-warm)'
```

So I considered _just_ making that change. But 5% is kind of
borderline...

With both of these changes, the regression is down to 1-2%:

```
Benchmark 1: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     282.6 ms ±   7.4 ms    [User: 244.6 ms, System: 181.3 ms]
  Range (min … max):   275.1 ms … 318.5 ms    30 runs

Benchmark 2: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     286.8 ms ±   2.2 ms    [User: 247.0 ms, System: 169.1 ms]
  Range (min … max):   282.3 ms … 290.7 ms    30 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)'
```

It's consistently ~2%-ish, but at this point it's unclear if that's due
to the URL change or something other change between now and then.

Closes #943.
2024-01-17 09:15:21 -05:00
Charlie Marsh b8fbd529a1
Move `OnceMap` into its own crate (#946)
## Summary

This is extremely generic (like `WaitMap`), and I want to use it in the
cache.
2024-01-17 04:09:15 +00:00
konsti 5051b2c004
Use tempfile to prevent install io race crashes (#929)
On ubuntu and python 3.10,

```
cargo run -q -- pip-install --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html "jax[cuda12_pip]==0.4.23"
```

non-deterministically but for me consistently fails to install with
messages such as

```
error: Failed to install: nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (nvidia-nccl-cu12==2.19.3)
  Caused by: failed to remove file `/home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py`
  Caused by: No such file or directory (os error 2)
```

```
error: Failed to install: nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (nvidia-cublas-cu12==12.3.4.1)
  Caused by: Replacing an existing file or directory failed
```

```
error: Failed to install: nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (nvidia-cuda-nvcc-cu12==12.3.107)
  Caused by: failed to hardlink file from /home/konsti/.cache/puffin/wheels-v0/pypi/nvidia-cuda-nvcc-cu12/nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64/nvidia/__init__.py to /home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py
  Caused by: File exists (os error 17)
```

We install a lot of nvidia package, that all contain
`nvidia/__init__.py`, since they all install themselves into the
`nvidia` module:

```
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
```

```
$  tree -L 1 .venv/lib/python3.10/site-packages/nvidia
.venv/lib/python3.10/site-packages/nvidia
├── cublas
├── cuda_cupti
├── cuda_nvcc
├── cuda_nvrtc
├── cuda_runtime
├── cudnn
├── cufft
├── cusolver
├── cusparse
├── __init__.py
├── nccl
└── nvjitlink
```

When installing we get a race condition, each package installation is
its own thread:
* Installer Thread 1 creates `nvidia/__init__.py`
* Installer Thread 2 sees an existing  `nvidia/__init__.py`
* Installer Thread 3 sees an existing  `nvidia/__init__.py`
* Installer Thread 2 removes `nvidia/__init__.py`
* Installer Thread 3 tries to remove `nvidia/__init__.py`, it doesn't
exist anymore -> failure.

We switch to a new strategy: When the target files exists, we don't
remove it, but instead hardlink the source file to a tempfile first,
then renaming the tempfile to the target file. Renaming is considered an
atomic operation.

I've put the logging on debug level because they cases indicate a
conflict between two packages while being rare.

Closes #925

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-01-16 21:07:39 +00:00
Charlie Marsh b50e5fcbc5
Fetch `--find-links` indexes in parallel (#934)
## Summary

Removes a TODO.

## Test Plan

Tested manually with:

```shell
cargo run -p puffin-cli -- \
    pip compile requirements.in -n \
    --find-links 'https://download.pytorch.org/whl/torch_stable.html' \
    --find-links 'https://storage.googleapis.com/jax-releases/jax_cuda_releases.html' \
    --verbose
```

And inspecting the logs to ensure that the two requests were kicked off
concrurently.
2024-01-16 11:37:35 +01:00
Charlie Marsh 2f8f126f2f
Share a single `Index` across resolutions (#906)
## Summary

This PR uses a single `Index` that's shared between the top-level
resolver and any sub-resolutions happen in the course of that top-level
resolution (namely, to resolve build dependencies for any source
distributions).

In theory it's an optimization, since (e.g.) if we have two packages
that both need the `flit-core` build system, and we attempt to build
them both at once, we'll only fetch its metadata _once_, and share it
across the two resolutions. In practice, I haven't been able to get this
to show up in benchmarks. I suspect you'd need a _lot_ of source
distributions for it to matter... Though it may still be worth doing, it
strikes me as a cleaner design.

Closes #200.

Closes #541.
2024-01-16 05:37:15 +00:00
Charlie Marsh 0f592b67bb
Remove clone from `RegistryWheelIndex` (#937)
Doesn't need to own the package names.
2024-01-15 16:18:12 -05:00
Charlie Marsh 2a69b273ce
Use a standalone error type for `--find-links` registry (#936) 2024-01-15 19:48:48 +00:00
Charlie Marsh e71e3e8dd1
Refresh `BuildDispatch` when running pip install with `--reinstall` (#933)
## Summary

This fixes an extremely subtle bug in `pip install --reinstall`, whereby
if you depend on `setuptools` at the top level, we end up uninstalling
it after resolving, which breaks some cached state. If we have
`--reinstall`, we need to reset that cached state between resolving and
installing.

## Test Plan

Running `pip install --reinstall` with:

```txt
setuptools
devpi @ e334eb4dc9bb023329e4b610e4515b/devpi-2.2.0.tar.gz
```

Fails on `main`, but passes.
2024-01-15 18:56:18 +00:00
Charlie Marsh 116da6b7de
Share in-flight map across resolutions (#932)
## Summary

This PR fixes a subtle bug in `pip install` when using `--reinstall`. If
a package depends on a build system directly (e.g., `waitress` depends
on `setuptools`), and then you have other packages that also need the
build system to build a source distribution, right now, we don't share
the `OnceMap` between those cases.

This lifts the `InFlight` tracking up a level, so that it's initialized
once per command, then shared everywhere.

## Test Plan

I'm having trouble coming up with an identical test-case and hesitant to
add this slow test to the suite... But if you run `pip install
--reinstall` with:

```
waitress @ git+https://github.com/zanieb/waitress
devpi-server @ git+https://github.com/zanieb/devpi#subdirectory=server
```

It fails consistently on `main` and passes here.
2024-01-15 13:11:22 -05:00
Charlie Marsh 249ca10765
Move Puffin subcommands to a pip namespace (#921)
## Summary

This makes the separation clearer between the legacy `pip` API and the
API we'll add in the future for the package manager itself. It also
enables seamless `puffin pip` aliasing for those that want it.

Closes #918.
2024-01-15 16:36:45 +00:00
Charlie Marsh e54fdea93f
Continue to respect `--find-links` with `--no-index` (#931)
Like `pip`, we should allow `--find-links` with `--no-index`.
2024-01-15 16:19:27 +00:00
Charlie Marsh 42888a9609
Share flat index across resolutions (#930)
## Summary

This PR restructures the flat index fetching in a few ways:

1. It now lives in its own `FlatIndexClient`, since it felt a bit
awkward (in my opinion) for it to live in `RegistryClient`.
2. We now fetch the `FlatIndex` outside of the resolver. This has a few
benefits: (1) the resolver construct is no longer `async` and no longer
returns `Result`, which feels better for a resolver; and (2) we can
share the `FlatIndex` across resolutions rather than re-fetching it for
every source distribution build.
2024-01-15 11:02:02 -05:00
Charlie Marsh e6d7124147
Add an extra struct around the package-to-flat index map (#923)
## Summary

`FlatIndex` is now the thing that's keyed on `PackageName`, while
`FlatDistributions` is what used to be called `FlatIndex` (a map from
version to `PrioritizedDistribution`, for a single package). I find this
a bit clearer, since we can also remove the `from_files` that doesn't
return `Self`, which I had trouble following.
2024-01-15 14:48:10 +00:00
Charlie Marsh 9a3f3d385c
Remove `PubGrubVersion` (#924)
## Summary

I'm running into some annoyances converting `&Version` to
`&PubGrubVersion` (which is just a wrapper type around `Version`), and I
realized... We don't even need `PubGrubVersion`?

The reason we "need" it today is due to the orphan trait rule: `Version`
is defined in `pep440_rs`, but we want to `impl
pubgrub::version::Version for Version` in the resolver crate.

Instead of introducing a new type here, which leads to a lot of
awkwardness around conversion and API isolation, what if we instead just
implement `pubgrub::version::Version` in `pep440_rs` via a feature? That
way, we can just use `Version` everywhere without any confusion and
conversion for the wrapper type.
2024-01-15 08:51:12 -05:00
konsti 8860a9c29e
Add flat index urls to registry wheel index (#928)
Previously, we were missing flat index wheels in the cache.
2024-01-15 10:21:59 +00:00
konsti 95f3cca28d
Use fs_err in more places (#926)
Before:

```
error: Failed to download distributions
  Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89
  Caused by: Directory not empty (os error 39)
```

After:

```
error: Failed to download distributions
  Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89
  Caused by: failed to rename file from /home/konsti/.cache/puffin/.tmpcG7tVP/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64.whl to /home/konsti/.cache/puffin/wheels-v0/index/9ff50b883297fa9d/jaxlib/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64
  Caused by: Directory not empty (os error 39)
```
2024-01-15 09:39:33 +00:00
konsti 82ff136a74
Add find links supports to pip-sync (#914)
Closes #877
2024-01-15 03:04:55 +00:00
konsti f63776b894
Support HTML indexes in `--find-links` (#913)
The simple html format parser luckily seems to work for find links too,
at least it can parse
https://storage.googleapis.com/jax-releases/jax_cuda_releases.html.
2024-01-15 02:54:34 +00:00
konsti e9b6b6fa36
Implement `--find-links` as flat indexes (directories in pip-compile) (#912)
Add directory `--find-links` support for local paths to pip-compile.

It seems that pip joins all sources and then picks the best package. We
explicitly give find links packages precedence if the same exists on an
index and locally by prefilling the `VersionMap`, otherwise they are
added as another index and the existing rules of precedence apply.

Internally, the feature is called _flat index_, which is more meaningful
than _find links_: We're not looking for links, we're picking up local
directories, and (TBD) support another index format that's just a flat
list of files instead of a nested index.

`RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and
`SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist`
and `RegistrySourceDist` gained the ability to represent both a url and
a path so that `--find-links` with a url and with a path works the same,
both being locked as `<package_name>@<version>` instead of
`<package_name> @ <url>`. (This is more of a detail, this PR in general
still work if we strip that and have directory find links represented as
`<package_name> @ file:///path/to/file.ext`)

`PrioritizedDistribution` and `FlatIndex` have been moved to locations
where we can use them in the upstack PR.

I added a `scripts/wheels` directory with stripped down wheels to use
for testing.

We're lacking tests for correct tag priority precedence with flat
indexes, i only confirmed this manually since it is not covered in the
pip-compile or pip-sync output.

Closes #876
2024-01-15 02:04:10 +00:00
konsti 5ffbfadf66
Make hashes optional (#910)
There is no guarantee that indexes provide hashes at all or the sha256
we support specifically. [PEP
503](https://peps.python.org/pep-0503/#specification):

> The URL SHOULD include a hash in the form of a URL fragment with the
following syntax: #<hashname>=<hashvalue>, where <hashname> is the
lowercase name of the hash function (such as sha256) and <hashvalue> is
the hex encoded digest.

We instead use the url as input to generate a hash when caching.
2024-01-14 16:32:55 -05:00
Zanie Blue 9ad19b7e54
Bump to the latest packse version (#916) 2024-01-14 12:49:23 -06:00
konsti a53bdeba4c
Remove `base` from `RegistryBuiltDist` and `RegistrySourceDist` (#919)
Follow-up to https://github.com/astral-sh/puffin/pull/917 i found
rebasing the find-links PRs, this field became unused through the
absolute URLs.
2024-01-14 17:46:16 +00:00
Charlie Marsh 0374000ec0
Normalize extras when evaluating PEP 508 markers (#915)
## Summary

We always normalize extra names in our requirements (e.g., `cuda12_pip`
to `cuda12-pip`), but we weren't normalizing within PEP 508 markers,
which meant we ended up comparing `cuda12-pip` (normalized) against
`cuda12_pip` (unnormalized).

Closes https://github.com/astral-sh/puffin/issues/911.
2024-01-14 17:16:54 +00:00
konsti a99e5e00f2
Use absolute urls in `distribution_type::File` (#917)
Previously, the url on file could either be a relative or an absolute
url, depending on the index, and we would finalize it lazily. Now we
finalize the url when converting `pypi_types::File` to
`distribution_types::File`. This change is required to make the hashes
on `File` optional (https://github.com/astral-sh/puffin/pull/910), which
are currently the only unique field usable for caching.
2024-01-14 17:15:24 +00:00
Charlie Marsh 6e18e56789
Adjust markers to match target Python version (#909)
## Summary

This PR ensures that when the user passes in `--python-version`, we
adjust the _markers_ to match the target version, thus forcing us to
select compatible wheels for the `--python-version`, rather than the
installed version.

## Context

Let's call Python 3.10 the "installed" environment and Python 3.12 the
"target" environment. For each version, we have _both_ a Python version
(to match against `Requires-Python`) and a set of tags (to match against
wheels).

The rules for resolution are as follows...

- For each package, for each version, we try to find the "best
candidate" for resolution and installation.
- We first look for a wheel that's compatible with the _target_
environment. This requires testing against both the `Requires-Python`
and the markers. (We won't have to build or run this code, so the
_installed_ version is irrelevant.) **(This PR corrects _this_ bullet --
previously, we validated against the _installed_ markers, rather than
the target markers.)**
- If we can't find a compatible wheel, we accept any _incompatible_
wheel as long as there's a source distribution. The source distribution
_must_ be compatible with the target environment. (We won't have to
build or run this code, so the _installed_ version is irrelevant.)
- If there are no wheels, then the source distribution must be
compatible with _both_ the installed and target environments, since we
need to build it.

This is all true for the top-level resolution. When we perform a
sub-resolution (when resolving the build dependencies of a source
distribution), we should _only_ use the installed environment, and
ignore the target environment, since we assume that the dependencies
will be the same in both environments once built -- so our goal is
"just" to build the distribution, without concern for which build
dependencies it uses.

Closes https://github.com/astral-sh/puffin/issues/883.
2024-01-14 15:39:15 +00:00
Charlie Marsh 8187c05d8a
Use `DashMap` for redirects (#908)
## Summary

We don't need to wait on these, so it's simpler to use a standard
concurrent hash map.
2024-01-13 20:36:02 +00:00
Charlie Marsh f527f2add9
Remove erroneous local `Index` in resolver (#907) 2024-01-13 15:19:00 -05:00
Charlie Marsh 231686e71b
Remove `incompatibilities` from index (#905)
This isn't really part of the "index", it's part of the resolution.
2024-01-13 02:57:15 +00:00
Charlie Marsh 477186dcb3
Remove `ResolutionGraph#requirements` (#903) 2024-01-12 20:09:19 +00:00
Charlie Marsh d3f65c317d
Avoid some additional clones for `PackageName` (#896) 2024-01-12 17:54:40 +00:00
konsti aee6aed684
Make install_editable test faster (#901)
Remove a test case from the `install_editable` that slows it down from
3.6s to 6.5s while providing low test coverage. It also seems to block
other tests sometimes, `cargo nextest run -E "test(editable)"
--all-features` has more consistent and lower runtimes. Surprisingly
this seems to have bigger effect than switching from pyo3 to cffi.

Used test commands:
```
rm -rf scripts/editable-installs/maturin_editable/target/ && time cargo nextest run -E "test(=install_editable)" --all-features
rm -rf scripts/editable-installs/maturin_editable/target/ && time cargo nextest run -E "test(editable)" --all-features
 ```

Part of #878
2024-01-12 18:50:27 +01:00
konsti 878bc4bf8d
Stub out DTLSsocket test (#900)
Replace the DTLSsocket test with a dummy package that does nothing but
contain the build system specs that we need. This should speed up one of
the slowest tests.

Part of #878
2024-01-12 18:50:16 +01:00
Charlie Marsh 06039e1293
Add hashes to `pip-compile` output (#894)
## Summary

Adds hashes to `pip-compile` output, though we don't actually check
those hashes in `pip-sync` yet.

Closes https://github.com/astral-sh/puffin/issues/131.
2024-01-12 12:44:19 -05:00
konsti 0cc98c771e
Fix a tracing panic (#899) 2024-01-12 14:47:58 +00:00
Charlie Marsh 11b11d04a7
Ignore installed version when determining wheel compatibility (#890) 2024-01-12 08:57:00 -05:00
Charlie Marsh 5fd2c380a7
Add `into_cached_dist` to `LocalWheel` (#893)
Simplifies `unzip_wheel` a bit and avoids unnecessarily cloning in the
common case.
2024-01-12 09:01:30 +00:00
Charlie Marsh 35c1faa575
Move in-flight tracking to the download level (#892)
## Summary

Now that `get_or_build_wheel` will often _also_ handle the unzip step,
we need to move our per-target locking (`OnceMap`) up a level.
Previously, it was only applied to the unzip step, to prevent us from
attempting to unzip into the same target concurrently; now, it's applied
at the `get_wheel` level, which includes both downloading and unzipping.

## Test Plan

It seems like none of our existing tests catch this -- perhaps because
they're too "simple"? You need to run into a situation in which you're
doing multiple source distribution builds concurrently (since they'll
all try to download `setuptools`):

```
rm -rf foo && virtualenv --clear .venv && cargo run -p puffin-cli -- pip-compile ./scripts/requirements/pydantic.in  --verbose --cache-dir foo
```
2024-01-12 09:52:22 +01:00
Charlie Marsh 60cea0f07d
Use consistent parse terminology in pyproject error (#891)
We use `parse` for the other file types.
2024-01-11 21:25:47 -05:00
bojanserafimov 4c047f858f
Remove InMemoryWheel and dead code (#879) 2024-01-11 10:11:07 -05:00
bojanserafimov 10227a74f8
Unzip while downloading (#856) 2024-01-11 09:41:46 -05:00
konsti 0dfbddd275
Shorten resolve many dev output (#885) 2024-01-11 13:53:13 +00:00
konsti 8c2b7d55af
Cleanup deps and docs (#882)
Fix warnings from `cargo +nightly udeps` and `cargo doc`.

Removes all mentions of regex from pep440_rs.
2024-01-11 10:43:40 +00:00
Zanie Blue d6fa628e11
Fix failing test (#880) 2024-01-11 00:41:37 +00:00
Zanie Blue 811332eacc
Improve handling of "full" version ranges (#868)
Reduces the number of implementation branches handling `Range:full`,
deferring it to `PackageRange`.
Improves some user-facing messages, e.g. saying `all versions of
<package>` instead of `<package>*`.
Changes the member names of the `PackageRangeKind` enum — they were not
very clear.
2024-01-10 21:03:55 +00:00
Zanie Blue a65c55ff4a
Say "cannot be used" and "must be used" instead of "forbidden" and "mandatory" (#867)
Closes #858
2024-01-10 20:49:40 +00:00
Zanie Blue 845ba6801d
Improve formatting of incompatible terms when there are two items (#866) 2024-01-10 20:36:54 +00:00