Commit Graph

257 Commits

Author SHA1 Message Date
Andrew Gallant 8d70055140
puffin-client: actually use SimpleMetadataRaw
This commit introduces SimpleMetadataRaw into the registry
client without actually changing the cache contents. This
effectively is just lining up the API we want. To achieve
this, we actually introduce a perf regression by continuing
to deserialize SimpleMetadata as a MessagePack blob, and then
doing a rkyv serialization followed by a deserialization.

The idea is that we want to change the cached client to deal
with archived types instead of round-tripping through Serde.
2024-02-04 09:46:56 -05:00
Zanie Blue bc2f8f5b1e
Fix direct use of `range.simplify` (#1236) 2024-02-02 18:08:25 +00:00
Zanie Blue 6db9db0079
Invert display of "no versions" incompatibilities with multiple ranges (#1233)
Closes #884 

e.g.

```
❯ cargo run -q -- pip compile --python-version 3.12 requirements.in
  × No solution found when resolving dependencies:
  ╰─▶ Because the requested Python version (3.12) does not satisfy Python>=3.6,<3.10 and recommenders==1.0.0 depends on Python>=3.6,<3.9, we can conclude that recommenders==1.0.0 cannot be used.
      And because only the following versions of recommenders are available:
          recommenders<=0.7
          recommenders==1.0.0
          recommenders==1.1.0
          recommenders==1.1.1
      we can conclude that recommenders>0.7,<1.1.0 cannot be used. (1)

      Because the requested Python version (3.12) does not satisfy Python>=3.6,<3.10 and recommenders>=1.1.0 depends on Python>=3.6,<3.10, we can conclude that recommenders>=1.1.0 cannot be used.
      And because we know from (1) that recommenders>0.7,<1.1.0 cannot be used, we can conclude that recommenders>0.7 cannot be used.
      And because you require recommenders>0.7, we can conclude that the requirements are unsatisfiable.
```
2024-02-02 12:00:25 -06:00
konsti f10f902570
Yield after channel send and move cpu tasks to thread (#1163)
## Summary

Previously, we were blocking operations that could run in parallel. We
would send request through our main requests channel, but not yield so
that the receiver could only start processing requests much later than
necessary. We solve this by switching to the async
`tokio::sync::mpsc::channel`, where send is an async functions that
yields.

Due to the increased parallelism cache deserialization and the
conversion from simple api request to version map became bottlenecks, so
i moved them to `spawn_blocking`. Together these result in a 30-60%
speedup for larger warm cache resolution. Small cases such as black
already resolve in 5.7 ms on my machine so there's no speedup to be
gained, refresh and no cache were to noisy to get signal from.

Note for the future: Revisit the bounded channel if we want to produce
requests from `process_request`, too, (this would be good for
prefetching) to avoid deadlocks.

## Details

We can look at the behavior change through the spans:

```
RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null
```

Below, you can see how on main, we have discrete phases: All (cached)
simple api requests in parallel, then all (cached) metadata requests in
parallel, repeat until done. The solver is mostly waiting until it has
it's version map from the simple API query to be able to choose a
version. The main thread is blocked by process requests.

In the PR branch, the simple api requests succeeds much earlier,
allowing the solver to advance and also to schedule more prefetching.
Due to that `parse_cache` and `from_metadata` became bottlenecks, so i
moved them off the main thread (green color, and their spans can now
overlap because they can run on multiple threads in parallel). The main
thread isn't blocked on `process_request` anymore, instead it has
frequent idle times. The spans are all much shorter, which indicates
that on main they could have finished much earlier, but a task didn't
yield so they weren't scheduled to finish (though i haven't dug deep
enough to understand the exact scheduling between the process request
stream and the solver here).

**main**


![jupyter-warm-main](https://github.com/astral-sh/puffin/assets/6826232/693c53cc-1090-41b7-b02a-a607fcd2cd99)

**PR**


![jupyter-warm-branch](https://github.com/astral-sh/puffin/assets/6826232/33435f34-b39b-4b0a-a9d7-4bfc22f55f05)

## Benchmarks

```
$ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter"
Benchmark 1: target/profiling/main-dev resolve jupyter
  Time (mean ± σ):      29.1 ms ±   0.7 ms    [User: 22.9 ms, System: 11.1 ms]
  Range (min … max):    27.7 ms …  32.2 ms    103 runs
 
Benchmark 2: target/profiling/branch-dev resolve jupyter
  Time (mean ± σ):      18.8 ms ±   1.1 ms    [User: 37.0 ms, System: 22.7 ms]
  Range (min … max):    16.5 ms …  21.9 ms    154 runs
 
Summary
  target/profiling/branch-dev resolve jupyter ran
    1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter

$ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent"
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      37.8 ms ±   0.9 ms    [User: 30.7 ms, System: 14.1 ms]
  Range (min … max):    36.6 ms …  41.5 ms    79 runs
 
Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent
  Time (mean ± σ):      24.7 ms ±   1.5 ms    [User: 47.0 ms, System: 39.3 ms]
  Range (min … max):    21.5 ms …  28.7 ms    113 runs
 
Summary
  target/profiling/branch-dev resolve meine_stadt_transparent ran
    1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent

$ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):     229.0 ms ±   2.8 ms    [User: 197.3 ms, System: 63.7 ms]
  Range (min … max):   225.8 ms … 234.0 ms    13 runs
 
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):      91.4 ms ±   5.3 ms    [User: 289.2 ms, System: 176.9 ms]
  Range (min … max):    81.0 ms … 104.7 ms    32 runs
 
Summary
  target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
    2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
```
2024-02-02 18:18:24 +01:00
Charlie Marsh 8f9258fae3
Invert default feature for testing (#1200)
## Summary

We have some flags in Puffin that enable us to opt-in to certain tests.
To date, they've been opt-in, so we've run our tests with
`--all-features`. This PR makes them opt-out, and we now run tests with
default features.

The main motivation here is that I want to get tests working for macOS
on CI, but for unknown reasons, macOS can't compile the PyO3 features at
the same time as everything else due to strange linker issues. By
avoiding `--all-features` for tests, we thus avoid unnecessarily
including features that we don't actually use in Puffin.

I verified that the exact same number of tests (439) are run before and
after this change. For users, the primary difference is that you now
need to specify `--no-default-features --features pypi --features
python` to avoid (e.g.) including the Git tests.
2024-01-31 09:44:26 -05:00
Charlie Marsh 3f5e7306a5
Remove `WaitMap` dependency (#1183)
## Summary

This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by
removing the `WaitMap` and gaining more granular control over the values
that we hold over `await` boundaries.
2024-01-30 15:25:22 -05:00
Charlie Marsh c129717b41
Add support for `--no-deps` to `pip install` (#1191)
## Summary

Closes https://github.com/astral-sh/puffin/issues/1188.
2024-01-30 19:54:57 +00:00
Charlie Marsh 8305acc584
Add a builder for resolution options (#1192) 2024-01-30 19:50:16 +00:00
konsti be48200642
Small instrumentation improvements (#1164)
Less verbose span fields for `Dist`s by using the display impl and no
more min length in the tracing durations plot config for comparability
(we lose spans due to a speedup otherwise). Both wait points in the
solver loop are now instrumented so we can inspect what we're waiting
for to progress in the solver.
2024-01-29 10:55:19 +00:00
Andrew Gallant 5219d37250
add initial rkyv support (#1135)
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.

For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:

* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
  cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
  type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
  of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
  else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
  introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
  given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
  allocate and create all necessary objects.

This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.

The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.

My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.

There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.

[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
2024-01-28 12:14:59 -05:00
Charlie Marsh 3d10f344f3
Only include visited packages in error message derivation (#1144)
## Summary

This is my guess as to the source of the resolver flake, based on
information and extensive debugging from @zanieb. In short, if we rely
on `self.index.packages` as a source of truth during error reporting, we
open ourselves up to a source of non-determinism, because we fetch
package metadata asynchronously in the background while we solve -- so
packages _could_ be included in or excluded from the index depending on
the order in which those requests are returned.

So, instead, we now track the set of packages that _were_ visited by the
solver. Visiting a package _requires_ that we wait for its metadata to
be available. By limiting analysis to those packages that were visited
during solving, we are faithfully representing the state of the solver
at the time of failure.

Closes #863
2024-01-28 09:27:22 -05:00
Charlie Marsh 361a2039d2
Add `--no-annotate` and `--no-header` flags (#1117)
Closes #1107.
Closes #1108.
2024-01-26 12:14:18 +00:00
Charlie Marsh 7755f986c3
Support extras in editable requirements (#1113)
## Summary

This PR adds support for requirements like `-e .[d]`.

Closes #1091.
2024-01-26 12:07:51 +00:00
Charlie Marsh f36c167982
Use a consolidated error for distribution failures (#1104)
## Summary

Use a single error type in `puffin_distribution`, rather than two
confusingly similar types between `DistributionDatabase` and the source
distribution module.

Also removes the `#[from]` for IO errors and replaces with explicit
wrapping, which is verbose but removes a bunch of incorrect error
messages.
2024-01-25 14:49:11 -05:00
Charlie Marsh 8ef819e07e
Remove `Option` wrapper from requirement extras (#1103)
There's no semantic difference between `None` and empty, so seems
simpler to represent this way.
2024-01-25 13:21:53 -05:00
Andrew Gallant 067acfe79e
puffin-client: rejigger error type (#1102)
This PR changes the error type to be boxed internally so that it uses
less size on the stack. This makes functions returning `Result<T,
Error>`, in particular, return something much smaller.

The specific thing that motivated this was Clippy lints firing when I
tried to refactor code in this crate.

I chose to achieve boxing by splitting the enum out into a separate
type, and then wiring up the necessary `From` impl to make error
conversions easy, and then making `Error` itself opaque. We could expose
the `Box`, but there isn't a ton of benefit in doing so because one
cannot pattern match through a `Box`.

This required using more explicit error conversions in several places.
And as a result, I was able to remove all `#[from]` attributes on
non-transparent error variants.
2024-01-25 13:13:21 -05:00
Zanie Blue ed1ac640b9
Consolidate `UnusableDependencies` into a generic `Unavailable` incompatibility (#1088)
Requires https://github.com/zanieb/pubgrub/pull/20

In short, `UnusableDependencies` can be generalized into `Unavailable`
which encompasses incompatibilities where a package range which is
unusable for some inherent reason as well as when its dependencies are
unusable. We can eventually use this to track more incompatibilities in
the solver. I made the reason string required because I can't see a case
where we should leave it out.

Additionally, this improves the display of conflicts in the root
requirements.
2024-01-24 22:10:44 -06:00
konsti 2e0ce70d13
Initial windows support (#940)
## Summary

First batch of changes for windows support. Notable changes:

* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.

## Outlook

Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped

There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
2024-01-24 18:27:49 +01:00
Charlie Marsh 0519375bd6
Remove some unused dependencies (#1077) 2024-01-24 11:58:21 -05:00
Andrew Gallant eebc2f340a
make some things guaranteed to be deterministic (#1065)
This PR replaces a few uses of hash maps/sets with btree maps/sets and
index maps/sets. This has the benefit of guaranteeing a deterministic
order of iteration.

I made these changes as part of looking into a flaky test.
Unfortunately, I'm not optimistic that anything here will actually fix
the flaky test, since I don't believe anything was actually dependent
on the order of iteration.
2024-01-23 20:30:33 -05:00
Zanie Blue 89eb8547ce
Fix missing comma before conclusions (#1042)
Closes https://github.com/astral-sh/puffin/issues/1010
2024-01-22 13:31:09 -06:00
Zanie Blue e21948f353
Improve display of Python versions (#1029)
In https://github.com/astral-sh/puffin/pull/986 there was some confusion
about what these values are set to and I noticed that we never actually
display the target version being used for a resolution.

- Consistently display the Python interpreter being used, i.e. make it
clear that we are referring the the interpreter/installed Python version
and always show the version number
- Display the target Python version during solving
2024-01-22 18:46:18 +00:00
Charlie Marsh b0e73d796c
Add support for PyPy wheels (#1028)
## Summary

This PR adds support for PyPy wheels by changing the compatible tags
based on the implementation name and version of the current interpreter.

For now, we only support CPython and PyPy, and explicitly error out when
given other interpreters. (Is this right? Should we just fallback to
CPython tags...? Or skip the ABI-specific tags for unknown
interpreters?)

The logic is based on
4d85340613/src/packaging/tags.py (L247).
Note, however, that `packaging` uses the `EXT_SUFFIX` variable from
`sysconfig`... Instead, I looked at the way that PyPy formats the tags,
and recreated them based on the Python and implementation version. For
example, PyPy wheels look like
`cchardet-2.1.7-pp37-pypy37_pp73-win_amd64.whl` -- so that's `pp37` for
PyPy with Python version 3.7, and then `pypy37_pp73` for PyPy with
Python version 3.7 and PyPy version 7.3.

Closes https://github.com/astral-sh/puffin/issues/1013.

## Test Plan

I tested this manually, but I couldn't find macOS universal PyPy
wheels... So instead I added `cchardet` to a `requirements.in`, ran
`cargo run pip sync requirements.in --index-url
https://pypy.kmtea.eu/simple --verbose`, and added logging to verify
that the platform tags matched (even if the architecture didn't).
2024-01-22 14:22:27 +00:00
Charlie Marsh e09a51653e
Propagate cancellation errors in `OnceMap` (#1032)
## Summary

Ensures that if an operation is cancelled in one thread, we propagate it
to others rather than panicking.

Related to https://github.com/astral-sh/puffin/issues/1005.
2024-01-22 09:00:21 -05:00
Charlie Marsh b9bee013ce
Use full Python version for installed version (#1033)
## Summary

`interpreter.version()` returns the `python_full_version`, but the
marker variant uses `python_version` instead of `python_full_version` --
so it's omitting the patch.
2024-01-22 00:44:39 -06:00
Zanie Blue 6202c9e1b5
Use current and requested Python versions in `requires-python` incompatibility errors (#986)
Closes https://github.com/astral-sh/puffin/issues/806
2024-01-22 00:32:02 -06:00
Zanie Blue 4026710189
Add scenario tests for `pip-compile` (#1011)
e.g. for scenarios that test resolution _without_ installation.

This refactors the `update` script to generate scenario test files for
`pip compile` _and_ `pip install`. We don't overlap scenarios to save
time. We only generate `pip compile` test cases for scenarios we cannot
represent with `pip install` e.g. a `--python-version` override.

The _one_ scenario I added happened to reveal a bug in our resolver
where we were incorrectly filtering versions by the installed version
when wheels were available. Per the comment at
https://github.com/astral-sh/puffin/issues/883#issuecomment-1890773112,
we should _only_ need to check for a compatible installed Python version
when using a different _target_ Python version if we need to build a
source distribution.
53bce68400
resolves this by removing the excessive constraints — the correct Python
version incompatibilities are applied elsewhere.
2024-01-21 17:47:42 -06:00
Zanie Blue 33b35f7020
Add support for disabling installation from pre-built wheels (#956)
Adds support for disabling installation from pre-built wheels i.e. the
package must be built from source locally.
We will still always use pre-built wheels for metadata during
resolution.

Available via `--no-binary` and `--no-binary-package <name>` flags in
`pip install` and `pip sync`. There is no flag for `pip compile` since
no installation happens there.

```
--no-binary

    Don't install pre-built wheels.
    
    When enabled, all installed packages will be installed from a source distribution. 
    The resolver will still use pre-built wheels for metadata.


--no-binary-package <NO_BINARY_PACKAGE>

    Don't install pre-built wheels for a specific package.
    
    When enabled, the specified packages will be installed from a source distribution. 
    The resolver will still use pre-built wheels for metadata.
```

When packages are already installed, the `--no-binary` flag will have no
affect without the `--reinstall` flag. In the future, I'd like to change
this by tracking if a local distribution is from a pre-built wheel or a
locally-built wheel. However, this is significantly more complex and
different than `pip`'s behavior so deferring for now.

For reference, `pip`'s flag works as follows:

```
--no-binary <format_control>

    Do not use binary packages. Can be supplied multiple times, and each time adds to the
    existing value. Accepts either ":all:" to disable all binary packages, ":none:" to empty the
    set (notice the colons), or one or more package names with commas between them (no colons).
    Note that some packages are tricky to compile and may fail to install when this option is
    used on them.
```

Note we are not matching the exact `pip` interface here because it seems
complicated to use. I think we may want to consider adjusting our
interface for this behavior since we're not entirely compatible anyway
e.g. I think `--force-build` and `--force-build-package` are clearer
names. We could also consider matching the `pip` interface or only
allowing `--no-binary <package>` for compatibility. We can of course do
whatever we want in our _own_ install interfaces later.

Additionally, we may want to further consider the semantics of
`--no-binary`. For example, if I run `pip install pydantic --no-binary`
I expect _just_ Pydantic to be installed without binaries but by default
we will build all of Pydantic's dependencies too.

This work was prompted by #895, as it is much easier to measure
performance gains from building source distributions if we have a flag
to ensure we actually build source distributions. Additionally, this is
a flag I have used frequently in production to debug packages that ship
Cythonized wheels.
2024-01-19 11:24:27 -06:00
Zanie Blue 8b49d900bd
Refer to the user instead of "root" when mentioning direct dependencies (#982)
Closes https://github.com/astral-sh/puffin/issues/857
2024-01-19 11:17:42 -06:00
Zanie Blue ae7a2cddc2
Avoid showing negations of ranges in error messages (#981)
Closes https://github.com/astral-sh/puffin/issues/980
2024-01-19 11:07:14 -06:00
Zanie Blue 02ed195982
Improve simple no version messages using complement of range (#979)
Improves some of the "no versions of <package> are available" messages
by showing the complement or inversion of the package.

Does not address cases like

```
Because there are no versions of crow that satisfy any of:
    crow>1.0.0,<2.0.0a5
    crow>2.0.0a7,<2.0.0b1
    crow>2.0.0b1,<2.0.0b5
...
```

which are a bit more complicated; I'll focus on those cases in a
follow-up.
2024-01-19 16:48:20 +00:00
Zanie Blue 7bb4fda8af
Say "depend on" instead of "depends on" when proper in error messages (#968)
I would like to spend some additional time working on the package range
display abstractions, but maybe that is best done _after_ I've done a
good bit of fiddling with the error messages.

Addresses
https://github.com/astral-sh/puffin/pull/868#discussion_r1447593081
2024-01-19 16:08:17 +00:00
konsti 47fc90d1b3
Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures (#1004)
This is https://github.com/astral-sh/puffin/pull/947 again but this time
merging into main instead of downstack, sorry for the noise.

---

Windows has a default stack size of 1MB, which makes puffin often fail
with stack overflows. The PR reduces stack size by three changes:

* Boxing `File` in `Dist`, reducing the size from 496 to 240.
* Boxing the largest futures.
* Boxing `CachePolicy`

## Method

Debugging happened on linux using
https://github.com/astral-sh/puffin/pull/941 to limit the stack size to
1MB. Used ran the command below.

```
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt
```

The main drawback is top-type-sizes not saying what the `__awaitee` is,
so it requires manually looking up with a future with matching size.

When the `brotli` features on `reqwest` is active, a lot of brotli types
show up. Toggling this feature however seems to have no effect. I assume
they are false positives since the `brotli` crate has elaborate control
about allocation. The sizes are therefore shown with the feature off.

## Results

The largest future goes from 12208B to 6416B, the largest type
(`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full
diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the
tests passed, then with an 800KB stack limit looked through the
backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

```console
$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```
2024-01-19 09:38:36 +00:00
Charlie Marsh f9154e8297
Add release workflow (#961)
## Summary

This PR adds a release workflow powered by `cargo-dist`. It's similar to
the version that's PR'd in Ruff
(https://github.com/astral-sh/ruff/pull/9559), with the exception that
it doesn't include the Docker build or the "update dependents" step for
pre-commit.
2024-01-18 15:44:11 -05:00
Charlie Marsh b8fbd529a1
Move `OnceMap` into its own crate (#946)
## Summary

This is extremely generic (like `WaitMap`), and I want to use it in the
cache.
2024-01-17 04:09:15 +00:00
Charlie Marsh 2f8f126f2f
Share a single `Index` across resolutions (#906)
## Summary

This PR uses a single `Index` that's shared between the top-level
resolver and any sub-resolutions happen in the course of that top-level
resolution (namely, to resolve build dependencies for any source
distributions).

In theory it's an optimization, since (e.g.) if we have two packages
that both need the `flit-core` build system, and we attempt to build
them both at once, we'll only fetch its metadata _once_, and share it
across the two resolutions. In practice, I haven't been able to get this
to show up in benchmarks. I suspect you'd need a _lot_ of source
distributions for it to matter... Though it may still be worth doing, it
strikes me as a cleaner design.

Closes #200.

Closes #541.
2024-01-16 05:37:15 +00:00
Charlie Marsh e54fdea93f
Continue to respect `--find-links` with `--no-index` (#931)
Like `pip`, we should allow `--find-links` with `--no-index`.
2024-01-15 16:19:27 +00:00
Charlie Marsh 42888a9609
Share flat index across resolutions (#930)
## Summary

This PR restructures the flat index fetching in a few ways:

1. It now lives in its own `FlatIndexClient`, since it felt a bit
awkward (in my opinion) for it to live in `RegistryClient`.
2. We now fetch the `FlatIndex` outside of the resolver. This has a few
benefits: (1) the resolver construct is no longer `async` and no longer
returns `Result`, which feels better for a resolver; and (2) we can
share the `FlatIndex` across resolutions rather than re-fetching it for
every source distribution build.
2024-01-15 11:02:02 -05:00
Charlie Marsh e6d7124147
Add an extra struct around the package-to-flat index map (#923)
## Summary

`FlatIndex` is now the thing that's keyed on `PackageName`, while
`FlatDistributions` is what used to be called `FlatIndex` (a map from
version to `PrioritizedDistribution`, for a single package). I find this
a bit clearer, since we can also remove the `from_files` that doesn't
return `Self`, which I had trouble following.
2024-01-15 14:48:10 +00:00
Charlie Marsh 9a3f3d385c
Remove `PubGrubVersion` (#924)
## Summary

I'm running into some annoyances converting `&Version` to
`&PubGrubVersion` (which is just a wrapper type around `Version`), and I
realized... We don't even need `PubGrubVersion`?

The reason we "need" it today is due to the orphan trait rule: `Version`
is defined in `pep440_rs`, but we want to `impl
pubgrub::version::Version for Version` in the resolver crate.

Instead of introducing a new type here, which leads to a lot of
awkwardness around conversion and API isolation, what if we instead just
implement `pubgrub::version::Version` in `pep440_rs` via a feature? That
way, we can just use `Version` everywhere without any confusion and
conversion for the wrapper type.
2024-01-15 08:51:12 -05:00
konsti 82ff136a74
Add find links supports to pip-sync (#914)
Closes #877
2024-01-15 03:04:55 +00:00
konsti f63776b894
Support HTML indexes in `--find-links` (#913)
The simple html format parser luckily seems to work for find links too,
at least it can parse
https://storage.googleapis.com/jax-releases/jax_cuda_releases.html.
2024-01-15 02:54:34 +00:00
konsti e9b6b6fa36
Implement `--find-links` as flat indexes (directories in pip-compile) (#912)
Add directory `--find-links` support for local paths to pip-compile.

It seems that pip joins all sources and then picks the best package. We
explicitly give find links packages precedence if the same exists on an
index and locally by prefilling the `VersionMap`, otherwise they are
added as another index and the existing rules of precedence apply.

Internally, the feature is called _flat index_, which is more meaningful
than _find links_: We're not looking for links, we're picking up local
directories, and (TBD) support another index format that's just a flat
list of files instead of a nested index.

`RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and
`SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist`
and `RegistrySourceDist` gained the ability to represent both a url and
a path so that `--find-links` with a url and with a path works the same,
both being locked as `<package_name>@<version>` instead of
`<package_name> @ <url>`. (This is more of a detail, this PR in general
still work if we strip that and have directory find links represented as
`<package_name> @ file:///path/to/file.ext`)

`PrioritizedDistribution` and `FlatIndex` have been moved to locations
where we can use them in the upstack PR.

I added a `scripts/wheels` directory with stripped down wheels to use
for testing.

We're lacking tests for correct tag priority precedence with flat
indexes, i only confirmed this manually since it is not covered in the
pip-compile or pip-sync output.

Closes #876
2024-01-15 02:04:10 +00:00
konsti 5ffbfadf66
Make hashes optional (#910)
There is no guarantee that indexes provide hashes at all or the sha256
we support specifically. [PEP
503](https://peps.python.org/pep-0503/#specification):

> The URL SHOULD include a hash in the form of a URL fragment with the
following syntax: #<hashname>=<hashvalue>, where <hashname> is the
lowercase name of the hash function (such as sha256) and <hashvalue> is
the hex encoded digest.

We instead use the url as input to generate a hash when caching.
2024-01-14 16:32:55 -05:00
konsti a53bdeba4c
Remove `base` from `RegistryBuiltDist` and `RegistrySourceDist` (#919)
Follow-up to https://github.com/astral-sh/puffin/pull/917 i found
rebasing the find-links PRs, this field became unused through the
absolute URLs.
2024-01-14 17:46:16 +00:00
Charlie Marsh 0374000ec0
Normalize extras when evaluating PEP 508 markers (#915)
## Summary

We always normalize extra names in our requirements (e.g., `cuda12_pip`
to `cuda12-pip`), but we weren't normalizing within PEP 508 markers,
which meant we ended up comparing `cuda12-pip` (normalized) against
`cuda12_pip` (unnormalized).

Closes https://github.com/astral-sh/puffin/issues/911.
2024-01-14 17:16:54 +00:00
Charlie Marsh 6e18e56789
Adjust markers to match target Python version (#909)
## Summary

This PR ensures that when the user passes in `--python-version`, we
adjust the _markers_ to match the target version, thus forcing us to
select compatible wheels for the `--python-version`, rather than the
installed version.

## Context

Let's call Python 3.10 the "installed" environment and Python 3.12 the
"target" environment. For each version, we have _both_ a Python version
(to match against `Requires-Python`) and a set of tags (to match against
wheels).

The rules for resolution are as follows...

- For each package, for each version, we try to find the "best
candidate" for resolution and installation.
- We first look for a wheel that's compatible with the _target_
environment. This requires testing against both the `Requires-Python`
and the markers. (We won't have to build or run this code, so the
_installed_ version is irrelevant.) **(This PR corrects _this_ bullet --
previously, we validated against the _installed_ markers, rather than
the target markers.)**
- If we can't find a compatible wheel, we accept any _incompatible_
wheel as long as there's a source distribution. The source distribution
_must_ be compatible with the target environment. (We won't have to
build or run this code, so the _installed_ version is irrelevant.)
- If there are no wheels, then the source distribution must be
compatible with _both_ the installed and target environments, since we
need to build it.

This is all true for the top-level resolution. When we perform a
sub-resolution (when resolving the build dependencies of a source
distribution), we should _only_ use the installed environment, and
ignore the target environment, since we assume that the dependencies
will be the same in both environments once built -- so our goal is
"just" to build the distribution, without concern for which build
dependencies it uses.

Closes https://github.com/astral-sh/puffin/issues/883.
2024-01-14 15:39:15 +00:00
Charlie Marsh 8187c05d8a
Use `DashMap` for redirects (#908)
## Summary

We don't need to wait on these, so it's simpler to use a standard
concurrent hash map.
2024-01-13 20:36:02 +00:00
Charlie Marsh f527f2add9
Remove erroneous local `Index` in resolver (#907) 2024-01-13 15:19:00 -05:00
Charlie Marsh 231686e71b
Remove `incompatibilities` from index (#905)
This isn't really part of the "index", it's part of the resolution.
2024-01-13 02:57:15 +00:00