Commit Graph

894 Commits

Author SHA1 Message Date
Charlie Marsh 0f592b67bb
Remove clone from `RegistryWheelIndex` (#937)
Doesn't need to own the package names.
2024-01-15 16:18:12 -05:00
Charlie Marsh 2a69b273ce
Use a standalone error type for `--find-links` registry (#936) 2024-01-15 19:48:48 +00:00
Charlie Marsh e71e3e8dd1
Refresh `BuildDispatch` when running pip install with `--reinstall` (#933)
## Summary

This fixes an extremely subtle bug in `pip install --reinstall`, whereby
if you depend on `setuptools` at the top level, we end up uninstalling
it after resolving, which breaks some cached state. If we have
`--reinstall`, we need to reset that cached state between resolving and
installing.

## Test Plan

Running `pip install --reinstall` with:

```txt
setuptools
devpi @ e334eb4dc9bb023329e4b610e4515b/devpi-2.2.0.tar.gz
```

Fails on `main`, but passes.
2024-01-15 18:56:18 +00:00
Charlie Marsh 116da6b7de
Share in-flight map across resolutions (#932)
## Summary

This PR fixes a subtle bug in `pip install` when using `--reinstall`. If
a package depends on a build system directly (e.g., `waitress` depends
on `setuptools`), and then you have other packages that also need the
build system to build a source distribution, right now, we don't share
the `OnceMap` between those cases.

This lifts the `InFlight` tracking up a level, so that it's initialized
once per command, then shared everywhere.

## Test Plan

I'm having trouble coming up with an identical test-case and hesitant to
add this slow test to the suite... But if you run `pip install
--reinstall` with:

```
waitress @ git+https://github.com/zanieb/waitress
devpi-server @ git+https://github.com/zanieb/devpi#subdirectory=server
```

It fails consistently on `main` and passes here.
2024-01-15 13:11:22 -05:00
Charlie Marsh 249ca10765
Move Puffin subcommands to a pip namespace (#921)
## Summary

This makes the separation clearer between the legacy `pip` API and the
API we'll add in the future for the package manager itself. It also
enables seamless `puffin pip` aliasing for those that want it.

Closes #918.
2024-01-15 16:36:45 +00:00
Charlie Marsh e54fdea93f
Continue to respect `--find-links` with `--no-index` (#931)
Like `pip`, we should allow `--find-links` with `--no-index`.
2024-01-15 16:19:27 +00:00
Charlie Marsh 42888a9609
Share flat index across resolutions (#930)
## Summary

This PR restructures the flat index fetching in a few ways:

1. It now lives in its own `FlatIndexClient`, since it felt a bit
awkward (in my opinion) for it to live in `RegistryClient`.
2. We now fetch the `FlatIndex` outside of the resolver. This has a few
benefits: (1) the resolver construct is no longer `async` and no longer
returns `Result`, which feels better for a resolver; and (2) we can
share the `FlatIndex` across resolutions rather than re-fetching it for
every source distribution build.
2024-01-15 11:02:02 -05:00
Charlie Marsh e6d7124147
Add an extra struct around the package-to-flat index map (#923)
## Summary

`FlatIndex` is now the thing that's keyed on `PackageName`, while
`FlatDistributions` is what used to be called `FlatIndex` (a map from
version to `PrioritizedDistribution`, for a single package). I find this
a bit clearer, since we can also remove the `from_files` that doesn't
return `Self`, which I had trouble following.
2024-01-15 14:48:10 +00:00
Charlie Marsh 9a3f3d385c
Remove `PubGrubVersion` (#924)
## Summary

I'm running into some annoyances converting `&Version` to
`&PubGrubVersion` (which is just a wrapper type around `Version`), and I
realized... We don't even need `PubGrubVersion`?

The reason we "need" it today is due to the orphan trait rule: `Version`
is defined in `pep440_rs`, but we want to `impl
pubgrub::version::Version for Version` in the resolver crate.

Instead of introducing a new type here, which leads to a lot of
awkwardness around conversion and API isolation, what if we instead just
implement `pubgrub::version::Version` in `pep440_rs` via a feature? That
way, we can just use `Version` everywhere without any confusion and
conversion for the wrapper type.
2024-01-15 08:51:12 -05:00
konsti 8860a9c29e
Add flat index urls to registry wheel index (#928)
Previously, we were missing flat index wheels in the cache.
2024-01-15 10:21:59 +00:00
konsti 95f3cca28d
Use fs_err in more places (#926)
Before:

```
error: Failed to download distributions
  Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89
  Caused by: Directory not empty (os error 39)
```

After:

```
error: Failed to download distributions
  Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89
  Caused by: failed to rename file from /home/konsti/.cache/puffin/.tmpcG7tVP/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64.whl to /home/konsti/.cache/puffin/wheels-v0/index/9ff50b883297fa9d/jaxlib/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64
  Caused by: Directory not empty (os error 39)
```
2024-01-15 09:39:33 +00:00
Charlie Marsh 05029d219f
Remove `--find-links` limitation from README (#922)
These are now supported.
2024-01-15 03:09:15 +00:00
konsti 82ff136a74
Add find links supports to pip-sync (#914)
Closes #877
2024-01-15 03:04:55 +00:00
konsti f63776b894
Support HTML indexes in `--find-links` (#913)
The simple html format parser luckily seems to work for find links too,
at least it can parse
https://storage.googleapis.com/jax-releases/jax_cuda_releases.html.
2024-01-15 02:54:34 +00:00
konsti e9b6b6fa36
Implement `--find-links` as flat indexes (directories in pip-compile) (#912)
Add directory `--find-links` support for local paths to pip-compile.

It seems that pip joins all sources and then picks the best package. We
explicitly give find links packages precedence if the same exists on an
index and locally by prefilling the `VersionMap`, otherwise they are
added as another index and the existing rules of precedence apply.

Internally, the feature is called _flat index_, which is more meaningful
than _find links_: We're not looking for links, we're picking up local
directories, and (TBD) support another index format that's just a flat
list of files instead of a nested index.

`RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and
`SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist`
and `RegistrySourceDist` gained the ability to represent both a url and
a path so that `--find-links` with a url and with a path works the same,
both being locked as `<package_name>@<version>` instead of
`<package_name> @ <url>`. (This is more of a detail, this PR in general
still work if we strip that and have directory find links represented as
`<package_name> @ file:///path/to/file.ext`)

`PrioritizedDistribution` and `FlatIndex` have been moved to locations
where we can use them in the upstack PR.

I added a `scripts/wheels` directory with stripped down wheels to use
for testing.

We're lacking tests for correct tag priority precedence with flat
indexes, i only confirmed this manually since it is not covered in the
pip-compile or pip-sync output.

Closes #876
2024-01-15 02:04:10 +00:00
konsti 5ffbfadf66
Make hashes optional (#910)
There is no guarantee that indexes provide hashes at all or the sha256
we support specifically. [PEP
503](https://peps.python.org/pep-0503/#specification):

> The URL SHOULD include a hash in the form of a URL fragment with the
following syntax: #<hashname>=<hashvalue>, where <hashname> is the
lowercase name of the hash function (such as sha256) and <hashvalue> is
the hex encoded digest.

We instead use the url as input to generate a hash when caching.
2024-01-14 16:32:55 -05:00
Zanie Blue 9ad19b7e54
Bump to the latest packse version (#916) 2024-01-14 12:49:23 -06:00
konsti a53bdeba4c
Remove `base` from `RegistryBuiltDist` and `RegistrySourceDist` (#919)
Follow-up to https://github.com/astral-sh/puffin/pull/917 i found
rebasing the find-links PRs, this field became unused through the
absolute URLs.
2024-01-14 17:46:16 +00:00
Charlie Marsh 0374000ec0
Normalize extras when evaluating PEP 508 markers (#915)
## Summary

We always normalize extra names in our requirements (e.g., `cuda12_pip`
to `cuda12-pip`), but we weren't normalizing within PEP 508 markers,
which meant we ended up comparing `cuda12-pip` (normalized) against
`cuda12_pip` (unnormalized).

Closes https://github.com/astral-sh/puffin/issues/911.
2024-01-14 17:16:54 +00:00
konsti a99e5e00f2
Use absolute urls in `distribution_type::File` (#917)
Previously, the url on file could either be a relative or an absolute
url, depending on the index, and we would finalize it lazily. Now we
finalize the url when converting `pypi_types::File` to
`distribution_types::File`. This change is required to make the hashes
on `File` optional (https://github.com/astral-sh/puffin/pull/910), which
are currently the only unique field usable for caching.
2024-01-14 17:15:24 +00:00
Charlie Marsh 6e18e56789
Adjust markers to match target Python version (#909)
## Summary

This PR ensures that when the user passes in `--python-version`, we
adjust the _markers_ to match the target version, thus forcing us to
select compatible wheels for the `--python-version`, rather than the
installed version.

## Context

Let's call Python 3.10 the "installed" environment and Python 3.12 the
"target" environment. For each version, we have _both_ a Python version
(to match against `Requires-Python`) and a set of tags (to match against
wheels).

The rules for resolution are as follows...

- For each package, for each version, we try to find the "best
candidate" for resolution and installation.
- We first look for a wheel that's compatible with the _target_
environment. This requires testing against both the `Requires-Python`
and the markers. (We won't have to build or run this code, so the
_installed_ version is irrelevant.) **(This PR corrects _this_ bullet --
previously, we validated against the _installed_ markers, rather than
the target markers.)**
- If we can't find a compatible wheel, we accept any _incompatible_
wheel as long as there's a source distribution. The source distribution
_must_ be compatible with the target environment. (We won't have to
build or run this code, so the _installed_ version is irrelevant.)
- If there are no wheels, then the source distribution must be
compatible with _both_ the installed and target environments, since we
need to build it.

This is all true for the top-level resolution. When we perform a
sub-resolution (when resolving the build dependencies of a source
distribution), we should _only_ use the installed environment, and
ignore the target environment, since we assume that the dependencies
will be the same in both environments once built -- so our goal is
"just" to build the distribution, without concern for which build
dependencies it uses.

Closes https://github.com/astral-sh/puffin/issues/883.
2024-01-14 15:39:15 +00:00
Charlie Marsh 8187c05d8a
Use `DashMap` for redirects (#908)
## Summary

We don't need to wait on these, so it's simpler to use a standard
concurrent hash map.
2024-01-13 20:36:02 +00:00
Charlie Marsh f527f2add9
Remove erroneous local `Index` in resolver (#907) 2024-01-13 15:19:00 -05:00
Charlie Marsh 231686e71b
Remove `incompatibilities` from index (#905)
This isn't really part of the "index", it's part of the resolution.
2024-01-13 02:57:15 +00:00
Charlie Marsh 477186dcb3
Remove `ResolutionGraph#requirements` (#903) 2024-01-12 20:09:19 +00:00
Charlie Marsh d3f65c317d
Avoid some additional clones for `PackageName` (#896) 2024-01-12 17:54:40 +00:00
konsti aee6aed684
Make install_editable test faster (#901)
Remove a test case from the `install_editable` that slows it down from
3.6s to 6.5s while providing low test coverage. It also seems to block
other tests sometimes, `cargo nextest run -E "test(editable)"
--all-features` has more consistent and lower runtimes. Surprisingly
this seems to have bigger effect than switching from pyo3 to cffi.

Used test commands:
```
rm -rf scripts/editable-installs/maturin_editable/target/ && time cargo nextest run -E "test(=install_editable)" --all-features
rm -rf scripts/editable-installs/maturin_editable/target/ && time cargo nextest run -E "test(editable)" --all-features
 ```

Part of #878
2024-01-12 18:50:27 +01:00
konsti 878bc4bf8d
Stub out DTLSsocket test (#900)
Replace the DTLSsocket test with a dummy package that does nothing but
contain the build system specs that we need. This should speed up one of
the slowest tests.

Part of #878
2024-01-12 18:50:16 +01:00
konsti 174200519f
Don't install cargo insta on CI (#902)
We don't use it anymore on CI.
2024-01-12 17:45:37 +00:00
Charlie Marsh 06039e1293
Add hashes to `pip-compile` output (#894)
## Summary

Adds hashes to `pip-compile` output, though we don't actually check
those hashes in `pip-sync` yet.

Closes https://github.com/astral-sh/puffin/issues/131.
2024-01-12 12:44:19 -05:00
Andrew Gallant 1629141d67
bench: set log level to WARN, make --verbose raise it to INFO (#898)
I personally found the output by default somewhat noisy, especially for
large requirements files. Since --verbose is already a thing, I propose
making the extra output opt-in.
2024-01-12 10:07:58 -05:00
konsti 0cc98c771e
Fix a tracing panic (#899) 2024-01-12 14:47:58 +00:00
Andrew Gallant cf62d296b3
bench: ignore empty requirements lines (#897)
In particular, this script previously choked on the `home-assistant.in`
requirements file because it contains many empty lines.
2024-01-12 09:39:48 -05:00
Charlie Marsh 11b11d04a7
Ignore installed version when determining wheel compatibility (#890) 2024-01-12 08:57:00 -05:00
Charlie Marsh 5fd2c380a7
Add `into_cached_dist` to `LocalWheel` (#893)
Simplifies `unzip_wheel` a bit and avoids unnecessarily cloning in the
common case.
2024-01-12 09:01:30 +00:00
Charlie Marsh 35c1faa575
Move in-flight tracking to the download level (#892)
## Summary

Now that `get_or_build_wheel` will often _also_ handle the unzip step,
we need to move our per-target locking (`OnceMap`) up a level.
Previously, it was only applied to the unzip step, to prevent us from
attempting to unzip into the same target concurrently; now, it's applied
at the `get_wheel` level, which includes both downloading and unzipping.

## Test Plan

It seems like none of our existing tests catch this -- perhaps because
they're too "simple"? You need to run into a situation in which you're
doing multiple source distribution builds concurrently (since they'll
all try to download `setuptools`):

```
rm -rf foo && virtualenv --clear .venv && cargo run -p puffin-cli -- pip-compile ./scripts/requirements/pydantic.in  --verbose --cache-dir foo
```
2024-01-12 09:52:22 +01:00
Charlie Marsh 60cea0f07d
Use consistent parse terminology in pyproject error (#891)
We use `parse` for the other file types.
2024-01-11 21:25:47 -05:00
Zanie Blue 65c600b666
Use a larger runner for tests (#889)
Alternative to #875. Instead of partitioning tests across multiple
runners via nextest, we use a larger GitHub Actions runner.
Additionally, we explore using nextest to take advantage of the
increased number of cores.

On the 8-core machine, nextest is 22% faster than insta. In combination
with the vastly more readable output, I think this means we should
switch over. As noted in #875 we lose the ability to detect unreferenced
snapshot files but since we inline all of our snapshots this shouldn't
matter.

### Benchmarks

The following are the runtime of _just_ the test portion of the test job
in GitHub Actions except the partitioned case from #875 which requires a
separate build step making runner overhead relevant.

The compile times are noted as a reference as a possible lower bound of
test times. The compile time **is included** in all of the test times
shown.

Where the nextest thread count is not noted, it is inferred from the CPU
count.

```
test                                  time        diff
------------------------------------------------------
2-core (main)                         4m 53s
2-core-nextest-partioned (#875)       3m 56s      -19%
4-core-compilation                       32s      
4-core-insta                          1m 47s      -63%
4-core-nextest                        1m 40s      -66%
8-core-compilation                       18s      
8-core-insta                          1m  9s      -76%
8-core-nextest                        1m  5s      -78%
8-core-nextest-12-threads                54s      -82%
8-core-nextest-16-threads                55s      -82%
```


### Cost

We must pay per-minute costs for these runners:

> Larger runners are not eligible for the use of included minutes on
private repositories. For both private and public repositories, when
larger runners are in use, they will always be billed at the per-minute
rate.
>
> Compared to standard GitHub-hosted runners, larger runners are billed
differently. Larger runners are only billed at the per-minute rate for
the amount of time workflows are executed on them.


[[source]](https://docs.github.com/en/actions/using-github-hosted-runners/about-larger-runners/about-larger-runners#understanding-billing)

The per-minute rates are as follows:

> Linux	2	$0.008  (main)
> Linux	4	$0.016
> Linux	8	$0.032  (pull request)


[[source]](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#per-minute-rates)

The per-minute cost increases by 4x but the workflow is 5.2x faster
since we are making use of the extra compute. We will not get any free
minutes executing these runners once the repository is public.
Additionally, we will not make use of our [3,000 minutes /
month](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#included-storage-and-minutes)
of included minutes. Using the 8-core machines, the included 3,000
minutes should account for approximately ~$100.

Here's a brief analysis of costs from the last few
```

Minutes used
------------
November 1090 + 3000 = 4090
December 1357 + 3000 = 4357
January  2655 in 7 days
         ~3x more expected
                     = 11000 estimated

Costs
-----
November  1090 * 0.008   = $ 8.72
December  1357 * 0.008   = $10.86
January   8000 * 0.008   = $64 projected using 3000 included minutes and 2-core machines
          (11000 - (0.82 * 11000)) * 0.032
                         = $63 projected without included minutes and 4-core machines with perf improvement
          (11000 - (0.70 * 11000)) * 0.032
                         = $100 projected with a more conservative 70% reduction in total runtime

```

We can reduce costs (once public) by disabling larger runners for
non-organization users e.g.
https://github.com/PrefectHQ/prefect/pull/9519
2024-01-11 16:14:21 -06:00
Zanie Blue 90edfa8fe7
Use `mold` for linking in CI tests (#887)
Derived from https://github.com/astral-sh/puffin/pull/875

This gets us a significant speedup.

I would not read the commits individually. I can squash them but they
were used for testing various scenarios.

### Test compile times

Ranges are the lowest and highest I've seen. Huge variability in GitHub
Actions runners.

**Before:**
7m 21s - 8m 22s (cold cache)
110s - 120s (warm cache)

**After:**
6m 15s - 7m 05s (cold cache)
57s - 70s (warm cache)

**Improvement:**
4% - 25% (cold cache)
36% - 52% (warm cache)
2024-01-11 12:28:53 -06:00
bojanserafimov 4c047f858f
Remove InMemoryWheel and dead code (#879) 2024-01-11 10:11:07 -05:00
bojanserafimov 10227a74f8
Unzip while downloading (#856) 2024-01-11 09:41:46 -05:00
Charlie Marsh 4123a35228
Run `cargo update` (#873) 2024-01-11 09:10:07 -05:00
konsti 0dfbddd275
Shorten resolve many dev output (#885) 2024-01-11 13:53:13 +00:00
konsti 8c2b7d55af
Cleanup deps and docs (#882)
Fix warnings from `cargo +nightly udeps` and `cargo doc`.

Removes all mentions of regex from pep440_rs.
2024-01-11 10:43:40 +00:00
Zanie Blue d6fa628e11
Fix failing test (#880) 2024-01-11 00:41:37 +00:00
Zanie Blue d47eeccca8
Only write to rust cache in CI from main branch (#874)
Each cache entry is ~1 GB of our allotted 10 GB for the repository which
is quite a bit. We're probably losing cache entries all the time since
we add an entry per commit per pull request.

Saving the cache takes ~3 minutes
([example](https://github.com/astral-sh/puffin/actions/runs/7479909295/job/20358124969)),
it's probably just slowing down CI. It's ~25% of our test runtime and
~50% of our clippy runtime.
2024-01-10 15:04:27 -06:00
Zanie Blue 811332eacc
Improve handling of "full" version ranges (#868)
Reduces the number of implementation branches handling `Range:full`,
deferring it to `PackageRange`.
Improves some user-facing messages, e.g. saying `all versions of
<package>` instead of `<package>*`.
Changes the member names of the `PackageRangeKind` enum — they were not
very clear.
2024-01-10 21:03:55 +00:00
Zanie Blue a65c55ff4a
Say "cannot be used" and "must be used" instead of "forbidden" and "mandatory" (#867)
Closes #858
2024-01-10 20:49:40 +00:00
Zanie Blue 845ba6801d
Improve formatting of incompatible terms when there are two items (#866) 2024-01-10 20:36:54 +00:00
Zanie Blue 93d3093a2a
Improve formatting of package ranges in error messages (#864)
Closes #810
Closes https://github.com/astral-sh/puffin/issues/812
Requires https://github.com/zanieb/pubgrub/pull/19 and
https://github.com/zanieb/pubgrub/pull/18

- Always pair package ranges with names e.g. `... of a matching a<1.0`
instead of `... of a matching <1.0`
- Split range segments onto multiple lines when not a singleton as
suggested in
[#850](https://github.com/astral-sh/puffin/pull/850#discussion_r1446419610)
- Improve formatting when ranges are split across multiple lines e.g. by
avoiding extra spaces and improving wording

Note review will require expanding the hidden files as there are
significant changes to the report formatter and snapshots.

Bear with me here as these are definitely not perfect still.

The following changes build on top of this independently for further
improvements:
- #868 
- #867 
- #866 
- #871
2024-01-10 14:16:23 -06:00