Commit Graph

707 Commits

Author SHA1 Message Date
Zanie Blue 65c600b666
Use a larger runner for tests (#889)
Alternative to #875. Instead of partitioning tests across multiple
runners via nextest, we use a larger GitHub Actions runner.
Additionally, we explore using nextest to take advantage of the
increased number of cores.

On the 8-core machine, nextest is 22% faster than insta. In combination
with the vastly more readable output, I think this means we should
switch over. As noted in #875 we lose the ability to detect unreferenced
snapshot files but since we inline all of our snapshots this shouldn't
matter.

### Benchmarks

The following are the runtime of _just_ the test portion of the test job
in GitHub Actions except the partitioned case from #875 which requires a
separate build step making runner overhead relevant.

The compile times are noted as a reference as a possible lower bound of
test times. The compile time **is included** in all of the test times
shown.

Where the nextest thread count is not noted, it is inferred from the CPU
count.

```
test                                  time        diff
------------------------------------------------------
2-core (main)                         4m 53s
2-core-nextest-partioned (#875)       3m 56s      -19%
4-core-compilation                       32s      
4-core-insta                          1m 47s      -63%
4-core-nextest                        1m 40s      -66%
8-core-compilation                       18s      
8-core-insta                          1m  9s      -76%
8-core-nextest                        1m  5s      -78%
8-core-nextest-12-threads                54s      -82%
8-core-nextest-16-threads                55s      -82%
```


### Cost

We must pay per-minute costs for these runners:

> Larger runners are not eligible for the use of included minutes on
private repositories. For both private and public repositories, when
larger runners are in use, they will always be billed at the per-minute
rate.
>
> Compared to standard GitHub-hosted runners, larger runners are billed
differently. Larger runners are only billed at the per-minute rate for
the amount of time workflows are executed on them.


[[source]](https://docs.github.com/en/actions/using-github-hosted-runners/about-larger-runners/about-larger-runners#understanding-billing)

The per-minute rates are as follows:

> Linux	2	$0.008  (main)
> Linux	4	$0.016
> Linux	8	$0.032  (pull request)


[[source]](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#per-minute-rates)

The per-minute cost increases by 4x but the workflow is 5.2x faster
since we are making use of the extra compute. We will not get any free
minutes executing these runners once the repository is public.
Additionally, we will not make use of our [3,000 minutes /
month](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#included-storage-and-minutes)
of included minutes. Using the 8-core machines, the included 3,000
minutes should account for approximately ~$100.

Here's a brief analysis of costs from the last few
```

Minutes used
------------
November 1090 + 3000 = 4090
December 1357 + 3000 = 4357
January  2655 in 7 days
         ~3x more expected
                     = 11000 estimated

Costs
-----
November  1090 * 0.008   = $ 8.72
December  1357 * 0.008   = $10.86
January   8000 * 0.008   = $64 projected using 3000 included minutes and 2-core machines
          (11000 - (0.82 * 11000)) * 0.032
                         = $63 projected without included minutes and 4-core machines with perf improvement
          (11000 - (0.70 * 11000)) * 0.032
                         = $100 projected with a more conservative 70% reduction in total runtime

```

We can reduce costs (once public) by disabling larger runners for
non-organization users e.g.
https://github.com/PrefectHQ/prefect/pull/9519
2024-01-11 16:14:21 -06:00
Zanie Blue 90edfa8fe7
Use `mold` for linking in CI tests (#887)
Derived from https://github.com/astral-sh/puffin/pull/875

This gets us a significant speedup.

I would not read the commits individually. I can squash them but they
were used for testing various scenarios.

### Test compile times

Ranges are the lowest and highest I've seen. Huge variability in GitHub
Actions runners.

**Before:**
7m 21s - 8m 22s (cold cache)
110s - 120s (warm cache)

**After:**
6m 15s - 7m 05s (cold cache)
57s - 70s (warm cache)

**Improvement:**
4% - 25% (cold cache)
36% - 52% (warm cache)
2024-01-11 12:28:53 -06:00
bojanserafimov 4c047f858f
Remove InMemoryWheel and dead code (#879) 2024-01-11 10:11:07 -05:00
bojanserafimov 10227a74f8
Unzip while downloading (#856) 2024-01-11 09:41:46 -05:00
Charlie Marsh 4123a35228
Run `cargo update` (#873) 2024-01-11 09:10:07 -05:00
konsti 0dfbddd275
Shorten resolve many dev output (#885) 2024-01-11 13:53:13 +00:00
konsti 8c2b7d55af
Cleanup deps and docs (#882)
Fix warnings from `cargo +nightly udeps` and `cargo doc`.

Removes all mentions of regex from pep440_rs.
2024-01-11 10:43:40 +00:00
Zanie Blue d6fa628e11
Fix failing test (#880) 2024-01-11 00:41:37 +00:00
Zanie Blue d47eeccca8
Only write to rust cache in CI from main branch (#874)
Each cache entry is ~1 GB of our allotted 10 GB for the repository which
is quite a bit. We're probably losing cache entries all the time since
we add an entry per commit per pull request.

Saving the cache takes ~3 minutes
([example](https://github.com/astral-sh/puffin/actions/runs/7479909295/job/20358124969)),
it's probably just slowing down CI. It's ~25% of our test runtime and
~50% of our clippy runtime.
2024-01-10 15:04:27 -06:00
Zanie Blue 811332eacc
Improve handling of "full" version ranges (#868)
Reduces the number of implementation branches handling `Range:full`,
deferring it to `PackageRange`.
Improves some user-facing messages, e.g. saying `all versions of
<package>` instead of `<package>*`.
Changes the member names of the `PackageRangeKind` enum — they were not
very clear.
2024-01-10 21:03:55 +00:00
Zanie Blue a65c55ff4a
Say "cannot be used" and "must be used" instead of "forbidden" and "mandatory" (#867)
Closes #858
2024-01-10 20:49:40 +00:00
Zanie Blue 845ba6801d
Improve formatting of incompatible terms when there are two items (#866) 2024-01-10 20:36:54 +00:00
Zanie Blue 93d3093a2a
Improve formatting of package ranges in error messages (#864)
Closes #810
Closes https://github.com/astral-sh/puffin/issues/812
Requires https://github.com/zanieb/pubgrub/pull/19 and
https://github.com/zanieb/pubgrub/pull/18

- Always pair package ranges with names e.g. `... of a matching a<1.0`
instead of `... of a matching <1.0`
- Split range segments onto multiple lines when not a singleton as
suggested in
[#850](https://github.com/astral-sh/puffin/pull/850#discussion_r1446419610)
- Improve formatting when ranges are split across multiple lines e.g. by
avoiding extra spaces and improving wording

Note review will require expanding the hidden files as there are
significant changes to the report formatter and snapshots.

Bear with me here as these are definitely not perfect still.

The following changes build on top of this independently for further
improvements:
- #868 
- #867 
- #866 
- #871
2024-01-10 14:16:23 -06:00
konsti 4d8bfd7f61
Split source dist error type into error and kind (#872)
It's a better, less redundant error type. It will come in handy when
adding a second parse function.
2024-01-10 17:42:54 +00:00
bojanserafimov 6993f87037
Commit compiled requirement test files (#869) 2024-01-10 09:24:17 -05:00
Charlie Marsh fbb57b24dd
Add `--seed` flag to `venv` to allow seed package environments (#865)
## Summary

Installs the seed packages you get with `virtualenv`, but opt-in rather
than opt-out.

Closes https://github.com/astral-sh/puffin/issues/852.

## Test Plan

```
❯ ./scripts/benchmarks/venv.sh
+ hyperfine --runs 20 --warmup 3 --prepare 'rm -rf .venv' './target/release/puffin venv' --prepare 'rm -rf .venv' 'virtualenv --without-pip .venv' --prepare 'rm -rf .venv' 'python -m venv --without-pip .venv'
Benchmark 1: ./target/release/puffin venv
  Time (mean ± σ):       4.6 ms ±   0.2 ms    [User: 2.4 ms, System: 3.6 ms]
  Range (min … max):     4.3 ms …   4.9 ms    20 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Benchmark 2: virtualenv --without-pip .venv
  Time (mean ± σ):      73.3 ms ±   0.3 ms    [User: 57.4 ms, System: 14.2 ms]
  Range (min … max):    72.8 ms …  74.0 ms    20 runs

Benchmark 3: python -m venv --without-pip .venv
  Time (mean ± σ):      22.5 ms ±   0.3 ms    [User: 17.0 ms, System: 4.9 ms]
  Range (min … max):    22.0 ms …  23.2 ms    20 runs

Summary
  './target/release/puffin venv' ran
    4.92 ± 0.20 times faster than 'python -m venv --without-pip .venv'
   16.00 ± 0.63 times faster than 'virtualenv --without-pip .venv'
+ hyperfine --runs 20 --warmup 3 --prepare 'rm -rf .venv' './target/release/puffin venv --seed' --prepare 'rm -rf .venv' 'virtualenv .venv' --prepare 'rm -rf .venv' 'python -m venv .venv'
Benchmark 1: ./target/release/puffin venv --seed
  Time (mean ± σ):      20.2 ms ±   0.4 ms    [User: 8.6 ms, System: 15.7 ms]
  Range (min … max):    19.7 ms …  21.2 ms    20 runs

Benchmark 2: virtualenv .venv
  Time (mean ± σ):     135.1 ms ±   2.4 ms    [User: 66.7 ms, System: 65.7 ms]
  Range (min … max):   133.2 ms … 142.8 ms    20 runs

Benchmark 3: python -m venv .venv
  Time (mean ± σ):      1.656 s ±  0.014 s    [User: 1.447 s, System: 0.186 s]
  Range (min … max):    1.641 s …  1.697 s    20 runs

Summary
  './target/release/puffin venv --seed' ran
    6.67 ± 0.17 times faster than 'virtualenv .venv'
   81.79 ± 1.70 times faster than 'python -m venv .venv'
```
2024-01-09 20:45:56 -05:00
Charlie Marsh 55f2be72e2
Default to PEP 517-based builds (#843)
## Summary

Our current setup uses the legacy `setup.py`-based builds if a
`pyproject.toml` file isn't present. This matches pip's behavior.
However, `pypa/build` uses PEP 517-based builds in such cases, and it
looks like pip plans to make that the default
(https://github.com/pypa/pip/issues/9175), with the limiting factor
being performance issues related to isolated builds.

This is now the default behavior, but the `--legacy-setup-py` flag
allows users to opt-in to using `setup.py` directly for distributions
that lack a `pyproject.toml`.
2024-01-10 01:27:06 +00:00
Charlie Marsh e26dc8e33d
Add support for `prepare_metadata_for_build_wheel` (#842)
## Summary

This PR adds support for `prepare_metadata_for_build_wheel`, which
allows us to determine source distribution metadata without building the
source distribution. This represents an optimization for the resolver,
as we can skip the expensive build phase for build backends that support
it.

For reference, `prepare_metadata_for_build_wheel` seems to be supported
by:

- `hatchling` (as of
[1.0.9](https://hatch.pypa.io/latest/history/hatchling/#hatchling-v1.9.0)).
- `flit`
- `setuptools`

In fact, it seems to work for every backend _except_ those using legacy
`setup.py`.

Closes #599.
2024-01-10 00:07:37 +00:00
konsti 858d5584cc
Use `Dist` in `VersionMap` (#851)
Refactoring split out from find links support: Find links files can be
represented as `Dist`, but not really as `File`, they don't have url nor
hashes.

`DistRequiresPython` is somewhat odd as an in between type.
2024-01-10 00:14:42 +01:00
konsti 1203f8f9e8
Gourgeist updates (#862)
* Use caching again
* Make clap feature only required for the cli/bin optional
2024-01-09 23:04:15 +00:00
bojanserafimov e67b7858e6
Use zlib-ng for faster decompression (#859) 2024-01-09 16:13:36 -05:00
Zanie Blue 34d548de21
Improve error messages when there are no versions of a singleton range (#855) 2024-01-09 15:09:52 -06:00
Charlie Marsh 33982efb25
Remove a TOCTOU read in build (#860)
We should just read and handle the not-found case, rather than checking
if the file doesn't exist first.
2024-01-09 20:33:08 +00:00
Charlie Marsh 31139aa88d
Add derive feature to `gourgeist` (#854)
Needed to build `gourgeist` directly, probably dropped during a
refactor.
2024-01-09 17:46:16 +00:00
Charlie Marsh 7f6aa1a236
Add a basic venv benchmark (#853) 2024-01-09 17:43:16 +00:00
konsti ee6d809b60
Remove unused `Result` (#849)
Remove some dead code, seems to be a refactoring oversight
2024-01-09 16:35:10 +00:00
konsti 643e5e4a49
Use pdm for black editable as PEP 621 test case (#848)
This gives us a PEP 621 test package in tree and increases the diversity
for the editable tests a bit.
2024-01-09 16:33:05 +00:00
konsti 5b0b072e3c
Allow files >4GB on 32-bit platforms (#847)
Changes `File::size` from a `usize` to a `u64`.

The motivations are that with tensorflow wheels being 475 MB
(https://pypi.org/project/tensorflow/2.15.0.post1/#files), we're already
only one order of magnitude away and to avoid target dependent failures.
2024-01-09 17:31:49 +01:00
Charlie Marsh ee3a6431c7
Show available pre-releases in error hints (#844)
## Summary

If pre-releases are available for a package that we otherwise couldn't
resolve, we now show a hint that includes one of the example versions.

Closes https://github.com/astral-sh/puffin/issues/811.
2024-01-09 09:58:38 -05:00
konsti b1edecdf1f
Filter out files with invalid requires python specifiers (#775)
Instead of trying to fixup _all_ the invalid version specifiers on pypi
and elsewhere, this filters out distributions with invalid
`requires-python` version specifiers that even
`LenientVersionSpecifiers` couldn't parse, as opposed to failing
entirely, which we currently do.

I would be nicer to model through an invalid distribution pubgrub type,
together with e.g. source dists with an unknown extension, so that the
version itself still shows up in the error trace.

At the same time, we reduce the log level for fixups from warning to
trace, as they are not actionable for the user.
2024-01-09 02:46:27 +00:00
Zanie Blue 64da1f0306
Always pair package names with ranges in error messages (#838)
Adjusts display of "no versions available" in error messages to be
consistent with other package/range pairings i.e. we usually display
"<package-name><range>".
2024-01-08 22:11:10 +00:00
Charlie Marsh 19c6d655b5
Avoid duplicated source distribution handling in url (#841)
## Summary

Right now, both the callback _and_ the "We have no compatible wheel"
paths have a lot of repeated code. This PR changes the callback to
_just_ remove all the wheels and handle the download, and the rest of
the method following the callback is responsible for finding and
building any wheels.
2024-01-08 16:19:54 -05:00
Charlie Marsh cc9140643e
Rename `metadata` to `built_wheel` in `source/mod.rs` (#840) 2024-01-08 19:20:20 +00:00
Charlie Marsh df254087d9
Break `source_dist.rs` into a module (#839)
## Summary

Finding this file hard to edit and work in since it's gotten quite
large.
2024-01-08 19:14:45 +00:00
Zanie Blue 2b0c2e294b
Fix formatting of negated singleton versions in error messages (#836)
Closes #805 
Requires https://github.com/zanieb/pubgrub/pull/17
2024-01-08 12:33:01 -06:00
Zanie Blue c3d37db85b
Check locked dependencies in CI (#837) 2024-01-08 18:18:40 +00:00
Charlie Marsh aeefe65227
Fix `tracing-duration-export` compilation (#835)
## Summary

I'm unable to run `puffin-cli` on `main` as the
`tracing-durations-export` is marked as optional, but the crate actually
depends on it to compile. Further, without `tracing-durations-export`,
there are `Option` types that can't resolve to a concrete type.

This PR fixes compilation with and without the feature.
2024-01-08 18:04:23 +00:00
Charlie Marsh c06bf658bb
Remove some filesystem calls from the installer (#834)
Noticed these when working on something unrelated. Generally:

- Prefer `entry.file_type()` over `entry.path().is_file()` or similar,
as the former is almost always free on Unix.
- Call `entry.path()` once, since it allocates internally (returns a
`PathBuf`).
2024-01-08 12:59:01 -05:00
bojanserafimov cd6265a66d
Fix typo in bench docs (#833) 2024-01-08 12:35:26 -05:00
konsti 004147d441
Add tracing_durations_export feature to puffin-cli (#830)
The optional `tracing-durations-export` feature allows creating
parallelism plots from all puffin-cli commands without affecting
production builds.

Usage:

```
virtualenv --clear -p 3.10 .venv310 && TRACING_DURATIONS_FILE=target/traces/jupyter-no-cache.ndjson RUST_LOG=puffin=info VIRTUAL_ENV=.venv310 cargo run --bin puffin --profile profiling --features tracing-durations-export -- pip-install -v --no-cache jupyter
virtualenv --clear -p 3.10 .venv310 && TRACING_DURATIONS_FILE=target/traces/jupyter.ndjson RUST_LOG=puffin=info VIRTUAL_ENV=.venv310 cargo run --bin puffin --profile profiling --features tracing-durations-export -- pip-install -v jupyter
 ```

Output, plotted in collapsed mode for readability:

Cached jupyter:

![jupyter](https://github.com/astral-sh/puffin/assets/6826232/f7e03c68-0438-4cf4-bceb-9a4a146cc506)

Uncached jupyter:

![image](https://github.com/astral-sh/puffin/assets/6826232/cfdd3383-7a9d-43d6-b8d0-201f64611596)
2024-01-08 16:20:45 +01:00
konsti b6338b5e4a
Use tracing-durations-export to visualize parallelism bottlenecks (dev commands) (#816)
Example usage:

```
# Cached
TRACING_DURATIONS_FILE=target/traces/black.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve black
TRACING_DURATIONS_FILE=target/traces/meine_stadt_transparent.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve meine_stadt_transparent
TRACING_DURATIONS_FILE=target/traces/jupyter.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve jupyter

# No cache
TRACING_DURATIONS_FILE=target/traces/black-no-cache.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve --no-cache black
TRACING_DURATIONS_FILE=target/traces/meine_stadt_transparent-no-cache.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve --no-cache meine_stadt_transparent
TRACING_DURATIONS_FILE=target/traces/jupyter-no-cache.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve --no-cache jupyter
```

Uncached black output example:


![black-no-cache](https://github.com/astral-sh/puffin/assets/6826232/38497b89-7214-453b-9456-c9d9cbf7d2d5)
2024-01-08 16:20:38 +01:00
konsti 243392f718
`cargo run` run `puffin` by default (#831)
`cargo run` now runs `puffin` by default. `cargo run --bin puffin-dev`
remains working.
2024-01-08 12:49:06 +00:00
konsti 3f587156ec
Improve install instrumentation (#829)
Add tracing spans to different phases of the wheel installation.
2024-01-08 10:13:59 +00:00
konsti 2d4743d782
Update compare_with_pip.py (#828)
Preparing for #817
2024-01-08 09:46:30 +00:00
konsti 60ba7dd14f
Use `std::io::read_to_string` (#826)
The `std::io::read_to_string` shorthand was stabilized in 1.65.
2024-01-08 09:15:38 +00:00
Charlie Marsh 54838914be
Migrate back to `owo-colors` (#824)
In the past, I moved us to `owo-colors`
(https://github.com/astral-sh/puffin/pull/121); then, we moved back,
because we ran into issues with overriding the settings to force-disable
colors. But `anstream` solved those problems, so I'm moving us _back_ to
`owo-colors`, since it's what `anstream` recommends, and it's already
used by many of our dependencies (`miette`, `configparser`).

---------

Co-authored-by: konstin <konstin@mailbox.org>
2024-01-08 08:54:57 +00:00
Charlie Marsh 17452e3e64
Simplify ranges in pre-release hints (#825)
Closes https://github.com/astral-sh/puffin/issues/807.
2024-01-07 12:40:22 -05:00
Charlie Marsh e6fcb9c4d3
Use `anstream` for all color control (#823)
## Summary

We can use `anstream` for all color control, rather than going through
`colored`. Note that we still need the `colored` crate, since `colored`
and `anstream` solve different problems. (`anstream` recommends using
`owo-colors` alongside it, but `colored` seems to work fine?)

Resolves the issue raised in
https://github.com/astral-sh/puffin/pull/742 via `anstream` rather than
`colored`.

Closes https://github.com/astral-sh/puffin/issues/782.
2024-01-06 20:44:05 -05:00
Charlie Marsh fed492831a
Inline some format placeholders (#822) 2024-01-06 23:13:44 +00:00
Charlie Marsh 77c3a67029
Remove `pub(crate)` from `RegistryClient` fields (#821) 2024-01-06 22:05:18 +00:00