Commit Graph

182 Commits

Author SHA1 Message Date
konsti e23292641f
Add pypi 10k packages with most dependents dataset (#711)
From manual inspection, this dataset generated through the [libraries.io
API](https://libraries.io/api#project-search) seems more mainstream than
the current 8k one, which is also preserved. I've added the dataset to
the repo because the API requires an API key.
2023-12-24 18:31:52 +00:00
Charlie Marsh 5bce699ee1
Add support for HTML indexes (#719)
## Summary

This PR adds support for HTML index responses (as with
`--index-url=https://download.pytorch.org/whl`).

Closes https://github.com/astral-sh/puffin/issues/412.
2023-12-24 16:04:00 +00:00
konsti e60f0ec732
Update pubgrub (#713)
Easier than i expected: We simply never construct the pubgrub error
variants since we have our own main loop. The `unreachable!()`s can be
removed when never is stabilized
2023-12-20 23:56:59 +01:00
Charlie Marsh 98fcb76015
Lock entire virtualenv during modifying commands (#695)
These commands all assume that the `site-packages` are constant
throughout.

Closes #691.
2023-12-18 16:44:45 -05:00
konsti 89ca0d68b9
`exclude_newer` in puffin-dev resolve-cli (#684)
Internal dev tool change.
2023-12-18 14:06:54 +00:00
konsti f059c6e6a6
Support editable in pip-sync and pip-compile (#587)
Support `-e path/do/dir` in pip-sync and and pip-compile.
2023-12-16 22:37:34 +00:00
konsti 71964ec7a8
Switch to msgpack in the cached client (#662)
This gives a 1.23 speedup on transformers-extras. We could change to
msgpack for the entire cache if we want. I only tried this format and
postcard so far, where postcard was much slower (like 1.6s).

I don't actually want to merge it like this, i wanted to figure out the
ballpark of improvement for switching away from json.

```
hyperfine --warmup 3 --runs 10 "target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in" "target/profiling/branch pip-compile scripts/requirements/transformers-extras.in"
Benchmark 1: target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in
  Time (mean ± σ):     179.1 ms ±   4.8 ms    [User: 157.5 ms, System: 48.1 ms]
  Range (min … max):   174.9 ms … 188.1 ms    10 runs

Benchmark 2: target/profiling/branch pip-compile scripts/requirements/transformers-extras.in
  Time (mean ± σ):     221.1 ms ±   6.7 ms    [User: 208.1 ms, System: 46.5 ms]
  Range (min … max):   213.5 ms … 235.5 ms    10 runs

Summary
  target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in ran
    1.23 ± 0.05 times faster than target/profiling/branch pip-compile scripts/requirements/transformers-extras.in
```

Disadvantage: We can't manually look into the cache anymore to debug
things

- [ ] Check more formats, i currently only tested json, msgpack and
postcard, there should be other formats, too
- [x] Switch over `CachedByTimestamp` serialization (for the interpreter
caching)
- [x] Switch over error handling and make sure puffin is still resilient
to cache failure
2023-12-16 21:01:35 +00:00
konsti 620f73b38b
Speed up version parsing for a 1.27±0.03 speedup in transformers-extras with conservative changes (#660)
Two low-hanging fruits as optimizations for version parsing: A fast path
for release only versions and removing the regex from version specifiers
(still calling into version's parsing regex if required). This enables
optimizing the serde format since we now see the serde part instead of
only PEP 440 parsing. I intentionally didn't rewrite the full PEP 440 at
this step.

```console
$ hyperfine --warmup 5 --runs 50 "target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in" "target/profiling/main pip-compile scripts/requirements/transformers-extras.in"
  Benchmark 1: target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in
    Time (mean ± σ):     217.1 ms ±   3.2 ms    [User: 194.0 ms, System: 55.1 ms]
    Range (min … max):   211.0 ms … 228.1 ms    50 runs

  Benchmark 2: target/profiling/main pip-compile scripts/requirements/transformers-extras.in
    Time (mean ± σ):     276.7 ms ±   5.7 ms    [User: 252.4 ms, System: 54.6 ms]
    Range (min … max):   268.9 ms … 303.5 ms    50 runs

  Summary
    target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in ran
      1.27 ± 0.03 times faster than target/profiling/main pip-compile scripts/requirements/transformers-extras.in
```

---------

Co-authored-by: Andrew Gallant <andrew@astral.sh>
2023-12-15 14:03:35 -05:00
Charlie Marsh 9470c20e7a
Avoid double resolution during source builds (#656)
## Summary

This PR ensures that we re-use the resolution to install the build
dependencies when building a source distribution. Currently, we only
pass along the list of requirements, and then use the `Finder` to map
each requirement to a distribution. But we already determine the correct
distribution when resolving!

Closes https://github.com/astral-sh/puffin/issues/655.
2023-12-15 17:27:16 +00:00
Charlie Marsh ed8dfbfcf7
Preserve verbatim URLs (#639)
## Summary

This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout
the resolution and installation pipeline. In short, alongside the parsed
`Url`, we also keep the URL as written by the user. This enables us to
display the URL exactly as written by the user, rather than the
serialized path that we use internally.

This will be especially useful once we start expanding environment
variables since, at that point, we'll be able to write the version of
the URL that includes the _unexpected_ environment variable to the
output file.
2023-12-14 15:03:39 +00:00
Charlie Marsh db7e2dedbb
Move archive extraction into its own crate (#647)
We have some shared utilities beyond `puffin-build` and
`puffin-distribution`, and further, I want to be able to access the
sdist archive extraction logic from `puffin-distribution`. This is
really generic, so moving into its own crate.
2023-12-14 04:49:09 +00:00
Charlie Marsh 920e10fc8f
Use `FxHash` consistently (#632) 2023-12-13 05:36:03 +00:00
Charlie Marsh a24eb57e93
Make warnings user-facing (#628)
## Summary

Now, `puffin_warnings::warn_once` and `puffin_warnings::warn` will go to
`stderr`, as long as the user isn't running under `--quiet`. Previously,
these went through `tracing`, and so were only visible when running
under `--verbose`.
2023-12-12 21:24:38 -05:00
Zanie Blue 490fb55ac5
Use available versions to simplify unsat error reports (#547)
Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate
version ranges in error reports using the actual available versions for
each package.

Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements
this behavior as a method in the `Reporter` — here it's implemented in
our custom report formatter (#521) instead which requires no upstream
changes.

Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the
versions for packages that will be used in the report.

This is a work in progress. Some things to do:
- ~We may want to allow lazy retrieval of the version maps from the
formatter~
- [x] We should probably create a separate error type for no solution
instead of mixing them with other resolve errors
- ~We can probably do something smarter than creating vectors to hold
the versions~
- [x] This degrades error messages when a single version is not
available, we'll need to special case that
- [x] It seems safer to coerce the error type in `resolve` instead of
`solve` if feasible
2023-12-12 23:25:16 +00:00
Charlie Marsh 1181288078
Download, build, and install in a single pipeline phase (#605)
## Summary

At present, we have two separate phases within the installation pipeline
related to populating wheels into the cache. The first phase downloads
the distribution, and then builds any source distributions into wheels;
the second phase unzips all the built wheels into the cache.

This PR merges those two phases into one, such that we seamlessly
download, build, and unzip wheels in one pass. This is more efficient,
since we can start unzipping while we build. It also ensures that if the
install _fails_ partway through, we don't end up with a bunch of
downloaded wheels that we never had a chance to unzip. The code is also
much simpler.

The main downside is that the user-facing feedback isn't as granular,
since we only have one phase and one progress bar for what was
originally three distinct phases.

Closes https://github.com/astral-sh/puffin/issues/571.

## Test Plan

I ran the benchmark script on two separate requirements files, and saw a
7% and 31% speedup respectively:

```text
+ TARGET=./scripts/benchmarks/requirements.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache
  Time (mean ± σ):     269.4 ms ±  33.0 ms    [User: 42.4 ms, System: 117.5 ms]
  Range (min … max):   221.7 ms … 446.7 ms    100 runs

Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache
  Time (mean ± σ):     250.6 ms ±  28.3 ms    [User: 41.5 ms, System: 127.4 ms]
  Range (min … max):   207.6 ms … 336.4 ms    100 runs

Summary
  './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran
    1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
```

```text
+ TARGET=./scripts/benchmarks/requirements-large.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
  Time (mean ± σ):      5.053 s ±  0.354 s    [User: 1.413 s, System: 6.710 s]
  Range (min … max):    4.584 s …  6.333 s    100 runs

Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
  Time (mean ± σ):      3.845 s ±  0.225 s    [User: 1.364 s, System: 6.970 s]
  Range (min … max):    3.482 s …  4.715 s    100 runs

Summary
  './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran
```
2023-12-11 15:42:29 +00:00
Charlie Marsh 32f54a5947
Use async `Command` for wheel build operations (#601)
Incredibly, this speeds up the install on a large project from 2m6s to
50s.
2023-12-09 16:20:52 +00:00
Charlie Marsh a24534b0ce
Use `rustc-hash` instead of `fxhash` crate (#594)
`fxhash` is the old, less maintained version of this crate
(`rustc-hash`). We use the latter in Ruff.
2023-12-08 20:27:49 +00:00
konsti 6005d7a552
Keep track of in flight unzips using `OnceMap` (#544)
I saw warnings when we were e.g. unzipping wheel and setuptools in two
tasks at the same time. We now keep track of in flight unzips.

This introduces a `OnceMap` abstraction which we also use in the
resolver.
2023-12-08 20:18:11 +00:00
Charlie Marsh 4b8642c6f7
Enable selective cache purging in `puffin clean` (#589)
## Summary

This PR enables `puffin clean` to accept package names as command line
arguments, and selectively purge entries from the cache tied to the
given package.

Relate to #572.

## Test Plan

Modified all the caching tests to run an additional step to (1) purge
the cache, and (2) re-install the package.
2023-12-08 19:51:32 +00:00
Zanie Blue ef7be9103c
Parse `SimpleJson` into categorized data in the client (#522)
Extends #517 with a suggestion from @konstin to parse the `SimpleJson`
into an intermediate type `SimpleMetadata(BTreeMap<Version,
VersionFiles>)` before converting to a `VersionMap`. This reduces the
number of times we need to parse the response. Additionally, we cache
the parsed response now instead of `SimpleJson`.

`VersionFiles` stores two vectors with
`WheelFilename`/`SourceDistFilename` and `File` tuples. These can be
iterated over together or separately. A new enum `DistFilename` was
added to capture the `SourceDistFilename` and `WheelFilename` variants
allowing iteration over both vectors.
2023-12-07 11:04:47 -06:00
Charlie Marsh aa065f5c97
Modify install plan to support all distribution types (#581)
This PR adds caching support for built wheels in the installer.
Specifically, the `RegistryWheelIndex` now indexes both downloaded and
built wheels (from registries), and we have a new `BuiltWheelIndex` that
takes a subdirectory and returns the "best-matching" compatible wheel.

Closes #570.
2023-12-07 04:43:34 +00:00
konsti 366c389385
Parse editable installs (#564)
Parse `-e` for editable installs in `requirements.txt`.

Unlike all the other requirements, editable installs don't have the name
of the package specified.
2023-12-06 18:21:15 +01:00
konsti 3f4d7b7826
Improve path source dist caching (#578)
Path distribution cache reading errors are no longer fatal.

We now invalidate the path file source dists if its modification
timestamp changed, and invalidate path dir source dists if
`pyproject.toml` or alternatively `setup.py` changed, which seems good
choices since changing pyproject.toml should trigger a rebuild and the
user can `touch` the file as part of their workflow.

`CachedByTimestamp` is now a shared util. It doesn't have methods as i
don't think it's worth it yet for two users.

Closes #478

TODO(konstin): Write a test. This is probably twice as much work as that
fix itself, so i made that PR without one for now.
2023-12-06 11:47:01 -05:00
Charlie Marsh a15da36d74
Avoid removing local wheels when unzipping (#560)
## Summary

When installing a local wheel, we need to avoid removing the zipped
wheel (since it lives outside of the cache), _and_ need to ensure that
we unzip the wheel into the cache (rather than replacing the zipped
wheel, which may even live outside of the project).

Closes https://github.com/astral-sh/puffin/issues/553.
2023-12-05 17:50:08 +00:00
Charlie Marsh 6f055ecf3b
Remove existing built wheels when building source distributions (#559)
This PR modifies the source distribution building to replace any
existing targets after building the new wheel. In some cases, the
existence of an existing target may be indicative of a bug, so we warn.
It's partially a workaround for some (but not all) of the errors in
https://github.com/astral-sh/puffin/issues/554.
2023-12-05 12:45:24 -05:00
Zanie Blue 37ca2e2928
Bump pubgrub for latest upstream (#525)
https://github.com/pubgrub-rs/pubgrub/pull/157
2023-12-04 09:09:30 -06:00
konsti 6dc8ebcb90
Test interpreter cache invalidation (#540)
Add missing test for #529/#508.
2023-12-04 10:03:43 +00:00
Charlie Marsh ee2fca3a48
Add CACHEDIR and .gitignore tags to cache directories (#526)
## Summary

Even if this will typically be in the user's application folder (rather
than a local directory), it's still a good practice.

Closes https://github.com/astral-sh/puffin/issues/280.
2023-12-02 00:37:51 +00:00
konsti 9806901a16
Consolidate wheel caches (#524)
After this change, two wheel caches remain: `built-wheels-v0` and
`wheels-v0`, docs screenshots below. Each contains both the wheel
metadata, cache policy and zip or unzipped wheels under the same name.

The zipped/unzipped strategy is as follows: In `pip-compile`, when we
build a wheel, we store it zipped. When `pip-sync` or a source dist
build in `pip-compile` need to install the wheel, we unzip it, remove
the file and replace it with the unzipped wheel.

This removes `WheelCache` and `UrlIndex` in favor of `Cache` plus
`WheelCache`. The non-built wheel cache now considers index urls and the
url for url wheels.

I'm unsure if we need the `Unzipper` type, this could just be a
function.

I move `no_index` into `IndexUrls` and started using `IndexUrl` up to
the clap level.

I left a number of TODOs in the code, namely performing the actual
invalidation of unzipped wheels and making the `InstallPlan` understand
cache invalidation (i.e. uninstall wheels when their remote changed).


![image](https://github.com/astral-sh/puffin/assets/6826232/c4d45979-485b-4954-848d-fd3347ee2510)
2023-12-01 20:16:33 +00:00
Zanie Blue 2a8544df9e
Use a custom pubgrub report formatter (#521)
Uses https://github.com/zanieb/pubgrub/pull/10 to drastically simplify
our reporter implementation. This will allow us to make use of upstream
improvements to the reporter e.g.
https://github.com/zanieb/pubgrub/pull/8 without multiple duplicative
pull requests.
2023-12-01 13:36:12 -06:00
Zanie Blue efcc4f1409
Use upstream commit for reflink-copy requirement (#523)
https://github.com/cargo-bins/reflink-copy/pull/51 was merged
2023-12-01 10:58:24 +00:00
Zanie Blue 5f1f207628
Recursively merge existing package directories on installation (#516)
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.

Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.

When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
2023-11-30 10:14:51 -06:00
konsti 929df586fb
Skip tf-models-nightly in resolve-many dev script for now (#510)
`tf-models-nightly` has pathologic backtracking behaviour, skip it for
now so we can benchmark the rest.
2023-11-28 18:25:32 +00:00
konsti d89fbeb642
Migrate interpreter query to custom caching (#508)
This removes the last usage of cacache by replacing it with a custom,
flat json caching keyed by the digest of the executable path.


![image](https://github.com/astral-sh/puffin/assets/6826232/8f777c4c-1f1b-4656-ba7b-002175270556)

A step towards #478. I've made `CachedByTimestamp<T>` generic over `T`
but intentionally not moved it to `puffin-cache` yet.
2023-11-28 17:14:59 +00:00
konsti 5435d44756
Introduce `Cache`, `CacheBucket` and `CacheEntry` (#507)
This is mostly a mechanical refactor that moves 80% of our code to the
same cache abstraction.

It introduces cache `Cache`, which abstracts away the path of the cache
and the temp dir drop and is passed throughout the codebase. To get a
specific cache bucket, you need to requests your `CacheBucket` from
`Cache`. `CacheBucket` is the centralizes the names of all cache
buckets, moving them away from the string constants spread throughout
the crates.

Specifically for working with the `CachedClient`, there is a
`CacheEntry`. I'm not sure yet if that is a strict improvement over
`cache_dir: PathBuf, cache_file: String`, i may have to rotate that
later.

The interpreter cache moved into `interpreter-v0`.

We can use the `CacheBucket` page to document the cache structure in
each bucket:


![image](https://github.com/astral-sh/puffin/assets/6826232/b023fdfb-e34d-4c2d-8663-b5f73937a539)
2023-11-28 17:11:14 +00:00
konsti 8855f44b5f
Move simple index queries to `CachedClient` (#504)
Replaces the usage of `http-cache-reqwest` for simple index queries with
our custom cached client, removing `http-cache-reqwest` altogether.

The new cache paths are `<cache>/simple-v0/<index>/<package_name>.json`.
I could not test with a non-pypi index since i'm not aware of any other
json indices (jax and torch are both html indices).

In a future step, we can transform the response to be a
`HashMap<Version, {source_dists: Vec<(SourceDistFilename, File)>,
wheels: Vec<(WheeFilename, File)>}` (independent of python version, this
cache is used by all environments together). This should speed up cache
deserialization a bit, since we don't need to try source dist and wheel
anymore and drop incompatible dists, and it should make building the
`VersionMap` simpler. We can speed this up even further by splitting
into a version lists and the info for each version. I'm mentioning this
because deserialization was a major bottleneck in the rust part of the
old python prototype.

Fixes #481
2023-11-28 00:11:03 +00:00
konsti d54e780843
Source dist metadata refactor (#468)
## Summary and motivation

For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.

`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.

Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.

In a follow-up PR, I'll also clear built wheels when they've become
stale.

Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.

Closes #473

## Source dist cache structure 

Entries by supported sources:
 * `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename

(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`

Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e2853bf83db817a7dd53d/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│   └── 5c56bc1c58c34c11
│       └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│           └── metadata.json
├── pypi
│   └── django-allauth-0.51.0.tar.gz
│       └── metadata.json
└── url
    └── 6781bd6440ae72c2
        └── werkzeug-3.0.1.tar.gz
            └── metadata.json
```

The inside of a `metadata.json`:
```json
{
  "data": {
    "django_allauth-0.51.0-py3-none-any.whl": {
      "metadata-version": "2.1",
      "name": "django-allauth",
      "version": "0.51.0",
      ...
    }
  }
}
```
2023-11-24 17:47:58 +00:00
konsti 8d247fe95b
Add `Tags::from_interpreter` (#498)
Small refactoring
2023-11-24 11:36:01 +00:00
Charlie Marsh 17228ba04e
Add support for path dependencies (#471)
## Summary

This PR adds support for local path dependencies. The approach mostly
just falls out of our existing approach and infrastructure for Git and
URL dependencies.

Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a
separate issue for editable installs.)

## Test Plan

Added `pip-compile` tests that pre-download a wheel or source
distribution, then install it via local path.
2023-11-21 11:49:42 +00:00
Charlie Marsh f1aa70d9d3
Refactor distribution types to return `Result` (#470)
## Summary

A variety of small refactors to the distribution types crate to (1)
return `Result` if we find an invalid wheel, rather than treating it as
a source distribution with a `.whl` suffix, and (2) DRY up some repeated
code around URLs.
2023-11-20 23:08:54 +00:00
konsti f0841cdb6e
Wheel metadata refactor (#462)
A consistent cache structure for remote wheel metadata:

 * `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json`
* `<wheel metadata
cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json`
* `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json`

The source dist caching will use a similar structure (#468).
2023-11-20 17:26:36 +01:00
konsti d3e9e1783f
Refactor lenient parsing (#467)
Deduplicate lenient parsing code between version specifiers and
Requirement. Use `warn_once!` since the warnings did show up multiple
times in my code. Fix the macro hygiene in `warn_once!`.
2023-11-20 15:35:38 +00:00
Zanie Blue e9b6fb90d6
Bump pubgrub to get range display changes (#444)
See https://github.com/zanieb/pubgrub/pull/5
2023-11-20 09:12:48 -06:00
Charlie Marsh 60f595b469
Prefer future stream over `JoinSet` in downloader (#469)
This avoids introducing a static lifetime requirement and, in my
benchmarks, is even a little faster.
2023-11-20 13:23:30 +00:00
Charlie Marsh 8decb29bad
Use a dedicated error type for `puffin-distribution` (#466) 2023-11-20 11:38:27 +00:00
Charlie Marsh 35fd86631b
Unify distribution operations into a single crate (#460)
## Summary

This PR unifies the behavior that lived in the resolver's `distribution`
crates with the behaviors that were spread between the various structs
in the installer crate into a single `Fetcher` struct that is intended
to manage all interactions with distributions. Specifically, the
interface of this struct is such that it can access distribution
metadata, download distributions, return those downloads, etc., all with
a common cache.

Overall, this is mostly just DRYing up code that was repeated between
the two crates, and putting it behind a reasonable shared interface.
2023-11-20 11:22:52 +00:00
konsti 46bb18f06e
Track file index (#452)
Track the index (or at least its url) where we got a file from across
the source code.

Fixes #448
2023-11-20 08:48:16 +00:00
Charlie Marsh 6fd582f8b9
Rename `puffin-distribution` to `distribution-types` (#458)
## Summary

This crate only contains types, and I want to introduce a new crate for
all _operations_ on distributions, so this feels like a more natural
name given we also have `pypi-types`.
2023-11-20 09:40:26 +01:00
konsti 255edf4445
Serde support for WheelFilename through str repr (#459)
I need this later, splitting out for PR size
2023-11-19 19:43:14 +00:00
konsti ab60233131
Use absolute cache paths (#453)
Previously, git requirements would fail when setting `--cache-dir`:

```console
$ cargo run --bin puffin -- pip-compile --cache-dir cache-all-kinds scripts/benchmarks/requirements/all-kinds.in
error: Failed to build distribution from URL: git+https://github.com/pydantic/pydantic-extra-types.git
  Caused by: Invalid path URL: cache-all-kinds/git-v0/db/b49ffcfeb6c2e9d8
  ```

The cause is using a relative and not an absolute path, which `Url` needs, the solution is to turn the cache dir into an absolute path.

This never showed up in the tests since the tests use absolute temp dirs for everything.
2023-11-19 13:32:32 +00:00
konsti bf71e7adcf
Add graphviz output to puffin-dev resolve-cli (#443)
I added output in graphviz DOT format to `puffin-dev resolve-cli` to
help with debugging resolutions. This requires tracking the requested
ranges in the graph. I also fixed the direction of the graph.

 Output for `black`:

```dot
digraph {
    0 [ label="click\n8.1.7"]
    1 [ label="black\n23.11.0"]
    2 [ label="packaging\n23.2"]
    3 [ label="mypy-extensions\n1.0.0"]
    4 [ label="tomli\n2.0.1"]
    5 [ label="pathspec\n0.11.2"]
    6 [ label="typing-extensions\n4.8.0"]
    7 [ label="platformdirs\n4.0.0"]
    1 -> 0 [ label=">=8.0.0"]
    1 -> 3 [ label=">=0.4.3"]
    1 -> 5 [ label=">=0.9.0"]
    1 -> 4 [ label=">=1.1.0"]
    1 -> 6 [ label=">=4.0.1"]
    1 -> 2 [ label=">=22.0"]
    1 -> 7 [ label=">=2"]
}
```


![image](https://github.com/astral-sh/puffin/assets/6826232/4a440fcd-6248-4349-8e1a-c3e0363e42b1)

transformers:


![image](https://github.com/astral-sh/puffin/assets/6826232/a13a693c-a8c0-4a4f-95d9-3458431c678a)

jupyter:


![graphviz](https://github.com/astral-sh/puffin/assets/6826232/ef730033-6fd9-4ea9-ac93-8c874c19a101)
2023-11-17 18:16:24 +00:00
Zanie Blue 221751487c
Use `UnusableDependencies` for URL dependency conflicts (#425)
Extends #424 with support for URL dependency incompatibilities.

Requires changes to `miette` to prevent URLs from being word wrapped;
accepted upstream in https://github.com/zkat/miette/pull/321
2023-11-17 08:28:12 -06:00
Charlie Marsh 2094680cdd
Add a `warn_user_once!` macro (#442)
Closes https://github.com/astral-sh/puffin/issues/429.
2023-11-17 02:34:06 +00:00
konsti 1883dbdc21
Always¹ clear temporary directories (#437)
Always¹ clear the temporary directories we create.

* Clear source dist downloads: Previously, the temporary directories
would remain in the cache dir, now they are cleared properly
* Clear wheel file downloads: Delete the `.whl` file, we only need to
cache the unpacked wheel
* Consistent handling of cache arguments: Abstract the handling for CLI
cache args away, again making sure we remove the `--no-cache` temp dir.

There are no more `into_path()` calls that persist `TempDir`s that i
could find.

¹Assuming drop is run, and deleting the directory doesn't silently
error.
2023-11-16 20:49:48 +00:00
Zanie Blue 0d9d4f9fca
Add an `UnusableDependencies` incompatibility kind and use for conflicting versions (#424)
Addresses
https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969

Similar to #338 this throws an error when merging versions results in an
empty set. Instead of propagating that error, we capture it and return a
new dependency type of `Unusable`. Unusable dependencies are a new
incompatibility kind which includes an arbitrary "reason" string that we
present to the user. Adding a new incompatibility kind requires changes
to the vendored pubgrub crate.

We could use this same incompatibility kind for conflicting urls as in
#284 which should allow the solver to backtrack to another valid version
instead of failing (see #425).

Unlike #383 this does not require changes to PubGrub's package mapping
model. I think in the long run we'll want PubGrub to accept multiple
versions per package to solve this specific issue, but we're interested
in it being merged upstream first. This pull request is just using the
issue as a simple case to explore adding a new incompatibility type.

We may or may not be able convince them to add this new incompatibility
type upstream. As discussed in
https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more
general incompatibility kind instead which can be used for arbitrary
problems. An upstream pull request has been opened for discussion at
https://github.com/pubgrub-rs/pubgrub/pull/153.

Related to:
- https://github.com/pubgrub-rs/pubgrub/issues/152
- #338 
- #383

---------

Co-authored-by: konsti <konstin@mailbox.org>
2023-11-16 20:02:06 +00:00
Zanie Blue 832058dbba
Switch from vendored PubGrub to a fork (#438)
A fork will let us stay up to date with the upstream while replaying our
work on top of it.

I expect a similar workflow to the RustPython-Parser fork we maintained,
except that I wrote an automation to create tags for each commit on the
fork (https://github.com/zanieb/pubgrub/pull/2) so we do not need to
manually tag and document each commit.

To update with the upstream:

- Rebase our fork's `main` branch on top of the latest changes in
upstream's `dev` branch
- Force push, overwriting our `main` branch history
- Change the commit hash here to the last commit on `main` in our fork

Since we automatically tag each commit on the fork, we should never lose
the commits that are dropped from `main` during rebase.
2023-11-16 13:49:19 -06:00
konsti e41ec12239
Option to resolve at a fixed timestamp with `pip-compile --exclude-newer YYYY-MM-DD` (#434)
This works by filtering out files with a more recent upload time, so if
the index you use does not provide upload times, the results might be
inaccurate. pypi provides upload times for all files. This is, the field
is non-nullable in the warehouse schema, but the simple API PEP does not
know this field.

If you have only pypi dependencies, this means deterministic,
reproducible(!) resolution. We could try doing the same for git repos
but it doesn't seem worth the effort, i'd recommend pinning commits
since git histories are arbitrarily malleable and also if you care about
reproducibility and such you such not use git dependencies but a custom
index.

Timestamps are given either as RFC 3339 timestamps such as
`2006-12-02T02:07:43Z` or as UTC dates in the same format such as
`2006-12-02`. Dates are interpreted as including this day, i.e. until
midnight UTC that day. Date only is required to make this ergonomic and
midnight seems like an ergonomic choice.

In action for `pandas`:

```console
$ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
Resolved 6 packages in 679ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
#    target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
numpy==1.26.2
    # via pandas
pandas==2.1.3
python-dateutil==2.8.2
    # via pandas
pytz==2023.3.post1
    # via pandas
six==1.16.0
    # via python-dateutil
tzdata==2023.3
    # via pandas
$ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
Resolved 5 packages in 655ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
#    target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
numpy==1.23.4
    # via pandas
pandas==1.5.1
python-dateutil==2.8.2
    # via pandas
pytz==2022.6
    # via pandas
six==1.16.0
    # via python-dateutil
$ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
Resolved 5 packages in 594ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
#    target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
numpy==1.21.4
    # via pandas
pandas==1.3.4
python-dateutil==2.8.2
    # via pandas
pytz==2021.3
    # via pandas
six==1.16.0
    # via python-dateutil
```
2023-11-16 19:46:17 +00:00
konsti 751f7fa9c6
Improve PEP 691 compatibility (#428)
[PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly
different, more relaxed rules around file metadata. These changes are
now reflected in the `File` struct. This will make it easier to support
alternative indices.

I had expected that i need to introduce a separate type for that, so i'm
happy it's two `Option`s more and an alias.

Part of #412
2023-11-16 19:03:44 +01:00
Charlie Marsh d3caf9ae86
Choose most-compatible wheel in resolver and installer (#422)
## Summary

This PR implements logic to sort wheels by priority, where priority is
defined as preferring more "specific" wheels over less "specific"
wheels. For example, in the case of Black, my machine now selects
`black-23.11.0-cp311-cp311-macosx_11_0_arm64.whl`, whereas sorting by
lowest priority instead gives me `black-23.11.0-py3-none-any.whl`.

As part of this change, I've also modified the resolver to fallback to
using incompatible wheels when determining package metadata, if no
compatible wheels are available.

The `VersionMap` was also moved out of `resolver.rs` and into its own
file with a wrapper type, for clarity.

Closes https://github.com/astral-sh/puffin/issues/380.
Closes https://github.com/astral-sh/puffin/issues/421.
2023-11-15 18:22:11 +00:00
Charlie Marsh 0af2f7e39f
Use `anstream` to avoid writing colorized output (#415)
A more robust solution to avoiding colorized output by ensuring we write
to `stdout` and `stderr` via the
[`anstream`](https://docs.rs/anstream/latest/anstream/) crate.

Closes https://github.com/astral-sh/puffin/issues/393.
2023-11-13 20:00:12 +00:00
Andrew Gallant 63f7f65190
change global allocator to jemalloc (and mimalloc on Windows) (#399)
This copies the allocator configuration used in the Ruff project. In
particular, this gives us an instant 10% win when resolving the top 1K
PyPI packages:

    $ hyperfine \
"./target/profiling/puffin-dev-main resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null" \
"./target/profiling/puffin-dev resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null"
Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null
Time (mean ± σ): 974.2 ms ± 26.4 ms [User: 17503.3 ms, System: 2205.3
ms]
      Range (min … max):   943.5 ms … 1015.9 ms    10 runs

Benchmark 2: ./target/profiling/puffin-dev resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null
Time (mean ± σ): 883.1 ms ± 23.3 ms [User: 14626.1 ms, System: 2542.2
ms]
      Range (min … max):   849.5 ms … 916.9 ms    10 runs

    Summary
'./target/profiling/puffin-dev resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null' ran
1.10 ± 0.04 times faster than './target/profiling/puffin-dev-main
resolve-many --cache-dir cache-docker-no-build --no-build
pypi_top_8k_flat.txt --limit 1000 2> /dev/null'

I was moved to do this because I noticed `malloc`/`free` taking up a
fairly sizeable percentage of time during light profiling.

As is becoming a pattern, it will be easier to review this
commit-by-commit.

Ref #396 (wouldn't call this issue fixed)

-----

I did also try adding a `smallvec` optimization to the
`Version::release` field, but it didn't bare any fruit. I still think
there is more to explore since the results I observed don't quite line
up with what I expect. (So probably either my mental model is off or my
measurement process is flawed.) You can see that attempt with a little
more explanation here:
f9528b4ecd

In the course of adding the `smallvec` optimization, I also shrunk the
`Version` fields from a `usize` to a `u32`. They should at least be a
fixed size integer since version numbers aren't used to index memory,
and I shrunk it to `u32` since it seems reasonable to assume that all
version numbers will be smaller than `2^32`.
2023-11-10 14:48:59 -05:00
konsti 5cef40d87a
Add proper caching for pypi metadata fetching kinds (#368)
I intend this to become the main form of caching for puffin: You can
make http requests, you tranform the data to what you really need, you
have control over the cache key, and the cache is always json (or
anything else much faster we want to replace it with as long as it's
serde!)
2023-11-10 11:03:40 +00:00
konsti d1b57acaa8
Implement PEP 517 backend-path (#385)
Closes #192
2023-11-10 11:54:23 +01:00
Andrew Gallant 33c0901a28
distribution-filename: speed up is_compatible (#367)
This PR tweaks the representation of `Tags` in order to offer a
faster implementation of `WheelFilename::is_compatible`. We now use a
nested map of tags that lets us avoid looping over every supported
platform tag. As the code comments suggest, that is the essential gain.
We still do not mind looping over the tags in each wheel name since they
tend to be quite small. And pushing our thumb on that side of things can
make things worse overall since it would likely slow down WheelFilename
construction itself.

For micro-benchmarks, we improve considerably for compatibility
checking:

    $ critcmp base test3
group base test3
----- ---- -----
build_platform_tags/burntsushi-archlinux 1.00 46.2±0.28µs ? ?/sec 2.48
114.8±0.45µs ? ?/sec
wheelname_parsing/flyte-long-compatible 1.00 624.8±3.31ns 174.0 MB/sec
1.01 629.4±4.30ns 172.7 MB/sec
wheelname_parsing/flyte-long-incompatible 1.00 743.6±4.23ns 165.4 MB/sec
1.00 746.9±4.62ns 164.7 MB/sec
wheelname_parsing/flyte-short-compatible 1.00 526.7±4.76ns 54.3 MB/sec
1.01 530.2±5.81ns 54.0 MB/sec
wheelname_parsing/flyte-short-incompatible 1.00 540.4±4.93ns 60.0 MB/sec
1.01 545.7±5.31ns 59.4 MB/sec
wheelname_parsing_failure/flyte-long-extension 1.00 13.6±0.13ns 3.2
GB/sec 1.01 13.7±0.14ns 3.2 GB/sec
wheelname_parsing_failure/flyte-short-extension 1.00 14.0±0.20ns 1160.4
MB/sec 1.01 14.1±0.14ns 1146.5 MB/sec
wheelname_tag_compatibility/flyte-long-compatible 11.33 159.8±2.79ns
680.5 MB/sec 1.00 14.1±0.23ns 7.5 GB/sec
wheelname_tag_compatibility/flyte-long-incompatible 237.60
1671.8±37.99ns 73.6 MB/sec 1.00 7.0±0.08ns 17.1 GB/sec
wheelname_tag_compatibility/flyte-short-compatible 16.07 223.5±8.60ns
128.0 MB/sec 1.00 13.9±0.30ns 2.0 GB/sec
wheelname_tag_compatibility/flyte-short-incompatible 149.83 628.3±2.13ns
51.6 MB/sec 1.00 4.2±0.10ns 7.6 GB/sec

We do regress slightly on the time it takes for `Tags::new` to run, but
this is somewhat expected. And in absolute terms, 114us is perfectly
acceptable given that it's only executed ~once for each `puffin`
invocation.

Ad hoc benchmarks indicate an overall 25% perf improvement in `puffin
pip-compile` times. This roughly corresponds with how much time
`is_compatible` was taking. Indeed, profiling confirms that it has
virtually disappeared from the profile.

Fixes #157
2023-11-09 09:01:03 -05:00
konsti d407bbbee6
Special case missing header build errors (on linux) (#354)
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).

The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):

```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
 3020 | #include "graphviz/cgraph.h"
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```

The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?

This is even harder to spot in pip's output, where it's 11 lines above
the last line:


![image](https://github.com/astral-sh/puffin/assets/6826232/7a3d7279-e7b1-4511-ab22-d0a35be5e672)

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:


![image](https://github.com/astral-sh/puffin/assets/6826232/4bbb8923-5a82-472f-ab1f-9e1471aa2896)

Scrolling up:


![image](https://github.com/astral-sh/puffin/assets/6826232/89a2495a-e188-4288-b534-ad885ee08763)

The difference gets even clearer with a default ubuntu terminal with its
80 columns:


![image](https://github.com/astral-sh/puffin/assets/6826232/49fb27bc-07c6-4b10-a1a1-30ec8e112438)

---

Note that the situation is better for a missing compiler, there i get:

```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
2023-11-08 15:26:39 +00:00
Andrew Gallant 294955ecff
fix platform detection on Linux (#359)
Rejigger Linux platform detection

This change makes some very small improvements to the Linux platform
detection logic. In particular, the existing logic did not work on my
Archlinux machine since /lib64/ld-linux-x86-64.so.2 isn't a symlink. In
that case, the detection logic should have fallen back to the slower
`ldd --version` technique, but `read_link` fails outright when its
argument isn't a symbolic link. So we tweak the logic to allow it to
fail, and if it does, we still try the `ldd --version` approach instead
of giving up completely.

I also made some cosmetic improvements to the regex matching, as well as
ensuring that the regexes are only compiled exactly once.
2023-11-07 11:39:35 -05:00
Charlie Marsh b0286a8939
Add user feedback when building source distributions in the resolver (#347)
It looks like Cargo, notice the bold green lines at the top (which
appear during the resolution, to indicate Git fetches and source
distribution builds):

<img width="868" alt="Screen Shot 2023-11-06 at 11 28 47 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/9647a480-7be7-41e9-b1d3-69faefd054ae">

<img width="868" alt="Screen Shot 2023-11-06 at 11 28 51 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/6bc491aa-5b51-4b37-9ee1-257f1bc1c049">

Closes https://github.com/astral-sh/puffin/issues/287 although we can do
a lot more here.
2023-11-07 14:17:31 +00:00
Charlie Marsh 2c32bc5a86
Respect direct URLs in puffin installer (#345)
We now write the `direct_url.json` when installing, and _skip_
installing if we find a package installed via the direct URL that the
user is requesting.

A lot of TODOs, especially around cleaning up the `Source` abstraction
and its relationship to `DirectUrl`. I'm gonna keep working on these
today, but this works and makes the requirements clear.

Closes #332.
2023-11-07 09:11:27 -05:00
konsti fbe28d3b7c
Fix mastodon-py dist-info handling (#336)
mastodon-py 1.5.1 uses a dot in its dist-info dir name, which we
previously didn't handle, causing home-assistant to fail. The new
implementation is based on
2f83540272/src/packaging/utils.py (L146-L172).

Part of #199

```
unzip -l  Mastodon.py-1.5.1-py2.py3-none-any.whl
Archive:  Mastodon.py-1.5.1-py2.py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
   153929  2020-02-29 17:39   mastodon/Mastodon.py
     1029  2019-10-11 19:15   mastodon/__init__.py
     7357  2019-10-11 20:24   mastodon/streaming.py
       10  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/DESCRIPTION.rst
     1398  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/metadata.json
        9  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/top_level.txt
      110  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/WHEEL
     1543  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/METADATA
      753  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/RECORD
---------                     -------
   166138                     9 files
```
2023-11-07 12:36:11 +01:00
Charlie Marsh 2c114592bd
Only store small wheels in-memory (#348)
Closes https://github.com/astral-sh/puffin/issues/246.
2023-11-07 00:50:00 +00:00
Charlie Marsh a5e535f6fb
Remove `virtualenv` setup from gourgeist (#339)
We now only support building bare environments.
2023-11-06 18:32:45 +00:00
Charlie Marsh b013ea9c93
Move `DirectUrl` into `pypi-types` (#343)
This needs to be reused elsewhere, and there's nothing specific to wheel
installation about it.
2023-11-06 18:26:33 +00:00
Charlie Marsh 24e30e6557
Split `puffin-package` into requirements.txt parser and `pypi-types` (#341)
There are only two things left in this crate and they don't really have
anything to do with one another.
2023-11-06 18:19:49 +00:00
Charlie Marsh d9bcfafa16
Write `direct_url.json` in wheel installer (#337)
## Summary

This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.

Part of https://github.com/astral-sh/puffin/issues/332.
2023-11-06 17:09:28 +00:00
konsti 9b077f3d0f
`cargo upgrade --incompatible` (#330)
Ran `cargo upgrade --incompatible`, seems there are no changes required.

From cacache 0.12.0:
> BREAKING CHANGE: some signatures for copy have changed, and copy no
longer automatically reflinks

`which` 5.0.0 seems to have only error message changes.
2023-11-06 14:14:47 +00:00
konsti b2439b24a1
Fetch wheel metadata by async range requests on the remote wheel (#301)
Use range requests and async zip to extract the METADATA file from a
remote wheel.

We currently only cache when the remote says the remote declares the
resource as immutable, see
https://github.com/06chaynes/http-cache/issues/57 and
https://github.com/baszalmstra/async_http_range_reader/pull/1 . The
cache is stored as json with the description omitted, this improve cache
deserialization performance.
2023-11-06 15:06:49 +01:00
Charlie Marsh 6d672b8951
Add source distribution support to `pip-compile` (#323)
## Summary

This is a first-pass at adding source distribution support to the
installer.

The previous installation flow was:

1. Come up with a plan.
1. Find a distribution (specific file) for every package that we'll need
to download.
1. Download those distributions.
1. Unzip them (since we assumed they were all wheels).
1. Install them into the virtual environment.

Now, Step (3) downloads both wheels and source distributions, and we
insert a step between Steps (3) and (4) to build any source
distributions into zipped wheels.

There are a bunch of TODOs, the most important (IMO) is that we
basically have two implementations of downloading and building, between
the stuff in `puffin_installer` and `puffin_resolver` (namely in
`crates/puffin-resolver/src/distribution`). I didn't attempt to clean
that up here -- it's already a problem, and it's related to the overall
problem we need to solve around unified caching and resource management.

Closes #243.
2023-11-06 08:22:36 -05:00
konsti b79a15b458
Update pyproject-toml to 0.8.0 (#329) 2023-11-06 13:16:36 +00:00
Charlie Marsh d785ffdbff
Move `Source` abstraction into `puffin-distribution` (#321)
No code changes, but this will allow it to be shared between the
installer and the resolver.
2023-11-06 02:31:15 +00:00
Charlie Marsh fa1bbbbe08
Write fully-precise Git SHAs to `pip-compile` output (#299)
This PR adds a mechanism by which we can ensure that we _always_ try to
refresh Git dependencies when resolving; further, we now write the fully
resolved SHA to the "lockfile". However, nothing in the code _assumes_
we do this, so the installer will remain agnostic to this behavior.

The specific approach taken here is minimally invasive. Specifically,
when we try to fetch a source distribution, we check if it's a Git
dependency; if it is, we fetch, and return the exact SHA, which we then
map back to a new URL. In the resolver, we keep track of URL
"redirects", and then we use the redirect (1) for the actual source
distribution building, and (2) when writing back out to the lockfile. As
such, none of the types outside of the resolver change at all, since
we're just mapping `RemoteDistribution` to `RemoteDistribution`, but
swapping out the internal URLs.

There are some inefficiencies here since, e.g., we do the Git fetch,
send back the "precise" URL, then a moment later, do a Git checkout of
that URL (which will be _mostly_ a no-op -- since we have a full SHA, we
don't have to fetch anything, but we _do_ check back on disk to see if
the SHA is still checked out). A more efficient approach would be to
return the path to the checked-out revision when we do this conversion
to a "precise" URL, since we'd then only interact with the Git repo
exactly once. But this runs the risk that the checked-out SHA changes
between the time we make the "precise" URL and the time we build the
source distribution.

Closes #286.
2023-11-03 16:26:57 +00:00
Charlie Marsh 62c474d880
Add support for Git dependencies (#283)
## Summary

This PR adds support for Git dependencies, like:

```
flask @ git+https://github.com/pallets/flask.git
```

Right now, they're only supported in the resolver (and not the
installer), since the installer doesn't yet support source distributions
at all.

The general approach here is based on Cargo's Git implementation.
Specifically, I adapted Cargo's
[`git`](23eb492cf9/src/cargo/sources/git/mod.rs)
module to perform the cloning, which is based on `libgit2`.

As compared to Cargo's implementation, I made the following changes:

- Removed any unnecessary code.
- Fixed any Clippy errors for our stricter ruleset.
- Removed the dependency on `curl`, in favor of `reqwest` which we use
elsewhere.
- Removed the ability to use `gix`. Cargo allows the use of `gix` as an
experimental flag, but it only supports a small subset of the
operations. When Cargo fully adopts `gix`, we should plan to do the
same.
- Removed Cargo's host key checking. We need to re-add this! I'll do it
shortly.
- Removed Cargo's progress bars. We should re-add this too, but we use
`indicatif` and Cargo had their own thing.

There are a few follow-ups to consider:

- Adding support in the installer.
- When we lock, we should write out the Git URL that includes the exact
SHA. This lets us cache in perpetuity and avoids dependencies changing
without re-locking.
- When we resolve, we should _always_ try to refresh Git dependencies.
(Right now, we skip if the wheel was already built.)

I'll work on the latter two in follow-up PRs.

Closes #202.
2023-11-02 15:14:55 +00:00
konsti 4adaa9a700
Wheel filename distribution package name (#278)
The normalized name abstractions were not consistently, this PR uses
them where they were previously missing:
* `WheelFilename::distribution`
* `Requirement::name`
* `Requirement::extras`
* `Metadata21::name`
* `Metadata21::provides_dist`

With `puffin-package` depending on `pep508_rs` this would be cyclical
crate dependency, so `puffin-normalize` gets split out from
`puffin-package`.

`DistInfoName` has the same task and semantics as `PackageName`, so it's
merged into the latter.

`PackageName` and `ExtraName` documentation is moved onto the type and
their constructors are called `new` instead of `normalize`. We now use
these constructors rarely enough the implicit allocation by
`to_string()` shouldn't matter anymore, while more actual cloning
becomes visible.
2023-11-02 11:15:27 +00:00
Charlie Marsh 2ee555df7b
Use `puffin_cache::digest` in another site (#289) 2023-11-02 04:48:14 +00:00
Charlie Marsh 8123e1a8f6
Add stable hash crate (#281)
This PR adds a `puffin-cache` crate that we can share across a variety of
other crates to generate stable hashes.
2023-11-01 23:41:45 +00:00
Charlie Marsh 2652caa3e3
Add support for URL dependencies (#251)
## Summary

This PR adds support for resolving and installing dependencies via
direct URLs, like:

```
werkzeug @ 960bb4017c4aed12b5ed8b78e0153e/Werkzeug-2.0.0-py3-none-any.whl
```

These are fairly common (e.g., with `torch`), but you most often see
them as Git dependencies.

Broadly, structs like `RemoteDistribution` and friends are now enums
that can represent either registry-based dependencies or URL-based
dependencies:

```rust
/// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`).
#[derive(Debug, Clone)]
#[allow(clippy::large_enum_variant)]
pub enum RemoteDistribution {
    /// The distribution exists in a registry, like `PyPI`.
    Registry(PackageName, Version, File),
    /// The distribution exists at an arbitrary URL.
    Url(PackageName, Url),
}
```

In the resolver, we now allow packages to take on an extra, optional
`Url` field:

```rust
#[derive(Debug, Clone, Eq, Derivative)]
#[derivative(PartialEq, Hash)]
pub enum PubGrubPackage {
    Root,
    Package(
        PackageName,
        Option<DistInfoName>,
        #[derivative(PartialEq = "ignore")]
        #[derivative(PartialOrd = "ignore")]
        #[derivative(Hash = "ignore")]
        Option<Url>,
    ),
}
```

However, for the purpose of version satisfaction, we ignore the URL.
This allows for the URL dependency to satisfy the transitive request in
cases like:

```
flask==3.0.0
werkzeug @ 254c3e9b5f5941e900b71206e6313b/werkzeug-3.0.1-py3-none-any.whl
```

There are a couple limitations in the current approach:

- The caching for remote URLs is done separately in the resolver vs. the
installer. I decided not to sweat this too much... We need to figure out
caching holistically.
- We don't support any sort of time-based cache for remote URLs -- they
just exist forever. This will be a problem for URL dependencies, where
we need some way to evict and refresh them. But I've deferred it for
now.
- I think I need to redo how this is modeled in the resolver, because
right now, we don't detect a variety of invalid cases, e.g., providing
two different URLs for a dependency, asking for a URL dependency and a
_different version_ of the same dependency in the list of first-party
dependencies, etc.
- (We don't yet support VCS dependencies.)
2023-11-01 09:21:44 -04:00
Charlie Marsh 89dad0c9ad
Move distribution abstraction in shared crate (#258)
This also allows us to get rid of `PinnedPackage` _and_ to remove some
`Result<...>` types due to needless conversions between
otherwise-identical types.
2023-10-31 15:30:06 -04:00
Charlie Marsh 3312ce30f5
Upgrade crates and remove unused dependencies (#256) 2023-10-31 13:16:58 -04:00
konsti 29bd0a4ed8
Fix musl compilation (#234)
musl (which we already use in ruff) allows statically linked binaries on
linux. This PR switches to rustls and vendors and fixes the glibc
detection. Using static musl builds makes it easier to avoid glibc
errors in docker and we'll need it later for alpine users anyway.

An alternative is using vendored openssl.
2023-10-30 18:10:17 +01:00
Charlie Marsh 2ba85bf80e
Add PubGrub's priority queue (#221)
Pulls in https://github.com/pubgrub-rs/pubgrub/pull/104.
2023-10-29 21:16:02 +00:00
konsti 5ad58474ca
Add script to check the top 8k pypi packages (#198)
To check to top 1k (current state):

```bash
scripts/resolve/get_pypi_top_8k.sh
cargo run --bin puffin-dev -- resolve-many scripts/resolve/pypi_top_8k_flat.txt --limit 1000
```

Results:
```
Errors: pywin32, geoip2, maxminddb, pypika, dirac
Success: 995, Error: 5
```
pywin32 has no solution for the build environment, 3 have no
`[build-system]` entry in pyproject.toml, `dirac` is missing cmake
2023-10-26 12:03:59 +00:00
konsti 216b6c41c2
Start puffin-dev (#193)
Currently, this is only the source distribution building feature moved.
It's intended that we can add development and test commands there
without affecting the main cli surface
2023-10-26 09:17:22 +00:00
konsti 889f6173cc
Unify python interpreter abstractions (#178)
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.

Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
2023-10-25 20:11:36 +00:00
konsti 1fbe328257
Build source distributions in the resolver (#138)
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.

The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.

It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
2023-10-25 20:05:13 +00:00
Charlie Marsh 49a27ff33c
Add support for parameterized link modes (#164)
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)

Closes #159.
2023-10-22 04:35:50 +00:00
Charlie Marsh b665f1489a
Add tests for `puffin sync` (#161)
Closes #158.
2023-10-22 03:25:00 +00:00
Charlie Marsh 3072c3265e
Add support for lowest and lowest-direct resolution modes (#160)
Borrows terminology from pnpm by introducing three resolution modes:

- "Highest": always choose the highest compliant version (default).
- "Lowest": always choose the lowest compliant version.
- "LowestDirect": choose the lowest compliant version of direct
dependencies, and the highest compliant version of any transitive
dependencies. (This makes a bit more sense than "lowest".)

Closes https://github.com/astral-sh/puffin/issues/142.
2023-10-21 22:58:06 -04:00
konsti ae9d1f7572
Add source distribution filename abstraction (#154)
The need for this became clear when working on the source distribution
integration into the resolver.

While at it i also switch the `WheelFilename` version to the parsed
`pep440_rs` version now that we have this crate.
2023-10-20 17:45:57 +02:00
Charlie Marsh 4645f79237
Use `FxHash` (#151) 2023-10-20 05:26:06 +00:00
Charlie Marsh 8001c792e7
Show requirement sources in `pip-compile` output (#149)
Builds up a complete resolved graph from PubGrub, and shows the sources
that led to each package being included in the resolution, like
`pip-compile`.

Closes https://github.com/astral-sh/puffin/issues/60.
2023-10-20 05:14:59 +00:00
Charlie Marsh 9b3405bf0e
Upgrade PubGrub to dev branch (#147)
Updates to `29c48fb9f3daa11bd02794edd55060d0b01ee705` from the
`pubgrub-rs` dev branch. This lets us reduce the number of changes we've
made to PubGrub itself (now, only changing visibility to export a few
things from the `solver.rs` module).
2023-10-20 03:23:26 +00:00