## Summary
This PR makes the `pypi_types::File` a response-only type (i.e., a type
that's only used when deserializing over the wire), and adds a separate
internal `File` type. Right now, the representations are similar, but
already, we can avoid the "lenient" deserialization on our internal
`File` type, and avoid the special-casing of the property names that's
required in the JSON. Over time, we can evolve this representation
entirely separately from the representation we receive from PyPI and
other indexes.
This allows the default index URL to be easily overridden with a local
index e.g. a `packse` server
```
export PUFFIN_INDEX_URL="http://localhost:3141/packages/all/+simple"
```
The high level goal here is to improve the tests for the version parser.
Namely, we now check not just that version strings parse successfully,
but that they parse to the expected result.
We also do a few other cleanups. Most notably, `Version` is now an
opaque type so that we can more easily change its representation going
forward.
Reviewing commit-by-commit is suggested. :-)
The test creates a cache from multiple sources and injects faults (once
using invalid data and once by making the files unreadable on the fs
level), then resolves again.
I didn't test git because it has its own locking and correctness logic.
The main drawback is that this test is slow (2.5s for me), we could
`#[ignore]` it.
This is a pure refactor to follow-up #690, to separate the metadata that
we know upfront about distributions (like the version, for
registry-based distributions) vs. the metadata that requires building
(like the version, for URL-based distributions).
We now show the fully-resolved URL, rather than the URL as given by the
user, _everywhere_ except for the output resolution file (which should
retain relative paths, unexpanded environment variables, etc.).
Closes https://github.com/astral-sh/puffin/issues/687.
With `Option<T>` and `.unwrap_or_default()` later, the default of `T`
isn't shown in the help output.
Old:
```
--link-mode <LINK_MODE>
The method to use when installing packages from the global cache
Possible values:
- clone: Clone (i.e., copy-on-write) packages from the wheel into the site packages
- copy: Copy packages from the wheel into the site packages
- hardlink: Hard link packages from the wheel into the site packages
-q, --quiet
Do not print any output
--resolution <RESOLUTION>
Possible values:
- highest: Resolve the highest compatible version of each package
- lowest: Resolve the lowest compatible version of each package
- lowest-direct: Resolve the lowest compatible version of any direct dependencies, and the highest compatible version of any transitive dependencies
--prerelease <PRERELEASE>
Possible values:
- disallow: Disallow all pre-release versions
- allow: Allow all pre-release versions
- if-necessary: Allow pre-release versions if all versions of a package are pre-release
- explicit: Allow pre-release versions for first-party packages with explicit pre-release markers in their version requirements
- if-necessary-or-explicit: Allow pre-release versions if all versions of a package are pre-release, or if the package has an explicit pre-release marker in its version requirements
```

New:
```
--link-mode <LINK_MODE>
The method to use when installing packages from the global cache
[default: hardlink]
Possible values:
- clone: Clone (i.e., copy-on-write) packages from the wheel into the site packages
- copy: Copy packages from the wheel into the site packages
- hardlink: Hard link packages from the wheel into the site packages
-q, --quiet
Do not print any output
--resolution <RESOLUTION>
[default: highest]
Possible values:
- highest: Resolve the highest compatible version of each package
- lowest: Resolve the lowest compatible version of each package
- lowest-direct: Resolve the lowest compatible version of any direct dependencies, and the highest compatible version of any transitive dependencies
--prerelease <PRERELEASE>
[default: if-necessary-or-explicit]
Possible values:
- disallow: Disallow all pre-release versions
- allow: Allow all pre-release versions
- if-necessary: Allow pre-release versions if all versions of a package are pre-release
- explicit: Allow pre-release versions for first-party packages with explicit pre-release markers in their version requirements
- if-necessary-or-explicit: Allow pre-release versions if all versions of a package are pre-release, or if the package has an explicit pre-release marker in its version requirements
```

This PR uses borrowed data in `BuildDispatch` which makes creating a
`BuildDispatch` extremely cheap (only one allocation, for the Python
executable). I can be talked out of this, it will have no measurable
impact.
Separate branch for rebasing #677 onto main because i don't trust the
rebase enough to force push.
Closes#677.
---
If you install `black` from PyPI, then `-e ../black`, we need to
uninstall the existing `black`. This sounds simple, but that in turn
requires that we _know_ `-e ../black` maps to the package `black`, so
that we can mark it for uninstallation in the install plan. This, in
turn, means that we need to build editable dependencies prior to the
install plan.
This is just a bunch of reorganization to fix that specific bug
(installing multiple versions of `black` if you run through the above
workflow): we now run through the list of editables upfront, mark those
that are already installed, build those that aren't, and then ensure
that `InstallPlan` correctly removes those that need to be removed, etc.
Closes#676.
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
Per the title: adds support for `-e` installs to `puffin pip-install`.
There were some challenges here around threading the editable installs
to the right places. Namely, we want to build _once_, then reuse the
editable installs from the resolution. At present, we were losing the
`editable: true` flag on the `Dist` that came back through the
resolution, so it required some changes to the resolver.
Closes https://github.com/astral-sh/puffin/issues/672.
This PR modifies `SitePackages` to store all distributions in a flat
vector, and maintain two indexes (hash maps) from "per-element data for
an element in the vector" to "index of that element". This enables us to
maintain a map on both package name and editable URL.
Two low-hanging fruits as optimizations for version parsing: A fast path
for release only versions and removing the regex from version specifiers
(still calling into version's parsing regex if required). This enables
optimizing the serde format since we now see the serde part instead of
only PEP 440 parsing. I intentionally didn't rewrite the full PEP 440 at
this step.
```console
$ hyperfine --warmup 5 --runs 50 "target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in" "target/profiling/main pip-compile scripts/requirements/transformers-extras.in"
Benchmark 1: target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in
Time (mean ± σ): 217.1 ms ± 3.2 ms [User: 194.0 ms, System: 55.1 ms]
Range (min … max): 211.0 ms … 228.1 ms 50 runs
Benchmark 2: target/profiling/main pip-compile scripts/requirements/transformers-extras.in
Time (mean ± σ): 276.7 ms ± 5.7 ms [User: 252.4 ms, System: 54.6 ms]
Range (min … max): 268.9 ms … 303.5 ms 50 runs
Summary
target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in ran
1.27 ± 0.03 times faster than target/profiling/main pip-compile scripts/requirements/transformers-extras.in
```
---------
Co-authored-by: Andrew Gallant <andrew@astral.sh>
## Summary
This PR ensures that we re-use the resolution to install the build
dependencies when building a source distribution. Currently, we only
pass along the list of requirements, and then use the `Finder` to map
each requirement to a distribution. But we already determine the correct
distribution when resolving!
Closes https://github.com/astral-sh/puffin/issues/655.
## Summary
This PR enables users to express relative dependencies via environment
variables. Like pip, PDM, Hatch, Rye, and others, we now allow users to
express dependencies like:
```text
flask @ file://${PROJECT_ROOT}/flask-3.0.0-py3-none-any.whl
```
In the compiled requirements file, we'll also preserve the unexpanded
environment variable.
Closes https://github.com/astral-sh/puffin/issues/592.
## Summary
This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout
the resolution and installation pipeline. In short, alongside the parsed
`Url`, we also keep the URL as written by the user. This enables us to
display the URL exactly as written by the user, rather than the
serialized path that we use internally.
This will be especially useful once we start expanding environment
variables since, at that point, we'll be able to write the version of
the URL that includes the _unexpected_ environment variable to the
output file.
## Summary
When resolving `transformers[tensorboard]`, the `[tensorboard]` extra
doesn't exist. Previously, we returned "unknown" dependencies for this
variant, which leads the resolution to try all versions, then fail. This
PR instead warns, but returns the base dependencies for the package,
which matches `pip`. (Poetry doesn't even warn, it just proceeds as
normal.)
Arguably, it would be better to return a custom incompatibility here and
then propagate... But this PR is better than the status quo, and I don't
know if we have support for that behavior yet...? (\cc @zanieb)
Closes#386.
Closes https://github.com/astral-sh/puffin/issues/423.
## Summary
This PR enables overrides to be passed to `pip-compile` and
`pip-install` via a new `--overrides` flag.
When overrides are provided, we effectively replace any requirements
that are overridden with the overridden versions. This is applied at all
depths of the tree.
The merge semantics are such that we replace _all_ requirements of a
package with _all_ requirements from the overrides files. So, for
example, if a package declares:
```
foo >= 1.0; python_version < '3.11'
foo < 1.0; python_version >= '3.11'
```
And the user provides an override like:
```
foo >= 2.0
```
Then _both_ of the `foo` requirements in the package will be replaced
with the override.
If instead, the user provided an override like:
```
foo >= 2.0; python_version < '3.11'
foo < 3.0; python_version >= '3.11'
```
Then we'd replace _both_ of the original `foo` requirements with both of
these overrides. (In technical terms, for each package in the
requirements file, we flat-map over its overrides.)
Closes https://github.com/astral-sh/puffin/issues/511.
## Summary
Now, `puffin_warnings::warn_once` and `puffin_warnings::warn` will go to
`stderr`, as long as the user isn't running under `--quiet`. Previously,
these went through `tracing`, and so were only visible when running
under `--verbose`.
Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate
version ranges in error reports using the actual available versions for
each package.
Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements
this behavior as a method in the `Reporter` — here it's implemented in
our custom report formatter (#521) instead which requires no upstream
changes.
Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the
versions for packages that will be used in the report.
This is a work in progress. Some things to do:
- ~We may want to allow lazy retrieval of the version maps from the
formatter~
- [x] We should probably create a separate error type for no solution
instead of mixing them with other resolve errors
- ~We can probably do something smarter than creating vectors to hold
the versions~
- [x] This degrades error messages when a single version is not
available, we'll need to special case that
- [x] It seems safer to coerce the error type in `resolve` instead of
`solve` if feasible
## Summary
Now, after running `pip-install`, we validate that the set of installed
packages is consistent -- that is, that we don't have any packages that
are missing dependencies, or incompatible versions of installed
dependencies.
## Summary
At present, when performing a `pip-install`, we first do a resolution,
then take the set of requirements and basically run them through our
`pip-sync`, which itself includes re-resolving the dependencies to get a
specific `Dist` for each package. (E.g., the set of requirements might
say `flask==3.0.0`, but the installer needs a specific _wheel_ or source
distribution to install.)
This PR removes this second resolution by exposing the set of pinned
packages from the resolution. The main challenge here is that we have an
optimization in the resolver such that we let the resolver read metadata
from an incompatible wheel as long as a source distribution exists for a
given package. This lets us avoid building source distributions in the
resolver under the assumption that we'll be able to install the package
later on, if needed. As such, the resolver now needs to track the
resolution and installation filenames separately.
## Summary
When running `puffin pip-install`, we should respect versions that are
already installed in the environment. For example, if you run `puffin
pip-install flask==2.0.0` and then `puffin pip-install flask`, we should
avoid upgrading Flask. The most natural way to model this is to mark
them as "preferences".
(It's not enough to just filter those requirements out prior to
resolving, since we may not have the _dependencies_ of those packages
installed. We _could_ recursively verify this across the
`site-packages`, but that would be a larger PR.)
## Summary
This PR adds a `pip-install` command that operates like, well, `pip
install`. In short, it resolves the provided dependency, then makes sure
they're all installed in the environment. The primary differences with
`pip-sync` are that (1) `pip-sync` ignores dependencies, and assumes
that the packages represent a complete set; and (2) `pip-sync`
uninstalls any unlisted packages.
There are a bunch of TODOs that I'll resolve in subsequent PRs.
Closes https://github.com/astral-sh/puffin/issues/129.
## Summary
At present, we have two separate phases within the installation pipeline
related to populating wheels into the cache. The first phase downloads
the distribution, and then builds any source distributions into wheels;
the second phase unzips all the built wheels into the cache.
This PR merges those two phases into one, such that we seamlessly
download, build, and unzip wheels in one pass. This is more efficient,
since we can start unzipping while we build. It also ensures that if the
install _fails_ partway through, we don't end up with a bunch of
downloaded wheels that we never had a chance to unzip. The code is also
much simpler.
The main downside is that the user-facing feedback isn't as granular,
since we only have one phase and one progress bar for what was
originally three distinct phases.
Closes https://github.com/astral-sh/puffin/issues/571.
## Test Plan
I ran the benchmark script on two separate requirements files, and saw a
7% and 31% speedup respectively:
```text
+ TARGET=./scripts/benchmarks/requirements.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache
Time (mean ± σ): 269.4 ms ± 33.0 ms [User: 42.4 ms, System: 117.5 ms]
Range (min … max): 221.7 ms … 446.7 ms 100 runs
Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache
Time (mean ± σ): 250.6 ms ± 28.3 ms [User: 41.5 ms, System: 127.4 ms]
Range (min … max): 207.6 ms … 336.4 ms 100 runs
Summary
'./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran
1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
```
```text
+ TARGET=./scripts/benchmarks/requirements-large.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
Time (mean ± σ): 5.053 s ± 0.354 s [User: 1.413 s, System: 6.710 s]
Range (min … max): 4.584 s … 6.333 s 100 runs
Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
Time (mean ± σ): 3.845 s ± 0.225 s [User: 1.364 s, System: 6.970 s]
Range (min … max): 3.482 s … 4.715 s 100 runs
Summary
'./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran
```
I saw warnings when we were e.g. unzipping wheel and setuptools in two
tasks at the same time. We now keep track of in flight unzips.
This introduces a `OnceMap` abstraction which we also use in the
resolver.
## Summary
This PR adds two flags to `pip-sync`: `--reinstall`, and
`--reinstall-package [PACKAGE]`. The former reinstalls all packages in
the requirements, while the latter can be repeated and reinstalls all
specified packages.
For our purposes, a reinstall includes (1) purging the cache, and (2)
marking any already-installed versions as extraneous.
Closes#572.
Closes https://github.com/astral-sh/puffin/issues/271.
## Summary
This PR enables `puffin clean` to accept package names as command line
arguments, and selectively purge entries from the cache tied to the
given package.
Relate to #572.
## Test Plan
Modified all the caching tests to run an additional step to (1) purge
the cache, and (2) re-install the package.
Also ensures that we filter out any incompatible requirements when
building the install plan. In general, we assume that requirements were
generated by `pip-compile`, in which case all requirements should be
compatible and there should be no duplicates; but we should handle this
case gracefully.
Closes https://github.com/astral-sh/puffin/issues/582.
This PR adds caching support for built wheels in the installer.
Specifically, the `RegistryWheelIndex` now indexes both downloaded and
built wheels (from registries), and we have a new `BuiltWheelIndex` that
takes a subdirectory and returns the "best-matching" compatible wheel.
Closes#570.
Adds a few more tests for re-installs with various kinds of source
distributions, and changes the tests to use packages that we can safely
import (via `check_command`) for extra validation.
Once we properly respect cached built wheels, we should expect these
snapshots to change, since we'll no longer download and re-build
unnecessarily.
I'm working off of @konstin's commit here to implement arbitrary unsat
test cases for the resolver.
The entirety of the resolver's io are two functions: Get the version map
for a package (PEP 440 version -> distribution) and get the metadata for
a distribution. A new trait `ResolverProvider` abstracts these two away and
allows replacing the real network requests e.g. with stored responses
(https://github.com/pradyunsg/pip-resolver-benchmarks/blob/main/scenarios/pyrax_198.json).
---------
Co-authored-by: konsti <konstin@mailbox.org>
## Summary
This enables users to rely on yanked versions via explicit `==` markers,
which is necessary in some projects (and, in my opinion, reasonable).
Closes#551.
## Summary
Allows, e.g., `--python-version 3.7` or `--python-version 3.7.9`. This
was also feedback I received in the original PR.
Closes https://github.com/astral-sh/puffin/issues/533.
Ensure we're using atomic writes everywhere in our cache to avoid broken
cache records and error with parallel puffin actions
(https://github.com/astral-sh/puffin/pull/544#issuecomment-1838841581).
All json files that are written to the cache are written atomically and
the build wheels are written to temp dir and then moved atomically. I
didn't touch venv creation though, i don't think that's worth it since
python does not support atomic package installation through its design.
## Summary
This PR modifies the behavior of our `--python-version` override in two
ways:
1. First, we always use the "real" interpreter in the source
distribution builder. I think this is correct. We don't need to use the
fake markers for recursive builds, because all we care about is the
top-level resolution, and we already assume that a single source
distribution will always return the same metadata regardless of its
build environment.
2. Second, we require that source distributions are compatible with
_both_ the "real" interpreter version and the marker environment. This
ensures that we don't try to build source distributions that are
compatible with our interpreter, but incompatible with the target
version.
Closes https://github.com/astral-sh/puffin/issues/407.
## Summary
This PR modifies `puffin-build` to be closer in behavior to
[pip](a15dd75d98/src/pip/_internal/pyproject.py (L53))
and
[build](de5b44b0c2/src/build/__init__.py (L94)).
Specifically, if a project contains a `[build-system]` field, but no
`build-backend`, we now perform a PEP 517 build (instead of using
`setup.py` directly) _and_ respect the `requires` of the
`[build-system]`. Without this change, we were failing to build source
distributions for packages like `ujson`.
Closes#527.
---------
Co-authored-by: konstin <konstin@mailbox.org>
## Summary
Even if this will typically be in the user's application folder (rather
than a local directory), it's still a good practice.
Closes https://github.com/astral-sh/puffin/issues/280.
After this change, two wheel caches remain: `built-wheels-v0` and
`wheels-v0`, docs screenshots below. Each contains both the wheel
metadata, cache policy and zip or unzipped wheels under the same name.
The zipped/unzipped strategy is as follows: In `pip-compile`, when we
build a wheel, we store it zipped. When `pip-sync` or a source dist
build in `pip-compile` need to install the wheel, we unzip it, remove
the file and replace it with the unzipped wheel.
This removes `WheelCache` and `UrlIndex` in favor of `Cache` plus
`WheelCache`. The non-built wheel cache now considers index urls and the
url for url wheels.
I'm unsure if we need the `Unzipper` type, this could just be a
function.
I move `no_index` into `IndexUrls` and started using `IndexUrl` up to
the clap level.
I left a number of TODOs in the code, namely performing the actual
invalidation of unzipped wheels and making the `InstallPlan` understand
cache invalidation (i.e. uninstall wheels when their remote changed).

Remove built wheels alongside their metadata when their index source
dist or url source dist changed. For git source dists, we currently
don't clear the previous build but use a new directory (not sure what's
right here - are there any generic cache GC approaches out there? I've
seen that e.g. spotify keeps its cache at 10GB max, but i also haven't
seen any reusable, well tested approaches for this). Path distributions
are unchanged (#478).
I like the structure of metadata alongside the wheel for cache
invalidation, i'll try to do that for `wheels-v0`/`wheel-metadata-v0`
too. (The unzipped wheels afaik currently lack cache invalidation when
the remote changed.) This should give is roughly the same structure for
wheel and built wheels and a very similar pattern of invalidation.
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.
Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.
When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
This is mostly a mechanical refactor that moves 80% of our code to the
same cache abstraction.
It introduces cache `Cache`, which abstracts away the path of the cache
and the temp dir drop and is passed throughout the codebase. To get a
specific cache bucket, you need to requests your `CacheBucket` from
`Cache`. `CacheBucket` is the centralizes the names of all cache
buckets, moving them away from the string constants spread throughout
the crates.
Specifically for working with the `CachedClient`, there is a
`CacheEntry`. I'm not sure yet if that is a strict improvement over
`cache_dir: PathBuf, cache_file: String`, i may have to rotate that
later.
The interpreter cache moved into `interpreter-v0`.
We can use the `CacheBucket` page to document the cache structure in
each bucket:

**Motivation** Previously, we would install any wheel with the correct
package name and version from the cache, even if it doesn't match the
current python interpreter.
**Summary** The unzipped wheel cache for registries now uses the entire
wheel filename over the name-version (`editables-0.5-py3-none-any.whl`
over `editables-0.5`).
Built wheels are not stored in the `wheels-v0` unzipped wheels cache
anymore. For each source distribution, there can be multiple built
wheels (with different compatibility tags), so i argue that we need a
different cache structure for them (follow up PR).
For `all-kinds.in` with
```bash
rm -rf cache-all-kinds
virtualenv --clear -p 3.12 .venv
cargo run --bin puffin -- pip-sync --cache-dir cache-all-kinds target/all-kinds.txt
```
we get:
**Before**
```
cache-all-kinds/wheels-v0/
├── registry
│ ├── annotated_types-0.6.0
│ ├── asgiref-3.7.2
│ ├── blinker-1.7.0
│ ├── certifi-2023.11.17
│ ├── cffi-1.16.0
│ ├── [...]
│ ├── tzdata-2023.3
│ ├── urllib3-2.1.0
│ └── wheel-0.42.0
└── url
├── 4b8be67c801a7ecb
│ ├── flask
│ └── flask-3.0.0.dist-info
├── 6781bd6440ae72c2
│ ├── werkzeug
│ └── werkzeug-3.0.1.dist-info
└── a67db8ed076e3814
├── pydantic_extra_types
└── pydantic_extra_types-2.1.0.dist-info
48 directories, 0 files
```
**After**
```
cache-all-kinds/wheels-v0/
├── registry
│ ├── annotated_types-0.6.0-py3-none-any.whl
│ ├── asgiref-3.7.2-py3-none-any.whl
│ ├── blinker-1.7.0-py3-none-any.whl
│ ├── certifi-2023.11.17-py3-none-any.whl
│ ├── cffi-1.16.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
│ ├── [...]
│ ├── tzdata-2023.3-py2.py3-none-any.whl
│ ├── urllib3-2.1.0-py3-none-any.whl
│ └── wheel-0.42.0-py3-none-any.whl
└── url
└── 4b8be67c801a7ecb
└── flask-3.0.0-py3-none-any.whl
39 directories, 0 files
```
**Outlook** Part of #477 "Fix wheel caching". Further tasks:
* Replace the `CacheShard` with `WheelMetadataCache` which handles urls
properly.
* Delete unzipped wheels when their remote wheel changed
* Store built wheels next to the `metadata.json` in the source dist
directory; delete built wheels when their source dist changed (different
cache bucket, but it's the same problem of fixing wheel caching) I'll
make stacked PRs for those
## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e2853bf83db817a7dd53d/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
Preparing for #235, some refactoring to `puffin_interpreter`.
* Added a dedicated error type instead of anyhow
* `InterpreterInfo` -> `Interpreter`
* `detect_virtual_env` now returns an option so it can be chained for
#235
## Summary
This PR adds support for local path dependencies. The approach mostly
just falls out of our existing approach and infrastructure for Git and
URL dependencies.
Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a
separate issue for editable installs.)
## Test Plan
Added `pip-compile` tests that pre-download a wheel or source
distribution, then install it via local path.
## Summary
This PR unifies the behavior that lived in the resolver's `distribution`
crates with the behaviors that were spread between the various structs
in the installer crate into a single `Fetcher` struct that is intended
to manage all interactions with distributions. Specifically, the
interface of this struct is such that it can access distribution
metadata, download distributions, return those downloads, etc., all with
a common cache.
Overall, this is mostly just DRYing up code that was repeated between
the two crates, and putting it behind a reasonable shared interface.
## Summary
This crate only contains types, and I want to introduce a new crate for
all _operations_ on distributions, so this feels like a more natural
name given we also have `pypi-types`.
```
error[E0432]: unresolved imports `puffin_cache::CacheArgs`, `puffin_cache::CacheDir`
--> crates/puffin-cli/src/main.rs:11:20
|
11 | use puffin_cache::{CacheArgs, CacheDir};
| ^^^^^^^^^ ^^^^^^^^ no `CacheDir` in the root
| |
| no `CacheArgs` in the root
|
note: found an item that was configured out
--> /Users/mz/eng/src/astral-sh/puffin/crates/puffin-cache/src/lib.rs:7:15
|
7 | pub use cli::{CacheArgs, CacheDir};
| ^^^^^^^^^
= note: the item is gated behind the `clap` feature
note: found an item that was configured out
--> /Users/mz/eng/src/astral-sh/puffin/crates/puffin-cache/src/lib.rs:7:26
|
7 | pub use cli::{CacheArgs, CacheDir};
| ^^^^^^^^
= note: the item is gated behind the `clap` feature
For more information about this error, try `rustc --explain E0432`.
error: could not compile `puffin-cli` (bin "puffin") due to previous error
```
## Summary
This is a refactor to address a TODO in the build context whereby we
aren't respecting the resolution options in recursive resolutions. Now,
the options are split out from the resolution _manifest_, and shared
across the build context tree.
Extends #424 with support for URL dependency incompatibilities.
Requires changes to `miette` to prevent URLs from being word wrapped;
accepted upstream in https://github.com/zkat/miette/pull/321
Always¹ clear the temporary directories we create.
* Clear source dist downloads: Previously, the temporary directories
would remain in the cache dir, now they are cleared properly
* Clear wheel file downloads: Delete the `.whl` file, we only need to
cache the unpacked wheel
* Consistent handling of cache arguments: Abstract the handling for CLI
cache args away, again making sure we remove the `--no-cache` temp dir.
There are no more `into_path()` calls that persist `TempDir`s that i
could find.
¹Assuming drop is run, and deleting the directory doesn't silently
error.
Addresses
https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969
Similar to #338 this throws an error when merging versions results in an
empty set. Instead of propagating that error, we capture it and return a
new dependency type of `Unusable`. Unusable dependencies are a new
incompatibility kind which includes an arbitrary "reason" string that we
present to the user. Adding a new incompatibility kind requires changes
to the vendored pubgrub crate.
We could use this same incompatibility kind for conflicting urls as in
#284 which should allow the solver to backtrack to another valid version
instead of failing (see #425).
Unlike #383 this does not require changes to PubGrub's package mapping
model. I think in the long run we'll want PubGrub to accept multiple
versions per package to solve this specific issue, but we're interested
in it being merged upstream first. This pull request is just using the
issue as a simple case to explore adding a new incompatibility type.
We may or may not be able convince them to add this new incompatibility
type upstream. As discussed in
https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more
general incompatibility kind instead which can be used for arbitrary
problems. An upstream pull request has been opened for discussion at
https://github.com/pubgrub-rs/pubgrub/pull/153.
Related to:
- https://github.com/pubgrub-rs/pubgrub/issues/152
- #338
- #383
---------
Co-authored-by: konsti <konstin@mailbox.org>
A fork will let us stay up to date with the upstream while replaying our
work on top of it.
I expect a similar workflow to the RustPython-Parser fork we maintained,
except that I wrote an automation to create tags for each commit on the
fork (https://github.com/zanieb/pubgrub/pull/2) so we do not need to
manually tag and document each commit.
To update with the upstream:
- Rebase our fork's `main` branch on top of the latest changes in
upstream's `dev` branch
- Force push, overwriting our `main` branch history
- Change the commit hash here to the last commit on `main` in our fork
Since we automatically tag each commit on the fork, we should never lose
the commits that are dropped from `main` during rebase.
This works by filtering out files with a more recent upload time, so if
the index you use does not provide upload times, the results might be
inaccurate. pypi provides upload times for all files. This is, the field
is non-nullable in the warehouse schema, but the simple API PEP does not
know this field.
If you have only pypi dependencies, this means deterministic,
reproducible(!) resolution. We could try doing the same for git repos
but it doesn't seem worth the effort, i'd recommend pinning commits
since git histories are arbitrarily malleable and also if you care about
reproducibility and such you such not use git dependencies but a custom
index.
Timestamps are given either as RFC 3339 timestamps such as
`2006-12-02T02:07:43Z` or as UTC dates in the same format such as
`2006-12-02`. Dates are interpreted as including this day, i.e. until
midnight UTC that day. Date only is required to make this ergonomic and
midnight seems like an ergonomic choice.
In action for `pandas`:
```console
$ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
Resolved 6 packages in 679ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
numpy==1.26.2
# via pandas
pandas==2.1.3
python-dateutil==2.8.2
# via pandas
pytz==2023.3.post1
# via pandas
six==1.16.0
# via python-dateutil
tzdata==2023.3
# via pandas
$ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
Resolved 5 packages in 655ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
numpy==1.23.4
# via pandas
pandas==1.5.1
python-dateutil==2.8.2
# via pandas
pytz==2022.6
# via pandas
six==1.16.0
# via python-dateutil
$ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
Resolved 5 packages in 594ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
numpy==1.21.4
# via pandas
pandas==1.3.4
python-dateutil==2.8.2
# via pandas
pytz==2021.3
# via pandas
six==1.16.0
# via python-dateutil
```
[PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly
different, more relaxed rules around file metadata. These changes are
now reflected in the `File` struct. This will make it easier to support
alternative indices.
I had expected that i need to introduce a separate type for that, so i'm
happy it's two `Option`s more and an alias.
Part of #412
Previously, we were assuming that `which <python>` return the path to
the python executable. This is not true when using pyenv shims, which
are bash scripts. Instead, we have to use `sys.executable`. Luckily,
we're already querying the python interpreter and can do it in that
pass.
We are also not allowed to cache the execution of the python interpreter
through the shim because pyenv might change the target. As a heuristic,
we check whether `sys.executable`, the real binary, is the same our
canonicalized `which` result.
---------
Co-authored-by: Zanie Blue <contact@zanie.dev>
## Summary
This PR implements logic to sort wheels by priority, where priority is
defined as preferring more "specific" wheels over less "specific"
wheels. For example, in the case of Black, my machine now selects
`black-23.11.0-cp311-cp311-macosx_11_0_arm64.whl`, whereas sorting by
lowest priority instead gives me `black-23.11.0-py3-none-any.whl`.
As part of this change, I've also modified the resolver to fallback to
using incompatible wheels when determining package metadata, if no
compatible wheels are available.
The `VersionMap` was also moved out of `resolver.rs` and into its own
file with a wrapper type, for clarity.
Closes https://github.com/astral-sh/puffin/issues/380.
Closes https://github.com/astral-sh/puffin/issues/421.
The pip compile test now explicitly set their python version and `puffin
venv` resolves e.g. `python3.12` correctly now. The venv creation is
moved to a shared method
Implement two behaviors for yanked versions:
* During `pip-compile`, yanked versions are filtered out entirely, we
currently treat them is if they don't exist. This is leads to confusing
error messages because a version that does exist seems to have suddenly
disappeared.
* During `pip-sync`, we warn when we fetch a remote distribution and it
has been yanked. We currently don't warn on cached or installed
distributions that have been yanked.