musl (which we already use in ruff) allows statically linked binaries on
linux. This PR switches to rustls and vendors and fixes the glibc
detection. Using static musl builds makes it easier to avoid glibc
errors in docker and we'll need it later for alpine users anyway.
An alternative is using vendored openssl.
I didn't realize this, but they made a bunch of improvements to how
PubGrub represents versions which lets us greatly simplify our own
PubGrub version wrapper
(https://github.com/pubgrub-rs/guide/pull/6/files).
We now accept a pre-release if (1) all versions are pre-releases, or (2)
there was a pre-release marker in the dependency specifiers for a direct
dependency.
The code is written such that we can support a variety of pre-release
strategies.
Closes https://github.com/astral-sh/puffin/issues/191.
To check to top 1k (current state):
```bash
scripts/resolve/get_pypi_top_8k.sh
cargo run --bin puffin-dev -- resolve-many scripts/resolve/pypi_top_8k_flat.txt --limit 1000
```
Results:
```
Errors: pywin32, geoip2, maxminddb, pypika, dirac
Success: 995, Error: 5
```
pywin32 has no solution for the build environment, 3 have no
`[build-system]` entry in pyproject.toml, `dirac` is missing cmake
Currently, this is only the source distribution building feature moved.
It's intended that we can add development and test commands there
without affecting the main cli surface
Select a compatible wheel for a version, even we already found a source
distribution previously.
If no wheel is found, select the most recent source distribution, not
the oldest compatible one.
This fixes the resolution of `mst.in`, which i added
Like `pip-compile`, we now respect existing versions from the
`requirements.txt` provided via `--output-file`, unless you pass a
`--upgrade` flag.
Closes#166.
Modifies the resolver to remove any incompatible distributions upfront,
and store them in an index by version. This will be necessary to support
`--upgrade` semantics.
This actually does cause a meaningful slowdown right now (since we now
iterate over all files, even if we otherwise never would've needed to
touch them), but we should be able to optimize it out later.
Everywhere else, we use cache to refer to a filesystem cache, so this is
kind of confusing. It's really an in-memory index that we build up over
the course of the solve.
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.
Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.
The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.
It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
As elsewhere, we just use the `pip` and `pip-compile` APIs. So we
support `--index-url` to override PyPI, then `--extra-index-url` to add
_additional_ indexes, and `--no-index` to avoid hitting the index at
all.
Closes#156.
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)
Closes#159.
Borrows terminology from pnpm by introducing three resolution modes:
- "Highest": always choose the highest compliant version (default).
- "Lowest": always choose the lowest compliant version.
- "LowestDirect": choose the lowest compliant version of direct
dependencies, and the highest compliant version of any transitive
dependencies. (This makes a bit more sense than "lowest".)
Closes https://github.com/astral-sh/puffin/issues/142.
The need for this became clear when working on the source distribution
integration into the resolver.
While at it i also switch the `WheelFilename` version to the parsed
`pep440_rs` version now that we have this crate.
Builds up a complete resolved graph from PubGrub, and shows the sources
that led to each package being included in the resolution, like
`pip-compile`.
Closes https://github.com/astral-sh/puffin/issues/60.
Kind of an oversight in my initial implementation. If we find that any
package has _no_ matching versions, we should select it! This lets us
short-circuit _immediately_ when top-level dependencies aren't
satisfiable.
Updates to `29c48fb9f3daa11bd02794edd55060d0b01ee705` from the
`pubgrub-rs` dev branch. This lets us reduce the number of changes we've
made to PubGrub itself (now, only changing visibility to export a few
things from the `solver.rs` module).
Gets rid of the custom `DistInfo` struct in the site-packages
abstraction in favor of a new kind of distribution
(`InstalledDistribution`). No change in behavior.
This is also a lot faster. Unfortunately it copies a lot of code from
the sync cli since the `Printer` is private.
The first commit are some refactorings i made when i thought about how i
could reuse the existing code.
Support calling `prepare_metadata_for_build_wheel`, which can give you
the metadata without executing the actual build if the backend supports
it.
This makes the code a lot uglier since we effectively have a state
machine:
* Setup: Either venv plus requires (PEP 517) or just a venv (setup.py)
* Get metadata (optional step): None (setup.py) or
`prepare_metadata_for_build_wheel` and saving that result
* Build: `setup.py`, `build_wheel()` or
`build_wheel(metadata_directory=metadata_directory)`, but i think i got
general flow right.
@charliermarsh This is a "barely works but unblocks building on top"
implementation, say if you want more polishing (i'll look at this again
tomorrow)
This needs far better error handling and user-facing feedback, but it
does the basic operation (and includes discovery of the `pyproject.toml`
file, etc.).
This PR enables us to make "fixups" to bad metadata. I copied over the
one fixup that @konstin made in `monotrail-resolve`, and added a few
common ones for `Requires-Python`.
This PR reverts #109 which is actually a performance _regression_ since
we need to iterate over a bunch of wheels that we could otherwise
entirely ignore.
Rather than constantly iterating over all files and testing their
compatibility with the current platform, just store wheels we can
actually consider in the solver cache.
This adds a basic sdist builder that has been tested with two source
distributions, one with a PEP 517 backend and one with setup.py.
It uses pip for requirements installation atm, lacks testing in all
directions, lacks checks for recursive requirements, can't pass in
already resolved versions, doesn't support prepare metadata for build to
allow resolution to continue without doing the actual (native) build,
error messages are mediocre, etc.
```console
$ RUST_LOG=puffin_build=debug puffin-build --wheels wheels downloads/tqdm-4.66.1.tar.gz
2023-10-16T12:28:35.503182Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/tqdm-4.66.1.tar.gz
2023-10-16T12:28:35.521780Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=18.4ms time.idle=16.7µs
2023-10-16T12:28:35.845096Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: Calling pip to install build dependencies
2023-10-16T12:28:37.668660Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: close time.busy=1.82s time.idle=13.2µs
2023-10-16T12:28:37.668744Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.get_requires_for_build_wheel()`
2023-10-16T12:28:38.159205Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=490ms time.idle=13.0µs
2023-10-16T12:28:38.159304Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.build_wheel()`
2023-10-16T12:28:38.501732Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=342ms time.idle=15.2µs
2023-10-16T12:28:38.522700Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=3.02s time.idle=16.2µs
Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/tqdm-4.66.1-py3-none-any.whl
2023-10-16T12:28:38.522772Z DEBUG puffin_build: Took 3020ms
$ puffin-build --wheels wheels downloads/geoextract-0.3.1.tar.gz
2023-10-16T12:28:40.884622Z DEBUG build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/geoextract-0.3.1.tar.gz
2023-10-16T12:28:40.887743Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=2.97ms time.idle=12.6µs
2023-10-16T12:28:41.469738Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=585ms time.idle=15.3µs
Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/geoextract-0.3.1-py3-none-any.whl
2023-10-16T12:28:41.469814Z DEBUG puffin_build: Took 585ms
```
The main change is to print the whole error chain. We can combine this
with adding `.context` to distinct phases to be able to locate crashes
without having to use a debugger.
## Summary
This PR enables the proof-of-concept resolver to backtrack by way of
using the `pubgrub-rs` crate.
Rather than using PubGrub as a _framework_ (implementing the
`DependencyProvider` trait, letting PubGrub call us), I've instead
copied over PubGrub's primary solver hook (which is only ~100 lines or
so) and modified it for our purposes (e.g., made it async).
There's a lot to improve here, but it's a start that will let us
understand PubGrub's appropriateness for this problem space. A few
observations:
- In simple cases, the resolver is slower than our current (naive)
resolver. I think it's just that the pipelining isn't as efficient as in
the naive case, where we can just stream package and version fetches
concurrently without any bottlenecks.
- A lot of the code here relates to bridging PubGrub with our own
abstractions -- so we need a `PubGrubPackage`, a `PubGrubVersion`, etc.
This fixes two bugs on linux:
`/tmp` and `$HOME` are technically on two different partitions on my
machine, which means that rename-as-atomic-dir-write doesn't work. The
solution is to create the temp dir in the target directory.
zip files may contain directory entries, we can't create files for them
but need to create directories. We could skip them though because iirc
they are not in the RECORD so they won't be uninstalled.
We can always restore these from history, but right now, it feels a lot
more productive to just hit PyPI directly for our integration tests,
since we don't have to spend time figuring out mocks.
Remove the parser I wrote in favor of Konsti's which is much more
complete. The only change vs. the version in `poc-monotrail` is that I
changed the tests to use insta rather than manually storing and
comparing against JSON snapshots.
Closes https://github.com/astral-sh/puffin/issues/89.
It looks like using _either_ async Rust with a `JoinSet` _or_
parallelizing a fixed threadpool with Rayon provide about a ~5% speed-up
over our current serial approach:
```console
❯ hyperfine --runs 30 --warmup 5 --prepare "./target/release/puffin venv .venv" \
"./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt" \
"./target/release/async sync ./scripts/benchmarks/requirements-large.txt" \
"./target/release/main sync ./scripts/benchmarks/requirements-large.txt"
Benchmark 1: ./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt
Time (mean ± σ): 295.7 ms ± 16.9 ms [User: 28.6 ms, System: 263.3 ms]
Range (min … max): 249.2 ms … 315.9 ms 30 runs
Benchmark 2: ./target/release/async sync ./scripts/benchmarks/requirements-large.txt
Time (mean ± σ): 296.2 ms ± 20.2 ms [User: 36.1 ms, System: 340.1 ms]
Range (min … max): 258.0 ms … 359.4 ms 30 runs
Benchmark 3: ./target/release/main sync ./scripts/benchmarks/requirements-large.txt
Time (mean ± σ): 306.6 ms ± 19.5 ms [User: 25.3 ms, System: 220.5 ms]
Range (min … max): 269.6 ms … 332.2 ms 30 runs
Summary
'./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt' ran
1.00 ± 0.09 times faster than './target/release/async sync ./scripts/benchmarks/requirements-large.txt'
1.04 ± 0.09 times faster than './target/release/main sync ./scripts/benchmarks/requirements-large.txt'
```
It's much easier to just parallelize with Rayon and avoid async in the
underlying wheel code, so this PR takes that approach for now.
Mocks out the PyPI client using some checked-in fixtures. The test is
very basic, and I'm not very happy with all the ceremony around the
mocks and such, but it's an interesting experiment at least.
When we go to install a locked `requirements.txt`, if a wheel is already
available in the local cache, and matches the version specifiers, we can
just use it directly without fetching the package metadata. This speeds
up the no-op case by about 33%.
Closes https://github.com/astral-sh/puffin/issues/48.
I think this isn't necessary to support in this generic crate. If we
choose to adopt Monotrail-style concepts, we'll likely need to rework
them anyway.
It turns out that on macOS, you can pass `clonefile` a directory to
recursively copy an entire directory. This speeds up wheel installation
dramatically, by about 3x.
The setup is now as follows:
- All user-facing logging goes through `tracing` at an `info` leve.
(This excludes messages that go to `stdout`, like the compiled
`requirements.txt` file.)
- We have `--quiet` and `--verbose` command-line flags to set the
tracing filter and format defaults. So if you use `--verbose`, we
include timestamps and targets, and filter at `puffin=debug` level.
- However, we always respect `RUST_LOG`. So you can override the
_filter_ via `RUST_LOG`.
For example: the standard setup filters to `puffin=info`, and doesn't
show timestamps or targets:
<img width="1235" alt="Screen Shot 2023-10-08 at 3 41 22 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/54ca4db6-c66a-439e-bfa3-b86dee136e45">
If you run with `--verbose`, you get debug logging, but confined to our
crates:
<img width="1235" alt="Screen Shot 2023-10-08 at 3 41 57 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/c5c1af11-7f7a-4038-a173-d9eca4c3630b">
If you want verbose logging with _all_ crates, you can add
`RUST_LOG=debug`:
<img width="1235" alt="Screen Shot 2023-10-08 at 3 42 39 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/0b5191f4-4db0-4db9-86ba-6f9fa521bcb6">
I think this is a reasonable setup, though we can see how it feels and
refine over time.
Closes https://github.com/astral-sh/puffin/issues/57.
This PR gets `gourgeist` passing our local CI and integrated into the
broader workspace.
There's some duplicate between concepts in `gourgeist` (like the
`InterpreterInfo`) and structs we have elsewhere, but we can tackle
those later.
This PR copies over the `gourgeist` crate at commit
`e64c17a263dac6933702dc8d155425c053fe885a` with no modifications.
It won't pass CI, but modifications will intentionally be confined to
later PRs.
This PR massively speeds up the case in which you need to install wheels
that already exist in the global cache.
The new strategy is as follows:
- Download the wheel into the content-addressed cache.
- Unzip the wheel into the cache, but ignore content-addressing. It
turns out that writing to `cacache` for every file in the zip added a
ton of overhead, and I don't see any actual advantages to doing so.
Instead, we just unzip the contents into a directory at, e.g.,
`~/.cache/puffin/django-4.1.5`.
- (The unzip itself is now parallelized with Rayon.)
- When installing the wheel, we now support unzipping from a directory
instead of a zip archive. This required duplicating and tweaking a few
functions.
- When installing the wheel, we now use reflinks (or copy-on-write
links). These have a few fantastic properties: (1) they're extremely
cheap to create (on macOS, they are allegedly faster than hard links);
(2) they minimize disk space, since we avoid copying files entirely in
the vast majority of cases; and (3) if the user then edits a file
locally, the cache doesn't get polluted. Orogene, Bun, and soon pnpm all
use reflinks.
Puffin is now ~15x faster than `pip` for the common case of installing
cached data into a fresh environment.
Closes https://github.com/astral-sh/puffin/issues/21.
Closes https://github.com/astral-sh/puffin/issues/39.
This PR modifies the `install-wheel-rs` (and a few other crates) to get
everything playing nicely. Specifically, CI should pass, and all these
crates now use workspace dependencies between one another.
As part of this change, I split out the wheel name parsing into its own
`wheel-filename` crate, and the compatibility tag parsing into its own
`platform-tags` crate.
This PR copies over the `install-wheel-rs` crate at commit
`10730ea1a84c58af6b35fb74c89ed0578ab042b6` with no modifications.
It won't pass CI, but modifications will intentionally be confined to
later PRs.
This PR modifies the PEP 440 and PEP 508 crates to pass CI, primarily by
fixing all lint violations.
We're also now using these crates in the workspace via `path`.
(Previously, we were still fetching them from Cargo.)
This PR copies over the `pep440-rs` crate at commit
`82aa5d4dcbe676b121dc931b0afa09a82de8e3d7` with no modifications.
It won't pass CI, but modifications will intentionally be confined to
later PRs.
This PR copies over the `pep440-rs` crate at commit
`a8303b01ffef6fccfdce562a887f6b110d482ef3` with no modifications.
It won't pass CI, but modifications will intentionally be confined to
later PRs.