## Summary
This PR adds support for resolving and installing dependencies via
direct URLs, like:
```
werkzeug @ 960bb4017c4aed12b5ed8b78e0153e/Werkzeug-2.0.0-py3-none-any.whl
```
These are fairly common (e.g., with `torch`), but you most often see
them as Git dependencies.
Broadly, structs like `RemoteDistribution` and friends are now enums
that can represent either registry-based dependencies or URL-based
dependencies:
```rust
/// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`).
#[derive(Debug, Clone)]
#[allow(clippy::large_enum_variant)]
pub enum RemoteDistribution {
/// The distribution exists in a registry, like `PyPI`.
Registry(PackageName, Version, File),
/// The distribution exists at an arbitrary URL.
Url(PackageName, Url),
}
```
In the resolver, we now allow packages to take on an extra, optional
`Url` field:
```rust
#[derive(Debug, Clone, Eq, Derivative)]
#[derivative(PartialEq, Hash)]
pub enum PubGrubPackage {
Root,
Package(
PackageName,
Option<DistInfoName>,
#[derivative(PartialEq = "ignore")]
#[derivative(PartialOrd = "ignore")]
#[derivative(Hash = "ignore")]
Option<Url>,
),
}
```
However, for the purpose of version satisfaction, we ignore the URL.
This allows for the URL dependency to satisfy the transitive request in
cases like:
```
flask==3.0.0
werkzeug @ 254c3e9b5f5941e900b71206e6313b/werkzeug-3.0.1-py3-none-any.whl
```
There are a couple limitations in the current approach:
- The caching for remote URLs is done separately in the resolver vs. the
installer. I decided not to sweat this too much... We need to figure out
caching holistically.
- We don't support any sort of time-based cache for remote URLs -- they
just exist forever. This will be a problem for URL dependencies, where
we need some way to evict and refresh them. But I've deferred it for
now.
- I think I need to redo how this is modeled in the resolver, because
right now, we don't detect a variety of invalid cases, e.g., providing
two different URLs for a dependency, asking for a URL dependency and a
_different version_ of the same dependency in the list of first-party
dependencies, etc.
- (We don't yet support VCS dependencies.)
Tests would sometimes flake with this locally e.g. "1.50s" was not
filtered correctly.
Verified with
```diff
diff --git a/crates/puffin-cli/src/commands/pip_compile.rs b/crates/puffin-cli/src/commands/pip_compile.rs
index 0193216..2d6f8af 100644
--- a/crates/puffin-cli/src/commands/pip_compile.rs
+++ b/crates/puffin-cli/src/commands/pip_compile.rs
@@ -150,6 +150,8 @@ pub(crate) async fn pip_compile(
result => result,
}?;
+ std:🧵:sleep(std::time::Duration::from_secs(1));
+
let s = if resolution.len() == 1 { "" } else { "s" };
writeln!(
printer,
```
This also allows us to get rid of `PinnedPackage` _and_ to remove some
`Result<...>` types due to needless conversions between
otherwise-identical types.
Extends #253Closes#241
Adds `extras` to `RequirementsSpecification` to track extras used to
construct the requirements so we can throw an error when not all of the
requested extras are used.
Going to add some tests.
Extends #239Closes#245
Normalizes optional dependency group names found in pyproject files
before comparing them to the normalized user-requested extras.
Adds support for `pip-compile --extra <name> ...` which includes
optional dependencies in the specified group in the resolution.
Following precedent in `pip-compile`, if a given extra is not found,
there is no error. ~We could consider warning in this case.~ We should
probably add an error but it expands scope and will be considered
separately in #241
There are packages such as DTLSSocket 0.1.16 that say
```toml
[build-system]
requires = ["Cython<3", "setuptools", "wheel"]
```
In this case we need to install requires PEP 517 style but then call setup.py in the
legacy way
Part of making home-assistant work
musl (which we already use in ruff) allows statically linked binaries on
linux. This PR switches to rustls and vendors and fixes the glibc
detection. Using static musl builds makes it easier to avoid glibc
errors in docker and we'll need it later for alpine users anyway.
An alternative is using vendored openssl.
I didn't realize this, but they made a bunch of improvements to how
PubGrub represents versions which lets us greatly simplify our own
PubGrub version wrapper
(https://github.com/pubgrub-rs/guide/pull/6/files).
We now accept a pre-release if (1) all versions are pre-releases, or (2)
there was a pre-release marker in the dependency specifiers for a direct
dependency.
The code is written such that we can support a variety of pre-release
strategies.
Closes https://github.com/astral-sh/puffin/issues/191.
To check to top 1k (current state):
```bash
scripts/resolve/get_pypi_top_8k.sh
cargo run --bin puffin-dev -- resolve-many scripts/resolve/pypi_top_8k_flat.txt --limit 1000
```
Results:
```
Errors: pywin32, geoip2, maxminddb, pypika, dirac
Success: 995, Error: 5
```
pywin32 has no solution for the build environment, 3 have no
`[build-system]` entry in pyproject.toml, `dirac` is missing cmake
Currently, this is only the source distribution building feature moved.
It's intended that we can add development and test commands there
without affecting the main cli surface
Select a compatible wheel for a version, even we already found a source
distribution previously.
If no wheel is found, select the most recent source distribution, not
the oldest compatible one.
This fixes the resolution of `mst.in`, which i added
Like `pip-compile`, we now respect existing versions from the
`requirements.txt` provided via `--output-file`, unless you pass a
`--upgrade` flag.
Closes#166.
Modifies the resolver to remove any incompatible distributions upfront,
and store them in an index by version. This will be necessary to support
`--upgrade` semantics.
This actually does cause a meaningful slowdown right now (since we now
iterate over all files, even if we otherwise never would've needed to
touch them), but we should be able to optimize it out later.
Everywhere else, we use cache to refer to a filesystem cache, so this is
kind of confusing. It's really an in-memory index that we build up over
the course of the solve.
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.
Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.
The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.
It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
As elsewhere, we just use the `pip` and `pip-compile` APIs. So we
support `--index-url` to override PyPI, then `--extra-index-url` to add
_additional_ indexes, and `--no-index` to avoid hitting the index at
all.
Closes#156.
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)
Closes#159.
Borrows terminology from pnpm by introducing three resolution modes:
- "Highest": always choose the highest compliant version (default).
- "Lowest": always choose the lowest compliant version.
- "LowestDirect": choose the lowest compliant version of direct
dependencies, and the highest compliant version of any transitive
dependencies. (This makes a bit more sense than "lowest".)
Closes https://github.com/astral-sh/puffin/issues/142.
The need for this became clear when working on the source distribution
integration into the resolver.
While at it i also switch the `WheelFilename` version to the parsed
`pep440_rs` version now that we have this crate.
Builds up a complete resolved graph from PubGrub, and shows the sources
that led to each package being included in the resolution, like
`pip-compile`.
Closes https://github.com/astral-sh/puffin/issues/60.
Kind of an oversight in my initial implementation. If we find that any
package has _no_ matching versions, we should select it! This lets us
short-circuit _immediately_ when top-level dependencies aren't
satisfiable.
Updates to `29c48fb9f3daa11bd02794edd55060d0b01ee705` from the
`pubgrub-rs` dev branch. This lets us reduce the number of changes we've
made to PubGrub itself (now, only changing visibility to export a few
things from the `solver.rs` module).
Gets rid of the custom `DistInfo` struct in the site-packages
abstraction in favor of a new kind of distribution
(`InstalledDistribution`). No change in behavior.
This is also a lot faster. Unfortunately it copies a lot of code from
the sync cli since the `Printer` is private.
The first commit are some refactorings i made when i thought about how i
could reuse the existing code.
Support calling `prepare_metadata_for_build_wheel`, which can give you
the metadata without executing the actual build if the backend supports
it.
This makes the code a lot uglier since we effectively have a state
machine:
* Setup: Either venv plus requires (PEP 517) or just a venv (setup.py)
* Get metadata (optional step): None (setup.py) or
`prepare_metadata_for_build_wheel` and saving that result
* Build: `setup.py`, `build_wheel()` or
`build_wheel(metadata_directory=metadata_directory)`, but i think i got
general flow right.
@charliermarsh This is a "barely works but unblocks building on top"
implementation, say if you want more polishing (i'll look at this again
tomorrow)
This needs far better error handling and user-facing feedback, but it
does the basic operation (and includes discovery of the `pyproject.toml`
file, etc.).
This PR enables us to make "fixups" to bad metadata. I copied over the
one fixup that @konstin made in `monotrail-resolve`, and added a few
common ones for `Requires-Python`.
This PR reverts #109 which is actually a performance _regression_ since
we need to iterate over a bunch of wheels that we could otherwise
entirely ignore.
Rather than constantly iterating over all files and testing their
compatibility with the current platform, just store wheels we can
actually consider in the solver cache.
This adds a basic sdist builder that has been tested with two source
distributions, one with a PEP 517 backend and one with setup.py.
It uses pip for requirements installation atm, lacks testing in all
directions, lacks checks for recursive requirements, can't pass in
already resolved versions, doesn't support prepare metadata for build to
allow resolution to continue without doing the actual (native) build,
error messages are mediocre, etc.
```console
$ RUST_LOG=puffin_build=debug puffin-build --wheels wheels downloads/tqdm-4.66.1.tar.gz
2023-10-16T12:28:35.503182Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/tqdm-4.66.1.tar.gz
2023-10-16T12:28:35.521780Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=18.4ms time.idle=16.7µs
2023-10-16T12:28:35.845096Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: Calling pip to install build dependencies
2023-10-16T12:28:37.668660Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: close time.busy=1.82s time.idle=13.2µs
2023-10-16T12:28:37.668744Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.get_requires_for_build_wheel()`
2023-10-16T12:28:38.159205Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=490ms time.idle=13.0µs
2023-10-16T12:28:38.159304Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.build_wheel()`
2023-10-16T12:28:38.501732Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=342ms time.idle=15.2µs
2023-10-16T12:28:38.522700Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=3.02s time.idle=16.2µs
Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/tqdm-4.66.1-py3-none-any.whl
2023-10-16T12:28:38.522772Z DEBUG puffin_build: Took 3020ms
$ puffin-build --wheels wheels downloads/geoextract-0.3.1.tar.gz
2023-10-16T12:28:40.884622Z DEBUG build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/geoextract-0.3.1.tar.gz
2023-10-16T12:28:40.887743Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=2.97ms time.idle=12.6µs
2023-10-16T12:28:41.469738Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=585ms time.idle=15.3µs
Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/geoextract-0.3.1-py3-none-any.whl
2023-10-16T12:28:41.469814Z DEBUG puffin_build: Took 585ms
```
The main change is to print the whole error chain. We can combine this
with adding `.context` to distinct phases to be able to locate crashes
without having to use a debugger.
## Summary
This PR enables the proof-of-concept resolver to backtrack by way of
using the `pubgrub-rs` crate.
Rather than using PubGrub as a _framework_ (implementing the
`DependencyProvider` trait, letting PubGrub call us), I've instead
copied over PubGrub's primary solver hook (which is only ~100 lines or
so) and modified it for our purposes (e.g., made it async).
There's a lot to improve here, but it's a start that will let us
understand PubGrub's appropriateness for this problem space. A few
observations:
- In simple cases, the resolver is slower than our current (naive)
resolver. I think it's just that the pipelining isn't as efficient as in
the naive case, where we can just stream package and version fetches
concurrently without any bottlenecks.
- A lot of the code here relates to bridging PubGrub with our own
abstractions -- so we need a `PubGrubPackage`, a `PubGrubVersion`, etc.
This fixes two bugs on linux:
`/tmp` and `$HOME` are technically on two different partitions on my
machine, which means that rename-as-atomic-dir-write doesn't work. The
solution is to create the temp dir in the target directory.
zip files may contain directory entries, we can't create files for them
but need to create directories. We could skip them though because iirc
they are not in the RECORD so they won't be uninstalled.