## Summary
This is a first-pass at adding source distribution support to the
installer.
The previous installation flow was:
1. Come up with a plan.
1. Find a distribution (specific file) for every package that we'll need
to download.
1. Download those distributions.
1. Unzip them (since we assumed they were all wheels).
1. Install them into the virtual environment.
Now, Step (3) downloads both wheels and source distributions, and we
insert a step between Steps (3) and (4) to build any source
distributions into zipped wheels.
There are a bunch of TODOs, the most important (IMO) is that we
basically have two implementations of downloading and building, between
the stuff in `puffin_installer` and `puffin_resolver` (namely in
`crates/puffin-resolver/src/distribution`). I didn't attempt to clean
that up here -- it's already a problem, and it's related to the overall
problem we need to solve around unified caching and resource management.
Closes#243.
`PackageName` and `ExtraName` can now only be constructed from valid
names. They share the same rules, so i gave them the same
implementation. Constructors are split between `new` (owned) and
`from_str` (borrowed), with the owned version avoiding allocations.
Closes#279
---------
Co-authored-by: Zanie <contact@zanie.dev>
## Summary
This just enables the `DistributionFinder` (previously known as the
`WheelFinder`) to select source distributions when there are no matching
wheels for a given platform. As a reminder, the `DistributionFinder` is
a simple resolver that doesn't look at any dependencies: it just takes a
set of pinned packages, and finds a distribution to install to satisfy
each requirement.
Extends #295Closes#214
Copies some of the implementations from `pubgrub::report` so we can
implement Puffin `PubGrubPackage` specific display when explaining
failed resolutions.
Here, we just drop the dummy version number if it's a
`PubGrubPackage::Root` package. In the future, we can further customize
reporting.
Part of https://github.com/astral-sh/puffin/issues/214
Adds a `project: Option<PackageName>` to the `Manifest`, `Resolver`, and
`RequirementsSpecification`.
To populate an optional `name` for `PubGubPackage::Root`.
I'll work on removing the version number next.
Should we consider using the parent directory name when a
`pyproject.toml` file is not present?
This PR makes the cache non-optional in most of Puffin, which simplifies
the code, allows us to reuse the cache within a single command (even
with `--no-cache`), and also allows us to use the cache for disk storage
across an invocation.
I left the cache as optional for the `Virtualenv` and `InterpreterInfo`
abstractions, since those are generic enough that it seems nice to have
a non-cached version, but it's kind of arbitrary.
The normalized name abstractions were not consistently, this PR uses
them where they were previously missing:
* `WheelFilename::distribution`
* `Requirement::name`
* `Requirement::extras`
* `Metadata21::name`
* `Metadata21::provides_dist`
With `puffin-package` depending on `pep508_rs` this would be cyclical
crate dependency, so `puffin-normalize` gets split out from
`puffin-package`.
`DistInfoName` has the same task and semantics as `PackageName`, so it's
merged into the latter.
`PackageName` and `ExtraName` documentation is moved onto the type and
their constructors are called `new` instead of `normalize`. We now use
these constructors rarely enough the implicit allocation by
`to_string()` shouldn't matter anymore, while more actual cloning
becomes visible.
## Summary
This PR adds support for resolving and installing dependencies via
direct URLs, like:
```
werkzeug @ 960bb4017c4aed12b5ed8b78e0153e/Werkzeug-2.0.0-py3-none-any.whl
```
These are fairly common (e.g., with `torch`), but you most often see
them as Git dependencies.
Broadly, structs like `RemoteDistribution` and friends are now enums
that can represent either registry-based dependencies or URL-based
dependencies:
```rust
/// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`).
#[derive(Debug, Clone)]
#[allow(clippy::large_enum_variant)]
pub enum RemoteDistribution {
/// The distribution exists in a registry, like `PyPI`.
Registry(PackageName, Version, File),
/// The distribution exists at an arbitrary URL.
Url(PackageName, Url),
}
```
In the resolver, we now allow packages to take on an extra, optional
`Url` field:
```rust
#[derive(Debug, Clone, Eq, Derivative)]
#[derivative(PartialEq, Hash)]
pub enum PubGrubPackage {
Root,
Package(
PackageName,
Option<DistInfoName>,
#[derivative(PartialEq = "ignore")]
#[derivative(PartialOrd = "ignore")]
#[derivative(Hash = "ignore")]
Option<Url>,
),
}
```
However, for the purpose of version satisfaction, we ignore the URL.
This allows for the URL dependency to satisfy the transitive request in
cases like:
```
flask==3.0.0
werkzeug @ 254c3e9b5f5941e900b71206e6313b/werkzeug-3.0.1-py3-none-any.whl
```
There are a couple limitations in the current approach:
- The caching for remote URLs is done separately in the resolver vs. the
installer. I decided not to sweat this too much... We need to figure out
caching holistically.
- We don't support any sort of time-based cache for remote URLs -- they
just exist forever. This will be a problem for URL dependencies, where
we need some way to evict and refresh them. But I've deferred it for
now.
- I think I need to redo how this is modeled in the resolver, because
right now, we don't detect a variety of invalid cases, e.g., providing
two different URLs for a dependency, asking for a URL dependency and a
_different version_ of the same dependency in the list of first-party
dependencies, etc.
- (We don't yet support VCS dependencies.)
This also allows us to get rid of `PinnedPackage` _and_ to remove some
`Result<...>` types due to needless conversions between
otherwise-identical types.
Extends #253Closes#241
Adds `extras` to `RequirementsSpecification` to track extras used to
construct the requirements so we can throw an error when not all of the
requested extras are used.
Going to add some tests.
Extends #239Closes#245
Normalizes optional dependency group names found in pyproject files
before comparing them to the normalized user-requested extras.
Adds support for `pip-compile --extra <name> ...` which includes
optional dependencies in the specified group in the resolution.
Following precedent in `pip-compile`, if a given extra is not found,
there is no error. ~We could consider warning in this case.~ We should
probably add an error but it expands scope and will be considered
separately in #241
We now accept a pre-release if (1) all versions are pre-releases, or (2)
there was a pre-release marker in the dependency specifiers for a direct
dependency.
The code is written such that we can support a variety of pre-release
strategies.
Closes https://github.com/astral-sh/puffin/issues/191.
Like `pip-compile`, we now respect existing versions from the
`requirements.txt` provided via `--output-file`, unless you pass a
`--upgrade` flag.
Closes#166.
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.
Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.
The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.
It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
As elsewhere, we just use the `pip` and `pip-compile` APIs. So we
support `--index-url` to override PyPI, then `--extra-index-url` to add
_additional_ indexes, and `--no-index` to avoid hitting the index at
all.
Closes#156.
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)
Closes#159.
Borrows terminology from pnpm by introducing three resolution modes:
- "Highest": always choose the highest compliant version (default).
- "Lowest": always choose the lowest compliant version.
- "LowestDirect": choose the lowest compliant version of direct
dependencies, and the highest compliant version of any transitive
dependencies. (This makes a bit more sense than "lowest".)
Closes https://github.com/astral-sh/puffin/issues/142.
Builds up a complete resolved graph from PubGrub, and shows the sources
that led to each package being included in the resolution, like
`pip-compile`.
Closes https://github.com/astral-sh/puffin/issues/60.
Gets rid of the custom `DistInfo` struct in the site-packages
abstraction in favor of a new kind of distribution
(`InstalledDistribution`). No change in behavior.
This is also a lot faster. Unfortunately it copies a lot of code from
the sync cli since the `Printer` is private.
The first commit are some refactorings i made when i thought about how i
could reuse the existing code.
This needs far better error handling and user-facing feedback, but it
does the basic operation (and includes discovery of the `pyproject.toml`
file, etc.).
The main change is to print the whole error chain. We can combine this
with adding `.context` to distinct phases to be able to locate crashes
without having to use a debugger.
## Summary
This PR enables the proof-of-concept resolver to backtrack by way of
using the `pubgrub-rs` crate.
Rather than using PubGrub as a _framework_ (implementing the
`DependencyProvider` trait, letting PubGrub call us), I've instead
copied over PubGrub's primary solver hook (which is only ~100 lines or
so) and modified it for our purposes (e.g., made it async).
There's a lot to improve here, but it's a start that will let us
understand PubGrub's appropriateness for this problem space. A few
observations:
- In simple cases, the resolver is slower than our current (naive)
resolver. I think it's just that the pipelining isn't as efficient as in
the naive case, where we can just stream package and version fetches
concurrently without any bottlenecks.
- A lot of the code here relates to bridging PubGrub with our own
abstractions -- so we need a `PubGrubPackage`, a `PubGrubVersion`, etc.
Remove the parser I wrote in favor of Konsti's which is much more
complete. The only change vs. the version in `poc-monotrail` is that I
changed the tests to use insta rather than manually storing and
comparing against JSON snapshots.
Closes https://github.com/astral-sh/puffin/issues/89.
When we go to install a locked `requirements.txt`, if a wheel is already
available in the local cache, and matches the version specifiers, we can
just use it directly without fetching the package metadata. This speeds
up the no-op case by about 33%.
Closes https://github.com/astral-sh/puffin/issues/48.
The setup is now as follows:
- All user-facing logging goes through `tracing` at an `info` leve.
(This excludes messages that go to `stdout`, like the compiled
`requirements.txt` file.)
- We have `--quiet` and `--verbose` command-line flags to set the
tracing filter and format defaults. So if you use `--verbose`, we
include timestamps and targets, and filter at `puffin=debug` level.
- However, we always respect `RUST_LOG`. So you can override the
_filter_ via `RUST_LOG`.
For example: the standard setup filters to `puffin=info`, and doesn't
show timestamps or targets:
<img width="1235" alt="Screen Shot 2023-10-08 at 3 41 22 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/54ca4db6-c66a-439e-bfa3-b86dee136e45">
If you run with `--verbose`, you get debug logging, but confined to our
crates:
<img width="1235" alt="Screen Shot 2023-10-08 at 3 41 57 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/c5c1af11-7f7a-4038-a173-d9eca4c3630b">
If you want verbose logging with _all_ crates, you can add
`RUST_LOG=debug`:
<img width="1235" alt="Screen Shot 2023-10-08 at 3 42 39 PM"
src="https://github.com/astral-sh/puffin/assets/1309177/0b5191f4-4db0-4db9-86ba-6f9fa521bcb6">
I think this is a reasonable setup, though we can see how it feels and
refine over time.
Closes https://github.com/astral-sh/puffin/issues/57.
This PR modifies the `install-wheel-rs` (and a few other crates) to get
everything playing nicely. Specifically, CI should pass, and all these
crates now use workspace dependencies between one another.
As part of this change, I split out the wheel name parsing into its own
`wheel-filename` crate, and the compatibility tag parsing into its own
`platform-tags` crate.