## Summary
This was just a missing line -- we have `dependencies.remove(&package);`
in the ~identical branch above, but it must've been an oversight to omit
it here.
Closes https://github.com/astral-sh/uv/issues/1467.
## Test Plan
`cargo test`
## Summary
It turns out that it's not uncommon to end up with repeated packages in
requirements files when running `pip-sync`, e.g., you might have
`anyio==4.0.0` specified multiple times. This PR relaxes our assertions
in the install plan to allow such repeated packages, as long as the
requirement markers are exactly the same (i.e., they are truly
duplicates).
Closes https://github.com/astral-sh/uv/issues/1552.
## Summary
If you're developing on a package like `attrs` locally, and it has a
recursive extra like `attrs[dev]`, it turns out that we then try to find
the `attrs` in `attrs[dev]` from the registry, rather than recognizing
that it's part of the editable.
This PR fixes the issue by making editables slightly more first-class
throughout the resolver. Instead of mocking metadata, we explicitly
check for extras in various places. Part of the problem here is that we
treated editables as URL dependencies, but when we saw an _extra_ like
`attrs[dev]`, we didn't map that back to the URL. So now, we treat them
as registry dependencies, but with the appropriate guardrails
throughout.
Closes https://github.com/astral-sh/uv/issues/1447.
## Test Plan
- Cloned `attrs`.
- Ran `cargo run venv && cargo run pip install -e ".[dev]" -v`.
## Summary
This _could_ fix https://github.com/astral-sh/uv/issues/1454, but I'm
not sure. I was able to replicate by forcing a bunch of error states.
But, in short, if we fail to hardlink on the initial copy due to a file
existing, and then fail _again_, we fallback to copying. But if we copy,
then the tempfile doesn't exist, and so the `fs_err::rename(&tempfile,
&out_path)?;` will fail with "File not found".
This PR just ensures that the cases are explicitly mutually exclusive:
we only attempt to rename if the hardlink succeeded.
This PR fixes the OS detection for Alpine Linux such that the version
of musl available is correctly determined. The issue boiled down to
a regex that required 2 digits for each version component. But a
valid musl version is 1.2.4, which only has a single digit for each
component.
It's unclear how this was working for musl before this change. My
theory is that our other methods of OS detection were somehow working.
The first commit in this PR cleans up our Linux detection logic and adds
lots of tracing calls to make debugging issues like this easier in the
future. To do so, one can run:
$ RUST_LOG=trace uv pip install -v whatever
The second commit has the actual fix.
Fixes#1427
## Summary
By using the display representation of `Version` to form a `PackageId`,
we run the risk (as seen in the linked issue) of thinking that versions
like `2021.1` and `2021.1.0` are not equivalent.
Closes https://github.com/astral-sh/uv/issues/1536
This fixes a bug where `uv pip install` failed to install `polars`:
```
$ uv pip install polars==0.14.0
error: Failed to download: polars==0.14.0
Caused by: Couldn't parse metadata of polars-0.14.0-cp37-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.whl from 749022b096cb7c1c2cc32b7f433c4f/polars-0.14.0-cp37-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Caused by: Operator >= cannot be used with a wildcard version specifier
pyarrow>=4.0.*; extra == 'pyarrow'
^^^^^^^
```
Since `pyarrow>=4.0.*; extra == 'pyarrow'` is invalid *and* it comes
from the metadata of a dependency (that isn't under the control of the
end user), we actually attempt to "fix" it. Namely, wildcard
dependency specifications are only allowed with `==` and `!=`, as per
the [Version Specifiers spec]. (They aren't explicitly forbidden in
these cases, but instead only have specified behavior for the `==` and
`!=` operators.)
This is all fine, but it turns out that when we fix the `>=4.0.*`
component, we also strip the quotes around `pyarrow`. (Because some
dependency specifications include stray quotes.) We fix this by making
our quote stripping a bit more selective. (We require that it appear
adjacent to a digit or a `*`.)
Note that #1477 also reports this error:
```
$ uv pip install 'requests>=2.30.*'
error: Failed to parse `requests>=2.30.*`
Caused by: Operator >= cannot be used with a wildcard version specifier
requests>=2.30.*
```
However, we specifically keep that error message since it's something
under the end user's control. And similarly for a dependency
specification in a `requirements.txt` file.
Fixes#1477
[Version Specifiers spec]:
https://packaging.python.org/en/latest/specifications/version-specifiers/
It turns out that /bin/ls can sometimes be plain text file. For
example, in Rocky Linux 9:
```
$ cat /bin/ls
#!/usr/bin/coreutils --coreutils-prog-shebang=ls
```
However, `/bin/sh` is an ELF binary:
```
$ file /bin/sh
/bin/sh: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7acbb41bf6f1b7d977f1b44675bf3ed213776835, for GNU/Linux 3.2.0, stripped
```
In a related issue (#1433), @zanieb fixed#1395 where, on NixOS,
`/bin/ls` doesn't exist but `/bin/sh` does. However, the fix attempts
`/bin/ls` first and only tries `/bin/sh` if `/bin/ls` doesn't exist. If
`/bin/ls` exists but isn't a valid ELF file, then the entire enterprise
gives up and `uv` fails to detect the version of `libc` that is
installed.
Instead of tweaking the logic to keep trying `/bin/ls` and then
`/bin/sh` after even if parsing `/bin/ls` fails, we just switch over to
reading `/bin/sh` only. It seems like a more fundamental thing to sniff
and likely less error prone.
We can adjust this heuristic as needed if it provdes to be problematic.
I tested this fix manually on Rocky Linux 9 via Docker:
```
$ cross b -r -p uv --target x86_64-unknown-linux-musl
$ cp target/x86_64-unknown-linux-musl/release/uv ~/astral/issues/uv/i1486/uv
$ docker run --rm -it --mount type=bind,src=/home/andrew/astral/issues/uv/i1486,dst=/host rockylinux:9 bash
[root@df2baa65d2f8 /]# /host/uv venv
Using Python 3.9.18 interpreter at /usr/bin/python3.9
Creating virtualenv at: .venv
[root@df2baa65d2f8 /]#
```
Fixes#1486, Ref #1433
I'm not sure if we should just switch to _always_ reading from sh
instead? I don't love that all these errors are strings and I if
`/bin/ls` exists but can't be parsed we still won't try `/bin/sh`. We
may want to address these things in the future.
Closes https://github.com/astral-sh/uv/issues/1395
## Summary
It looks like `devpi` might add an empty fragment (`#`) at the end of
the URL. We expect it to contain the hash; this just makes
empty-fragment map to "no hash".
Closes https://github.com/astral-sh/uv/issues/1441.
## Summary
If a distribution contains a `+`, it'll be HTML-escaped; so when we try
to identify the `#`, we'll split in the wrong location.
Closes https://github.com/astral-sh/uv/issues/1338.
Closes https://github.com/astral-sh/uv/issues/1388
Fixes incorrect handling of relative paths returned by indexes without
an explicit `<base>`.
`Url.join` will drop the last segment in an url e.g. `http://foo/bar` ->
`http://foo/baz` if there is not a trailing slash but what we want is
`http://foo/bar/baz`. We don't add the trailing `/` in
`base_url_join_relative` because flat indexes are `http://foo/bar.html`
and we _want_ `bar.html` to be replaced.
## Summary
In a `requirements.txt` file, it turns out that the `-c` and `-r`
entries should be interpreted as relative to the file in which they're
declared, while the `-e` entries should be interpreted as relative to
the current working directory, no matter where they're defined.
Previously, we always used the current working directory; now, we use
the declaring file's directory for `-c` and `-r`.
Closes https://github.com/astral-sh/uv/issues/1367.
Closes https://github.com/astral-sh/uv/issues/1416.
## Summary
Closes https://github.com/astral-sh/uv/issues/1402.
## Test Plan
Ran `cargo run pip install junos-eznc==2.6.5`, which still fails for me,
but fails identically to `pip` (and not on the `requires-python`):
```
/private/var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmp7mxT9L/built-wheels-v0/pypi/ncclient/0.6.13/4vvPwmDC_CL2OUXd68Zqb/ncclient-0.6.13.tar.gz/versioneer.py:421: SyntaxWarning: invalid escape sequence '\s'
LONG_VERSION_PY['git'] = '''
Traceback (most recent call last):
File "<string>", line 10, in <module>
File "/private/var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmplD5mMO/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 366, in prepare_metadata_for_build_wheel
self.run_setup()
File "/private/var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmplD5mMO/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 480, in run_setup
super().run_setup(setup_script=setup_script)
File "/private/var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmplD5mMO/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "<string>", line 45, in <module>
File "/private/var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmp7mxT9L/built-wheels-v0/pypi/ncclient/0.6.13/4vvPwmDC_CL2OUXd68Zqb/ncclient-0.6.13.tar.gz/versioneer.py", line 1480, in get_version
return get_versions()["version"]
^^^^^^^^^^^^^^
File "/private/var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmp7mxT9L/built-wheels-v0/pypi/ncclient/0.6.13/4vvPwmDC_CL2OUXd68Zqb/ncclient-0.6.13.tar.gz/versioneer.py", line 1412, in get_versions
cfg = get_config_from_root(root)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmp7mxT9L/built-wheels-v0/pypi/ncclient/0.6.13/4vvPwmDC_CL2OUXd68Zqb/ncclient-0.6.13.tar.gz/versioneer.py", line 342, in get_config_from_root
parser = configparser.SafeConfigParser()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'configparser' has no attribute 'SafeConfigParser'. Did you mean: 'RawConfigParser'?
```
This PR improves the error message for the problem described in
https://github.com/astral-sh/uv/issues/1376. The original output
duplicates the actual error message and includes lots of noise
(`DirEntry { inner: DirEntry(...) }`).
```
$ uv pip install hexdump==3.3
error: Failed to download and build: hexdump==3.3
Caused by: Failed to extract source distribution: The top level of the archive must only contain a list directory, but it contains: [DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/__main__.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/hexdump.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/data") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/PKG-INFO") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/setup.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/README.txt") }]
Caused by: The top level of the archive must only contain a list directory, but it contains: [DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/__main__.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/hexdump.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/data") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/PKG-INFO") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/setup.py") }, DirEntry { inner: DirEntry("/home/robin/.cache/uv/.tmpgSvTCk/README.txt") }]
```
This PR removes the duplication and `DirEntry` internals so that the
error message is easier to grasp:
```
$ uv pip install hexdump==3.3
error: Failed to download and build: hexdump==3.3
Caused by: Failed to extract source distribution
Caused by: The top level of the archive must only contain a list directory, but it contains: ["__main__.py", "hexdump.py", "data", "PKG-INFO", "setup.py", "README.txt"]
```
It's a little picky about the value, but that seems okay.
```
❯ ./target/debug/uv pip install trio
Audited 1 package in 4ms
❯ UV_NO_CACHE=true ./target/debug/uv pip install trio
Audited 1 package in 50ms
```
Closes#1382
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```
Instead of dropping versions without a compatible distribution, we track
them as incompatibilities in the solver. This implementation follows
patterns established in https://github.com/astral-sh/puffin/pull/1290.
This required some significant refactoring of how we track incompatible
distributions. Notably:
- `Option<TagPriority>` is now `WheelCompatibility` which allows us to
track the reason a wheel is incompatible instead of just `None`.
- `Candidate` now has a `CandidateDist` with `Compatible` and
`Incompatibile` variants instead of just `ResolvableDist`; candidates
are not strictly compatible anymore
- `ResolvableDist` was renamed to `CompatibleDist`
- `IncompatibleWheel` was given an ordering implementation so we can
track the "most compatible" (but still incompatible) wheel. This allows
us to collapse the reason a version cannot be used to a single
incompatibility.
- The filtering in the `VersionMap` is retained, we still only store one
incompatible wheel per version. This is sufficient for error reporting.
- A `TagCompatibility` type was added for tracking which part of a wheel
tag is incompatible
- `Candidate::validate_python` moved to
`PythonRequirement::validate_dist`
I am doing more refactoring in #1298 — I think a couple passes will be
necessary to clarify the relationships of these types.
Includes improved error message snapshots for multiple incompatible
Python tag types from #1285 — we should add more scenarios for coverage
of behavior when multiple tags with different levels are present.
Mostly throwing this up here as a discussion topic. Having something
like this is primarily useful for enabling use cases similar to `rye
add` where I want to use this currently. One can accomplish something
similar with `unearth` today or by abusing regular `pip install`:
```
$ ~/.rye/self/bin/pip install --no-deps --dry-run flask --report - -q | jq '.install[0].metadata | {name, version}'
{
"name": "Flask",
"version": "3.0.2"
}
```
Another option would be to have a `puffin resolve` command or similar
that works like `pip compile` without dependencies, takes the
requirements as arguments and returns a line for each resolution. That
would be a larger change.
This rollbacks the optimization in the previous commit to be more
general. That is, instead of specializing the case of a range for a
singleton version, we make iteration over the distributions in a
`VersionMap` more explicitly lazy. Iteration now provides a `Version`
(like it did previously) and a _handle_ to a distribution that can be
turned into a `ResolvableDist`.
Doing things this way permits callers to iterate over the versions and
only materialize a distribution if they actually need one. In cases like
candidate selection, one can often rule out use of a distribution
through its version alone, and thus skip construction of that
distribution entirely.
In many cases, version ranges are actually just pins to a
specific and single version. And we can detect that statically
by examining the range. If we do have a range that is just one
version, then we can ask a `VersionMap` for just that version
instead of iterating over what's in the map until we find one
that satisfies the range.
I had tried this before making `VersionMap` construction lazy,
but it didn't seem to matter much. But helps a lot more now
with a lazy `VersionMap` because it lets us avoid creating a
lot of distributions in memory that we won't ultimately use.
That is, a `PrioritizedDistribution` for a specific version of a
package is not actually materialized in memory until a corresponding
`VersionMap::get` call is made for that version. Similarly, iteration
lazily materializes distributions as it moves through the map. It
specifically does not materialize everything first.
The main reason why this is effective is that an
`OwnedArchive<SimpleMetadata>` represents a zero-copy (other than
reading the source file) version of `SimpleMetadata` that is really just
a `Vec<u8>` internally. The problem with `VersionMap` construction
previously is that it had to eagerly materialize a `SimpleMetadata` in
memory before anything else, which defeats a large part of the purpose
of zero-copy deserialization. By making more of `VersionMap`
construction itself lazy, we permit doing some parts of resolution
without necessarily fully deserializing a `SimpleMetadata` into memory.
Indeed, with this commit, in the warm cached case, a `SimpleMetadata` is
itself never materialized fully in memory.
This does not completely and totally fully realize the benefits of
zero-copy deserialization. For example, we are likely still building
lots of distributions in memory that we don't actually need in some
cases. Perhaps in cases where no resolution exists, or when one needs to
iterate over large portions of the total versions published for a
package.
This commit adds some logging to candidate selection during
resolution. The idea with these logs is to get a signal on
how much "exploring" the resolver does in specific examples.
For example, this logs helped me realize that at least in
some cases, candidate selection was looking through a long list
of versions even when its range consisted of exactly one
version. We'll use this fact in a later commit.
This makes cloning and thus sharing across multiple threads much
cheaper. Since Tags is conceptually immutable once it is constructed,
this doesn't pose an issue and shouldn't introduce any additional
costs.
This is really annoying, but the snapshots keep changing indentation
when updated.
I could not get insta to update them. So I added a print statement to
`main` and updated the snapshots, then removed the statement and updated
the snapshots again to force them all to refresh.
We use
- An arbitrary ABI hash: `MMMMMM` (six base64 characters)
- An unlikely Jython27 Python tag
For cases that are valid but are never going to be available during
tests.
See https://github.com/zanieb/packse/pull/109
Moves yanked version filtering from `VersionMap::from_metadata` to the
resolver and tracks it as a PubGrub unavailable incompatibility so
yanked versions are reflected in error messages.
e.g. before
```
╰─▶ Because only albatross<=0.1.0 is available and you require albatross>0.1.0,
we can conclude that the requirements are unsatisfiable.
```
after
```
╰─▶ Because only the following versions of albatross are available:
albatross<=0.1.0
albatross==1.0.0
and albatross==1.0.0 is unusable because it was yanked, we can conclude that albatross>0.1.0 cannot be used.
And because you require albatross>0.1.0, we can conclude that the requirements are unsatisfiable.
```
## Summary
This PR adds an `--offline` flag to Puffin that disables network
requests (implemented as a Reqwest middleware on our registry client).
When `--offline` is provided, we also allow the HTTP cache to return
stale data.
Closes#942.
Updates our `--no-binary` option and adds a `--only-binary` option for
compatibility with `pip` which uses `:all:`, `:none:` and `<name>` for
specifying packages.
This required adding support for `--only-binary <name>` into our
resolver, previously it was only a boolean toggle.
Retains`--no-build` which is equivalent to `--only-binary :all:`. This
is common enough for safety that I would prefer it is available without
pip's awkward `:all:` syntax.
---------
Co-authored-by: konsti <konstin@mailbox.org>
## Summary
For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...
This PR ensures that we remove the UNC prefix when setting the current
working directory.
Closes#1238.
## Test Plan
I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
Contrary to our prior assumption, we can't reliably select a specific
patch version. With the deadsnakes PPA for example, `python3.12` is
installed into `PATH`, but `python3.12.1` isn't. Based on the assumption
(or rather, observation) that users have a single python patch version
per python minor version installed, generally the latest, we only check
if the installed patch version matches the selected patch version, if
any, instead of search for one.
In the process, i deduplicated the python discovery logic.
Run `cargo test` on windows in CI, pulling the switch on tier 1 windows
support.
These changes make the bootstrap script virtually required for running
the tests. This gives us consistency between and CI, but it also locks
our tests to python-build-standalone and an articificial `PATH`.
I've deleted the shell bootstrap script in favor of only the python one,
which also runs on windows. I've left the (sym)link creation of the
bootstrap in place, even though it is not used by the tests anymore.
I've reactivated the three tests that would previously stack overflow by
doubling their stack sizes. The stack overflows only happen in debug
mode, so this is neither a user facing problem nor an actual problem
with our code and this workaround seems better than optimizing our code
for case that the (release) compiler can optimize much better for.
The handling of patch versions will be fixed in a follow-up PR.
Closes#1160Closes#1161
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
In the process of making VersionMap construction lazy, I realized this
refactoring would be useful to me. It also simplifies a fair bit of case
analysis and does fewer BTreeMap lookups during construction. With that
said, this doesn't seem to matter for perf:
```
$ hyperfine -w10 --runs 50 \
"puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" \
"puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null"
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
Time (mean ± σ): 146.8 ms ± 4.1 ms [User: 350.1 ms, System: 314.2 ms]
Range (min … max): 140.7 ms … 158.0 ms 50 runs
Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
Time (mean ± σ): 146.8 ms ± 4.5 ms [User: 359.8 ms, System: 308.3 ms]
Range (min … max): 138.2 ms … 160.1 ms 50 runs
Summary
puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
1.00 ± 0.04 times faster than puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```
But the simplification is still nice, and will decrease the delta
between what we have now and a lazy version map.
This PR reduces the stack sizes a windows a little further using the
stack traces from stack overflows combined with looking at the type
sizes. Ultimately, it ignore the three remaining tests failing in debug
on windows due to stack overflows to unblock `cargo test` for windows on
CI.
444 tests run: 444 passed (39 slow), 1 skipped
We need to use the anstream print macros instead of the std print
macros, otherwise we risk wrong color behavior
(https://github.com/astral-sh/puffin/pull/1258#discussion_r1480428236).
Luckily, the `print_stderr` and `print_stdout` lints catch usages of the
std prints.
This PR switches over to anstream consistently and removes the now
redundant clippy lints. The lints should catch missing anstream usage in
the future.
Remove windows-only dependencies from the snapshot output using regex.
We now do the filtering entirely on our without relying on insta
settings.
435 tests run: 430 passed (30 slow), 5 failed, 1 skipped
There are no binary installers for the latests patch versions of cpython
for windows, and building them is hard. As an alternative, we download
python-build-standanlone cpythons and put them into `<project
root>/bin`. On unix, we can symlink `pythonx.y.z` into this directory
and point `PUFFIN_PYTHON_PATH` to it. On windows, all pythons are called
`python.exe` and they don't like being linked. Instead, we add the path
to each directory containing a `python.exe` to `PUFFIN_PYTHON_PATH`,
similar to the regular `PATH`. The python discovery on windows was
extended to respect `PUFFIN_PYTHON_PATH` where needed.
These changes mean that we don't need to (sym)link pythons anymore and
could drop that part to the script.
435 tests run: 389 passed (21 slow), 46 failed, 1 skipped
## Summary
Open to other opinions here. We could just continue (and warn), prompt
the user with a confirmation, etc.
(The weird thing about those two options is we might need to validate
the command-line arguments _before_ we do that -- so you could get
errors for bad arguments, and then get a warning that your subcommand is
wrong. I can probably avoid that with more work if it feels like a
better out come though.)
Closes https://github.com/astral-sh/puffin/issues/1256.
## Summary
These add and remove dependencies from a `pyproject.toml` -- but they're
currently hidden, and don't match the rest of the workflow. We can
re-add them when the time is right.
Since unavailable packages with `--no-index` can be confusing when the
user does not also provide `--find-links` we add a hint for this case.
Required some plumbing to get the required information to the
`NoSolution` error.
---------
Co-authored-by: konstin <konstin@mailbox.org>
(Please review this PR commit by commit.)
This PR closes an initial loop on zero-copy deserialization. That
is, provides a way to get a `Archived<SimpleMetadata>` (spelled
`OwnedArchive<SimpleMetadata>` in the code) from a `CachedClient`. The
main benefit of zero-copy deserialization is that we can read bytes
from a file, cast those bytes to a structured representation without
cost, and then start using that type as any other Rust type. The
"catch" is that the structured representation is not the actual type
you started with, but the "archived" version of it.
In order to make all this work, we ended up needing to shave a rather
large yak: we had to re-implement HTTP cache semantics. Previously,
we were using the `http-cache-semantics` crate. While it does support
Serde, it doesn't support `rkyv`. Moreover, even simple support for
`rkyv` wouldn't be enough. What we actually want is for the HTTP cache
semantics to be implemented on the *archived* type so that we can
decide whether our cached response is stale or not without needing to
do a full deserialization into the unarchived type. This is why, in
this PR, you'll see `impl ArchivedCachePolicy { ... }` instead of
`impl CachePolicy { ... }`. (The `derive(rkyv::Archive)` macro
automatically introduces the `ArchivedCachePolicy` type into the
current namespace.)
Unfortunately, this PR does not fully realize the dream that is
zero-copy deserialization. Namely, while a `CachedClient` can now
provide an `OwnedArchive<SimpleMetadata>`, the rest of our code
doesn't really make use of it. Indeed, as soon as we go to build a
`VersionMap`, we eagerly convert our archived metadata into an owned
`SimpleMetadata` via deserialization (that *isn't* zero-copy). After
this change, a lot of the work now shifts to `rkyv` deserialization
and `VersionMap` construction. More precisely, the main thing we drop
here is `CachePolicy` deserialization (which is now truly zero-copy)
and the parsing of the MessagePack format for `SimpleMetadata`. But we
are still paying for deserialization. We're just paying for it in a
different place.
This PR does seem to bring a speed-up, but it is somewhat underwhelming.
My measurements have been pretty noisy, but I get a 1.1x speedup fairly
often:
```
$ hyperfine -w5 "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
Time (mean ± σ): 164.4 ms ± 18.8 ms [User: 427.1 ms, System: 348.6 ms]
Range (min … max): 131.1 ms … 190.5 ms 18 runs
Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
Time (mean ± σ): 148.3 ms ± 10.2 ms [User: 357.1 ms, System: 319.4 ms]
Range (min … max): 136.8 ms … 184.4 ms 19 runs
Summary
puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
1.11 ± 0.15 times faster than puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```
One downside is that this does increase cache size (`rkyv`'s
serialization format is not as compact as MessagePack). On disk size
increases by about 1.8x for our `simple-v0` cache.
```
$ sort-filesize cache-main
4.0K cache-main/CACHEDIR.TAG
4.0K cache-main/.gitignore
8.0K cache-main/interpreter-v0
8.7M cache-main/wheels-v0
18M cache-main/archive-v0
59M cache-main/simple-v0
109M cache-main/built-wheels-v0
193M cache-main
193M total
$ sort-filesize cache-test
4.0K cache-test/CACHEDIR.TAG
4.0K cache-test/.gitignore
8.0K cache-test/interpreter-v0
8.7M cache-test/wheels-v0
18M cache-test/archive-v0
107M cache-test/simple-v0
109M cache-test/built-wheels-v0
242M cache-test
242M total
```
Also, while I initially intended to do a simplistic implementation of
HTTP cache semantics, I found that everything was somewhat
inter-connected. I could have wrote code that _specifically_ only worked
with the present behavior of PyPI, but then it would need to be special
cased and everything else would need to continue to use
`http-cache-sematics`. By implementing what we need based on what Puffin
actually is (which is still less than what `http-cache-semantics` does),
we can avoid special casing and use zero-copy deserialization for our
cache policy in _all_ cases.
Previously, whenever we encountered a missing package we would throw an
error without information about why the package was requested. This
meant that if a transitive dependency required a missing package, the
user would have no idea why it was even selected. Here, we track
`NotFound` and `NoIndex` errors as `NoVersions` incompatibilities with
an attached reason. Improves our test coverage for `--no-index` without
`--find-links`.
The
[snapshots](https://github.com/astral-sh/puffin/pull/1241/files#diff-3eea1658f165476252f1f061d0aa9f915aabdceafac21611cdf45019447f60ec)
show a nice improvement.
I think this will also enable backtracking to another version if some
version of transitive dependency has a missing dependency. I'll write a
scenario for that next.
Requires https://github.com/zanieb/pubgrub/pull/22
Closes#884
e.g.
```
❯ cargo run -q -- pip compile --python-version 3.12 requirements.in
× No solution found when resolving dependencies:
╰─▶ Because the requested Python version (3.12) does not satisfy Python>=3.6,<3.10 and recommenders==1.0.0 depends on Python>=3.6,<3.9, we can conclude that recommenders==1.0.0 cannot be used.
And because only the following versions of recommenders are available:
recommenders<=0.7
recommenders==1.0.0
recommenders==1.1.0
recommenders==1.1.1
we can conclude that recommenders>0.7,<1.1.0 cannot be used. (1)
Because the requested Python version (3.12) does not satisfy Python>=3.6,<3.10 and recommenders>=1.1.0 depends on Python>=3.6,<3.10, we can conclude that recommenders>=1.1.0 cannot be used.
And because we know from (1) that recommenders>0.7,<1.1.0 cannot be used, we can conclude that recommenders>0.7 cannot be used.
And because you require recommenders>0.7, we can conclude that the requirements are unsatisfiable.
```
## Summary
Previously, we were blocking operations that could run in parallel. We
would send request through our main requests channel, but not yield so
that the receiver could only start processing requests much later than
necessary. We solve this by switching to the async
`tokio::sync::mpsc::channel`, where send is an async functions that
yields.
Due to the increased parallelism cache deserialization and the
conversion from simple api request to version map became bottlenecks, so
i moved them to `spawn_blocking`. Together these result in a 30-60%
speedup for larger warm cache resolution. Small cases such as black
already resolve in 5.7 ms on my machine so there's no speedup to be
gained, refresh and no cache were to noisy to get signal from.
Note for the future: Revisit the bounded channel if we want to produce
requests from `process_request`, too, (this would be good for
prefetching) to avoid deadlocks.
## Details
We can look at the behavior change through the spans:
```
RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null
```
Below, you can see how on main, we have discrete phases: All (cached)
simple api requests in parallel, then all (cached) metadata requests in
parallel, repeat until done. The solver is mostly waiting until it has
it's version map from the simple API query to be able to choose a
version. The main thread is blocked by process requests.
In the PR branch, the simple api requests succeeds much earlier,
allowing the solver to advance and also to schedule more prefetching.
Due to that `parse_cache` and `from_metadata` became bottlenecks, so i
moved them off the main thread (green color, and their spans can now
overlap because they can run on multiple threads in parallel). The main
thread isn't blocked on `process_request` anymore, instead it has
frequent idle times. The spans are all much shorter, which indicates
that on main they could have finished much earlier, but a task didn't
yield so they weren't scheduled to finish (though i haven't dug deep
enough to understand the exact scheduling between the process request
stream and the solver here).
**main**

**PR**

## Benchmarks
```
$ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter"
Benchmark 1: target/profiling/main-dev resolve jupyter
Time (mean ± σ): 29.1 ms ± 0.7 ms [User: 22.9 ms, System: 11.1 ms]
Range (min … max): 27.7 ms … 32.2 ms 103 runs
Benchmark 2: target/profiling/branch-dev resolve jupyter
Time (mean ± σ): 18.8 ms ± 1.1 ms [User: 37.0 ms, System: 22.7 ms]
Range (min … max): 16.5 ms … 21.9 ms 154 runs
Summary
target/profiling/branch-dev resolve jupyter ran
1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter
$ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent"
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
Time (mean ± σ): 37.8 ms ± 0.9 ms [User: 30.7 ms, System: 14.1 ms]
Range (min … max): 36.6 ms … 41.5 ms 79 runs
Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent
Time (mean ± σ): 24.7 ms ± 1.5 ms [User: 47.0 ms, System: 39.3 ms]
Range (min … max): 21.5 ms … 28.7 ms 113 runs
Summary
target/profiling/branch-dev resolve meine_stadt_transparent ran
1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent
$ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
Time (mean ± σ): 229.0 ms ± 2.8 ms [User: 197.3 ms, System: 63.7 ms]
Range (min … max): 225.8 ms … 234.0 ms 13 runs
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
Time (mean ± σ): 91.4 ms ± 5.3 ms [User: 289.2 ms, System: 176.9 ms]
Range (min … max): 81.0 ms … 104.7 ms 32 runs
Summary
target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
```
In the scenario tests, we want to make sure we're actually conforming to
the scenario's expectations, so we now have an extra assertion on
whether resolution failed or succeeded as well as that it includes the
given packages.
Closes#1112Closes#1030
We need more flexible filters than those `inta` offers, and `insta_cmd`
makes it impossible to plug in programmatic filters. At the same time we
use barely any of `insta_cmd`'s features. We can replace the subset we
need in about 50 loc.
Mostly a mechanical refactor to use the `puffin_snapshot!` and
`TestContext` infrastructure in the add, remove, venv and pip uninstall
tests, in preparation for adding programmatic windows testing filters.
The is only one remaining usage of `assert_cmd_snapshot!` now in the
`puffin_snapshot!` macro.
Mostly a mechanical refactor to use the `puffin_snapshot!` and
`TestContext` infrastructure in the pip install and pip sync tests, in
preparation for adding programmatic windows testing filters.
Split out from the large test refactoring PR. Use `normalized_display`
in tests and two more thiserror derives to match snapshots and output,
and other small windows fixes.
## Summary
See: https://github.com/astral-sh/puffin/issues/1224
## Test Plan
Ran `python -m scripts.bench --puffin
scripts/requirements/compiled/jupyter.txt --min-runs 100 --benchmark
install-warm --verbose` several times, which failed eventually on `main`
but not on this branch.
Mostly a mechanical refactor to use the `puffin_snapshot!` and
`TestContext` infrastructure in the pip compile and pip install
scenarios, in preparation for adding programmatic windows testing
filters.
## Summary
Oops -- this was using a different cache key than the route above (this
is the wheel _metadata_ route vs. the wheel build route), so we were
saving and building source distributions twice in `pip install`.
I originally used Python 3.10, since 3.10 and 3.11 are by far the most
common (at least for [Ruff](https://pypistats.org/packages/ruff)). But
3.12 should give Python tools the most favorable benchmarks.
It turns out that the pattern I coded up for SimpleMetadataRaw is
generally useful when working with rkyv. This commit makes it generic by
supporting any type that implements rkyv's traits, and makes a few
simplifying assumptions by picking a concrete serializer, validator and
deserializer. In effect, this lets use own any archived value.
We also rejigger the API a little bit and double-down on
`OwnedArchive<A>` just being a owned wrapper for `Archived<A>`. Namely,
we implement `Deref` and turn its inherent methods into methods that
require fully qualified syntax. (As is standard for things that
implement `Deref` to avoid ambiguity with the deref target's methods.)
(This PR also makes a couple small simplifications to our custom rkyv
serializer since we no longer need to use it directly. We do still need
to name the type in trait bounds, so it has to be public.)
In preparation for the new windows handling, i want to introduce a
`TestContext` and `puffin_snapshot!` abstraction. This PR applies those
changes for pip-compile. My plan is to use those for all venv-based
integration tests and build the custom windows filters on top of
`puffin_snapshot!`.
## Summary
We have some flags in Puffin that enable us to opt-in to certain tests.
To date, they've been opt-in, so we've run our tests with
`--all-features`. This PR makes them opt-out, and we now run tests with
default features.
The main motivation here is that I want to get tests working for macOS
on CI, but for unknown reasons, macOS can't compile the PyO3 features at
the same time as everything else due to strange linker issues. By
avoiding `--all-features` for tests, we thus avoid unnecessarily
including features that we don't actually use in Puffin.
I verified that the exact same number of tests (439) are run before and
after this change. For users, the primary difference is that you now
need to specify `--no-default-features --features pypi --features
python` to avoid (e.g.) including the Git tests.
The `http-cache-semantics` crate is polymorphic on the types of requests
and responses it accepts. We had previously been explicitly converting
between `http` and `reqwest` types, but this isn't necessary. We can
provide impls of the traits in `http-cache-semantics` for `reqwest`'s
types (via a wrapper). This saves us from the awkward request/response
type conversions.
While this does clone the request, this is:
1. Not new. We were previously cloning the request to do the conversion.
2. An artifact (I believe) of http-cache-semantics API. (It kind of
seems like an API bug to me?)
There is also a little bit of messiness around inter-operating between
http::uri::Uri and url::Url. But overall shouldn't be a big deal.
## Summary
This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by
removing the `WaitMap` and gaining more granular control over the values
that we hold over `await` boundaries.
## Summary
Like https://github.com/astral-sh/puffin/pull/1180, this PR adds logic
for `requirements.txt` parsing whereby if a requirement _looks like_ a
local requirements file or an editable directory, we prompt the user to
correct the error (typically, by adding `-r`).
Lacking windows compatible aarch64 hardware, i cross compiled the
trampoline from x86_64 linux to aarch64-pc-windows-msvc; I added the
instructions to the puffin-trampoline readme. With some testing on an
aarch64 windows machine, this should be sufficient to build working
win_arm64 tagged wheels.
i686-pc-windows-msvc is failing with an error:
```
error: linking with `lld-link` failed: exit status: 1
= note: lld-link: error: undefined symbol: __aulldiv
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced 4 more times
lld-link: error: undefined symbol: __aullrem
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced 4 more times
```
Instrument the main function as anchor span for checking overhead and
update tracing-durations-export to 0.2.0 for differentiating
blocking/non-blocking tasks.
Add a `jupyter.in` requirement since `pip install jupyter` is a common
operation. I tried `jupyterlab` too but there is no difference in
performance (1.00 ± 0.07).
Use `virtualenv` consistently, remove unused error variants and hint the
user towards installing missing python versions.
I didn't touch the Readme but i replaced `virtualenv environment` with
`virtualenv` in the strings i found.
Fixes https://github.com/astral-sh/puffin/issues/1167
## Summary
See: https://github.com/astral-sh/puffin/issues/1181.
## Test Plan
```
❯ cargo run -- pip install packse@../../zanieb/packse
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/puffin pip install 'packse@../../zanieb/packse'`
error: Distribution not found at: file:///Users/crmarsh/zanieb/packse
```
Make the test `compile_python_37` pass whether python 3.7 is installed
or not by muting the warning for a missing 3.7. The resolution error is
independent of whether 3.7 is installed or not.
## Summary
This PR adds support for `--find-links`, `--index-url`, and
`--extra-index-url` arguments when specified in a `requirements.txt`.
It's a mostly-straightforward change. The only uncertain piece is what
to do when multiple files include these flags, and/or when we include
them on the CLI and in other files.
In general:
- If _anything_ specifies `--no-index`, we respect it.
- We combine all `--extra-index-url` and `--find-links` across all
sources, since those are just vectors.
- If we see multiple `--index-url` in requirements files, we error.
- We respect the `--index-url` from the command line over any provided
in a requirements file.
(`pip-compile` seems to just pick one semi-arbitrarily when multiple are
provided.)
Closes https://github.com/astral-sh/puffin/issues/1143.
This adds what is effectively an owned wrapper around
`Archived<SimpleMetadata>`. Normally, an `Archived<SimpleMetadata>`
has to be used behind a pointer (since it has a lifetime
attached to its underlying byte buffer), but we create a
wrapper around it that owns the underlying buffer and provides
free access to the archived type.
This in effect creates an anchor point for the archived type
and lets us pass it around easily. (There has to be an anchor
point for it somewhere.)
An alternative to this approach would be to store it as a file
backed memory map. But in practice, we're dealing with small
files, and just reading them on to the heap is likely to be
faster. (Memory maps also have wildly different perf characteristics
across platforms.)
Note that this commit just defines the type. It isn't actually
used anywhere yet.
Less verbose span fields for `Dist`s by using the display impl and no
more min length in the tracing durations plot config for comparability
(we lose spans due to a speedup otherwise). Both wait points in the
solver loop are now instrumented so we can inspect what we're waiting
for to progress in the solver.
This PR migrates our source distribution downloads to unzip as we
stream, similar to our approach for wheels.
In my testing, this showed a consistent speedup (e.g., 6% here for a few
representative source distributions):
```text
❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in
Benchmark 1: ./target/release/main (install-cold)
Time (mean ± σ): 1.503 s ± 0.039 s [User: 1.479 s, System: 0.537 s]
Range (min … max): 1.466 s … 1.605 s 10 runs
Benchmark 2: ./target/release/puffin (install-cold)
Time (mean ± σ): 1.421 s ± 0.024 s [User: 1.505 s, System: 0.593 s]
Range (min … max): 1.381 s … 1.454 s 10 runs
Summary
'./target/release/puffin (install-cold)' ran
1.06 ± 0.03 times faster than './target/release/main (install-cold)'
```
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.
For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:
* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
allocate and create all necessary objects.
This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.
The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.
My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.
There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.
[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
## Summary
This is my guess as to the source of the resolver flake, based on
information and extensive debugging from @zanieb. In short, if we rely
on `self.index.packages` as a source of truth during error reporting, we
open ourselves up to a source of non-determinism, because we fetch
package metadata asynchronously in the background while we solve -- so
packages _could_ be included in or excluded from the index depending on
the order in which those requests are returned.
So, instead, we now track the set of packages that _were_ visited by the
solver. Visiting a package _requires_ that we wait for its metadata to
be available. By limiting analysis to those packages that were visited
during solving, we are faithfully representing the state of the solver
at the time of failure.
Closes#863
## Summary
We have this optimization in `wheel.rs`, in the installer, but it makes
a huge difference for zips with many small files:
```
Benchmarking file_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 74.2s, or reduce sample count to 10.
file_reader/Django-5.0.1-py3-none-any.whl
time: [751.63 ms 757.78 ms 764.27 ms]
change: [-1.0290% +0.0841% +1.2289%] (p = 0.88 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
Benchmarking buffered_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 53.4s, or reduce sample count to 10.
buffered_reader/Django-5.0.1-py3-none-any.whl
time: [529.86 ms 536.44 ms 543.35 ms]
change: [+0.0293% +1.5543% +3.1426%] (p = 0.05 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
```
That's almost 30% faster...
In Rust, `fs::copy` automatically preserves permissions (see:
https://doc.rust-lang.org/std/fs/fn.copy.html).
Elsewhere, when copying from the zip archive out to the cache, we can
set permissions during file creation, rather than as a separate call.
Both of these should be slightly more efficient.
## Summary
When we migrated to an "unzip while we stream" solution, we lost the
logic to set permissions on the extracted files, so executables in
wheels were no longer executable. It turns out this is a little tricky,
since the permissions metadata is in the central directory at the _end_
of the zip file, and the async ZIP reader explicitly stops iteration
once it hits the central directory. (Specifically, it goes 4 bytes into
the central directory, since it sees the 4-byte signature header and
then stops.)
So, to solve that, I've added a `CentralDirectoryReader` that continues
where that iterator left off. This required forking the async zip crate:
https://github.com/charliermarsh/rs-async-zip/pull/1. It took a lot of
fiddling but I'm quite confident in the code now, especially since the
async zip crate validates the signature kind on every read.
The central directory is typically quite small (even for the Zig wheel,
which is enormous, it's just around 1MB), so I don't expect this to have
a high cost.
Closes https://github.com/astral-sh/puffin/issues/1148.
## Summary
This ensures that we warn when redundant options are passed (like
`--allow-unsafe`, which is really common for forwards compatibility
since it's going to be the default in a future release), and errors when
known variants are passed that we _don't_ support (like
`--resolver=backtracking`).
Closes https://github.com/astral-sh/puffin/issues/1127.
In https://github.com/astral-sh/puffin/pull/1040 we broke the pip
compile scenarios designed to test failure when a required Python
version is not available — resolution succeeded because all of the
Python versions were available in CI. Following #1105 we have the
ability to isolate tests from Python versions available in the system.
Here, we limit the scenarios to only the Python version in the current
environment, restoring our ability to test the error messages.
With https://github.com/zanieb/packse/pull/95, we will be able to
specify scenarios with access to additional system Python versions. This
will allow us to include test coverage where resolution can succeed by
using a version available elsewhere on the system. See #1111 for this
follow-up.
Replaces https://github.com/astral-sh/puffin/pull/1068 and #1070 which
were more complicated than I wanted.
- Introduces a `.python-versions` file which defines the Python versions
needed for development
- Adds a Bash script at `scripts/bootstrap/install` which installs the
required Python versions from `python-build-standalone` to `./bin`
- Checks in a `versions.json` file with metadata about available
versions on each platform and a `fetch-version` Python script derived
from `rye` for updating the versions
- Updates CI to use these Python builds instead of the `setup-python`
action
- Updates to the latest packse scenarios which require Python 3.8+
instead of 3.7+ since we cannot use 3.7 anymore and includes new test
coverage of patch Python version requests
- Adds a `PUFFIN_PYTHON_PATH` variable to prevent lookup of system
Python versions for isolation during development
Tested on Linux (via CI) and macOS (locally) — presumably it will be a
bit more complicated to do proper Windows support.
## Background
In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.
On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.
```python
#!/usr/bin/env python
```
will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:
```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
sys.exit(patched_main())
```
On windows, this doesn't work, we can only rely on launching `.exe`
files.
## Summary
We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.
The changes in this PR make the `black` entrypoint work:
```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```
Integration with our existing tests will be done in follow-up PRs.
## Implementation and Details
I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.
The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.
On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
## Summary
Rather than checking cache freshness in the install plan, it's a lot
simple to have the install plan _never_ return cached data when the
refresh policy is in place, and then rely on the distribution database
to check for freshness. The original implementation didn't support this,
since the distribution database was rebuilding things too often. Now, it
rarely rebuilds (it's much better about this), so it seems conceptually
much simpler to split up the responsibilities like this.
## Summary
This ensures that (like Cargo) we don't suffer from
https://github.com/advisories/GHSA-r5w3-xm58-jv6j, by way of checking
known hosts when fetching via `libgit2`.
The implementation is taken from Cargo itself, modified to remove all
configuration, since we don't yet support configuration for known hosts,
etc.
Closes#285.
## Summary
Use a single error type in `puffin_distribution`, rather than two
confusingly similar types between `DistributionDatabase` and the source
distribution module.
Also removes the `#[from]` for IO errors and replaces with explicit
wrapping, which is verbose but removes a bunch of incorrect error
messages.
This PR changes the error type to be boxed internally so that it uses
less size on the stack. This makes functions returning `Result<T,
Error>`, in particular, return something much smaller.
The specific thing that motivated this was Clippy lints firing when I
tried to refactor code in this crate.
I chose to achieve boxing by splitting the enum out into a separate
type, and then wiring up the necessary `From` impl to make error
conversions easy, and then making `Error` itself opaque. We could expose
the `Box`, but there isn't a ton of benefit in doing so because one
cannot pattern match through a `Box`.
This required using more explicit error conversions in several places.
And as a result, I was able to remove all `#[from]` attributes on
non-transparent error variants.
Our existing detection doesn't work on Windows, because we canoncalize
the interpreter path but not `info.sys_executable`, so the former
includes the UNC prefix, etc. This is cross-platform and gets at the
intent of the check.
## Summary
This PR adds a `NormalizedDisplay` trait that we can use for user-facing
paths, to strip the UNC prefix on Windows.
On other platforms, the implementation is a no-op (vs. `Display`).
I audited all usages of `.display()`, and changed any that were
user-facing, either via `println!` or `eprintln!`, or by way of being
included in error messages. I did _not_ change uses that were only in
tests or only went to tracing.
Closes https://github.com/astral-sh/puffin/issues/1084.
Windows uses `;` instead of `:` to separate `PATH` entries. This pull
request switches from manually using `:` to the `std::env` functions.
This fixes
```
puffin pip install -e scripts/editable-installs/maturin_editable
```
on windows.
## Summary
When we unzip wheels in the cache, we write the directories out to an
`archive-v0` bucket, and then symlink into that bucket from the
`wheels-v0` and `built-wheels-v0` buckets.
On Windows, symlinks are not well supported. Specifically, they need to
be explicitly enabled by the user. So, instead of symlinks, we now use
junctions, which are well-supported on Windows, and allow you to
(effectively) symlink a directory to another directory. This PR
implements said junction support, which gets the core installer working
on Windows.
In the past, we also used symlinks to implement another primitive: we
wanted to be able to replace a directory "atomically" (I put
"atomically" in quotes because I don't know if it's actually a
guaranteed atomic operation), in case someone was trying to use the
directory while we were replacing it (as opposed to deleting the
directory, then moving it into place).
On Windows, it doesn't appear to be possible to atomically replace a
junction. So instead, I'm using a new design, whereby the cache always
returns canonicalized paths. We know these canonicalized paths are
unique and won't be replaced, so they're safe for writers to rely on. In
general, when we write new data to the cache, we now return the
canonicalized path. When we read from the cache, and try to identify
(e.g.) the set of wheels available to us, we canonicalize the links
immediately and consider them non-existent if that operation fails.
Closes#1085.
---------
Co-authored-by: konstin <konstin@mailbox.org>
Requires https://github.com/zanieb/pubgrub/pull/20
In short, `UnusableDependencies` can be generalized into `Unavailable`
which encompasses incompatibilities where a package range which is
unusable for some inherent reason as well as when its dependencies are
unusable. We can eventually use this to track more incompatibilities in
the solver. I made the reason string required because I can't see a case
where we should leave it out.
Additionally, this improves the display of conflicts in the root
requirements.
## Summary
It turns out this is significantly faster when reading (e.g.) _just_ the
`METADATA` file from a zipped wheel.
I audited other `File::open` usages, and everything else seems to be
using a buffered reader already (directly, or in whatever third-party
crate it's passed to) _or_ is read immediately in full.
See the criterion benchmark:
```
file_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
time: [6.9618 ms 6.9664 ms 6.9713 ms]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
file_reader/flask-3.0.1-py3-none-any.whl
time: [237.50 µs 238.25 µs 239.13 µs]
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
buffered_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
time: [648.92 µs 653.85 µs 660.09 µs]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
buffered_reader/flask-3.0.1-py3-none-any.whl
time: [39.578 µs 39.712 µs 39.869 µs]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
```
Follow-up to https://github.com/astral-sh/puffin/pull/1040 adding a
user-facing warning when we cannot build with their requested version.
e.g.
```
❯ cargo run -- pip compile requirements.in --python-version 3.11.4 --no-build
Resolved 8 packages in 483ms
❯ cargo run -- pip compile requirements.in --python-version 3.11.4
warning: The requested Python version 3.11.4 is not available; 3.11.7 will be used to build dependencies instead.
Resolved 8 packages in 71ms
❯ cargo run -- pip compile requirements.in --python-version 3.11
Resolved 8 packages in 71ms
```
## Summary
This PR uses `ctime` consistently on Unix as a more conservative
approach to change detection. It also ensures that our timestamp
abstraction is entirely internal, so we can change the representation
and logic easily across the codebase in the future.
## Summary
First batch of changes for windows support. Notable changes:
* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.
## Outlook
Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped
There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
Extends #1029
Closes https://github.com/astral-sh/puffin/issues/1038
Instead of always using the current Python version for builds when a
target version is provided, we will do our best to use a compatible
Python version for builds.
Removes behavior where Python versions without patch versions were
always assumed to be the latest known patch version (previously
discussed in https://github.com/astral-sh/puffin/pull/534). While this
was convenient for resolutions which include packages which require
minimum patch versions e.g. `requires-python=">=3.7.4"`, it conflicts
with the idea that the target Python version you provide is the
_minimum_ compatible version. Additionally, it complicates interpreter
lookup as we cannot tell if the user has asked for that specific patch
version or not.
In windows, `python3.9` and `python3.11` are not in `PATH`. Instead, we
should pass only the python version to `puffin venv -p` in packse
scenarios (#1039).
This PR replaces a few uses of hash maps/sets with btree maps/sets and
index maps/sets. This has the benefit of guaranteeing a deterministic
order of iteration.
I made these changes as part of looking into a flaky test.
Unfortunately, I'm not optimistic that anything here will actually fix
the flaky test, since I don't believe anything was actually dependent
on the order of iteration.
## Summary
This PR is an alternative approach to #949 which should be much safer.
As in #949, we add a `Refresh` policy to the cache. However, instead of
deleting entries from the cache the first time we read them, we now
check if the entry is sufficiently new (created after the start of the
command) if the refresh policy applies. If the entry is stale, then we
avoid reading it and continue onward, relying on the cache to
appropriately overwrite based on "new" data. (This relies on the
preceding PRs, which ensure the cache is append-only, and ensure that we
can atomically overwrite.)
Unfortunately, there are just a lot of paths through the cache, and
didn't data is handled with different policies, so I really had to go
through and consider the "right" behavior for each case. For example,
the HTTP requests can use `max-age=0, must-revalidate`. But for the
routes that are based on filesystem modification, we need to do
something slightly different.
Closes#945.
## Summary
This PR ensures that we store HTTP caching information for wheels.
Previously, we only stored these for source distributions. This will be
helpful for refresh, since we can avoid re-downloading wheels that are
unchanged per HTTP caching semantics.
There should be zero performance hit here for warm installs, and only an
extremely small hit for cold installs (writing the HTTP cache data to
disk). The hyperfine benchmarks reflect this.
## Summary
If you send a revalidation request to a resource that returns an
`immutable` directive, the server apparently returns a 200 instead of a
304? In other words, the server can ignore the revalidation request.
This PR adds handling on top of the HTTP cache semantics to respect
immutable resources, which is especially useful since all PyPI files are
immutable.
## Summary
One problem we have in the cache today is that we can't overwrite
entries atomically, because we store unzipped _directories_ in the cache
(which makes installation _much_ faster than storing zipped
directories). So, if you ignore the existing contents of the cache when
writing, you might run into an error, because you might attempt to write
a directory where a directory already exists.
This is especially annoying for cache refresh, because in order to
refresh the cache, we have to purge it (i.e., delete a bunch of stuff),
which is also highly unsafe if Puffin is running across multiple threads
or multiple processes.
The solution I'm proposing here is that whenever we persist a
_directory_ to the cache, we persist it to a special "archive" bucket.
Then, within the other buckets, directory entries are actually symlinks
into that "archive" bucket. With symlinks, we can atomically replace,
which means we can easily overwrite cache entries without having to
delete from the cache.
The main downside is that we'll now accumulate dangling entries in the
"archive" bucket, and so we'll need to implement some form of garbage
collection to ensure that we remove entries with no symlinks. Another
downside is that cache reads and writes will be a bit slower, since we
need to deal with creating and resolving these symlinks.
As an example... after this change, the cache entry for this unzipped
wheel is actually a symlink:

Then, within the archive directory, we actually have two unique entries
(since I intentionally ran the command twice to ensure overwrites were
safe):

Per https://apenwarr.ca/log/20181113, `ctime` should be a lot more
conservative, and should detect things like the issue we see with the
python-build-standalone builds, where the `mtime` is identical across
builds.
On Windows, I'm just using `last_write_time`. But we should probably add
`volume_serial_number` and other attributes via
[`winapi_util`](https://docs.rs/winapi-util/latest/winapi_util/index.html).
## Summary
This is a refactor of the source distribution cache that again aims to
make the cache purely additive. Instead of deleting all built wheels
when the cache gets invalidated (e.g., because the source distribution
changed on PyPI or something), we now treat each invalidation as its own
cache directory. The manifest inside of the source distribution
directory now becomes a pointer to the "latest" version of the source
distribution cache.
Here's a visual example:

With this change, we avoid deleting built distributions that might be
relied on elsewhere and maintain our invariant that the cache is purely
additive. The cost is that we now preserve stale wheels, but we should
add a garbage collection mechanism to deal with that.
## Summary
This PR gets rid of the manifest that we store for source distributions.
Historically, that manifest included the source distribution metadata,
plus a list of built wheels.
The problem with the manifest is that it duplicates state, since we now
have to look at both the manifest and the filesystem to understand the
cache state. Instead, I think we should treat the cache as the source of
truth, and get rid of the duplicated state in the manifest.
Now, we store the manifest (which is merely used to check for cache
freshness -- in future PRs, I will repurpose it though, so I left it
around), then the distribution metadata as its own file, then any
distributions in the same directory. When we want to see if there are
any valid distributions, we `readdir` on the directory. This is also
much more consistent with how the install plan works.
Mirroring `virtualenv -p` and driven by the lack of `pythonx.y` in
`PATH` on windows, this PR adds `-p x.y` support to `puffin venv` (first
commit).
Supported formats:
* NEW: `-p 3.10` searches for an installed Python 3.10 (Looking for
`python3.10` on linux/mac).
Specifying a patch version is not supported
* `-p python3.10` or `-p python.exe` looks for a binary in `PATH`
* `-p /home/ferris/.local/bin/python3.10` uses this exact Python
In the second commit, we add python interpreter search on windows using
`py --list-paths`. On windows, all python are called `python.exe` so the
unix trick of looking for `python{}.{}` in `PATH` doesn't work. Instead,
we ask the python launcher for windows to tell us about all installed
packages. We should eventually migrate this to [PEP
514](https://peps.python.org/pep-0514/) by reading the registry entries
ourselves.
Extends #1048 interface providing a more general interface that I think
should be standard.
Allows forcing colors to be on _or_ off. e.g. `NO_COLOR=1 pip install
pip-tools --color always` would be colored.
Hides the `--no-color` option as it only exists for compatibility (and
seems better than throwing an error when people assume it will exist).
Has a nice side-effect of documenting our coloring behaviors e.g.
```
--color <COLOR>
Control colors in output
[default: auto]
Possible values:
- auto: Enables colored output only when the output is going to a terminal or TTY with support
- always: Enables colored output regardless of the detected environment
- never: Disables colored output
```