Python/uv - uv - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Zanie Blue	af39bbde75	Add long-form version output (#1930 ) Similar to https://github.com/astral-sh/ruff/pull/8034 Adds more version information so it's clear what revision the user is on ``` ❯ cargo run -q -- --version uv 0.1.10 (`daa8565a7` 2024-02-23) ❯ cargo run -q -- -V uv 0.1.10 ❯ cargo run -q -- version uv 0.1.10 (`daa8565a7` 2024-02-23) ❯ cargo run -q -- version --output-format json { "version": "0.1.10", "commit_info": { "short_commit_hash": "daa8565a7", "commit_hash": "daa8565a75249305821fdc34ace085060c082ba3", "commit_date": "2024-02-23", "last_tag": null, "commits_since_last_tag": 0 } } ```	2024-02-23 13:45:01 -06:00
Zanie Blue	daa8565a75	Bump version to 0.1.10 (#1923 )	2024-02-23 11:40:36 -06:00
Zanie Blue	fe1847561c	Retain authentication when making range requests (#1902 ) Needs https://github.com/prefix-dev/async_http_range_reader/pull/9 Closes https://github.com/astral-sh/uv/issues/1709	2024-02-23 15:21:10 +00:00
Charlie Marsh	0212cb72e9	Bump version to v0.1.9 (#1891 )	2024-02-23 01:32:48 +00:00
Charlie Marsh	aa73a4f0ea	Add support for `config_settings` in PEP 517 hooks (#1833 ) ## Summary Adds `--config-setting` / `-C` (with a `--config-settings` alias for convenience) to the CLI. Closes https://github.com/astral-sh/uv/issues/1460.	2024-02-23 00:53:45 +00:00
Zanie Blue	8a12b2ebf9	Ensure authentication is passed from the index url to distribution files (#1886 ) Closes https://github.com/astral-sh/uv/issues/1709 Closes https://github.com/astral-sh/uv/issues/1371 Tested with the reproduction provided in #1709 which gets past the HTTP 401. Reuses the same copying logic we introduced in https://github.com/astral-sh/uv/pull/1874 to ensure authentication is attached to file URLs with a realm that matches that of the index. I had to move the authentication logic into a new crate so it could be used in `distribution-types`. We will want to something more robust in the future, like track all realms with authentication in a central store and perform lookups there. That's what `pip` does and it allows consolidation of logic like netrc lookups. That refactor feels significant though, and I'd like to get this fixed ASAP so this is a minimal fix.	2024-02-22 18:10:17 -06:00
Zanie Blue	f0b39a36b4	Bump version to 0.1.8 (#1880 )	2024-02-22 13:11:58 -06:00
Charlie Marsh	12462e5730	Bump version to v0.1.7 (#1851 )	2024-02-21 22:31:23 -05:00
Charlie Marsh	7eaed07f6c	Move conflicting dependencies into PubGrub (#1796 ) ## Summary This revives a PR from long ago (https://github.com/astral-sh/uv/pull/383 and https://github.com/zanieb/pubgrub/pull/24) that modifies how we deal with dependencies that are declared multiple times within a single package. To quote from the originating PR: > Uses an experimental pubgrub branch (#370) that allows us to handle multiple version ranges for a single dependency to the solver which results in better error messages because the derivation tree contains all of the relevant versions. Previously, the version ranges were merged (by us) in the resolver before handing them to pubgrub since only one range could be provided per package. Since we don't merge the versions anymore, we no longer give the solver an empty range for conflicting requirements; instead the solver comes to that conclusion from the provided versions. You can see the improved error message for direct dependencies in [this snapshot](https://github.com/astral-sh/puffin/pull/383/files#diff-a0437f2c20cde5e2f15199a3bf81a102b92580063268417847ec9c793a115bd0). The main issue with that PR was around its handling of URL dependencies, so this PR _also_ refactors how we handle those. Previously, we stored URL dependencies on `PubGrubPackage`, but they were omitted from the hash and equality implementations of `PubGrubPackage`. This led to some really careful codepaths wherein we had to ensure that we always visited URLs before non-URL packages, so that the URL-inclusive versions were included in any hashmaps, etc. I considered preserving this approach, but it would require us to rely on lots of internal details of PubGrub (since we'd now be relying on PubGrub to merge those packages in the "right" order). So, instead, we now _always_ set the URL on a given package, whenever that package was _given_ a URL upfront. I think this is easier to reason about: if the user provided a URL for `flask`, then we should just always add the URL for `flask`. If we see some other URL for `flask`, we error, like before. If we see some unknown URL for `flask`, we error, like before. Closes https://github.com/astral-sh/uv/issues/1522. Closes https://github.com/astral-sh/uv/issues/1821. Closes https://github.com/astral-sh/uv/issues/1615.	2024-02-21 21:27:58 -05:00
Micha Reiser	fac9d843dc	Normalize `VIRTUAL_ENV` path in activation scripts (#1817 )	2024-02-21 15:52:32 +00:00
Charlie Marsh	88a0c13865	Use async unzip for local source distributions (#1809 ) ## Summary We currently maintain separate untar methods for sync and async, but we only use the sync version when the user provides a local source distribution. (Otherwise, we untar as we download the distribution.) In my testing, this is actually slower anyway: ``` ❯ python -m scripts.bench \ --uv-path ./target/release/main \ --uv-path ./target/release/uv \ ./requirements.in --benchmark resolve-cold --min-runs 50 Benchmark 1: ./target/release/main (resolve-cold) Time (mean ± σ): 835.2 ms ± 107.4 ms [User: 346.0 ms, System: 151.3 ms] Range (min … max): 639.2 ms … 1051.0 ms 50 runs Benchmark 2: ./target/release/uv (resolve-cold) Time (mean ± σ): 750.7 ms ± 91.9 ms [User: 345.7 ms, System: 149.4 ms] Range (min … max): 637.9 ms … 905.7 ms 50 runs Summary './target/release/uv (resolve-cold)' ran 1.11 ± 0.20 times faster than './target/release/main (resolve-cold)' ```	2024-02-21 14:11:37 +00:00
Charlie Marsh	a2a1b2fb0f	Avoid enforcing URL correctness for installed distributions (#1793 ) ## Summary Allows the corresponding `pypi_types` struct to use any URL, since other installers can put those into the environment, and Poetry seems to write invalid URLs. If we see a distribution with an invalid URL, we just treat it as a registry distribution, which isn't ideal, but is better than (1) erroring, and (2) changing `Url` to `String` everywhere internally. (I'm torn on this second option.) Closes https://github.com/astral-sh/uv/issues/1744. ## Test Plan - Added `flask = { git = "git@github.com:pallets/flask.git", rev = "b90a4f1f4a370e92054b9cc9db0efcb864f87ebe" }` to `scripts/editable-installs/poetry_editable/pyproject.toml`. - Ran `poetry install`. - Ran `cargo pip freeze`. Verified that it errored on `main`, but passed here. - Ran `cargo run pip install "flask==3.0.0"`. Verified that it uninstalled the existing Flask, and installed a new version from the registry.	2024-02-21 09:06:31 -05:00
Zanie Blue	d07b587f3f	Retain passwords in Git URLs (#1717 ) Fixes handling of GitHub PATs in HTTPS URLs, which were otherwise dropped. We now supporting the following authentication schemes: ``` git+https://<user>:<token>/... git+https://<token>/... ``` On Windows, the username is required. We can consider adding a special-case for this in the future, but this just matches libgit2's behavior. I tested with fine-grained tokens, OAuth tokens, and "classic" tokens. There's test coverage for fine-grained tokens in CI where we use a real private repository and PAT. Yes, the PAT is committed to make this test usable by anyone. It has read-only permissions to the single repository, expires Feb 1 2025, and is in an isolated organization and GitHub account. Does not yet address SSH authentication. Related: - https://github.com/astral-sh/uv/issues/1514 - https://github.com/astral-sh/uv/issues/1452	2024-02-21 00:12:56 +00:00
konsti	2928c6e574	Backport changes from publish crates (#1739 ) Backport of changes for the published new versions of pep440_rs and pep508_rs to make it easier to keep them in sync.	2024-02-20 19:33:27 +01:00
Charlie Marsh	ede2828fde	Bump version to v0.1.6 (#1736 )	2024-02-20 12:22:26 -05:00
Di-Is	36edaeecf2	Control pip timeout duration via environment variable (#1694 ) <!-- Thank you for contributing to uv! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Add the environment variable `UV_REQUEST_TIMEOUT` to allow control over pip timeouts. Closes #1549 ## Test Plan I built uv in the repository top Dockerfile, set the timeout to 3 seconds, and ran `uv pip install torch`. I measured the execution time with the time command and confirmed that the process finished at a value close to the timeout we set. ```bash root@037c69228cdc:~# time UV_REQUEST_TIMEOUT=3 /uv pip install torch Resolved 22 packages in 25ms error: Failed to download distributions Caused by: Failed to fetch wheel: nvidia-cusolver-cu12==11.4.5.107 Caused by: Failed to extract source distribution Caused by: request or response body error: operation timed out Caused by: operation timed out real 0m3.064s user 0m0.225s sys 0m0.240s ```	2024-02-19 22:37:56 -06:00
Charlie Marsh	034f62b24f	Respect `--index-url` provided via requirements.txt (#1719 ) ## Summary When we read `--index-url` from a `requirements.txt`, we attempt to respect the `--index-url` provided by the CLI if it exists. Unfortunately, `--index-url` from the CLI has a default value... so we _never_ respect the `--index-url` in the requirements file. This PR modifies the CLI to use `None`, and moves the default into logic in the `IndexLocations `struct. Closes https://github.com/astral-sh/uv/issues/1692.	2024-02-20 00:02:26 +00:00
Alexander Gherm	4dfcf32e4c	Add shell completions generation (#1675 ) <!-- Thank you for contributing to uv! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Adds cli command / flag (`generate-shell-completion <SHELL>` / `--generate-shell-completion <SHELL>`) to generate the completion script for the given shell. Implemented in exactly the same way as it is done in ruff (https://github.com/astral-sh/ruff/blob/main/crates/ruff/src/lib.rs#L197) Closes https://github.com/astral-sh/uv/issues/1654 ## Test Plan I've normally tested the generated script manually only for bash shell on Ubuntu 22.04.3 ```bash $ uv --generate-shell-completion bash > /usr/share/bash-completion/completions/uv $ uv # <TAB> -q -h --verbose --no-cache --version clean -v -V --no-color --cache-dir pip generate-shell-completion -n --quiet --color --help venv help $ uv pip # <TAB> -q -n -V --verbose --color --cache-dir --version sync uninstall help -v -h --quiet --no-color --no-cache --help compile install freeze ```	2024-02-18 21:43:18 -06:00
Zanie Blue	07349e39e8	Bump version to v0.1.5 (#1671 )	2024-02-18 20:18:07 -06:00
Charlie Marsh	5cdc6de4a9	Add `CACHEDIR.TAG` to uv-created virtualenvs (#1653 ) ## Summary Just as we mark virtualenvs as `gitignore`d by default, we should also mark them as `CACHEDIR.TAG`, to ensure that they aren't included in backups, etc. Closes https://github.com/astral-sh/uv/issues/1648. ## Test Plan Ran `cargo run venv` and: ``` ❯ ls .venv CACHEDIR.TAG bin lib pyvenv.cfg ```	2024-02-18 13:32:11 -05:00
Charlie Marsh	ea62ae4ebd	Bump version to v0.1.4 (#1608 )	2024-02-17 15:22:07 -05:00
Charlie Marsh	facc60f3a8	Add graceful fallback for Artifactory indexes (#1574 ) ## Summary There are more details in https://github.com/astral-sh/uv/issues/1370, but it looks like Artifactory servers have incorrect behavior when it comes to HTTP range requests, in that they return `Accept-Ranges: bytes`, but then incorrectly return 200 requests when you actually ask for a given range. This PR ensures that we fallback gracefully in this case. It's built on https://github.com/prefix-dev/async_http_range_reader/pull/5. Assuming that gets merged upstream, we can then remove the Git dependency. Closes https://github.com/astral-sh/uv/issues/1370. ## Test Plan `cargo run pip install requests -i https://killjoyuvbug.jfrog.io/artifactory/api/pypi/pypi/simple --verbose`	2024-02-17 14:37:06 +00:00
Charlie Marsh	1110489c29	Bump version to v0.1.3 (#1557 )	2024-02-16 19:45:29 -05:00
Charlie Marsh	9e0336c28a	Remove URL encoding when determining file name (#1555 ) ## Summary Closes https://github.com/astral-sh/uv/issues/1553.	2024-02-16 19:15:24 -05:00
Charlie Marsh	4f216f3a74	Apply percent-decoding to filepaths in HTML find-links (#1544 ) ## Summary Closes https://github.com/astral-sh/uv/issues/1542.	2024-02-16 16:47:04 -05:00
Charlie Marsh	01ffc36520	Apply percent-decoding to file-based URLs (#1541 ) ## Summary Closes https://github.com/astral-sh/uv/issues/1537.	2024-02-16 16:11:16 -05:00
Zanie Blue	9737b93b79	Use the system trust store for HTTPS requests (#1512 ) Closes #1474 Using the `rustls-tls-native-roots` feature > `rustls-tls`: Enables TLS functionality provided by rustls. Equivalent to rustls-tls-webpki-roots. > > `rustls-tls-webpki-roots`: Enables TLS functionality provided by rustls, while using root certificates from the webpki-roots crate. > > `rustls-tls-native-roots`: Enables TLS functionality provided by rustls, while using root certificates from the rustls-native-certs crate. Additional context: - https://github.com/seanmonstar/reqwest/issues/1554 - https://github.com/encode/httpx/issues/302 - [Should I use the native certs or webpki-roots?](https://github.com/rustls/rustls-native-certs#should-i-use-this-or-webpki-roots) Prior discussion at https://github.com/astral-sh/uv/pull/609	2024-02-16 14:07:18 -05:00
Zanie Blue	2ea44d863a	Add warning for empty requirements files (#1519 ) Also, improve tracing of requirements file parsing. Per my confusion in #1334	2024-02-16 18:19:09 +00:00
Charlie Marsh	659327f24a	Bump version to v0.1.2 (#1439 )	2024-02-16 01:17:19 -05:00
Zanie Blue	e0885b7c8e	Bump version to 0.1.1 (#1359 )	2024-02-15 15:38:22 -06:00
Charlie Marsh	27177613d4	Bump version to v0.1.0 (#1325 )	2024-02-15 14:12:23 -05:00
Charlie Marsh	0579a04014	Bump to v0.0.5 for pre-release (#1324 ) This is easier than figuring out the version parsing.	2024-02-15 18:33:34 +00:00
Charlie Marsh	ad12d97e71	Set crate to prerelease (#1320 )	2024-02-15 18:21:09 +00:00
Charlie Marsh	06f2b6eee2	Bump version and update pyproject.toml metadata (#1316 ) Also ensures that we no longer clear the README when uploading to PyPI :)	2024-02-15 18:03:35 +00:00
Zanie Blue	2586f655bb	Rename to `uv` (#1302 ) First, replace all usages in files in-place. I used my editor for this. If someone wants to add a one-liner that'd be fun. Then, update directory and file names: ``` # Run twice for nested directories find . -type d -print0 \| xargs -0 rename s/puffin/uv/g find . -type d -print0 \| xargs -0 rename s/puffin/uv/g # Update files find . -type f -print0 \| xargs -0 rename s/puffin/uv/g ``` Then add all the files again ``` # Add all the files again git add crates git add python/uv # This one needs a force-add git add -f crates/uv-trampoline ```	2024-02-15 11:19:46 -06:00
Andrew Gallant	8102980192	puffin-resolver: make VersionMap construction lazy That is, a `PrioritizedDistribution` for a specific version of a package is not actually materialized in memory until a corresponding `VersionMap::get` call is made for that version. Similarly, iteration lazily materializes distributions as it moves through the map. It specifically does not materialize everything first. The main reason why this is effective is that an `OwnedArchive<SimpleMetadata>` represents a zero-copy (other than reading the source file) version of `SimpleMetadata` that is really just a `Vec<u8>` internally. The problem with `VersionMap` construction previously is that it had to eagerly materialize a `SimpleMetadata` in memory before anything else, which defeats a large part of the purpose of zero-copy deserialization. By making more of `VersionMap` construction itself lazy, we permit doing some parts of resolution without necessarily fully deserializing a `SimpleMetadata` into memory. Indeed, with this commit, in the warm cached case, a `SimpleMetadata` is itself never materialized fully in memory. This does not completely and totally fully realize the benefits of zero-copy deserialization. For example, we are likely still building lots of distributions in memory that we don't actually need in some cases. Perhaps in cases where no resolution exists, or when one needs to iterate over large portions of the total versions published for a package.	2024-02-15 08:10:32 -05:00
Andrew Gallant	bdb491baf6	deps: bump pubgrub This brings in a [PR] that makes `Range::as_singleton` return a borrow. [PR]: https://github.com/zanieb/pubgrub/pull/23	2024-02-15 08:10:32 -05:00
Zanie Blue	b5dd8b7de2	Track yanked versions as incompatibilities (#1290 ) Moves yanked version filtering from `VersionMap::from_metadata` to the resolver and tracks it as a PubGrub unavailable incompatibility so yanked versions are reflected in error messages. e.g. before ``` ╰─▶ Because only albatross<=0.1.0 is available and you require albatross>0.1.0, we can conclude that the requirements are unsatisfiable. ``` after ``` ╰─▶ Because only the following versions of albatross are available: albatross<=0.1.0 albatross==1.0.0 and albatross==1.0.0 is unusable because it was yanked, we can conclude that albatross>0.1.0 cannot be used. And because you require albatross>0.1.0, we can conclude that the requirements are unsatisfiable. ```	2024-02-12 22:01:17 -06:00
Charlie Marsh	16bb80132f	Add an `--offline` mode (#1270 ) ## Summary This PR adds an `--offline` flag to Puffin that disables network requests (implemented as a Reqwest middleware on our registry client). When `--offline` is provided, we also allow the HTTP cache to return stale data. Closes #942.	2024-02-13 03:35:23 +00:00
Charlie Marsh	c75eef28b5	Upgrade to miette v6.0.0 (#1272 )	2024-02-11 23:23:27 -05:00
Charlie Marsh	ba4c6e1a55	Remove unused deps (#1273 )	2024-02-11 18:53:58 +00:00
Charlie Marsh	32aacc35a9	Bump version to v0.0.4 (#1269 )	2024-02-09 16:42:17 -05:00
konsti	ab45485eb5	Reduce stack sizes further and ignore remaining tests (#1261 ) This PR reduces the stack sizes a windows a little further using the stack traces from stack overflows combined with looking at the type sizes. Ultimately, it ignore the three remaining tests failing in debug on windows due to stack overflows to unblock `cargo test` for windows on CI. 444 tests run: 444 passed (39 slow), 1 skipped	2024-02-06 23:08:18 +01:00
Charlie Marsh	62416286e2	Remove `add` and `remove` commands (#1259 ) ## Summary These add and remove dependencies from a `pyproject.toml` -- but they're currently hidden, and don't match the rest of the workflow. We can re-add them when the time is right.	2024-02-06 14:18:27 -05:00
Andrew Gallant	d4b4c21133	initial implementation of zero-copy deserialization for SimpleMetadata (#1249 ) (Please review this PR commit by commit.) This PR closes an initial loop on zero-copy deserialization. That is, provides a way to get a `Archived<SimpleMetadata>` (spelled `OwnedArchive<SimpleMetadata>` in the code) from a `CachedClient`. The main benefit of zero-copy deserialization is that we can read bytes from a file, cast those bytes to a structured representation without cost, and then start using that type as any other Rust type. The "catch" is that the structured representation is not the actual type you started with, but the "archived" version of it. In order to make all this work, we ended up needing to shave a rather large yak: we had to re-implement HTTP cache semantics. Previously, we were using the `http-cache-semantics` crate. While it does support Serde, it doesn't support `rkyv`. Moreover, even simple support for `rkyv` wouldn't be enough. What we actually want is for the HTTP cache semantics to be implemented on the archived type so that we can decide whether our cached response is stale or not without needing to do a full deserialization into the unarchived type. This is why, in this PR, you'll see `impl ArchivedCachePolicy { ... }` instead of `impl CachePolicy { ... }`. (The `derive(rkyv::Archive)` macro automatically introduces the `ArchivedCachePolicy` type into the current namespace.) Unfortunately, this PR does not fully realize the dream that is zero-copy deserialization. Namely, while a `CachedClient` can now provide an `OwnedArchive<SimpleMetadata>`, the rest of our code doesn't really make use of it. Indeed, as soon as we go to build a `VersionMap`, we eagerly convert our archived metadata into an owned `SimpleMetadata` via deserialization (that isn't zero-copy). After this change, a lot of the work now shifts to `rkyv` deserialization and `VersionMap` construction. More precisely, the main thing we drop here is `CachePolicy` deserialization (which is now truly zero-copy) and the parsing of the MessagePack format for `SimpleMetadata`. But we are still paying for deserialization. We're just paying for it in a different place. This PR does seem to bring a speed-up, but it is somewhat underwhelming. My measurements have been pretty noisy, but I get a 1.1x speedup fairly often: ``` $ hyperfine -w5 "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null Time (mean ± σ): 164.4 ms ± 18.8 ms [User: 427.1 ms, System: 348.6 ms] Range (min … max): 131.1 ms … 190.5 ms 18 runs Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null Time (mean ± σ): 148.3 ms ± 10.2 ms [User: 357.1 ms, System: 319.4 ms] Range (min … max): 136.8 ms … 184.4 ms 19 runs Summary puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran 1.11 ± 0.15 times faster than puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ``` One downside is that this does increase cache size (`rkyv`'s serialization format is not as compact as MessagePack). On disk size increases by about 1.8x for our `simple-v0` cache. ``` $ sort-filesize cache-main 4.0K cache-main/CACHEDIR.TAG 4.0K cache-main/.gitignore 8.0K cache-main/interpreter-v0 8.7M cache-main/wheels-v0 18M cache-main/archive-v0 59M cache-main/simple-v0 109M cache-main/built-wheels-v0 193M cache-main 193M total $ sort-filesize cache-test 4.0K cache-test/CACHEDIR.TAG 4.0K cache-test/.gitignore 8.0K cache-test/interpreter-v0 8.7M cache-test/wheels-v0 18M cache-test/archive-v0 107M cache-test/simple-v0 109M cache-test/built-wheels-v0 242M cache-test 242M total ``` Also, while I initially intended to do a simplistic implementation of HTTP cache semantics, I found that everything was somewhat inter-connected. I could have wrote code that _specifically_ only worked with the present behavior of PyPI, but then it would need to be special cased and everything else would need to continue to use `http-cache-sematics`. By implementing what we need based on what Puffin actually is (which is still less than what `http-cache-semantics` does), we can avoid special casing and use zero-copy deserialization for our cache policy in _all_ cases.	2024-02-05 16:47:53 -05:00
Zanie Blue	d090acf13d	Improve error messaging when a dependency is not found (#1241 ) Previously, whenever we encountered a missing package we would throw an error without information about why the package was requested. This meant that if a transitive dependency required a missing package, the user would have no idea why it was even selected. Here, we track `NotFound` and `NoIndex` errors as `NoVersions` incompatibilities with an attached reason. Improves our test coverage for `--no-index` without `--find-links`. The [snapshots](https://github.com/astral-sh/puffin/pull/1241/files#diff-3eea1658f165476252f1f061d0aa9f915aabdceafac21611cdf45019447f60ec) show a nice improvement. I think this will also enable backtracking to another version if some version of transitive dependency has a missing dependency. I'll write a scenario for that next. Requires https://github.com/zanieb/pubgrub/pull/22	2024-02-05 08:43:05 -06:00
konsti	f10f902570	Yield after channel send and move cpu tasks to thread (#1163 ) ## Summary Previously, we were blocking operations that could run in parallel. We would send request through our main requests channel, but not yield so that the receiver could only start processing requests much later than necessary. We solve this by switching to the async `tokio::sync::mpsc::channel`, where send is an async functions that yields. Due to the increased parallelism cache deserialization and the conversion from simple api request to version map became bottlenecks, so i moved them to `spawn_blocking`. Together these result in a 30-60% speedup for larger warm cache resolution. Small cases such as black already resolve in 5.7 ms on my machine so there's no speedup to be gained, refresh and no cache were to noisy to get signal from. Note for the future: Revisit the bounded channel if we want to produce requests from `process_request`, too, (this would be good for prefetching) to avoid deadlocks. ## Details We can look at the behavior change through the spans: ``` RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null ``` Below, you can see how on main, we have discrete phases: All (cached) simple api requests in parallel, then all (cached) metadata requests in parallel, repeat until done. The solver is mostly waiting until it has it's version map from the simple API query to be able to choose a version. The main thread is blocked by process requests. In the PR branch, the simple api requests succeeds much earlier, allowing the solver to advance and also to schedule more prefetching. Due to that `parse_cache` and `from_metadata` became bottlenecks, so i moved them off the main thread (green color, and their spans can now overlap because they can run on multiple threads in parallel). The main thread isn't blocked on `process_request` anymore, instead it has frequent idle times. The spans are all much shorter, which indicates that on main they could have finished much earlier, but a task didn't yield so they weren't scheduled to finish (though i haven't dug deep enough to understand the exact scheduling between the process request stream and the solver here). main ![jupyter-warm-main](https://github.com/astral-sh/puffin/assets/6826232/693c53cc-1090-41b7-b02a-a607fcd2cd99) PR ![jupyter-warm-branch](https://github.com/astral-sh/puffin/assets/6826232/33435f34-b39b-4b0a-a9d7-4bfc22f55f05) ## Benchmarks ``` $ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter" Benchmark 1: target/profiling/main-dev resolve jupyter Time (mean ± σ): 29.1 ms ± 0.7 ms [User: 22.9 ms, System: 11.1 ms] Range (min … max): 27.7 ms … 32.2 ms 103 runs Benchmark 2: target/profiling/branch-dev resolve jupyter Time (mean ± σ): 18.8 ms ± 1.1 ms [User: 37.0 ms, System: 22.7 ms] Range (min … max): 16.5 ms … 21.9 ms 154 runs Summary target/profiling/branch-dev resolve jupyter ran 1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter $ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent" Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent Time (mean ± σ): 37.8 ms ± 0.9 ms [User: 30.7 ms, System: 14.1 ms] Range (min … max): 36.6 ms … 41.5 ms 79 runs Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent Time (mean ± σ): 24.7 ms ± 1.5 ms [User: 47.0 ms, System: 39.3 ms] Range (min … max): 21.5 ms … 28.7 ms 113 runs Summary target/profiling/branch-dev resolve meine_stadt_transparent ran 1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent $ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in" Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in Time (mean ± σ): 229.0 ms ± 2.8 ms [User: 197.3 ms, System: 63.7 ms] Range (min … max): 225.8 ms … 234.0 ms 13 runs Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in Time (mean ± σ): 91.4 ms ± 5.3 ms [User: 289.2 ms, System: 176.9 ms] Range (min … max): 81.0 ms … 104.7 ms 32 runs Summary target/profiling/branch pip compile scripts/requirements/home-assistant.in ran 2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in ```	2024-02-02 18:18:24 +01:00
konsti	b16422a108	Remove insta_cmd (#1225 ) We need more flexible filters than those `inta` offers, and `insta_cmd` makes it impossible to plug in programmatic filters. At the same time we use barely any of `insta_cmd`'s features. We can replace the subset we need in about 50 loc.	2024-02-02 09:37:04 +00:00
Charlie Marsh	d77d129e8d	Run `cargo update` (#1230 )	2024-02-01 11:14:38 -05:00
Charlie Marsh	c4bfb6efee	Add a `BENCHMARKS.md` with rendered benchmarks (#1211 ) As a precursor to the release, I want to include a structured document with detailed benchmarks. Closes https://github.com/astral-sh/puffin/issues/1210.	2024-01-31 20:11:52 +00:00
Charlie Marsh	01258c1bb3	Report number of bytes deleted when clearing cache (#1203 ) ## Summary This is based on Cargo's `clean` implementation, with modifications based on some of my own preferences, and to better adhere to patterns we use in our codebase: ![Screenshot 2024-01-31 at 1 31 10 AM](https://github.com/astral-sh/puffin/assets/1309177/38704798-b17f-4972-ab67-00484ce63d62)	2024-01-31 10:48:28 -05:00
Charlie Marsh	b2f1bbaa63	Add a Ctrl+C handler to the confirm workflow (#1202 ) Fixes an issue whereby exiting the confirmation prompt can lead to your cursor disappearing: https://github.com/console-rs/dialoguer/issues/294. See: `b839a2c5b7/rye/src/main.rs (L36-L48)`.	2024-01-31 02:08:27 +00:00
Charlie Marsh	3f5e7306a5	Remove `WaitMap` dependency (#1183 ) ## Summary This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by removing the `WaitMap` and gaining more granular control over the values that we hold over `await` boundaries.	2024-01-30 15:25:22 -05:00
Charlie Marsh	aa3b79ec63	Prompt user for missing `-r` and `-e` flags in `pip install` (#1180 ) ## Summary If the user runs a command like `pip install requirements.txt`, we now prompt them to ask if they meant to include the `-r` flag: ![Screenshot 2024-01-29 at 8 38 29 PM](https://github.com/astral-sh/puffin/assets/1309177/82b9f7a2-2526-4144-b200-a5015e5b8a4b) ![Screenshot 2024-01-29 at 8 38 33 PM](https://github.com/astral-sh/puffin/assets/1309177/bd8ebb51-2537-4540-a0e0-718e66a1c69c) The specific logic is: if the requirement ends in `.txt` or `.in`, and the file exists locally, prompt the user for `-r`. If the requirement contains a directory separator, and the directory exists locally, prompt the user for `-e`. Closes #1166.	2024-01-30 18:58:45 +00:00
konsti	614bb0cf52	Update async_http_range_reader to 0.5.0 (#1189 ) Removes a git dep and removes itertools 0.11	2024-01-30 16:32:53 +00:00
konsti	ab27913f68	Instrument the main function and add jupyter.in (#1186 ) Instrument the main function as anchor span for checking overhead and update tracing-durations-export to 0.2.0 for differentiating blocking/non-blocking tasks. Add a `jupyter.in` requirement since `pip install jupyter` is a common operation. I tried `jupyterlab` too but there is no difference in performance (1.00 ± 0.07).	2024-01-30 11:03:24 +00:00
Charlie Marsh	61a3060383	Run `cargo update` (#1178 )	2024-01-29 21:01:37 -05:00
Charlie Marsh	fa3c9afdc1	Deduplicate `pep440_rs` in dependency tree (#1177 ) ## Summary Closes https://github.com/astral-sh/puffin/issues/1176. ## Test Plan `cargo tree -p puffin -i pep440_rs` runs without error. Previously, it errored due to multiple versions.	2024-01-29 16:11:42 -05:00
Charlie Marsh	67a09649f2	Support parsing `--find-links`, `--index-url`, and `--extra-index-url` in `requirements.txt` (#1146 ) ## Summary This PR adds support for `--find-links`, `--index-url`, and `--extra-index-url` arguments when specified in a `requirements.txt`. It's a mostly-straightforward change. The only uncertain piece is what to do when multiple files include these flags, and/or when we include them on the CLI and in other files. In general: - If _anything_ specifies `--no-index`, we respect it. - We combine all `--extra-index-url` and `--find-links` across all sources, since those are just vectors. - If we see multiple `--index-url` in requirements files, we error. - We respect the `--index-url` from the command line over any provided in a requirements file. (`pip-compile` seems to just pick one semi-arbitrarily when multiple are provided.) Closes https://github.com/astral-sh/puffin/issues/1143.	2024-01-29 15:06:40 +00:00
Charlie Marsh	4b9daf9604	Use tokio_tar instead of async_tar (#1170 ) ## Summary `tokio_tar` is a fork of `async_tar` that uses Tokio instead of `async-std`. Using it removes a significant dependency from our tree. (There is an open PR (https://github.com/dignifiedquire/async-tar/pull/41) in `async-tar` to add Tokio support, but it's over a year old.) See: https://github.com/astral-sh/puffin/pull/1157#discussion_r1469190249.	2024-01-29 10:00:30 -05:00
Charlie Marsh	d88ce76979	Stream unpacking of source distribution downloads (#1157 ) This PR migrates our source distribution downloads to unzip as we stream, similar to our approach for wheels. In my testing, this showed a consistent speedup (e.g., 6% here for a few representative source distributions): ```text ❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in Benchmark 1: ./target/release/main (install-cold) Time (mean ± σ): 1.503 s ± 0.039 s [User: 1.479 s, System: 0.537 s] Range (min … max): 1.466 s … 1.605 s 10 runs Benchmark 2: ./target/release/puffin (install-cold) Time (mean ± σ): 1.421 s ± 0.024 s [User: 1.505 s, System: 0.593 s] Range (min … max): 1.381 s … 1.454 s 10 runs Summary './target/release/puffin (install-cold)' ran 1.06 ± 0.03 times faster than './target/release/main (install-cold)' ```	2024-01-28 20:09:24 -05:00
Andrew Gallant	5219d37250	add initial rkyv support (#1135 ) This PR adds initial support for [rkyv] to puffin. In particular, the main aim here is to make puffin-client's `SimpleMetadata` type possible to deserialize from a `&[u8]` without doing any copies. This PR stops short of actuallying doing that zero-copy deserialization. Instead, this PR is about adding the necessary trait impls to a variety of types, along with a smattering of small refactorings to make rkyv possible to use. For those unfamiliar, rkyv works via the interplay of three traits: `Archive`, `Serialize` and `Deserialize`. The usual flow of things is this: * Make a type `T` implement `Archive`, `Serialize` and `Deserialize`. rkyv helpfully provides `derive` macros to make this pretty painless in most cases. * The process of implementing `Archive` for `T` usually creates an entirely new distinct type within the same namespace. One can refer to this type without naming it explicitly via `Archived<T>` (where `Archived` is a clever type alias defined by rkyv). * Serialization happens from `T` to (conceptually) a `Vec<u8>`. The serialization format is specifically designed to reflect the in-memory layout of `Archived<T>`. Notably, not `T`. But `Archived<T>`. * One can then get an `Archived<T>` with no copying (albeit, we will likely need to incur some cost for validation) from the previously created `&[u8]`. This is quite literally [implemented as a pointer cast][rkyv-ptr-cast]. * The problem with an `Archived<T>` is that it isn't your `T`. It's something else. And while there is limited interoperability between a `T` and an `Archived<T>`, the main issue is that the surrounding code generally demands a `T` and not an `Archived<T>`. This is at the heart of the tension for introducing zero-copy deserialization, and this is mostly an intrinsic problem to the technique and not an rkyv-specific issue. For this reason, given an `Archived<T>`, one can get a `T` back via an explicit deserialization step. This step is like any other kind of deserialization, although generally faster since no real "parsing" is required. But it will allocate and create all necessary objects. This PR largely proceeds by deriving the three aforementioned traits for `SimpleMetadata`. And, of course, all of its type dependencies. But we stop there for now. The main issue with carrying this work forward so that rkyv is actually used to deserialize a `SimpleMetadata` is figuring out how to deal with `DataWithCachePolicy` inside of the cached client. Ideally, this type would itself have rkyv support, but adding it is difficult. The main difficulty lay in the fact that its `CachePolicy` type is opaque, not easily constructable and is internally the tip of the iceberg of a rat's nest of types found in more crates such as `http`. While one "dumb"-but-annoying approach would be to fork both of those crates and add rkyv trait impls to all necessary types, it is my belief that this is the wrong approach. What we'd like to do is not just use rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to get an `Archived<DataWithCachePolicy>` and make actual decisions used the archived type directly. Doing that will require some work to make `Archived<DataWithCachePolicy>` directly useful. My suspicion is that, after doing the above, we may want to mush forward with a similar approach for `SimpleMetadata`. That is, we want `Archived<SimpleMetadata>` to be as useful as possible. But right now, the structure of the code demands an eager conversion (and thus deserialization) into a `SimpleMetadata` and then into a `VersionMap`. Getting rid of that eagerness is, I think, the next step after dealing with `DataWithCachePolicy` to unlock bigger wins here. There are many commits in this PR, but most are tiny. I still encourage review to happen commit-by-commit. [rkyv]: https://rkyv.org/ [rkyv-ptr-cast]: https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68	2024-01-28 12:14:59 -05:00
Charlie Marsh	6f2c235d21	Avoid re-creating directories during unzip (#1154 ) ## Summary We have this optimization in `wheel.rs`, in the installer, but it makes a huge difference for zips with many small files: ``` Benchmarking file_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 74.2s, or reduce sample count to 10. file_reader/Django-5.0.1-py3-none-any.whl time: [751.63 ms 757.78 ms 764.27 ms] change: [-1.0290% +0.0841% +1.2289%] (p = 0.88 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild Benchmarking buffered_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 53.4s, or reduce sample count to 10. buffered_reader/Django-5.0.1-py3-none-any.whl time: [529.86 ms 536.44 ms 543.35 ms] change: [+0.0293% +1.5543% +3.1426%] (p = 0.05 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild ``` That's almost 30% faster...	2024-01-28 00:07:54 -05:00
Charlie Marsh	d6795da0ea	Set permissions after streaming unzip (#1151 ) ## Summary When we migrated to an "unzip while we stream" solution, we lost the logic to set permissions on the extracted files, so executables in wheels were no longer executable. It turns out this is a little tricky, since the permissions metadata is in the central directory at the _end_ of the zip file, and the async ZIP reader explicitly stops iteration once it hits the central directory. (Specifically, it goes 4 bytes into the central directory, since it sees the 4-byte signature header and then stops.) So, to solve that, I've added a `CentralDirectoryReader` that continues where that iterator left off. This required forking the async zip crate: https://github.com/charliermarsh/rs-async-zip/pull/1. It took a lot of fiddling but I'm quite confident in the code now, especially since the async zip crate validates the signature kind on every read. The central directory is typically quite small (even for the Zig wheel, which is enormous, it's just around 1MB), so I don't expect this to have a high cost. Closes https://github.com/astral-sh/puffin/issues/1148.	2024-01-27 19:22:44 -05:00
Charlie Marsh	50057cd5f2	Re-add Cargo's known hosts checking (#1118 ) ## Summary This ensures that (like Cargo) we don't suffer from https://github.com/advisories/GHSA-r5w3-xm58-jv6j, by way of checking known hosts when fetching via `libgit2`. The implementation is taken from Cargo itself, modified to remove all configuration, since we don't yet support configuration for known hosts, etc. Closes #285.	2024-01-25 22:29:36 -05:00
Charlie Marsh	77351c7874	Use snapshots for requirements.txt error tests (#1115 ) ## Summary I find these too difficult to edit and maintain. This brings them closer to the rest of our testing setups.	2024-01-25 20:35:52 -05:00
Charlie Marsh	5ad2e60561	Use `same-file` to detect interpreter shims (#1099 ) Our existing detection doesn't work on Windows, because we canoncalize the interpreter path but not `info.sys_executable`, so the former includes the UNC prefix, etc. This is cross-platform and gets at the intent of the check.	2024-01-25 12:27:49 -05:00
Charlie Marsh	f4939e50a6	Remove UNC prefixes on Windows (#1086 ) ## Summary This PR adds a `NormalizedDisplay` trait that we can use for user-facing paths, to strip the UNC prefix on Windows. On other platforms, the implementation is a no-op (vs. `Display`). I audited all usages of `.display()`, and changed any that were user-facing, either via `println!` or `eprintln!`, or by way of being included in error messages. I did _not_ change uses that were only in tests or only went to tracing. Closes https://github.com/astral-sh/puffin/issues/1084.	2024-01-25 11:44:22 -05:00
Zanie Blue	272555e915	Switch to ref on `main` for PubGrub (#1094 ) Just fixing the wrong merge order from https://github.com/astral-sh/puffin/pull/1088	2024-01-25 14:50:12 +00:00
Charlie Marsh	904db967af	Use junctions instead of symlinks on Windows (#1087 ) ## Summary When we unzip wheels in the cache, we write the directories out to an `archive-v0` bucket, and then symlink into that bucket from the `wheels-v0` and `built-wheels-v0` buckets. On Windows, symlinks are not well supported. Specifically, they need to be explicitly enabled by the user. So, instead of symlinks, we now use junctions, which are well-supported on Windows, and allow you to (effectively) symlink a directory to another directory. This PR implements said junction support, which gets the core installer working on Windows. In the past, we also used symlinks to implement another primitive: we wanted to be able to replace a directory "atomically" (I put "atomically" in quotes because I don't know if it's actually a guaranteed atomic operation), in case someone was trying to use the directory while we were replacing it (as opposed to deleting the directory, then moving it into place). On Windows, it doesn't appear to be possible to atomically replace a junction. So instead, I'm using a new design, whereby the cache always returns canonicalized paths. We know these canonicalized paths are unique and won't be replaced, so they're safe for writers to rely on. In general, when we write new data to the cache, we now return the canonicalized path. When we read from the cache, and try to identify (e.g.) the set of wheels available to us, we canonicalize the links immediately and consider them non-existent if that operation fails. Closes #1085. --------- Co-authored-by: konstin <konstin@mailbox.org>	2024-01-25 10:06:38 +01:00
Zanie Blue	ed1ac640b9	Consolidate `UnusableDependencies` into a generic `Unavailable` incompatibility (#1088 ) Requires https://github.com/zanieb/pubgrub/pull/20 In short, `UnusableDependencies` can be generalized into `Unavailable` which encompasses incompatibilities where a package range which is unusable for some inherent reason as well as when its dependencies are unusable. We can eventually use this to track more incompatibilities in the solver. I made the reason string required because I can't see a case where we should leave it out. Additionally, this improves the display of conflicts in the root requirements.	2024-01-24 22:10:44 -06:00
konsti	2e0ce70d13	Initial windows support (#940 ) ## Summary First batch of changes for windows support. Notable changes: * Fixes all compile errors and added windows specific paths. * Working venv creation on windows, both from a base interpreter and from a venv. This requires querying `stdlib` from the sysconfig paths to find the launcher. * Basic url/path conversion handling for windows. * `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to keep everything compiling across platforms. ## Outlook Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped There are various reason for the remaining test failure: * Windows-specific colorama and tzdata dependencies that change the snapshot slightly. This is by far the biggest batch. * Some url-path handling issues. I fixed some in the PR, some remain. * Lack of the latest python patch versions for older pythons on my machine, since there are no builds for windows and we need to register them in the registry for them to be picked up for `py --list-paths` (CC @zanieb RE #1070). * Lack of entrypoint launchers. * ... likely more	2024-01-24 18:27:49 +01:00
Charlie Marsh	0519375bd6	Remove some unused dependencies (#1077 )	2024-01-24 11:58:21 -05:00
Charlie Marsh	63f3434b21	Use nanoid instead of uuid (#1074 ) ## Summary Gives us equivalent randomness with ~half as many characters.	2024-01-24 05:05:14 +00:00
Andrew Gallant	eebc2f340a	make some things guaranteed to be deterministic (#1065 ) This PR replaces a few uses of hash maps/sets with btree maps/sets and index maps/sets. This has the benefit of guaranteeing a deterministic order of iteration. I made these changes as part of looking into a flaky test. Unfortunately, I'm not optimistic that anything here will actually fix the flaky test, since I don't believe anything was actually dependent on the order of iteration.	2024-01-23 20:30:33 -05:00
Charlie Marsh	5621c414cf	Use symlinks for directories entries in cache (#1037 ) ## Summary One problem we have in the cache today is that we can't overwrite entries atomically, because we store unzipped _directories_ in the cache (which makes installation _much_ faster than storing zipped directories). So, if you ignore the existing contents of the cache when writing, you might run into an error, because you might attempt to write a directory where a directory already exists. This is especially annoying for cache refresh, because in order to refresh the cache, we have to purge it (i.e., delete a bunch of stuff), which is also highly unsafe if Puffin is running across multiple threads or multiple processes. The solution I'm proposing here is that whenever we persist a _directory_ to the cache, we persist it to a special "archive" bucket. Then, within the other buckets, directory entries are actually symlinks into that "archive" bucket. With symlinks, we can atomically replace, which means we can easily overwrite cache entries without having to delete from the cache. The main downside is that we'll now accumulate dangling entries in the "archive" bucket, and so we'll need to implement some form of garbage collection to ensure that we remove entries with no symlinks. Another downside is that cache reads and writes will be a bit slower, since we need to deal with creating and resolving these symlinks. As an example... after this change, the cache entry for this unzipped wheel is actually a symlink: ![Screenshot 2024-01-22 at 11 56 18 AM](https://github.com/astral-sh/puffin/assets/1309177/99ff6940-5096-4246-8d16-2a7bdcdd8d4b) Then, within the archive directory, we actually have two unique entries (since I intentionally ran the command twice to ensure overwrites were safe): ![Screenshot 2024-01-22 at 11 56 22 AM](https://github.com/astral-sh/puffin/assets/1309177/717d04e2-25d9-4225-b190-bad1441868c6)	2024-01-23 19:52:37 +00:00
Charlie Marsh	6561617c56	Store source distribution builds under a unique manifest ID (#1051 ) ## Summary This is a refactor of the source distribution cache that again aims to make the cache purely additive. Instead of deleting all built wheels when the cache gets invalidated (e.g., because the source distribution changed on PyPI or something), we now treat each invalidation as its own cache directory. The manifest inside of the source distribution directory now becomes a pointer to the "latest" version of the source distribution cache. Here's a visual example: ![Screenshot 2024-01-22 at 5 35 41 PM](https://github.com/astral-sh/puffin/assets/1309177/ca103c83-e116-4956-b91c-8434fe62cffe) With this change, we avoid deleting built distributions that might be relied on elsewhere and maintain our invariant that the cache is purely additive. The cost is that we now preserve stale wheels, but we should add a garbage collection mechanism to deal with that.	2024-01-23 19:49:11 +00:00
konsti	1131341cbc	Support more formats in `puffin venv`, incl. windows support (#1039 ) Mirroring `virtualenv -p` and driven by the lack of `pythonx.y` in `PATH` on windows, this PR adds `-p x.y` support to `puffin venv` (first commit). Supported formats: * NEW: `-p 3.10` searches for an installed Python 3.10 (Looking for `python3.10` on linux/mac). Specifying a patch version is not supported * `-p python3.10` or `-p python.exe` looks for a binary in `PATH` * `-p /home/ferris/.local/bin/python3.10` uses this exact Python In the second commit, we add python interpreter search on windows using `py --list-paths`. On windows, all python are called `python.exe` so the unix trick of looking for `python{}.{}` in `PATH` doesn't work. Instead, we ask the python launcher for windows to tell us about all installed packages. We should eventually migrate this to [PEP 514](https://peps.python.org/pep-0514/) by reading the registry entries ourselves.	2024-01-23 15:35:07 +00:00
Charlie Marsh	b0e73d796c	Add support for PyPy wheels (#1028 ) ## Summary This PR adds support for PyPy wheels by changing the compatible tags based on the implementation name and version of the current interpreter. For now, we only support CPython and PyPy, and explicitly error out when given other interpreters. (Is this right? Should we just fallback to CPython tags...? Or skip the ABI-specific tags for unknown interpreters?) The logic is based on `4d85340613/src/packaging/tags.py (L247)`. Note, however, that `packaging` uses the `EXT_SUFFIX` variable from `sysconfig`... Instead, I looked at the way that PyPy formats the tags, and recreated them based on the Python and implementation version. For example, PyPy wheels look like `cchardet-2.1.7-pp37-pypy37_pp73-win_amd64.whl` -- so that's `pp37` for PyPy with Python version 3.7, and then `pypy37_pp73` for PyPy with Python version 3.7 and PyPy version 7.3. Closes https://github.com/astral-sh/puffin/issues/1013. ## Test Plan I tested this manually, but I couldn't find macOS universal PyPy wheels... So instead I added `cchardet` to a `requirements.in`, ran `cargo run pip sync requirements.in --index-url https://pypy.kmtea.eu/simple --verbose`, and added logging to verify that the platform tags matched (even if the architecture didn't).	2024-01-22 14:22:27 +00:00
Charlie Marsh	e09a51653e	Propagate cancellation errors in `OnceMap` (#1032 ) ## Summary Ensures that if an operation is cancelled in one thread, we propagate it to others rather than panicking. Related to https://github.com/astral-sh/puffin/issues/1005.	2024-01-22 09:00:21 -05:00
Charlie Marsh	b3954f2449	Enable PowerPC builds (#1017 ) Closes #1015.	2024-01-19 17:29:11 -05:00
Zanie Blue	33b35f7020	Add support for disabling installation from pre-built wheels (#956 ) Adds support for disabling installation from pre-built wheels i.e. the package must be built from source locally. We will still always use pre-built wheels for metadata during resolution. Available via `--no-binary` and `--no-binary-package <name>` flags in `pip install` and `pip sync`. There is no flag for `pip compile` since no installation happens there. ``` --no-binary Don't install pre-built wheels. When enabled, all installed packages will be installed from a source distribution. The resolver will still use pre-built wheels for metadata. --no-binary-package <NO_BINARY_PACKAGE> Don't install pre-built wheels for a specific package. When enabled, the specified packages will be installed from a source distribution. The resolver will still use pre-built wheels for metadata. ``` When packages are already installed, the `--no-binary` flag will have no affect without the `--reinstall` flag. In the future, I'd like to change this by tracking if a local distribution is from a pre-built wheel or a locally-built wheel. However, this is significantly more complex and different than `pip`'s behavior so deferring for now. For reference, `pip`'s flag works as follows: ``` --no-binary <format_control> Do not use binary packages. Can be supplied multiple times, and each time adds to the existing value. Accepts either ":all:" to disable all binary packages, ":none:" to empty the set (notice the colons), or one or more package names with commas between them (no colons). Note that some packages are tricky to compile and may fail to install when this option is used on them. ``` Note we are not matching the exact `pip` interface here because it seems complicated to use. I think we may want to consider adjusting our interface for this behavior since we're not entirely compatible anyway e.g. I think `--force-build` and `--force-build-package` are clearer names. We could also consider matching the `pip` interface or only allowing `--no-binary <package>` for compatibility. We can of course do whatever we want in our _own_ install interfaces later. Additionally, we may want to further consider the semantics of `--no-binary`. For example, if I run `pip install pydantic --no-binary` I expect _just_ Pydantic to be installed without binaries but by default we will build all of Pydantic's dependencies too. This work was prompted by #895, as it is much easier to measure performance gains from building source distributions if we have a flag to ensure we actually build source distributions. Additionally, this is a flag I have used frequently in production to debug packages that ship Cythonized wheels.	2024-01-19 11:24:27 -06:00
Charlie Marsh	5adb08a304	Allow relative paths and environment variables in all editable representations (#1000 ) ## Summary I don't know if this is actually a good change, but it tries to make the editable install experience more consistent. Specifically, we now support... ``` # Use a relative path with a `file://` prefix. # Prior to this PR, we supported `file:../foo`, but not `file://../foo`, which felt inconsistent. -e file://../foo # Use environment variables with paths, not just URLs. # Prior to this PR, we supported `file://${PROJECT_ROOT}/../foo`, but not the below. -e ${PROJECT_ROOT}/../foo ``` Importantly, `-e file://../foo` is actually not supported by pip... `-e file:../foo` _is_ supported though. We support both, as of this PR. Open to feedback.	2024-01-19 09:00:37 -05:00
Charlie Marsh	c8285cb5ef	Bump version to v0.0.3 (#999 )	2024-01-18 23:39:35 -05:00
Charlie Marsh	732ef7adb7	Bump version to v0.0.2 (#987 ) Bumping the version so that I can test the release process again (including PyPI publish).	2024-01-18 20:56:09 -05:00
Charlie Marsh	5e2b715366	Rename `puffin-cli` crate to `puffin` (#976 ) ## Summary Like in Ruff, this simplifies a few things.	2024-01-18 19:02:52 -05:00
Charlie Marsh	96a61fb351	Remove RFC2047 decoder (#967 ) ## Summary - This was inherited from `d719988323/src/metadata.rs (LL78C2-L91C26)` - ...which introduced this code here: `9cd1d43f7c` - ...with the originating issue here: https://github.com/PyO3/maturin/issues/612 - ...and the upstream issue here: https://github.com/staktrace/mailparse/issues/50 It seems like the goal was to support Unicode in certain header fields, but I don't think this is necessary for us. We only use `get_first_value` for `Requires-Python`, which has to be ASCII, doesn't it? In my testing, it seems like the `charset` hack can also be removed. The tests I copied over actually work without it, which makes me a bit skeptical. The main benefit here is that we get to a remove a _big_ dependency stack, including Chumsky and Stacker and psm which have limited cross-platform support.	2024-01-18 15:09:45 -05:00
Charlie Marsh	bb55fc19b3	Run `cargo update` (#951 )	2024-01-17 13:15:47 -05:00
Charlie Marsh	b8fbd529a1	Move `OnceMap` into its own crate (#946 ) ## Summary This is extremely generic (like `WaitMap`), and I want to use it in the cache.	2024-01-17 04:09:15 +00:00
Charlie Marsh	9a3f3d385c	Remove `PubGrubVersion` (#924 ) ## Summary I'm running into some annoyances converting `&Version` to `&PubGrubVersion` (which is just a wrapper type around `Version`), and I realized... We don't even need `PubGrubVersion`? The reason we "need" it today is due to the orphan trait rule: `Version` is defined in `pep440_rs`, but we want to `impl pubgrub::version::Version for Version` in the resolver crate. Instead of introducing a new type here, which leads to a lot of awkwardness around conversion and API isolation, what if we instead just implement `pubgrub::version::Version` in `pep440_rs` via a feature? That way, we can just use `Version` everywhere without any confusion and conversion for the wrapper type.	2024-01-15 08:51:12 -05:00
konsti	e9b6b6fa36	Implement `--find-links` as flat indexes (directories in pip-compile) (#912 ) Add directory `--find-links` support for local paths to pip-compile. It seems that pip joins all sources and then picks the best package. We explicitly give find links packages precedence if the same exists on an index and locally by prefilling the `VersionMap`, otherwise they are added as another index and the existing rules of precedence apply. Internally, the feature is called _flat index_, which is more meaningful than _find links_: We're not looking for links, we're picking up local directories, and (TBD) support another index format that's just a flat list of files instead of a nested index. `RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and `SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist` and `RegistrySourceDist` gained the ability to represent both a url and a path so that `--find-links` with a url and with a path works the same, both being locked as `<package_name>@<version>` instead of `<package_name> @ <url>`. (This is more of a detail, this PR in general still work if we strip that and have directory find links represented as `<package_name> @ file:///path/to/file.ext`) `PrioritizedDistribution` and `FlatIndex` have been moved to locations where we can use them in the upstack PR. I added a `scripts/wheels` directory with stripped down wheels to use for testing. We're lacking tests for correct tag priority precedence with flat indexes, i only confirmed this manually since it is not covered in the pip-compile or pip-sync output. Closes #876	2024-01-15 02:04:10 +00:00
konsti	5ffbfadf66	Make hashes optional (#910 ) There is no guarantee that indexes provide hashes at all or the sha256 we support specifically. [PEP 503](https://peps.python.org/pep-0503/#specification): > The URL SHOULD include a hash in the form of a URL fragment with the following syntax: #<hashname>=<hashvalue>, where <hashname> is the lowercase name of the hash function (such as sha256) and <hashvalue> is the hex encoded digest. We instead use the url as input to generate a hash when caching.	2024-01-14 16:32:55 -05:00
Charlie Marsh	231686e71b	Remove `incompatibilities` from index (#905 ) This isn't really part of the "index", it's part of the resolution.	2024-01-13 02:57:15 +00:00
bojanserafimov	10227a74f8	Unzip while downloading (#856 )	2024-01-11 09:41:46 -05:00
Charlie Marsh	4123a35228	Run `cargo update` (#873 )	2024-01-11 09:10:07 -05:00
konsti	8c2b7d55af	Cleanup deps and docs (#882 ) Fix warnings from `cargo +nightly udeps` and `cargo doc`. Removes all mentions of regex from pep440_rs.	2024-01-11 10:43:40 +00:00
Zanie Blue	93d3093a2a	Improve formatting of package ranges in error messages (#864 ) Closes #810 Closes https://github.com/astral-sh/puffin/issues/812 Requires https://github.com/zanieb/pubgrub/pull/19 and https://github.com/zanieb/pubgrub/pull/18 - Always pair package ranges with names e.g. `... of a matching a<1.0` instead of `... of a matching <1.0` - Split range segments onto multiple lines when not a singleton as suggested in [#850](https://github.com/astral-sh/puffin/pull/850#discussion_r1446419610) - Improve formatting when ranges are split across multiple lines e.g. by avoiding extra spaces and improving wording Note review will require expanding the hidden files as there are significant changes to the report formatter and snapshots. Bear with me here as these are definitely not perfect still. The following changes build on top of this independently for further improvements: - #868 - #867 - #866 - #871	2024-01-10 14:16:23 -06:00
konsti	1203f8f9e8	Gourgeist updates (#862 ) * Use caching again * Make clap feature only required for the cli/bin optional	2024-01-09 23:04:15 +00:00
bojanserafimov	e67b7858e6	Use zlib-ng for faster decompression (#859 )	2024-01-09 16:13:36 -05:00
Zanie Blue	2b0c2e294b	Fix formatting of negated singleton versions in error messages (#836 ) Closes #805 Requires https://github.com/zanieb/pubgrub/pull/17	2024-01-08 12:33:01 -06:00
konsti	b6338b5e4a	Use tracing-durations-export to visualize parallelism bottlenecks (dev commands) (#816 ) Example usage: ``` # Cached TRACING_DURATIONS_FILE=target/traces/black.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve black TRACING_DURATIONS_FILE=target/traces/meine_stadt_transparent.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve meine_stadt_transparent TRACING_DURATIONS_FILE=target/traces/jupyter.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve jupyter # No cache TRACING_DURATIONS_FILE=target/traces/black-no-cache.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve --no-cache black TRACING_DURATIONS_FILE=target/traces/meine_stadt_transparent-no-cache.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve --no-cache meine_stadt_transparent TRACING_DURATIONS_FILE=target/traces/jupyter-no-cache.ndjson RUST_LOG=puffin=info cargo run --bin puffin-dev --profile profiling -- resolve --no-cache jupyter ``` Uncached black output example: ![black-no-cache](https://github.com/astral-sh/puffin/assets/6826232/38497b89-7214-453b-9456-c9d9cbf7d2d5)	2024-01-08 16:20:38 +01:00
Charlie Marsh	54838914be	Migrate back to `owo-colors` (#824 ) In the past, I moved us to `owo-colors` (https://github.com/astral-sh/puffin/pull/121); then, we moved back, because we ran into issues with overriding the settings to force-disable colors. But `anstream` solved those problems, so I'm moving us _back_ to `owo-colors`, since it's what `anstream` recommends, and it's already used by many of our dependencies (`miette`, `configparser`). --------- Co-authored-by: konstin <konstin@mailbox.org>	2024-01-08 08:54:57 +00:00
Charlie Marsh	e6fcb9c4d3	Use `anstream` for all color control (#823 ) ## Summary We can use `anstream` for all color control, rather than going through `colored`. Note that we still need the `colored` crate, since `colored` and `anstream` solve different problems. (`anstream` recommends using `owo-colors` alongside it, but `colored` seems to work fine?) Resolves the issue raised in https://github.com/astral-sh/puffin/pull/742 via `anstream` rather than `colored`. Closes https://github.com/astral-sh/puffin/issues/782.	2024-01-06 20:44:05 -05:00
konsti	5820a9d937	Update dependencies (#794 ) Pull in a bunch of updates so they get some testing before we announce the project. textwrap 0.16 is blocked on miette updating, http 1.0 on reqwest.	2024-01-05 11:40:12 -05:00
konsti	673bece595	Allow `pip-compile` without a venv (#494 ) The semantics are a bit unintuitive because `--python-version` is a preference when looking for a python version without a venv, but if we don't find that exact version we'll take `python3` and patch the markers. This will make more sense once we start provisioning python builds. We can now resolve black with both python 3.8 and 3.12, with or without that python version being in scope. In the example below, `PATH=$HOME/.cargo/bin:/usr/bin` removes the pyenv builds and leaves only `python3`, which is python 3.11. ```console $ RUST_LOG=puffin::commands=debug cargo run --bin puffin -q -- pip-compile -v scripts/benchmarks/requirements/black.in --python-version py38 0.004108s DEBUG puffin::commands::pip_compile Using Python 3.8 at /home/konsti/.local/bin/python3.8 Resolved 8 packages in 44ms # This file was autogenerated by Puffin v0.0.1 via the following command: # puffin pip-compile -v scripts/benchmarks/requirements/black.in --python-version py38 black==23.11.0 [...] platformdirs==4.0.0 # via black tomli==2.0.1 # via black typing-extensions==4.8.0 # via black $ PATH=$HOME/.cargo/bin:/usr/bin RUST_LOG=puffin::commands=debug cargo run --bin puffin -q -- pip-compile -v scripts/benchmarks/requirements/black.in --python-version py38 0.004315s DEBUG puffin::commands::pip_compile Using Python 3.11 at /usr/bin/python3 Resolved 8 packages in 43ms # This file was autogenerated by Puffin v0.0.1 via the following command: # puffin pip-compile -v scripts/benchmarks/requirements/black.in --python-version py38 black==23.11.0 [...] platformdirs==4.0.0 # via black tomli==2.0.1 # via black typing-extensions==4.8.0 # via black ``` ```console $ RUST_LOG=puffin::commands=debug cargo run --bin puffin -q -- pip-compile -v scripts/benchmarks/requirements/black.in --python-version py312 0.004216s DEBUG puffin::commands::pip_compile Using Python 3.12 at /home/konsti/.local/bin/python3.12 Resolved 6 packages in 37ms # This file was autogenerated by Puffin v0.0.1 via the following command: # puffin pip-compile -v scripts/benchmarks/requirements/black.in --python-version py312 black==23.11.0 [...] platformdirs==4.0.0 # via black $ PATH=$HOME/.cargo/bin:/usr/bin RUST_LOG=puffin::commands=debug cargo run --bin puffin -q -- pip-compile -v scripts/benchmarks/requirements/black.in --python-version py312 0.004190s DEBUG puffin::commands::pip_compile Using Python 3.11 at /usr/bin/python3 Resolved 6 packages in 39ms # This file was autogenerated by Puffin v0.0.1 via the following command: # puffin pip-compile -v scripts/benchmarks/requirements/black.in --python-version py312 black==23.11.0 [...] platformdirs==4.0.0 # via black ``` Fixes #235. Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>	2024-01-05 15:01:06 +00:00
Zanie Blue	5e04a95c45	Disable line wrapping during scenario tests (#784 ) Adds support for a `PUFFIN_NO_WRAP` environment variable which disables line wrapping in `miette` output. We set this variable in the scenario tests to improve the readability of snapshots. I contributed the ability to disable line wrapping upstream at https://github.com/zkat/miette/pull/328	2024-01-04 19:07:16 +00:00
konsti	2db9135c51	Update pubgrub to 78b8add6942766e5fb070bbda1de570e93d6399f (#783 ) Pull in the latest perf improvements	2024-01-04 15:55:35 +00:00
konsti	cd43708369	Flag to force latest version in resolve-many (#741 ) Also fixes color when redirecting puffin-dev to a log file.	2024-01-02 11:04:26 +00:00
konsti	3f8dc9f5bb	Update pubgrub (#737 ) Pull in https://github.com/pubgrub-rs/pubgrub/pull/170 and https://github.com/pubgrub-rs/pubgrub/pull/171	2023-12-28 21:13:27 +00:00
Charlie Marsh	188ab75769	Split `File` into internal and external type (#729 ) ## Summary This PR makes the `pypi_types::File` a response-only type (i.e., a type that's only used when deserializing over the wire), and adds a separate internal `File` type. Right now, the representations are similar, but already, we can avoid the "lenient" deserialization on our internal `File` type, and avoid the special-casing of the property names that's required in the JSON. Over time, we can evolve this representation entirely separately from the representation we receive from PyPI and other indexes.	2023-12-25 15:42:28 -05:00
Charlie Marsh	6ff21374dc	Split `puffin-cache` into Puffin-specific and generic utilities (#728 ) This crate started off as generic caching utilities, but we started adding a lot of Puffin-specific stuff (like the cache buckets abstraction that knows about Git vs. direct URL vs. indexes and so on). This PR moves the generic stuff into a new `cache-key` crate.	2023-12-25 14:38:56 +00:00
Charlie Marsh	187ccef4e1	Cache `Tags` on `Interpreter` (#726 )	2023-12-25 13:41:10 +00:00
Charlie Marsh	5b2e381f87	Remove `platform-tags` dependency on `puffin-interpreter` (#725 ) Cuts off a large internal dependency chain from what is otherwise a very general crate.	2023-12-24 23:06:50 +00:00
Charlie Marsh	343880820b	Un-escape HTML entities when decoding (#723 ) I don't have a good testing strategy here (I'm manually testing against `devpi` via `packse`), but the HTML index uses (e.g.) `data-requires-python=">=3.8"`, so we need to decode.	2023-12-24 16:35:45 -05:00
konsti	e23292641f	Add pypi 10k packages with most dependents dataset (#711 ) From manual inspection, this dataset generated through the [libraries.io API](https://libraries.io/api#project-search) seems more mainstream than the current 8k one, which is also preserved. I've added the dataset to the repo because the API requires an API key.	2023-12-24 18:31:52 +00:00
Charlie Marsh	5bce699ee1	Add support for HTML indexes (#719 ) ## Summary This PR adds support for HTML index responses (as with `--index-url=https://download.pytorch.org/whl`). Closes https://github.com/astral-sh/puffin/issues/412.	2023-12-24 16:04:00 +00:00
konsti	e60f0ec732	Update pubgrub (#713 ) Easier than i expected: We simply never construct the pubgrub error variants since we have our own main loop. The `unreachable!()`s can be removed when never is stabilized	2023-12-20 23:56:59 +01:00
Charlie Marsh	98fcb76015	Lock entire virtualenv during modifying commands (#695 ) These commands all assume that the `site-packages` are constant throughout. Closes #691.	2023-12-18 16:44:45 -05:00
konsti	89ca0d68b9	`exclude_newer` in puffin-dev resolve-cli (#684 ) Internal dev tool change.	2023-12-18 14:06:54 +00:00
konsti	f059c6e6a6	Support editable in pip-sync and pip-compile (#587 ) Support `-e path/do/dir` in pip-sync and and pip-compile.	2023-12-16 22:37:34 +00:00
konsti	71964ec7a8	Switch to msgpack in the cached client (#662 ) This gives a 1.23 speedup on transformers-extras. We could change to msgpack for the entire cache if we want. I only tried this format and postcard so far, where postcard was much slower (like 1.6s). I don't actually want to merge it like this, i wanted to figure out the ballpark of improvement for switching away from json. ``` hyperfine --warmup 3 --runs 10 "target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in" "target/profiling/branch pip-compile scripts/requirements/transformers-extras.in" Benchmark 1: target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in Time (mean ± σ): 179.1 ms ± 4.8 ms [User: 157.5 ms, System: 48.1 ms] Range (min … max): 174.9 ms … 188.1 ms 10 runs Benchmark 2: target/profiling/branch pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 221.1 ms ± 6.7 ms [User: 208.1 ms, System: 46.5 ms] Range (min … max): 213.5 ms … 235.5 ms 10 runs Summary target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in ran 1.23 ± 0.05 times faster than target/profiling/branch pip-compile scripts/requirements/transformers-extras.in ``` Disadvantage: We can't manually look into the cache anymore to debug things - [ ] Check more formats, i currently only tested json, msgpack and postcard, there should be other formats, too - [x] Switch over `CachedByTimestamp` serialization (for the interpreter caching) - [x] Switch over error handling and make sure puffin is still resilient to cache failure	2023-12-16 21:01:35 +00:00
konsti	620f73b38b	Speed up version parsing for a 1.27±0.03 speedup in transformers-extras with conservative changes (#660 ) Two low-hanging fruits as optimizations for version parsing: A fast path for release only versions and removing the regex from version specifiers (still calling into version's parsing regex if required). This enables optimizing the serde format since we now see the serde part instead of only PEP 440 parsing. I intentionally didn't rewrite the full PEP 440 at this step. ```console $ hyperfine --warmup 5 --runs 50 "target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in" "target/profiling/main pip-compile scripts/requirements/transformers-extras.in" Benchmark 1: target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 217.1 ms ± 3.2 ms [User: 194.0 ms, System: 55.1 ms] Range (min … max): 211.0 ms … 228.1 ms 50 runs Benchmark 2: target/profiling/main pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 276.7 ms ± 5.7 ms [User: 252.4 ms, System: 54.6 ms] Range (min … max): 268.9 ms … 303.5 ms 50 runs Summary target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in ran 1.27 ± 0.03 times faster than target/profiling/main pip-compile scripts/requirements/transformers-extras.in ``` --------- Co-authored-by: Andrew Gallant <andrew@astral.sh>	2023-12-15 14:03:35 -05:00
Charlie Marsh	9470c20e7a	Avoid double resolution during source builds (#656 ) ## Summary This PR ensures that we re-use the resolution to install the build dependencies when building a source distribution. Currently, we only pass along the list of requirements, and then use the `Finder` to map each requirement to a distribution. But we already determine the correct distribution when resolving! Closes https://github.com/astral-sh/puffin/issues/655.	2023-12-15 17:27:16 +00:00
Charlie Marsh	ed8dfbfcf7	Preserve verbatim URLs (#639 ) ## Summary This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout the resolution and installation pipeline. In short, alongside the parsed `Url`, we also keep the URL as written by the user. This enables us to display the URL exactly as written by the user, rather than the serialized path that we use internally. This will be especially useful once we start expanding environment variables since, at that point, we'll be able to write the version of the URL that includes the _unexpected_ environment variable to the output file.	2023-12-14 15:03:39 +00:00
Charlie Marsh	db7e2dedbb	Move archive extraction into its own crate (#647 ) We have some shared utilities beyond `puffin-build` and `puffin-distribution`, and further, I want to be able to access the sdist archive extraction logic from `puffin-distribution`. This is really generic, so moving into its own crate.	2023-12-14 04:49:09 +00:00
Charlie Marsh	920e10fc8f	Use `FxHash` consistently (#632 )	2023-12-13 05:36:03 +00:00
Charlie Marsh	a24eb57e93	Make warnings user-facing (#628 ) ## Summary Now, `puffin_warnings::warn_once` and `puffin_warnings::warn` will go to `stderr`, as long as the user isn't running under `--quiet`. Previously, these went through `tracing`, and so were only visible when running under `--verbose`.	2023-12-12 21:24:38 -05:00
Zanie Blue	490fb55ac5	Use available versions to simplify unsat error reports (#547 ) Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate version ranges in error reports using the actual available versions for each package. Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements this behavior as a method in the `Reporter` — here it's implemented in our custom report formatter (#521) instead which requires no upstream changes. Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the versions for packages that will be used in the report. This is a work in progress. Some things to do: - ~We may want to allow lazy retrieval of the version maps from the formatter~ - [x] We should probably create a separate error type for no solution instead of mixing them with other resolve errors - ~We can probably do something smarter than creating vectors to hold the versions~ - [x] This degrades error messages when a single version is not available, we'll need to special case that - [x] It seems safer to coerce the error type in `resolve` instead of `solve` if feasible	2023-12-12 23:25:16 +00:00
Charlie Marsh	1181288078	Download, build, and install in a single pipeline phase (#605 ) ## Summary At present, we have two separate phases within the installation pipeline related to populating wheels into the cache. The first phase downloads the distribution, and then builds any source distributions into wheels; the second phase unzips all the built wheels into the cache. This PR merges those two phases into one, such that we seamlessly download, build, and unzip wheels in one pass. This is more efficient, since we can start unzipping while we build. It also ensures that if the install _fails_ partway through, we don't end up with a bunch of downloaded wheels that we never had a chance to unzip. The code is also much simpler. The main downside is that the user-facing feedback isn't as granular, since we only have one phase and one progress bar for what was originally three distinct phases. Closes https://github.com/astral-sh/puffin/issues/571. ## Test Plan I ran the benchmark script on two separate requirements files, and saw a 7% and 31% speedup respectively: ```text + TARGET=./scripts/benchmarks/requirements.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 269.4 ms ± 33.0 ms [User: 42.4 ms, System: 117.5 ms] Range (min … max): 221.7 ms … 446.7 ms 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 250.6 ms ± 28.3 ms [User: 41.5 ms, System: 127.4 ms] Range (min … max): 207.6 ms … 336.4 ms 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran 1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ``` ```text + TARGET=./scripts/benchmarks/requirements-large.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 5.053 s ± 0.354 s [User: 1.413 s, System: 6.710 s] Range (min … max): 4.584 s … 6.333 s 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 3.845 s ± 0.225 s [User: 1.364 s, System: 6.970 s] Range (min … max): 3.482 s … 4.715 s 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran ```	2023-12-11 15:42:29 +00:00
Charlie Marsh	32f54a5947	Use async `Command` for wheel build operations (#601 ) Incredibly, this speeds up the install on a large project from 2m6s to 50s.	2023-12-09 16:20:52 +00:00
Charlie Marsh	a24534b0ce	Use `rustc-hash` instead of `fxhash` crate (#594 ) `fxhash` is the old, less maintained version of this crate (`rustc-hash`). We use the latter in Ruff.	2023-12-08 20:27:49 +00:00
konsti	6005d7a552	Keep track of in flight unzips using `OnceMap` (#544 ) I saw warnings when we were e.g. unzipping wheel and setuptools in two tasks at the same time. We now keep track of in flight unzips. This introduces a `OnceMap` abstraction which we also use in the resolver.	2023-12-08 20:18:11 +00:00
Charlie Marsh	4b8642c6f7	Enable selective cache purging in `puffin clean` (#589 ) ## Summary This PR enables `puffin clean` to accept package names as command line arguments, and selectively purge entries from the cache tied to the given package. Relate to #572. ## Test Plan Modified all the caching tests to run an additional step to (1) purge the cache, and (2) re-install the package.	2023-12-08 19:51:32 +00:00
Zanie Blue	ef7be9103c	Parse `SimpleJson` into categorized data in the client (#522 ) Extends #517 with a suggestion from @konstin to parse the `SimpleJson` into an intermediate type `SimpleMetadata(BTreeMap<Version, VersionFiles>)` before converting to a `VersionMap`. This reduces the number of times we need to parse the response. Additionally, we cache the parsed response now instead of `SimpleJson`. `VersionFiles` stores two vectors with `WheelFilename`/`SourceDistFilename` and `File` tuples. These can be iterated over together or separately. A new enum `DistFilename` was added to capture the `SourceDistFilename` and `WheelFilename` variants allowing iteration over both vectors.	2023-12-07 11:04:47 -06:00
Charlie Marsh	aa065f5c97	Modify install plan to support all distribution types (#581 ) This PR adds caching support for built wheels in the installer. Specifically, the `RegistryWheelIndex` now indexes both downloaded and built wheels (from registries), and we have a new `BuiltWheelIndex` that takes a subdirectory and returns the "best-matching" compatible wheel. Closes #570.	2023-12-07 04:43:34 +00:00
konsti	366c389385	Parse editable installs (#564 ) Parse `-e` for editable installs in `requirements.txt`. Unlike all the other requirements, editable installs don't have the name of the package specified.	2023-12-06 18:21:15 +01:00
konsti	3f4d7b7826	Improve path source dist caching (#578 ) Path distribution cache reading errors are no longer fatal. We now invalidate the path file source dists if its modification timestamp changed, and invalidate path dir source dists if `pyproject.toml` or alternatively `setup.py` changed, which seems good choices since changing pyproject.toml should trigger a rebuild and the user can `touch` the file as part of their workflow. `CachedByTimestamp` is now a shared util. It doesn't have methods as i don't think it's worth it yet for two users. Closes #478 TODO(konstin): Write a test. This is probably twice as much work as that fix itself, so i made that PR without one for now.	2023-12-06 11:47:01 -05:00
Charlie Marsh	a15da36d74	Avoid removing local wheels when unzipping (#560 ) ## Summary When installing a local wheel, we need to avoid removing the zipped wheel (since it lives outside of the cache), _and_ need to ensure that we unzip the wheel into the cache (rather than replacing the zipped wheel, which may even live outside of the project). Closes https://github.com/astral-sh/puffin/issues/553.	2023-12-05 17:50:08 +00:00
Charlie Marsh	6f055ecf3b	Remove existing built wheels when building source distributions (#559 ) This PR modifies the source distribution building to replace any existing targets after building the new wheel. In some cases, the existence of an existing target may be indicative of a bug, so we warn. It's partially a workaround for some (but not all) of the errors in https://github.com/astral-sh/puffin/issues/554.	2023-12-05 12:45:24 -05:00
Zanie Blue	37ca2e2928	Bump pubgrub for latest upstream (#525 ) https://github.com/pubgrub-rs/pubgrub/pull/157	2023-12-04 09:09:30 -06:00
konsti	6dc8ebcb90	Test interpreter cache invalidation (#540 ) Add missing test for #529/#508.	2023-12-04 10:03:43 +00:00
Charlie Marsh	ee2fca3a48	Add CACHEDIR and .gitignore tags to cache directories (#526 ) ## Summary Even if this will typically be in the user's application folder (rather than a local directory), it's still a good practice. Closes https://github.com/astral-sh/puffin/issues/280.	2023-12-02 00:37:51 +00:00
konsti	9806901a16	Consolidate wheel caches (#524 ) After this change, two wheel caches remain: `built-wheels-v0` and `wheels-v0`, docs screenshots below. Each contains both the wheel metadata, cache policy and zip or unzipped wheels under the same name. The zipped/unzipped strategy is as follows: In `pip-compile`, when we build a wheel, we store it zipped. When `pip-sync` or a source dist build in `pip-compile` need to install the wheel, we unzip it, remove the file and replace it with the unzipped wheel. This removes `WheelCache` and `UrlIndex` in favor of `Cache` plus `WheelCache`. The non-built wheel cache now considers index urls and the url for url wheels. I'm unsure if we need the `Unzipper` type, this could just be a function. I move `no_index` into `IndexUrls` and started using `IndexUrl` up to the clap level. I left a number of TODOs in the code, namely performing the actual invalidation of unzipped wheels and making the `InstallPlan` understand cache invalidation (i.e. uninstall wheels when their remote changed). ![image](https://github.com/astral-sh/puffin/assets/6826232/c4d45979-485b-4954-848d-fd3347ee2510)	2023-12-01 20:16:33 +00:00
Zanie Blue	2a8544df9e	Use a custom pubgrub report formatter (#521 ) Uses https://github.com/zanieb/pubgrub/pull/10 to drastically simplify our reporter implementation. This will allow us to make use of upstream improvements to the reporter e.g. https://github.com/zanieb/pubgrub/pull/8 without multiple duplicative pull requests.	2023-12-01 13:36:12 -06:00
Zanie Blue	efcc4f1409	Use upstream commit for reflink-copy requirement (#523 ) https://github.com/cargo-bins/reflink-copy/pull/51 was merged	2023-12-01 10:58:24 +00:00
Zanie Blue	5f1f207628	Recursively merge existing package directories on installation (#516 ) Previously, when installing a package we would delete the target directory before copying (or linking) the contents of the package. However, this means that we do not properly support namespace packages which can share a target directory. Instead the last package to be installed would be override existing packages. Since we install packages in parallel, this could result in a race condition where the target directory already exists which is not allowed when using `clonefile`. See example error in #515. `c7e63d2dce` provides a regression test for this — it fails on `main`. Here, we implement a recursive merge when the target directory already exists. Both packages will be installed into the same directory. We no longer delete the target directory, which seems okay since we uninstall packages before installing now. When files conflict, we will likely throw an error still. The correct behavior to implement in this case is unclear, as if we just take "first write wins" or "last write wins" we could end up with some files from one package and some from another resulting in two broken packages. A possible solution here is to lock the target directories while copying.	2023-11-30 10:14:51 -06:00
konsti	929df586fb	Skip tf-models-nightly in resolve-many dev script for now (#510 ) `tf-models-nightly` has pathologic backtracking behaviour, skip it for now so we can benchmark the rest.	2023-11-28 18:25:32 +00:00
konsti	d89fbeb642	Migrate interpreter query to custom caching (#508 ) This removes the last usage of cacache by replacing it with a custom, flat json caching keyed by the digest of the executable path. ![image](https://github.com/astral-sh/puffin/assets/6826232/8f777c4c-1f1b-4656-ba7b-002175270556) A step towards #478. I've made `CachedByTimestamp<T>` generic over `T` but intentionally not moved it to `puffin-cache` yet.	2023-11-28 17:14:59 +00:00
konsti	5435d44756	Introduce `Cache`, `CacheBucket` and `CacheEntry` (#507 ) This is mostly a mechanical refactor that moves 80% of our code to the same cache abstraction. It introduces cache `Cache`, which abstracts away the path of the cache and the temp dir drop and is passed throughout the codebase. To get a specific cache bucket, you need to requests your `CacheBucket` from `Cache`. `CacheBucket` is the centralizes the names of all cache buckets, moving them away from the string constants spread throughout the crates. Specifically for working with the `CachedClient`, there is a `CacheEntry`. I'm not sure yet if that is a strict improvement over `cache_dir: PathBuf, cache_file: String`, i may have to rotate that later. The interpreter cache moved into `interpreter-v0`. We can use the `CacheBucket` page to document the cache structure in each bucket: ![image](https://github.com/astral-sh/puffin/assets/6826232/b023fdfb-e34d-4c2d-8663-b5f73937a539)	2023-11-28 17:11:14 +00:00
konsti	8855f44b5f	Move simple index queries to `CachedClient` (#504 ) Replaces the usage of `http-cache-reqwest` for simple index queries with our custom cached client, removing `http-cache-reqwest` altogether. The new cache paths are `<cache>/simple-v0/<index>/<package_name>.json`. I could not test with a non-pypi index since i'm not aware of any other json indices (jax and torch are both html indices). In a future step, we can transform the response to be a `HashMap<Version, {source_dists: Vec<(SourceDistFilename, File)>, wheels: Vec<(WheeFilename, File)>}` (independent of python version, this cache is used by all environments together). This should speed up cache deserialization a bit, since we don't need to try source dist and wheel anymore and drop incompatible dists, and it should make building the `VersionMap` simpler. We can speed this up even further by splitting into a version lists and the info for each version. I'm mentioning this because deserialization was a major bottleneck in the rust part of the old python prototype. Fixes #481	2023-11-28 00:11:03 +00:00
konsti	d54e780843	Source dist metadata refactor (#468 ) ## Summary and motivation For a given source dist, we store the metadata of each wheel built through it in `built-wheel-metadata-v0/pypi/<source dist filename>/metadata.json`. During resolution, we check the cache status of the source dist. If it is fresh, we check `metadata.json` for a matching wheel. If there is one we use that metadata, if there isn't, we build one. If the source is stale, we build a wheel and override `metadata.json` with that single wheel. This PR thereby ties the local built wheel metadata cache to the freshness of the remote source dist. This functionality is available through `SourceDistCachedBuilder`. `puffin_installer::Builder`, `puffin_installer::Downloader` and `Fetcher` are removed, instead there are now `FetchAndBuild` which calls into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new main high-level abstraction: It spawns parallel fetching/building, for wheel metadata it calls into the registry client, for wheel files it fetches them, for source dists it calls `SourceDistCachedBuilder`. It handles locks around builds, and newly added also inter-process file locking for git operations. Fetching and building source distributions now happens in parallel in `pip-sync`, i.e. we don't have to wait for the largest wheel to be downloaded to start building source distributions. In a follow-up PR, I'll also clear built wheels when they've become stale. Another effect is that in a fully cached resolution, we need neither zip reading nor email parsing. Closes #473 ## Source dist cache structure Entries by supported sources: * `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json` * `<build wheel metadata cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json` * `<build wheel metadata cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json` But the url filename does not need to be a valid source dist filename (<https://github.com/search?q=path%3A*%2Frequirements.txt+master.zip&type=code>), so it could also be the following and we have to take any string as filename: `<build wheel metadata cache>/url/<sha256(url)>/master.zip/metadata.json` Example: ```text # git source dist pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git # pypi source dist django_allauth==0.51.0 # url source dist werkzeug @ `ff1904eb5e`2853bf83db817a7dd53d/werkzeug-3.0.1.tar.gz ``` will be stored as ```text built-wheel-metadata-v0 ├── git │ └── 5c56bc1c58c34c11 │ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f │ └── metadata.json ├── pypi │ └── django-allauth-0.51.0.tar.gz │ └── metadata.json └── url └── 6781bd6440ae72c2 └── werkzeug-3.0.1.tar.gz └── metadata.json ``` The inside of a `metadata.json`: ```json { "data": { "django_allauth-0.51.0-py3-none-any.whl": { "metadata-version": "2.1", "name": "django-allauth", "version": "0.51.0", ... } } } ```	2023-11-24 17:47:58 +00:00
konsti	8d247fe95b	Add `Tags::from_interpreter` (#498 ) Small refactoring	2023-11-24 11:36:01 +00:00
Charlie Marsh	17228ba04e	Add support for path dependencies (#471 ) ## Summary This PR adds support for local path dependencies. The approach mostly just falls out of our existing approach and infrastructure for Git and URL dependencies. Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a separate issue for editable installs.) ## Test Plan Added `pip-compile` tests that pre-download a wheel or source distribution, then install it via local path.	2023-11-21 11:49:42 +00:00
Charlie Marsh	f1aa70d9d3	Refactor distribution types to return `Result` (#470 ) ## Summary A variety of small refactors to the distribution types crate to (1) return `Result` if we find an invalid wheel, rather than treating it as a source distribution with a `.whl` suffix, and (2) DRY up some repeated code around URLs.	2023-11-20 23:08:54 +00:00
konsti	f0841cdb6e	Wheel metadata refactor (#462 ) A consistent cache structure for remote wheel metadata: * `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json` * `<wheel metadata cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json` * `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json` The source dist caching will use a similar structure (#468).	2023-11-20 17:26:36 +01:00
konsti	d3e9e1783f	Refactor lenient parsing (#467 ) Deduplicate lenient parsing code between version specifiers and Requirement. Use `warn_once!` since the warnings did show up multiple times in my code. Fix the macro hygiene in `warn_once!`.	2023-11-20 15:35:38 +00:00
Zanie Blue	e9b6fb90d6	Bump pubgrub to get range display changes (#444 ) See https://github.com/zanieb/pubgrub/pull/5	2023-11-20 09:12:48 -06:00
Charlie Marsh	60f595b469	Prefer future stream over `JoinSet` in downloader (#469 ) This avoids introducing a static lifetime requirement and, in my benchmarks, is even a little faster.	2023-11-20 13:23:30 +00:00
Charlie Marsh	8decb29bad	Use a dedicated error type for `puffin-distribution` (#466 )	2023-11-20 11:38:27 +00:00
Charlie Marsh	35fd86631b	Unify distribution operations into a single crate (#460 ) ## Summary This PR unifies the behavior that lived in the resolver's `distribution` crates with the behaviors that were spread between the various structs in the installer crate into a single `Fetcher` struct that is intended to manage all interactions with distributions. Specifically, the interface of this struct is such that it can access distribution metadata, download distributions, return those downloads, etc., all with a common cache. Overall, this is mostly just DRYing up code that was repeated between the two crates, and putting it behind a reasonable shared interface.	2023-11-20 11:22:52 +00:00
konsti	46bb18f06e	Track file index (#452 ) Track the index (or at least its url) where we got a file from across the source code. Fixes #448	2023-11-20 08:48:16 +00:00
Charlie Marsh	6fd582f8b9	Rename `puffin-distribution` to `distribution-types` (#458 ) ## Summary This crate only contains types, and I want to introduce a new crate for all _operations_ on distributions, so this feels like a more natural name given we also have `pypi-types`.	2023-11-20 09:40:26 +01:00
konsti	255edf4445	Serde support for WheelFilename through str repr (#459 ) I need this later, splitting out for PR size	2023-11-19 19:43:14 +00:00
konsti	ab60233131	Use absolute cache paths (#453 ) Previously, git requirements would fail when setting `--cache-dir`: ```console $ cargo run --bin puffin -- pip-compile --cache-dir cache-all-kinds scripts/benchmarks/requirements/all-kinds.in error: Failed to build distribution from URL: git+https://github.com/pydantic/pydantic-extra-types.git Caused by: Invalid path URL: cache-all-kinds/git-v0/db/b49ffcfeb6c2e9d8 ``` The cause is using a relative and not an absolute path, which `Url` needs, the solution is to turn the cache dir into an absolute path. This never showed up in the tests since the tests use absolute temp dirs for everything.	2023-11-19 13:32:32 +00:00
konsti	bf71e7adcf	Add graphviz output to puffin-dev resolve-cli (#443 ) I added output in graphviz DOT format to `puffin-dev resolve-cli` to help with debugging resolutions. This requires tracking the requested ranges in the graph. I also fixed the direction of the graph. Output for `black`: ```dot digraph { 0 [ label="click\n8.1.7"] 1 [ label="black\n23.11.0"] 2 [ label="packaging\n23.2"] 3 [ label="mypy-extensions\n1.0.0"] 4 [ label="tomli\n2.0.1"] 5 [ label="pathspec\n0.11.2"] 6 [ label="typing-extensions\n4.8.0"] 7 [ label="platformdirs\n4.0.0"] 1 -> 0 [ label=">=8.0.0"] 1 -> 3 [ label=">=0.4.3"] 1 -> 5 [ label=">=0.9.0"] 1 -> 4 [ label=">=1.1.0"] 1 -> 6 [ label=">=4.0.1"] 1 -> 2 [ label=">=22.0"] 1 -> 7 [ label=">=2"] } ``` ![image](https://github.com/astral-sh/puffin/assets/6826232/4a440fcd-6248-4349-8e1a-c3e0363e42b1) transformers: ![image](https://github.com/astral-sh/puffin/assets/6826232/a13a693c-a8c0-4a4f-95d9-3458431c678a) jupyter: ![graphviz](https://github.com/astral-sh/puffin/assets/6826232/ef730033-6fd9-4ea9-ac93-8c874c19a101)	2023-11-17 18:16:24 +00:00
Zanie Blue	221751487c	Use `UnusableDependencies` for URL dependency conflicts (#425 ) Extends #424 with support for URL dependency incompatibilities. Requires changes to `miette` to prevent URLs from being word wrapped; accepted upstream in https://github.com/zkat/miette/pull/321	2023-11-17 08:28:12 -06:00
Charlie Marsh	2094680cdd	Add a `warn_user_once!` macro (#442 ) Closes https://github.com/astral-sh/puffin/issues/429.	2023-11-17 02:34:06 +00:00
konsti	1883dbdc21	Always¹ clear temporary directories (#437 ) Always¹ clear the temporary directories we create. * Clear source dist downloads: Previously, the temporary directories would remain in the cache dir, now they are cleared properly * Clear wheel file downloads: Delete the `.whl` file, we only need to cache the unpacked wheel * Consistent handling of cache arguments: Abstract the handling for CLI cache args away, again making sure we remove the `--no-cache` temp dir. There are no more `into_path()` calls that persist `TempDir`s that i could find. ¹Assuming drop is run, and deleting the directory doesn't silently error.	2023-11-16 20:49:48 +00:00
Zanie Blue	0d9d4f9fca	Add an `UnusableDependencies` incompatibility kind and use for conflicting versions (#424 ) Addresses https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969 Similar to #338 this throws an error when merging versions results in an empty set. Instead of propagating that error, we capture it and return a new dependency type of `Unusable`. Unusable dependencies are a new incompatibility kind which includes an arbitrary "reason" string that we present to the user. Adding a new incompatibility kind requires changes to the vendored pubgrub crate. We could use this same incompatibility kind for conflicting urls as in #284 which should allow the solver to backtrack to another valid version instead of failing (see #425). Unlike #383 this does not require changes to PubGrub's package mapping model. I think in the long run we'll want PubGrub to accept multiple versions per package to solve this specific issue, but we're interested in it being merged upstream first. This pull request is just using the issue as a simple case to explore adding a new incompatibility type. We may or may not be able convince them to add this new incompatibility type upstream. As discussed in https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more general incompatibility kind instead which can be used for arbitrary problems. An upstream pull request has been opened for discussion at https://github.com/pubgrub-rs/pubgrub/pull/153. Related to: - https://github.com/pubgrub-rs/pubgrub/issues/152 - #338 - #383 --------- Co-authored-by: konsti <konstin@mailbox.org>	2023-11-16 20:02:06 +00:00
Zanie Blue	832058dbba	Switch from vendored PubGrub to a fork (#438 ) A fork will let us stay up to date with the upstream while replaying our work on top of it. I expect a similar workflow to the RustPython-Parser fork we maintained, except that I wrote an automation to create tags for each commit on the fork (https://github.com/zanieb/pubgrub/pull/2) so we do not need to manually tag and document each commit. To update with the upstream: - Rebase our fork's `main` branch on top of the latest changes in upstream's `dev` branch - Force push, overwriting our `main` branch history - Change the commit hash here to the last commit on `main` in our fork Since we automatically tag each commit on the fork, we should never lose the commits that are dropped from `main` during rebase.	2023-11-16 13:49:19 -06:00
konsti	e41ec12239	Option to resolve at a fixed timestamp with `pip-compile --exclude-newer YYYY-MM-DD` (#434 ) This works by filtering out files with a more recent upload time, so if the index you use does not provide upload times, the results might be inaccurate. pypi provides upload times for all files. This is, the field is non-nullable in the warehouse schema, but the simple API PEP does not know this field. If you have only pypi dependencies, this means deterministic, reproducible(!) resolution. We could try doing the same for git repos but it doesn't seem worth the effort, i'd recommend pinning commits since git histories are arbitrarily malleable and also if you care about reproducibility and such you such not use git dependencies but a custom index. Timestamps are given either as RFC 3339 timestamps such as `2006-12-02T02:07:43Z` or as UTC dates in the same format such as `2006-12-02`. Dates are interpreted as including this day, i.e. until midnight UTC that day. Date only is required to make this ergonomic and midnight seems like an ergonomic choice. In action for `pandas`: ```console $ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in Resolved 6 packages in 679ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in numpy==1.26.2 # via pandas pandas==2.1.3 python-dateutil==2.8.2 # via pandas pytz==2023.3.post1 # via pandas six==1.16.0 # via python-dateutil tzdata==2023.3 # via pandas $ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in Resolved 5 packages in 655ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in numpy==1.23.4 # via pandas pandas==1.5.1 python-dateutil==2.8.2 # via pandas pytz==2022.6 # via pandas six==1.16.0 # via python-dateutil $ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in Resolved 5 packages in 594ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in numpy==1.21.4 # via pandas pandas==1.3.4 python-dateutil==2.8.2 # via pandas pytz==2021.3 # via pandas six==1.16.0 # via python-dateutil ```	2023-11-16 19:46:17 +00:00
konsti	751f7fa9c6	Improve PEP 691 compatibility (#428 ) [PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly different, more relaxed rules around file metadata. These changes are now reflected in the `File` struct. This will make it easier to support alternative indices. I had expected that i need to introduce a separate type for that, so i'm happy it's two `Option`s more and an alias. Part of #412	2023-11-16 19:03:44 +01:00
Charlie Marsh	d3caf9ae86	Choose most-compatible wheel in resolver and installer (#422 ) ## Summary This PR implements logic to sort wheels by priority, where priority is defined as preferring more "specific" wheels over less "specific" wheels. For example, in the case of Black, my machine now selects `black-23.11.0-cp311-cp311-macosx_11_0_arm64.whl`, whereas sorting by lowest priority instead gives me `black-23.11.0-py3-none-any.whl`. As part of this change, I've also modified the resolver to fallback to using incompatible wheels when determining package metadata, if no compatible wheels are available. The `VersionMap` was also moved out of `resolver.rs` and into its own file with a wrapper type, for clarity. Closes https://github.com/astral-sh/puffin/issues/380. Closes https://github.com/astral-sh/puffin/issues/421.	2023-11-15 18:22:11 +00:00
Charlie Marsh	0af2f7e39f	Use `anstream` to avoid writing colorized output (#415 ) A more robust solution to avoiding colorized output by ensuring we write to `stdout` and `stderr` via the [`anstream`](https://docs.rs/anstream/latest/anstream/) crate. Closes https://github.com/astral-sh/puffin/issues/393.	2023-11-13 20:00:12 +00:00
Andrew Gallant	63f7f65190	change global allocator to jemalloc (and mimalloc on Windows) (#399 ) This copies the allocator configuration used in the Ruff project. In particular, this gives us an instant 10% win when resolving the top 1K PyPI packages: $ hyperfine \ "./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \ "./target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 974.2 ms ± 26.4 ms [User: 17503.3 ms, System: 2205.3 ms] Range (min … max): 943.5 ms … 1015.9 ms 10 runs Benchmark 2: ./target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 883.1 ms ± 23.3 ms [User: 14626.1 ms, System: 2542.2 ms] Range (min … max): 849.5 ms … 916.9 ms 10 runs Summary './target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran 1.10 ± 0.04 times faster than './target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' I was moved to do this because I noticed `malloc`/`free` taking up a fairly sizeable percentage of time during light profiling. As is becoming a pattern, it will be easier to review this commit-by-commit. Ref #396 (wouldn't call this issue fixed) ----- I did also try adding a `smallvec` optimization to the `Version::release` field, but it didn't bare any fruit. I still think there is more to explore since the results I observed don't quite line up with what I expect. (So probably either my mental model is off or my measurement process is flawed.) You can see that attempt with a little more explanation here: `f9528b4ecd` In the course of adding the `smallvec` optimization, I also shrunk the `Version` fields from a `usize` to a `u32`. They should at least be a fixed size integer since version numbers aren't used to index memory, and I shrunk it to `u32` since it seems reasonable to assume that all version numbers will be smaller than `2^32`.	2023-11-10 14:48:59 -05:00
konsti	5cef40d87a	Add proper caching for pypi metadata fetching kinds (#368 ) I intend this to become the main form of caching for puffin: You can make http requests, you tranform the data to what you really need, you have control over the cache key, and the cache is always json (or anything else much faster we want to replace it with as long as it's serde!)	2023-11-10 11:03:40 +00:00
konsti	d1b57acaa8	Implement PEP 517 backend-path (#385 ) Closes #192	2023-11-10 11:54:23 +01:00
Andrew Gallant	33c0901a28	distribution-filename: speed up is_compatible (#367 ) This PR tweaks the representation of `Tags` in order to offer a faster implementation of `WheelFilename::is_compatible`. We now use a nested map of tags that lets us avoid looping over every supported platform tag. As the code comments suggest, that is the essential gain. We still do not mind looping over the tags in each wheel name since they tend to be quite small. And pushing our thumb on that side of things can make things worse overall since it would likely slow down WheelFilename construction itself. For micro-benchmarks, we improve considerably for compatibility checking: $ critcmp base test3 group base test3 ----- ---- ----- build_platform_tags/burntsushi-archlinux 1.00 46.2±0.28µs ? ?/sec 2.48 114.8±0.45µs ? ?/sec wheelname_parsing/flyte-long-compatible 1.00 624.8±3.31ns 174.0 MB/sec 1.01 629.4±4.30ns 172.7 MB/sec wheelname_parsing/flyte-long-incompatible 1.00 743.6±4.23ns 165.4 MB/sec 1.00 746.9±4.62ns 164.7 MB/sec wheelname_parsing/flyte-short-compatible 1.00 526.7±4.76ns 54.3 MB/sec 1.01 530.2±5.81ns 54.0 MB/sec wheelname_parsing/flyte-short-incompatible 1.00 540.4±4.93ns 60.0 MB/sec 1.01 545.7±5.31ns 59.4 MB/sec wheelname_parsing_failure/flyte-long-extension 1.00 13.6±0.13ns 3.2 GB/sec 1.01 13.7±0.14ns 3.2 GB/sec wheelname_parsing_failure/flyte-short-extension 1.00 14.0±0.20ns 1160.4 MB/sec 1.01 14.1±0.14ns 1146.5 MB/sec wheelname_tag_compatibility/flyte-long-compatible 11.33 159.8±2.79ns 680.5 MB/sec 1.00 14.1±0.23ns 7.5 GB/sec wheelname_tag_compatibility/flyte-long-incompatible 237.60 1671.8±37.99ns 73.6 MB/sec 1.00 7.0±0.08ns 17.1 GB/sec wheelname_tag_compatibility/flyte-short-compatible 16.07 223.5±8.60ns 128.0 MB/sec 1.00 13.9±0.30ns 2.0 GB/sec wheelname_tag_compatibility/flyte-short-incompatible 149.83 628.3±2.13ns 51.6 MB/sec 1.00 4.2±0.10ns 7.6 GB/sec We do regress slightly on the time it takes for `Tags::new` to run, but this is somewhat expected. And in absolute terms, 114us is perfectly acceptable given that it's only executed ~once for each `puffin` invocation. Ad hoc benchmarks indicate an overall 25% perf improvement in `puffin pip-compile` times. This roughly corresponds with how much time `is_compatible` was taking. Indeed, profiling confirms that it has virtually disappeared from the profile. Fixes #157	2023-11-09 09:01:03 -05:00
konsti	d407bbbee6	Special case missing header build errors (on linux) (#354 ) One of the most common errors i observed are build failures due to missing header files. On ubuntu, this generally means that you need to install some `<...>-dev` package that the documentation tells you about, e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs `default-libmysqlclient-dev`, [some psycopg versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation) (i remember that this was always required at some earlier point) require `libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common for many scientific packages (where conda has an advantage because they can provide those package as a dependency). The error message can be completely inscrutable if you're just a python programmer (or user) and not a c programmer (example: pygraphviz): ``` warning: no files found matching '.png' under directory 'doc' warning: no files found matching '.txt' under directory 'doc' warning: no files found matching '.css' under directory 'doc' warning: no previously-included files matching '~' found anywhere in distribution warning: no previously-included files matching '.pyc' found anywhere in distribution warning: no previously-included files matching '.svn' found anywhere in distribution no previously-included directories found matching 'doc/build' pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory 3020 \| #include "graphviz/cgraph.h" \| ^~~~~~~~~~~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1 ``` The only relevant part is `Fatal error: graphviz/cgraph.h: No such file or directory`. Why is this file not there and how do i get it to be there? This is even harder to spot in pip's output, where it's 11 lines above the last line: ![image](https://github.com/astral-sh/puffin/assets/6826232/7a3d7279-e7b1-4511-ab22-d0a35be5e672) I've special cased missing headers and made sure that the last line tells you the important information: We're missing some header, please check the documentation of {package} {version} for what to install: ![image](https://github.com/astral-sh/puffin/assets/6826232/4bbb8923-5a82-472f-ab1f-9e1471aa2896) Scrolling up: ![image](https://github.com/astral-sh/puffin/assets/6826232/89a2495a-e188-4288-b534-ad885ee08763) The difference gets even clearer with a default ubuntu terminal with its 80 columns: ![image](https://github.com/astral-sh/puffin/assets/6826232/49fb27bc-07c6-4b10-a1a1-30ec8e112438) --- Note that the situation is better for a missing compiler, there i get: ``` [...] warning: no previously-included files matching '~' found anywhere in distribution warning: no previously-included files matching '*.pyc' found anywhere in distribution warning: no previously-included files matching '.svn' found anywhere in distribution no previously-included directories found matching 'doc/build' error: command 'gcc' failed: No such file or directory --- ``` Putting the last line into google, the first two results tell me to `sudo apt-get install gcc`, the third even tells me about `sudo apt install build-essential`	2023-11-08 15:26:39 +00:00
Andrew Gallant	294955ecff	fix platform detection on Linux (#359 ) Rejigger Linux platform detection This change makes some very small improvements to the Linux platform detection logic. In particular, the existing logic did not work on my Archlinux machine since /lib64/ld-linux-x86-64.so.2 isn't a symlink. In that case, the detection logic should have fallen back to the slower `ldd --version` technique, but `read_link` fails outright when its argument isn't a symbolic link. So we tweak the logic to allow it to fail, and if it does, we still try the `ldd --version` approach instead of giving up completely. I also made some cosmetic improvements to the regex matching, as well as ensuring that the regexes are only compiled exactly once.	2023-11-07 11:39:35 -05:00
Charlie Marsh	b0286a8939	Add user feedback when building source distributions in the resolver (#347 ) It looks like Cargo, notice the bold green lines at the top (which appear during the resolution, to indicate Git fetches and source distribution builds): <img width="868" alt="Screen Shot 2023-11-06 at 11 28 47 PM" src="https://github.com/astral-sh/puffin/assets/1309177/9647a480-7be7-41e9-b1d3-69faefd054ae"> <img width="868" alt="Screen Shot 2023-11-06 at 11 28 51 PM" src="https://github.com/astral-sh/puffin/assets/1309177/6bc491aa-5b51-4b37-9ee1-257f1bc1c049"> Closes https://github.com/astral-sh/puffin/issues/287 although we can do a lot more here.	2023-11-07 14:17:31 +00:00
Charlie Marsh	2c32bc5a86	Respect direct URLs in puffin installer (#345 ) We now write the `direct_url.json` when installing, and _skip_ installing if we find a package installed via the direct URL that the user is requesting. A lot of TODOs, especially around cleaning up the `Source` abstraction and its relationship to `DirectUrl`. I'm gonna keep working on these today, but this works and makes the requirements clear. Closes #332.	2023-11-07 09:11:27 -05:00
konsti	fbe28d3b7c	Fix mastodon-py dist-info handling (#336 ) mastodon-py 1.5.1 uses a dot in its dist-info dir name, which we previously didn't handle, causing home-assistant to fail. The new implementation is based on `2f83540272/src/packaging/utils.py (L146-L172)`. Part of #199 ``` unzip -l Mastodon.py-1.5.1-py2.py3-none-any.whl Archive: Mastodon.py-1.5.1-py2.py3-none-any.whl Length Date Time Name --------- ---------- ----- ---- 153929 2020-02-29 17:39 mastodon/Mastodon.py 1029 2019-10-11 19:15 mastodon/__init__.py 7357 2019-10-11 20:24 mastodon/streaming.py 10 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/DESCRIPTION.rst 1398 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/metadata.json 9 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/top_level.txt 110 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/WHEEL 1543 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/METADATA 753 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/RECORD --------- ------- 166138 9 files ```	2023-11-07 12:36:11 +01:00
Charlie Marsh	2c114592bd	Only store small wheels in-memory (#348 ) Closes https://github.com/astral-sh/puffin/issues/246.	2023-11-07 00:50:00 +00:00
Charlie Marsh	a5e535f6fb	Remove `virtualenv` setup from gourgeist (#339 ) We now only support building bare environments.	2023-11-06 18:32:45 +00:00
Charlie Marsh	b013ea9c93	Move `DirectUrl` into `pypi-types` (#343 ) This needs to be reused elsewhere, and there's nothing specific to wheel installation about it.	2023-11-06 18:26:33 +00:00
Charlie Marsh	24e30e6557	Split `puffin-package` into requirements.txt parser and `pypi-types` (#341 ) There are only two things left in this crate and they don't really have anything to do with one another.	2023-11-06 18:19:49 +00:00
Charlie Marsh	d9bcfafa16	Write `direct_url.json` in wheel installer (#337 ) ## Summary This PR just adds the logic in `install-wheel-rs` to write `direct_url.json`. We're not actually taking advantage of it yet (or wiring it through) in Puffin. Part of https://github.com/astral-sh/puffin/issues/332.	2023-11-06 17:09:28 +00:00
konsti	9b077f3d0f	`cargo upgrade --incompatible` (#330 ) Ran `cargo upgrade --incompatible`, seems there are no changes required. From cacache 0.12.0: > BREAKING CHANGE: some signatures for copy have changed, and copy no longer automatically reflinks `which` 5.0.0 seems to have only error message changes.	2023-11-06 14:14:47 +00:00
konsti	b2439b24a1	Fetch wheel metadata by async range requests on the remote wheel (#301 ) Use range requests and async zip to extract the METADATA file from a remote wheel. We currently only cache when the remote says the remote declares the resource as immutable, see https://github.com/06chaynes/http-cache/issues/57 and https://github.com/baszalmstra/async_http_range_reader/pull/1 . The cache is stored as json with the description omitted, this improve cache deserialization performance.	2023-11-06 15:06:49 +01:00
Charlie Marsh	6d672b8951	Add source distribution support to `pip-compile` (#323 ) ## Summary This is a first-pass at adding source distribution support to the installer. The previous installation flow was: 1. Come up with a plan. 1. Find a distribution (specific file) for every package that we'll need to download. 1. Download those distributions. 1. Unzip them (since we assumed they were all wheels). 1. Install them into the virtual environment. Now, Step (3) downloads both wheels and source distributions, and we insert a step between Steps (3) and (4) to build any source distributions into zipped wheels. There are a bunch of TODOs, the most important (IMO) is that we basically have two implementations of downloading and building, between the stuff in `puffin_installer` and `puffin_resolver` (namely in `crates/puffin-resolver/src/distribution`). I didn't attempt to clean that up here -- it's already a problem, and it's related to the overall problem we need to solve around unified caching and resource management. Closes #243.	2023-11-06 08:22:36 -05:00
konsti	b79a15b458	Update pyproject-toml to 0.8.0 (#329 )	2023-11-06 13:16:36 +00:00
Charlie Marsh	d785ffdbff	Move `Source` abstraction into `puffin-distribution` (#321 ) No code changes, but this will allow it to be shared between the installer and the resolver.	2023-11-06 02:31:15 +00:00
Charlie Marsh	fa1bbbbe08	Write fully-precise Git SHAs to `pip-compile` output (#299 ) This PR adds a mechanism by which we can ensure that we _always_ try to refresh Git dependencies when resolving; further, we now write the fully resolved SHA to the "lockfile". However, nothing in the code _assumes_ we do this, so the installer will remain agnostic to this behavior. The specific approach taken here is minimally invasive. Specifically, when we try to fetch a source distribution, we check if it's a Git dependency; if it is, we fetch, and return the exact SHA, which we then map back to a new URL. In the resolver, we keep track of URL "redirects", and then we use the redirect (1) for the actual source distribution building, and (2) when writing back out to the lockfile. As such, none of the types outside of the resolver change at all, since we're just mapping `RemoteDistribution` to `RemoteDistribution`, but swapping out the internal URLs. There are some inefficiencies here since, e.g., we do the Git fetch, send back the "precise" URL, then a moment later, do a Git checkout of that URL (which will be _mostly_ a no-op -- since we have a full SHA, we don't have to fetch anything, but we _do_ check back on disk to see if the SHA is still checked out). A more efficient approach would be to return the path to the checked-out revision when we do this conversion to a "precise" URL, since we'd then only interact with the Git repo exactly once. But this runs the risk that the checked-out SHA changes between the time we make the "precise" URL and the time we build the source distribution. Closes #286.	2023-11-03 16:26:57 +00:00
Charlie Marsh	62c474d880	Add support for Git dependencies (#283 ) ## Summary This PR adds support for Git dependencies, like: ``` flask @ git+https://github.com/pallets/flask.git ``` Right now, they're only supported in the resolver (and not the installer), since the installer doesn't yet support source distributions at all. The general approach here is based on Cargo's Git implementation. Specifically, I adapted Cargo's [`git`](`23eb492cf9/src/cargo/sources/git/mod.rs`) module to perform the cloning, which is based on `libgit2`. As compared to Cargo's implementation, I made the following changes: - Removed any unnecessary code. - Fixed any Clippy errors for our stricter ruleset. - Removed the dependency on `curl`, in favor of `reqwest` which we use elsewhere. - Removed the ability to use `gix`. Cargo allows the use of `gix` as an experimental flag, but it only supports a small subset of the operations. When Cargo fully adopts `gix`, we should plan to do the same. - Removed Cargo's host key checking. We need to re-add this! I'll do it shortly. - Removed Cargo's progress bars. We should re-add this too, but we use `indicatif` and Cargo had their own thing. There are a few follow-ups to consider: - Adding support in the installer. - When we lock, we should write out the Git URL that includes the exact SHA. This lets us cache in perpetuity and avoids dependencies changing without re-locking. - When we resolve, we should _always_ try to refresh Git dependencies. (Right now, we skip if the wheel was already built.) I'll work on the latter two in follow-up PRs. Closes #202.	2023-11-02 15:14:55 +00:00
konsti	4adaa9a700	Wheel filename distribution package name (#278 ) The normalized name abstractions were not consistently, this PR uses them where they were previously missing: * `WheelFilename::distribution` * `Requirement::name` * `Requirement::extras` * `Metadata21::name` * `Metadata21::provides_dist` With `puffin-package` depending on `pep508_rs` this would be cyclical crate dependency, so `puffin-normalize` gets split out from `puffin-package`. `DistInfoName` has the same task and semantics as `PackageName`, so it's merged into the latter. `PackageName` and `ExtraName` documentation is moved onto the type and their constructors are called `new` instead of `normalize`. We now use these constructors rarely enough the implicit allocation by `to_string()` shouldn't matter anymore, while more actual cloning becomes visible.	2023-11-02 11:15:27 +00:00
Charlie Marsh	2ee555df7b	Use `puffin_cache::digest` in another site (#289 )	2023-11-02 04:48:14 +00:00
Charlie Marsh	8123e1a8f6	Add stable hash crate (#281 ) This PR adds a `puffin-cache` crate that we can share across a variety of other crates to generate stable hashes.	2023-11-01 23:41:45 +00:00
Charlie Marsh	2652caa3e3	Add support for URL dependencies (#251 ) ## Summary This PR adds support for resolving and installing dependencies via direct URLs, like: ``` werkzeug @ `960bb4017c`4aed12b5ed8b78e0153e/Werkzeug-2.0.0-py3-none-any.whl ``` These are fairly common (e.g., with `torch`), but you most often see them as Git dependencies. Broadly, structs like `RemoteDistribution` and friends are now enums that can represent either registry-based dependencies or URL-based dependencies: ```rust /// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`). #[derive(Debug, Clone)] #[allow(clippy::large_enum_variant)] pub enum RemoteDistribution { /// The distribution exists in a registry, like `PyPI`. Registry(PackageName, Version, File), /// The distribution exists at an arbitrary URL. Url(PackageName, Url), } ``` In the resolver, we now allow packages to take on an extra, optional `Url` field: ```rust #[derive(Debug, Clone, Eq, Derivative)] #[derivative(PartialEq, Hash)] pub enum PubGrubPackage { Root, Package( PackageName, Option<DistInfoName>, #[derivative(PartialEq = "ignore")] #[derivative(PartialOrd = "ignore")] #[derivative(Hash = "ignore")] Option<Url>, ), } ``` However, for the purpose of version satisfaction, we ignore the URL. This allows for the URL dependency to satisfy the transitive request in cases like: ``` flask==3.0.0 werkzeug @ `254c3e9b5f`5941e900b71206e6313b/werkzeug-3.0.1-py3-none-any.whl ``` There are a couple limitations in the current approach: - The caching for remote URLs is done separately in the resolver vs. the installer. I decided not to sweat this too much... We need to figure out caching holistically. - We don't support any sort of time-based cache for remote URLs -- they just exist forever. This will be a problem for URL dependencies, where we need some way to evict and refresh them. But I've deferred it for now. - I think I need to redo how this is modeled in the resolver, because right now, we don't detect a variety of invalid cases, e.g., providing two different URLs for a dependency, asking for a URL dependency and a _different version_ of the same dependency in the list of first-party dependencies, etc. - (We don't yet support VCS dependencies.)	2023-11-01 09:21:44 -04:00
Charlie Marsh	89dad0c9ad	Move distribution abstraction in shared crate (#258 ) This also allows us to get rid of `PinnedPackage` _and_ to remove some `Result<...>` types due to needless conversions between otherwise-identical types.	2023-10-31 15:30:06 -04:00
Charlie Marsh	3312ce30f5	Upgrade crates and remove unused dependencies (#256 )	2023-10-31 13:16:58 -04:00
konsti	29bd0a4ed8	Fix musl compilation (#234 ) musl (which we already use in ruff) allows statically linked binaries on linux. This PR switches to rustls and vendors and fixes the glibc detection. Using static musl builds makes it easier to avoid glibc errors in docker and we'll need it later for alpine users anyway. An alternative is using vendored openssl.	2023-10-30 18:10:17 +01:00
Charlie Marsh	2ba85bf80e	Add PubGrub's priority queue (#221 ) Pulls in https://github.com/pubgrub-rs/pubgrub/pull/104.	2023-10-29 21:16:02 +00:00
konsti	5ad58474ca	Add script to check the top 8k pypi packages (#198 ) To check to top 1k (current state): ```bash scripts/resolve/get_pypi_top_8k.sh cargo run --bin puffin-dev -- resolve-many scripts/resolve/pypi_top_8k_flat.txt --limit 1000 ``` Results: ``` Errors: pywin32, geoip2, maxminddb, pypika, dirac Success: 995, Error: 5 ``` pywin32 has no solution for the build environment, 3 have no `[build-system]` entry in pyproject.toml, `dirac` is missing cmake	2023-10-26 12:03:59 +00:00
konsti	216b6c41c2	Start puffin-dev (#193 ) Currently, this is only the source distribution building feature moved. It's intended that we can add development and test commands there without affecting the main cli surface	2023-10-26 09:17:22 +00:00
konsti	889f6173cc	Unify python interpreter abstractions (#178 ) Previously, we had two python interpreter metadata structs, one in gourgeist and one in puffin. Both would spawn a subprocess to query overlapping metadata and both would appear in the cli crate, if you weren't careful you could even have to different base interpreters at once. This change unifies this to one set of metadata, queried and cached once. Another effect of this crate is proper separation of python interpreter and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv and conda installed python) has a set of metadata. A venv has a root and inherits the base python metadata except for `sys.prefix`, which unlike `sys.base_prefix`, gets set to the venv root. From the root and the interpreter info we can compute the paths inside the venv. We can reuse the interpreter info of the base interpreter when creating a venv without having to query the newly created `python`.	2023-10-25 20:11:36 +00:00
konsti	1fbe328257	Build source distributions in the resolver (#138 ) This is isn't ready, but it can resolve `meine_stadt_transparent==0.2.14`. The source distributions are currently being built serially one after the other, i don't know if that is incidentally due to the resolution order, because sdist building is blocking or because of something in the resolver that could be improved. It's a bit annoying that the thing that was supposed to do http requests now suddenly also has to a whole download/unpack/resolve/install/build routine, it messes up the type hierarchy. The much bigger problem though is avoid recursive crate dependencies, it's the reason for the callback and for splitting the builder into two crates (badly named atm)	2023-10-25 20:05:13 +00:00
Charlie Marsh	49a27ff33c	Add support for parameterized link modes (#164 ) Allows the user to select between clone, hardlink, and copy semantics for installs. (The pnpm documentation has a decent description of what these mean: https://pnpm.io/npmrc#package-import-method.) Closes #159.	2023-10-22 04:35:50 +00:00
Charlie Marsh	b665f1489a	Add tests for `puffin sync` (#161 ) Closes #158.	2023-10-22 03:25:00 +00:00
Charlie Marsh	3072c3265e	Add support for lowest and lowest-direct resolution modes (#160 ) Borrows terminology from pnpm by introducing three resolution modes: - "Highest": always choose the highest compliant version (default). - "Lowest": always choose the lowest compliant version. - "LowestDirect": choose the lowest compliant version of direct dependencies, and the highest compliant version of any transitive dependencies. (This makes a bit more sense than "lowest".) Closes https://github.com/astral-sh/puffin/issues/142.	2023-10-21 22:58:06 -04:00
konsti	ae9d1f7572	Add source distribution filename abstraction (#154 ) The need for this became clear when working on the source distribution integration into the resolver. While at it i also switch the `WheelFilename` version to the parsed `pep440_rs` version now that we have this crate.	2023-10-20 17:45:57 +02:00
Charlie Marsh	4645f79237	Use `FxHash` (#151 )	2023-10-20 05:26:06 +00:00
Charlie Marsh	8001c792e7	Show requirement sources in `pip-compile` output (#149 ) Builds up a complete resolved graph from PubGrub, and shows the sources that led to each package being included in the resolution, like `pip-compile`. Closes https://github.com/astral-sh/puffin/issues/60.	2023-10-20 05:14:59 +00:00
Charlie Marsh	9b3405bf0e	Upgrade PubGrub to dev branch (#147 ) Updates to `29c48fb9f3daa11bd02794edd55060d0b01ee705` from the `pubgrub-rs` dev branch. This lets us reduce the number of changes we've made to PubGrub itself (now, only changing visibility to export a few things from the `solver.rs` module).	2023-10-20 03:23:26 +00:00
Charlie Marsh	03101c6a5c	Add an autogeneration header to pip-compile (#145 ) Closes https://github.com/astral-sh/puffin/issues/132.	2023-10-19 20:57:27 -04:00
Charlie Marsh	d5105a76c5	Improve and test diagnostics for requirements-reading CLI commands (#143 ) Also removes `owo_colors` because it was really painful to get it to avoid printing colors during tests.	2023-10-19 18:13:40 -04:00
Charlie Marsh	ba181eacdd	Accept dependencies from `pyproject.toml` (#141 ) Doesn't support extras yet. It's also supported for `pip uninstall`, which `pip` itself doesn't support, but whatever. Closes #127.	2023-10-19 18:42:05 +00:00
Charlie Marsh	7ef6c0315c	Unify site-packages into distribution enum (#136 ) Gets rid of the custom `DistInfo` struct in the site-packages abstraction in favor of a new kind of distribution (`InstalledDistribution`). No change in behavior.	2023-10-19 04:37:52 +00:00
Charlie Marsh	4b91ae4769	Add CLI tests for add and remove commands (#124 )	2023-10-19 01:06:48 +00:00
konsti	8cc4fe0d44	Install source distribution requirements with puffin itself instead of pip (#122 ) This is also a lot faster. Unfortunately it copies a lot of code from the sync cli since the `Printer` is private. The first commit are some refactorings i made when i thought about how i could reuse the existing code.	2023-10-18 19:11:17 +00:00
Charlie Marsh	7bc42ca2ce	Use `owo_colors` instead of `colored` (#121 ) This is what `miette` uses so seems better to avoid two coloring crates.	2023-10-18 18:57:07 +00:00
Charlie Marsh	1fc03780f9	Use `miette` for `puffin add` diagnostics (#119 ) Experiment in using `miette` for better user-facing diagnostics in the CLI crate: <img width="710" alt="Screen Shot 2023-10-18 at 2 11 54 PM" src="https://github.com/astral-sh/puffin/assets/1309177/30299da0-da65-4972-944f-cb8cc5f72a77"> For now, only the `add` command has been migrated, and all the library crates continue to use `anyhow`.	2023-10-18 14:24:09 -04:00
Charlie Marsh	4c87a1d42c	Add a `puffin add` command (#117 ) This needs far better error handling and user-facing feedback, but it does the basic operation (and includes discovery of the `pyproject.toml` file, etc.).	2023-10-18 00:51:20 -04:00
Charlie Marsh	5f5788e866	Surface PubGrub derivation trees (#108 ) I think the derivation trees could be stronger but this exposes PubGrub's proof-like error messages. Closes #102.	2023-10-16 14:14:36 -04:00
Charlie Marsh	7e8ffeb2df	Use `fs-err` in more crates (#100 ) Closes https://github.com/astral-sh/puffin/issues/88.	2023-10-16 13:37:58 +00:00
konsti	fa2fd14587	Add basic sdist builder (#104 ) This adds a basic sdist builder that has been tested with two source distributions, one with a PEP 517 backend and one with setup.py. It uses pip for requirements installation atm, lacks testing in all directions, lacks checks for recursive requirements, can't pass in already resolved versions, doesn't support prepare metadata for build to allow resolution to continue without doing the actual (native) build, error messages are mediocre, etc. ```console $ RUST_LOG=puffin_build=debug puffin-build --wheels wheels downloads/tqdm-4.66.1.tar.gz 2023-10-16T12:28:35.503182Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/tqdm-4.66.1.tar.gz 2023-10-16T12:28:35.521780Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=18.4ms time.idle=16.7µs 2023-10-16T12:28:35.845096Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: Calling pip to install build dependencies 2023-10-16T12:28:37.668660Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: close time.busy=1.82s time.idle=13.2µs 2023-10-16T12:28:37.668744Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.get_requires_for_build_wheel()` 2023-10-16T12:28:38.159205Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=490ms time.idle=13.0µs 2023-10-16T12:28:38.159304Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.build_wheel()` 2023-10-16T12:28:38.501732Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=342ms time.idle=15.2µs 2023-10-16T12:28:38.522700Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=3.02s time.idle=16.2µs Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/tqdm-4.66.1-py3-none-any.whl 2023-10-16T12:28:38.522772Z DEBUG puffin_build: Took 3020ms $ puffin-build --wheels wheels downloads/geoextract-0.3.1.tar.gz 2023-10-16T12:28:40.884622Z DEBUG build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/geoextract-0.3.1.tar.gz 2023-10-16T12:28:40.887743Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=2.97ms time.idle=12.6µs 2023-10-16T12:28:41.469738Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=585ms time.idle=15.3µs Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/geoextract-0.3.1-py3-none-any.whl 2023-10-16T12:28:41.469814Z DEBUG puffin_build: Took 585ms ```	2023-10-16 12:43:31 +00:00
Charlie Marsh	471a1d657d	Migrate resolver proof-of-concept to PubGrub (#97 ) ## Summary This PR enables the proof-of-concept resolver to backtrack by way of using the `pubgrub-rs` crate. Rather than using PubGrub as a _framework_ (implementing the `DependencyProvider` trait, letting PubGrub call us), I've instead copied over PubGrub's primary solver hook (which is only ~100 lines or so) and modified it for our purposes (e.g., made it async). There's a lot to improve here, but it's a start that will let us understand PubGrub's appropriateness for this problem space. A few observations: - In simple cases, the resolver is slower than our current (naive) resolver. I think it's just that the pipelining isn't as efficient as in the naive case, where we can just stream package and version fetches concurrently without any bottlenecks. - A lot of the code here relates to bridging PubGrub with our own abstractions -- so we need a `PubGrubPackage`, a `PubGrubVersion`, etc.	2023-10-15 22:05:44 -04:00
Charlie Marsh	a622345fbc	Replace mocked server with 'real' integration tests (#91 ) We can always restore these from history, but right now, it feels a lot more productive to just hit PyPI directly for our integration tests, since we don't have to spend time figuring out mocks.	2023-10-12 17:34:48 +00:00
Charlie Marsh	496cb7b2ef	Migrate to `requirements_txt.rs` (#90 ) Remove the parser I wrote in favor of Konsti's which is much more complete. The only change vs. the version in `poc-monotrail` is that I changed the tests to use insta rather than manually storing and comparing against JSON snapshots. Closes https://github.com/astral-sh/puffin/issues/89.	2023-10-12 17:09:00 +00:00
Charlie Marsh	ed68d31e03	Add a basic test for the resolver (#86 ) Mocks out the PyPI client using some checked-in fixtures. The test is very basic, and I'm not very happy with all the ceremony around the mocks and such, but it's an interesting experiment at least.	2023-10-11 03:30:53 +00:00
Charlie Marsh	d0764bdc23	Add `puffin venv` command to create virtual environments (#83 ) Closes https://github.com/astral-sh/puffin/issues/58.	2023-10-10 13:46:25 -04:00
Charlie Marsh	a0294a510c	Rework `puffin sync` output to summarize (#81 ) This also moves away from using `tracing` for user-facing logging, instead introducing a new `Printer` abstraction. Closes #66.	2023-10-10 03:29:09 +00:00
Charlie Marsh	ba2b200fce	Enable release builds via `cargo-dist` (#79 )	2023-10-09 20:48:55 +00:00
Charlie Marsh	ba72950546	Avoid passing cached wheels to the resolver step (#70 ) When we go to install a locked `requirements.txt`, if a wheel is already available in the local cache, and matches the version specifiers, we can just use it directly without fetching the package metadata. This speeds up the no-op case by about 33%. Closes https://github.com/astral-sh/puffin/issues/48.	2023-10-08 22:17:19 -04:00
Charlie Marsh	0ca17a1cf2	Use local copy of `gourgeist` (#62 ) This PR gets `gourgeist` passing our local CI and integrated into the broader workspace. There's some duplicate between concepts in `gourgeist` (like the `InterpreterInfo`) and structs we have elsewhere, but we can tackle those later.	2023-10-08 18:45:08 +00:00
Charlie Marsh	d1ed41170b	Cache environment marker lookups (#55 ) Closes https://github.com/astral-sh/puffin/issues/53.	2023-10-08 05:31:19 +00:00
Charlie Marsh	5eef6e9636	Store cached wheels by dist-info-like name (#52 ) Closes https://github.com/astral-sh/puffin/issues/50.	2023-10-08 04:28:04 +00:00
Charlie Marsh	2a846e76b7	Store unzipped wheels in a cache (#49 ) This PR massively speeds up the case in which you need to install wheels that already exist in the global cache. The new strategy is as follows: - Download the wheel into the content-addressed cache. - Unzip the wheel into the cache, but ignore content-addressing. It turns out that writing to `cacache` for every file in the zip added a ton of overhead, and I don't see any actual advantages to doing so. Instead, we just unzip the contents into a directory at, e.g., `~/.cache/puffin/django-4.1.5`. - (The unzip itself is now parallelized with Rayon.) - When installing the wheel, we now support unzipping from a directory instead of a zip archive. This required duplicating and tweaking a few functions. - When installing the wheel, we now use reflinks (or copy-on-write links). These have a few fantastic properties: (1) they're extremely cheap to create (on macOS, they are allegedly faster than hard links); (2) they minimize disk space, since we avoid copying files entirely in the vast majority of cases; and (3) if the user then edits a file locally, the cache doesn't get polluted. Orogene, Bun, and soon pnpm all use reflinks. Puffin is now ~15x faster than `pip` for the common case of installing cached data into a fresh environment. Closes https://github.com/astral-sh/puffin/issues/21. Closes https://github.com/astral-sh/puffin/issues/39.	2023-10-08 04:04:48 +00:00
Charlie Marsh	92160e37df	Surface error when unable to find package (#45 )	2023-10-07 19:43:12 +00:00
Charlie Marsh	9be02d1590	Skip already-installed dependencies during `sync` command (#43 ) Closes https://github.com/astral-sh/puffin/issues/35.	2023-10-07 19:26:45 +00:00
Charlie Marsh	bc1736feff	Add a `freeze` command to list installed dependencies (#42 ) A pre-requisite for https://github.com/astral-sh/puffin/issues/35.	2023-10-07 18:46:09 +00:00
Charlie Marsh	f3015ffc1f	Add a `clean` command to clear the cache (#41 )	2023-10-07 15:19:03 +00:00
Charlie Marsh	162952bf64	Add a content-addressed cache for wheels (#38 ) Closes https://github.com/astral-sh/puffin/issues/4.	2023-10-07 14:24:52 +00:00
Charlie Marsh	ae28552b3a	Use local copy of `install-wheel-rs` (#34 ) This PR modifies the `install-wheel-rs` (and a few other crates) to get everything playing nicely. Specifically, CI should pass, and all these crates now use workspace dependencies between one another. As part of this change, I split out the wheel name parsing into its own `wheel-filename` crate, and the compatibility tag parsing into its own `platform-tags` crate.	2023-10-07 01:43:55 +00:00
Charlie Marsh	c8477991a9	Use local versions of PEP 440 and PEP 508 crates (#32 ) This PR modifies the PEP 440 and PEP 508 crates to pass CI, primarily by fixing all lint violations. We're also now using these crates in the workspace via `path`. (Previously, we were still fetching them from Cargo.)	2023-10-07 00:16:44 +00:00
Charlie Marsh	1e6a217503	Check in `Cargo.lock` (#29 )	2023-10-06 20:59:17 +00:00

... 23 24 25 26 27 ...

1446 Commits