Python/uv - uv - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Zanie	4ab7176b74	WIP: Rust implementation	2024-01-17 08:11:39 -06:00
Charlie Marsh	0f592b67bb	Remove clone from `RegistryWheelIndex` (#937 ) Doesn't need to own the package names.	2024-01-15 16:18:12 -05:00
konsti	8860a9c29e	Add flat index urls to registry wheel index (#928 ) Previously, we were missing flat index wheels in the cache.	2024-01-15 10:21:59 +00:00
konsti	95f3cca28d	Use fs_err in more places (#926 ) Before: ``` error: Failed to download distributions Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89 Caused by: Directory not empty (os error 39) ``` After: ``` error: Failed to download distributions Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89 Caused by: failed to rename file from /home/konsti/.cache/puffin/.tmpcG7tVP/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64.whl to /home/konsti/.cache/puffin/wheels-v0/index/9ff50b883297fa9d/jaxlib/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64 Caused by: Directory not empty (os error 39) ```	2024-01-15 09:39:33 +00:00
konsti	e9b6b6fa36	Implement `--find-links` as flat indexes (directories in pip-compile) (#912 ) Add directory `--find-links` support for local paths to pip-compile. It seems that pip joins all sources and then picks the best package. We explicitly give find links packages precedence if the same exists on an index and locally by prefilling the `VersionMap`, otherwise they are added as another index and the existing rules of precedence apply. Internally, the feature is called _flat index_, which is more meaningful than _find links_: We're not looking for links, we're picking up local directories, and (TBD) support another index format that's just a flat list of files instead of a nested index. `RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and `SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist` and `RegistrySourceDist` gained the ability to represent both a url and a path so that `--find-links` with a url and with a path works the same, both being locked as `<package_name>@<version>` instead of `<package_name> @ <url>`. (This is more of a detail, this PR in general still work if we strip that and have directory find links represented as `<package_name> @ file:///path/to/file.ext`) `PrioritizedDistribution` and `FlatIndex` have been moved to locations where we can use them in the upstack PR. I added a `scripts/wheels` directory with stripped down wheels to use for testing. We're lacking tests for correct tag priority precedence with flat indexes, i only confirmed this manually since it is not covered in the pip-compile or pip-sync output. Closes #876	2024-01-15 02:04:10 +00:00
konsti	5ffbfadf66	Make hashes optional (#910 ) There is no guarantee that indexes provide hashes at all or the sha256 we support specifically. [PEP 503](https://peps.python.org/pep-0503/#specification): > The URL SHOULD include a hash in the form of a URL fragment with the following syntax: #<hashname>=<hashvalue>, where <hashname> is the lowercase name of the hash function (such as sha256) and <hashvalue> is the hex encoded digest. We instead use the url as input to generate a hash when caching.	2024-01-14 16:32:55 -05:00
konsti	a99e5e00f2	Use absolute urls in `distribution_type::File` (#917 ) Previously, the url on file could either be a relative or an absolute url, depending on the index, and we would finalize it lazily. Now we finalize the url when converting `pypi_types::File` to `distribution_types::File`. This change is required to make the hashes on `File` optional (https://github.com/astral-sh/puffin/pull/910), which are currently the only unique field usable for caching.	2024-01-14 17:15:24 +00:00
Charlie Marsh	5fd2c380a7	Add `into_cached_dist` to `LocalWheel` (#893 ) Simplifies `unzip_wheel` a bit and avoids unnecessarily cloning in the common case.	2024-01-12 09:01:30 +00:00
Charlie Marsh	35c1faa575	Move in-flight tracking to the download level (#892 ) ## Summary Now that `get_or_build_wheel` will often _also_ handle the unzip step, we need to move our per-target locking (`OnceMap`) up a level. Previously, it was only applied to the unzip step, to prevent us from attempting to unzip into the same target concurrently; now, it's applied at the `get_wheel` level, which includes both downloading and unzipping. ## Test Plan It seems like none of our existing tests catch this -- perhaps because they're too "simple"? You need to run into a situation in which you're doing multiple source distribution builds concurrently (since they'll all try to download `setuptools`): ``` rm -rf foo && virtualenv --clear .venv && cargo run -p puffin-cli -- pip-compile ./scripts/requirements/pydantic.in --verbose --cache-dir foo ```	2024-01-12 09:52:22 +01:00
bojanserafimov	4c047f858f	Remove InMemoryWheel and dead code (#879 )	2024-01-11 10:11:07 -05:00
bojanserafimov	10227a74f8	Unzip while downloading (#856 )	2024-01-11 09:41:46 -05:00
Charlie Marsh	e26dc8e33d	Add support for `prepare_metadata_for_build_wheel` (#842 ) ## Summary This PR adds support for `prepare_metadata_for_build_wheel`, which allows us to determine source distribution metadata without building the source distribution. This represents an optimization for the resolver, as we can skip the expensive build phase for build backends that support it. For reference, `prepare_metadata_for_build_wheel` seems to be supported by: - `hatchling` (as of [1.0.9](https://hatch.pypa.io/latest/history/hatchling/#hatchling-v1.9.0)). - `flit` - `setuptools` In fact, it seems to work for every backend _except_ those using legacy `setup.py`. Closes #599.	2024-01-10 00:07:37 +00:00
konsti	ee6d809b60	Remove unused `Result` (#849 ) Remove some dead code, seems to be a refactoring oversight	2024-01-09 16:35:10 +00:00
konsti	5b0b072e3c	Allow files >4GB on 32-bit platforms (#847 ) Changes `File::size` from a `usize` to a `u64`. The motivations are that with tensorflow wheels being 475 MB (https://pypi.org/project/tensorflow/2.15.0.post1/#files), we're already only one order of magnitude away and to avoid target dependent failures.	2024-01-09 17:31:49 +01:00
Charlie Marsh	19c6d655b5	Avoid duplicated source distribution handling in url (#841 ) ## Summary Right now, both the callback _and_ the "We have no compatible wheel" paths have a lot of repeated code. This PR changes the callback to _just_ remove all the wheels and handle the download, and the rest of the method following the callback is responsible for finding and building any wheels.	2024-01-08 16:19:54 -05:00
Charlie Marsh	cc9140643e	Rename `metadata` to `built_wheel` in `source/mod.rs` (#840 )	2024-01-08 19:20:20 +00:00
Charlie Marsh	df254087d9	Break `source_dist.rs` into a module (#839 ) ## Summary Finding this file hard to edit and work in since it's gotten quite large.	2024-01-08 19:14:45 +00:00
konsti	26f597a787	Add spans to all significant tasks (#740 ) I've tried to investigate puffin's performance wrt to builds and parallelism in general, but found the previous instrumentation to granular. I've tried to add spans to every function that either needs noticeable io or cpu resources without creating duplication. This also fixes some wrong tracing usage on async functions (https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code) and some spans that weren't actually entered.	2024-01-02 16:17:03 +00:00
Charlie Marsh	007f52bb4e	Add support for relative URLs in simple metadata responses (#721 ) ## Summary This PR adds support for relative URLs in the simple JSON responses. We already support relative URLs for HTML responses, but the handling has been consolidated between the two. Similar to index URLs, we now store the base alongside the metadata, and use the base when resolving the URL. Closes #455. ## Test Plan `cargo test` (to test HTML indexes). Separately, I also ran `cargo run -p puffin-cli -- pip-compile requirements.in -n --index-url=http://localhost:3141/packages/pypi/+simple` on the `zb/relative` branch with `packse` running, and forced both HTML and JSON by limiting the `accept` header.	2023-12-27 08:53:21 -05:00
Charlie Marsh	bbe0246205	Change internal representation of `CacheEntry` to avoid allocations (#730 ) Removes a TODO.	2023-12-26 02:10:30 +00:00
Charlie Marsh	188ab75769	Split `File` into internal and external type (#729 ) ## Summary This PR makes the `pypi_types::File` a response-only type (i.e., a type that's only used when deserializing over the wire), and adds a separate internal `File` type. Right now, the representations are similar, but already, we can avoid the "lenient" deserialization on our internal `File` type, and avoid the special-casing of the property names that's required in the JSON. Over time, we can evolve this representation entirely separately from the representation we receive from PyPI and other indexes.	2023-12-25 15:42:28 -05:00
Charlie Marsh	6ff21374dc	Split `puffin-cache` into Puffin-specific and generic utilities (#728 ) This crate started off as generic caching utilities, but we started adding a lot of Puffin-specific stuff (like the cache buckets abstraction that knows about Git vs. direct URL vs. indexes and so on). This PR moves the generic stuff into a new `cache-key` crate.	2023-12-25 14:38:56 +00:00
Charlie Marsh	ad34bb02a9	Modify some inconsistent exports (#724 )	2023-12-24 22:30:03 +00:00
konsti	e23292641f	Add pypi 10k packages with most dependents dataset (#711 ) From manual inspection, this dataset generated through the [libraries.io API](https://libraries.io/api#project-search) seems more mainstream than the current 8k one, which is also preserved. I've added the dataset to the repo because the API requires an API key.	2023-12-24 18:31:52 +00:00
konsti	b7ad97a823	Show resource and lockfile when waiting (#715 ) We lock git checkout directories and the virtualenv to avoid two puffin instances running in parallel changing files at the same time and leading to a broken state. When one instance is blocking another, we need to inform the user (why is the program hanging?) and also add some information for them to debug the situation. The new messages will print ``` Waiting to acquire lock for /home/konsti/projects/puffin/.venv (lockfile: /home/konsti/projects/puffin/.venv/.lock) ``` or ``` Waiting to acquire lock for git+https://github.com/pydantic/pydantic-extra-types@0ce9f207a1e09a862287ab77512f0060c1625223 (lockfile: /home/konsti/projects/puffin/cache-all-kinds/git-v0/locks/f157fd329a506a34) ``` The messages aren't perfect but clear enough to see what the contention is and in the worst case to delete the lockfile. Fixes #714	2023-12-21 00:05:49 +01:00
Charlie Marsh	3660d8a08e	Introduce separate traits for ahead-of-time and installed metadata (#692 ) This is a pure refactor to follow-up #690, to separate the metadata that we know upfront about distributions (like the version, for registry-based distributions) vs. the metadata that requires building (like the version, for URL-based distributions).	2023-12-18 22:37:45 +00:00
Charlie Marsh	98fcb76015	Lock entire virtualenv during modifying commands (#695 ) These commands all assume that the `site-packages` are constant throughout. Closes #691.	2023-12-18 16:44:45 -05:00
konsti	f059c6e6a6	Support editable in pip-sync and pip-compile (#587 ) Support `-e path/do/dir` in pip-sync and and pip-compile.	2023-12-16 22:37:34 +00:00
konsti	71964ec7a8	Switch to msgpack in the cached client (#662 ) This gives a 1.23 speedup on transformers-extras. We could change to msgpack for the entire cache if we want. I only tried this format and postcard so far, where postcard was much slower (like 1.6s). I don't actually want to merge it like this, i wanted to figure out the ballpark of improvement for switching away from json. ``` hyperfine --warmup 3 --runs 10 "target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in" "target/profiling/branch pip-compile scripts/requirements/transformers-extras.in" Benchmark 1: target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in Time (mean ± σ): 179.1 ms ± 4.8 ms [User: 157.5 ms, System: 48.1 ms] Range (min … max): 174.9 ms … 188.1 ms 10 runs Benchmark 2: target/profiling/branch pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 221.1 ms ± 6.7 ms [User: 208.1 ms, System: 46.5 ms] Range (min … max): 213.5 ms … 235.5 ms 10 runs Summary target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in ran 1.23 ± 0.05 times faster than target/profiling/branch pip-compile scripts/requirements/transformers-extras.in ``` Disadvantage: We can't manually look into the cache anymore to debug things - [ ] Check more formats, i currently only tested json, msgpack and postcard, there should be other formats, too - [x] Switch over `CachedByTimestamp` serialization (for the interpreter caching) - [x] Switch over error handling and make sure puffin is still resilient to cache failure	2023-12-16 21:01:35 +00:00
Charlie Marsh	1129661a22	Ignore missing manifest entries in the built wheel cache (#654 ) ## Summary This is more of a hypothetical problem, but the cache manifest could in theory get out-of-sync with the contents on disk. This PR modifies the `BuiltWheelMetadata` lookup to warn (but not fail) if the manifest includes a wheel that no longer exists on disk. You can mimic this by removing a wheel from the `built-wheels-v0` cache without modifying the manifest correspondingly.	2023-12-15 17:24:09 +00:00
Charlie Marsh	84093773ef	Store source distribution sources in the cache (#653 ) ## Summary This PR modifies `source_dist.rs` to store source distributions (from remote URLs) in the cache. The cache structure for registries now looks like: <img width="1053" alt="Screen Shot 2023-12-14 at 10 43 43 PM" src="https://github.com/astral-sh/puffin/assets/1309177/3c2dbf6b-5926-41f2-b69b-74031741aba8"> (I will update the docs prior to merging, if approved.) The benefit here is that we can reuse the source distribution (avoid download + unzipping it) if we need to build multiple wheels. In the future, it will be even more relevant, since we'll need to reuse the source distribution to support https://github.com/astral-sh/puffin/issues/599. I also included some misc. refactors to DRY up repeated operations and add some more abstraction to `source_dist.rs`.	2023-12-15 17:19:33 +00:00
Charlie Marsh	a361ccfbb3	Remove additional metadata call in `source_dist.rs` (#652 )	2023-12-14 19:45:31 +00:00
Charlie Marsh	ed8dfbfcf7	Preserve verbatim URLs (#639 ) ## Summary This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout the resolution and installation pipeline. In short, alongside the parsed `Url`, we also keep the URL as written by the user. This enables us to display the URL exactly as written by the user, rather than the serialized path that we use internally. This will be especially useful once we start expanding environment variables since, at that point, we'll be able to write the version of the URL that includes the _unexpected_ environment variable to the output file.	2023-12-14 15:03:39 +00:00
Charlie Marsh	db7e2dedbb	Move archive extraction into its own crate (#647 ) We have some shared utilities beyond `puffin-build` and `puffin-distribution`, and further, I want to be able to access the sdist archive extraction logic from `puffin-distribution`. This is really generic, so moving into its own crate.	2023-12-14 04:49:09 +00:00
Charlie Marsh	388641643d	Remove `SourceDistDownload` struct (#646 ) This is created in one place, then immediately destructed into fields.	2023-12-14 02:34:50 +00:00
Charlie Marsh	e0127581b6	Use `fs_err` in more places (#644 )	2023-12-14 01:11:45 +00:00
Charlie Marsh	8071a23863	Add dedicated ID types to avoid opaque strings (#642 ) This allows us to enforce type safety within the resolver. For example, in the index, we can remove `String` as a key type and enforce that callers _must_ present us with a `PackageId`. (This actually caught one bug, where we were using the SHA rather than the package ID. That bug shouldn't have had any effect given where it was, since those are 1:1, but it's still problematic.)	2023-12-14 00:53:33 +00:00
konsti	0b20f6a25a	Proper unzip error type (#636 ) Move the `Unzip` trait from anyhow to `ZipError\|io::Error`.	2023-12-13 12:55:59 +00:00
konsti	a24a681db9	Towards using `prepare_metadata_for_build_wheel` in the resolver (#616 ) Make `prepare_metadata_for_build_wheel` accessible across the puffin codebase by splitting the built call into a setup, a metadata and a wheel call. This does not actually use the hook yet, but it's the required refactoring for it. Part of #599.	2023-12-12 20:45:37 +00:00
Charlie Marsh	1181288078	Download, build, and install in a single pipeline phase (#605 ) ## Summary At present, we have two separate phases within the installation pipeline related to populating wheels into the cache. The first phase downloads the distribution, and then builds any source distributions into wheels; the second phase unzips all the built wheels into the cache. This PR merges those two phases into one, such that we seamlessly download, build, and unzip wheels in one pass. This is more efficient, since we can start unzipping while we build. It also ensures that if the install _fails_ partway through, we don't end up with a bunch of downloaded wheels that we never had a chance to unzip. The code is also much simpler. The main downside is that the user-facing feedback isn't as granular, since we only have one phase and one progress bar for what was originally three distinct phases. Closes https://github.com/astral-sh/puffin/issues/571. ## Test Plan I ran the benchmark script on two separate requirements files, and saw a 7% and 31% speedup respectively: ```text + TARGET=./scripts/benchmarks/requirements.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 269.4 ms ± 33.0 ms [User: 42.4 ms, System: 117.5 ms] Range (min … max): 221.7 ms … 446.7 ms 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 250.6 ms ± 28.3 ms [User: 41.5 ms, System: 127.4 ms] Range (min … max): 207.6 ms … 336.4 ms 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran 1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ``` ```text + TARGET=./scripts/benchmarks/requirements-large.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 5.053 s ± 0.354 s [User: 1.413 s, System: 6.710 s] Range (min … max): 4.584 s … 6.333 s 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 3.845 s ± 0.225 s [User: 1.364 s, System: 6.970 s] Range (min … max): 3.482 s … 4.715 s 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran ```	2023-12-11 15:42:29 +00:00
Charlie Marsh	00f1703111	Avoid storing partial wheels in the cache (#604 ) Closes https://github.com/astral-sh/puffin/issues/603.	2023-12-09 19:11:30 -05:00
Charlie Marsh	f1c05dcd66	Buffer streamed file writes (#602 )	2023-12-09 16:20:31 +00:00
Charlie Marsh	0499fe0613	Fix incorrect unknown size marker in traces (#600 ) It said `(unknown size)` for _all_ disk-based wheels.	2023-12-09 04:46:01 +00:00
Charlie Marsh	714a64549b	Use a progress bar for the build phase (#597 ) I think this might've been an oversight when copying over the build reporting during the source distribution refactor.	2023-12-09 04:05:13 +00:00
Charlie Marsh	a24534b0ce	Use `rustc-hash` instead of `fxhash` crate (#594 ) `fxhash` is the old, less maintained version of this crate (`rustc-hash`). We use the latter in Ruff.	2023-12-08 20:27:49 +00:00
konsti	6005d7a552	Keep track of in flight unzips using `OnceMap` (#544 ) I saw warnings when we were e.g. unzipping wheel and setuptools in two tasks at the same time. We now keep track of in flight unzips. This introduces a `OnceMap` abstraction which we also use in the resolver.	2023-12-08 20:18:11 +00:00
Charlie Marsh	4b8642c6f7	Enable selective cache purging in `puffin clean` (#589 ) ## Summary This PR enables `puffin clean` to accept package names as command line arguments, and selectively purge entries from the cache tied to the given package. Relate to #572. ## Test Plan Modified all the caching tests to run an additional step to (1) purge the cache, and (2) re-install the package.	2023-12-08 19:51:32 +00:00
Charlie Marsh	5ae3a8b1cb	Restructure Git cache to include package name (#588 ) ## Summary This PR modifies the Git wheel cache to: (1) use a shorter version of the SHA, to save space; and (2) include the package name, for consistency with all other buckets. I considered removing the URL hash entirely, and _just_ using the SHA, which would be even _more_ consistent with other buckets. But if we remove the URL, then we won't have separate directories for subdirectories (which are part of the URL). Before: <img width="1035" alt="Screen Shot 2023-12-07 at 7 23 42 PM" src="https://github.com/astral-sh/puffin/assets/1309177/86afce67-682f-464f-9ba1-0b60d5b7f19f"> After: <img width="1232" alt="Screen Shot 2023-12-07 at 8 09 23 PM" src="https://github.com/astral-sh/puffin/assets/1309177/eda42a19-974f-47fe-8c83-54a602ddfd2d">	2023-12-07 20:17:41 -05:00
Charlie Marsh	a825b2db06	Shard the registry cache by package (#583 ) ## Summary This PR modifies the cache structure in a few ways. Most notably, we now shard the set of registry wheels by package, and index them lazily when computing the install plan. This applies both to built wheels: <img width="989" alt="Screen Shot 2023-12-06 at 4 42 19 PM" src="https://github.com/astral-sh/puffin/assets/1309177/0e8a306f-befd-4be9-a63e-2303389837bb"> And remote wheels: <img width="836" alt="Screen Shot 2023-12-06 at 4 42 30 PM" src="https://github.com/astral-sh/puffin/assets/1309177/7fd908cd-dd86-475e-9779-07ed067b4a1a"> For other distributions, we now consistently cache using the package name, which is really just for clarity and debuggability (we could consider omitting these): <img width="955" alt="Screen Shot 2023-12-06 at 4 58 30 PM" src="https://github.com/astral-sh/puffin/assets/1309177/3e8d0f99-df45-429a-9175-d57b54a72e56"> Obliquely closes https://github.com/astral-sh/puffin/issues/575.	2023-12-07 05:02:46 +00:00
Charlie Marsh	aa065f5c97	Modify install plan to support all distribution types (#581 ) This PR adds caching support for built wheels in the installer. Specifically, the `RegistryWheelIndex` now indexes both downloaded and built wheels (from registries), and we have a new `BuiltWheelIndex` that takes a subdirectory and returns the "best-matching" compatible wheel. Closes #570.	2023-12-07 04:43:34 +00:00

1 2

96 Commits