Python/uv - uv - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Charlie Marsh	9fd3b8298d	Use `fs_err::tokio` consistently in distribution database (#1055 )	2024-01-22 19:14:29 -05:00
Zanie Blue	33b35f7020	Add support for disabling installation from pre-built wheels (#956 ) Adds support for disabling installation from pre-built wheels i.e. the package must be built from source locally. We will still always use pre-built wheels for metadata during resolution. Available via `--no-binary` and `--no-binary-package <name>` flags in `pip install` and `pip sync`. There is no flag for `pip compile` since no installation happens there. ``` --no-binary Don't install pre-built wheels. When enabled, all installed packages will be installed from a source distribution. The resolver will still use pre-built wheels for metadata. --no-binary-package <NO_BINARY_PACKAGE> Don't install pre-built wheels for a specific package. When enabled, the specified packages will be installed from a source distribution. The resolver will still use pre-built wheels for metadata. ``` When packages are already installed, the `--no-binary` flag will have no affect without the `--reinstall` flag. In the future, I'd like to change this by tracking if a local distribution is from a pre-built wheel or a locally-built wheel. However, this is significantly more complex and different than `pip`'s behavior so deferring for now. For reference, `pip`'s flag works as follows: ``` --no-binary <format_control> Do not use binary packages. Can be supplied multiple times, and each time adds to the existing value. Accepts either ":all:" to disable all binary packages, ":none:" to empty the set (notice the colons), or one or more package names with commas between them (no colons). Note that some packages are tricky to compile and may fail to install when this option is used on them. ``` Note we are not matching the exact `pip` interface here because it seems complicated to use. I think we may want to consider adjusting our interface for this behavior since we're not entirely compatible anyway e.g. I think `--force-build` and `--force-build-package` are clearer names. We could also consider matching the `pip` interface or only allowing `--no-binary <package>` for compatibility. We can of course do whatever we want in our _own_ install interfaces later. Additionally, we may want to further consider the semantics of `--no-binary`. For example, if I run `pip install pydantic --no-binary` I expect _just_ Pydantic to be installed without binaries but by default we will build all of Pydantic's dependencies too. This work was prompted by #895, as it is much easier to measure performance gains from building source distributions if we have a flag to ensure we actually build source distributions. Additionally, this is a flag I have used frequently in production to debug packages that ship Cythonized wheels.	2024-01-19 11:24:27 -06:00
konsti	47fc90d1b3	Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures (#1004 ) This is https://github.com/astral-sh/puffin/pull/947 again but this time merging into main instead of downstack, sorry for the noise. --- Windows has a default stack size of 1MB, which makes puffin often fail with stack overflows. The PR reduces stack size by three changes: * Boxing `File` in `Dist`, reducing the size from 496 to 240. * Boxing the largest futures. * Boxing `CachePolicy` ## Method Debugging happened on linux using https://github.com/astral-sh/puffin/pull/941 to limit the stack size to 1MB. Used ran the command below. ``` RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt ``` The main drawback is top-type-sizes not saying what the `__awaitee` is, so it requires manually looking up with a future with matching size. When the `brotli` features on `reqwest` is active, a lot of brotli types show up. Toggling this feature however seems to have no effect. I assume they are false positives since the `brotli` crate has elaborate control about allocation. The sizes are therefore shown with the feature off. ## Results The largest future goes from 12208B to 6416B, the largest type (`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f For the second commit, i iteratively boxed the largest file until the tests passed, then with an 800KB stack limit looked through the backtrace of a failing test and added some more boxing. Quick benchmarking showed no difference: ```console $ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent Time (mean ± σ): 49.2 ms ± 3.0 ms [User: 39.8 ms, System: 24.0 ms] Range (min … max): 46.6 ms … 63.0 ms 55 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent Time (mean ± σ): 47.4 ms ± 3.2 ms [User: 41.3 ms, System: 20.6 ms] Range (min … max): 44.6 ms … 60.5 ms 62 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary target/profiling/puffin-dev resolve meine_stadt_transparent ran 1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent ```	2024-01-19 09:38:36 +00:00
Charlie Marsh	9b24fcd306	Remove verbatim URL from path file location (#998 ) ## Summary I got confused by why `VerbatimUrl` was on `Path`. Since it's directly computed from it, I think we should just compute it as-needed. I think it's also possibly-buggy because the URL is the URL of the _directory_, not the artifact itself, which differs from other distributions.	2024-01-18 22:40:48 -05:00
Charlie Marsh	a0420114c3	Avoid storing absolute URLs for files (#944 ) ## Summary It turns out that storing an absolute URL for every file caused a significant performance regression. This PR attempts to address the regression with two changes. The first is that we now store the raw string if the URL is an absolute URL. If the URL is relative, we store the base URL alongside the raw relative string. As such, we avoid serializing and deserializing URLs until we need them (later on), except for the base URL. The second is that we now use the internal `Url` crate methods for serializing and deserializing. If you look inside `Url`, its standard serializer and deserialization actually convert it to a string, then parse the string. But the crate exposes some other methods for faster serialization and deserialization (with fewer guarantees). I think this is totally fine since the cache is entirely internal. If we _just_ change the `Url` serialization (and no other code -- so continue to store URLs for every file), then the regression goes down to about 5%: ```shell ❯ python -m scripts.bench \ --puffin-path ./target/release/main \ --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \ scripts/requirements/home-assistant.in --benchmark resolve-warm Benchmark 1: ./target/release/main (resolve-warm) Time (mean ± σ): 496.3 ms ± 4.3 ms [User: 452.4 ms, System: 175.5 ms] Range (min … max): 487.3 ms … 502.4 ms 10 runs Benchmark 2: ./target/release/relative (resolve-warm) Time (mean ± σ): 284.8 ms ± 2.1 ms [User: 245.8 ms, System: 165.6 ms] Range (min … max): 280.3 ms … 288.0 ms 10 runs Benchmark 3: ./target/release/puffin (resolve-warm) Time (mean ± σ): 300.4 ms ± 3.2 ms [User: 255.5 ms, System: 178.1 ms] Range (min … max): 295.4 ms … 305.1 ms 10 runs Summary './target/release/relative (resolve-warm)' ran 1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)' 1.74 ± 0.02 times faster than './target/release/main (resolve-warm)' ``` So I considered _just_ making that change. But 5% is kind of borderline... With both of these changes, the regression is down to 1-2%: ``` Benchmark 1: ./target/release/relative (resolve-warm) Time (mean ± σ): 282.6 ms ± 7.4 ms [User: 244.6 ms, System: 181.3 ms] Range (min … max): 275.1 ms … 318.5 ms 30 runs Benchmark 2: ./target/release/puffin (resolve-warm) Time (mean ± σ): 286.8 ms ± 2.2 ms [User: 247.0 ms, System: 169.1 ms] Range (min … max): 282.3 ms … 290.7 ms 30 runs Summary './target/release/relative (resolve-warm)' ran 1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)' ``` It's consistently ~2%-ish, but at this point it's unclear if that's due to the URL change or something other change between now and then. Closes #943.	2024-01-17 09:15:21 -05:00
konsti	95f3cca28d	Use fs_err in more places (#926 ) Before: ``` error: Failed to download distributions Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89 Caused by: Directory not empty (os error 39) ``` After: ``` error: Failed to download distributions Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89 Caused by: failed to rename file from /home/konsti/.cache/puffin/.tmpcG7tVP/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64.whl to /home/konsti/.cache/puffin/wheels-v0/index/9ff50b883297fa9d/jaxlib/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64 Caused by: Directory not empty (os error 39) ```	2024-01-15 09:39:33 +00:00
konsti	e9b6b6fa36	Implement `--find-links` as flat indexes (directories in pip-compile) (#912 ) Add directory `--find-links` support for local paths to pip-compile. It seems that pip joins all sources and then picks the best package. We explicitly give find links packages precedence if the same exists on an index and locally by prefilling the `VersionMap`, otherwise they are added as another index and the existing rules of precedence apply. Internally, the feature is called _flat index_, which is more meaningful than _find links_: We're not looking for links, we're picking up local directories, and (TBD) support another index format that's just a flat list of files instead of a nested index. `RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and `SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist` and `RegistrySourceDist` gained the ability to represent both a url and a path so that `--find-links` with a url and with a path works the same, both being locked as `<package_name>@<version>` instead of `<package_name> @ <url>`. (This is more of a detail, this PR in general still work if we strip that and have directory find links represented as `<package_name> @ file:///path/to/file.ext`) `PrioritizedDistribution` and `FlatIndex` have been moved to locations where we can use them in the upstack PR. I added a `scripts/wheels` directory with stripped down wheels to use for testing. We're lacking tests for correct tag priority precedence with flat indexes, i only confirmed this manually since it is not covered in the pip-compile or pip-sync output. Closes #876	2024-01-15 02:04:10 +00:00
konsti	a99e5e00f2	Use absolute urls in `distribution_type::File` (#917 ) Previously, the url on file could either be a relative or an absolute url, depending on the index, and we would finalize it lazily. Now we finalize the url when converting `pypi_types::File` to `distribution_types::File`. This change is required to make the hashes on `File` optional (https://github.com/astral-sh/puffin/pull/910), which are currently the only unique field usable for caching.	2024-01-14 17:15:24 +00:00
Charlie Marsh	35c1faa575	Move in-flight tracking to the download level (#892 ) ## Summary Now that `get_or_build_wheel` will often _also_ handle the unzip step, we need to move our per-target locking (`OnceMap`) up a level. Previously, it was only applied to the unzip step, to prevent us from attempting to unzip into the same target concurrently; now, it's applied at the `get_wheel` level, which includes both downloading and unzipping. ## Test Plan It seems like none of our existing tests catch this -- perhaps because they're too "simple"? You need to run into a situation in which you're doing multiple source distribution builds concurrently (since they'll all try to download `setuptools`): ``` rm -rf foo && virtualenv --clear .venv && cargo run -p puffin-cli -- pip-compile ./scripts/requirements/pydantic.in --verbose --cache-dir foo ```	2024-01-12 09:52:22 +01:00
bojanserafimov	4c047f858f	Remove InMemoryWheel and dead code (#879 )	2024-01-11 10:11:07 -05:00
bojanserafimov	10227a74f8	Unzip while downloading (#856 )	2024-01-11 09:41:46 -05:00
Charlie Marsh	e26dc8e33d	Add support for `prepare_metadata_for_build_wheel` (#842 ) ## Summary This PR adds support for `prepare_metadata_for_build_wheel`, which allows us to determine source distribution metadata without building the source distribution. This represents an optimization for the resolver, as we can skip the expensive build phase for build backends that support it. For reference, `prepare_metadata_for_build_wheel` seems to be supported by: - `hatchling` (as of [1.0.9](https://hatch.pypa.io/latest/history/hatchling/#hatchling-v1.9.0)). - `flit` - `setuptools` In fact, it seems to work for every backend _except_ those using legacy `setup.py`. Closes #599.	2024-01-10 00:07:37 +00:00
konsti	5b0b072e3c	Allow files >4GB on 32-bit platforms (#847 ) Changes `File::size` from a `usize` to a `u64`. The motivations are that with tensorflow wheels being 475 MB (https://pypi.org/project/tensorflow/2.15.0.post1/#files), we're already only one order of magnitude away and to avoid target dependent failures.	2024-01-09 17:31:49 +01:00
konsti	26f597a787	Add spans to all significant tasks (#740 ) I've tried to investigate puffin's performance wrt to builds and parallelism in general, but found the previous instrumentation to granular. I've tried to add spans to every function that either needs noticeable io or cpu resources without creating duplication. This also fixes some wrong tracing usage on async functions (https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code) and some spans that weren't actually entered.	2024-01-02 16:17:03 +00:00
Charlie Marsh	007f52bb4e	Add support for relative URLs in simple metadata responses (#721 ) ## Summary This PR adds support for relative URLs in the simple JSON responses. We already support relative URLs for HTML responses, but the handling has been consolidated between the two. Similar to index URLs, we now store the base alongside the metadata, and use the base when resolving the URL. Closes #455. ## Test Plan `cargo test` (to test HTML indexes). Separately, I also ran `cargo run -p puffin-cli -- pip-compile requirements.in -n --index-url=http://localhost:3141/packages/pypi/+simple` on the `zb/relative` branch with `packse` running, and forced both HTML and JSON by limiting the `accept` header.	2023-12-27 08:53:21 -05:00
Charlie Marsh	bbe0246205	Change internal representation of `CacheEntry` to avoid allocations (#730 ) Removes a TODO.	2023-12-26 02:10:30 +00:00
Charlie Marsh	6ff21374dc	Split `puffin-cache` into Puffin-specific and generic utilities (#728 ) This crate started off as generic caching utilities, but we started adding a lot of Puffin-specific stuff (like the cache buckets abstraction that knows about Git vs. direct URL vs. indexes and so on). This PR moves the generic stuff into a new `cache-key` crate.	2023-12-25 14:38:56 +00:00
Charlie Marsh	ad34bb02a9	Modify some inconsistent exports (#724 )	2023-12-24 22:30:03 +00:00
Charlie Marsh	3660d8a08e	Introduce separate traits for ahead-of-time and installed metadata (#692 ) This is a pure refactor to follow-up #690, to separate the metadata that we know upfront about distributions (like the version, for registry-based distributions) vs. the metadata that requires building (like the version, for URL-based distributions).	2023-12-18 22:37:45 +00:00
konsti	f059c6e6a6	Support editable in pip-sync and pip-compile (#587 ) Support `-e path/do/dir` in pip-sync and and pip-compile.	2023-12-16 22:37:34 +00:00
Charlie Marsh	ed8dfbfcf7	Preserve verbatim URLs (#639 ) ## Summary This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout the resolution and installation pipeline. In short, alongside the parsed `Url`, we also keep the URL as written by the user. This enables us to display the URL exactly as written by the user, rather than the serialized path that we use internally. This will be especially useful once we start expanding environment variables since, at that point, we'll be able to write the version of the URL that includes the _unexpected_ environment variable to the output file.	2023-12-14 15:03:39 +00:00
Charlie Marsh	1181288078	Download, build, and install in a single pipeline phase (#605 ) ## Summary At present, we have two separate phases within the installation pipeline related to populating wheels into the cache. The first phase downloads the distribution, and then builds any source distributions into wheels; the second phase unzips all the built wheels into the cache. This PR merges those two phases into one, such that we seamlessly download, build, and unzip wheels in one pass. This is more efficient, since we can start unzipping while we build. It also ensures that if the install _fails_ partway through, we don't end up with a bunch of downloaded wheels that we never had a chance to unzip. The code is also much simpler. The main downside is that the user-facing feedback isn't as granular, since we only have one phase and one progress bar for what was originally three distinct phases. Closes https://github.com/astral-sh/puffin/issues/571. ## Test Plan I ran the benchmark script on two separate requirements files, and saw a 7% and 31% speedup respectively: ```text + TARGET=./scripts/benchmarks/requirements.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 269.4 ms ± 33.0 ms [User: 42.4 ms, System: 117.5 ms] Range (min … max): 221.7 ms … 446.7 ms 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 250.6 ms ± 28.3 ms [User: 41.5 ms, System: 127.4 ms] Range (min … max): 207.6 ms … 336.4 ms 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran 1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ``` ```text + TARGET=./scripts/benchmarks/requirements-large.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 5.053 s ± 0.354 s [User: 1.413 s, System: 6.710 s] Range (min … max): 4.584 s … 6.333 s 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 3.845 s ± 0.225 s [User: 1.364 s, System: 6.970 s] Range (min … max): 3.482 s … 4.715 s 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran ```	2023-12-11 15:42:29 +00:00
Charlie Marsh	00f1703111	Avoid storing partial wheels in the cache (#604 ) Closes https://github.com/astral-sh/puffin/issues/603.	2023-12-09 19:11:30 -05:00
Charlie Marsh	f1c05dcd66	Buffer streamed file writes (#602 )	2023-12-09 16:20:31 +00:00
Charlie Marsh	0499fe0613	Fix incorrect unknown size marker in traces (#600 ) It said `(unknown size)` for _all_ disk-based wheels.	2023-12-09 04:46:01 +00:00
Charlie Marsh	a825b2db06	Shard the registry cache by package (#583 ) ## Summary This PR modifies the cache structure in a few ways. Most notably, we now shard the set of registry wheels by package, and index them lazily when computing the install plan. This applies both to built wheels: <img width="989" alt="Screen Shot 2023-12-06 at 4 42 19 PM" src="https://github.com/astral-sh/puffin/assets/1309177/0e8a306f-befd-4be9-a63e-2303389837bb"> And remote wheels: <img width="836" alt="Screen Shot 2023-12-06 at 4 42 30 PM" src="https://github.com/astral-sh/puffin/assets/1309177/7fd908cd-dd86-475e-9779-07ed067b4a1a"> For other distributions, we now consistently cache using the package name, which is really just for clarity and debuggability (we could consider omitting these): <img width="955" alt="Screen Shot 2023-12-06 at 4 58 30 PM" src="https://github.com/astral-sh/puffin/assets/1309177/3e8d0f99-df45-429a-9175-d57b54a72e56"> Obliquely closes https://github.com/astral-sh/puffin/issues/575.	2023-12-07 05:02:46 +00:00
Charlie Marsh	5370484307	Remove `.whl` extension for cached, unzipped wheels (#574 ) ## Summary This PR uses the wheel stem (e.g., `foo-1.2.3-py3-none-any`) instead of the wheel name (e.g., `foo-1.2.3-py3-none-any.whl`) when storing unzipped wheels in the cache, which removes a class of confusing issues around overwrites and directory-vs.-file collisions. For now, we retain _both_ the zipped and unzipped wheels in the cache, though we can easily change this by storing the zipped wheels in a temporary directory. Closes https://github.com/astral-sh/puffin/issues/573. ## Test Plan Some examples from my local cache: <img width="835" alt="Screen Shot 2023-12-05 at 4 09 55 PM" src="https://github.com/astral-sh/puffin/assets/1309177/784146aa-b080-416e-9767-40c843fe5d6a"> <img width="847" alt="Screen Shot 2023-12-05 at 4 12 14 PM" src="https://github.com/astral-sh/puffin/assets/1309177/4bc7f30f-bef3-47f1-b4e8-da9cabf87f28"> <img width="637" alt="Screen Shot 2023-12-05 at 4 09 50 PM" src="https://github.com/astral-sh/puffin/assets/1309177/25ca4944-4a06-4a08-ac85-c6f7d8b5c8ea">	2023-12-05 22:41:22 +00:00
Charlie Marsh	a15da36d74	Avoid removing local wheels when unzipping (#560 ) ## Summary When installing a local wheel, we need to avoid removing the zipped wheel (since it lives outside of the cache), _and_ need to ensure that we unzip the wheel into the cache (rather than replacing the zipped wheel, which may even live outside of the project). Closes https://github.com/astral-sh/puffin/issues/553.	2023-12-05 17:50:08 +00:00
Charlie Marsh	6f055ecf3b	Remove existing built wheels when building source distributions (#559 ) This PR modifies the source distribution building to replace any existing targets after building the new wheel. In some cases, the existence of an existing target may be indicative of a bug, so we warn. It's partially a workaround for some (but not all) of the errors in https://github.com/astral-sh/puffin/issues/554.	2023-12-05 12:45:24 -05:00
Charlie Marsh	f99e3560e8	Avoid returning zipped wheels from registry and URL indexes (#558 ) ## Summary This is hard to reproduce, but if you run a long installation process that errors part-way through, you can end up with zipped wheels in the `Wheels` cache, which is intended to contain only unzipped wheels. This PR avoids returning those entries from the registry, which will then lead to errors downstream when we treat them as directories.	2023-12-05 09:53:45 +01:00
konsti	9806901a16	Consolidate wheel caches (#524 ) After this change, two wheel caches remain: `built-wheels-v0` and `wheels-v0`, docs screenshots below. Each contains both the wheel metadata, cache policy and zip or unzipped wheels under the same name. The zipped/unzipped strategy is as follows: In `pip-compile`, when we build a wheel, we store it zipped. When `pip-sync` or a source dist build in `pip-compile` need to install the wheel, we unzip it, remove the file and replace it with the unzipped wheel. This removes `WheelCache` and `UrlIndex` in favor of `Cache` plus `WheelCache`. The non-built wheel cache now considers index urls and the url for url wheels. I'm unsure if we need the `Unzipper` type, this could just be a function. I move `no_index` into `IndexUrls` and started using `IndexUrl` up to the clap level. I left a number of TODOs in the code, namely performing the actual invalidation of unzipped wheels and making the `InstallPlan` understand cache invalidation (i.e. uninstall wheels when their remote changed). ![image](https://github.com/astral-sh/puffin/assets/6826232/c4d45979-485b-4954-848d-fd3347ee2510)	2023-12-01 20:16:33 +00:00
konsti	4551994b7d	Clear built wheels when remote changed (#519 ) Remove built wheels alongside their metadata when their index source dist or url source dist changed. For git source dists, we currently don't clear the previous build but use a new directory (not sure what's right here - are there any generic cache GC approaches out there? I've seen that e.g. spotify keeps its cache at 10GB max, but i also haven't seen any reusable, well tested approaches for this). Path distributions are unchanged (#478). I like the structure of metadata alongside the wheel for cache invalidation, i'll try to do that for `wheels-v0`/`wheel-metadata-v0` too. (The unzipped wheels afaik currently lack cache invalidation when the remote changed.) This should give is roughly the same structure for wheel and built wheels and a very similar pattern of invalidation.	2023-12-01 14:56:47 -05:00
konsti	5435d44756	Introduce `Cache`, `CacheBucket` and `CacheEntry` (#507 ) This is mostly a mechanical refactor that moves 80% of our code to the same cache abstraction. It introduces cache `Cache`, which abstracts away the path of the cache and the temp dir drop and is passed throughout the codebase. To get a specific cache bucket, you need to requests your `CacheBucket` from `Cache`. `CacheBucket` is the centralizes the names of all cache buckets, moving them away from the string constants spread throughout the crates. Specifically for working with the `CachedClient`, there is a `CacheEntry`. I'm not sure yet if that is a strict improvement over `cache_dir: PathBuf, cache_file: String`, i may have to rotate that later. The interpreter cache moved into `interpreter-v0`. We can use the `CacheBucket` page to document the cache structure in each bucket: ![image](https://github.com/astral-sh/puffin/assets/6826232/b023fdfb-e34d-4c2d-8663-b5f73937a539)	2023-11-28 17:11:14 +00:00
Charlie Marsh	3eb0a43995	Perform a single Git fetch when building source distributions (#499 ) ## Summary We need to pass in the distribution with the "precise" URL to avoid refetching. ## Test Plan Ran `cargo run -p puffin-cli -- pip-compile requirements.in --verbose` with `flask @ git+https://github.com/pallets/flask.git` and verified that we only checked out Flask once.	2023-11-25 23:29:41 +00:00
konsti	d54e780843	Source dist metadata refactor (#468 ) ## Summary and motivation For a given source dist, we store the metadata of each wheel built through it in `built-wheel-metadata-v0/pypi/<source dist filename>/metadata.json`. During resolution, we check the cache status of the source dist. If it is fresh, we check `metadata.json` for a matching wheel. If there is one we use that metadata, if there isn't, we build one. If the source is stale, we build a wheel and override `metadata.json` with that single wheel. This PR thereby ties the local built wheel metadata cache to the freshness of the remote source dist. This functionality is available through `SourceDistCachedBuilder`. `puffin_installer::Builder`, `puffin_installer::Downloader` and `Fetcher` are removed, instead there are now `FetchAndBuild` which calls into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new main high-level abstraction: It spawns parallel fetching/building, for wheel metadata it calls into the registry client, for wheel files it fetches them, for source dists it calls `SourceDistCachedBuilder`. It handles locks around builds, and newly added also inter-process file locking for git operations. Fetching and building source distributions now happens in parallel in `pip-sync`, i.e. we don't have to wait for the largest wheel to be downloaded to start building source distributions. In a follow-up PR, I'll also clear built wheels when they've become stale. Another effect is that in a fully cached resolution, we need neither zip reading nor email parsing. Closes #473 ## Source dist cache structure Entries by supported sources: * `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json` * `<build wheel metadata cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json` * `<build wheel metadata cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json` But the url filename does not need to be a valid source dist filename (<https://github.com/search?q=path%3A*%2Frequirements.txt+master.zip&type=code>), so it could also be the following and we have to take any string as filename: `<build wheel metadata cache>/url/<sha256(url)>/master.zip/metadata.json` Example: ```text # git source dist pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git # pypi source dist django_allauth==0.51.0 # url source dist werkzeug @ `ff1904eb5e`2853bf83db817a7dd53d/werkzeug-3.0.1.tar.gz ``` will be stored as ```text built-wheel-metadata-v0 ├── git │ └── 5c56bc1c58c34c11 │ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f │ └── metadata.json ├── pypi │ └── django-allauth-0.51.0.tar.gz │ └── metadata.json └── url └── 6781bd6440ae72c2 └── werkzeug-3.0.1.tar.gz └── metadata.json ``` The inside of a `metadata.json`: ```json { "data": { "django_allauth-0.51.0-py3-none-any.whl": { "metadata-version": "2.1", "name": "django-allauth", "version": "0.51.0", ... } } } ```	2023-11-24 17:47:58 +00:00

35 Commits