Python/uv - uv - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
konsti	e23292641f	Add pypi 10k packages with most dependents dataset (#711 ) From manual inspection, this dataset generated through the [libraries.io API](https://libraries.io/api#project-search) seems more mainstream than the current 8k one, which is also preserved. I've added the dataset to the repo because the API requires an API key.	2023-12-24 18:31:52 +00:00
Charlie Marsh	5bce699ee1	Add support for HTML indexes (#719 ) ## Summary This PR adds support for HTML index responses (as with `--index-url=https://download.pytorch.org/whl`). Closes https://github.com/astral-sh/puffin/issues/412.	2023-12-24 16:04:00 +00:00
konsti	e60f0ec732	Update pubgrub (#713 ) Easier than i expected: We simply never construct the pubgrub error variants since we have our own main loop. The `unreachable!()`s can be removed when never is stabilized	2023-12-20 23:56:59 +01:00
Charlie Marsh	98fcb76015	Lock entire virtualenv during modifying commands (#695 ) These commands all assume that the `site-packages` are constant throughout. Closes #691.	2023-12-18 16:44:45 -05:00
konsti	89ca0d68b9	`exclude_newer` in puffin-dev resolve-cli (#684 ) Internal dev tool change.	2023-12-18 14:06:54 +00:00
konsti	f059c6e6a6	Support editable in pip-sync and pip-compile (#587 ) Support `-e path/do/dir` in pip-sync and and pip-compile.	2023-12-16 22:37:34 +00:00
konsti	71964ec7a8	Switch to msgpack in the cached client (#662 ) This gives a 1.23 speedup on transformers-extras. We could change to msgpack for the entire cache if we want. I only tried this format and postcard so far, where postcard was much slower (like 1.6s). I don't actually want to merge it like this, i wanted to figure out the ballpark of improvement for switching away from json. ``` hyperfine --warmup 3 --runs 10 "target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in" "target/profiling/branch pip-compile scripts/requirements/transformers-extras.in" Benchmark 1: target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in Time (mean ± σ): 179.1 ms ± 4.8 ms [User: 157.5 ms, System: 48.1 ms] Range (min … max): 174.9 ms … 188.1 ms 10 runs Benchmark 2: target/profiling/branch pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 221.1 ms ± 6.7 ms [User: 208.1 ms, System: 46.5 ms] Range (min … max): 213.5 ms … 235.5 ms 10 runs Summary target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in ran 1.23 ± 0.05 times faster than target/profiling/branch pip-compile scripts/requirements/transformers-extras.in ``` Disadvantage: We can't manually look into the cache anymore to debug things - [ ] Check more formats, i currently only tested json, msgpack and postcard, there should be other formats, too - [x] Switch over `CachedByTimestamp` serialization (for the interpreter caching) - [x] Switch over error handling and make sure puffin is still resilient to cache failure	2023-12-16 21:01:35 +00:00
konsti	620f73b38b	Speed up version parsing for a 1.27±0.03 speedup in transformers-extras with conservative changes (#660 ) Two low-hanging fruits as optimizations for version parsing: A fast path for release only versions and removing the regex from version specifiers (still calling into version's parsing regex if required). This enables optimizing the serde format since we now see the serde part instead of only PEP 440 parsing. I intentionally didn't rewrite the full PEP 440 at this step. ```console $ hyperfine --warmup 5 --runs 50 "target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in" "target/profiling/main pip-compile scripts/requirements/transformers-extras.in" Benchmark 1: target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 217.1 ms ± 3.2 ms [User: 194.0 ms, System: 55.1 ms] Range (min … max): 211.0 ms … 228.1 ms 50 runs Benchmark 2: target/profiling/main pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 276.7 ms ± 5.7 ms [User: 252.4 ms, System: 54.6 ms] Range (min … max): 268.9 ms … 303.5 ms 50 runs Summary target/profiling/puffin pip-compile scripts/requirements/transformers-extras.in ran 1.27 ± 0.03 times faster than target/profiling/main pip-compile scripts/requirements/transformers-extras.in ``` --------- Co-authored-by: Andrew Gallant <andrew@astral.sh>	2023-12-15 14:03:35 -05:00
Charlie Marsh	9470c20e7a	Avoid double resolution during source builds (#656 ) ## Summary This PR ensures that we re-use the resolution to install the build dependencies when building a source distribution. Currently, we only pass along the list of requirements, and then use the `Finder` to map each requirement to a distribution. But we already determine the correct distribution when resolving! Closes https://github.com/astral-sh/puffin/issues/655.	2023-12-15 17:27:16 +00:00
Charlie Marsh	ed8dfbfcf7	Preserve verbatim URLs (#639 ) ## Summary This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout the resolution and installation pipeline. In short, alongside the parsed `Url`, we also keep the URL as written by the user. This enables us to display the URL exactly as written by the user, rather than the serialized path that we use internally. This will be especially useful once we start expanding environment variables since, at that point, we'll be able to write the version of the URL that includes the _unexpected_ environment variable to the output file.	2023-12-14 15:03:39 +00:00
Charlie Marsh	db7e2dedbb	Move archive extraction into its own crate (#647 ) We have some shared utilities beyond `puffin-build` and `puffin-distribution`, and further, I want to be able to access the sdist archive extraction logic from `puffin-distribution`. This is really generic, so moving into its own crate.	2023-12-14 04:49:09 +00:00
Charlie Marsh	920e10fc8f	Use `FxHash` consistently (#632 )	2023-12-13 05:36:03 +00:00
Charlie Marsh	a24eb57e93	Make warnings user-facing (#628 ) ## Summary Now, `puffin_warnings::warn_once` and `puffin_warnings::warn` will go to `stderr`, as long as the user isn't running under `--quiet`. Previously, these went through `tracing`, and so were only visible when running under `--verbose`.	2023-12-12 21:24:38 -05:00
Zanie Blue	490fb55ac5	Use available versions to simplify unsat error reports (#547 ) Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate version ranges in error reports using the actual available versions for each package. Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements this behavior as a method in the `Reporter` — here it's implemented in our custom report formatter (#521) instead which requires no upstream changes. Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the versions for packages that will be used in the report. This is a work in progress. Some things to do: - ~We may want to allow lazy retrieval of the version maps from the formatter~ - [x] We should probably create a separate error type for no solution instead of mixing them with other resolve errors - ~We can probably do something smarter than creating vectors to hold the versions~ - [x] This degrades error messages when a single version is not available, we'll need to special case that - [x] It seems safer to coerce the error type in `resolve` instead of `solve` if feasible	2023-12-12 23:25:16 +00:00
Charlie Marsh	1181288078	Download, build, and install in a single pipeline phase (#605 ) ## Summary At present, we have two separate phases within the installation pipeline related to populating wheels into the cache. The first phase downloads the distribution, and then builds any source distributions into wheels; the second phase unzips all the built wheels into the cache. This PR merges those two phases into one, such that we seamlessly download, build, and unzip wheels in one pass. This is more efficient, since we can start unzipping while we build. It also ensures that if the install _fails_ partway through, we don't end up with a bunch of downloaded wheels that we never had a chance to unzip. The code is also much simpler. The main downside is that the user-facing feedback isn't as granular, since we only have one phase and one progress bar for what was originally three distinct phases. Closes https://github.com/astral-sh/puffin/issues/571. ## Test Plan I ran the benchmark script on two separate requirements files, and saw a 7% and 31% speedup respectively: ```text + TARGET=./scripts/benchmarks/requirements.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 269.4 ms ± 33.0 ms [User: 42.4 ms, System: 117.5 ms] Range (min … max): 221.7 ms … 446.7 ms 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache Time (mean ± σ): 250.6 ms ± 28.3 ms [User: 41.5 ms, System: 127.4 ms] Range (min … max): 207.6 ms … 336.4 ms 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran 1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ``` ```text + TARGET=./scripts/benchmarks/requirements-large.txt + hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 5.053 s ± 0.354 s [User: 1.413 s, System: 6.710 s] Range (min … max): 4.584 s … 6.333 s 100 runs Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache Time (mean ± σ): 3.845 s ± 0.225 s [User: 1.364 s, System: 6.970 s] Range (min … max): 3.482 s … 4.715 s 100 runs Summary './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran ```	2023-12-11 15:42:29 +00:00
Charlie Marsh	32f54a5947	Use async `Command` for wheel build operations (#601 ) Incredibly, this speeds up the install on a large project from 2m6s to 50s.	2023-12-09 16:20:52 +00:00
Charlie Marsh	a24534b0ce	Use `rustc-hash` instead of `fxhash` crate (#594 ) `fxhash` is the old, less maintained version of this crate (`rustc-hash`). We use the latter in Ruff.	2023-12-08 20:27:49 +00:00
konsti	6005d7a552	Keep track of in flight unzips using `OnceMap` (#544 ) I saw warnings when we were e.g. unzipping wheel and setuptools in two tasks at the same time. We now keep track of in flight unzips. This introduces a `OnceMap` abstraction which we also use in the resolver.	2023-12-08 20:18:11 +00:00
Charlie Marsh	4b8642c6f7	Enable selective cache purging in `puffin clean` (#589 ) ## Summary This PR enables `puffin clean` to accept package names as command line arguments, and selectively purge entries from the cache tied to the given package. Relate to #572. ## Test Plan Modified all the caching tests to run an additional step to (1) purge the cache, and (2) re-install the package.	2023-12-08 19:51:32 +00:00
Zanie Blue	ef7be9103c	Parse `SimpleJson` into categorized data in the client (#522 ) Extends #517 with a suggestion from @konstin to parse the `SimpleJson` into an intermediate type `SimpleMetadata(BTreeMap<Version, VersionFiles>)` before converting to a `VersionMap`. This reduces the number of times we need to parse the response. Additionally, we cache the parsed response now instead of `SimpleJson`. `VersionFiles` stores two vectors with `WheelFilename`/`SourceDistFilename` and `File` tuples. These can be iterated over together or separately. A new enum `DistFilename` was added to capture the `SourceDistFilename` and `WheelFilename` variants allowing iteration over both vectors.	2023-12-07 11:04:47 -06:00
Charlie Marsh	aa065f5c97	Modify install plan to support all distribution types (#581 ) This PR adds caching support for built wheels in the installer. Specifically, the `RegistryWheelIndex` now indexes both downloaded and built wheels (from registries), and we have a new `BuiltWheelIndex` that takes a subdirectory and returns the "best-matching" compatible wheel. Closes #570.	2023-12-07 04:43:34 +00:00
konsti	366c389385	Parse editable installs (#564 ) Parse `-e` for editable installs in `requirements.txt`. Unlike all the other requirements, editable installs don't have the name of the package specified.	2023-12-06 18:21:15 +01:00
konsti	3f4d7b7826	Improve path source dist caching (#578 ) Path distribution cache reading errors are no longer fatal. We now invalidate the path file source dists if its modification timestamp changed, and invalidate path dir source dists if `pyproject.toml` or alternatively `setup.py` changed, which seems good choices since changing pyproject.toml should trigger a rebuild and the user can `touch` the file as part of their workflow. `CachedByTimestamp` is now a shared util. It doesn't have methods as i don't think it's worth it yet for two users. Closes #478 TODO(konstin): Write a test. This is probably twice as much work as that fix itself, so i made that PR without one for now.	2023-12-06 11:47:01 -05:00
Charlie Marsh	a15da36d74	Avoid removing local wheels when unzipping (#560 ) ## Summary When installing a local wheel, we need to avoid removing the zipped wheel (since it lives outside of the cache), _and_ need to ensure that we unzip the wheel into the cache (rather than replacing the zipped wheel, which may even live outside of the project). Closes https://github.com/astral-sh/puffin/issues/553.	2023-12-05 17:50:08 +00:00
Charlie Marsh	6f055ecf3b	Remove existing built wheels when building source distributions (#559 ) This PR modifies the source distribution building to replace any existing targets after building the new wheel. In some cases, the existence of an existing target may be indicative of a bug, so we warn. It's partially a workaround for some (but not all) of the errors in https://github.com/astral-sh/puffin/issues/554.	2023-12-05 12:45:24 -05:00
Zanie Blue	37ca2e2928	Bump pubgrub for latest upstream (#525 ) https://github.com/pubgrub-rs/pubgrub/pull/157	2023-12-04 09:09:30 -06:00
konsti	6dc8ebcb90	Test interpreter cache invalidation (#540 ) Add missing test for #529/#508.	2023-12-04 10:03:43 +00:00
Charlie Marsh	ee2fca3a48	Add CACHEDIR and .gitignore tags to cache directories (#526 ) ## Summary Even if this will typically be in the user's application folder (rather than a local directory), it's still a good practice. Closes https://github.com/astral-sh/puffin/issues/280.	2023-12-02 00:37:51 +00:00
konsti	9806901a16	Consolidate wheel caches (#524 ) After this change, two wheel caches remain: `built-wheels-v0` and `wheels-v0`, docs screenshots below. Each contains both the wheel metadata, cache policy and zip or unzipped wheels under the same name. The zipped/unzipped strategy is as follows: In `pip-compile`, when we build a wheel, we store it zipped. When `pip-sync` or a source dist build in `pip-compile` need to install the wheel, we unzip it, remove the file and replace it with the unzipped wheel. This removes `WheelCache` and `UrlIndex` in favor of `Cache` plus `WheelCache`. The non-built wheel cache now considers index urls and the url for url wheels. I'm unsure if we need the `Unzipper` type, this could just be a function. I move `no_index` into `IndexUrls` and started using `IndexUrl` up to the clap level. I left a number of TODOs in the code, namely performing the actual invalidation of unzipped wheels and making the `InstallPlan` understand cache invalidation (i.e. uninstall wheels when their remote changed). ![image](https://github.com/astral-sh/puffin/assets/6826232/c4d45979-485b-4954-848d-fd3347ee2510)	2023-12-01 20:16:33 +00:00
Zanie Blue	2a8544df9e	Use a custom pubgrub report formatter (#521 ) Uses https://github.com/zanieb/pubgrub/pull/10 to drastically simplify our reporter implementation. This will allow us to make use of upstream improvements to the reporter e.g. https://github.com/zanieb/pubgrub/pull/8 without multiple duplicative pull requests.	2023-12-01 13:36:12 -06:00
Zanie Blue	efcc4f1409	Use upstream commit for reflink-copy requirement (#523 ) https://github.com/cargo-bins/reflink-copy/pull/51 was merged	2023-12-01 10:58:24 +00:00
Zanie Blue	5f1f207628	Recursively merge existing package directories on installation (#516 ) Previously, when installing a package we would delete the target directory before copying (or linking) the contents of the package. However, this means that we do not properly support namespace packages which can share a target directory. Instead the last package to be installed would be override existing packages. Since we install packages in parallel, this could result in a race condition where the target directory already exists which is not allowed when using `clonefile`. See example error in #515. `c7e63d2dce` provides a regression test for this — it fails on `main`. Here, we implement a recursive merge when the target directory already exists. Both packages will be installed into the same directory. We no longer delete the target directory, which seems okay since we uninstall packages before installing now. When files conflict, we will likely throw an error still. The correct behavior to implement in this case is unclear, as if we just take "first write wins" or "last write wins" we could end up with some files from one package and some from another resulting in two broken packages. A possible solution here is to lock the target directories while copying.	2023-11-30 10:14:51 -06:00
konsti	929df586fb	Skip tf-models-nightly in resolve-many dev script for now (#510 ) `tf-models-nightly` has pathologic backtracking behaviour, skip it for now so we can benchmark the rest.	2023-11-28 18:25:32 +00:00
konsti	d89fbeb642	Migrate interpreter query to custom caching (#508 ) This removes the last usage of cacache by replacing it with a custom, flat json caching keyed by the digest of the executable path. ![image](https://github.com/astral-sh/puffin/assets/6826232/8f777c4c-1f1b-4656-ba7b-002175270556) A step towards #478. I've made `CachedByTimestamp<T>` generic over `T` but intentionally not moved it to `puffin-cache` yet.	2023-11-28 17:14:59 +00:00
konsti	5435d44756	Introduce `Cache`, `CacheBucket` and `CacheEntry` (#507 ) This is mostly a mechanical refactor that moves 80% of our code to the same cache abstraction. It introduces cache `Cache`, which abstracts away the path of the cache and the temp dir drop and is passed throughout the codebase. To get a specific cache bucket, you need to requests your `CacheBucket` from `Cache`. `CacheBucket` is the centralizes the names of all cache buckets, moving them away from the string constants spread throughout the crates. Specifically for working with the `CachedClient`, there is a `CacheEntry`. I'm not sure yet if that is a strict improvement over `cache_dir: PathBuf, cache_file: String`, i may have to rotate that later. The interpreter cache moved into `interpreter-v0`. We can use the `CacheBucket` page to document the cache structure in each bucket: ![image](https://github.com/astral-sh/puffin/assets/6826232/b023fdfb-e34d-4c2d-8663-b5f73937a539)	2023-11-28 17:11:14 +00:00
konsti	8855f44b5f	Move simple index queries to `CachedClient` (#504 ) Replaces the usage of `http-cache-reqwest` for simple index queries with our custom cached client, removing `http-cache-reqwest` altogether. The new cache paths are `<cache>/simple-v0/<index>/<package_name>.json`. I could not test with a non-pypi index since i'm not aware of any other json indices (jax and torch are both html indices). In a future step, we can transform the response to be a `HashMap<Version, {source_dists: Vec<(SourceDistFilename, File)>, wheels: Vec<(WheeFilename, File)>}` (independent of python version, this cache is used by all environments together). This should speed up cache deserialization a bit, since we don't need to try source dist and wheel anymore and drop incompatible dists, and it should make building the `VersionMap` simpler. We can speed this up even further by splitting into a version lists and the info for each version. I'm mentioning this because deserialization was a major bottleneck in the rust part of the old python prototype. Fixes #481	2023-11-28 00:11:03 +00:00
konsti	d54e780843	Source dist metadata refactor (#468 ) ## Summary and motivation For a given source dist, we store the metadata of each wheel built through it in `built-wheel-metadata-v0/pypi/<source dist filename>/metadata.json`. During resolution, we check the cache status of the source dist. If it is fresh, we check `metadata.json` for a matching wheel. If there is one we use that metadata, if there isn't, we build one. If the source is stale, we build a wheel and override `metadata.json` with that single wheel. This PR thereby ties the local built wheel metadata cache to the freshness of the remote source dist. This functionality is available through `SourceDistCachedBuilder`. `puffin_installer::Builder`, `puffin_installer::Downloader` and `Fetcher` are removed, instead there are now `FetchAndBuild` which calls into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new main high-level abstraction: It spawns parallel fetching/building, for wheel metadata it calls into the registry client, for wheel files it fetches them, for source dists it calls `SourceDistCachedBuilder`. It handles locks around builds, and newly added also inter-process file locking for git operations. Fetching and building source distributions now happens in parallel in `pip-sync`, i.e. we don't have to wait for the largest wheel to be downloaded to start building source distributions. In a follow-up PR, I'll also clear built wheels when they've become stale. Another effect is that in a fully cached resolution, we need neither zip reading nor email parsing. Closes #473 ## Source dist cache structure Entries by supported sources: * `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json` * `<build wheel metadata cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json` * `<build wheel metadata cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json` But the url filename does not need to be a valid source dist filename (<https://github.com/search?q=path%3A*%2Frequirements.txt+master.zip&type=code>), so it could also be the following and we have to take any string as filename: `<build wheel metadata cache>/url/<sha256(url)>/master.zip/metadata.json` Example: ```text # git source dist pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git # pypi source dist django_allauth==0.51.0 # url source dist werkzeug @ `ff1904eb5e`2853bf83db817a7dd53d/werkzeug-3.0.1.tar.gz ``` will be stored as ```text built-wheel-metadata-v0 ├── git │ └── 5c56bc1c58c34c11 │ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f │ └── metadata.json ├── pypi │ └── django-allauth-0.51.0.tar.gz │ └── metadata.json └── url └── 6781bd6440ae72c2 └── werkzeug-3.0.1.tar.gz └── metadata.json ``` The inside of a `metadata.json`: ```json { "data": { "django_allauth-0.51.0-py3-none-any.whl": { "metadata-version": "2.1", "name": "django-allauth", "version": "0.51.0", ... } } } ```	2023-11-24 17:47:58 +00:00
konsti	8d247fe95b	Add `Tags::from_interpreter` (#498 ) Small refactoring	2023-11-24 11:36:01 +00:00
Charlie Marsh	17228ba04e	Add support for path dependencies (#471 ) ## Summary This PR adds support for local path dependencies. The approach mostly just falls out of our existing approach and infrastructure for Git and URL dependencies. Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a separate issue for editable installs.) ## Test Plan Added `pip-compile` tests that pre-download a wheel or source distribution, then install it via local path.	2023-11-21 11:49:42 +00:00
Charlie Marsh	f1aa70d9d3	Refactor distribution types to return `Result` (#470 ) ## Summary A variety of small refactors to the distribution types crate to (1) return `Result` if we find an invalid wheel, rather than treating it as a source distribution with a `.whl` suffix, and (2) DRY up some repeated code around URLs.	2023-11-20 23:08:54 +00:00
konsti	f0841cdb6e	Wheel metadata refactor (#462 ) A consistent cache structure for remote wheel metadata: * `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json` * `<wheel metadata cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json` * `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json` The source dist caching will use a similar structure (#468).	2023-11-20 17:26:36 +01:00
konsti	d3e9e1783f	Refactor lenient parsing (#467 ) Deduplicate lenient parsing code between version specifiers and Requirement. Use `warn_once!` since the warnings did show up multiple times in my code. Fix the macro hygiene in `warn_once!`.	2023-11-20 15:35:38 +00:00
Zanie Blue	e9b6fb90d6	Bump pubgrub to get range display changes (#444 ) See https://github.com/zanieb/pubgrub/pull/5	2023-11-20 09:12:48 -06:00
Charlie Marsh	60f595b469	Prefer future stream over `JoinSet` in downloader (#469 ) This avoids introducing a static lifetime requirement and, in my benchmarks, is even a little faster.	2023-11-20 13:23:30 +00:00
Charlie Marsh	8decb29bad	Use a dedicated error type for `puffin-distribution` (#466 )	2023-11-20 11:38:27 +00:00
Charlie Marsh	35fd86631b	Unify distribution operations into a single crate (#460 ) ## Summary This PR unifies the behavior that lived in the resolver's `distribution` crates with the behaviors that were spread between the various structs in the installer crate into a single `Fetcher` struct that is intended to manage all interactions with distributions. Specifically, the interface of this struct is such that it can access distribution metadata, download distributions, return those downloads, etc., all with a common cache. Overall, this is mostly just DRYing up code that was repeated between the two crates, and putting it behind a reasonable shared interface.	2023-11-20 11:22:52 +00:00
konsti	46bb18f06e	Track file index (#452 ) Track the index (or at least its url) where we got a file from across the source code. Fixes #448	2023-11-20 08:48:16 +00:00
Charlie Marsh	6fd582f8b9	Rename `puffin-distribution` to `distribution-types` (#458 ) ## Summary This crate only contains types, and I want to introduce a new crate for all _operations_ on distributions, so this feels like a more natural name given we also have `pypi-types`.	2023-11-20 09:40:26 +01:00
konsti	255edf4445	Serde support for WheelFilename through str repr (#459 ) I need this later, splitting out for PR size	2023-11-19 19:43:14 +00:00
konsti	ab60233131	Use absolute cache paths (#453 ) Previously, git requirements would fail when setting `--cache-dir`: ```console $ cargo run --bin puffin -- pip-compile --cache-dir cache-all-kinds scripts/benchmarks/requirements/all-kinds.in error: Failed to build distribution from URL: git+https://github.com/pydantic/pydantic-extra-types.git Caused by: Invalid path URL: cache-all-kinds/git-v0/db/b49ffcfeb6c2e9d8 ``` The cause is using a relative and not an absolute path, which `Url` needs, the solution is to turn the cache dir into an absolute path. This never showed up in the tests since the tests use absolute temp dirs for everything.	2023-11-19 13:32:32 +00:00
konsti	bf71e7adcf	Add graphviz output to puffin-dev resolve-cli (#443 ) I added output in graphviz DOT format to `puffin-dev resolve-cli` to help with debugging resolutions. This requires tracking the requested ranges in the graph. I also fixed the direction of the graph. Output for `black`: ```dot digraph { 0 [ label="click\n8.1.7"] 1 [ label="black\n23.11.0"] 2 [ label="packaging\n23.2"] 3 [ label="mypy-extensions\n1.0.0"] 4 [ label="tomli\n2.0.1"] 5 [ label="pathspec\n0.11.2"] 6 [ label="typing-extensions\n4.8.0"] 7 [ label="platformdirs\n4.0.0"] 1 -> 0 [ label=">=8.0.0"] 1 -> 3 [ label=">=0.4.3"] 1 -> 5 [ label=">=0.9.0"] 1 -> 4 [ label=">=1.1.0"] 1 -> 6 [ label=">=4.0.1"] 1 -> 2 [ label=">=22.0"] 1 -> 7 [ label=">=2"] } ``` ![image](https://github.com/astral-sh/puffin/assets/6826232/4a440fcd-6248-4349-8e1a-c3e0363e42b1) transformers: ![image](https://github.com/astral-sh/puffin/assets/6826232/a13a693c-a8c0-4a4f-95d9-3458431c678a) jupyter: ![graphviz](https://github.com/astral-sh/puffin/assets/6826232/ef730033-6fd9-4ea9-ac93-8c874c19a101)	2023-11-17 18:16:24 +00:00
Zanie Blue	221751487c	Use `UnusableDependencies` for URL dependency conflicts (#425 ) Extends #424 with support for URL dependency incompatibilities. Requires changes to `miette` to prevent URLs from being word wrapped; accepted upstream in https://github.com/zkat/miette/pull/321	2023-11-17 08:28:12 -06:00
Charlie Marsh	2094680cdd	Add a `warn_user_once!` macro (#442 ) Closes https://github.com/astral-sh/puffin/issues/429.	2023-11-17 02:34:06 +00:00
konsti	1883dbdc21	Always¹ clear temporary directories (#437 ) Always¹ clear the temporary directories we create. * Clear source dist downloads: Previously, the temporary directories would remain in the cache dir, now they are cleared properly * Clear wheel file downloads: Delete the `.whl` file, we only need to cache the unpacked wheel * Consistent handling of cache arguments: Abstract the handling for CLI cache args away, again making sure we remove the `--no-cache` temp dir. There are no more `into_path()` calls that persist `TempDir`s that i could find. ¹Assuming drop is run, and deleting the directory doesn't silently error.	2023-11-16 20:49:48 +00:00
Zanie Blue	0d9d4f9fca	Add an `UnusableDependencies` incompatibility kind and use for conflicting versions (#424 ) Addresses https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969 Similar to #338 this throws an error when merging versions results in an empty set. Instead of propagating that error, we capture it and return a new dependency type of `Unusable`. Unusable dependencies are a new incompatibility kind which includes an arbitrary "reason" string that we present to the user. Adding a new incompatibility kind requires changes to the vendored pubgrub crate. We could use this same incompatibility kind for conflicting urls as in #284 which should allow the solver to backtrack to another valid version instead of failing (see #425). Unlike #383 this does not require changes to PubGrub's package mapping model. I think in the long run we'll want PubGrub to accept multiple versions per package to solve this specific issue, but we're interested in it being merged upstream first. This pull request is just using the issue as a simple case to explore adding a new incompatibility type. We may or may not be able convince them to add this new incompatibility type upstream. As discussed in https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more general incompatibility kind instead which can be used for arbitrary problems. An upstream pull request has been opened for discussion at https://github.com/pubgrub-rs/pubgrub/pull/153. Related to: - https://github.com/pubgrub-rs/pubgrub/issues/152 - #338 - #383 --------- Co-authored-by: konsti <konstin@mailbox.org>	2023-11-16 20:02:06 +00:00
Zanie Blue	832058dbba	Switch from vendored PubGrub to a fork (#438 ) A fork will let us stay up to date with the upstream while replaying our work on top of it. I expect a similar workflow to the RustPython-Parser fork we maintained, except that I wrote an automation to create tags for each commit on the fork (https://github.com/zanieb/pubgrub/pull/2) so we do not need to manually tag and document each commit. To update with the upstream: - Rebase our fork's `main` branch on top of the latest changes in upstream's `dev` branch - Force push, overwriting our `main` branch history - Change the commit hash here to the last commit on `main` in our fork Since we automatically tag each commit on the fork, we should never lose the commits that are dropped from `main` during rebase.	2023-11-16 13:49:19 -06:00
konsti	e41ec12239	Option to resolve at a fixed timestamp with `pip-compile --exclude-newer YYYY-MM-DD` (#434 ) This works by filtering out files with a more recent upload time, so if the index you use does not provide upload times, the results might be inaccurate. pypi provides upload times for all files. This is, the field is non-nullable in the warehouse schema, but the simple API PEP does not know this field. If you have only pypi dependencies, this means deterministic, reproducible(!) resolution. We could try doing the same for git repos but it doesn't seem worth the effort, i'd recommend pinning commits since git histories are arbitrarily malleable and also if you care about reproducibility and such you such not use git dependencies but a custom index. Timestamps are given either as RFC 3339 timestamps such as `2006-12-02T02:07:43Z` or as UTC dates in the same format such as `2006-12-02`. Dates are interpreted as including this day, i.e. until midnight UTC that day. Date only is required to make this ergonomic and midnight seems like an ergonomic choice. In action for `pandas`: ```console $ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in Resolved 6 packages in 679ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in numpy==1.26.2 # via pandas pandas==2.1.3 python-dateutil==2.8.2 # via pandas pytz==2023.3.post1 # via pandas six==1.16.0 # via python-dateutil tzdata==2023.3 # via pandas $ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in Resolved 5 packages in 655ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in numpy==1.23.4 # via pandas pandas==1.5.1 python-dateutil==2.8.2 # via pandas pytz==2022.6 # via pandas six==1.16.0 # via python-dateutil $ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in Resolved 5 packages in 594ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in numpy==1.21.4 # via pandas pandas==1.3.4 python-dateutil==2.8.2 # via pandas pytz==2021.3 # via pandas six==1.16.0 # via python-dateutil ```	2023-11-16 19:46:17 +00:00
konsti	751f7fa9c6	Improve PEP 691 compatibility (#428 ) [PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly different, more relaxed rules around file metadata. These changes are now reflected in the `File` struct. This will make it easier to support alternative indices. I had expected that i need to introduce a separate type for that, so i'm happy it's two `Option`s more and an alias. Part of #412	2023-11-16 19:03:44 +01:00
Charlie Marsh	d3caf9ae86	Choose most-compatible wheel in resolver and installer (#422 ) ## Summary This PR implements logic to sort wheels by priority, where priority is defined as preferring more "specific" wheels over less "specific" wheels. For example, in the case of Black, my machine now selects `black-23.11.0-cp311-cp311-macosx_11_0_arm64.whl`, whereas sorting by lowest priority instead gives me `black-23.11.0-py3-none-any.whl`. As part of this change, I've also modified the resolver to fallback to using incompatible wheels when determining package metadata, if no compatible wheels are available. The `VersionMap` was also moved out of `resolver.rs` and into its own file with a wrapper type, for clarity. Closes https://github.com/astral-sh/puffin/issues/380. Closes https://github.com/astral-sh/puffin/issues/421.	2023-11-15 18:22:11 +00:00
Charlie Marsh	0af2f7e39f	Use `anstream` to avoid writing colorized output (#415 ) A more robust solution to avoiding colorized output by ensuring we write to `stdout` and `stderr` via the [`anstream`](https://docs.rs/anstream/latest/anstream/) crate. Closes https://github.com/astral-sh/puffin/issues/393.	2023-11-13 20:00:12 +00:00
Andrew Gallant	63f7f65190	change global allocator to jemalloc (and mimalloc on Windows) (#399 ) This copies the allocator configuration used in the Ruff project. In particular, this gives us an instant 10% win when resolving the top 1K PyPI packages: $ hyperfine \ "./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \ "./target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 974.2 ms ± 26.4 ms [User: 17503.3 ms, System: 2205.3 ms] Range (min … max): 943.5 ms … 1015.9 ms 10 runs Benchmark 2: ./target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null Time (mean ± σ): 883.1 ms ± 23.3 ms [User: 14626.1 ms, System: 2542.2 ms] Range (min … max): 849.5 ms … 916.9 ms 10 runs Summary './target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran 1.10 ± 0.04 times faster than './target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' I was moved to do this because I noticed `malloc`/`free` taking up a fairly sizeable percentage of time during light profiling. As is becoming a pattern, it will be easier to review this commit-by-commit. Ref #396 (wouldn't call this issue fixed) ----- I did also try adding a `smallvec` optimization to the `Version::release` field, but it didn't bare any fruit. I still think there is more to explore since the results I observed don't quite line up with what I expect. (So probably either my mental model is off or my measurement process is flawed.) You can see that attempt with a little more explanation here: `f9528b4ecd` In the course of adding the `smallvec` optimization, I also shrunk the `Version` fields from a `usize` to a `u32`. They should at least be a fixed size integer since version numbers aren't used to index memory, and I shrunk it to `u32` since it seems reasonable to assume that all version numbers will be smaller than `2^32`.	2023-11-10 14:48:59 -05:00
konsti	5cef40d87a	Add proper caching for pypi metadata fetching kinds (#368 ) I intend this to become the main form of caching for puffin: You can make http requests, you tranform the data to what you really need, you have control over the cache key, and the cache is always json (or anything else much faster we want to replace it with as long as it's serde!)	2023-11-10 11:03:40 +00:00
konsti	d1b57acaa8	Implement PEP 517 backend-path (#385 ) Closes #192	2023-11-10 11:54:23 +01:00
Andrew Gallant	33c0901a28	distribution-filename: speed up is_compatible (#367 ) This PR tweaks the representation of `Tags` in order to offer a faster implementation of `WheelFilename::is_compatible`. We now use a nested map of tags that lets us avoid looping over every supported platform tag. As the code comments suggest, that is the essential gain. We still do not mind looping over the tags in each wheel name since they tend to be quite small. And pushing our thumb on that side of things can make things worse overall since it would likely slow down WheelFilename construction itself. For micro-benchmarks, we improve considerably for compatibility checking: $ critcmp base test3 group base test3 ----- ---- ----- build_platform_tags/burntsushi-archlinux 1.00 46.2±0.28µs ? ?/sec 2.48 114.8±0.45µs ? ?/sec wheelname_parsing/flyte-long-compatible 1.00 624.8±3.31ns 174.0 MB/sec 1.01 629.4±4.30ns 172.7 MB/sec wheelname_parsing/flyte-long-incompatible 1.00 743.6±4.23ns 165.4 MB/sec 1.00 746.9±4.62ns 164.7 MB/sec wheelname_parsing/flyte-short-compatible 1.00 526.7±4.76ns 54.3 MB/sec 1.01 530.2±5.81ns 54.0 MB/sec wheelname_parsing/flyte-short-incompatible 1.00 540.4±4.93ns 60.0 MB/sec 1.01 545.7±5.31ns 59.4 MB/sec wheelname_parsing_failure/flyte-long-extension 1.00 13.6±0.13ns 3.2 GB/sec 1.01 13.7±0.14ns 3.2 GB/sec wheelname_parsing_failure/flyte-short-extension 1.00 14.0±0.20ns 1160.4 MB/sec 1.01 14.1±0.14ns 1146.5 MB/sec wheelname_tag_compatibility/flyte-long-compatible 11.33 159.8±2.79ns 680.5 MB/sec 1.00 14.1±0.23ns 7.5 GB/sec wheelname_tag_compatibility/flyte-long-incompatible 237.60 1671.8±37.99ns 73.6 MB/sec 1.00 7.0±0.08ns 17.1 GB/sec wheelname_tag_compatibility/flyte-short-compatible 16.07 223.5±8.60ns 128.0 MB/sec 1.00 13.9±0.30ns 2.0 GB/sec wheelname_tag_compatibility/flyte-short-incompatible 149.83 628.3±2.13ns 51.6 MB/sec 1.00 4.2±0.10ns 7.6 GB/sec We do regress slightly on the time it takes for `Tags::new` to run, but this is somewhat expected. And in absolute terms, 114us is perfectly acceptable given that it's only executed ~once for each `puffin` invocation. Ad hoc benchmarks indicate an overall 25% perf improvement in `puffin pip-compile` times. This roughly corresponds with how much time `is_compatible` was taking. Indeed, profiling confirms that it has virtually disappeared from the profile. Fixes #157	2023-11-09 09:01:03 -05:00
konsti	d407bbbee6	Special case missing header build errors (on linux) (#354 ) One of the most common errors i observed are build failures due to missing header files. On ubuntu, this generally means that you need to install some `<...>-dev` package that the documentation tells you about, e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs `default-libmysqlclient-dev`, [some psycopg versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation) (i remember that this was always required at some earlier point) require `libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common for many scientific packages (where conda has an advantage because they can provide those package as a dependency). The error message can be completely inscrutable if you're just a python programmer (or user) and not a c programmer (example: pygraphviz): ``` warning: no files found matching '.png' under directory 'doc' warning: no files found matching '.txt' under directory 'doc' warning: no files found matching '.css' under directory 'doc' warning: no previously-included files matching '~' found anywhere in distribution warning: no previously-included files matching '.pyc' found anywhere in distribution warning: no previously-included files matching '.svn' found anywhere in distribution no previously-included directories found matching 'doc/build' pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory 3020 \| #include "graphviz/cgraph.h" \| ^~~~~~~~~~~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1 ``` The only relevant part is `Fatal error: graphviz/cgraph.h: No such file or directory`. Why is this file not there and how do i get it to be there? This is even harder to spot in pip's output, where it's 11 lines above the last line: ![image](https://github.com/astral-sh/puffin/assets/6826232/7a3d7279-e7b1-4511-ab22-d0a35be5e672) I've special cased missing headers and made sure that the last line tells you the important information: We're missing some header, please check the documentation of {package} {version} for what to install: ![image](https://github.com/astral-sh/puffin/assets/6826232/4bbb8923-5a82-472f-ab1f-9e1471aa2896) Scrolling up: ![image](https://github.com/astral-sh/puffin/assets/6826232/89a2495a-e188-4288-b534-ad885ee08763) The difference gets even clearer with a default ubuntu terminal with its 80 columns: ![image](https://github.com/astral-sh/puffin/assets/6826232/49fb27bc-07c6-4b10-a1a1-30ec8e112438) --- Note that the situation is better for a missing compiler, there i get: ``` [...] warning: no previously-included files matching '~' found anywhere in distribution warning: no previously-included files matching '*.pyc' found anywhere in distribution warning: no previously-included files matching '.svn' found anywhere in distribution no previously-included directories found matching 'doc/build' error: command 'gcc' failed: No such file or directory --- ``` Putting the last line into google, the first two results tell me to `sudo apt-get install gcc`, the third even tells me about `sudo apt install build-essential`	2023-11-08 15:26:39 +00:00
Andrew Gallant	294955ecff	fix platform detection on Linux (#359 ) Rejigger Linux platform detection This change makes some very small improvements to the Linux platform detection logic. In particular, the existing logic did not work on my Archlinux machine since /lib64/ld-linux-x86-64.so.2 isn't a symlink. In that case, the detection logic should have fallen back to the slower `ldd --version` technique, but `read_link` fails outright when its argument isn't a symbolic link. So we tweak the logic to allow it to fail, and if it does, we still try the `ldd --version` approach instead of giving up completely. I also made some cosmetic improvements to the regex matching, as well as ensuring that the regexes are only compiled exactly once.	2023-11-07 11:39:35 -05:00
Charlie Marsh	b0286a8939	Add user feedback when building source distributions in the resolver (#347 ) It looks like Cargo, notice the bold green lines at the top (which appear during the resolution, to indicate Git fetches and source distribution builds): <img width="868" alt="Screen Shot 2023-11-06 at 11 28 47 PM" src="https://github.com/astral-sh/puffin/assets/1309177/9647a480-7be7-41e9-b1d3-69faefd054ae"> <img width="868" alt="Screen Shot 2023-11-06 at 11 28 51 PM" src="https://github.com/astral-sh/puffin/assets/1309177/6bc491aa-5b51-4b37-9ee1-257f1bc1c049"> Closes https://github.com/astral-sh/puffin/issues/287 although we can do a lot more here.	2023-11-07 14:17:31 +00:00
Charlie Marsh	2c32bc5a86	Respect direct URLs in puffin installer (#345 ) We now write the `direct_url.json` when installing, and _skip_ installing if we find a package installed via the direct URL that the user is requesting. A lot of TODOs, especially around cleaning up the `Source` abstraction and its relationship to `DirectUrl`. I'm gonna keep working on these today, but this works and makes the requirements clear. Closes #332.	2023-11-07 09:11:27 -05:00
konsti	fbe28d3b7c	Fix mastodon-py dist-info handling (#336 ) mastodon-py 1.5.1 uses a dot in its dist-info dir name, which we previously didn't handle, causing home-assistant to fail. The new implementation is based on `2f83540272/src/packaging/utils.py (L146-L172)`. Part of #199 ``` unzip -l Mastodon.py-1.5.1-py2.py3-none-any.whl Archive: Mastodon.py-1.5.1-py2.py3-none-any.whl Length Date Time Name --------- ---------- ----- ---- 153929 2020-02-29 17:39 mastodon/Mastodon.py 1029 2019-10-11 19:15 mastodon/__init__.py 7357 2019-10-11 20:24 mastodon/streaming.py 10 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/DESCRIPTION.rst 1398 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/metadata.json 9 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/top_level.txt 110 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/WHEEL 1543 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/METADATA 753 2020-03-14 18:14 Mastodon.py-1.5.1.dist-info/RECORD --------- ------- 166138 9 files ```	2023-11-07 12:36:11 +01:00
Charlie Marsh	2c114592bd	Only store small wheels in-memory (#348 ) Closes https://github.com/astral-sh/puffin/issues/246.	2023-11-07 00:50:00 +00:00
Charlie Marsh	a5e535f6fb	Remove `virtualenv` setup from gourgeist (#339 ) We now only support building bare environments.	2023-11-06 18:32:45 +00:00
Charlie Marsh	b013ea9c93	Move `DirectUrl` into `pypi-types` (#343 ) This needs to be reused elsewhere, and there's nothing specific to wheel installation about it.	2023-11-06 18:26:33 +00:00
Charlie Marsh	24e30e6557	Split `puffin-package` into requirements.txt parser and `pypi-types` (#341 ) There are only two things left in this crate and they don't really have anything to do with one another.	2023-11-06 18:19:49 +00:00
Charlie Marsh	d9bcfafa16	Write `direct_url.json` in wheel installer (#337 ) ## Summary This PR just adds the logic in `install-wheel-rs` to write `direct_url.json`. We're not actually taking advantage of it yet (or wiring it through) in Puffin. Part of https://github.com/astral-sh/puffin/issues/332.	2023-11-06 17:09:28 +00:00
konsti	9b077f3d0f	`cargo upgrade --incompatible` (#330 ) Ran `cargo upgrade --incompatible`, seems there are no changes required. From cacache 0.12.0: > BREAKING CHANGE: some signatures for copy have changed, and copy no longer automatically reflinks `which` 5.0.0 seems to have only error message changes.	2023-11-06 14:14:47 +00:00
konsti	b2439b24a1	Fetch wheel metadata by async range requests on the remote wheel (#301 ) Use range requests and async zip to extract the METADATA file from a remote wheel. We currently only cache when the remote says the remote declares the resource as immutable, see https://github.com/06chaynes/http-cache/issues/57 and https://github.com/baszalmstra/async_http_range_reader/pull/1 . The cache is stored as json with the description omitted, this improve cache deserialization performance.	2023-11-06 15:06:49 +01:00
Charlie Marsh	6d672b8951	Add source distribution support to `pip-compile` (#323 ) ## Summary This is a first-pass at adding source distribution support to the installer. The previous installation flow was: 1. Come up with a plan. 1. Find a distribution (specific file) for every package that we'll need to download. 1. Download those distributions. 1. Unzip them (since we assumed they were all wheels). 1. Install them into the virtual environment. Now, Step (3) downloads both wheels and source distributions, and we insert a step between Steps (3) and (4) to build any source distributions into zipped wheels. There are a bunch of TODOs, the most important (IMO) is that we basically have two implementations of downloading and building, between the stuff in `puffin_installer` and `puffin_resolver` (namely in `crates/puffin-resolver/src/distribution`). I didn't attempt to clean that up here -- it's already a problem, and it's related to the overall problem we need to solve around unified caching and resource management. Closes #243.	2023-11-06 08:22:36 -05:00
konsti	b79a15b458	Update pyproject-toml to 0.8.0 (#329 )	2023-11-06 13:16:36 +00:00
Charlie Marsh	d785ffdbff	Move `Source` abstraction into `puffin-distribution` (#321 ) No code changes, but this will allow it to be shared between the installer and the resolver.	2023-11-06 02:31:15 +00:00
Charlie Marsh	fa1bbbbe08	Write fully-precise Git SHAs to `pip-compile` output (#299 ) This PR adds a mechanism by which we can ensure that we _always_ try to refresh Git dependencies when resolving; further, we now write the fully resolved SHA to the "lockfile". However, nothing in the code _assumes_ we do this, so the installer will remain agnostic to this behavior. The specific approach taken here is minimally invasive. Specifically, when we try to fetch a source distribution, we check if it's a Git dependency; if it is, we fetch, and return the exact SHA, which we then map back to a new URL. In the resolver, we keep track of URL "redirects", and then we use the redirect (1) for the actual source distribution building, and (2) when writing back out to the lockfile. As such, none of the types outside of the resolver change at all, since we're just mapping `RemoteDistribution` to `RemoteDistribution`, but swapping out the internal URLs. There are some inefficiencies here since, e.g., we do the Git fetch, send back the "precise" URL, then a moment later, do a Git checkout of that URL (which will be _mostly_ a no-op -- since we have a full SHA, we don't have to fetch anything, but we _do_ check back on disk to see if the SHA is still checked out). A more efficient approach would be to return the path to the checked-out revision when we do this conversion to a "precise" URL, since we'd then only interact with the Git repo exactly once. But this runs the risk that the checked-out SHA changes between the time we make the "precise" URL and the time we build the source distribution. Closes #286.	2023-11-03 16:26:57 +00:00
Charlie Marsh	62c474d880	Add support for Git dependencies (#283 ) ## Summary This PR adds support for Git dependencies, like: ``` flask @ git+https://github.com/pallets/flask.git ``` Right now, they're only supported in the resolver (and not the installer), since the installer doesn't yet support source distributions at all. The general approach here is based on Cargo's Git implementation. Specifically, I adapted Cargo's [`git`](`23eb492cf9/src/cargo/sources/git/mod.rs`) module to perform the cloning, which is based on `libgit2`. As compared to Cargo's implementation, I made the following changes: - Removed any unnecessary code. - Fixed any Clippy errors for our stricter ruleset. - Removed the dependency on `curl`, in favor of `reqwest` which we use elsewhere. - Removed the ability to use `gix`. Cargo allows the use of `gix` as an experimental flag, but it only supports a small subset of the operations. When Cargo fully adopts `gix`, we should plan to do the same. - Removed Cargo's host key checking. We need to re-add this! I'll do it shortly. - Removed Cargo's progress bars. We should re-add this too, but we use `indicatif` and Cargo had their own thing. There are a few follow-ups to consider: - Adding support in the installer. - When we lock, we should write out the Git URL that includes the exact SHA. This lets us cache in perpetuity and avoids dependencies changing without re-locking. - When we resolve, we should _always_ try to refresh Git dependencies. (Right now, we skip if the wheel was already built.) I'll work on the latter two in follow-up PRs. Closes #202.	2023-11-02 15:14:55 +00:00
konsti	4adaa9a700	Wheel filename distribution package name (#278 ) The normalized name abstractions were not consistently, this PR uses them where they were previously missing: * `WheelFilename::distribution` * `Requirement::name` * `Requirement::extras` * `Metadata21::name` * `Metadata21::provides_dist` With `puffin-package` depending on `pep508_rs` this would be cyclical crate dependency, so `puffin-normalize` gets split out from `puffin-package`. `DistInfoName` has the same task and semantics as `PackageName`, so it's merged into the latter. `PackageName` and `ExtraName` documentation is moved onto the type and their constructors are called `new` instead of `normalize`. We now use these constructors rarely enough the implicit allocation by `to_string()` shouldn't matter anymore, while more actual cloning becomes visible.	2023-11-02 11:15:27 +00:00
Charlie Marsh	2ee555df7b	Use `puffin_cache::digest` in another site (#289 )	2023-11-02 04:48:14 +00:00
Charlie Marsh	8123e1a8f6	Add stable hash crate (#281 ) This PR adds a `puffin-cache` crate that we can share across a variety of other crates to generate stable hashes.	2023-11-01 23:41:45 +00:00
Charlie Marsh	2652caa3e3	Add support for URL dependencies (#251 ) ## Summary This PR adds support for resolving and installing dependencies via direct URLs, like: ``` werkzeug @ `960bb4017c`4aed12b5ed8b78e0153e/Werkzeug-2.0.0-py3-none-any.whl ``` These are fairly common (e.g., with `torch`), but you most often see them as Git dependencies. Broadly, structs like `RemoteDistribution` and friends are now enums that can represent either registry-based dependencies or URL-based dependencies: ```rust /// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`). #[derive(Debug, Clone)] #[allow(clippy::large_enum_variant)] pub enum RemoteDistribution { /// The distribution exists in a registry, like `PyPI`. Registry(PackageName, Version, File), /// The distribution exists at an arbitrary URL. Url(PackageName, Url), } ``` In the resolver, we now allow packages to take on an extra, optional `Url` field: ```rust #[derive(Debug, Clone, Eq, Derivative)] #[derivative(PartialEq, Hash)] pub enum PubGrubPackage { Root, Package( PackageName, Option<DistInfoName>, #[derivative(PartialEq = "ignore")] #[derivative(PartialOrd = "ignore")] #[derivative(Hash = "ignore")] Option<Url>, ), } ``` However, for the purpose of version satisfaction, we ignore the URL. This allows for the URL dependency to satisfy the transitive request in cases like: ``` flask==3.0.0 werkzeug @ `254c3e9b5f`5941e900b71206e6313b/werkzeug-3.0.1-py3-none-any.whl ``` There are a couple limitations in the current approach: - The caching for remote URLs is done separately in the resolver vs. the installer. I decided not to sweat this too much... We need to figure out caching holistically. - We don't support any sort of time-based cache for remote URLs -- they just exist forever. This will be a problem for URL dependencies, where we need some way to evict and refresh them. But I've deferred it for now. - I think I need to redo how this is modeled in the resolver, because right now, we don't detect a variety of invalid cases, e.g., providing two different URLs for a dependency, asking for a URL dependency and a _different version_ of the same dependency in the list of first-party dependencies, etc. - (We don't yet support VCS dependencies.)	2023-11-01 09:21:44 -04:00
Charlie Marsh	89dad0c9ad	Move distribution abstraction in shared crate (#258 ) This also allows us to get rid of `PinnedPackage` _and_ to remove some `Result<...>` types due to needless conversions between otherwise-identical types.	2023-10-31 15:30:06 -04:00
Charlie Marsh	3312ce30f5	Upgrade crates and remove unused dependencies (#256 )	2023-10-31 13:16:58 -04:00
konsti	29bd0a4ed8	Fix musl compilation (#234 ) musl (which we already use in ruff) allows statically linked binaries on linux. This PR switches to rustls and vendors and fixes the glibc detection. Using static musl builds makes it easier to avoid glibc errors in docker and we'll need it later for alpine users anyway. An alternative is using vendored openssl.	2023-10-30 18:10:17 +01:00
Charlie Marsh	2ba85bf80e	Add PubGrub's priority queue (#221 ) Pulls in https://github.com/pubgrub-rs/pubgrub/pull/104.	2023-10-29 21:16:02 +00:00
konsti	5ad58474ca	Add script to check the top 8k pypi packages (#198 ) To check to top 1k (current state): ```bash scripts/resolve/get_pypi_top_8k.sh cargo run --bin puffin-dev -- resolve-many scripts/resolve/pypi_top_8k_flat.txt --limit 1000 ``` Results: ``` Errors: pywin32, geoip2, maxminddb, pypika, dirac Success: 995, Error: 5 ``` pywin32 has no solution for the build environment, 3 have no `[build-system]` entry in pyproject.toml, `dirac` is missing cmake	2023-10-26 12:03:59 +00:00
konsti	216b6c41c2	Start puffin-dev (#193 ) Currently, this is only the source distribution building feature moved. It's intended that we can add development and test commands there without affecting the main cli surface	2023-10-26 09:17:22 +00:00
konsti	889f6173cc	Unify python interpreter abstractions (#178 ) Previously, we had two python interpreter metadata structs, one in gourgeist and one in puffin. Both would spawn a subprocess to query overlapping metadata and both would appear in the cli crate, if you weren't careful you could even have to different base interpreters at once. This change unifies this to one set of metadata, queried and cached once. Another effect of this crate is proper separation of python interpreter and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv and conda installed python) has a set of metadata. A venv has a root and inherits the base python metadata except for `sys.prefix`, which unlike `sys.base_prefix`, gets set to the venv root. From the root and the interpreter info we can compute the paths inside the venv. We can reuse the interpreter info of the base interpreter when creating a venv without having to query the newly created `python`.	2023-10-25 20:11:36 +00:00
konsti	1fbe328257	Build source distributions in the resolver (#138 ) This is isn't ready, but it can resolve `meine_stadt_transparent==0.2.14`. The source distributions are currently being built serially one after the other, i don't know if that is incidentally due to the resolution order, because sdist building is blocking or because of something in the resolver that could be improved. It's a bit annoying that the thing that was supposed to do http requests now suddenly also has to a whole download/unpack/resolve/install/build routine, it messes up the type hierarchy. The much bigger problem though is avoid recursive crate dependencies, it's the reason for the callback and for splitting the builder into two crates (badly named atm)	2023-10-25 20:05:13 +00:00
Charlie Marsh	49a27ff33c	Add support for parameterized link modes (#164 ) Allows the user to select between clone, hardlink, and copy semantics for installs. (The pnpm documentation has a decent description of what these mean: https://pnpm.io/npmrc#package-import-method.) Closes #159.	2023-10-22 04:35:50 +00:00
Charlie Marsh	b665f1489a	Add tests for `puffin sync` (#161 ) Closes #158.	2023-10-22 03:25:00 +00:00
Charlie Marsh	3072c3265e	Add support for lowest and lowest-direct resolution modes (#160 ) Borrows terminology from pnpm by introducing three resolution modes: - "Highest": always choose the highest compliant version (default). - "Lowest": always choose the lowest compliant version. - "LowestDirect": choose the lowest compliant version of direct dependencies, and the highest compliant version of any transitive dependencies. (This makes a bit more sense than "lowest".) Closes https://github.com/astral-sh/puffin/issues/142.	2023-10-21 22:58:06 -04:00
konsti	ae9d1f7572	Add source distribution filename abstraction (#154 ) The need for this became clear when working on the source distribution integration into the resolver. While at it i also switch the `WheelFilename` version to the parsed `pep440_rs` version now that we have this crate.	2023-10-20 17:45:57 +02:00
Charlie Marsh	4645f79237	Use `FxHash` (#151 )	2023-10-20 05:26:06 +00:00
Charlie Marsh	8001c792e7	Show requirement sources in `pip-compile` output (#149 ) Builds up a complete resolved graph from PubGrub, and shows the sources that led to each package being included in the resolution, like `pip-compile`. Closes https://github.com/astral-sh/puffin/issues/60.	2023-10-20 05:14:59 +00:00
Charlie Marsh	9b3405bf0e	Upgrade PubGrub to dev branch (#147 ) Updates to `29c48fb9f3daa11bd02794edd55060d0b01ee705` from the `pubgrub-rs` dev branch. This lets us reduce the number of changes we've made to PubGrub itself (now, only changing visibility to export a few things from the `solver.rs` module).	2023-10-20 03:23:26 +00:00

1 2 3 4

182 Commits