Python/uv - uv - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Charlie Marsh	d88ce76979	Stream unpacking of source distribution downloads (#1157 ) This PR migrates our source distribution downloads to unzip as we stream, similar to our approach for wheels. In my testing, this showed a consistent speedup (e.g., 6% here for a few representative source distributions): ```text ❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in Benchmark 1: ./target/release/main (install-cold) Time (mean ± σ): 1.503 s ± 0.039 s [User: 1.479 s, System: 0.537 s] Range (min … max): 1.466 s … 1.605 s 10 runs Benchmark 2: ./target/release/puffin (install-cold) Time (mean ± σ): 1.421 s ± 0.024 s [User: 1.505 s, System: 0.593 s] Range (min … max): 1.381 s … 1.454 s 10 runs Summary './target/release/puffin (install-cold)' ran 1.06 ± 0.03 times faster than './target/release/main (install-cold)' ```	2024-01-28 20:09:24 -05:00
Andrew Gallant	5219d37250	add initial rkyv support (#1135 ) This PR adds initial support for [rkyv] to puffin. In particular, the main aim here is to make puffin-client's `SimpleMetadata` type possible to deserialize from a `&[u8]` without doing any copies. This PR stops short of actuallying doing that zero-copy deserialization. Instead, this PR is about adding the necessary trait impls to a variety of types, along with a smattering of small refactorings to make rkyv possible to use. For those unfamiliar, rkyv works via the interplay of three traits: `Archive`, `Serialize` and `Deserialize`. The usual flow of things is this: * Make a type `T` implement `Archive`, `Serialize` and `Deserialize`. rkyv helpfully provides `derive` macros to make this pretty painless in most cases. * The process of implementing `Archive` for `T` usually creates an entirely new distinct type within the same namespace. One can refer to this type without naming it explicitly via `Archived<T>` (where `Archived` is a clever type alias defined by rkyv). * Serialization happens from `T` to (conceptually) a `Vec<u8>`. The serialization format is specifically designed to reflect the in-memory layout of `Archived<T>`. Notably, not `T`. But `Archived<T>`. * One can then get an `Archived<T>` with no copying (albeit, we will likely need to incur some cost for validation) from the previously created `&[u8]`. This is quite literally [implemented as a pointer cast][rkyv-ptr-cast]. * The problem with an `Archived<T>` is that it isn't your `T`. It's something else. And while there is limited interoperability between a `T` and an `Archived<T>`, the main issue is that the surrounding code generally demands a `T` and not an `Archived<T>`. This is at the heart of the tension for introducing zero-copy deserialization, and this is mostly an intrinsic problem to the technique and not an rkyv-specific issue. For this reason, given an `Archived<T>`, one can get a `T` back via an explicit deserialization step. This step is like any other kind of deserialization, although generally faster since no real "parsing" is required. But it will allocate and create all necessary objects. This PR largely proceeds by deriving the three aforementioned traits for `SimpleMetadata`. And, of course, all of its type dependencies. But we stop there for now. The main issue with carrying this work forward so that rkyv is actually used to deserialize a `SimpleMetadata` is figuring out how to deal with `DataWithCachePolicy` inside of the cached client. Ideally, this type would itself have rkyv support, but adding it is difficult. The main difficulty lay in the fact that its `CachePolicy` type is opaque, not easily constructable and is internally the tip of the iceberg of a rat's nest of types found in more crates such as `http`. While one "dumb"-but-annoying approach would be to fork both of those crates and add rkyv trait impls to all necessary types, it is my belief that this is the wrong approach. What we'd like to do is not just use rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to get an `Archived<DataWithCachePolicy>` and make actual decisions used the archived type directly. Doing that will require some work to make `Archived<DataWithCachePolicy>` directly useful. My suspicion is that, after doing the above, we may want to mush forward with a similar approach for `SimpleMetadata`. That is, we want `Archived<SimpleMetadata>` to be as useful as possible. But right now, the structure of the code demands an eager conversion (and thus deserialization) into a `SimpleMetadata` and then into a `VersionMap`. Getting rid of that eagerness is, I think, the next step after dealing with `DataWithCachePolicy` to unlock bigger wins here. There are many commits in this PR, but most are tiny. I still encourage review to happen commit-by-commit. [rkyv]: https://rkyv.org/ [rkyv-ptr-cast]: https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68	2024-01-28 12:14:59 -05:00
Charlie Marsh	d6795da0ea	Set permissions after streaming unzip (#1151 ) ## Summary When we migrated to an "unzip while we stream" solution, we lost the logic to set permissions on the extracted files, so executables in wheels were no longer executable. It turns out this is a little tricky, since the permissions metadata is in the central directory at the _end_ of the zip file, and the async ZIP reader explicitly stops iteration once it hits the central directory. (Specifically, it goes 4 bytes into the central directory, since it sees the 4-byte signature header and then stops.) So, to solve that, I've added a `CentralDirectoryReader` that continues where that iterator left off. This required forking the async zip crate: https://github.com/charliermarsh/rs-async-zip/pull/1. It took a lot of fiddling but I'm quite confident in the code now, especially since the async zip crate validates the signature kind on every read. The central directory is typically quite small (even for the Zig wheel, which is enormous, it's just around 1MB), so I don't expect this to have a high cost. Closes https://github.com/astral-sh/puffin/issues/1148.	2024-01-27 19:22:44 -05:00
Charlie Marsh	abe1867a0d	Enable Windows wheel builds in CI (#1129 ) Closes https://github.com/astral-sh/puffin/issues/990.	2024-01-27 01:12:25 +00:00
konsti	39021263dd	Windows launchers using posy trampolines (#1092 ) ## Background In virtual environments, we want to install python programs as console commands, e.g. `black .` over `python -m black .`. They may be called [entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/) or scripts. For entrypoints, we're given a module name and function to call in that module. On Unix, we generate a minimal python script launcher. Text files are runnable on unix by adding a shebang at their top, e.g. ```python #!/usr/bin/env python ``` will make the operating system run the file with the current python interpreter. A venv launcher for black in `/home/ferris/colorize/.venv` (module name: `black`, function to call: `patched_main`) would look like this: ```python #!/home/ferris/colorize/.venv/bin/python # -- coding: utf-8 -- import re import sys from black import patched_main if __name__ == "__main__": sys.argv[0] = re.sub(r"(-script\.pyw\|\.exe)?$", "", sys.argv[0]) sys.exit(patched_main()) ``` On windows, this doesn't work, we can only rely on launching `.exe` files. ## Summary We use posy's rust implementation of a trampoline, which is based on distlib's c++ implementation. We pre-build a minimal exe and append the launcher script as stored zip archive behind it. The exe will look for the venv python interpreter next to it and use it to execute the appended script. The changes in this PR make the `black` entrypoint work: ```powershell cargo run -- venv .venv cargo run -q -- pip install black .\.venv\Scripts\black --version ``` Integration with our existing tests will be done in follow-up PRs. ## Implementation and Details I've vendored the posy trampoline crate. It is a formatted, renamed and slightly changed for embedding version of https://github.com/njsmith/posy/pull/28. The posy launchers are smaller than the distlib launchers, 16K vs 106K for black. Currently only `x86_64-pc-windows-msvc` is supported. The crate requires a nightly compiler for its no-std binary size tricks. On windows, an application can be launched with a console or without (to create windows instead), which needs two different launchers. The gui launcher will subsequently use `pythonw.exe` while the console launcher uses `python.exe`.	2024-01-26 13:54:11 +00:00
Charlie Marsh	50057cd5f2	Re-add Cargo's known hosts checking (#1118 ) ## Summary This ensures that (like Cargo) we don't suffer from https://github.com/advisories/GHSA-r5w3-xm58-jv6j, by way of checking known hosts when fetching via `libgit2`. The implementation is taken from Cargo itself, modified to remove all configuration, since we don't yet support configuration for known hosts, etc. Closes #285.	2024-01-25 22:29:36 -05:00
Charlie Marsh	5ad2e60561	Use `same-file` to detect interpreter shims (#1099 ) Our existing detection doesn't work on Windows, because we canoncalize the interpreter path but not `info.sys_executable`, so the former includes the UNC prefix, etc. This is cross-platform and gets at the intent of the check.	2024-01-25 12:27:49 -05:00
Zanie Blue	272555e915	Switch to ref on `main` for PubGrub (#1094 ) Just fixing the wrong merge order from https://github.com/astral-sh/puffin/pull/1088	2024-01-25 14:50:12 +00:00
Charlie Marsh	904db967af	Use junctions instead of symlinks on Windows (#1087 ) ## Summary When we unzip wheels in the cache, we write the directories out to an `archive-v0` bucket, and then symlink into that bucket from the `wheels-v0` and `built-wheels-v0` buckets. On Windows, symlinks are not well supported. Specifically, they need to be explicitly enabled by the user. So, instead of symlinks, we now use junctions, which are well-supported on Windows, and allow you to (effectively) symlink a directory to another directory. This PR implements said junction support, which gets the core installer working on Windows. In the past, we also used symlinks to implement another primitive: we wanted to be able to replace a directory "atomically" (I put "atomically" in quotes because I don't know if it's actually a guaranteed atomic operation), in case someone was trying to use the directory while we were replacing it (as opposed to deleting the directory, then moving it into place). On Windows, it doesn't appear to be possible to atomically replace a junction. So instead, I'm using a new design, whereby the cache always returns canonicalized paths. We know these canonicalized paths are unique and won't be replaced, so they're safe for writers to rely on. In general, when we write new data to the cache, we now return the canonicalized path. When we read from the cache, and try to identify (e.g.) the set of wheels available to us, we canonicalize the links immediately and consider them non-existent if that operation fails. Closes #1085. --------- Co-authored-by: konstin <konstin@mailbox.org>	2024-01-25 10:06:38 +01:00
Zanie Blue	ed1ac640b9	Consolidate `UnusableDependencies` into a generic `Unavailable` incompatibility (#1088 ) Requires https://github.com/zanieb/pubgrub/pull/20 In short, `UnusableDependencies` can be generalized into `Unavailable` which encompasses incompatibilities where a package range which is unusable for some inherent reason as well as when its dependencies are unusable. We can eventually use this to track more incompatibilities in the solver. I made the reason string required because I can't see a case where we should leave it out. Additionally, this improves the display of conflicts in the root requirements.	2024-01-24 22:10:44 -06:00
Charlie Marsh	0519375bd6	Remove some unused dependencies (#1077 )	2024-01-24 11:58:21 -05:00
Charlie Marsh	63f3434b21	Use nanoid instead of uuid (#1074 ) ## Summary Gives us equivalent randomness with ~half as many characters.	2024-01-24 05:05:14 +00:00
Andrew Gallant	eebc2f340a	make some things guaranteed to be deterministic (#1065 ) This PR replaces a few uses of hash maps/sets with btree maps/sets and index maps/sets. This has the benefit of guaranteeing a deterministic order of iteration. I made these changes as part of looking into a flaky test. Unfortunately, I'm not optimistic that anything here will actually fix the flaky test, since I don't believe anything was actually dependent on the order of iteration.	2024-01-23 20:30:33 -05:00
Charlie Marsh	6561617c56	Store source distribution builds under a unique manifest ID (#1051 ) ## Summary This is a refactor of the source distribution cache that again aims to make the cache purely additive. Instead of deleting all built wheels when the cache gets invalidated (e.g., because the source distribution changed on PyPI or something), we now treat each invalidation as its own cache directory. The manifest inside of the source distribution directory now becomes a pointer to the "latest" version of the source distribution cache. Here's a visual example: ![Screenshot 2024-01-22 at 5 35 41 PM](https://github.com/astral-sh/puffin/assets/1309177/ca103c83-e116-4956-b91c-8434fe62cffe) With this change, we avoid deleting built distributions that might be relied on elsewhere and maintain our invariant that the cache is purely additive. The cost is that we now preserve stale wheels, but we should add a garbage collection mechanism to deal with that.	2024-01-23 19:49:11 +00:00
Charlie Marsh	53b7e3cb4f	Update list of expected architectures for `cargo-dist` (#1021 )	2024-01-19 22:32:01 +00:00
Charlie Marsh	b3954f2449	Enable PowerPC builds (#1017 ) Closes #1015.	2024-01-19 17:29:11 -05:00
Charlie Marsh	72924935f8	Upgrade cargo-dist (#1016 )	2024-01-19 20:19:02 +00:00
Charlie Marsh	7b365195cb	Add support for ARM Linux builds in release (#1012 ) Closes #992.	2024-01-19 15:13:07 -05:00
Charlie Marsh	980e1f6d79	Set explicit Docker permissions (#997 )	2024-01-19 05:29:29 +00:00
Charlie Marsh	6af4eb7a45	Remove `[puffin]` prefix (#989 ) And disable the plan job on CI now.	2024-01-19 01:59:06 +00:00
Charlie Marsh	3a1cd44fc6	Add Puffin Docker image (#985 ) Missing piece for the release. ## Test Plan Built the image locally: ```shell ❯ docker run 99956098e1f8f04e209dcfc4a0afcee67df1fe8a726c164884e67f035b1a0f42 Usage: puffin [OPTIONS] <COMMAND> Commands: pip Resolve and install Python packages venv Create a virtual environment clean Clear the cache help Print this message or the help of the given subcommand(s) Options: -q, --quiet Do not print any output -v, --verbose Use verbose output -n, --no-cache Avoid reading from or writing to the cache --cache-dir <CACHE_DIR> Path to the cache directory [env: PUFFIN_CACHE_DIR=] -h, --help Print help -V, --version Print version ```	2024-01-18 20:21:31 -05:00
Charlie Marsh	59d700ad2d	Run cargo-dist plan on PR (#971 )	2024-01-18 16:16:11 -05:00
Charlie Marsh	f9154e8297	Add release workflow (#961 ) ## Summary This PR adds a release workflow powered by `cargo-dist`. It's similar to the version that's PR'd in Ruff (https://github.com/astral-sh/ruff/pull/9559), with the exception that it doesn't include the Docker build or the "update dependents" step for pre-commit.	2024-01-18 15:44:11 -05:00
Charlie Marsh	96a61fb351	Remove RFC2047 decoder (#967 ) ## Summary - This was inherited from `d719988323/src/metadata.rs (LL78C2-L91C26)` - ...which introduced this code here: `9cd1d43f7c` - ...with the originating issue here: https://github.com/PyO3/maturin/issues/612 - ...and the upstream issue here: https://github.com/staktrace/mailparse/issues/50 It seems like the goal was to support Unicode in certain header fields, but I don't think this is necessary for us. We only use `get_first_value` for `Requires-Python`, which has to be ASCII, doesn't it? In my testing, it seems like the `charset` hack can also be removed. The tests I copied over actually work without it, which makes me a bit skeptical. The main benefit here is that we get to a remove a _big_ dependency stack, including Chumsky and Stacker and psm which have limited cross-platform support.	2024-01-18 15:09:45 -05:00
Charlie Marsh	fb22680311	Remove legacy release files (#960 ) I added these long ago, and they're going to make the release diff a lot more confusing.	2024-01-18 05:07:55 +00:00
Charlie Marsh	231686e71b	Remove `incompatibilities` from index (#905 ) This isn't really part of the "index", it's part of the resolution.	2024-01-13 02:57:15 +00:00
Zanie Blue	93d3093a2a	Improve formatting of package ranges in error messages (#864 ) Closes #810 Closes https://github.com/astral-sh/puffin/issues/812 Requires https://github.com/zanieb/pubgrub/pull/19 and https://github.com/zanieb/pubgrub/pull/18 - Always pair package ranges with names e.g. `... of a matching a<1.0` instead of `... of a matching <1.0` - Split range segments onto multiple lines when not a singleton as suggested in [#850](https://github.com/astral-sh/puffin/pull/850#discussion_r1446419610) - Improve formatting when ranges are split across multiple lines e.g. by avoiding extra spaces and improving wording Note review will require expanding the hidden files as there are significant changes to the report formatter and snapshots. Bear with me here as these are definitely not perfect still. The following changes build on top of this independently for further improvements: - #868 - #867 - #866 - #871	2024-01-10 14:16:23 -06:00
bojanserafimov	e67b7858e6	Use zlib-ng for faster decompression (#859 )	2024-01-09 16:13:36 -05:00
Zanie Blue	2b0c2e294b	Fix formatting of negated singleton versions in error messages (#836 ) Closes #805 Requires https://github.com/zanieb/pubgrub/pull/17	2024-01-08 12:33:01 -06:00
Charlie Marsh	aeefe65227	Fix `tracing-duration-export` compilation (#835 ) ## Summary I'm unable to run `puffin-cli` on `main` as the `tracing-durations-export` is marked as optional, but the crate actually depends on it to compile. Further, without `tracing-durations-export`, there are `Option` types that can't resolve to a concrete type. This PR fixes compilation with and without the feature.	2024-01-08 18:04:23 +00:00
Charlie Marsh	54838914be	Migrate back to `owo-colors` (#824 ) In the past, I moved us to `owo-colors` (https://github.com/astral-sh/puffin/pull/121); then, we moved back, because we ran into issues with overriding the settings to force-disable colors. But `anstream` solved those problems, so I'm moving us _back_ to `owo-colors`, since it's what `anstream` recommends, and it's already used by many of our dependencies (`miette`, `configparser`). --------- Co-authored-by: konstin <konstin@mailbox.org>	2024-01-08 08:54:57 +00:00
Charlie Marsh	ca2e3d7073	Remove outdated Cargo.toml comment (#813 )	2024-01-06 02:50:52 +00:00
konsti	5820a9d937	Update dependencies (#794 ) Pull in a bunch of updates so they get some testing before we announce the project. textwrap 0.16 is blocked on miette updating, http 1.0 on reqwest.	2024-01-05 11:40:12 -05:00
Zanie Blue	5e04a95c45	Disable line wrapping during scenario tests (#784 ) Adds support for a `PUFFIN_NO_WRAP` environment variable which disables line wrapping in `miette` output. We set this variable in the scenario tests to improve the readability of snapshots. I contributed the ability to disable line wrapping upstream at https://github.com/zkat/miette/pull/328	2024-01-04 19:07:16 +00:00
konsti	2db9135c51	Update pubgrub to 78b8add6942766e5fb070bbda1de570e93d6399f (#783 ) Pull in the latest perf improvements	2024-01-04 15:55:35 +00:00
konsti	3f8dc9f5bb	Update pubgrub (#737 ) Pull in https://github.com/pubgrub-rs/pubgrub/pull/170 and https://github.com/pubgrub-rs/pubgrub/pull/171	2023-12-28 21:13:27 +00:00
Charlie Marsh	343880820b	Un-escape HTML entities when decoding (#723 ) I don't have a good testing strategy here (I'm manually testing against `devpi` via `packse`), but the HTML index uses (e.g.) `data-requires-python=">=3.8"`, so we need to decode.	2023-12-24 16:35:45 -05:00
Charlie Marsh	5bce699ee1	Add support for HTML indexes (#719 ) ## Summary This PR adds support for HTML index responses (as with `--index-url=https://download.pytorch.org/whl`). Closes https://github.com/astral-sh/puffin/issues/412.	2023-12-24 16:04:00 +00:00
konsti	e60f0ec732	Update pubgrub (#713 ) Easier than i expected: We simply never construct the pubgrub error variants since we have our own main loop. The `unreachable!()`s can be removed when never is stabilized	2023-12-20 23:56:59 +01:00
konsti	71964ec7a8	Switch to msgpack in the cached client (#662 ) This gives a 1.23 speedup on transformers-extras. We could change to msgpack for the entire cache if we want. I only tried this format and postcard so far, where postcard was much slower (like 1.6s). I don't actually want to merge it like this, i wanted to figure out the ballpark of improvement for switching away from json. ``` hyperfine --warmup 3 --runs 10 "target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in" "target/profiling/branch pip-compile scripts/requirements/transformers-extras.in" Benchmark 1: target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in Time (mean ± σ): 179.1 ms ± 4.8 ms [User: 157.5 ms, System: 48.1 ms] Range (min … max): 174.9 ms … 188.1 ms 10 runs Benchmark 2: target/profiling/branch pip-compile scripts/requirements/transformers-extras.in Time (mean ± σ): 221.1 ms ± 6.7 ms [User: 208.1 ms, System: 46.5 ms] Range (min … max): 213.5 ms … 235.5 ms 10 runs Summary target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in ran 1.23 ± 0.05 times faster than target/profiling/branch pip-compile scripts/requirements/transformers-extras.in ``` Disadvantage: We can't manually look into the cache anymore to debug things - [ ] Check more formats, i currently only tested json, msgpack and postcard, there should be other formats, too - [x] Switch over `CachedByTimestamp` serialization (for the interpreter caching) - [x] Switch over error handling and make sure puffin is still resilient to cache failure	2023-12-16 21:01:35 +00:00
Zanie Blue	490fb55ac5	Use available versions to simplify unsat error reports (#547 ) Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate version ranges in error reports using the actual available versions for each package. Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements this behavior as a method in the `Reporter` — here it's implemented in our custom report formatter (#521) instead which requires no upstream changes. Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the versions for packages that will be used in the report. This is a work in progress. Some things to do: - ~We may want to allow lazy retrieval of the version maps from the formatter~ - [x] We should probably create a separate error type for no solution instead of mixing them with other resolve errors - ~We can probably do something smarter than creating vectors to hold the versions~ - [x] This degrades error messages when a single version is not available, we'll need to special case that - [x] It seems safer to coerce the error type in `resolve` instead of `solve` if feasible	2023-12-12 23:25:16 +00:00
Charlie Marsh	a24534b0ce	Use `rustc-hash` instead of `fxhash` crate (#594 ) `fxhash` is the old, less maintained version of this crate (`rustc-hash`). We use the latter in Ruff.	2023-12-08 20:27:49 +00:00
konsti	366c389385	Parse editable installs (#564 ) Parse `-e` for editable installs in `requirements.txt`. Unlike all the other requirements, editable installs don't have the name of the package specified.	2023-12-06 18:21:15 +01:00
Zanie Blue	37ca2e2928	Bump pubgrub for latest upstream (#525 ) https://github.com/pubgrub-rs/pubgrub/pull/157	2023-12-04 09:09:30 -06:00
Charlie Marsh	ee2fca3a48	Add CACHEDIR and .gitignore tags to cache directories (#526 ) ## Summary Even if this will typically be in the user's application folder (rather than a local directory), it's still a good practice. Closes https://github.com/astral-sh/puffin/issues/280.	2023-12-02 00:37:51 +00:00
Zanie Blue	2a8544df9e	Use a custom pubgrub report formatter (#521 ) Uses https://github.com/zanieb/pubgrub/pull/10 to drastically simplify our reporter implementation. This will allow us to make use of upstream improvements to the reporter e.g. https://github.com/zanieb/pubgrub/pull/8 without multiple duplicative pull requests.	2023-12-01 13:36:12 -06:00
Zanie Blue	efcc4f1409	Use upstream commit for reflink-copy requirement (#523 ) https://github.com/cargo-bins/reflink-copy/pull/51 was merged	2023-12-01 10:58:24 +00:00
Zanie Blue	5f1f207628	Recursively merge existing package directories on installation (#516 ) Previously, when installing a package we would delete the target directory before copying (or linking) the contents of the package. However, this means that we do not properly support namespace packages which can share a target directory. Instead the last package to be installed would be override existing packages. Since we install packages in parallel, this could result in a race condition where the target directory already exists which is not allowed when using `clonefile`. See example error in #515. `c7e63d2dce` provides a regression test for this — it fails on `main`. Here, we implement a recursive merge when the target directory already exists. Both packages will be installed into the same directory. We no longer delete the target directory, which seems okay since we uninstall packages before installing now. When files conflict, we will likely throw an error still. The correct behavior to implement in this case is unclear, as if we just take "first write wins" or "last write wins" we could end up with some files from one package and some from another resulting in two broken packages. A possible solution here is to lock the target directories while copying.	2023-11-30 10:14:51 -06:00
konsti	d89fbeb642	Migrate interpreter query to custom caching (#508 ) This removes the last usage of cacache by replacing it with a custom, flat json caching keyed by the digest of the executable path. ![image](https://github.com/astral-sh/puffin/assets/6826232/8f777c4c-1f1b-4656-ba7b-002175270556) A step towards #478. I've made `CachedByTimestamp<T>` generic over `T` but intentionally not moved it to `puffin-cache` yet.	2023-11-28 17:14:59 +00:00
konsti	8855f44b5f	Move simple index queries to `CachedClient` (#504 ) Replaces the usage of `http-cache-reqwest` for simple index queries with our custom cached client, removing `http-cache-reqwest` altogether. The new cache paths are `<cache>/simple-v0/<index>/<package_name>.json`. I could not test with a non-pypi index since i'm not aware of any other json indices (jax and torch are both html indices). In a future step, we can transform the response to be a `HashMap<Version, {source_dists: Vec<(SourceDistFilename, File)>, wheels: Vec<(WheeFilename, File)>}` (independent of python version, this cache is used by all environments together). This should speed up cache deserialization a bit, since we don't need to try source dist and wheel anymore and drop incompatible dists, and it should make building the `VersionMap` simpler. We can speed this up even further by splitting into a version lists and the info for each version. I'm mentioning this because deserialization was a major bottleneck in the rust part of the old python prototype. Fixes #481	2023-11-28 00:11:03 +00:00
Charlie Marsh	9d35128840	Use Clippy lint table over Cargo config (#490 ) Closes https://github.com/astral-sh/puffin/issues/482.	2023-11-22 15:10:27 +00:00
konsti	7c7daa8f83	Consistent Cargo.toml syntax (#483 ) Remove the last Cargo.toml inconsistencies, see `1526b3458a (r1401083681)`. Now all `[dependencies]` are workspace dependencies.	2023-11-22 08:34:08 +00:00
Zanie Blue	e9b6fb90d6	Bump pubgrub to get range display changes (#444 ) See https://github.com/zanieb/pubgrub/pull/5	2023-11-20 09:12:48 -06:00
Zanie Blue	221751487c	Use `UnusableDependencies` for URL dependency conflicts (#425 ) Extends #424 with support for URL dependency incompatibilities. Requires changes to `miette` to prevent URLs from being word wrapped; accepted upstream in https://github.com/zkat/miette/pull/321	2023-11-17 08:28:12 -06:00
Zanie Blue	0d9d4f9fca	Add an `UnusableDependencies` incompatibility kind and use for conflicting versions (#424 ) Addresses https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969 Similar to #338 this throws an error when merging versions results in an empty set. Instead of propagating that error, we capture it and return a new dependency type of `Unusable`. Unusable dependencies are a new incompatibility kind which includes an arbitrary "reason" string that we present to the user. Adding a new incompatibility kind requires changes to the vendored pubgrub crate. We could use this same incompatibility kind for conflicting urls as in #284 which should allow the solver to backtrack to another valid version instead of failing (see #425). Unlike #383 this does not require changes to PubGrub's package mapping model. I think in the long run we'll want PubGrub to accept multiple versions per package to solve this specific issue, but we're interested in it being merged upstream first. This pull request is just using the issue as a simple case to explore adding a new incompatibility type. We may or may not be able convince them to add this new incompatibility type upstream. As discussed in https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more general incompatibility kind instead which can be used for arbitrary problems. An upstream pull request has been opened for discussion at https://github.com/pubgrub-rs/pubgrub/pull/153. Related to: - https://github.com/pubgrub-rs/pubgrub/issues/152 - #338 - #383 --------- Co-authored-by: konsti <konstin@mailbox.org>	2023-11-16 20:02:06 +00:00
Zanie Blue	832058dbba	Switch from vendored PubGrub to a fork (#438 ) A fork will let us stay up to date with the upstream while replaying our work on top of it. I expect a similar workflow to the RustPython-Parser fork we maintained, except that I wrote an automation to create tags for each commit on the fork (https://github.com/zanieb/pubgrub/pull/2) so we do not need to manually tag and document each commit. To update with the upstream: - Rebase our fork's `main` branch on top of the latest changes in upstream's `dev` branch - Force push, overwriting our `main` branch history - Change the commit hash here to the last commit on `main` in our fork Since we automatically tag each commit on the fork, we should never lose the commits that are dropped from `main` during rebase.	2023-11-16 13:49:19 -06:00
konsti	e41ec12239	Option to resolve at a fixed timestamp with `pip-compile --exclude-newer YYYY-MM-DD` (#434 ) This works by filtering out files with a more recent upload time, so if the index you use does not provide upload times, the results might be inaccurate. pypi provides upload times for all files. This is, the field is non-nullable in the warehouse schema, but the simple API PEP does not know this field. If you have only pypi dependencies, this means deterministic, reproducible(!) resolution. We could try doing the same for git repos but it doesn't seem worth the effort, i'd recommend pinning commits since git histories are arbitrarily malleable and also if you care about reproducibility and such you such not use git dependencies but a custom index. Timestamps are given either as RFC 3339 timestamps such as `2006-12-02T02:07:43Z` or as UTC dates in the same format such as `2006-12-02`. Dates are interpreted as including this day, i.e. until midnight UTC that day. Date only is required to make this ergonomic and midnight seems like an ergonomic choice. In action for `pandas`: ```console $ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in Resolved 6 packages in 679ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in numpy==1.26.2 # via pandas pandas==2.1.3 python-dateutil==2.8.2 # via pandas pytz==2023.3.post1 # via pandas six==1.16.0 # via python-dateutil tzdata==2023.3 # via pandas $ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in Resolved 5 packages in 655ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in numpy==1.23.4 # via pandas pandas==1.5.1 python-dateutil==2.8.2 # via pandas pytz==2022.6 # via pandas six==1.16.0 # via python-dateutil $ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in Resolved 5 packages in 594ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in numpy==1.21.4 # via pandas pandas==1.3.4 python-dateutil==2.8.2 # via pandas pytz==2021.3 # via pandas six==1.16.0 # via python-dateutil ```	2023-11-16 19:46:17 +00:00
Charlie Marsh	0af2f7e39f	Use `anstream` to avoid writing colorized output (#415 ) A more robust solution to avoiding colorized output by ensuring we write to `stdout` and `stderr` via the [`anstream`](https://docs.rs/anstream/latest/anstream/) crate. Closes https://github.com/astral-sh/puffin/issues/393.	2023-11-13 20:00:12 +00:00
konsti	5cef40d87a	Add proper caching for pypi metadata fetching kinds (#368 ) I intend this to become the main form of caching for puffin: You can make http requests, you tranform the data to what you really need, you have control over the cache key, and the cache is always json (or anything else much faster we want to replace it with as long as it's serde!)	2023-11-10 11:03:40 +00:00
Charlie Marsh	2c114592bd	Only store small wheels in-memory (#348 ) Closes https://github.com/astral-sh/puffin/issues/246.	2023-11-07 00:50:00 +00:00
konsti	9b077f3d0f	`cargo upgrade --incompatible` (#330 ) Ran `cargo upgrade --incompatible`, seems there are no changes required. From cacache 0.12.0: > BREAKING CHANGE: some signatures for copy have changed, and copy no longer automatically reflinks `which` 5.0.0 seems to have only error message changes.	2023-11-06 14:14:47 +00:00
konsti	b2439b24a1	Fetch wheel metadata by async range requests on the remote wheel (#301 ) Use range requests and async zip to extract the METADATA file from a remote wheel. We currently only cache when the remote says the remote declares the resource as immutable, see https://github.com/06chaynes/http-cache/issues/57 and https://github.com/baszalmstra/async_http_range_reader/pull/1 . The cache is stored as json with the description omitted, this improve cache deserialization performance.	2023-11-06 15:06:49 +01:00
konsti	b79a15b458	Update pyproject-toml to 0.8.0 (#329 )	2023-11-06 13:16:36 +00:00
Charlie Marsh	62c474d880	Add support for Git dependencies (#283 ) ## Summary This PR adds support for Git dependencies, like: ``` flask @ git+https://github.com/pallets/flask.git ``` Right now, they're only supported in the resolver (and not the installer), since the installer doesn't yet support source distributions at all. The general approach here is based on Cargo's Git implementation. Specifically, I adapted Cargo's [`git`](`23eb492cf9/src/cargo/sources/git/mod.rs`) module to perform the cloning, which is based on `libgit2`. As compared to Cargo's implementation, I made the following changes: - Removed any unnecessary code. - Fixed any Clippy errors for our stricter ruleset. - Removed the dependency on `curl`, in favor of `reqwest` which we use elsewhere. - Removed the ability to use `gix`. Cargo allows the use of `gix` as an experimental flag, but it only supports a small subset of the operations. When Cargo fully adopts `gix`, we should plan to do the same. - Removed Cargo's host key checking. We need to re-add this! I'll do it shortly. - Removed Cargo's progress bars. We should re-add this too, but we use `indicatif` and Cargo had their own thing. There are a few follow-ups to consider: - Adding support in the installer. - When we lock, we should write out the Git URL that includes the exact SHA. This lets us cache in perpetuity and avoids dependencies changing without re-locking. - When we resolve, we should _always_ try to refresh Git dependencies. (Right now, we skip if the wheel was already built.) I'll work on the latter two in follow-up PRs. Closes #202.	2023-11-02 15:14:55 +00:00
konsti	4adaa9a700	Wheel filename distribution package name (#278 ) The normalized name abstractions were not consistently, this PR uses them where they were previously missing: * `WheelFilename::distribution` * `Requirement::name` * `Requirement::extras` * `Metadata21::name` * `Metadata21::provides_dist` With `puffin-package` depending on `pep508_rs` this would be cyclical crate dependency, so `puffin-normalize` gets split out from `puffin-package`. `DistInfoName` has the same task and semantics as `PackageName`, so it's merged into the latter. `PackageName` and `ExtraName` documentation is moved onto the type and their constructors are called `new` instead of `normalize`. We now use these constructors rarely enough the implicit allocation by `to_string()` shouldn't matter anymore, while more actual cloning becomes visible.	2023-11-02 11:15:27 +00:00
Charlie Marsh	2652caa3e3	Add support for URL dependencies (#251 ) ## Summary This PR adds support for resolving and installing dependencies via direct URLs, like: ``` werkzeug @ `960bb4017c`4aed12b5ed8b78e0153e/Werkzeug-2.0.0-py3-none-any.whl ``` These are fairly common (e.g., with `torch`), but you most often see them as Git dependencies. Broadly, structs like `RemoteDistribution` and friends are now enums that can represent either registry-based dependencies or URL-based dependencies: ```rust /// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`). #[derive(Debug, Clone)] #[allow(clippy::large_enum_variant)] pub enum RemoteDistribution { /// The distribution exists in a registry, like `PyPI`. Registry(PackageName, Version, File), /// The distribution exists at an arbitrary URL. Url(PackageName, Url), } ``` In the resolver, we now allow packages to take on an extra, optional `Url` field: ```rust #[derive(Debug, Clone, Eq, Derivative)] #[derivative(PartialEq, Hash)] pub enum PubGrubPackage { Root, Package( PackageName, Option<DistInfoName>, #[derivative(PartialEq = "ignore")] #[derivative(PartialOrd = "ignore")] #[derivative(Hash = "ignore")] Option<Url>, ), } ``` However, for the purpose of version satisfaction, we ignore the URL. This allows for the URL dependency to satisfy the transitive request in cases like: ``` flask==3.0.0 werkzeug @ `254c3e9b5f`5941e900b71206e6313b/werkzeug-3.0.1-py3-none-any.whl ``` There are a couple limitations in the current approach: - The caching for remote URLs is done separately in the resolver vs. the installer. I decided not to sweat this too much... We need to figure out caching holistically. - We don't support any sort of time-based cache for remote URLs -- they just exist forever. This will be a problem for URL dependencies, where we need some way to evict and refresh them. But I've deferred it for now. - I think I need to redo how this is modeled in the resolver, because right now, we don't detect a variety of invalid cases, e.g., providing two different URLs for a dependency, asking for a URL dependency and a _different version_ of the same dependency in the list of first-party dependencies, etc. - (We don't yet support VCS dependencies.)	2023-11-01 09:21:44 -04:00
Charlie Marsh	3312ce30f5	Upgrade crates and remove unused dependencies (#256 )	2023-10-31 13:16:58 -04:00
konsti	29bd0a4ed8	Fix musl compilation (#234 ) musl (which we already use in ruff) allows statically linked binaries on linux. This PR switches to rustls and vendors and fixes the glibc detection. Using static musl builds makes it easier to avoid glibc errors in docker and we'll need it later for alpine users anyway. An alternative is using vendored openssl.	2023-10-30 18:10:17 +01:00
konsti	5ad58474ca	Add script to check the top 8k pypi packages (#198 ) To check to top 1k (current state): ```bash scripts/resolve/get_pypi_top_8k.sh cargo run --bin puffin-dev -- resolve-many scripts/resolve/pypi_top_8k_flat.txt --limit 1000 ``` Results: ``` Errors: pywin32, geoip2, maxminddb, pypika, dirac Success: 995, Error: 5 ``` pywin32 has no solution for the build environment, 3 have no `[build-system]` entry in pyproject.toml, `dirac` is missing cmake	2023-10-26 12:03:59 +00:00
Charlie Marsh	49a27ff33c	Add support for parameterized link modes (#164 ) Allows the user to select between clone, hardlink, and copy semantics for installs. (The pnpm documentation has a decent description of what these mean: https://pnpm.io/npmrc#package-import-method.) Closes #159.	2023-10-22 04:35:50 +00:00
Charlie Marsh	4645f79237	Use `FxHash` (#151 )	2023-10-20 05:26:06 +00:00
Charlie Marsh	d5105a76c5	Improve and test diagnostics for requirements-reading CLI commands (#143 ) Also removes `owo_colors` because it was really painful to get it to avoid printing colors during tests.	2023-10-19 18:13:40 -04:00
Charlie Marsh	7bc42ca2ce	Use `owo_colors` instead of `colored` (#121 ) This is what `miette` uses so seems better to avoid two coloring crates.	2023-10-18 18:57:07 +00:00
Charlie Marsh	1fc03780f9	Use `miette` for `puffin add` diagnostics (#119 ) Experiment in using `miette` for better user-facing diagnostics in the CLI crate: <img width="710" alt="Screen Shot 2023-10-18 at 2 11 54 PM" src="https://github.com/astral-sh/puffin/assets/1309177/30299da0-da65-4972-944f-cb8cc5f72a77"> For now, only the `add` command has been migrated, and all the library crates continue to use `anyhow`.	2023-10-18 14:24:09 -04:00
Charlie Marsh	4c87a1d42c	Add a `puffin add` command (#117 ) This needs far better error handling and user-facing feedback, but it does the basic operation (and includes discovery of the `pyproject.toml` file, etc.).	2023-10-18 00:51:20 -04:00
konsti	fa2fd14587	Add basic sdist builder (#104 ) This adds a basic sdist builder that has been tested with two source distributions, one with a PEP 517 backend and one with setup.py. It uses pip for requirements installation atm, lacks testing in all directions, lacks checks for recursive requirements, can't pass in already resolved versions, doesn't support prepare metadata for build to allow resolution to continue without doing the actual (native) build, error messages are mediocre, etc. ```console $ RUST_LOG=puffin_build=debug puffin-build --wheels wheels downloads/tqdm-4.66.1.tar.gz 2023-10-16T12:28:35.503182Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/tqdm-4.66.1.tar.gz 2023-10-16T12:28:35.521780Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=18.4ms time.idle=16.7µs 2023-10-16T12:28:35.845096Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: Calling pip to install build dependencies 2023-10-16T12:28:37.668660Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: close time.busy=1.82s time.idle=13.2µs 2023-10-16T12:28:37.668744Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.get_requires_for_build_wheel()` 2023-10-16T12:28:38.159205Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=490ms time.idle=13.0µs 2023-10-16T12:28:38.159304Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.build_wheel()` 2023-10-16T12:28:38.501732Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=342ms time.idle=15.2µs 2023-10-16T12:28:38.522700Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=3.02s time.idle=16.2µs Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/tqdm-4.66.1-py3-none-any.whl 2023-10-16T12:28:38.522772Z DEBUG puffin_build: Took 3020ms $ puffin-build --wheels wheels downloads/geoextract-0.3.1.tar.gz 2023-10-16T12:28:40.884622Z DEBUG build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/geoextract-0.3.1.tar.gz 2023-10-16T12:28:40.887743Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=2.97ms time.idle=12.6µs 2023-10-16T12:28:41.469738Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=585ms time.idle=15.3µs Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/geoextract-0.3.1-py3-none-any.whl 2023-10-16T12:28:41.469814Z DEBUG puffin_build: Took 585ms ```	2023-10-16 12:43:31 +00:00
Charlie Marsh	471a1d657d	Migrate resolver proof-of-concept to PubGrub (#97 ) ## Summary This PR enables the proof-of-concept resolver to backtrack by way of using the `pubgrub-rs` crate. Rather than using PubGrub as a _framework_ (implementing the `DependencyProvider` trait, letting PubGrub call us), I've instead copied over PubGrub's primary solver hook (which is only ~100 lines or so) and modified it for our purposes (e.g., made it async). There's a lot to improve here, but it's a start that will let us understand PubGrub's appropriateness for this problem space. A few observations: - In simple cases, the resolver is slower than our current (naive) resolver. I think it's just that the pipelining isn't as efficient as in the naive case, where we can just stream package and version fetches concurrently without any bottlenecks. - A lot of the code here relates to bridging PubGrub with our own abstractions -- so we need a `PubGrubPackage`, a `PubGrubVersion`, etc.	2023-10-15 22:05:44 -04:00
konstin	a8d020f53c	Add `profiling` profile for profilers	2023-10-13 11:51:07 +02:00
Charlie Marsh	496cb7b2ef	Migrate to `requirements_txt.rs` (#90 ) Remove the parser I wrote in favor of Konsti's which is much more complete. The only change vs. the version in `poc-monotrail` is that I changed the tests to use insta rather than manually storing and comparing against JSON snapshots. Closes https://github.com/astral-sh/puffin/issues/89.	2023-10-12 17:09:00 +00:00
Charlie Marsh	a0294a510c	Rework `puffin sync` output to summarize (#81 ) This also moves away from using `tracing` for user-facing logging, instead introducing a new `Printer` abstraction. Closes #66.	2023-10-10 03:29:09 +00:00
Charlie Marsh	ba2b200fce	Enable release builds via `cargo-dist` (#79 )	2023-10-09 20:48:55 +00:00
Charlie Marsh	0ca17a1cf2	Use local copy of `gourgeist` (#62 ) This PR gets `gourgeist` passing our local CI and integrated into the broader workspace. There's some duplicate between concepts in `gourgeist` (like the `InterpreterInfo`) and structs we have elsewhere, but we can tackle those later.	2023-10-08 18:45:08 +00:00
Charlie Marsh	2a846e76b7	Store unzipped wheels in a cache (#49 ) This PR massively speeds up the case in which you need to install wheels that already exist in the global cache. The new strategy is as follows: - Download the wheel into the content-addressed cache. - Unzip the wheel into the cache, but ignore content-addressing. It turns out that writing to `cacache` for every file in the zip added a ton of overhead, and I don't see any actual advantages to doing so. Instead, we just unzip the contents into a directory at, e.g., `~/.cache/puffin/django-4.1.5`. - (The unzip itself is now parallelized with Rayon.) - When installing the wheel, we now support unzipping from a directory instead of a zip archive. This required duplicating and tweaking a few functions. - When installing the wheel, we now use reflinks (or copy-on-write links). These have a few fantastic properties: (1) they're extremely cheap to create (on macOS, they are allegedly faster than hard links); (2) they minimize disk space, since we avoid copying files entirely in the vast majority of cases; and (3) if the user then edits a file locally, the cache doesn't get polluted. Orogene, Bun, and soon pnpm all use reflinks. Puffin is now ~15x faster than `pip` for the common case of installing cached data into a fresh environment. Closes https://github.com/astral-sh/puffin/issues/21. Closes https://github.com/astral-sh/puffin/issues/39.	2023-10-08 04:04:48 +00:00
Charlie Marsh	f3015ffc1f	Add a `clean` command to clear the cache (#41 )	2023-10-07 15:19:03 +00:00
Charlie Marsh	ae28552b3a	Use local copy of `install-wheel-rs` (#34 ) This PR modifies the `install-wheel-rs` (and a few other crates) to get everything playing nicely. Specifically, CI should pass, and all these crates now use workspace dependencies between one another. As part of this change, I split out the wheel name parsing into its own `wheel-filename` crate, and the compatibility tag parsing into its own `platform-tags` crate.	2023-10-07 01:43:55 +00:00
Charlie Marsh	c8477991a9	Use local versions of PEP 440 and PEP 508 crates (#32 ) This PR modifies the PEP 440 and PEP 508 crates to pass CI, primarily by fixing all lint violations. We're also now using these crates in the workspace via `path`. (Previously, we were still fetching them from Cargo.)	2023-10-07 00:16:44 +00:00
Charlie Marsh	dd26cfa0cc	Migrate to `tokio` (#27 ) Closes https://github.com/astral-sh/puffin/issues/26.	2023-10-06 20:31:03 +00:00
Charlie Marsh	ca6aa207ff	Move to workspace dependencies (#25 )	2023-10-06 19:49:41 +00:00
Charlie Marsh	2d6266b167	Add an HTTP cache (and `--no-cache` argument) (#14 ) Closes https://github.com/astral-sh/puffin/issues/3.	2023-10-05 19:14:05 -04:00
Charlie Marsh	b059c590c4	Add basic CI via GitHub Actions (#10 ) Closes https://github.com/astral-sh/puffin/issues/1.	2023-10-05 13:42:58 -04:00
Charlie Marsh	8b9ac30507	Add license, Cargo.toml, etc.	2023-10-05 12:45:38 -04:00

... 5 6 7 8 9

441 Commits