This commit attempts an optimization that switches a version's `release`
field over to a `smallvec` optimization. The idea is that most versions
are very small and can be stored inline.
Interestingly, I was unable to observe any obvious benefit:
$ hyperfine \
"./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \
"./target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null"
Benchmark 1: ./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 872.2 ms ± 26.5 ms [User: 14646.0 ms, System: 2516.0 ms]
Range (min … max): 833.0 ms … 912.0 ms 10 runs
Benchmark 2: ./target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 882.3 ms ± 17.4 ms [User: 14764.4 ms, System: 2520.9 ms]
Range (min … max): 859.7 ms … 912.7 ms 10 runs
Summary
'./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran
1.01 ± 0.04 times faster than './target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null'
My hypothesis is that because of an earlier commit that switched the
global allocator to jemalloc, the cost of allocation had precipitously
decreased. To the point that the reduction in allocs from the smallvec
becomes a wash. To test my hypothesis, I dropped the jemalloc commit and
measured the perf of the smallvec optimization against main:
$ hyperfine \
"./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \
"./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null"
Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 968.0 ms ± 20.0 ms [User: 17637.4 ms, System: 2151.9 ms]
Range (min … max): 940.2 ms … 1005.3 ms 10 runs
Benchmark 2: ./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 958.4 ms ± 15.7 ms [User: 17119.7 ms, System: 2246.1 ms]
Range (min … max): 944.7 ms … 993.3 ms 10 runs
Summary
'./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran
1.01 ± 0.03 times faster than './target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null'
Fiddlesticks. Even when allocation is (presumably) more expensive, the
smallvec optimization didn't help. This suggests something is off about
my mental model of the code. So there are more avenues to explore here!
It's not clear whether these routines are used in any hot path so I
don't know whether this will have a perf improvement, but there's really
no reason not to do this.
I did this change for two reasons. Firstly, `usize` should generally
only be used for something that is indexing memory. This is why it
varies in size depending on the target. Since version numbers aren't
really things that we use to index memory, we should be using a fixed
size integer type. Secondly, I shrunk it down to `u32` because it
doesn't really seem like we need to support version numbers bigger than
2^32. In theory, we could even take this down to 2^16 via `u16`, but
that seems... within the realm of possibility? The actual reason to
shrink it is to make `Version` take up less space, allocate less memory
and (hopefully) increase the effectiveness of a small data optimization.
Comparing this with the latest change there is a small improvement:
$ hyperfine \
"./target/profiling/puffin-dev-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \
"./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null"
Benchmark 1: ./target/profiling/puffin-dev-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 893.2 ms ± 19.5 ms [User: 14709.5 ms, System: 2463.3 ms]
Range (min … max): 858.4 ms … 914.1 ms 10 runs
Benchmark 2: ./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 866.9 ms ± 13.9 ms [User: 14745.4 ms, System: 2424.8 ms]
Range (min … max): 850.1 ms … 896.4 ms 10 runs
Summary
'./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran
1.03 ± 0.03 times faster than './target/profiling/puffin-dev-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null'
The idea is that this might unlock better improvements later. (Spoiler
alert: it didn't. But we keep this change anyway because it seems like
good sense.)
This copies the allocator configuration used in the Ruff project. In
particular, this gives us an instant 10% win when resolving the top 1K
PyPI packages:
$ hyperfine \
"./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \
"./target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null"
Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 974.2 ms ± 26.4 ms [User: 17503.3 ms, System: 2205.3 ms]
Range (min … max): 943.5 ms … 1015.9 ms 10 runs
Benchmark 2: ./target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 883.1 ms ± 23.3 ms [User: 14626.1 ms, System: 2542.2 ms]
Range (min … max): 849.5 ms … 916.9 ms 10 runs
Summary
'./target/profiling/puffin-dev resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran
1.10 ± 0.04 times faster than './target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null'
I was moved to do this because I noticed `malloc`/`free` taking up a
fairly sizeable percentage of time during light profiling.
My editor is configured to remove all trailing whitespace on every line
whenever I save. I find that configuration to be extremely useful and
would prefer not to turn it off. Unfortunately, the tests in this file
do rely on significant trailing whitespace that was getting stripped
whenever I saved the file. I've tweaked the strings to avoid this by
inserting explicit an explicit ASCII whitespace escape (\x20). Not
great, but if we only do this rarely it seems okay?
## Summary
It looks like, when you install `pip`, it includes a bunch of
`__pycache__` directories in the RECORD file (although these directories
don't exist until you run `pip`). Our uninstaller assumed that the
RECORD file only contained _files_.
Closes https://github.com/astral-sh/puffin/issues/389.
I intend this to become the main form of caching for puffin: You can
make http requests, you tranform the data to what you really need, you
have control over the cache key, and the cache is always json (or
anything else much faster we want to replace it with as long as it's
serde!)
## Summary
This PR refactors our `RemoteDistribution` type such that it now follows
a clear hierarchy that matches the actual variants, and encodes the
differences between source and built distributions:
```rust
pub enum Distribution {
Built(BuiltDistribution),
Source(SourceDistribution),
}
pub enum BuiltDistribution {
Registry(RegistryBuiltDistribution),
DirectUrl(DirectUrlBuiltDistribution),
}
pub enum SourceDistribution {
Registry(RegistrySourceDistribution),
DirectUrl(DirectUrlSourceDistribution),
Git(GitSourceDistribution),
}
/// A built distribution (wheel) that exists in a registry, like `PyPI`.
pub struct RegistryBuiltDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A built distribution (wheel) that exists at an arbitrary URL.
pub struct DirectUrlBuiltDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a registry, like `PyPI`.
pub struct RegistrySourceDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A source distribution that exists at an arbitrary URL.
pub struct DirectUrlSourceDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a Git repository.
pub struct GitSourceDistribution {
pub name: PackageName,
pub url: Url,
}
```
Most of the PR just stems downstream from this change. There are no
behavioral changes, so I'm largely relying on lint, tests, and the
compiler for correctness.
This PR tweaks the representation of `Tags` in order to offer a
faster implementation of `WheelFilename::is_compatible`. We now use a
nested map of tags that lets us avoid looping over every supported
platform tag. As the code comments suggest, that is the essential gain.
We still do not mind looping over the tags in each wheel name since they
tend to be quite small. And pushing our thumb on that side of things can
make things worse overall since it would likely slow down WheelFilename
construction itself.
For micro-benchmarks, we improve considerably for compatibility
checking:
$ critcmp base test3
group base test3
----- ---- -----
build_platform_tags/burntsushi-archlinux 1.00 46.2±0.28µs ? ?/sec 2.48
114.8±0.45µs ? ?/sec
wheelname_parsing/flyte-long-compatible 1.00 624.8±3.31ns 174.0 MB/sec
1.01 629.4±4.30ns 172.7 MB/sec
wheelname_parsing/flyte-long-incompatible 1.00 743.6±4.23ns 165.4 MB/sec
1.00 746.9±4.62ns 164.7 MB/sec
wheelname_parsing/flyte-short-compatible 1.00 526.7±4.76ns 54.3 MB/sec
1.01 530.2±5.81ns 54.0 MB/sec
wheelname_parsing/flyte-short-incompatible 1.00 540.4±4.93ns 60.0 MB/sec
1.01 545.7±5.31ns 59.4 MB/sec
wheelname_parsing_failure/flyte-long-extension 1.00 13.6±0.13ns 3.2
GB/sec 1.01 13.7±0.14ns 3.2 GB/sec
wheelname_parsing_failure/flyte-short-extension 1.00 14.0±0.20ns 1160.4
MB/sec 1.01 14.1±0.14ns 1146.5 MB/sec
wheelname_tag_compatibility/flyte-long-compatible 11.33 159.8±2.79ns
680.5 MB/sec 1.00 14.1±0.23ns 7.5 GB/sec
wheelname_tag_compatibility/flyte-long-incompatible 237.60
1671.8±37.99ns 73.6 MB/sec 1.00 7.0±0.08ns 17.1 GB/sec
wheelname_tag_compatibility/flyte-short-compatible 16.07 223.5±8.60ns
128.0 MB/sec 1.00 13.9±0.30ns 2.0 GB/sec
wheelname_tag_compatibility/flyte-short-incompatible 149.83 628.3±2.13ns
51.6 MB/sec 1.00 4.2±0.10ns 7.6 GB/sec
We do regress slightly on the time it takes for `Tags::new` to run, but
this is somewhat expected. And in absolute terms, 114us is perfectly
acceptable given that it's only executed ~once for each `puffin`
invocation.
Ad hoc benchmarks indicate an overall 25% perf improvement in `puffin
pip-compile` times. This roughly corresponds with how much time
`is_compatible` was taking. Indeed, profiling confirms that it has
virtually disappeared from the profile.
Fixes#157
## Summary
Low-priority but fun thing to end the day. You can now pass
`--target-version py37`, and we'll generate a resolution for Python 3.7.
See: https://github.com/astral-sh/puffin/issues/183.
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).
The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):
```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
3020 | #include "graphviz/cgraph.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```
The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?
This is even harder to spot in pip's output, where it's 11 lines above
the last line:

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:

Scrolling up:

The difference gets even clearer with a default ubuntu terminal with its
80 columns:

---
Note that the situation is better for a missing compiler, there i get:
```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
By default, we will build source distributions for both resolving and
installing, running arbitrary code. `--no-build` adds an option to ban
this and only install from wheels, no source distributions or git builds
allowed. We also don't fetch these and instead report immediately.
I've heard from users for whom this is a requirement, i'm implementing
it now because it's helpful for testing.
I'm thinking about adding a shared `PuffinSharedArgs` struct so we don't
have to repeat each option everywhere.
Closes https://github.com/astral-sh/puffin/issues/356.
The example from the issue now renders as:
```
❯ cargo run --bin puffin-dev -q -- resolve-cli tensorflow-cpu-aws
puffin-dev failed
Caused by: No solution found when resolving build dependencies for source distribution:
Caused by: Because there is no available version for tensorflow-cpu-aws and root depends on tensorflow-cpu-aws, version solving failed.
```
This was dumb of me. We pass out indexes when adding progress bars, but
were then removing entries on completion, so any outstanding indexes
were now _invalid_. We just shouldn't remove them. The `MultiProgress`
retains a reference anyway, IIUC.
Closes https://github.com/astral-sh/puffin/issues/360.
Rejigger Linux platform detection
This change makes some very small improvements to the Linux platform
detection logic. In particular, the existing logic did not work on my
Archlinux machine since /lib64/ld-linux-x86-64.so.2 isn't a symlink. In
that case, the detection logic should have fallen back to the slower
`ldd --version` technique, but `read_link` fails outright when its
argument isn't a symbolic link. So we tweak the logic to allow it to
fail, and if it does, we still try the `ldd --version` approach instead
of giving up completely.
I also made some cosmetic improvements to the regex matching, as well as
ensuring that the regexes are only compiled exactly once.
We now write the `direct_url.json` when installing, and _skip_
installing if we find a package installed via the direct URL that the
user is requesting.
A lot of TODOs, especially around cleaning up the `Source` abstraction
and its relationship to `DirectUrl`. I'm gonna keep working on these
today, but this works and makes the requirements clear.
Closes#332.
## Summary
This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.
Part of https://github.com/astral-sh/puffin/issues/332.
Ran `cargo upgrade --incompatible`, seems there are no changes required.
From cacache 0.12.0:
> BREAKING CHANGE: some signatures for copy have changed, and copy no
longer automatically reflinks
`which` 5.0.0 seems to have only error message changes.
## Summary
This is a first-pass at adding source distribution support to the
installer.
The previous installation flow was:
1. Come up with a plan.
1. Find a distribution (specific file) for every package that we'll need
to download.
1. Download those distributions.
1. Unzip them (since we assumed they were all wheels).
1. Install them into the virtual environment.
Now, Step (3) downloads both wheels and source distributions, and we
insert a step between Steps (3) and (4) to build any source
distributions into zipped wheels.
There are a bunch of TODOs, the most important (IMO) is that we
basically have two implementations of downloading and building, between
the stuff in `puffin_installer` and `puffin_resolver` (namely in
`crates/puffin-resolver/src/distribution`). I didn't attempt to clean
that up here -- it's already a problem, and it's related to the overall
problem we need to solve around unified caching and resource management.
Closes#243.
`PackageName` and `ExtraName` can now only be constructed from valid
names. They share the same rules, so i gave them the same
implementation. Constructors are split between `new` (owned) and
`from_str` (borrowed), with the owned version avoiding allocations.
Closes#279
---------
Co-authored-by: Zanie <contact@zanie.dev>
## Summary
This just enables the `DistributionFinder` (previously known as the
`WheelFinder`) to select source distributions when there are no matching
wheels for a given platform. As a reminder, the `DistributionFinder` is
a simple resolver that doesn't look at any dependencies: it just takes a
set of pinned packages, and finds a distribution to install to satisfy
each requirement.