This commit attempts an optimization that switches a version's `release`
field over to a `smallvec` optimization. The idea is that most versions
are very small and can be stored inline.
Interestingly, I was unable to observe any obvious benefit:
$ hyperfine \
"./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \
"./target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null"
Benchmark 1: ./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 872.2 ms ± 26.5 ms [User: 14646.0 ms, System: 2516.0 ms]
Range (min … max): 833.0 ms … 912.0 ms 10 runs
Benchmark 2: ./target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 882.3 ms ± 17.4 ms [User: 14764.4 ms, System: 2520.9 ms]
Range (min … max): 859.7 ms … 912.7 ms 10 runs
Summary
'./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran
1.01 ± 0.04 times faster than './target/profiling/puffin-dev-smallvec-release resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null'
My hypothesis is that because of an earlier commit that switched the
global allocator to jemalloc, the cost of allocation had precipitously
decreased. To the point that the reduction in allocs from the smallvec
becomes a wash. To test my hypothesis, I dropped the jemalloc commit and
measured the perf of the smallvec optimization against main:
$ hyperfine \
"./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \
"./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null"
Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 968.0 ms ± 20.0 ms [User: 17637.4 ms, System: 2151.9 ms]
Range (min … max): 940.2 ms … 1005.3 ms 10 runs
Benchmark 2: ./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 958.4 ms ± 15.7 ms [User: 17119.7 ms, System: 2246.1 ms]
Range (min … max): 944.7 ms … 993.3 ms 10 runs
Summary
'./target/profiling/puffin-dev-smallvec-release-no-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran
1.01 ± 0.03 times faster than './target/profiling/puffin-dev-main resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null'
Fiddlesticks. Even when allocation is (presumably) more expensive, the
smallvec optimization didn't help. This suggests something is off about
my mental model of the code. So there are more avenues to explore here!
It's not clear whether these routines are used in any hot path so I
don't know whether this will have a perf improvement, but there's really
no reason not to do this.
I did this change for two reasons. Firstly, `usize` should generally
only be used for something that is indexing memory. This is why it
varies in size depending on the target. Since version numbers aren't
really things that we use to index memory, we should be using a fixed
size integer type. Secondly, I shrunk it down to `u32` because it
doesn't really seem like we need to support version numbers bigger than
2^32. In theory, we could even take this down to 2^16 via `u16`, but
that seems... within the realm of possibility? The actual reason to
shrink it is to make `Version` take up less space, allocate less memory
and (hopefully) increase the effectiveness of a small data optimization.
Comparing this with the latest change there is a small improvement:
$ hyperfine \
"./target/profiling/puffin-dev-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null" \
"./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null"
Benchmark 1: ./target/profiling/puffin-dev-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 893.2 ms ± 19.5 ms [User: 14709.5 ms, System: 2463.3 ms]
Range (min … max): 858.4 ms … 914.1 ms 10 runs
Benchmark 2: ./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null
Time (mean ± σ): 866.9 ms ± 13.9 ms [User: 14745.4 ms, System: 2424.8 ms]
Range (min … max): 850.1 ms … 896.4 ms 10 runs
Summary
'./target/profiling/puffin-dev-u32 resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null' ran
1.03 ± 0.03 times faster than './target/profiling/puffin-dev-jemalloc resolve-many --cache-dir cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2> /dev/null'
The idea is that this might unlock better improvements later. (Spoiler
alert: it didn't. But we keep this change anyway because it seems like
good sense.)
Ran `cargo upgrade --incompatible`, seems there are no changes required.
From cacache 0.12.0:
> BREAKING CHANGE: some signatures for copy have changed, and copy no
longer automatically reflinks
`which` 5.0.0 seems to have only error message changes.
We now accept a pre-release if (1) all versions are pre-releases, or (2)
there was a pre-release marker in the dependency specifiers for a direct
dependency.
The code is written such that we can support a variety of pre-release
strategies.
Closes https://github.com/astral-sh/puffin/issues/191.
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.
Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.
The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.
It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
This PR modifies the PEP 440 and PEP 508 crates to pass CI, primarily by
fixing all lint violations.
We're also now using these crates in the workspace via `path`.
(Previously, we were still fetching them from Cargo.)
This PR copies over the `pep440-rs` crate at commit
`a8303b01ffef6fccfdce562a887f6b110d482ef3` with no modifications.
It won't pass CI, but modifications will intentionally be confined to
later PRs.