uv/crates
Charlie Marsh 5621c414cf
Use symlinks for directories entries in cache (#1037)
## Summary

One problem we have in the cache today is that we can't overwrite
entries atomically, because we store unzipped _directories_ in the cache
(which makes installation _much_ faster than storing zipped
directories). So, if you ignore the existing contents of the cache when
writing, you might run into an error, because you might attempt to write
a directory where a directory already exists.

This is especially annoying for cache refresh, because in order to
refresh the cache, we have to purge it (i.e., delete a bunch of stuff),
which is also highly unsafe if Puffin is running across multiple threads
or multiple processes.

The solution I'm proposing here is that whenever we persist a
_directory_ to the cache, we persist it to a special "archive" bucket.
Then, within the other buckets, directory entries are actually symlinks
into that "archive" bucket. With symlinks, we can atomically replace,
which means we can easily overwrite cache entries without having to
delete from the cache.

The main downside is that we'll now accumulate dangling entries in the
"archive" bucket, and so we'll need to implement some form of garbage
collection to ensure that we remove entries with no symlinks. Another
downside is that cache reads and writes will be a bit slower, since we
need to deal with creating and resolving these symlinks.

As an example... after this change, the cache entry for this unzipped
wheel is actually a symlink:

![Screenshot 2024-01-22 at 11 56
18 AM](https://github.com/astral-sh/puffin/assets/1309177/99ff6940-5096-4246-8d16-2a7bdcdd8d4b)

Then, within the archive directory, we actually have two unique entries
(since I intentionally ran the command twice to ensure overwrites were
safe):

![Screenshot 2024-01-22 at 11 56
22 AM](https://github.com/astral-sh/puffin/assets/1309177/717d04e2-25d9-4225-b190-bad1441868c6)
2024-01-23 19:52:37 +00:00
..
bench Use Clippy lint table over Cargo config (#490) 2023-11-22 15:10:27 +00:00
cache-key Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
distribution-filename Implement `--find-links` as flat indexes (directories in pip-compile) (#912) 2024-01-15 02:04:10 +00:00
distribution-types Improve error message when editable requirement doesn't exist (#1024) 2024-01-20 12:59:18 -05:00
gourgeist Add support for PyPy wheels (#1028) 2024-01-22 14:22:27 +00:00
install-wheel-rs Validate wheel metadata against filename (#1002) 2024-01-19 05:48:55 +00:00
once-map Propagate cancellation errors in `OnceMap` (#1032) 2024-01-22 09:00:21 -05:00
pep440-rs Rename `pep440-rs` to `Readme.md` (#1014) 2024-01-19 15:16:12 -05:00
pep508-rs Allow relative paths in requirements.txt (#1027) 2024-01-22 14:20:30 +00:00
platform-host Error when `ldd` is not in path (#506) 2023-11-28 05:55:04 +00:00
platform-tags Add support for PyPy wheels (#1028) 2024-01-22 14:22:27 +00:00
puffin Write an `Into<anstream::ColorChoice>` implementation for more idiomatic code (#1064) 2024-01-23 15:43:16 +00:00
puffin-build Add support for PyPy wheels (#1028) 2024-01-22 14:22:27 +00:00
puffin-cache Use symlinks for directories entries in cache (#1037) 2024-01-23 19:52:37 +00:00
puffin-client Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures (#1004) 2024-01-19 09:38:36 +00:00
puffin-dev Use a separate memory index for each requirement (#1036) 2024-01-22 16:22:03 +00:00
puffin-dispatch Add support for disabling installation from pre-built wheels (#956) 2024-01-19 11:24:27 -06:00
puffin-distribution Use symlinks for directories entries in cache (#1037) 2024-01-23 19:52:37 +00:00
puffin-extract Use fs_err in more places (#926) 2024-01-15 09:39:33 +00:00
puffin-fs Use symlinks for directories entries in cache (#1037) 2024-01-23 19:52:37 +00:00
puffin-git Split `puffin-cache` into Puffin-specific and generic utilities (#728) 2023-12-25 14:38:56 +00:00
puffin-installer Use symlinks for directories entries in cache (#1037) 2024-01-23 19:52:37 +00:00
puffin-interpreter Use ctime for interpreter timestamps (#1067) 2024-01-23 19:52:20 +00:00
puffin-normalize Avoid some additional clones for `PackageName` (#896) 2024-01-12 17:54:40 +00:00
puffin-resolver Fix missing comma before conclusions (#1042) 2024-01-22 13:31:09 -06:00
puffin-traits Add support for disabling installation from pre-built wheels (#956) 2024-01-19 11:24:27 -06:00
puffin-warnings Migrate back to `owo-colors` (#824) 2024-01-08 08:54:57 +00:00
puffin-workspace Use Clippy lint table over Cargo config (#490) 2023-11-22 15:10:27 +00:00
pypi-types Remove RFC2047 decoder (#967) 2024-01-18 15:09:45 -05:00
requirements-txt Allow relative paths in requirements.txt (#1027) 2024-01-22 14:20:30 +00:00
README.md Rename `puffin-cli` crate to `puffin` (#976) 2024-01-18 19:02:52 -05:00

README.md

Crates

bench

Functionality for benchmarking Puffin.

cache-key

Generic functionality for caching paths, URLs, and other resources across platforms.

distribution-filename

Parse built distribution (wheel) and source distribution (sdist) filenames to extract structured metadata.

distribution-types

Abstractions for representing built distributions (wheels) and source distributions (sdists), and the sources from which they can be downloaded.

gourgeist

A venv replacement to create virtual environments in Rust.

install-wheel-rs

Install built distributions (wheels) into a virtual environment.]

once-map

A waitmap-like concurrent hash map for executing tasks exactly once.

pep440-rs

Utilities for interacting with Python version numbers and specifiers.

pep508-rs

Utilities for interacting with PEP 508 dependency specifiers.

platform-host

Functionality for detecting the current platform (operating system, architecture, etc.).

platform-tags

Functionality for parsing and inferring Python platform tags as per PEP 425.

puffin

Command-line interface for the Puffin package manager.

puffin-build

A PEP 517-compatible build frontend for Puffin.

puffin-cache

Functionality for caching Python packages and associated metadata.

puffin-client

Client for interacting with PyPI-compatible HTTP APIs.

puffin-dev

Development utilities for Puffin.

puffin-dispatch

A centralized struct for resolving and building source distributions in isolated environments. Implements the traits defined in puffin-traits.

puffin-distribution

Client for interacting with built distributions (wheels) and source distributions (sdists). Capable of fetching metadata, distribution contents, etc.

puffin-extract

Utilities for extracting files from archives.

puffin-fs

Utilities for interacting with the filesystem.

puffin-git

Functionality for interacting with Git repositories.

puffin-installer

Functionality for installing Python packages into a virtual environment.

puffin-interpreter

Functionality for detecting and leveraging the current Python interpreter.

puffin-normalize

Normalize package and extra names as per Python specifications.

puffin-package

Types and functionality for working with Python packages, e.g., parsing wheel files.

puffin-resolver

Functionality for resolving Python packages and their dependencies.

puffin-traits

Shared traits for Puffin, to avoid circular dependencies.

pypi-types

General-purpose type definitions for types used in PyPI-compatible APIs.

puffin-warnings

User-facing warnings for Puffin.

requirements-txt

Functionality for parsing requirements.txt files.