Commit Graph

28 Commits

Author SHA1 Message Date
konsti 11ed4f7183
Generate versioned pip launchers (#1918)
Users expect pip to have `pip`, `pip3` and `pip3.x` entrypoints. But pip
is a universal wheel, so it contains the `pip3.x` entrypoint where it
was built on. To fix this, pip special cases itself when installing
(3898741e29/src/pip/_internal/operations/install/wheel.py (L283)),
replacing the wheel entrypoint with one for the current version. We now
do the same.

Fixes #1593
2024-02-23 18:01:31 +00:00
Jane Lewis da3a7ec801
Linker copies files as a fallback when ref-linking fails (#1773)
## Summary

Fixes #1444.

In situations where the installer fails to perform a reflink, a regular
file copy is also attempted, as a fallback. This circumvents issues with
linking files across filesystems or volumes.

## Test Plan
N/A
2024-02-21 21:57:31 -05:00
Charlie Marsh 4e0b6f8f84
Avoid attempting rename in copy fallback path (#1546)
## Summary

This _could_ fix https://github.com/astral-sh/uv/issues/1454, but I'm
not sure. I was able to replicate by forcing a bunch of error states.
But, in short, if we fail to hardlink on the initial copy due to a file
existing, and then fail _again_, we fallback to copying. But if we copy,
then the tempfile doesn't exist, and so the `fs_err::rename(&tempfile,
&out_path)?;` will fail with "File not found".

This PR just ensures that the cases are explicitly mutually exclusive:
we only attempt to rename if the hardlink succeeded.
2024-02-16 17:08:49 -05:00
Zanie Blue 2586f655bb
Rename to `uv` (#1302)
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.

Then, update directory and file names:

```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g

# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```

Then add all the files again

```
# Add all the files again
git add crates
git add python/uv

# This one needs a force-add
git add -f crates/uv-trampoline
```
2024-02-15 11:19:46 -06:00
Charlie Marsh be9125b0f0
Remove unnecessary `is_dir` in `clone_recursive` (#1247) 2024-02-04 23:54:22 +00:00
Charlie Marsh bb49ebee1e
Avoid race condition in clone file replacement (#1229)
## Summary

I've never seen this in practice but in theory it is possible, and we
have the same guardrail in the hardlink path.
2024-02-01 10:55:23 -05:00
Charlie Marsh d243250dec
Avoid unnecessary permissions changes for copy paths (#1152)
In Rust, `fs::copy` automatically preserves permissions (see:
https://doc.rust-lang.org/std/fs/fn.copy.html).

Elsewhere, when copying from the zip archive out to the cache, we can
set permissions during file creation, rather than as a separate call.

Both of these should be slightly more efficient.
2024-01-27 22:11:55 -05:00
konsti 39021263dd
Windows launchers using posy trampolines (#1092)
## Background

In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.

On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.

```python
#!/usr/bin/env python
```

will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:

```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
    sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
    sys.exit(patched_main())
```

On windows, this doesn't work, we can only rely on launching `.exe`
files.

## Summary

We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.

The changes in this PR make the `black` entrypoint work:

```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```

Integration with our existing tests will be done in follow-up PRs.

## Implementation and Details

I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.

The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.

On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
2024-01-26 13:54:11 +00:00
Charlie Marsh 69c72b6fa1
Validate wheel metadata against filename (#1002)
Closes #983.
2024-01-19 05:48:55 +00:00
konsti 5051b2c004
Use tempfile to prevent install io race crashes (#929)
On ubuntu and python 3.10,

```
cargo run -q -- pip-install --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html "jax[cuda12_pip]==0.4.23"
```

non-deterministically but for me consistently fails to install with
messages such as

```
error: Failed to install: nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (nvidia-nccl-cu12==2.19.3)
  Caused by: failed to remove file `/home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py`
  Caused by: No such file or directory (os error 2)
```

```
error: Failed to install: nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (nvidia-cublas-cu12==12.3.4.1)
  Caused by: Replacing an existing file or directory failed
```

```
error: Failed to install: nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (nvidia-cuda-nvcc-cu12==12.3.107)
  Caused by: failed to hardlink file from /home/konsti/.cache/puffin/wheels-v0/pypi/nvidia-cuda-nvcc-cu12/nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64/nvidia/__init__.py to /home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py
  Caused by: File exists (os error 17)
```

We install a lot of nvidia package, that all contain
`nvidia/__init__.py`, since they all install themselves into the
`nvidia` module:

```
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
```

```
$  tree -L 1 .venv/lib/python3.10/site-packages/nvidia
.venv/lib/python3.10/site-packages/nvidia
├── cublas
├── cuda_cupti
├── cuda_nvcc
├── cuda_nvrtc
├── cuda_runtime
├── cudnn
├── cufft
├── cusolver
├── cusparse
├── __init__.py
├── nccl
└── nvjitlink
```

When installing we get a race condition, each package installation is
its own thread:
* Installer Thread 1 creates `nvidia/__init__.py`
* Installer Thread 2 sees an existing  `nvidia/__init__.py`
* Installer Thread 3 sees an existing  `nvidia/__init__.py`
* Installer Thread 2 removes `nvidia/__init__.py`
* Installer Thread 3 tries to remove `nvidia/__init__.py`, it doesn't
exist anymore -> failure.

We switch to a new strategy: When the target files exists, we don't
remove it, but instead hardlink the source file to a tempfile first,
then renaming the tempfile to the target file. Renaming is considered an
atomic operation.

I've put the logging on debug level because they cases indicate a
conflict between two packages while being rare.

Closes #925

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-01-16 21:07:39 +00:00
Charlie Marsh c06bf658bb
Remove some filesystem calls from the installer (#834)
Noticed these when working on something unrelated. Generally:

- Prefer `entry.file_type()` over `entry.path().is_file()` or similar,
as the former is almost always free on Unix.
- Call `entry.path()` once, since it allocates internally (returns a
`PathBuf`).
2024-01-08 12:59:01 -05:00
konsti 3f587156ec
Improve install instrumentation (#829)
Add tracing spans to different phases of the wheel installation.
2024-01-08 10:13:59 +00:00
Charlie Marsh 02b157085e
Add INSTALLER file to install-wheel-rs (#760)
See:
https://packaging.python.org/en/latest/specifications/recording-installed-packages/#the-installer-file
2024-01-03 17:30:54 -05:00
konsti 26f597a787
Add spans to all significant tasks (#740)
I've tried to investigate puffin's performance wrt to builds and
parallelism in general, but found the previous instrumentation to
granular. I've tried to add spans to every function that either needs
noticeable io or cpu resources without creating duplication. This also
fixes some wrong tracing usage on async functions
(https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code)
and some spans that weren't actually entered.
2024-01-02 16:17:03 +00:00
Charlie Marsh af13c83177
Overwrite individual files when reflinking (#556)
Similar to #516, but for individual files.

## Test Plan

Ran:

```sh
cargo run -p puffin-cli -- pip-uninstall plaid-python
mkdir -p /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests
echo "x=1" > /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests/__init__.py
cargo run -p puffin-cli -- pip-sync requirements.txt --no-cache --verbose
```
2023-12-04 23:59:35 +00:00
Charlie Marsh 3b55d0b295
Deduplicate various `.dist-info/METADATA` read implementations (#531)
Closes https://github.com/astral-sh/puffin/issues/484.
2023-12-03 21:29:00 -05:00
Zanie Blue 5f1f207628
Recursively merge existing package directories on installation (#516)
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.

Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.

When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
2023-11-30 10:14:51 -06:00
konsti 6841c06e2d
Show error paths in install-wheel-rs (#514)
Ensure that we consistently show a path for all io errors in
install-wheel-rs either (preferred) through `fs_err`, or as fallback by
a custom error type. For zip reading errors, we rely on the caller to
add the name and/or location of the wheel.
2023-11-29 20:14:34 +01:00
Charlie Marsh 06b312de7e
Overwrite existing files when hardlinking (#402)
## Summary

Closes https://github.com/astral-sh/puffin/issues/390.

## Test Plan

Installed `jupyter_core==5.5.0`, then removed the `jupyter_core` and
`jupyter_core-5.5.0.dist-info` directories from my virtualenv manually,
but left `jupyter.py`. I then re-ran `puffin pip-compile`, and verified
that it errored on `main` but succeeded here.
2023-11-10 20:24:19 +00:00
Charlie Marsh 56a4b51eb6
Refactor hardlink fallback to use an enum (#401)
Makes an invalid state unrepresentable (`first_try_hard_linking = true`,
`use_copy_fallback` = true`).
2023-11-10 15:18:51 -05:00
konsti d407bbbee6
Special case missing header build errors (on linux) (#354)
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).

The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):

```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
 3020 | #include "graphviz/cgraph.h"
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```

The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?

This is even harder to spot in pip's output, where it's 11 lines above
the last line:


![image](https://github.com/astral-sh/puffin/assets/6826232/7a3d7279-e7b1-4511-ab22-d0a35be5e672)

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:


![image](https://github.com/astral-sh/puffin/assets/6826232/4bbb8923-5a82-472f-ab1f-9e1471aa2896)

Scrolling up:


![image](https://github.com/astral-sh/puffin/assets/6826232/89a2495a-e188-4288-b534-ad885ee08763)

The difference gets even clearer with a default ubuntu terminal with its
80 columns:


![image](https://github.com/astral-sh/puffin/assets/6826232/49fb27bc-07c6-4b10-a1a1-30ec8e112438)

---

Note that the situation is better for a missing compiler, there i get:

```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
2023-11-08 15:26:39 +00:00
Charlie Marsh b013ea9c93
Move `DirectUrl` into `pypi-types` (#343)
This needs to be reused elsewhere, and there's nothing specific to wheel
installation about it.
2023-11-06 18:26:33 +00:00
Charlie Marsh d9bcfafa16
Write `direct_url.json` in wheel installer (#337)
## Summary

This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.

Part of https://github.com/astral-sh/puffin/issues/332.
2023-11-06 17:09:28 +00:00
konsti c6aa1cd7a3
Only fall back to copy when the first hard linking failed (#268)
Hard linking might not be supported but we (afaik) can't detect this
ahead of time, so we'll try hard linking the first file, if this
succeeds we'll know later hard linking errors are not due to lack of
os/fs support, if it fails we'll switch to copying for the rest of the
install. Follow up to
https://github.com/astral-sh/puffin/pull/237#discussion_r1376705137
2023-11-01 18:35:52 +01:00
konsti 35d6bd761b
Fallback to copy if hardlinking failed (#237) 2023-10-30 19:10:01 +00:00
Charlie Marsh ffbf6b6c16
Avoid symlinking `RECORD` file (#227)
This is the one file that gets modified during installation. Hardlinking
it is bad!
2023-10-30 03:10:38 +00:00
konsti 889f6173cc
Unify python interpreter abstractions (#178)
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.

Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
2023-10-25 20:11:36 +00:00
Charlie Marsh 49a27ff33c
Add support for parameterized link modes (#164)
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)

Closes #159.
2023-10-22 04:35:50 +00:00