## Introduction
PEP 621 is limited. Specifically, it lacks
* Relative path support
* Editable support
* Workspace support
* Index pinning or any sort of index specification
The semantics of urls are a custom extension, PEP 440 does not specify
how to use git references or subdirectories, instead pip has a custom
stringly format. We need to somehow support these while still stying
compatible with PEP 621.
## `tool.uv.source`
Drawing inspiration from cargo, poetry and rye, we add `tool.uv.sources`
or (for now stub only) `tool.uv.workspace`:
```toml
[project]
name = "albatross"
version = "0.1.0"
dependencies = [
"tqdm >=4.66.2,<5",
"torch ==2.2.2",
"transformers[torch] >=4.39.3,<5",
"importlib_metadata >=7.1.0,<8; python_version < '3.10'",
"mollymawk ==0.1.0"
]
[tool.uv.sources]
tqdm = { git = "https://github.com/tqdm/tqdm", rev = "cc372d09dcd5a5eabdc6ed4cf365bdb0be004d44" }
importlib_metadata = { url = "https://github.com/python/importlib_metadata/archive/refs/tags/v7.1.0.zip" }
torch = { index = "torch-cu118" }
mollymawk = { workspace = true }
[tool.uv.workspace]
include = [
"packages/mollymawk"
]
[tool.uv.indexes]
torch-cu118 = "https://download.pytorch.org/whl/cu118"
```
See `docs/specifying_dependencies.md` for a detailed explanation of the
format. The basic gist is that `project.dependencies` is what ends up on
pypi, while `tool.uv.sources` are your non-published additions. We do
support the full range or PEP 508, we just hide it in the docs and
prefer the exploded table for easier readability and less confusing with
actual url parts.
This format should eventually be able to subsume requirements.txt's
current use cases. While we will continue to support the legacy `uv pip`
interface, this is a piece of the uv's own top level interface. Together
with `uv run` and a lockfile format, you should only need to write
`pyproject.toml` and do `uv run`, which generates/uses/updates your
lockfile behind the scenes, no more pip-style requirements involved. It
also lays the groundwork for implementing index pinning.
## Changes
This PR implements:
* Reading and lowering `project.dependencies`,
`project.optional-dependencies` and `tool.uv.sources` into a new
requirements format, including:
* Git dependencies
* Url dependencies
* Path dependencies, including relative and editable
* `pip install` integration
* Error reporting for invalid `tool.uv.sources`
* Json schema integration (works in pycharm, see below)
* Draft user-level docs (see `docs/specifying_dependencies.md`)
It does not implement:
* No `pip compile` testing, deprioritizing towards our own lockfile
* Index pinning (stub definitions only)
* Development dependencies
* Workspace support (stub definitions only)
* Overrides in pyproject.toml
* Patching/replacing dependencies
One technically breaking change is that we now require user provided
pyproject.toml to be valid wrt to PEP 621. Included files still fall
back to PEP 517. That means `pip install -r requirements.txt` requires
it to be valid while `pip install -r requirements.txt` with `-e .` as
content falls back to PEP 517 as before.
## Implementation
The `pep508` requirement is replaced by a new `UvRequirement` (name up
for bikeshedding, not particularly attached to the uv prefix). The still
existing `pep508_rs::Requirement` type is a url format copied from pip's
requirements.txt and doesn't appropriately capture all features we
want/need to support. The bulk of the diff is changing the requirement
type throughout the codebase.
We still use `VerbatimUrl` in many places, where we would expect a
parsed/decomposed url type, specifically:
* Reading core metadata except top level pyproject.toml files, we fail a
step later instead if the url isn't supported.
* Allowed `Urls`.
* `PackageId` with a custom `CanonicalUrl` comparison, instead of
canonicalizing urls eagerly.
* `PubGrubPackage`: We eventually convert the `VerbatimUrl` back to a
`Dist` (`Dist::from_url`), instead of remembering the url.
* Source dist types: We use verbatim url even though we know and require
that these are supported urls we can and have parsed.
I tried to make improve the situation be replacing `VerbatimUrl`, but
these changes would require massive invasive changes (see e.g.
https://github.com/astral-sh/uv/pull/3253). A main problem is the ref
`VersionOrUrl` and applying overrides, which assume the same
requirement/url type everywhere. In its current form, this PR increases
this tech debt.
I've tried to split off PRs and commits, but the main refactoring is
still a single monolith commit to make it compile and the tests pass.
## Demo
Adding
d1ae3b85d5/pyproject.json
as json schema (v7) to pycharm for `pyproject.toml`, you can try the IDE
support already:

[dove.webm](https://github.com/astral-sh/uv/assets/6826232/c293c272-c80b-459d-8c95-8c46a8d198a1)
In *some* places in our crates, `serde` (and `rkyv`) are optional
dependencies. I believe this was done out of reasons of "good sense,"
that is, it follows a Rust ecosystem pattern where serde integration
tends to be an opt-in crate feature. (And similarly for `rkyv`.)
However, ultimately, `uv` itself requires `serde` and `rkyv` to
function. Since our crates are strictly internal, there are limited
consumers for our crates without `serde` (and `rkyv`) enabled. I think
one possibility is that optional `serde` (and `rkyv`) integration means
that someone can do this:
cargo test -p pep440_rs
And this will run tests _without_ `serde` or `rkyv` enabled. That in
turn could lead to faster iteration time by reducing compile times. But,
I'm not sure this is worth supporting. The iterative compilation times
of
individual crates are probably fast enough in debug mode, even with
`serde` and `rkyv` enabled. Namely, `serde` and `rkyv` themselves
shouldn't need to be re-compiled in most cases. On `main`:
```
from-scratch: `cargo test -p pep440_rs --lib` 0.685
incremental: `cargo test -p pep440_rs --lib` 0.278s
from-scratch: `cargo test -p pep440_rs --features serde,rkyv --lib` 3.948s
incremental: `cargo test -p pep440_rs --features serde,rkyv --lib` 0.321s
```
So while a from-scratch build does take significantly longer, an
incremental build is about the same.
The benefit of doing this change is two-fold:
1. It brings out crates into alignment with "reality." In particular,
some crates were _implicitly_ relying on `serde` being enabled
without explicitly declaring it. This technically means that our
`Cargo.toml`s were wrong in some cases, but it is hard to observe it
because of feature unification in a Cargo workspace.
2. We no longer need to deal with the cognitive burden of writing
`#[cfg_attr(feature = "serde", ...)]` everywhere.
Previously, uv-auth would fail to compile due to a missing process
feature. I chose to make all tokio features we use top level features,
so we can share the tokio cache between all test invocations.
## Summary
I'm surprised we haven't hit this before, but apparently we don't allow
comments after `--index-url`, `-e` entries, etc., in the
requirements.txt parser.
Closes#3011.
## Test Plan
`cargo test`
Needed to prevent circular dependencies in my toolchain work (#2931). I
think this is probably a reasonable change as we move towards persistent
configuration too?
Unfortunately `BuildIsolation` needs to be in `uv-types` to avoid
circular dependencies still. We might be able to resolve that in the
future.
## Summary
`uv` was failing to install requirements defined like:
```
file://localhost/Users/crmarsh/Downloads/iniconfig-2.0.0-py3-none-any.whl
```
Closes https://github.com/astral-sh/uv/issues/2652.
## Summary
First piece of https://github.com/astral-sh/uv/issues/313. In order to
support unnamed requirements, we need to be able to parse them in
`requirements-txt`, which in turn means that we need to introduce a new
type that's distinct from `pep508::Requirement`, given that these
_aren't_ PEP 508-compatible requirements.
Part of: https://github.com/astral-sh/uv/issues/313.
Scott schafer got me the idea: We can avoid repeating the path for
workspaces dependencies everywhere if we declare them in the virtual
package once and treat them as workspace dependencies from there on.
## Summary
This PR changes our user-facing representation for paths to use relative
paths, when the path is within the current working directory. This
mirrors what we do in Ruff. (If the path is _outside_ the current
working directory, we print an absolute path.)
Before:
```shell
❯ uv venv .venv2
Using Python 3.12.2 interpreter at: /Users/crmarsh/workspace/uv/.venv/bin/python3
Creating virtualenv at: .venv2
Activate with: source .venv2/bin/activate
```
After:
```shell
❯ cargo run venv .venv2
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/uv venv .venv2`
Using Python 3.12.2 interpreter at: .venv/bin/python3
Creating virtualenv at: .venv2
Activate with: source .venv2/bin/activate
```
Note that we still want to use the existing `.simplified_display()`
anywhere that the path is being simplified, but _still_ intended for
machine consumption (e.g., when passing to `.current_dir()`).
## Summary
I tried out `cargo shear` to see if there are any unused dependencies
that `cargo udeps` isn't reporting. It turned out, there are a few. This
PR removes those dependencies.
## Test Plan
`cargo build`
## Summary
This PR ensures that we expand environment variables _before_ sniffing
for the URL scheme (e.g., `file://` vs. `https://` vs. something else).
Closes https://github.com/astral-sh/uv/issues/2375.
## Summary
We now initialize an HTTP client in advance for remote requirements
files. It turns out this adds a significant overhead, even for
operations like auditing the environment (at least on macOS).
This PR makes initialization lazy. After a lot of evaluation, I took the
easiest route, which is: we just pass in `Connectivity`, and then use
the default HTTP client. So we won't respect netrc files and anything
else that we get from our registry client. If we want to keep using the
registry client, we _can_, it's just way more ceremony to pass down a
closure.
See: https://github.com/astral-sh/uv/issues/2346.
## Test Plan
- Verified that `cargo run pip compile
https://raw.githubusercontent.com/ansible/ansible/f1ded0f41759235eb15a7d13dbc3c95dce5d5acd/requirements.txt`
completed without error.
- Verified that `cargo run pip compile
https://raw.githubusercontent.com/ansible/ansible/f1ded0f41759235eb15a7d13dbc3c95dce5d5acd/requirements.txt
--offline` failed with an error.
- Verified that `./target/release/uv pip install requests` completed in
0-2ms, rather than hundreds.
This PR tweaks uv to support reading `requirements.txt` regardless of
whether it is encoded as UTF-8 or UTF-16. This is particularly relevant
on Windows where `uv pip freeze > requirements.txt` will likely write a
UTF-16 encoded `requirements.txt` file.
There is some discussion on #1666 where it's suggested that perhaps
we should explicitly not support this. I didn't see that until I
had already put this PR together, but even so, I think it's worth
considering this. UTF-16 is predominant on Windows. It is very easy
to produce a UTF-16 encoded file. Moreover, there is an easy and well
specified way to recognize and transcode UTF-16 encoded data to UTF-8.
I think the downside of this is that it could encourage the use UTF-16
encoded `requirements.txt` files *in addition* to UTF-8 encoded
files, and it would probably be nice to converge and standardize on
one encoding. One possible alternative to this PR is that we provide
a better error message. Another alternative is to ensure that a
`-o/--output` flag exists for all commands (neither `uv pip freeze` nor
`pip freeze` have such a flag) so that users can always write output
to a file without relying on their environment's piping behavior.
(Although this last alternative seems a little sad to me.)
It's also worth noting the [PEP-0508] doesn't seem to mention file
encoding at all. So I think from a "do the standards allow this"
perspective, this change is OK.
Finally, `pip` itself seems to work with UTF-16 encoded
`requirements.txt` files.
I think I personally overall lean towards supporting UTF-16 for
`requirements.txt` files. In part because I think it smoothes out the
UX a little bit, in part because there is no obvious specification
(that I'm aware of) that mandates that these files are UTF-8, and
finally in part because `pip` supports it too.
Fixes#1666, Fixes#2276
[PEP-0508]: https://peps.python.org/pep-0508/
## Summary
Allow using http(s) urls for constraints and requirements files handed
to the CLI, by handling paths starting with `http://` or `https://`
differently. This allows commands for such as: `uv pip install -c
https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.8.txt
requests`.
closes#1332
## Test Plan
Testing install using a `constraints.txt` file hosted on github in the
airflow repository:
fbdc2eba8e/crates/uv/tests/pip_install.rs (L1440-L1484)
## Advice Needed
- filesystem/http dispatch is implemented at a relatively low level (at
`crates/uv-fs/src/lib.rs#read_to_string`). Should I change some naming
here so it is obvious that the function is able to dispatch?
- I kept the CLI argument for -c and -r as a PathBuf, even though now it
is technically either a path or a url. We could either keep this as is
for now, or implement a new enum for this case? The enum could then
handle dispatch to files/http.
- Using another abstraction layer like
https://docs.rs/object_store/latest/object_store/ for the
files/urls/[s3] could work as well, though I ran into a bug during
testing which I couldn't debug
## Summary
This PR expands environment variables in `-r` and `-c` paths _within_
requirements files. We already do this for `@` URL references and
others.
Closes https://github.com/astral-sh/uv/issues/1473.
## Summary
Closes#2012
This changes `RequirementsTxtParserError::Parser` to take a line, column
instead of cursor location to improve reporting of parser errors. A new
function was added to compute the line and column based on the content
and cursor location when a parser error occurs for simplicity.
Given `uv pip compile .\requirements.txt` of below
```
numpy>=1,<2
--borken
tqdm
```
Before:
```
error: Unexpected '-', expected '-c', '-e', '-r' or the start of a requirement in `.\requirements.txt` at position 14
```
After:
```
error: Unexpected '-', expected '-c', '-e', '-r' or the start of a requirement in `.\requirements.txt` at position 2:3
```
Open Question: Do we want to support `line:column` for other types of
errors? I didn't look dig other potential error types where this might
be desired.
## Test Plan
New test was added to `requirements-txt` crate with this example.
## Summary
This also preserves the environment variables in the output file, e.g.:
```
Resolved 1 package in 216ms
# This file was autogenerated by uv via the following command:
# uv pip compile requirements.in --emit-index-url
--index-url https://test.pypi.org/${SUFFIX}
requests==2.5.4.1
```
I'm torn on whether that's correct or undesirable here.
Closes#2035.
Address a few pedantic lints
lints are separated into separate commits so they can be reviewed
individually.
I've not added enforcement for any of these lints, but that could be
added if desirable.
## Summary
The main change is that we need to have an explicit list of protocols we
_do_ support (like `https`), so that when we see a Windows absolute path
(`C:\...`), we don't treat the `C` as a protocol itself.
Closes https://github.com/astral-sh/uv/issues/1539.
## Summary
In a `requirements.txt` file, it turns out that the `-c` and `-r`
entries should be interpreted as relative to the file in which they're
declared, while the `-e` entries should be interpreted as relative to
the current working directory, no matter where they're defined.
Previously, we always used the current working directory; now, we use
the declaring file's directory for `-c` and `-r`.
Closes https://github.com/astral-sh/uv/issues/1367.
Closes https://github.com/astral-sh/uv/issues/1416.
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```
## Summary
For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...
This PR ensures that we remove the UNC prefix when setting the current
working directory.
Closes#1238.
## Test Plan
I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
## Summary
Like https://github.com/astral-sh/puffin/pull/1180, this PR adds logic
for `requirements.txt` parsing whereby if a requirement _looks like_ a
local requirements file or an editable directory, we prompt the user to
correct the error (typically, by adding `-r`).
## Summary
See: https://github.com/astral-sh/puffin/issues/1181.
## Test Plan
```
❯ cargo run -- pip install packse@../../zanieb/packse
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/puffin pip install 'packse@../../zanieb/packse'`
error: Distribution not found at: file:///Users/crmarsh/zanieb/packse
```
## Summary
This PR adds support for `--find-links`, `--index-url`, and
`--extra-index-url` arguments when specified in a `requirements.txt`.
It's a mostly-straightforward change. The only uncertain piece is what
to do when multiple files include these flags, and/or when we include
them on the CLI and in other files.
In general:
- If _anything_ specifies `--no-index`, we respect it.
- We combine all `--extra-index-url` and `--find-links` across all
sources, since those are just vectors.
- If we see multiple `--index-url` in requirements files, we error.
- We respect the `--index-url` from the command line over any provided
in a requirements file.
(`pip-compile` seems to just pick one semi-arbitrarily when multiple are
provided.)
Closes https://github.com/astral-sh/puffin/issues/1143.
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.
For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:
* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
allocate and create all necessary objects.
This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.
The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.
My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.
There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.
[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68