Recursively resolve direct URL references upfront (#2684)

## Summary

This PR would enable us to support transitive URL requirements. The key
idea is to leverage the fact that...

- URL requirements can only come from URL requirements.
- URL requirements identify a _specific_ version, and so don't require
backtracking.

Prior to running the "real" resolver, we recursively resolve any URL
requirements, and collect all the known URLs upfront, then pass those to
the resolver as "lookahead" requirements. This means the resolver knows
upfront that if a given package is included, it _must_ use the provided
URL.

Closes https://github.com/astral-sh/uv/issues/1808.
This commit is contained in:
Charlie Marsh 2024-04-01 17:16:20 -04:00 committed by GitHub
parent f60e7495ca
commit a48bcaecb1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 107 additions and 97 deletions

View File

@ -38,41 +38,6 @@ Instead, uv supports its own environment variables, like `UV_INDEX_URL`. In the
also support persistent configuration in its own configuration file format (e.g., `pyproject.toml` also support persistent configuration in its own configuration file format (e.g., `pyproject.toml`
or `uv.toml` or similar). For more, see [#651](https://github.com/astral-sh/uv/issues/651). or `uv.toml` or similar). For more, see [#651](https://github.com/astral-sh/uv/issues/651).
## Transitive direct URL dependencies
While uv does support direct URL dependencies (e.g., `black @ https://...`), it does not support
nested (or "transitive") direct URL dependencies, instead requiring that any direct URL dependencies
are declared upfront.
For example, if `black @ https://...` itself had a dependency on `toml @ https://...`, uv would
reject the transitive direct URL dependency on `toml` and require that `toml` be declared as a
dependency in the `pyproject.toml` file, like:
```toml
# pyproject.toml
dependencies = [
"black @ https://...",
"toml @ https://...",
]
```
This is a deliberate choice to avoid the correctness and security issues associated with allowing
transitive dependencies to introduce arbitrary URLs into the dependency graph.
For example:
- Your package depends on `package_a==1.0.0`.
- Your package depends on `package_b==1.0.0`.
- `package_b==1.0.0` depends on `package_a @ https://...`.
If `package_a @ https://...` happens to resolve to version `1.0.0`, `pip` would install `package_a`
from the direct URL. This is a security issue, since the direct URL could be controlled by an
attacker, and a correctness issue, since the direct URL could resolve to an entirely different
package with the same name and version.
In the future, uv may allow transitive URL dependencies in some form (e.g., with user opt-in).
For more, see [#1808](https://github.com/astral-sh/uv/issues/1808).
## Pre-release compatibility ## Pre-release compatibility
By default, uv will accept pre-release versions during dependency resolution in two cases: By default, uv will accept pre-release versions during dependency resolution in two cases:
@ -167,6 +132,23 @@ In the future, uv will support pinning packages to dedicated indexes (see: [#171
Additionally, [PEP 708](https://peps.python.org/pep-0708/) is a provisional standard that aims to Additionally, [PEP 708](https://peps.python.org/pep-0708/) is a provisional standard that aims to
address the "dependency confusion" issue across package registries and installers. address the "dependency confusion" issue across package registries and installers.
## Transitive direct URL dependencies for constraints and overrides
While uv does support URL dependencies (e.g., `black @ https://...`), it does not support
_transitive_ (i.e., "nested") direct URL dependencies for constraints and overrides.
Specifically, if a constraint or override is defined using a direct URL dependency, and the
constrained package has a direct URL dependency of its own, uv _may_ reject that transitive direct
URL dependency during resolution.
uv also makes the assumption that non-URL dependencies won't introduce URL dependencies (i.e., that
dependencies fetched from a registry will not themselves have direct URL dependencies). If a non-URL
dependency _does_ introduce a URL dependency, uv will reject the URL dependency during resolution.
If uv rejects a transitive URL dependency in either case, the best course of action is to provide
the URL dependency as a direct dependency in the `requirements.in` file, rather than as a
constraint, override, or transitive dependency.
## Virtual environments by default ## Virtual environments by default
`uv pip install` and `uv pip sync` are designed to work with virtual environments by default. `uv pip install` and `uv pip sync` are designed to work with virtual environments by default.

View File

@ -1,29 +1,34 @@
use std::collections::VecDeque; use std::collections::VecDeque;
use std::sync::Arc;
use anyhow::Result; use anyhow::{Context, Result};
use futures::stream::FuturesUnordered; use futures::stream::FuturesUnordered;
use futures::StreamExt; use futures::StreamExt;
use rustc_hash::FxHashSet;
use distribution_types::{BuildableSource, Dist, LocalEditable}; use distribution_types::{Dist, LocalEditable};
use pep508_rs::{MarkerEnvironment, Requirement, VersionOrUrl}; use pep508_rs::{MarkerEnvironment, Requirement, VersionOrUrl};
use pypi_types::Metadata23; use pypi_types::Metadata23;
use uv_client::RegistryClient; use uv_client::RegistryClient;
use uv_distribution::{Reporter, SourceDistributionBuilder}; use uv_distribution::{DistributionDatabase, Reporter};
use uv_types::{BuildContext, Constraints, Overrides, RequestedRequirements}; use uv_types::{BuildContext, Constraints, Overrides, RequestedRequirements};
/// A resolver for resolving lookahead requirements from local dependencies. /// A resolver for resolving lookahead requirements from direct URLs.
/// ///
/// The resolver extends certain privileges to "first-party" requirements. For example, first-party /// The resolver extends certain privileges to "first-party" requirements. For example, first-party
/// requirements are allowed to contain direct URL references, local version specifiers, and more. /// requirements are allowed to contain direct URL references, local version specifiers, and more.
/// ///
/// We make an exception for transitive requirements of _local_ dependencies. For example, /// The lookahead resolver resolves requirements recursively for direct URLs, so that the resolver
/// `pip install .` should treat the dependencies of `.` as if they were first-party dependencies. /// can treat them as first-party dependencies for the purpose of analyzing their specifiers.
/// This matches our treatment of editable installs (`pip install -e .`). /// Namely, this enables transitive direct URL dependencies, since we can tell the resolver all of
/// the known URLs upfront.
/// ///
/// The lookahead resolver resolves requirements for local dependencies, so that the resolver can /// This strategy relies on the assumption that direct URLs are only introduced by other direct
/// treat them as first-party dependencies for the purpose of analyzing their specifiers. /// URLs, and not by PyPI dependencies. (If a direct URL _is_ introduced by a PyPI dependency, then
pub struct LookaheadResolver<'a> { /// the resolver will (correctly) reject it later on with a conflict error.) Further, it's only
/// possible because a direct URL points to a _specific_ version of a package, and so we know that
/// any correct resolution will _have_ to include it (unlike with PyPI dependencies, which may
/// require a range of versions and backtracking).
pub struct LookaheadResolver<'a, Context: BuildContext + Send + Sync> {
/// The direct requirements for the project. /// The direct requirements for the project.
requirements: &'a [Requirement], requirements: &'a [Requirement],
/// The constraints for the project. /// The constraints for the project.
@ -32,46 +37,43 @@ pub struct LookaheadResolver<'a> {
overrides: &'a Overrides, overrides: &'a Overrides,
/// The editable requirements for the project. /// The editable requirements for the project.
editables: &'a [(LocalEditable, Metadata23)], editables: &'a [(LocalEditable, Metadata23)],
/// The reporter to use when building source distributions. /// The database for fetching and building distributions.
reporter: Option<Arc<dyn Reporter>>, database: DistributionDatabase<'a, Context>,
} }
impl<'a> LookaheadResolver<'a> { impl<'a, Context: BuildContext + Send + Sync> LookaheadResolver<'a, Context> {
/// Instantiate a new [`LookaheadResolver`] for a given set of requirements. /// Instantiate a new [`LookaheadResolver`] for a given set of requirements.
pub fn new( pub fn new(
requirements: &'a [Requirement], requirements: &'a [Requirement],
constraints: &'a Constraints, constraints: &'a Constraints,
overrides: &'a Overrides, overrides: &'a Overrides,
editables: &'a [(LocalEditable, Metadata23)], editables: &'a [(LocalEditable, Metadata23)],
context: &'a Context,
client: &'a RegistryClient,
) -> Self { ) -> Self {
Self { Self {
requirements, requirements,
constraints, constraints,
overrides, overrides,
editables, editables,
reporter: None, database: DistributionDatabase::new(client, context),
} }
} }
/// Set the [`Reporter`] to use for this resolver. /// Set the [`Reporter`] to use for this resolver.
#[must_use] #[must_use]
pub fn with_reporter(self, reporter: impl Reporter + 'static) -> Self { pub fn with_reporter(self, reporter: impl Reporter + 'static) -> Self {
let reporter: Arc<dyn Reporter> = Arc::new(reporter);
Self { Self {
reporter: Some(reporter), database: self.database.with_reporter(reporter),
..self ..self
} }
} }
/// Resolve the requirements from the provided source trees. /// Resolve the requirements from the provided source trees.
pub async fn resolve<T: BuildContext>( pub async fn resolve(self, markers: &MarkerEnvironment) -> Result<Vec<RequestedRequirements>> {
self,
context: &T,
markers: &MarkerEnvironment,
client: &RegistryClient,
) -> Result<Vec<RequestedRequirements>> {
let mut results = Vec::new(); let mut results = Vec::new();
let mut futures = FuturesUnordered::new(); let mut futures = FuturesUnordered::new();
let mut seen = FxHashSet::default();
// Queue up the initial requirements. // Queue up the initial requirements.
let mut queue: VecDeque<Requirement> = self let mut queue: VecDeque<Requirement> = self
@ -88,7 +90,12 @@ impl<'a> LookaheadResolver<'a> {
while !queue.is_empty() || !futures.is_empty() { while !queue.is_empty() || !futures.is_empty() {
while let Some(requirement) = queue.pop_front() { while let Some(requirement) = queue.pop_front() {
futures.push(self.lookahead(requirement, context, client)); // Ignore duplicates. If we have conflicting URLs, we'll catch that later.
if matches!(requirement.version_or_url, Some(VersionOrUrl::Url(_))) {
if seen.insert(requirement.name.clone()) {
futures.push(self.lookahead(requirement));
}
}
} }
while let Some(result) = futures.next().await { while let Some(result) = futures.next().await {
@ -110,12 +117,7 @@ impl<'a> LookaheadResolver<'a> {
} }
/// Infer the package name for a given "unnamed" requirement. /// Infer the package name for a given "unnamed" requirement.
async fn lookahead<T: BuildContext>( async fn lookahead(&self, requirement: Requirement) -> Result<Option<RequestedRequirements>> {
&self,
requirement: Requirement,
context: &T,
client: &RegistryClient,
) -> Result<Option<RequestedRequirements>> {
// Determine whether the requirement represents a local distribution. // Determine whether the requirement represents a local distribution.
let Some(VersionOrUrl::Url(url)) = requirement.version_or_url.as_ref() else { let Some(VersionOrUrl::Url(url)) = requirement.version_or_url.as_ref() else {
return Ok(None); return Ok(None);
@ -124,29 +126,28 @@ impl<'a> LookaheadResolver<'a> {
// Convert to a buildable distribution. // Convert to a buildable distribution.
let dist = Dist::from_url(requirement.name, url.clone())?; let dist = Dist::from_url(requirement.name, url.clone())?;
// Only support source trees (and not, e.g., wheels).
let Dist::Source(source_dist) = &dist else {
return Ok(None);
};
if !source_dist.as_path().is_some_and(std::path::Path::is_dir) {
return Ok(None);
}
// Run the PEP 517 build process to extract metadata from the source distribution. // Run the PEP 517 build process to extract metadata from the source distribution.
let builder = if let Some(reporter) = self.reporter.clone() { let (metadata, _precise) = self
SourceDistributionBuilder::new(client, context).with_reporter(reporter) .database
} else { .get_or_build_wheel_metadata(&dist)
SourceDistributionBuilder::new(client, context) .await
}; .with_context(|| match &dist {
Dist::Built(built) => format!("Failed to download: {built}"),
Dist::Source(source) => format!("Failed to download and build: {source}"),
})?;
let metadata = builder // Consider the dependencies to be "direct" if the requirement is a local source tree.
.download_and_build_metadata(&BuildableSource::Dist(source_dist)) let direct = if let Dist::Source(source_dist) = &dist {
.await?; source_dist.as_path().is_some_and(std::path::Path::is_dir)
} else {
false
};
// Return the requirements from the metadata. // Return the requirements from the metadata.
Ok(Some(RequestedRequirements::new( Ok(Some(RequestedRequirements::new(
requirement.extras, requirement.extras,
metadata.requires_dist, metadata.requires_dist,
direct,
))) )))
} }
} }

View File

@ -137,6 +137,7 @@ impl Manifest {
) -> impl Iterator<Item = &PackageName> { ) -> impl Iterator<Item = &PackageName> {
self.lookaheads self.lookaheads
.iter() .iter()
.filter(|lookahead| lookahead.direct())
.flat_map(|lookahead| { .flat_map(|lookahead| {
self.overrides self.overrides
.apply(lookahead.requirements()) .apply(lookahead.requirements())

View File

@ -12,14 +12,17 @@ pub struct RequestedRequirements {
extras: Vec<ExtraName>, extras: Vec<ExtraName>,
/// The set of requirements that were requested by the originating requirement. /// The set of requirements that were requested by the originating requirement.
requirements: Vec<Requirement>, requirements: Vec<Requirement>,
/// Whether the dependencies were direct or transitive.
direct: bool,
} }
impl RequestedRequirements { impl RequestedRequirements {
/// Instantiate a [`RequestedRequirements`] with the given `extras` and `requirements`. /// Instantiate a [`RequestedRequirements`] with the given `extras` and `requirements`.
pub fn new(extras: Vec<ExtraName>, requirements: Vec<Requirement>) -> Self { pub fn new(extras: Vec<ExtraName>, requirements: Vec<Requirement>, direct: bool) -> Self {
Self { Self {
extras, extras,
requirements, requirements,
direct,
} }
} }
@ -32,4 +35,9 @@ impl RequestedRequirements {
pub fn requirements(&self) -> &[Requirement] { pub fn requirements(&self) -> &[Requirement] {
&self.requirements &self.requirements
} }
/// Return whether the dependencies were direct or transitive.
pub fn direct(&self) -> bool {
self.direct
}
} }

View File

@ -338,10 +338,17 @@ pub(crate) async fn pip_compile(
}; };
// Determine any lookahead requirements. // Determine any lookahead requirements.
let lookaheads = LookaheadResolver::new(&requirements, &constraints, &overrides, &editables) let lookaheads = LookaheadResolver::new(
.with_reporter(ResolverReporter::from(printer)) &requirements,
.resolve(&build_dispatch, &markers, &client) &constraints,
.await?; &overrides,
&editables,
&build_dispatch,
&client,
)
.with_reporter(ResolverReporter::from(printer))
.resolve(&markers)
.await?;
// Create a manifest of the requirements. // Create a manifest of the requirements.
let manifest = Manifest::new( let manifest = Manifest::new(

View File

@ -539,10 +539,17 @@ async fn resolve(
.collect(); .collect();
// Determine any lookahead requirements. // Determine any lookahead requirements.
let lookaheads = LookaheadResolver::new(&requirements, &constraints, &overrides, &editables) let lookaheads = LookaheadResolver::new(
.with_reporter(ResolverReporter::from(printer)) &requirements,
.resolve(build_dispatch, markers, client) &constraints,
.await?; &overrides,
&editables,
build_dispatch,
client,
)
.with_reporter(ResolverReporter::from(printer))
.resolve(markers)
.await?;
// Create a manifest of the requirements. // Create a manifest of the requirements.
let manifest = Manifest::new( let manifest = Manifest::new(

View File

@ -1699,7 +1699,7 @@ fn compatible_repeated_narrowed_url_dependency() -> Result<()> {
fn incompatible_narrowed_url_dependency() -> Result<()> { fn incompatible_narrowed_url_dependency() -> Result<()> {
let context = TestContext::new("3.12"); let context = TestContext::new("3.12");
let requirements_in = context.temp_dir.child("requirements.in"); let requirements_in = context.temp_dir.child("requirements.in");
requirements_in.write_str("werkzeug @ git+https://github.com/pallets/werkzeug.git@main\nwerkzeug @ git+https://github.com/pallets/werkzeug@32e69512134c2f8183c6438b2b2e13fd24e9d19f\nwerkzeug @ git+https://github.com/pallets/werkzeug.git@v1.0.0")?; requirements_in.write_str("werkzeug @ git+https://github.com/pallets/werkzeug.git@main\nwerkzeug @ git+https://github.com/pallets/werkzeug@32e69512134c2f8183c6438b2b2e13fd24e9d19f\nwerkzeug @ git+https://github.com/pallets/werkzeug.git@3.0.1")?;
uv_snapshot!(context.compile() uv_snapshot!(context.compile()
.arg("requirements.in"), @r###" .arg("requirements.in"), @r###"
@ -1710,7 +1710,7 @@ fn incompatible_narrowed_url_dependency() -> Result<()> {
----- stderr ----- ----- stderr -----
error: Requirements contain conflicting URLs for package `werkzeug`: error: Requirements contain conflicting URLs for package `werkzeug`:
- git+https://github.com/pallets/werkzeug@32e69512134c2f8183c6438b2b2e13fd24e9d19f - git+https://github.com/pallets/werkzeug@32e69512134c2f8183c6438b2b2e13fd24e9d19f
- git+https://github.com/pallets/werkzeug.git@v1.0.0 - git+https://github.com/pallets/werkzeug.git@3.0.1
"### "###
); );
@ -1718,10 +1718,9 @@ fn incompatible_narrowed_url_dependency() -> Result<()> {
} }
/// Request `hatchling_editable`, which depends on `https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl`. /// Request `hatchling_editable`, which depends on `https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl`.
/// Since this URL isn't declared upfront, we should reject it.
#[test] #[test]
#[cfg(feature = "git")] #[cfg(feature = "git")]
fn disallowed_transitive_url_dependency() -> Result<()> { fn allowed_transitive_git_dependency() -> Result<()> {
let context = TestContext::new("3.12"); let context = TestContext::new("3.12");
let requirements_in = context.temp_dir.child("requirements.in"); let requirements_in = context.temp_dir.child("requirements.in");
@ -1729,12 +1728,17 @@ fn disallowed_transitive_url_dependency() -> Result<()> {
uv_snapshot!(context.compile() uv_snapshot!(context.compile()
.arg("requirements.in"), @r###" .arg("requirements.in"), @r###"
success: false success: true
exit_code: 2 exit_code: 0
----- stdout ----- ----- stdout -----
# This file was autogenerated by uv via the following command:
# uv pip compile --cache-dir [CACHE_DIR] --exclude-newer 2024-03-25T00:00:00Z requirements.in
hatchling-editable @ https://github.com/astral-sh/uv/files/14762645/hatchling_editable.zip
iniconfig @ git+https://github.com/pytest-dev/iniconfig@9cae43103df70bac6fde7b9f35ad11a9f1be0cb4
# via hatchling-editable
----- stderr ----- ----- stderr -----
error: Package `iniconfig` attempted to resolve via URL: git+https://github.com/pytest-dev/iniconfig@9cae43103df70bac6fde7b9f35ad11a9f1be0cb4. URL dependencies must be expressed as direct requirements or constraints. Consider adding `iniconfig @ git+https://github.com/pytest-dev/iniconfig@9cae43103df70bac6fde7b9f35ad11a9f1be0cb4` to your dependencies or constraints file. Resolved 2 packages in [TIME]
"### "###
); );