uv-resolver: fix conflicting extra bug during `uv sync`

In #10875, I relaxed the error checking during resolution to permit
dependencies like `foo[x1]`, where `x1` was defined to be conflicting.
In exchange, the error was, roughly speaking, moved to installation
time. This was achieved by looking at the full set of enabled extras
and checking whether any conflicts occurred. If so, an error was
reported. This ends up being more expressive and permits more valid
configurations.

However, in so doing, there was a bug in how the accumulated extras
were being passed to conflict marker evaluation. Namely, we weren't
accounting for the fact that if `foo[x1]` was enabled, then that fact
should be carried through to all conflict marker evaluations. This is
because some of those will use things like `extra != 'x1'` to indicate
that it should only be included if an extra *isn't* enabled.

In #10985, this manifested with PyTorch where `torch==2.4.1` and
`torch==2.4.1+cpu` were being installed simultaneously. Namely, the
choice to install `torch==2.4.1` was not taking into account that
the `cpu` extra has been enabled. If it did, then it's conflict
marker would evaluate to `false`. Since it didn't, and since
`torch==2.4.1+cpu` was also being included, we ended up installing both
versions.

The approach I took in this PR was to add a second breadth first
traversal (which comes first) over the dependency tree to accumulate all
of the activated extras. Then, only in the second traversal do we
actually build up the resolution graph.

Unfortunately, I have no automatic regression test to include here. The
regression test we _ought_ to include involves `torch`. And while we are
generally find to use those in tests that only generate a lock file, the
regression test here actually requires running installation. And
downloading and installing `torch` in tests is bad juju. So adding a
regression test for this is blocked on better infrastructure for PyTorch
tests. With that said, I did manually verify that the test case in #10985
no longer installs multiple versions of `torch`.

Fixes #10985
This commit is contained in:
Andrew Gallant 2025-01-29 13:24:29 -05:00 committed by Andrew Gallant
parent d5461d8d34
commit 35ded6d7e1
1 changed files with 131 additions and 37 deletions

View File

@ -267,8 +267,137 @@ pub trait Installable<'lock> {
}
}
let mut all_activated_extras: BTreeSet<(&PackageName, &ExtraName)> =
activated_extras.iter().copied().collect();
// Below, we traverse the dependency graph in a breadth first manner
// twice. It's only in the second traversal that we actually build
// up our resolution graph. In the first traversal, we accumulate all
// activated extras. This includes the extras explicitly enabled on
// the CLI (which were gathered above) and the extras enabled via
// dependency specifications like `foo[extra]`. We need to do this
// to correctly support conflicting extras.
//
// In particular, the way conflicting extras works is by forking the
// resolver based on the extras that are declared as conflicting. But
// this forking needs to be made manifest somehow in the lock file to
// avoid multiple versions of the same package being installed into the
// environment. This is why "conflict markers" were invented. For
// example, you might have both `torch` and `torch+cpu` in your
// dependency graph, where the latter is only enabled when the `cpu`
// extra is enabled, and the former is specifically *not* enabled
// when the `cpu` extra is enabled.
//
// In order to evaluate these conflict markers correctly, we need to
// know whether the `cpu` extra is enabled when we visit the `torch`
// dependency. If we think it's disabled, then we'll erroneously
// include it if the extra is actually enabled. But in order to tell
// if it's enabled, we need to traverse the entire dependency graph
// first to inspect which extras are enabled!
//
// Of course, we don't need to do this at all if there aren't any
// conflicts. In which case, we skip all of this and just do the one
// traversal below.
if !self.lock().conflicts().is_empty() {
let mut activated_extras_set: BTreeSet<(&PackageName, &ExtraName)> =
activated_extras.iter().copied().collect();
let mut queue = queue.clone();
let mut seen = seen.clone();
while let Some((package, extra)) = queue.pop_front() {
let deps = if let Some(extra) = extra {
Either::Left(
package
.optional_dependencies
.get(extra)
.into_iter()
.flatten(),
)
} else {
Either::Right(package.dependencies.iter())
};
for dep in deps {
let mut additional_activated_extras = vec![];
for extra in &dep.extra {
let key = (&dep.package_id.name, extra);
if !activated_extras_set.contains(&key) {
additional_activated_extras.push(key);
}
}
let temp_activated_extras = if additional_activated_extras.is_empty() {
Cow::Borrowed(&activated_extras)
} else {
let mut owned = activated_extras.clone();
owned.extend_from_slice(&additional_activated_extras);
Cow::Owned(owned)
};
if !dep.complexified_marker.evaluate(
marker_env,
&temp_activated_extras,
&activated_groups,
) {
continue;
}
// It is, I believe, possible to be here for a dependency that
// will ultimately not be included in the final resolution.
// Specifically, carrying on from the example in the comments
// above, we might visit `torch` first and thus not know if
// the `cpu` feature is enabled or not, and thus, the marker
// evaluation above will pass.
//
// So is this a problem? Well, this is the main reason why we
// do two graph traversals. On the second traversal below, we
// will have seen all of the enabled extras, and so `torch`
// will be excluded.
//
// But could this lead to a bigger list of activated extras
// than we actually have? I believe that is indeed possible,
// but I think it is only a problem if it leads to extras that
// *conflict* with one another being simultaneously enabled.
// However, after this first traversal, we check our set of
// accumulated extras to ensure that there are no conflicts. If
// there are, we raise an error. ---AG
for key in additional_activated_extras {
activated_extras_set.insert(key);
activated_extras.push(key);
}
let dep_dist = self.lock().find_by_id(&dep.package_id);
// Push its dependencies on the queue.
if seen.insert((&dep.package_id, None)) {
queue.push_back((dep_dist, None));
}
for extra in &dep.extra {
if seen.insert((&dep.package_id, Some(extra))) {
queue.push_back((dep_dist, Some(extra)));
}
}
}
}
// At time of writing, it's somewhat expected that the set of
// conflicting extras is pretty small. With that said, the
// time complexity of the following routine is pretty gross.
// Namely, `set.contains` is linear in the size of the set,
// iteration over all conflicts is also obviously linear in
// the number of conflicting sets and then for each of those,
// we visit every possible pair of activated extra from above,
// which is quadratic in the total number of extras enabled. I
// believe the simplest improvement here, if it's necessary, is
// to adjust the `Conflicts` internals to own these sorts of
// checks. ---AG
for set in self.lock().conflicts().iter() {
for ((pkg1, extra1), (pkg2, extra2)) in
activated_extras_set.iter().tuple_combinations()
{
if set.contains(pkg1, *extra1) && set.contains(pkg2, *extra2) {
return Err(LockErrorKind::ConflictingExtra {
package1: (*pkg1).clone(),
extra1: (*extra1).clone(),
package2: (*pkg2).clone(),
extra2: (*extra2).clone(),
}
.into());
}
}
}
}
while let Some((package, extra)) = queue.pop_front() {
let deps = if let Some(extra) = extra {
Either::Left(
@ -282,15 +411,6 @@ pub trait Installable<'lock> {
Either::Right(package.dependencies.iter())
};
for dep in deps {
let mut activated_extras = Cow::Borrowed(&*activated_extras);
if !dep.extra.is_empty() {
let mut extended = activated_extras.to_vec();
for extra in &dep.extra {
extended.push((&dep.package_id.name, extra));
all_activated_extras.insert((&dep.package_id.name, extra));
}
activated_extras = Cow::Owned(extended);
}
if !dep.complexified_marker.evaluate(
marker_env,
&activated_extras,
@ -343,32 +463,6 @@ pub trait Installable<'lock> {
}
}
// At time of writing, it's somewhat expected that the set of
// conflicting extras is pretty small. With that said, the
// time complexity of the following routine is pretty gross.
// Namely, `set.contains` is linear in the size of the set,
// iteration over all conflicts is also obviously linear in
// the number of conflicting sets and then for each of those,
// we visit every possible pair of activated extra from above,
// which is quadratic in the total number of extras enabled. I
// believe the simplest improvement here, if it's necessary, is
// to adjust the `Conflicts` internals to own these sorts of
// checks. ---AG
for set in self.lock().conflicts().iter() {
for ((pkg1, extra1), (pkg2, extra2)) in all_activated_extras.iter().tuple_combinations()
{
if set.contains(pkg1, *extra1) && set.contains(pkg2, *extra2) {
return Err(LockErrorKind::ConflictingExtra {
package1: (*pkg1).clone(),
extra1: (*extra1).clone(),
package2: (*pkg2).clone(),
extra2: (*extra2).clone(),
}
.into());
}
}
}
Ok(Resolution::new(petgraph))
}