mirror of
https://github.com/astral-sh/ruff
synced 2026-01-21 05:20:49 -05:00
## Summary Derived from #17371 Fixes astral-sh/ty#256 Fixes https://github.com/astral-sh/ty/issues/1415 Fixes https://github.com/astral-sh/ty/issues/1433 Fixes https://github.com/astral-sh/ty/issues/1524 Properly handles any kind of recursive inference and prevents panics. --- Let me explain techniques for converging fixed-point iterations during recursive type inference. There are two types of type inference that naively don't converge (causing salsa to panic): divergent type inference and oscillating type inference. ### Divergent type inference Divergent type inference occurs when eagerly expanding a recursive type. A typical example is this: ```python class C: def f(self, other: "C"): self.x = (other.x, 1) reveal_type(C().x) # revealed: Unknown | tuple[Unknown | tuple[Unknown | tuple[..., Literal[1]], Literal[1]], Literal[1]] ``` To solve this problem, we have already introduced `Divergent` types (https://github.com/astral-sh/ruff/pull/20312). `Divergent` types are treated as a kind of dynamic type [^1]. ```python Unknown | tuple[Unknown | tuple[Unknown | tuple[..., Literal[1]], Literal[1]], Literal[1]] => Unknown | tuple[Divergent, Literal[1]] ``` When a query function that returns a type enters a cycle, it sets `Divergent` as the cycle initial value (instead of `Never`). Then, in the cycle recovery function, it reduces the nesting of types containing `Divergent` to converge. ```python 0th: Divergent 1st: Unknown | tuple[Divergent, Literal[1]] 2nd: Unknown | tuple[Unknown | tuple[Divergent, Literal[1]], Literal[1]] => Unknown | tuple[Divergent, Literal[1]] ``` Each cycle recovery function for each query should operate only on the `Divergent` type originating from that query. For this reason, while `Divergent` appears the same as `Any` to the user, it internally carries some information: the location where the cycle occurred. Previously, we roughly identified this by having the scope where the cycle occurred, but with the update to salsa, functions that create cycle initial values can now receive a `salsa::Id` (https://github.com/salsa-rs/salsa/pull/1012). This is an opaque ID that uniquely identifies the cycle head (the query that is the starting point for the fixed-point iteration). `Divergent` now has this `salsa::Id`. ### Oscillating type inference Now, another thing to consider is oscillating type inference. Oscillating type inference arises from the fact that monotonicity is broken. Monotonicity here means that for a query function, if it enters a cycle, the calculation must start from a "bottom value" and progress towards the final result with each cycle. Monotonicity breaks down in type systems that have features like overloading and overriding. ```python class Base: def flip(self) -> "Sub": return Sub() class Sub(Base): def flip(self) -> "Base": return Base() class C: def __init__(self, x: Sub): self.x = x def replace_with(self, other: "C"): self.x = other.x.flip() reveal_type(C(Sub()).x) ``` Naive fixed-point iteration results in `Divergent -> Sub -> Base -> Sub -> ...`, which oscillates forever without diverging or converging. To address this, the salsa API has been modified so that the cycle recovery function receives the value of the previous cycle (https://github.com/salsa-rs/salsa/pull/1012). The cycle recovery function returns the union type of the current cycle and the previous cycle. In the above example, the result type for each cycle is `Divergent -> Sub -> Base (= Sub | Base) -> Base`, which converges. The final result of oscillating type inference does not contain `Divergent` because `Divergent` that appears in a union type can be removed, as is clear from the expansion. This simplification is performed at the same time as nesting reduction. ``` T | Divergent = T | (T | (T | ...)) = T ``` [^1]: In theory, it may be possible to strictly treat types containing `Divergent` types as recursive types, but we probably shouldn't go that deep yet. (AFAIK, there are no PEPs that specify how to handle implicitly recursive types that aren't named by type aliases) ## Performance analysis A happy side effect of this PR is that we've observed widespread performance improvements! This is likely due to the removal of the `ITERATIONS_BEFORE_FALLBACK` and max-specialization depth trick (https://github.com/astral-sh/ty/issues/1433, https://github.com/astral-sh/ty/issues/1415), which means we reach a fixed point much sooner. ## Ecosystem analysis The changes look good overall. You may notice changes in the converged values for recursive types, this is because the way recursive types are normalized has been changed. Previously, types containing `Divergent` types were normalized by replacing them with the `Divergent` type itself, but in this PR, types with a nesting level of 2 or more that contain `Divergent` types are normalized by replacing them with a type with a nesting level of 1. This means that information about the non-divergent parts of recursive types is no longer lost. ```python # previous tuple[tuple[Divergent, int], int] => Divergent # now tuple[tuple[Divergent, int], int] => tuple[Divergent, int] ``` The false positive error introduced in this PR occurs in class definitions with self-referential base classes, such as the one below. ```python from typing_extensions import Generic, TypeVar T = TypeVar("T") U = TypeVar("U") class Base2(Generic[T, U]): ... # TODO: no error # error: [unsupported-base] "Unsupported class base with type `<class 'Base2[Sub2, U@Sub2]'> | <class 'Base2[Sub2[Unknown], U@Sub2]'>`" class Sub2(Base2["Sub2", U]): ... ``` This is due to the lack of support for unions of MROs, or because cyclic legacy generic types are not inferred as generic types early in the query cycle. ## Test Plan All samples listed in astral-sh/ty#256 are tested and passed without any panic! ## Acknowledgments Thanks to @MichaReiser for working on bug fixes and improvements to salsa for this PR. @carljm also contributed early on to the discussion of the query convergence mechanism proposed in this PR. --------- Co-authored-by: Carl Meyer <carl@astral.sh>
263 lines
7.8 KiB
Rust
263 lines
7.8 KiB
Rust
use anyhow::{Context, anyhow};
|
|
use ruff_db::Db;
|
|
use ruff_db::files::{File, Files, system_path_to_file};
|
|
use ruff_db::system::{DbWithTestSystem, System, SystemPath, SystemPathBuf, TestSystem};
|
|
use ruff_db::vendored::VendoredFileSystem;
|
|
use ruff_python_ast::PythonVersion;
|
|
|
|
use ty_python_semantic::lint::{LintRegistry, RuleSelection};
|
|
use ty_python_semantic::pull_types::pull_types;
|
|
use ty_python_semantic::{
|
|
Program, ProgramSettings, PythonPlatform, PythonVersionSource, PythonVersionWithSource,
|
|
SearchPathSettings, default_lint_registry,
|
|
};
|
|
|
|
use test_case::test_case;
|
|
|
|
fn get_cargo_workspace_root() -> anyhow::Result<SystemPathBuf> {
|
|
Ok(SystemPathBuf::from(String::from_utf8(
|
|
std::process::Command::new("cargo")
|
|
.args(["locate-project", "--workspace", "--message-format", "plain"])
|
|
.output()?
|
|
.stdout,
|
|
)?)
|
|
.parent()
|
|
.unwrap()
|
|
.to_owned())
|
|
}
|
|
|
|
/// Test that all snippets in testcorpus can be checked without panic (except for [`KNOWN_FAILURES`])
|
|
#[test]
|
|
fn corpus_no_panic() -> anyhow::Result<()> {
|
|
let crate_root = String::from(env!("CARGO_MANIFEST_DIR"));
|
|
run_corpus_tests(&format!("{crate_root}/resources/corpus/**/*.py"))
|
|
}
|
|
|
|
#[test]
|
|
fn parser_no_panic() -> anyhow::Result<()> {
|
|
let workspace_root = get_cargo_workspace_root()?;
|
|
run_corpus_tests(&format!(
|
|
"{workspace_root}/crates/ruff_python_parser/resources/**/*.py"
|
|
))
|
|
}
|
|
|
|
#[test_case("a-e")]
|
|
#[test_case("f")]
|
|
#[test_case("g-o")]
|
|
#[test_case("p")]
|
|
#[test_case("q-z")]
|
|
#[test_case("!a-z")]
|
|
fn linter_no_panic(range: &str) -> anyhow::Result<()> {
|
|
let workspace_root = get_cargo_workspace_root()?;
|
|
run_corpus_tests(&format!(
|
|
"{workspace_root}/crates/ruff_linter/resources/test/fixtures/[{range}]*/**/*.py"
|
|
))
|
|
}
|
|
|
|
#[test]
|
|
fn linter_stubs_no_panic() -> anyhow::Result<()> {
|
|
let workspace_root = get_cargo_workspace_root()?;
|
|
run_corpus_tests(&format!(
|
|
"{workspace_root}/crates/ruff_linter/resources/test/fixtures/**/*.pyi"
|
|
))
|
|
}
|
|
|
|
#[test_case("a-e")]
|
|
#[test_case("f-k")]
|
|
#[test_case("l-p")]
|
|
#[test_case("q-z")]
|
|
#[test_case("!a-z")]
|
|
fn typeshed_no_panic(range: &str) -> anyhow::Result<()> {
|
|
let workspace_root = get_cargo_workspace_root()?;
|
|
run_corpus_tests(&format!(
|
|
"{workspace_root}/crates/ty_vendored/vendor/typeshed/stdlib/[{range}]*.pyi"
|
|
))
|
|
}
|
|
|
|
#[expect(clippy::print_stdout)]
|
|
fn run_corpus_tests(pattern: &str) -> anyhow::Result<()> {
|
|
let root = SystemPathBuf::from("/src");
|
|
|
|
let mut db = CorpusDb::new();
|
|
db.memory_file_system()
|
|
.create_directory_all(root.as_ref())?;
|
|
|
|
let workspace_root = get_cargo_workspace_root()?;
|
|
let workspace_root = workspace_root.to_string();
|
|
|
|
let corpus = glob::glob(pattern).context("Failed to compile pattern")?;
|
|
|
|
for path in corpus {
|
|
let path = path.context("Failed to glob path")?;
|
|
let path = SystemPathBuf::from_path_buf(path).map_err(|path| {
|
|
anyhow!(
|
|
"Failed to convert path '{path}' to system path",
|
|
path = path.display()
|
|
)
|
|
})?;
|
|
|
|
let relative_path = path.strip_prefix(&workspace_root)?;
|
|
|
|
let (py_expected_to_fail, pyi_expected_to_fail) = KNOWN_FAILURES
|
|
.iter()
|
|
.find_map(|(path, py_fail, pyi_fail)| {
|
|
if *path == relative_path.as_str().replace('\\', "/") {
|
|
Some((*py_fail, *pyi_fail))
|
|
} else {
|
|
None
|
|
}
|
|
})
|
|
.unwrap_or((false, false));
|
|
|
|
let source = path.as_path();
|
|
let source_filename = source.file_name().unwrap();
|
|
|
|
let code = std::fs::read_to_string(source)
|
|
.with_context(|| format!("Failed to read test file: {path}"))?;
|
|
|
|
let mut check_with_file_name = |path: &SystemPath| {
|
|
db.memory_file_system().write_file_all(path, &code).unwrap();
|
|
File::sync_path(&mut db, path);
|
|
|
|
// this test is only asserting that we can pull every expression type without a panic
|
|
// (and some non-expressions that clearly define a single type)
|
|
let file = system_path_to_file(&db, path).unwrap();
|
|
|
|
let result = std::panic::catch_unwind(|| pull_types(&db, file));
|
|
|
|
let expected_to_fail = if path.extension().map(|e| e == "pyi").unwrap_or(false) {
|
|
pyi_expected_to_fail
|
|
} else {
|
|
py_expected_to_fail
|
|
};
|
|
if let Err(err) = result {
|
|
if !expected_to_fail {
|
|
println!(
|
|
"Check failed for {relative_path:?}. Consider fixing it or adding it to KNOWN_FAILURES"
|
|
);
|
|
std::panic::resume_unwind(err);
|
|
}
|
|
} else {
|
|
assert!(
|
|
!expected_to_fail,
|
|
"Expected to panic, but did not. Consider removing this path from KNOWN_FAILURES"
|
|
);
|
|
}
|
|
|
|
db.memory_file_system().remove_file(path).unwrap();
|
|
file.sync(&mut db);
|
|
};
|
|
|
|
if source.extension() == Some("pyi") {
|
|
println!("checking {relative_path}");
|
|
let pyi_dest = root.join(source_filename);
|
|
check_with_file_name(&pyi_dest);
|
|
} else {
|
|
println!("checking {relative_path}");
|
|
let py_dest = root.join(source_filename);
|
|
check_with_file_name(&py_dest);
|
|
|
|
let pyi_dest = root.join(format!("{source_filename}i"));
|
|
println!("re-checking as stub file: {pyi_dest}");
|
|
check_with_file_name(&pyi_dest);
|
|
}
|
|
}
|
|
|
|
Ok(())
|
|
}
|
|
|
|
/// Whether or not the .py/.pyi version of this file is expected to fail
|
|
#[rustfmt::skip]
|
|
const KNOWN_FAILURES: &[(&str, bool, bool)] = &[
|
|
];
|
|
|
|
#[salsa::db]
|
|
#[derive(Clone)]
|
|
pub struct CorpusDb {
|
|
storage: salsa::Storage<Self>,
|
|
files: Files,
|
|
rule_selection: RuleSelection,
|
|
system: TestSystem,
|
|
vendored: VendoredFileSystem,
|
|
}
|
|
|
|
impl CorpusDb {
|
|
#[expect(clippy::new_without_default)]
|
|
pub fn new() -> Self {
|
|
let db = Self {
|
|
storage: salsa::Storage::new(None),
|
|
system: TestSystem::default(),
|
|
vendored: ty_vendored::file_system().clone(),
|
|
rule_selection: RuleSelection::from_registry(default_lint_registry()),
|
|
files: Files::default(),
|
|
};
|
|
|
|
Program::from_settings(
|
|
&db,
|
|
ProgramSettings {
|
|
python_version: PythonVersionWithSource {
|
|
version: PythonVersion::latest_ty(),
|
|
source: PythonVersionSource::default(),
|
|
},
|
|
python_platform: PythonPlatform::default(),
|
|
search_paths: SearchPathSettings::new(vec![])
|
|
.to_search_paths(db.system(), db.vendored())
|
|
.unwrap(),
|
|
},
|
|
);
|
|
|
|
db
|
|
}
|
|
}
|
|
|
|
impl DbWithTestSystem for CorpusDb {
|
|
fn test_system(&self) -> &TestSystem {
|
|
&self.system
|
|
}
|
|
|
|
fn test_system_mut(&mut self) -> &mut TestSystem {
|
|
&mut self.system
|
|
}
|
|
}
|
|
|
|
#[salsa::db]
|
|
impl ruff_db::Db for CorpusDb {
|
|
fn vendored(&self) -> &VendoredFileSystem {
|
|
&self.vendored
|
|
}
|
|
|
|
fn system(&self) -> &dyn System {
|
|
&self.system
|
|
}
|
|
|
|
fn files(&self) -> &Files {
|
|
&self.files
|
|
}
|
|
|
|
fn python_version(&self) -> PythonVersion {
|
|
Program::get(self).python_version(self)
|
|
}
|
|
}
|
|
|
|
#[salsa::db]
|
|
impl ty_python_semantic::Db for CorpusDb {
|
|
fn should_check_file(&self, file: File) -> bool {
|
|
!file.path(self).is_vendored_path()
|
|
}
|
|
|
|
fn rule_selection(&self, _file: File) -> &RuleSelection {
|
|
&self.rule_selection
|
|
}
|
|
|
|
fn lint_registry(&self) -> &LintRegistry {
|
|
default_lint_registry()
|
|
}
|
|
|
|
fn verbose(&self) -> bool {
|
|
false
|
|
}
|
|
}
|
|
|
|
#[salsa::db]
|
|
impl salsa::Database for CorpusDb {}
|