ruff/crates/ty_python_semantic/src/semantic_index/use_def.rs

1129 lines
49 KiB
Rust

//! First, some terminology:
//!
//! * A "place" is semantically a location where a value can be read or written, and syntactically,
//! an expression that can be the target of an assignment, e.g. `x`, `x[0]`, `x.y`. (The term is
//! borrowed from Rust). In Python syntax, an expression like `f().x` is also allowed as the
//! target so it can be called a place, but we do not record declarations / bindings like `f().x:
//! int`, `f().x = ...`. Type checking itself can be done by recording only assignments to names,
//! but in order to perform type narrowing by attribute/subscript assignments, they must also be
//! recorded.
//!
//! * A "binding" gives a new value to a place. This includes many different Python statements
//! (assignment statements of course, but also imports, `def` and `class` statements, `as`
//! clauses in `with` and `except` statements, match patterns, and others) and even one
//! expression kind (named expressions). It notably does not include annotated assignment
//! statements without a right-hand side value; these do not assign any new value to the place.
//! We consider function parameters to be bindings as well, since (from the perspective of the
//! function's internal scope), a function parameter begins the scope bound to a value.
//!
//! * A "declaration" establishes an upper bound type for the values that a variable may be
//! permitted to take on. Annotated assignment statements (with or without an RHS value) are
//! declarations; annotated function parameters are also declarations. We consider `def` and
//! `class` statements to also be declarations, so as to prohibit accidentally shadowing them.
//!
//! Annotated assignments with a right-hand side, and annotated function parameters, are both
//! bindings and declarations.
//!
//! We use [`Definition`] as the universal term (and Salsa tracked struct) encompassing both
//! bindings and declarations. (This sacrifices a bit of type safety in exchange for improved
//! performance via fewer Salsa tracked structs and queries, since most declarations -- typed
//! parameters and annotated assignments with RHS -- are both bindings and declarations.)
//!
//! At any given use of a variable, we can ask about both its "declared type" and its "inferred
//! type". These may be different, but the inferred type must always be assignable to the declared
//! type; that is, the declared type is always wider, and the inferred type may be more precise. If
//! we see an invalid assignment, we emit a diagnostic and abandon our inferred type, deferring to
//! the declared type (this allows an explicit annotation to override bad inference, without a
//! cast), maintaining the invariant.
//!
//! The **inferred type** represents the most precise type we believe encompasses all possible
//! values for the variable at a given use. It is based on a union of the bindings which can reach
//! that use through some control flow path, and the narrowing constraints that control flow must
//! have passed through between the binding and the use. For example, in this code:
//!
//! ```python
//! x = 1 if flag else None
//! if x is not None:
//! use(x)
//! ```
//!
//! For the use of `x` on the third line, the inferred type should be `Literal[1]`. This is based
//! on the binding on the first line, which assigns the type `Literal[1] | None`, and the narrowing
//! constraint on the second line, which rules out the type `None`, since control flow must pass
//! through this constraint to reach the use in question.
//!
//! The **declared type** represents the code author's declaration (usually through a type
//! annotation) that a given variable should not be assigned any type outside the declared type. In
//! our model, declared types are also control-flow-sensitive; we allow the code author to
//! explicitly redeclare the same variable with a different type. So for a given binding of a
//! variable, we will want to ask which declarations of that variable can reach that binding, in
//! order to determine whether the binding is permitted, or should be a type error. For example:
//!
//! ```python
//! from pathlib import Path
//! def f(path: str):
//! path: Path = Path(path)
//! ```
//!
//! In this function, the initial declared type of `path` is `str`, meaning that the assignment
//! `path = Path(path)` would be a type error, since it assigns to `path` a value whose type is not
//! assignable to `str`. This is the purpose of declared types: they prevent accidental assignment
//! of the wrong type to a variable.
//!
//! But in some cases it is useful to "shadow" or "redeclare" a variable with a new type, and we
//! permit this, as long as it is done with an explicit re-annotation. So `path: Path =
//! Path(path)`, with the explicit `: Path` annotation, is permitted.
//!
//! The general rule is that whatever declaration(s) can reach a given binding determine the
//! validity of that binding. If there is a path in which the place is not declared, that is a
//! declaration of `Unknown`. If multiple declarations can reach a binding, we union them, but by
//! default we also issue a type error, since this implicit union of declared types may hide an
//! error.
//!
//! To support type inference, we build a map from each use of a place to the bindings live at
//! that use, and the type narrowing constraints that apply to each binding.
//!
//! Let's take this code sample:
//!
//! ```python
//! x = 1
//! x = 2
//! y = x
//! if flag:
//! x = 3
//! else:
//! x = 4
//! z = x
//! ```
//!
//! In this snippet, we have four bindings of `x` (the statements assigning `1`, `2`, `3`, and `4`
//! to it), and two uses of `x` (the `y = x` and `z = x` assignments). The first binding of `x`
//! does not reach any use, because it's immediately replaced by the second binding, before any use
//! happens. (A linter could thus flag the statement `x = 1` as likely superfluous.)
//!
//! The first use of `x` has one live binding: the assignment `x = 2`.
//!
//! Things get a bit more complex when we have branches. We will definitely take either the `if` or
//! the `else` branch. Thus, the second use of `x` has two live bindings: `x = 3` and `x = 4`. The
//! `x = 2` assignment is no longer visible, because it must be replaced by either `x = 3` or `x =
//! 4`, no matter which branch was taken. We don't know which branch was taken, so we must consider
//! both bindings as live, which means eventually we would (in type inference) look at these two
//! bindings and infer a type of `Literal[3, 4]` -- the union of `Literal[3]` and `Literal[4]` --
//! for the second use of `x`.
//!
//! So that's one question our use-def map needs to answer: given a specific use of a place, which
//! binding(s) can reach that use. In [`AstIds`](crate::semantic_index::ast_ids::AstIds) we number
//! all uses (that means a `Name`/`ExprAttribute`/`ExprSubscript` node with `Load` context)
//! so we have a `ScopedUseId` to efficiently represent each use.
//!
//! We also need to know, for a given definition of a place, what type narrowing constraints apply
//! to it. For instance, in this code sample:
//!
//! ```python
//! x = 1 if flag else None
//! if x is not None:
//! use(x)
//! ```
//!
//! At the use of `x`, the live binding of `x` is `1 if flag else None`, which would infer as the
//! type `Literal[1] | None`. But the constraint `x is not None` dominates this use, which means we
//! can rule out the possibility that `x` is `None` here, which should give us the type
//! `Literal[1]` for this use.
//!
//! For declared types, we need to be able to answer the question "given a binding to a place,
//! which declarations of that place can reach the binding?" This allows us to emit a diagnostic
//! if the binding is attempting to bind a value of a type that is not assignable to the declared
//! type for that place, at that point in control flow.
//!
//! We also need to know, given a declaration of a place, what the inferred type of that place is
//! at that point. This allows us to emit a diagnostic in a case like `x = "foo"; x: int`. The
//! binding `x = "foo"` occurs before the declaration `x: int`, so according to our
//! control-flow-sensitive interpretation of declarations, the assignment is not an error. But the
//! declaration is an error, since it would violate the "inferred type must be assignable to
//! declared type" rule.
//!
//! Another case we need to handle is when a place is referenced from a different scope (for
//! example, an import or a nonlocal reference). We call this "public" use of a place. For public
//! use of a place, we prefer the declared type, if there are any declarations of that place; if
//! not, we fall back to the inferred type. So we also need to know which declarations and bindings
//! can reach the end of the scope.
//!
//! Technically, public use of a place could occur from any point in control flow of the scope
//! where the place is defined (via inline imports and import cycles, in the case of an import, or
//! via a function call partway through the local scope that ends up using a place from the scope
//! via a global or nonlocal reference.) But modeling this fully accurately requires whole-program
//! analysis that isn't tractable for an efficient analysis, since it means a given place could
//! have a different type every place it's referenced throughout the program, depending on the
//! shape of arbitrarily-sized call/import graphs. So we follow other Python type checkers in
//! making the simplifying assumption that usually the scope will finish execution before its
//! places are made visible to other scopes; for instance, most imports will import from a
//! complete module, not a partially-executed module. (We may want to get a little smarter than
//! this in the future for some closures, but for now this is where we start.)
//!
//! The data structure we build to answer these questions is the `UseDefMap`. It has a
//! `bindings_by_use` vector of [`Bindings`] indexed by [`ScopedUseId`], a
//! `declarations_by_binding` vector of [`Declarations`] indexed by [`ScopedDefinitionId`], a
//! `bindings_by_declaration` vector of [`Bindings`] indexed by [`ScopedDefinitionId`], and
//! `public_bindings` and `public_definitions` vectors indexed by [`ScopedPlaceId`]. The values in
//! each of these vectors are (in principle) a list of live bindings at that use/definition, or at
//! the end of the scope for that place, with a list of the dominating constraints for each
//! binding.
//!
//! In order to avoid vectors-of-vectors-of-vectors and all the allocations that would entail, we
//! don't actually store these "list of visible definitions" as a vector of [`Definition`].
//! Instead, [`Bindings`] and [`Declarations`] are structs which use bit-sets to track
//! definitions (and constraints, in the case of bindings) in terms of [`ScopedDefinitionId`] and
//! [`ScopedPredicateId`], which are indices into the `all_definitions` and `predicates`
//! indexvecs in the [`UseDefMap`].
//!
//! There is another special kind of possible "definition" for a place: there might be a path from
//! the scope entry to a given use in which the place is never bound. We model this with a special
//! "unbound/undeclared" definition (a [`DefinitionState::Undefined`] entry at the start of the
//! `all_definitions` vector). If that sentinel definition is present in the live bindings at a
//! given use, it means that there is a possible path through control flow in which that place is
//! unbound. Similarly, if that sentinel is present in the live declarations, it means that the
//! place is (possibly) undeclared.
//!
//! To build a [`UseDefMap`], the [`UseDefMapBuilder`] is notified of each new use, definition, and
//! constraint as they are encountered by the
//! [`SemanticIndexBuilder`](crate::semantic_index::builder::SemanticIndexBuilder) AST visit. For
//! each place, the builder tracks the `PlaceState` (`Bindings` and `Declarations`) for that place.
//! When we hit a use or definition of a place, we record the necessary parts of the current state
//! for that place that we need for that use or definition. When we reach the end of the scope, it
//! records the state for each place as the public definitions of that place.
//!
//! ```python
//! x = 1
//! x = 2
//! y = x
//! if flag:
//! x = 3
//! else:
//! x = 4
//! z = x
//! ```
//!
//! Let's walk through the above example. Initially we do not have any record of `x`. When we add
//! the new place (before we process the first binding), we create a new undefined `PlaceState`
//! which has a single live binding (the "unbound" definition) and a single live declaration (the
//! "undeclared" definition). When we see `x = 1`, we record that as the sole live binding of `x`.
//! The "unbound" binding is no longer visible. Then we see `x = 2`, and we replace `x = 1` as the
//! sole live binding of `x`. When we get to `y = x`, we record that the live bindings for that use
//! of `x` are just the `x = 2` definition.
//!
//! Then we hit the `if` branch. We visit the `test` node (`flag` in this case), since that will
//! happen regardless. Then we take a pre-branch snapshot of the current state for all places,
//! which we'll need later. Then we record `flag` as a possible constraint on the current binding
//! (`x = 2`), and go ahead and visit the `if` body. When we see `x = 3`, it replaces `x = 2`
//! (constrained by `flag`) as the sole live binding of `x`. At the end of the `if` body, we take
//! another snapshot of the current place state; we'll call this the post-if-body snapshot.
//!
//! Now we need to visit the `else` clause. The conditions when entering the `else` clause should
//! be the pre-if conditions; if we are entering the `else` clause, we know that the `if` test
//! failed and we didn't execute the `if` body. So we first reset the builder to the pre-if state,
//! using the snapshot we took previously (meaning we now have `x = 2` as the sole binding for `x`
//! again), and record a *negative* `flag` constraint for all live bindings (`x = 2`). We then
//! visit the `else` clause, where `x = 4` replaces `x = 2` as the sole live binding of `x`.
//!
//! Now we reach the end of the if/else, and want to visit the following code. The state here needs
//! to reflect that we might have gone through the `if` branch, or we might have gone through the
//! `else` branch, and we don't know which. So we need to "merge" our current builder state
//! (reflecting the end-of-else state, with `x = 4` as the only live binding) with our post-if-body
//! snapshot (which has `x = 3` as the only live binding). The result of this merge is that we now
//! have two live bindings of `x`: `x = 3` and `x = 4`.
//!
//! Another piece of information that the `UseDefMap` needs to provide are reachability constraints.
//! See [`reachability_constraints.rs`] for more details, in particular how they apply to bindings.
//!
//! The [`UseDefMapBuilder`] itself just exposes methods for taking a snapshot, resetting to a
//! snapshot, and merging a snapshot into the current state. The logic using these methods lives in
//! [`SemanticIndexBuilder`](crate::semantic_index::builder::SemanticIndexBuilder), e.g. where it
//! visits a `StmtIf` node.
use ruff_index::{IndexVec, newtype_index};
use rustc_hash::FxHashMap;
use self::place_state::{
Bindings, Declarations, EagerSnapshot, LiveBindingsIterator, LiveDeclaration,
LiveDeclarationsIterator, PlaceState, ScopedDefinitionId,
};
use crate::node_key::NodeKey;
use crate::place::BoundnessAnalysis;
use crate::semantic_index::ast_ids::ScopedUseId;
use crate::semantic_index::definition::{Definition, DefinitionState};
use crate::semantic_index::narrowing_constraints::{
ConstraintKey, NarrowingConstraints, NarrowingConstraintsBuilder, NarrowingConstraintsIterator,
};
use crate::semantic_index::place::{
FileScopeId, PlaceExpr, PlaceExprWithFlags, ScopeKind, ScopedPlaceId,
};
use crate::semantic_index::predicate::{
Predicate, PredicateOrLiteral, Predicates, PredicatesBuilder, ScopedPredicateId,
};
use crate::semantic_index::reachability_constraints::{
ReachabilityConstraints, ReachabilityConstraintsBuilder, ScopedReachabilityConstraintId,
};
use crate::semantic_index::use_def::place_state::PreviousDefinitions;
use crate::semantic_index::{EagerSnapshotResult, SemanticIndex};
use crate::types::{IntersectionBuilder, Truthiness, Type, infer_narrowing_constraint};
mod place_state;
/// Applicable definitions and constraints for every use of a name.
#[derive(Debug, PartialEq, Eq, salsa::Update, get_size2::GetSize)]
pub(crate) struct UseDefMap<'db> {
/// Array of [`Definition`] in this scope. Only the first entry should be [`DefinitionState::Undefined`];
/// this represents the implicit "unbound"/"undeclared" definition of every place.
all_definitions: IndexVec<ScopedDefinitionId, DefinitionState<'db>>,
/// Array of predicates in this scope.
predicates: Predicates<'db>,
/// Array of narrowing constraints in this scope.
narrowing_constraints: NarrowingConstraints,
/// Array of reachability constraints in this scope.
reachability_constraints: ReachabilityConstraints,
/// [`Bindings`] reaching a [`ScopedUseId`].
bindings_by_use: IndexVec<ScopedUseId, Bindings>,
/// Tracks whether or not a given AST node is reachable from the start of the scope.
node_reachability: FxHashMap<NodeKey, ScopedReachabilityConstraintId>,
/// If the definition is a binding (only) -- `x = 1` for example -- then we need
/// [`Declarations`] to know whether this binding is permitted by the live declarations.
///
/// If the definition is both a declaration and a binding -- `x: int = 1` for example -- then
/// we don't actually need anything here, all we'll need to validate is that our own RHS is a
/// valid assignment to our own annotation.
declarations_by_binding: FxHashMap<Definition<'db>, Declarations>,
/// If the definition is a declaration (only) -- `x: int` for example -- then we need
/// [`Bindings`] to know whether this declaration is consistent with the previously
/// inferred type.
///
/// If the definition is both a declaration and a binding -- `x: int = 1` for example -- then
/// we don't actually need anything here, all we'll need to validate is that our own RHS is a
/// valid assignment to our own annotation.
bindings_by_declaration: FxHashMap<Definition<'db>, Bindings>,
/// [`PlaceState`] visible at end of scope for each place.
end_of_scope_places: IndexVec<ScopedPlaceId, PlaceState>,
/// All potentially reachable bindings and declarations, for each place.
reachable_definitions: IndexVec<ScopedPlaceId, ReachableDefinitions>,
/// Snapshot of bindings in this scope that can be used to resolve a reference in a nested
/// eager scope.
eager_snapshots: EagerSnapshots,
/// Whether or not the end of the scope is reachable.
///
/// This is used to check if the function can implicitly return `None`.
/// For example:
/// ```py
/// def f(cond: bool) -> int | None:
/// if cond:
/// return 1
///
/// def g() -> int:
/// if True:
/// return 1
/// ```
///
/// Function `f` may implicitly return `None`, but `g` cannot.
///
/// This is used by [`UseDefMap::can_implicitly_return_none`].
end_of_scope_reachability: ScopedReachabilityConstraintId,
}
pub(crate) enum ApplicableConstraints<'map, 'db> {
UnboundBinding(ConstraintsIterator<'map, 'db>),
ConstrainedBindings(BindingWithConstraintsIterator<'map, 'db>),
}
impl<'db> UseDefMap<'db> {
pub(crate) fn bindings_at_use(
&self,
use_id: ScopedUseId,
) -> BindingWithConstraintsIterator<'_, 'db> {
self.bindings_iterator(
&self.bindings_by_use[use_id],
BoundnessAnalysis::BasedOnUnboundVisibility,
)
}
pub(crate) fn applicable_constraints(
&self,
constraint_key: ConstraintKey,
enclosing_scope: FileScopeId,
expr: &PlaceExpr,
index: &'db SemanticIndex,
) -> ApplicableConstraints<'_, 'db> {
match constraint_key {
ConstraintKey::NarrowingConstraint(constraint) => {
ApplicableConstraints::UnboundBinding(ConstraintsIterator {
predicates: &self.predicates,
constraint_ids: self.narrowing_constraints.iter_predicates(constraint),
})
}
ConstraintKey::EagerNestedScope(nested_scope) => {
let EagerSnapshotResult::FoundBindings(bindings) =
index.eager_snapshot(enclosing_scope, expr, nested_scope)
else {
unreachable!(
"The result of `SemanticIndex::eager_snapshot` must be `FoundBindings`"
)
};
ApplicableConstraints::ConstrainedBindings(bindings)
}
ConstraintKey::UseId(use_id) => {
ApplicableConstraints::ConstrainedBindings(self.bindings_at_use(use_id))
}
}
}
pub(super) fn is_reachable(
&self,
db: &dyn crate::Db,
reachability: ScopedReachabilityConstraintId,
) -> bool {
self.reachability_constraints
.evaluate(db, &self.predicates, reachability)
.may_be_true()
}
/// Check whether or not a given expression is reachable from the start of the scope. This
/// is a local analysis which does not capture the possibility that the entire scope might
/// be unreachable. Use [`super::SemanticIndex::is_node_reachable`] for the global
/// analysis.
#[track_caller]
pub(super) fn is_node_reachable(&self, db: &dyn crate::Db, node_key: NodeKey) -> bool {
self
.reachability_constraints
.evaluate(
db,
&self.predicates,
*self
.node_reachability
.get(&node_key)
.expect("`is_node_reachable` should only be called on AST nodes with recorded reachability"),
)
.may_be_true()
}
pub(crate) fn end_of_scope_bindings(
&self,
place: ScopedPlaceId,
) -> BindingWithConstraintsIterator<'_, 'db> {
self.bindings_iterator(
self.end_of_scope_places[place].bindings(),
BoundnessAnalysis::BasedOnUnboundVisibility,
)
}
pub(crate) fn all_reachable_bindings(
&self,
place: ScopedPlaceId,
) -> BindingWithConstraintsIterator<'_, 'db> {
self.bindings_iterator(
&self.reachable_definitions[place].bindings,
BoundnessAnalysis::AssumeBound,
)
}
pub(crate) fn eager_snapshot(
&self,
eager_bindings: ScopedEagerSnapshotId,
) -> EagerSnapshotResult<'_, 'db> {
match self.eager_snapshots.get(eager_bindings) {
Some(EagerSnapshot::Constraint(constraint)) => {
EagerSnapshotResult::FoundConstraint(*constraint)
}
Some(EagerSnapshot::Bindings(bindings)) => EagerSnapshotResult::FoundBindings(
self.bindings_iterator(bindings, BoundnessAnalysis::BasedOnUnboundVisibility),
),
None => EagerSnapshotResult::NotFound,
}
}
pub(crate) fn bindings_at_declaration(
&self,
declaration: Definition<'db>,
) -> BindingWithConstraintsIterator<'_, 'db> {
self.bindings_iterator(
&self.bindings_by_declaration[&declaration],
BoundnessAnalysis::BasedOnUnboundVisibility,
)
}
pub(crate) fn declarations_at_binding(
&self,
binding: Definition<'db>,
) -> DeclarationsIterator<'_, 'db> {
self.declarations_iterator(
&self.declarations_by_binding[&binding],
BoundnessAnalysis::BasedOnUnboundVisibility,
)
}
pub(crate) fn end_of_scope_declarations<'map>(
&'map self,
place: ScopedPlaceId,
) -> DeclarationsIterator<'map, 'db> {
let declarations = self.end_of_scope_places[place].declarations();
self.declarations_iterator(declarations, BoundnessAnalysis::BasedOnUnboundVisibility)
}
pub(crate) fn all_reachable_declarations(
&self,
place: ScopedPlaceId,
) -> DeclarationsIterator<'_, 'db> {
let declarations = &self.reachable_definitions[place].declarations;
self.declarations_iterator(declarations, BoundnessAnalysis::AssumeBound)
}
pub(crate) fn all_end_of_scope_declarations<'map>(
&'map self,
) -> impl Iterator<Item = (ScopedPlaceId, DeclarationsIterator<'map, 'db>)> + 'map {
(0..self.end_of_scope_places.len())
.map(ScopedPlaceId::from_usize)
.map(|place_id| (place_id, self.end_of_scope_declarations(place_id)))
}
pub(crate) fn all_end_of_scope_bindings<'map>(
&'map self,
) -> impl Iterator<Item = (ScopedPlaceId, BindingWithConstraintsIterator<'map, 'db>)> + 'map
{
(0..self.end_of_scope_places.len())
.map(ScopedPlaceId::from_usize)
.map(|place_id| (place_id, self.end_of_scope_bindings(place_id)))
}
/// This function is intended to be called only once inside `TypeInferenceBuilder::infer_function_body`.
pub(crate) fn can_implicitly_return_none(&self, db: &dyn crate::Db) -> bool {
!self
.reachability_constraints
.evaluate(db, &self.predicates, self.end_of_scope_reachability)
.is_always_false()
}
pub(crate) fn is_binding_reachable(
&self,
db: &dyn crate::Db,
binding: &BindingWithConstraints<'_, 'db>,
) -> Truthiness {
self.reachability_constraints.evaluate(
db,
&self.predicates,
binding.reachability_constraint,
)
}
fn bindings_iterator<'map>(
&'map self,
bindings: &'map Bindings,
boundness_analysis: BoundnessAnalysis,
) -> BindingWithConstraintsIterator<'map, 'db> {
BindingWithConstraintsIterator {
all_definitions: &self.all_definitions,
predicates: &self.predicates,
narrowing_constraints: &self.narrowing_constraints,
reachability_constraints: &self.reachability_constraints,
boundness_analysis,
inner: bindings.iter(),
}
}
fn declarations_iterator<'map>(
&'map self,
declarations: &'map Declarations,
boundness_analysis: BoundnessAnalysis,
) -> DeclarationsIterator<'map, 'db> {
DeclarationsIterator {
all_definitions: &self.all_definitions,
predicates: &self.predicates,
reachability_constraints: &self.reachability_constraints,
boundness_analysis,
inner: declarations.iter(),
}
}
}
/// Uniquely identifies a snapshot of a place state that can be used to resolve a reference in a
/// nested eager scope.
///
/// An eager scope has its entire body executed immediately at the location where it is defined.
/// For any free references in the nested scope, we use the bindings that are visible at the point
/// where the nested scope is defined, instead of using the public type of the place.
///
/// There is a unique ID for each distinct [`EagerSnapshotKey`] in the file.
#[newtype_index]
#[derive(get_size2::GetSize)]
pub(crate) struct ScopedEagerSnapshotId;
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, get_size2::GetSize)]
pub(crate) struct EagerSnapshotKey {
/// The enclosing scope containing the bindings
pub(crate) enclosing_scope: FileScopeId,
/// The referenced place (in the enclosing scope)
pub(crate) enclosing_place: ScopedPlaceId,
/// The nested eager scope containing the reference
pub(crate) nested_scope: FileScopeId,
}
/// A snapshot of place states that can be used to resolve a reference in a nested eager scope.
type EagerSnapshots = IndexVec<ScopedEagerSnapshotId, EagerSnapshot>;
#[derive(Debug)]
pub(crate) struct BindingWithConstraintsIterator<'map, 'db> {
all_definitions: &'map IndexVec<ScopedDefinitionId, DefinitionState<'db>>,
pub(crate) predicates: &'map Predicates<'db>,
pub(crate) narrowing_constraints: &'map NarrowingConstraints,
pub(crate) reachability_constraints: &'map ReachabilityConstraints,
pub(crate) boundness_analysis: BoundnessAnalysis,
inner: LiveBindingsIterator<'map>,
}
impl<'map, 'db> Iterator for BindingWithConstraintsIterator<'map, 'db> {
type Item = BindingWithConstraints<'map, 'db>;
fn next(&mut self) -> Option<Self::Item> {
let predicates = self.predicates;
let narrowing_constraints = self.narrowing_constraints;
self.inner
.next()
.map(|live_binding| BindingWithConstraints {
binding: self.all_definitions[live_binding.binding],
narrowing_constraint: ConstraintsIterator {
predicates,
constraint_ids: narrowing_constraints
.iter_predicates(live_binding.narrowing_constraint),
},
reachability_constraint: live_binding.reachability_constraint,
})
}
}
impl std::iter::FusedIterator for BindingWithConstraintsIterator<'_, '_> {}
pub(crate) struct BindingWithConstraints<'map, 'db> {
pub(crate) binding: DefinitionState<'db>,
pub(crate) narrowing_constraint: ConstraintsIterator<'map, 'db>,
pub(crate) reachability_constraint: ScopedReachabilityConstraintId,
}
pub(crate) struct ConstraintsIterator<'map, 'db> {
predicates: &'map Predicates<'db>,
constraint_ids: NarrowingConstraintsIterator<'map>,
}
impl<'db> Iterator for ConstraintsIterator<'_, 'db> {
type Item = Predicate<'db>;
fn next(&mut self) -> Option<Self::Item> {
self.constraint_ids
.next()
.map(|narrowing_constraint| self.predicates[narrowing_constraint.predicate()])
}
}
impl std::iter::FusedIterator for ConstraintsIterator<'_, '_> {}
impl<'db> ConstraintsIterator<'_, 'db> {
pub(crate) fn narrow(
self,
db: &'db dyn crate::Db,
base_ty: Type<'db>,
place: ScopedPlaceId,
) -> Type<'db> {
let constraint_tys: Vec<_> = self
.filter_map(|constraint| infer_narrowing_constraint(db, constraint, place))
.collect();
if constraint_tys.is_empty() {
base_ty
} else {
constraint_tys
.into_iter()
.rev()
.fold(
IntersectionBuilder::new(db).add_positive(base_ty),
IntersectionBuilder::add_positive,
)
.build()
}
}
}
#[derive(Clone)]
pub(crate) struct DeclarationsIterator<'map, 'db> {
all_definitions: &'map IndexVec<ScopedDefinitionId, DefinitionState<'db>>,
pub(crate) predicates: &'map Predicates<'db>,
pub(crate) reachability_constraints: &'map ReachabilityConstraints,
pub(crate) boundness_analysis: BoundnessAnalysis,
inner: LiveDeclarationsIterator<'map>,
}
pub(crate) struct DeclarationWithConstraint<'db> {
pub(crate) declaration: DefinitionState<'db>,
pub(crate) reachability_constraint: ScopedReachabilityConstraintId,
}
impl<'db> Iterator for DeclarationsIterator<'_, 'db> {
type Item = DeclarationWithConstraint<'db>;
fn next(&mut self) -> Option<Self::Item> {
self.inner.next().map(
|LiveDeclaration {
declaration,
reachability_constraint,
}| {
DeclarationWithConstraint {
declaration: self.all_definitions[*declaration],
reachability_constraint: *reachability_constraint,
}
},
)
}
}
impl std::iter::FusedIterator for DeclarationsIterator<'_, '_> {}
#[derive(Debug, PartialEq, Eq, salsa::Update, get_size2::GetSize)]
struct ReachableDefinitions {
bindings: Bindings,
declarations: Declarations,
}
/// A snapshot of the definitions and constraints state at a particular point in control flow.
#[derive(Clone, Debug)]
pub(super) struct FlowSnapshot {
place_states: IndexVec<ScopedPlaceId, PlaceState>,
reachability: ScopedReachabilityConstraintId,
}
#[derive(Debug)]
pub(super) struct UseDefMapBuilder<'db> {
/// Append-only array of [`DefinitionState`].
all_definitions: IndexVec<ScopedDefinitionId, DefinitionState<'db>>,
/// Builder of predicates.
pub(super) predicates: PredicatesBuilder<'db>,
/// Builder of narrowing constraints.
pub(super) narrowing_constraints: NarrowingConstraintsBuilder,
/// Builder of reachability constraints.
pub(super) reachability_constraints: ReachabilityConstraintsBuilder,
/// Live bindings at each so-far-recorded use.
bindings_by_use: IndexVec<ScopedUseId, Bindings>,
/// Tracks whether or not the current point in control flow is reachable from the
/// start of the scope.
pub(super) reachability: ScopedReachabilityConstraintId,
/// Tracks whether or not a given AST node is reachable from the start of the scope.
node_reachability: FxHashMap<NodeKey, ScopedReachabilityConstraintId>,
/// Live declarations for each so-far-recorded binding.
declarations_by_binding: FxHashMap<Definition<'db>, Declarations>,
/// Live bindings for each so-far-recorded declaration.
bindings_by_declaration: FxHashMap<Definition<'db>, Bindings>,
/// Currently live bindings and declarations for each place.
place_states: IndexVec<ScopedPlaceId, PlaceState>,
/// All potentially reachable bindings and declarations, for each place.
reachable_definitions: IndexVec<ScopedPlaceId, ReachableDefinitions>,
/// Snapshots of place states in this scope that can be used to resolve a reference in a
/// nested eager scope.
eager_snapshots: EagerSnapshots,
/// Is this a class scope?
is_class_scope: bool,
}
impl<'db> UseDefMapBuilder<'db> {
pub(super) fn new(is_class_scope: bool) -> Self {
Self {
all_definitions: IndexVec::from_iter([DefinitionState::Undefined]),
predicates: PredicatesBuilder::default(),
narrowing_constraints: NarrowingConstraintsBuilder::default(),
reachability_constraints: ReachabilityConstraintsBuilder::default(),
bindings_by_use: IndexVec::new(),
reachability: ScopedReachabilityConstraintId::ALWAYS_TRUE,
node_reachability: FxHashMap::default(),
declarations_by_binding: FxHashMap::default(),
bindings_by_declaration: FxHashMap::default(),
place_states: IndexVec::new(),
reachable_definitions: IndexVec::new(),
eager_snapshots: EagerSnapshots::default(),
is_class_scope,
}
}
pub(super) fn mark_unreachable(&mut self) {
self.reachability = ScopedReachabilityConstraintId::ALWAYS_FALSE;
for state in &mut self.place_states {
state.record_reachability_constraint(
&mut self.reachability_constraints,
ScopedReachabilityConstraintId::ALWAYS_FALSE,
);
}
}
pub(super) fn add_place(&mut self, place: ScopedPlaceId) {
let new_place = self
.place_states
.push(PlaceState::undefined(self.reachability));
debug_assert_eq!(place, new_place);
let new_place = self.reachable_definitions.push(ReachableDefinitions {
bindings: Bindings::unbound(self.reachability),
declarations: Declarations::undeclared(self.reachability),
});
debug_assert_eq!(place, new_place);
}
pub(super) fn record_binding(
&mut self,
place: ScopedPlaceId,
binding: Definition<'db>,
is_place_name: bool,
) {
let def_id = self.all_definitions.push(DefinitionState::Defined(binding));
let place_state = &mut self.place_states[place];
self.declarations_by_binding
.insert(binding, place_state.declarations().clone());
place_state.record_binding(
def_id,
self.reachability,
self.is_class_scope,
is_place_name,
);
self.reachable_definitions[place].bindings.record_binding(
def_id,
self.reachability,
self.is_class_scope,
is_place_name,
PreviousDefinitions::AreKept,
);
}
pub(super) fn add_predicate(
&mut self,
predicate: PredicateOrLiteral<'db>,
) -> ScopedPredicateId {
match predicate {
PredicateOrLiteral::Predicate(predicate) => self.predicates.add_predicate(predicate),
PredicateOrLiteral::Literal(true) => ScopedPredicateId::ALWAYS_TRUE,
PredicateOrLiteral::Literal(false) => ScopedPredicateId::ALWAYS_FALSE,
}
}
pub(super) fn record_narrowing_constraint(&mut self, predicate: ScopedPredicateId) {
if predicate == ScopedPredicateId::ALWAYS_TRUE
|| predicate == ScopedPredicateId::ALWAYS_FALSE
{
// No need to record a narrowing constraint for `True` or `False`.
return;
}
let narrowing_constraint = predicate.into();
for state in &mut self.place_states {
state
.record_narrowing_constraint(&mut self.narrowing_constraints, narrowing_constraint);
}
}
/// Snapshot the state of a single place at the current point in control flow.
///
/// This is only used for `*`-import reachability constraints, which are handled differently
/// to most other reachability constraints. See the doc-comment for
/// [`Self::record_and_negate_star_import_reachability_constraint`] for more details.
pub(super) fn single_place_snapshot(&self, place: ScopedPlaceId) -> PlaceState {
self.place_states[place].clone()
}
/// This method exists solely for handling `*`-import reachability constraints.
///
/// The reason why we add reachability constraints for [`Definition`]s created by `*` imports
/// is laid out in the doc-comment for `StarImportPlaceholderPredicate`. But treating these
/// reachability constraints in the use-def map the same way as all other reachability constraints
/// was shown to lead to [significant regressions] for small codebases where typeshed
/// dominates. (Although `*` imports are not common generally, they are used in several
/// important places by typeshed.)
///
/// To solve these regressions, it was observed that we could do significantly less work for
/// `*`-import definitions. We do a number of things differently here to our normal handling of
/// reachability constraints:
///
/// - We only apply and negate the reachability constraints to a single symbol, rather than to
/// all symbols. This is possible here because, unlike most definitions, we know in advance that
/// exactly one definition occurs inside the "if-true" predicate branch, and we know exactly
/// which definition it is.
///
/// - We only snapshot the state for a single place prior to the definition, rather than doing
/// expensive calls to [`Self::snapshot`]. Again, this is possible because we know
/// that only a single definition occurs inside the "if-predicate-true" predicate branch.
///
/// - Normally we take care to check whether an "if-predicate-true" branch or an
/// "if-predicate-false" branch contains a terminal statement: these can affect the reachability
/// of symbols defined inside either branch. However, in the case of `*`-import definitions,
/// this is unnecessary (and therefore not done in this method), since we know that a `*`-import
/// predicate cannot create a terminal statement inside either branch.
///
/// [significant regressions]: https://github.com/astral-sh/ruff/pull/17286#issuecomment-2786755746
pub(super) fn record_and_negate_star_import_reachability_constraint(
&mut self,
reachability_id: ScopedReachabilityConstraintId,
symbol: ScopedPlaceId,
pre_definition_state: PlaceState,
) {
let negated_reachability_id = self
.reachability_constraints
.add_not_constraint(reachability_id);
let mut post_definition_state =
std::mem::replace(&mut self.place_states[symbol], pre_definition_state);
post_definition_state
.record_reachability_constraint(&mut self.reachability_constraints, reachability_id);
self.place_states[symbol].record_reachability_constraint(
&mut self.reachability_constraints,
negated_reachability_id,
);
self.place_states[symbol].merge(
post_definition_state,
&mut self.narrowing_constraints,
&mut self.reachability_constraints,
);
}
pub(super) fn record_reachability_constraint(
&mut self,
constraint: ScopedReachabilityConstraintId,
) {
self.reachability = self
.reachability_constraints
.add_and_constraint(self.reachability, constraint);
for state in &mut self.place_states {
state.record_reachability_constraint(&mut self.reachability_constraints, constraint);
}
}
pub(super) fn record_declaration(
&mut self,
place: ScopedPlaceId,
declaration: Definition<'db>,
) {
let def_id = self
.all_definitions
.push(DefinitionState::Defined(declaration));
let place_state = &mut self.place_states[place];
self.bindings_by_declaration
.insert(declaration, place_state.bindings().clone());
place_state.record_declaration(def_id, self.reachability);
self.reachable_definitions[place]
.declarations
.record_declaration(def_id, self.reachability, PreviousDefinitions::AreKept);
}
pub(super) fn record_declaration_and_binding(
&mut self,
place: ScopedPlaceId,
definition: Definition<'db>,
is_place_name: bool,
) {
// We don't need to store anything in self.bindings_by_declaration or
// self.declarations_by_binding.
let def_id = self
.all_definitions
.push(DefinitionState::Defined(definition));
let place_state = &mut self.place_states[place];
place_state.record_declaration(def_id, self.reachability);
place_state.record_binding(
def_id,
self.reachability,
self.is_class_scope,
is_place_name,
);
self.reachable_definitions[place]
.declarations
.record_declaration(def_id, self.reachability, PreviousDefinitions::AreKept);
self.reachable_definitions[place].bindings.record_binding(
def_id,
self.reachability,
self.is_class_scope,
is_place_name,
PreviousDefinitions::AreKept,
);
}
pub(super) fn delete_binding(&mut self, place: ScopedPlaceId, is_place_name: bool) {
let def_id = self.all_definitions.push(DefinitionState::Deleted);
let place_state = &mut self.place_states[place];
place_state.record_binding(
def_id,
self.reachability,
self.is_class_scope,
is_place_name,
);
}
pub(super) fn record_use(
&mut self,
place: ScopedPlaceId,
use_id: ScopedUseId,
node_key: NodeKey,
) {
// We have a use of a place; clone the current bindings for that place, and record them
// as the live bindings for this use.
let new_use = self
.bindings_by_use
.push(self.place_states[place].bindings().clone());
debug_assert_eq!(use_id, new_use);
// Track reachability of all uses of places to silence `unresolved-reference`
// diagnostics in unreachable code.
self.record_node_reachability(node_key);
}
pub(super) fn record_node_reachability(&mut self, node_key: NodeKey) {
self.node_reachability.insert(node_key, self.reachability);
}
pub(super) fn snapshot_eager_state(
&mut self,
enclosing_place: ScopedPlaceId,
scope: ScopeKind,
enclosing_place_expr: &PlaceExprWithFlags,
) -> ScopedEagerSnapshotId {
// Names bound in class scopes are never visible to nested scopes (but attributes/subscripts are visible),
// so we never need to save eager scope bindings in a class scope.
if (scope.is_class() && enclosing_place_expr.is_name()) || !enclosing_place_expr.is_bound()
{
self.eager_snapshots.push(EagerSnapshot::Constraint(
self.place_states[enclosing_place]
.bindings()
.unbound_narrowing_constraint(),
))
} else {
self.eager_snapshots.push(EagerSnapshot::Bindings(
self.place_states[enclosing_place].bindings().clone(),
))
}
}
/// Take a snapshot of the current visible-places state.
pub(super) fn snapshot(&self) -> FlowSnapshot {
FlowSnapshot {
place_states: self.place_states.clone(),
reachability: self.reachability,
}
}
/// Restore the current builder places state to the given snapshot.
pub(super) fn restore(&mut self, snapshot: FlowSnapshot) {
// We never remove places from `place_states` (it's an IndexVec, and the place
// IDs must line up), so the current number of known places must always be equal to or
// greater than the number of known places in a previously-taken snapshot.
let num_places = self.place_states.len();
debug_assert!(num_places >= snapshot.place_states.len());
// Restore the current visible-definitions state to the given snapshot.
self.place_states = snapshot.place_states;
self.reachability = snapshot.reachability;
// If the snapshot we are restoring is missing some places we've recorded since, we need
// to fill them in so the place IDs continue to line up. Since they don't exist in the
// snapshot, the correct state to fill them in with is "undefined".
self.place_states
.resize(num_places, PlaceState::undefined(self.reachability));
}
/// Merge the given snapshot into the current state, reflecting that we might have taken either
/// path to get here. The new state for each place should include definitions from both the
/// prior state and the snapshot.
pub(super) fn merge(&mut self, snapshot: FlowSnapshot) {
// As an optimization, if we know statically that either of the snapshots is always
// unreachable, we can leave it out of the merged result entirely. Note that we cannot
// perform any type inference at this point, so this is largely limited to unreachability
// via terminal statements. If a flow's reachability depends on an expression in the code,
// we will include the flow in the merged result; the reachability constraints of its
// bindings will include this reachability condition, so that later during type inference,
// we can determine whether any particular binding is non-visible due to unreachability.
if snapshot.reachability == ScopedReachabilityConstraintId::ALWAYS_FALSE {
return;
}
if self.reachability == ScopedReachabilityConstraintId::ALWAYS_FALSE {
self.restore(snapshot);
return;
}
// We never remove places from `place_states` (it's an IndexVec, and the place
// IDs must line up), so the current number of known places must always be equal to or
// greater than the number of known places in a previously-taken snapshot.
debug_assert!(self.place_states.len() >= snapshot.place_states.len());
let mut snapshot_definitions_iter = snapshot.place_states.into_iter();
for current in &mut self.place_states {
if let Some(snapshot) = snapshot_definitions_iter.next() {
current.merge(
snapshot,
&mut self.narrowing_constraints,
&mut self.reachability_constraints,
);
} else {
current.merge(
PlaceState::undefined(snapshot.reachability),
&mut self.narrowing_constraints,
&mut self.reachability_constraints,
);
// Place not present in snapshot, so it's unbound/undeclared from that path.
}
}
self.reachability = self
.reachability_constraints
.add_or_constraint(self.reachability, snapshot.reachability);
}
pub(super) fn finish(mut self) -> UseDefMap<'db> {
self.all_definitions.shrink_to_fit();
self.place_states.shrink_to_fit();
self.reachable_definitions.shrink_to_fit();
self.bindings_by_use.shrink_to_fit();
self.node_reachability.shrink_to_fit();
self.declarations_by_binding.shrink_to_fit();
self.bindings_by_declaration.shrink_to_fit();
self.eager_snapshots.shrink_to_fit();
UseDefMap {
all_definitions: self.all_definitions,
predicates: self.predicates.build(),
narrowing_constraints: self.narrowing_constraints.build(),
reachability_constraints: self.reachability_constraints.build(),
bindings_by_use: self.bindings_by_use,
node_reachability: self.node_reachability,
end_of_scope_places: self.place_states,
reachable_definitions: self.reachable_definitions,
declarations_by_binding: self.declarations_by_binding,
bindings_by_declaration: self.bindings_by_declaration,
eager_snapshots: self.eager_snapshots,
end_of_scope_reachability: self.reachability,
}
}
}