Commit Graph

39 Commits

Author SHA1 Message Date
Alex Waygood
5bac4f6bd4 [red-knot] Add a test to ensure that KnownClass::try_from_file_and_name() is kept up to date (#16326) 2025-02-24 12:14:20 +00:00
Micha Reiser
b3e99b25bf Fix missing serde feature for red_knot_python_semantic (#16169)
## Summary

Running `cargo test -p red_knot_python_semantic` failed because of a
missing serde feature. This PR enables the `ruff_python_ast`'`s `serde`
if the crate's `serde` feature is enabled

## Test Plan

`cargo test -p red_knot_python_semantic` compiles again
2025-02-14 20:31:55 +00:00
Ibraheem Ahmed
69d86d1d69 Transition to salsa coarse-grained tracked structs (#15763)
## Summary

Transition to using coarse-grained tracked structs (depends on
https://github.com/salsa-rs/salsa/pull/657). For now, this PR doesn't
add any `#[tracked]` fields, meaning that any changes cause the entire
struct to be invalidated. It also changes `AstNodeRef` to be
compared/hashed by pointer address, instead of performing a deep AST
comparison.

## Test Plan

This yields a 10-15% improvement on my machine (though weirdly some runs
were 5-10% without being flagged as inconsistent by criterion, is there
some non-determinism involved?). It's possible that some of this is
unrelated, I'll try applying the patch to the current salsa version to
make sure.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2025-02-11 11:38:50 +01:00
Micha Reiser
af832560fc [red-knot] User-level configuration (#16021)
## Summary

This PR adds support for user-level configurations
(`~/.config/knot/knot.toml`) to Red Knot.

Red Knot will watch the user-level configuration file for changes but
only if it exists
when the process start. It doesn't watch for new configurations, 
mainly to simplify things for now (it would require watching the entire
`.config` directory because the `knot` subfolder might not exist
either).

The new `ConfigurationFile` struct seems a bit overkill for now but I
plan to use it for
hierarchical configurations as well. 


Red Knot uses the same strategy as uv and Ruff by using the etcetera
crate.

## Test Plan

Added CLI and file watching test
2025-02-10 16:44:23 +01:00
Micha Reiser
f7819e553f Add user_configuration_directory to System (#16020)
## Summary

This PR adds a new `user_configuration_directory` method to `System`. We
need it to resolve where to lookup a user-level `knot.toml`
configuration file.
The method belongs to `System` because not all platforms have a
convention of where to store such configuration files (e.g. wasm).


I refactored `TestSystem` to be a simple wrapper around an `Arc<dyn
System...>` and use the `System.as_any` method instead to cast it down
to an `InMemory` system. I also removed some `System` specific methods
from `InMemoryFileSystem`, they don't belong there.

This PR removes the `os` feature as a default feature from `ruff_db`.
Most crates depending on `ruff_db` don't need it because they only
depend on `System` or only depend on `os` for testing. This was
necessary to fix a compile error with `red_knot_wasm`

## Test Plan

I'll make use of the method in my next PR. So I guess we won't know if
it works before then but I copied the code from Ruff/uv, so I have high
confidence that it is correct.

`cargo test`
2025-02-10 15:50:55 +01:00
Micha Reiser
26c37b1e0e Add knot.toml schema (#15735)
## Summary

Adds a JSON schema generation step for Red Knot. This PR doesn't yet add
a publishing step because it's still a bit early for that


## Test plan

I tested the schema in Zed, VS Code and PyCharm:

* PyCharm: You have to manually add a schema mapping (settings JSON
Schema Mappings)
* Zed and VS code support the inline schema specification

```toml
#:schema /Users/micha/astral/ruff/knot.schema.json


[environment]
extra-paths = []


[rules]
call-possibly-unbound-method = "error"
unknown-rule = "error"

# duplicate-base = "error"
```

```json
{
    "$schema": "file:///Users/micha/astral/ruff/knot.schema.json",

    "environment": {
        "python-version": "3.13",
        "python-platform": "linux2"
    },

    "rules": {
        "unknown-rule": "error"
    }
}
```


https://github.com/user-attachments/assets/a18fcd96-7cbe-4110-985b-9f1935584411


The Schema overall works but all editors have their own quirks:

* PyCharm: Hovering a name always shows the section description instead
of the description of the specific setting. But it's the same for other
settings in `pyproject.toml` files 🤷
* VS Code (JSON): Using the generated schema in a JSON file gives
exactly the experience I want
* VS Code (TOML): 
* Properties with multiple possible values are repeated during
auto-completion without giving any hint how they're different. ![Screen
Shot 2025-02-06 at 14 05 35
PM](https://github.com/user-attachments/assets/d7f3c2a9-2351-4226-9fc1-b91aa192a237)
* The property description mushes together the description of the
property and the value, which looks sort of ridiculous. ![Screen Shot
2025-02-06 at 14 04 40
PM](https://github.com/user-attachments/assets/8b72f04a-c62a-49b5-810f-7ddd472884d0)
* Autocompletion and documentation hovering works (except the
limitations mentioned above)
* Zed:
* Very similar to VS Code with the exception that it uses the
description attribute to distinguish settings with multiple possible
values ![Screen Shot 2025-02-06 at 14 08 19
PM](https://github.com/user-attachments/assets/78a7f849-ff4e-44ff-8317-708eaf02dc1f)


I don't think there's much we can do here other than hope (or help)
editors improve their auto completion. The same short comings also apply
to ruff, so this isn't something new. For now, I think this is good
enough
2025-02-07 10:59:40 +01:00
Micha Reiser
2f85749fa0 type: ignore[codes] and knot: ignore (#15078) 2024-12-23 10:52:43 +01:00
Micha Reiser
0fc4e8f795 Introduce InferContext (#14956)
## Summary

I'm currently on the fence about landing the #14760 PR because it's
unclear how we'd support tracking used and unused suppression comments
in a performant way:
* Salsa adds an "untracked" dependency to every query reading
accumulated values. This has the effect that the query re-runs on every
revision. For example, a possible future query
`unused_suppression_comments(db, file)` would re-run on every
incremental change and for every file. I don't expect the operation
itself to be expensive, but it all adds up in a project with 100k+ files
* Salsa collects the accumulated values by traversing the entire query
dependency graph. It can skip over sub-graphs if it is known that they
contain no accumulated values. This makes accumulators a great tool for
when they are rare; diagnostics are a good example. Unfortunately,
suppressions are more common, and they often appear in many different
files, making the "skip over subgraphs" optimization less effective.

Because of that, I want to wait to adopt salsa accumulators for type
check diagnostics (we could start using them for other diagnostics)
until we have very specific reasons that justify regressing incremental
check performance.

This PR does a "small" refactor that brings us closer to what I have in
#14760 but without using accumulators. To emit a diagnostic, a method
needs:

* Access to the db
* Access to the currently checked file

This PR introduces a new `InferContext` that holds on to the db, the
current file, and the reported diagnostics. It replaces the
`TypeCheckDiagnosticsBuilder`. We pass the `InferContext` instead of the
`db` to methods that *might* emit diagnostics. This simplifies some of
the `Outcome` methods, which can now be called with a context instead of
a `db` and the diagnostics builder. Having the `db` and the file on a
single type like this would also be useful when using accumulators.

This PR doesn't solve the issue that the `Outcome` types feel somewhat
complicated nor that it can be annoying when you need to report a
`Diagnostic,` but you don't have access to an `InferContext` (or the
file). However, I also believe that accumulators won't solve these
problems because:

* Even with accumulators, it's necessary to have a reference to the file
that's being checked. The struggle would be to get a reference to that
file rather than getting a reference to `InferContext`.
* Users of the `HasTy` trait (e.g., a linter) don't want to bother
getting the `File` when calling `Type::return_ty` because they aren't
interested in the created diagnostics. They just want to know what
calling the current expression would return (and if it even is a
callable). This is what the different methods of `Outcome` enable today.
I can ask for the return type without needing extra data that's only
relevant for emitting a diagnostic.

A shortcoming of this approach is that it is now a bit confusing when to
pass `db` and when an `InferContext`. An option is that we'd make the
`file` on `InferContext` optional (it won't collect any diagnostics if
`None`) and change all methods on `Type` to take `InferContext` as the
first argument instead of a `db`. I'm interested in your opinion on
this.

Accumulators are definitely harder to use incorrectly because they
remove the need to merge the diagnostics explicitly and there's no risk
that we accidentally merge the diagnostics twice, resulting in
duplicated diagnostics. I still value performance more over making our
life slightly easier.
2024-12-18 12:22:33 +00:00
Micha Reiser
f52b1f4a4d Add tracing support to mdtest (#14935)
## Summary

This PR extends the mdtest configuration with a `log` setting that can
be any of:

* `true`: Enables tracing
* `false`: Disables tracing (default)
* String: An ENV_FILTER similar to `RED_KNOT_LOG`

```toml
log = true
```

Closes https://github.com/astral-sh/ruff/issues/13865

## Test Plan

I changed a test and tried `log=true`, `log=false`, and `log=INFO`
2024-12-13 09:10:01 +00:00
Micha Reiser
5fc8e5d80e [red-knot] Add infrastructure to declare lints (#14873)
## Summary

This is the second PR out of three that adds support for
enabling/disabling lint rules in Red Knot. You may want to take a look
at the [first PR](https://github.com/astral-sh/ruff/pull/14869) in this
stack to familiarize yourself with the used terminology.

This PR adds a new syntax to define a lint: 

```rust
declare_lint! {
    /// ## What it does
    /// Checks for references to names that are not defined.
    ///
    /// ## Why is this bad?
    /// Using an undefined variable will raise a `NameError` at runtime.
    ///
    /// ## Example
    ///
    /// ```python
    /// print(x)  # NameError: name 'x' is not defined
    /// ```
    pub(crate) static UNRESOLVED_REFERENCE = {
        summary: "detects references to names that are not defined",
        status: LintStatus::preview("1.0.0"),
        default_level: Level::Warn,
    }
}
```

A lint has a name and metadata about its status (preview, stable,
removed, deprecated), the default diagnostic level (unless the
configuration changes), and documentation. I use a macro here to derive
the kebab-case name and extract the documentation automatically.

This PR doesn't yet add any mechanism to discover all known lints. This
will be added in the next and last PR in this stack.


## Documentation
I documented some rules but then decided that it's probably not my best
use of time if I document all of them now (it also means that I play
catch-up with all of you forever). That's why I left some rules
undocumented (marked with TODO)

## Where is the best place to define all lints?

I'm not sure. I think what I have in this PR is fine but I also don't
love it because most lints are in a single place but not all of them. If
you have ideas, let me know.


## Why is the message not part of the lint, unlike Ruff's `Violation`

I understand that the main motivation for defining `message` on
`Violation` in Ruff is to remove the need to repeat the same message
over and over again. I'm not sure if this is an actual problem. Most
rules only emit a diagnostic in a single place and they commonly use
different messages if they emit diagnostics in different code paths,
requiring extra fields on the `Violation` struct.

That's why I'm not convinced that there's an actual need for it and
there are alternatives that can reduce the repetition when creating a
diagnostic:

* Create a helper function. We already do this in red knot with the
`add_xy` methods
* Create a custom `Diagnostic` implementation that tailors the entire
diagnostic and pre-codes e.g. the message

Avoiding an extra field on the `Violation` also removes the need to
allocate intermediate strings as it is commonly the place in Ruff.
Instead, Red Knot can use a borrowed string with `format_args`

## Test Plan

`cargo test`
2024-12-10 16:14:44 +00:00
David Peter
74309008fd [red-knot] Property tests (#14178)
## Summary

This PR adds a new `property_tests` module with quickcheck-based tests
that verify certain properties of types. The following properties are
currently checked:

* `is_equivalent_to`:
  * is reflexive: `T` is equivalent to itself
* `is_subtype_of`:
  * is reflexive: `T` is a subtype of `T`
* is antisymmetric: if `S <: T` and `T <: S`, then `S` is equivalent to
`T`
  * is transitive: `S <: T` & `T <: U` => `S <: U`
* `is_disjoint_from`:
  * is irreflexive: `T` is not disjoint from `T`
  * is symmetric: `S` disjoint from `T` => `T` disjoint from `S`
* `is_assignable_to`:
  * is reflexive
* `negate`:
  * is an involution: `T.negate().negate()` is equivalent to `T`

There are also some tests that validate higher-level properties like:

* `S <: T` implies that `S` is not disjoint from `T`
* `S <: T` implies that `S` is assignable to `T`
* A singleton type must also be single-valued

These tests found a few bugs so far:

- #14177 
- #14195 
- #14196 
- #14210
- #14731

Some additional notes:

- Quickcheck-based property tests are non-deterministic and finding
counter-examples might take an arbitrary long time. This makes them bad
candidates for running in CI (for every PR). We can think of running
them in a cron-job way from time to time, similar to fuzzing. But for
now, it's only possible to run them locally (see instructions in source
code).
- Some tests currently find false positive "counterexamples" because our
understanding of equivalence of types is not yet complete. We do not
understand that `int | str` is the same as `str | int`, for example.
These tests are in a separate `property_tests::flaky` module.
- Properties can not be formulated in every way possible, due to the
fact that `is_disjoint_from` and `is_subtype_of` can produce false
negative answers.
- The current shrinking implementation is very naive, which leads to
counterexamples that are very long (`str & Any & ~tuple[Any] &
~tuple[Unknown] & ~Literal[""] & ~Literal["a"] | str & int & ~tuple[Any]
& ~tuple[Unknown]`), requiring the developer to simplify manually. It
has not been a major issue so far, but there is a comment in the code
how this can be improved.
- The tests are currently implemented using a macro. This is a single
commit on top which can easily be reverted, if we prefer the plain code
instead. With the macro:
  ```rs
  // `S <: T` implies that `S` can be assigned to `T`.
  type_property_test!(
      subtype_of_implies_assignable_to, db,
forall types s, t. s.is_subtype_of(db, t) => s.is_assignable_to(db, t)
  );
  ```
  without the macro:
  ```rs
  /// `S <: T` implies that `S` can be assigned to `T`.
  #[quickcheck]
  fn subtype_of_implies_assignable_to(s: Ty, t: Ty) -> bool {
      let db = get_cached_db();
  
      let s = s.into_type(&db);
      let t = t.into_type(&db);
  
      !s.is_subtype_of(&*db, t) || s.is_assignable_to(&*db, t)
  }
  ```

## Test Plan

```bash
while cargo test --release -p red_knot_python_semantic --features property_tests types::property_tests; do :; done
```
2024-12-03 13:54:54 +01:00
Micha Reiser
81e5830585 Workspace discovery (#14308) 2024-11-15 19:20:15 +01:00
Micha Reiser
2b58705cc1 Remove the optional salsa dependency from the AST crate (#14363) 2024-11-15 16:46:04 +00:00
Dhruv Manilawala
9ec690b8f8 [red-knot] Add support for string annotations (#14151)
## Summary

This PR adds support for parsing and inferring types within string
annotations.

### Implementation (attempt 1)

This is preserved in
6217f48924.

The implementation here would separate the inference of string
annotations in the deferred query. This requires the following:
* Two ways of evaluating the deferred definitions - lazily and eagerly. 
* An eager evaluation occurs right outside the definition query which in
this case would be in `binding_ty` and `declaration_ty`.
* A lazy evaluation occurs on demand like using the
`definition_expression_ty` to determine the function return type and
class bases.
* The above point means that when trying to get the binding type for a
variable in an annotated assignment, the definition query won't include
the type. So, it'll require going through the deferred query to get the
type.

This has the following limitations:
* Nested string annotations, although not necessarily a useful feature,
is difficult to implement unless we convert the implementation in an
infinite loop
* Partial string annotations require complex layout because inferring
the types for stringified and non-stringified parts of the annotation
are done in separate queries. This means we need to maintain additional
information

### Implementation (attempt 2)

This is the final diff in this PR.

The implementation here does the complete inference of string annotation
in the same definition query by maintaining certain state while trying
to infer different parts of an expression and take decisions
accordingly. These are:
* Allow names that are part of a string annotation to not exists in the
symbol table. For example, in `x: "Foo"`, if the "Foo" symbol is not
defined then it won't exists in the symbol table even though it's being
used. This is an invariant which is being allowed only for symbols in a
string annotation.
* Similarly, lookup name is updated to do the same and if the symbol
doesn't exists, then it's not bounded.
* Store the final type of a string annotation on the string expression
itself and not for any of the sub-expressions that are created after
parsing. This is because those sub-expressions won't exists in the
semantic index.

Design document:
https://www.notion.so/astral-sh/String-Annotations-12148797e1ca801197a9f146641e5b71?pvs=4

Closes: #13796 

## Test Plan

* Add various test cases in our markdown framework
* Run `red_knot` on LibCST (contains a lot of string annotations,
specifically
https://github.com/Instagram/LibCST/blob/main/libcst/matchers/_matcher_base.py),
FastAPI (good amount of annotated code including `typing.Literal`) and
compare against the `main` branch output
2024-11-15 04:10:18 +00:00
Micha Reiser
bc0586d922 Avoid cloning Name when looking up function and class types (#14092) 2024-11-04 15:52:59 +01:00
Alex Waygood
df45a0e3f9 [red-knot] Add MRO resolution for classes (#14027) 2024-11-04 13:31:38 +00:00
Aditya Pratap Singh
7fd8e30eed [red-knot] Cleanup generated names of mdtest tests (#13831)
Co-authored-by: Alex Waygood <alex.waygood@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
2024-10-20 15:11:53 +00:00
David Peter
04b636cba2 [red knot] Use memmem::find instead of custom version (#13750)
This is a follow-up on #13746:

- Use `memmem::find` instead of rolling our own inferior version.
- Avoid `x.as_ref()` calls using `&**x`
2024-10-14 15:17:19 +02:00
Carl Meyer
93eff7f174 [red-knot] type inference/checking test framework (#13636)
## Summary

Adds a markdown-based test framework for writing tests of type inference
and type checking. Fixes #11664.

Implements the basic required features. A markdown test file is a suite
of tests, each test can contain one or more Python files, with
optionally specified path/name. The test writes all files to an
in-memory file system, runs red-knot, and matches the resulting
diagnostics against `Type: ` and `Error: ` assertions embedded in the
Python source as comments.

We will want to add features like incremental tests, setting custom
configuration for tests, writing non-Python files, testing syntax
errors, capturing full diagnostic output, etc. There's also plenty of
room for improved UX (colored output?).

## Test Plan

Lots of tests!

Sample of the current output when a test fails:

```
     Running tests/inference.rs (target/debug/deps/inference-7c96590aa84de2a4)

running 1 test
test inference::path_1_resources_inference_numbers_md ... FAILED

failures:

---- inference::path_1_resources_inference_numbers_md stdout ----
inference/numbers.md - Numbers - Floats
  /src/test.py
    line 2: unexpected error: [invalid-assignment] "Object of type `Literal["str"]` is not assignable to `int`"

thread 'inference::path_1_resources_inference_numbers_md' panicked at crates/red_knot_test/src/lib.rs:60:5:
Some tests failed.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    inference::path_1_resources_inference_numbers_md

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.19s

error: test failed, to rerun pass `-p red_knot_test --test inference`
```

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-10-08 12:33:19 -07:00
Simon
888930b7d3 [red-knot] feat: implement integer comparison (#13571)
## Summary

Implements the comparison operator for `[Type::IntLiteral]` and
`[Type::BooleanLiteral]` (as an artifact of special handling of `True` and
`False` in python).
Sets the framework to implement more comparison for types known at
static time (e.g. `BooleanLiteral`, `StringLiteral`), allowing us to only
implement cases of the triplet `<left> Type`, `<right> Type`, `CmpOp`.
Contributes to #12701 (without checking off an item yet).

## Test Plan

- Added a test for the comparison of literals that should include most
cases of note.
- Added a test for the comparison of int instances

Please note that the cases do not cover 100% of the branches as there
are many and the current testing strategy with variables make this
fairly confusing once we have too many in one test.

---------

Co-authored-by: Carl Meyer <carl@astral.sh>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-10-04 10:40:59 -07:00
Alex Waygood
82324678cf Rename the ruff_vendored crate to red_knot_vendored (#13586) 2024-10-01 16:16:59 +01:00
Micha Reiser
653c09001a Use an empty vendored file system in Ruff (#13436)
## Summary

This PR changes removes the typeshed stubs from the vendored file system
shipped with ruff
and instead ships an empty "typeshed".

Making the typeshed files optional required extracting the typshed files
into a new `ruff_vendored` crate. I do like this even if all our builds
always include typeshed because it means `red_knot_python_semantic`
contains less code that needs compiling.

This also allows us to use deflate because the compression algorithm
doesn't matter for an archive containing a single, empty file.

## Test Plan

`cargo test`

I verified with ` cargo tree -f "{p} {f}" -p <package> ` that:

* red_knot_wasm: enables `deflate` compression
* red_knot: enables `zstd` compression
* `ruff`: uses stored


I'm not quiet sure how to build the binary that maturin builds but
comparing the release artifact size with `strip = true` shows a `1.5MB`
size reduction

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-09-21 16:31:42 +00:00
Carl Meyer
175d067250 [red-knot] add initial Type::is_equivalent_to and Type::is_assignable_to (#13332)
These are quite incomplete, but I needed to start stubbing them out in
order to build and test declared-types.

Allowing unused for now, until they are used later in the declared-types
PR.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-09-12 14:15:25 -04:00
Teodoro Freund
b9c8113a8a Added bytes type and some inference (#13061)
## Summary

This PR adds the `bytes` type to red-knot:
- Added the `bytes` type
- Added support for bytes literals
- Support for the `+` operator

Improves on #12701 

Big TODO on supporting and normalizing r-prefixed bytestrings
(`rb"hello\n"`)

## Test Plan

Added a test for a bytes literals, concatenation, and corner values
2024-08-22 13:27:15 -07:00
Micha Reiser
dce87c21fd Eagerly validate typeshed versions (#12786) 2024-08-21 15:49:53 +00:00
Carl Meyer
6359e55383 [red-knot] type narrowing (#12706)
Extend the `UseDefMap` to also track which constraints (provided by e.g.
`if` tests) apply to each visible definition.

Uses a custom `BitSet` and `BitSetArray` to track which constraints
apply to which definitions, while keeping data inline as much as
possible.
2024-08-16 16:34:13 -07:00
Dhruv Manilawala
99dc208b00 [red-knot] Add filename and source location for diagnostics (#12842)
## Summary

I'm not sure if this is useful but this is a hacky implementation to add
the filename and row / column numbers to the current Red Knot
diagnostics.
2024-08-12 15:56:30 +00:00
Micha Reiser
a99a45868c Eagerly validate search paths (#12783)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-08-12 07:46:59 +00:00
Micha Reiser
ffaa35eafe Add test helper to setup tracing (#12741) 2024-08-09 07:04:04 +00:00
Carl Meyer
bc5b9b81dd [red-knot] add dev dependency on ruff_db os feature from red_knot_pyt… (#12760) 2024-08-08 18:10:30 +01:00
Alex Waygood
f1de08c2a0 [red-knot] Merge the semantic and module-resolver crates (#12751) 2024-08-08 15:34:11 +01:00
Micha Reiser
eac965ecaf [red-knot] Watch search paths (#12407) 2024-07-24 07:38:50 +00:00
Carl Meyer
f22c8ab811 [red-knot] add maybe-undefined lint rule (#12414)
Add a lint rule to detect if a name is definitely or possibly undefined
at a given usage.

If I create the file `undef/main.py` with contents:

```python
x = int
def foo():
    z
    return x
if flag:
    y = x
y
```

And then run `cargo run --bin red_knot -- --current-directory
../ruff-examples/undef`, I get the output:

```
Name 'z' used when not defined.
Name 'flag' used when not defined.
Name 'y' used when possibly not defined.
```

If I modify the file to add `y = 0` at the top, red-knot re-checks it
and I get the new output:

```
Name 'z' used when not defined.
Name 'flag' used when not defined.
```

Note that `int` is not flagged, since it's a builtin, and `return x` in
the function scope is not flagged, since it refers to the global `x`.
2024-07-22 13:53:59 -07:00
Carl Meyer
595b1aa4a1 [red-knot] per-definition inference, use-def maps (#12269)
Implements definition-level type inference, with basic control flow
(only if statements and if expressions so far) in Salsa.

There are a couple key ideas here:

1) We can do type inference queries at any of three region
granularities: an entire scope, a single definition, or a single
expression. These are represented by the `InferenceRegion` enum, and the
entry points are the salsa queries `infer_scope_types`,
`infer_definition_types`, and `infer_expression_types`. Generally
per-scope will be used for scopes that we are directly checking and
per-definition will be used anytime we are looking up symbol types from
another module/scope. Per-expression should be uncommon: used only for
the RHS of an unpacking or multi-target assignment (to avoid
re-inferring the RHS once per symbol defined in the assignment) and for
test nodes in type narrowing (e.g. the `test` of an `If` node). All
three queries return a `TypeInference` with a map of types for all
definitions and expressions within their region. If you do e.g.
scope-level inference, when it hits a definition, or an
independently-inferable expression, it should use the relevant query
(which may already be cached) to get all types within the smaller
region. This avoids double-inferring smaller regions, even though larger
regions encompass smaller ones.

2) Instead of building a control-flow graph and lazily traversing it to
find definitions which reach a use of a name (which is O(n^2) in the
worst case), instead semantic indexing builds a use-def map, where every
use of a name knows which definitions can reach that use. We also no
longer track all definitions of a symbol in the symbol itself; instead
the use-def map also records which defs remain visible at the end of the
scope, and considers these the publicly-visible definitions of the
symbol (see below).

Major items left as TODOs in this PR, to be done in follow-up PRs:

1) Free/global references aren't supported yet (only lookup based on
definitions in current scope), which means the override-check example
doesn't currently work. This is the first thing I'll fix as follow-up to
this PR.

2) Control flow outside of if statements and expressions.

3) Type narrowing.

There are also some smaller relevant changes here:

1) Eliminate `Option` in the return type of member lookups; instead
always return `Type::Unbound` for a name we can't find. Also use
`Type::Unbound` for modules we can't resolve (not 100% sure about this
one yet.)

2) Eliminate the use of the terms "public" and "root" to refer to
module-global scope or symbols. Instead consistently use the term
"module-global". It's longer, but it's the clearest, and the most
consistent with typical Python terminology. In particular I don't like
"public" for this use because it has other implications around author
intent (is an underscore-prefixed module-global symbol "public"?). And
"root" is just not commonly used for this in Python.

3) Eliminate the `PublicSymbol` Salsa ingredient. Many non-module-global
symbols can also be seen from other scopes (e.g. by a free var in a
nested scope, or by class attribute access), and thus need to have a
"public type" (that is, the type not as seen from a particular use in
the control flow of the same scope, but the type as seen from some other
scope.) So all symbols need to have a "public type" (here I want to keep
the use of the term "public", unless someone has a better term to
suggest -- since it's "public type of a symbol" and not "public symbol"
the confusion with e.g. initial underscores is less of an issue.) At
least initially, I would like to try not having special handling for
module-global symbols vs other symbols.

4) Switch to using "definitions that reach end of scope" rather than
"all definitions" in determining the public type of a symbol. I'm
convinced that in general this is the right way to go. We may want to
refine this further in future for some free-variable cases, but it can
be changed purely by making changes to the building of the use-def map
(the `public_definitions` index in it), without affecting any other
code. One consequence of combining this with no control-flow support
(just last-definition-wins) is that some inference tests now give more
wrong-looking results; I left TODO comments on these tests to fix them
when control flow is added.

And some potential areas for consideration in the future:

1) Should `symbol_ty` be a Salsa query? This would require making all
symbols a Salsa ingredient, and tracking even more dependencies. But it
would save some repeated reconstruction of unions, for symbols with
multiple public definitions. For now I'm not making it a query, but open
to changing this in future with actual perf evidence that it's better.
2024-07-16 11:02:30 -07:00
Carl Meyer
0e44235981 [red-knot] intern types using Salsa (#12061)
Intern types using Salsa interning instead of in the `TypeInference`
result.

This eliminates the need for `TypingContext`, and also paves the way for
finer-grained type inference queries.
2024-07-05 12:16:37 -07:00
Micha Reiser
3ce8b9fcae Make Definition a salsa-ingredient (#12151) 2024-07-04 06:46:08 +00:00
Micha Reiser
5109b50bb3 Use CompactString for Identifier (#12101) 2024-07-01 10:06:02 +02:00
Alex Waygood
736a4ead14 [red-knot] Move module-resolution logic to its own crate (#11964) 2024-06-21 13:25:44 +00:00
Micha Reiser
2dfbf118d7 [red-knot] Extract red_knot_python_semantic crate (#11926) 2024-06-20 13:24:24 +02:00