## Summary
If the user is analyzing a script (i.e., we have no module path), it
seems reasonable to use the script name when trying to identify paths to
objects defined _within_ the script.
Closes https://github.com/astral-sh/ruff/issues/10960.
## Test Plan
Ran:
```shell
check --isolated --select=B008 \
--config 'lint.flake8-bugbear.extend-immutable-calls=["test.A"]' \
test.py
```
On:
```python
class A: pass
def f(a=A()):
pass
```
## Summary
Fixes#3011.
Type checkers currently allow forward references in all contexts in stub
files, and stubs frequently make use of this capability (although it
doesn't actually seem to be specc'd anywhere --neither in PEP 484, nor
https://typing.readthedocs.io/en/latest/source/stubs.html#id6, nor the
CPython typing docs). Implementing it so that Ruff allows forward
references in _all contexts_ in stub files seems non-trivial, however
(or at least, I couldn't figure out how to do it easily), so this PR
does not do that. Perhaps it _should_; if we think this apporach isn't
principled enough, I'm happy to close it and postpone changing anything
here.
However, this does reduce the number of F821 errors Ruff emits on
typeshed down from 76 to 2, which would mean that we could enable the
rule at typeshed. The remaining 2 F821 errors can be trivially fixed at
typeshed by moving definitions around; forward references in class bases
were really the only remaining places where there was a real _use case_
for forward references in stub files that Ruff wasn't yet allowing.
## Test plan
`cargo test`. I also ran this PR branch on typeshed to check to see if
there were any new false positives caused by the changes here; there
were none.
## Summary
This PR adds a new semantic model flag to indicate that the checker is
inside an f-string replacement field. This will be used to ignore
certain checks if the target version doesn't support a specific feature
like PEP 701.
fixes: #10761
## Test Plan
Add a test case from the raised issue.
## Summary
In https://github.com/astral-sh/ruff/pull/10341, we fixed some false
positives in `.pyi` files, but introduced others. This PR effectively
reverts the change in #10341 and fixes it in a slightly different way.
Instead of changing the _bindings_ we generate in the semantic model in
`.pyi` files, we instead change how we _resolve_ them.
Closes https://github.com/astral-sh/ruff/issues/10509.
## Summary
Given `del X`, we'll typically add a `BindingKind::Deletion` to `X` to
shadow the current binding. However, if the deletion is inside of a
conditional operation, we _won't_, as in:
```python
def f():
global X
if X > 0:
del X
```
We will, however, track it as a reference to the binding. This PR adds
the expression context to those resolved references, so that we can
detect that the `X` in `global X` was "assigned to".
Closes https://github.com/astral-sh/ruff/issues/10397.
## Summary
When you try to remove an internal representation leaking into another
type and end up rewriting a simple version of `smallvec`.
The goal of this PR is to replace the `Box<[&'a str]>` with
`Box<QualifiedName>` to avoid that the internal `QualifiedName`
representation leaks (and it gives us a nicer API too). However, doing
this when `QualifiedName` uses `SmallVec` internally gives us all sort
of funny lifetime errors. I was lost but @BurntSushi came to rescue me.
He figured out that `smallvec` has a variance problem which is already
tracked in https://github.com/servo/rust-smallvec/issues/146
To fix the variants problem, I could use the smallvec-2-alpha-4 or
implement our own smallvec. I went with implementing our own small vec
for this specific problem. It obviously isn't as sophisticated as
smallvec (only uses safe code), e.g. it doesn't perform any size
optimizations, but it does its job.
Other changes:
* Removed `Imported::qualified_name` (the version that returns a
`String`). This can be replaced by calling `ToString` on the qualified
name.
* Renamed `Imported::call_path` to `qualified_name` and changed its
return type to `&QualifiedName`.
* Renamed `QualifiedName::imported` to `user_defined` which is the more
common term when talking about builtins vs the rest/user defined
functions.
## Test plan
`cargo test`
The expression types in our AST are called `ExprYield`, `ExprAwait`,
`ExprStringLiteral` etc, except `ExprNamedExpr`, `ExprIfExpr` and
`ExprGenratorExpr`. This seems to align with [Python AST's
naming](https://docs.python.org/3/library/ast.html) but feels
inconsistent and excessive.
This PR removes the `Expr` postfix from `ExprNamedExpr`, `ExprIfExpr`,
and `ExprGeneratorExpr`.
## Summary
Charlie can probably explain this better than I but it turns out,
`CallPath` is used for two different things:
* To represent unqualified names like `version` where `version` can be a
local variable or imported (e.g. `from sys import version` where the
full qualified name is `sys.version`)
* To represent resolved, full qualified names
This PR splits `CallPath` into two types to make this destinction clear.
> Note: I haven't renamed all `call_path` variables to `qualified_name`
or `unqualified_name`. I can do that if that's welcomed but I first want
to get feedback on the approach and naming overall.
## Test Plan
`cargo test`
## Summary
This PR changes the `CallPath` type alias to a newtype wrapper.
A newtype wrapper allows us to limit the API and to experiment with
alternative ways to implement matching on `CallPath`s.
## Test Plan
`cargo test`
## Summary
Allows, e.g.:
```python
import os
os.environ["WORLD_SIZE"] = "1"
os.putenv("CUDA_VISIBLE_DEVICES", "4")
import torch
```
For now, this is only allowed in preview.
Closes https://github.com/astral-sh/ruff/issues/10059
## Summary
This PR introduces a new semantic model flag `DOCSTRING` which suggests
that the model is currently in a module / class / function docstring.
This is the first step in eliminating the docstring detection state
machine which is prone to bugs as stated in #7595.
## Test Plan
~TODO: Is there a way to add a test case for this?~
I tested this using the following code snippet and adding a print
statement in the `string_like` analyzer to print if we're currently in a
docstring or not.
<details><summary>Test code snippet:</summary>
<p>
```python
"Docstring" ", still a docstring"
"Not a docstring"
def foo():
"Docstring"
"Not a docstring"
if foo:
"Not a docstring"
pass
class Foo:
"Docstring"
"Not a docstring"
foo: int
"Unofficial variable docstring"
def method():
"Docstring"
"Not a docstring"
pass
def bar():
"Not a docstring".strip()
def baz():
_something_else = 1
"""Not a docstring"""
```
</p>
</details>
## Summary
Implement [implicit readlines
(FURB129)](https://github.com/dosisod/refurb/blob/master/refurb/checks/iterable/implicit_readlines.py)
lint.
## Notes
I need a help/an opinion about suggested implementations.
This implementation differs from the original one from `refurb` in the
following way. This implementation checks syntactically the call of the
method with the name `readlines()` inside `for` {loop|generator
expression}. The implementation from refurb also
[checks](https://github.com/dosisod/refurb/blob/master/refurb/checks/iterable/implicit_readlines.py#L43)
that callee is a variable with a type `io.TextIOWrapper` or
`io.BufferedReader`.
- I do not see a simple way to implement the same logic.
- The best I can have is something like
```rust
checker.semantic().binding(checker.semantic().resolve_name(attr_expr.value.as_name_expr()?)?).statement(checker.semantic())
```
and analyze cases. But this will be not about types, but about guessing
the type by assignment (or with) expression.
- Also this logic has several false negatives, when the callee is not a
variable, but the result of function call (e.g. `open(...)`).
- On the other side, maybe it is good to lint this on other things,
where this suggestion is not safe, and push the developers to change
their interfaces to be less surprising, comparing with the standard
library.
- Anyway while the current implementation has false-positives (I
mentioned some of them in the test) I marked the fixes to be unsafe.
## Summary
I was surprised to learn that we treat `x` in `[_ for x in y]` as an
"assignment" binding kind, rather than a dedicated comprehension
variable.
## Summary
This PR renames the semantic model flag `MODULE_DOCSTRING` to
`MODULE_DOCSTRING_BOUNDARY`. The main reason is for readability and for
the new semantic model flag `DOCSTRING` which tracks that the model is
in a module / class / function docstring.
I got confused earlier with the name until I looked at the use case and
it seems that the `_BOUNDARY` prefix is more appropriate for the
use-case and is consistent with other flags.
## Summary
This is a simple idea to avoid unnecessary work in the linter,
especially for rules that run on all name and/or all attribute nodes.
Imagine a rule like the NumPy deprecation check. If the user never
imported `numpy`, we should be able to skip that rule entirely --
whereas today, we do a `resolve_call_path` check on _every_ name in the
file. It turns out that there's basically a finite set of modules that
we care about, so we now track imports on those modules as explicit
flags on the semantic model. In rules that can _only_ ever trigger if
those modules were imported, we add a dedicated and extremely cheap
check to the top of the rule.
We could consider generalizing this to all modules, but I would expect
that not to be much faster than `resolve_call_path`, which is just a
hash map lookup on `TextSize` anyway.
It would also be nice to make this declarative, such that rules could
declare the modules they care about, the analyzers could call the rules
as appropriate. But, I don't think such a design should block merging
this.
Implements SIM113 from #998
Added tests
Limitations
- No fix yet
- Only flag cases where index variable immediately precede `for` loop
@charliermarsh please review and let me know any improvements
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
This PR attempts to improve `builtin-attribute-shadowing` (`A003`), a
rule which has been repeatedly criticized, but _does_ have value (just
not in the current form).
Historically, this rule would flag cases like:
```python
class Class:
id: int
```
This led to an increasing number of exceptions and special-cases to the
rule over time to try and improve it's specificity (e.g., ignore
`TypedDict`, ignore `@override`).
The crux of the issue is that given the above, referencing `id` will
never resolve to `Class.id`, so the shadowing is actually fine. There's
one exception, however:
```python
class Class:
id: int
def do_thing() -> id:
pass
```
Here, `id` actually resolves to the `id` attribute on the class, not the
`id` builtin.
So this PR completely reworks the rule around this _much_ more targeted
case, which will almost always be a mistake: when you reference a class
member from within the class, and that member shadows a builtin.
Closes https://github.com/astral-sh/ruff/issues/6524.
Closes https://github.com/astral-sh/ruff/issues/7806.
## Summary
On `main`, we flag redefinitions in cases like:
```python
import os
x = 1
if x > 0:
import os
```
That is, we consider these to be in the "same branch", since they're not
in disjoint branches. This matches Flake8's behavior, but it seems to
lead to false positives.
## Summary
Historically, we encoded this list by extracting the `__all__`. I went
to update it, but... is there really any value in it? Seems easier to
just treat `typing_extensions` as an alias for `typing`.
Closes https://github.com/astral-sh/ruff/issues/9334.
## Summary
I always found it odd that we had to pass this in, since it's really
higher-level context for the error. The awkwardness is further evidenced
by the fact that we pass in fake values everywhere (even outside of
tests). The source path isn't actually used to display the error; it's
only accessed elsewhere to _re-display_ the error in certain cases. This
PR modifies to instead pass the path directly in those cases.
## Summary
Given:
```python
from somewhere import get_cfg
def lookup_cfg(cfg_description):
cfg = get_cfg(cfg_description)
if cfg is not None:
return cfg
raise AttributeError(f"No cfg found matching {cfg_description}")
```
We were analyzing the method from last-to-first statement. So we saw the
`raise`, then assumed the method _always_ raised. In reality, though, it
_might_ return. This PR improves the branch analysis to respect these
mixed cases.
Closes https://github.com/astral-sh/ruff/issues/9269.
Closes https://github.com/astral-sh/ruff/issues/9304.
## Summary
Adds a rule to detect unions that include `typing.NoReturn` or
`typing.Never`. In such cases, the use of the bottom type is redundant.
Closes https://github.com/astral-sh/ruff/issues/9113.
## Test Plan
`cargo test`
## Summary
This allows us to fix usages like:
```python
from pandas import DataFrame
def baz() -> DataFrame:
...
```
By quoting the `DataFrame` in `-> DataFrame`. Without quotes, moving
`from pandas import DataFrame` into an `if TYPE_CHECKING:` block will
fail at runtime, since Python tries to evaluate the annotation to add it
to the function's `__annotations__`.
Unfortunately, this does require us to split our "annotation kind" flags
into three categories, rather than two:
- `typing-only`: The annotation is only evaluated at type-checking-time.
- `runtime-evaluated`: Python will evaluate the annotation at runtime
(like above) -- but we're willing to quote it.
- `runtime-required`: Python will evaluate the annotation at runtime
(like above), and some library (like Pydantic) needs it to be available
at runtime, so we _can't_ quote it.
This functionality is gated behind a setting
(`flake8-type-checking.quote-annotations`).
Closes https://github.com/astral-sh/ruff/issues/5559.
## Summary
Adds `find_assigned_value` a function which gets the `&Expr` assigned to
a given `id` if one exists in the semantic model.
Open TODOs:
- [ ] Handle `binding.kind.is_unpacked_assignment()`: I am bit confused
by this one. The snippet from its documentation does not appear to be
counted as an unpacked assignment and the only ones I could find for
which that was true were invalid Python like:
```python
x, y = 1
```
- [ ] How to handle AugAssign. Can we combine statements like:
```python
(a, b) = [(1, 2, 3), (4,)]
a += (6, 7)
```
to get the full value for a? Code currently just returns `None` for
these assign types
- [ ] Multi target assigns
```python
m_c = (m_d, m_e) = (0, 0)
trio.sleep(m_c) # OK
trio.sleep(m_d) # TRIO115
trio.sleep(m_e) # TRIO115
```
## Test Plan
Used the function in two rules:
- `TRIO115`
- `PERF101`
Expanded both their fixtures for explicit multi target check
This PR allows `matplotlib.use` calls to intersperse imports without
triggering `E402`. This is a pragmatic choice as it's common to require
`matplotlib.use` calls prior to importing from within `matplotlib`
itself.
Closes https://github.com/astral-sh/ruff/issues/9091.
## Summary
It's common to interleave a `sys.path` modification between imports at
the top of a file. This is a frequent cause of `# noqa: E402` false
positives, as seen in the ecosystem checks. This PR modifies E402 to
omit such modifications when determining the "import boundary".
(We could consider linting against `sys.path` modifications, but that
should be a separate rule.)
Closes: https://github.com/astral-sh/ruff/issues/5557.
Rebase of #6365 authored by @davidszotten.
## Summary
This PR updates the AST structure for an f-string elements.
The main **motivation** behind this change is to have a dedicated node
for the string part of an f-string. Previously, the existing
`ExprStringLiteral` node was used for this purpose which isn't exactly
correct. The `ExprStringLiteral` node should include the quotes as well
in the range but the f-string literal element doesn't include the quote
as it's a specific part within an f-string. For example,
```python
f"foo {x}"
# ^^^^
# This is the literal part of an f-string
```
The introduction of `FStringElement` enum is helpful which represent
either the literal part or the expression part of an f-string.
### Rule Updates
This means that there'll be two nodes representing a string depending on
the context. One for a normal string literal while the other is a string
literal within an f-string. The AST checker is updated to accommodate
this change. The rules which work on string literal are updated to check
on the literal part of f-string as well.
#### Notes
1. The `Expr::is_literal_expr` method would check for
`ExprStringLiteral` and return true if so. But now that we don't
represent the literal part of an f-string using that node, this improves
the method's behavior and confines to the actual expression. We do have
the `FStringElement::is_literal` method.
2. We avoid checking if we're in a f-string context before adding to
`string_type_definitions` because the f-string literal is now a
dedicated node and not part of `Expr`.
3. Annotations cannot use f-string so we avoid changing any rules which
work on annotation and checks for `ExprStringLiteral`.
## Test Plan
- All references of `Expr::StringLiteral` were checked to see if any of
the rules require updating to account for the f-string literal element
node.
- New test cases are added for rules which check against the literal
part of an f-string.
- Check the ecosystem results and ensure it remains unchanged.
## Performance
There's a performance penalty in the parser. The reason for this remains
unknown as it seems that the generated assembly code is now different
for the `__reduce154` function. The reduce function body is just popping
the `ParenthesizedExpr` on top of the stack and pushing it with the new
location.
- The size of `FStringElement` enum is the same as `Expr` which is what
it replaces in `FString::format_spec`
- The size of `FStringExpressionElement` is the same as
`ExprFormattedValue` which is what it replaces
I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly
resulted in any performance gain. The difference can be seen here:
- Original profile: https://share.firefox.dev/3Taa7ES
- Profile after boxing some node fields:
https://share.firefox.dev/3GsNXpD
### Backtracking
I tried backtracking the changes to see if any of the isolated change
produced this regression. The problem here is that the overall change is
so small that there's only a single checkpoint where I can backtrack and
that checkpoint results in the same regression. This checkpoint is to
revert using `Expr` to the `FString::format_spec` field. After this
point, the change would revert back to the original implementation.
## Review process
The review process is similar to #7927. The first set of commits update
the node structure, parser, and related AST files. Then, further commits
update the linter and formatter part to account for the AST change.
---------
Co-authored-by: David Szotten <davidszotten@gmail.com>
This PR renames the semantic model flag `LITERAL` to `TYPING_LITERAL` to
better reflect its purpose. The main motivation behind this change is to
avoid any confusion with the "literal" terminology used in the AST for
literal nodes like string, bytes, numbers, etc.
Update to [Rust
1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html) and use
the new clippy lints table.
The update itself introduced a new clippy lint about superfluous hashes
in raw strings, which got removed.
I moved our lint config from `rustflags` to the newly stabilized
[workspace.lints](https://doc.rust-lang.org/stable/cargo/reference/workspaces.html#the-lints-table).
One consequence is that we have to `unsafe_code = "warn"` instead of
"forbid" because the latter now actually bans unsafe code:
```
error[E0453]: allow(unsafe_code) incompatible with previous forbid
--> crates/ruff_source_file/src/newlines.rs:62:17
|
62 | #[allow(unsafe_code)]
| ^^^^^^^^^^^ overruled by previous forbid
|
= note: `forbid` lint level was set on command line
```
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
We already support inserting imports for `I002` -- this PR just adds the
same fix for `FA102`, which is explicitly about `from __future__ import
annotations`.
Closes https://github.com/astral-sh/ruff/issues/8682.
## Summary
This PR adds (unsafe) fixes to the flake8-annotations rules that enforce
missing return types, offering to automatically insert type annotations
for functions with literal return values. The logic is smart enough to
generate simplified unions (e.g., `float` instead of `int | float`) and
deal with implicit returns (`return` without a value).
Closes https://github.com/astral-sh/ruff/issues/1640 (though we could
open a separate issue for referring parameter types).
Closes https://github.com/astral-sh/ruff/issues/8213.
## Test Plan
`cargo test`
## Summary
An assignment can be _both_ (e.g.) a loop variable _and_ assigned via
unpacking. In other words, unpacking is a quality of an assignment, not
a _kind_.
## Summary
This brings ruff's behavior in line with what `pep8-naming` already does
and thus closes#8397.
I had initially implemented this to look at the last segment of a dotted
path only when the entry in the `*-decorators` setting started with a
`.`, but in the end I thought it's better to remain consistent w/
`pep8-naming` and doing a match against the last segment of the
decorator name in any case.
If you prefer to diverge from this in favor of less ambiguity in the
configuration let me know and I'll change it so you would need to put
e.g. `.expression` in the `classmethod-decorators` list.
## Test Plan
Tested against the file in the issue linked below, plus the new testcase
added in this PR.
~Improves detection of types imported from `typing_extensions`. Removes
the hard-coded list of supported types in `typing_extensions`; instead
assuming all types could be imported from `typing`, `_typeshed`, or
`typing_extensions`.~
~The typing extensions package appears to re-export types even if they
do not need modification.~
Adds detection of `if typing_extensions.TYPE_CHECKING` blocks. Avoids
inserting a new `if TYPE_CHECKING` block and `from typing import
TYPE_CHECKING` if `typing_extensions.TYPE_CHECKING` is used (closes
https://github.com/astral-sh/ruff/issues/8427)
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
We have this pattern in a bunch of places, where we find the _only_
binding to a name (and return `None`) if it's bound multiple times. This
PR DRYs it up into a method on `SemanticModel`.
## Summary
This PR splits the `Constant` enum as individual literal nodes. It
introduces the following new nodes for each variant:
* `ExprStringLiteral`
* `ExprBytesLiteral`
* `ExprNumberLiteral`
* `ExprBooleanLiteral`
* `ExprNoneLiteral`
* `ExprEllipsisLiteral`
The main motivation behind this refactor is to introduce the new AST
node for implicit string concatenation in the coming PR. The elements of
that node will be either a string literal, bytes literal or a f-string
which can be implemented using an enum. This means that a string or
bytes literal cannot be represented by `Constant::Str` /
`Constant::Bytes` which creates an inconsistency.
This PR avoids that inconsistency by splitting the constant nodes into
it's own literal nodes, literal being the more appropriate naming
convention from a static analysis tool perspective.
This also makes working with literals in the linter and formatter much
more ergonomic like, for example, if one would want to check if this is
a string literal, it can be done easily using
`Expr::is_string_literal_expr` or matching against `Expr::StringLiteral`
as oppose to matching against the `ExprConstant` and enum `Constant`. A
few AST helper methods can be simplified as well which will be done in a
follow-up PR.
This introduces a new `Expr::is_literal_expr` method which is the same
as `Expr::is_constant_expr`. There are also intermediary changes related
to implicit string concatenation which are quiet less. This is done so
as to avoid having a huge PR which this already is.
## Test Plan
1. Verify and update all of the existing snapshots (parser, visitor)
2. Verify that the ecosystem check output remains **unchanged** for both
the linter and formatter
### Formatter ecosystem check
#### `main`
| project | similarity index | total files | changed files |
|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
#### `dhruv/constant-to-literal`
| project | similarity index | total files | changed files |
|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
In https://github.com/astral-sh/ruff/pull/7968, I introduced a
regression whereby we started to treat imports used _only_ in type
annotation bounds (with `__future__` annotations) as unused.
The root of the issue is that I started using `visit_annotation` for
these bounds. So we'd queue up the bound in the list of deferred type
parameters, then when visiting, we'd further queue it up in the list of
deferred type annotations... Which we'd then never visit, since deferred
type annotations are visited _before_ deferred type parameters.
Anyway, the better solution here is to use a dedicated flag for these,
since they have slightly different behavior than type annotations.
I've also fixed what I _think_ is a bug whereby we previously failed to
resolve `Callable` in:
```python
type RecordCallback[R: Record] = Callable[[R], None]
from collections.abc import Callable
```
IIUC, the values in type aliases should be evaluated lazily, like type
parameters.
Closes https://github.com/astral-sh/ruff/issues/8017.
## Test Plan
`cargo test`
## Summary
Check that the sequence type is a list, set, dict, or tuple before
recommending replacing the `enumerate(...)` call with `range(len(...))`.
Document behaviour so users are aware of the type inference limitation
leading to false negatives.
Closes#7656.
## Summary
This wasn't necessary in the past, since we _only_ applied this rule to
bodies that contained two statements, one of which was a `pass`. Now
that it applies to any `pass` in a block with multiple statements, we
can run into situations in which we remove both passes, and so need to
apply the fixes in isolation.
See:
https://github.com/astral-sh/ruff/issues/7455#issuecomment-1741107573.
## Summary
This is a follow-up to #7469 that attempts to achieve similar gains, but
without introducing malachite. Instead, this PR removes the `BigInt`
type altogether, instead opting for a simple enum that allows us to
store small integers directly and only allocate for values greater than
`i64`:
```rust
/// A Python integer literal. Represents both small (fits in an `i64`) and large integers.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct Int(Number);
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Number {
/// A "small" number that can be represented as an `i64`.
Small(i64),
/// A "large" number that cannot be represented as an `i64`.
Big(Box<str>),
}
impl std::fmt::Display for Number {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Number::Small(value) => write!(f, "{value}"),
Number::Big(value) => write!(f, "{value}"),
}
}
}
```
We typically don't care about numbers greater than `isize` -- our only
uses are comparisons against small constants (like `1`, `2`, `3`, etc.),
so there's no real loss of information, except in one or two rules where
we're now a little more conservative (with the worst-case being that we
don't flag, e.g., an `itertools.pairwise` that uses an extremely large
value for the slice start constant). For simplicity, a few diagnostics
now show a dedicated message when they see integers that are out of the
supported range (e.g., `outdated-version-block`).
An additional benefit here is that we get to remove a few dependencies,
especially `num-bigint`.
## Test Plan
`cargo test`
## Summary
Implement
[`no-ignored-enumerate-items`](https://github.com/dosisod/refurb/blob/master/refurb/checks/builtin/no_ignored_enumerate.py)
as `unnecessary-enumerate` (`FURB148`).
The auto-fix considers if a `start` argument is passed to the
`enumerate()` function. If only the index is used, then the suggested
fix is to pass the `start` value to the `range()` function. So,
```python
for i, _ in enumerate(xs, 1):
...
```
becomes
```python
for i in range(1, len(xs)):
...
```
If the index is ignored and only the value is ignored, and if a start
value greater than zero is passed to `enumerate()`, the rule doesn't
produce a suggestion. I couldn't find a unanimously accepted best way to
iterate over a collection whilst skipping the first n elements. The rule
still triggers, however.
Related to #1348.
## Test Plan
`cargo test`
---------
Co-authored-by: Dhruv Manilawala <dhruvmanila@gmail.com>
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
The motivation here is that this enables us to implement `Ranged` in
crates that don't depend on `ruff_python_ast`.
Largely a mechanical refactor with a lot of regex, Clippy help, and
manual fixups.
## Test Plan
`cargo test`
## Summary
The range of the usage from `Globals` should be the range of the
identifier, not the range of the full `global pandas` statement.
Closes https://github.com/astral-sh/ruff/issues/6914.
## Summary
This PR is a follow-up to the suggestion in
https://github.com/astral-sh/ruff/pull/6345#discussion_r1285470953 to
use a single stack to store all statements and expressions, rather than
using separate vectors for each, which gives us something closer to a
full-fidelity chain. (We can then generalize this concept to include all
other AST nodes too.)
This is in part made possible by the removal of the hash map from
`&Stmt` to `StatementId` (#6694), which makes it much cheaper to store
these using a single interface (since doing so no longer introduces the
requirement that we hash all expressions).
I'll follow-up with some profiling, but a few notes on how the data
requirements have changed:
- We now store a `BranchId` for every expression, not just every
statement, so that's an extra `u32`.
- We now store a single `NodeId` on every snapshot, rather than separate
`StatementId` and `ExpressionId` IDs, so that's one fewer `u32` for each
snapshot.
- We're probably doing a few more lookups in general, since any calls to
`current_statement()` etc. now have to iterate up the node hierarchy
until they identify the first statement.
## Test Plan
`cargo test`
This _probably_ never matters given the set of rules we support and in
fact I'm having trouble thinking of a test-case for it, but it's
definitely incorrect _not_ to pass on the `BranchId` here.
## Summary
We have a few rules that rely on detecting whether two statements are in
different branches -- for example, different arms of an `if`-`else`.
Historically, the way this was implemented is that, given two statement
IDs, we'd find the common parent (by traversing upwards via our
`Statements` abstraction); then identify branches "manually" by matching
the parents against `try`, `if`, and `match`, and returning iterators
over the arms; then check if there's an arm for which one of the
statements is a child, and the other is not.
This has a few drawbacks:
1. First, the code is generally a bit hard to follow (Konsti mentioned
this too when working on the `ElifElseClause` refactor).
2. Second, this is the only place in the codebase where we need to go
from `&Stmt` to `StatementID` -- _everywhere_ else, we only need to go
in the _other_ direction. Supporting these lookups means we need to
maintain a mapping from `&Stmt` to `StatementID` that includes every
`&Stmt` in the program. (We _also_ end up maintaining a `depth` level
for every statement.) I'd like to get rid of these requirements to
improve efficiency, reduce complexity, and enable us to treat AST modes
more generically in the future. (When I looked at adding the `&Expr` to
our existing statement-tracking infrastructure, maintaining a hash map
with all the statements noticeably hurt performance.)
The solution implemented here instead makes branches a first-class
concept in the semantic model. Like with `Statements`, we now have a
`Branches` abstraction, where each branch points to its optional parent.
When we store statements, we store the `BranchID` alongside each
statement. When we need to detect whether two statements are in the same
branch, we just realize each statement's branch path and compare the
two. (Assuming that the two statements are in the same scope, then
they're on the same branch IFF one branch path is a subset of the other,
starting from the top.) We then add some calls to the visitor to push
and pop branches in the appropriate places, for `if`, `try`, and `match`
statements.
Note that a branch is not 1:1 with a statement; instead, each branch is
closer to a suite, but not _every_ suite is a branch. For example, each
arm in an `if`-`elif`-`else` is a branch, but the `else` in a `for` loop
is not considered a branch.
In addition to being much simpler, this should also be more efficient,
since we've shed the entire `&Stmt` hash map, plus the `depth` that we
track on `StatementWithParent` in favor of a single `Option<BranchID>`
on `StatementWithParent` plus a single vector for all branches. The
lookups should be faster too, since instead of doing a bunch of jumps
around with the hash map + repeated recursive calls to find the common
parents, we instead just do a few simple lookups in the `Branches`
vector to realize and compare the branch paths.
## Test Plan
`cargo test` -- we have a lot of coverage for this, which we inherited
from PyFlakes
## Summary
I noticed some inconsistencies around uses of `.range.start()`, structs
that have a `TextRange` field but don't implement `Ranged`, etc.
## Test Plan
`cargo test`
## Summary
Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the
pattern we've migrated towards for `Stmt::For` (removing
`Stmt::AsyncFor`) and friends. While these are significant differences
for an interpreter, we tend to handle these cases identically or nearly
identically.
## Test Plan
`cargo test`
## Summary
This method is almost never what you actually want, because it doesn't
respect Python's scoping semantics. For example, if you call this within
a class method, it will return class attributes, whereas Python actually
_skips_ symbols in classes unless the load occurs within the class
itself. I also want to move away from these kinds of dynamic lookups and
more towards `resolve_name`, which performs a lookup based on the stored
`BindingId` at the time of symbol resolution, and will make it much
easier for us to separate model building from linting in the near
future.
## Test Plan
`cargo test`
## Summary
Fixes some TODOs introduced in
https://github.com/astral-sh/ruff/pull/6538. In short, given an
expression like `1 if x > 0 else "Hello, world!"`, we now return a union
type that says the expression can resolve to either an `int` or a `str`.
The system remains very limited, it only works for obvious primitive
types, and there's no attempt to do inference on any more complex
variables. (If any expression yields `Unknown` or `TypeError`, we
propagate that result throughout and abort on the client's end.)
## Summary
Our `is_builtin` check did a naive walk over the parent scopes; instead,
it needs to (e.g.) skip symbols in a class scope if being called outside
of the class scope itself.
Closes https://github.com/astral-sh/ruff/issues/6466.
## Test Plan
`cargo test`
## Summary
We have some logic in the expression analyzer method to avoid
re-checking the inner `Union` in `Union[Union[...]]`, since the methods
that analyze `Union` expressions already recurse. Elsewhere, we have
logic to avoid re-checking the inner `|` in `int | (int | str)`, for the
same reason.
This PR unifies that logic into a single method _and_ ensures that, just
as we recurse over both `Union` and `|`, we also detect that we're in
_either_ kind of nested union.
Closes https://github.com/astral-sh/ruff/issues/6285.
## Test Plan
Added some new snapshots.
## Summary
This PR leverages the unified function definition node to add precise
AST node types to `MemberKind`, which is used to power our docstring
definition tracking (e.g., classes and functions, whether they're
methods or functions or nested functions and so on, whether they have a
docstring, etc.). It was painful to do this in the past because the
function variants needed to support a union anyway, but storing precise
nodes removes like a dozen panics.
No behavior changes -- purely a refactor.
## Test Plan
`cargo test`
## Summary
Per the suggestion in
https://github.com/astral-sh/ruff/discussions/6183, this PR removes
`AsyncWith`, `AsyncFor`, and `AsyncFunctionDef`, replacing them with an
`is_async` field on the non-async variants of those structs. Unlike an
interpreter, we _generally_ have identical handling for these nodes, so
separating them into distinct variants adds complexity from which we
don't really benefit. This can be seen below, where we get to remove a
_ton_ of code related to adding generic `Any*` wrappers, and a ton of
duplicate branches for these cases.
## Test Plan
`cargo test` is unchanged, apart from parser snapshots.
## Summary
See discussion in
https://github.com/astral-sh/ruff/pull/6351#discussion_r1284996979. We
can remove `RefEquality` entirely and instead use a text offset for
statement keys, since no two statements can start at the same text
offset.
## Test Plan
`cargo test`
## Summary
This PR fixes the performance degradation introduced in
https://github.com/astral-sh/ruff/pull/6345. Instead of using the
generic `Nodes` structs, we now use separate `Statement` and
`Expression` structs. Importantly, we can avoid tracking a bunch of
state for expressions that we need for parents: we don't need to track
reference-to-ID pointers (we just have no use-case for this -- I'd
actually like to remove this from statements too, but we need it for
branch detection right now), we don't need to track depth, etc.
In my testing, this entirely removes the regression on all-rules, and
gets us down to 2ms slower on the default rules (as a crude hyperfine
benchmark, so this is within margin of error IMO).
No behavioral changes.
## Summary
This PR attempts to draw a clearer divide between "methods that take
(e.g.) an expression or statement as input" and "methods that rely on
the _current_ expression or statement" in the semantic model, by
renaming methods like `stmt()` to `current_statement()`.
This had led to confusion in the past. For example, prior to this PR, we
had `scope()` (which returns the current scope), and `parent_scope`,
which returns the parent _of a scope that's passed in_. Now, the API is
clearer: `current_scope` returns the current scope, and `parent_scope`
takes a scope as argument and returns its parent.
Per above, I also changed `stmt` to `statement` and `expr` to
`expression`.
## Summary
When we iterate over the AST for analysis, we often process nodes in a
"deferred" manner. For example, if we're analyzing a function, we push
the function body onto a deferred stack, along with a snapshot of the
current semantic model state. Later, when we analyze the body, we
restore the semantic model state from the snapshot. This ensures that we
know the correct scope, hierarchy of statement parents, etc., when we go
to analyze the function body.
Historically, we _haven't_ included the _expression_ hierarchy in the
model snapshot -- so we track the current expression parents in the
visitor, but we never save and restore them when processing deferred
nodes. This can lead to subtle bugs, in that methods like
`expr_parent()` aren't guaranteed to be correct, if you're in a deferred
visitor.
This PR migrates expression tracking to mirror statement tracking
exactly. So we push all expressions onto an `IndexVec`, and include the
current expression on the snapshot. This ensures that `expr_parent()`
and related methods are "always correct" rather than "sometimes
correct".
There's a performance cost here, both at runtime and in terms of memory
consumption (we now store an additional pointer for every expression).
In my hyperfine testing, it's about a 1% performance decrease for
all-rules on CPython (up to 533.8ms, from 528.3ms) and a 4% performance
decrease for default-rules on CPython (up to 212ms, from 204ms).
However... I think this is worth it given the incorrectness of our
current approach. In the future, we may want to reconsider how we do
these upward traversals (e.g., with something like a red-green tree).
(**Note**: in https://github.com/astral-sh/ruff/pull/6351, the slowdown
seems to be entirely removed.)
## Summary
Historically, we've stored "qualified names" on our
`BindingKind::Import`, `BindingKind::SubmoduleImport`, and
`BindingKind::ImportFrom` structs. In Ruff, a "qualified name" is a
dot-separated path to a symbol. For example, given `import foo.bar`, the
"qualified name" would be `"foo.bar"`; and given `from foo.bar import
baz`, the "qualified name" would be `foo.bar.baz`.
This PR modifies the `BindingKind` structs to instead store _call paths_
rather than qualified names. So in the examples above, we'd store
`["foo", "bar"]` and `["foo", "bar", "baz"]`. It turns out that this
more efficient given our data access patterns. Namely, we frequently
need to convert the qualified name to a call path (whenever we call
`resolve_call_path`), and it turns out that we do this operation enough
that those conversations show up on benchmarks.
There are a few other advantages to using call paths, rather than
qualified names:
1. The size of `BindingKind` is reduced from 32 to 24 bytes, since we no
longer need to store a `String` (only a boxed slice).
2. All three import types are more consistent, since they now all store
a boxed slice, rather than some storing an `&str` and some storing a
`String` (for `BindingKind::ImportFrom`, we needed to allocate a
`String` to create the qualified name, but the call path is a slice of
static elements that don't require that allocation).
3. A lot of code gets simpler, in part because we now do call path
resolution "earlier". Most notably, for relative imports (`from .foo
import bar`), we store the _resolved_ call path rather than the relative
call path, so the semantic model doesn't have to deal with that
resolution. (See that `resolve_call_path` is simpler, fewer branches,
etc.)
In my testing, this change improves the all-rules benchmark by another
4-5% on top of the improvements mentioned in #6047.
## Summary
This PR removes a now-unnecessary abstraction from `helper.rs`
(`CallArguments`), in favor of adding methods to `Arguments` directly,
which helps with discoverability.
## Summary
This PR adds a new `Arguments` AST node, which we can use for function
calls and class definitions.
The `Arguments` node spans from the left (open) to right (close)
parentheses inclusive.
In the case of classes, the `Arguments` is an option, to differentiate
between:
```python
# None
class C: ...
# Some, with empty vectors
class C(): ...
```
In this PR, we don't really leverage this change (except that a few
rules get much simpler, since we don't need to lex to find the start and
end ranges of the parentheses, e.g.,
`crates/ruff/src/rules/pyupgrade/rules/lru_cache_without_parameters.rs`,
`crates/ruff/src/rules/pyupgrade/rules/unnecessary_class_parentheses.rs`).
In future PRs, this will be especially helpful for the formatter, since
we can track comments enclosed on the node itself.
## Test Plan
`cargo test`
## Summary
We have some code to ensure that if an aliased import is used, any
submodules should be marked as used too. This comment says it best:
```rust
// If the name of a submodule import is the same as an alias of another import, and the
// alias is used, then the submodule import should be marked as used too.
//
// For example, mark `pyarrow.csv` as used in:
//
// ```python
// import pyarrow as pa
// import pyarrow.csv
// print(pa.csv.read_csv("test.csv"))
// ```
```
However, it looks like when we go to look up `pyarrow` (of `import
pyarrow as pa`), we aren't checking to ensure the resolved binding is
_actually_ an import. This was causing us to attribute `print(rm.ANY)`
to `def requests_mock` here:
```python
import requests_mock as rm
def requests_mock(requests_mock: rm.Mocker):
print(rm.ANY)
```
Closes https://github.com/astral-sh/ruff/issues/6180.
Requires https://github.com/astral-sh/RustPython-Parser/pull/42
Related https://github.com/PyCQA/pyflakes/pull/778
[PEP-695](https://peps.python.org/pep-0695)
Part of #5062
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
<!-- What's the purpose of the change? What does it do, and why? -->
Adds a scope for type parameters, a type parameter binding kind, and
checker visitation of type parameters in type alias statements, function
definitions, and class definitions.
A few changes were necessary to ensure correctness following the
insertion of a new scope between function and class scopes and their
parent.
## Test Plan
<!-- How was it tested? -->
Undefined name snapshots.
Unused type parameter rule will be added as follow-up.
## Summary
This PR stores the mapping from `ExprName` node to resolved `BindingId`,
which lets us skip scope lookups in `resolve_call_path`. It's enabled by
#6045, since that PR ensures that when we analyze a node (and thus call
`resolve_call_path`), we'll have already visited its `ExprName`
elements.
In more detail: imagine that we're traversing over `foo.bar()`. When we
read `foo`, it will be an `ExprName`, which we'll then resolve to a
binding via `handle_node_load`. With this change, we then store that
binding in a map. Later, if we call `collect_call_path` on `foo.bar`,
we'll identify `foo` (the "head" of the attribute) and grab the resolved
binding in that map. _Almost_ all names are now resolved in advance,
though it's not a strict requirement, and some rules break that pattern
(e.g., if we're analyzing arguments, and they need to inspect their
annotations, which are visited in a deferred manner).
This improves performance by 4-6% on the all-rules benchmark. It looks
like it hurts performance (1-2% drop) in the default-rules benchmark,
presumedly because those rules don't call `resolve_call_path` nearly as
much, and so we're paying for these extra writes.
Here's the benchmark data:
```
linter/default-rules/numpy/globals.py
time: [67.270 µs 67.380 µs 67.489 µs]
thrpt: [43.720 MiB/s 43.792 MiB/s 43.863 MiB/s]
change:
time: [+0.4747% +0.7752% +1.0626%] (p = 0.00 < 0.05)
thrpt: [-1.0514% -0.7693% -0.4724%]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
linter/default-rules/pydantic/types.py
time: [1.4067 ms 1.4105 ms 1.4146 ms]
thrpt: [18.028 MiB/s 18.081 MiB/s 18.129 MiB/s]
change:
time: [+1.3152% +1.6953% +2.0414%] (p = 0.00 < 0.05)
thrpt: [-2.0006% -1.6671% -1.2981%]
Performance has regressed.
linter/default-rules/numpy/ctypeslib.py
time: [637.67 µs 638.96 µs 640.28 µs]
thrpt: [26.006 MiB/s 26.060 MiB/s 26.113 MiB/s]
change:
time: [+1.5859% +1.8109% +2.0353%] (p = 0.00 < 0.05)
thrpt: [-1.9947% -1.7787% -1.5611%]
Performance has regressed.
linter/default-rules/large/dataset.py
time: [3.2289 ms 3.2336 ms 3.2383 ms]
thrpt: [12.563 MiB/s 12.581 MiB/s 12.599 MiB/s]
change:
time: [+0.8029% +0.9898% +1.1740%] (p = 0.00 < 0.05)
thrpt: [-1.1604% -0.9801% -0.7965%]
Change within noise threshold.
linter/all-rules/numpy/globals.py
time: [134.05 µs 134.15 µs 134.26 µs]
thrpt: [21.977 MiB/s 21.995 MiB/s 22.012 MiB/s]
change:
time: [-4.4571% -4.1175% -3.8268%] (p = 0.00 < 0.05)
thrpt: [+3.9791% +4.2943% +4.6651%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
linter/all-rules/pydantic/types.py
time: [2.5627 ms 2.5669 ms 2.5720 ms]
thrpt: [9.9158 MiB/s 9.9354 MiB/s 9.9516 MiB/s]
change:
time: [-5.8304% -5.6374% -5.4452%] (p = 0.00 < 0.05)
thrpt: [+5.7587% +5.9742% +6.1914%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
linter/all-rules/numpy/ctypeslib.py
time: [1.3949 ms 1.3956 ms 1.3964 ms]
thrpt: [11.925 MiB/s 11.931 MiB/s 11.937 MiB/s]
change:
time: [-6.2496% -6.0856% -5.9293%] (p = 0.00 < 0.05)
thrpt: [+6.3030% +6.4799% +6.6662%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
linter/all-rules/large/dataset.py
time: [5.5951 ms 5.6019 ms 5.6093 ms]
thrpt: [7.2527 MiB/s 7.2623 MiB/s 7.2711 MiB/s]
change:
time: [-5.1781% -4.9783% -4.8070%] (p = 0.00 < 0.05)
thrpt: [+5.0497% +5.2391% +5.4608%]
Performance has improved.
```
Still playing with this (the concepts need better names, documentation,
etc.), but opening up for feedback.
## Summary
Check for unused private `TypeVar`. See [original
implementation](2a86db8271/pyi.py (L1958)).
```
$ flake8 --select Y018 crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:4:1: Y018 TypeVar "_T" is not used
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:5:1: Y018 TypeVar "_P" is not used
```
```
$ ./target/debug/ruff --select PYI018 crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi --no-cache
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:4:1: PYI018 TypeVar `_T` is never used
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:5:1: PYI018 TypeVar `_P` is never used
Found 2 errors.
```
In the file `unused_private_type_declaration.rs`, I'm planning to add
other rules that are similar to `PYI018` like the `PYI046`, `PYI047` and
`PYI049`.
ref #848
## Test Plan
Snapshots and manual runs of flake8.
## Summary
This PR adds a `logger-objects` setting that allows users to mark
specific symbols a `logging.Logger` objects. Currently, if a `logger` is
imported, we only flagged it as a `logging.Logger` if it comes exactly
from the `logging` module or is `flask.current_app.logger`.
This PR allows users to mark specific loggers, like
`logging_setup.logger`, to ensure that they're covered by the
`flake8-logging-format` rules and others.
For example, if you have a module `logging_setup.py` with the following
contents:
```python
import logging
logger = logging.getLogger(__name__)
```
Adding `"logging_setup.logger"` to `logger-objects` will ensure that
`logging_setup.logger` is treated as a `logging.Logger` object when
imported from other modules (e.g., `from logging_setup import logger`).
Closes https://github.com/astral-sh/ruff/issues/5694.
## Summary
This is equivalent for a single flag, but I think it's more likely to be
correct when the bitflags are modified -- the primary reason being that
we sometimes define flags as the union of other flags, e.g.:
```rust
const ANNOTATION = Self::TYPING_ONLY_ANNOTATION.bits() | Self::RUNTIME_ANNOTATION.bits();
```
In this case, `flags.contains(Flag::ANNOTATION)` requires that _both_
flags in the union are set, whereas `flags.intersects(Flag::ANNOTATION)`
requires that _at least one_ flag is set.
## Summary
As part of my continued quest to separate semantic model-building from
diagnostic emission, this PR moves our unresolved-reference rules to a
deferred pass. So, rather than emitting diagnostics as we encounter
unresolved references, we now track those unresolved references on the
semantic model (just like resolved references), and after traversal,
emit the relevant rules for any unresolved references.
## Summary
This PR moves two rules (`invalid-all-format` and `invalid-all-object`)
out of the name-binding phase, and into the dedicated pass over all
bindings that occurs at the end of the `Checker`. This is part of my
continued quest to separate the semantic model-building logic from the
actual rule enforcement.
## Summary
The vector of names here is immutable -- we never push to it after
initialization. Boxing reduces the size of the variant from 32 bytes to
24 bytes. (See:
https://nnethercote.github.io/perf-book/type-sizes.html#boxed-slices.)
It doesn't make a difference here, since it's not the largest variant,
but it still seems like a prudent change (and I was considering adding
another field to this variant, though I may no longer do so).
## Summary
No behavior change, but this is in theory more efficient, since we can
just iterate over the flat `Binding` vector rather than having to
iterate over binding chains via the `Scope`.
## Summary
This PR moves the "unused exception" rule out of the visitor and into a
deferred check. When we can base rules solely on the semantic model, we
probably should, as it greatly simplifies the `Checker` itself.
## Summary
The `SemanticModel` currently stores the "body" of a given `Suite`,
along with the current statement index. This is used to support "next
sibling" queries, but we only use this in exactly one place -- the rule
that simplifies constructs like this to `any` or `all`:
```python
for x in y:
if x == 0:
return True
return False
```
Instead of tracking the state, we can just do a (slightly more
expensive) traversal, by finding the node within its parent and
returning the next node in the body.
Note that we'll only have to do this extremely rarely -- namely, for
functions that contain something like:
```python
for x in y:
if x == 0:
return True
```
## Summary
Previously, `StmtIf` was defined recursively as
```rust
pub struct StmtIf {
pub range: TextRange,
pub test: Box<Expr>,
pub body: Vec<Stmt>,
pub orelse: Vec<Stmt>,
}
```
Every `elif` was represented as an `orelse` with a single `StmtIf`. This
means that this representation couldn't differentiate between
```python
if cond1:
x = 1
else:
if cond2:
x = 2
```
and
```python
if cond1:
x = 1
elif cond2:
x = 2
```
It also makes many checks harder than they need to be because we have to
recurse just to iterate over an entire if-elif-else and because we're
lacking nodes and ranges on the `elif` and `else` branches.
We change the representation to a flat
```rust
pub struct StmtIf {
pub range: TextRange,
pub test: Box<Expr>,
pub body: Vec<Stmt>,
pub elif_else_clauses: Vec<ElifElseClause>,
}
pub struct ElifElseClause {
pub range: TextRange,
pub test: Option<Expr>,
pub body: Vec<Stmt>,
}
```
where `test: Some(_)` represents an `elif` and `test: None` an else.
This representation is different tradeoff, e.g. we need to allocate the
`Vec<ElifElseClause>`, the `elif`s are now different than the `if`s
(which matters in rules where want to check both `if`s and `elif`s) and
the type system doesn't guarantee that the `test: None` else is actually
last. We're also now a bit more inconsistent since all other `else`,
those from `for`, `while` and `try`, still don't have nodes. With the
new representation some things became easier, e.g. finding the `elif`
token (we can use the start of the `ElifElseClause`) and formatting
comments for if-elif-else (no more dangling comments splitting, we only
have to insert the dangling comment after the colon manually and set
`leading_alternate_branch_comments`, everything else is taken of by
having nodes for each branch and the usual placement.rs fixups).
## Merge Plan
This PR requires coordination between the parser repo and the main ruff
repo. I've split the ruff part, into two stacked PRs which have to be
merged together (only the second one fixes all tests), the first for the
formatter to be reviewed by @michareiser and the second for the linter
to be reviewed by @charliermarsh.
* MH: Review and merge
https://github.com/astral-sh/RustPython-Parser/pull/20
* MH: Review and merge or move later in stack
https://github.com/astral-sh/RustPython-Parser/pull/21
* MH: Review and approve
https://github.com/astral-sh/RustPython-Parser/pull/22
* MH: Review and approve formatter PR
https://github.com/astral-sh/ruff/pull/5459
* CM: Review and approve linter PR
https://github.com/astral-sh/ruff/pull/5460
* Merge linter PR in formatter PR, fix ecosystem checks (ecosystem
checks can't run on the formatter PR and won't run on the linter PR, so
we need to merge them first)
* Merge https://github.com/astral-sh/RustPython-Parser/pull/22
* Create tag in the parser, update linter+formatter PR
* Merge linter+formatter PR https://github.com/astral-sh/ruff/pull/5459
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
## Summary
The AST pass is broken up into three phases: pre-visit (which includes
analysis), recurse (visit all members), and post-visit (clean-up). We're
not supposed to edit semantic model flags in the pre-visit phase, but it
looks like we were for literal detection. This didn't matter in
practice, but I'm looking into some AST refactors for which this _does_
cause issues.
No behavior changes expected.
## Test Plan
Good test coverage on these.
## Summary
Python doesn't allow `"Foo" | None` if the annotation will be evaluated
at runtime (see the comments in the PR, or the semantic model
documentation for more on what this means and when it is true), but it
_does_ allow it if the annotation is typing-only.
This, for example, is invalid, as Python will evaluate `"Foo" | None` at
runtime in order to
populate the function's `__annotations__`:
```python
def f(x: "Foo" | None): ...
```
This, however, is valid:
```python
def f():
x: "Foo" | None
```
As is this:
```python
from __future__ import annotations
def f(x: "Foo" | None): ...
```
Closes#5706.
## Summary
In Python, the annotations on `x` and `y` here have very different
treatment:
```python
def foo(x: int):
y: int
```
The `int` in `x: int` is a runtime-required annotation, because `x` gets
added to the function's `__annotations__`. You'll notice, for example,
that this fails:
```python
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from foo import Bar
def f(x: Bar):
...
```
Because `Bar` is required to be available at runtime, not just at typing
time. Meanwhile, this succeeds:
```python
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from foo import Bar
def f():
x: Bar = 1
f()
```
(Both cases are fine if you use `from __future__ import annotations`.)
Historically, we've tracked those annotations that are _not_
runtime-required via the semantic model's `ANNOTATION` flag. But
annotations that _are_ runtime-required have been treated as "type
definitions" that aren't annotations.
This causes problems for the flake8-future-annotations rules, which try
to detect whether adding `from __future__ import annotations` would
_allow_ you to rewrite a type annotation. We need to know whether we're
in _any_ type annotation, runtime-required or not, since adding `from
__future__ import annotations` will convert any runtime-required
annotation to a typing-only annotation.
This PR adds separate state to track these runtime-required annotations.
The changes in the test fixtures are correct -- these were false
negatives before.
Closes https://github.com/astral-sh/ruff/issues/5574.
## Summary
This PR enables us to resolve attribute accesses within files, at least
for static and class methods. For example, we can now detect that this
is a function access (and avoid a false-positive):
```python
class Class:
@staticmethod
def error():
return ValueError("Something")
# OK
raise Class.error()
```
Closes#5487.
Closes#5416.
Support for `let…else` formatting was just merged to nightly
(rust-lang/rust#113225). Rerun `cargo fmt` with Rust nightly 2023-07-02
to pick this up. Followup to #939.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
## Summary
This PR extracts a bunch of complex logic from `add_binding`, instead
running the the shadowing rules in the deferred handler, thereby
decoupling the binding phase (during which we build up the semantic
model) from the analysis phase, and generally making `add_binding` much
more focused.
This was made possible by improving the semantic model to better handle
deletions -- previously, we'd "lose track" of bindings if they were
deleted, which made this kind of refactor impossible.
## Test Plan
We have good automated coverage for this, but I want to benchmark it
separately.