The docs were out of date, and the new version incorporates some
feedback.
I tried to keep the language concise and the information ordered by how
early you need it, so people can get the relevant information quickly
before jumping into the code.
I did some minor format_dev changes for consistency in the docs.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
Closes https://github.com/astral-sh/ruff/issues/6788 by special casing
integer literals with attribute access — either retaining parenthesis
for literals with values (e.g. `int(7).denominator` to
`(7).denominator)` or leaving calls without values (e.g.
`int().denominator`) unchanged.
## Summary
Another drive-by change to remove unnecessary custom lexing. We just
need to know the parenthesized range, so we can use...
`parenthesized_range`. I've also updated `parenthesized_range` to
support nested parentheses.
## Test Plan
`cargo test`
## Summary
This is effectively #6608, but with additional tests.
We aren't properly handling parenthesized patterns, but that needs to be
dealt with separately as it's somewhat involved.
Closes#6555
## Summary
Follows up on
https://github.com/astral-sh/ruff/pull/6652#discussion_r1300871033 with
some modifications to the `PatternMatchAs` comment handling.
Specifically, any comments between the `as` and the end are now
formatted as dangling, and we now insert some newlines in the
appropriate places.
## Test Plan
`cargo test`
## Summary
Ensures that we retain the open-parenthesis comment in cases like:
```python
match pattern_comments:
case ( # leading
only_leading
):
...
```
Previously, this was treated as a leading comment on `only_leading`.
## Test Plan
`cargo test`
## Summary
This PR fixes the bug where the decorator parentheses weren't being considered
when computing the autofix for `ANN204`. The existing logic would only look
for balanced parentheses and not multiple pairs of parentheses.
The solution is to remove the logic to generate the autofix and use the
`Parameters` end range directly which includes the parentheses as well.
## Test Plan
Add test case for `ANN204` with decorator being called
fixes: #6790
## Summary
Given:
```python
def end_of_file():
if False:
return 1
x = 2 \
```
Then when searching for the end of the `x = 2` statement, we'd reach a
panic as we'd hit the last line (`\\`) and abort, since the universal
iterator doesn't return trailing newlines. Instead, we should just use
the end of the file as the fallback.
Closes https://github.com/astral-sh/ruff/issues/6787.
## Test Plan
`cargo test`
## Summary
Avoid `C417` for `lambda` with default and variadic parameters.
## Test Plan
`cargo test` and checking if it generates any autofix errors as test
cases
for `lambda` with default parameters already exists.
fixes: #6715
## Summary
This PR updates the lexer tests to use the snapshot testing framework.
It also
makes the following changes:
* Remove the use of macros in the lexer tests
* Use `test_case` for EOL tests
## Test Plan
```
cargo test --package ruff_python_parser --lib --all-features -- lexer::tests --no-capture
```
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR adds a utility for transforming expressions via LibCST that
automatically wraps the expression in parentheses, applies a
user-provided transformation, then strips the parentheses from the
generated code. LibCST can't parse arbitrary expression ranges, since
some expressions may require parenthesization in order to be parsed
properly. For example:
```python
option = (
'{name}={value}'
.format(nam=name, value=value)
)
```
In this case, the expression range is:
```python
'{name}={value}'
.format(nam=name, value=value)
```
Which isn't valid on its own. So, instead, we add "fake" parentheses
around the expression.
We were already doing this in a few places, so this is mostly
formalizing and DRYing up that pattern.
Closes https://github.com/astral-sh/ruff/issues/6720.
## Summary
For imports, we enforce that there's _at least_ one empty line after an
import (assuming the next statement is _not_ an import), but allow up to
two at the module level.
Closes https://github.com/astral-sh/ruff/issues/6760.
## Test Plan
`cargo test`
## Summary
The isolation group for unused imports was relying on
`checker.semantic().current_statement()`, which isn't valid for that
rule, since it runs over the _scope_, not the statement. Instead, we
need to lookup the isolation group based on the `NodeId` of the
statement.
Our tests didn't catch this, because we mostly have cases that look like
this:
```python
if TYPE_CHECKING:
import shelve
import importlib
```
In this case, the two fixes to remove the two unused imports are
considered overlapping (since we delete the _full_ line, and the two
_full_ lines touch, and we consider exactly-adjacent fixes to be
overlapping), and so they don't run in a single pass due to the
non-overlapping-fixes requirement. That is: the isolation groups aren't
required for this case. They are, however, required for cases like:
```python
if TYPE_CHECKING:
import shelve
import importlib
```
...where the fixes don't overlap.
Closes https://github.com/astral-sh/ruff/issues/6758.
## Test Plan
`cargo test`
`IOError` is special, it is not actually a lint but an error before
linting. I'm not entirely sure how to document it since it does not
match the general lint rule pattern (`Checks that the file can be read
in its entirety.` is imho worse).
I added the in my experience two most common reasons for io errors on
unix systems and linked two tutorials on how to fix them.
See https://github.com/astral-sh/ruff/issues/2646
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
I noticed this in the ecosystem CI check from
https://github.com/astral-sh/ruff/pull/6742. If we include source code
directly in a diagnostic, we need to be careful to avoid rendering
multi-line diagnostics or even excessively long diagnostics.
## Test Plan
`cargo test`
## Summary
This PR is a follow-up to the suggestion in
https://github.com/astral-sh/ruff/pull/6345#discussion_r1285470953 to
use a single stack to store all statements and expressions, rather than
using separate vectors for each, which gives us something closer to a
full-fidelity chain. (We can then generalize this concept to include all
other AST nodes too.)
This is in part made possible by the removal of the hash map from
`&Stmt` to `StatementId` (#6694), which makes it much cheaper to store
these using a single interface (since doing so no longer introduces the
requirement that we hash all expressions).
I'll follow-up with some profiling, but a few notes on how the data
requirements have changed:
- We now store a `BranchId` for every expression, not just every
statement, so that's an extra `u32`.
- We now store a single `NodeId` on every snapshot, rather than separate
`StatementId` and `ExpressionId` IDs, so that's one fewer `u32` for each
snapshot.
- We're probably doing a few more lookups in general, since any calls to
`current_statement()` etc. now have to iterate up the node hierarchy
until they identify the first statement.
## Test Plan
`cargo test`
## Summary
Avoid attempting to rewrite `import matplotlib.pyplot` as `import
matplotlib.pyplot as plt`. We can't support these right now, since we
don't track references at the attribute level (like
`matplotlib.pyplot`).
Closes https://github.com/astral-sh/ruff/issues/6719.
## Summary
Given:
```python
from sys import *
exit(0)
```
We can't add `exit` to `from sys import *`, so we should just ignore it.
Ideally, we'd just resolve `exit` in the first place (since it's
imported from `from sys import *`), but as long as we don't support
wildcard imports, this is more consistent.
Closes https://github.com/astral-sh/ruff/issues/6718.
## Test Plan
`cargo test`
Avoid the nesting in a macro by using the new `WithNodeLevel` to
`PyFormatter` deref. No changes otherwise.
I wanted to follow this up with quickly fixing the typeshed empty line
rules but they turned out a lot more complex than i had anticipated.
**Summary** A common pattern in the code used to be
```rust
if statements.len() != 1 {
return;
}
use_single_entry(statements[0])?;
```
which can be better expressed as
```rust
let [statement] = statements else {
return;
};
use_single_entry(statements)?;
```
Direct indexing can cause panics if you don't manually take care of
checking the length, while matching (such as if-let or let-else) can
never panic.
This isn't a complete refactor, i've just removed some of the obvious
cases. I've specifically looked for `.len() != 1` and fixed those.
**Test Plan** No functional changes
## Summary
We're using LibCST to ensure that we return the full parenthesized range
of an expression, for display purposes. We can just use
`parenthesized_range` which is more efficient and removes one LibCST
dependency.
## Test Plan
`cargo test`
This _probably_ never matters given the set of rules we support and in
fact I'm having trouble thinking of a test-case for it, but it's
definitely incorrect _not_ to pass on the `BranchId` here.
## Summary
We have a few rules that rely on detecting whether two statements are in
different branches -- for example, different arms of an `if`-`else`.
Historically, the way this was implemented is that, given two statement
IDs, we'd find the common parent (by traversing upwards via our
`Statements` abstraction); then identify branches "manually" by matching
the parents against `try`, `if`, and `match`, and returning iterators
over the arms; then check if there's an arm for which one of the
statements is a child, and the other is not.
This has a few drawbacks:
1. First, the code is generally a bit hard to follow (Konsti mentioned
this too when working on the `ElifElseClause` refactor).
2. Second, this is the only place in the codebase where we need to go
from `&Stmt` to `StatementID` -- _everywhere_ else, we only need to go
in the _other_ direction. Supporting these lookups means we need to
maintain a mapping from `&Stmt` to `StatementID` that includes every
`&Stmt` in the program. (We _also_ end up maintaining a `depth` level
for every statement.) I'd like to get rid of these requirements to
improve efficiency, reduce complexity, and enable us to treat AST modes
more generically in the future. (When I looked at adding the `&Expr` to
our existing statement-tracking infrastructure, maintaining a hash map
with all the statements noticeably hurt performance.)
The solution implemented here instead makes branches a first-class
concept in the semantic model. Like with `Statements`, we now have a
`Branches` abstraction, where each branch points to its optional parent.
When we store statements, we store the `BranchID` alongside each
statement. When we need to detect whether two statements are in the same
branch, we just realize each statement's branch path and compare the
two. (Assuming that the two statements are in the same scope, then
they're on the same branch IFF one branch path is a subset of the other,
starting from the top.) We then add some calls to the visitor to push
and pop branches in the appropriate places, for `if`, `try`, and `match`
statements.
Note that a branch is not 1:1 with a statement; instead, each branch is
closer to a suite, but not _every_ suite is a branch. For example, each
arm in an `if`-`elif`-`else` is a branch, but the `else` in a `for` loop
is not considered a branch.
In addition to being much simpler, this should also be more efficient,
since we've shed the entire `&Stmt` hash map, plus the `depth` that we
track on `StatementWithParent` in favor of a single `Option<BranchID>`
on `StatementWithParent` plus a single vector for all branches. The
lookups should be faster too, since instead of doing a bunch of jumps
around with the hash map + repeated recursive calls to find the common
parents, we instead just do a few simple lookups in the `Branches`
vector to realize and compare the branch paths.
## Test Plan
`cargo test` -- we have a lot of coverage for this, which we inherited
from PyFlakes
## Summary
This is much simpler and avoids (1) multiple passes over the entire
function body, (2) requiring the rule to do its own binding tracking (we
can just use the semantic model), and (3) a usage of `StatementKey`.
In general, where we can, we should try to remove these kinds of custom
visitors that track name references, and instead rely on the semantic
model.
## Test Plan
`cargo test`
## Summary
If a lambda doesn't contain any parameters, or any parameter _tokens_
(like `*`), we can use `None` for the parameters. This feels like a
better representation to me, since, e.g., what should the `TextRange` be
for a non-existent set of parameters? It also allows us to remove
several sites where we check if the `Parameters` is empty by seeing if
it contains any arguments, so semantically, we're already trying to
detect and model around this elsewhere.
Changing this also fixes a number of issues with dangling comments in
parameter-less lambdas, since those comments are now automatically
marked as dangling on the lambda. (As-is, we were also doing something
not-great whereby the lambda was responsible for formatting dangling
comments on the parameters, which has been removed.)
Closes https://github.com/astral-sh/ruff/issues/6646.
Closes https://github.com/astral-sh/ruff/issues/6647.
## Test Plan
`cargo test`
## Summary
In working on https://github.com/astral-sh/ruff/pull/6628, I noticed
that we clone the source code contents, potentially multiple times,
prior to linting. The issue is that `SourceKind::Python` takes a
`String`, so we first have to provide it with a `String`. In the stdin
case, that means cloning. However, on top of this, we then have to clone
`source_kind.contents()` because `SourceKind` gets mutated. So for
stdin, we end up cloning twice. For non-stdin, we end up cloning once,
but unnecessarily (since the _contents_ don't get mutated, only the
kind).
This PR removes the `String` from `source_kind`, instead requiring that
we parse it out elsewhere. It reduces the number of clones down to 1 for
Jupyter Notebooks, and zero otherwise.
## Summary
The motivating code here was:
```python
with test as (
# test
foo):
pass
```
Which we were formatting as:
```python
with test as
# test
(foo):
pass
```
`with` statements are oddly difficult. This PR makes a bunch of subtle
modifications and adds a more extensive test suite. For example, we now
only preserve parentheses if there's more than one `WithItem` _or_ a
trailing comma; before, we always preserved.
Our formatting is_not_ the same as Black, but here's a diff of our
formatted code vs. Black's for the `with.py` test suite. The primary
difference is that we tend to break parentheses when they contain
comments rather than move them to the end of the life (this is a
consistent difference that we make across the codebase):
```diff
diff --git a/crates/ruff_python_formatter/foo.py b/crates/ruff_python_formatter/foo.py
index 85e761080..31625c876 100644
--- a/crates/ruff_python_formatter/foo.py
+++ b/crates/ruff_python_formatter/foo.py
@@ -1,6 +1,4 @@
-with (
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-), aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:
+with aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:
...
# trailing
@@ -16,28 +14,33 @@ with (
# trailing
-with a, b: # a # comma # c # colon
+with (
+ a, # a # comma
+ b, # c
+): # colon
...
with (
- a as # a # as
- # own line
- b, # b # comma
+ a as ( # a # as
+ # own line
+ b
+ ), # b # comma
c, # c
): # colon
... # body
# body trailing own
-with (
- a as # a # as
+with a as ( # a # as
# own line
- bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb # b
-):
+ bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
+): # b
pass
-with (a,): # magic trailing comma
+with (
+ a,
+): # magic trailing comma
...
@@ -47,6 +50,7 @@ with a: # should remove brackets
with aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb as c:
...
+
with (
# leading comment
a
@@ -74,8 +78,7 @@ with (
with (
a # trailing same line comment
# trailing own line comment
- as b
-):
+) as b:
...
with (
@@ -87,7 +90,9 @@ with (
with (
a
# trailing own line comment
-) as b: # trailing as same line comment # trailing b same line comment
+) as ( # trailing as same line comment
+ b
+): # trailing b same line comment
...
with (
@@ -124,18 +129,24 @@ with ( # comment
...
with ( # outer comment
- CtxManager1() as example1, # inner comment
+ ( # inner comment
+ CtxManager1()
+ ) as example1,
CtxManager2() as example2,
CtxManager3() as example3,
):
...
-with CtxManager() as example: # outer comment
+with ( # outer comment
+ CtxManager()
+) as example:
...
with ( # outer comment
CtxManager()
-) as example, CtxManager2() as example2: # inner comment
+) as example, ( # inner comment
+ CtxManager2()
+) as example2:
...
with ( # outer comment
@@ -145,7 +156,9 @@ with ( # outer comment
...
with ( # outer comment
- (CtxManager1()), # inner comment
+ ( # inner comment
+ CtxManager1()
+ ),
CtxManager2(),
) as example:
...
@@ -179,7 +192,9 @@ with (
):
pass
-with a as (b): # foo
+with a as ( # foo
+ b
+):
pass
with f(
@@ -209,17 +224,13 @@ with f(
) as b, c as d:
pass
-with (
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
-) as b:
+with aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb as b:
pass
with aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb as b:
pass
-with (
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
-) as b, c as d:
+with aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb as b, c as d:
pass
with (
@@ -230,6 +241,8 @@ with (
pass
with (
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
-) as b, c as d:
+ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ + bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb as b,
+ c as d,
+):
pass
```
Closes https://github.com/astral-sh/ruff/issues/6600.
## Test Plan
Before:
| project | similarity index |
|--------------|------------------|
| cpython | 0.75473 |
| django | 0.99804 |
| transformers | 0.99618 |
| twine | 0.99876 |
| typeshed | 0.74292 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
After:
| project | similarity index |
|--------------|------------------|
| cpython | 0.75473 |
| django | 0.99804 |
| transformers | 0.99618 |
| twine | 0.99876 |
| typeshed | 0.74292 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
`cargo test`
## Summary
Attaches comments around the `:=` operator in a named expression as
dangling, and formats them manually in the `named_expr.rs` formatter.
Closes https://github.com/astral-sh/ruff/issues/5695.
## Test Plan
`cargo test`
## Summary
This PR exposes our `is_expression_parenthesized` logic such that we can
use it to expand expressions when autofixing to include their
parenthesized ranges.
This solution has a few drawbacks: (1) we need to compute parenthesized
ranges in more places, which also relies on backwards lexing; and (2) we
need to make use of this in any relevant fixes.
However, I still think it's worth pursuing. On (1), the implementation
is very contained, so IMO we can easily swap this out for a more
performant solution in the future if needed. On (2), this improves
correctness and fixes some bad syntax errors detected by fuzzing, which
means it has value even if it's not as robust as an _actual_
`ParenthesizedExpression` node in the AST itself.
Closes https://github.com/astral-sh/ruff/issues/4925.
## Test Plan
`cargo test` with new cases that previously failed the fuzzer.
## Summary
I noticed some inconsistencies around uses of `.range.start()`, structs
that have a `TextRange` field but don't implement `Ranged`, etc.
## Test Plan
`cargo test`
## Summary
Given:
```python
if (
x
:=
( # 4
y # 5
) # 6
):
pass
```
It turns out the parser ended the range of the `NamedExpr` at the end of
`y`, rather than the end of the parenthesis that encloses `y`. This just
seems like a bug -- the range should be from the start of the name on
the left, to the end of the parenthesized node on the right.
## Test Plan
`cargo test`
Closes https://github.com/astral-sh/ruff/issues/6442
Python string formatting like `"hello {place}".format(place="world")`
supports format specifications for replaced content such as `"hello
{place:>10}".format(place="world")` which will align the text to the
right in a container filled up to ten characters.
Ruff parses formatted strings into `FormatPart`s each of which is either
a `Field` (content in `{...}`) or a `Literal` (the normal content).
Fields are parsed into name and format specifier sections (we'll ignore
conversion specifiers for now).
There are a myriad of specifiers that can be used in a `FormatSpec`.
Unfortunately for linters, the specifier values can be dynamically set.
For example, `"hello {place:{align}{width}}".format(place="world",
align=">", width=10)` and `"hello {place:{fmt}}".format(place="world",
fmt=">10")` will yield the same string as before but variables can be
used to determine the formatting. In this case, when parsing the format
specifier we can't know what _kind_ of specifier is being used as their
meaning is determined by both position and value.
Ruff does not support nested replacements and our current data model
does not support the concept. Here the data model is updated to support
this concept, although linting of specifications with replacements will
be inherently limited. We could split format specifications into two
types, one without any replacements that we can perform lints with and
one with replacements that we cannot inspect. However, it seems
excessive to drop all parsing of format specifiers due to the presence
of a replacement. Instead, I've opted to parse replacements eagerly and
ignore their possible effect on other format specifiers. This will allow
us to retain a simple interface for `FormatSpec` and most syntax checks.
We may need to add some handling to relax errors if a replacement was
seen previously.
It's worth noting that the nested replacement _can_ also include a
format specification although it may fail at runtime if you produce an
invalid outer format specification. For example, `"hello
{place:{fmt:<2}}".format(place="world", fmt=">10")` is valid so we need
to represent each nested replacement as a full `FormatPart`.
## Test plan
Adding unit tests for `FormatSpec` parsing and snapshots for PLE1300