## Summary
Document all `ruff_dev` subcommands and document the `format_dev` flags
in the formatter readme.
CC @zanieb please flag everything that isn't clear or missing
## Test Plan
n/a
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR matches Black' behavior where it only omits the optional parentheses if the expression starts or ends with a parenthesized expression:
```python
a + [aaa, bbb, cccc] * c # Don't omit
[aaa, bbb, cccc] + a * c # Split
a + c * [aaa, bbb, ccc] # Split
```
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
This improves the Jaccard index from 0.945 to 0.946
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR improves the Black compatibility when it comes to breaking comprehensions.
We want to avoid line breaks before the target and `in` whenever possible. Furthermore, `if X is not None` should be grouped together, similar to other binary like expressions
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
`cargo test`
<!-- How was it tested? -->
## Summary
We don't use `ModExpression` anywhere but it's part of the AST, removes
one `not_implemented_yet` and is a trivial 2-liner, so i implemented
formatting for `ModExpression`.
## Test Plan
None, this kind of node does not occur in file input. Otherwise all the
tests for expressions
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
I started working on this because I assumed that I would need access to options inside of `NeedsParantheses` but it then turned out that I won't.
Anyway, it kind of felt nice to pass fewer arguments. So I'm gonna put this out here to get your feedback if you prefer this over passing individual fiels.
Oh, I sneeked in another change. I renamed `context.contents` to `source`. `contents` is too generic and doesn't tell you anything.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
It compiles
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR implements Black's behavior where it first splits off parenthesized expressions before splitting before operands to avoid unnecessary parentheses:
```python
# We want
if a + [
b,
c
]:
pass
# Rather than
if (
a
+ [b, c]
):
pass
```
This is implemented by using the new IR elements introduced in #5596.
* We give the group wrapping the optional parentheses an ID (`parentheses_id`)
* We use `conditional_group` for the lower priority groups (all non-parenthesized expressions) with the condition that the `parentheses_id` group breaks (we want to split before operands only if the parentheses are necessary)
* We use `fits_expanded` to wrap all other parenthesized expressions (lists, dicts, sets), to prevent that expanding e.g. a list expands the `parentheses_id` group. We gate the `fits_expand` to only apply if the `parentheses_id` group fits (because we prefer `a\n+[b, c]` over expanding `[b, c]` if the whole expression gets parenthesized).
We limit using `fits_expanded` and `conditional_group` only to expressions that themselves are not in parentheses (checking the conditions isn't free)
## Test Plan
It increases the Jaccard index for Django from 0.915 to 0.917
## Incompatibilites
There are two incompatibilities left that I'm aware of (there may be more, I didn't go through all snapshot differences).
### Long string literals
I commented on the regression. The issue is that a very long string (or any content without a split point) may not fit when only breaking the right side. The formatter than inserts the optional parentheses. But this is kind of useless because the overlong string will still not fit, because there are no new split points.
I think we should ignore this incompatibility for now
### Expressions on statement level
I don't fully understand the logic behind this yet, but black doesn't break before the operators for the following example even though the expression exceeds the configured line width
```python
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa < bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb > ccccccccccccccccccccccccccccc == ddddddddddddddddddddd
```
But it would if the expression is used inside of a condition.
What I understand so far is that Black doesn't insert optional parentheses on the expression statement level (and a few other places) and, therefore, only breaks after opening parentheses. I propose to keep this deviation for now to avoid overlong-lines and use the compatibility report to make a decision if we should implement the same behavior.
## Summary
The similarity index, the fraction of unchanged lines, is easier to
understand than the jaccard index, the fraction between intersection and
union.
## Test Plan
I ran this on django and git a 0.945 index, meaning 5.5% of lines are
currently reformatted when compared to black
## Summary
Format statements such as `tree_depth += 1`. This is a statement that
does not allow any line breaks, the only thing to be mindful of is to
parenthesize the assigned expression
Jaccard index on django: 0.915 -> 0.918
## Test Plan
black tests, and two new tests, a basic one and one that ensures that
the child gets parentheses. I ran the django stability check.
## Summary
This is the result of running `cargo +nightly clippy --workspace
--all-targets --all-features -- -D warnings` and fixing all violations.
Just wanted to see if there were any interesting new checks on nightly
👀
## Summary
This PR implements the formatting of `raise` statements. I haven't
looked at the black implementation, this is inspired from from the
`return` statements formatting.
## Test Plan
The black differences with insta.
I also compared manually some edge cases with very long string and call
chaining and it seems to do the same formatting as black.
There is one issue:
```python
# input
raise OsError(
"aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa"
) from a.aaaaa(aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa).a(aaaa)
# black
raise OsError(
"aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa"
) from a.aaaaa(
aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa
).a(
aaaa
)
# ruff
raise OsError(
"aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa"
) from a.aaaaa(
aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa
).a(aaaa)
```
But I'm not sure this diff is the raise formatting implementation.
---------
Co-authored-by: Louis Dispa <ldispa@deezer.com>
## Summary
Fix an oversight in `find_only_token_in_range` where the following code
would panic due do the closing and opening parentheses being in the
range we scan:
```python
d1 = [
("a") if # 1
("b") else # 2
("c")
]
```
Closing and opening parentheses respectively are now correctly skipped.
## Test Plan
I added a regression test
## Summary
Format named expressions (walrus operator) such a `value := f()`.
Unlike tuples, named expression parentheses are not part of the range
even when mandatory, so mapping optional parentheses to always gives us
decent formatting without implementing all [PEP
572](https://peps.python.org/pep-0572/) rules on when we need
parentheses where other expressions wouldn't. We might want to revisit
this decision later and implement special cases, but for now this gives
us what we need.
## Test Plan
black fixtures, i added some fixtures and checked django and cpython for
stability.
Closes#5613
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
Fix typos found by
[codespell](https://github.com/codespell-project/codespell).
I have left out `memoize` for now (see #5606).
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
CI tests.
<!-- How was it tested? -->
## Summary
Format `ExprIfExp`, also known as the ternary operator or inline `if`.
It can look like
```python
a1 = 1 if True else 2
```
but also
```python
b1 = (
# We return "a" ...
"a" # that's our True value
# ... if this condition matches ...
if True # that's our test
# ... otherwise we return "b§
else "b" # that's our False value
)
```
This also fixes a visitor order bug.
The jaccard index on django goes from 0.911 to 0.915.
## Test Plan
I added fixtures without and with comments in strange places.
## Summary
This extends the `ruff_dev` formatter script util. Instead of only doing
stability checks, you can now choose different compatible options on the
CLI and get statistics.
* It adds an option the formats all files that ruff would check to allow
looking at an entire black-formatted repository with `git diff`
* It computes the [Jaccard
index](https://en.wikipedia.org/wiki/Jaccard_index) as a measure of
deviation between input and output, which is useful as single number
metric for assessing our current deviations from black.
* It adds progress bars to both the single projects as well as the
multi-project mode.
* It adds an option to write the multi-project output to a file
Sample usage:
```
$ cargo run --bin ruff_dev -- format-dev --stability-check crates/ruff/resources/test/cpython
$ cargo run --bin ruff_dev -- format-dev --stability-check /home/konsti/projects/django
Syntax error in /home/konsti/projects/django/tests/test_runner_apps/tagged/tests_syntax_error.py: source contains syntax errors (parser error): BaseError { error: UnrecognizedToken(Name { name: "syntax_error" }, None), offset: 131, source_path: "<filename>" }
Found 0 stability errors in 2755 files (jaccard index 0.911) in 9.75s
$ cargo run --bin ruff_dev -- format-dev --write /home/konsti/projects/django
```
Options:
```
Several utils related to the formatter which can be run on one or more repositories. The selected set of files in a repository is the same as for `ruff check`.
* Check formatter stability: Format a repository twice and ensure that it looks that the first and second formatting look the same. * Format: Format the files in a repository to be able to check them with `git diff` * Statistics: The subcommand the Jaccard index between the (assumed to be black formatted) input and the ruff formatted output
Usage: ruff_dev format-dev [OPTIONS] [FILES]...
Arguments:
[FILES]...
Like `ruff check`'s files. See `--multi-project` if you want to format an ecosystem checkout
Options:
--stability-check
Check stability
We want to ensure that once formatted content stays the same when formatted again, which is known as formatter stability or formatter idempotency, and that the formatter prints syntactically valid code. As our test cases cover only a limited amount of code, this allows checking entire repositories.
--write
Format the files. Without this flag, the python files are not modified
--format <FORMAT>
Control the verbosity of the output
[default: default]
Possible values:
- minimal: Filenames only
- default: Filenames and reduced diff
- full: Full diff and invalid code
-x, --exit-first-error
Print only the first error and exit, `-x` is same as pytest
--multi-project
Checks each project inside a directory, useful e.g. if you want to check all of the ecosystem checkouts
--error-file <ERROR_FILE>
Write all errors to this file in addition to stdout. Only used in multi-project mode
```
## Test Plan
I ran this on django (2755 files, jaccard index 0.911) and discovered a
magic trailing comma problem and that we really needed to implement
import formatting. I ran the script on cpython to identify
https://github.com/astral-sh/ruff/pull/5558.
## Summary
The following code was previously leading to unstable formatting:
```python
try:
try:
pass
finally:
print(1) # issue7208
except A:
pass
```
The comment would be formatted as a trailing comment of `try` which is
unstable as an end-of-line comment gets two extra whitespaces.
This was originally found in
99b00efd5e/Lib/getpass.py (L68-L91)
## Test Plan
I added a regression test
## Summary
Format import statements in all their variants. Specifically, this
implemented formatting `StmtImport`, `StmtImportFrom` and `Alias`.
## Test Plan
I added some custom snapshots, even though this has been covered well by
black's tests.
## Summary
If a comma separated list has only one entry, black will respect the
magic trailing comma, but it will not add a new one.
The following code will remain as is:
```python
b1 = [
aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa
]
b2 = [
aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa,
]
b3 = [
aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa,
aksjdhflsakhdflkjsadlfajkslhfdkjsaldajlahflashdfljahlfksajlhfajfjfsaahflakjslhdfkjalhdskjfa
]
```
## Test Plan
This was first discovered in
7eeadc82c2/django/contrib/admin/checks.py (L674-L681),
which i've minimized into a call test.
I've added tests for the three cases (one entry + no comma, one entry +
comma, more than one entry) to the list tests.
The diffs from the black tests get smaller.
## Summary
Change generator formatting dummy to include `NOT_YET_IMPLEMENTED`. This
makes it easier to correctly identify them as dummies
## Test Plan
This is a dummy change
Support for `let…else` formatting was just merged to nightly
(rust-lang/rust#113225). Rerun `cargo fmt` with Rust nightly 2023-07-02
to pick this up. Followup to #939.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR normalizes line endings inside of strings to `\n` as required by the printer.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
I added a new test using `\r\n` and ran the ecosystem check. There are no remaining end of line panics.
https://gist.github.com/MichaReiser/8f36b1391ca7b48475b3a4f592d74ff4
<!-- How was it tested? -->
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR fixes an issue where the binary expression formatting removed parentheses around the left hand side of an expression.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
I added a new regression test and re-ran the ecosystem check. It brings down the `check-formatter-stability` output from a 3.4MB file down to 900KB.
<!-- How was it tested? -->
## Summary
This formats call expressions with magic trailing comma and parentheses
behaviour but without call chaining
## Test Plan
Lots of new test fixtures, including some that don't work yet
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR extends the string formatting to respect the configured quote style.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
Extended the string test with new cases and set it up to run twice: Once with the `quote_style: Doube`, and once with `quote_style: Single` single and double quotes.
<!-- How was it tested? -->
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR adds tests that verify that the magic trailing comma is not respected if disabled in the formatter options.
Our test setup now allows to create a `<fixture-name>.options.json` file that contains an array of configurations that should be tested.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
It's all about tests :)
<!-- How was it tested? -->
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR adds a new `PyFormatOptions` struct that stores the python formatter options.
The new options aren't used yet, with the exception of magical trailing commas and the options passed to the printer.
I'll follow up with more PRs that use the new options (e.g. `QuoteStyle`).
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
`cargo test` I'll follow up with a new PR that adds support for overriding the options in our fixture tests.
## Motation
Previously,
```python
x = (
a1
.a2
# a
. # b
# c
a3
)
```
got formatted as
```python
x = a1.a2
# a
. # b
# c
a3
```
which is invalid syntax. This fixes that.
## Summary
This implements a basic form of attribute chaining
(<https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#call-chains>)
by checking if any inner attribute access contains an own line comment,
and if this is the case, adds parentheses around the outermost attribute
access while disabling parentheses for all inner attribute expressions.
We want to replace this with an implementation that uses recursion or a
stack while formatting instead of in `needs_parentheses` and also
includes calls rather sooner than later, but i'm fixing this now because
i'm uncomfortable with having known invalid syntax generation in the
formatter.
## Test Plan
I added new fixtures.
## Summary
This is small refactoring to reuse the code that detects the magic
trailing comma across functions. I make this change now to avoid copying
code in a later PR. @MichaReiser is planning on making a larger
refactoring later that integrates with the join nodes builder
## Test Plan
No functional changes. The magic trailing comma behaviour is checked by
the fixtures.
In the following code, the comment used to get wrongly associated with
the `if False` since it looked like an elif. This fixes it by checking
the indentation and adding a regression test
```python
if True:
pass
else: # Comment
if False:
pass
pass
```
Originally found in
1570b94a02/gradio/external.py (L478)
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR implements formatting for non-f-string Strings that do not use implicit concatenation.
Docstring formatting is out of the scope of this PR.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
I added a few tests for simple string literals.
## Performance
Ouch. This is hitting performance somewhat hard. This is probably because we now iterate each string a couple of times:
1. To detect if it is an implicit string continuation
2. To detect if the string contains any new lines
3. To detect the preferred quote
4. To normalize the string
Edit: I integrated the detection of newlines into the preferred quote detection so that we only iterate the string three time.
We can probably do better by merging the implicit string continuation with the quote detection and new line detection by iterating till the end of the string part and returning the offset. We then use our simple tokenizer to skip over any comments or whitespace until we find the first non trivia token. From there we keep continue doing this in a loop until we reach the end o the string. I'll leave this improvement for later.
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR adds basic formatting for compare operations.
The implementation currently breaks diffeently when nesting binary like expressions. I haven't yet figured out what Black's logic is in that case but I think that this by itself is already an improvement worth merging.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
I added a few new tests
<!-- How was it tested? -->
## Summary
This snippet used to panic because it expected to see a comma or
something similar after the `2` but met the closing parentheses that is
not part of the range and panicked
```python
a = {
1: (2),
# comment
3: True,
}
```
Originally found in
636a717ef0/testing/marionette/client/marionette_driver/geckoinstance.py (L109)
This snippet is also the test plan.
This solves an instability when formatting cpython. It also introduces
another one, but i think it's still a worthwhile change for now.
There's no proper testing since this is just a dummy.
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
format StmtFor
still trying to learn how to help out with the formatter. trying
something slightly more advanced than [break](#5158)
mostly copied form StmtWhile
## Test Plan
snapshots
## Motivation
While black keeps parentheses nearly everywhere, the notable exception
is in the body of for loops:
```python
for (a, b) in x:
pass
```
becomes
```python
for a, b in x:
pass
```
This currently blocks #5163, which this PR should unblock.
## Solution
This changes the `ExprTuple` formatting option to include one additional
option that removes the parentheses when not using magic trailing comma
and not breaking. It is supposed to be used through
```rust
#[derive(Debug)]
struct ExprTupleWithoutParentheses<'a>(&'a Expr);
impl Format<PyFormatContext<'_>> for ExprTupleWithoutParentheses<'_> {
fn fmt(&self, f: &mut Formatter<PyFormatContext<'_>>) -> FormatResult<()> {
match self.0 {
Expr::Tuple(expr_tuple) => expr_tuple
.format()
.with_options(TupleParentheses::StripInsideForLoop)
.fmt(f),
other => other.format().with_options(Parenthesize::IfBreaks).fmt(f),
}
}
}
```
## Testing
The for loop formatting isn't merged due to missing this (and i didn't
want to create more git weirdness across two people), but I've confirmed
that when applying this to while loops instead of for loops, then
```rust
write!(
f,
[
text("while"),
space(),
ExprTupleWithoutParentheses(test.as_ref()),
text(":"),
trailing_comments(trailing_condition_comments),
block_indent(&body.format())
]
)?;
```
makes
```python
while (a, b):
pass
while (
ajssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssa,
b,
):
pass
while (a,b,):
pass
```
formatted as
```python
while a, b:
pass
while (
ajssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssa,
b,
):
pass
while (
a,
b,
):
pass
```
## Summary
This is a complete rewrite of the handling of `/` and `*` comment
handling in function signatures. The key problem is that slash and star
don't have a note. We now parse out the positions of slash and star and
their respective preceding and following note. I've left code comments
for each possible case of function signature structure and comment
placement
## Test Plan
I extended the function statement fixtures with cases that i found. If
you have more weird edge cases your input would be appreciated.
## Summary
This fixes two problems discovered when trying to format the cpython
repo with `cargo run --bin ruff_dev -- check-formatter-stability
projects/cpython`:
The first is to ignore try/except trailing comments for now since they
lead to unstable formatting on the dummy.
The second is to avoid dropping trailing if comments through placement:
This changes the placement to keep a comment trailing an if-elif or
if-elif-else to keep the comment a trailing comment on the entire if.
Previously the last comment would have been lost.
```python
if "first if":
pass
elif "first elif":
pass
```
The last remaining problem in cpython so far is function signature
argument separator comment placement which is its own PR on top of this.
## Test Plan
I added test fixtures of minimized examples with links back to the
original cpython location
This formats slice expressions and subscript expressions.
Spaces around the colons follows the same rules as black
(https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#slices):
```python
e00 = "e"[:]
e01 = "e"[:1]
e02 = "e"[: a()]
e10 = "e"[1:]
e11 = "e"[1:1]
e12 = "e"[1 : a()]
e20 = "e"[a() :]
e21 = "e"[a() : 1]
e22 = "e"[a() : a()]
e200 = "e"[a() : :]
e201 = "e"[a() :: 1]
e202 = "e"[a() :: a()]
e210 = "e"[a() : 1 :]
```
Comment placement is different due to our very different infrastructure.
If we have explicit bounds (e.g. `x[1:2]`) all comments get assigned as
leading or trailing to the bound expression. If a bound is missing
`[:]`, comments get marked as dangling and placed in the same section as
they were originally in:
```python
x = "x"[ # a
# b
: # c
# d
]
```
to
```python
x = "x"[
# a
# b
:
# c
# d
]
```
Except for the potential trailing end-of-line comments, all comments get
formatted on their own line. This can be improved by keeping end-of-line
comments after the opening bracket or after a colon as such but the
changes were already complex enough.
I added tests for comment placement and spaces.
## Summary
I found it hard to figure out which function decides placement for a
specific comment. An explicit loop makes this easier to debug
## Test Plan
There should be no functional changes, no changes to the formatting of
the fixtures.
## Summary
Previously, `DecoratedComment` used `text_position()` and
`SourceComment` used `position()`. This PR unifies this to
`line_position` everywhere.
## Test Plan
This is a rename refactoring.
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR adds basic formatting for unary expressions.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
I added a new `unary.py` with custom test cases
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
Black supports for layouts when it comes to breaking binary expressions:
```rust
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
enum BinaryLayout {
/// Put each operand on their own line if either side expands
Default,
/// Try to expand the left to make it fit. Add parentheses if the left or right don't fit.
///
///```python
/// [
/// a,
/// b
/// ] & c
///```
ExpandLeft,
/// Try to expand the right to make it fix. Add parentheses if the left or right don't fit.
///
/// ```python
/// a & [
/// b,
/// c
/// ]
/// ```
ExpandRight,
/// Both the left and right side can be expanded. Try in the following order:
/// * expand the right side
/// * expand the left side
/// * expand both sides
///
/// to make the expression fit
///
/// ```python
/// [
/// a,
/// b
/// ] & [
/// c,
/// d
/// ]
/// ```
ExpandRightThenLeft,
}
```
Our current implementation only handles `ExpandRight` and `Default` correctly. This PR adds support for `ExpandRightThenLeft` and `ExpandLeft`.
## Test Plan
I added tests that play through all 4 binary expression layouts.
## Summary
In https://github.com/astral-sh/RustPython-Parser/pull/8, we modified
RustPython to include ranges for any identifiers that aren't
`Expr::Name` (which already has an identifier).
For example, the `e` in `except ValueError as e` was previously
un-ranged. To extract its range, we had to do some lexing of our own.
This change should improve performance and let us remove a bunch of
code.
## Test Plan
`cargo test`
## Summary
This PR upgrade RustPython to pull in the changes to `Arguments` (zip
defaults with their identifiers) and all the renames to `CmpOp` and
friends.
This documentation change improves the section on dangling comments in
the formatter.
---------
Co-authored-by: David Szotten <davidszotten@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
Format `continue` statement.
## Test Plan
`continue` is used already in some tests, but if a new test is needed I
could add it.
---------
Co-authored-by: konstin <konstin@mailbox.org>
## Summary
At present, when we store a binding, we include a `TextRange` alongside
it. The `TextRange` _sometimes_ matches the exact range of the
identifier to which the `Binding` is linked, but... not always.
For example, given:
```python
x = 1
```
The binding we create _will_ use the range of `x`, because the left-hand
side is an `Expr::Name`, which has a valid range on it.
However, given:
```python
try:
pass
except ValueError as e:
pass
```
When we create a binding for `e`, we don't have a `TextRange`... The AST
doesn't give us one. So we end up extracting it via lexing.
This PR extends that pattern to the rest of the binding kinds, to ensure
that whenever we create a binding, we always use the range of the bound
name. This leads to better diagnostics in cases like pattern matching,
whereby the diagnostic for "unused variable `x`" here used to include
`*x`, instead of just `x`:
```python
def f(provided: int) -> int:
match provided:
case [_, *x]:
pass
```
This is _also_ required for symbol renames, since we track writes as
bindings -- so we need to know the ranges of the bound symbols.
By storing these bindings precisely, we can also remove the
`binding.trimmed_range` abstraction -- since bindings already use the
"trimmed range".
To implement this behavior, I took some of our existing utilities (like
the code we had for `except ValueError as e` above), migrated them from
a full lexer to a zero-allocation lexer that _only_ identifies
"identifiers", and moved the behavior into a trait, so we can now do
`stmt.identifier(locator)` to get the range for the identifier.
Honestly, we might end up discarding much of this if we decide to put
ranges on all identifiers
(https://github.com/astral-sh/RustPython-Parser/pull/8). But even if we
do, this will _still_ be a good change, because the lexer introduced
here is useful beyond names (e.g., we use it find the `except` keyword
in an exception handler, to find the `else` after a `for` loop, and so
on). So, I'm fine committing this even if we end up changing our minds
about the right approach.
Closes#5090.
## Benchmarks
No significant change, with one statistically significant improvement
(-2.1654% on `linter/all-rules/large/dataset.py`):
```
linter/default-rules/numpy/globals.py
time: [73.922 µs 73.955 µs 73.986 µs]
thrpt: [39.882 MiB/s 39.898 MiB/s 39.916 MiB/s]
change:
time: [-0.5579% -0.4732% -0.3980%] (p = 0.00 < 0.05)
thrpt: [+0.3996% +0.4755% +0.5611%]
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
linter/default-rules/pydantic/types.py
time: [1.4909 ms 1.4917 ms 1.4926 ms]
thrpt: [17.087 MiB/s 17.096 MiB/s 17.106 MiB/s]
change:
time: [+0.2140% +0.2741% +0.3392%] (p = 0.00 < 0.05)
thrpt: [-0.3380% -0.2734% -0.2136%]
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
linter/default-rules/numpy/ctypeslib.py
time: [688.97 µs 691.34 µs 694.15 µs]
thrpt: [23.988 MiB/s 24.085 MiB/s 24.168 MiB/s]
change:
time: [-1.3282% -0.7298% -0.1466%] (p = 0.02 < 0.05)
thrpt: [+0.1468% +0.7351% +1.3461%]
Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
2 (2.00%) high mild
12 (12.00%) high severe
linter/default-rules/large/dataset.py
time: [3.3872 ms 3.4032 ms 3.4191 ms]
thrpt: [11.899 MiB/s 11.954 MiB/s 12.011 MiB/s]
change:
time: [-0.6427% -0.2635% +0.0906%] (p = 0.17 > 0.05)
thrpt: [-0.0905% +0.2642% +0.6469%]
No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
1 (1.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
13 (13.00%) high severe
linter/all-rules/numpy/globals.py
time: [148.99 µs 149.21 µs 149.42 µs]
thrpt: [19.748 MiB/s 19.776 MiB/s 19.805 MiB/s]
change:
time: [-0.7340% -0.5068% -0.2778%] (p = 0.00 < 0.05)
thrpt: [+0.2785% +0.5094% +0.7395%]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
linter/all-rules/pydantic/types.py
time: [3.0362 ms 3.0396 ms 3.0441 ms]
thrpt: [8.3779 MiB/s 8.3903 MiB/s 8.3997 MiB/s]
change:
time: [-0.0957% +0.0618% +0.2125%] (p = 0.45 > 0.05)
thrpt: [-0.2121% -0.0618% +0.0958%]
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
3 (3.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
linter/all-rules/numpy/ctypeslib.py
time: [1.6879 ms 1.6894 ms 1.6909 ms]
thrpt: [9.8478 MiB/s 9.8562 MiB/s 9.8652 MiB/s]
change:
time: [-0.2279% -0.0888% +0.0436%] (p = 0.18 > 0.05)
thrpt: [-0.0435% +0.0889% +0.2284%]
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high severe
linter/all-rules/large/dataset.py
time: [7.1520 ms 7.1586 ms 7.1654 ms]
thrpt: [5.6777 MiB/s 5.6831 MiB/s 5.6883 MiB/s]
change:
time: [-2.5626% -2.1654% -1.7780%] (p = 0.00 < 0.05)
thrpt: [+1.8102% +2.2133% +2.6300%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
```
## Summary
This fixes a number of problems in the formatter that showed up with
various files in the [cpython](https://github.com/python/cpython)
repository. These problems surfaced as unstable formatting and invalid
code. This is not the entirety of problems discovered through cpython,
but a big enough chunk to separate it. Individual fixes are generally
individual commits. They were discovered with #5055, which i update as i
work through the output
## Test Plan
I added regression tests with links to cpython for each entry, except
for the two stubs that also got comment stubs since they'll be
implemented properly later.
## Summary
This PR runs `rustfmt` with a few nightly options as a one-time fix to
catch some malformatted comments. I ended up just running with:
```toml
condense_wildcard_suffixes = true
edition = "2021"
max_width = 100
normalize_comments = true
normalize_doc_attributes = true
reorder_impl_items = true
unstable_features = true
use_field_init_shorthand = true
```
Since these all seem like reasonable things to fix, so may as well while
I'm here.
I've written done my condensed learnings from working on the formatter
so that others can have an easier start working on it.
This is a pure docs change
This implements formatting ExprTuple, including magic trailing comma. I
intentionally didn't change the settings mechanism but just added a
dummy global const flag.
Besides the snapshots, I added custom breaking/joining tests and a
deeply nested test case. The diffs look better than previously, proper
black compatibility depends on parentheses handling.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
## Summary
We use `.trim()` and friends in a bunch of places, to strip whitespace
from source code. However, not all Unicode whitespace characters are
considered "whitespace" in Python, which only supports the standard
space, tab, and form-feed characters.
This PR audits our usages of `.trim()`, `.trim_start()`, `.trim_end()`,
and `char::is_whitespace`, and replaces them as appropriate with a new
`.trim_whitespace()` analogues, powered by a `PythonWhitespace` trait.
In general, the only place that should continue to use `.trim()` is
content within docstrings, which don't need to adhere to Python's
semantic definitions of whitespace.
Closes#4991.
## Summary
`ruff_newlines` becomes `ruff_python_whitespace`, and includes the
existing "universal newline" handlers alongside the Python
whitespace-specific utilities.
* Implement StmtPass
This implements StmtPass as `pass`.
The snapshot diff is small because pass mainly occurs in bodies and function (#4951) and if/for bodies.
* Implement StmtReturn
This implements StmtReturn as `return` or `return {value}`.
The snapshot diff is small because return occurs in functions (#4951)
* A basic StmtAssign formatter and better dummies for expressions
The goal of this PR was formatting StmtAssign since many nodes in the black tests (and in python in general) are after an assignment. This caused unstable formatting: The spacing of power op spacing depends on the type of the two involved expressions, but each expression was formatted as dummy string and re-parsed as a ExprName, so in the second round the different rules of ExprName were applied, causing unstable formatting.
This PR does not necessarily bring us closer to black's style, but it unlocks a good porting of black's test suite and is a basis for implementing the Expr nodes.
* fmt
* Review
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR replaces the `verbatim_text` builder with a `not_yet_implemented` builder that emits `NOT_YET_IMPLEMENTED_<NodeKind>` for not yet implemented nodes.
The motivation for this change is that partially formatting compound statements can result in incorrectly indented code, which is a syntax error:
```python
def func_no_args():
a; b; c
if True: raise RuntimeError
if False: ...
for i in range(10):
print(i)
continue
```
Get's reformatted to
```python
def func_no_args():
a; b; c
if True: raise RuntimeError
if False: ...
for i in range(10):
print(i)
continue
```
because our formatter does not yet support `for` statements and just inserts the text from the source.
## Downsides
Using an identifier will not work in all situations. For example, an identifier is invalid in an `Arguments ` position. That's why I kept `verbatim_text` around and e.g. use it in the `Arguments` formatting logic where incorrect indentations are impossible (to my knowledge). Meaning, `verbatim_text` we can opt in to `verbatim_text` when we want to iterate quickly on nodes that we don't want to provide a full implementation yet and using an identifier would be invalid.
## Upsides
Running this on main discovered stability issues with the newline handling that were previously "hidden" because of the verbatim formatting. I guess that's an upside :)
## Test Plan
None?
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This issue fixes the removal of empty lines between a leading comment and the previous statement:
```python
a = 20
# leading comment
b = 10
```
Ruff removed the empty line between `a` and `b` because:
* The leading comments formatting does not preserve leading newlines (to avoid adding new lines at the top of a body)
* The `JoinNodesBuilder` counted the lines before `b`, which is 1 -> Doesn't insert a new line
This is fixed by changing the `JoinNodesBuilder` to count the lines instead *after* the last node. This correctly gives 1, and the `# leading comment` will insert the empty lines between any other leading comment or the node.
## Test Plan
I added a new test for empty lines.
According to https://docs.python.org/3/library/ast.html#ast-helpers, we expect type_ignores to be always be empty, so this adds a debug assert.
Test plan: I confirmed that the assertion holdes for the file below and for all the black tests which include a number of `type: ignore` comments.
```python
# type: ignore
if 1:
print("1") # type: ignore
# elsebranch
# type: ignore
else: # type: ignore
print("2") # type: ignore
while 1:
print()
# type: ignore
```
* Implement module formatting using JoinNodesBuilder
This uses JoinNodesBuilder to implement module formatting for #4800
See the snapshots for the changed behaviour. See one PR up for a CLI that i used to verify the trailing new line behaviour
* Add a formatter CLI for debugging
This adds a ruff_python_formatter cli modelled aber `rustfmt` that i use for debugging
* clippy
* Add print IR and print comments options
Tested with `cargo run --bin ruff_python_formatter -- --print-ir --print-comments scratch.py`
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
And more custom logic around comments in bodies... uff.
Let's say we have the following code
```python
if x == y:
pass # trailing comment of pass
else: # trailing comment of `else`
print("I have no comments")
```
Right now, the formatter attaches the `# trailing comment of `else` as a trailing comment of `pass` because it doesn't "see" that there's an `else` keyword in between (because the else body is just a Vec and not a node).
This PR adds custom logic that attaches the trailing comments after the `else` as dangling comments to the `if` statement. The if statement must then split the dangling comments by `comments.text_position()`:
* All comments up to the first end-of-line comment are leading comments of the `else` keyword.
* All end-of-line comments coming after are `trailing` comments for the `else` keyword.
## Test Plan
I added new unit tests.
### Summary
This PR adds custom logic to handle end-of-line comments of the last statement in a body.
For example:
```python
while True:
if something.changed:
do.stuff() # trailing comment
b
```
The `# trailing comment` is a trailing comment of the `do.stuff()` expression statement. We incorrectly attached the comment as a trailing comment of the enclosing `while` statement because the comment is between the end of the while statement (the `while` statement ends right after `do.stuff()`) and before the `b` statement.
This PR fixes the placement to correctly attach these comments to the last statement in a body (recursively).
## Test Plan
I reviewed the snapshots and they now look correct. This may appear odd because a lot comments have now disappeared. This is the expected result because we use `verbatim` formatting for the block statements (like `while`) and that means that it only formats the inner content of the block, but not any trailing comments. The comments were visible before, because they were associated with the block statement (e.g. `while`).
* Add Format for Stmt
* Implement basic module formatting
This implements formatting each statement in a module with a hard line break in between, so that we can start formatting statements.
Basic testing is done by the snapshots
* Use dummy verbatim formatter for all nodes
* Use new formatter infrastructure in CLI and test
* Expose the new formatter in the CLI
* Merge import blocks
# Summary
We need to support CR line endings (as opposed to LF and CRLF line endings, which are already supported). They're rare, but they do appear in Python code, and we tend to panic on any file that uses them.
Our `Locator` abstraction now supports CR line endings. However, Rust's `str#lines` implementation does _not_.
This PR adds a `UniversalNewlineIterator` implementation that respects all of CR, LF, and CRLF line endings, and plugs it into most of the `.lines()` call sites.
As an alternative design, it could be nice if we could leverage `Locator` for this. We've already computed all of the line endings, so we could probably iterate much more efficiently?
# Test Plan
Largely relying on automated testing, however, also ran over some known failure cases, like #3404.
In hindsight, `ruff_python` is too general. A good giveaway is that it's actually a prefix of some other crates. The intent of this crate is to reimplement pieces of the Python standard library and CPython itself, so `ruff_python_stdlib` feels appropriate.
This PR enables us to apply the proper quotation marks, including support for escapes. There are some significant TODOs, especially around implicit concatenations like:
```py
(
"abc"
"def"
)
```
Which are represented as a single AST node, which requires us to tokenize _within_ the formatter to identify all the individual string parts.
I manually changed these in #3080 and #3083 to get the tests passing (with notes around the deviations) -- but that's no longer necessary, now that we have proper testing that takes deviations into account.
This just re-formats all the `.py.expect` files with Black, both to add a trailing newline and be doubly-certain that they're correctly formatted.
I also ensured that we add a hard line break after each statement, and that we avoid including an extra newline in the generated Markdown (since the code should contain the exact expected newlines).
This PR changes the testing infrastructure to run all black tests and:
* Pass if Ruff and Black generate the same formatting
* Fail and write a markdown snapshot that shows the input code, the differences between Black and Ruff, Ruffs output, and Blacks output
This is achieved by introducing a new `fixture` macro (open to better name suggestions) that "duplicates" the attributed test for every file that matches the specified glob pattern. Creating a new test for each file over having a test that iterates over all files has the advantage that you can run a single test, and that test failures indicate which case is failing.
The `fixture` macro also makes it straightforward to e.g. setup our own spec tests that test very specific formatting by creating a new folder and use insta to assert the formatted output.
# Summary
This PR contains the code for the autoformatter proof-of-concept.
## Crate structure
The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`.
The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows:
- Tokenize the code using the RustPython lexer.
- In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
- Generate the AST via the RustPython parser.
- In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further.
- In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
- In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go.
- Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`).
## Testing and iteration
The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`.
The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo.
My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.
## What works, and what doesn’t
- *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`).
- Newlines are properly preserved in most cases.
- Magic trailing commas are properly preserved in some (but not all) cases.
- Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
- Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
- We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
- We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):
```py
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
b,
c,
d,
):
pass
# ...whereas we do this:
if (
some_expression
and f(
b,
c,
d,
)
):
pass
# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
a, b, c, d, e, f, g, ...
):
pass
```
- We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.