Commit Graph

23 Commits

Author SHA1 Message Date
Micha Reiser c847cad389
Update insta snapshots (#14366) 2024-11-15 19:31:15 +01:00
renovate[bot] 53a80a5c11
Update Rust crate rustc-hash to v2 (#12001) 2024-06-23 20:46:42 -04:00
Dhruv Manilawala 13ffb5bc19
Replace LALRPOP parser with hand-written parser (#10036)
(Supersedes #9152, authored by @LaBatata101)

## Summary

This PR replaces the current parser generated from LALRPOP to a
hand-written recursive descent parser.

It also updates the grammar for [PEP
646](https://peps.python.org/pep-0646/) so that the parser outputs the
correct AST. For example, in `data[*x]`, the index expression is now a
tuple with a single starred expression instead of just a starred
expression.

Beyond the performance improvements, the parser is also error resilient
and can provide better error messages. The behavior as seen by any
downstream tools isn't changed. That is, the linter and formatter can
still assume that the parser will _stop_ at the first syntax error. This
will be updated in the following months.

For more details about the change here, refer to the PR corresponding to
the individual commits and the release blog post.

## Test Plan

Write _lots_ and _lots_ of tests for both valid and invalid syntax and
verify the output.

## Acknowledgements

- @MichaReiser for reviewing 100+ parser PRs and continuously providing
guidance throughout the project
- @LaBatata101 for initiating the transition to a hand-written parser in
#9152
- @addisoncrump for implementing the fuzzer which helped
[catch](https://github.com/astral-sh/ruff/pull/10903)
[a](https://github.com/astral-sh/ruff/pull/10910)
[lot](https://github.com/astral-sh/ruff/pull/10966)
[of](https://github.com/astral-sh/ruff/pull/10896)
[bugs](https://github.com/astral-sh/ruff/pull/10877)

---------

Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org>
Co-authored-by: Micha Reiser <micha@reiser.io>
2024-04-18 17:57:39 +05:30
Dhruv Manilawala 230c9ce236
Split `Constant` to individual literal nodes (#8064)
## Summary

This PR splits the `Constant` enum as individual literal nodes. It
introduces the following new nodes for each variant:
* `ExprStringLiteral`
* `ExprBytesLiteral`
* `ExprNumberLiteral`
* `ExprBooleanLiteral`
* `ExprNoneLiteral`
* `ExprEllipsisLiteral`

The main motivation behind this refactor is to introduce the new AST
node for implicit string concatenation in the coming PR. The elements of
that node will be either a string literal, bytes literal or a f-string
which can be implemented using an enum. This means that a string or
bytes literal cannot be represented by `Constant::Str` /
`Constant::Bytes` which creates an inconsistency.

This PR avoids that inconsistency by splitting the constant nodes into
it's own literal nodes, literal being the more appropriate naming
convention from a static analysis tool perspective.

This also makes working with literals in the linter and formatter much
more ergonomic like, for example, if one would want to check if this is
a string literal, it can be done easily using
`Expr::is_string_literal_expr` or matching against `Expr::StringLiteral`
as oppose to matching against the `ExprConstant` and enum `Constant`. A
few AST helper methods can be simplified as well which will be done in a
follow-up PR.

This introduces a new `Expr::is_literal_expr` method which is the same
as `Expr::is_constant_expr`. There are also intermediary changes related
to implicit string concatenation which are quiet less. This is done so
as to avoid having a huge PR which this already is.

## Test Plan

1. Verify and update all of the existing snapshots (parser, visitor)
2. Verify that the ecosystem check output remains **unchanged** for both
the linter and formatter

### Formatter ecosystem check

#### `main`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |

#### `dhruv/constant-to-literal`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
2023-10-30 12:13:23 +05:30
Micha Reiser 26ae0a6e8d
Fix dangling module comments (#7456) 2023-09-17 14:56:41 +00:00
Charlie Marsh 376d3caf47
Treat empty-line separated comments as trailing statement comments (#6999)
## Summary

This PR modifies our between-statement comment handling such that
comments that are not separated by a statement by any newlines continue
to be treated as leading comments on the statement, but comments that
_are_ separated are instead formatted as trailing comments on the
preceding statement.

See, e.g., the originating snippet:

```python
DEFAULT_TEMPLATE = "flatpages/default.html"

# This view is called from FlatpageFallbackMiddleware.process_response
# when a 404 is raised, which often means CsrfViewMiddleware.process_view
# has not been called even if CsrfViewMiddleware is installed. So we need
# to use @csrf_protect, in case the template needs {% csrf_token %}.
# However, we can't just wrap this view; if no matching flatpage exists,
# or a redirect is required for authentication, the 404 needs to be returned
# without any CSRF checks. Therefore, we only
# CSRF protect the internal implementation.


def flatpage(request, url):
    pass
```

Here, we need to ensure that the `def flatpage` is precede by two empty
lines. However, we want those two empty lines to be enforced from the
_end_ of the comment block, _unless_ the comments are directly atop the
`def flatpage`.

I played with this a bit, and I think the simplest conceptual model and
implementation is to instead treat those as trailing comments on the
preceding node. The main difficulty with this approach is that, in order
to be fully compatible with Black, we'd sometimes need to insert
newlines _between_ the preceding node and its trailing comments. See,
e.g.:

```python
def func():
    ...
# comment

x = 1
```

In this case, we'd need to insert two blank lines between `def func():
...` and `# comment`, but `# comment` is trailing comment on `def
func(): ...`. So, we'd need to take this case into account in the
various nodes that _require_ newlines after them: functions, classes,
and imports. After some discussion, we've opted _not_ to support this,
and just treat these as trailing comments -- so we won't insert newlines
there. This means our handling is still identical to Black's on
Black-formatted code, but avoids moving such trailing comments on
unformatted code.

I dislike that the empty handling is so complex, and that it's split
between so many different nodes, but this is really tricky. Continuing
to treat these as leading comments is very difficult too, since we'd
need to do similar tricks for the leading comment handling in those
nodes, and influencing leading comments is even harder, since they're
all formatted _before_ the node itself.

Closes https://github.com/astral-sh/ruff/issues/6761.

## Test Plan

`cargo test`

Surprisingly, it doesn't change the similarity at all (apart from a
0.00001 change in CPython), but I manually confirmed that it did fix the
originating issue in Django.

Before:

| project      | similarity index |
|--------------|------------------|
| cpython      | 0.76082          |
| django       | 0.99921          |
| transformers | 0.99854          |
| twine        | 0.99982          |
| typeshed     | 0.99953          |
| warehouse    | 0.99648          |
| zulip        | 0.99928          |


After:

| project      | similarity index |
|--------------|------------------|
| cpython      | 0.76081          |
| django       | 0.99921          |
| transformers | 0.99854          |
| twine        | 0.99982          |
| typeshed     | 0.99953          |
| warehouse    | 0.99648          |
| zulip        | 0.99928          |
2023-08-31 20:55:05 +00:00
Charlie Marsh a3d4f08f29
Add general support for parenthesized comments on expressions (#6485)
## Summary

This PR adds support for parenthesized comments. A parenthesized comment
is a comment that appears within a parenthesis, but not within the range
of the expression enclosed by the parenthesis. For example, the comment
here is a parenthesized comment:

```python
if (
    # comment
    True
):
    ...
```

The parentheses enclose the `True`, but the range of `True` doesn’t
include the `# comment`.

There are at least two problems associated with parenthesized comments:
(1) associating the comment with the correct (i.e., enclosed) node; and
(2) formatting the comment correctly, once it has been associated with
the enclosed node.

The solution proposed here for (1) is to search for parentheses between
preceding and following node, and use open and close parentheses to
break ties, rather than always assigning to the preceding node.

For (2), we handle these special parenthesized comments in `FormatExpr`.
The biggest risk with this approach is that we forget some codepath that
force-disables parenthesization (by passing in `Parentheses::Never`).
I've audited all usages of that enum and added additional handling +
test coverage for such cases.

Closes https://github.com/astral-sh/ruff/issues/6390.

## Test Plan

`cargo test` with new cases.

Before:

| project      | similarity index |
|--------------|------------------|
| build        | 0.75623          |
| cpython      | 0.75472          |
| django       | 0.99804          |
| transformers | 0.99618          |
| typeshed     | 0.74233          |
| warehouse    | 0.99601          |
| zulip        | 0.99727          |

After:

| project      | similarity index |
|--------------|------------------|
| build        | 0.75623          |
| cpython      | 0.75472          |
| django       | 0.99804          |
| transformers | 0.99618          |
| typeshed     | 0.74237          |
| warehouse    | 0.99601          |
| zulip        | 0.99727          |
2023-08-15 18:59:18 +00:00
Victor Hugo Gomes b05574babd
Fix formatter instability with half-indented comment (#6460)
## Summary
The bug was happening in this
[loop](75f402eb82/crates/ruff_python_formatter/src/comments/placement.rs (L545)).

Basically, In the first iteration of the loop, the `comment_indentation`
is bigger than `child_indentation` (`comment_indentation` is 7 and
`child_indentation` is 4) making the `Ordering::Greater` branch execute.
Inside the `Ordering::Greater` branch, the `if` block gets executed,
resulting in the update of these variables.
```rust
parent_body = current_body;                    
current_body = Some(last_child_in_current_body);
last_child_in_current_body = nested_child;
```
In the second iteration of the loop, `comment_indentation` is smaller
than `child_indentation` (`comment_indentation` is 7 and
`child_indentation` is 8) making the `Ordering::Less` branch execute.
Inside the `Ordering::Less` branch, the `if` block gets executed, this
is where the bug was happening. At this point `parent_body` should be a
`StmtFunctionDef` but it was a `StmtClassDef`. Causing the comment to be
incorrectly formatted.

That happened for the following code:
```python
class A:
    def f():
        pass
       # strangely indented comment

print()
```

There is only one problem that I couldn't figure it out a solution, the
variable `current_body` in this
[line](75f402eb82/crates/ruff_python_formatter/src/comments/placement.rs (L542C5-L542C49))
now gives this warning _"value assigned to `current_body` is never read
maybe it is overwritten before being read?"_
Any tips on how to solve that?

Closes #5337

## Test Plan

Add new test case.

---------

Co-authored-by: konstin <konstin@mailbox.org>
2023-08-11 11:21:16 +00:00
Charlie Marsh adc8bb7821
Rename `Arguments` to `Parameters` in the AST (#6253)
## Summary

This PR renames a few AST nodes for clarity:

- `Arguments` is now `Parameters`
- `Arg` is now `Parameter`
- `ArgWithDefault` is now `ParameterWithDefault`

For now, the attribute names that reference `Parameters` directly are
changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters`
itself are not (e.g., `vararg`). We may revisit that decision in the
future.

For context, the AST node formerly known as `Arguments` is used in
function definitions. Formally (outside of the Python context),
"arguments" typically refers to "the values passed to a function", while
"parameters" typically refers to "the variables used in a function
definition". E.g., if you Google "arguments vs parameters", you'll get
some explanation like:

> A parameter is a variable in a function definition. It is a
placeholder and hence does not have a concrete value. An argument is a
value passed during function invocation.

We're thus deviating from Python's nomenclature in favor of a scheme
that we find to be more precise.
2023-08-01 13:53:28 -04:00
konsti 13f9a16e33
Rewrite placement logic (#6040)
## Summary
This is a rewrite of the main comment placement logic. `place_comment`
now has three parts:

- place own line comments
  - between branches
  - after a branch
- place end-of-line comments
  - after colon
  - after a branch
- place comments for specific nodes (that include module level comments)

The rewrite fixed three bugs: `class A: # trailing comment` comments now
stay end-of-line, `try: # comment` remains end-of-line and deeply
indented try-else-finally comments remain with the right nested
statement.

It will be much easier to give more alternative branches nodes since
this is abstracted away by `is_node_with_body` and the first/last child
helpers. Adding new node types can now be done by adding an entry to the
`place_comment` match. The code went from 1526 lines before #6033 to
1213 lines now.

It thinks it easier to just read the new `placement.rs` rather than
reviewing the diff.

## Test Plan

The existing fixtures staying the same or improving plus new ones for
the bug fixes.
2023-07-26 16:21:23 +00:00
konsti 730e6b2b4c
Refactor `StmtIf`: Formatter and Linter (#5459)
## Summary

Previously, `StmtIf` was defined recursively as
```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub orelse: Vec<Stmt>,
}
```
Every `elif` was represented as an `orelse` with a single `StmtIf`. This
means that this representation couldn't differentiate between
```python
if cond1:
    x = 1
else:
    if cond2:
        x = 2
```
and 
```python
if cond1:
    x = 1
elif cond2:
    x = 2
```
It also makes many checks harder than they need to be because we have to
recurse just to iterate over an entire if-elif-else and because we're
lacking nodes and ranges on the `elif` and `else` branches.

We change the representation to a flat

```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub elif_else_clauses: Vec<ElifElseClause>,
}

pub struct ElifElseClause {
    pub range: TextRange,
    pub test: Option<Expr>,
    pub body: Vec<Stmt>,
}
```
where `test: Some(_)` represents an `elif` and `test: None` an else.

This representation is different tradeoff, e.g. we need to allocate the
`Vec<ElifElseClause>`, the `elif`s are now different than the `if`s
(which matters in rules where want to check both `if`s and `elif`s) and
the type system doesn't guarantee that the `test: None` else is actually
last. We're also now a bit more inconsistent since all other `else`,
those from `for`, `while` and `try`, still don't have nodes. With the
new representation some things became easier, e.g. finding the `elif`
token (we can use the start of the `ElifElseClause`) and formatting
comments for if-elif-else (no more dangling comments splitting, we only
have to insert the dangling comment after the colon manually and set
`leading_alternate_branch_comments`, everything else is taken of by
having nodes for each branch and the usual placement.rs fixups).

## Merge Plan

This PR requires coordination between the parser repo and the main ruff
repo. I've split the ruff part, into two stacked PRs which have to be
merged together (only the second one fixes all tests), the first for the
formatter to be reviewed by @michareiser and the second for the linter
to be reviewed by @charliermarsh.

* MH: Review and merge
https://github.com/astral-sh/RustPython-Parser/pull/20
* MH: Review and merge or move later in stack
https://github.com/astral-sh/RustPython-Parser/pull/21
* MH: Review and approve
https://github.com/astral-sh/RustPython-Parser/pull/22
* MH: Review and approve formatter PR
https://github.com/astral-sh/ruff/pull/5459
* CM: Review and approve linter PR
https://github.com/astral-sh/ruff/pull/5460
* Merge linter PR in formatter PR, fix ecosystem checks (ecosystem
checks can't run on the formatter PR and won't run on the linter PR, so
we need to merge them first)
 * Merge https://github.com/astral-sh/RustPython-Parser/pull/22
 * Create tag in the parser, update linter+formatter PR
 * Merge linter+formatter PR https://github.com/astral-sh/ruff/pull/5459

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2023-07-18 13:40:15 +02:00
Micha Reiser e520a3a721
Fix ArgWithDefault comments handling (#5204) 2023-06-20 20:48:07 +00:00
Charlie Marsh 36e01ad6eb
Upgrade RustPython (#5192)
## Summary

This PR upgrade RustPython to pull in the changes to `Arguments` (zip
defaults with their identifiers) and all the renames to `CmpOp` and
friends.
2023-06-19 21:09:53 +00:00
Micha Reiser 68969240c5
Format Function definitions (#4951) 2023-06-08 16:07:33 +00:00
Micha Reiser 39a1f3980f
Upgrade RustPython (#4900) 2023-06-08 05:53:14 +00:00
Micha Reiser c65f47d7c4
Format `while` Statement (#4810) 2023-06-05 08:24:00 +00:00
Micha Reiser d6daa61563
Handle trailing end-of-line comments in-between-bodies (#4812)
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

And more custom logic around comments in bodies... uff. 

Let's say we have the following code

```python
if x == y:
    pass # trailing comment of pass
else: # trailing comment of `else`
    print("I have no comments")
```

Right now, the formatter attaches the `# trailing comment of `else` as a trailing comment of `pass` because it doesn't "see" that there's an `else` keyword in between (because the else body is just a Vec and not a node). 

This PR adds custom logic that attaches the trailing comments after the `else` as dangling comments to the `if` statement. The if statement must then split the dangling comments by `comments.text_position()`:
* All comments up to the first end-of-line comment are leading comments of the `else` keyword.
* All end-of-line comments coming after are `trailing` comments for the `else` keyword.


## Test Plan

I added new unit tests.
2023-06-03 15:29:22 +02:00
Micha Reiser cb6788ab5f
Handle trailing body end-of-line comments (#4811)
### Summary

This PR adds custom logic to handle end-of-line comments of the last statement in a body. 

For example: 

```python
while True:
    if something.changed:
        do.stuff()  # trailing comment

b
```

The `# trailing comment` is a trailing comment of the `do.stuff()` expression statement. We incorrectly attached the comment as a trailing comment of the enclosing `while` statement  because the comment is between the end of the while statement (the `while` statement ends right after `do.stuff()`) and before the `b` statement. 


This PR fixes the placement to correctly attach these comments to the last statement in a body (recursively). 

## Test Plan

I reviewed the snapshots and they now look correct. This may appear odd because a lot comments have now disappeared. This is the expected result because we use `verbatim` formatting for the block statements (like `while`) and that means that it only formats the inner content of the block, but not any trailing comments. The comments were visible before, because they were associated with the block statement (e.g. `while`).
2023-06-03 15:17:33 +02:00
Micha Reiser 59148344be
Place comments of left and right binary expression operands (#4751) 2023-06-01 07:01:32 +00:00
Micha Reiser b7294b48e7
Handle positional-only-arguments separator comments (#4748) 2023-06-01 06:22:49 +00:00
Micha Reiser be31d71849
Correctly associate own-line comments in bodies (#4671) 2023-06-01 08:12:53 +02:00
Micha Reiser 0cd453bdf0
Generic "comment to node" association logic (#4642) 2023-05-30 09:28:01 +00:00
Micha Reiser 84a5584888
Add `Comments` data structure (#4641) 2023-05-30 08:54:55 +00:00