Python/ruff - ruff - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Micha Reiser	c847cad389	Update insta snapshots (#14366 )	2024-11-15 19:31:15 +01:00
renovate[bot]	53a80a5c11	Update Rust crate rustc-hash to v2 (#12001 )	2024-06-23 20:46:42 -04:00
Dhruv Manilawala	13ffb5bc19	Replace LALRPOP parser with hand-written parser (#10036 ) (Supersedes #9152, authored by @LaBatata101) ## Summary This PR replaces the current parser generated from LALRPOP to a hand-written recursive descent parser. It also updates the grammar for [PEP 646](https://peps.python.org/pep-0646/) so that the parser outputs the correct AST. For example, in `data[*x]`, the index expression is now a tuple with a single starred expression instead of just a starred expression. Beyond the performance improvements, the parser is also error resilient and can provide better error messages. The behavior as seen by any downstream tools isn't changed. That is, the linter and formatter can still assume that the parser will _stop_ at the first syntax error. This will be updated in the following months. For more details about the change here, refer to the PR corresponding to the individual commits and the release blog post. ## Test Plan Write _lots_ and _lots_ of tests for both valid and invalid syntax and verify the output. ## Acknowledgements - @MichaReiser for reviewing 100+ parser PRs and continuously providing guidance throughout the project - @LaBatata101 for initiating the transition to a hand-written parser in #9152 - @addisoncrump for implementing the fuzzer which helped [catch](https://github.com/astral-sh/ruff/pull/10903) [a](https://github.com/astral-sh/ruff/pull/10910) [lot](https://github.com/astral-sh/ruff/pull/10966) [of](https://github.com/astral-sh/ruff/pull/10896) [bugs](https://github.com/astral-sh/ruff/pull/10877) --------- Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org> Co-authored-by: Micha Reiser <micha@reiser.io>	2024-04-18 17:57:39 +05:30
Dhruv Manilawala	230c9ce236	Split `Constant` to individual literal nodes (#8064 ) ## Summary This PR splits the `Constant` enum as individual literal nodes. It introduces the following new nodes for each variant: * `ExprStringLiteral` * `ExprBytesLiteral` * `ExprNumberLiteral` * `ExprBooleanLiteral` * `ExprNoneLiteral` * `ExprEllipsisLiteral` The main motivation behind this refactor is to introduce the new AST node for implicit string concatenation in the coming PR. The elements of that node will be either a string literal, bytes literal or a f-string which can be implemented using an enum. This means that a string or bytes literal cannot be represented by `Constant::Str` / `Constant::Bytes` which creates an inconsistency. This PR avoids that inconsistency by splitting the constant nodes into it's own literal nodes, literal being the more appropriate naming convention from a static analysis tool perspective. This also makes working with literals in the linter and formatter much more ergonomic like, for example, if one would want to check if this is a string literal, it can be done easily using `Expr::is_string_literal_expr` or matching against `Expr::StringLiteral` as oppose to matching against the `ExprConstant` and enum `Constant`. A few AST helper methods can be simplified as well which will be done in a follow-up PR. This introduces a new `Expr::is_literal_expr` method which is the same as `Expr::is_constant_expr`. There are also intermediary changes related to implicit string concatenation which are quiet less. This is done so as to avoid having a huge PR which this already is. ## Test Plan 1. Verify and update all of the existing snapshots (parser, visitor) 2. Verify that the ecosystem check output remains unchanged for both the linter and formatter ### Formatter ecosystem check #### `main` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \| #### `dhruv/constant-to-literal` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \|	2023-10-30 12:13:23 +05:30
Micha Reiser	26ae0a6e8d	Fix dangling module comments (#7456 )	2023-09-17 14:56:41 +00:00
Charlie Marsh	376d3caf47	Treat empty-line separated comments as trailing statement comments (#6999 ) ## Summary This PR modifies our between-statement comment handling such that comments that are not separated by a statement by any newlines continue to be treated as leading comments on the statement, but comments that _are_ separated are instead formatted as trailing comments on the preceding statement. See, e.g., the originating snippet: ```python DEFAULT_TEMPLATE = "flatpages/default.html" # This view is called from FlatpageFallbackMiddleware.process_response # when a 404 is raised, which often means CsrfViewMiddleware.process_view # has not been called even if CsrfViewMiddleware is installed. So we need # to use @csrf_protect, in case the template needs {% csrf_token %}. # However, we can't just wrap this view; if no matching flatpage exists, # or a redirect is required for authentication, the 404 needs to be returned # without any CSRF checks. Therefore, we only # CSRF protect the internal implementation. def flatpage(request, url): pass ``` Here, we need to ensure that the `def flatpage` is precede by two empty lines. However, we want those two empty lines to be enforced from the _end_ of the comment block, _unless_ the comments are directly atop the `def flatpage`. I played with this a bit, and I think the simplest conceptual model and implementation is to instead treat those as trailing comments on the preceding node. The main difficulty with this approach is that, in order to be fully compatible with Black, we'd sometimes need to insert newlines _between_ the preceding node and its trailing comments. See, e.g.: ```python def func(): ... # comment x = 1 ``` In this case, we'd need to insert two blank lines between `def func(): ...` and `# comment`, but `# comment` is trailing comment on `def func(): ...`. So, we'd need to take this case into account in the various nodes that _require_ newlines after them: functions, classes, and imports. After some discussion, we've opted _not_ to support this, and just treat these as trailing comments -- so we won't insert newlines there. This means our handling is still identical to Black's on Black-formatted code, but avoids moving such trailing comments on unformatted code. I dislike that the empty handling is so complex, and that it's split between so many different nodes, but this is really tricky. Continuing to treat these as leading comments is very difficult too, since we'd need to do similar tricks for the leading comment handling in those nodes, and influencing leading comments is even harder, since they're all formatted _before_ the node itself. Closes https://github.com/astral-sh/ruff/issues/6761. ## Test Plan `cargo test` Surprisingly, it doesn't change the similarity at all (apart from a 0.00001 change in CPython), but I manually confirmed that it did fix the originating issue in Django. Before: \| project \| similarity index \| \|--------------\|------------------\| \| cpython \| 0.76082 \| \| django \| 0.99921 \| \| transformers \| 0.99854 \| \| twine \| 0.99982 \| \| typeshed \| 0.99953 \| \| warehouse \| 0.99648 \| \| zulip \| 0.99928 \| After: \| project \| similarity index \| \|--------------\|------------------\| \| cpython \| 0.76081 \| \| django \| 0.99921 \| \| transformers \| 0.99854 \| \| twine \| 0.99982 \| \| typeshed \| 0.99953 \| \| warehouse \| 0.99648 \| \| zulip \| 0.99928 \|	2023-08-31 20:55:05 +00:00
Charlie Marsh	a3d4f08f29	Add general support for parenthesized comments on expressions (#6485 ) ## Summary This PR adds support for parenthesized comments. A parenthesized comment is a comment that appears within a parenthesis, but not within the range of the expression enclosed by the parenthesis. For example, the comment here is a parenthesized comment: ```python if ( # comment True ): ... ``` The parentheses enclose the `True`, but the range of `True` doesn’t include the `# comment`. There are at least two problems associated with parenthesized comments: (1) associating the comment with the correct (i.e., enclosed) node; and (2) formatting the comment correctly, once it has been associated with the enclosed node. The solution proposed here for (1) is to search for parentheses between preceding and following node, and use open and close parentheses to break ties, rather than always assigning to the preceding node. For (2), we handle these special parenthesized comments in `FormatExpr`. The biggest risk with this approach is that we forget some codepath that force-disables parenthesization (by passing in `Parentheses::Never`). I've audited all usages of that enum and added additional handling + test coverage for such cases. Closes https://github.com/astral-sh/ruff/issues/6390. ## Test Plan `cargo test` with new cases. Before: \| project \| similarity index \| \|--------------\|------------------\| \| build \| 0.75623 \| \| cpython \| 0.75472 \| \| django \| 0.99804 \| \| transformers \| 0.99618 \| \| typeshed \| 0.74233 \| \| warehouse \| 0.99601 \| \| zulip \| 0.99727 \| After: \| project \| similarity index \| \|--------------\|------------------\| \| build \| 0.75623 \| \| cpython \| 0.75472 \| \| django \| 0.99804 \| \| transformers \| 0.99618 \| \| typeshed \| 0.74237 \| \| warehouse \| 0.99601 \| \| zulip \| 0.99727 \|	2023-08-15 18:59:18 +00:00
Victor Hugo Gomes	b05574babd	Fix formatter instability with half-indented comment (#6460 ) ## Summary The bug was happening in this [loop](`75f402eb82/crates/ruff_python_formatter/src/comments/placement.rs (L545)`). Basically, In the first iteration of the loop, the `comment_indentation` is bigger than `child_indentation` (`comment_indentation` is 7 and `child_indentation` is 4) making the `Ordering::Greater` branch execute. Inside the `Ordering::Greater` branch, the `if` block gets executed, resulting in the update of these variables. ```rust parent_body = current_body; current_body = Some(last_child_in_current_body); last_child_in_current_body = nested_child; ``` In the second iteration of the loop, `comment_indentation` is smaller than `child_indentation` (`comment_indentation` is 7 and `child_indentation` is 8) making the `Ordering::Less` branch execute. Inside the `Ordering::Less` branch, the `if` block gets executed, this is where the bug was happening. At this point `parent_body` should be a `StmtFunctionDef` but it was a `StmtClassDef`. Causing the comment to be incorrectly formatted. That happened for the following code: ```python class A: def f(): pass # strangely indented comment print() ``` There is only one problem that I couldn't figure it out a solution, the variable `current_body` in this [line](`75f402eb82/crates/ruff_python_formatter/src/comments/placement.rs (L542C5-L542C49)`) now gives this warning _"value assigned to `current_body` is never read maybe it is overwritten before being read?"_ Any tips on how to solve that? Closes #5337 ## Test Plan Add new test case. --------- Co-authored-by: konstin <konstin@mailbox.org>	2023-08-11 11:21:16 +00:00
Charlie Marsh	adc8bb7821	Rename `Arguments` to `Parameters` in the AST (#6253 ) ## Summary This PR renames a few AST nodes for clarity: - `Arguments` is now `Parameters` - `Arg` is now `Parameter` - `ArgWithDefault` is now `ParameterWithDefault` For now, the attribute names that reference `Parameters` directly are changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters` itself are not (e.g., `vararg`). We may revisit that decision in the future. For context, the AST node formerly known as `Arguments` is used in function definitions. Formally (outside of the Python context), "arguments" typically refers to "the values passed to a function", while "parameters" typically refers to "the variables used in a function definition". E.g., if you Google "arguments vs parameters", you'll get some explanation like: > A parameter is a variable in a function definition. It is a placeholder and hence does not have a concrete value. An argument is a value passed during function invocation. We're thus deviating from Python's nomenclature in favor of a scheme that we find to be more precise.	2023-08-01 13:53:28 -04:00
konsti	13f9a16e33	Rewrite placement logic (#6040 ) ## Summary This is a rewrite of the main comment placement logic. `place_comment` now has three parts: - place own line comments - between branches - after a branch - place end-of-line comments - after colon - after a branch - place comments for specific nodes (that include module level comments) The rewrite fixed three bugs: `class A: # trailing comment` comments now stay end-of-line, `try: # comment` remains end-of-line and deeply indented try-else-finally comments remain with the right nested statement. It will be much easier to give more alternative branches nodes since this is abstracted away by `is_node_with_body` and the first/last child helpers. Adding new node types can now be done by adding an entry to the `place_comment` match. The code went from 1526 lines before #6033 to 1213 lines now. It thinks it easier to just read the new `placement.rs` rather than reviewing the diff. ## Test Plan The existing fixtures staying the same or improving plus new ones for the bug fixes.	2023-07-26 16:21:23 +00:00
konsti	730e6b2b4c	Refactor `StmtIf`: Formatter and Linter (#5459 ) ## Summary Previously, `StmtIf` was defined recursively as ```rust pub struct StmtIf { pub range: TextRange, pub test: Box<Expr>, pub body: Vec<Stmt>, pub orelse: Vec<Stmt>, } ``` Every `elif` was represented as an `orelse` with a single `StmtIf`. This means that this representation couldn't differentiate between ```python if cond1: x = 1 else: if cond2: x = 2 ``` and ```python if cond1: x = 1 elif cond2: x = 2 ``` It also makes many checks harder than they need to be because we have to recurse just to iterate over an entire if-elif-else and because we're lacking nodes and ranges on the `elif` and `else` branches. We change the representation to a flat ```rust pub struct StmtIf { pub range: TextRange, pub test: Box<Expr>, pub body: Vec<Stmt>, pub elif_else_clauses: Vec<ElifElseClause>, } pub struct ElifElseClause { pub range: TextRange, pub test: Option<Expr>, pub body: Vec<Stmt>, } ``` where `test: Some(_)` represents an `elif` and `test: None` an else. This representation is different tradeoff, e.g. we need to allocate the `Vec<ElifElseClause>`, the `elif`s are now different than the `if`s (which matters in rules where want to check both `if`s and `elif`s) and the type system doesn't guarantee that the `test: None` else is actually last. We're also now a bit more inconsistent since all other `else`, those from `for`, `while` and `try`, still don't have nodes. With the new representation some things became easier, e.g. finding the `elif` token (we can use the start of the `ElifElseClause`) and formatting comments for if-elif-else (no more dangling comments splitting, we only have to insert the dangling comment after the colon manually and set `leading_alternate_branch_comments`, everything else is taken of by having nodes for each branch and the usual placement.rs fixups). ## Merge Plan This PR requires coordination between the parser repo and the main ruff repo. I've split the ruff part, into two stacked PRs which have to be merged together (only the second one fixes all tests), the first for the formatter to be reviewed by @michareiser and the second for the linter to be reviewed by @charliermarsh. * MH: Review and merge https://github.com/astral-sh/RustPython-Parser/pull/20 * MH: Review and merge or move later in stack https://github.com/astral-sh/RustPython-Parser/pull/21 * MH: Review and approve https://github.com/astral-sh/RustPython-Parser/pull/22 * MH: Review and approve formatter PR https://github.com/astral-sh/ruff/pull/5459 * CM: Review and approve linter PR https://github.com/astral-sh/ruff/pull/5460 * Merge linter PR in formatter PR, fix ecosystem checks (ecosystem checks can't run on the formatter PR and won't run on the linter PR, so we need to merge them first) * Merge https://github.com/astral-sh/RustPython-Parser/pull/22 * Create tag in the parser, update linter+formatter PR * Merge linter+formatter PR https://github.com/astral-sh/ruff/pull/5459 --------- Co-authored-by: Micha Reiser <micha@reiser.io>	2023-07-18 13:40:15 +02:00
Micha Reiser	e520a3a721	Fix ArgWithDefault comments handling (#5204 )	2023-06-20 20:48:07 +00:00
Charlie Marsh	36e01ad6eb	Upgrade RustPython (#5192 ) ## Summary This PR upgrade RustPython to pull in the changes to `Arguments` (zip defaults with their identifiers) and all the renames to `CmpOp` and friends.	2023-06-19 21:09:53 +00:00
Micha Reiser	68969240c5	Format Function definitions (#4951 )	2023-06-08 16:07:33 +00:00
Micha Reiser	39a1f3980f	Upgrade RustPython (#4900 )	2023-06-08 05:53:14 +00:00
Micha Reiser	c65f47d7c4	Format `while` Statement (#4810 )	2023-06-05 08:24:00 +00:00
Micha Reiser	d6daa61563	Handle trailing end-of-line comments in-between-bodies (#4812 ) <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary And more custom logic around comments in bodies... uff. Let's say we have the following code ```python if x == y: pass # trailing comment of pass else: # trailing comment of `else` print("I have no comments") ``` Right now, the formatter attaches the `# trailing comment of `else` as a trailing comment of `pass` because it doesn't "see" that there's an `else` keyword in between (because the else body is just a Vec and not a node). This PR adds custom logic that attaches the trailing comments after the `else` as dangling comments to the `if` statement. The if statement must then split the dangling comments by `comments.text_position()`: * All comments up to the first end-of-line comment are leading comments of the `else` keyword. * All end-of-line comments coming after are `trailing` comments for the `else` keyword. ## Test Plan I added new unit tests.	2023-06-03 15:29:22 +02:00
Micha Reiser	cb6788ab5f	Handle trailing body end-of-line comments (#4811 ) ### Summary This PR adds custom logic to handle end-of-line comments of the last statement in a body. For example: ```python while True: if something.changed: do.stuff() # trailing comment b ``` The `# trailing comment` is a trailing comment of the `do.stuff()` expression statement. We incorrectly attached the comment as a trailing comment of the enclosing `while` statement because the comment is between the end of the while statement (the `while` statement ends right after `do.stuff()`) and before the `b` statement. This PR fixes the placement to correctly attach these comments to the last statement in a body (recursively). ## Test Plan I reviewed the snapshots and they now look correct. This may appear odd because a lot comments have now disappeared. This is the expected result because we use `verbatim` formatting for the block statements (like `while`) and that means that it only formats the inner content of the block, but not any trailing comments. The comments were visible before, because they were associated with the block statement (e.g. `while`).	2023-06-03 15:17:33 +02:00
Micha Reiser	59148344be	Place comments of left and right binary expression operands (#4751 )	2023-06-01 07:01:32 +00:00
Micha Reiser	b7294b48e7	Handle positional-only-arguments separator comments (#4748 )	2023-06-01 06:22:49 +00:00
Micha Reiser	be31d71849	Correctly associate own-line comments in bodies (#4671 )	2023-06-01 08:12:53 +02:00
Micha Reiser	0cd453bdf0	Generic "comment to node" association logic (#4642 )	2023-05-30 09:28:01 +00:00
Micha Reiser	84a5584888	Add `Comments` data structure (#4641 )	2023-05-30 08:54:55 +00:00

23 Commits