Python/ruff - ruff - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Dylan	4e1cf5747a	Fluent formatting of method chains (#21369 ) This PR implements a modification (in preview) to fluent formatting for method chains: We break _at_ the first call instead of _after_. For example, we have the following diff between `main` and this PR (with `line-length=8` so I don't have to stretch out the text): ```diff x = ( - df.merge() + df + .merge() .groupby() .agg() .filter() ) ``` ## Explanation of current implementation Recall that we traverse the AST to apply formatting. A method chain, while read left-to-right, is stored in the AST "in reverse". So if we start with something like ```python a.b.c.d().e.f() ``` then the first syntax node we meet is essentially `.f()`. So we have to peek ahead. And we actually _already_ do this in our current fluent formatting logic: we peek ahead to count how many calls we have in the chain to see whether we should be using fluent formatting or now. In this implementation, we actually _record_ this number inside the enum for `CallChainLayout`. That is, we make the variant `Fluent` hold an `AttributeState`. This state can either be: - The number of call-like attributes preceding the current attribute - The state `FirstCallOrSubscript` which means we are at the first call-like attribute in the chain (reading from left to right) - The state `BeforeFirstCallOrSubscript` which means we are in the "first group" of attributes, preceding that first call. In our example, here's what it looks like at each attribute: ``` a.b.c.d().e.f @ Fluent(CallsOrSubscriptsPreceding(1)) a.b.c.d().e @ Fluent(CallsOrSubscriptsPreceding(1)) a.b.c.d @ Fluent(FirstCallOrSubscript) a.b.c @ Fluent(BeforeFirstCallOrSubscript) a.b @ Fluent(BeforeFirstCallOrSubscript) ``` Now, as we descend down from the parent expression, we pass along this little piece of state and modify it as we go to track where we are. This state doesn't do anything except when we are in `FirstCallOrSubscript`, in which case we add a soft line break. Closes #8598 --------- Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>	2025-12-15 09:29:50 -06:00
Dylan	8156b45173	Avoid syntax error when formatting attribute expressions with outer parentheses, parenthesized value, and trailing comment on value (#20418 ) Closes #19350 This fixes a syntax error caused by formatting. However, the new tests reveal that there are some cases where formatting attributes with certain comments behaves strangely, both before and after this PR, so some more polish may be in order. For example, without parentheses around the value, and both before and after this PR, we have: ```python # unformatted variable = ( something # a comment .first_method("some string") ) # formatted variable = something.first_method("some string") # a comment ``` which is probably not where the comment ought to go.	2025-11-17 09:11:36 -06:00
Ibraheem Ahmed	c9dff5c7d5	[ty] AST garbage collection (#18482 ) ## Summary Garbage collect ASTs once we are done checking a given file. Queries with a cross-file dependency on the AST will reparse the file on demand. This reduces ty's peak memory usage by ~20-30%. The primary change of this PR is adding a `node_index` field to every AST node, that is assigned by the parser. `ParsedModule` can use this to create a flat index of AST nodes any time the file is parsed (or reparsed). This allows `AstNodeRef` to simply index into the current instance of the `ParsedModule`, instead of storing a pointer directly. The indices are somewhat hackily (using an atomic integer) assigned by the `parsed_module` query instead of by the parser directly. Assigning the indices in source-order in the (recursive) parser turns out to be difficult, and collecting the nodes during semantic indexing is impossible as `SemanticIndex` does not hold onto a specific `ParsedModuleRef`, which the pointers in the flat AST are tied to. This means that we have to do an extra AST traversal to assign and collect the nodes into a flat index, but the small performance impact (~3% on cold runs) seems worth it for the memory savings. Part of https://github.com/astral-sh/ty/issues/214.	2025-06-13 08:40:11 -04:00
Micha Reiser	9ae698fe30	Switch to Rust 2024 edition (#18129 )	2025-05-16 13:25:28 +02:00
Micha Reiser	31180a84e4	Fix unstable formatting of trailing end-of-line comments of parenthesized attribute values (#16187 )	2025-02-18 08:43:51 +01:00
Micha Reiser	6a1e555537	Upgrade to Rust 1.78 (#11260 )	2024-05-03 12:46:21 +00:00
Micha Reiser	230c93459f	Delete redundant branch in `NeedsParentheses` (#8377 )	2023-10-31 12:06:17 +00:00
Dhruv Manilawala	230c9ce236	Split `Constant` to individual literal nodes (#8064 ) ## Summary This PR splits the `Constant` enum as individual literal nodes. It introduces the following new nodes for each variant: * `ExprStringLiteral` * `ExprBytesLiteral` * `ExprNumberLiteral` * `ExprBooleanLiteral` * `ExprNoneLiteral` * `ExprEllipsisLiteral` The main motivation behind this refactor is to introduce the new AST node for implicit string concatenation in the coming PR. The elements of that node will be either a string literal, bytes literal or a f-string which can be implemented using an enum. This means that a string or bytes literal cannot be represented by `Constant::Str` / `Constant::Bytes` which creates an inconsistency. This PR avoids that inconsistency by splitting the constant nodes into it's own literal nodes, literal being the more appropriate naming convention from a static analysis tool perspective. This also makes working with literals in the linter and formatter much more ergonomic like, for example, if one would want to check if this is a string literal, it can be done easily using `Expr::is_string_literal_expr` or matching against `Expr::StringLiteral` as oppose to matching against the `ExprConstant` and enum `Constant`. A few AST helper methods can be simplified as well which will be done in a follow-up PR. This introduces a new `Expr::is_literal_expr` method which is the same as `Expr::is_constant_expr`. There are also intermediary changes related to implicit string concatenation which are quiet less. This is done so as to avoid having a huge PR which this already is. ## Test Plan 1. Verify and update all of the existing snapshots (parser, visitor) 2. Verify that the ecosystem check output remains unchanged for both the linter and formatter ### Formatter ecosystem check #### `main` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \| #### `dhruv/constant-to-literal` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \|	2023-10-30 12:13:23 +05:30
Micha Reiser	8b665f40c8	Avoid parenthesizing octal/hex or binary literals in object positions (#8160 )	2023-10-24 15:12:52 +01:00
Charlie Marsh	d685107638	Move {AnyNodeRef, AstNode} to ruff_python_ast crate root (#8030 ) This is a do-over of https://github.com/astral-sh/ruff/pull/8011, which I accidentally merged into a non-`main` branch. Sorry!	2023-10-18 00:01:18 +00:00
konsti	2cbe1733c8	Use CommentRanges in backwards lexing (#7360 ) ## Summary The tokenizer was split into a forward and a backwards tokenizer. The backwards tokenizer uses the same names as the forwards ones (e.g. `next_token`). The backwards tokenizer gets the comment ranges that we already built to skip comments. --------- Co-authored-by: Micha Reiser <micha@reiser.io>	2023-09-16 03:21:45 +00:00
Charlie Marsh	11287f944f	Avoid re-parenthesizing call chains whose inner values are parenthesized (#7373 ) ## Summary Given a statement like: ```python result = ( f(111111111111111111111111111111111111111111111111111111111111111111111111111111111) + 1 )() ``` When we go to parenthesize the target of the assignment, we use `maybe_parenthesize_expression` with `Parenthesize::IfBreaks`. This then checks if the call on the right-hand side needs to be parenthesized, the implementation of which looks like: ```rust impl NeedsParentheses for ExprCall { fn needs_parentheses( &self, _parent: AnyNodeRef, context: &PyFormatContext, ) -> OptionalParentheses { if CallChainLayout::from_expression(self.into(), context.source()) == CallChainLayout::Fluent { OptionalParentheses::Multiline } else if context.comments().has_dangling(self) { OptionalParentheses::Always } else { self.func.needs_parentheses(self.into(), context) } } } ``` Checking for `self.func.needs_parentheses(self.into(), context)` is problematic, since, as in the example above, `self.func` may _already_ be parenthesized -- in which case, we _don't_ want to parenthesize the entire expression. If we do, we end up with this non-ideal formatting: ```python result = ( ( f( 111111111111111111111111111111111111111111111111111111111111111111111111111111111 ) + 1 )() ) ``` This PR modifies the `NeedsParentheses` implementations for call chain expressions to return `Never` if the inner expression has its own parentheses, in which case, the formatting implementations for those expressions will preserve them anyway. Closes https://github.com/astral-sh/ruff/issues/7370. ## Test Plan Zulip improves a bit, everything else is unchanged. Before: \| project \| similarity index \| total files \| changed files \| \|--------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.76083 \| 1789 \| 1632 \| \| django \| 0.99981 \| 2760 \| 40 \| \| transformers \| 0.99944 \| 2587 \| 413 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99983 \| 3496 \| 18 \| \| warehouse \| 0.99834 \| 648 \| 20 \| \| zulip \| 0.99956 \| 1437 \| 23 \| After: \| project \| similarity index \| total files \| changed files \| \|--------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.76083 \| 1789 \| 1632 \| \| django \| 0.99981 \| 2760 \| 40 \| \| transformers \| 0.99944 \| 2587 \| 413 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99983 \| 3496 \| 18 \| \| warehouse \| 0.99834 \| 648 \| 20 \| \| zulip \| 0.99962 \| 1437 \| 22 \|	2023-09-14 05:05:37 -04:00
Charlie Marsh	ece30e7c69	Preserve parentheses around partial call chains (#7109 )	2023-09-04 10:57:04 +01:00
Micha Reiser	c05e4628b1	Introduce Token element (#7048 )	2023-09-02 10:05:47 +02:00
Charlie Marsh	fc89976c24	Move `Ranged` into `ruff_text_size` (#6919 ) ## Summary The motivation here is that this enables us to implement `Ranged` in crates that don't depend on `ruff_python_ast`. Largely a mechanical refactor with a lot of regex, Clippy help, and manual fixups. ## Test Plan `cargo test`	2023-08-27 14:12:51 -04:00
Charlie Marsh	edb9b0c62a	Use the formatter prelude in more files (#6882 ) Removes a bunch of imports that are made redundant by the prelude.	2023-08-25 16:51:07 -04:00
Micha Reiser	29a0c1003b	Use `BestFit` layout even for attributes with a short name (#6872 )	2023-08-25 17:47:02 +02:00
Charlie Marsh	59e70896c0	Fix formatting of comments between function and arguments (#6826 ) ## Summary We now format comments between a function and its arguments as dangling. Like with other strange placements, I've biased towards preserving the existing formatting, rather than attempting to reorder the comments. Closes https://github.com/astral-sh/ruff/issues/6818. ## Test Plan `cargo test` Before: \| project \| similarity index \| \|--------------\|------------------\| \| cpython \| 0.76050 \| \| django \| 0.99820 \| \| transformers \| 0.99800 \| \| twine \| 0.99876 \| \| typeshed \| 0.99953 \| \| warehouse \| 0.99615 \| \| zulip \| 0.99729 \| After: \| project \| similarity index \| \|--------------\|------------------\| \| cpython \| 0.76050 \| \| django \| 0.99820 \| \| transformers \| 0.99800 \| \| twine \| 0.99876 \| \| typeshed \| 0.99953 \| \| warehouse \| 0.99615 \| \| zulip \| 0.99729 \|	2023-08-25 04:06:56 +00:00
Charlie Marsh	474e8fbcd4	Format all attribute dot comments manually (#6825 ) ## Summary This PR modifies our formatting of comments around the `.` in an attribute. Specifically, the goal here is to avoid _reordering_ comments, and the net effect is that we generally leave comments where-they-are when dealing with comments between around the dot (which you can also think of as comments between attributes). All comments around the dot are now treated as dangling and formatted manually, with the exception of end-of-line or parenthesized comments on the value, like those marked as trailing here, which remain trailing: ```python ( ( a # trailing end-of-line # trailing own-line ) # dangling before dot end-of-line .b # trailing end-of-line ) ``` Closes https://github.com/astral-sh/ruff/issues/6823. ## Test Plan `cargo test` Before: \| project \| similarity index \| \|--------------\|------------------\| \| cpython \| 0.76050 \| \| django \| 0.99820 \| \| transformers \| 0.99800 \| \| twine \| 0.99876 \| \| typeshed \| 0.99953 \| \| warehouse \| 0.99615 \| \| zulip \| 0.99729 \| After: \| project \| similarity index \| \|--------------\|------------------\| \| cpython \| 0.76050 \| \| django \| 0.99820 \| \| transformers \| 0.99800 \| \| twine \| 0.99876 \| \| typeshed \| 0.99953 \| \| warehouse \| 0.99615 \| \| zulip \| 0.99729 \|	2023-08-25 03:50:56 +00:00
Micha Reiser	0cea4975fc	Rename Comments methods (#6649 )	2023-08-18 06:37:01 +00:00
Micha Reiser	29c0b9f91c	Use single lookup for leading, dangling, and trailing comments (#6589 )	2023-08-15 17:39:45 +02:00
Charlie Marsh	f2939c678b	Avoid breaking call chains unnecessarily (#6488 ) ## Summary This PR attempts to fix the formatting of the following expression: ```python max_message_id = ( Message.objects.filter(recipient=recipient).order_by("id").reverse()[0].id ) ``` Specifically, Black preserves _that_ formatting, while we do: ```python max_message_id = ( Message.objects.filter(recipient=recipient) .order_by("id") .reverse()[0] .id ) ``` The fix here is to add a group around the entire call chain. ## Test Plan Before: - `zulip`: 0.99702 - `django`: 0.99784 - `warehouse`: 0.99585 - `build`: 0.75623 - `transformers`: 0.99470 - `cpython`: 0.75989 - `typeshed`: 0.74853 After: - `zulip`: 0.99703 - `django`: 0.99791 - `warehouse`: 0.99586 - `build`: 0.75623 - `transformers`: 0.99470 - `cpython`: 0.75989 - `typeshed`: 0.74853	2023-08-11 13:33:15 +00:00
konsti	99baad12d8	Call chain formatting in fluent style (#6151 ) Implement fluent style/call chains. See the `call_chains.py` formatting for examples. This isn't fully like black because in `raise A from B` they allow `A` breaking can influence the formatting of `B` even if it is already multiline. Similarity index: \| project \| main \| PR \| \|--------------\|-------\|-------\| \| build \| ??? \| 0.753 \| \| django \| 0.991 \| 0.998 \| \| transformers \| 0.993 \| 0.994 \| \| typeshed \| 0.723 \| 0.723 \| \| warehouse \| 0.978 \| 0.994 \| \| zulip \| 0.992 \| 0.994 \| Call chain formatting is affected by https://github.com/astral-sh/ruff/issues/627, but i'm cutting scope here. Closes #5343 Test Plan: * Added a dedicated call chains test file * The ecosystem checks found some bugs * I manually check django and zulip formatting --------- Co-authored-by: Micha Reiser <micha@reiser.io>	2023-08-04 13:58:01 +00:00
Micha Reiser	6bf6646c5d	Respect indent when measuring with `MeasureMode::AllLines` (#6120 )	2023-07-27 10:22:13 -04:00
Micha Reiser	40f54375cb	Pull in RustPython parser (#6099 )	2023-07-27 09:29:11 +00:00
Micha Reiser	2cf00fee96	Remove parser dependency from ruff-python-ast (#6096 )	2023-07-26 17:47:22 +02:00
Micha Reiser	067b2a6ce6	Pass parent to `NeedsParentheses` (#5708 )	2023-07-13 08:57:29 +02:00
Micha Reiser	8665a1a19d	Pass `FormatContext` to `NeedsParentheses` <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary I started working on this because I assumed that I would need access to options inside of `NeedsParantheses` but it then turned out that I won't. Anyway, it kind of felt nice to pass fewer arguments. So I'm gonna put this out here to get your feedback if you prefer this over passing individual fiels. Oh, I sneeked in another change. I renamed `context.contents` to `source`. `contents` is too generic and doesn't tell you anything. <!-- What's the purpose of the change? What does it do, and why? --> ## Test Plan It compiles	2023-07-11 14:28:50 +02:00
konstin	a52cd47c7f	Fix attribute chain own line comments (#5340 ) ## Motation Previously, ```python x = ( a1 .a2 # a . # b # c a3 ) ``` got formatted as ```python x = a1.a2 # a . # b # c a3 ``` which is invalid syntax. This fixes that. ## Summary This implements a basic form of attribute chaining (<https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#call-chains>) by checking if any inner attribute access contains an own line comment, and if this is the case, adds parentheses around the outermost attribute access while disabling parentheses for all inner attribute expressions. We want to replace this with an implementation that uses recursion or a stack while formatting instead of in `needs_parentheses` and also includes calls rather sooner than later, but i'm fixing this now because i'm uncomfortable with having known invalid syntax generation in the formatter. ## Test Plan I added new fixtures.	2023-06-26 09:13:07 +00:00
Micha Reiser	ccf34aae8c	Format Attribute Expression (#5259 )	2023-06-21 21:33:53 +00:00
Micha Reiser	68969240c5	Format Function definitions (#4951 )	2023-06-08 16:07:33 +00:00
Micha Reiser	c1cc6f3be1	Add basic Constant formatting (#4954 )	2023-06-08 11:42:44 +00:00
konstin	23abad0bd5	A basic StmtAssign formatter and better dummies for expressions (#4938 ) * A basic StmtAssign formatter and better dummies for expressions The goal of this PR was formatting StmtAssign since many nodes in the black tests (and in python in general) are after an assignment. This caused unstable formatting: The spacing of power op spacing depends on the type of the two involved expressions, but each expression was formatted as dummy string and re-parsed as a ExprName, so in the second round the different rules of ExprName were applied, causing unstable formatting. This PR does not necessarily bring us closer to black's style, but it unlocks a good porting of black's test suite and is a basis for implementing the Expr nodes. * fmt * Review	2023-06-08 12:20:25 +02:00
Micha Reiser	bcf745c5ba	Replace verbatim text with `NOT_YET_IMPLEMENTED` (#4904 ) <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary This PR replaces the `verbatim_text` builder with a `not_yet_implemented` builder that emits `NOT_YET_IMPLEMENTED_<NodeKind>` for not yet implemented nodes. The motivation for this change is that partially formatting compound statements can result in incorrectly indented code, which is a syntax error: ```python def func_no_args(): a; b; c if True: raise RuntimeError if False: ... for i in range(10): print(i) continue ``` Get's reformatted to ```python def func_no_args(): a; b; c if True: raise RuntimeError if False: ... for i in range(10): print(i) continue ``` because our formatter does not yet support `for` statements and just inserts the text from the source. ## Downsides Using an identifier will not work in all situations. For example, an identifier is invalid in an `Arguments ` position. That's why I kept `verbatim_text` around and e.g. use it in the `Arguments` formatting logic where incorrect indentations are impossible (to my knowledge). Meaning, `verbatim_text` we can opt in to `verbatim_text` when we want to iterate quickly on nodes that we don't want to provide a full implementation yet and using an identifier would be invalid. ## Upsides Running this on main discovered stability issues with the newline handling that were previously "hidden" because of the verbatim formatting. I guess that's an upside :) ## Test Plan None?	2023-06-07 14:57:25 +02:00
Micha Reiser	3f032cf09d	Format binary expressions (#4862 ) * Format Binary Expressions * Extract NeedsParentheses trait	2023-06-06 08:34:53 +00:00
konstin	9bf168c0a4	Use dummy verbatim formatter for all nodes (#4755 )	2023-06-01 08:25:26 +00:00
konstin	0945803427	Generate FormatRule definitions (#4724 ) * Generate FormatRule definitions * Generate verbatim output * pub(crate) everything * clippy fix * Update crates/ruff_python_formatter/src/lib.rs Co-authored-by: Micha Reiser <micha@reiser.io> * Update crates/ruff_python_formatter/src/lib.rs Co-authored-by: Micha Reiser <micha@reiser.io> * stub out with Ok(()) again * Update crates/ruff_python_formatter/src/lib.rs Co-authored-by: Micha Reiser <micha@reiser.io> * PyFormatContext::{contents, locator} with `#[allow(unused)]` * Can't leak private type * remove commented code * Fix ruff errors * pub struct Format{node} due to rust rules --------- Co-authored-by: Julian LaNeve <lanevejulian@gmail.com> Co-authored-by: Micha Reiser <micha@reiser.io>	2023-06-01 08:38:53 +02:00

37 Commits