Commit Graph

32 Commits

Author SHA1 Message Date
Dylan 4e1cf5747a
Fluent formatting of method chains (#21369)
This PR implements a modification (in preview) to fluent formatting for
method chains: We break _at_ the first call instead of _after_.

For example, we have the following diff between `main` and this PR (with
`line-length=8` so I don't have to stretch out the text):

```diff
 x = (
-    df.merge()
+    df
+    .merge()
     .groupby()
     .agg()
     .filter()
 )
```

## Explanation of current implementation

Recall that we traverse the AST to apply formatting. A method chain,
while read left-to-right, is stored in the AST "in reverse". So if we
start with something like

```python
a.b.c.d().e.f()
```

then the first syntax node we meet is essentially `.f()`. So we have to
peek ahead. And we actually _already_ do this in our current fluent
formatting logic: we peek ahead to count how many calls we have in the
chain to see whether we should be using fluent formatting or now.

In this implementation, we actually _record_ this number inside the enum
for `CallChainLayout`. That is, we make the variant `Fluent` hold an
`AttributeState`. This state can either be:

- The number of call-like attributes preceding the current attribute
- The state `FirstCallOrSubscript` which means we are at the first
call-like attribute in the chain (reading from left to right)
- The state `BeforeFirstCallOrSubscript` which means we are in the
"first group" of attributes, preceding that first call.

In our example, here's what it looks like at each attribute:

```
a.b.c.d().e.f @ Fluent(CallsOrSubscriptsPreceding(1))
a.b.c.d().e @ Fluent(CallsOrSubscriptsPreceding(1))
a.b.c.d @ Fluent(FirstCallOrSubscript)
a.b.c @ Fluent(BeforeFirstCallOrSubscript)
a.b @ Fluent(BeforeFirstCallOrSubscript)
```

Now, as we descend down from the parent expression, we pass along this
little piece of state and modify it as we go to track where we are. This
state doesn't do anything except when we are in `FirstCallOrSubscript`,
in which case we add a soft line break.

Closes #8598

---------

Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
2025-12-15 09:29:50 -06:00
Brent Westbrook 0ebdebddd8
Keep lambda parameters on one line and parenthesize the body if it expands (#21385)
## Summary

This PR makes two changes to our formatting of `lambda` expressions:
1. We now parenthesize the body expression if it expands
2. We now try to keep the parameters on a single line

The latter of these fixes #8179:

Black formatting and this PR's formatting:

```py
def a():
    return b(
        c,
        d,
        e,
        f=lambda self, *args, **kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(
            *args, **kwargs
        ),
    )
```

Stable Ruff formatting

```py
def a():
    return b(
        c,
        d,
        e,
        f=lambda self,
        *args,
        **kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(*args, **kwargs),
    )
```

We don't parenthesize the body expression here because the call to
`aaaa...` has its own parentheses, but adding a binary operator shows
the new parenthesization:

```diff
@@ -3,7 +3,7 @@
         c,
         d,
         e,
-        f=lambda self, *args, **kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(
-            *args, **kwargs
-        ) + 1,
+        f=lambda self, *args, **kwargs: (
+            aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(*args, **kwargs) + 1
+        ),
     )
```

This is actually a new divergence from Black, which formats this input
like this:

```py
def a():
    return b(
        c,
        d,
        e,
        f=lambda self, *args, **kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(
            *args, **kwargs
        )
        + 1,
    )
```

But I think this is an improvement, unlike the case from #8179.

One other, smaller benefit is that because we now add parentheses to
lambda bodies, we also remove redundant parentheses:

```diff
 @pytest.mark.parametrize(
     "f",
     [
-        lambda x: (x.expanding(min_periods=5).cov(x, pairwise=True)),
-        lambda x: (x.expanding(min_periods=5).corr(x, pairwise=True)),
+        lambda x: x.expanding(min_periods=5).cov(x, pairwise=True),
+        lambda x: x.expanding(min_periods=5).corr(x, pairwise=True),
     ],
 )
 def test_moment_functions_zero_length_pairwise(f):
```

## Test Plan

New tests taken from #8465 and probably a few more I should grab from
the ecosystem results.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2025-12-12 12:02:25 -05:00
Brent Westbrook f3714fd3c1
Fix leading comment formatting for lambdas with multiple parameters (#21879)
## Summary

This is a follow-up to #21868. As soon as I started merging #21868 into
#21385, I realized that I had missed a test case with `**kwargs` after
the `*args` parameter. Such a case is supposed to be formatted on one
line like:

```py
# input
(
    lambda
    # comment
    *x,
    **y: x
)

# output
(
    lambda
    # comment
    *x, **y: x
)
```

which you can still see on the
[playground](https://play.ruff.rs/bd88d339-1358-40d2-819f-865bfcb23aef?secondary=Format),
but on `main` after #21868, this was formatted as:

```py
(
    lambda
    # comment
    *x,
    **y: x
)
```

because the leading comment on the first parameter caused the whole
group around the parameters to break.

Instead of making these comments leading comments on the first
parameter, this PR makes them leading comments on the parameters list as
a whole.

## Test Plan

New tests, and I will also try merging this into #21385 _before_ opening
it for review this time.

<hr>

(labeling `internal` since #21868 should not be released before some
kind of fix)
2025-12-09 18:15:12 -05:00
Brent Westbrook 0bec5c0362
Fix comment placement in lambda parameters (#21868)
Summary
--

This PR makes two changes to comment placement in lambda parameters.
First, we
now insert a line break if the first parameter has a leading comment:

```py
# input
(
    lambda
    * # comment 2
    x:
    x
)

# main
(
    lambda # comment 2
    *x: x
)

# this PR
(
    lambda
	# comment 2
    *x: x
)
```

Note the missing space in the output from main. This case is currently
unstable
on main. Also note that the new formatting is more consistent with our
stable
formatting in cases where the lambda has its own dangling comment:

```py
# input
(
    lambda # comment 1
    * # comment 2
    x:
    x
)

# output
(
    lambda  # comment 1
    # comment 2
    *x: x
)
```

and when a parameter without a comment precedes the split `*x`:

```py
# input
(
    lambda y,
    * # comment 2
    x:
    x
)

# output
(
    lambda y,
    # comment 2
    *x: x
)
```

This does change the stable formatting, but I think such cases are rare
(expecting zero hits in the ecosystem report), this fixes an existing
instability, and it should not change any code we've previously
formatted.

Second, this PR modifies the comment placement such that `# comment 2`
in these
outputs is still a leading comment on the parameter. This is also not
the case
on main, where it becomes a [dangling lambda
comment](https://play.ruff.rs/3b29bb7e-70e4-4365-88e0-e60fe1857a35?secondary=Comments).
This doesn't cause any
instability that I'm aware of on main, but it does cause problems when
trying to
adjust the placement of dangling lambda comments in #21385. Changing the
placement in this way should not affect any formatting here.

Test Plan
--

New lambda tests, plus existing tests covering the cases above with
multiple
comments around the parameters (see lambda.py 122-143, and 122-205 or so
more
broadly)

I also checked manually that the comments are now leading on the
parameter:

```shell
❯ cargo run --bin ruff_python_formatter -- --emit stdout --target-version 3.10 --print-comments <<EOF
(
    lambda
        # comment 2
    *x: x
)
EOF
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/ruff_python_formatter --emit stdout --target-version 3.10 --print-comments`
# Comment decoration: Range, Preceding, Following, Enclosing, Comment
21..32, None, Some((Parameters, 37..39)), (ExprLambda, 6..42), "# comment 2"
{
    Node {
        kind: Parameter,
        range: 37..39,
        source: `*x`,
    }: {
        "leading": [
            SourceComment {
                text: "# comment 2",
                position: OwnLine,
                formatted: true,
            },
        ],
        "dangling": [],
        "trailing": [],
    },
}
(
    lambda
    # comment 2
    *x: x
)
```

But I didn't see a great place to put a test like this. Is there
somewhere I can assert this comment placement since it doesn't affect
any formatting yet? Or is it okay to wait until we use this in #21385?
2025-12-09 14:07:48 -05:00
Ibraheem Ahmed c9dff5c7d5
[ty] AST garbage collection (#18482)
## Summary

Garbage collect ASTs once we are done checking a given file. Queries
with a cross-file dependency on the AST will reparse the file on demand.
This reduces ty's peak memory usage by ~20-30%.

The primary change of this PR is adding a `node_index` field to every
AST node, that is assigned by the parser. `ParsedModule` can use this to
create a flat index of AST nodes any time the file is parsed (or
reparsed). This allows `AstNodeRef` to simply index into the current
instance of the `ParsedModule`, instead of storing a pointer directly.

The indices are somewhat hackily (using an atomic integer) assigned by
the `parsed_module` query instead of by the parser directly. Assigning
the indices in source-order in the (recursive) parser turns out to be
difficult, and collecting the nodes during semantic indexing is
impossible as `SemanticIndex` does not hold onto a specific
`ParsedModuleRef`, which the pointers in the flat AST are tied to. This
means that we have to do an extra AST traversal to assign and collect
the nodes into a flat index, but the small performance impact (~3% on
cold runs) seems worth it for the memory savings.

Part of https://github.com/astral-sh/ty/issues/214.
2025-06-13 08:40:11 -04:00
Micha Reiser 6a1e555537
Upgrade to Rust 1.78 (#11260) 2024-05-03 12:46:21 +00:00
Charlie Marsh d685107638
Move {AnyNodeRef, AstNode} to ruff_python_ast crate root (#8030)
This is a do-over of https://github.com/astral-sh/ruff/pull/8011, which
I accidentally merged into a non-`main` branch. Sorry!
2023-10-18 00:01:18 +00:00
Charlie Marsh 4c4eceee36
Add dangling comment handling for `lambda` expressions (#7493)
## Summary

This PR adds dangling comment handling for `lambda` expressions. In
short, comments around the `lambda` and the `:` are all considered
dangling. Comments that come between the `lambda` and the `:` may be
moved after the colon for simplicity (this is an odd position for a
comment anyway), unless they also precede the lambda parameters, in
which case they're formatted before the parameters.

Closes https://github.com/astral-sh/ruff/issues/7470.

## Test Plan

`cargo test`

No change in similarity.

Before:

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| django | 0.99982 | 2760 | 37 |
| transformers | 0.99957 | 2587 | 398 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99929 | 648 | 16 |
| zulip | 0.99962 | 1437 | 22 |

After:

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| django | 0.99982 | 2760 | 37 |
| transformers | 0.99957 | 2587 | 398 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99929 | 648 | 16 |
| zulip | 0.99962 | 1437 | 22 |
2023-09-19 15:23:51 -04:00
Charlie Marsh 8ab2519717
Respect parentheses for precedence in `await` (#7468)
## Summary

We were using `Parenthesize::IfBreaks` universally for `await`, but
dropping parentheses can change the AST due to precedence. It turns out
that Black's rules aren't _exactly_ the same as operator precedence
(e.g., they leave parentheses around `await ([1, 2, 3])`, although they
aren't strictly required).

Closes https://github.com/astral-sh/ruff/issues/7467.

## Test Plan

`cargo test`

No change in similarity.

Before:

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| django | 0.99982 | 2760 | 37 |
| transformers | 0.99957 | 2587 | 398 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99929 | 648 | 16 |
| zulip | 0.99962 | 1437 | 22 |

After:

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| django | 0.99982 | 2760 | 37 |
| transformers | 0.99957 | 2587 | 398 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99929 | 648 | 16 |
| zulip | 0.99962 | 1437 | 22 |
2023-09-18 09:56:41 -04:00
qdegraaf 05951dd338
Fix inconsistent `expr_lambda` formatting (#6318) 2023-09-08 09:40:58 +00:00
Micha Reiser c05e4628b1
Introduce Token element (#7048) 2023-09-02 10:05:47 +02:00
Charlie Marsh edb9b0c62a
Use the formatter prelude in more files (#6882)
Removes a bunch of imports that are made redundant by the prelude.
2023-08-25 16:51:07 -04:00
Charlie Marsh 6a5acde226
Make `Parameters` an optional field on `ExprLambda` (#6669)
## Summary

If a lambda doesn't contain any parameters, or any parameter _tokens_
(like `*`), we can use `None` for the parameters. This feels like a
better representation to me, since, e.g., what should the `TextRange` be
for a non-existent set of parameters? It also allows us to remove
several sites where we check if the `Parameters` is empty by seeing if
it contains any arguments, so semantically, we're already trying to
detect and model around this elsewhere.

Changing this also fixes a number of issues with dangling comments in
parameter-less lambdas, since those comments are now automatically
marked as dangling on the lambda. (As-is, we were also doing something
not-great whereby the lambda was responsible for formatting dangling
comments on the parameters, which has been removed.)

Closes https://github.com/astral-sh/ruff/issues/6646.

Closes https://github.com/astral-sh/ruff/issues/6647.

## Test Plan

`cargo test`
2023-08-18 15:34:54 +00:00
Micha Reiser 29c0b9f91c
Use single lookup for leading, dangling, and trailing comments (#6589) 2023-08-15 17:39:45 +02:00
qdegraaf 278a4f6e14
Formatter: Fix posonlyargs for `expr_lambda` (#6562) 2023-08-14 17:38:56 +02:00
Victor Hugo Gomes 7c5791fb77
Fix formatting of `lambda` star arguments (#6257)
## Summary
Previously, the ruff formatter was removing the star argument of
`lambda` expressions when formatting.

Given the following code snippet
```python
lambda *a: ()
lambda **b: ()
```
it would be formatted to
```python
lambda: ()
lambda: ()
```

We fix this by checking for the presence of `args`, `vararg` or `kwarg`
in the `lambda` expression, before we were only checking for the
presence of `args`.

Fixes #5894

## Test Plan

Add new tests cases.

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-08-02 19:31:20 +00:00
Charlie Marsh adc8bb7821
Rename `Arguments` to `Parameters` in the AST (#6253)
## Summary

This PR renames a few AST nodes for clarity:

- `Arguments` is now `Parameters`
- `Arg` is now `Parameter`
- `ArgWithDefault` is now `ParameterWithDefault`

For now, the attribute names that reference `Parameters` directly are
changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters`
itself are not (e.g., `vararg`). We may revisit that decision in the
future.

For context, the AST node formerly known as `Arguments` is used in
function definitions. Formally (outside of the Python context),
"arguments" typically refers to "the values passed to a function", while
"parameters" typically refers to "the variables used in a function
definition". E.g., if you Google "arguments vs parameters", you'll get
some explanation like:

> A parameter is a variable in a function definition. It is a
placeholder and hence does not have a concrete value. An argument is a
value passed during function invocation.

We're thus deviating from Python's nomenclature in favor of a scheme
that we find to be more precise.
2023-08-01 13:53:28 -04:00
Micha Reiser 40f54375cb
Pull in RustPython parser (#6099) 2023-07-27 09:29:11 +00:00
Micha Reiser 2cf00fee96
Remove parser dependency from ruff-python-ast (#6096) 2023-07-26 17:47:22 +02:00
Chris Pryer f5c69c1b34
Update `ArgumentsParentheses` usage (#6070) 2023-07-25 18:03:48 +02:00
Chris Pryer 9e32585cb1
Use `dangling_node_comments` in `lambda` formatting (#5903) 2023-07-20 08:52:32 +02:00
Chris Pryer 38678142ed
Format `lambda` expression (#5806) 2023-07-19 11:47:56 +00:00
Micha Reiser 067b2a6ce6
Pass parent to `NeedsParentheses` (#5708) 2023-07-13 08:57:29 +02:00
konsti 0c8ec80d7b
Change lambda dummy to NOT_YET_IMPLEMENTED_lambda (#5687)
This only changes the dummy to be easier to identify.
2023-07-11 13:16:18 +00:00
Micha Reiser 8665a1a19d
Pass `FormatContext` to `NeedsParentheses`
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

I started working on this because I assumed that I would need access to options inside of `NeedsParantheses` but it then turned out that I won't. 
Anyway, it kind of felt nice to pass fewer arguments. So I'm gonna put this out here to get your feedback if you prefer this over passing individual fiels. 

Oh, I sneeked in another change. I renamed `context.contents` to `source`. `contents` is too generic and doesn't tell you anything. 

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

It compiles
2023-07-11 14:28:50 +02:00
Micha Reiser 68969240c5
Format Function definitions (#4951) 2023-06-08 16:07:33 +00:00
Micha Reiser c1cc6f3be1
Add basic Constant formatting (#4954) 2023-06-08 11:42:44 +00:00
konstin 23abad0bd5
A basic StmtAssign formatter and better dummies for expressions (#4938)
* A basic StmtAssign formatter and better dummies for expressions

The goal of this PR was formatting StmtAssign since many nodes in the black tests (and in python in general) are after an assignment. This caused unstable formatting: The spacing of power op spacing depends on the type of the two involved expressions, but each expression was formatted as dummy string and re-parsed as a ExprName, so in the second round the different rules of ExprName were applied, causing unstable formatting.

This PR does not necessarily bring us closer to black's style, but it unlocks a good porting of black's test suite and is a basis for implementing the Expr nodes.

* fmt

* Review
2023-06-08 12:20:25 +02:00
Micha Reiser bcf745c5ba
Replace verbatim text with `NOT_YET_IMPLEMENTED` (#4904)
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR replaces the `verbatim_text` builder with a `not_yet_implemented` builder that emits `NOT_YET_IMPLEMENTED_<NodeKind>` for not yet implemented nodes. 

The motivation for this change is that partially formatting compound statements can result in incorrectly indented code, which is a syntax error:

```python
def func_no_args():
  a; b; c
  if True: raise RuntimeError
  if False: ...
  for i in range(10):
    print(i)
    continue
```

Get's reformatted to

```python
def func_no_args():
    a; b; c
    if True: raise RuntimeError
    if False: ...
    for i in range(10):
    print(i)
    continue
```

because our formatter does not yet support `for` statements and just inserts the text from the source. 

## Downsides

Using an identifier will not work in all situations. For example, an identifier is invalid in an `Arguments ` position. That's why I kept `verbatim_text` around and e.g. use it in the `Arguments` formatting logic where incorrect indentations are impossible (to my knowledge). Meaning, `verbatim_text` we can opt in to `verbatim_text` when we want to iterate quickly on nodes that we don't want to provide a full implementation yet and using an identifier would be invalid. 

## Upsides

Running this on main discovered stability issues with the newline handling that were previously "hidden" because of the verbatim formatting. I guess that's an upside :)

## Test Plan

None?
2023-06-07 14:57:25 +02:00
Micha Reiser 3f032cf09d
Format binary expressions (#4862)
* Format Binary Expressions

* Extract NeedsParentheses trait
2023-06-06 08:34:53 +00:00
konstin 9bf168c0a4
Use dummy verbatim formatter for all nodes (#4755) 2023-06-01 08:25:26 +00:00
konstin 0945803427
Generate FormatRule definitions (#4724)
* Generate FormatRule definitions

* Generate verbatim output

* pub(crate) everything

* clippy fix

* Update crates/ruff_python_formatter/src/lib.rs

Co-authored-by: Micha Reiser <micha@reiser.io>

* Update crates/ruff_python_formatter/src/lib.rs

Co-authored-by: Micha Reiser <micha@reiser.io>

* stub out with Ok(()) again

* Update crates/ruff_python_formatter/src/lib.rs

Co-authored-by: Micha Reiser <micha@reiser.io>

* PyFormatContext::{contents, locator} with `#[allow(unused)]`

* Can't leak private type

* remove commented code

* Fix ruff errors

* pub struct Format{node} due to rust rules

---------

Co-authored-by: Julian LaNeve <lanevejulian@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
2023-06-01 08:38:53 +02:00