Commit Graph

37 Commits

Author SHA1 Message Date
Dylan 4e1cf5747a
Fluent formatting of method chains (#21369)
This PR implements a modification (in preview) to fluent formatting for
method chains: We break _at_ the first call instead of _after_.

For example, we have the following diff between `main` and this PR (with
`line-length=8` so I don't have to stretch out the text):

```diff
 x = (
-    df.merge()
+    df
+    .merge()
     .groupby()
     .agg()
     .filter()
 )
```

## Explanation of current implementation

Recall that we traverse the AST to apply formatting. A method chain,
while read left-to-right, is stored in the AST "in reverse". So if we
start with something like

```python
a.b.c.d().e.f()
```

then the first syntax node we meet is essentially `.f()`. So we have to
peek ahead. And we actually _already_ do this in our current fluent
formatting logic: we peek ahead to count how many calls we have in the
chain to see whether we should be using fluent formatting or now.

In this implementation, we actually _record_ this number inside the enum
for `CallChainLayout`. That is, we make the variant `Fluent` hold an
`AttributeState`. This state can either be:

- The number of call-like attributes preceding the current attribute
- The state `FirstCallOrSubscript` which means we are at the first
call-like attribute in the chain (reading from left to right)
- The state `BeforeFirstCallOrSubscript` which means we are in the
"first group" of attributes, preceding that first call.

In our example, here's what it looks like at each attribute:

```
a.b.c.d().e.f @ Fluent(CallsOrSubscriptsPreceding(1))
a.b.c.d().e @ Fluent(CallsOrSubscriptsPreceding(1))
a.b.c.d @ Fluent(FirstCallOrSubscript)
a.b.c @ Fluent(BeforeFirstCallOrSubscript)
a.b @ Fluent(BeforeFirstCallOrSubscript)
```

Now, as we descend down from the parent expression, we pass along this
little piece of state and modify it as we go to track where we are. This
state doesn't do anything except when we are in `FirstCallOrSubscript`,
in which case we add a soft line break.

Closes #8598

---------

Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
2025-12-15 09:29:50 -06:00
Dylan 8156b45173
Avoid syntax error when formatting attribute expressions with outer parentheses, parenthesized value, and trailing comment on value (#20418)
Closes #19350 

This fixes a syntax error caused by formatting. However, the new tests reveal that there are some cases where formatting attributes with certain comments behaves strangely, both before and after this PR, so some more polish may be in order.

For example, without parentheses around the value, and both before and after this PR, we have:

```python
# unformatted
variable = (
    something # a comment
    .first_method("some string")
)

# formatted
variable = something.first_method("some string")  # a comment
```

which is probably not where the comment ought to go.
2025-11-17 09:11:36 -06:00
Ibraheem Ahmed c9dff5c7d5
[ty] AST garbage collection (#18482)
## Summary

Garbage collect ASTs once we are done checking a given file. Queries
with a cross-file dependency on the AST will reparse the file on demand.
This reduces ty's peak memory usage by ~20-30%.

The primary change of this PR is adding a `node_index` field to every
AST node, that is assigned by the parser. `ParsedModule` can use this to
create a flat index of AST nodes any time the file is parsed (or
reparsed). This allows `AstNodeRef` to simply index into the current
instance of the `ParsedModule`, instead of storing a pointer directly.

The indices are somewhat hackily (using an atomic integer) assigned by
the `parsed_module` query instead of by the parser directly. Assigning
the indices in source-order in the (recursive) parser turns out to be
difficult, and collecting the nodes during semantic indexing is
impossible as `SemanticIndex` does not hold onto a specific
`ParsedModuleRef`, which the pointers in the flat AST are tied to. This
means that we have to do an extra AST traversal to assign and collect
the nodes into a flat index, but the small performance impact (~3% on
cold runs) seems worth it for the memory savings.

Part of https://github.com/astral-sh/ty/issues/214.
2025-06-13 08:40:11 -04:00
Micha Reiser 9ae698fe30
Switch to Rust 2024 edition (#18129) 2025-05-16 13:25:28 +02:00
Micha Reiser 31180a84e4
Fix unstable formatting of trailing end-of-line comments of parenthesized attribute values (#16187) 2025-02-18 08:43:51 +01:00
Micha Reiser 6a1e555537
Upgrade to Rust 1.78 (#11260) 2024-05-03 12:46:21 +00:00
Micha Reiser 230c93459f
Delete redundant branch in `NeedsParentheses` (#8377) 2023-10-31 12:06:17 +00:00
Dhruv Manilawala 230c9ce236
Split `Constant` to individual literal nodes (#8064)
## Summary

This PR splits the `Constant` enum as individual literal nodes. It
introduces the following new nodes for each variant:
* `ExprStringLiteral`
* `ExprBytesLiteral`
* `ExprNumberLiteral`
* `ExprBooleanLiteral`
* `ExprNoneLiteral`
* `ExprEllipsisLiteral`

The main motivation behind this refactor is to introduce the new AST
node for implicit string concatenation in the coming PR. The elements of
that node will be either a string literal, bytes literal or a f-string
which can be implemented using an enum. This means that a string or
bytes literal cannot be represented by `Constant::Str` /
`Constant::Bytes` which creates an inconsistency.

This PR avoids that inconsistency by splitting the constant nodes into
it's own literal nodes, literal being the more appropriate naming
convention from a static analysis tool perspective.

This also makes working with literals in the linter and formatter much
more ergonomic like, for example, if one would want to check if this is
a string literal, it can be done easily using
`Expr::is_string_literal_expr` or matching against `Expr::StringLiteral`
as oppose to matching against the `ExprConstant` and enum `Constant`. A
few AST helper methods can be simplified as well which will be done in a
follow-up PR.

This introduces a new `Expr::is_literal_expr` method which is the same
as `Expr::is_constant_expr`. There are also intermediary changes related
to implicit string concatenation which are quiet less. This is done so
as to avoid having a huge PR which this already is.

## Test Plan

1. Verify and update all of the existing snapshots (parser, visitor)
2. Verify that the ecosystem check output remains **unchanged** for both
the linter and formatter

### Formatter ecosystem check

#### `main`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |

#### `dhruv/constant-to-literal`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
2023-10-30 12:13:23 +05:30
Micha Reiser 8b665f40c8
Avoid parenthesizing octal/hex or binary literals in object positions (#8160) 2023-10-24 15:12:52 +01:00
Charlie Marsh d685107638
Move {AnyNodeRef, AstNode} to ruff_python_ast crate root (#8030)
This is a do-over of https://github.com/astral-sh/ruff/pull/8011, which
I accidentally merged into a non-`main` branch. Sorry!
2023-10-18 00:01:18 +00:00
konsti 2cbe1733c8
Use CommentRanges in backwards lexing (#7360)
## Summary

The tokenizer was split into a forward and a backwards tokenizer. The
backwards tokenizer uses the same names as the forwards ones (e.g.
`next_token`). The backwards tokenizer gets the comment ranges that we
already built to skip comments.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2023-09-16 03:21:45 +00:00
Charlie Marsh 11287f944f
Avoid re-parenthesizing call chains whose inner values are parenthesized (#7373)
## Summary

Given a statement like:

```python
result = (
    f(111111111111111111111111111111111111111111111111111111111111111111111111111111111)
    + 1
)()
```

When we go to parenthesize the target of the assignment, we use
`maybe_parenthesize_expression` with `Parenthesize::IfBreaks`. This then
checks if the call on the right-hand side needs to be parenthesized, the
implementation of which looks like:

```rust
impl NeedsParentheses for ExprCall {
    fn needs_parentheses(
        &self,
        _parent: AnyNodeRef,
        context: &PyFormatContext,
    ) -> OptionalParentheses {
        if CallChainLayout::from_expression(self.into(), context.source())
            == CallChainLayout::Fluent
        {
            OptionalParentheses::Multiline
        } else if context.comments().has_dangling(self) {
            OptionalParentheses::Always
        } else {
            self.func.needs_parentheses(self.into(), context)
        }
    }
}
```

Checking for `self.func.needs_parentheses(self.into(), context)` is
problematic, since, as in the example above, `self.func` may _already_
be parenthesized -- in which case, we _don't_ want to parenthesize the
entire expression. If we do, we end up with this non-ideal formatting:

```python
result = (
    (
        f(
            111111111111111111111111111111111111111111111111111111111111111111111111111111111
        )
        + 1
    )()
)
```

This PR modifies the `NeedsParentheses` implementations for call chain
expressions to return `Never` if the inner expression has its own
parentheses, in which case, the formatting implementations for those
expressions will preserve them anyway.

Closes https://github.com/astral-sh/ruff/issues/7370.

## Test Plan

Zulip improves a bit, everything else is unchanged.

Before:

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| django | 0.99981 | 2760 | 40 |
| transformers | 0.99944 | 2587 | 413 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99834 | 648 | 20 |
| zulip | 0.99956 | 1437 | 23 |

After:

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| django | 0.99981 | 2760 | 40 |
| transformers | 0.99944 | 2587 | 413 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99834 | 648 | 20 |
| **zulip** | **0.99962** | **1437** | **22** |
2023-09-14 05:05:37 -04:00
Charlie Marsh ece30e7c69
Preserve parentheses around partial call chains (#7109) 2023-09-04 10:57:04 +01:00
Micha Reiser c05e4628b1
Introduce Token element (#7048) 2023-09-02 10:05:47 +02:00
Charlie Marsh fc89976c24
Move `Ranged` into `ruff_text_size` (#6919)
## Summary

The motivation here is that this enables us to implement `Ranged` in
crates that don't depend on `ruff_python_ast`.

Largely a mechanical refactor with a lot of regex, Clippy help, and
manual fixups.

## Test Plan

`cargo test`
2023-08-27 14:12:51 -04:00
Charlie Marsh edb9b0c62a
Use the formatter prelude in more files (#6882)
Removes a bunch of imports that are made redundant by the prelude.
2023-08-25 16:51:07 -04:00
Micha Reiser 29a0c1003b
Use `BestFit` layout even for attributes with a short name (#6872) 2023-08-25 17:47:02 +02:00
Charlie Marsh 59e70896c0
Fix formatting of comments between function and arguments (#6826)
## Summary

We now format comments between a function and its arguments as dangling.
Like with other strange placements, I've biased towards preserving the
existing formatting, rather than attempting to reorder the comments.

Closes https://github.com/astral-sh/ruff/issues/6818.

## Test Plan

`cargo test`

Before:

| project      | similarity index |
|--------------|------------------|
| cpython      | 0.76050          |
| django       | 0.99820          |
| transformers | 0.99800          |
| twine        | 0.99876          |
| typeshed     | 0.99953          |
| warehouse    | 0.99615          |
| zulip        | 0.99729          |

After:

| project      | similarity index |
|--------------|------------------|
| cpython      | 0.76050          |
| django       | 0.99820          |
| transformers | 0.99800          |
| twine        | 0.99876          |
| typeshed     | 0.99953          |
| warehouse    | 0.99615          |
| zulip        | 0.99729          |
2023-08-25 04:06:56 +00:00
Charlie Marsh 474e8fbcd4
Format all attribute dot comments manually (#6825)
## Summary

This PR modifies our formatting of comments around the `.` in an
attribute. Specifically, the goal here is to avoid _reordering_
comments, and the net effect is that we generally leave comments
where-they-are when dealing with comments between around the dot (which
you can also think of as comments between attributes).

All comments around the dot are now treated as dangling and formatted
manually, with the exception of end-of-line or parenthesized comments on
the value, like those marked as trailing here, which remain trailing:

```python
(
    (
        a # trailing end-of-line
        # trailing own-line
    ) # dangling before dot end-of-line
    .b # trailing end-of-line
)
```

Closes https://github.com/astral-sh/ruff/issues/6823.

## Test Plan

`cargo test`

Before:

| project      | similarity index |
|--------------|------------------|
| cpython      | 0.76050          |
| django       | 0.99820          |
| transformers | 0.99800          |
| twine        | 0.99876          |
| typeshed     | 0.99953          |
| warehouse    | 0.99615          |
| zulip        | 0.99729          |

After:

| project      | similarity index |
|--------------|------------------|
| cpython      | 0.76050          |
| django       | 0.99820   |
| transformers | 0.99800          |
| twine        | 0.99876          |
| typeshed     | 0.99953          |
| warehouse    | 0.99615          |
| zulip        | 0.99729          |
2023-08-25 03:50:56 +00:00
Micha Reiser 0cea4975fc
Rename Comments methods (#6649) 2023-08-18 06:37:01 +00:00
Micha Reiser 29c0b9f91c
Use single lookup for leading, dangling, and trailing comments (#6589) 2023-08-15 17:39:45 +02:00
Charlie Marsh f2939c678b
Avoid breaking call chains unnecessarily (#6488)
## Summary

This PR attempts to fix the formatting of the following expression:

```python
max_message_id = (
    Message.objects.filter(recipient=recipient).order_by("id").reverse()[0].id
)
```

Specifically, Black preserves _that_ formatting, while we do:

```python
max_message_id = (
    Message.objects.filter(recipient=recipient)
    .order_by("id")
    .reverse()[0]
    .id
)
```

The fix here is to add a group around the entire call chain.

## Test Plan

Before:

- `zulip`: 0.99702
- `django`: 0.99784
- `warehouse`: 0.99585
- `build`: 0.75623
- `transformers`: 0.99470
- `cpython`: 0.75989
- `typeshed`: 0.74853

After:

- `zulip`: 0.99703
- `django`: 0.99791
- `warehouse`: 0.99586
- `build`: 0.75623
- `transformers`: 0.99470
- `cpython`: 0.75989
- `typeshed`: 0.74853
2023-08-11 13:33:15 +00:00
konsti 99baad12d8
Call chain formatting in fluent style (#6151)
Implement fluent style/call chains. See the `call_chains.py` formatting
for examples.

This isn't fully like black because in `raise A from B` they allow `A`
breaking can influence the formatting of `B` even if it is already
multiline.

Similarity index:

| project      | main  | PR    |
|--------------|-------|-------|
| build        | ???   | 0.753 |
| django       | 0.991 | 0.998 |
| transformers | 0.993 | 0.994 |
| typeshed     | 0.723 | 0.723 |
| warehouse    | 0.978 | 0.994 |
| zulip        | 0.992 | 0.994 |

Call chain formatting is affected by
https://github.com/astral-sh/ruff/issues/627, but i'm cutting scope
here.

Closes #5343

**Test Plan**:
 * Added a dedicated call chains test file
 * The ecosystem checks found some bugs
 * I manually check django and zulip formatting

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2023-08-04 13:58:01 +00:00
Micha Reiser 6bf6646c5d Respect indent when measuring with `MeasureMode::AllLines` (#6120) 2023-07-27 10:22:13 -04:00
Micha Reiser 40f54375cb
Pull in RustPython parser (#6099) 2023-07-27 09:29:11 +00:00
Micha Reiser 2cf00fee96
Remove parser dependency from ruff-python-ast (#6096) 2023-07-26 17:47:22 +02:00
Micha Reiser 067b2a6ce6
Pass parent to `NeedsParentheses` (#5708) 2023-07-13 08:57:29 +02:00
Micha Reiser 8665a1a19d
Pass `FormatContext` to `NeedsParentheses`
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

I started working on this because I assumed that I would need access to options inside of `NeedsParantheses` but it then turned out that I won't. 
Anyway, it kind of felt nice to pass fewer arguments. So I'm gonna put this out here to get your feedback if you prefer this over passing individual fiels. 

Oh, I sneeked in another change. I renamed `context.contents` to `source`. `contents` is too generic and doesn't tell you anything. 

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

It compiles
2023-07-11 14:28:50 +02:00
konstin a52cd47c7f
Fix attribute chain own line comments (#5340)
## Motation

Previously,
```python
x = (
    a1
    .a2
    # a
    .  # b
    # c
    a3
)
```
got formatted as
```python
x = a1.a2
# a
.  # b
# c
a3
```
which is invalid syntax. This fixes that.

## Summary

This implements a basic form of attribute chaining
(<https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#call-chains>)
by checking if any inner attribute access contains an own line comment,
and if this is the case, adds parentheses around the outermost attribute
access while disabling parentheses for all inner attribute expressions.
We want to replace this with an implementation that uses recursion or a
stack while formatting instead of in `needs_parentheses` and also
includes calls rather sooner than later, but i'm fixing this now because
i'm uncomfortable with having known invalid syntax generation in the
formatter.

## Test Plan

I added new fixtures.
2023-06-26 09:13:07 +00:00
Micha Reiser ccf34aae8c
Format Attribute Expression (#5259) 2023-06-21 21:33:53 +00:00
Micha Reiser 68969240c5
Format Function definitions (#4951) 2023-06-08 16:07:33 +00:00
Micha Reiser c1cc6f3be1
Add basic Constant formatting (#4954) 2023-06-08 11:42:44 +00:00
konstin 23abad0bd5
A basic StmtAssign formatter and better dummies for expressions (#4938)
* A basic StmtAssign formatter and better dummies for expressions

The goal of this PR was formatting StmtAssign since many nodes in the black tests (and in python in general) are after an assignment. This caused unstable formatting: The spacing of power op spacing depends on the type of the two involved expressions, but each expression was formatted as dummy string and re-parsed as a ExprName, so in the second round the different rules of ExprName were applied, causing unstable formatting.

This PR does not necessarily bring us closer to black's style, but it unlocks a good porting of black's test suite and is a basis for implementing the Expr nodes.

* fmt

* Review
2023-06-08 12:20:25 +02:00
Micha Reiser bcf745c5ba
Replace verbatim text with `NOT_YET_IMPLEMENTED` (#4904)
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR replaces the `verbatim_text` builder with a `not_yet_implemented` builder that emits `NOT_YET_IMPLEMENTED_<NodeKind>` for not yet implemented nodes. 

The motivation for this change is that partially formatting compound statements can result in incorrectly indented code, which is a syntax error:

```python
def func_no_args():
  a; b; c
  if True: raise RuntimeError
  if False: ...
  for i in range(10):
    print(i)
    continue
```

Get's reformatted to

```python
def func_no_args():
    a; b; c
    if True: raise RuntimeError
    if False: ...
    for i in range(10):
    print(i)
    continue
```

because our formatter does not yet support `for` statements and just inserts the text from the source. 

## Downsides

Using an identifier will not work in all situations. For example, an identifier is invalid in an `Arguments ` position. That's why I kept `verbatim_text` around and e.g. use it in the `Arguments` formatting logic where incorrect indentations are impossible (to my knowledge). Meaning, `verbatim_text` we can opt in to `verbatim_text` when we want to iterate quickly on nodes that we don't want to provide a full implementation yet and using an identifier would be invalid. 

## Upsides

Running this on main discovered stability issues with the newline handling that were previously "hidden" because of the verbatim formatting. I guess that's an upside :)

## Test Plan

None?
2023-06-07 14:57:25 +02:00
Micha Reiser 3f032cf09d
Format binary expressions (#4862)
* Format Binary Expressions

* Extract NeedsParentheses trait
2023-06-06 08:34:53 +00:00
konstin 9bf168c0a4
Use dummy verbatim formatter for all nodes (#4755) 2023-06-01 08:25:26 +00:00
konstin 0945803427
Generate FormatRule definitions (#4724)
* Generate FormatRule definitions

* Generate verbatim output

* pub(crate) everything

* clippy fix

* Update crates/ruff_python_formatter/src/lib.rs

Co-authored-by: Micha Reiser <micha@reiser.io>

* Update crates/ruff_python_formatter/src/lib.rs

Co-authored-by: Micha Reiser <micha@reiser.io>

* stub out with Ok(()) again

* Update crates/ruff_python_formatter/src/lib.rs

Co-authored-by: Micha Reiser <micha@reiser.io>

* PyFormatContext::{contents, locator} with `#[allow(unused)]`

* Can't leak private type

* remove commented code

* Fix ruff errors

* pub struct Format{node} due to rust rules

---------

Co-authored-by: Julian LaNeve <lanevejulian@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
2023-06-01 08:38:53 +02:00