This PR implements a modification (in preview) to fluent formatting for
method chains: We break _at_ the first call instead of _after_.
For example, we have the following diff between `main` and this PR (with
`line-length=8` so I don't have to stretch out the text):
```diff
x = (
- df.merge()
+ df
+ .merge()
.groupby()
.agg()
.filter()
)
```
## Explanation of current implementation
Recall that we traverse the AST to apply formatting. A method chain,
while read left-to-right, is stored in the AST "in reverse". So if we
start with something like
```python
a.b.c.d().e.f()
```
then the first syntax node we meet is essentially `.f()`. So we have to
peek ahead. And we actually _already_ do this in our current fluent
formatting logic: we peek ahead to count how many calls we have in the
chain to see whether we should be using fluent formatting or now.
In this implementation, we actually _record_ this number inside the enum
for `CallChainLayout`. That is, we make the variant `Fluent` hold an
`AttributeState`. This state can either be:
- The number of call-like attributes preceding the current attribute
- The state `FirstCallOrSubscript` which means we are at the first
call-like attribute in the chain (reading from left to right)
- The state `BeforeFirstCallOrSubscript` which means we are in the
"first group" of attributes, preceding that first call.
In our example, here's what it looks like at each attribute:
```
a.b.c.d().e.f @ Fluent(CallsOrSubscriptsPreceding(1))
a.b.c.d().e @ Fluent(CallsOrSubscriptsPreceding(1))
a.b.c.d @ Fluent(FirstCallOrSubscript)
a.b.c @ Fluent(BeforeFirstCallOrSubscript)
a.b @ Fluent(BeforeFirstCallOrSubscript)
```
Now, as we descend down from the parent expression, we pass along this
little piece of state and modify it as we go to track where we are. This
state doesn't do anything except when we are in `FirstCallOrSubscript`,
in which case we add a soft line break.
Closes#8598
---------
Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
## Summary
This PR makes two changes to our formatting of `lambda` expressions:
1. We now parenthesize the body expression if it expands
2. We now try to keep the parameters on a single line
The latter of these fixes#8179:
Black formatting and this PR's formatting:
```py
def a():
return b(
c,
d,
e,
f=lambda self, *args, **kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(
*args, **kwargs
),
)
```
Stable Ruff formatting
```py
def a():
return b(
c,
d,
e,
f=lambda self,
*args,
**kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(*args, **kwargs),
)
```
We don't parenthesize the body expression here because the call to
`aaaa...` has its own parentheses, but adding a binary operator shows
the new parenthesization:
```diff
@@ -3,7 +3,7 @@
c,
d,
e,
- f=lambda self, *args, **kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(
- *args, **kwargs
- ) + 1,
+ f=lambda self, *args, **kwargs: (
+ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(*args, **kwargs) + 1
+ ),
)
```
This is actually a new divergence from Black, which formats this input
like this:
```py
def a():
return b(
c,
d,
e,
f=lambda self, *args, **kwargs: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(
*args, **kwargs
)
+ 1,
)
```
But I think this is an improvement, unlike the case from #8179.
One other, smaller benefit is that because we now add parentheses to
lambda bodies, we also remove redundant parentheses:
```diff
@pytest.mark.parametrize(
"f",
[
- lambda x: (x.expanding(min_periods=5).cov(x, pairwise=True)),
- lambda x: (x.expanding(min_periods=5).corr(x, pairwise=True)),
+ lambda x: x.expanding(min_periods=5).cov(x, pairwise=True),
+ lambda x: x.expanding(min_periods=5).corr(x, pairwise=True),
],
)
def test_moment_functions_zero_length_pairwise(f):
```
## Test Plan
New tests taken from #8465 and probably a few more I should grab from
the ecosystem results.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
## Summary
This is a follow-up to #21868. As soon as I started merging #21868 into
#21385, I realized that I had missed a test case with `**kwargs` after
the `*args` parameter. Such a case is supposed to be formatted on one
line like:
```py
# input
(
lambda
# comment
*x,
**y: x
)
# output
(
lambda
# comment
*x, **y: x
)
```
which you can still see on the
[playground](https://play.ruff.rs/bd88d339-1358-40d2-819f-865bfcb23aef?secondary=Format),
but on `main` after #21868, this was formatted as:
```py
(
lambda
# comment
*x,
**y: x
)
```
because the leading comment on the first parameter caused the whole
group around the parameters to break.
Instead of making these comments leading comments on the first
parameter, this PR makes them leading comments on the parameters list as
a whole.
## Test Plan
New tests, and I will also try merging this into #21385 _before_ opening
it for review this time.
<hr>
(labeling `internal` since #21868 should not be released before some
kind of fix)
Summary
--
This PR makes two changes to comment placement in lambda parameters.
First, we
now insert a line break if the first parameter has a leading comment:
```py
# input
(
lambda
* # comment 2
x:
x
)
# main
(
lambda # comment 2
*x: x
)
# this PR
(
lambda
# comment 2
*x: x
)
```
Note the missing space in the output from main. This case is currently
unstable
on main. Also note that the new formatting is more consistent with our
stable
formatting in cases where the lambda has its own dangling comment:
```py
# input
(
lambda # comment 1
* # comment 2
x:
x
)
# output
(
lambda # comment 1
# comment 2
*x: x
)
```
and when a parameter without a comment precedes the split `*x`:
```py
# input
(
lambda y,
* # comment 2
x:
x
)
# output
(
lambda y,
# comment 2
*x: x
)
```
This does change the stable formatting, but I think such cases are rare
(expecting zero hits in the ecosystem report), this fixes an existing
instability, and it should not change any code we've previously
formatted.
Second, this PR modifies the comment placement such that `# comment 2`
in these
outputs is still a leading comment on the parameter. This is also not
the case
on main, where it becomes a [dangling lambda
comment](https://play.ruff.rs/3b29bb7e-70e4-4365-88e0-e60fe1857a35?secondary=Comments).
This doesn't cause any
instability that I'm aware of on main, but it does cause problems when
trying to
adjust the placement of dangling lambda comments in #21385. Changing the
placement in this way should not affect any formatting here.
Test Plan
--
New lambda tests, plus existing tests covering the cases above with
multiple
comments around the parameters (see lambda.py 122-143, and 122-205 or so
more
broadly)
I also checked manually that the comments are now leading on the
parameter:
```shell
❯ cargo run --bin ruff_python_formatter -- --emit stdout --target-version 3.10 --print-comments <<EOF
(
lambda
# comment 2
*x: x
)
EOF
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/ruff_python_formatter --emit stdout --target-version 3.10 --print-comments`
# Comment decoration: Range, Preceding, Following, Enclosing, Comment
21..32, None, Some((Parameters, 37..39)), (ExprLambda, 6..42), "# comment 2"
{
Node {
kind: Parameter,
range: 37..39,
source: `*x`,
}: {
"leading": [
SourceComment {
text: "# comment 2",
position: OwnLine,
formatted: true,
},
],
"dangling": [],
"trailing": [],
},
}
(
lambda
# comment 2
*x: x
)
```
But I didn't see a great place to put a test like this. Is there
somewhere I can assert this comment placement since it doesn't affect
any formatting yet? Or is it okay to wait until we use this in #21385?
## Summary
Garbage collect ASTs once we are done checking a given file. Queries
with a cross-file dependency on the AST will reparse the file on demand.
This reduces ty's peak memory usage by ~20-30%.
The primary change of this PR is adding a `node_index` field to every
AST node, that is assigned by the parser. `ParsedModule` can use this to
create a flat index of AST nodes any time the file is parsed (or
reparsed). This allows `AstNodeRef` to simply index into the current
instance of the `ParsedModule`, instead of storing a pointer directly.
The indices are somewhat hackily (using an atomic integer) assigned by
the `parsed_module` query instead of by the parser directly. Assigning
the indices in source-order in the (recursive) parser turns out to be
difficult, and collecting the nodes during semantic indexing is
impossible as `SemanticIndex` does not hold onto a specific
`ParsedModuleRef`, which the pointers in the flat AST are tied to. This
means that we have to do an extra AST traversal to assign and collect
the nodes into a flat index, but the small performance impact (~3% on
cold runs) seems worth it for the memory savings.
Part of https://github.com/astral-sh/ty/issues/214.
## Summary
If a lambda doesn't contain any parameters, or any parameter _tokens_
(like `*`), we can use `None` for the parameters. This feels like a
better representation to me, since, e.g., what should the `TextRange` be
for a non-existent set of parameters? It also allows us to remove
several sites where we check if the `Parameters` is empty by seeing if
it contains any arguments, so semantically, we're already trying to
detect and model around this elsewhere.
Changing this also fixes a number of issues with dangling comments in
parameter-less lambdas, since those comments are now automatically
marked as dangling on the lambda. (As-is, we were also doing something
not-great whereby the lambda was responsible for formatting dangling
comments on the parameters, which has been removed.)
Closes https://github.com/astral-sh/ruff/issues/6646.
Closes https://github.com/astral-sh/ruff/issues/6647.
## Test Plan
`cargo test`
## Summary
Previously, the ruff formatter was removing the star argument of
`lambda` expressions when formatting.
Given the following code snippet
```python
lambda *a: ()
lambda **b: ()
```
it would be formatted to
```python
lambda: ()
lambda: ()
```
We fix this by checking for the presence of `args`, `vararg` or `kwarg`
in the `lambda` expression, before we were only checking for the
presence of `args`.
Fixes#5894
## Test Plan
Add new tests cases.
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
This PR renames a few AST nodes for clarity:
- `Arguments` is now `Parameters`
- `Arg` is now `Parameter`
- `ArgWithDefault` is now `ParameterWithDefault`
For now, the attribute names that reference `Parameters` directly are
changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters`
itself are not (e.g., `vararg`). We may revisit that decision in the
future.
For context, the AST node formerly known as `Arguments` is used in
function definitions. Formally (outside of the Python context),
"arguments" typically refers to "the values passed to a function", while
"parameters" typically refers to "the variables used in a function
definition". E.g., if you Google "arguments vs parameters", you'll get
some explanation like:
> A parameter is a variable in a function definition. It is a
placeholder and hence does not have a concrete value. An argument is a
value passed during function invocation.
We're thus deviating from Python's nomenclature in favor of a scheme
that we find to be more precise.
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
I started working on this because I assumed that I would need access to options inside of `NeedsParantheses` but it then turned out that I won't.
Anyway, it kind of felt nice to pass fewer arguments. So I'm gonna put this out here to get your feedback if you prefer this over passing individual fiels.
Oh, I sneeked in another change. I renamed `context.contents` to `source`. `contents` is too generic and doesn't tell you anything.
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
It compiles
* A basic StmtAssign formatter and better dummies for expressions
The goal of this PR was formatting StmtAssign since many nodes in the black tests (and in python in general) are after an assignment. This caused unstable formatting: The spacing of power op spacing depends on the type of the two involved expressions, but each expression was formatted as dummy string and re-parsed as a ExprName, so in the second round the different rules of ExprName were applied, causing unstable formatting.
This PR does not necessarily bring us closer to black's style, but it unlocks a good porting of black's test suite and is a basis for implementing the Expr nodes.
* fmt
* Review
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR replaces the `verbatim_text` builder with a `not_yet_implemented` builder that emits `NOT_YET_IMPLEMENTED_<NodeKind>` for not yet implemented nodes.
The motivation for this change is that partially formatting compound statements can result in incorrectly indented code, which is a syntax error:
```python
def func_no_args():
a; b; c
if True: raise RuntimeError
if False: ...
for i in range(10):
print(i)
continue
```
Get's reformatted to
```python
def func_no_args():
a; b; c
if True: raise RuntimeError
if False: ...
for i in range(10):
print(i)
continue
```
because our formatter does not yet support `for` statements and just inserts the text from the source.
## Downsides
Using an identifier will not work in all situations. For example, an identifier is invalid in an `Arguments ` position. That's why I kept `verbatim_text` around and e.g. use it in the `Arguments` formatting logic where incorrect indentations are impossible (to my knowledge). Meaning, `verbatim_text` we can opt in to `verbatim_text` when we want to iterate quickly on nodes that we don't want to provide a full implementation yet and using an identifier would be invalid.
## Upsides
Running this on main discovered stability issues with the newline handling that were previously "hidden" because of the verbatim formatting. I guess that's an upside :)
## Test Plan
None?