Commit Graph

468 Commits

Author SHA1 Message Date
Dhruv Manilawala 13ffb5bc19
Replace LALRPOP parser with hand-written parser (#10036)
(Supersedes #9152, authored by @LaBatata101)

## Summary

This PR replaces the current parser generated from LALRPOP to a
hand-written recursive descent parser.

It also updates the grammar for [PEP
646](https://peps.python.org/pep-0646/) so that the parser outputs the
correct AST. For example, in `data[*x]`, the index expression is now a
tuple with a single starred expression instead of just a starred
expression.

Beyond the performance improvements, the parser is also error resilient
and can provide better error messages. The behavior as seen by any
downstream tools isn't changed. That is, the linter and formatter can
still assume that the parser will _stop_ at the first syntax error. This
will be updated in the following months.

For more details about the change here, refer to the PR corresponding to
the individual commits and the release blog post.

## Test Plan

Write _lots_ and _lots_ of tests for both valid and invalid syntax and
verify the output.

## Acknowledgements

- @MichaReiser for reviewing 100+ parser PRs and continuously providing
guidance throughout the project
- @LaBatata101 for initiating the transition to a hand-written parser in
#9152
- @addisoncrump for implementing the fuzzer which helped
[catch](https://github.com/astral-sh/ruff/pull/10903)
[a](https://github.com/astral-sh/ruff/pull/10910)
[lot](https://github.com/astral-sh/ruff/pull/10966)
[of](https://github.com/astral-sh/ruff/pull/10896)
[bugs](https://github.com/astral-sh/ruff/pull/10877)

---------

Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org>
Co-authored-by: Micha Reiser <micha@reiser.io>
2024-04-18 17:57:39 +05:30
Alex Waygood f779babc5f
Improve handling of builtin symbols in linter rules (#10919)
Add a new method to the semantic model to simplify and improve the correctness of a common pattern
2024-04-16 11:37:31 +01:00
renovate[bot] 388658efdb
Update pre-commit dependencies (#10698)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Zanie Blue <contact@zanie.dev>
Co-authored-by: Alex Waygood <alex.waygood@gmail.com>
2024-04-06 23:00:41 +00:00
Charlie Marsh 7fb5f47efe
Respect `# noqa` directives on `__all__` openers (#10798)
## Summary

Historically, given:

```python
__all__ = [  # noqa: F822
    "Bernoulli",
    "Beta",
    "Binomial",
]
```

The F822 violations would be attached to the `__all__`, so this `# noqa`
would be enforced for _all_ definitions in the list. This changed in
https://github.com/astral-sh/ruff/pull/10525 for the better, in that we
now use the range of each string. But these `# noqa` directives stopped
working.

This PR sets the `__all__` as a parent range in the diagnostic, so that
these directives are respected once again.

Closes https://github.com/astral-sh/ruff/issues/10795.

## Test Plan

`cargo test`
2024-04-06 14:51:17 +00:00
Dhruv Manilawala 99dd3a8ab0
Implement `as_str` & `Display` for all operator enums (#10691)
## Summary

This PR adds the `as_str` implementation for all the operator methods.
It already exists for `CmpOp` which is being [used in the
linter](ffcd77860c/crates/ruff_linter/src/rules/flake8_simplify/rules/key_in_dict.rs (L117))
and it makes sense to implement it for the rest as well. This will also
be utilized in error messages for the new parser.
2024-04-02 10:34:36 +00:00
Dhruv Manilawala eee2d5b915
Remove unused operator methods and impl (#10690)
## Summary

This PR removes unused operator methods and impl traits. There is
already the `is_macro::Is` implementation for all the operators and this
seems unnecessary.
2024-04-02 15:53:20 +05:30
Alex Waygood a06ffeb54e
Track ranges of names inside `__all__` definitions (#10525) 2024-03-22 18:38:40 +00:00
Charlie Marsh 60fd98eb2f
Update Rust to v1.77 (#10510) 2024-03-21 12:10:33 -04:00
Alex Waygood 7caf0d064a
Simplify formatting of strings by using flags from the AST nodes (#10489) 2024-03-20 16:16:54 +00:00
Alex Waygood 162d2eb723
Track casing of r-string prefixes in the tokenizer and AST (#10314)
Co-authored-by: Micha Reiser <micha@reiser.io>
2024-03-18 17:18:04 +00:00
Dhruv Manilawala 5f40371ffc
Use `ExprFString` for `StringLike::FString` variant (#10311)
## Summary

This PR updates the `StringLike::FString` variant to use `ExprFString`
instead of `FStringLiteralElement`.

For context, the reason it used `FStringLiteralElement` is that the node
is actually the string part of an f-string ("foo" in `f"foo{x}"`). But,
this is inconsistent with other variants where the captured value is the
_entire_ string.

This is also problematic w.r.t. implicitly concatenated strings. Any
rules which work with `StringLike::FString` doesn't account for the
string part in an implicitly concatenated f-strings. For example, we
don't flag confusable character in the first part of `"𝐁ad" f"𝐁ad
string"`, but only the second part
(https://play.ruff.rs/16071c4c-a1dd-4920-b56f-e2ce2f69c843).

### Update `PYI053`

_This is included in this PR because otherwise it requires a temporary
workaround to be compatible with the old logic._

This PR also updates the `PYI053` (`string-or-bytes-too-long`) rule for
f-string to consider _all_ the visible characters in a f-string,
including the ones which are implicitly concatenated. This is consistent
with implicitly concatenated strings and bytes.

For example,

```python
def foo(
	# We count all the characters here
    arg1: str = '51 character ' 'stringgggggggggggggggggggggggggggggggg',
	# But not here because of the `{x}` replacement field which _breaks_ them up into two chunks
    arg2: str = f'51 character {x} stringgggggggggggggggggggggggggggggggggggggggggggg',
) -> None: ...
```

This PR fixes it to consider all _visible_ characters inside an f-string
which includes expressions as well.

fixes: #10310 
fixes: #10307 

## Test Plan

Add new test cases and update the snapshots.

## Review

To facilitate the review process, the change have been split into two
commits: one which has the code change while the other has the test
cases and updated snapshots.
2024-03-14 13:30:22 +05:30
Alex Waygood c2e15f38ee
Unify enums used for internal representation of quoting style (#10383) 2024-03-13 17:19:17 +00:00
Dhruv Manilawala 32d6f84e3d
Add methods to iter over f-string elements (#10309)
## Summary

This PR adds methods on `FString` to iterate over the two different kind
of elements it can have - literals and expressions. This is similar to
the methods we have on `ExprFString`.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-03-13 08:46:55 +00:00
Charlie Marsh dbf82233b8
Gate f-string struct size test for Rustc < 1.76 (#10371)
Closes https://github.com/astral-sh/ruff/issues/10319.
2024-03-12 15:46:36 -04:00
Alex Waygood 1d97f27335
Start tracking quoting style in the AST (#10298)
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
- The quoting style used (double or single quotes)
- Whether the string is triple-quoted or not
- Whether the string is raw or not

This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for https://github.com/astral-sh/ruff/issues/7799.

The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.

Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
2024-03-08 19:11:47 +00:00
Micha Reiser 8ea5b08700
refactor: Use `QualifiedName` for `Imported::call_path` (#10214)
## Summary

When you try to remove an internal representation leaking into another
type and end up rewriting a simple version of `smallvec`.

The goal of this PR is to replace the `Box<[&'a str]>` with
`Box<QualifiedName>` to avoid that the internal `QualifiedName`
representation leaks (and it gives us a nicer API too). However, doing
this when `QualifiedName` uses `SmallVec` internally gives us all sort
of funny lifetime errors. I was lost but @BurntSushi came to rescue me.
He figured out that `smallvec` has a variance problem which is already
tracked in https://github.com/servo/rust-smallvec/issues/146

To fix the variants problem, I could use the smallvec-2-alpha-4 or
implement our own smallvec. I went with implementing our own small vec
for this specific problem. It obviously isn't as sophisticated as
smallvec (only uses safe code), e.g. it doesn't perform any size
optimizations, but it does its job.

Other changes:

* Removed `Imported::qualified_name` (the version that returns a
`String`). This can be replaced by calling `ToString` on the qualified
name.
* Renamed `Imported::call_path` to `qualified_name` and changed its
return type to `&QualifiedName`.
* Renamed `QualifiedName::imported` to `user_defined` which is the more
common term when talking about builtins vs the rest/user defined
functions.


## Test plan

`cargo test`
2024-03-06 09:55:59 +01:00
Micha Reiser 184241f99a
Remove `Expr` postfix from `ExprNamed`, `ExprIf`, and `ExprGenerator` (#10229)
The expression types in our AST are called `ExprYield`, `ExprAwait`,
`ExprStringLiteral` etc, except `ExprNamedExpr`, `ExprIfExpr` and
`ExprGenratorExpr`. This seems to align with [Python AST's
naming](https://docs.python.org/3/library/ast.html) but feels
inconsistent and excessive.

This PR removes the `Expr` postfix from `ExprNamedExpr`, `ExprIfExpr`,
and `ExprGeneratorExpr`.
2024-03-04 12:55:01 +01:00
Micha Reiser a6d892b1f4
Split `CallPath` into `QualifiedName` and `UnqualifiedName` (#10210)
## Summary

Charlie can probably explain this better than I but it turns out,
`CallPath` is used for two different things:

* To represent unqualified names like `version` where `version` can be a
local variable or imported (e.g. `from sys import version` where the
full qualified name is `sys.version`)
* To represent resolved, full qualified names

This PR splits `CallPath` into two types to make this destinction clear.

> Note: I haven't renamed all `call_path` variables to `qualified_name`
or `unqualified_name`. I can do that if that's welcomed but I first want
to get feedback on the approach and naming overall.

## Test Plan

`cargo test`
2024-03-04 09:06:51 +00:00
Micha Reiser db25a563f7
Remove unneeded lifetime bounds (#10213)
## Summary

This PR removes the unneeded lifetime `'b` from many of our `Visitor`
implementations.

The lifetime is unneeded because it is only constraint by `'a`, so we
can use `'a` directly.

## Test Plan

`cargo build`
2024-03-03 18:12:11 +00:00
Micha Reiser e725b6fdaf
CallPath newtype wrapper (#10201)
## Summary

This PR changes the `CallPath` type alias to a newtype wrapper. 

A newtype wrapper allows us to limit the API and to experiment with
alternative ways to implement matching on `CallPath`s.



## Test Plan

`cargo test`
2024-03-03 16:54:24 +01:00
Jane Lewis 0293908b71
Implement RUF028 to detect useless formatter suppression comments (#9899)
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
Fixes #6611

## Summary

This lint rule spots comments that are _intended_ to suppress or enable
the formatter, but will be ignored by the Ruff formatter.

We borrow some functions the formatter uses for determining comment
placement / putting them in context within an AST.

The analysis function uses an AST visitor to visit each comment and
attach it to the AST. It then uses that context to check:
1. Is this comment in an expression?
2. Does this comment have bad placement? (e.g. a `# fmt: skip` above a
function instead of at the end of a line)
3. Is this comment redundant?
4. Does this comment actually suppress any code?
5. Does this comment have ambiguous placement? (e.g. a `# fmt: off`
above an `else:` block)

If any of these are true, a violation is thrown. The reported reason
depends on the order of the above check-list: in other words, a `# fmt:
skip` comment on its own line within a list expression will be reported
as being in an expression, since that reason takes priority.

The lint suggests removing the comment as an unsafe fix, regardless of
the reason.

## Test Plan

A snapshot test has been created.
2024-02-28 19:21:06 +00:00
Micha Reiser 77c5561646
Add `parenthesized` flag to `ExprTuple` and `ExprGenerator` (#9614) 2024-02-26 15:35:20 +00:00
Charlie Marsh 0304623878
[`perflint`] Catch a wider range of mutations in `PERF101` (#9955)
## Summary

This PR ensures that if a list `x` is modified within a `for` loop, we
avoid flagging `list(x)` as unnecessary. Previously, we only detected
calls to exactly `.append`, and they couldn't be nested within other
statements.

Closes https://github.com/astral-sh/ruff/issues/9925.
2024-02-12 12:17:55 -05:00
Charlie Marsh 6f0e4ad332
Remove unnecessary string cloning from the parser (#9884)
Closes https://github.com/astral-sh/ruff/issues/9869.
2024-02-09 16:03:27 -05:00
Charlie Marsh 49fe1b85f2
Reduce size of `Expr` from 80 to 64 bytes (#9900)
## Summary

This PR reduces the size of `Expr` from 80 to 64 bytes, by reducing the
sizes of...

- `ExprCall` from 72 to 56 bytes, by using boxed slices for `Arguments`.
- `ExprCompare` from 64 to 48 bytes, by using boxed slices for its
various vectors.

In testing, the parser gets a bit faster, and the linter benchmarks
improve quite a bit.
2024-02-09 02:53:13 +00:00
Micha Reiser fe7d965334
Reduce `Result<Tok, LexicalError>` size by using `Box<str>` instead of `String` (#9885) 2024-02-08 20:36:22 +00:00
Micha Reiser 688177ff6a
Use Rust 1.76 (#9897) 2024-02-08 18:20:08 +00:00
Charlie Marsh daae28efc7
Respect `async with` in `timeout-without-await` (#9859)
Closes https://github.com/astral-sh/ruff/issues/9855.
2024-02-06 12:04:24 -05:00
Dhruv Manilawala 36b752876e
Implement `AnyNode`/`AnyNodeRef` for `FStringFormatSpec` (#9836)
## Summary

This PR adds the `AnyNode` and `AnyNodeRef` implementation for
`FStringFormatSpec` node which will be required in the f-string
formatting.

The main usage for this is so that we can pass in the node directly to
`suppressed_node` in case debug expression is used to format is as
verbatim text.
2024-02-05 19:23:43 +00:00
Charlie Marsh ea1c089652
Use `AhoCorasick` to speed up quote match (#9773)
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

When I was looking at the v0.2.0 release, this method showed up in a
CodSpeed regression (we were calling it more), so I decided to quickly
look at speeding it up. @BurntSushi suggested using Aho-Corasick, and it
looks like it's about 7 or 8x faster:

```text
Parser/AhoCorasick      time:   [8.5646 ns 8.5914 ns 8.6191 ns]
Parser/Iterator         time:   [64.992 ns 65.124 ns 65.271 ns]
```

## Test Plan

`cargo test`
2024-02-02 09:57:39 -05:00
Micha Reiser ce14f4dea5
Range formatting API (#9635) 2024-01-31 11:13:37 +01:00
Micha Reiser 5fe0fdd0a8
Delete `is_node_with_body` method (#9643) 2024-01-25 14:41:13 +00:00
Alex Waygood a1e65a92bd
Move `is_tuple_parenthesized` from the formatter to `ruff_python_ast` (#9533)
This allows it to be used in the linter as well as the formatter. It
will be useful in #9474
2024-01-15 16:10:40 +00:00
Chammika Mannakkara 0003c730e0
[`flake8-simplify`] Implement `enumerate-for-loop` (`SIM113`) (#7777)
Implements SIM113 from #998

Added tests
Limitations 
   - No fix yet
   - Only flag cases where index variable immediately precede `for` loop

@charliermarsh please review and let me know any improvements

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-01-14 11:00:59 -05:00
Charlie Marsh 009430e034
[`ruff`] Avoid treating named expressions as static keys (`RUF011`) (#9494)
Closes https://github.com/astral-sh/ruff/issues/9487.
2024-01-12 14:33:45 -05:00
Charlie Marsh a31a314b2b
Account for possibly-empty f-string values in truthiness logic (#9484)
Closes https://github.com/astral-sh/ruff/issues/9479.
2024-01-11 21:16:19 -05:00
Micha Reiser f192c72596
Remove type parameter from `parse_*` methods (#9466) 2024-01-11 19:41:19 +01:00
Micha Reiser 94968fedd5
Use Rust 1.75 toolchain (#9437) 2024-01-08 18:03:16 +01:00
Charlie Marsh 701697c37e
Support variable keys in static dictionary key rule (#9411)
Closes https://github.com/astral-sh/ruff/issues/9410.
2024-01-06 20:44:40 +00:00
Charlie Marsh 0e202718fd
Misc. small tweaks from perusing modules (#9383) 2024-01-03 12:30:25 -05:00
Charlie Marsh 9073220887
Make all dependencies workspace dependencies (#9333)
## Summary

This PR modifies our `Cargo.toml` files to use workspace dependencies
for _all_ dependencies, rather than the status quo of sporadically
trying to use workspace dependencies for those dependencies that are
used across multiple crates. I find the current situation more confusing
and harder to manage, since we have a mix of workspace and crate-local
dependencies, whereas this setup consistently uses the same approach for
all dependencies.
2024-01-02 13:41:59 +00:00
Charlie Marsh 1c9268d2ff
Remove some unused dependencies (#9330) 2023-12-31 07:38:16 -05:00
Charlie Marsh e80260a3c5
Remove source path from parser errors (#9322)
## Summary

I always found it odd that we had to pass this in, since it's really
higher-level context for the error. The awkwardness is further evidenced
by the fact that we pass in fake values everywhere (even outside of
tests). The source path isn't actually used to display the error; it's
only accessed elsewhere to _re-display_ the error in certain cases. This
PR modifies to instead pass the path directly in those cases.
2023-12-30 20:33:05 +00:00
Charlie Marsh eb9a1bc5f1
Use consistent re-export from `ruff_source_file` (#9320)
Right now, we both re-export (via `pub use`) and mark the modules
themselves a `pub`, so they can be imported through two different paths.
2023-12-30 14:48:45 -05:00
Charlie Marsh 2895e7d126
Respect mixed `return` and `raise` cases in return-type analysis (#9310)
## Summary

Given:

```python
from somewhere import get_cfg

def lookup_cfg(cfg_description):
    cfg = get_cfg(cfg_description)
    if cfg is not None:
        return cfg
    raise AttributeError(f"No cfg found matching {cfg_description}")
```

We were analyzing the method from last-to-first statement. So we saw the
`raise`, then assumed the method _always_ raised. In reality, though, it
_might_ return. This PR improves the branch analysis to respect these
mixed cases.

Closes https://github.com/astral-sh/ruff/issues/9269.
Closes https://github.com/astral-sh/ruff/issues/9304.
2023-12-29 16:46:37 +00:00
Charlie Marsh a9ceef5b5d
[`ruff`] Add `never-union` rule to detect redundant `typing.NoReturn` and `typing.Never` (#9217)
## Summary

Adds a rule to detect unions that include `typing.NoReturn` or
`typing.Never`. In such cases, the use of the bottom type is redundant.

Closes https://github.com/astral-sh/ruff/issues/9113.

## Test Plan

`cargo test`
2023-12-21 20:53:31 +00:00
Charlie Marsh 5ccc21aea2
Add support for `NoReturn` in auto-return-typing (#9206)
## Summary

Given a function like:

```python
def func(x: int):
    if not x:
        raise ValueError
    else:
        raise TypeError
```

We now correctly use `NoReturn` as the return type, rather than `None`.

Closes https://github.com/astral-sh/ruff/issues/9201.
2023-12-20 00:06:31 -05:00
Dhruv Manilawala 18452cf477
Add `as_slice` method for all string nodes (#9111)
This PR adds a `as_slice` method to all the string nodes which returns
all the parts of the nodes as a slice. This will be useful in the next
PR to split the string formatting to use this method to extract the
_single node_ or _implicitly concanated nodes_.
2023-12-13 06:31:20 +00:00
Dhruv Manilawala 96ae9fe685
Introduce `StringLike` enum (#9016)
## Summary

This PR introduces a new `StringLike` enum which is a narrow type to
indicate string-like nodes. These includes the string literals, bytes
literals, and the literal parts of f-strings.

The main motivation behind this is to avoid repetition of rule calling
in the AST checker. We add a new `analyze::string_like` function which
takes in the enum and calls all the respective rule functions which
expects atleast 2 of the variants of this enum.

I'm open to discarding this if others think it's not that useful at this
stage as currently only 3 rules require these nodes.

As suggested
[here](https://github.com/astral-sh/ruff/pull/8835#discussion_r1414746934)
and
[here](https://github.com/astral-sh/ruff/pull/8835#discussion_r1414750204).

## Test Plan

`cargo test`
2023-12-07 16:39:13 +00:00
Dhruv Manilawala cdac90ef68
New AST nodes for f-string elements (#8835)
Rebase of #6365 authored by @davidszotten.

## Summary

This PR updates the AST structure for an f-string elements.

The main **motivation** behind this change is to have a dedicated node
for the string part of an f-string. Previously, the existing
`ExprStringLiteral` node was used for this purpose which isn't exactly
correct. The `ExprStringLiteral` node should include the quotes as well
in the range but the f-string literal element doesn't include the quote
as it's a specific part within an f-string. For example,

```python
f"foo {x}"
# ^^^^
# This is the literal part of an f-string
```

The introduction of `FStringElement` enum is helpful which represent
either the literal part or the expression part of an f-string.

### Rule Updates

This means that there'll be two nodes representing a string depending on
the context. One for a normal string literal while the other is a string
literal within an f-string. The AST checker is updated to accommodate
this change. The rules which work on string literal are updated to check
on the literal part of f-string as well.

#### Notes

1. The `Expr::is_literal_expr` method would check for
`ExprStringLiteral` and return true if so. But now that we don't
represent the literal part of an f-string using that node, this improves
the method's behavior and confines to the actual expression. We do have
the `FStringElement::is_literal` method.
2. We avoid checking if we're in a f-string context before adding to
`string_type_definitions` because the f-string literal is now a
dedicated node and not part of `Expr`.
3. Annotations cannot use f-string so we avoid changing any rules which
work on annotation and checks for `ExprStringLiteral`.

## Test Plan

- All references of `Expr::StringLiteral` were checked to see if any of
the rules require updating to account for the f-string literal element
node.
- New test cases are added for rules which check against the literal
part of an f-string.
- Check the ecosystem results and ensure it remains unchanged.

## Performance

There's a performance penalty in the parser. The reason for this remains
unknown as it seems that the generated assembly code is now different
for the `__reduce154` function. The reduce function body is just popping
the `ParenthesizedExpr` on top of the stack and pushing it with the new
location.

- The size of `FStringElement` enum is the same as `Expr` which is what
it replaces in `FString::format_spec`
- The size of `FStringExpressionElement` is the same as
`ExprFormattedValue` which is what it replaces

I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly
resulted in any performance gain. The difference can be seen here:
- Original profile: https://share.firefox.dev/3Taa7ES
- Profile after boxing some node fields:
https://share.firefox.dev/3GsNXpD

### Backtracking

I tried backtracking the changes to see if any of the isolated change
produced this regression. The problem here is that the overall change is
so small that there's only a single checkpoint where I can backtrack and
that checkpoint results in the same regression. This checkpoint is to
revert using `Expr` to the `FString::format_spec` field. After this
point, the change would revert back to the original implementation.

## Review process

The review process is similar to #7927. The first set of commits update
the node structure, parser, and related AST files. Then, further commits
update the linter and formatter part to account for the AST change.

---------

Co-authored-by: David Szotten <davidszotten@gmail.com>
2023-12-07 10:28:05 -06:00
Dhruv Manilawala ef7778d794
Fix preorder visitor tests (#9025)
Follow-up PR to #9009 to fix the `PreorderVisitor` test cases as
suggested here: https://github.com/astral-sh/ruff/pull/9009#discussion_r1416459688
2023-12-06 16:58:51 +00:00
Dhruv Manilawala bd443ebe91
Add visitor tests for strings, bytes, f-strings (#9009)
This PR adds tests for visitor implementation for string literals, bytes
literals and f-strings.
2023-12-06 10:52:19 -06:00
Micha Reiser 7e390d3772
Move `ParenthesizedExpr` to `ruff_python_parser` (#8987) 2023-12-04 05:36:28 +00:00
Charlie Marsh e5db72459e
Detect implicit returns in auto-return-types (#8952)
## Summary

Adds detection for branches without a `return` or `raise`, so that we
can properly `Optional` the return types. I'd like to remove this and
replace it with our code graph analysis from the `unreachable.rs` rule,
but it at least fixes the worst offenders.

Closes #8942.
2023-12-01 12:35:01 -05:00
Charlie Marsh 6435e4e4aa
Enable auto-return-type involving `Optional` and `Union` annotations (#8885)
## Summary

Previously, this was only supported for Python 3.10 and later, since we
always use the PEP 604-style unions.
2023-11-28 18:35:55 -08:00
Dhruv Manilawala ec7456bac0
Rename `as_str` to `to_str` (#8886)
This PR renames the method on `StringLiteralValue` from `as_str` to
`to_str`. The main motivation is to follow the naming convention as
described in the [Rust API
Guidelines](https://rust-lang.github.io/api-guidelines/naming.html#ad-hoc-conversions-follow-as_-to_-into_-conventions-c-conv).
This method can perform a string allocation in case the string is
implicitly concatenated.
2023-11-28 18:50:42 -06:00
Dhruv Manilawala b28556d739
Update `E402` to work at cell level for notebooks (#8872)
## Summary

This PR updates the `E402` rule to work at cell level for Jupyter
notebooks. This is enabled only in preview to gather feedback.

The implementation basically resets the import boundary flag on the
semantic model when we encounter the first statement in a cell.

Another potential solution is to introduce `E403` rule that is
specifically for notebooks that works at cell level while `E402` will be
disabled for notebooks.

## Test Plan

Add a notebook with imports in multiple cells and verify that the rule
works as expected.

resolves: #8669
2023-11-29 00:32:35 +00:00
Dhruv Manilawala 501cca8b72
Remove `#[allow(unused_variables)]` from visitor methods (#8828)
Small follow-up to remove `#[allow(unused_variables)]` from visitor
methods and use underscore prefix for unused variables instead.
2023-11-25 00:09:46 +00:00
Dhruv Manilawala 626b0577cd
Explicit `as_str` (no deref), add no allocation methods (#8826)
## Summary

This PR is a follow-up to the AST refactor which does the following:
- Remove `Deref` implementation on `StringLiteralValue` and use explicit
`as_str` calls instead. The `Deref` implementation would implicitly
perform allocations in case of implicitly concatenated strings. This is
to make sure the allocation is explicit.
- Now, certain methods can be implemented to do zero allocations which
have been implemented in this PR. They are:
    - `is_empty`
    - `len`
    - `chars`
    - Custom `PartialEq` implementation to compare each character

## Test Plan

Run the linter test suite and make sure all tests pass.
2023-11-25 00:03:59 +00:00
Dhruv Manilawala 017e829115
Update string nodes for implicit concatenation (#7927)
## Summary

This PR updates the string nodes (`ExprStringLiteral`,
`ExprBytesLiteral`, and `ExprFString`) to account for implicit string
concatenation.

### Motivation

In Python, implicit string concatenation are joined while parsing
because the interpreter doesn't require the information for each part.
While that's feasible for an interpreter, it falls short for a static
analysis tool where having such information is more useful. Currently,
various parts of the code uses the lexer to get the individual string
parts.

One of the main challenge this solves is that of string formatting.
Currently, the formatter relies on the lexer to get the individual
string parts, and formats them including the comments accordingly. But,
with PEP 701, f-string can also contain comments. Without this change,
it becomes very difficult to add support for f-string formatting.

### Implementation

The initial proposal was made in this discussion:
https://github.com/astral-sh/ruff/discussions/6183#discussioncomment-6591993.
There were various AST designs which were explored for this task which
are available in the linked internal document[^1].

The selected variant was the one where the nodes were kept as it is
except that the `implicit_concatenated` field was removed and instead a
new struct was added to the `Expr*` struct. This would be a private
struct would contain the actual implementation of how the AST is
designed for both single and implicitly concatenated strings.

This implementation is achieved through an enum with two variants:
`Single` and `Concatenated` to avoid allocating a vector even for single
strings. There are various public methods available on the value struct
to query certain information regarding the node.

The nodes are structured in the following way:

```
ExprStringLiteral - "foo" "bar"
|- StringLiteral - "foo"
|- StringLiteral - "bar"

ExprBytesLiteral - b"foo" b"bar"
|- BytesLiteral - b"foo"
|- BytesLiteral - b"bar"

ExprFString - "foo" f"bar {x}"
|- FStringPart::Literal - "foo"
|- FStringPart::FString - f"bar {x}"
  |- StringLiteral - "bar "
  |- FormattedValue - "x"
```

[^1]: Internal document:
https://www.notion.so/astral-sh/Implicit-String-Concatenation-e036345dc48943f89e416c087bf6f6d9?pvs=4

#### Visitor

The way the nodes are structured is that the entire string, including
all the parts that are implicitly concatenation, is a single node
containing individual nodes for the parts. The previous section has a
representation of that tree for all the string nodes. This means that
new visitor methods are added to visit the individual parts of string,
bytes, and f-strings for `Visitor`, `PreorderVisitor`, and
`Transformer`.

## Test Plan

- `cargo insta test --workspace --all-features --unreferenced reject`
- Verify that the ecosystem results are unchanged
2023-11-24 17:55:41 -06:00
Samuel Cormier-Iijima 852a8f4a4f
[PIE796] don't report when using ellipses for enum values in stub files (#8825)
## Summary

Just ignores ellipses as enum values inside stub files.

Fixes #8818.
2023-11-24 15:24:57 +00:00
konsti 14e65afdc6
Update to Rust 1.74 and use new clippy lints table (#8722)
Update to [Rust
1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html) and use
the new clippy lints table.

The update itself introduced a new clippy lint about superfluous hashes
in raw strings, which got removed.

I moved our lint config from `rustflags` to the newly stabilized
[workspace.lints](https://doc.rust-lang.org/stable/cargo/reference/workspaces.html#the-lints-table).
One consequence is that we have to `unsafe_code = "warn"` instead of
"forbid" because the latter now actually bans unsafe code:

```
error[E0453]: allow(unsafe_code) incompatible with previous forbid
  --> crates/ruff_source_file/src/newlines.rs:62:17
   |
62 |         #[allow(unsafe_code)]
   |                 ^^^^^^^^^^^ overruled by previous forbid
   |
   = note: `forbid` lint level was set on command line
```

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-11-16 18:12:46 -05:00
Charlie Marsh bf2cc3f520
Add autotyping-like return type inference for annotation rules (#8643)
## Summary

This PR adds (unsafe) fixes to the flake8-annotations rules that enforce
missing return types, offering to automatically insert type annotations
for functions with literal return values. The logic is smart enough to
generate simplified unions (e.g., `float` instead of `int | float`) and
deal with implicit returns (`return` without a value).

Closes https://github.com/astral-sh/ruff/issues/1640 (though we could
open a separate issue for referring parameter types).

Closes https://github.com/astral-sh/ruff/issues/8213.

## Test Plan

`cargo test`
2023-11-13 23:34:15 -05:00
Charlie Marsh df9ade7fd9
Use AST transformer for `relocate` (#8660) 2023-11-13 13:24:27 -05:00
Charlie Marsh 345e1401cf
Treat `class C: ...` and `class C(): ...` equivalently (#8659)
## Summary

These should be seen as identical from the `ComparableAst` perspective.
2023-11-13 18:03:04 +00:00
Charlie Marsh d574fcd1ac
Compare formatted and unformatted ASTs during formatter tests (#8624)
## Summary

This PR implements validation in the formatter tests to ensure that we
don't modify the AST during formatting. Black has similar logic.

In implementing this, I learned that Black actually _does_ modify the
AST, and their test infrastructure normalizes the AST to wipe away those
differences. Specifically, Black changes the indentation of docstrings,
which _does_ modify the AST; and it also inserts parentheses in `del`
statements, which changes the AST too.

Ruff also does both these things, so we _also_ implement the same
normalization using a new visitor that allows for modifying the AST.

Closes https://github.com/astral-sh/ruff/issues/8184.

## Test Plan

`cargo test`
2023-11-13 17:43:27 +00:00
Jesse Serrao 39728a1198
Add check for is comparison with mutable initialisers to rule F632 (#8607)
## Summary

Adds an extra check to F632 to check for any `is` comparisons to a
mutable initialisers.
Implements #8589 .

Example:
```Python
named_var = {}
if named_var is {}:  # F632 (fix)
    pass
```
The if condition will always evaluate to False because it checks on
identity and it's impossible to take the same identity as a hard coded
list/set/dict initializer.

## Test Plan

Multiple test cases were added to ensure the rule works + doesn't flag
false positives + the fix works correctly.
2023-11-11 00:29:23 +00:00
Kar Petrosyan e2c7b1ece6
[TRIO] Add TRIO109 rule (#8534)
## Summary

Adds TRIO109 from the [flake8-trio
plugin](https://github.com/Zac-HD/flake8-trio).
Relates to: https://github.com/astral-sh/ruff/issues/8451
2023-11-07 17:13:01 -05:00
qdegraaf 4170ef0508
[`TRIO`] Add `TRIO105`: `SyncTrioCall` (#8490)
## Summary

Adds `TRIO105` from the [flake8-trio
plugin](https://github.com/Zac-HD/flake8-trio). The `MethodName` logic
mirrors that of `TRIO100` to stay consistent within the plugin.

It is at 95% parity with the exception of upstream also checking for a
slightly more complex scenario where a call to `start()` on a
`trio.Nursery` context should also be immediately awaited. Upstream
plugin appears to just check for anything named `nursery` judging from
[the relevant issue](https://github.com/Zac-HD/flake8-trio/issues/56).

Unsure if we want to do so something similar or, alternatively, if there
is some capability in ruff to check for calls made on this context some
other way

## Test Plan

Added a new fixture, based on [the one from upstream
plugin](https://github.com/Zac-HD/flake8-trio/blob/main/tests/eval_files/trio105.py)

## Issue link

Refers: https://github.com/astral-sh/ruff/issues/8451
2023-11-05 19:56:10 +00:00
Micha Reiser f16505d885
Formatter: Remove unnecessary `group` (#8455) 2023-11-03 04:14:29 +00:00
Dhruv Manilawala d350ede992
Remove unicode flag from comparable (#8440)
## Summary

This PR removes the `unicode` flag from the string literal in
`ComparableExpr`. This flag isn't required as all strings are unicode in
Python 3 so `"foo" == u"foo"`.
2023-11-02 13:21:45 +05:30
Dhruv Manilawala 97ae617fac
Introduce `LiteralExpressionRef` for all literals (#8339)
## Summary

This PR adds a new `LiteralExpressionRef` which wraps all of the literal
expression nodes in a single enum. This allows for a narrow type when
working exclusively with a literal node. Additionally, it also
implements a `Expr::as_literal_expr` method to return the new enum if
the expression is indeed a literal one.

A few rules have been updated to account for the new enum:
1. `redundant_literal_union`
2. `if_else_block_instead_of_dict_lookup`
3. `magic_value_comparison`

To account for the change in (2), a new `ComparableLiteral` has been
added which can be constructed from the new enum
(`ComparableLiteral::from(<LiteralExpressionRef>)`).

### Open Questions

1. The new `ComparableLiteral` can be exclusively used via the
`LiteralExpressionRef` enum. Should we remove all of the literal
variants from `ComparableExpr` and instead have a single
`ComparableExpr::Literal(ComparableLiteral)` variant instead?

## Test Plan

`cargo test`
2023-10-31 12:56:11 +00:00
Dhruv Manilawala 8977b6ae11
Inline AST helpers for new literal nodes (#8374)
A small refactor to inline the `is_const_none` now that there's a
dedicated `ExprNoneLiteral` node.
2023-10-31 11:06:54 +00:00
Charlie Marsh 161c093c06
Avoid including literal `shell=True` for truthy, non-`True` diagnostics (#8359)
## Summary

If the value of `shell` wasn't literally `True`, we now show a message
describing it as truthy, rather than the (misleading) `shell=True`
literal in the diagnostic.

Closes https://github.com/astral-sh/ruff/issues/8310.
2023-10-30 15:44:38 +00:00
Dhruv Manilawala b0dc5a86a1
Impl `Default` for `(String|Bytes|Boolean|None|Ellipsis)Literal` (#8341)
## Summary

This PR adds `Default` for the following literal nodes:
* `StringLiteral`
* `BytesLiteral`
* `BooleanLiteral`
* `NoneLiteral`
* `EllipsisLiteral`

The implementation creates the zero value of the respective literal
nodes in terms of the Python language.

## Test Plan

`cargo test`
2023-10-30 08:47:44 +00:00
Dhruv Manilawala 230c9ce236
Split `Constant` to individual literal nodes (#8064)
## Summary

This PR splits the `Constant` enum as individual literal nodes. It
introduces the following new nodes for each variant:
* `ExprStringLiteral`
* `ExprBytesLiteral`
* `ExprNumberLiteral`
* `ExprBooleanLiteral`
* `ExprNoneLiteral`
* `ExprEllipsisLiteral`

The main motivation behind this refactor is to introduce the new AST
node for implicit string concatenation in the coming PR. The elements of
that node will be either a string literal, bytes literal or a f-string
which can be implemented using an enum. This means that a string or
bytes literal cannot be represented by `Constant::Str` /
`Constant::Bytes` which creates an inconsistency.

This PR avoids that inconsistency by splitting the constant nodes into
it's own literal nodes, literal being the more appropriate naming
convention from a static analysis tool perspective.

This also makes working with literals in the linter and formatter much
more ergonomic like, for example, if one would want to check if this is
a string literal, it can be done easily using
`Expr::is_string_literal_expr` or matching against `Expr::StringLiteral`
as oppose to matching against the `ExprConstant` and enum `Constant`. A
few AST helper methods can be simplified as well which will be done in a
follow-up PR.

This introduces a new `Expr::is_literal_expr` method which is the same
as `Expr::is_constant_expr`. There are also intermediary changes related
to implicit string concatenation which are quiet less. This is done so
as to avoid having a huge PR which this already is.

## Test Plan

1. Verify and update all of the existing snapshots (parser, visitor)
2. Verify that the ecosystem check output remains **unchanged** for both
the linter and formatter

### Formatter ecosystem check

#### `main`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |

#### `dhruv/constant-to-literal`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
2023-10-30 12:13:23 +05:30
Dhruv Manilawala 78bbf6d403
New `Singleton` enum for `PatternMatchSingleton` node (#8063)
## Summary

This PR adds a new `Singleton` enum for the `PatternMatchSingleton`
node.

Earlier the node was using the `Constant` enum but the value for this
pattern can only be either `None`, `True` or `False`. With the coming PR
to remove the `Constant`, this node required a new type to fill in.

This also has the benefit of narrowing the type down to only the
possible values for the node as evident by the removal of `unreachable`.

## Test Plan

Update the AST snapshots and run `cargo test`.
2023-10-30 05:48:53 +00:00
Dhruv Manilawala ec1be60dcb
Remove leftover constant tuple reference (#8062)
This PR removes the leftover reference to the tuple variant in
`Constant`.
2023-10-19 17:50:45 +00:00
konsti 8f9753f58e
Comments outside expression parentheses (#7873)
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

Fixes https://github.com/astral-sh/ruff/issues/7448
Fixes https://github.com/astral-sh/ruff/issues/7892

I've removed automatic dangling comment formatting, we're doing manual
dangling comment formatting everywhere anyway (the
assert-all-comments-formatted ensures this) and dangling comments would
break the formatting there.

## Test Plan

New test file.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2023-10-19 09:24:11 +00:00
konsti 0c3123e07e
Insert newline after nested function or class statements (#7946)
**Summary** Insert a newline after nested function and class
definitions, unless there is a trailing own line comment.

We need to e.g. format
```python
if platform.system() == "Linux":
    if sys.version > (3, 10):
        def f():
            print("old")
    else:
        def f():
            print("new")
    f()
```
as
```python
if platform.system() == "Linux":
    if sys.version > (3, 10):

        def f():
            print("old")

    else:

        def f():
            print("new")

    f()
```
even though `f()` is directly preceded by an if statement, not a
function or class definition. See the comments and fixtures for trailing
own line comment handling.

**Test Plan** I checked that the new content of `newlines.py` matches
black's formatting.

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-10-18 09:45:58 +00:00
Charlie Marsh d685107638
Move {AnyNodeRef, AstNode} to ruff_python_ast crate root (#8030)
This is a do-over of https://github.com/astral-sh/ruff/pull/8011, which
I accidentally merged into a non-`main` branch. Sorry!
2023-10-18 00:01:18 +00:00
Tom Kuson 62f1ee08e7
[`refurb`] Implement `single-item-membership-test` (`FURB171`) (#7815)
## Summary

Implement
[`no-single-item-in`](https://github.com/dosisod/refurb/blob/master/refurb/checks/iterable/no_single_item_in.py)
as `single-item-membership-test` (`FURB171`).

Uses the helper function `generate_comparison` from the `pycodestyle`
implementations; this function should probably be moved, but I am not
sure where at the moment.

Update: moved it to `ruff_python_ast::helpers`.

Related to #1348.

## Test Plan

`cargo test`
2023-10-08 14:08:47 +00:00
Charlie Marsh f71c80af68
Show changed files when running under `--check` (#7788)
## Summary

We now list each changed file when running with `--check`.

Closes https://github.com/astral-sh/ruff/issues/7782.

## Test Plan

```
❯ cargo run -p ruff_cli -- format foo.py --check
   Compiling ruff_cli v0.0.292 (/Users/crmarsh/workspace/ruff/crates/ruff_cli)
rgo +    Finished dev [unoptimized + debuginfo] target(s) in 1.41s
     Running `target/debug/ruff format foo.py --check`
warning: `ruff format` is a work-in-progress, subject to change at any time, and intended only for experimentation.
Would reformat: foo.py
1 file would be reformatted
```
2023-10-03 18:50:06 +00:00
Tom Kuson e129f77bcf
Extend `reimplemented-starmap` (`FURB140`) to catch calls with a single and starred argument (#7768) 2023-10-02 21:38:05 -04:00
Dhruv Manilawala e62e245c61
Add support for PEP 701 (#7376)
## Summary

This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the
other individual PRs. The separate PRs were created for logic separation
and code reviews. Refer to each pull request for a detail description on
the change.

Refer to the PR description for the list of pull requests within this PR.

## Test Plan

### Formatter ecosystem checks

Explanation for the change in ecosystem check:
https://github.com/astral-sh/ruff/pull/7597#issue-1908878183

#### `main`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76083 |              1789 |              1631 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```

#### `dhruv/pep-701`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76051 |              1789 |              1632 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```
2023-09-29 02:55:39 +00:00
Charlie Marsh f45281345d
Include radix base prefix in large number representation (#7700)
## Summary

When lexing a number like `0x995DC9BBDF1939FA` that exceeds our small
number representation, we were only storing the portion after the base
(in this case, `995DC9BBDF1939FA`). When using that representation in
code generation, this could lead to invalid syntax, since
`995DC9BBDF1939FA)` on its own is not a valid integer.

This PR modifies the code to store the full span, including the radix
prefix.

See:
https://github.com/astral-sh/ruff/issues/7455#issuecomment-1739802958.

## Test Plan

`cargo test`
2023-09-28 20:38:06 +00:00
Charlie Marsh 0a8cad2550
Allow named expressions in `__all__` assignments (#7673)
## Summary

This PR adds support for named expressions when analyzing `__all__`
assignments, as per https://github.com/astral-sh/ruff/issues/7672. It
also loosens the enforcement around assignments like: `__all__ =
list(some_other_expression)`. We shouldn't flag these as invalid, even
though we can't analyze the members, since we _know_ they evaluate to a
`list`.

Closes https://github.com/astral-sh/ruff/issues/7672.

## Test Plan

`cargo test`
2023-09-27 00:36:55 -04:00
Charlie Marsh 93b5d8a0fb
Implement our own small-integer optimization (#7584)
## Summary

This is a follow-up to #7469 that attempts to achieve similar gains, but
without introducing malachite. Instead, this PR removes the `BigInt`
type altogether, instead opting for a simple enum that allows us to
store small integers directly and only allocate for values greater than
`i64`:

```rust
/// A Python integer literal. Represents both small (fits in an `i64`) and large integers.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct Int(Number);

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Number {
    /// A "small" number that can be represented as an `i64`.
    Small(i64),
    /// A "large" number that cannot be represented as an `i64`.
    Big(Box<str>),
}

impl std::fmt::Display for Number {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Number::Small(value) => write!(f, "{value}"),
            Number::Big(value) => write!(f, "{value}"),
        }
    }
}
```

We typically don't care about numbers greater than `isize` -- our only
uses are comparisons against small constants (like `1`, `2`, `3`, etc.),
so there's no real loss of information, except in one or two rules where
we're now a little more conservative (with the worst-case being that we
don't flag, e.g., an `itertools.pairwise` that uses an extremely large
value for the slice start constant). For simplicity, a few diagnostics
now show a dedicated message when they see integers that are out of the
supported range (e.g., `outdated-version-block`).

An additional benefit here is that we get to remove a few dependencies,
especially `num-bigint`.

## Test Plan

`cargo test`
2023-09-25 15:13:21 +00:00
Charlie Marsh 4d6f5ff0a7
Remove `Int` wrapper type from parser (#7577)
## Summary

This is only used for the `level` field in relative imports (e.g., `from
..foo import bar`). It seems unnecessary to use a wrapper here, so this
PR changes to a `u32` directly.
2023-09-21 17:01:44 +00:00
Charlie Marsh 5df0326bc8
Treat parameters-with-newline as empty in function formatting (#7550)
## Summary

If a function has no parameters (and no comments within the parameters'
`()`), we're supposed to wrap the return annotation _whenever_ it
breaks. However, our `empty_parameters` test didn't properly account for
the case in which the parameters include a newline (but no other
content), like:

```python
def get_dashboards_hierarchy(
) -> Dict[Type['BaseDashboard'], List[Type['BaseDashboard']]]:
    """Get hierarchy of dashboards classes.

    Returns:
        Dict of dashboards classes.
    """
    dashboards_hierarchy = {}
```

This PR fixes that detection. Instead of lexing, it now checks if the
parameters itself is empty (or if it contains comments).

Closes https://github.com/astral-sh/ruff/issues/7457.
2023-09-20 16:20:22 -04:00
konsti 2cbe1733c8
Use CommentRanges in backwards lexing (#7360)
## Summary

The tokenizer was split into a forward and a backwards tokenizer. The
backwards tokenizer uses the same names as the forwards ones (e.g.
`next_token`). The backwards tokenizer gets the comment ranges that we
already built to skip comments.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2023-09-16 03:21:45 +00:00
Charlie Marsh ec2f229a45
Remove `ExprContext` from `ComparableExpr` (#7362)
`ComparableExpr` includes the `ExprContext` field on an expression, so,
e.g., the two tuples in `(a, b) = (a, b)` won't be considered equal.
Similarly, the tuples in `[(a, b) for (a, b) in c]` _also_ wouldn't be
considered equal. I find this behavior surprising, since
`ComparableExpr` is intended to allow you to compare two ASTs, but
`ExprContext` is really encoding information about the broader context
for the expression.
2023-09-14 15:40:02 +00:00
konsti f4c7bff36b
Don't reorder parameters in function calls (#7268)
## Summary

In `f(*args, a=b, *args2, **kwargs)` the args (`*args`, `*args2`) and
keywords (`a=b`, `**kwargs`) are interleaved, which we previously didn't
handle.

Fixes #6498

**main**

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| **django** | 0.99966 | 2760 | 58 |
| transformers | 0.99930 | 2587 | 447 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99825 | 648 | 22 |
| zulip | 0.99950 | 1437 | 27 |

**PR**

| project | similarity index | total files | changed files |

|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1632 |
| **django** | 0.99967 | 2760 | 53 |
| transformers | 0.99930 | 2587 | 447 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99825 | 648 | 22 |
| zulip | 0.99950 | 1437 | 27 |


## Test Plan

New fixtures
2023-09-13 09:01:49 +00:00
konsti 56440ad835
Introduce `ArgOrKeyword` to keep call parameter order (#7302)
## Motivation

The `ast::Arguments` for call argument are split into positional
arguments (args) and keywords arguments (keywords). We currently assume
that call consists of first args and then keywords, which is generally
the case, but not always:

```python
f(*args, a=2, *args2, **kwargs)

class A(*args, a=2, *args2, **kwargs):
    pass
```

The consequence is accidentally reordering arguments
(https://github.com/astral-sh/ruff/pull/7268).

## Summary

`Arguments::args_and_keywords` returns an iterator of an `ArgOrKeyword`
enum that yields args and keywords in the correct order. I've fixed the
obvious `args` and `keywords` usages, but there might be some cases with
wrong assumptions remaining.

## Test Plan

The generator got new test cases, otherwise the stacked PR
(https://github.com/astral-sh/ruff/pull/7268) which uncovered this.
2023-09-13 08:45:46 +00:00
Dhruv Manilawala 04f2842e4f
Move `ExprConstant::kind` to `StringConstant::unicode` (#7180) 2023-09-06 07:39:25 +00:00
Dhruv Manilawala fa6bff0078
Add inline documentation for `Ipy*` AST nodes (#7178) 2023-09-06 12:07:34 +05:30
Charlie Marsh b0d171ac19
Supported starred exceptions in length-one tuple detection (#7080) 2023-09-03 13:31:13 +00:00
Charlie Marsh 68f605e80a
Fix `WithItem` ranges for parenthesized, non-`as` items (#6782)
## Summary

This PR attempts to address a problem in the parser related to the
range's of `WithItem` nodes in certain contexts -- specifically,
`WithItem` nodes in parentheses that do not have an `as` token after
them.

For example,
[here](https://play.ruff.rs/71be2d0b-2a04-4c7e-9082-e72bff152679):

```python
with (a, b):
    pass
```

The range of the `WithItem` `a` is set to the range of `(a, b)`, as is
the range of the `WithItem` `b`. In other words, when we have this kind
of sequence, we use the range of the entire parenthesized context,
rather than the ranges of the items themselves.

Note that this also applies to cases
[like](https://play.ruff.rs/c551e8e9-c3db-4b74-8cc6-7c4e3bf3713a):

```python
with (a, b, c as d):
    pass
```

You can see the issue in the parser here:

```rust
#[inline]
WithItemsNoAs: Vec<ast::WithItem> = {
    <location:@L> <all:OneOrMore<Test<"all">>> <end_location:@R> => {
        all.into_iter().map(|context_expr| ast::WithItem { context_expr, optional_vars: None, range: (location..end_location).into() }).collect()
    },
}
```

Fixing this issue is... very tricky. The naive approach is to use the
range of the `context_expr` as the range for the `WithItem`, but that
range will be incorrect when the `context_expr` is itself parenthesized.
For example, _that_ solution would fail here, since the range of the
first `WithItem` would be that of `a`, rather than `(a)`:

```python
with ((a), b):
    pass
```

The `with` parsing in general is highly precarious due to ambiguities in
the grammar. Changing it in _any_ way seems to lead to an ambiguous
grammar that LALRPOP fails to translate. Consensus seems to be that we
don't really understand _why_ the current grammar works (i.e., _how_ it
avoids these ambiguities as-is).

The solution implemented here is to avoid changing the grammar itself,
and instead change the shape of the nodes returned by various rules in
the grammar. Specifically, everywhere that we return `Expr`, we instead
return `ParenthesizedExpr`, which includes a parenthesized range and the
underlying `Expr` itself. (If an `Expr` isn't parenthesized, the ranges
will be equivalent.) In `WithItemsNoAs`, we can then use the
parenthesized range as the range for the `WithItem`.
2023-08-31 16:21:29 +01:00
Valeriy Savchenko 26d53c56a2
[refurb] Implement `repeated-append` rule (`FURB113`) (#6702)
## Summary

As an initial effort with replicating `refurb` rules (#1348 ), this PR
adds support for
[FURB113](https://github.com/dosisod/refurb/blob/master/refurb/checks/builtin/list_extend.py)
and adds a new category of checks.

## Test Plan

I included a new test + checked that all other tests pass.
2023-08-28 22:51:59 +00:00
Charlie Marsh 58f5f27dc3
Add TOML files to `SourceType` (#6929)
## Summary

This PR adds a higher-level enum (`SourceType`) around `PySourceType` to
allow us to use the same detection path to handle TOML files. Right now,
we have ad hoc `is_pyproject_toml` checks littered around, and some
codepaths are omitting that logic altogether (like `add_noqa`). Instead,
we should always be required to check the source type and handle TOML
files as appropriate.

This PR will also help with our pre-commit capabilities. If we add
`toml` to pre-commit (to support `pyproject.toml`), pre-commit will
start to pass _other_ files to Ruff (along with `poetry.lock` and
`Pipfile` -- see
[identify](b59996304f/identify/extensions.py (L355))).
By detecting those files and handling those cases, we avoid attempting
to parse them as Python files, which would lead to pre-commit errors.
(We tried to add `toml` to pre-commit here
(https://github.com/astral-sh/ruff-pre-commit/pull/44), but had to
revert here (https://github.com/astral-sh/ruff-pre-commit/pull/45) as it
led to the pre-commit hook attempting to parse `poetry.lock` files as
Python files.)
2023-08-28 15:01:48 +00:00
Charlie Marsh fc89976c24
Move `Ranged` into `ruff_text_size` (#6919)
## Summary

The motivation here is that this enables us to implement `Ranged` in
crates that don't depend on `ruff_python_ast`.

Largely a mechanical refactor with a lot of regex, Clippy help, and
manual fixups.

## Test Plan

`cargo test`
2023-08-27 14:12:51 -04:00
Micha Reiser 7c480236e0
Use dyn dispatch for `any_over_*` (#6912) 2023-08-27 15:54:01 +02:00
Charlie Marsh 15b73bdb8a
Introduce AST nodes for `PatternMatchClass` arguments (#6881)
## Summary

This PR introduces two new AST nodes to improve the representation of
`PatternMatchClass`. As a reminder, `PatternMatchClass` looks like this:

```python
case Point2D(0, 0, x=1, y=2):
  ...
```

Historically, this was represented as a vector of patterns (for the `0,
0` portion) and parallel vectors of keyword names (for `x` and `y`) and
values (for `1` and `2`). This introduces a bunch of challenges for the
formatter, but importantly, it's also really different from how we
represent similar nodes, like arguments (`func(0, 0, x=1, y=2)`) or
parameters (`def func(x, y)`).

So, firstly, we now use a single node (`PatternArguments`) for the
entire parenthesized region, making it much more consistent with our
other nodes. So, above, `PatternArguments` would be `(0, 0, x=1, y=2)`.

Secondly, we now have a `PatternKeyword` node for `x=1` and `y=2`. This
is much more similar to the how `Keyword` is represented within
`Arguments` for call expressions.

Closes https://github.com/astral-sh/ruff/issues/6866.

Closes https://github.com/astral-sh/ruff/issues/6880.
2023-08-26 14:45:44 +00:00
Dhruv Manilawala d1f07008f7
Rename Notebook related symbols (#6862)
This PR renames the following symbols:

* `PySourceType::Jupyter` -> `PySourceType::Ipynb`
* `SourceKind::Jupyter` -> `SourceKind::IpyNotebook`
* `JupyterIndex` -> `NotebookIndex`
2023-08-25 11:40:54 +05:30
Charlie Marsh 847432cacf
Avoid attempting to fix PT018 in multi-statement lines (#6829)
## Summary

These fixes will _always_ fail, so we should avoid trying to construct
them in the first place.

Closes https://github.com/astral-sh/ruff/issues/6812.
2023-08-23 19:09:34 -04:00
Charlie Marsh 26e63ab137
Remove lexing from flake8-pytest-style (#6795)
## Summary

Another drive-by change to remove unnecessary custom lexing. We just
need to know the parenthesized range, so we can use...
`parenthesized_range`. I've also updated `parenthesized_range` to
support nested parentheses.

## Test Plan

`cargo test`
2023-08-23 15:54:11 +00:00
Charlie Marsh 6a5acde226
Make `Parameters` an optional field on `ExprLambda` (#6669)
## Summary

If a lambda doesn't contain any parameters, or any parameter _tokens_
(like `*`), we can use `None` for the parameters. This feels like a
better representation to me, since, e.g., what should the `TextRange` be
for a non-existent set of parameters? It also allows us to remove
several sites where we check if the `Parameters` is empty by seeing if
it contains any arguments, so semantically, we're already trying to
detect and model around this elsewhere.

Changing this also fixes a number of issues with dangling comments in
parameter-less lambdas, since those comments are now automatically
marked as dangling on the lambda. (As-is, we were also doing something
not-great whereby the lambda was responsible for formatting dangling
comments on the parameters, which has been removed.)

Closes https://github.com/astral-sh/ruff/issues/6646.

Closes https://github.com/astral-sh/ruff/issues/6647.

## Test Plan

`cargo test`
2023-08-18 15:34:54 +00:00
Charlie Marsh 1050142a58
Expand expressions to include parentheses in E712 (#6575)
## Summary

This PR exposes our `is_expression_parenthesized` logic such that we can
use it to expand expressions when autofixing to include their
parenthesized ranges.

This solution has a few drawbacks: (1) we need to compute parenthesized
ranges in more places, which also relies on backwards lexing; and (2) we
need to make use of this in any relevant fixes.

However, I still think it's worth pursuing. On (1), the implementation
is very contained, so IMO we can easily swap this out for a more
performant solution in the future if needed. On (2), this improves
correctness and fixes some bad syntax errors detected by fuzzing, which
means it has value even if it's not as robust as an _actual_
`ParenthesizedExpression` node in the AST itself.

Closes https://github.com/astral-sh/ruff/issues/4925.

## Test Plan

`cargo test` with new cases that previously failed the fuzzer.
2023-08-17 15:51:09 +00:00
Charlie Marsh db1c556508
Implement `Ranged` on more structs (#6639)
## Summary

I noticed some inconsistencies around uses of `.range.start()`, structs
that have a `TextRange` field but don't implement `Ranged`, etc.

## Test Plan

`cargo test`
2023-08-17 11:22:39 -04:00
Charlie Marsh 1334232168
Introduce `ExpressionRef` (#6637)
## Summary

This PR revives the `ExpressionRef` concept introduced in
https://github.com/astral-sh/ruff/pull/5644, motivated by the change we
want to make in https://github.com/astral-sh/ruff/pull/6575 to narrow
the type of the expression that can be passed to `parenthesized_range`.

## Test Plan

`cargo test`
2023-08-17 10:07:16 -04:00
Micha Reiser 455db84a59
Replace `inline(always)` with `inline` (#6590) 2023-08-15 08:58:11 +02:00
Charlie Marsh 7f7df852e8
Remove some extraneous newlines in Cargo.toml (#6577) 2023-08-14 23:39:41 +00:00
Charlie Marsh 96d310fbab
Remove `Stmt::TryStar` (#6566)
## Summary

Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the
pattern we've migrated towards for `Stmt::For` (removing
`Stmt::AsyncFor`) and friends. While these are significant differences
for an interpreter, we tend to handle these cases identically or nearly
identically.

## Test Plan

`cargo test`
2023-08-14 13:39:44 -04:00
Charlie Marsh a7cf8f0b77
Replace dynamic implicit concatenation detection with parser flag (#6513)
## Summary

In https://github.com/astral-sh/ruff/pull/6512, we added a flag to the
AST to mark implicitly-concatenated string expressions. This PR makes
use of that flag to remove the `is_implicit_concatenation` method.

## Test Plan

`cargo test`
2023-08-14 10:27:17 -04:00
Charlie Marsh f16e780e0a
Add an implicit concatenation flag to string and bytes constants (#6512)
## Summary

Per the discussion in
https://github.com/astral-sh/ruff/discussions/6183, this PR adds an
`implicit_concatenated` flag to the string and bytes constant variants.
It's not actually _used_ anywhere as of this PR, but it is covered by
the tests.

Specifically, we now use a struct for the string and bytes cases, along
with the `Expr::FString` node. That struct holds the value, plus the
flag:

```rust
#[derive(Clone, Debug, PartialEq, is_macro::Is)]
pub enum Constant {
    Str(StringConstant),
    Bytes(BytesConstant),
    ...
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct StringConstant {
    /// The string value as resolved by the parser (i.e., without quotes, or escape sequences, or
    /// implicit concatenations).
    pub value: String,
    /// Whether the string contains multiple string tokens that were implicitly concatenated.
    pub implicit_concatenated: bool,
}

impl Deref for StringConstant {
    type Target = str;
    fn deref(&self) -> &Self::Target {
        self.value.as_str()
    }
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct BytesConstant {
    /// The bytes value as resolved by the parser (i.e., without quotes, or escape sequences, or
    /// implicit concatenations).
    pub value: Vec<u8>,
    /// Whether the string contains multiple string tokens that were implicitly concatenated.
    pub implicit_concatenated: bool,
}

impl Deref for BytesConstant {
    type Target = [u8];
    fn deref(&self) -> &Self::Target {
        self.value.as_slice()
    }
}
```

## Test Plan

`cargo test`
2023-08-14 13:46:54 +00:00
Micha Reiser 9584f613b9
Remove `allow(pedantic)` from formatter (#6549) 2023-08-14 14:02:06 +02:00
Micha Reiser ac5c8bb3b6
Add `AnyNodeRef.visit_preorder`
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR adds the `AnyNodeRef.visit_preorder` method. I'll need this method to mark all comments of a suppressed node's children as formatted (in debug builds). 

I'm not super happy with this because it now requires a double-dispatch where the `walk_*` methods call into `node.visit_preorder` and the `visit_preorder` then calls back into the visitor. Meaning,
the new implementation now probably results in way more function calls. The other downside is that `AnyNodeRef` now contains code that is difficult to auto-generate. This could be mitigated by extracting the `visit_preorder` method into its own `VisitPreorder` trait. 

Anyway, this approach solves the need and avoids duplicating the visiting code once more. 

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

`cargo test`

<!-- How was it tested? -->
2023-08-10 08:35:09 +02:00
Charlie Marsh 395bb31247
Improve counting of message arguments when msg is provided as a keyword (#6456)
Closes https://github.com/astral-sh/ruff/issues/6454.
2023-08-09 20:39:10 +00:00
Dhruv Manilawala 6a64f2289b
Rename `Magic*` to `IpyEscape*` (#6395)
## Summary

This PR renames the `MagicCommand` token to `IpyEscapeCommand` token and
`MagicKind` to `IpyEscapeKind` type to better reflect the purpose of the
token and type. Similarly, it renames the AST nodes from `LineMagic` to
`IpyEscapeCommand` prefixed with `Stmt`/`Expr` wherever necessary.

It also makes renames from using `jupyter_magic` to
`ipython_escape_commands` in various function names.

The mode value is still `Mode::Jupyter` because the escape commands are
part of the IPython syntax but the lexing/parsing is done for a Jupyter
notebook.

### Motivation behind the rename:
* IPython codebase defines it as "EscapeCommand" / "Escape Sequences":
* Escape Sequences:
292e3a2345/IPython/core/inputtransformer2.py (L329-L333)
* Escape command:
292e3a2345/IPython/core/inputtransformer2.py (L410-L411)
* The word "magic" is used mainly for the actual magic commands i.e.,
the ones starting with `%`/`%%`
(https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system).
So, this avoids any confusion between the Magic token (`%`, `%%`) and
the escape command itself.
## Test Plan

* `cargo test` to make sure all renames are done correctly.
* `grep` for `jupyter_escape`/`magic` to make sure all renames are done
correctly.
2023-08-09 13:28:18 +00:00
Micha Reiser a39dd76d95
Add `enter` and `leave_node` methods to Preoder visitor (#6422) 2023-08-09 09:09:00 +00:00
Charlie Marsh 3f0eea6d87
Rename `JoinedStr` to `FString` in the AST (#6379)
## Summary

Per the proposal in https://github.com/astral-sh/ruff/discussions/6183,
this PR renames the `JoinedStr` node to `FString`.
2023-08-07 17:33:17 +00:00
Charlie Marsh c439435615
Use dedicated AST nodes on `MemberKind` (#6374)
## Summary

This PR leverages the unified function definition node to add precise
AST node types to `MemberKind`, which is used to power our docstring
definition tracking (e.g., classes and functions, whether they're
methods or functions or nested functions and so on, whether they have a
docstring, etc.). It was painful to do this in the past because the
function variants needed to support a union anyway, but storing precise
nodes removes like a dozen panics.

No behavior changes -- purely a refactor.

## Test Plan

`cargo test`
2023-08-07 17:17:58 +00:00
Charlie Marsh daefa74e9a
Remove async AST node variants for `with`, `for`, and `def` (#6369)
## Summary

Per the suggestion in
https://github.com/astral-sh/ruff/discussions/6183, this PR removes
`AsyncWith`, `AsyncFor`, and `AsyncFunctionDef`, replacing them with an
`is_async` field on the non-async variants of those structs. Unlike an
interpreter, we _generally_ have identical handling for these nodes, so
separating them into distinct variants adds complexity from which we
don't really benefit. This can be seen below, where we get to remove a
_ton_ of code related to adding generic `Any*` wrappers, and a ton of
duplicate branches for these cases.

## Test Plan

`cargo test` is unchanged, apart from parser snapshots.
2023-08-07 16:36:02 +00:00
Charlie Marsh c895252aae
Remove `RefEquality` (#6393)
## Summary

See discussion in
https://github.com/astral-sh/ruff/pull/6351#discussion_r1284996979. We
can remove `RefEquality` entirely and instead use a text offset for
statement keys, since no two statements can start at the same text
offset.

## Test Plan

`cargo test`
2023-08-07 16:04:50 +00:00
Dhruv Manilawala e4a4660925
Support help end escape command with priority (#6272)
## Summary

This PR adds support for help end escape command in the lexer.

### What are "help end escape commands"?

First, the escape commands are special IPython syntax which enhances the
functionality for the IPython REPL. There are 9 types of escape kinds
which are recognized by the tokens which are present at the start of the
command (`?`, `??`, `!`, `!!`, etc.).

Here, the help command is using either the `?` or `??` token at the
start (`?str.replace` for example). Those 2 tokens are also supported
when they're at the end of the command (`str.replace?`), but the other
tokens aren't supported in that position.

There are mainly two types of help end escape commands:
1. Ending with either `?` or `??`, but it also starts with one of the
escape tokens (`%matplotlib?`)
2. On the other hand, there's a stricter version for (1) which doesn't
start with any escape tokens (`str.replace?`)

This PR adds support for (1) while (2) will be supported in the parser.

### Priority

Now, if the command starts and ends with an escape token, how do we
decide the kind of this command? This is where priority comes into
picture. This is simple as there's only one priority where `?`/`??` at
the end takes priority over any other escape token and all of the other
tokens are at the same priority. Remember that only `?`/`??` at the end
is considered valid.

This is mainly useful in the case where someone would want to invoke the
help command on the magic command itself. For example, in `%matplotlib?`
the help command takes priority which means that we want help for the
`matplotlib` magic function instead of calling the magic function
itself.

### Specification

Here's where things get a bit tricky. What if there are question mark
tokens at both ends. How do we decide if it's `Help` (`?`) kind or
`Help2` (`??`) kind?

|     | Magic       | Value     | Kind    |
| --- | ---         | ---       | ---     |
| 1   | `?foo?`     | `foo`     | `Help`  |
| 2   | `??foo?`    | `foo`     | `Help`  |
| 3   | `?foo??`    | `foo`     | `Help2` |
| 4   | `??foo??`   | `foo`     | `Help2` |
| 5   | `???foo??`  | `foo`     | `Help2` |
| 6   | `??foo???`  | `foo???`  | `Help2` |
| 7   | `???foo???` | `?foo???` | `Help2` |

Looking at the above table:

- The question mark tokens on the right takes priority over the ones on
the left but only if the number of question mark on the right is 1 or 2.
- If there are more than 2 question mark tokens on the right side, then
the left side is used to determine the same.
- If the right side is used to determine the kind, then all of the
question marks and whitespaces on the left side are ignored in the
`value`, but if it’s the other way around, then all of the extra
question marks are part of the `value`.

### References

- IPython implementation using the regex:
292e3a2345/IPython/core/inputtransformer2.py (L454-L462)
- Priorities:
292e3a2345/IPython/core/inputtransformer2.py (L466-L469)

## Test Plan

Add a bunch of test cases for the lexer and verify that it matches the
behavior of
IPython transformer.

resolves: #6357
2023-08-07 21:01:02 +05:30
Charlie Marsh 76148ddb76
Store call paths rather than stringified names (#6102)
## Summary

Historically, we've stored "qualified names" on our
`BindingKind::Import`, `BindingKind::SubmoduleImport`, and
`BindingKind::ImportFrom` structs. In Ruff, a "qualified name" is a
dot-separated path to a symbol. For example, given `import foo.bar`, the
"qualified name" would be `"foo.bar"`; and given `from foo.bar import
baz`, the "qualified name" would be `foo.bar.baz`.

This PR modifies the `BindingKind` structs to instead store _call paths_
rather than qualified names. So in the examples above, we'd store
`["foo", "bar"]` and `["foo", "bar", "baz"]`. It turns out that this
more efficient given our data access patterns. Namely, we frequently
need to convert the qualified name to a call path (whenever we call
`resolve_call_path`), and it turns out that we do this operation enough
that those conversations show up on benchmarks.

There are a few other advantages to using call paths, rather than
qualified names:

1. The size of `BindingKind` is reduced from 32 to 24 bytes, since we no
longer need to store a `String` (only a boxed slice).
2. All three import types are more consistent, since they now all store
a boxed slice, rather than some storing an `&str` and some storing a
`String` (for `BindingKind::ImportFrom`, we needed to allocate a
`String` to create the qualified name, but the call path is a slice of
static elements that don't require that allocation).
3. A lot of code gets simpler, in part because we now do call path
resolution "earlier". Most notably, for relative imports (`from .foo
import bar`), we store the _resolved_ call path rather than the relative
call path, so the semantic model doesn't have to deal with that
resolution. (See that `resolve_call_path` is simpler, fewer branches,
etc.)

In my testing, this change improves the all-rules benchmark by another
4-5% on top of the improvements mentioned in #6047.
2023-08-05 15:21:50 +00:00
Dhruv Manilawala 1ac2699b5e
Update `F841` autofix to not remove line magic expr (#6141)
## Summary

Update `F841` autofix to not remove line magic expr

## Test Plan

Added test case for assignment statement with and without type
annotation

fixes: #6116
2023-08-05 00:45:01 +00:00
konsti 1031bb6550
Formatter: Add SourceType to context to enable special formatting for stub files (#6331)
**Summary** This adds the information whether we're in a .py python
source file or in a .pyi stub file to enable people working on #5822 and
related issues.

I'm not completely happy with `Default` for something that depends on
the input.

**Test Plan** None, this is currently unused, i'm leaving this to first
implementation of stub file specific formatting.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2023-08-04 11:52:26 +00:00
Charlie Marsh 2fa508793f
Return a slice in `StmtClassDef#bases` (#6311)
Slices are strictly more flexible, since you can always convert to an
iterator, etc., but not the other way around. Suggested in
https://github.com/astral-sh/ruff/pull/6259#discussion_r1282730994.
2023-08-03 16:21:55 +00:00
Charlie Marsh 9f3567dea6
Use `range: _` in lieu of `range: _range` (#6296)
## Summary

`range: _range` is slightly inconvenient because you can't use it
multiple times within a single match, unlike `_`.
2023-08-02 22:11:13 -04:00
Zanie Blue 1a60d1e3c6
Add formatting of type parameters in class and function definitions (#6161)
Part of #5062 
Closes https://github.com/astral-sh/ruff/issues/5931

Implements formatting of a sequence of type parameters in a dedicated
struct for reuse by classes, functions, and type aliases (preparing for
#5929). Adds formatting of type parameters in class and function
definitions — previously, they were just elided.
2023-08-02 20:29:28 +00:00
Charlie Marsh 23b8fc4366
Move `includes_arg_name` onto `Parameters` (#6282)
## Summary

Like #6279, no reason for this to be a standalone method.
2023-08-02 18:05:26 +00:00
Charlie Marsh fd40864924
Move `find_keyword` helpers onto `Arguments` struct (#6280)
## Summary

Similar to #6279, moving some helpers onto the struct in the name of
reducing the number of random undiscoverable utilities we have in
`helpers.rs`.

Most of the churn is migrating rules to take `ast::ExprCall` instead of
the spread call arguments.

## Test Plan

`cargo test`
2023-08-02 13:54:48 -04:00
Charlie Marsh 041946fb64
Remove `CallArguments` abstraction (#6279)
## Summary

This PR removes a now-unnecessary abstraction from `helper.rs`
(`CallArguments`), in favor of adding methods to `Arguments` directly,
which helps with discoverability.
2023-08-02 13:25:43 -04:00
Charlie Marsh 8a0f844642
Box type params and arguments fields on the class definition node (#6275)
## Summary

This PR boxes the `TypeParams` and `Arguments` fields on the class
definition node. These fields are optional and often emitted, and given
that class definition is our largest enum variant, we pay the cost of
including them for every statement in the AST. Boxing these types
reduces the statement size by 40 bytes, which seems like a good tradeoff
given how infrequently these are accessed.

## Test Plan

Need to benchmark, but no behavior changes.
2023-08-02 16:47:06 +00:00
Charlie Marsh 4c53bfe896
Add formatter support for call and class definition `Arguments` (#6274)
## Summary

This PR leverages the `Arguments` AST node introduced in #6259 in the
formatter, which ensures that we correctly handle trailing comments in
calls, like:

```python
f(
  1,
  # comment
)

pass
```

(Previously, this was treated as a leading comment on `pass`.)

This also allows us to unify the argument handling across calls and
class definitions.

## Test Plan

A bunch of new fixture tests, plus improved Black compatibility.
2023-08-02 11:54:22 -04:00
Charlie Marsh b095b7204b
Add a `TypeParams` node to the AST (#6261)
## Summary

Similar to #6259, this PR adds a `TypeParams` node to the AST, to
capture the list of type parameters with their surrounding brackets.

If a statement lacks type parameters, the `type_params` field will be
`None`.
2023-08-02 14:12:45 +00:00
Charlie Marsh 981e64f82b
Introduce an `Arguments` AST node for function calls and class definitions (#6259)
## Summary

This PR adds a new `Arguments` AST node, which we can use for function
calls and class definitions.

The `Arguments` node spans from the left (open) to right (close)
parentheses inclusive.

In the case of classes, the `Arguments` is an option, to differentiate
between:

```python
# None
class C: ...

# Some, with empty vectors
class C(): ...
```

In this PR, we don't really leverage this change (except that a few
rules get much simpler, since we don't need to lex to find the start and
end ranges of the parentheses, e.g.,
`crates/ruff/src/rules/pyupgrade/rules/lru_cache_without_parameters.rs`,
`crates/ruff/src/rules/pyupgrade/rules/unnecessary_class_parentheses.rs`).

In future PRs, this will be especially helpful for the formatter, since
we can track comments enclosed on the node itself.

## Test Plan

`cargo test`
2023-08-02 10:01:13 -04:00
Charlie Marsh 9c708d8fc1
Rename `Parameter#arg` and `ParameterWithDefault#def` fields (#6255)
## Summary

This PR renames...

- `Parameter#arg` to `Parameter#name`
- `ParameterWithDefault#def` to `ParameterWithDefault#parameter` (such
that `ParameterWithDefault` has a `default` and a `parameter`)

## Test Plan

`cargo test`
2023-08-01 14:28:34 -04:00
Charlie Marsh adc8bb7821
Rename `Arguments` to `Parameters` in the AST (#6253)
## Summary

This PR renames a few AST nodes for clarity:

- `Arguments` is now `Parameters`
- `Arg` is now `Parameter`
- `ArgWithDefault` is now `ParameterWithDefault`

For now, the attribute names that reference `Parameters` directly are
changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters`
itself are not (e.g., `vararg`). We may revisit that decision in the
future.

For context, the AST node formerly known as `Arguments` is used in
function definitions. Formally (outside of the Python context),
"arguments" typically refers to "the values passed to a function", while
"parameters" typically refers to "the variables used in a function
definition". E.g., if you Google "arguments vs parameters", you'll get
some explanation like:

> A parameter is a variable in a function definition. It is a
placeholder and hence does not have a concrete value. An argument is a
value passed during function invocation.

We're thus deviating from Python's nomenclature in favor of a scheme
that we find to be more precise.
2023-08-01 13:53:28 -04:00
konsti 1df7e9831b
Replace `.map_or(false, $closure)` with `.is_some_and(closure)` (#6244)
**Summary**
[Option::is_some_and](https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.is_some_and)
and
[Result::is_ok_and](https://doc.rust-lang.org/std/result/enum.Result.html#method.is_ok_and)
are new methods is rust 1.70. I find them way more readable than
`.map_or(false, ...)`.

The changes are `s/.map_or(false,/.is_some_and(/g`, then manually
switching to `is_ok_and` where the value is a Result rather than an
Option.

**Test Plan** n/a^
2023-08-01 19:29:42 +02:00
Micha Reiser debfca3a11
Remove `Parse` trait (#6235) 2023-08-01 18:35:03 +02:00
Charlie Marsh 83fe103d6e
Allow generic tuple and list calls in __all__ (#6247)
## Summary

Allows, e.g., `__all__ = list[str]()`.

Closes https://github.com/astral-sh/ruff/issues/6226.
2023-08-01 12:01:48 -04:00
Micha Reiser f45e8645d7
Remove unused parser modes
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR removes the `Interactive` and `FunctionType` parser modes that are unused by ruff

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

`cargo test`

<!-- How was it tested? -->
2023-08-01 13:10:07 +02:00
Micha Reiser 7c7231db2e
Remove unsupported `type_comment` field
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR removes the `type_comment` field which our parser doesn't support.

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

`cargo test`

<!-- How was it tested? -->
2023-08-01 12:53:13 +02:00
Micha Reiser 4ad5903ef6
Delete type-ignore node
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR removes the type ignore node from the AST because our parser doesn't support it, and just having it around is confusing.

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

`cargo build`

<!-- How was it tested? -->
2023-08-01 12:34:50 +02:00
Micha Reiser ecfdd8d58b
Add static assertions to nodes (#6228) 2023-08-01 11:54:49 +02:00
David Szotten ba990b676f
add `DebugText` for self-documenting f-strings (#6167) 2023-08-01 07:55:03 +02:00
Charlie Marsh 646ff6497c
Ignore end-of-line file exemption comments (#6160)
## Summary

This PR protects against code like:

```python
from typing import Optional

import bar  # ruff: noqa
import baz

class Foo:
    x: Optional[str] = None
```

In which the user wrote `# ruff: noqa` to ignore a specific error, not
realizing that it was a file-level exemption that thus turned off all
lint rules.

Specifically, if a `# ruff: noqa` directive is not at the start of a
line, we now ignore it and warn, since this is almost certainly a
mistake.
2023-07-29 00:40:32 +00:00
Micha Reiser 40f54375cb
Pull in RustPython parser (#6099) 2023-07-27 09:29:11 +00:00
konsti 13f9a16e33
Rewrite placement logic (#6040)
## Summary
This is a rewrite of the main comment placement logic. `place_comment`
now has three parts:

- place own line comments
  - between branches
  - after a branch
- place end-of-line comments
  - after colon
  - after a branch
- place comments for specific nodes (that include module level comments)

The rewrite fixed three bugs: `class A: # trailing comment` comments now
stay end-of-line, `try: # comment` remains end-of-line and deeply
indented try-else-finally comments remain with the right nested
statement.

It will be much easier to give more alternative branches nodes since
this is abstracted away by `is_node_with_body` and the first/last child
helpers. Adding new node types can now be done by adding an entry to the
`place_comment` match. The code went from 1526 lines before #6033 to
1213 lines now.

It thinks it easier to just read the new `placement.rs` rather than
reviewing the diff.

## Test Plan

The existing fixtures staying the same or improving plus new ones for
the bug fixes.
2023-07-26 16:21:23 +00:00
Micha Reiser 2cf00fee96
Remove parser dependency from ruff-python-ast (#6096) 2023-07-26 17:47:22 +02:00
Micha Reiser 16e1737d1b
Use cursor based lexer (#6012) 2023-07-26 11:32:26 +02:00
Dhruv Manilawala 025fa4eba8
Integrate the new Jupyter AST nodes in Ruff (#6086)
## Summary

This PR adds the implementation for the new Jupyter AST nodes i.e.,
`ExprLineMagic` and `StmtLineMagic`.

## Test Plan

Add test cases for `unparse` containing magic commands

resolves: #6087
2023-07-26 08:20:30 +00:00
Harutaka Kawamura 62f821daaa
Avoid raising PT012 for simple `with` statements (#6081) 2023-07-26 01:43:31 +00:00
Zanie Blue 389fe13c93
Implement visitation of type aliases and parameters (#5927)
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

<!-- What's the purpose of the change? What does it do, and why? -->

Part of #5062 
Requires https://github.com/astral-sh/RustPython-Parser/pull/32

Adds visitation of type alias statements and type parameters in class
and function definitions.

Duplicates tests for `PreorderVisitor` into `Visitor` with new
snapshots. Testing required node implementations for the `TypeParam`
enum, which is a chunk of the diff and the reason we need `Ranged`
implementations in
https://github.com/astral-sh/RustPython-Parser/pull/32.

## Test Plan

<!-- How was it tested? -->

Adds unit tests with snapshots.
2023-07-25 17:11:26 +00:00
konsti e7f228f781
Placement refactor (#6034)
## Summary

This PR is a refactoring of placement.rs. The code got more consistent,
some comments were updated and some dead code was removed or replaced
with debug assertions. It also contains a bugfix for the placement of
end-of-branch comments with nested bodies inside try statements that
occurred when refactoring the nested body loop.

## Test Plan

The existing test cases don't change. I added a couple of cases that i
think should be tested but weren't, and a regression test for the bugfix
2023-07-25 11:49:05 +02:00
Charlie Marsh 0d94337b96
Avoid allocations in `SimpleCallArgs` (#6021)
## Summary

My intuition is that it's faster to do these checks as-needed rather
than allocation new hash maps and vectors for the arguments. (We
typically only query once anyway.)
2023-07-24 04:55:37 +00:00
Charlie Marsh 9834c69c98
Remove `__all__` enforcement rules out of binding phase (#5897)
## Summary

This PR moves two rules (`invalid-all-format` and `invalid-all-object`)
out of the name-binding phase, and into the dedicated pass over all
bindings that occurs at the end of the `Checker`. This is part of my
continued quest to separate the semantic model-building logic from the
actual rule enforcement.
2023-07-19 21:18:47 +00:00
Zanie Blue b27f0fa433
Implement `any_over_expr` for type alias and type params (#5866)
Part of https://github.com/astral-sh/ruff/issues/5062
2023-07-19 16:17:06 -05:00
Charlie Marsh 5f3da9955a
Rename `ruff_python_whitespace` to `ruff_python_trivia` (#5886)
## Summary

This crate now contains utilities for dealing with trivia more broadly:
whitespace, newlines, "simple" trivia lexing, etc. So renaming it to
reflect its increased responsibilities.

To avoid conflicts, I've also renamed `Token` and `TokenKind` to
`SimpleToken` and `SimpleTokenKind`.
2023-07-19 11:48:27 -04:00
Charlie Marsh 626d8dc2cc
Use `.as_ref()` in lieu of `&**` (#5874)
I find this less opaque (and often more succinct).
2023-07-19 00:49:13 +00:00
Charlie Marsh 2d505e2b04
Remove suite body tracking from `SemanticModel` (#5848)
## Summary

The `SemanticModel` currently stores the "body" of a given `Suite`,
along with the current statement index. This is used to support "next
sibling" queries, but we only use this in exactly one place -- the rule
that simplifies constructs like this to `any` or `all`:

```python
for x in y:
    if x == 0:
        return True
return False
```

Instead of tracking the state, we can just do a (slightly more
expensive) traversal, by finding the node within its parent and
returning the next node in the body.

Note that we'll only have to do this extremely rarely -- namely, for
functions that contain something like:

```python
for x in y:
    if x == 0:
        return True
```
2023-07-18 18:58:31 -04:00
Zanie Blue a93254f026
Implement `unparse` for type aliases and parameters (#5869)
Part of https://github.com/astral-sh/ruff/issues/5062
2023-07-18 16:25:49 -05:00
Zanie Blue 41da52a61b
Implement `TokenKind` for type aliases (#5870)
Part of https://github.com/astral-sh/ruff/issues/5062
2023-07-18 18:21:51 +00:00
Zanie Blue d5c43a45b3
Implement `Comparable` for type aliases and parameters (#5865)
Part of https://github.com/astral-sh/ruff/issues/5062
2023-07-18 17:18:14 +00:00
Zanie Blue 0eab4b3c22
Implement `AnyNode` and `AnyNodRef` for `StmtTypeAlias` (#5863)
Part of https://github.com/astral-sh/ruff/issues/5062
2023-07-18 10:44:55 -05:00
Charlie Marsh c868def374
Unroll `collect_call_path` to speed up common cases (#5792)
## Summary

This PR just naively unrolls `collect_call_path` to handle attribute
resolutions of up to eight segments. In profiling via Instruments, it
seems to be about 4x faster for a very hot code path (4% of total
execution time on `main`, 1% here).

Profiling by running `RAYON_NUM_THREADS=1 cargo instruments -t time
--profile release-debug --time-limit 10000 -p ruff_cli -o
FromSlice.trace -- check crates/ruff/resources/test/cpython --silent -e
--no-cache --select ALL`, and modifying the linter to loop infinitely up
to the specified time (10 seconds) to increase sample size.

Before:

<img width="1792" alt="Screen Shot 2023-07-15 at 5 13 34 PM"
src="https://github.com/astral-sh/ruff/assets/1309177/4a8b0b45-8b67-43e9-af5e-65b326928a8e">

After:

<img width="1792" alt="Screen Shot 2023-07-15 at 8 38 51 PM"
src="https://github.com/astral-sh/ruff/assets/1309177/d8829159-2c79-4a49-ab3c-9e4e86f5b2b1">
2023-07-18 11:29:59 -04:00
konsti 730e6b2b4c
Refactor `StmtIf`: Formatter and Linter (#5459)
## Summary

Previously, `StmtIf` was defined recursively as
```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub orelse: Vec<Stmt>,
}
```
Every `elif` was represented as an `orelse` with a single `StmtIf`. This
means that this representation couldn't differentiate between
```python
if cond1:
    x = 1
else:
    if cond2:
        x = 2
```
and 
```python
if cond1:
    x = 1
elif cond2:
    x = 2
```
It also makes many checks harder than they need to be because we have to
recurse just to iterate over an entire if-elif-else and because we're
lacking nodes and ranges on the `elif` and `else` branches.

We change the representation to a flat

```rust
pub struct StmtIf {
    pub range: TextRange,
    pub test: Box<Expr>,
    pub body: Vec<Stmt>,
    pub elif_else_clauses: Vec<ElifElseClause>,
}

pub struct ElifElseClause {
    pub range: TextRange,
    pub test: Option<Expr>,
    pub body: Vec<Stmt>,
}
```
where `test: Some(_)` represents an `elif` and `test: None` an else.

This representation is different tradeoff, e.g. we need to allocate the
`Vec<ElifElseClause>`, the `elif`s are now different than the `if`s
(which matters in rules where want to check both `if`s and `elif`s) and
the type system doesn't guarantee that the `test: None` else is actually
last. We're also now a bit more inconsistent since all other `else`,
those from `for`, `while` and `try`, still don't have nodes. With the
new representation some things became easier, e.g. finding the `elif`
token (we can use the start of the `ElifElseClause`) and formatting
comments for if-elif-else (no more dangling comments splitting, we only
have to insert the dangling comment after the colon manually and set
`leading_alternate_branch_comments`, everything else is taken of by
having nodes for each branch and the usual placement.rs fixups).

## Merge Plan

This PR requires coordination between the parser repo and the main ruff
repo. I've split the ruff part, into two stacked PRs which have to be
merged together (only the second one fixes all tests), the first for the
formatter to be reviewed by @michareiser and the second for the linter
to be reviewed by @charliermarsh.

* MH: Review and merge
https://github.com/astral-sh/RustPython-Parser/pull/20
* MH: Review and merge or move later in stack
https://github.com/astral-sh/RustPython-Parser/pull/21
* MH: Review and approve
https://github.com/astral-sh/RustPython-Parser/pull/22
* MH: Review and approve formatter PR
https://github.com/astral-sh/ruff/pull/5459
* CM: Review and approve linter PR
https://github.com/astral-sh/ruff/pull/5460
* Merge linter PR in formatter PR, fix ecosystem checks (ecosystem
checks can't run on the formatter PR and won't run on the linter PR, so
we need to merge them first)
 * Merge https://github.com/astral-sh/RustPython-Parser/pull/22
 * Create tag in the parser, update linter+formatter PR
 * Merge linter+formatter PR https://github.com/astral-sh/ruff/pull/5459

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2023-07-18 13:40:15 +02:00
David Szotten 52aa2fc875
upgrade rustpython to remove tuple-constants (#5840)
c.f. https://github.com/astral-sh/RustPython-Parser/pull/28

Tests: No snapshots changed

---------

Co-authored-by: Zanie <contact@zanie.dev>
2023-07-17 22:50:31 +00:00
Charlie Marsh 2cd117ba81
Remove `TryIdentifier` trait (#5816)
## Summary

Last remaining usage here is for patterns, but we now have ranges on
identifiers so it's unnecessary.
2023-07-16 21:24:16 -04:00
Charlie Marsh 01b05fe247
Remove `Identifier` usages for isolating exception names (#5797)
## Summary

The motivating change here is to remove `let range =
except_handler.try_identifier().unwrap();` and instead just do
`name.range()`, since exception names now have ranges attached to them
by the parse. This also required some refactors (which are improvements)
to the built-in attribute shadowing rules, since at least one invocation
relied on passing in the exception handler and calling
`.try_identifier()`. Now that we have easy access to identifiers, we can
remove the whole `AnyShadowing` abstraction.
2023-07-16 04:49:48 +00:00
Charlie Marsh 4782675bf9
Remove lexer-based comment range detection (#5785)
## Summary

I'm doing some unrelated profiling, and I noticed that this method is
actually measurable on the CPython benchmark -- it's > 1% of execution
time. We don't need to lex here, we already know the ranges of all
comments, so we can just do a simple binary search for overlap, which
brings the method down to 0%.

## Test Plan

`cargo test`
2023-07-16 01:03:27 +00:00
guillaumeLepape 6824b67f44
Include alias when formatting import-from structs (#5786)
## Summary

When required-imports is set with the syntax from ... import ... as ...,
autofix I002 is failing

## Test Plan

Reuse the same python files as
`crates/ruff/src/rules/isort/mod.rs:required_import` test.
2023-07-15 15:53:21 -04:00
Charlie Marsh 5a4516b812
Misc. stylistic changes from flipping through rules late at night (#5757)
## Summary

This is really bad PR hygiene, but a mix of: using `Locator`-based fixes
in a few places (in lieu of `Generator`-based fixes), using match syntax
to avoid `.len() == 1` checks, using common helpers in more places, etc.

## Test Plan

`cargo test`
2023-07-14 05:23:47 +00:00
Charlie Marsh 6dbc6d2e59
Use shared `Cursor` across crates (#5715)
## Summary

We have two `Cursor` implementations. This PR moves the implementation
from the formatter into `ruff_python_whitespace` (kind of a poorly-named
crate now) and uses it for both use-cases.
2023-07-12 21:09:27 +00:00
Charlie Marsh 4dee49d6fa
Run nightly Clippy over the Ruff repo (#5670)
## Summary

This is the result of running `cargo +nightly clippy --workspace
--all-targets --all-features -- -D warnings` and fixing all violations.
Just wanted to see if there were any interesting new checks on nightly
👀
2023-07-10 23:44:38 -04:00
konsti 0b9af031fb
Format ExprIfExp (ternary operator) (#5597)
## Summary

Format `ExprIfExp`, also known as the ternary operator or inline `if`.
It can look like
```python
a1 = 1 if True else 2
```
but also
```python
b1 = (
    # We return "a" ...
    "a" # that's our True value
    # ... if this condition matches ...
    if True # that's our test
    # ... otherwise we return "b§
    else "b" # that's our False value
)
```

This also fixes a visitor order bug.

The jaccard index on django goes from 0.911 to 0.915.

## Test Plan

I added fixtures without and with comments in strange places.
2023-07-07 19:11:52 +00:00
konsti 8184235f93
Try statements have a body: Fix formatter instability (#5558)
## Summary

The following code was previously leading to unstable formatting:
```python
try:
    try:
        pass
    finally:
        print(1)  # issue7208
except A:
    pass
```
The comment would be formatted as a trailing comment of `try` which is
unstable as an end-of-line comment gets two extra whitespaces.

This was originally found in
99b00efd5e/Lib/getpass.py (L68-L91)

## Test Plan

I added a regression test
2023-07-06 16:07:47 +02:00
Charlie Marsh dadad0e9ed
Remove some allocations in argument detection (#5481)
## Summary

Drive-by PR to remove some allocations around argument name matching.
2023-07-03 12:21:26 -04:00
Anders Kaseorg df13e69c3c
Format let-else with rustfmt nightly (#5461)
Support for `let…else` formatting was just merged to nightly
(rust-lang/rust#113225). Rerun `cargo fmt` with Rust nightly 2023-07-02
to pick this up. Followup to #939.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2023-07-03 02:13:35 +00:00
Charlie Marsh fa1b85b3da
Remove prelude from `ruff_python_ast` (#5369)
## Summary

Per @MichaReiser, this is causing more confusion than it is helpful.
2023-06-26 11:43:49 -04:00
Micha Reiser 6ba9d5d5a4
Upgrade RustPython (#5334) 2023-06-23 20:39:47 +00:00
James Berry f85eb709e2
Visit AugAssign target after value (#5325)
## Summary

When visiting AugAssign in evaluation order, the AugAssign `target`
should be visited after it's `value`. Based on my testing, the pseudo
code for `a += b` is effectively:
```python
tmp = a
a = tmp.__iadd__(b)
```

That is, an ideal traversal order would look something like this:
1. load a
2. b
3. op
4. store a

But, there is only a single AST node which captures `a` in the statement
`a += b`, so it cannot be traversed both before and after the traversal
of `b` and the `op`.

Nonetheless, I think traversing `a` after `b` and the `op` makes the
most sense for a number of reasons:
1. All the other assignment expressions traverse their `value`s before
their `target`s. Having `AugAssign` traverse in the same order would be
more consistent.
2. Within the AST, the `ctx` of the `target` for an `AugAssign` is
`Store` (though technically this is a `Load` and `Store` operation, the
AST only indicates it as a `Store`). Since the the store portion of the
`AugAssign` occurs last, I think it makes sense to traverse the `target`
last as well.

The effect of this is marginal, but it may have an impact on the
behavior of #5271.
2023-06-23 09:54:54 -04:00
Micha Reiser c52aa8f065
Basic string formatting
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR implements formatting for non-f-string Strings that do not use implicit concatenation. 

Docstring formatting is out of the scope of this PR.

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

I added a few tests for simple string literals. 

## Performance

Ouch. This is hitting performance somewhat hard. This is probably because we now iterate each string a couple of times:

1. To detect if it is an implicit string continuation
2. To detect if the string contains any new lines
3. To detect the preferred quote
4. To normalize the string

Edit: I integrated the detection of newlines into the preferred quote detection so that we only iterate the string three time.
We can probably do better by merging the implicit string continuation with the quote detection and new line detection by iterating till the end of the string part and returning the offset. We then use our simple tokenizer to skip over any comments or whitespace until we find the first non trivia token. From there we keep continue doing this in a loop until we reach the end o the string. I'll leave this improvement for later.
2023-06-23 09:46:05 +02:00
James Berry 2142bf6141
Fix annotation and format spec visitors (#5324)
## Summary

The `Visitor` and `preorder::Visitor` traits provide some convenience
functions, `visit_annotation` and `visit_format_spec`, for handling
annotation and format spec expressions respectively. Both of these
functions accept an `&Expr` and have a default implementation which
delegates to `walk_expr`. The problem with this approach is that any
custom handling done in `visit_expr` will be skipped for annotations and
format specs. Instead, to capture any custom logic implemented in
`visit_expr`, both of these function's default implementations should
delegate to `visit_expr` instead of `walk_expr`.

## Example

Consider the below `Visitor` implementation:
```rust
impl<'a> Visitor<'a> for Example<'a> {
    fn visit_expr(&mut self, expr: &'a Expr) {
        match expr {
            Expr::Name(ExprName { id, .. }) => println!("Visiting {:?}", id),
            _ => walk_expr(self, expr),
        }
    }
}
```

Run on the following Python snippet:
```python
a: b
```

I would expect such a visitor to print the following:
```
Visiting b
Visiting a
```

But it instead prints the following:
```
Visiting a
```

Our custom `visit_expr` handler is not invoked for the annotation.

## Test Plan

Tests added in #5271 caught this behavior.
2023-06-23 03:55:42 +00:00
James Berry f194572be8
Remove visit_arg_with_default (#5265)
## Summary

This is a follow up to #5221. Turns out it was easy to restructure the
visitor to get the right order, I'm just dumb 🤷‍♂️ I've
removed `visit_arg_with_default` entirely from the `Visitor`, although
it still exists as part of `preorder::Visitor`.
2023-06-21 16:00:24 -04:00
James Berry 9b5fb8f38f
Fix AST visitor traversal order (#5221)
## Summary

According to the AST visitor documentation, the AST visitor "visits all
nodes in the AST recursively in evaluation-order". However, the current
traversal fails to meet this specification in a few places.

### Function traversal

```python
order = []
@(order.append("decorator") or (lambda x: x))
def f(
    posonly: order.append("posonly annotation") = order.append("posonly default"),
    /,
    arg: order.append("arg annotation") = order.append("arg default"),
    *args: order.append("vararg annotation"),
    kwarg: order.append("kwarg annotation") = order.append("kwarg default"),
    **kwargs: order.append("kwarg annotation")
) -> order.append("return annotation"):
    pass
print(order)
```

Executing the above snippet using CPython 3.10.6 prints the following
result (formatted for readability):
```python
[
    'decorator',
    'posonly default',
    'arg default',
    'kwarg default',
    'arg annotation',
    'posonly annotation',
    'vararg annotation',
    'kwarg annotation',
    'kwarg annotation',
    'return annotation',
]
```

Here we can see that decorators are evaluated first, followed by
argument defaults, and annotations are last. The current traversal of a
function's AST does not align with this order.

### Annotated assignment traversal
```python
order = []
x: order.append("annotation") = order.append("expression")
print(order)
```

Executing the above snippet using CPython 3.10.6 prints the following
result:
```python
['expression', 'annotation']
```

Here we can see that an annotated assignments annotation gets evaluated
after the assignment's expression. The current traversal of an annotated
assignment's AST does not align with this order.

## Why?

I'm slowly working on #3946 and porting over some of the logic and tests
from ssort. ssort is very sensitive to AST traversal order, so ensuring
the utmost correctness here is important.

## Test Plan

There doesn't seem to be existing tests for the AST visitor, so I didn't
bother adding tests for these very subtle changes. However, this
behavior will be captured in the tests for the PR which addresses #3946.
2023-06-21 14:40:58 -04:00
Micha Reiser e520a3a721
Fix ArgWithDefault comments handling (#5204) 2023-06-20 20:48:07 +00:00
Charlie Marsh 7bc33a8d5f
Remove identifier lexing in favor of parser ranges (#5195)
## Summary

Now that all identifiers include ranges (#5194), we can remove a ton of
this "custom lexing" code that we have to sketchily extract identifier
ranges from source.

## Test Plan

`cargo test`
2023-06-20 12:07:29 -04:00
Charlie Marsh 6331598511
Upgrade `RustPython` to access ranged names (#5194)
## Summary

In https://github.com/astral-sh/RustPython-Parser/pull/8, we modified
RustPython to include ranges for any identifiers that aren't
`Expr::Name` (which already has an identifier).

For example, the `e` in `except ValueError as e` was previously
un-ranged. To extract its range, we had to do some lexing of our own.
This change should improve performance and let us remove a bunch of
code.

## Test Plan

`cargo test`
2023-06-20 15:43:38 +00:00
Charlie Marsh 8e06140d1d
Remove continuations when deleting statements (#5198)
## Summary

This PR modifies our statement deletion logic to delete any preceding
continuation lines.

For example, given:

```py
x = 1; \
  import os
```

We'll now rewrite to:

```py
x = 1;
```

In addition, the logic can now handle multiple preceding continuations
(which is unlikely, but valid).
2023-06-19 22:04:28 -04:00
Charlie Marsh 36e01ad6eb
Upgrade RustPython (#5192)
## Summary

This PR upgrade RustPython to pull in the changes to `Arguments` (zip
defaults with their identifiers) and all the renames to `CmpOp` and
friends.
2023-06-19 21:09:53 +00:00
Thomas de Zeeuw e3c12764f8
Only use a single cache file per Python package (#5117)
## Summary

This changes the caching design from one cache file per source file, to
one cache file per package. This greatly reduces the amount of cache
files that are opened and written, while maintaining roughly the same
(combined) size as bincode is very compact.

Below are some very much not scientific performance tests. It uses
projects/sources to check:

* small.py: single, 31 bytes Python file with 2 errors.
* test.py: single, 43k Python file with 8 errors.
* fastapi: FastAPI repo, 1134 files checked, 0 errors.

Source   | Before # files | After # files | Before size | After size
-------|-------|-------|-------|-------
small.py | 1              | 1             | 20 K        | 20 K
test.py  | 1              | 1             | 60 K        | 60 K
fastapi  | 1134           | 518           | 4.5 M       | 2.3 M

One question that might come up is why fastapi still has 518 cache files
and not 1? That is because this is using the existing package
resolution, which sees examples, docs, etc. as separate from the "main"
source code (in the fastapi directory in the repo). In this future it
might be worth consider switching to a one cache file per repo strategy.

This new design is not perfect and does have a number of known issues.
First, like the old design it doesn't remove the cache for a source file
that has been (re)moved until `ruff clean` is called.

Second, this currently uses a large mutex around the mutation of the
package cache (e.g. inserting result). This could be (or become) a
bottleneck. It's future work to test and improve this (if needed).

Third, currently the packages and opened and stored in a sequential
loop, this could be done parallel. This is also future work.


## Test Plan

Run `ruff check` (with caching enabled) twice on any Python source code
and it should produce the same results.
2023-06-19 17:46:13 +02:00
Charlie Marsh 2b82caa163
Detect continuations at start-of-file (#5173)
## Summary

Given:

```python
\
import os
```

Deleting `import os` leaves a syntax error: a file can't end in a
continuation. We have code to handle this case, but it failed to pick up
continuations at the _very start_ of a file.

Closes #5156.
2023-06-19 00:09:02 -04:00
Charlie Marsh fab2a4adf7
Use `matches!` for insecure hash rule (#5141) 2023-06-16 04:18:32 +00:00
Charlie Marsh 5ea3e42513
Always use identifier ranges to store bindings (#5110)
## Summary

At present, when we store a binding, we include a `TextRange` alongside
it. The `TextRange` _sometimes_ matches the exact range of the
identifier to which the `Binding` is linked, but... not always.

For example, given:

```python
x = 1
```

The binding we create _will_ use the range of `x`, because the left-hand
side is an `Expr::Name`, which has a valid range on it.

However, given:

```python
try:
  pass
except ValueError as e:
  pass
```

When we create a binding for `e`, we don't have a `TextRange`... The AST
doesn't give us one. So we end up extracting it via lexing.

This PR extends that pattern to the rest of the binding kinds, to ensure
that whenever we create a binding, we always use the range of the bound
name. This leads to better diagnostics in cases like pattern matching,
whereby the diagnostic for "unused variable `x`" here used to include
`*x`, instead of just `x`:

```python
def f(provided: int) -> int:
    match provided:
        case [_, *x]:
            pass
```

This is _also_ required for symbol renames, since we track writes as
bindings -- so we need to know the ranges of the bound symbols.

By storing these bindings precisely, we can also remove the
`binding.trimmed_range` abstraction -- since bindings already use the
"trimmed range".

To implement this behavior, I took some of our existing utilities (like
the code we had for `except ValueError as e` above), migrated them from
a full lexer to a zero-allocation lexer that _only_ identifies
"identifiers", and moved the behavior into a trait, so we can now do
`stmt.identifier(locator)` to get the range for the identifier.

Honestly, we might end up discarding much of this if we decide to put
ranges on all identifiers
(https://github.com/astral-sh/RustPython-Parser/pull/8). But even if we
do, this will _still_ be a good change, because the lexer introduced
here is useful beyond names (e.g., we use it find the `except` keyword
in an exception handler, to find the `else` after a `for` loop, and so
on). So, I'm fine committing this even if we end up changing our minds
about the right approach.

Closes #5090.

## Benchmarks

No significant change, with one statistically significant improvement
(-2.1654% on `linter/all-rules/large/dataset.py`):

```
linter/default-rules/numpy/globals.py
                        time:   [73.922 µs 73.955 µs 73.986 µs]
                        thrpt:  [39.882 MiB/s 39.898 MiB/s 39.916 MiB/s]
                 change:
                        time:   [-0.5579% -0.4732% -0.3980%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3996% +0.4755% +0.5611%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
linter/default-rules/pydantic/types.py
                        time:   [1.4909 ms 1.4917 ms 1.4926 ms]
                        thrpt:  [17.087 MiB/s 17.096 MiB/s 17.106 MiB/s]
                 change:
                        time:   [+0.2140% +0.2741% +0.3392%] (p = 0.00 < 0.05)
                        thrpt:  [-0.3380% -0.2734% -0.2136%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
linter/default-rules/numpy/ctypeslib.py
                        time:   [688.97 µs 691.34 µs 694.15 µs]
                        thrpt:  [23.988 MiB/s 24.085 MiB/s 24.168 MiB/s]
                 change:
                        time:   [-1.3282% -0.7298% -0.1466%] (p = 0.02 < 0.05)
                        thrpt:  [+0.1468% +0.7351% +1.3461%]
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  12 (12.00%) high severe
linter/default-rules/large/dataset.py
                        time:   [3.3872 ms 3.4032 ms 3.4191 ms]
                        thrpt:  [11.899 MiB/s 11.954 MiB/s 12.011 MiB/s]
                 change:
                        time:   [-0.6427% -0.2635% +0.0906%] (p = 0.17 > 0.05)
                        thrpt:  [-0.0905% +0.2642% +0.6469%]
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  13 (13.00%) high severe

linter/all-rules/numpy/globals.py
                        time:   [148.99 µs 149.21 µs 149.42 µs]
                        thrpt:  [19.748 MiB/s 19.776 MiB/s 19.805 MiB/s]
                 change:
                        time:   [-0.7340% -0.5068% -0.2778%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2785% +0.5094% +0.7395%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
linter/all-rules/pydantic/types.py
                        time:   [3.0362 ms 3.0396 ms 3.0441 ms]
                        thrpt:  [8.3779 MiB/s 8.3903 MiB/s 8.3997 MiB/s]
                 change:
                        time:   [-0.0957% +0.0618% +0.2125%] (p = 0.45 > 0.05)
                        thrpt:  [-0.2121% -0.0618% +0.0958%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe
linter/all-rules/numpy/ctypeslib.py
                        time:   [1.6879 ms 1.6894 ms 1.6909 ms]
                        thrpt:  [9.8478 MiB/s 9.8562 MiB/s 9.8652 MiB/s]
                 change:
                        time:   [-0.2279% -0.0888% +0.0436%] (p = 0.18 > 0.05)
                        thrpt:  [-0.0435% +0.0889% +0.2284%]
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high severe
linter/all-rules/large/dataset.py
                        time:   [7.1520 ms 7.1586 ms 7.1654 ms]
                        thrpt:  [5.6777 MiB/s 5.6831 MiB/s 5.6883 MiB/s]
                 change:
                        time:   [-2.5626% -2.1654% -1.7780%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8102% +2.2133% +2.6300%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
```
2023-06-15 18:43:19 +00:00
konstin 66089e1a2e
Fix a number of formatter errors from the cpython repository (#5089)
## Summary

This fixes a number of problems in the formatter that showed up with
various files in the [cpython](https://github.com/python/cpython)
repository. These problems surfaced as unstable formatting and invalid
code. This is not the entirety of problems discovered through cpython,
but a big enough chunk to separate it. Individual fixes are generally
individual commits. They were discovered with #5055, which i update as i
work through the output

## Test Plan

I added regression tests with links to cpython for each entry, except
for the two stubs that also got comment stubs since they'll be
implemented properly later.
2023-06-15 11:24:14 +00:00
Charlie Marsh 716cab2f19
Run `rustfmt` on nightly to clean up erroneous comments (#5106)
## Summary

This PR runs `rustfmt` with a few nightly options as a one-time fix to
catch some malformatted comments. I ended up just running with:

```toml
condense_wildcard_suffixes = true
edition = "2021"
max_width = 100
normalize_comments = true
normalize_doc_attributes = true
reorder_impl_items = true
unstable_features = true
use_field_init_shorthand = true
```

Since these all seem like reasonable things to fix, so may as well while
I'm here.
2023-06-15 00:19:05 +00:00
Charlie Marsh aa41ffcfde
Add `BindingKind` variants to represent deleted bindings (#5071)
## Summary

Our current mechanism for handling deletions (e.g., `del x`) is to
remove the symbol from the scope's `bindings` table. This "does the
right thing", in that if we then reference a deleted symbol, we're able
to determine that it's unbound -- but it causes a variety of problems,
mostly in that it makes certain bindings and references unreachable
after-the-fact.

Consider:

```python
x = 1
print(x)
del x
```

If we analyze this code _after_ running the semantic model over the AST,
we'll have no way of knowing that `x` was ever introduced in the scope,
much less that it was bound to a value, read, and then deleted --
because we effectively erased `x` from the model entirely when we hit
the deletion.

In practice, this will make it impossible for us to support local symbol
renames. It also means that certain rules that we want to move out of
the model-building phase and into the "check dead scopes" phase wouldn't
work today, since we'll have lost important information about the source
code.

This PR introduces two new `BindingKind` variants to model deletions:

- `BindingKind::Deletion`, which represents `x = 1; del x`.
- `BindingKind::UnboundException`, which represents:

```python
try:
  1 / 0
except Exception as e:
  pass
```

In the latter case, `e` gets unbound after the exception handler
(assuming it's triggered), so we want to handle it similarly to a
deletion.

The main challenge here is auditing all of our existing `Binding` and
`Scope` usages to understand whether they need to accommodate deletions
or otherwise behave differently. If you look one commit back on this
branch, you'll see that the code is littered with `NOTE(charlie)`
comments that describe the reasoning behind changing (or not) each of
those call sites. I've also augmented our test suite in preparation for
this change over a few prior PRs.

### Alternatives

As an alternative, I considered introducing a flag to `BindingFlags`,
like `BindingFlags::UNBOUND`, and setting that at the appropriate time.

This turned out to be a much more difficult change, because we tend to
match on `BindingKind` all over the place (e.g., we have a bunch of code
blocks that only run when a `BindingKind` is
`BindingKind::Importation`). As a result, introducing these new
`BindingKind` variants requires only a few changes at the client sites.
Adding a flag would've required a much wider-reaching change.
2023-06-14 09:27:24 -04:00