## Summary
This PR updates the linter to show all the parse errors as diagnostics
instead of just the first one.
Note that this doesn't affect the parse error displayed as error log
message. This will be removed in a follow-up PR.
### Breaking?
I don't think this is a breaking change even though this might give more
diagnostics. The main reason is that this shouldn't affect any users
because it'll only give additional diagnostics in the case of multiple
syntax errors.
## Test Plan
Add an integration test case which would raise more than one parse
error.
## Summary
This PR updates the parser to remove building the `CommentRanges` and
instead it'll be built by the linter and the formatter when it's
required.
For the linter, it'll be built and owned by the `Indexer` while for the
formatter it'll be built from the `Tokens` struct and passed as an
argument.
## Test Plan
`cargo insta test`
## Summary
This PR removes the `result-like` dependency and instead implement the
required functionality. The motivation being that `noqa.is_enabled()` is
easier to read than `noqa.into()`.
For context, I was just trying to understand the syntax error workflow
and I saw these flags which were being converted via `into`. I always
find `into` confusing because you never know what's it being converted
into unless you know the type. Later realized that it's just a boolean
flag. After removing the usages from these two flags, it turns out that
the dependency is only being used in one rule so I thought to remove
that as well.
## Test Plan
`cargo insta test`
## Summary
This PR updates the entire parser stack in multiple ways:
### Make the lexer lazy
* https://github.com/astral-sh/ruff/pull/11244
* https://github.com/astral-sh/ruff/pull/11473
Previously, Ruff's lexer would act as an iterator. The parser would
collect all the tokens in a vector first and then process the tokens to
create the syntax tree.
The first task in this project is to update the entire parsing flow to
make the lexer lazy. This includes the `Lexer`, `TokenSource`, and
`Parser`. For context, the `TokenSource` is a wrapper around the `Lexer`
to filter out the trivia tokens[^1]. Now, the parser will ask the token
source to get the next token and only then the lexer will continue and
emit the token. This means that the lexer needs to be aware of the
"current" token. When the `next_token` is called, the current token will
be updated with the newly lexed token.
The main motivation to make the lexer lazy is to allow re-lexing a token
in a different context. This is going to be really useful to make the
parser error resilience. For example, currently the emitted tokens
remains the same even if the parser can recover from an unclosed
parenthesis. This is important because the lexer emits a
`NonLogicalNewline` in parenthesized context while a normal `Newline` in
non-parenthesized context. This different kinds of newline is also used
to emit the indentation tokens which is important for the parser as it's
used to determine the start and end of a block.
Additionally, this allows us to implement the following functionalities:
1. Checkpoint - rewind infrastructure: The idea here is to create a
checkpoint and continue lexing. At a later point, this checkpoint can be
used to rewind the lexer back to the provided checkpoint.
2. Remove the `SoftKeywordTransformer` and instead use lookahead or
speculative parsing to determine whether a soft keyword is a keyword or
an identifier
3. Remove the `Tok` enum. The `Tok` enum represents the tokens emitted
by the lexer but it contains owned data which makes it expensive to
clone. The new `TokenKind` enum just represents the type of token which
is very cheap.
This brings up a question as to how will the parser get the owned value
which was stored on `Tok`. This will be solved by introducing a new
`TokenValue` enum which only contains a subset of token kinds which has
the owned value. This is stored on the lexer and is requested by the
parser when it wants to process the data. For example:
8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L1260-L1262)
[^1]: Trivia tokens are `NonLogicalNewline` and `Comment`
### Remove `SoftKeywordTransformer`
* https://github.com/astral-sh/ruff/pull/11441
* https://github.com/astral-sh/ruff/pull/11459
* https://github.com/astral-sh/ruff/pull/11442
* https://github.com/astral-sh/ruff/pull/11443
* https://github.com/astral-sh/ruff/pull/11474
For context,
https://github.com/RustPython/RustPython/pull/4519/files#diff-5de40045e78e794aa5ab0b8aacf531aa477daf826d31ca129467703855408220
added support for soft keywords in the parser which uses infinite
lookahead to classify a soft keyword as a keyword or an identifier. This
is a brilliant idea as it basically wraps the existing Lexer and works
on top of it which means that the logic for lexing and re-lexing a soft
keyword remains separate. The change here is to remove
`SoftKeywordTransformer` and let the parser determine this based on
context, lookahead and speculative parsing.
* **Context:** The transformer needs to know the position of the lexer
between it being at a statement position or a simple statement position.
This is because a `match` token starts a compound statement while a
`type` token starts a simple statement. **The parser already knows
this.**
* **Lookahead:** Now that the parser knows the context it can perform
lookahead of up to two tokens to classify the soft keyword. The logic
for this is mentioned in the PR implementing it for `type` and `match
soft keyword.
* **Speculative parsing:** This is where the checkpoint - rewind
infrastructure helps. For `match` soft keyword, there are certain cases
for which we can't classify based on lookahead. The idea here is to
create a checkpoint and keep parsing. Based on whether the parsing was
successful and what tokens are ahead we can classify the remaining
cases. Refer to #11443 for more details.
If the soft keyword is being parsed in an identifier context, it'll be
converted to an identifier and the emitted token will be updated as
well. Refer
8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L487-L491).
The `case` soft keyword doesn't require any special handling because
it'll be a keyword only in the context of a match statement.
### Update the parser API
* https://github.com/astral-sh/ruff/pull/11494
* https://github.com/astral-sh/ruff/pull/11505
Now that the lexer is in sync with the parser, and the parser helps to
determine whether a soft keyword is a keyword or an identifier, the
lexer cannot be used on its own. The reason being that it's not
sensitive to the context (which is correct). This means that the parser
API needs to be updated to not allow any access to the lexer.
Previously, there were multiple ways to parse the source code:
1. Passing the source code itself
2. Or, passing the tokens
Now that the lexer and parser are working together, the API
corresponding to (2) cannot exists. The final API is mentioned in this
PR description: https://github.com/astral-sh/ruff/pull/11494.
### Refactor the downstream tools (linter and formatter)
* https://github.com/astral-sh/ruff/pull/11511
* https://github.com/astral-sh/ruff/pull/11515
* https://github.com/astral-sh/ruff/pull/11529
* https://github.com/astral-sh/ruff/pull/11562
* https://github.com/astral-sh/ruff/pull/11592
And, the final set of changes involves updating all references of the
lexer and `Tok` enum. This was done in two-parts:
1. Update all the references in a way that doesn't require any changes
from this PR i.e., it can be done independently
* https://github.com/astral-sh/ruff/pull/11402
* https://github.com/astral-sh/ruff/pull/11406
* https://github.com/astral-sh/ruff/pull/11418
* https://github.com/astral-sh/ruff/pull/11419
* https://github.com/astral-sh/ruff/pull/11420
* https://github.com/astral-sh/ruff/pull/11424
2. Update all the remaining references to use the changes made in this
PR
For (2), there were various strategies used:
1. Introduce a new `Tokens` struct which wraps the token vector and add
methods to query a certain subset of tokens. These includes:
1. `up_to_first_unknown` which replaces the `tokenize` function
2. `in_range` and `after` which replaces the `lex_starts_at` function
where the former returns the tokens within the given range while the
latter returns all the tokens after the given offset
2. Introduce a new `TokenFlags` which is a set of flags to query certain
information from a token. Currently, this information is only limited to
any string type token but can be expanded to include other information
in the future as needed. https://github.com/astral-sh/ruff/pull/11578
3. Move the `CommentRanges` to the parsed output because this
information is common to both the linter and the formatter. This removes
the need for `tokens_and_ranges` function.
## Test Plan
- [x] Update and verify the test snapshots
- [x] Make sure the entire test suite is passing
- [x] Make sure there are no changes in the ecosystem checks
- [x] Run the fuzzer on the parser
- [x] Run this change on dozens of open-source projects
### Running this change on dozens of open-source projects
Refer to the PR description to get the list of open source projects used
for testing.
Now, the following tests were done between `main` and this branch:
1. Compare the output of `--select=E999` (syntax errors)
2. Compare the output of default rule selection
3. Compare the output of `--select=ALL`
**Conclusion: all output were same**
## What's next?
The next step is to introduce re-lexing logic and update the parser to
feed the recovery information to the lexer so that it can emit the
correct token. This moves us one step closer to having error resilience
in the parser and provides Ruff the possibility to lint even if the
source code contains syntax errors.
## Summary
Alternative to #11237
This PR adds a new `Tokens` struct which is a newtype wrapper around a
vector of lexer output. This allows us to add a `kinds` method which
returns an iterator over the corresponding `TokenKind`. This iterator is
implemented as a separate `TokenKindIter` struct to allow using the type
and provide additional methods like `peek` directly on the iterator.
This exposes the linter to access the stream of `TokenKind` instead of
`Tok`.
Edit: I've made the necessary downstream changes and plan to merge the
entire stack at once.
## Summary
This PR removes the `ImportMap` implementation and all its routing
through ruff.
The import map was added in https://github.com/astral-sh/ruff/pull/3243
but we then never ended up using it to do cross file analysis.
We are now working on adding multifile analysis to ruff, and revisit
import resolution as part of it.
```
hyperfine --warmup 10 --runs 20 --setup "./target/release/ruff clean" \
"./target/release/ruff check crates/ruff_linter/resources/test/cpython -e -s --extend-select=I" \
"./target/release/ruff-import check crates/ruff_linter/resources/test/cpython -e -s --extend-select=I"
Benchmark 1: ./target/release/ruff check crates/ruff_linter/resources/test/cpython -e -s --extend-select=I
Time (mean ± σ): 37.6 ms ± 0.9 ms [User: 52.2 ms, System: 63.7 ms]
Range (min … max): 35.8 ms … 39.8 ms 20 runs
Benchmark 2: ./target/release/ruff-import check crates/ruff_linter/resources/test/cpython -e -s --extend-select=I
Time (mean ± σ): 36.0 ms ± 0.7 ms [User: 50.3 ms, System: 58.4 ms]
Range (min … max): 34.5 ms … 37.6 ms 20 runs
Summary
./target/release/ruff-import check crates/ruff_linter/resources/test/cpython -e -s --extend-select=I ran
1.04 ± 0.03 times faster than ./target/release/ruff check crates/ruff_linter/resources/test/cpython -e -s --extend-select=I
```
I suspect that the performance improvement should even be more
significant for users that otherwise don't have any diagnostics.
```
hyperfine --warmup 10 --runs 20 --setup "cd ../ecosystem/airflow && ../../ruff/target/release/ruff clean" \
"./target/release/ruff check ../ecosystem/airflow -e -s --extend-select=I" \
"./target/release/ruff-import check ../ecosystem/airflow -e -s --extend-select=I"
Benchmark 1: ./target/release/ruff check ../ecosystem/airflow -e -s --extend-select=I
Time (mean ± σ): 53.7 ms ± 1.8 ms [User: 68.4 ms, System: 63.0 ms]
Range (min … max): 51.1 ms … 58.7 ms 20 runs
Benchmark 2: ./target/release/ruff-import check ../ecosystem/airflow -e -s --extend-select=I
Time (mean ± σ): 50.8 ms ± 1.4 ms [User: 50.7 ms, System: 60.9 ms]
Range (min … max): 48.5 ms … 55.3 ms 20 runs
Summary
./target/release/ruff-import check ../ecosystem/airflow -e -s --extend-select=I ran
1.06 ± 0.05 times faster than ./target/release/ruff check ../ecosystem/airflow -e -s --extend-select=I
```
## Test Plan
`cargo test`
Extends https://github.com/astral-sh/ruff/pull/9752 adding internal test
rules for redirection
Fixes a bug where we did not see warnings for exact codes that are
redirected (just prefixes)
Updated implementation of https://github.com/astral-sh/ruff/pull/7369
which was left out in the cold.
This was motivated again following changes in #9691 and #9689 where we
could not test the changes without actually deprecating or removing
rules.
---
Follow-up to discussion in https://github.com/astral-sh/ruff/pull/7210
Moves integration tests from using rules that are transitively in
nursery / preview groups to dedicated test rules that only exist during
development. These rules always raise violations (they do not require
specific file behavior). The rules are not available in production or in
the documentation.
Uses features instead of `cfg(test)` for cross-crate support per
https://github.com/rust-lang/cargo/issues/8379
## Summary
Given a statement like `colors = 6`, we currently treat the cell as an
automagic (since `colors` is an automagic) -- i.e., we assume it's
equivalent to `%colors = 6`. This PR adds some additional detection
whereby if the statement is an _assignment_, we avoid treating it as
such. I audited the list of automagics, and I believe this is safe for
all of them.
Closes https://github.com/astral-sh/ruff/issues/8526.
Closes https://github.com/astral-sh/ruff/issues/9648.
## Test Plan
`cargo test`
## Summary
Ensures that any lint rules that include line locations render them as
relative to the cell (and include the cell number) when inside a Jupyter
notebook.
Closes https://github.com/astral-sh/ruff/issues/6672.
## Test Plan
`cargo test`
## Summary
This is a non-behavior-changing refactor to follow-up
https://github.com/astral-sh/ruff/pull/9321 by modifying
`DisplayParseError` to use owned data and make it useable as a
standalone error type (rather than using references and implementing
`Display`). I don't feel very strongly either way. I thought it was
awkward that the `FormatCommandError` had two branches in the display
path, and wanted to represent the `Parse` vs. other cases as a separate
enum, so here we are.
## Summary
I always found it odd that we had to pass this in, since it's really
higher-level context for the error. The awkwardness is further evidenced
by the fact that we pass in fake values everywhere (even outside of
tests). The source path isn't actually used to display the error; it's
only accessed elsewhere to _re-display_ the error in certain cases. This
PR modifies to instead pass the path directly in those cases.
## Summary
If `RUF100` is ignored via `per-file-ignores`, we need to avoid raising
it. `RUF100` has special "self-ignore" logic, since the rule itself
deals with `# noqa` directives. This PR wires up `per-file-ignores` to
that "self-ignore" logic.
Closes https://github.com/astral-sh/ruff/issues/9297.
## Summary
This PR adds some helper structs to the linter paths to enable passing
in the pre-computed tokens and parsed source code during benchmarking,
to remove lexing and parsing from the overall linter benchmark
measurement. We already remove parsing for the formatter, and we have
separate benchmarks for the lexer and the parser, so this should make it
much easier to measure linter performance changes.
## Summary
This PR updates the `E703` rule to avoid flagging any semicolons if
they're present after the last expression in a notebook cell. These are
intended to hide the cell output.
Part of #8669
## Test Plan
Add test notebook and update the snapshots.
## Summary
This PR updates `B015` and `B018` to ignore last top-level expressions
in each cell of a Jupyter Notebook.
Part of #8669
## Test Plan
Add test cases for both rules and update the snapshots.
## Summary
I think it's reasonable to avoid raising `INP001` for scripts, and
shebangs are one sufficient way to detect scripts.
Closes https://github.com/astral-sh/ruff/issues/8690.
## Summary
`display` is a special-cased builtin in IPython. This PR adds it to the
builtin namespace when analyzing IPython notebooks.
Closes https://github.com/astral-sh/ruff/issues/8702.
## Summary
Prior to this change `extend_unsafe_fixes` took precedence over
`extend_safe_fixes` selectors, so any conflicts were resolved in favour
of `extend_unsafe_fixes`. Thanks to that ruff were conservatively
assuming that if configs conlict the fix corresponding to selected rule
will be treated as unsafe.
After this change we take into account Specificity of the selectors. For
conflicts between selectors of the same Specificity we will treat the
corresponding fixes as unsafe. But if the conflicting selectors are of
different specificity the more specific one will win.
## Test Plan
Tests were added for the `FixSafetyTable` struct. The
`check_extend_unsafe_fixes_conflict_with_extend_safe_fixes_by_specificity`
integration test was added to test conflicting rules of different
specificity.
Fixes#8404
---------
Co-authored-by: Zanie <contact@zanie.dev>
## Summary
Throughout the codebase, we have this pattern:
```rust
let mut diagnostic = ...
if checker.patch(Rule::UnusedVariable) {
// Do the fix.
}
diagnostics.push(diagnostic)
```
This was helpful when we computed fixes lazily; however, we now compute
fixes eagerly, and this is _only_ used to ensure that we don't generate
fixes for rules marked as unfixable.
We often forget to add this, and it leads to bugs in enforcing
`--unfixable`.
This PR instead removes all of these checks, moving the responsibility
of enforcing `--unfixable` up to `check_path`. This is similar to how
@zanieb handled the `--extend-unsafe` logic: we post-process the
diagnostics to remove any fixes that should be ignored.
Adds two configuration-file only settings `extend-safe-fixes` and
`extend-unsafe-fixes` which can be used to promote and demote the
applicability of fixes for rules.
Fixes with `Never` applicability cannot be promoted.
Rebase of https://github.com/astral-sh/ruff/pull/5119 authored by
@evanrittenhouse with additional refinements.
## Changes
- Adds `--unsafe-fixes` / `--no-unsafe-fixes` flags to `ruff check`
- Violations with unsafe fixes are not shown as fixable unless opted-in
- Fix applicability is respected now
- `Applicability::Never` fixes are no longer applied
- `Applicability::Sometimes` fixes require opt-in
- `Applicability::Always` fixes are unchanged
- Hints for availability of `--unsafe-fixes` added to `ruff check`
output
## Examples
Check hints at hidden unsafe fixes
```
❯ ruff check example.py --no-cache --select F601,W292
example.py:1:14: F601 Dictionary key literal `'a'` repeated
example.py:2:15: W292 [*] No newline at end of file
Found 2 errors.
[*] 1 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).
```
We could add an indicator for which violations have hidden fixes in the
future.
Check treats unsafe fixes as applicable with opt-in
```
❯ ruff check example.py --no-cache --select F601,W292 --unsafe-fixes
example.py:1:14: F601 [*] Dictionary key literal `'a'` repeated
example.py:2:15: W292 [*] No newline at end of file
Found 2 errors.
[*] 2 fixable with the --fix option.
```
Also can be enabled in the config file
```
❯ cat ruff.toml
unsafe-fixes = true
```
And opted-out per invocation
```
❯ ruff check example.py --no-cache --select F601,W292 --no-unsafe-fixes
example.py:1:14: F601 Dictionary key literal `'a'` repeated
example.py:2:15: W292 [*] No newline at end of file
Found 2 errors.
[*] 1 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).
```
Diff does not include unsafe fixes
```
❯ ruff check example.py --no-cache --select F601,W292 --diff
--- example.py
+++ example.py
@@ -1,2 +1,2 @@
x = {'a': 1, 'a': 1}
-print(('foo'))
+print(('foo'))
\ No newline at end of file
Would fix 1 error.
```
Unless there is opt-in
```
❯ ruff check example.py --no-cache --select F601,W292 --diff --unsafe-fixes
--- example.py
+++ example.py
@@ -1,2 +1,2 @@
-x = {'a': 1}
-print(('foo'))
+x = {'a': 1, 'a': 1}
+print(('foo'))
\ No newline at end of file
Would fix 2 errors.
```
https://github.com/astral-sh/ruff/pull/7790 will improve the diff
messages following this pull request
Similarly, `--fix` and `--fix-only` require the `--unsafe-fixes` flag to
apply unsafe fixes.
## Related
Replaces #5119
Closes https://github.com/astral-sh/ruff/issues/4185
Closes https://github.com/astral-sh/ruff/issues/7214
Closes https://github.com/astral-sh/ruff/issues/4845
Closes https://github.com/astral-sh/ruff/issues/3863
Addresses https://github.com/astral-sh/ruff/issues/6835
Addresses https://github.com/astral-sh/ruff/issues/7019
Needs follow-up https://github.com/astral-sh/ruff/issues/6962
Needs follow-up https://github.com/astral-sh/ruff/issues/4845
Needs follow-up https://github.com/astral-sh/ruff/issues/7436
Needs follow-up https://github.com/astral-sh/ruff/issues/7025
Needs follow-up https://github.com/astral-sh/ruff/issues/6434
Follow-up #7790
Follow-up https://github.com/astral-sh/ruff/pull/7792
---------
Co-authored-by: Evan Rittenhouse <evanrittenhouse@gmail.com>
## Summary
The parser now uses the raw source code as global context and slices
into it to parse debug text. It turns out we were always passing in the
_old_ source code, so when code was fixed, we were making invalid
accesses. This PR modifies the call to use the _fixed_ source code,
which will always be consistent with the tokens.
Closes https://github.com/astral-sh/ruff/issues/7711.
## Test Plan
`cargo test`
## Stack Summary
This stack splits `Settings` into `FormatterSettings` and `LinterSettings` and moves it into `ruff_workspace`. This change is necessary to add the `FormatterSettings` to `Settings` without adding `ruff_python_formatter` as a dependency to `ruff_linter` (and the linter should not contain the formatter settings).
A quick overview of our settings struct at play:
* `Options`: 1:1 representation of the options in the `pyproject.toml` or `ruff.toml`. Used for deserialization.
* `Configuration`: Resolved `Options`, potentially merged from multiple configurations (when using `extend`). The representation is very close if not identical to the `Options`.
* `Settings`: The resolved configuration that uses a data format optimized for reading. Optional fields are initialized with their default values. Initialized by `Configuration::into_settings` .
The goal of this stack is to split `Settings` into tool-specific resolved `Settings` that are independent of each other. This comes at the advantage that the individual crates don't need to know anything about the other tools. The downside is that information gets duplicated between `Settings`. Right now the duplication is minimal (`line-length`, `tab-width`) but we may need to come up with a solution if more expensive data needs sharing.
This stack focuses on `Settings`. Splitting `Configuration` into some smaller structs is something I'll follow up on later.
## PR Summary
This PR extracts the linter-specific settings into a new `LinterSettings` struct and adds it as a `linter` field to the `Settings` struct. This is in preparation for moving `Settings` from `ruff_linter` to `ruff_workspace`
## Test Plan
`cargo test`