mirror of https://github.com/astral-sh/ruff
5 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
bf5b62edac
|
Maintain synchronicity between the lexer and the parser (#11457)
## Summary This PR updates the entire parser stack in multiple ways: ### Make the lexer lazy * https://github.com/astral-sh/ruff/pull/11244 * https://github.com/astral-sh/ruff/pull/11473 Previously, Ruff's lexer would act as an iterator. The parser would collect all the tokens in a vector first and then process the tokens to create the syntax tree. The first task in this project is to update the entire parsing flow to make the lexer lazy. This includes the `Lexer`, `TokenSource`, and `Parser`. For context, the `TokenSource` is a wrapper around the `Lexer` to filter out the trivia tokens[^1]. Now, the parser will ask the token source to get the next token and only then the lexer will continue and emit the token. This means that the lexer needs to be aware of the "current" token. When the `next_token` is called, the current token will be updated with the newly lexed token. The main motivation to make the lexer lazy is to allow re-lexing a token in a different context. This is going to be really useful to make the parser error resilience. For example, currently the emitted tokens remains the same even if the parser can recover from an unclosed parenthesis. This is important because the lexer emits a `NonLogicalNewline` in parenthesized context while a normal `Newline` in non-parenthesized context. This different kinds of newline is also used to emit the indentation tokens which is important for the parser as it's used to determine the start and end of a block. Additionally, this allows us to implement the following functionalities: 1. Checkpoint - rewind infrastructure: The idea here is to create a checkpoint and continue lexing. At a later point, this checkpoint can be used to rewind the lexer back to the provided checkpoint. 2. Remove the `SoftKeywordTransformer` and instead use lookahead or speculative parsing to determine whether a soft keyword is a keyword or an identifier 3. Remove the `Tok` enum. The `Tok` enum represents the tokens emitted by the lexer but it contains owned data which makes it expensive to clone. The new `TokenKind` enum just represents the type of token which is very cheap. This brings up a question as to how will the parser get the owned value which was stored on `Tok`. This will be solved by introducing a new `TokenValue` enum which only contains a subset of token kinds which has the owned value. This is stored on the lexer and is requested by the parser when it wants to process the data. For example: |
|
|
|
c504d7ab11
|
Track quoting style in the tokenizer (#10256) | |
|
|
c3ca34543f
|
Skip LibCST parsing for standard dedent adjustments (#9769)
## Summary Often, when fixing, we need to dedent a block of code (e.g., if we remove an `if` and dedent its body). Today, we use LibCST to parse and adjust the indentation, which is really expensive -- but this is only really necessary if the block contains a multiline string, since naively adjusting the indentation for such a string can change the whitespace _within_ the string. This PR uses a simple dedent implementation for cases in which the block doesn't intersect with a multi-line string (or an f-string, since we don't support tracking multi-line strings for f-strings right now). We could improve this even further by using the ranges to guide the dedent function, such that we don't apply the dedent if the line starts within a multiline string. But that would also need to take f-strings into account, which is a little tricky. ## Test Plan `cargo test` |
|
|
|
097e703071
|
Consider unterminated f-strings in `FStringRanges` (#8154)
## Summary
This PR removes the `debug_assertion` in the `Indexer` to allow
unterminated f-strings. This is mainly a fix in the development build
which now matches the release build.
The fix is simple: remove the `debug_assertion` which means that the
there could be `FStringStart` and possibly `FStringMiddle` tokens
without a corresponding f-string range in the `Indexer`. This means that
the code requesting for the f-string index need to account for the
`None` case, making the code safer.
This also updates the code which queries the `FStringRanges` to account
for the `None` case. This will happen when the `FStringStart` /
`FStringMiddle` tokens are present but the `FStringEnd` token isn't
which means that the `Indexer` won't contain the range for that
f-string.
## Test Plan
`cargo test`
Taking the following code as an example:
```python
f"{123}
```
This only emits a `FStringStart` token, but no `FStringMiddle` or
`FStringEnd` tokens.
And,
```python
f"\.png${
```
This emits a `FStringStart` and `FStringMiddle` token, but no
`FStringEnd` token.
fixes: #8065
|
|
|
|
e62e245c61
|
Add support for PEP 701 (#7376)
## Summary This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the other individual PRs. The separate PRs were created for logic separation and code reviews. Refer to each pull request for a detail description on the change. Refer to the PR description for the list of pull requests within this PR. ## Test Plan ### Formatter ecosystem checks Explanation for the change in ecosystem check: https://github.com/astral-sh/ruff/pull/7597#issue-1908878183 #### `main` ``` | project | similarity index | total files | changed files | |--------------|------------------:|------------------:|------------------:| | cpython | 0.76083 | 1789 | 1631 | | django | 0.99983 | 2760 | 36 | | transformers | 0.99963 | 2587 | 319 | | twine | 1.00000 | 33 | 0 | | typeshed | 0.99983 | 3496 | 18 | | warehouse | 0.99967 | 648 | 15 | | zulip | 0.99972 | 1437 | 21 | ``` #### `dhruv/pep-701` ``` | project | similarity index | total files | changed files | |--------------|------------------:|------------------:|------------------:| | cpython | 0.76051 | 1789 | 1632 | | django | 0.99983 | 2760 | 36 | | transformers | 0.99963 | 2587 | 319 | | twine | 1.00000 | 33 | 0 | | typeshed | 0.99983 | 3496 | 18 | | warehouse | 0.99967 | 648 | 15 | | zulip | 0.99972 | 1437 | 21 | ``` |