mirror of https://github.com/astral-sh/ruff
4 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
bf5b62edac
|
Maintain synchronicity between the lexer and the parser (#11457)
## Summary This PR updates the entire parser stack in multiple ways: ### Make the lexer lazy * https://github.com/astral-sh/ruff/pull/11244 * https://github.com/astral-sh/ruff/pull/11473 Previously, Ruff's lexer would act as an iterator. The parser would collect all the tokens in a vector first and then process the tokens to create the syntax tree. The first task in this project is to update the entire parsing flow to make the lexer lazy. This includes the `Lexer`, `TokenSource`, and `Parser`. For context, the `TokenSource` is a wrapper around the `Lexer` to filter out the trivia tokens[^1]. Now, the parser will ask the token source to get the next token and only then the lexer will continue and emit the token. This means that the lexer needs to be aware of the "current" token. When the `next_token` is called, the current token will be updated with the newly lexed token. The main motivation to make the lexer lazy is to allow re-lexing a token in a different context. This is going to be really useful to make the parser error resilience. For example, currently the emitted tokens remains the same even if the parser can recover from an unclosed parenthesis. This is important because the lexer emits a `NonLogicalNewline` in parenthesized context while a normal `Newline` in non-parenthesized context. This different kinds of newline is also used to emit the indentation tokens which is important for the parser as it's used to determine the start and end of a block. Additionally, this allows us to implement the following functionalities: 1. Checkpoint - rewind infrastructure: The idea here is to create a checkpoint and continue lexing. At a later point, this checkpoint can be used to rewind the lexer back to the provided checkpoint. 2. Remove the `SoftKeywordTransformer` and instead use lookahead or speculative parsing to determine whether a soft keyword is a keyword or an identifier 3. Remove the `Tok` enum. The `Tok` enum represents the tokens emitted by the lexer but it contains owned data which makes it expensive to clone. The new `TokenKind` enum just represents the type of token which is very cheap. This brings up a question as to how will the parser get the owned value which was stored on `Tok`. This will be solved by introducing a new `TokenValue` enum which only contains a subset of token kinds which has the owned value. This is stored on the lexer and is requested by the parser when it wants to process the data. For example: |
|
|
|
f7740a8a20
|
Allow SPDX license headers to exceed the line length (#10481)
Closes https://github.com/astral-sh/ruff/issues/10465. |
|
|
|
84979f9673
|
Rename `tab-size` to `indent-width` (#8082)
## Summary This PR renames the `tab-size` configuration option to `indent-width` to express that the formatter uses the option to determine the indentation width AND as tab width. I first preferred naming the option `tab-width` but then decided to go with `indent-width` because: * It aligns with the `indent-style` option * It would allow us to write a lint rule that asserts that each indentation uses `indent-width` spaces. Closes #7643 ## Test Plan Added integration test |
|
|
|
1646939383
|
Ignore overlong pragma comments when enforcing linter line length (#7692)
## Summary This PR modifies the `line-too-long` and `doc-line-too-long` rules to ignore lines that are too long due to the presence of a pragma comment (e.g., `# type: ignore` or `# noqa`). That is, if a line only exceeds the limit due to the pragma comment, it will no longer be flagged as "too long". This behavior mirrors that of the formatter, thus ensuring that we don't flag lines under E501 that the formatter would otherwise avoid wrapping. As a concrete example, given a line length of 88, the following would _no longer_ be considered an E501 violation: ```python # The string literal is 88 characters, including quotes. "shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:sh" # type: ignore ``` This, however, would: ```python # The string literal is 89 characters, including quotes. "shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:shape:sha" # type: ignore ``` In addition to mirroring the formatter, this also means that adding a pragma comment (like `# noqa`) won't _cause_ additional violations to appear (namely, E501). It's very common for users to add a `# type: ignore` or similar to a line, only to find that they then have to add a suppression comment _after_ it that was required before, as in `# type: ignore # noqa: E501`. Closes https://github.com/astral-sh/ruff/issues/7471. ## Test Plan `cargo test` |