## Summary
The problem: given a (row, column) number (e.g., for a token in the AST), we need to be able to map it to a precise byte index in the source code. A while ago, we moved to `ropey` for this, since it was faster in practice (mostly, I think, because it's able to defer indexing). However, at some threshold of accesses, it becomes faster to index the string in advance, as we're doing here.
## Benchmark
It looks like this is ~3.6% slower for the default rule set, but ~9.3% faster for `--select ALL`.
**I suspect there's a strategy that would be strictly faster in both cases**, based on deferring even more computation (right now, we lazily compute these offsets, but we do it for the entire file at once, even if we only need some slice at the top), or caching the `ropey` lookups in some way.
Before:

After:

## Alternatives
I tried tweaking the `Vec::with_capacity` hints, and even trying `Vec::with_capacity(str_indices::lines_crlf::count_breaks(contents))` to do a quick scan of the number of lines, but that turned out to be slower.
The idea is the same as #1867. Avoids emitting `SIM102` twice for the following code:
```python
if a:
if b:
if c:
d
```
```
resources/test/fixtures/flake8_simplify/SIM102.py:1:1: SIM102 Use a single `if` statement instead of nested `if` statements
resources/test/fixtures/flake8_simplify/SIM102.py:2:5: SIM102 Use a single `if` statement instead of nested `if` statements
```
For now, we're just gonna avoid flagging this for `elif` blocks, following the same reasoning as for ternaries. We can handle all of these cases, but we'll knock out the TODOs as a pair, and this avoids broken code.
Closes#2007.
# This commit was automatically generated by running the following
# script (followed by `cargo +nightly fmt`):
import glob
import re
from typing import NamedTuple
class Rule(NamedTuple):
code: str
name: str
path: str
def rules() -> list[Rule]:
"""Returns all the rules defined in `src/registry.rs`."""
file = open('src/registry.rs')
rules = []
while next(file) != 'ruff_macros::define_rule_mapping!(\n':
continue
while (line := next(file)) != ');\n':
line = line.strip().rstrip(',')
if line.startswith('//'):
continue
code, path = line.split(' => ')
name = path.rsplit('::')[-1]
rules.append(Rule(code, name, path))
return rules
code2name = {r.code: r.name for r in rules()}
for pattern in ('src/**/*.rs', 'ruff_cli/**/*.rs', 'ruff_dev/**/*.rs', 'scripts/add_*.py'):
for name in glob.glob(pattern, recursive=True):
with open(name) as f:
text = f.read()
text = re.sub('Rule(?:Code)?::([A-Z]\w+)', lambda m: 'Rule::' + code2name[m.group(1)], text)
text = re.sub(r'(?<!"<FilePattern>:<)RuleCode\b', 'Rule', text)
text = re.sub('(use crate::registry::{.*, Rule), Rule(.*)', r'\1\2', text) # fix duplicate import
with open(name, 'w') as f:
f.write(text)
This is slightly buggy due to Instagram/LibCST#855; it will complain `[ERROR] Failed to fix nested with: Failed to extract CST from source` when trying to fix nested parenthesized `with` statements lacking trailing commas. But presumably people who write parenthesized `with` statements already knew that they don’t need to nest them.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
If a `try` block has multiple statements, a compound statement, or
control flow, rewriting it with `contextlib.suppress` would obfuscate
the fact that the exception still short-circuits further statements in
the block.
Fixes#1947.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Since our binding tracking is somewhat limited, I opted to favor false negatives over false positives. So, e.g., this won't trigger SIM115:
```py
with contextlib.ExitStack():
f = exit_stack.enter_context(open("filename"))
```
(Notice that `exit_stack` is unbound.)
The alternative strategy required us to incorrectly trigger SIM115 on this:
```py
with contextlib.ExitStack() as exit_stack:
exit_stack_ = exit_stack
f = exit_stack_.enter_context(open("filename"))
```
Closes#1945.
The Settings struct previously contained the fields:
pub enabled: HashableHashSet<RuleCode>,
pub fixable: HashableHashSet<RuleCode>,
This commit merges both fields into one by introducing a new
RuleTable type, wrapping HashableHashMap<RuleCode, bool>,
which has the following benefits:
1. It makes the invalid state that a rule is
disabled but fixable unrepresentable.
2. It encapsulates the implementation details of the table.
(It currently uses an FxHashMap but that may change.)
3. It results in more readable code.
settings.rules.enabled(rule)
settings.rules.should_fix(rule)
is more readable than:
settings.enabled.contains(rule)
settings.fixable.contains(rule)
The primary motivation is that we can now robustly detect `\` continuations due to the addition of `Tok::NonLogicalNewline`. This PR generalizes the approach we took to comments (track all lines that contain any comments), and applies it to continuations too.
Before
```
resources/test/fixtures/flake8_simplify/SIM208.py:1:13: SIM208 Use `a` instead of `not (not a)`
|
1 | if not (not a): # SIM208
| ^ SIM208
|
= help: Replace with `a`
resources/test/fixtures/flake8_simplify/SIM208.py:4:14: SIM208 Use `a == b` instead of `not (not a == b)`
|
4 | if not (not (a == b)): # SIM208
| ^^^^^^ SIM208
|
= help: Replace with `a == b`
```
After
```
resources/test/fixtures/flake8_simplify/SIM208.py:1:4: SIM208 Use `a` instead of `not (not a)`
|
1 | if not (not a): # SIM208
| ^^^^^^^^^^^ SIM208
|
= help: Replace with `a`
resources/test/fixtures/flake8_simplify/SIM208.py:4:4: SIM208 Use `a == b` instead of `not (not a == b)`
|
4 | if not (not (a == b)): # SIM208
| ^^^^^^^^^^^^^^^^^^ SIM208
|
= help: Replace with `a == b`
```