This commit removes rule redirects such as ("U" -> "UP") from the
RuleCodePrefix enum because they complicated the generation of that enum
(which we want to change to be prefix-agnostic in the future).
To preserve backwards compatibility redirects are now resolved
before the strum-generated RuleCodePrefix::from_str is invoked.
This change also brings two other advantages:
* Redirects are now only defined once
(previously they had to be defined twice:
once in ruff_macros/src/rule_code_prefix.rs
and a second time in src/registry.rs).
* The deprecated redirects will no longer be suggested in IDE
autocompletion within pyproject.toml since they are now no
longer part of the ruff.schema.json.
Yet another refactor to let us implement the many-to-many mapping
between codes and rules in a prefix-agnostic way.
We want to break up the RuleCodePrefix[1] enum into smaller enums.
To facilitate that this commit introduces a new wrapping type around
RuleCodePrefix so that we can start breaking it apart.
[1]: Actually `RuleCodePrefix` is the previous name of the autogenerated
enum ... I renamed it in b19258a243 to
RuleSelector since `ALL` isn't a prefix. This commit now renames it back
but only because the new `RuleSelector` wrapper type, introduced in this
commit, will let us move the `ALL` variant from `RuleCodePrefix` to
`RuleSelector` in the next commit.
At present, `ISC001` and `ISC002` flag concatenations like the following:
```py
"a" "b" # ISC001
"a" \
"b" # ISC002
```
However, multiline concatenations are allowed.
This PR adds a setting:
```toml
[tool.ruff.flake8-implicit-str-concat]
allow-multiline = false
```
Which extends `ISC002` to _also_ flag multiline concatenations, like:
```py
(
"a" # ISC002
"b"
)
```
Note that this is backwards compatible, as `allow-multiline` defaults to `true`.
This _did_ fix https://github.com/charliermarsh/ruff/issues/1894, but was a little premature. `toml` doesn't actually depend on `toml-edit` yet, and `v0.5.11` was mostly about deprecations AFAICT. So upgrading might solve that issue, but could introduce other incompatibilities, and I'd like to minimize churn. I expect that `toml` will have a new release soon, so we can revert this revert.
Reverts charliermarsh/ruff#2040.
More accurate since the enum also encompasses:
* ALL (which isn't a prefix at all)
* fully-qualified rule codes (which aren't prefixes unless you say
they're a prefix to the empty string but that's not intuitive)
Fixes: #1953
@charliermarsh thank you for the tips in the issue.
I'm not very familiar with Rust, so please excuse if my string formatting syntax is messy.
In terms of testing, I compared output of `flake8 --format=pylint ` and `cargo run --format=pylint` on the same code and the output syntax seems to check out.
# This commit was automatically generated by running the following
# script (followed by `cargo +nightly fmt`):
import glob
import re
from typing import NamedTuple
class Rule(NamedTuple):
code: str
name: str
path: str
def rules() -> list[Rule]:
"""Returns all the rules defined in `src/registry.rs`."""
file = open('src/registry.rs')
rules = []
while next(file) != 'ruff_macros::define_rule_mapping!(\n':
continue
while (line := next(file)) != ');\n':
line = line.strip().rstrip(',')
if line.startswith('//'):
continue
code, path = line.split(' => ')
name = path.rsplit('::')[-1]
rules.append(Rule(code, name, path))
return rules
code2name = {r.code: r.name for r in rules()}
for pattern in ('src/**/*.rs', 'ruff_cli/**/*.rs', 'ruff_dev/**/*.rs', 'scripts/add_*.py'):
for name in glob.glob(pattern, recursive=True):
with open(name) as f:
text = f.read()
text = re.sub('Rule(?:Code)?::([A-Z]\w+)', lambda m: 'Rule::' + code2name[m.group(1)], text)
text = re.sub(r'(?<!"<FilePattern>:<)RuleCode\b', 'Rule', text)
text = re.sub('(use crate::registry::{.*, Rule), Rule(.*)', r'\1\2', text) # fix duplicate import
with open(name, 'w') as f:
f.write(text)
The Settings struct previously contained the fields:
pub enabled: HashableHashSet<RuleCode>,
pub fixable: HashableHashSet<RuleCode>,
This commit merges both fields into one by introducing a new
RuleTable type, wrapping HashableHashMap<RuleCode, bool>,
which has the following benefits:
1. It makes the invalid state that a rule is
disabled but fixable unrepresentable.
2. It encapsulates the implementation details of the table.
(It currently uses an FxHashMap but that may change.)
3. It results in more readable code.
settings.rules.enabled(rule)
settings.rules.should_fix(rule)
is more readable than:
settings.enabled.contains(rule)
settings.fixable.contains(rule)
The caching mechanism of the CLI (ruff_cli::cache) relies on
ruff::settings::Settings implementing the Hash trait.
The ruff::settings::Settings struct previously couldn't automatically
derive the Hash implementation via the #[derive(Hash)] macro attribute
since some of its field types intentionally[1][2] don't implement Hash
(namely regex::Regex, globset::GlobMatcher and globset::GlobSet and
HashMap and HashSet from the standard library).
The code therefore previously implemented the Hash trait by hand for the
whole struct. Implementing Hash by hand for structs that are subject to
change is a bad idea since it's very easy to forget to update the Hash
implementation when adding a new field to the struct. And the Hash
implementation indeed was already incorrect by omitting several fields
from the hash.
This commit introduces wrapper types for Regex, GlobMatcher, GlobSet,
HashSet & HashMap that implement Hash so that we can still add
#[derive(Hash)] to the Settings struct, guaranteeing a correct hash
implementation.
[1]: https://github.com/rust-lang/regex/issues/364#issuecomment-301082076
[2]: The standard library doesn't impl<T: Hash + Ord> Hash for HashSet<T>
presumably since sorted() requires an allocation and Hash
implementations are generally expected to work without allocations.
We want to automatically derive Hash for the library settings, which
requires us to split off all the settings unused by the library
(since these shouldn't affect the hash used by ruff_cli::cache).
This lets you test the ruff linters or use the ruff library
without having to compile the ~100 additional dependencies
that are needed by the CLI.
Because we set the following in the [workspace] section of Cargo.toml:
default-members = [".", "ruff_cli"]
`cargo run` still runs the CLI and `cargo test` still tests
the code in src/ as well as the code in the new ruff_cli crate.
(But you can now also run `cargo test -p ruff` to only test the linters.)
This PR implements `W505` (`DocLineTooLong`), which is similar to `E501`
(`LineTooLong`) but confined to doc lines.
I based the "doc line" definition on pycodestyle, which defines a doc
line as a standalone comment or string statement. Our definition is a
bit more liberal, since we consider any string statement a doc line
(even if it's part of a multi-line statement) -- but that seems fine to
me.
Note that, unusually, this rule requires custom extraction from both the
token stream (to find standalone comments) and the AST (to find string
statements).
Closes#1784.