ruff/crates
Charlie Marsh 6fffde72e7
Use `memchr` for string lexing (#9888)
## Summary

On `main`, string lexing consists of walking through the string
character-by-character to search for the closing quote (with some
nuance: we also need to skip escaped characters, and error if we see
newlines in non-triple-quoted strings). This PR rewrites `lex_string` to
instead use `memchr` to search for the closing quote, which is
significantly faster. On my machine, at least, the `globals.py`
benchmark (which contains a lot of docstrings) gets 40% faster...

```text
lexer/numpy/globals.py  time:   [3.6410 µs 3.6496 µs 3.6585 µs]
                        thrpt:  [806.53 MiB/s 808.49 MiB/s 810.41 MiB/s]
                 change:
                        time:   [-40.413% -40.185% -39.984%] (p = 0.00 < 0.05)
                        thrpt:  [+66.623% +67.181% +67.822%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
lexer/unicode/pypinyin.py
                        time:   [12.422 µs 12.445 µs 12.467 µs]
                        thrpt:  [337.03 MiB/s 337.65 MiB/s 338.27 MiB/s]
                 change:
                        time:   [-9.4213% -9.1930% -8.9586%] (p = 0.00 < 0.05)
                        thrpt:  [+9.8401% +10.124% +10.401%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
lexer/pydantic/types.py time:   [107.45 µs 107.50 µs 107.56 µs]
                        thrpt:  [237.11 MiB/s 237.24 MiB/s 237.35 MiB/s]
                 change:
                        time:   [-4.0108% -3.7005% -3.3787%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4968% +3.8427% +4.1784%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
lexer/numpy/ctypeslib.py
                        time:   [46.123 µs 46.165 µs 46.208 µs]
                        thrpt:  [360.36 MiB/s 360.69 MiB/s 361.01 MiB/s]
                 change:
                        time:   [-19.313% -18.996% -18.710%] (p = 0.00 < 0.05)
                        thrpt:  [+23.016% +23.451% +23.935%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
lexer/large/dataset.py  time:   [231.07 µs 231.19 µs 231.33 µs]
                        thrpt:  [175.87 MiB/s 175.97 MiB/s 176.06 MiB/s]
                 change:
                        time:   [-2.0437% -1.7663% -1.4922%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5148% +1.7981% +2.0864%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
```
2024-02-08 17:23:06 +00:00
..
ruff Make show-settings filters directory-agnostic (#9866) 2024-02-07 03:20:27 +00:00
ruff_benchmark Approximate tokens len (#9546) 2024-01-19 17:39:37 +01:00
ruff_cache Make all dependencies workspace dependencies (#9333) 2024-01-02 13:41:59 +00:00
ruff_dev Add rule removal infrastructure (#9691) 2024-02-01 13:35:02 -06:00
ruff_diagnostics Enable annotation quoting for multi-line expressions (#9142) 2023-12-15 01:03:09 +00:00
ruff_formatter Range formatting: Fix invalid syntax after parenthesizing expression (#9751) 2024-02-02 17:56:25 +01:00
ruff_index Make all dependencies workspace dependencies (#9333) 2024-01-02 13:41:59 +00:00
ruff_linter RUF027 no longer has false negatives with string literals inside of method calls (#9865) 2024-02-08 10:00:20 -05:00
ruff_macros Add rule removal infrastructure (#9691) 2024-02-01 13:35:02 -06:00
ruff_notebook Detect automagic-like assignments in notebooks (#9653) 2024-01-29 12:55:44 +00:00
ruff_python_ast Respect `async with` in `timeout-without-await` (#9859) 2024-02-06 12:04:24 -05:00
ruff_python_codegen Remove source path from parser errors (#9322) 2023-12-30 20:33:05 +00:00
ruff_python_formatter Implement `AnyNode`/`AnyNodeRef` for `FStringFormatSpec` (#9836) 2024-02-05 19:23:43 +00:00
ruff_python_index Index multiline f-strings (#9837) 2024-02-05 21:25:33 -05:00
ruff_python_literal Use Rust 1.75 toolchain (#9437) 2024-01-08 18:03:16 +01:00
ruff_python_parser Use `memchr` for string lexing (#9888) 2024-02-08 17:23:06 +00:00
ruff_python_resolver Use Rust 1.75 toolchain (#9437) 2024-01-08 18:03:16 +01:00
ruff_python_semantic Short-circuit typing matches based on imports (#9800) 2024-02-04 14:06:44 -05:00
ruff_python_stdlib Slight speed-up for lowercase and uppercase identifier checks (#9798) 2024-02-03 14:40:41 +00:00
ruff_python_trivia Add fast-path for comment detection (#9808) 2024-02-05 11:00:18 -05:00
ruff_shrinking Bump version to v0.2.1 (#9843) 2024-02-05 15:31:05 -05:00
ruff_source_file Fix blank-line docstring rules for module-level docstrings (#9878) 2024-02-07 16:48:28 -05:00
ruff_text_size Range formatting: Fix invalid syntax after parenthesizing expression (#9751) 2024-02-02 17:56:25 +01:00
ruff_wasm Deduplicate deprecation warnings for v0.2.0 release (#9764) 2024-02-01 17:10:24 -06:00
ruff_workspace Fix typo in option name: `output_format` -> `output-format` (#9874) 2024-02-07 16:17:58 +00:00