* Use phf for confusables to reduce llvm lines
## Summary
This replaces FxHashMap for the confusables with a perfect hash map from the [phf crate](https://github.com/rust-phf/rust-phf) to reduce the generated llvm instructions.
A perfect hash function is one that doesn't have any collisions. We can build one because we know all keys at compile time. This improves hashmap efficiency, even though this is likely not noticeable in our case (except someone has a large non-english crate to test on).
The original hashmap contained a lot of duplicates, which i had to remove when phf_map complained, i did so by sorting the keys.
The important part that it reduces the llvm instructions generated (#3808, `RUSTFLAGS="-Csymbol-mangling-version=v0" cargo llvm-lines -p ruff --lib | head -20`):
```
Lines Copies Function name
----- ------ -------------
1740502 38973 (TOTAL)
27423 (1.6%, 1.6%) 1 (0.0%, 0.0%) ruff[cef4c65d96248843]::rules::ruff::rules::confusables::CONFUSABLES::{closure#0}
10193 (0.6%, 2.2%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::codes::RuleCodePrefix>::iter
8107 (0.5%, 2.6%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::codes::Rule>::noqa_code
7345 (0.4%, 3.0%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::checkers::ast::Checker as ruff_python_ast[3778b140caf21545]::visitor::Visitor>::visit_stmt
6412 (0.4%, 3.4%) 1 (0.0%, 0.0%) <<ruff[cef4c65d96248843]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_map::<toml_edit[7e3a6c5e67260672]::de::spanned::SpannedDeserializer<toml_edit[7e3a6c5e67260672]::de::value::ValueDeserializer>>
6412 (0.4%, 3.8%) 1 (0.0%, 0.0%) <<ruff[cef4c65d96248843]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_map::<toml_edit[7e3a6c5e67260672]::de::table::TableMapAccess>
6409 (0.4%, 4.2%) 1 (0.0%, 0.0%) <<ruff[cef4c65d96248843]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_map::<toml_edit[7e3a6c5e67260672]::de::datetime::DatetimeDeserializer>
5696 (0.3%, 4.5%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::checkers::ast::Checker as ruff_python_ast[3778b140caf21545]::visitor::Visitor>::visit_expr
4448 (0.3%, 4.7%) 1 (0.0%, 0.0%) ruff[cef4c65d96248843]::flake8_to_ruff::converter::convert
3702 (0.2%, 4.9%) 1 (0.0%, 0.0%) <&ruff[cef4c65d96248843]::registry::Linter as core[da82827a87f140f9]::iter::traits::collect::IntoIterator>::into_iter
3349 (0.2%, 5.1%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::registry::Linter>::code_for_rule
3132 (0.2%, 5.3%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::codes::Rule as core[da82827a87f140f9]::fmt::Debug>::fmt
3130 (0.2%, 5.5%) 1 (0.0%, 0.0%) <&str as core[da82827a87f140f9]::convert::From<&ruff[cef4c65d96248843]::codes::Rule>>::from
3130 (0.2%, 5.7%) 1 (0.0%, 0.0%) <&str as core[da82827a87f140f9]::convert::From<ruff[cef4c65d96248843]::codes::Rule>>::from
3130 (0.2%, 5.9%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::codes::Rule as core[da82827a87f140f9]::convert::AsRef<str>>::as_ref
3128 (0.2%, 6.0%) 1 (0.0%, 0.0%) <ruff[cef4c65d96248843]::codes::RuleIter>::get
2669 (0.2%, 6.2%) 1 (0.0%, 0.0%) <<ruff[cef4c65d96248843]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_seq::<toml_edit[7e3a6c5e67260672]::de::array::ArraySeqAccess>
```
After:
```
Lines Copies Function name
----- ------ -------------
1710487 38900 (TOTAL)
10193 (0.6%, 0.6%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::codes::RuleCodePrefix>::iter
8107 (0.5%, 1.1%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::codes::Rule>::noqa_code
7345 (0.4%, 1.5%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::checkers::ast::Checker as ruff_python_ast[5588cd60041c8605]::visitor::Visitor>::visit_stmt
6412 (0.4%, 1.9%) 1 (0.0%, 0.0%) <<ruff[52408f46d2058296]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_map::<toml_edit[7e3a6c5e67260672]::de::spanned::SpannedDeserializer<toml_edit[7e3a6c5e67260672]::de::value::ValueDeserializer>>
6412 (0.4%, 2.2%) 1 (0.0%, 0.0%) <<ruff[52408f46d2058296]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_map::<toml_edit[7e3a6c5e67260672]::de::table::TableMapAccess>
6409 (0.4%, 2.6%) 1 (0.0%, 0.0%) <<ruff[52408f46d2058296]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_map::<toml_edit[7e3a6c5e67260672]::de::datetime::DatetimeDeserializer>
5696 (0.3%, 3.0%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::checkers::ast::Checker as ruff_python_ast[5588cd60041c8605]::visitor::Visitor>::visit_expr
4448 (0.3%, 3.2%) 1 (0.0%, 0.0%) ruff[52408f46d2058296]::flake8_to_ruff::converter::convert
3702 (0.2%, 3.4%) 1 (0.0%, 0.0%) <&ruff[52408f46d2058296]::registry::Linter as core[da82827a87f140f9]::iter::traits::collect::IntoIterator>::into_iter
3349 (0.2%, 3.6%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::registry::Linter>::code_for_rule
3132 (0.2%, 3.8%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::codes::Rule as core[da82827a87f140f9]::fmt::Debug>::fmt
3130 (0.2%, 4.0%) 1 (0.0%, 0.0%) <&str as core[da82827a87f140f9]::convert::From<&ruff[52408f46d2058296]::codes::Rule>>::from
3130 (0.2%, 4.2%) 1 (0.0%, 0.0%) <&str as core[da82827a87f140f9]::convert::From<ruff[52408f46d2058296]::codes::Rule>>::from
3130 (0.2%, 4.4%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::codes::Rule as core[da82827a87f140f9]::convert::AsRef<str>>::as_ref
3128 (0.2%, 4.5%) 1 (0.0%, 0.0%) <ruff[52408f46d2058296]::codes::RuleIter>::get
2669 (0.2%, 4.7%) 1 (0.0%, 0.0%) <<ruff[52408f46d2058296]::settings::options::Options as serde[d89b1b632568f5a3]::de::Deserialize>::deserialize::__Visitor as serde[d89b1b632568f5a3]::de::Visitor>::visit_seq::<toml_edit[7e3a6c5e67260672]::de::array::ArraySeqAccess>
2659 (0.2%, 4.9%) 1 (0.0%, 0.0%) <&ruff[52408f46d2058296]::codes::Pylint as core[da82827a87f140f9]::iter::traits::collect::IntoIterator>::into_iter
```
I'd assume this has a positive effect both on compile time and on runtime, but i don't know the actual effect on compile times and can't really measure.
## Test plan
Check CI for any performance regressions.
This should fix#3808 if we merge it.
* clippy
* Update update_ambiguous_characters.py
* Use dummy verbatim formatter for all nodes
* Use new formatter infrastructure in CLI and test
* Expose the new formatter in the CLI
* Merge import blocks
This adds a new rule `InvalidPyprojectToml` that lints pyproject.toml by checking if https://github.com/PyO3/pyproject-toml-rs can parse it. This means the linting is currently very basic, e.g. we don't check whether the name is actually a valid python project name or appropriately normalized. It does catch errors e.g. with invalid dependency requirements or problems withs the license specifications. It is open to be extended in the future (validate name, SPDX expressions, classifiers, ...), either in ruff or in pyproject-toml-rs.
Test plan:
```
scripts/ecosystem_all_check.sh check --select RUF200
```
This lead to a bunch of
```
RUF200 Failed to parse pyproject.toml: missing field `name`
```
(e.g. https://github.com/amitsk/fastapi-todos/blob/main/pyproject.toml) which is indeed invalid (https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#specification).
Filtering those out, the following other problems were found by `cd target/ecosystem_all_results/ && rg RUF200`:
```
UCL-ARC:rred-reports.stdout.txt
1:pyproject.toml:27:16: RUF200 Failed to parse pyproject.toml: Version specifier `>='3.9'` doesn't match PEP 440 rules
EndlessTrax:python-start-project.stdout.txt
1:pyproject.toml:14:16: RUF200 Failed to parse pyproject.toml: Expected package name starting with an alphanumeric character, found '#'
redjax:gardening-api.stdout.txt
1:pyproject.toml:7:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
ajslater:codex.stdout.txt
2: 3:17 RUF200 Failed to parse pyproject.toml: invalid type: sequence, expected a string
LDmitriy7:404_AvatarsBot.stdout.txt
1:pyproject.toml:3:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
ajslater:comicbox.stdout.txt
1:pyproject.toml:3:17: RUF200 Failed to parse pyproject.toml: invalid type: sequence, expected a string
manueldevillena:forecast-earnings.stdout.txt
1:pyproject.toml:24:12: RUF200 Failed to parse pyproject.toml: Expected one of `@`, `(`, `<`, `=`, `>`, `~`, `!`, `;`, found `^`
redjax:ohio_utility_scraper.stdout.txt
1:pyproject.toml:11:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
agronholm:typeguard.stdout.txt
1:pyproject.toml:40:8: RUF200 Failed to parse pyproject.toml: Expected a valid marker name, found 'python_implementation'
cyuss:decathlon-turnover.stdout.txt
1:pyproject.toml:7:12: RUF200 Failed to parse pyproject.toml: invalid type: string "Youcef", expected a table with 'name' and 'email' keys
ajslater:boilerplate.stdout.txt
1:pyproject.toml:3:17: RUF200 Failed to parse pyproject.toml: invalid type: sequence, expected a string
kaparoo:lightning-project-template.stdout.txt
1:pyproject.toml:56:16: RUF200 Failed to parse pyproject.toml: You can't mix a >= operator with a local version (`+cu117`)
dijital20:pytexas2023-decorators.stdout.txt
1:pyproject.toml:5:11: RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
pfouque:django-anymail-history.stdout.txt
1:pyproject.toml:137:12: RUF200 Failed to parse pyproject.toml: Version specifier `> = 1.2.0` doesn't match PEP 440 rules
pfouque:django-fakemessages.stdout.txt
1:pyproject.toml:130:12: RUF200 Failed to parse pyproject.toml: Version specifier `> = 1.2.0` doesn't match PEP 440 rules
pypa:build.stdout.txt
1:tests/packages/test-invalid-requirements/pyproject.toml:2:12: RUF200 Failed to parse pyproject.toml: Expected one of `@`, `(`, `<`, `=`, `>`, `~`, `!`, `;`, found `i`
4:tests/packages/test-no-requires/pyproject.toml:1:1: RUF200 Failed to parse pyproject.toml: missing field `requires`
UnoYakshi:FRAAND.stdout.txt
2: 3:11 RUF200 Failed to parse pyproject.toml: Version `` doesn't match PEP 440 rules
DHolmanCoding:python-template.stdout.txt
1:pyproject.toml:22:1: RUF200 Failed to parse pyproject.toml: missing field `requires`
```
Overall, this emitted errors in 43 out of 3408 projects (`rg -c RUF200 target/ecosystem_all_results/ | wc -l`)
Co-authored-by: Micha Reiser <micha@reiser.io>