From f6b40a021f8a9ace1a2cb1054d32bd967fdd088a Mon Sep 17 00:00:00 2001 From: konsti Date: Fri, 21 Jul 2023 11:32:26 +0200 Subject: [PATCH] Document shrinking script (#5942) **Summary** Document shrinking script: I thinks it's both in a good enough state and valuable enough to document it's usage. --- crates/ruff_python_formatter/README.md | 36 +++++++++++++++++++------- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/crates/ruff_python_formatter/README.md b/crates/ruff_python_formatter/README.md index 1d8763cea8..ab8e8aa6ca 100644 --- a/crates/ruff_python_formatter/README.md +++ b/crates/ruff_python_formatter/README.md @@ -267,15 +267,6 @@ git clone --branch 3.10 https://github.com/python/cpython.git crates/ruff/resour cargo run --bin ruff_dev -- format-dev --stability-check crates/ruff/resources/test/cpython ``` -It is also possible large number of repositories using ruff. This dataset is large (~60GB), so we -only do this occasionally: - -```shell -curl https://raw.githubusercontent.com/akx/ruff-usage-aggregate/master/data/known-github-tomls-clean.jsonl> github_search.jsonl -python scripts/check_ecosystem.py --checkouts target/checkouts --projects github_search.jsonl -v $(which true) $(which true) -cargo run --bin ruff_dev -- format-dev --stability-check --multi-project target/checkouts -``` - Compared to `ruff check`, `cargo run --bin ruff_dev -- format-dev` has 4 additional options: - `--write`: Format the files and write them back to disk @@ -284,6 +275,33 @@ Compared to `ruff check`, `cargo run --bin ruff_dev -- format-dev` has 4 additio - `--error-file`: Use together with `--multi-project`, this writes all errors (but not status messages) to a file. +It is also possible to check a large number of repositories. This dataset is large (~60GB), so we +only do this occasionally: + +```shell +# Get the list of projects +curl https://raw.githubusercontent.com/akx/ruff-usage-aggregate/master/data/known-github-tomls-clean.jsonl > github_search.jsonl +# Repurpose this script to download the repositories for us +python scripts/check_ecosystem.py --checkouts target/checkouts --projects github_search.jsonl -v $(which true) $(which true) +# Check each project for formatter stability +cargo run --bin ruff_dev -- format-dev --stability-check --error-file target/formatter-ecosystem-errors.txt --multi-project target/checkouts +``` + +To shrink a formatter error from an entire file to a minimal reproducible example, you can use +`ruff_shrinking`: + +```shell +cargo run --bin ruff_shrinking -- target/shrinking.py "Unstable formatting" "target/release/ruff_dev format-dev --stability-check target/shrinking.py" +``` + +The first argument is the input file, the second is the output file where the candidates +and the eventual minimized version will be written to. The third argument is a regex matching the +error message, e.g. "Unstable formatting" or "Formatter error". The last argument is the command +with the error, e.g. running the stability check on the candidate file. The script will try various +strategies to remove parts of the code. If the output of the command still matches, it will use that +slightly smaller code as starting point for the next iteration, otherwise it will revert and try +a different strategy until all strategies are exhausted. + ## The orphan rules and trait structure For the formatter, we would like to implement `Format` from the rust_formatter crate for all AST