ruff/scripts/ty_benchmark/README.md

74 lines
2.6 KiB
Markdown

## Getting started
1. [Install `uv`](https://docs.astral.sh/uv/getting-started/installation/)
- Unix: `curl -LsSf https://astral.sh/uv/install.sh | sh`
- Windows: `powershell -c "irm https://astral.sh/uv/install.ps1 | iex"`
1. Build ty: `cargo build --bin ty --release`
1. `cd` into the benchmark directory: `cd scripts/ty_benchmark`
1. Install Pyright: `npm ci --ignore-scripts`
1. Run benchmarks: `uv run benchmark`
Requires hyperfine 1.20 or newer.
## Benchmarks
### Cold check time
Run with:
```shell
uv run --python 3.14 benchmark
```
Measures how long it takes to type check a project without a pre-existing cache.
You can run the benchmark with `--single-threaded` to measure the check time when using a single thread only.
### Warm check time
Run with:
```shell
uv run --python 3.14 benchmark --warm
```
Measures how long it takes to recheck a project if there were no changes.
> **Note**: Of the benchmarked type checkers, only mypy supports caching.
### LSP: Time to first diagnostic
Measures how long it takes for a newly started LSP to return the diagnostics for the files open in the editor.
Run with:
```bash
uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_fetch_diagnostics
```
**Note**: Use `-v -s` to see the set of diagnostics returned by each type checker.
### LSP: Re-check time
Measure how long it takes to recheck all open files after making a single change in a file.
Run with:
```bash
uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_incremental_edit
```
> **Note**: This benchmark uses [pull diagnostics](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_pullDiagnostics) for type checkers that support this operation (ty), and falls back to [publish diagnostics](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_publishDiagnostics) otherwise (Pyright, Pyrefly).
## Known limitations
The tested type checkers implement Python's type system to varying degrees and
some projects only successfully pass type checking using a specific type checker.
## Updating the benchmark
The benchmark script supports snapshoting the results when running with `--snapshot` and `--accept`.
The goal of those snapshots is to catch accidental regressions. For example, if a project adds
new dependencies that we fail to install. They are not intended as a testing tool. E.g. the snapshot runner doesn't account for platform differences so that
you might see differences when running the snapshots on your machine.