Co-authored-by: Claude <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| snapshots | ||
| src/benchmark | ||
| .gitignore | ||
| README.md | ||
| package-lock.json | ||
| package.json | ||
| pyproject.toml | ||
| uv.lock | ||
README.md
Getting started
- Unix:
curl -LsSf https://astral.sh/uv/install.sh | sh - Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
- Build ty:
cargo build --bin ty --release cdinto the benchmark directory:cd scripts/ty_benchmark- Install Pyright:
npm ci --ignore-scripts - Run benchmarks:
uv run benchmark
Requires hyperfine 1.20 or newer.
Benchmarks
Cold check time
Run with:
uv run --python 3.14 benchmark
Measures how long it takes to type check a project without a pre-existing cache.
You can run the benchmark with --single-threaded to measure the check time when using a single thread only.
Warm check time
Run with:
uv run --python 3.14 benchmark --warm
Measures how long it takes to recheck a project if there were no changes.
Note: Of the benchmarked type checkers, only mypy supports caching.
LSP: Time to first diagnostic
Measures how long it takes for a newly started LSP to return the diagnostics for the files open in the editor.
Run with:
uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_fetch_diagnostics
Note: Use -v -s to see the set of diagnostics returned by each type checker.
LSP: Re-check time
Measure how long it takes to recheck all open files after making a single change in a file.
Run with:
uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_incremental_edit
Note: This benchmark uses pull diagnostics for type checkers that support this operation (ty), and falls back to publish diagnostics otherwise (Pyright, Pyrefly).
Known limitations
The tested type checkers implement Python's type system to varying degrees and some projects only successfully pass type checking using a specific type checker.
Updating the benchmark
The benchmark script supports snapshoting the results when running with --snapshot and --accept.
The goal of those snapshots is to catch accidental regressions. For example, if a project adds
new dependencies that we fail to install. They are not intended as a testing tool. E.g. the snapshot runner doesn't account for platform differences so that
you might see differences when running the snapshots on your machine.