## Getting started 1. [Install `uv`](https://docs.astral.sh/uv/getting-started/installation/) - Unix: `curl -LsSf https://astral.sh/uv/install.sh | sh` - Windows: `powershell -c "irm https://astral.sh/uv/install.ps1 | iex"` 1. Build ty: `cargo build --bin ty --release` 1. `cd` into the benchmark directory: `cd scripts/ty_benchmark` 1. Install Pyright: `npm ci --ignore-scripts` 1. Run benchmarks: `uv run benchmark` Requires hyperfine 1.20 or newer. ## Benchmarks ### Cold check time Run with: ```shell uv run --python 3.14 benchmark ``` Measures how long it takes to type check a project without a pre-existing cache. You can run the benchmark with `--single-threaded` to measure the check time when using a single thread only. ### Warm check time Run with: ```shell uv run --python 3.14 benchmark --warm ``` Measures how long it takes to recheck a project if there were no changes. > **Note**: Of the benchmarked type checkers, only mypy supports caching. ### LSP: Time to first diagnostic Measures how long it takes for a newly started LSP to return the diagnostics for the files open in the editor. Run with: ```bash uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_fetch_diagnostics ``` **Note**: Use `-v -s` to see the set of diagnostics returned by each type checker. ### LSP: Re-check time Measure how long it takes to recheck all open files after making a single change in a file. Run with: ```bash uv run --python 3.14 pytest src/benchmark/test_lsp_diagnostics.py::test_incremental_edit ``` > **Note**: This benchmark uses [pull diagnostics](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_pullDiagnostics) for type checkers that support this operation (ty), and falls back to [publish diagnostics](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_publishDiagnostics) otherwise (Pyright, Pyrefly). ## Known limitations The tested type checkers implement Python's type system to varying degrees and some projects only successfully pass type checking using a specific type checker. ## Updating the benchmark The benchmark script supports snapshoting the results when running with `--snapshot` and `--accept`. The goal of those snapshots is to catch accidental regressions. For example, if a project adds new dependencies that we fail to install. They are not intended as a testing tool. E.g. the snapshot runner doesn't account for platform differences so that you might see differences when running the snapshots on your machine.