ollama/cmd/bench
Eloi Torrents dac4f17fea
cmd/bench: fix binary name in README (#13276)
2025-12-10 14:16:58 -08:00
..
README.md cmd/bench: fix binary name in README (#13276) 2025-12-10 14:16:58 -08:00
bench.go cmd/bench: support writing benchmark output to file (#13263) 2025-12-04 13:22:41 -08:00
bench_test.go tests: basic benchmarking test framework (#12964) 2025-11-15 18:17:40 -08:00

README.md

Ollama Benchmark Tool

A Go-based command-line tool for benchmarking Ollama models with configurable parameters and multiple output formats.

Features

  • Benchmark multiple models in a single run
  • Support for both text and image prompts
  • Configurable generation parameters (temperature, max tokens, seed, etc.)
  • Supports benchstat and CSV output formats
  • Detailed performance metrics (prefill, generate, load, total durations)

Building from Source

go build -o ollama-bench bench.go
./ollama-bench -model gpt-oss:20b -epochs 6 -format csv

Using Go Run (without building)

go run bench.go -model gpt-oss:20b -epochs 3

Usage

Basic Example

./ollama-bench -model gemma3 -epochs 6

Benchmark Multiple Models

./ollama-bench -model gemma3,gemma3n -epochs 6 -max-tokens 100 -p "Write me a short story" | tee gemma.bench
benchstat -col /name gemma.bench

With Image Prompt

./ollama-bench -model qwen3-vl -image photo.jpg -epochs 6 -max-tokens 100 -p "Describe this image"

Advanced Example

./ollama-bench -model llama3 -epochs 10 -temperature 0.7 -max-tokens 500 -seed 42 -format csv -output results.csv

Command Line Options

Option Description Default
-model Comma-separated list of models to benchmark (required)
-epochs Number of iterations per model 1
-max-tokens Maximum tokens for model response 0 (unlimited)
-temperature Temperature parameter 0.0
-seed Random seed 0 (random)
-timeout Timeout in seconds 300
-p Prompt text "Write a long story."
-image Image file to include in prompt
-k Keep-alive duration in seconds 0
-format Output format (benchstat, csv) benchstat
-output Output file for results "" (stdout)
-v Verbose mode false
-debug Show debug information false

Output Formats

Markdown Format

The default markdown format is suitable for copying and pasting into a GitHub issue and will look like:

 Model | Step | Count | Duration | nsPerToken | tokensPerSec |
|-------|------|-------|----------|------------|--------------|
| gpt-oss:20b | prefill | 124 | 30.006458ms | 241987.56 | 4132.44 |
| gpt-oss:20b | generate | 200 | 2.646843954s | 13234219.77 | 75.56 |
| gpt-oss:20b | load | 1 | 121.674208ms | - | - |
| gpt-oss:20b | total | 1 | 2.861047625s | - | - |

Benchstat Format

Compatible with Go's benchstat tool for statistical analysis:

BenchmarkModel/name=gpt-oss:20b/step=prefill 128 78125.00 ns/token 12800.00 token/sec
BenchmarkModel/name=gpt-oss:20b/step=generate 512 19531.25 ns/token 51200.00 token/sec
BenchmarkModel/name=gpt-oss:20b/step=load 1 1500000000 ns/request

CSV Format

Machine-readable comma-separated values:

NAME,STEP,COUNT,NS_PER_COUNT,TOKEN_PER_SEC
gpt-oss:20b,prefill,128,78125.00,12800.00
gpt-oss:20b,generate,512,19531.25,51200.00
gpt-oss:20b,load,1,1500000000,0

Metrics Explained

The tool reports four types of metrics for each model:

  • prefill: Time spent processing the prompt
  • generate: Time spent generating the response
  • load: Model loading time (one-time cost)
  • total: Total request duration