mirror of https://github.com/ollama/ollama
- Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models |
||
|---|---|---|
| .. | ||
| internal | ||
| auth.go | ||
| create.go | ||
| create_test.go | ||
| download.go | ||
| fixblobs.go | ||
| fixblobs_test.go | ||
| images.go | ||
| images_test.go | ||
| layer.go | ||
| manifest.go | ||
| manifest_test.go | ||
| model.go | ||
| modelpath.go | ||
| modelpath_test.go | ||
| prompt.go | ||
| prompt_test.go | ||
| quantization.go | ||
| quantization_test.go | ||
| routes.go | ||
| routes_create_test.go | ||
| routes_delete_test.go | ||
| routes_generate_test.go | ||
| routes_list_test.go | ||
| routes_test.go | ||
| sched.go | ||
| sched_test.go | ||
| sparse_common.go | ||
| sparse_windows.go | ||
| thinking.go | ||
| thinking_test.go | ||
| upload.go | ||