Commit Graph

4843 Commits

Author SHA1 Message Date
Daniel Hiltgen 3f30836734
CUDA: filter devices on secondary discovery (#13317)
We now do a deeper probe of CUDA devices to verify the library version has
the correct compute capability coverage for the device.  Due to ROCm also
interpreting the CUDA env var to filter AMD devices, we try to avoid setting
it which leads to problems in mixed vendor systems.  However without setting
it for this deeper probe, each CUDA library subprocess discovers all CUDA GPUs
and on systems with lots of GPUs, this can lead to hitting timeouts.  The fix is
to turn on the CUDA visibility env var just for this deeper probe use-case.
2025-12-03 12:58:16 -08:00
Nathan Hook cc9555aff0
Update user message format for temperature query (#13256) 2025-12-02 15:08:39 -08:00
hello_world 20aee96706
Add Vulkan GPU support instructions in development.md (#13265)
Added Vulkan SDK installation instructions and environment variable setup for building with Vulkan support.
2025-12-02 13:37:32 -08:00
Daniel Hiltgen 18b5958d46
test: avoid ministral tools test on low vram (#13302)
Avoid hitting test timeouts
2025-12-02 13:18:55 -08:00
Jesse Gross 5317202c38 llm: Don't always evict models on CPU-only systems
Model eviction happens when we have at least one other model
loaded and are unable to load all layers into VRAM. However, on
CPU-only systems we can never load layers into VRAM, so this
constantly triggered eviction.

Fixes #13227
2025-12-02 10:58:08 -08:00
Daniel Hiltgen d771043e88
test: add ministral-3 (#13300) 2025-12-02 09:52:16 -08:00
Daniel Hiltgen f8f1071818
CUDA: verify CC is supported by target library (#13298) 2025-12-02 09:28:41 -08:00
Patrick Devine d3e0a0dee4
model: ministral w/ llama4 scaling (#13292)
This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2025-12-01 23:20:14 -08:00
Daniel Hiltgen 554172759c
win: warn if ggml-base detected in PATH (#13289)
If the user has somehow installed another GGML based app which places a
ggml-base lib somewhere in their PATH, we can experience runtime problems
due to incompatibilities.  This change adds a warning message if we detect
a ggml-base outside of our install location to aid in troubleshooting.
2025-12-01 15:36:47 -08:00
Bruce MacDonald 5b6a8e6001
api/client: handle non-json streaming errors (#13007)
While processing the response stream during a chat or generation if an error is occurred it is parsed and returned to the user. The issue with the existing code is that this assumed the response would be valid JSON, which is not a safe assumption and caused cryptic error messages to be displayed due to parsing failures:
`invalid character 'i' looking for beginning of value`

This change updates the stream function to return the raw error string if it cant be parsed as JSON. This should help with debugging issues by making sure the actual error reaches the user.
2025-12-01 15:10:16 -08:00
Daniel Hiltgen 467bbc0dd5
jetpack: require exact match or skip cuda_jetpack* (#13288)
The cuda_jetpack libs will enumerate discrete GPUs on SBSA systems
which leads to runtime failures of missing kernels.  This fix
requires an exact match to enable jetpacks instead of relying on
enumeration to filter out supported libraries.
2025-12-01 12:48:16 -08:00
Jeffrey Morgan 6d9f9323c5
.gitattributes: add app/webview to linguist-vendored (#13274) 2025-11-29 23:46:10 -05:00
Ondrej Kokes 0c2489605d
docs: fix output formatting in faq.mdx (#13231)
There were a few Markdown typos in one FAQ answer. It now renders as a proper ascii table.
2025-11-28 19:19:21 -05:00
EntropyYue 8b1b89a984
docs: remove deprecated parameters (#13237) 2025-11-26 11:03:09 +09:00
Eva H 47e272c35a
app/cmd: update ollama help to navigate to ollama doc instead of github page (#13174) 2025-11-20 16:30:35 -05:00
Jeffrey Morgan 417a81fda3
app: open app instead of always navigating to / on connect (#13164) 2025-11-20 12:59:17 -08:00
Daniel Hiltgen dba62ff3a5
discovery: fix cuda overlap case (#13176)
Recent refactoring introduced a regression for filtering cuda overlap to favor newest supported version.
2025-11-20 12:15:37 -08:00
Grace d70e935526
Parser for Cogito v2 (#13145) 2025-11-19 17:21:07 -08:00
Michael Yang 5c1063df7f
deepseek2: upgrade to run v3+ models (#13166)
the check for mla omits v3 and r1 which should not return unsupported.
instead check the tokenizer for compatibility
2025-11-19 17:05:39 -08:00
Jesse Gross cb485b2019 kvcache: Run tests both with and without PermutedV
The causal cache can store data differently depending on what is
best for the backend. We should run tests both ways.
2025-11-19 16:45:30 -08:00
nicole pardal b2af50960f
nomic-embed: nomic-embed-text defaulted to ollama runner (#13144) 2025-11-19 13:03:44 -08:00
Michael Yang eac5b8bfbd chore: mark vulkan shaders as vendored files 2025-11-19 12:01:23 -08:00
Patrick Devine 604e43b28d
models: enable deepseek2 (deepseek v3.1 w/ MLA) on the new engine (#13151) 2025-11-18 22:03:50 -08:00
Jesse Gross 53985b3c4d kvcache: Use SetRows to store cache data
We currently copy data into the KV cache in contiguous buffers using
ggml_cpy(). ggml_set_rows() was introduced to allow scatter operation
so that contiguous buffers are no longer required. The direct primary
benefit of this is that we no longer need to perform defragmentation.

However, GGML recently removed an optimization for ggml_cpy() and
we picked it up in 544b673 "ggml update to b6840 (#12791)". This
caused a roughly 40% drop in token generation performance on CUDA
due to CUDA graphs no longer being used. By switching to
ggml_set_rows(), the original optimization is no longer necessary
and CUDA performance is restored.

Fixes #13112
2025-11-18 20:42:28 -08:00
Jesse Gross b6e02cbbd2 ggml: Automatically make tensors contiguous on reshape
GGML requires tensors to be contiguous for reshape and if
this is not the case, it will assert fail. Contiguous is an
expensive operation, so it's best to do it lazily when it is
actually required rather than ahead of time when it may not
be needed.
2025-11-18 20:42:28 -08:00
Grace 91935631ac
Renderer for Cogito v2 (#13139) 2025-11-18 19:06:34 -08:00
nicole pardal 8de30b568a
nomic-embed-text model implementation (#13071) 2025-11-18 18:28:10 -08:00
Daniel Hiltgen 485da9fd35
win: exit instead of abort (#13138)
Calling abort on windows triggers the C++ runtime to attempt a debugger
attach, which causes the crashed runners to hang instead of exit, leading
to a timeout instead of a fast failure during discovery.
2025-11-18 16:33:33 -08:00
Michael Yang 0796d79d19 cuda: skip large batches
cuda panics on batches larger than 1024 so skip those and fallback to
cpu
2025-11-18 16:11:37 -08:00
Michael Yang 92981ae3f2 deepseekocr 2025-11-18 16:11:37 -08:00
Lhiam Andrei Lingco 8ed1adf3db
docs: fix typo in vscode.mdx (#13116) 2025-11-18 13:18:42 -08:00
Michael Yang 440a3823a6
fix(tokenizer): add special tokens to empty inputs (#13091) 2025-11-18 11:16:56 -08:00
Michael Yang 718961de68
migrate to golangci-lint v2 (#13109)
* migrate to golangci-lint v2
* copyloopvar
2025-11-18 11:00:26 -08:00
SamareshSingh 330f62a7fa
docs: add Void Editor to community integrations (#13124)
Void is an open source AI code editor and Cursor alternative that supports
Ollama. It's built on VS Code and allows users to connect directly to Ollama
for private LLM usage without going through a middleman backend.

Key features:
- Open source Cursor alternative
- Direct Ollama integration
- VS Code fork with full compatibility
- Agent mode and MCP support
- Works with any open source model

Fixes #12919

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
2025-11-17 19:20:36 -08:00
Grace 584e2d646f
Add deepseek v3.1 (#13063)
* Add mla for flash attention
* Revert to using chunks
2025-11-17 18:03:21 -08:00
Eva H 1fd4cb87b2
app/cmd: restrict ollama:// URL scheme to supported paths (#13120) 2025-11-17 20:10:45 -05:00
Cerussite 4aba2e8b72
discover: Support cgroups cores and memory limitations (#10292)
* Add supports for cgroups cores and memory limitations

* fix compile error and add logs

* remove cpu info log
2025-11-17 16:13:03 -08:00
Daniel Hiltgen 2f36d769aa
bring back sysfs based VRAM information for AMD (#12871)
* build: optimize dockerfile context for iterating

This moves the copy of the source into the layer AFTER
doing software installs so we don't have to go through
the RPM install for cuda, etc. every time you touch a
source file.

* amd: implement linux sysfs based VRAM lookup

This adds a C++ implementation of sysfs DRM VRAM discovery
for more accurate free VRAM data on linux for AMD GPUs.
2025-11-17 15:40:58 -08:00
Daniel Hiltgen 399eacf486
ci: fix missing vulkan binaries in linux bundles (#13123) 2025-11-17 15:39:59 -08:00
Eva H 231cc878cb
app/ui: fix to point ollama client to ui backend in dev mode (#13079) 2025-11-17 12:58:35 -05:00
Jeffrey Morgan aa676b313f
docs: link to ollama.com instead of hardcoding list of cloud models (#13110) 2025-11-16 20:56:09 -08:00
omahs dd0ed0ef17
docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00
Joel Bryan Juliano d5649821ae
readme: add Kdeps to community integrations (#11877)
Kdeps is an AI framework for building Dockerized full-stack AI
applications declaratively and uses Ollama LLM models on the
backend
2025-11-15 19:19:03 -08:00
pierwill 4cea757e70
server: clean up manifest documentation (#12995)
Co-authored-by: pierwill <pierwill@users.noreply.github.com>
2025-11-15 19:13:15 -08:00
Vignesh Skanda a751bc159c
llama: test case typo and readability improvements (#13078) 2025-11-15 18:54:27 -08:00
Laurențiu Nicola 5d31242fbf
discover: fix typos in runner.go (#13096) 2025-11-15 18:52:54 -08:00
Patrick Devine d7fd72193f
tests: basic benchmarking test framework (#12964)
This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.
2025-11-15 18:17:40 -08:00
Daniel Hiltgen 72ff5b9d8c
log: warn if user overrides detected (#13088)
Many failed GPU discovery issues recently can be traced to incorrect override settings.
This extra logging should help quickly spot these and guide users to try unsetting them first.
2025-11-14 14:36:28 -08:00
Parth Sareen ce29f695b4
docs: add logprobs to openapi (#13090) 2025-11-14 14:14:58 -08:00
Michael Yang 12b174b10e
fix tensor merge (#13053) 2025-11-13 15:32:34 -08:00