Commit Graph

4901 Commits

Author SHA1 Message Date
Joel Bryan Juliano d5649821ae
readme: add Kdeps to community integrations (#11877)
Kdeps is an AI framework for building Dockerized full-stack AI
applications declaratively and uses Ollama LLM models on the
backend
2025-11-15 19:19:03 -08:00
pierwill 4cea757e70
server: clean up manifest documentation (#12995)
Co-authored-by: pierwill <pierwill@users.noreply.github.com>
2025-11-15 19:13:15 -08:00
Vignesh Skanda a751bc159c
llama: test case typo and readability improvements (#13078) 2025-11-15 18:54:27 -08:00
Laurențiu Nicola 5d31242fbf
discover: fix typos in runner.go (#13096) 2025-11-15 18:52:54 -08:00
Patrick Devine d7fd72193f
tests: basic benchmarking test framework (#12964)
This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.
2025-11-15 18:17:40 -08:00
Daniel Hiltgen 72ff5b9d8c
log: warn if user overrides detected (#13088)
Many failed GPU discovery issues recently can be traced to incorrect override settings.
This extra logging should help quickly spot these and guide users to try unsetting them first.
2025-11-14 14:36:28 -08:00
Parth Sareen ce29f695b4
docs: add logprobs to openapi (#13090) 2025-11-14 14:14:58 -08:00
Michael Yang 12b174b10e
fix tensor merge (#13053) 2025-11-13 15:32:34 -08:00
Michael Yang 333203d871
chore: update models to use slice/chunk/chunksections (#12934)
* use slice/chunks

* bert

* llama4

* gemma3n

* gptoss

* mistral3

* qwen3vl

* qwen25vl

* deepseek2

* remove unused ops
2025-11-13 15:20:12 -08:00
Parth Sareen c114987523
logprob: add bytes to logprobs (#13068) 2025-11-13 13:49:25 -08:00
Michael Yang b48083f33f
ml: add slice operation (#12870)
* slice

* chunk, chunksections
2025-11-13 13:28:21 -08:00
nicole pardal 482bec824f
embeddings: added cli command to embedding docs (#12993) 2025-11-13 13:24:13 -08:00
Kowyo 684a9a8c5a
docs: fix typo (VSCode -> VS Code) (#13072) 2025-11-12 20:49:33 -08:00
Jeffrey Morgan 54a76d3773
app: remove source code for previous JavaScript-based macOS app (#13067)
The code in this directory has been replaced with the
new Go version in the 'app' directory.
2025-11-12 20:37:43 -08:00
Radhi 8a75d8b015
readme: add AI UI to community integrations (#13035) 2025-11-12 17:08:50 -08:00
Jeffrey Morgan f206357412
readme: fix incorrect header in community integrations (#13065) 2025-11-12 17:00:16 -08:00
Daniel Hiltgen 8224cd9063
ci: fix win vulkan (#13062) 2025-11-12 10:32:24 -08:00
Daniel Hiltgen 6286d9a3a5
Enable Vulkan with a temporary opt-in setting (#12931)
* docs: vulkan information

* Revert "CI: Set up temporary opt-out Vulkan support (#12614)"

This reverts commit 8b6e5baee7.

* vulkan: temporary opt-in for Vulkan support

Revert this once we're ready to enable by default.

* win: add vulkan CI build
2025-11-12 08:40:38 -08:00
Daniel Hiltgen 3a9e8e9fd4
vulkan: temporary cary of vulkan fixes (#12971)
This should be reverted once we update ggml past b6897
2025-11-12 08:31:40 -08:00
Jeffrey Morgan cb1cb06478
docs: rename api-reference.md back to api.md since redirect stopped working (#13056) 2025-11-11 15:53:06 -08:00
Jeffrey Morgan 2d5e066c8c
docs: fix openapi.yaml warnings, rename api.md to api-reference.md (#12904) 2025-11-11 15:39:35 -08:00
Bruce MacDonald 15968714bd
docs/openapi: document that delete and copy responses are empty (#13055)
Some route endpoints return an empty response with a 200 OK. These should be documented in the OpenAPI doc. Note that the previous deletion response was not correct.
2025-11-11 15:07:21 -08:00
Jesse Gross 8bf38552de llm: Prefer dedicated GPUs over iGPUs when allocating memory
We currently assign model layers to GPUs according to free VRAM,
which assumes that GPU performance is roughly equal. This does not
work well for mixed dGPU and iGPU systems because iGPUs typically
use system memory which is large but their performance is slow.
This instead assigns layers to dGPUs first and then iGPUs.

In the future, this could be generalized to have a more fine grained
notion of GPU performance but dGPU vs. iGPU performance is the most
extreme.
2025-11-11 13:11:08 -08:00
Jesse Gross b13fbad0fe llm: Separate llamaServer and ollamaServer code paths
Originally, llamaServer represented old memory estimates, which
could be used with either the old or new engine. ollamaServer was
used only for the new estimates and new engine. Since these
implementations did not map directly to engine, there was engine-
specific code in common code paths.

Now that new estimates are always used for the new engine, there is
a direct mapping between server type and engine. This separates out
most of the engine-specific code into the correct implementation
to make things easier to understand.
2025-11-11 13:11:08 -08:00
Jesse Gross f560bd077f llm: Use Ollama engine memory layouts for both old and new engines
Currently for both the old and new engines, there is code to
calculate how much memory is required for a model and lay out
the layers onto GPUs. This reuses the new engine's lay out code
for the old engine as well, bringing them closer together. The
old engine continues to use its current method of estimating
required memory.

This reduces maintainence effort and improves consistency, as new
features only need to be implemented in one place. The newer code
is also more accurate, especially with multiple GPUs.
2025-11-11 13:11:08 -08:00
Jesse Gross 4372d0bfef llamarunner: Respect device ordering for offloaded layers
We used to control the way that llama.cpp saw devices using
CUDA_VISIBLE_DEVICES or similar. This would ensure that the layers
offloaded to a device were actually the ones intended. This is
particularly important because we might reorder devices based on
free memory or performance.

When we started explicitly scheduling layers, this logic went
away but the llamarunner didn't have any way to set the correct
order of devices. This meant that the correct number of layers
would be assigned to a device but not necessarily the layers
that were expected. This change sets up the devices correctly
based on the offload information.
2025-11-11 13:11:08 -08:00
Eva H 31361c4d3c
app/ui: do not send thinking to prevent errors with cloud provider 2025-11-11 16:09:24 -05:00
Baptiste Jamin 59241c5bee
server: add logprobs and top_logprobs support to Ollama's API (#12899)
Adds logprobs support to Ollama's API including support for Ollama's
OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
in the API, Ollama will return the log probabilities for each token generated.
'top_logprobs', an integer value can also be specified up to the value 20.
When specified, the API will also provide the number of most likely tokens to
return at each token position

Co-authored-by: Baptiste Jamin <baptiste@crisp.chat>
2025-11-11 08:49:50 -08:00
Eva Ho 2a9b61f099 address comment 2025-11-11 08:58:55 -05:00
Sheikh 6df4208836
docs: fix metal gpu section header (#13045) 2025-11-10 21:51:22 -08:00
Eva Ho 9d615cdaa0 fix test 2025-11-10 20:13:50 -05:00
Eva Ho 6a818b8a09 clean up 2025-11-10 19:08:42 -05:00
Eva Ho 2aaf29acb5 app/ui: do not send to prevent errors with cloud provider 2025-11-10 19:05:00 -05:00
Eva H a42f826acb
app/ui: using streamdown AI elements for markdown rendering 2025-11-10 12:05:59 -05:00
Bruce MacDonald e10a3533a5
app/docs: remove out of date storybook instructions (#13006) 2025-11-08 13:28:18 -08:00
Patrick Devine 91ec3ddbeb
bugfix: don't include both consolidated.safetensors and model-*.safetensors (#13010) 2025-11-07 22:41:57 -08:00
Parth Sareen 755ac3b069
docs: update n8n URL for Ollama (#12994) 2025-11-07 20:07:26 -08:00
Daniel Hiltgen 60b8973559
doc: re-add login autostart faq and GPU updates (#12975)
* doc: re-add login autostart faq

This appears to have been accidentally dropped during the doc migration.

* docs: GPU updates lost on the doc update

* review comments: improve windows login disable instructions
2025-11-07 11:21:44 -08:00
Tomoya Fujita d2ef679d42
docs: fix 404 link to modelfile documentation (#12996) 2025-11-07 10:06:46 -08:00
Thomas Stocker d4e0da0890
Remove unnecessary MacOs 13 and lower Patches (#12656)
* Remove unnecessary macos 13 Patch

* Remove unnecessary MacOs Version Guard patch

* rename patchesw

* remove again macos13 patch

* rename files
2025-11-06 15:52:56 -08:00
Jeffrey Morgan 565b802a6b
openai: fix tool call ID mapping (#12988) 2025-11-06 15:26:25 -08:00
Saifeddine ALOUI 6c79e6c09a
readme: add security tools section and Ollama fortress to community integrations (#12981) 2025-11-06 15:21:13 -08:00
breatn 780762f9d2
server: fix duplicate 'is' typo in comment (#12985) 2025-11-06 14:44:44 -08:00
Jeffrey Morgan 30fcc71983
api: add omitempty to required tool function parameter type (#12989) 2025-11-06 14:08:55 -08:00
Eva Ho 3501a4bdf9 address comment 2025-11-06 16:49:22 -05:00
Eva H 73a0cafc1e
Merge pull request #12973 from macarronesc/main
feat: add support for WebP images in Ollama's app
2025-11-06 16:31:46 -05:00
Eva Ho e309c80474 address comments 2025-11-06 13:49:59 -05:00
Daniel Hiltgen 544b6739dd
ggml update to b6840 (#12791) 2025-11-06 10:19:22 -08:00
Daniel Alejandro Coll Tejeda a4a53692f8 refactor: remove GIF support from image validation tests and logging 2025-11-06 09:09:51 +00:00
7394112478 c4ba257c64
readme: remove 404 link (#11351) 2025-11-05 23:36:59 -08:00