History

Gabe Goodhart b95693056c feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408 ) * feat: Bump llama.cpp to the latest master (17f7f4b) This brings in significant improvements to prefill performance for all models using the SSM_CONV and SSM_SCAN ops (granite4, jamba, falcon-h, nemotron-h, Qwen3 Next) on Apple Metal. See https://github.com/ggml-org/llama.cpp/pull/17876 Branch: LlamaCPPMetalSSMImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Update patches 1-4 Branch: LlamaCPPMetalSSMImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Update patches 5-12 Branch: LlamaCPPMetalSSMImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Update patches 13-18 Branch: LlamaCPPMetalSSMImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Update patch 20 Branch: LlamaCPPMetalSSMImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Update patches 21-31 Branch: LlamaCPPMetalSSMImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Sync vendored code The two files I'm not sure about here are the swap from gemma3-iswa.cpp to gemma3.cpp (I chose to include this because I think it's required), and the inclusion of `ggml-zendnn.h` which I chose to omit. Branch: LlamaCPPMetalSSMImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>		2025-12-10 12:59:27 -08:00
..
llama.cpp	feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408 )	2025-12-10 12:59:27 -08:00
patches	feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408 )	2025-12-10 12:59:27 -08:00
.gitignore	Re-introduce the `llama` package (#5034 )	2024-10-08 08:53:54 -07:00
README.md	docs: improve syntax highlighting in code blocks (#8854 )	2025-02-07 09:55:07 -08:00
build-info.cpp	feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408 )	2025-12-10 12:59:27 -08:00
build-info.cpp.in	chore: update gitattributes (#8860 )	2025-02-05 16:37:18 -08:00
llama.go	llamarunner: Respect device ordering for offloaded layers	2025-11-11 13:11:08 -08:00
llama_test.go	llama: test case typo and readability improvements (#13078 )	2025-11-15 18:54:27 -08:00
sampling_ext.cpp	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
sampling_ext.h	api: remove unused sampling parameters (#10581 )	2025-05-08 08:31:08 -07:00

README.md

`llama`

This package provides Go bindings to llama.cpp.

Vendoring

Ollama vendors llama.cpp and ggml. While we generally strive to contribute changes back upstream to avoid drift, we carry a small set of patches which are applied to the tracking commit.

If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the ./vendor/ directory.

make -f Makefile.sync apply-patches

Updating Base Commit

Pin to new base commit

To change the base commit, update FETCH_HEAD in Makefile.sync.

When updating to a newer base commit, the existing patches may not apply cleanly and require manual merge resolution.

Start by applying the patches. If any of the patches have conflicts, the git am will stop at the first failure.

make -f Makefile.sync apply-patches

If there are conflicts, you will see an error message. Resolve the conflicts in ./vendor/, and continue the patch series with git am --continue and rerun make -f Makefile.sync apply-patches. Repeat until all patches are successfully applied.

Once all patches are applied, commit the changes to the tracking repository.

make -f Makefile.sync format-patches sync

Generating Patches

When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:

make -f Makefile.sync clean apply-patches

Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the ./vendor/ directory, then generate the patches for ollama with

make -f Makefile.sync format-patches

In your ./vendor/ directory, create a branch, and cherry-pick the new commit to that branch, then submit a PR upstream to llama.cpp.

Commit the changes in the ollama repo and submit a PR to Ollama, which will include the vendored code update with your change, along with the patches.

After your PR upstream is merged, follow the Updating Base Commit instructions above, however first remove your patch before running apply-patches since the new base commit contains your change already.