* feat: Bump llama.cpp to the latest master (17f7f4b)
This brings in significant improvements to prefill performance for all
models using the SSM_CONV and SSM_SCAN ops (granite4, jamba, falcon-h,
nemotron-h, Qwen3 Next) on Apple Metal.
See https://github.com/ggml-org/llama.cpp/pull/17876
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 1-4
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Update patches 5-12
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 13-18
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patch 20
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 21-31
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Sync vendored code
The two files I'm not sure about here are the swap from gemma3-iswa.cpp to
gemma3.cpp (I chose to include this because I think it's required), and the
inclusion of `ggml-zendnn.h` which I chose to omit.
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* Revert "vulkan: temporary cary of vulkan fixes (#12971)"
This reverts commit 3a9e8e9fd4.
* ggml update to b7087
* fix argsort on metal
* update to b7108
* fix bakllava regression
This model lacks the metadata for the projector type.
* update to b7209
* fix TopK perf
* only build arm code on arm