* feat: Bump llama.cpp to the latest master (17f7f4b)
This brings in significant improvements to prefill performance for all
models using the SSM_CONV and SSM_SCAN ops (granite4, jamba, falcon-h,
nemotron-h, Qwen3 Next) on Apple Metal.
See https://github.com/ggml-org/llama.cpp/pull/17876
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 1-4
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Update patches 5-12
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 13-18
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patch 20
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 21-31
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Sync vendored code
The two files I'm not sure about here are the swap from gemma3-iswa.cpp to
gemma3.cpp (I chose to include this because I think it's required), and the
inclusion of `ggml-zendnn.h` which I chose to omit.
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* Revert "vulkan: temporary cary of vulkan fixes (#12971)"
This reverts commit 3a9e8e9fd4.
* ggml update to b7087
* fix argsort on metal
* update to b7108
* fix bakllava regression
This model lacks the metadata for the projector type.
* update to b7209
* fix TopK perf
* only build arm code on arm
* build: optimize dockerfile context for iterating
This moves the copy of the source into the layer AFTER
doing software installs so we don't have to go through
the RPM install for cuda, etc. every time you touch a
source file.
* amd: implement linux sysfs based VRAM lookup
This adds a C++ implementation of sysfs DRM VRAM discovery
for more accurate free VRAM data on linux for AMD GPUs.