mirror of https://github.com/ollama/ollama
We currently copy data into the KV cache in contiguous buffers using
ggml_cpy(). ggml_set_rows() was introduced to allow scatter operation
so that contiguous buffers are no longer required. The direct primary
benefit of this is that we no longer need to perform defragmentation.
However, GGML recently removed an optimization for ggml_cpy() and
we picked it up in
|
||
|---|---|---|
| .. | ||
| backend | ||
| nn | ||
| backend.go | ||
| device.go | ||
| path.go | ||