[PR #11058] gguf: even faster parsing by lazily reading array values #18696

Open
opened 2026-04-16 06:43:40 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11058
Author: @mxyng
Created: 6/12/2025
Status: 🔄 Open

Base: mainHead: mxyng/gguf


📝 Commits (3)

📊 Changes

7 files changed (+312 additions, -132 deletions)

View changed files

📝 fs/ggml/gguf_test.go (+46 -0)
📝 fs/gguf/gguf.go (+78 -36)
📝 fs/gguf/gguf_test.go (+41 -0)
📝 fs/gguf/keyvalue.go (+47 -20)
📝 fs/gguf/reader.go (+6 -0)
📝 fs/gguf/tensor.go (+55 -48)
📝 server/routes.go (+39 -28)

📄 Description

this change improves gguf parsing by discarding array items on initial read and returning a struct for lazily reading array items using file offsets. the result is a noticeable performance bump for both string array types and number array types in all cases.

there's the added benefit where the array data is always available

goos: darwin
goarch: arm64
pkg: github.com/ollama/ollama/fs/gguf
cpu: Apple M3 Max
BenchmarkReadArray/float32-16               8242            145775 ns/op
BenchmarkReadArray/string-16                 157           7694131 ns/op
BenchmarkReadArray/int32-16                 7747            149120 ns/op
BenchmarkReadArray/uint32-16                8326            146902 ns/op
PASS
ok      github.com/ollama/ollama/fs/gguf        7.287s
goos: darwin
goarch: arm64
pkg: github.com/ollama/ollama/fs/ggml
cpu: Apple M3 Max
BenchmarkReadArray/float32-maxArraySize=-1-16                 67          17617177 ns/op
BenchmarkReadArray/float32-maxArraySize=0-16                  67          17463565 ns/op
BenchmarkReadArray/float32-maxArraySize=1024-16               68          17402137 ns/op
BenchmarkReadArray/string-maxArraySize=-1-16                  45          23541357 ns/op
BenchmarkReadArray/string-maxArraySize=0-16                   93          12076803 ns/op
BenchmarkReadArray/string-maxArraySize=1024-16                93          12190660 ns/op
BenchmarkReadArray/int32-maxArraySize=-1-16                   68          17309949 ns/op
BenchmarkReadArray/int32-maxArraySize=0-16                    66          17324967 ns/op
BenchmarkReadArray/int32-maxArraySize=1024-16                 68          17315369 ns/op
BenchmarkReadArray/uint32-maxArraySize=-1-16                  67          17362288 ns/op
BenchmarkReadArray/uint32-maxArraySize=0-16                   67          17207407 ns/op
BenchmarkReadArray/uint32-maxArraySize=1024-16                68          17178488 ns/op
PASS
ok      github.com/ollama/ollama/fs/ggml        20.938s

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11058 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 6/12/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `mxyng/gguf` --- ### 📝 Commits (3) - [`12e1357`](https://github.com/ollama/ollama/commit/12e13573a87431a04caedfc9a36f0e7df35b6440) benchmark tests - [`d9d980c`](https://github.com/ollama/ollama/commit/d9d980c7602417d7524bbd4e52ac2cc6f5be6f82) lazy gguf arrays - [`8d97d4b`](https://github.com/ollama/ollama/commit/8d97d4b0ea29ee6103930714a320ad787cfb5fce) use fs.gguf.File to show models ### 📊 Changes **7 files changed** (+312 additions, -132 deletions) <details> <summary>View changed files</summary> 📝 `fs/ggml/gguf_test.go` (+46 -0) 📝 `fs/gguf/gguf.go` (+78 -36) 📝 `fs/gguf/gguf_test.go` (+41 -0) 📝 `fs/gguf/keyvalue.go` (+47 -20) 📝 `fs/gguf/reader.go` (+6 -0) 📝 `fs/gguf/tensor.go` (+55 -48) 📝 `server/routes.go` (+39 -28) </details> ### 📄 Description this change improves gguf parsing by discarding array items on initial read and returning a struct for lazily reading array items using file offsets. the result is a noticeable performance bump for both string array types and number array types in all cases. there's the added benefit where the array data is always available ``` goos: darwin goarch: arm64 pkg: github.com/ollama/ollama/fs/gguf cpu: Apple M3 Max BenchmarkReadArray/float32-16 8242 145775 ns/op BenchmarkReadArray/string-16 157 7694131 ns/op BenchmarkReadArray/int32-16 7747 149120 ns/op BenchmarkReadArray/uint32-16 8326 146902 ns/op PASS ok github.com/ollama/ollama/fs/gguf 7.287s ``` ``` goos: darwin goarch: arm64 pkg: github.com/ollama/ollama/fs/ggml cpu: Apple M3 Max BenchmarkReadArray/float32-maxArraySize=-1-16 67 17617177 ns/op BenchmarkReadArray/float32-maxArraySize=0-16 67 17463565 ns/op BenchmarkReadArray/float32-maxArraySize=1024-16 68 17402137 ns/op BenchmarkReadArray/string-maxArraySize=-1-16 45 23541357 ns/op BenchmarkReadArray/string-maxArraySize=0-16 93 12076803 ns/op BenchmarkReadArray/string-maxArraySize=1024-16 93 12190660 ns/op BenchmarkReadArray/int32-maxArraySize=-1-16 68 17309949 ns/op BenchmarkReadArray/int32-maxArraySize=0-16 66 17324967 ns/op BenchmarkReadArray/int32-maxArraySize=1024-16 68 17315369 ns/op BenchmarkReadArray/uint32-maxArraySize=-1-16 67 17362288 ns/op BenchmarkReadArray/uint32-maxArraySize=0-16 67 17207407 ns/op BenchmarkReadArray/uint32-maxArraySize=1024-16 68 17178488 ns/op PASS ok github.com/ollama/ollama/fs/ggml 20.938s ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:43:40 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#18696