[PR #11003] [CLOSED] server: do partial gguf kv read for capability check #60107

Closed
opened 2026-04-29 15:01:48 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11003
Author: @BruceMacD
Created: 6/6/2025
Status: Closed

Base: mxyng/ggufHead: brucemacd/partial-read-caps


📝 Commits (6)

  • 8e3998b wip: incremental gguf parser
  • 735e807 gguf: update test to not rely on gguf on disc
  • e8d1933 fix lint unneeded conversions
  • d3cbbbf more type conversions
  • 22aed78 Update gguf_test.go
  • 3cf6283 server: do partial gguf kv read for capability check

📊 Changes

9 files changed (+1631 additions, -87 deletions)

View changed files

fs/gguf/gguf.go (+350 -0)
fs/gguf/gguf_test.go (+320 -0)
fs/gguf/keyvalue.go (+102 -0)
fs/gguf/keyvalue_test.go (+208 -0)
fs/gguf/lazy.go (+88 -0)
fs/gguf/reader.go (+34 -0)
fs/gguf/tensor.go (+284 -0)
📝 server/images.go (+13 -15)
📝 server/images_test.go (+232 -72)

📄 Description

Adding model capabilities to the tags endpoint caused slow responses in cases with many models.
https://github.com/ollama/ollama/pull/10174#issuecomment-2948892454

This change uses the new partial gguf reading capability to speed up the response. Comparison below:

before returning capabilities:
❯ time ollama ls
ollama ls  0.01s user 0.01s system 78% cpu 0.028 total

return capabilities with full file read:
❯ time ollama ls
ollama ls  0.01s user 0.01s system 3% cpu 0.505 total

returning capabilities with partial read:
❯ time ollama ls
ollama ls  0.01s user 0.01s system 5% cpu 0.345 total

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11003 **Author:** [@BruceMacD](https://github.com/BruceMacD) **Created:** 6/6/2025 **Status:** ❌ Closed **Base:** `mxyng/gguf` ← **Head:** `brucemacd/partial-read-caps` --- ### 📝 Commits (6) - [`8e3998b`](https://github.com/ollama/ollama/commit/8e3998b9dd3f6d96c4882b8d8f8c933fc0f8ae0a) wip: incremental gguf parser - [`735e807`](https://github.com/ollama/ollama/commit/735e80787ba3b6e96c4b0be856bcb2f1bdc38a7e) gguf: update test to not rely on gguf on disc - [`e8d1933`](https://github.com/ollama/ollama/commit/e8d1933b99c42b619d928cdd65916a2faa18104c) fix lint unneeded conversions - [`d3cbbbf`](https://github.com/ollama/ollama/commit/d3cbbbfd855f719891d35658066344a950f3cb4c) more type conversions - [`22aed78`](https://github.com/ollama/ollama/commit/22aed78048ee5b426298c1122e483106cc0419ec) Update gguf_test.go - [`3cf6283`](https://github.com/ollama/ollama/commit/3cf62838ce70b934d699cfac90192a47ac77818b) server: do partial gguf kv read for capability check ### 📊 Changes **9 files changed** (+1631 additions, -87 deletions) <details> <summary>View changed files</summary> ➕ `fs/gguf/gguf.go` (+350 -0) ➕ `fs/gguf/gguf_test.go` (+320 -0) ➕ `fs/gguf/keyvalue.go` (+102 -0) ➕ `fs/gguf/keyvalue_test.go` (+208 -0) ➕ `fs/gguf/lazy.go` (+88 -0) ➕ `fs/gguf/reader.go` (+34 -0) ➕ `fs/gguf/tensor.go` (+284 -0) 📝 `server/images.go` (+13 -15) 📝 `server/images_test.go` (+232 -72) </details> ### 📄 Description Adding model capabilities to the `tags` endpoint caused slow responses in cases with many models. https://github.com/ollama/ollama/pull/10174#issuecomment-2948892454 This change uses the new partial gguf reading capability to speed up the response. Comparison below: ``` before returning capabilities: ❯ time ollama ls ollama ls 0.01s user 0.01s system 78% cpu 0.028 total return capabilities with full file read: ❯ time ollama ls ollama ls 0.01s user 0.01s system 3% cpu 0.505 total returning capabilities with partial read: ❯ time ollama ls ollama ls 0.01s user 0.01s system 5% cpu 0.345 total ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 15:01:48 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#60107