[PR #5244] [CLOSED] llm: suppress large allocations for GGUF arrays #58419

Closed
opened 2026-04-29 13:17:51 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5244
Author: @bmizerany
Created: 6/23/2024
Status: Closed

Base: mainHead: bmizerany/nosillyggufslurps


📝 Commits (1)

  • acbffa5 llm: suppress large allocations for GGUF arrays

📊 Changes

2 files changed (+31 additions, -7 deletions)

View changed files

📝 llm/ggml.go (+1 -1)
📝 llm/gguf.go (+30 -6)

📄 Description

This introduces a little array type for holding GGUF arrays that prevents the array from growing too large. It preserves the total size of the array, but limits the number of elements that are actually allocated.

GGUF arrays that are extremely large, such as tokens, etc, are generally uninteresting to users, and are not worth the memory overhead, and the time spent allocating and freeing them. They are necessary for inference, but not for inspection.

The size of these arrays is, however, important in Ollama, so it is preserved in a separate field on array.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5244 **Author:** [@bmizerany](https://github.com/bmizerany) **Created:** 6/23/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `bmizerany/nosillyggufslurps` --- ### 📝 Commits (1) - [`acbffa5`](https://github.com/ollama/ollama/commit/acbffa59e9f5ea20d0183c10c4896d297e47da10) llm: suppress large allocations for GGUF arrays ### 📊 Changes **2 files changed** (+31 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `llm/ggml.go` (+1 -1) 📝 `llm/gguf.go` (+30 -6) </details> ### 📄 Description This introduces a little array type for holding GGUF arrays that prevents the array from growing too large. It preserves the total size of the array, but limits the number of elements that are actually allocated. GGUF arrays that are extremely large, such as tokens, etc, are generally uninteresting to users, and are not worth the memory overhead, and the time spent allocating and freeing them. They are necessary for inference, but not for inspection. The size of these arrays is, however, important in Ollama, so it is preserved in a separate field on array. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 13:17:51 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#58419