[PR #10540] [MERGED] ollamarunner: Re-enable worst case graph preallocation. #18541

Closed
opened 2026-04-16 06:38:37 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10540
Author: @jessegross
Created: 5/2/2025
Status: Merged
Merged: 5/2/2025
Merged by: @jessegross

Base: mainHead: jessegross/talloc


📝 Commits (1)

  • 2ed1140 ollamarunner: Re-enable worst case graph preallocation.

📊 Changes

3 files changed (+46 additions, -7 deletions)

View changed files

llama/patches/0018-ggml-Don-t-assert-fail-when-tensor-data-changes-1322.patch (+38 -0)
📝 ml/backend/ggml/ggml/src/ggml-alloc.c (+4 -1)
📝 runner/ollamarunner/runner.go (+4 -6)

📄 Description

Worst case graph preallocation was disabled by a27462b "ollamarunner: Temporarily disable worst case graph preallocation" since it caused crashes with large batches when not using the GPU.

This backports upstream llama.cpp commit f057808
"ggml: Don't assert fail when tensor data changes (#13222)", which fixes the underlying bug and allows reverting the previous workaround.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10540 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 5/2/2025 **Status:** ✅ Merged **Merged:** 5/2/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/talloc` --- ### 📝 Commits (1) - [`2ed1140`](https://github.com/ollama/ollama/commit/2ed114085625ee29c5b041194cf1fc8f908999a2) ollamarunner: Re-enable worst case graph preallocation. ### 📊 Changes **3 files changed** (+46 additions, -7 deletions) <details> <summary>View changed files</summary> ➕ `llama/patches/0018-ggml-Don-t-assert-fail-when-tensor-data-changes-1322.patch` (+38 -0) 📝 `ml/backend/ggml/ggml/src/ggml-alloc.c` (+4 -1) 📝 `runner/ollamarunner/runner.go` (+4 -6) </details> ### 📄 Description Worst case graph preallocation was disabled by a27462b "ollamarunner: Temporarily disable worst case graph preallocation" since it caused crashes with large batches when not using the GPU. This backports upstream llama.cpp commit f057808 "ggml: Don't assert fail when tensor data changes (#13222)", which fixes the underlying bug and allows reverting the previous workaround. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:38:37 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#18541