[PR #12772] [CLOSED] 3470a5c89 ggml-alloc : make gallocr prefer chunks that allow memory reuse (#16788) #76239

Closed
opened 2026-05-05 08:45:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12772
Author: @inforithmics
Created: 10/24/2025
Status: Closed

Base: mainHead: LlamaCppUpdate


📝 Commits (10+)

📊 Changes

108 files changed (+4627 additions, -1390 deletions)

View changed files

📝 Makefile.sync (+1 -1)
📝 llama/build-info.cpp (+1 -1)
📝 llama/llama.cpp/common/json-schema-to-grammar.cpp (+12 -12)
📝 llama/llama.cpp/src/llama-arch.cpp (+40 -0)
📝 llama/llama.cpp/src/llama-arch.h (+4 -0)
📝 llama/llama.cpp/src/llama-batch.h (+1 -1)
📝 llama/llama.cpp/src/llama-chat.cpp (+35 -2)
📝 llama/llama.cpp/src/llama-chat.h (+2 -0)
📝 llama/llama.cpp/src/llama-context.cpp (+2 -1)
📝 llama/llama.cpp/src/llama-graph.cpp (+109 -43)
📝 llama/llama.cpp/src/llama-graph.h (+7 -3)
📝 llama/llama.cpp/src/llama-hparams.h (+2 -0)
📝 llama/llama.cpp/src/llama-model.cpp (+322 -46)
📝 llama/llama.cpp/src/llama-model.h (+3 -0)
📝 llama/llama.cpp/src/llama-quant.cpp (+7 -1)
📝 llama/llama.cpp/src/llama-vocab.cpp (+1 -0)
📝 llama/llama.cpp/src/llama.cpp (+4 -0)
📝 llama/llama.cpp/tools/mtmd/clip-impl.h (+2 -0)
📝 llama/llama.cpp/tools/mtmd/clip.cpp (+15 -3)
📝 llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch (+21 -21)

...and 80 more files

📄 Description

  • macos "go generate ./... " is not clean.

Improvements on Vulkan

  • granite4:small-h from 6 TG/s to 12 TG/s on AMD integrated GPU

Known Problems:

  • granite4 crashes when not enough memory is available on the gpu. (Probably a Bug in the memory estimation logic in the llama.cpp engine).

Things fixed with synchronization:


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12772 **Author:** [@inforithmics](https://github.com/inforithmics) **Created:** 10/24/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `LlamaCppUpdate` --- ### 📝 Commits (10+) - [`473e60d`](https://github.com/ollama/ollama/commit/473e60d3f388d424e16909ed96c83b950bd4c7ed) Update sync - [`b06374d`](https://github.com/ollama/ollama/commit/b06374d6bd51dd7126a310543389096f39c19ecd) update patch - [`4ecdba1`](https://github.com/ollama/ollama/commit/4ecdba19a209c0114ae0564601f1f8bc37be359f) Fix Patch - [`d1317b2`](https://github.com/ollama/ollama/commit/d1317b21eb4ea28dbfa5ec00f0b5c62fc77449eb) Removed 0030 patch is inlcuded in sync - [`ca02afd`](https://github.com/ollama/ollama/commit/ca02afdb8e0f1acb56a67b7d4a3c393a8f576bdb) sync files - [`f7d98de`](https://github.com/ollama/ollama/commit/f7d98de16ae9da0431de15def98c111082184e1b) Merge remote-tracking branch 'upstream/main' into LlamaCppUpdate - [`2b8b9ac`](https://github.com/ollama/ollama/commit/2b8b9ac042f2c652c6491adcde3aad48a595cbf3) Update to fb349848f vulkan: Handle FA with all -inf mask values (#16447) - [`865d0a9`](https://github.com/ollama/ollama/commit/865d0a934711e1cd26e3c0d5901ec7f7acd9b50e) fixing cuda - [`39cfca9`](https://github.com/ollama/ollama/commit/39cfca9e1736d7de13e934fe0fc6dfac06ca14f0) Update macos files - [`21c7b44`](https://github.com/ollama/ollama/commit/21c7b44c55c7af4094f6083234d8c157672f349b) Commit 061f0ef use passed ops instead of hardcoded ops (#16712) ### 📊 Changes **108 files changed** (+4627 additions, -1390 deletions) <details> <summary>View changed files</summary> 📝 `Makefile.sync` (+1 -1) 📝 `llama/build-info.cpp` (+1 -1) 📝 `llama/llama.cpp/common/json-schema-to-grammar.cpp` (+12 -12) 📝 `llama/llama.cpp/src/llama-arch.cpp` (+40 -0) 📝 `llama/llama.cpp/src/llama-arch.h` (+4 -0) 📝 `llama/llama.cpp/src/llama-batch.h` (+1 -1) 📝 `llama/llama.cpp/src/llama-chat.cpp` (+35 -2) 📝 `llama/llama.cpp/src/llama-chat.h` (+2 -0) 📝 `llama/llama.cpp/src/llama-context.cpp` (+2 -1) 📝 `llama/llama.cpp/src/llama-graph.cpp` (+109 -43) 📝 `llama/llama.cpp/src/llama-graph.h` (+7 -3) 📝 `llama/llama.cpp/src/llama-hparams.h` (+2 -0) 📝 `llama/llama.cpp/src/llama-model.cpp` (+322 -46) 📝 `llama/llama.cpp/src/llama-model.h` (+3 -0) 📝 `llama/llama.cpp/src/llama-quant.cpp` (+7 -1) 📝 `llama/llama.cpp/src/llama-vocab.cpp` (+1 -0) 📝 `llama/llama.cpp/src/llama.cpp` (+4 -0) 📝 `llama/llama.cpp/tools/mtmd/clip-impl.h` (+2 -0) 📝 `llama/llama.cpp/tools/mtmd/clip.cpp` (+15 -3) 📝 `llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch` (+21 -21) _...and 80 more files_ </details> ### 📄 Description - [x] macos "go generate ./... " is not clean. Improvements on Vulkan - granite4:small-h from 6 TG/s to 12 TG/s on AMD integrated GPU Known Problems: - granite4 crashes when not enough memory is available on the gpu. (Probably a Bug in the memory estimation logic in the llama.cpp engine). Things fixed with synchronization: - Included fix for PLaMo2: https://github.com/ollama/ollama/pull/12761 - vulkan: deduplicate Microsoft Direct3D12 devices - Vulkan: Odd compute buffer behaviors at specific context breakpoints version b6568 and above #16759 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 08:45:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#76239