[PR #5117] [MERGED] Handle models with divergent layer sizes #22224

Closed
opened 2026-04-19 16:10:57 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5117
Author: @dhiltgen
Created: 6/18/2024
Status: Merged
Merged: 6/18/2024
Merged by: @dhiltgen

Base: mainHead: fix_prediction


📝 Commits (1)

  • 359b15a Handle models with divergent layer sizes

📊 Changes

1 file changed (+6 additions, -0 deletions)

View changed files

📝 llm/memory.go (+6 -0)

📄 Description

The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off.

Without the fix:

time=2024-06-18T11:03:42.708-07:00 level=INFO source=memory.go:303 msg="offload to metal" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[96.0 GiB]" memory.required.full="2.4 GiB" memory.required.partial="2.4 GiB" memory.required.kv="432.0 MiB" memory.required.allocations="[2.4 GiB]" memory.weights.total="1.6 GiB" memory.weights.repeating="1.4 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="72.0 MiB" memory.graph.partial="72.0 MiB"

With the fix:

time=2024-06-18T11:02:47.707-07:00 level=INFO source=memory.go:309 msg="offload to metal" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[96.0 GiB]" memory.required.full="9.2 GiB" memory.required.partial="9.2 GiB" memory.required.kv="432.0 MiB" memory.required.allocations="[9.2 GiB]" memory.weights.total="8.4 GiB" memory.weights.repeating="8.3 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="72.0 MiB" memory.graph.partial="72.0 MiB"

Partial fix for #5113 but we'll need additional graph updates...


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5117 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 6/18/2024 **Status:** ✅ Merged **Merged:** 6/18/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `fix_prediction` --- ### 📝 Commits (1) - [`359b15a`](https://github.com/ollama/ollama/commit/359b15a59785809465ddffbaffd8be0ae3afcd5a) Handle models with divergent layer sizes ### 📊 Changes **1 file changed** (+6 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `llm/memory.go` (+6 -0) </details> ### 📄 Description The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off. Without the fix: ``` time=2024-06-18T11:03:42.708-07:00 level=INFO source=memory.go:303 msg="offload to metal" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[96.0 GiB]" memory.required.full="2.4 GiB" memory.required.partial="2.4 GiB" memory.required.kv="432.0 MiB" memory.required.allocations="[2.4 GiB]" memory.weights.total="1.6 GiB" memory.weights.repeating="1.4 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="72.0 MiB" memory.graph.partial="72.0 MiB" ``` With the fix: ``` time=2024-06-18T11:02:47.707-07:00 level=INFO source=memory.go:309 msg="offload to metal" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[96.0 GiB]" memory.required.full="9.2 GiB" memory.required.partial="9.2 GiB" memory.required.kv="432.0 MiB" memory.required.allocations="[9.2 GiB]" memory.weights.total="8.4 GiB" memory.weights.repeating="8.3 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="72.0 MiB" memory.graph.partial="72.0 MiB" ``` Partial fix for #5113 but we'll need additional graph updates... --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 16:10:57 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#22224