[PR #9243] [MERGED] fix: the problem of loading models with large difference in initial and final layer sizes #75193

Closed
opened 2026-05-05 07:37:29 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9243
Author: @itej89
Created: 2/20/2025
Status: Merged
Merged: 5/13/2025
Merged by: @jessegross

Base: mainHead: main


📝 Commits (10+)

  • cecb96e Fixed over vram allcation dure to small initial layer sizes.
  • 0442fd0 Merge branch 'ollama:main' into main
  • 24fbbbb Merge branch 'ollama:main' into main
  • db62bca Update llm/memory.go
  • a674d85 Added required imports in memory.go
  • 6556387 Merge branch 'ollama:main' into main
  • 12cb3e0 Udpated the initial buffer computation to use kv size of the largest layer. \n Updated the overflow computation to use the exact layer sizes
  • b7caaa1 Merge branch 'ollama:main' into main
  • 54387cb Updated the inital buffer size to use the largest layer size and its kv size
  • 63f81f7 Merge branch 'ollama:main' into main

📊 Changes

1 file changed (+13 additions, -11 deletions)

View changed files

📝 llm/memory.go (+13 -11)

📄 Description

This PR is related to the bug reported for Deepseek R1 deepseek-r1:671b model loading failure on MI210 3X, 4X and possibly more GPU configurations.

Reported Bug:
https://github.com/ollama/ollama/issues/8776

Observations:
I have noticed that the Ollama go-lang script uses initial layers for finding the number of layers off-loadable to GPU, while the llama.cpp code is assigning the models fnial layers on to the GPU devices as shown in the below line

Counting number of layers
https://github.com/ollama/ollama/blob/main/llm/memory.go#L213

Assigning layers to GPU
https://github.com/ollama/ollama/blob/main/llama/llama.cpp/src/llama.cpp#L330

This can cause problems if there is a big difference in the initial layer sizes and final layer sizes, as in the case of Deepseek R1.
The go script estimates more number of layers than can be fit onto the GPU.

This PR fixes the issue by updating the go script to use the final layers and allocate buffer size equals to the maximum layer size.

Logs:
Error debug logs
VRAM allocation error.txt

Logs after this fix
After Changes - Success log.txt


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9243 **Author:** [@itej89](https://github.com/itej89) **Created:** 2/20/2025 **Status:** ✅ Merged **Merged:** 5/13/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (10+) - [`cecb96e`](https://github.com/ollama/ollama/commit/cecb96ef3f32191233609e4253650473be3f91c1) Fixed over vram allcation dure to small initial layer sizes. - [`0442fd0`](https://github.com/ollama/ollama/commit/0442fd015fd64f22df3895df18fddf48c419feb7) Merge branch 'ollama:main' into main - [`24fbbbb`](https://github.com/ollama/ollama/commit/24fbbbbc12bd0c42aa4b2c0d7db9facd51f9ec92) Merge branch 'ollama:main' into main - [`db62bca`](https://github.com/ollama/ollama/commit/db62bca77d4334dba2577e1f846100ebb95f6094) Update llm/memory.go - [`a674d85`](https://github.com/ollama/ollama/commit/a674d8544570e390315749d2d57e9e9b78182ce0) Added required imports in memory.go - [`6556387`](https://github.com/ollama/ollama/commit/6556387417d4097d1c1a6a13f44975e3f04333e5) Merge branch 'ollama:main' into main - [`12cb3e0`](https://github.com/ollama/ollama/commit/12cb3e0da4e4d0cd4210f0b9a3dc08266403ae87) Udpated the initial buffer computation to use kv size of the largest layer. \n Updated the overflow computation to use the exact layer sizes - [`b7caaa1`](https://github.com/ollama/ollama/commit/b7caaa167f74c7214b470f0f10b2ae094467a95a) Merge branch 'ollama:main' into main - [`54387cb`](https://github.com/ollama/ollama/commit/54387cba246a1f474659cd306c3331127c58c1f9) Updated the inital buffer size to use the largest layer size and its kv size - [`63f81f7`](https://github.com/ollama/ollama/commit/63f81f78df6f187fec9fc725e984b7ea7bccf6a1) Merge branch 'ollama:main' into main ### 📊 Changes **1 file changed** (+13 additions, -11 deletions) <details> <summary>View changed files</summary> 📝 `llm/memory.go` (+13 -11) </details> ### 📄 Description This PR is related to the bug reported for Deepseek R1 deepseek-r1:671b model loading failure on MI210 3X, 4X and possibly more GPU configurations. **Reported Bug:** https://github.com/ollama/ollama/issues/8776 **Observations:** I have noticed that the Ollama go-lang script uses initial layers for finding the number of layers off-loadable to GPU, while the llama.cpp code is assigning the models fnial layers on to the GPU devices as shown in the below line _Counting number of layers_ _https://github.com/ollama/ollama/blob/main/llm/memory.go#L213_ _Assigning layers to GPU_ _https://github.com/ollama/ollama/blob/main/llama/llama.cpp/src/llama.cpp#L330_ This can cause problems if there is a big difference in the initial layer sizes and final layer sizes, as in the case of Deepseek R1. The go script estimates more number of layers than can be fit onto the GPU. This PR fixes the issue by updating the go script to use the final layers and allocate buffer size equals to the maximum layer size. **Logs**: Error debug logs [VRAM allocation error.txt](https://github.com/user-attachments/files/18881223/VRAM.allocation.error.txt) Logs after this fix [After Changes - Success log.txt](https://github.com/user-attachments/files/18881224/After.Changes.-.Success.log.txt) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 07:37:29 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75193