[PR #591] [MERGED] unbound max num gpu layers #10242

Closed
opened 2026-04-12 22:55:40 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/591
Author: @BruceMacD
Created: 9/25/2023
Status: Merged
Merged: 9/25/2023
Merged by: @BruceMacD

Base: mainHead: brucemacd/load-more-layers


📝 Commits (10+)

📊 Changes

4 files changed (+36 additions, -29 deletions)

View changed files

📝 llm/ggml.go (+1 -0)
📝 llm/gguf.go (+10 -0)
📝 llm/llama.go (+23 -27)
📝 llm/llm.go (+2 -2)

📄 Description

Load as many layers into VRAM as possible using model file size as a rough heuristic for the amount of memory required for a layer.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/591 **Author:** [@BruceMacD](https://github.com/BruceMacD) **Created:** 9/25/2023 **Status:** ✅ Merged **Merged:** 9/25/2023 **Merged by:** [@BruceMacD](https://github.com/BruceMacD) **Base:** `main` ← **Head:** `brucemacd/load-more-layers` --- ### 📝 Commits (10+) - [`7f79a24`](https://github.com/ollama/ollama/commit/7f79a24131f2ee01c3800d4d739f33e6b82b14a9) unbound max num gpu layers - [`b11077a`](https://github.com/ollama/ollama/commit/b11077a0e592782a26d0df90cdb7041d1903d295) gguf num_layer - [`9163b25`](https://github.com/ollama/ollama/commit/9163b257ae3014614591597dd9cdce4ced904305) handle different types for block_count - [`8e41c16`](https://github.com/ollama/ollama/commit/8e41c162038d051d10dd22e23f27ea8d9e4f58e1) use file size to guess layer size - [`22473b8`](https://github.com/ollama/ollama/commit/22473b861897afa5a765be68ca28b3fb1fa0dab0) Update gguf.go - [`d72c5ff`](https://github.com/ollama/ollama/commit/d72c5ffb2b151344f746f9d382c5609b07ad2544) Update llama.go - [`9cc4b87`](https://github.com/ollama/ollama/commit/9cc4b879adfb3b1f18fc7bbade06579b0d8a6af6) return int64 from numlayers - [`daee1cc`](https://github.com/ollama/ollama/commit/daee1cc3618cf7e2ab3cb4eacf2a145fcdda36d2) Update llama.go - [`6542207`](https://github.com/ollama/ollama/commit/65422078b507a873d524eb1c8fd71e153aa35e1c) Update llama.go - [`51b110e`](https://github.com/ollama/ollama/commit/51b110eb427a53f54f874ef51ce5e57e747a42ac) type casting fix ### 📊 Changes **4 files changed** (+36 additions, -29 deletions) <details> <summary>View changed files</summary> 📝 `llm/ggml.go` (+1 -0) 📝 `llm/gguf.go` (+10 -0) 📝 `llm/llama.go` (+23 -27) 📝 `llm/llm.go` (+2 -2) </details> ### 📄 Description Load as many layers into VRAM as possible using model file size as a rough heuristic for the amount of memory required for a layer. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 22:55:40 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#10242