[PR #10721] [MERGED] ggml: Seperate tensor load from backend creation #60039

Closed
opened 2026-04-29 14:57:26 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10721
Author: @jessegross
Created: 5/15/2025
Status: Merged
Merged: 5/19/2025
Merged by: @jessegross

Base: mainHead: jessegross/separate_load


📝 Commits (1)

  • c09c816 ggml: Seperate tensor load from backend creation

📊 Changes

13 files changed (+123 additions, -107 deletions)

View changed files

📝 convert/convert_test.go (+2 -2)
📝 fs/ggml/ggml.go (+8 -6)
📝 fs/ggml/gguf_test.go (+1 -1)
📝 llm/memory.go (+1 -1)
📝 llm/server.go (+1 -1)
📝 ml/backend.go (+5 -9)
📝 ml/backend/ggml/ggml.go (+83 -64)
📝 model/model.go (+3 -9)
📝 runner/ollamarunner/runner.go (+9 -4)
📝 server/create.go (+6 -6)
📝 server/images.go (+1 -1)
📝 server/model.go (+1 -1)
📝 server/quantization_test.go (+2 -2)

📄 Description

Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps:

  • Create backend, including enumerating tensors and memory allocation
  • Loading tensor data

This allows more flexibility in managing model loading.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10721 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 5/15/2025 **Status:** ✅ Merged **Merged:** 5/19/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/separate_load` --- ### 📝 Commits (1) - [`c09c816`](https://github.com/ollama/ollama/commit/c09c8164f34a10c966adec63dc5046943a134cbe) ggml: Seperate tensor load from backend creation ### 📊 Changes **13 files changed** (+123 additions, -107 deletions) <details> <summary>View changed files</summary> 📝 `convert/convert_test.go` (+2 -2) 📝 `fs/ggml/ggml.go` (+8 -6) 📝 `fs/ggml/gguf_test.go` (+1 -1) 📝 `llm/memory.go` (+1 -1) 📝 `llm/server.go` (+1 -1) 📝 `ml/backend.go` (+5 -9) 📝 `ml/backend/ggml/ggml.go` (+83 -64) 📝 `model/model.go` (+3 -9) 📝 `runner/ollamarunner/runner.go` (+9 -4) 📝 `server/create.go` (+6 -6) 📝 `server/images.go` (+1 -1) 📝 `server/model.go` (+1 -1) 📝 `server/quantization_test.go` (+2 -2) </details> ### 📄 Description Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 14:57:26 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#60039