[PR #9795] [CLOSED] Improve parallel model execution #75355

Closed
opened 2026-05-05 07:47:04 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9795
Author: @Githubguy132010
Created: 3/16/2025
Status: Closed

Base: mainHead: improve-parallel-model-execution


📝 Commits (1)

  • d338b66 Improve parallel model execution

📊 Changes

2 files changed (+213 additions, -68 deletions)

View changed files

📝 llama/llama.cpp/src/llama-model-loader.cpp (+121 -68)
📝 llm/memory.go (+92 -0)

📄 Description

Fixes #9787

Implement a dynamic load balancing system and efficient concurrency control mechanism to manage parallel model execution.

  • llama/llama.cpp/src/llama-model-loader.cpp

    • Add necessary includes for threading and synchronization.
    • Implement a priority-based scheduling system using a task queue and worker threads.
    • Distribute workload evenly across available GPUs and monitor GPU utilization.
    • Adjust workload distribution in real-time to avoid overloading any single GPU.
    • Ensure timely execution of critical tasks.
  • llm/memory.go

    • Add necessary imports for synchronization and timing.
    • Implement a task queue system to manage the execution of parallel tasks.
    • Add a concurrency control mechanism using fine-grained locking techniques.
    • Provide an example usage of the task queue and concurrency control.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9795 **Author:** [@Githubguy132010](https://github.com/Githubguy132010) **Created:** 3/16/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `improve-parallel-model-execution` --- ### 📝 Commits (1) - [`d338b66`](https://github.com/ollama/ollama/commit/d338b6612cb8c2c232cb8db195e3a1a00cbe65c5) Improve parallel model execution ### 📊 Changes **2 files changed** (+213 additions, -68 deletions) <details> <summary>View changed files</summary> 📝 `llama/llama.cpp/src/llama-model-loader.cpp` (+121 -68) 📝 `llm/memory.go` (+92 -0) </details> ### 📄 Description Fixes #9787 Implement a dynamic load balancing system and efficient concurrency control mechanism to manage parallel model execution. * **llama/llama.cpp/src/llama-model-loader.cpp** - Add necessary includes for threading and synchronization. - Implement a priority-based scheduling system using a task queue and worker threads. - Distribute workload evenly across available GPUs and monitor GPU utilization. - Adjust workload distribution in real-time to avoid overloading any single GPU. - Ensure timely execution of critical tasks. * **llm/memory.go** - Add necessary imports for synchronization and timing. - Implement a task queue system to manage the execution of parallel tasks. - Add a concurrency control mechanism using fine-grained locking techniques. - Provide an example usage of the task queue and concurrency control. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 07:47:04 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75355