[PR #7805] [MERGED] runner.go: Fix deadlock with many concurrent requests #12520

Closed
opened 2026-04-13 00:01:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7805
Author: @jessegross
Created: 11/23/2024
Status: Merged
Merged: 11/23/2024
Merged by: @jessegross

Base: mainHead: jessegross/deadlock


📝 Commits (1)

  • 1c94e46 runner.go: Fix deadlock with many concurrent requests

📊 Changes

1 file changed (+17 additions, -4 deletions)

View changed files

📝 llama/runner/runner.go (+17 -4)

📄 Description

If there are no avilable slots for new sequences then a request will not be added to the processing queue but will continue on to wait for a response that never comes. Besides never giving a response to the request, this prevents the model from being unloaded due to the outstanding request.

To prevent this, there are semaphores that prevent more requests from being processed than there are slots - one in the Ollama server and one in the runner.

  • The Ollama server one works but it is not designed to protect the runner's data internal structures and the runner can return a final response before clearing its data structures.
  • The internal runner semaphore has similar behavior where it can release the semaphore when it issues a response. This is wrong - it should only release the semaphore after it has cleared the data structure.

In addition, we should return an error if a slot is not found rather than deadlocking in the event we ever get to this spot.

Fixes #7779


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7805 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 11/23/2024 **Status:** ✅ Merged **Merged:** 11/23/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/deadlock` --- ### 📝 Commits (1) - [`1c94e46`](https://github.com/ollama/ollama/commit/1c94e467f14e30a2905d8a4f987583aa5c0d6247) runner.go: Fix deadlock with many concurrent requests ### 📊 Changes **1 file changed** (+17 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `llama/runner/runner.go` (+17 -4) </details> ### 📄 Description If there are no avilable slots for new sequences then a request will not be added to the processing queue but will continue on to wait for a response that never comes. Besides never giving a response to the request, this prevents the model from being unloaded due to the outstanding request. To prevent this, there are semaphores that prevent more requests from being processed than there are slots - one in the Ollama server and one in the runner. - The Ollama server one works but it is not designed to protect the runner's data internal structures and the runner can return a final response before clearing its data structures. - The internal runner semaphore has similar behavior where it can release the semaphore when it issues a response. This is wrong - it should only release the semaphore after it has cleared the data structure. In addition, we should return an error if a slot is not found rather than deadlocking in the event we ever get to this spot. Fixes #7779 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:01:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#12520