[PR #1445] [MERGED] fix: parallel queueing race condition caused silent failure #15857

Closed
opened 2026-04-16 05:10:28 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/1445
Author: @BruceMacD
Created: 12/9/2023
Status: Merged
Merged: 12/9/2023
Merged by: @jmorganca

Base: mainHead: brucemacd/parallel


📝 Commits (2)

📊 Changes

1 file changed (+30 additions, -26 deletions)

View changed files

📝 llm/llama.go (+30 -26)

📄 Description

As of the most recent llama.cpp update concurrent requests had a race condition that would result in an empty response.

This was not easy to observe since the response from the llm runner subprocess was a 200 with the error {"content":"slot unavailable"} in the response stream, which just silently closed the channel.

This change resolves this by allowing multiple slots in the llm runner subprocess. We manage the queueing ourselves so this should be ok. @dhiltgen this may be a case we need to account for in the cgo changes.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/1445 **Author:** [@BruceMacD](https://github.com/BruceMacD) **Created:** 12/9/2023 **Status:** ✅ Merged **Merged:** 12/9/2023 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `brucemacd/parallel` --- ### 📝 Commits (2) - [`2439574`](https://github.com/ollama/ollama/commit/2439574b0fe04b4252dbcccf1c21a93a2bb3dcb8) fix: queued request failures - [`2a43e92`](https://github.com/ollama/ollama/commit/2a43e92698ba01fa31250335fcde64f384d98f07) log steam errors ### 📊 Changes **1 file changed** (+30 additions, -26 deletions) <details> <summary>View changed files</summary> 📝 `llm/llama.go` (+30 -26) </details> ### 📄 Description As of the most recent llama.cpp update concurrent requests had a race condition that would result in an empty response. This was not easy to observe since the response from the llm runner subprocess was a 200 with the error {"content":"slot unavailable"} in the response stream, which just silently closed the channel. This change resolves this by allowing multiple slots in the llm runner subprocess. We manage the queueing ourselves so this should be ok. @dhiltgen this may be a case we need to account for in the cgo changes. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 05:10:28 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#15857