[PR #4144] [MERGED] Make maximum pending request configurable #10135

Closed
opened 2025-11-12 15:21:43 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4144
Author: @dhiltgen
Created: 5/3/2024
Status: Merged
Merged: 5/5/2024
Merged by: @dhiltgen

Base: mainHead: max_queue


📝 Commits (2)

  • 20f6c06 Make maximum pending request configurable
  • 45d61aa Add integration test to push max queue limits

📊 Changes

4 files changed (+154 additions, -23 deletions)

View changed files

📝 docs/faq.md (+6 -0)
integration/max_queue_test.go (+117 -0)
📝 server/routes.go (+15 -18)
📝 server/sched.go (+16 -5)

📄 Description

Bump the maximum queued requests to 512 (from 10)
Make it configurable with a new env var OLLAMA_MAX_QUEUE
Return a 503 when the server is too busy instead of more generic 500.

Fixes #4124

With the added integration test, here are some quick memory stats on linux:

  • Just starting ollama RSS 429.0m
  • Load orca-mini: RSS 456.8m. (just the Go process, not the child runner)
  • During my stress test where I push >512 connections: RSS 489.0m

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4144 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 5/3/2024 **Status:** ✅ Merged **Merged:** 5/5/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `max_queue` --- ### 📝 Commits (2) - [`20f6c06`](https://github.com/ollama/ollama/commit/20f6c0656906f103d1962b67789a8b7ae8515514) Make maximum pending request configurable - [`45d61aa`](https://github.com/ollama/ollama/commit/45d61aaaa3de5a21b3babddd38faa623926948e4) Add integration test to push max queue limits ### 📊 Changes **4 files changed** (+154 additions, -23 deletions) <details> <summary>View changed files</summary> 📝 `docs/faq.md` (+6 -0) ➕ `integration/max_queue_test.go` (+117 -0) 📝 `server/routes.go` (+15 -18) 📝 `server/sched.go` (+16 -5) </details> ### 📄 Description Bump the maximum queued requests to 512 (from 10) Make it configurable with a new env var `OLLAMA_MAX_QUEUE` Return a 503 when the server is too busy instead of more generic 500. Fixes #4124 With the added integration test, here are some quick memory stats on linux: - Just starting ollama RSS 429.0m - Load orca-mini: RSS 456.8m. (just the Go process, not the child runner) - During my stress test where I push >512 connections: RSS 489.0m --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the
pull-request
label 2025-11-12 15:21:43 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#10135
No description provided.