[GH-ISSUE #5486] Upper token limit scales with number of parallel requests #65468

Closed
opened 2026-05-03 21:24:33 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @jmorganca on GitHub (Jul 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5486

Originally assigned to: @jmorganca on GitHub.

What is the issue?

It should be based on single parallel requests' context size

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @jmorganca on GitHub (Jul 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5486 Originally assigned to: @jmorganca on GitHub. ### What is the issue? It should be based on single parallel requests' context size ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-05-03 21:24:33 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65468