[PR #11794] server: Reduce gpt-oss context length for small VRAM GPUs #13622

Closed
opened 2026-04-13 00:31:26 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/11794

State: closed
Merged: Yes


gpt-oss works best with a context length of at least 8k. However, for GPUs with limited amount of VRAM, there is a significant performance hit to this increased context. In these cases, we switch to the Ollama default of 4k

**Original Pull Request:** https://github.com/ollama/ollama/pull/11794 **State:** closed **Merged:** Yes --- gpt-oss works best with a context length of at least 8k. However, for GPUs with limited amount of VRAM, there is a significant performance hit to this increased context. In these cases, we switch to the Ollama default of 4k
GiteaMirror added the pull-request label 2026-04-13 00:31:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13622