[GH-ISSUE #4079] About OLLAMA_PARALLEL split the max context length #64572

Open
opened 2026-05-03 18:11:05 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @DirtyKnightForVi on GitHub (May 1, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4079

What is the issue?

I encountered this while testing SQL QA with extremely large table, and i put all DDL into system .

When OLLAMA_PARALLEL = 4, I observed that model appears to only understand the last 4000 tokens of the DDL. This is quite different from my previous experience. My webui is open webui , it can set num_ctx to 16000, but useless.

BUT changing OLLAMA_PARALLEL=1, model can understand the whole DDL !

so , max_num_ctx = 16000 / OLLAMA_PARALLEL ? Even when the machine is free ?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.33-RC5

Originally created by @DirtyKnightForVi on GitHub (May 1, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4079 ### What is the issue? I encountered this while testing SQL QA with extremely large table, and i put all DDL into `system` . When `OLLAMA_PARALLEL = 4,` I observed that model appears to only understand the last 4000 tokens of the DDL. This is quite different from my previous experience. My webui is open webui , it can set `num_ctx` to 16000, but useless. BUT changing `OLLAMA_PARALLEL=1`, model can understand the whole DDL ! so , max_num_ctx = 16000 / OLLAMA_PARALLEL ? Even when the machine is free ? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.33-RC5
GiteaMirror added the bug label 2026-05-03 18:11:05 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64572