[GH-ISSUE #6927] Why Is n_ctx in log Always Four Times the num_ctx Value in ModelFIle When Building qwen2.5-coder-7b-instruct-q5_k_m.gguf? #30144

Closed
opened 2026-04-22 09:37:38 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @XiongDaowen on GitHub (Sep 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6927

What is the issue?

When I built qwen2.5-coder-7b-instruct-q5_k_m.gguf using the modelfile and set PARAMETER num_ctx 4096, the log output showed llama_new_context_with_model: n_ctx = 16384. After setting num_ctx to different values, I noticed that n_ctx is always 4 times the value of num_ctx. Why is this happening?
The log:
89d1ee67-30d0-49b6-9553-32fdc1840fb7
The ModelFile:
4dc02651-9a6d-4007-95be-adcbd9aa871e

OS

No response

GPU

No response

CPU

No response

Ollama version

0.3.3

Originally created by @XiongDaowen on GitHub (Sep 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6927 ### What is the issue? When I built qwen2.5-coder-7b-instruct-q5_k_m.gguf using the modelfile and set PARAMETER num_ctx 4096, the log output showed llama_new_context_with_model: n_ctx = 16384. After setting num_ctx to different values, I noticed that n_ctx is always 4 times the value of num_ctx. Why is this happening? The log: ![89d1ee67-30d0-49b6-9553-32fdc1840fb7](https://github.com/user-attachments/assets/b9b5a0d0-727b-4422-a592-320784054634) The ModelFile: ![4dc02651-9a6d-4007-95be-adcbd9aa871e](https://github.com/user-attachments/assets/591e6e7c-3047-478e-9207-19c2fb3ea509) ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.3.3
GiteaMirror added the bug label 2026-04-22 09:37:38 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 24, 2024):

Either OLLAMA_NUM_PARALLEL is unset or set to 4. OLLAMA_NUM_PARALLEL sets how many requests ollama can handle concurrently. Each request handler needs its own context space, so total context is (number of handlers * context size). If OLLAMA_NUM_PARALLEL is unset, ollama chooses either 1 or 4 depending on how much VRAM is available. If you set OLLAMA_NUM_PARALLEL=1 in the server environment, n_ctx will be 4096.

<!-- gh-comment-id:2370278545 --> @rick-github commented on GitHub (Sep 24, 2024): Either `OLLAMA_NUM_PARALLEL` is unset or set to 4. `OLLAMA_NUM_PARALLEL` sets how many requests ollama can handle concurrently. Each request handler needs its own context space, so total context is (number of handlers * context size). If `OLLAMA_NUM_PARALLEL` is unset, ollama chooses either 1 or 4 depending on how much VRAM is available. If you set `OLLAMA_NUM_PARALLEL=1` in the [server environment](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server), `n_ctx` will be 4096.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30144