[GH-ISSUE #11186] Ollama Server ignore num_ctx for Qwen3 model #7373

Closed
opened 2026-04-12 19:25:52 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @JpEncausse on GitHub (Jun 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11186

What is the issue?

By Design Ollama is an abstraction wrapper to call various Model.
Ollama expose some parameters to set :

  • in the HTTP Request
  • env variables
  • or CLI variable

A parameter like num_ctx can't be set in the HTTP Request and must be set OLLAMA_CONTEXT_LENGTH variable for Qwen3 models.

This is an inconsistent behavior. Please,

  • remove or handle num_ctx for all models
  • handle the same way CLI, HTTP and ENV

Ollama is a generic wrapper, as a user we can't guess a behavior according to a given model.

Relevant log output


OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.9.2

Originally created by @JpEncausse on GitHub (Jun 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11186 ### What is the issue? By Design Ollama is an abstraction wrapper to call various Model. Ollama expose some parameters to set : - in the HTTP Request - env variables - or CLI variable A parameter like `num_ctx` can't be set in the HTTP Request and must be set OLLAMA_CONTEXT_LENGTH variable for Qwen3 models. This is an inconsistent behavior. Please, - remove or handle num_ctx for all models - handle the same way CLI, HTTP and ENV Ollama is a generic wrapper, as a user we can't guess a behavior according to a given model. ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.9.2
GiteaMirror added the bug label 2026-04-12 19:25:52 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 24, 2025):

$ curl localhost:11434/api/generate -d '{"model":"qwen3"}'
{"model":"qwen3","created_at":"2025-06-24T21:19:51.947251954Z","response":"","done":true,"done_reason":"load"}
$ ps wwp$(pidof /usr/bin/ollama) | sed -ne 's/.*\(ctx-size\)/\1/p'
ctx-size 4096 --batch-size 512 --n-gpu-layers 37 --threads 8 --parallel 1 --port 41259
$ curl localhost:11434/api/generate -d '{"model":"qwen3","options":{"num_ctx":12345}}'
{"model":"qwen3","created_at":"2025-06-24T21:22:05.37284232Z","response":"","done":true,"done_reason":"load"}
$ ps wwp$(pidof /usr/bin/ollama) | sed -ne 's/.*\(ctx-size\)/\1/p'
ctx-size 12345 --batch-size 512 --n-gpu-layers 37 --threads 8 --parallel 1 --port 33679
<!-- gh-comment-id:3001924552 --> @rick-github commented on GitHub (Jun 24, 2025): ```console $ curl localhost:11434/api/generate -d '{"model":"qwen3"}' {"model":"qwen3","created_at":"2025-06-24T21:19:51.947251954Z","response":"","done":true,"done_reason":"load"} $ ps wwp$(pidof /usr/bin/ollama) | sed -ne 's/.*\(ctx-size\)/\1/p' ctx-size 4096 --batch-size 512 --n-gpu-layers 37 --threads 8 --parallel 1 --port 41259 $ curl localhost:11434/api/generate -d '{"model":"qwen3","options":{"num_ctx":12345}}' {"model":"qwen3","created_at":"2025-06-24T21:22:05.37284232Z","response":"","done":true,"done_reason":"load"} $ ps wwp$(pidof /usr/bin/ollama) | sed -ne 's/.*\(ctx-size\)/\1/p' ctx-size 12345 --batch-size 512 --n-gpu-layers 37 --threads 8 --parallel 1 --port 33679
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7373