[GH-ISSUE #4288] bug: extra zero being added to Context Length and Max Tokens #49191

Closed
opened 2026-04-28 10:55:29 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @edwardochoaphd on GitHub (May 9, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4288

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

first reported to OpenWebUI at: https://github.com/open-webui/open-webui/issues/2141

then, checked by a OpenWebUI dev who says may be an issue on ollama-end...
image

Set context length then read logfile. these are 2 logs examples showing my context length had an extra zero added

time=2024-05-08T20:12:24.868Z level=WARN source=memory.go:17 msg="requested context length is greater than model max context length" requested=81920 model=65536
time=2024-05-08T05:24:12.168Z level=WARN source=memory.go:17 msg="requested context length is greater than model max context length" requested=20480 model=8192

also
appears there may be a 2nd (related?) issue/bug with Max Tokens
when set to 2048... in a conversation, getting an extra zero getting added here too?... see below two examples of 20480 which should have been 2048
...
time=2024-05-09T14:33:27.120Z level=WARN source=server.go:77 msg="requested context length is greater than the model's training context window size" requested=20480 "training size"=4096
...
.................................................................................................
llama_new_context_with_model: n_ctx = 20480

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.1.34

Originally created by @edwardochoaphd on GitHub (May 9, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4288 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? first reported to OpenWebUI at: https://github.com/open-webui/open-webui/issues/2141 then, checked by a OpenWebUI dev who says may be an issue on ollama-end... ![image](https://github.com/ollama/ollama/assets/20013196/e29b0cc3-8711-410d-90e9-500bbcaded41) Set context length then read logfile. these are 2 logs examples showing my context length had an extra zero added time=2024-05-08T20:12:24.868Z level=WARN source=memory.go:17 msg="requested context length is greater than model max context length" requested=81920 model=65536 time=2024-05-08T05:24:12.168Z level=WARN source=memory.go:17 msg="requested context length is greater than model max context length" requested=20480 model=8192 also appears there may be a 2nd (related?) issue/bug with Max Tokens when set to 2048... in a conversation, getting an extra zero getting added here too?... see below two examples of 20480 which should have been 2048 ... time=2024-05-09T14:33:27.120Z level=WARN source=server.go:77 msg="requested context length is greater than the model's training context window size" requested=20480 "training size"=4096 ... ................................................................................................. llama_new_context_with_model: n_ctx = 20480 ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.1.34
GiteaMirror added the bug label 2026-04-28 10:55:29 -05:00
Author
Owner

@edwardochoaphd commented on GitHub (May 11, 2024):

using OpenWebUI to workaround the bug, in case helpful for others - see addt'l comments in: https://github.com/open-webui/open-webui/discussions/2147#discussioncomment-9395291
image

<!-- gh-comment-id:2105736858 --> @edwardochoaphd commented on GitHub (May 11, 2024): using OpenWebUI to workaround the bug, in case helpful for others - see addt'l comments in: https://github.com/open-webui/open-webui/discussions/2147#discussioncomment-9395291 ![image](https://github.com/ollama/ollama/assets/20013196/592b5dc5-3b16-46ba-8d68-0c9f2287c3a5)
Author
Owner

@dhiltgen commented on GitHub (May 21, 2024):

We fixed some context calculations glitches related to concurrency in the past few weeks. Can you try upgrading to the latest version and see if the problem is resolved?

<!-- gh-comment-id:2123601133 --> @dhiltgen commented on GitHub (May 21, 2024): We fixed some context calculations glitches related to concurrency in the past few weeks. Can you try upgrading to the latest version and see if the problem is resolved?
Author
Owner

@edwardochoaphd commented on GitHub (May 29, 2024):

@dhiltgen - apologies for delay... upgraded to ollama 0.1.39 still seeing issue with extra zero being added when using OpenWebUI with
LLM = codegemma:7b-instruct-v1.1-fp16
then in OpenWebUI settings, set max tokens = 2048...
i get (from the log file)
...
llama_new_context_with_model: n_ctx = 20480

it looks like an extra zero is still being added...

i again get the same when trying other LLMs, like mixtral:8x22b-instruct-v0.1-q4_K_M
i get (from the log file)
llama_new_context_with_model: n_ctx = 20480

hopefully the above examples are helpful!

<!-- gh-comment-id:2138092619 --> @edwardochoaphd commented on GitHub (May 29, 2024): @dhiltgen - apologies for delay... upgraded to ollama 0.1.39 still seeing issue with extra zero being added when using OpenWebUI with LLM = codegemma:7b-instruct-v1.1-fp16 then in OpenWebUI settings, set max tokens = 2048... i get (from the log file) ... llama_new_context_with_model: n_ctx = 20480 it looks like an extra zero is still being added... i again get the same when trying other LLMs, like mixtral:8x22b-instruct-v0.1-q4_K_M i get (from the log file) llama_new_context_with_model: n_ctx = 20480 hopefully the above examples are helpful!
Author
Owner

@dhiltgen commented on GitHub (Jun 21, 2024):

Are you by any chance setting OLLAMA_NUM_PARALLEL=10?

When I try to repro on the latest release, I see the following with OLLAMA_NUM_PARALLEL=1 (or unset)

llama_new_context_with_model: n_ctx      = 2048

If I set OLLAMA_NUM_PARALLEL=10 then I see

llama_new_context_with_model: n_ctx      = 20480
<!-- gh-comment-id:2183589373 --> @dhiltgen commented on GitHub (Jun 21, 2024): Are you by any chance setting `OLLAMA_NUM_PARALLEL=10`? When I try to repro on the latest release, I see the following with `OLLAMA_NUM_PARALLEL=1` (or unset) ``` llama_new_context_with_model: n_ctx = 2048 ``` If I set `OLLAMA_NUM_PARALLEL=10` then I see ``` llama_new_context_with_model: n_ctx = 20480 ```
Author
Owner

@edwardochoaphd commented on GitHub (Jun 21, 2024):

that's interesting... i was setting OLLAMA_NUM_PARALLEL=10 back around the time i was getting that issue... i'm currently setting OLLAMA_NUM_PARALLEL=15, however i haven't checked in a while to see what that does... that's so strange and interesting you noticed that above...

llama_new_context_with_model: n_ctx = 245760
is what i'm getting ... with LLM CognitiveComputations/dolphin-mixtral:8x22b-v2.9.2-Q5_K_M

<!-- gh-comment-id:2183592097 --> @edwardochoaphd commented on GitHub (Jun 21, 2024): that's interesting... i was setting OLLAMA_NUM_PARALLEL=10 back around the time i was getting that issue... i'm currently setting OLLAMA_NUM_PARALLEL=15, however i haven't checked in a while to see what that does... that's so strange and interesting you noticed that above... llama_new_context_with_model: n_ctx = 245760 is what i'm getting ... with LLM CognitiveComputations/dolphin-mixtral:8x22b-v2.9.2-Q5_K_M
Author
Owner

@dhiltgen commented on GitHub (Jun 23, 2024):

Each parallel request needs its own context, so setting num parallel drives a multiplier to the actual context used when we load the model. It sounds like the system is behaving as expected. In a future version, we'll set a reasonable default context based on available VRAM, but you'll still be able to set it explicitly.

<!-- gh-comment-id:2185320687 --> @dhiltgen commented on GitHub (Jun 23, 2024): Each parallel request needs its own context, so setting num parallel drives a multiplier to the actual context used when we load the model. It sounds like the system is behaving as expected. In a future version, we'll set a reasonable default context based on available VRAM, but you'll still be able to set it explicitly.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49191