[GH-ISSUE #14082] New context length base on vram size is crude rule(ollama0.15.5_rc3). Deepseek R1 32b can no longer works well(32Gvram ). #9194

Closed
opened 2026-04-12 22:02:36 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @taozebra on GitHub (Feb 5, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14082

What is the issue?

When i use deepseek r1 32b model.The output is very slow and the memory usage is abnormal under a 32G VRAM GPU.I think the new rule of vram use is inappropriate.

Relevant log output


OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.15.5rc3

Originally created by @taozebra on GitHub (Feb 5, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14082 ### What is the issue? When i use deepseek r1 32b model.The output is very slow and the memory usage is abnormal under a 32G VRAM GPU.I think the new rule of vram use is inappropriate. ### Relevant log output ```shell ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.15.5rc3
GiteaMirror added the bug label 2026-04-12 22:02:36 -05:00
Author
Owner

@taozebra commented on GitHub (Feb 5, 2026):

Image When i limit context lenth with 4096 in deepseekr1 output speed will be ok.
<!-- gh-comment-id:3850810814 --> @taozebra commented on GitHub (Feb 5, 2026): <img width="907" height="582" alt="Image" src="https://github.com/user-attachments/assets/82aba545-8db8-44f2-9fcb-80b625947a33" /> When i limit context lenth with 4096 in deepseekr1 output speed will be ok.
Author
Owner

@rick-github commented on GitHub (Feb 5, 2026):

You can set OLLAMA_CONTEXT_LENGTH in the server environment, you don't need to modify the code. Or create a Modelfile that has PARAMETER num_ctx 4096.

<!-- gh-comment-id:3850875880 --> @rick-github commented on GitHub (Feb 5, 2026): You can set `OLLAMA_CONTEXT_LENGTH` in the server environment, you don't need to modify the code. Or create a Modelfile that has `PARAMETER num_ctx 4096`.
Author
Owner

@taozebra commented on GitHub (Feb 5, 2026):

You can set OLLAMA_CONTEXT_LENGTH in the server environment, you don't need to modify the code. Or create a Modelfile that has PARAMETER num_ctx 4096.

Most time I just use ollama by default setting. So when i update ollama to newest version i come across this issure. Thanks for your tip. they are usefully, In fact, I can modify openwebui configure of model to solve the problem(like Modefile). However, I still hope that ollama can be used out of the box.

<!-- gh-comment-id:3853178709 --> @taozebra commented on GitHub (Feb 5, 2026): > You can set `OLLAMA_CONTEXT_LENGTH` in the server environment, you don't need to modify the code. Or create a Modelfile that has `PARAMETER num_ctx 4096`. Most time I just use ollama by default setting. So when i update ollama to newest version i come across this issure. Thanks for your tip. they are usefully, In fact, I can modify openwebui configure of model to solve the problem(like Modefile). However, I still hope that ollama can be used out of the box.
Author
Owner

@jessegross commented on GitHub (Feb 5, 2026):

Can you please post your server logs? In theory, the VRAM usage should be pretty close to what you have.

<!-- gh-comment-id:3855271971 --> @jessegross commented on GitHub (Feb 5, 2026): Can you please post your server logs? In theory, the VRAM usage should be pretty close to what you have.
Author
Owner

@rick-github commented on GitHub (Feb 5, 2026):

May be the num parallel issue.

<!-- gh-comment-id:3855638872 --> @rick-github commented on GitHub (Feb 5, 2026): May be the num parallel issue.
Author
Owner

@taozebra commented on GitHub (Feb 6, 2026):

May be the num parallel issue.

Yes, I set "OLLAMA_NUM_PARALLEL=2"

<!-- gh-comment-id:3857743561 --> @taozebra commented on GitHub (Feb 6, 2026): > May be the num parallel issue. Yes, I set "OLLAMA_NUM_PARALLEL=2"
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9194