[GH-ISSUE #10762] Ollama keeps flushing Qwen3 from memory every prompt #7070

Closed
opened 2026-04-12 18:59:41 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @codecrafting-io on GitHub (May 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10762

What is the issue?

I have an issue since Qwen3 14B public launch, that for every prompt the model is beign flushed from memory, which is the main cause for bad performance for this model I think. I have tried several versions from 0.6.6 to the latest 0.7.0. I'm using ollama in conjunction with Open Webui, but I don't see this behavior for other models, like Deepseek R1 and Gemma 3. I know there is a keep alive setting, but my goal is not to leave the model in memory indefinitely, and it should stay in memory for at least a few minutes, just like other models do.

I'm using a RTX 4060TI 16GB and I'm not sure what's going on, but to date the issue is still happening.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.7.0

Originally created by @codecrafting-io on GitHub (May 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10762 ### What is the issue? I have an issue since Qwen3 14B public launch, that for every prompt the model is beign flushed from memory, which is the main cause for bad performance for this model I think. I have tried several versions from 0.6.6 to the latest 0.7.0. I'm using ollama in conjunction with Open Webui, but I don't see this behavior for other models, like Deepseek R1 and Gemma 3. I know there is a keep alive setting, but my goal is not to leave the model in memory indefinitely, and it should stay in memory for at least a few minutes, just like other models do. I'm using a RTX 4060TI 16GB and I'm not sure what's going on, but to date the issue is still happening. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.7.0
GiteaMirror added the bugneeds more info labels 2026-04-12 18:59:41 -05:00
Author
Owner

@rick-github commented on GitHub (May 18, 2025):

Server logs may aid in debugging.

<!-- gh-comment-id:2888858444 --> @rick-github commented on GitHub (May 18, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@pdevine commented on GitHub (May 20, 2025):

When you use ollama ps after a prompt, what does it say?

<!-- gh-comment-id:2896076246 --> @pdevine commented on GitHub (May 20, 2025): When you use `ollama ps` after a prompt, what does it say?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7070