[GH-ISSUE #10709] Possible memory leakage #69096

Closed
opened 2026-05-04 17:08:32 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @nekiee13 on GitHub (May 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10709

What is the issue?

OS crashing while running python RAG-evaluation pipeline (using Ollama’s API) on a dual AMD EPYC 9124 Windows 11 Pro, with an NVIDIA RTX 4090 and Ollama version is 0.6.8.

OpenAi API and LM Studio API run fine on the same script.

As for Ollama. it sustain longer load with some models (for example glm4:9b, granite3.3:latest or gemma3:27b-it-qat, where I couldn't trigger crush, medium duration (qwen3:30b) and very rapid crushing (while using llama3-chatqa:8b-v1.5-fp16).

Relevant log output

## Logs overview
### Python
Python logs are clear.

### **Win+R + eventvwr.msc**
* Information, Warnings, Errors - Event ID 88 - the system was hibernated due to a critical thermal event

- HWInfo confirms that machine is not running extremely hot with RAG Pipeline. Beside, with thermal events, the crashes would not be associated with ollama client only. Can train LSTM on full GPU load for 24h without issues. Also can do CPU inference for hours (100k+ ctx), without issues. 
- My guess, it is not thermal hibernation. 

### **Win + R + perfmon /rel**
Generic report:
* The previous system shutdown at 12:50:35 AM on ‎5/‎13/‎2025 was unexpected.
* The previous system shutdown at 1:03:55 AM on ‎5/‎13/‎2025 was unexpected.



### **Windows RADAR Event**
A RADAR\_PRE\_LEAK\_64 event implicating `ollama.exe` in a memory leak. So, I'm suspecting the leak originates in the Ollama back-end rather than Python client or thermal hibernation.

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.6.8.

Originally created by @nekiee13 on GitHub (May 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10709 ### What is the issue? OS crashing while running python RAG-evaluation pipeline (using Ollama’s API) on a dual AMD EPYC 9124 Windows 11 Pro, with an NVIDIA RTX 4090 and Ollama version is 0.6.8. OpenAi API and LM Studio API run fine on the same script. As for Ollama. it sustain longer load with some models (for example glm4:9b, granite3.3:latest or gemma3:27b-it-qat, where I couldn't trigger crush, medium duration (qwen3:30b) and very rapid crushing (while using llama3-chatqa:8b-v1.5-fp16). ### Relevant log output ```shell ## Logs overview ### Python Python logs are clear. ### **Win+R + eventvwr.msc** * Information, Warnings, Errors - Event ID 88 - the system was hibernated due to a critical thermal event - HWInfo confirms that machine is not running extremely hot with RAG Pipeline. Beside, with thermal events, the crashes would not be associated with ollama client only. Can train LSTM on full GPU load for 24h without issues. Also can do CPU inference for hours (100k+ ctx), without issues. - My guess, it is not thermal hibernation. ### **Win + R + perfmon /rel** Generic report: * The previous system shutdown at 12:50:35 AM on ‎5/‎13/‎2025 was unexpected. * The previous system shutdown at 1:03:55 AM on ‎5/‎13/‎2025 was unexpected. ### **Windows RADAR Event** A RADAR\_PRE\_LEAK\_64 event implicating `ollama.exe` in a memory leak. So, I'm suspecting the leak originates in the Ollama back-end rather than Python client or thermal hibernation. ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.6.8.
GiteaMirror added the bugneeds more info labels 2026-05-04 17:08:33 -05:00
Author
Owner

@rick-github commented on GitHub (May 14, 2025):

There have been some memory related issues fixed since 0.6.8. Does 0.7.0 still exhibit the problem?

<!-- gh-comment-id:2881655386 --> @rick-github commented on GitHub (May 14, 2025): There have been some memory related issues fixed since 0.6.8. Does [0.7.0](https://github.com/ollama/ollama/releases/tag/v0.7.0) still exhibit the problem?
Author
Owner

@Gregory-Colin commented on GitHub (May 15, 2025):

I'm currently on 0.6.8 and while the memory leaks have been substantially ameliorated, they do still seem to exist.

<!-- gh-comment-id:2884388000 --> @Gregory-Colin commented on GitHub (May 15, 2025): I'm currently on 0.6.8 and while the memory leaks have been substantially ameliorated, they do still seem to exist.
Author
Owner

@rick-github commented on GitHub (May 15, 2025):

There have been some memory related issues fixed since 0.6.8. Does 0.7.0 still exhibit the problem?

<!-- gh-comment-id:2884448115 --> @rick-github commented on GitHub (May 15, 2025): There have been some memory related issues fixed since 0.6.8. Does [0.7.0](https://github.com/ollama/ollama/releases/tag/v0.7.0) still exhibit the problem?
Author
Owner

@nekiee13 commented on GitHub (May 15, 2025):

Ollama 0.7.0 seems to be more stable overall.

The only problematic LLM at the moment is llama3-chatqa:8b-v1.5-fp16. This one still causes almost immediate crash/shutdown. There is something exceptionally unstable related to that model. But, it is no problem - I simply will not use that model.
At the moment I don't have any problems with other LLMs.

I will report to you if I encounter similar issues with further testing.

Thank you for your efforts

Edit:
It is still crashing. Will look into Server logs as proposed.

<!-- gh-comment-id:2885187658 --> @nekiee13 commented on GitHub (May 15, 2025): Ollama 0.7.0 seems to be more stable overall. The only problematic LLM at the moment is llama3-chatqa:8b-v1.5-fp16. This one still causes almost immediate crash/shutdown. There is something exceptionally unstable related to that model. But, it is no problem - I simply will not use that model. At the moment I don't have any problems with other LLMs. I will report to you if I encounter similar issues with further testing. Thank you for your efforts Edit: It is still crashing. Will look into Server logs as proposed.
Author
Owner

@rick-github commented on GitHub (May 15, 2025):

I was unable to reproduce a crash with llama3-chatqa:8b-v1.5-fp16 on my test system. Server logs may aid in debugging.

<!-- gh-comment-id:2885197290 --> @rick-github commented on GitHub (May 15, 2025): I was unable to reproduce a crash with llama3-chatqa:8b-v1.5-fp16 on my test system. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69096