[GH-ISSUE #9395] When the model exceeds the VRAM and uses the CPU (E5 2673v4) for inference, the program becomes unresponsive. #6129

Closed
opened 2026-04-12 17:28:33 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @hendrymax on GitHub (Feb 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9395

What is the issue?

When the model exceeds the VRAM and uses the CPU (E5 2673v4) for inference, the program becomes unresponsive.

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @hendrymax on GitHub (Feb 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9395 ### What is the issue? When the model exceeds the VRAM and uses the CPU (E5 2673v4) for inference, the program becomes unresponsive. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the needs more infobug labels 2026-04-12 17:28:33 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 27, 2025):

CPU is slower than GPU, so the model will take longer to respond. If you provide server logs more specific remedies can be suggested.

<!-- gh-comment-id:2688143137 --> @rick-github commented on GitHub (Feb 27, 2025): CPU is slower than GPU, so the model will take longer to respond. If you provide [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) more specific remedies can be suggested.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6129