[GH-ISSUE #7576] num_ctx causes 100% CPU with no GPU usage #51340

Closed
opened 2026-04-28 19:35:35 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @aaronbolton on GitHub (Nov 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7576

What is the issue?

ive tried to create a new model recently with only the parameter num_ctx, when I run the models it shows 100% CPU with no GPU usage, even if the model was too big I would assume it would report GPU/CPU 100%/???%

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.4.0

Originally created by @aaronbolton on GitHub (Nov 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7576 ### What is the issue? ive tried to create a new model recently with only the parameter num_ctx, when I run the models it shows 100% CPU with no GPU usage, even if the model was too big I would assume it would report GPU/CPU 100%/???% ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.0
GiteaMirror added the bugneeds more info labels 2026-04-28 19:35:36 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 8, 2024):

#7509

Server logs will help in debugging.

<!-- gh-comment-id:2465216823 --> @rick-github commented on GitHub (Nov 8, 2024): #7509 [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging.
Author
Owner

@aaronbolton commented on GitHub (Nov 10, 2024):

Here is a fairly clean log, the model is loaded at around line 47

ollama_logs.txt

<!-- gh-comment-id:2466707819 --> @aaronbolton commented on GitHub (Nov 10, 2024): Here is a fairly clean log, the model is loaded at around line 47 [ollama_logs.txt](https://github.com/user-attachments/files/17691317/ollama_logs.txt)
Author
Owner

@rick-github commented on GitHub (Nov 10, 2024):

You have OLLAMA_NUM_PARALLEL=16 and are setting num_ctx to 32K, so the runner needs 28G just for context. Combined with other memory requirements, it just won't fit in the 2x15G you have on your GPUs, so the entire model is loaded in RAM. Try reducing OLLAMA_NUM_PARALLEL or num_ctx.

<!-- gh-comment-id:2466721302 --> @rick-github commented on GitHub (Nov 10, 2024): You have `OLLAMA_NUM_PARALLEL=16` and are setting `num_ctx` to 32K, so the runner needs 28G just for context. Combined with other memory requirements, it just won't fit in the 2x15G you have on your GPUs, so the entire model is loaded in RAM. Try reducing `OLLAMA_NUM_PARALLEL` or `num_ctx`.
Author
Owner

@aaronbolton commented on GitHub (Nov 12, 2024):

perfect thank you for the pointer and informative answer

<!-- gh-comment-id:2471300965 --> @aaronbolton commented on GitHub (Nov 12, 2024): perfect thank you for the pointer and informative answer
Author
Owner

@liaoweiguo commented on GitHub (Dec 29, 2024):

how do we get detailed information about this memory information, so important

<!-- gh-comment-id:2564624949 --> @liaoweiguo commented on GitHub (Dec 29, 2024): how do we get detailed information about this memory information, so important
Author
Owner

@rick-github commented on GitHub (Dec 29, 2024):

Server logs. Look for memory.required.

<!-- gh-comment-id:2564626484 --> @rick-github commented on GitHub (Dec 29, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues). Look for `memory.required`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51340