[GH-ISSUE #9002] How to limit GPU memory usage for Ollama when running large models? #67905

Closed
opened 2026-05-04 12:00:04 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @lizheyong on GitHub (Feb 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9002

Environment

  • OS: CentOS (offline environment)
  • Hardware: 2x NVIDIA A100 40GB GPUs
  • Model: Deepseek-r1-70B (43GB)
  • Current Configuration:
    • CUDA_VISIBLE_DEVICES=0,1
    • OLLAMA_MAX_LOADED_MODELS=1

Current Behavior

  • The model spans across both GPUs
  • Total GPU memory usage is around 80GB
  • Both GPUs are fully occupied

Expected Behavior

  • The model should use partial memory of the second GPU
  • Reserve some GPU memory on second GPU for other deep learning tasks
  • Still maintain model functionality

What I've Tried

  • Checked Ollama documentation
  • Searched online for solutions
  • Modified ollama.service configuration

Question

Is there any way to effectively limit Ollama's GPU memory usage, particularly on the second GPU?

Originally created by @lizheyong on GitHub (Feb 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9002 ## Environment - OS: CentOS (offline environment) - Hardware: 2x NVIDIA A100 40GB GPUs - Model: Deepseek-r1-70B (43GB) - Current Configuration: - `CUDA_VISIBLE_DEVICES=0,1` - `OLLAMA_MAX_LOADED_MODELS=1` ## Current Behavior - The model spans across both GPUs - Total GPU memory usage is around 80GB - Both GPUs are fully occupied ## Expected Behavior - The model should use partial memory of the second GPU - Reserve some GPU memory on second GPU for other deep learning tasks - Still maintain model functionality ## What I've Tried - Checked Ollama documentation - Searched online for solutions - Modified ollama.service configuration ## Question Is there any way to effectively limit Ollama's GPU memory usage, particularly on the second GPU?
GiteaMirror added the question label 2026-05-04 12:00:05 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 11, 2025):

https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288

Or reduce num_ctx.

<!-- gh-comment-id:2650215196 --> @rick-github commented on GitHub (Feb 11, 2025): https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288 Or reduce `num_ctx`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67905