[GH-ISSUE #4599] Enable concurrency by default for AMD GPU on windows - requires ROCm v6.2 windows release #28647

Open
opened 2026-04-22 07:07:31 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @dhiltgen on GitHub (May 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4599

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

When running model concurrency, the scheduler is unaware of WDDM KMD memory allocations on system memory and just looks like GPU reported memory usage, which can lead to loading too many models and layers, resulting in thrashing and poor performance.

OS

Windows

GPU

Nvidia, AMD

CPU

Intel

Ollama version

No response

Originally created by @dhiltgen on GitHub (May 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4599 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? When running model concurrency, the scheduler is unaware of WDDM KMD memory allocations on system memory and just looks like GPU reported memory usage, which can lead to loading too many models and layers, resulting in thrashing and poor performance. ### OS Windows ### GPU Nvidia, AMD ### CPU Intel ### Ollama version _No response_
GiteaMirror added the bugamdwindows labels 2026-04-22 07:07:31 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jun 4, 2024):

PR #4517 now includes an update to pull the nvml library in on windows to query GPU VRAM consumption for CUDA cards, which should address the CUDA portion of this. I'm still trying to find the optimal strategy for Radeon cards.

<!-- gh-comment-id:2148190097 --> @dhiltgen commented on GitHub (Jun 4, 2024): PR #4517 now includes an update to pull the nvml library in on windows to query GPU VRAM consumption for CUDA cards, which should address the CUDA portion of this. I'm still trying to find the optimal strategy for Radeon cards.
Author
Owner

@dhiltgen commented on GitHub (Jun 22, 2024):

This is resolved now for CUDA. ROCm will require a version bump to v6.2 but that hasn't been released yet by AMD.

<!-- gh-comment-id:2183592740 --> @dhiltgen commented on GitHub (Jun 22, 2024): This is resolved now for CUDA. ROCm will require a version bump to v6.2 but that hasn't been released yet by AMD.
Author
Owner

@dhiltgen commented on GitHub (Jul 3, 2024):

I'm re-tasking this issue to track AMD specifically, since NVIDIA support has already merged to main and concurrency will be enabled for CUDA GPUs in the upcoming release.

<!-- gh-comment-id:2207521317 --> @dhiltgen commented on GitHub (Jul 3, 2024): I'm re-tasking this issue to track AMD specifically, since NVIDIA support has already merged to main and concurrency will be enabled for CUDA GPUs in the upcoming release.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28647