[GH-ISSUE #7328] Performance degradation with 8B+ models on Windows Radeon #66711

Closed
opened 2026-05-04 07:55:34 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @7shi on GitHub (Oct 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7328

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

When running models 8B or larger on Windows with Radeon GPU, performance is slower than CPU-only mode, despite having sufficient VRAM available.

Environment:

  • OS: Windows 11 Home [10.0.22631]
  • CPU: AMD Ryzen 5 5600X 6-Core Processor
  • GPU: Radeon RX 7600 XT
  • VRAM: 16GB

Root Cause Investigation:

I've identified that this is caused by a HIP SDK behavior where memory allocations larger than 4GB are being redirected to shared GPU memory instead of using dedicated VRAM. I've reported this behavior to the HIP team here:

https://github.com/ROCm/HIP/issues/3644

Current Status:

As this is a HIP-level issue, improvement in model performance will depend on resolution from the HIP team. Creating this issue for visibility and to help others who might encounter similar performance degradation with large models on Windows Radeon setups.

Impact:

  • Models 8B and larger run slower than CPU-only mode
  • Available VRAM remains unused while slower shared memory is being utilized

379019648-866210ef-4a2c-4525-9026-9f614e19694e

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.3.14

Originally created by @7shi on GitHub (Oct 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7328 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? When running models 8B or larger on Windows with Radeon GPU, performance is slower than CPU-only mode, despite having sufficient VRAM available. Environment: - OS: Windows 11 Home [10.0.22631] - CPU: AMD Ryzen 5 5600X 6-Core Processor - GPU: Radeon RX 7600 XT - VRAM: 16GB Root Cause Investigation: I've identified that this is caused by a HIP SDK behavior where memory allocations larger than 4GB are being redirected to shared GPU memory instead of using dedicated VRAM. I've reported this behavior to the HIP team here: https://github.com/ROCm/HIP/issues/3644 Current Status: As this is a HIP-level issue, improvement in model performance will depend on resolution from the HIP team. Creating this issue for visibility and to help others who might encounter similar performance degradation with large models on Windows Radeon setups. Impact: - Models 8B and larger run slower than CPU-only mode - Available VRAM remains unused while slower shared memory is being utilized ![379019648-866210ef-4a2c-4525-9026-9f614e19694e](https://github.com/user-attachments/assets/07a1739a-3ee0-4b68-b515-b5ee6c0f4b6f) ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version 0.3.14
GiteaMirror added the bug label 2026-05-04 07:55:34 -05:00
Author
Owner

@0TTA commented on GitHub (Oct 23, 2024):

Same issue here!

<!-- gh-comment-id:2431752823 --> @0TTA commented on GitHub (Oct 23, 2024): Same issue here!
Author
Owner

@7shi commented on GitHub (Oct 23, 2024):

I've read your report - maybe the same issue.
https://github.com/ollama/ollama/issues/7330

<!-- gh-comment-id:2431909332 --> @7shi commented on GitHub (Oct 23, 2024): I've read your report - maybe the same issue. https://github.com/ollama/ollama/issues/7330
Author
Owner

@dhiltgen commented on GitHub (Oct 23, 2024):

This looks like a dup of #7107 - the only workaround for now is to downgrade to a driver OLDER than 24.9.1 until 24.11 comes out which should have the fix.

<!-- gh-comment-id:2432832082 --> @dhiltgen commented on GitHub (Oct 23, 2024): This looks like a dup of #7107 - the only workaround for now is to downgrade to a driver OLDER than 24.9.1 until 24.11 comes out which should have the fix.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66711