[GH-ISSUE #7107] Adrenalin Edition 24.9.1/24.10.1 slow ollama performance #66570

Closed
opened 2026-05-04 07:27:17 -05:00 by GiteaMirror · 16 comments
Owner

Originally created by @skarabaraks on GitHub (Oct 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7107

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Both Adrenalin Edition drivers (24.9.1 and 24.10.1) significantly slows windows performance. GPU acceleration appears disabled.

No issues with ollama on Adrenalin 24.8.1 (slightly older driver)

My system
Windows 11 24H2
GPU: RX6800 XT
CPU: Ryzen 5900XT
32GB RAM
Ollama version: Latest

Originally created by @skarabaraks on GitHub (Oct 6, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7107 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Both Adrenalin Edition drivers (24.9.1 and 24.10.1) significantly slows windows performance. GPU acceleration appears disabled. No issues with ollama on Adrenalin 24.8.1 (slightly older driver) My system Windows 11 24H2 GPU: RX6800 XT CPU: Ryzen 5900XT 32GB RAM Ollama version: Latest
GiteaMirror added the amdbugperformancewindows labels 2026-05-04 07:27:18 -05:00
Author
Owner

@n-shkutov commented on GitHub (Oct 6, 2024):

I can confirm that I have encountered the same problem as the author

OS

Windows 11

GPU

AMD Radeon RX 6800 XT

CPU

AMD Ryzen 7 5800 X3D

Ollama version

0.3.12

AMD Software: Adrenalin Edition

24.9.1

If earlier the answer was generated in a 1s or 2s, now it takes a 1s for 1-2 words

I downgraded to the previous version of Adrenalin Edition (24.8.1) and everything started working fast. Some change in this driver version caused performance degradation and it is not clear what exactly

<!-- gh-comment-id:2395439588 --> @n-shkutov commented on GitHub (Oct 6, 2024): I can confirm that I have encountered the same problem as the author ### OS Windows 11 ### GPU AMD Radeon RX 6800 XT ### CPU AMD Ryzen 7 5800 X3D ### Ollama version 0.3.12 ### AMD Software: Adrenalin Edition 24.9.1 If earlier the answer was generated in a 1s or 2s, now it takes a 1s for 1-2 words I downgraded to the previous version of Adrenalin Edition (24.8.1) and everything started working fast. Some change in this driver version caused performance degradation and it is not clear what exactly
Author
Owner

@rick-github commented on GitHub (Oct 6, 2024):

Server logs will help in diagnosis. The most likely culprit is an increase in context size that is pushing part of the model in to system RAM, see https://github.com/ollama/ollama/issues/7081.

<!-- gh-comment-id:2395481058 --> @rick-github commented on GitHub (Oct 6, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in diagnosis. The most likely culprit is an increase in context size that is pushing part of the model in to system RAM, see https://github.com/ollama/ollama/issues/7081.
Author
Owner

@dhiltgen commented on GitHub (Oct 6, 2024):

Logs with AMD_LOG_LEVEL=3 would confirm, but I believe this is a known regression in the latest AMD driver on windows. AMD has root caused the defect, and hopes to have a fix in the 24.10.1 driver in ~mid October.

With the ROCm logging enabled, in the server log you'll see the following

:3:hip_memory.cpp           :615 : 1204004670473 us: [pid:10512 tid:0x15724] [32m hipMalloc ( 00000042F92FBF88, 4357881856 ) [0m
:1:palresource.cpp          :1204: 1204004670756 us: [pid:10512 tid:0x15724] Failed PAL memory allocation!
:1:palresource.cpp          :1204: 1204004670800 us: [pid:10512 tid:0x15724] Failed PAL memory allocation!
:3:hip_memory.cpp           :617 : 1204005495608 us: [pid:10512 tid:0x15724] hipMalloc: Returned hipSuccess : 000000030ABF0000: duration: 825135 us
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  4156.00 MiB
llm_load_tensors:        CPU buffer size =   281.81 MiB

It's those Failed PAL memory allocation errors that cause ROCm to fall back to allocating shared memory instead of dedicated VRAM, which has a significant performance impact.

I'm not aware of a workaround with the newer driver at this time, so downgrading until 24.10.1 comes out and we confirm the fix is likely the best course of action for now.

<!-- gh-comment-id:2395495280 --> @dhiltgen commented on GitHub (Oct 6, 2024): Logs with `AMD_LOG_LEVEL=3` would confirm, but I believe this is a known regression in the latest AMD driver on windows. AMD has root caused the defect, and hopes to have a fix in the 24.10.1 driver in ~mid October. With the ROCm logging enabled, in the server log you'll see the following ``` :3:hip_memory.cpp :615 : 1204004670473 us: [pid:10512 tid:0x15724] [32m hipMalloc ( 00000042F92FBF88, 4357881856 ) [0m :1:palresource.cpp :1204: 1204004670756 us: [pid:10512 tid:0x15724] Failed PAL memory allocation! :1:palresource.cpp :1204: 1204004670800 us: [pid:10512 tid:0x15724] Failed PAL memory allocation! :3:hip_memory.cpp :617 : 1204005495608 us: [pid:10512 tid:0x15724] hipMalloc: Returned hipSuccess : 000000030ABF0000: duration: 825135 us llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 4156.00 MiB llm_load_tensors: CPU buffer size = 281.81 MiB ``` It's those `Failed PAL memory allocation` errors that cause ROCm to fall back to allocating shared memory instead of dedicated VRAM, which has a significant performance impact. I'm not aware of a workaround with the newer driver at this time, so downgrading until 24.10.1 comes out and we confirm the fix is likely the best course of action for now.
Author
Owner

@polaco1782 commented on GitHub (Oct 16, 2024):

I can confirm that too, downgraded to 24.8.1 and it works fine on my AMD Radeon RX 7800 XT

<!-- gh-comment-id:2416933300 --> @polaco1782 commented on GitHub (Oct 16, 2024): I can confirm that too, downgraded to 24.8.1 and it works fine on my AMD Radeon RX 7800 XT
Author
Owner

@boessu commented on GitHub (Oct 17, 2024):

I can confirm that too: AMD RS 6800 XT, 24.9.1. That's so sad as most 7b / 8b models run better in that old cards that the ones in that ages of team green.
However, in my case it seems slower as if I could use the CPU...
Is there a "use CPU" switch in ollama? It would be useful in these situations (also for comparisons).

<!-- gh-comment-id:2420078735 --> @boessu commented on GitHub (Oct 17, 2024): I can confirm that too: AMD RS 6800 XT, 24.9.1. That's so sad as most 7b / 8b models run better in that old cards that the ones in that ages of team green. However, in my case it seems slower as if I could use the CPU... Is there a "use CPU" switch in ollama? It would be useful in these situations (also for comparisons).
Author
Owner

@dhiltgen commented on GitHub (Oct 17, 2024):

@boessu you can set OLLAMA_LLM_LIBRARY=cpu_avx2 (assuming your CPU has avx2 support) for the server to force CPU based inference.

<!-- gh-comment-id:2420110235 --> @dhiltgen commented on GitHub (Oct 17, 2024): @boessu you can set `OLLAMA_LLM_LIBRARY=cpu_avx2` (assuming your CPU has avx2 support) for the server to force CPU based inference.
Author
Owner

@skarabaraks commented on GitHub (Oct 18, 2024):

Logs with AMD_LOG_LEVEL=3 would confirm, but I believe this is a known regression in the latest AMD driver on windows. AMD has root caused the defect, and hopes to have a fix in the 24.10.1 driver in ~mid October.

With the ROCm logging enabled, in the server log you'll see the following

:3:hip_memory.cpp           :615 : 1204004670473 us: [pid:10512 tid:0x15724] [32m hipMalloc ( 00000042F92FBF88, 4357881856 ) [0m
:1:palresource.cpp          :1204: 1204004670756 us: [pid:10512 tid:0x15724] Failed PAL memory allocation!
:1:palresource.cpp          :1204: 1204004670800 us: [pid:10512 tid:0x15724] Failed PAL memory allocation!
:3:hip_memory.cpp           :617 : 1204005495608 us: [pid:10512 tid:0x15724] hipMalloc: Returned hipSuccess : 000000030ABF0000: duration: 825135 us
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  4156.00 MiB
llm_load_tensors:        CPU buffer size =   281.81 MiB

It's those Failed PAL memory allocation errors that cause ROCm to fall back to allocating shared memory instead of dedicated VRAM, which has a significant performance impact.

I'm not aware of a workaround with the newer driver at this time, so downgrading until 24.10.1 comes out and we confirm the fix is likely the best course of action for now.

Just updated to Adrenalin Edition 24.10.1and the issue persists

<!-- gh-comment-id:2422729366 --> @skarabaraks commented on GitHub (Oct 18, 2024): > Logs with `AMD_LOG_LEVEL=3` would confirm, but I believe this is a known regression in the latest AMD driver on windows. AMD has root caused the defect, and hopes to have a fix in the 24.10.1 driver in ~mid October. > > With the ROCm logging enabled, in the server log you'll see the following > > ``` > :3:hip_memory.cpp :615 : 1204004670473 us: [pid:10512 tid:0x15724] [32m hipMalloc ( 00000042F92FBF88, 4357881856 ) [0m > :1:palresource.cpp :1204: 1204004670756 us: [pid:10512 tid:0x15724] Failed PAL memory allocation! > :1:palresource.cpp :1204: 1204004670800 us: [pid:10512 tid:0x15724] Failed PAL memory allocation! > :3:hip_memory.cpp :617 : 1204005495608 us: [pid:10512 tid:0x15724] hipMalloc: Returned hipSuccess : 000000030ABF0000: duration: 825135 us > llm_load_tensors: offloading 32 repeating layers to GPU > llm_load_tensors: offloading non-repeating layers to GPU > llm_load_tensors: offloaded 33/33 layers to GPU > llm_load_tensors: ROCm0 buffer size = 4156.00 MiB > llm_load_tensors: CPU buffer size = 281.81 MiB > ``` > > It's those `Failed PAL memory allocation` errors that cause ROCm to fall back to allocating shared memory instead of dedicated VRAM, which has a significant performance impact. > > I'm not aware of a workaround with the newer driver at this time, so downgrading until 24.10.1 comes out and we confirm the fix is likely the best course of action for now. Just updated to Adrenalin Edition 24.10.1and the issue persists
Author
Owner

@dhiltgen commented on GitHub (Oct 18, 2024):

Unfortunately it looks like the fix did not make it in time for the 24.10.1 release. It should be in 24.11.1 release in mid November from AMD.

<!-- gh-comment-id:2423260798 --> @dhiltgen commented on GitHub (Oct 18, 2024): Unfortunately it looks like the fix did not make it in time for the 24.10.1 release. It should be in 24.11.1 release in mid November from AMD.
Author
Owner

@boessu commented on GitHub (Oct 19, 2024):

@dhiltgen Thanks for the information about the new driver version. So we keep to use 24.8.1 for AI purposes. I guess the title of the issue should be changed to "Adrenalin Edition 24.9.1/24.10.1 slow ollama performance".
Quite frankly, the Issue should be kept open until it is fixed in the AMD driver for documentation purposes. Even though you can't fix that problem in ollama itself. It was very helpful for me to find it here.

<!-- gh-comment-id:2424123892 --> @boessu commented on GitHub (Oct 19, 2024): @dhiltgen Thanks for the information about the new driver version. So we keep to use 24.8.1 for AI purposes. I guess the title of the issue should be changed to "Adrenalin Edition 24.9.1/**24.10.1** slow ollama performance". Quite frankly, the Issue should be kept open until it is fixed in the AMD driver for documentation purposes. Even though you can't fix that problem in ollama itself. It was very helpful for me to find it here.
Author
Owner

@unclemusclez commented on GitHub (Oct 22, 2024):

@boessu you can set OLLAMA_LLM_LIBRARY=cpu_avx2 (assuming your CPU has avx2 support) for the server to force CPU based inference.

@dhiltgen i think this is the opposite of what people are trying to accomplish.

at the moment AMD GPUs are being FORCED to share memory in windows, which slows inference.

i have not tried this, but eliminating the cpu_avx2 and other cpu environment variables might remedy the issue, and just put the appropriate ROCm devices.

<!-- gh-comment-id:2429896295 --> @unclemusclez commented on GitHub (Oct 22, 2024): > @boessu you can set `OLLAMA_LLM_LIBRARY=cpu_avx2` (assuming your CPU has avx2 support) for the server to force CPU based inference. @dhiltgen i think this is the opposite of what people are trying to accomplish. at the moment AMD GPUs are being FORCED to share memory in windows, which slows inference. i have not tried this, but eliminating the `cpu_avx2` and other cpu environment variables might remedy the issue, and just put the appropriate ROCm devices.
Author
Owner

@dhiltgen commented on GitHub (Oct 22, 2024):

@unclemusclez the root cause is a driver or rocm defect. Unfortunately I'm not aware of any workaround at the Ollama layer to get around it besides downgrading or waiting for the new driver release.

<!-- gh-comment-id:2430385435 --> @dhiltgen commented on GitHub (Oct 22, 2024): @unclemusclez the root cause is a driver or rocm defect. Unfortunately I'm not aware of any workaround at the Ollama layer to get around it besides downgrading or waiting for the new driver release.
Author
Owner

@7shi commented on GitHub (Oct 23, 2024):

@dhiltgen Thank you for guiding me here. Since I confirmed that this issue occurs with a simple hipMalloc(), I reported it to the HIP team.
https://github.com/ROCm/HIP/issues/3644

<!-- gh-comment-id:2432893862 --> @7shi commented on GitHub (Oct 23, 2024): @dhiltgen Thank you for guiding me here. Since I confirmed that this issue occurs with a simple `hipMalloc()`, I reported it to the HIP team. https://github.com/ROCm/HIP/issues/3644
Author
Owner

@7shi commented on GitHub (Dec 5, 2024):

Good news - the recent Adrenalin 24.12.1 update resolves this issue. I've verified it myself after updating.
https://github.com/ROCm/HIP/issues/3644#issuecomment-2520966522

<!-- gh-comment-id:2521677115 --> @7shi commented on GitHub (Dec 5, 2024): Good news - the recent Adrenalin 24.12.1 update resolves this issue. I've verified it myself after updating. https://github.com/ROCm/HIP/issues/3644#issuecomment-2520966522
Author
Owner

@unclemusclez commented on GitHub (Dec 6, 2024):

Good news - the recent Adrenalin 24.12.1 update resolves this issue. I've verified it myself after updating. ROCm/HIP#3644 (comment)

ROCm 6.2.3 was just released as part of the Windows/WSL 2 Drivers.
https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-24-12-1.html

<!-- gh-comment-id:2521805918 --> @unclemusclez commented on GitHub (Dec 6, 2024): > Good news - the recent Adrenalin 24.12.1 update resolves this issue. I've verified it myself after updating. [ROCm/HIP#3644 (comment)](https://github.com/ROCm/HIP/issues/3644#issuecomment-2520966522) ROCm 6.2.3 was just released as part of the Windows/WSL 2 Drivers. https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-24-12-1.html
Author
Owner

@Vesper-Works commented on GitHub (Jan 28, 2025):

For anyone else who's currently trying to figure out why Ollama is being slow and you have this latest driver with the fix, I just disabled "AMD SmartAccess Memory" in the Smart Technology section of the Adreniline software.

<!-- gh-comment-id:2620283879 --> @Vesper-Works commented on GitHub (Jan 28, 2025): For anyone else who's currently trying to figure out why Ollama is being slow and you have this latest driver with the fix, I just disabled "AMD SmartAccess Memory" in the Smart Technology section of the Adreniline software.
Author
Owner

@dhiltgen commented on GitHub (Jul 1, 2025):

I think we can close this one out now.

<!-- gh-comment-id:3025836581 --> @dhiltgen commented on GitHub (Jul 1, 2025): I think we can close this one out now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66570