[GH-ISSUE #4253] A repeatable hang issue on Linux - dual radeon #28413

Closed
opened 2026-04-22 06:35:34 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @eliranwong on GitHub (May 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4253

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Experience a hang issue consistently.

Device information:
OS: Ubuntu, CPU: AMD Threadripper [AMD Ryzen Threadripper 7960X, 24 Cores, 48 Threads, 4.2GHz Base, 5.3GHz Turbo], Memory: 256GB RAM, Two GPUs: AMD RX 7900XTX + AMD RX 7900XTX

To reproduce the hang issue:

  1. ollama run command-r-plus:104b
  2. Ask a question and get a response
  3. Ctrl+d to exit the session
  4. Ask a question and get a response
  5. Ctrl+d to exit the session
  6. ollama run llama:70b
  7. Ask a question and get a response
  8. Ctrl+d to exit the session
  9. ollama run command-r-plus:104b

Ollama hangs at step 9

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.1.34

Originally created by @eliranwong on GitHub (May 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4253 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Experience a hang issue consistently. Device information: OS: Ubuntu, CPU: AMD Threadripper [AMD Ryzen Threadripper 7960X, 24 Cores, 48 Threads, 4.2GHz Base, 5.3GHz Turbo], Memory: 256GB RAM, Two GPUs: AMD RX 7900XTX + AMD RX 7900XTX To reproduce the hang issue: 1. ollama run command-r-plus:104b 2. Ask a question and get a response 3. Ctrl+d to exit the session 4. Ask a question and get a response 5. Ctrl+d to exit the session 6. ollama run llama:70b 7. Ask a question and get a response 8. Ctrl+d to exit the session 9. ollama run command-r-plus:104b Ollama hangs at step 9 ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.1.34
GiteaMirror added the amdbug labels 2026-04-22 06:35:34 -05:00
Author
Owner

@eliranwong commented on GitHub (May 8, 2024):

Remarks:
I also tried several times with llama.cpp to swap between the same model files. Perfectly loaded with llama.cpp.

It looks like to me that the hang issue with ollama may be due to memory management.

<!-- gh-comment-id:2099873355 --> @eliranwong commented on GitHub (May 8, 2024): Remarks: I also tried several times with llama.cpp to swap between the same model files. Perfectly loaded with llama.cpp. It looks like to me that the hang issue with ollama may be due to memory management.
Author
Owner

@dhiltgen commented on GitHub (May 8, 2024):

While trying to repro this, I saw a few potentially related failures.

  • It seems that the free memory reporting from the amd driver can be slow to update, and sometimes even after the prior model is unloaded and the process exits, the GPUs are still reporting very low available memory, which triggers the model to get loaded into CPU
  • At least once I saw it run out of memory (so the GPU was over-reporting available memory I think)

My suspicion is you likely hit that OOM scenario when you saw it hang. Usually llama.cpp aborts on out of memory errors and exits the process which we do detect, but occasionally I think it gets stuck and doesn't abort, which in our current code would require a 10m timer to expire before we detect it being stuck.

<!-- gh-comment-id:2101627027 --> @dhiltgen commented on GitHub (May 8, 2024): While trying to repro this, I saw a few potentially related failures. - It seems that the free memory reporting from the amd driver can be slow to update, and sometimes even after the prior model is unloaded and the process exits, the GPUs are still reporting very low available memory, which triggers the model to get loaded into CPU - At least once I saw it run out of memory (so the GPU was over-reporting available memory I think) My suspicion is you likely hit that OOM scenario when you saw it hang. Usually llama.cpp aborts on out of memory errors and exits the process which we do detect, but occasionally I think it gets stuck and doesn't abort, which in our current code would require a 10m timer to expire before we detect it being stuck.
Author
Owner

@dhiltgen commented on GitHub (May 9, 2024):

#4294 should resolve the hang, but I still need to work on a solution to the lag in the amd driver reporting available VRAM.

<!-- gh-comment-id:2103199394 --> @dhiltgen commented on GitHub (May 9, 2024): #4294 should resolve the hang, but I still need to work on a solution to the lag in the amd driver reporting available VRAM.
Author
Owner

@dhiltgen commented on GitHub (May 9, 2024):

It looks like it can take over a second for the free memory reported by the amd driver to converge after the process exits. I'll get a PR up to build in a shutdown check to loop until the free memory converges before we start loading the next model, which should fix the other part of this issue.

<!-- gh-comment-id:2103406356 --> @dhiltgen commented on GitHub (May 9, 2024): It looks like it can take over a second for the free memory reported by the amd driver to converge after the process exits. I'll get a PR up to build in a shutdown check to loop until the free memory converges before we start loading the next model, which should fix the other part of this issue.
Author
Owner

@eliranwong commented on GitHub (May 9, 2024):

Thanks a lot

<!-- gh-comment-id:2103550312 --> @eliranwong commented on GitHub (May 9, 2024): Thanks a lot
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28413