[GH-ISSUE #12512] Bug: GPU Hang (100% Utilization) on ROCm After Model Reload via /load Command #70367

Closed
opened 2026-05-04 21:17:20 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @926nathant on GitHub (Oct 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12512

What is the issue?

System Information:

Host OS: Ubuntu (Linux)

Hardware: AMD Ryzen AI CPU / AMD Radeon GPU (Device ID 0x1586 as per rocm-smi)

AMD Driver/Platform: ROCm (tested with containers using pytorch/rocm:6.0 base)

Ollama Version: Latest Docker image as of 2025-10-07 (ollama/ollama)

Execution Environment: Docker

Problem Description:

When running Ollama on an AMD GPU with the ROCm backend, the GPU enters a silent, unrecoverable hang state after a model is unloaded and a new one is loaded. This issue has been reproduced consistently and appears to be caused by an improper cleanup of the GPU's memory or computational state during the model-switching process.

The hang manifests as 100% GPU utilization, but with a significant drop in power consumption and temperature, indicating the GPU is stuck in a simple, non-computational infinite loop. Once in this state, the model outputs garbage (e.g., GGGGGG... or #######...) until the Ollama container is restarted.

Steps to Reproduce:

The bug can be reproduced reliably with no external tools, using only the Ollama command-line interface inside the container:

Start the Ollama container:
docker run -d --gpus=all --name stanley-ollama ollama/ollama

Connect to the container's CLI:
docker exec -it stanley-ollama ollama run <model_A>
(e.g., phi3)

The first model (model_A) loads and functions correctly. Have a brief, successful conversation.

Use the /load command to switch to a different model:
/load <model_B>
(e.g., deepseek-r1:1.5b)

The second model (model_B) loads. The first 1-3 inference attempts may succeed.

After a few prompts, the model will begin to output a stream of garbage characters. The hang is now active.

Expected Behavior:

The new model (model_B) should load into a clean state and function correctly for the duration of its use, regardless of how many models were loaded previously.

Actual Behavior:

The Ollama process corrupts the GPU state during the model switch. The GPU locks at 100% utilization, and all subsequent inference attempts produce garbage output until the service is restarted.

Relevant log output

1. rocm-smi output during the hang:
Note the 100% GPU utilization coinciding with a low power draw (33W) and low temperature (48°C), which is characteristic of a hang rather than a heavy workload.
code Code

    
======================================== ROCm System Management Interface ========================================
Device  Node  IDs              Temp    Power     Partitions          SCLK  MCLK  Fan  Perf  PwrCap  VRAM%  GPU%
==================================================================================================================
0       1     0x1586,   40251  48.0°C  33.043W   N/A, N/A, 0         N/A   N/A   0%   auto  N/A     18%    100%
==================================================================================================================

  

2. Kernel Logs (dmesg):
Host kernel logs show no amdgpu driver crashes, page faults, or GPU reset events during the hang. The issue appears to be entirely in the user-space libraries or the application.

3. Terminal Session Log:
(Example of the failure occurring in the openwebui and terminal below)
[openwebui output]
hello 
profile
 Arena Model
  Hello! How can I assist you today? 
  What kind of assistance are you looking for? 
profile
 Arena Model
  I'm a general-purpose conversational AI, so I don't have a specific task in mind. You can ask me anything that's on your mind, and I'll do my best to help or provide information. 

Here are some examples of things we could talk about: 

    General knowledge: history, science, technology, culture, etc.
    Language practice: conversation, vocabulary building, grammar correction, etc.
    Problem-solving: brainstorming ideas, exploring possibilities, etc.
    Creative writing: generating ideas, writing prompts, feedback, etc.
    Fun and games: wordplay, trivia, games like "Would you rather...", etc.
     
Or maybe you just want to chat about your day or interests? I'm happy to listen! 
    
 
(Userinput:)
  I’m interested in your interestes 
 
profile
 Arena Model
  As a conversational AI, I don't have personal interests or experiences like humans do. My purpose is to assist and provide information to users, so I don't have hobbies, preferences, or opinions. 

However, my training data includes a vast amount of text from the internet, books, and other sources, which allows me to understand and respond to a wide range of topics and questions. 

If you'd like to know more about how I was trained or what kind of information I can provide on a particular topic, feel free to ask! 
    
 
(Userinput:)
  what if you were really bad at speaking German 
 
 
profile
 Arena Model
  As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has its limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-quality interactions across many different forms of communication—including German. 

If you are not satisfied with my proficiency at any time: 

    Feedback is always appreciated as it helps improve AI services like me over time!
    You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future.
     

    
 
(Userinput:)
  a;lsd 
 
 
profile
 Arena Model
  Today at 1:07 AM
  "############################### 
[end openwebui output]
[terminal output]
... As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has it
... s limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-qualit
... y interactions across many different forms of communication—including German. 
... 
... If you are not satisfied with my proficiency at any time: 
... 
...     Feedback is always appreciated as it helps improve AI services like me over time!
...     You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future.
...      
... 
...     
...  
... 
... a;lsd 
...  
...  
... profile
... Arena Model
...    
... 
... "############################### 
...     
... 
... Above is the typical conversation. Real answers, then a long string of # or the capital letter "G".
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

>>> yes! That!
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

>>> Send a message (/? for help)
[end terminal output]

OS

Docker

GPU

AMD

CPU

AMD

Ollama version

0.12.3

Originally created by @926nathant on GitHub (Oct 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12512 ### What is the issue? System Information: Host OS: Ubuntu (Linux) Hardware: AMD Ryzen AI CPU / AMD Radeon GPU (Device ID 0x1586 as per rocm-smi) AMD Driver/Platform: ROCm (tested with containers using pytorch/rocm:6.0 base) Ollama Version: Latest Docker image as of 2025-10-07 (ollama/ollama) Execution Environment: Docker Problem Description: When running Ollama on an AMD GPU with the ROCm backend, the GPU enters a silent, unrecoverable hang state after a model is unloaded and a new one is loaded. This issue has been reproduced consistently and appears to be caused by an improper cleanup of the GPU's memory or computational state during the model-switching process. The hang manifests as 100% GPU utilization, but with a significant drop in power consumption and temperature, indicating the GPU is stuck in a simple, non-computational infinite loop. Once in this state, the model outputs garbage (e.g., GGGGGG... or #######...) until the Ollama container is restarted. Steps to Reproduce: The bug can be reproduced reliably with no external tools, using only the Ollama command-line interface inside the container: Start the Ollama container: docker run -d --gpus=all --name stanley-ollama ollama/ollama Connect to the container's CLI: docker exec -it stanley-ollama ollama run <model_A> (e.g., phi3) The first model (model_A) loads and functions correctly. Have a brief, successful conversation. Use the /load command to switch to a different model: /load <model_B> (e.g., deepseek-r1:1.5b) The second model (model_B) loads. The first 1-3 inference attempts may succeed. After a few prompts, the model will begin to output a stream of garbage characters. The hang is now active. Expected Behavior: The new model (model_B) should load into a clean state and function correctly for the duration of its use, regardless of how many models were loaded previously. Actual Behavior: The Ollama process corrupts the GPU state during the model switch. The GPU locks at 100% utilization, and all subsequent inference attempts produce garbage output until the service is restarted. ### Relevant log output ```shell 1. rocm-smi output during the hang: Note the 100% GPU utilization coinciding with a low power draw (33W) and low temperature (48°C), which is characteristic of a hang rather than a heavy workload. code Code ======================================== ROCm System Management Interface ======================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% ================================================================================================================== 0 1 0x1586, 40251 48.0°C 33.043W N/A, N/A, 0 N/A N/A 0% auto N/A 18% 100% ================================================================================================================== 2. Kernel Logs (dmesg): Host kernel logs show no amdgpu driver crashes, page faults, or GPU reset events during the hang. The issue appears to be entirely in the user-space libraries or the application. 3. Terminal Session Log: (Example of the failure occurring in the openwebui and terminal below) [openwebui output] hello profile Arena Model Hello! How can I assist you today? What kind of assistance are you looking for? profile Arena Model I'm a general-purpose conversational AI, so I don't have a specific task in mind. You can ask me anything that's on your mind, and I'll do my best to help or provide information. Here are some examples of things we could talk about: General knowledge: history, science, technology, culture, etc. Language practice: conversation, vocabulary building, grammar correction, etc. Problem-solving: brainstorming ideas, exploring possibilities, etc. Creative writing: generating ideas, writing prompts, feedback, etc. Fun and games: wordplay, trivia, games like "Would you rather...", etc. Or maybe you just want to chat about your day or interests? I'm happy to listen! (Userinput:) I’m interested in your interestes profile Arena Model As a conversational AI, I don't have personal interests or experiences like humans do. My purpose is to assist and provide information to users, so I don't have hobbies, preferences, or opinions. However, my training data includes a vast amount of text from the internet, books, and other sources, which allows me to understand and respond to a wide range of topics and questions. If you'd like to know more about how I was trained or what kind of information I can provide on a particular topic, feel free to ask! (Userinput:) what if you were really bad at speaking German profile Arena Model As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has its limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-quality interactions across many different forms of communication—including German. If you are not satisfied with my proficiency at any time: Feedback is always appreciated as it helps improve AI services like me over time! You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future. (Userinput:) a;lsd profile Arena Model Today at 1:07 AM "############################### [end openwebui output] [terminal output] ... As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has it ... s limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-qualit ... y interactions across many different forms of communication—including German. ... ... If you are not satisfied with my proficiency at any time: ... ... Feedback is always appreciated as it helps improve AI services like me over time! ... You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future. ... ... ... ... ... ... a;lsd ... ... ... profile ... Arena Model ... ... ... "############################### ... ... ... Above is the typical conversation. Real answers, then a long string of # or the capital letter "G". GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG >>> yes! That! GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG >>> Send a message (/? for help) [end terminal output] ``` ### OS Docker ### GPU AMD ### CPU AMD ### Ollama version 0.12.3
GiteaMirror added the bug label 2026-05-04 21:17:20 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Server log may aid in debugging.

<!-- gh-comment-id:3372580437 --> @rick-github commented on GitHub (Oct 6, 2025): [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@926nathant commented on GitHub (Oct 6, 2025):

ollama log.txt
ollama chat in openwebui.txt

<!-- gh-comment-id:3372958607 --> @926nathant commented on GitHub (Oct 6, 2025): [ollama log.txt](https://github.com/user-attachments/files/22727516/ollama.log.txt) [ollama chat in openwebui.txt](https://github.com/user-attachments/files/22727517/ollama.chat.in.openwebui.txt)
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

Full server log may aid in debugging.

<!-- gh-comment-id:3373010688 --> @rick-github commented on GitHub (Oct 6, 2025): Full [server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@926nathant commented on GitHub (Oct 6, 2025):

Was able to recreate in ollama native on Ubuntu 24.04...
see log.

ollama.log

<!-- gh-comment-id:3374468396 --> @926nathant commented on GitHub (Oct 6, 2025): Was able to recreate in ollama native on Ubuntu 24.04... see log. [ollama.log](https://github.com/user-attachments/files/22731882/ollama.log)
Author
Owner

@rick-github commented on GitHub (Oct 6, 2025):

The log shows two models being used, llama3:8b-instruct-q4_0 and deepseek-r1:1.5b-qwen-distill-q4_K_M. The llama model is loaded, does a few chat requests, and then there's a GPU error:

Oct 06 13:34:11 MagicMax ollama[31080]: time=2025-10-06T13:34:11.980-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42453"
Oct 06 13:34:13 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:13 | 200 |  1.671145967s |       127.0.0.1 | POST     "/api/generate"
Oct 06 13:34:25 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:25 | 200 |  2.528967921s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:35:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:35:58 | 200 |  7.797980106s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:38:01 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:38:01 | 200 |  8.493366225s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:39:18 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:39:18 | 200 |  8.661055958s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:40:06 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:40:06 | 200 |   8.65009734s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:40:43 MagicMax ollama[31080]: HW Exception by GPU node-1 (Agent handle: 0x556363024420) reason :GPU Hang

The llama model is reloaded a few minutes later, and then the deepseek model is loaded and does 12 requests, after which there's a quiet period and deepseek is loaded for one more final request.

Oct 06 13:44:49 MagicMax ollama[31080]: time=2025-10-06T13:44:49.629-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42349"
Oct 06 13:44:51 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:44:51 | 200 |  2.695380217s |       127.0.0.1 | POST     "/api/generate"
Oct 06 13:45:10 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:10 | 200 |  6.323452877s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 200 |      38.231µs |       127.0.0.1 | HEAD     "/"
Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 404 |      293.43µs |       127.0.0.1 | POST     "/api/show"
Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 | 22.875832301s |       127.0.0.1 | POST     "/api/pull"
Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 |   25.935859ms |       127.0.0.1 | POST     "/api/show"
Oct 06 13:45:57 MagicMax ollama[31080]: time=2025-10-06T13:45:57.163-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 38973"
Oct 06 13:45:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:58 | 200 |  1.362617748s |       127.0.0.1 | POST     "/api/generate"
Oct 06 13:46:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:21 | 200 |  536.541653ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:46:43 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:43 | 200 |  5.199556192s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:47:41 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:47:41 | 200 |  6.931446439s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:48:28 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:28 | 200 |  4.637018195s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:48:47 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:47 | 200 |  404.261863ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:04 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:04 | 200 |  424.644609ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:08 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:08 | 200 |  254.597441ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:20 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:20 | 200 |  3.255730187s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:57 | 200 |  4.074961155s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:50:42 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:50:42 | 200 |  4.979097927s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:54:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:21 | 200 |  8.476761648s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:54:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:57 | 200 | 10.229929529s |       127.0.0.1 | POST     "/api/chat"
Oct 06 14:21:57 MagicMax ollama[31080]: time=2025-10-06T14:21:57.005-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 37847"
Oct 06 14:21:59 MagicMax ollama[31080]: [GIN] 2025/10/06 - 14:21:59 | 200 |  2.263976261s |       127.0.0.1 | POST     "/api/chat"

Other than the hang (current issue in https://github.com/ollama/ollama/issues/12472) there's not much in the way of problem shown here. The requests are not taking a long time to run and there are no other error messages. If you set OLLAMA_DEBUG=1 in the server environment there might be more useful information logged.

<!-- gh-comment-id:3374595826 --> @rick-github commented on GitHub (Oct 6, 2025): The log shows two models being used, llama3:8b-instruct-q4_0 and deepseek-r1:1.5b-qwen-distill-q4_K_M. The llama model is loaded, does a few chat requests, and then there's a GPU error: ``` Oct 06 13:34:11 MagicMax ollama[31080]: time=2025-10-06T13:34:11.980-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42453" Oct 06 13:34:13 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:13 | 200 | 1.671145967s | 127.0.0.1 | POST "/api/generate" Oct 06 13:34:25 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:25 | 200 | 2.528967921s | 127.0.0.1 | POST "/api/chat" Oct 06 13:35:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:35:58 | 200 | 7.797980106s | 127.0.0.1 | POST "/api/chat" Oct 06 13:38:01 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:38:01 | 200 | 8.493366225s | 127.0.0.1 | POST "/api/chat" Oct 06 13:39:18 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:39:18 | 200 | 8.661055958s | 127.0.0.1 | POST "/api/chat" Oct 06 13:40:06 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:40:06 | 200 | 8.65009734s | 127.0.0.1 | POST "/api/chat" Oct 06 13:40:43 MagicMax ollama[31080]: HW Exception by GPU node-1 (Agent handle: 0x556363024420) reason :GPU Hang ``` The llama model is reloaded a few minutes later, and then the deepseek model is loaded and does 12 requests, after which there's a quiet period and deepseek is loaded for one more final request. ``` Oct 06 13:44:49 MagicMax ollama[31080]: time=2025-10-06T13:44:49.629-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42349" Oct 06 13:44:51 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:44:51 | 200 | 2.695380217s | 127.0.0.1 | POST "/api/generate" Oct 06 13:45:10 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:10 | 200 | 6.323452877s | 127.0.0.1 | POST "/api/chat" Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 200 | 38.231µs | 127.0.0.1 | HEAD "/" Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 404 | 293.43µs | 127.0.0.1 | POST "/api/show" Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 | 22.875832301s | 127.0.0.1 | POST "/api/pull" Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 | 25.935859ms | 127.0.0.1 | POST "/api/show" Oct 06 13:45:57 MagicMax ollama[31080]: time=2025-10-06T13:45:57.163-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 38973" Oct 06 13:45:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:58 | 200 | 1.362617748s | 127.0.0.1 | POST "/api/generate" Oct 06 13:46:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:21 | 200 | 536.541653ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:46:43 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:43 | 200 | 5.199556192s | 127.0.0.1 | POST "/api/chat" Oct 06 13:47:41 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:47:41 | 200 | 6.931446439s | 127.0.0.1 | POST "/api/chat" Oct 06 13:48:28 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:28 | 200 | 4.637018195s | 127.0.0.1 | POST "/api/chat" Oct 06 13:48:47 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:47 | 200 | 404.261863ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:04 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:04 | 200 | 424.644609ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:08 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:08 | 200 | 254.597441ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:20 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:20 | 200 | 3.255730187s | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:57 | 200 | 4.074961155s | 127.0.0.1 | POST "/api/chat" Oct 06 13:50:42 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:50:42 | 200 | 4.979097927s | 127.0.0.1 | POST "/api/chat" Oct 06 13:54:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:21 | 200 | 8.476761648s | 127.0.0.1 | POST "/api/chat" Oct 06 13:54:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:57 | 200 | 10.229929529s | 127.0.0.1 | POST "/api/chat" Oct 06 14:21:57 MagicMax ollama[31080]: time=2025-10-06T14:21:57.005-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 37847" Oct 06 14:21:59 MagicMax ollama[31080]: [GIN] 2025/10/06 - 14:21:59 | 200 | 2.263976261s | 127.0.0.1 | POST "/api/chat" ``` Other than the hang (current issue in https://github.com/ollama/ollama/issues/12472) there's not much in the way of problem shown here. The requests are not taking a long time to run and there are no other error messages. If you set `OLLAMA_DEBUG=1` in the server environment there might be more useful information logged.
Author
Owner

@926nathant commented on GitHub (Oct 7, 2025):

yes, that was what I was seeing (not much). I'll set the debug tonight and recreate.

<!-- gh-comment-id:3376960879 --> @926nathant commented on GitHub (Oct 7, 2025): yes, that was what I was seeing (not much). I'll set the debug tonight and recreate.
Author
Owner

@926nathant commented on GitHub (Oct 7, 2025):

I should clarify, my memory was that the llama3 model was difficult to get the error to show from but then I managed, to my surprise (I assumed it was a docker thing), to recreate it with llama3. Then I thought, because it happens much quicker with smaller models in docker (like deepseek-r1:1.5b and other sub 4bs), that I'd load the larger deepseek model. It hallucinated pretty hard, pretty quickly before it broke into the "GGGGGGGGGGGGGGGGGGGG" output.

<!-- gh-comment-id:3377015991 --> @926nathant commented on GitHub (Oct 7, 2025): I should clarify, my memory was that the llama3 model was difficult to get the error to show from but then I managed, to my surprise (I assumed it was a docker thing), to recreate it with llama3. Then I thought, because it happens much quicker with smaller models in docker (like deepseek-r1:1.5b and other sub 4bs), that I'd load the larger deepseek model. It hallucinated pretty hard, pretty quickly before it broke into the "GGGGGGGGGGGGGGGGGGGG" output.
Author
Owner

@926nathant commented on GitHub (Oct 8, 2025):

chat log and ollama log here. This seems very closely tied to context switching. Often it seems like when I'm changing the topic or getting weird in the conversation. In this chat I used deepseek-r1:1.5b because it's one of the first to break usually. Had a lot of trouble getting it to break by asking it questions about this issue so changed the topic to peanut butter sandwiches. The thinking part says it didn't catch that at all or wasn't trying to change the context. The next response, when I mentioned it failed to talk about peanut butter sandwiches it broke into the GGGGGGGGGGGGGGGGGGGG output.

ntennant MagicMax ollama chat-3.txt
ntennant MagicMax OLLAMA log -3.txt

<!-- gh-comment-id:3379211328 --> @926nathant commented on GitHub (Oct 8, 2025): chat log and ollama log here. This seems very closely tied to context switching. Often it seems like when I'm changing the topic or getting weird in the conversation. In this chat I used deepseek-r1:1.5b because it's one of the first to break usually. Had a lot of trouble getting it to break by asking it questions about this issue so changed the topic to peanut butter sandwiches. The thinking part says it didn't catch that at all or wasn't trying to change the context. The next response, when I mentioned it failed to talk about peanut butter sandwiches it broke into the GGGGGGGGGGGGGGGGGGGG output. [ntennant MagicMax ollama chat-3.txt](https://github.com/user-attachments/files/22757005/ntennant.MagicMax.ollama.chat-3.txt) [ntennant MagicMax OLLAMA log -3.txt](https://github.com/user-attachments/files/22757004/ntennant.MagicMax.OLLAMA.log.-3.txt)
Author
Owner

@926nathant commented on GitHub (Oct 10, 2025):

More logs attached. Here I attempted to bypass the GPU and run only on CPU

MagicMax:~/Downloads/gits$ systemctl stop ollama
MagicMax:~/Downloads/gits$ ROCR_VISIBLE_DEVICES="" OLLAMA_DEBUG=1 ollama serve

ollama-4.txt
ollama chat 4.txt

<!-- gh-comment-id:3392451061 --> @926nathant commented on GitHub (Oct 10, 2025): More logs attached. Here I attempted to bypass the GPU and run only on CPU ``` MagicMax:~/Downloads/gits$ systemctl stop ollama MagicMax:~/Downloads/gits$ ROCR_VISIBLE_DEVICES="" OLLAMA_DEBUG=1 ollama serve ``` [ollama-4.txt](https://github.com/user-attachments/files/22858459/ntennant.MagicMax.ollama-4.txt) [ollama chat 4.txt](https://github.com/user-attachments/files/22858458/ntennant.MagicMax.ollama.chat.4.txt)
Author
Owner

@926nathant commented on GitHub (Oct 10, 2025):

I used the CPU and could not recreate.
Stopped Ollama Completely:

systemctl stop ollama
# Double-check that no ollama processes are running
ps aux | grep ollama

Temporarily Hid the ROCm Library:
sudo mv /usr/local/lib/ollama/rocm /usr/local/lib/ollama/rocm_temp

Ran Ollama in Debug Mode:

OLLAMA_DEBUG=1 ollama serve

MagicMax ollama 5.txt

<!-- gh-comment-id:3392476546 --> @926nathant commented on GitHub (Oct 10, 2025): I used the CPU and could not recreate. Stopped Ollama Completely: ``` systemctl stop ollama # Double-check that no ollama processes are running ps aux | grep ollama ``` Temporarily Hid the ROCm Library: ```sudo mv /usr/local/lib/ollama/rocm /usr/local/lib/ollama/rocm_temp``` Ran Ollama in Debug Mode: ``` OLLAMA_DEBUG=1 ollama serve ``` [MagicMax ollama 5.txt](https://github.com/user-attachments/files/22858616/MagicMax.ollama.5.txt)
Author
Owner

@926nathant commented on GitHub (Oct 29, 2025):

Closing. Fixed with: https://github.com/ROCm/ROCm/issues/5499 Updating Ollama to 12.6 from 12.3 was the last thing.

<!-- gh-comment-id:3464849493 --> @926nathant commented on GitHub (Oct 29, 2025): Closing. Fixed with: https://github.com/ROCm/ROCm/issues/5499 Updating Ollama to 12.6 from 12.3 was the last thing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70367