[GH-ISSUE #12512] Bug: GPU Hang (100% Utilization) on ROCm After Model Reload via /load Command #70367

New Issue

GiteaMirror · 2026-05-04T21:17:20-05:00

GiteaMirror commented

2026-05-04 21:17:20 -05:00

Originally created by @926nathant on GitHub (Oct 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12512

What is the issue?

System Information:

Host OS: Ubuntu (Linux)

Hardware: AMD Ryzen AI CPU / AMD Radeon GPU (Device ID 0x1586 as per rocm-smi)

AMD Driver/Platform: ROCm (tested with containers using pytorch/rocm:6.0 base)

Ollama Version: Latest Docker image as of 2025-10-07 (ollama/ollama)

Execution Environment: Docker

Problem Description:

When running Ollama on an AMD GPU with the ROCm backend, the GPU enters a silent, unrecoverable hang state after a model is unloaded and a new one is loaded. This issue has been reproduced consistently and appears to be caused by an improper cleanup of the GPU's memory or computational state during the model-switching process.

The hang manifests as 100% GPU utilization, but with a significant drop in power consumption and temperature, indicating the GPU is stuck in a simple, non-computational infinite loop. Once in this state, the model outputs garbage (e.g., GGGGGG... or #######...) until the Ollama container is restarted.

Steps to Reproduce:

The bug can be reproduced reliably with no external tools, using only the Ollama command-line interface inside the container:

Start the Ollama container:
docker run -d --gpus=all --name stanley-ollama ollama/ollama

Connect to the container's CLI:
docker exec -it stanley-ollama ollama run <model_A>
(e.g., phi3)

The first model (model_A) loads and functions correctly. Have a brief, successful conversation.

Use the /load command to switch to a different model:
/load <model_B>
(e.g., deepseek-r1:1.5b)

The second model (model_B) loads. The first 1-3 inference attempts may succeed.

After a few prompts, the model will begin to output a stream of garbage characters. The hang is now active.

Expected Behavior:

The new model (model_B) should load into a clean state and function correctly for the duration of its use, regardless of how many models were loaded previously.

Actual Behavior:

The Ollama process corrupts the GPU state during the model switch. The GPU locks at 100% utilization, and all subsequent inference attempts produce garbage output until the service is restarted.

Relevant log output

1. rocm-smi output during the hang:
Note the 100% GPU utilization coinciding with a low power draw (33W) and low temperature (48°C), which is characteristic of a hang rather than a heavy workload.
code Code

    
======================================== ROCm System Management Interface ========================================
Device  Node  IDs              Temp    Power     Partitions          SCLK  MCLK  Fan  Perf  PwrCap  VRAM%  GPU%
==================================================================================================================
0       1     0x1586,   40251  48.0°C  33.043W   N/A, N/A, 0         N/A   N/A   0%   auto  N/A     18%    100%
==================================================================================================================

  

2. Kernel Logs (dmesg):
Host kernel logs show no amdgpu driver crashes, page faults, or GPU reset events during the hang. The issue appears to be entirely in the user-space libraries or the application.

3. Terminal Session Log:
(Example of the failure occurring in the openwebui and terminal below)
[openwebui output]
hello 
profile
 Arena Model
  Hello! How can I assist you today? 
  What kind of assistance are you looking for? 
profile
 Arena Model
  I'm a general-purpose conversational AI, so I don't have a specific task in mind. You can ask me anything that's on your mind, and I'll do my best to help or provide information. 

Here are some examples of things we could talk about: 

    General knowledge: history, science, technology, culture, etc.
    Language practice: conversation, vocabulary building, grammar correction, etc.
    Problem-solving: brainstorming ideas, exploring possibilities, etc.
    Creative writing: generating ideas, writing prompts, feedback, etc.
    Fun and games: wordplay, trivia, games like "Would you rather...", etc.
     
Or maybe you just want to chat about your day or interests? I'm happy to listen! 
    
 
(Userinput:)
  I’m interested in your interestes 
 
profile
 Arena Model
  As a conversational AI, I don't have personal interests or experiences like humans do. My purpose is to assist and provide information to users, so I don't have hobbies, preferences, or opinions. 

However, my training data includes a vast amount of text from the internet, books, and other sources, which allows me to understand and respond to a wide range of topics and questions. 

If you'd like to know more about how I was trained or what kind of information I can provide on a particular topic, feel free to ask! 
    
 
(Userinput:)
  what if you were really bad at speaking German 
 
 
profile
 Arena Model
  As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has its limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-quality interactions across many different forms of communication—including German. 

If you are not satisfied with my proficiency at any time: 

    Feedback is always appreciated as it helps improve AI services like me over time!
    You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future.
     

    
 
(Userinput:)
  a;lsd 
 
 
profile
 Arena Model
  Today at 1:07 AM
  "############################### 
[end openwebui output]
[terminal output]
... As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has it
... s limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-qualit
... y interactions across many different forms of communication—including German. 
... 
... If you are not satisfied with my proficiency at any time: 
... 
...     Feedback is always appreciated as it helps improve AI services like me over time!
...     You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future.
...      
... 
...     
...  
... 
... a;lsd 
...  
...  
... profile
... Arena Model
...    
... 
... "############################### 
...     
... 
... Above is the typical conversation. Real answers, then a long string of # or the capital letter "G".
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

>>> yes! That!
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

>>> Send a message (/? for help)
[end terminal output]

OS

Docker

GPU

AMD

CPU

AMD

Ollama version

0.12.3

Originally created by @926nathant on GitHub (Oct 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12512 ### What is the issue? System Information: Host OS: Ubuntu (Linux) Hardware: AMD Ryzen AI CPU / AMD Radeon GPU (Device ID 0x1586 as per rocm-smi) AMD Driver/Platform: ROCm (tested with containers using pytorch/rocm:6.0 base) Ollama Version: Latest Docker image as of 2025-10-07 (ollama/ollama) Execution Environment: Docker Problem Description: When running Ollama on an AMD GPU with the ROCm backend, the GPU enters a silent, unrecoverable hang state after a model is unloaded and a new one is loaded. This issue has been reproduced consistently and appears to be caused by an improper cleanup of the GPU's memory or computational state during the model-switching process. The hang manifests as 100% GPU utilization, but with a significant drop in power consumption and temperature, indicating the GPU is stuck in a simple, non-computational infinite loop. Once in this state, the model outputs garbage (e.g., GGGGGG... or #######...) until the Ollama container is restarted. Steps to Reproduce: The bug can be reproduced reliably with no external tools, using only the Ollama command-line interface inside the container: Start the Ollama container: docker run -d --gpus=all --name stanley-ollama ollama/ollama Connect to the container's CLI: docker exec -it stanley-ollama ollama run <model_A> (e.g., phi3) The first model (model_A) loads and functions correctly. Have a brief, successful conversation. Use the /load command to switch to a different model: /load <model_B> (e.g., deepseek-r1:1.5b) The second model (model_B) loads. The first 1-3 inference attempts may succeed. After a few prompts, the model will begin to output a stream of garbage characters. The hang is now active. Expected Behavior: The new model (model_B) should load into a clean state and function correctly for the duration of its use, regardless of how many models were loaded previously. Actual Behavior: The Ollama process corrupts the GPU state during the model switch. The GPU locks at 100% utilization, and all subsequent inference attempts produce garbage output until the service is restarted. ### Relevant log output ```shell 1. rocm-smi output during the hang: Note the 100% GPU utilization coinciding with a low power draw (33W) and low temperature (48°C), which is characteristic of a hang rather than a heavy workload. code Code ======================================== ROCm System Management Interface ======================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% ================================================================================================================== 0 1 0x1586, 40251 48.0°C 33.043W N/A, N/A, 0 N/A N/A 0% auto N/A 18% 100% ================================================================================================================== 2. Kernel Logs (dmesg): Host kernel logs show no amdgpu driver crashes, page faults, or GPU reset events during the hang. The issue appears to be entirely in the user-space libraries or the application. 3. Terminal Session Log: (Example of the failure occurring in the openwebui and terminal below) [openwebui output] hello profile Arena Model Hello! How can I assist you today? What kind of assistance are you looking for? profile Arena Model I'm a general-purpose conversational AI, so I don't have a specific task in mind. You can ask me anything that's on your mind, and I'll do my best to help or provide information. Here are some examples of things we could talk about: General knowledge: history, science, technology, culture, etc. Language practice: conversation, vocabulary building, grammar correction, etc. Problem-solving: brainstorming ideas, exploring possibilities, etc. Creative writing: generating ideas, writing prompts, feedback, etc. Fun and games: wordplay, trivia, games like "Would you rather...", etc. Or maybe you just want to chat about your day or interests? I'm happy to listen! (Userinput:) I’m interested in your interestes profile Arena Model As a conversational AI, I don't have personal interests or experiences like humans do. My purpose is to assist and provide information to users, so I don't have hobbies, preferences, or opinions. However, my training data includes a vast amount of text from the internet, books, and other sources, which allows me to understand and respond to a wide range of topics and questions. If you'd like to know more about how I was trained or what kind of information I can provide on a particular topic, feel free to ask! (Userinput:) what if you were really bad at speaking German profile Arena Model As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has its limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-quality interactions across many different forms of communication—including German. If you are not satisfied with my proficiency at any time: Feedback is always appreciated as it helps improve AI services like me over time! You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future. (Userinput:) a;lsd profile Arena Model Today at 1:07 AM "############################### [end openwebui output] [terminal output] ... As an artificial intelligence developed by Microsoft, my goal is always to communicate effectively and accurately. While natural language processing has it ... s limitations, including occasional errors or misunderstandings in some languages due to dialects and complex sentence structures, I strive for high-qualit ... y interactions across many different forms of communication—including German. ... ... If you are not satisfied with my proficiency at any time: ... ... Feedback is always appreciated as it helps improve AI services like me over time! ... You can correct or suggest better responses, and I learn from this input to provide improved interactions in the future. ... ... ... ... ... ... a;lsd ... ... ... profile ... Arena Model ... ... ... "############################### ... ... ... Above is the typical conversation. Real answers, then a long string of # or the capital letter "G". GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG >>> yes! That! GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG >>> Send a message (/? for help) [end terminal output] ``` ### OS Docker ### GPU AMD ### CPU AMD ### Ollama version 0.12.3

GiteaMirror added the bug label 2026-05-04 21:17:20 -05:00

GiteaMirror closed this issue

2026-05-04 21:17:22 -05:00

GiteaMirror commented

2026-05-04 21:17:24 -05:00

@rick-github commented on GitHub (Oct 6, 2025):

Server log may aid in debugging.

@rick-github commented on GitHub (Oct 6, 2025): [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.

GiteaMirror commented

2026-05-04 21:17:25 -05:00

@926nathant commented on GitHub (Oct 6, 2025):

ollama log.txt
ollama chat in openwebui.txt

@926nathant commented on GitHub (Oct 6, 2025): [ollama log.txt](https://github.com/user-attachments/files/22727516/ollama.log.txt) [ollama chat in openwebui.txt](https://github.com/user-attachments/files/22727517/ollama.chat.in.openwebui.txt)

GiteaMirror commented

2026-05-04 21:17:25 -05:00

@rick-github commented on GitHub (Oct 6, 2025):

Full server log may aid in debugging.

@rick-github commented on GitHub (Oct 6, 2025): Full [server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.

GiteaMirror commented

2026-05-04 21:17:26 -05:00

@926nathant commented on GitHub (Oct 6, 2025):

Was able to recreate in ollama native on Ubuntu 24.04...
see log.

ollama.log

@926nathant commented on GitHub (Oct 6, 2025): Was able to recreate in ollama native on Ubuntu 24.04... see log. [ollama.log](https://github.com/user-attachments/files/22731882/ollama.log)

GiteaMirror commented

2026-05-04 21:17:27 -05:00

@rick-github commented on GitHub (Oct 6, 2025):

The log shows two models being used, llama3:8b-instruct-q4_0 and deepseek-r1:1.5b-qwen-distill-q4_K_M. The llama model is loaded, does a few chat requests, and then there's a GPU error:

Oct 06 13:34:11 MagicMax ollama[31080]: time=2025-10-06T13:34:11.980-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42453"
Oct 06 13:34:13 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:13 | 200 |  1.671145967s |       127.0.0.1 | POST     "/api/generate"
Oct 06 13:34:25 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:25 | 200 |  2.528967921s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:35:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:35:58 | 200 |  7.797980106s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:38:01 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:38:01 | 200 |  8.493366225s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:39:18 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:39:18 | 200 |  8.661055958s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:40:06 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:40:06 | 200 |   8.65009734s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:40:43 MagicMax ollama[31080]: HW Exception by GPU node-1 (Agent handle: 0x556363024420) reason :GPU Hang

The llama model is reloaded a few minutes later, and then the deepseek model is loaded and does 12 requests, after which there's a quiet period and deepseek is loaded for one more final request.

Oct 06 13:44:49 MagicMax ollama[31080]: time=2025-10-06T13:44:49.629-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42349"
Oct 06 13:44:51 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:44:51 | 200 |  2.695380217s |       127.0.0.1 | POST     "/api/generate"
Oct 06 13:45:10 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:10 | 200 |  6.323452877s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 200 |      38.231µs |       127.0.0.1 | HEAD     "/"
Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 404 |      293.43µs |       127.0.0.1 | POST     "/api/show"
Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 | 22.875832301s |       127.0.0.1 | POST     "/api/pull"
Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 |   25.935859ms |       127.0.0.1 | POST     "/api/show"
Oct 06 13:45:57 MagicMax ollama[31080]: time=2025-10-06T13:45:57.163-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 38973"
Oct 06 13:45:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:58 | 200 |  1.362617748s |       127.0.0.1 | POST     "/api/generate"
Oct 06 13:46:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:21 | 200 |  536.541653ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:46:43 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:43 | 200 |  5.199556192s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:47:41 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:47:41 | 200 |  6.931446439s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:48:28 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:28 | 200 |  4.637018195s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:48:47 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:47 | 200 |  404.261863ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:04 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:04 | 200 |  424.644609ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:08 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:08 | 200 |  254.597441ms |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:20 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:20 | 200 |  3.255730187s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:49:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:57 | 200 |  4.074961155s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:50:42 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:50:42 | 200 |  4.979097927s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:54:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:21 | 200 |  8.476761648s |       127.0.0.1 | POST     "/api/chat"
Oct 06 13:54:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:57 | 200 | 10.229929529s |       127.0.0.1 | POST     "/api/chat"
Oct 06 14:21:57 MagicMax ollama[31080]: time=2025-10-06T14:21:57.005-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 37847"
Oct 06 14:21:59 MagicMax ollama[31080]: [GIN] 2025/10/06 - 14:21:59 | 200 |  2.263976261s |       127.0.0.1 | POST     "/api/chat"

Other than the hang (current issue in https://github.com/ollama/ollama/issues/12472) there's not much in the way of problem shown here. The requests are not taking a long time to run and there are no other error messages. If you set OLLAMA_DEBUG=1 in the server environment there might be more useful information logged.

@rick-github commented on GitHub (Oct 6, 2025): The log shows two models being used, llama3:8b-instruct-q4_0 and deepseek-r1:1.5b-qwen-distill-q4_K_M. The llama model is loaded, does a few chat requests, and then there's a GPU error: ``` Oct 06 13:34:11 MagicMax ollama[31080]: time=2025-10-06T13:34:11.980-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42453" Oct 06 13:34:13 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:13 | 200 | 1.671145967s | 127.0.0.1 | POST "/api/generate" Oct 06 13:34:25 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:34:25 | 200 | 2.528967921s | 127.0.0.1 | POST "/api/chat" Oct 06 13:35:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:35:58 | 200 | 7.797980106s | 127.0.0.1 | POST "/api/chat" Oct 06 13:38:01 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:38:01 | 200 | 8.493366225s | 127.0.0.1 | POST "/api/chat" Oct 06 13:39:18 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:39:18 | 200 | 8.661055958s | 127.0.0.1 | POST "/api/chat" Oct 06 13:40:06 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:40:06 | 200 | 8.65009734s | 127.0.0.1 | POST "/api/chat" Oct 06 13:40:43 MagicMax ollama[31080]: HW Exception by GPU node-1 (Agent handle: 0x556363024420) reason :GPU Hang ``` The llama model is reloaded a few minutes later, and then the deepseek model is loaded and does 12 requests, after which there's a quiet period and deepseek is loaded for one more final request. ``` Oct 06 13:44:49 MagicMax ollama[31080]: time=2025-10-06T13:44:49.629-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 42349" Oct 06 13:44:51 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:44:51 | 200 | 2.695380217s | 127.0.0.1 | POST "/api/generate" Oct 06 13:45:10 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:10 | 200 | 6.323452877s | 127.0.0.1 | POST "/api/chat" Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 200 | 38.231µs | 127.0.0.1 | HEAD "/" Oct 06 13:45:34 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:34 | 404 | 293.43µs | 127.0.0.1 | POST "/api/show" Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 | 22.875832301s | 127.0.0.1 | POST "/api/pull" Oct 06 13:45:56 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:56 | 200 | 25.935859ms | 127.0.0.1 | POST "/api/show" Oct 06 13:45:57 MagicMax ollama[31080]: time=2025-10-06T13:45:57.163-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 38973" Oct 06 13:45:58 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:45:58 | 200 | 1.362617748s | 127.0.0.1 | POST "/api/generate" Oct 06 13:46:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:21 | 200 | 536.541653ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:46:43 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:46:43 | 200 | 5.199556192s | 127.0.0.1 | POST "/api/chat" Oct 06 13:47:41 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:47:41 | 200 | 6.931446439s | 127.0.0.1 | POST "/api/chat" Oct 06 13:48:28 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:28 | 200 | 4.637018195s | 127.0.0.1 | POST "/api/chat" Oct 06 13:48:47 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:48:47 | 200 | 404.261863ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:04 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:04 | 200 | 424.644609ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:08 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:08 | 200 | 254.597441ms | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:20 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:20 | 200 | 3.255730187s | 127.0.0.1 | POST "/api/chat" Oct 06 13:49:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:49:57 | 200 | 4.074961155s | 127.0.0.1 | POST "/api/chat" Oct 06 13:50:42 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:50:42 | 200 | 4.979097927s | 127.0.0.1 | POST "/api/chat" Oct 06 13:54:21 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:21 | 200 | 8.476761648s | 127.0.0.1 | POST "/api/chat" Oct 06 13:54:57 MagicMax ollama[31080]: [GIN] 2025/10/06 - 13:54:57 | 200 | 10.229929529s | 127.0.0.1 | POST "/api/chat" Oct 06 14:21:57 MagicMax ollama[31080]: time=2025-10-06T14:21:57.005-04:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --port 37847" Oct 06 14:21:59 MagicMax ollama[31080]: [GIN] 2025/10/06 - 14:21:59 | 200 | 2.263976261s | 127.0.0.1 | POST "/api/chat" ``` Other than the hang (current issue in https://github.com/ollama/ollama/issues/12472) there's not much in the way of problem shown here. The requests are not taking a long time to run and there are no other error messages. If you set `OLLAMA_DEBUG=1` in the server environment there might be more useful information logged.

GiteaMirror commented

2026-05-04 21:17:28 -05:00

@926nathant commented on GitHub (Oct 7, 2025):

yes, that was what I was seeing (not much). I'll set the debug tonight and recreate.

@926nathant commented on GitHub (Oct 7, 2025): yes, that was what I was seeing (not much). I'll set the debug tonight and recreate.

GiteaMirror commented

2026-05-04 21:17:29 -05:00

@926nathant commented on GitHub (Oct 7, 2025):

I should clarify, my memory was that the llama3 model was difficult to get the error to show from but then I managed, to my surprise (I assumed it was a docker thing), to recreate it with llama3. Then I thought, because it happens much quicker with smaller models in docker (like deepseek-r1:1.5b and other sub 4bs), that I'd load the larger deepseek model. It hallucinated pretty hard, pretty quickly before it broke into the "GGGGGGGGGGGGGGGGGGGG" output.

@926nathant commented on GitHub (Oct 7, 2025): I should clarify, my memory was that the llama3 model was difficult to get the error to show from but then I managed, to my surprise (I assumed it was a docker thing), to recreate it with llama3. Then I thought, because it happens much quicker with smaller models in docker (like deepseek-r1:1.5b and other sub 4bs), that I'd load the larger deepseek model. It hallucinated pretty hard, pretty quickly before it broke into the "GGGGGGGGGGGGGGGGGGGG" output.

GiteaMirror commented

2026-05-04 21:17:30 -05:00

@926nathant commented on GitHub (Oct 8, 2025):

chat log and ollama log here. This seems very closely tied to context switching. Often it seems like when I'm changing the topic or getting weird in the conversation. In this chat I used deepseek-r1:1.5b because it's one of the first to break usually. Had a lot of trouble getting it to break by asking it questions about this issue so changed the topic to peanut butter sandwiches. The thinking part says it didn't catch that at all or wasn't trying to change the context. The next response, when I mentioned it failed to talk about peanut butter sandwiches it broke into the GGGGGGGGGGGGGGGGGGGG output.

ntennant MagicMax ollama chat-3.txt
ntennant MagicMax OLLAMA log -3.txt

@926nathant commented on GitHub (Oct 8, 2025): chat log and ollama log here. This seems very closely tied to context switching. Often it seems like when I'm changing the topic or getting weird in the conversation. In this chat I used deepseek-r1:1.5b because it's one of the first to break usually. Had a lot of trouble getting it to break by asking it questions about this issue so changed the topic to peanut butter sandwiches. The thinking part says it didn't catch that at all or wasn't trying to change the context. The next response, when I mentioned it failed to talk about peanut butter sandwiches it broke into the GGGGGGGGGGGGGGGGGGGG output. [ntennant MagicMax ollama chat-3.txt](https://github.com/user-attachments/files/22757005/ntennant.MagicMax.ollama.chat-3.txt) [ntennant MagicMax OLLAMA log -3.txt](https://github.com/user-attachments/files/22757004/ntennant.MagicMax.OLLAMA.log.-3.txt)

GiteaMirror commented

2026-05-04 21:17:30 -05:00

@926nathant commented on GitHub (Oct 10, 2025):

More logs attached. Here I attempted to bypass the GPU and run only on CPU

MagicMax:~/Downloads/gits$ systemctl stop ollama
MagicMax:~/Downloads/gits$ ROCR_VISIBLE_DEVICES="" OLLAMA_DEBUG=1 ollama serve

ollama-4.txt
ollama chat 4.txt

@926nathant commented on GitHub (Oct 10, 2025): More logs attached. Here I attempted to bypass the GPU and run only on CPU ``` MagicMax:~/Downloads/gits$ systemctl stop ollama MagicMax:~/Downloads/gits$ ROCR_VISIBLE_DEVICES="" OLLAMA_DEBUG=1 ollama serve ``` [ollama-4.txt](https://github.com/user-attachments/files/22858459/ntennant.MagicMax.ollama-4.txt) [ollama chat 4.txt](https://github.com/user-attachments/files/22858458/ntennant.MagicMax.ollama.chat.4.txt)

GiteaMirror commented

2026-05-04 21:17:30 -05:00

@926nathant commented on GitHub (Oct 10, 2025):

I used the CPU and could not recreate.
Stopped Ollama Completely:

systemctl stop ollama
# Double-check that no ollama processes are running
ps aux | grep ollama

Temporarily Hid the ROCm Library:
sudo mv /usr/local/lib/ollama/rocm /usr/local/lib/ollama/rocm_temp

Ran Ollama in Debug Mode:

OLLAMA_DEBUG=1 ollama serve

MagicMax ollama 5.txt

@926nathant commented on GitHub (Oct 10, 2025): I used the CPU and could not recreate. Stopped Ollama Completely: ``` systemctl stop ollama # Double-check that no ollama processes are running ps aux | grep ollama ``` Temporarily Hid the ROCm Library: ```sudo mv /usr/local/lib/ollama/rocm /usr/local/lib/ollama/rocm_temp``` Ran Ollama in Debug Mode: ``` OLLAMA_DEBUG=1 ollama serve ``` [MagicMax ollama 5.txt](https://github.com/user-attachments/files/22858616/MagicMax.ollama.5.txt)

GiteaMirror commented

2026-05-04 21:17:33 -05:00

@926nathant commented on GitHub (Oct 29, 2025):

Closing. Fixed with: https://github.com/ROCm/ROCm/issues/5499 Updating Ollama to 12.6 from 12.3 was the last thing.

@926nathant commented on GitHub (Oct 29, 2025): Closing. Fixed with: https://github.com/ROCm/ROCm/issues/5499 Updating Ollama to 12.6 from 12.3 was the last thing.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#70367