[GH-ISSUE #12209] Version 11 bombing out and responds with GGGGGGGGGGGGGGG #8124

Open
opened 2026-04-12 20:29:28 -05:00 by GiteaMirror · 25 comments
Owner

Originally created by @R1U2 on GitHub (Sep 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12209

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Hi, this started last week, When running Ollama in a docker env on my Jetson Orin Nano llama3.2 will after two or three responses using openwebui with GGGGGGGGGGGGGGGGGGGGGGGGG.
At first i thought it is due to the Jetson settings etc, but nothing fixed it. I then rolled my Ollama version back to 0.10 and it has been running stable. I just stopped my container and started it again and as it is set to latest pulled ver 0.11 again. I thought the issue smught have been addressed, unfortunately not. I will be rolling my version back to 0.10 again.

Relevant log output


OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.11 latest

Originally created by @R1U2 on GitHub (Sep 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12209 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Hi, this started last week, When running Ollama in a docker env on my Jetson Orin Nano llama3.2 will after two or three responses using openwebui with GGGGGGGGGGGGGGGGGGGGGGGGG. At first i thought it is due to the Jetson settings etc, but nothing fixed it. I then rolled my Ollama version back to 0.10 and it has been running stable. I just stopped my container and started it again and as it is set to latest pulled ver 0.11 again. I thought the issue smught have been addressed, unfortunately not. I will be rolling my version back to 0.10 again. ### Relevant log output ```shell ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.11 latest
GiteaMirror added the nvidiabugneeds more info labels 2026-04-12 20:29:28 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 9, 2025):

More context in #12142.
Also reported in the discord.

<!-- gh-comment-id:3271538336 --> @rick-github commented on GitHub (Sep 9, 2025): More context in #12142. Also reported in the [discord](https://discord.com/channels/1128867683291627614/1211804431340019753/threads/1408405019774156961).
Author
Owner

@dhiltgen commented on GitHub (Sep 9, 2025):

Which jetpack version are you running?
What did the "inference compute" line in the server log show?

<!-- gh-comment-id:3271542429 --> @dhiltgen commented on GitHub (Sep 9, 2025): Which jetpack version are you running? What did the "inference compute" line in the server log show?
Author
Owner

@R1U2 commented on GitHub (Sep 9, 2025):

Hi , my distro is the Jetpack orin nano super

I am running Ollama in a docker container from a docker-compose.yaml file.

I read on Redit a week ago looking into the reason why it is doing this from some other user on his distro. His issue was the same except that flash attention enabled was not in the ollama version he was using. He mentioned that once this was enabled his GGGGGGGGGGGGGGGGG responses went away. So basically the opposite of what we are experiencing now.

Looking at Ollama ver 0.11.8 flash attention enabled was now on permanently. I will run a test later today with ver 0.11.7 and see if i can recreate the GGGGGGGGGGGGGGGG response with that version, then i will do 0.11.8 to compare the two.

I will run it with deepseek r1 as well as llama3.2b.

Image
<!-- gh-comment-id:3271978854 --> @R1U2 commented on GitHub (Sep 9, 2025): Hi , my distro is the Jetpack orin nano super I am running Ollama in a docker container from a docker-compose.yaml file. I read on Redit a week ago looking into the reason why it is doing this from some other user on his distro. His issue was the same except that flash attention enabled was not in the ollama version he was using. He mentioned that once this was enabled his GGGGGGGGGGGGGGGGG responses went away. So basically the opposite of what we are experiencing now. Looking at Ollama ver 0.11.8 flash attention enabled was now on permanently. I will run a test later today with ver 0.11.7 and see if i can recreate the GGGGGGGGGGGGGGGG response with that version, then i will do 0.11.8 to compare the two. I will run it with deepseek r1 as well as llama3.2b. <img width="997" height="475" alt="Image" src="https://github.com/user-attachments/assets/904dee7b-91b0-4e7d-b898-c0038e569b54" />
Author
Owner

@R1U2 commented on GitHub (Sep 9, 2025):

Ok as promised started with Ollama 0.11.8 with llama3.2:latest on my Jetson. OpenWebai separate distro on omv7/docker and Intell Haswell processor.
Running from CLI gets the same response. If i ask another question or start a new conversation it will either be GGGGGGGGGGGG immediately or after the third question.

Image

Ran deepseek-r1:7b for a about 8 questions. No GGGGGGGGGGGGGGGGGGG responses, but twice when i change the subject gives me a response on the previous question.

Image

Cleared the chats and started with llama3.2:latest again. Managed to do 5 questions before giving me the GGGGGGGGGGGGG.

Cleared the chat and asked Qwen2.5-coder:3b some questions. Gave me GGGGGGGGGGGGGGGGGG after 5 Questions.

Image

Cleared and ran deepseek-r1:1.5b gave me GGGGGGGGGGGGGGGG after two questions.

Image

I will later roll back to Ollama version 0.11.7 and retest again.

<!-- gh-comment-id:3272159632 --> @R1U2 commented on GitHub (Sep 9, 2025): Ok as promised started with Ollama 0.11.8 with llama3.2:latest on my Jetson. OpenWebai separate distro on omv7/docker and Intell Haswell processor. Running from CLI gets the same response. If i ask another question or start a new conversation it will either be GGGGGGGGGGGG immediately or after the third question. <img width="1063" height="350" alt="Image" src="https://github.com/user-attachments/assets/86aa976c-e328-4684-b7a1-1bb438dd6101" /> Ran deepseek-r1:7b for a about 8 questions. No GGGGGGGGGGGGGGGGGGG responses, but twice when i change the subject gives me a response on the previous question. <img width="593" height="868" alt="Image" src="https://github.com/user-attachments/assets/ea397150-25a5-4419-8630-6afe7893371c" /> Cleared the chats and started with llama3.2:latest again. Managed to do 5 questions before giving me the GGGGGGGGGGGGG. Cleared the chat and asked Qwen2.5-coder:3b some questions. Gave me GGGGGGGGGGGGGGGGGG after 5 Questions. <img width="558" height="822" alt="Image" src="https://github.com/user-attachments/assets/1ea0af9b-ba6e-4c61-8f86-3c17764cba83" /> Cleared and ran deepseek-r1:1.5b gave me GGGGGGGGGGGGGGGG after two questions. <img width="356" height="836" alt="Image" src="https://github.com/user-attachments/assets/0fcdfbe5-fc58-49bf-b5b5-008a738e09c9" /> I will later roll back to Ollama version 0.11.7 and retest again.
Author
Owner

@dhiltgen commented on GitHub (Sep 9, 2025):

From the screenshots above, it looks like you're on Jetpack v6. Did we select the correct runtime in the "inference compute" log line? Something like this:

time=2025-09-09T14:23:40.179-07:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-67834ba8-0312-50b2-9286-9b3b02e80059 library=cuda variant=jetpack6 compute=8.7 driver=12.6 name=Orin total="61.4 GiB" available="51.2 GiB"
<!-- gh-comment-id:3272316465 --> @dhiltgen commented on GitHub (Sep 9, 2025): From the screenshots above, it looks like you're on Jetpack v6. Did we select the correct runtime in the "inference compute" log line? Something like this: ``` time=2025-09-09T14:23:40.179-07:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-67834ba8-0312-50b2-9286-9b3b02e80059 library=cuda variant=jetpack6 compute=8.7 driver=12.6 name=Orin total="61.4 GiB" available="51.2 GiB" ```
Author
Owner

@R1U2 commented on GitHub (Sep 9, 2025):

@dhiltgen i ran a test now on my Intel Nuc /omv7/docker/ollama 0.11.8 latest running llama3.2b . Using short questions the issue does not present itself. So not sure where i musty select the correct run time ?

<!-- gh-comment-id:3272569943 --> @R1U2 commented on GitHub (Sep 9, 2025): @dhiltgen i ran a test now on my Intel Nuc /omv7/docker/ollama 0.11.8 latest running llama3.2b . Using short questions the issue does not present itself. So not sure where i musty select the correct run time ?
Author
Owner

@HazmanNaim commented on GitHub (Sep 11, 2025):

Hi, I encountered a similar issue. I am running Ollama (Docker version 0.11.10) on a Jetson Orin and experimenting with LangGraph agents. For some unknown reason, Ollama starts responding with GGGG after a few interactions, regardless of which model is loaded. In one case, triggering a tool call immediately caused Ollama to respond with GGGG. However, if I run the Ollama on amd64 machine, ollama is stable and no issues.

So, I rolled back to Docker Ollama version 0.10.0, and the issue seems to have gone away. Probably the issue is something related to version 0.11 if it is running on Jetson.

<!-- gh-comment-id:3277458991 --> @HazmanNaim commented on GitHub (Sep 11, 2025): Hi, I encountered a similar issue. I am running Ollama (Docker version 0.11.10) on a Jetson Orin and experimenting with LangGraph agents. For some unknown reason, Ollama starts responding with GGGG after a few interactions, regardless of which model is loaded. In one case, triggering a tool call immediately caused Ollama to respond with GGGG. However, if I run the Ollama on amd64 machine, ollama is stable and no issues. So, I rolled back to Docker Ollama version 0.10.0, and the issue seems to have gone away. Probably the issue is something related to version 0.11 if it is running on Jetson.
Author
Owner

@v1ckxy commented on GitHub (Sep 16, 2025):

Same here. Orin Nano after two messages starts throwing G's

<!-- gh-comment-id:3294456974 --> @v1ckxy commented on GitHub (Sep 16, 2025): Same here. Orin Nano after two messages starts throwing G's
Author
Owner

@eschoell commented on GitHub (Sep 22, 2025):

I am having the same issue running on an Orin. The gpt-oss model runs fine, while any other will quickly -- if not immediately -- fail. Based on that, it seems that whatever was done to support the gpt-oss model is the cause.

<!-- gh-comment-id:3318146979 --> @eschoell commented on GitHub (Sep 22, 2025): I am having the same issue running on an Orin. The gpt-oss model runs fine, while any other will quickly -- if not immediately -- fail. Based on that, it seems that whatever was done to support the gpt-oss model is the cause.
Author
Owner

@thunderfm commented on GitHub (Sep 26, 2025):

Updated to 0.12.2 today and it seems to have been fixed. Tried a bunch of different models and they're all working well now.

<!-- gh-comment-id:3339028015 --> @thunderfm commented on GitHub (Sep 26, 2025): Updated to 0.12.2 today and it seems to have been fixed. Tried a bunch of different models and they're all working well now.
Author
Owner

@dhiltgen commented on GitHub (Sep 26, 2025):

@R1U2 please look in the server logs to see if Ollama auto-detected the correct runtime. This is not something you have to do, but Ollama is supposed to figure it out from information on the system. If we chose the wrong runtime, then gibberish responses (or crashing) will happen. Our troubleshooting guide explains how to find the logs https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

<!-- gh-comment-id:3339178213 --> @dhiltgen commented on GitHub (Sep 26, 2025): @R1U2 please look in the server logs to see if Ollama auto-detected the correct runtime. This is not something you have to do, but Ollama is supposed to figure it out from information on the system. If we chose the wrong runtime, then gibberish responses (or crashing) will happen. Our troubleshooting guide explains how to find the logs https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
Author
Owner

@eschoell commented on GitHub (Sep 27, 2025):

It has not been fixed. I am still experiencing the same problem with version 0.12.3.

<!-- gh-comment-id:3341940352 --> @eschoell commented on GitHub (Sep 27, 2025): It has not been fixed. I am still experiencing the same problem with version 0.12.3.
Author
Owner

@eschoell commented on GitHub (Sep 27, 2025):

I see the log, but I do not see where it talks about finding the runtime...

<!-- gh-comment-id:3341943864 --> @eschoell commented on GitHub (Sep 27, 2025): I see the log, but I do not see where it talks about finding the runtime...
Author
Owner

@rick-github commented on GitHub (Sep 27, 2025):

https://github.com/ollama/ollama/issues/12209#issuecomment-3271542429

Or post the log.

<!-- gh-comment-id:3342073824 --> @rick-github commented on GitHub (Sep 27, 2025): https://github.com/ollama/ollama/issues/12209#issuecomment-3271542429 Or post the log.
Author
Owner

@R1U2 commented on GitHub (Sep 28, 2025):

Hi All, been looking at the comments coming in. Busy away from home during the week and also playing around with comfyui on how to use it.

During this exercise i have learnt a few things about my Jetson Orin Nano. Ollama as well as Comfyui can be run directly like in Jetson containers or docker containers. Reason why i dont like jetson containers is that they are not persistant, when i reset the unit the container is gone, everything needs to be downloaded again when a new container is started. I dont know enough of running it in cli to make it run long enough to play around with it. Thus the need to run it in docker environment with portainer, and a decent docker-compose.yaml file to run it up. Then portainer to quick change the network or attached storage , GPU setting etc.

In saying that my previous Ollama container i ran, ran with i believe a bottleneck CPU, although it was still fast, i felt it was still not accessing the GPU on the Jetson as it should. Saw that on Jtop. Then getting Comfyui to run with a good docker-compose.yaml that performs and uses the gpu, i redid my Ollama compose file and started it up again. I realized that DustyNV's last version 34.4.0 version only had the Ollama/ollam:0.10.0 version in for jetson. I will now retest with my new compose setup version 0.11.8 and then do the new 0.12.2. If the results are the same i will post the logs as @dhiltgen requested.

Be back soon.

<!-- gh-comment-id:3342171689 --> @R1U2 commented on GitHub (Sep 28, 2025): Hi All, been looking at the comments coming in. Busy away from home during the week and also playing around with comfyui on how to use it. During this exercise i have learnt a few things about my Jetson Orin Nano. Ollama as well as Comfyui can be run directly like in Jetson containers or docker containers. Reason why i dont like jetson containers is that they are not persistant, when i reset the unit the container is gone, everything needs to be downloaded again when a new container is started. I dont know enough of running it in cli to make it run long enough to play around with it. Thus the need to run it in docker environment with portainer, and a decent docker-compose.yaml file to run it up. Then portainer to quick change the network or attached storage , GPU setting etc. In saying that my previous Ollama container i ran, ran with i believe a bottleneck CPU, although it was still fast, i felt it was still not accessing the GPU on the Jetson as it should. Saw that on Jtop. Then getting Comfyui to run with a good docker-compose.yaml that performs and uses the gpu, i redid my Ollama compose file and started it up again. I realized that DustyNV's last version 34.4.0 version only had the Ollama/ollam:0.10.0 version in for jetson. I will now retest with my new compose setup version 0.11.8 and then do the new 0.12.2. If the results are the same i will post the logs as @dhiltgen requested. Be back soon.
Author
Owner

@R1U2 commented on GitHub (Sep 28, 2025):

Ok test results are in.

Spun up ollama 0.11.8 to retest.

llama3.2b had no issues and i could ask it about twenty questions.
I moved over to deepseek-R1:1.5b , second question in i get the gggggggggg. Log file below.

_ollama_logs.txt

<!-- gh-comment-id:3342223260 --> @R1U2 commented on GitHub (Sep 28, 2025): Ok test results are in. Spun up ollama 0.11.8 to retest. llama3.2b had no issues and i could ask it about twenty questions. I moved over to deepseek-R1:1.5b , second question in i get the gggggggggg. Log file below. [_ollama_logs.txt](https://github.com/user-attachments/files/22579894/_ollama_logs.txt)
Author
Owner

@R1U2 commented on GitHub (Sep 28, 2025):

ran deepseek-r1:7b.

6 questions and it bombed out.
i changed the subject on 5 and asked it to tell me a joke., it replied but with no joke and previous line of questioning. question 6 answered with ggggggggggggg . log below does not show much.

_ollama_logs(1).txt

<!-- gh-comment-id:3342250916 --> @R1U2 commented on GitHub (Sep 28, 2025): ran deepseek-r1:7b. 6 questions and it bombed out. i changed the subject on 5 and asked it to tell me a joke., it replied but with no joke and previous line of questioning. question 6 answered with ggggggggggggg . log below does not show much. [_ollama_logs(1).txt](https://github.com/user-attachments/files/22580175/_ollama_logs.1.txt)
Author
Owner

@R1U2 commented on GitHub (Sep 28, 2025):

started a new chat with qwen , bombed out on second question. Log does not show much. below.

_ollama_logs(2).txt

<!-- gh-comment-id:3342253222 --> @R1U2 commented on GitHub (Sep 28, 2025): started a new chat with qwen , bombed out on second question. Log does not show much. below. [_ollama_logs(2).txt](https://github.com/user-attachments/files/22580186/_ollama_logs.2.txt)
Author
Owner

@R1U2 commented on GitHub (Sep 28, 2025):

Spun up Ollama 0.12.3 with llama3.2 latest. 6 questions in it bombs out with gggggggggg.
log below.

_ollama_logs.txt

@thunderfm - Still not fixed.

Will now revert back to 0.10.0 again until this is fixed. If there is anything you want me to assist with testing wise, let me know.

<!-- gh-comment-id:3342261680 --> @R1U2 commented on GitHub (Sep 28, 2025): Spun up Ollama 0.12.3 with llama3.2 latest. 6 questions in it bombs out with gggggggggg. log below. [_ollama_logs.txt](https://github.com/user-attachments/files/22580221/_ollama_logs.txt) @thunderfm - Still not fixed. Will now revert back to 0.10.0 again until this is fixed. If there is anything you want me to assist with testing wise, let me know.
Author
Owner

@eschoell commented on GitHub (Oct 22, 2025):

It seems that there should be enough info to proceed fixing this, correct? The issue still has the "needs more info" tag.

I have resorted to running the latest version in a Docker container (just for gpt-oss:20b) alongside the native build of v0.10 for everything else. This is clearly not a sustainable workaround.

<!-- gh-comment-id:3431681151 --> @eschoell commented on GitHub (Oct 22, 2025): It seems that there should be enough info to proceed fixing this, correct? The issue still has the "needs more info" tag. I have resorted to running the latest version in a Docker container (just for gpt-oss:20b) alongside the native build of v0.10 for *everything else*. This is clearly not a sustainable workaround.
Author
Owner

@dhiltgen commented on GitHub (Oct 22, 2025):

@R1U2 your logs aren't complete so I can't tell if this is a discovery problem where we're using the wrong CUDA runtime, or possibly over-committing GPU memory, or something else.

I believe you said you're using a container, so something like this should hopefully work: (adjust the flags if you need to)

docker run --rm -it --runtime=nvidia -e JETSON_JETPACK=6 -e OLLAMA_DEBUG=2 ollama/ollama 2>&1 | tee serve.log

As soon as you see the log line ... msg="inference compute" ... show up, ctrl-c the docker run and share that serve.log.

(I should also point out, if your container does not have JETSON_JETPACK=6 it's probable we're using the wrong runtime - see the note at https://github.com/ollama/ollama/blob/main/docs/docker.md#start-the-container)

<!-- gh-comment-id:3434367384 --> @dhiltgen commented on GitHub (Oct 22, 2025): @R1U2 your logs aren't complete so I can't tell if this is a discovery problem where we're using the wrong CUDA runtime, or possibly over-committing GPU memory, or something else. I believe you said you're using a container, so something like this should hopefully work: (adjust the flags if you need to) ``` docker run --rm -it --runtime=nvidia -e JETSON_JETPACK=6 -e OLLAMA_DEBUG=2 ollama/ollama 2>&1 | tee serve.log ``` As soon as you see the log line `... msg="inference compute" ...` show up, ctrl-c the docker run and share that serve.log. (I should also point out, if your container does not have JETSON_JETPACK=6 it's probable we're using the wrong runtime - see the note at https://github.com/ollama/ollama/blob/main/docs/docker.md#start-the-container)
Author
Owner

@undeadindustries commented on GitHub (Oct 26, 2025):

Just chiming in that I'm getting this exact same issue on nvidia dgx spark. version 0.12.3.

<!-- gh-comment-id:3448287020 --> @undeadindustries commented on GitHub (Oct 26, 2025): Just chiming in that I'm getting this exact same issue on nvidia dgx spark. version 0.12.3.
Author
Owner

@dhiltgen commented on GitHub (Nov 5, 2025):

@undeadindustries can you share more complete logs so we can try to isolate what's going wrong?

<!-- gh-comment-id:3488556098 --> @dhiltgen commented on GitHub (Nov 5, 2025): @undeadindustries can you share more complete logs so we can try to isolate what's going wrong?
Author
Owner

@undeadindustries commented on GitHub (Nov 10, 2025):

@dhiltgen absolutely. Just to make sure I'm giving you exactly what you want. Which command should I run for the logs and/or which log file would you like?

Thanks for looking into it!

<!-- gh-comment-id:3509021430 --> @undeadindustries commented on GitHub (Nov 10, 2025): @dhiltgen absolutely. Just to make sure I'm giving you exactly what you want. Which command should I run for the logs and/or which log file would you like? Thanks for looking into it!
Author
Owner

@dhiltgen commented on GitHub (Nov 12, 2025):

@undeadindustries make sure you're running the latest version, start the server with OLLAMA_DEBUG=2 and share the log from startup to the point where it reports "inference compute" so we can see why it's failing to discovery your GPU properly.

<!-- gh-comment-id:3522708029 --> @dhiltgen commented on GitHub (Nov 12, 2025): @undeadindustries make sure you're running the latest version, start the server with OLLAMA_DEBUG=2 and share the log from startup to the point where it reports "inference compute" so we can see why it's failing to discovery your GPU properly.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8124