[GH-ISSUE #11850] In Ollama app after first reply second prompt simply stucks - no response at all #7866

Open
opened 2026-04-12 20:01:35 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @nikolaevmaks on GitHub (Aug 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11850

What is the issue?

In Ollama app after first reply second prompt simply stucks - no response at all. App loads some time, then simply stops - i don't see any answer.
There are no such problems in LM Studio and Open WebUI, for example.

How to reproduce:

First prompt: "explain Sobel filter very detailed"

Then second prompt:
">1.2 Derivation of the Sobel kernels
explain very detailed" - no response

Model: gpt-oss:120b
Macbook M3 Max, 128 GB
Ollama app version: 0.11.4

You can reproduce it after any quite large first reply. There will be no second reply.

Image

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.11.4

Originally created by @nikolaevmaks on GitHub (Aug 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11850 ### What is the issue? In Ollama app after first reply second prompt simply stucks - no response at all. App loads some time, then simply stops - i don't see any answer. **There are no such problems in LM Studio and Open WebUI, for example.** **How to reproduce:** First prompt: "explain Sobel filter very detailed" Then second prompt: ">1.2 Derivation of the Sobel kernels explain very detailed" - no response Model: gpt-oss:120b Macbook M3 Max, 128 GB Ollama app version: 0.11.4 You can reproduce it after any quite large first reply. There will be no second reply. <img width="1017" height="975" alt="Image" src="https://github.com/user-attachments/assets/0819c3b3-1ba8-466c-83f9-e13fff06ed95" /> ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.11.4
GiteaMirror added the bug label 2026-04-12 20:01:35 -05:00
Author
Owner

@aole commented on GitHub (Aug 11, 2025):

reproducible in windows 20b as well ...

Image

but if I open the app back, I can see the reply sitting in the chat ...

Image
<!-- gh-comment-id:3176886676 --> @aole commented on GitHub (Aug 11, 2025): reproducible in windows 20b as well ... <img width="491" height="146" alt="Image" src="https://github.com/user-attachments/assets/92ec9c42-daa8-4e98-b8fc-0c7e851827b3" /> but if I open the app back, I can see the reply sitting in the chat ... <img width="1275" height="918" alt="Image" src="https://github.com/user-attachments/assets/1c83e7c6-8bc5-49d0-8f66-834c2bd61d7c" />
Author
Owner

@WiiliamC commented on GitHub (Aug 13, 2025):

Linux ollama docker, gpt-oss 20b stuck as well.

<!-- gh-comment-id:3181868696 --> @WiiliamC commented on GitHub (Aug 13, 2025): Linux ollama docker, gpt-oss 20b stuck as well.
Author
Owner

@AndreC10002 commented on GitHub (Aug 13, 2025):

I am experiencing a similar issue with GPT-OSS and Ollama on Docker. In my case, Ollama stops using the GPU(s) and only uses the CPU(s). That makes it feel like it is stuck.

Nothing on logs suggest the reason for abandoning the GPU(s), but you can see that the ollama process suddenly starts to use 4+ cores to process requests. No other processes are using the GPU(s).

I've:

  • restarted the Ollama docker;
  • upgraded the Ollama docker to the latest version (0.11.4);
  • restarted the physical server;

This is running on Debian 12.8, 192GB of RAM and 2x RTX A4000

<!-- gh-comment-id:3184336984 --> @AndreC10002 commented on GitHub (Aug 13, 2025): I am experiencing a similar issue with GPT-OSS and Ollama on Docker. In my case, Ollama stops using the GPU(s) and only uses the CPU(s). That makes it feel like it is stuck. Nothing on logs suggest the reason for abandoning the GPU(s), but you can see that the ollama process suddenly starts to use 4+ cores to process requests. No other processes are using the GPU(s). I've: - restarted the Ollama docker; - upgraded the Ollama docker to the latest version (0.11.4); - restarted the physical server; This is running on Debian 12.8, 192GB of RAM and 2x RTX A4000
Author
Owner

@eliciel0513 commented on GitHub (Aug 15, 2025):

Im using OpenWebUI, same issue with GPT-OSS 20B. Follow-up responses just remain processing. VRAM stays utilizyed and does not spill over to sytem RAM. But im getting the same in both OpenWebUI and the Ollama app with GPT-OSS 20B just staying processing/stuck, so its some kind of bug with ollama.

RTX 3090 24GB
i9 14900k
64GB RAM

<!-- gh-comment-id:3190384943 --> @eliciel0513 commented on GitHub (Aug 15, 2025): Im using OpenWebUI, same issue with GPT-OSS 20B. Follow-up responses just remain processing. VRAM stays utilizyed and does not spill over to sytem RAM. But im getting the same in both OpenWebUI and the Ollama app with GPT-OSS 20B just staying processing/stuck, so its some kind of bug with ollama. RTX 3090 24GB i9 14900k 64GB RAM
Author
Owner

@leedrake5 commented on GitHub (Aug 19, 2025):

I see this inconsistently too on Mac with 120b. It is often, but not always the second prompt - everything just stalls out. I can see the model is engaged by RAM/VRAM use and high energy draw - but no response is returned. The app has to be force quit and restarted to produce another response, and then again, then again, then again.

<!-- gh-comment-id:3199145932 --> @leedrake5 commented on GitHub (Aug 19, 2025): I see this inconsistently too on Mac with 120b. It is often, but not always the second prompt - everything just stalls out. I can see the model is engaged by RAM/VRAM use and high energy draw - but no response is returned. The app has to be force quit and restarted to produce another response, and then again, then again, then again.
Author
Owner

@anomaly256 commented on GitHub (Aug 21, 2025):

I'm having the same issue on a Linux host with 1tb RAM and 2x AMD Mi60's, Ollama 0.11.5. First prompt to any model that fits inside VRAM loads it and runs as expected. Any followup prompt to the same model times out, VRAM empties, and Ollama hangs

Edit: same behaviour in 0.11.6. In 0.11.4 I get the hang but the vram doesn't empty.
Edit: was originally on ROCM 6.3.3, issue persists after moving to ROCM 6.4.3 however. Likewise with testing various kernels between 6.8.12 to 6.14.4

<!-- gh-comment-id:3208616460 --> @anomaly256 commented on GitHub (Aug 21, 2025): I'm having the same issue on a Linux host with 1tb RAM and 2x AMD Mi60's, Ollama 0.11.5. First prompt to any model that fits inside VRAM loads it and runs as expected. Any followup prompt to the same model times out, VRAM empties, and Ollama hangs Edit: same behaviour in 0.11.6. In 0.11.4 I get the hang but the vram doesn't empty. Edit: was originally on ROCM 6.3.3, issue persists after moving to ROCM 6.4.3 however. Likewise with testing various kernels between 6.8.12 to 6.14.4
Author
Owner

@enporial commented on GitHub (Aug 31, 2025):

I am having this exact same issue. It started in the last few weeks. I can get a few back and forths in (2 or 3) then it just hangs. With thinking models I can see it stop mid thinking and just hang.

<!-- gh-comment-id:3239935384 --> @enporial commented on GitHub (Aug 31, 2025): I am having this exact same issue. It started in the last few weeks. I can get a few back and forths in (2 or 3) then it just hangs. With thinking models I can see it stop mid thinking and just hang.
Author
Owner

@leedrake5 commented on GitHub (Sep 14, 2025):

Hi all,

After testing, I'm almost certain this is due to the context limit being reached. Ollama just doesn't explain why the model stops producing responses in text chains. I think having a notification that responses can't be returned due to memory issues should occur.

If you want to stretch your limit, you can do so in settings.

<!-- gh-comment-id:3289951029 --> @leedrake5 commented on GitHub (Sep 14, 2025): Hi all, After testing, I'm almost certain this is due to the context limit being reached. Ollama just doesn't explain why the model stops producing responses in text chains. I think having a notification that responses can't be returned due to memory issues should occur. If you want to stretch your limit, you can do so in settings.
Author
Owner

@BillyOutlast commented on GitHub (Nov 21, 2025):

Still happening :/
Qwen-VL:4b

<!-- gh-comment-id:3564645348 --> @BillyOutlast commented on GitHub (Nov 21, 2025): Still happening :/ Qwen-VL:4b
Author
Owner

@jbourny commented on GitHub (Mar 6, 2026):

Qwen3.5:35b and 122b too, I've Asus Ascent GX10, the first day all was working well but now every time after the first reply I can't write anymore it stuck, even if I don't ask any questions the GPU usage is high for nothing. There is no issue if I use over API on Claude code or other.

<!-- gh-comment-id:4010823224 --> @jbourny commented on GitHub (Mar 6, 2026): Qwen3.5:35b and 122b too, I've Asus Ascent GX10, the first day all was working well but now every time after the first reply I can't write anymore it stuck, even if I don't ask any questions the GPU usage is high for nothing. There is no issue if I use over API on Claude code or other.
Author
Owner

@KintsugiUwU commented on GitHub (Mar 9, 2026):

Docker Open WebUI qwen3.5:0.8b stuck after second prompt. Only way to fix is to restart the docker.
i7-7700 32Gb no GPU

<!-- gh-comment-id:4023790141 --> @KintsugiUwU commented on GitHub (Mar 9, 2026): Docker Open WebUI qwen3.5:0.8b stuck after second prompt. Only way to fix is to restart the docker. i7-7700 32Gb no GPU
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7866