[GH-ISSUE #2225] Ollama stops generating output and fails to run models after a few minutes #1273

Closed
opened 2026-04-12 11:04:03 -05:00 by GiteaMirror · 54 comments
Owner

Originally created by @TheStarAlight on GitHub (Jan 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2225

Originally assigned to: @jmorganca on GitHub.

Hi, I'm running ollama on a Debian server and use the oterm as the interface.
After some chats (just less than 10 normal questions) the ollama fails to respond anymore and running ollama run mixtral just didn't success (it keeps loading).
I noted that the same issue happened, like in #1863 . Is there a solution at the moment? Also, I'm not the administrator of the server and I even don't know how to restart ollama 😂. The serve process seems to runs as another user named ollama. Can anyone tell me how to restart it?
To developers: I can provide some debug information if you need, just tell me how to do it.
Thanks :D

Originally created by @TheStarAlight on GitHub (Jan 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2225 Originally assigned to: @jmorganca on GitHub. Hi, I'm running ollama on a Debian server and use the oterm as the interface. After some chats (just less than 10 normal questions) the ollama fails to respond anymore and running `ollama run mixtral` just didn't success (it keeps loading). I noted that the same issue happened, like in #1863 . Is there a solution at the moment? Also, I'm not the administrator of the server and I even don't know how to restart ollama 😂. The serve process seems to runs as another user named ollama. Can anyone tell me how to restart it? To developers: I can provide some debug information if you need, just tell me how to do it. Thanks :D
GiteaMirror added the bug label 2026-04-12 11:04:03 -05:00
Author
Owner

@TheStarAlight commented on GitHub (Jan 27, 2024):

The model I'm running include mixtral:latest and wizard-math:70b.
I have access to an NVIDIA A100 PCI-e 80GB and the inputs are all simple sentences (no more than 100 words) and I ensure that nobody else is using the GPU (I see from nvitop).

<!-- gh-comment-id:1913030764 --> @TheStarAlight commented on GitHub (Jan 27, 2024): The model I'm running include mixtral:latest and wizard-math:70b. I have access to an NVIDIA A100 PCI-e 80GB and the inputs are all simple sentences (no more than 100 words) and I ensure that nobody else is using the GPU (I see from nvitop).
Author
Owner

@jmorganca commented on GitHub (Jan 27, 2024):

Hi @TheStarAlight, would it be possible to share which version of Ollama you are running? ollama -v will print this out. Thanks so much, and I'm sorry you hit this issue

<!-- gh-comment-id:1913031063 --> @jmorganca commented on GitHub (Jan 27, 2024): Hi @TheStarAlight, would it be possible to share which version of Ollama you are running? `ollama -v` will print this out. Thanks so much, and I'm sorry you hit this issue
Author
Owner

@TheStarAlight commented on GitHub (Jan 27, 2024):

@jmorganca Sure! The ollama version is 0.1.20, just installed three days ago via the shell script. Please tell me if you need more information :)

<!-- gh-comment-id:1913033845 --> @TheStarAlight commented on GitHub (Jan 27, 2024): @jmorganca Sure! The ollama version is 0.1.20, just installed three days ago via the shell script. Please tell me if you need more information :)
Author
Owner

@jmorganca commented on GitHub (Jan 27, 2024):

Would it be possible to test with the newest version 0.1.22, which should fix this? https://github.com/ollama/ollama/releases/tag/v0.1.22

You can download the latest version of Ollama here: https://ollama.ai/download

Keep me posted!

<!-- gh-comment-id:1913057760 --> @jmorganca commented on GitHub (Jan 27, 2024): Would it be possible to test with the newest version 0.1.22, which should fix this? https://github.com/ollama/ollama/releases/tag/v0.1.22 You can download the latest version of Ollama here: https://ollama.ai/download Keep me posted!
Author
Owner

@glorat commented on GitHub (Jan 30, 2024):

Is this a dupe issue of #1458 ?

Happened to me too on 0.1.22 with mistral on MacOS. Will post again if I can find a way to reproduce.

<!-- gh-comment-id:1916061972 --> @glorat commented on GitHub (Jan 30, 2024): Is this a dupe issue of #1458 ? Happened to me too on 0.1.22 with mistral on MacOS. Will post again if I can find a way to reproduce.
Author
Owner

@TheStarAlight commented on GitHub (Jan 30, 2024):

@glorat I think so, it seems this problem happens on all platforms (linux, macOS and WSL).

<!-- gh-comment-id:1916093684 --> @TheStarAlight commented on GitHub (Jan 30, 2024): @glorat I think so, it seems this problem happens on all platforms (linux, macOS and WSL).
Author
Owner

@TheStarAlight commented on GitHub (Jan 30, 2024):

@jmorganca I'm sorry that I'm not the administrator of the server and the administrator has not responded to my request😂. I'll try it on my own computer (but it can only run <4b models, even the mistral got very slow after the first evaluation) before the ollama on the server gets updated.
Btw, how can I restart the ollama server process😂? It is started by the user ollama and I cannot stop it without administrator privilege. The process has been hanging on the server for a few days and I just cannot find a way to stop it.
Thank you!

<!-- gh-comment-id:1916106299 --> @TheStarAlight commented on GitHub (Jan 30, 2024): @jmorganca I'm sorry that I'm not the administrator of the server and the administrator has not responded to my request😂. I'll try it on my own computer (but it can only run <4b models, even the mistral got very slow after the first evaluation) before the ollama on the server gets updated. Btw, how can I restart the ollama server process😂? It is started by the user ollama and I cannot stop it without administrator privilege. The process has been hanging on the server for a few days and I just cannot find a way to stop it. Thank you!
Author
Owner

@iplayfast commented on GitHub (Jan 30, 2024):

@jmorganca I can confirm that my memory issues have seemed to gone away with my stress test. https://github.com/ollama/ollama/issues/1691 Other issues have surfaced, but I think the ollama version 0.1.22 is a winner.

<!-- gh-comment-id:1916166816 --> @iplayfast commented on GitHub (Jan 30, 2024): @jmorganca I can confirm that my memory issues have seemed to gone away with my stress test. https://github.com/ollama/ollama/issues/1691 Other issues have surfaced, but I think the ollama version 0.1.22 is a winner.
Author
Owner

@tarbard commented on GitHub (Jan 31, 2024):

I'm seeing this behaviour on 0.1.22 too
After a few interactions (in this case codellama 70b) the API stops responding to ollama-webui and "ollama run codellama:70b-instruct-q4_K_M" just shows the loading animation and never starts.

journalctl -u ollama doesn't show any errors, just the last successful calls, is there any way to see more detailed logs?

"systemctl restart ollama" eventually restarts ollama but it takes quite a while

<!-- gh-comment-id:1918617194 --> @tarbard commented on GitHub (Jan 31, 2024): I'm seeing this behaviour on 0.1.22 too After a few interactions (in this case codellama 70b) the API stops responding to ollama-webui and "ollama run codellama:70b-instruct-q4_K_M" just shows the loading animation and never starts. journalctl -u ollama doesn't show any errors, just the last successful calls, is there any way to see more detailed logs? "systemctl restart ollama" eventually restarts ollama but it takes quite a while
Author
Owner

@thexclu commented on GitHub (Jan 31, 2024):

I have the same issue, running version 0.1.22 with mistral

<!-- gh-comment-id:1918979382 --> @thexclu commented on GitHub (Jan 31, 2024): I have the same issue, running version 0.1.22 with mistral
Author
Owner

@adriancbo commented on GitHub (Feb 4, 2024):

I am experiencing the same issue while running the technovangelist airenamer on version 0.1.23 with any llava. It functions initially but then hangs after a few minutes, causing the CPU usage to reach 100%. Consequently, I am unable to run any models. My system configuration is as follows:

  • Ubuntu 22.04
  • 2x Nvidia 4090 GPUs
  • 512GB RAM
<!-- gh-comment-id:1925583608 --> @adriancbo commented on GitHub (Feb 4, 2024): I am experiencing the same issue while running the technovangelist airenamer on version `0.1.23` with any llava. It functions initially but then hangs after a few minutes, causing the CPU usage to reach 100%. Consequently, I am unable to run any models. My system configuration is as follows: - Ubuntu 22.04 - 2x Nvidia 4090 GPUs - 512GB RAM
Author
Owner

@TheStarAlight commented on GitHub (Feb 7, 2024):

@jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem?

<!-- gh-comment-id:1932272888 --> @TheStarAlight commented on GitHub (Feb 7, 2024): @jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem?
Author
Owner

@TheStarAlight commented on GitHub (Feb 7, 2024):

@jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem?

I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ...

<!-- gh-comment-id:1932286100 --> @TheStarAlight commented on GitHub (Feb 7, 2024): > @jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem? I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ...
Author
Owner

@lukebelbina commented on GitHub (Feb 8, 2024):

I am having the same issue with latest version 0.1.24. I works for a few minutes then eventually starts hanging on every request.

<!-- gh-comment-id:1934820777 --> @lukebelbina commented on GitHub (Feb 8, 2024): I am having the same issue with latest version 0.1.24. I works for a few minutes then eventually starts hanging on every request.
Author
Owner

@coolrazor007 commented on GitHub (Feb 10, 2024):

I'm seeing this on 0.1.24 as well. How far back should I rollback in the interim? Anyone know when this was introduced?

<!-- gh-comment-id:1937083036 --> @coolrazor007 commented on GitHub (Feb 10, 2024): I'm seeing this on 0.1.24 as well. How far back should I rollback in the interim? Anyone know when this was introduced?
Author
Owner

@jmorganca commented on GitHub (Feb 11, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

<!-- gh-comment-id:1937824050 --> @jmorganca commented on GitHub (Feb 11, 2024): Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!
Author
Owner

@lukebelbina commented on GitHub (Feb 12, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

I am sending the same preprompt with different user message, one after another (about every 1-2 second) using llama:17b. It crashes 100% of the time within about 10 minutes.

<!-- gh-comment-id:1939263463 --> @lukebelbina commented on GitHub (Feb 12, 2024): > Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks! I am sending the same preprompt with different user message, one after another (about every 1-2 second) using llama:17b. It crashes 100% of the time within about 10 minutes.
Author
Owner

@lips85 commented on GitHub (Feb 13, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

I worked on ollama v0.124 on mac m3 max 64gb
The model worked with two models, mistral:latest and openhermes:latest, and after performing the same task several times, the CPU usage increased to 99% and stopped.

I confirmed that it was working with the GPU before the operation stopped.
Before checking the github issue, I thought it was a problem that only occurred on a specific OS (Mac silicon), but it seems to be a problem that occurs regardless of platform.

<!-- gh-comment-id:1940487833 --> @lips85 commented on GitHub (Feb 13, 2024): > Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks! I worked on `ollama v0.124` on `mac m3 max 64gb` The model worked with two models, `mistral:latest` and `openhermes:latest`, and after performing the same task several times, the CPU usage increased to 99% and stopped. I confirmed that it was working with the GPU before the operation stopped. Before checking the github issue, I thought it was a problem that only occurred on a specific OS (Mac silicon), but it seems to be a problem that occurs regardless of platform.
Author
Owner

@TheStarAlight commented on GitHub (Feb 13, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

@jmorganca Hi, thank you for your attention. I was just doing regular chats using ollama-webui (just like using ChatGPT). But now I cannot reproduce my previous chat anymore, I just had a chat with qwen:72b with longer than 2000 Chinese characters and the problem seemed gone away. But one thing is for sure, in my previous situation (ollama 0.1.22):

I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ...

it seemed that this chat was "poisonous" and the next prompt would crash every ollama server (at lease my 2 servers) in the first run. I'll comment if I find another similar occasion :D

<!-- gh-comment-id:1941441513 --> @TheStarAlight commented on GitHub (Feb 13, 2024): > Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks! @jmorganca Hi, thank you for your attention. I was just doing regular chats using ollama-webui (just like using ChatGPT). But now I cannot reproduce my previous chat anymore, I just had a chat with qwen:72b with longer than 2000 Chinese characters and the problem seemed gone away. But one thing is for sure, in my previous situation (ollama 0.1.22): > I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ... it seemed that this chat was "poisonous" and the next prompt would crash every ollama server (at lease my 2 servers) in the first run. I'll comment if I find another similar occasion :D
Author
Owner

@timiil commented on GitHub (Feb 19, 2024):

seems we are faceing the same problem in ubuntu, no matter docker env or directly deploy ollama service , after we call the ollama http endpoint serval times, ollama http service will be hang up.

<!-- gh-comment-id:1951781584 --> @timiil commented on GitHub (Feb 19, 2024): seems we are faceing the same problem in ubuntu, no matter docker env or directly deploy ollama service , after we call the ollama http endpoint serval times, ollama http service will be hang up.
Author
Owner

@TheStarAlight commented on GitHub (Feb 19, 2024):

Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log?

<!-- gh-comment-id:1952083526 --> @TheStarAlight commented on GitHub (Feb 19, 2024): Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log?
Author
Owner

@mjspeck commented on GitHub (Feb 19, 2024):

I think I'm running into this issue as well.

<!-- gh-comment-id:1952925742 --> @mjspeck commented on GitHub (Feb 19, 2024): I think I'm running into this issue as well.
Author
Owner

@Sinan-Karakaya commented on GitHub (Feb 22, 2024):

I am running on the same issue, using mistral with a pre-prompt with a Mac M1 chip. After a couple of generation, the server will not respond until I kill my request

<!-- gh-comment-id:1959184341 --> @Sinan-Karakaya commented on GitHub (Feb 22, 2024): I am running on the same issue, using mistral with a pre-prompt with a Mac M1 chip. After a couple of generation, the server will not respond until I kill my request
Author
Owner

@bennylam commented on GitHub (Feb 29, 2024):

Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log?
I have run the Ollama (version 0.1.23) with model llama2:latest and mistral:latest without problem in long time if I just use English for the chat conversation. However, when I try to ask (in English) the chat query that instruct the model to reply in Asian language (e.g. Chinese, Vietnamese or Thai), it will get stuck or freeze in most of case after a few conversations. You can still see Ollama is running by http://127.0.0.1:11434/ but whatever you type in the command shell prompt (even in English) you get no response at all. Not until you reboot the whole system (WSL on Windows 10 in my case), you cannot get Ollama to response again.
I think you can reproduce the problem in this way.

<!-- gh-comment-id:1970891493 --> @bennylam commented on GitHub (Feb 29, 2024): > Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log? I have run the Ollama (version 0.1.23) with model llama2:latest and mistral:latest without problem in long time if I just use English for the chat conversation. However, when I try to ask (in English) the chat query that instruct the model to reply in Asian language (e.g. Chinese, Vietnamese or Thai), it will get stuck or freeze in most of case after a few conversations. You can still see Ollama is running by http://127.0.0.1:11434/ but whatever you type in the command shell prompt (even in English) you get no response at all. Not until you reboot the whole system (WSL on Windows 10 in my case), you cannot get Ollama to response again. I think you can reproduce the problem in this way.
Author
Owner

@gOATiful commented on GitHub (Feb 29, 2024):

We encounter the same problem on ubuntu 20.04.6 LTS.

<!-- gh-comment-id:1970896548 --> @gOATiful commented on GitHub (Feb 29, 2024): We encounter the same problem on ubuntu 20.04.6 LTS.
Author
Owner

@justinwaltrip commented on GitHub (Feb 29, 2024):

I was able to fix this issue by removing the JSON formatting parameter for /api/generate calls

<!-- gh-comment-id:1971701615 --> @justinwaltrip commented on GitHub (Feb 29, 2024): I was able to fix this issue by removing the JSON formatting parameter for /api/generate calls
Author
Owner

@antonsapt4 commented on GitHub (Mar 1, 2024):

Have the same problem, I'm on mac M2 running ollama desktop version 0.1.27
Using gemma:7b-instruct-q6_K

First boot is run, doing some curl test just to make sure it works fine. But after idle whenever sending curl again when the model boot offloading to metal, it hang and restart my Macbook. Its happening all the time.

Can somebody shed some light?, should I uninstall and download a new Ollama or is there any setting that can fix this issues?

PS:
@justinwaltrip on Ollama desktop how to remove the JSON formatting?

<!-- gh-comment-id:1973479510 --> @antonsapt4 commented on GitHub (Mar 1, 2024): Have the same problem, I'm on mac M2 running ollama desktop version 0.1.27 Using gemma:7b-instruct-q6_K First boot is run, doing some curl test just to make sure it works fine. But after idle whenever sending curl again when the model boot offloading to metal, it hang and restart my Macbook. Its happening all the time. Can somebody shed some light?, should I uninstall and download a new Ollama or is there any setting that can fix this issues? PS: @justinwaltrip on Ollama desktop how to remove the JSON formatting?
Author
Owner

@koleshjr commented on GitHub (Mar 5, 2024):

I am having the same issue even on the new version: 0.1.28 . This happens after 200 iterations on a custom finetuned 4 bit mistral on collabs free tier t4

<!-- gh-comment-id:1978857557 --> @koleshjr commented on GitHub (Mar 5, 2024): I am having the same issue even on the new version: 0.1.28 . This happens after 200 iterations on a custom finetuned 4 bit mistral on collabs free tier t4
Author
Owner

@deadmanoz commented on GitHub (Mar 8, 2024):

We are experiencing the same issue on 0.1.28, using official Docker image on Ubuntu 22.04 with 8x RTX A40000.

Running llava:34b with images as part of the request.

Will successfully process infrequent requests than suddenly hang on some request and be unresponsive to request until container is restarted.

Requests are being made to /api/generate endpoint across network, with stream False in request.

ollama responds to health checks.

<!-- gh-comment-id:1985091215 --> @deadmanoz commented on GitHub (Mar 8, 2024): We are experiencing the same issue on `0.1.28`, using official Docker image on Ubuntu 22.04 with 8x RTX A40000. Running `llava:34b` with images as part of the request. Will successfully process infrequent requests than suddenly hang on some request and be unresponsive to request until container is restarted. Requests are being made to `/api/generate` endpoint across network, with stream `False` in request. ollama responds to health checks.
Author
Owner

@deadmanoz commented on GitHub (Mar 8, 2024):

I was just watching this as it hit the issue.

CPU of 1 core hits 100%.

It will just hang here now until I restart the container. This is the output of docker logs ollama

encode_image_[GIN] 2024/03/08 - 06:09:12 | 200 | 19.031746157s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:09:13.001Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:09:26 | 200 | 18.617710773s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:09:49.931Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:10:04 | 200 | 14.109569108s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:13:45.049Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:13:58 | 200 | 13.486728309s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:14:55.013Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:15:07 | 200 | 12.661186052s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:16:47.278Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"

nvtop:
Screenshot_8_3_2024__2_19 pm

htop
Menubar

Please advise if I can provide any further logs or assist with debugging this issue

<!-- gh-comment-id:1985109004 --> @deadmanoz commented on GitHub (Mar 8, 2024): I was just watching this as it hit the issue. CPU of 1 core hits 100%. It will just hang here now until I restart the container. This is the output of `docker logs ollama` ``` encode_image_[GIN] 2024/03/08 - 06:09:12 | 200 | 19.031746157s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:09:13.001Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:09:26 | 200 | 18.617710773s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:09:49.931Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:10:04 | 200 | 14.109569108s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:13:45.049Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:13:58 | 200 | 13.486728309s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:14:55.013Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:15:07 | 200 | 12.661186052s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:16:47.278Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" ``` nvtop: <img width="654" alt="Screenshot_8_3_2024__2_19 pm" src="https://github.com/ollama/ollama/assets/62584182/ef14084e-2789-4168-99c9-9890e26a2062"> htop <img width="2044" alt="Menubar" src="https://github.com/ollama/ollama/assets/62584182/1150e23d-1171-4ce6-92dc-07f9b2b61943"> Please advise if I can provide any further logs or assist with debugging this issue
Author
Owner

@seanmavley commented on GitHub (Mar 9, 2024):

@justinwaltrip on Ollama desktop how to remove the JSON formatting?

@antonsapt4 Removing the json format param appears to work without issues for me.

Issue may be related to #1910

<!-- gh-comment-id:1986843692 --> @seanmavley commented on GitHub (Mar 9, 2024): > @justinwaltrip on Ollama desktop how to remove the JSON formatting? @antonsapt4 Removing the json format param appears to work without issues for me. Issue may be related to #1910
Author
Owner

@deadmanoz commented on GitHub (Mar 9, 2024):

Note: I don't use "format": "json" and have this issue..

<!-- gh-comment-id:1986845667 --> @deadmanoz commented on GitHub (Mar 9, 2024): Note: I don't use `"format": "json"` and have this issue..
Author
Owner

@jmccrosky commented on GitHub (Mar 17, 2024):

I also have this issue on an M3 Max. It seems to be somewhat random, but tends to happen more quickly with larger model or larget prompts. For example, with one model, including only one image per prompt works consistently, but including two per prompt will trigger this issue after some runs... With a larger model, it happens even with only one image.

<!-- gh-comment-id:2002431927 --> @jmccrosky commented on GitHub (Mar 17, 2024): I also have this issue on an M3 Max. It seems to be somewhat random, but tends to happen more quickly with larger model or larget prompts. For example, with one model, including only one image per prompt works consistently, but including two per prompt will trigger this issue after some runs... With a larger model, it happens even with only one image.
Author
Owner

@niyogrv commented on GitHub (Mar 20, 2024):

#1863 and this seems to be the same issue.

<!-- gh-comment-id:2009333096 --> @niyogrv commented on GitHub (Mar 20, 2024): #1863 and this seems to be the same issue.
Author
Owner

@deadmanoz commented on GitHub (Mar 28, 2024):

Yes, looks to be the same issue!

🤞 that https://github.com/ollama/ollama/issues/2225 is the resolution!

<!-- gh-comment-id:2024287946 --> @deadmanoz commented on GitHub (Mar 28, 2024): Yes, looks to be the same issue! 🤞 that https://github.com/ollama/ollama/issues/2225 is the resolution!
Author
Owner

@Master-Lucas commented on GitHub (Apr 3, 2024):

Even in Ollama version 0.1.30, Japanese text generation stops. I've tried it once with the lightweight "mistral" and three times with "dolphin-mistral" (all 4q), and in total, at the point of failing generation, throwing all the input and output Japanese characters into the token counter, it stops around 3400-3600 tokens. It feels like it hangs up after generating longer texts for about 6-9 turns (Mac Sonoma14.4.1, 64GB). P.S. using terminal with this experiments, but using PageAssist webui also the same issue occures.

<!-- gh-comment-id:2034078331 --> @Master-Lucas commented on GitHub (Apr 3, 2024): Even in Ollama version 0.1.30, Japanese text generation stops. I've tried it once with the lightweight "mistral" and three times with "dolphin-mistral" (all 4q), and in total, at the point of failing generation, throwing all the input and output Japanese characters into the token counter, it stops around 3400-3600 tokens. It feels like it hangs up after generating longer texts for about 6-9 turns (Mac Sonoma14.4.1, 64GB). P.S. using terminal with this experiments, but using PageAssist webui also the same issue occures.
Author
Owner

@Mecil9 commented on GitHub (Apr 7, 2024):

I have the same question. When I run ollama on apple M1 Max. My activity monitor shows 100% CPU usage and 0% GPU usage, and after running for a while ollama becomes unresponsive. Don't know what caused it.
Once the CPU reaches 100%, ollama will stop working. I have tried many methods to no avail!

<!-- gh-comment-id:2041470696 --> @Mecil9 commented on GitHub (Apr 7, 2024): I have the same question. When I run ollama on apple M1 Max. My activity monitor shows 100% CPU usage and 0% GPU usage, and after running for a while ollama becomes unresponsive. Don't know what caused it. Once the CPU reaches 100%, ollama will stop working. I have tried many methods to no avail!
Author
Owner

@jmorganca commented on GitHub (Apr 15, 2024):

Hi all, this should be fixed in 0.1.31 (hanging when unicode characters are in the prompt). Further fixes for hanging are also in 0.1.32 - stay tuned!

<!-- gh-comment-id:2057619897 --> @jmorganca commented on GitHub (Apr 15, 2024): Hi all, this should be fixed in 0.1.31 (hanging when unicode characters are in the prompt). Further fixes for hanging are also in 0.1.32 - stay tuned!
Author
Owner

@Destroyer commented on GitHub (Apr 20, 2024):

running the 0.1.32 and the issue still persists. using CPU AVX on Debian 12 with llama3 model, gets stuck before my proxy times out after 60 seconds, reloading the page and typing the query again fixes it but it is very annoying

<!-- gh-comment-id:2067755841 --> @Destroyer commented on GitHub (Apr 20, 2024): running the 0.1.32 and the issue still persists. using CPU AVX on Debian 12 with llama3 model, gets stuck before my proxy times out after 60 seconds, reloading the page and typing the query again fixes it but it is very annoying
Author
Owner

@NikitaDeveloperAI commented on GitHub (Jun 3, 2024):

Currently running Ollama 0.1.41, sadly this problem still persists.

<!-- gh-comment-id:2146301859 --> @NikitaDeveloperAI commented on GitHub (Jun 3, 2024): Currently running Ollama 0.1.41, sadly this problem still persists.
Author
Owner

@seanmavley commented on GitHub (Jun 3, 2024):

@NikitaDeveloperAI this problem may hardly get a universal fix as it's like a whack-a-mole game.

In our case, we stopped using the format='json", and explicitly wrote a prompt that output in the exact json structure we want.

For now, that appears to work with some level of predictability and consistency. Testing it more, but so far, that approach doesn't cause hanging issues with Ollama.

<!-- gh-comment-id:2146307197 --> @seanmavley commented on GitHub (Jun 3, 2024): @NikitaDeveloperAI this problem may hardly get a universal fix as it's like a whack-a-mole game. In our case, we stopped using the `format='json"`, and explicitly wrote a prompt that output in the exact json structure we want. For now, that appears to work with some level of predictability and consistency. Testing it more, but so far, that approach doesn't cause hanging issues with Ollama.
Author
Owner

@thistlillo commented on GitHub (Mar 24, 2025):

I am using ollama version is 0.6.1 on a linux "Rocky Linux 8.10 (Green Obsidian)", machine with four H100 80GB, I configure several Ollama servers, 1 per gpu. Very often it freezes.

With the models available today:

  • Llama 3.3: it freezes every once in a while
  • gemma3:latest it always freezes
  • qwen2.5:latest it always freezes
<!-- gh-comment-id:2748992361 --> @thistlillo commented on GitHub (Mar 24, 2025): I am using ollama version is 0.6.1 on a linux "Rocky Linux 8.10 (Green Obsidian)", machine with four H100 80GB, I configure several Ollama servers, 1 per gpu. Very often it freezes. With the models available today: - Llama 3.3: it freezes every once in a while - gemma3:latest it always freezes - qwen2.5:latest it always freezes
Author
Owner

@fireblade2534 commented on GitHub (Mar 25, 2025):

same here with 0.6.2 and Gemma 3 on two A4500 gpu's

<!-- gh-comment-id:2751854465 --> @fireblade2534 commented on GitHub (Mar 25, 2025): same here with 0.6.2 and Gemma 3 on two A4500 gpu's
Author
Owner

@chm-dev commented on GitHub (Apr 11, 2025):

RTX 3090, Windows 11
Exactly same thing as @thistlillo

qwen2.5-coder 32b freezes almost instantly after loading. Sometimes you might be able to actually have one or two chats before it freezes.

<!-- gh-comment-id:2798062834 --> @chm-dev commented on GitHub (Apr 11, 2025): RTX 3090, Windows 11 Exactly same thing as @thistlillo [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder) 32b freezes almost instantly after loading. Sometimes you might be able to actually have one or two chats before it freezes.
Author
Owner

@festivus37 commented on GitHub (Apr 15, 2025):

Same. Ollama 0.6.5.
M3 Ultra 512 gigs.
First noticed it on Deepseek R1 671b q4 after a few queries, but then switched to Gemma 3 27b q8. It gets through about 7-10 requests and then starts freezing. The memory of the model is in use, the gpu is pegged permanently, no outputs via the api are being sent. Qutting Ollama fixes it for a short while, then it's back to the same behavior.

<!-- gh-comment-id:2803700885 --> @festivus37 commented on GitHub (Apr 15, 2025): Same. Ollama 0.6.5. M3 Ultra 512 gigs. First noticed it on Deepseek R1 671b q4 after a few queries, but then switched to Gemma 3 27b q8. It gets through about 7-10 requests and then starts freezing. The memory of the model is in use, the gpu is pegged permanently, no outputs via the api are being sent. Qutting Ollama fixes it for a short while, then it's back to the same behavior.
Author
Owner

@softmarshmallow commented on GitHub (Apr 25, 2025):

Same here. Running M1 Max, gemma3:27b freezes 100% after 20 ~ 30 min. I though this was related to screen saver, but it was simply randomly freezing. very annoying, need to run automation over night, no way..

<!-- gh-comment-id:2830987975 --> @softmarshmallow commented on GitHub (Apr 25, 2025): Same here. Running M1 Max, gemma3:27b freezes 100% after 20 ~ 30 min. I though this was related to screen saver, but it was simply randomly freezing. very annoying, need to run automation over night, no way..
Author
Owner

@Sven1403 commented on GitHub (Apr 28, 2025):

Same. Ollama 0.6.6 with a A16 vGPU.

First it works fine and then freeze. With ollama ps i see the model running or being stucked in "Stopping...". Only way to resolve is a reboot of the PC. Killing the ollama process isnt working.

<!-- gh-comment-id:2834983659 --> @Sven1403 commented on GitHub (Apr 28, 2025): Same. Ollama 0.6.6 with a A16 vGPU. First it works fine and then freeze. With ollama ps i see the model running or being stucked in "Stopping...". Only way to resolve is a reboot of the PC. Killing the ollama process isnt working.
Author
Owner

@remidebette commented on GitHub (Jun 4, 2025):

Hi guys,
Same issues with ollama deployed on A100 40GB
(in a kubernetes environment: helm chart 1.19.0 deploying ollama v0.9.0)

<!-- gh-comment-id:2940333189 --> @remidebette commented on GitHub (Jun 4, 2025): Hi guys, Same issues with ollama deployed on A100 40GB (in a kubernetes environment: helm chart 1.19.0 deploying ollama v0.9.0)
Author
Owner

@BillShiyaoZhang commented on GitHub (Jul 11, 2025):

Same issue. Ollama v0.9.6 on Mac M4 Pro

<!-- gh-comment-id:3062390380 --> @BillShiyaoZhang commented on GitHub (Jul 11, 2025): Same issue. Ollama v0.9.6 on Mac M4 Pro
Author
Owner

@czaku commented on GitHub (Aug 30, 2025):

same for me ollama 0.11.8 on latest macos, spinning but doesn't generate any output

<!-- gh-comment-id:3239211668 --> @czaku commented on GitHub (Aug 30, 2025): same for me ollama 0.11.8 on latest macos, spinning but doesn't generate any output
Author
Owner

@spacetime-labs commented on GitHub (Sep 8, 2025):

Same issue here on version 0.11.10 of MacOS. I use the ollama app, and models stop thinking and or responding. Remain stuck in 'loading', when I go back to it loading seems to have stopped and there is no output. Sometimes, adding another prompt or starting a new converstaion seems to get it to answer, but most times it gets stuck.

<!-- gh-comment-id:3268026027 --> @spacetime-labs commented on GitHub (Sep 8, 2025): Same issue here on version 0.11.10 of MacOS. I use the ollama app, and models stop thinking and or responding. Remain stuck in 'loading', when I go back to it loading seems to have stopped and there is no output. Sometimes, adding another prompt or starting a new converstaion seems to get it to answer, but most times it gets stuck.
Author
Owner

@Dayal-star commented on GitHub (Oct 24, 2025):

today is 24-OCT-2025 , the same is happening i am running on windows 10 and ollama version is 0.12.6.

<!-- gh-comment-id:3441604701 --> @Dayal-star commented on GitHub (Oct 24, 2025): today is 24-OCT-2025 , the same is happening i am running on windows 10 and ollama version is 0.12.6.
Author
Owner

@vijaykanade55-sys commented on GitHub (Oct 26, 2025):

I have been trying to run different models on Ollama like llama 3.2 and gemma:2b since past few days, and I have been encountering the same issue of run command getting stalled. I am using Windows 10 with ollama version 0.12.6. The run command does not process further and stop almost instantly. PFA image. Can anyone fix the issue and help me resolve it?

Image
<!-- gh-comment-id:3448419225 --> @vijaykanade55-sys commented on GitHub (Oct 26, 2025): I have been trying to run different models on Ollama like llama 3.2 and gemma:2b since past few days, and I have been encountering the same issue of run command getting stalled. I am using Windows 10 with ollama version 0.12.6. The run command does not process further and stop almost instantly. PFA image. Can anyone fix the issue and help me resolve it? <img width="962" height="143" alt="Image" src="https://github.com/user-attachments/assets/39e21b41-9a6e-4594-bc79-424dd56837fd" />
Author
Owner

@JT0719 commented on GitHub (Nov 6, 2025):

Same as @vijaykanade55-sys over here... tried different models, reinstalled ollama, reset pc, tried on different pcs (all windows 10) and no luck. it downloads the model then it stays loading... on interface and on cmd... was working fine a few weeks ago.
ollama version 0.12.9.

<!-- gh-comment-id:3499340662 --> @JT0719 commented on GitHub (Nov 6, 2025): Same as @vijaykanade55-sys over here... tried different models, reinstalled ollama, reset pc, tried on different pcs (all windows 10) and no luck. it downloads the model then it stays loading... on interface and on cmd... was working fine a few weeks ago. ollama version 0.12.9.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1273