[GH-ISSUE #2225] Ollama stops generating output and fails to run models after a few minutes #1273

New Issue

GiteaMirror · 2026-04-12T11:04:03-05:00

GiteaMirror commented

2026-04-12 11:04:03 -05:00

Originally created by @TheStarAlight on GitHub (Jan 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2225

Originally assigned to: @jmorganca on GitHub.

Hi, I'm running ollama on a Debian server and use the oterm as the interface.
After some chats (just less than 10 normal questions) the ollama fails to respond anymore and running ollama run mixtral just didn't success (it keeps loading).
I noted that the same issue happened, like in #1863 . Is there a solution at the moment? Also, I'm not the administrator of the server and I even don't know how to restart ollama 😂. The serve process seems to runs as another user named ollama. Can anyone tell me how to restart it?
To developers: I can provide some debug information if you need, just tell me how to do it.
Thanks :D

Originally created by @TheStarAlight on GitHub (Jan 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2225 Originally assigned to: @jmorganca on GitHub. Hi, I'm running ollama on a Debian server and use the oterm as the interface. After some chats (just less than 10 normal questions) the ollama fails to respond anymore and running `ollama run mixtral` just didn't success (it keeps loading). I noted that the same issue happened, like in #1863 . Is there a solution at the moment? Also, I'm not the administrator of the server and I even don't know how to restart ollama 😂. The serve process seems to runs as another user named ollama. Can anyone tell me how to restart it? To developers: I can provide some debug information if you need, just tell me how to do it. Thanks :D

GiteaMirror added the bug label 2026-04-12 11:04:03 -05:00

GiteaMirror closed this issue

2026-04-12 11:04:04 -05:00

GiteaMirror commented

2026-04-12 11:04:05 -05:00

@TheStarAlight commented on GitHub (Jan 27, 2024):

The model I'm running include mixtral:latest and wizard-math:70b.
I have access to an NVIDIA A100 PCI-e 80GB and the inputs are all simple sentences (no more than 100 words) and I ensure that nobody else is using the GPU (I see from nvitop).

@TheStarAlight commented on GitHub (Jan 27, 2024): The model I'm running include mixtral:latest and wizard-math:70b. I have access to an NVIDIA A100 PCI-e 80GB and the inputs are all simple sentences (no more than 100 words) and I ensure that nobody else is using the GPU (I see from nvitop).

GiteaMirror commented

2026-04-12 11:04:06 -05:00

@jmorganca commented on GitHub (Jan 27, 2024):

Hi @TheStarAlight, would it be possible to share which version of Ollama you are running? ollama -v will print this out. Thanks so much, and I'm sorry you hit this issue

@jmorganca commented on GitHub (Jan 27, 2024): Hi @TheStarAlight, would it be possible to share which version of Ollama you are running? `ollama -v` will print this out. Thanks so much, and I'm sorry you hit this issue

GiteaMirror commented

2026-04-12 11:04:06 -05:00

@TheStarAlight commented on GitHub (Jan 27, 2024):

@jmorganca Sure! The ollama version is 0.1.20, just installed three days ago via the shell script. Please tell me if you need more information :)

@TheStarAlight commented on GitHub (Jan 27, 2024): @jmorganca Sure! The ollama version is 0.1.20, just installed three days ago via the shell script. Please tell me if you need more information :)

GiteaMirror commented

2026-04-12 11:04:07 -05:00

@jmorganca commented on GitHub (Jan 27, 2024):

Would it be possible to test with the newest version 0.1.22, which should fix this? https://github.com/ollama/ollama/releases/tag/v0.1.22

You can download the latest version of Ollama here: https://ollama.ai/download

Keep me posted!

@jmorganca commented on GitHub (Jan 27, 2024): Would it be possible to test with the newest version 0.1.22, which should fix this? https://github.com/ollama/ollama/releases/tag/v0.1.22 You can download the latest version of Ollama here: https://ollama.ai/download Keep me posted!

GiteaMirror commented

2026-04-12 11:04:08 -05:00

@glorat commented on GitHub (Jan 30, 2024):

Is this a dupe issue of #1458 ?

Happened to me too on 0.1.22 with mistral on MacOS. Will post again if I can find a way to reproduce.

@glorat commented on GitHub (Jan 30, 2024): Is this a dupe issue of #1458 ? Happened to me too on 0.1.22 with mistral on MacOS. Will post again if I can find a way to reproduce.

GiteaMirror commented

2026-04-12 11:04:08 -05:00

@TheStarAlight commented on GitHub (Jan 30, 2024):

@glorat I think so, it seems this problem happens on all platforms (linux, macOS and WSL).

@TheStarAlight commented on GitHub (Jan 30, 2024): @glorat I think so, it seems this problem happens on all platforms (linux, macOS and WSL).

GiteaMirror commented

2026-04-12 11:04:09 -05:00

@TheStarAlight commented on GitHub (Jan 30, 2024):

@jmorganca I'm sorry that I'm not the administrator of the server and the administrator has not responded to my request😂. I'll try it on my own computer (but it can only run <4b models, even the mistral got very slow after the first evaluation) before the ollama on the server gets updated.
Btw, how can I restart the ollama server process😂? It is started by the user ollama and I cannot stop it without administrator privilege. The process has been hanging on the server for a few days and I just cannot find a way to stop it.
Thank you!

@TheStarAlight commented on GitHub (Jan 30, 2024): @jmorganca I'm sorry that I'm not the administrator of the server and the administrator has not responded to my request😂. I'll try it on my own computer (but it can only run <4b models, even the mistral got very slow after the first evaluation) before the ollama on the server gets updated. Btw, how can I restart the ollama server process😂? It is started by the user ollama and I cannot stop it without administrator privilege. The process has been hanging on the server for a few days and I just cannot find a way to stop it. Thank you!

GiteaMirror commented

2026-04-12 11:04:10 -05:00

@iplayfast commented on GitHub (Jan 30, 2024):

@jmorganca I can confirm that my memory issues have seemed to gone away with my stress test. https://github.com/ollama/ollama/issues/1691 Other issues have surfaced, but I think the ollama version 0.1.22 is a winner.

@iplayfast commented on GitHub (Jan 30, 2024): @jmorganca I can confirm that my memory issues have seemed to gone away with my stress test. https://github.com/ollama/ollama/issues/1691 Other issues have surfaced, but I think the ollama version 0.1.22 is a winner.

GiteaMirror commented

2026-04-12 11:04:10 -05:00

@tarbard commented on GitHub (Jan 31, 2024):

I'm seeing this behaviour on 0.1.22 too
After a few interactions (in this case codellama 70b) the API stops responding to ollama-webui and "ollama run codellama:70b-instruct-q4_K_M" just shows the loading animation and never starts.

journalctl -u ollama doesn't show any errors, just the last successful calls, is there any way to see more detailed logs?

"systemctl restart ollama" eventually restarts ollama but it takes quite a while

@tarbard commented on GitHub (Jan 31, 2024): I'm seeing this behaviour on 0.1.22 too After a few interactions (in this case codellama 70b) the API stops responding to ollama-webui and "ollama run codellama:70b-instruct-q4_K_M" just shows the loading animation and never starts. journalctl -u ollama doesn't show any errors, just the last successful calls, is there any way to see more detailed logs? "systemctl restart ollama" eventually restarts ollama but it takes quite a while

GiteaMirror commented

2026-04-12 11:04:11 -05:00

@thexclu commented on GitHub (Jan 31, 2024):

I have the same issue, running version 0.1.22 with mistral

@thexclu commented on GitHub (Jan 31, 2024): I have the same issue, running version 0.1.22 with mistral

GiteaMirror commented

2026-04-12 11:04:14 -05:00

@adriancbo commented on GitHub (Feb 4, 2024):

I am experiencing the same issue while running the technovangelist airenamer on version 0.1.23 with any llava. It functions initially but then hangs after a few minutes, causing the CPU usage to reach 100%. Consequently, I am unable to run any models. My system configuration is as follows:

Ubuntu 22.04
2x Nvidia 4090 GPUs
512GB RAM

@adriancbo commented on GitHub (Feb 4, 2024): I am experiencing the same issue while running the technovangelist airenamer on version `0.1.23` with any llava. It functions initially but then hangs after a few minutes, causing the CPU usage to reach 100%. Consequently, I am unable to run any models. My system configuration is as follows: - Ubuntu 22.04 - 2x Nvidia 4090 GPUs - 512GB RAM

GiteaMirror commented

2026-04-12 11:04:14 -05:00

@TheStarAlight commented on GitHub (Feb 7, 2024):

@jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem?

@TheStarAlight commented on GitHub (Feb 7, 2024): @jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem?

GiteaMirror commented

2026-04-12 11:04:15 -05:00

@TheStarAlight commented on GitHub (Feb 7, 2024):

@jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem?

I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ...

@TheStarAlight commented on GitHub (Feb 7, 2024): > @jmorganca I tried the new version (0.1.22) of ollama, and broke the ollama on two separate servers with two identical inputs 😂, the problem still exists. However, I notice that the problem occurs when the context gets a bit long (~1600 Chinese characters, 7 prompts). Would it be the problem? I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ...

GiteaMirror commented

2026-04-12 11:04:16 -05:00

@lukebelbina commented on GitHub (Feb 8, 2024):

I am having the same issue with latest version 0.1.24. I works for a few minutes then eventually starts hanging on every request.

@lukebelbina commented on GitHub (Feb 8, 2024): I am having the same issue with latest version 0.1.24. I works for a few minutes then eventually starts hanging on every request.

GiteaMirror commented

2026-04-12 11:04:16 -05:00

@coolrazor007 commented on GitHub (Feb 10, 2024):

I'm seeing this on 0.1.24 as well. How far back should I rollback in the interim? Anyone know when this was introduced?

@coolrazor007 commented on GitHub (Feb 10, 2024): I'm seeing this on 0.1.24 as well. How far back should I rollback in the interim? Anyone know when this was introduced?

GiteaMirror commented

2026-04-12 11:04:17 -05:00

@jmorganca commented on GitHub (Feb 11, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

@jmorganca commented on GitHub (Feb 11, 2024): Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

GiteaMirror commented

2026-04-12 11:04:18 -05:00

@lukebelbina commented on GitHub (Feb 12, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

I am sending the same preprompt with different user message, one after another (about every 1-2 second) using llama:17b. It crashes 100% of the time within about 10 minutes.

@lukebelbina commented on GitHub (Feb 12, 2024): > Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks! I am sending the same preprompt with different user message, one after another (about every 1-2 second) using llama:17b. It crashes 100% of the time within about 10 minutes.

GiteaMirror commented

2026-04-12 11:04:19 -05:00

@lips85 commented on GitHub (Feb 13, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

I worked on ollama v0.124 on mac m3 max 64gb
The model worked with two models, mistral:latest and openhermes:latest, and after performing the same task several times, the CPU usage increased to 99% and stopped.

I confirmed that it was working with the GPU before the operation stopped.
Before checking the github issue, I thought it was a problem that only occurred on a specific OS (Mac silicon), but it seems to be a problem that occurs regardless of platform.

@lips85 commented on GitHub (Feb 13, 2024): > Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks! I worked on `ollama v0.124` on `mac m3 max 64gb` The model worked with two models, `mistral:latest` and `openhermes:latest`, and after performing the same task several times, the CPU usage increased to 99% and stopped. I confirmed that it was working with the GPU before the operation stopped. Before checking the github issue, I thought it was a problem that only occurred on a specific OS (Mac silicon), but it seems to be a problem that occurs regardless of platform.

GiteaMirror commented

2026-04-12 11:04:20 -05:00

@TheStarAlight commented on GitHub (Feb 13, 2024):

Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks!

@jmorganca Hi, thank you for your attention. I was just doing regular chats using ollama-webui (just like using ChatGPT). But now I cannot reproduce my previous chat anymore, I just had a chat with qwen:72b with longer than 2000 Chinese characters and the problem seemed gone away. But one thing is for sure, in my previous situation (ollama 0.1.22):

I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ...

it seemed that this chat was "poisonous" and the next prompt would crash every ollama server (at lease my 2 servers) in the first run. I'll comment if I find another similar occasion :D

@TheStarAlight commented on GitHub (Feb 13, 2024): > Sorry this is still a problem – what kind of prompt is being sent to the model – is it the same prompt over and over again, or a different one? Thanks! @jmorganca Hi, thank you for your attention. I was just doing regular chats using ollama-webui (just like using ChatGPT). But now I cannot reproduce my previous chat anymore, I just had a chat with qwen:72b with longer than 2000 Chinese characters and the problem seemed gone away. But one thing is for sure, in my previous situation (ollama 0.1.22): > I should have illustrated it more clearly. I'm using ollama-webui and qwen:72b (this time a different model), and I forwarded the 11434 port from the remote server for my local webui to access. After the problem happened, I saved the previous chat history and switched to another server, then tried to continue the chat before using the same prompt which caused the problem in the previous server, and it just stuck in the middle as well, just after a single evaluation ... it seemed that this chat was "poisonous" and the next prompt would crash every ollama server (at lease my 2 servers) in the first run. I'll comment if I find another similar occasion :D

GiteaMirror commented

2026-04-12 11:04:20 -05:00

@timiil commented on GitHub (Feb 19, 2024):

seems we are faceing the same problem in ubuntu, no matter docker env or directly deploy ollama service , after we call the ollama http endpoint serval times, ollama http service will be hang up.

@timiil commented on GitHub (Feb 19, 2024): seems we are faceing the same problem in ubuntu, no matter docker env or directly deploy ollama service , after we call the ollama http endpoint serval times, ollama http service will be hang up.

GiteaMirror commented

2026-04-12 11:04:21 -05:00

@TheStarAlight commented on GitHub (Feb 19, 2024):

Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log?

@TheStarAlight commented on GitHub (Feb 19, 2024): Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log?

GiteaMirror commented

2026-04-12 11:04:21 -05:00

@mjspeck commented on GitHub (Feb 19, 2024):

I think I'm running into this issue as well.

@mjspeck commented on GitHub (Feb 19, 2024): I think I'm running into this issue as well.

GiteaMirror commented

2026-04-12 11:04:22 -05:00

@Sinan-Karakaya commented on GitHub (Feb 22, 2024):

I am running on the same issue, using mistral with a pre-prompt with a Mac M1 chip. After a couple of generation, the server will not respond until I kill my request

@Sinan-Karakaya commented on GitHub (Feb 22, 2024): I am running on the same issue, using mistral with a pre-prompt with a Mac M1 chip. After a couple of generation, the server will not respond until I kill my request

GiteaMirror commented

2026-04-12 11:04:22 -05:00

@bennylam commented on GitHub (Feb 29, 2024):

Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log?
I have run the Ollama (version 0.1.23) with model llama2:latest and mistral:latest without problem in long time if I just use English for the chat conversation. However, when I try to ask (in English) the chat query that instruct the model to reply in Asian language (e.g. Chinese, Vietnamese or Thai), it will get stuck or freeze in most of case after a few conversations. You can still see Ollama is running by http://127.0.0.1:11434/ but whatever you type in the command shell prompt (even in English) you get no response at all. Not until you reboot the whole system (WSL on Windows 10 in my case), you cannot get Ollama to response again.
I think you can reproduce the problem in this way.

@bennylam commented on GitHub (Feb 29, 2024): > Is there a reproducable way to reproduce the issue? Or if is there any way that we save the verbose log? I have run the Ollama (version 0.1.23) with model llama2:latest and mistral:latest without problem in long time if I just use English for the chat conversation. However, when I try to ask (in English) the chat query that instruct the model to reply in Asian language (e.g. Chinese, Vietnamese or Thai), it will get stuck or freeze in most of case after a few conversations. You can still see Ollama is running by http://127.0.0.1:11434/ but whatever you type in the command shell prompt (even in English) you get no response at all. Not until you reboot the whole system (WSL on Windows 10 in my case), you cannot get Ollama to response again. I think you can reproduce the problem in this way.

GiteaMirror commented

2026-04-12 11:04:23 -05:00

@gOATiful commented on GitHub (Feb 29, 2024):

We encounter the same problem on ubuntu 20.04.6 LTS.

@gOATiful commented on GitHub (Feb 29, 2024): We encounter the same problem on ubuntu 20.04.6 LTS.

GiteaMirror commented

2026-04-12 11:04:24 -05:00

@justinwaltrip commented on GitHub (Feb 29, 2024):

I was able to fix this issue by removing the JSON formatting parameter for /api/generate calls

@justinwaltrip commented on GitHub (Feb 29, 2024): I was able to fix this issue by removing the JSON formatting parameter for /api/generate calls

GiteaMirror commented

2026-04-12 11:04:24 -05:00

@antonsapt4 commented on GitHub (Mar 1, 2024):

Have the same problem, I'm on mac M2 running ollama desktop version 0.1.27
Using gemma:7b-instruct-q6_K

First boot is run, doing some curl test just to make sure it works fine. But after idle whenever sending curl again when the model boot offloading to metal, it hang and restart my Macbook. Its happening all the time.

Can somebody shed some light?, should I uninstall and download a new Ollama or is there any setting that can fix this issues?

PS:
@justinwaltrip on Ollama desktop how to remove the JSON formatting?

@antonsapt4 commented on GitHub (Mar 1, 2024): Have the same problem, I'm on mac M2 running ollama desktop version 0.1.27 Using gemma:7b-instruct-q6_K First boot is run, doing some curl test just to make sure it works fine. But after idle whenever sending curl again when the model boot offloading to metal, it hang and restart my Macbook. Its happening all the time. Can somebody shed some light?, should I uninstall and download a new Ollama or is there any setting that can fix this issues? PS: @justinwaltrip on Ollama desktop how to remove the JSON formatting?

GiteaMirror commented

2026-04-12 11:04:24 -05:00

@koleshjr commented on GitHub (Mar 5, 2024):

I am having the same issue even on the new version: 0.1.28 . This happens after 200 iterations on a custom finetuned 4 bit mistral on collabs free tier t4

@koleshjr commented on GitHub (Mar 5, 2024): I am having the same issue even on the new version: 0.1.28 . This happens after 200 iterations on a custom finetuned 4 bit mistral on collabs free tier t4

GiteaMirror commented

2026-04-12 11:04:25 -05:00

@deadmanoz commented on GitHub (Mar 8, 2024):

We are experiencing the same issue on 0.1.28, using official Docker image on Ubuntu 22.04 with 8x RTX A40000.

Running llava:34b with images as part of the request.

Will successfully process infrequent requests than suddenly hang on some request and be unresponsive to request until container is restarted.

Requests are being made to /api/generate endpoint across network, with stream False in request.

ollama responds to health checks.

@deadmanoz commented on GitHub (Mar 8, 2024): We are experiencing the same issue on `0.1.28`, using official Docker image on Ubuntu 22.04 with 8x RTX A40000. Running `llava:34b` with images as part of the request. Will successfully process infrequent requests than suddenly hang on some request and be unresponsive to request until container is restarted. Requests are being made to `/api/generate` endpoint across network, with stream `False` in request. ollama responds to health checks.

GiteaMirror commented

2026-04-12 11:04:25 -05:00

@deadmanoz commented on GitHub (Mar 8, 2024):

I was just watching this as it hit the issue.

CPU of 1 core hits 100%.

It will just hang here now until I restart the container. This is the output of docker logs ollama

encode_image_[GIN] 2024/03/08 - 06:09:12 | 200 | 19.031746157s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:09:13.001Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:09:26 | 200 | 18.617710773s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:09:49.931Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:10:04 | 200 | 14.109569108s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:13:45.049Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:13:58 | 200 | 13.486728309s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:14:55.013Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"
[GIN] 2024/03/08 - 06:15:07 | 200 | 12.661186052s |      172.21.0.4 | POST     "/api/generate"
time=2024-03-08T06:16:47.278Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images"

nvtop:

htop

Please advise if I can provide any further logs or assist with debugging this issue

@deadmanoz commented on GitHub (Mar 8, 2024): I was just watching this as it hit the issue. CPU of 1 core hits 100%. It will just hang here now until I restart the container. This is the output of `docker logs ollama` ``` encode_image_[GIN] 2024/03/08 - 06:09:12 | 200 | 19.031746157s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:09:13.001Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:09:26 | 200 | 18.617710773s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:09:49.931Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:10:04 | 200 | 14.109569108s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:13:45.049Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:13:58 | 200 | 13.486728309s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:14:55.013Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" [GIN] 2024/03/08 - 06:15:07 | 200 | 12.661186052s | 172.21.0.4 | POST "/api/generate" time=2024-03-08T06:16:47.278Z level=INFO source=dyn_ext_server.go:171 msg="loaded 3 images" ``` nvtop: <img width="654" alt="Screenshot_8_3_2024__2_19 pm" src="https://github.com/ollama/ollama/assets/62584182/ef14084e-2789-4168-99c9-9890e26a2062"> htop <img width="2044" alt="Menubar" src="https://github.com/ollama/ollama/assets/62584182/1150e23d-1171-4ce6-92dc-07f9b2b61943"> Please advise if I can provide any further logs or assist with debugging this issue

GiteaMirror commented

2026-04-12 11:04:26 -05:00

@seanmavley commented on GitHub (Mar 9, 2024):

@justinwaltrip on Ollama desktop how to remove the JSON formatting?

@antonsapt4 Removing the json format param appears to work without issues for me.

Issue may be related to #1910

@seanmavley commented on GitHub (Mar 9, 2024): > @justinwaltrip on Ollama desktop how to remove the JSON formatting? @antonsapt4 Removing the json format param appears to work without issues for me. Issue may be related to #1910

GiteaMirror commented

2026-04-12 11:04:27 -05:00

@deadmanoz commented on GitHub (Mar 9, 2024):

Note: I don't use "format": "json" and have this issue..

@deadmanoz commented on GitHub (Mar 9, 2024): Note: I don't use `"format": "json"` and have this issue..

GiteaMirror commented

2026-04-12 11:04:27 -05:00

@jmccrosky commented on GitHub (Mar 17, 2024):

I also have this issue on an M3 Max. It seems to be somewhat random, but tends to happen more quickly with larger model or larget prompts. For example, with one model, including only one image per prompt works consistently, but including two per prompt will trigger this issue after some runs... With a larger model, it happens even with only one image.

@jmccrosky commented on GitHub (Mar 17, 2024): I also have this issue on an M3 Max. It seems to be somewhat random, but tends to happen more quickly with larger model or larget prompts. For example, with one model, including only one image per prompt works consistently, but including two per prompt will trigger this issue after some runs... With a larger model, it happens even with only one image.

GiteaMirror commented

2026-04-12 11:04:29 -05:00

@niyogrv commented on GitHub (Mar 20, 2024):

#1863 and this seems to be the same issue.

@niyogrv commented on GitHub (Mar 20, 2024): #1863 and this seems to be the same issue.

GiteaMirror commented

2026-04-12 11:04:29 -05:00

@deadmanoz commented on GitHub (Mar 28, 2024):

Yes, looks to be the same issue!

🤞 that https://github.com/ollama/ollama/issues/2225 is the resolution!

@deadmanoz commented on GitHub (Mar 28, 2024): Yes, looks to be the same issue! 🤞 that https://github.com/ollama/ollama/issues/2225 is the resolution!

GiteaMirror commented

2026-04-12 11:04:30 -05:00

@Master-Lucas commented on GitHub (Apr 3, 2024):

Even in Ollama version 0.1.30, Japanese text generation stops. I've tried it once with the lightweight "mistral" and three times with "dolphin-mistral" (all 4q), and in total, at the point of failing generation, throwing all the input and output Japanese characters into the token counter, it stops around 3400-3600 tokens. It feels like it hangs up after generating longer texts for about 6-9 turns (Mac Sonoma14.4.1, 64GB). P.S. using terminal with this experiments, but using PageAssist webui also the same issue occures.

@Master-Lucas commented on GitHub (Apr 3, 2024): Even in Ollama version 0.1.30, Japanese text generation stops. I've tried it once with the lightweight "mistral" and three times with "dolphin-mistral" (all 4q), and in total, at the point of failing generation, throwing all the input and output Japanese characters into the token counter, it stops around 3400-3600 tokens. It feels like it hangs up after generating longer texts for about 6-9 turns (Mac Sonoma14.4.1, 64GB). P.S. using terminal with this experiments, but using PageAssist webui also the same issue occures.

GiteaMirror commented

2026-04-12 11:04:30 -05:00

@Mecil9 commented on GitHub (Apr 7, 2024):

I have the same question. When I run ollama on apple M1 Max. My activity monitor shows 100% CPU usage and 0% GPU usage, and after running for a while ollama becomes unresponsive. Don't know what caused it.
Once the CPU reaches 100%, ollama will stop working. I have tried many methods to no avail!

@Mecil9 commented on GitHub (Apr 7, 2024): I have the same question. When I run ollama on apple M1 Max. My activity monitor shows 100% CPU usage and 0% GPU usage, and after running for a while ollama becomes unresponsive. Don't know what caused it. Once the CPU reaches 100%, ollama will stop working. I have tried many methods to no avail!

GiteaMirror commented

2026-04-12 11:04:31 -05:00

@jmorganca commented on GitHub (Apr 15, 2024):

Hi all, this should be fixed in 0.1.31 (hanging when unicode characters are in the prompt). Further fixes for hanging are also in 0.1.32 - stay tuned!

@jmorganca commented on GitHub (Apr 15, 2024): Hi all, this should be fixed in 0.1.31 (hanging when unicode characters are in the prompt). Further fixes for hanging are also in 0.1.32 - stay tuned!

GiteaMirror commented

2026-04-12 11:04:32 -05:00

@Destroyer commented on GitHub (Apr 20, 2024):

running the 0.1.32 and the issue still persists. using CPU AVX on Debian 12 with llama3 model, gets stuck before my proxy times out after 60 seconds, reloading the page and typing the query again fixes it but it is very annoying

@Destroyer commented on GitHub (Apr 20, 2024): running the 0.1.32 and the issue still persists. using CPU AVX on Debian 12 with llama3 model, gets stuck before my proxy times out after 60 seconds, reloading the page and typing the query again fixes it but it is very annoying

GiteaMirror commented

2026-04-12 11:04:33 -05:00

@NikitaDeveloperAI commented on GitHub (Jun 3, 2024):

Currently running Ollama 0.1.41, sadly this problem still persists.

@NikitaDeveloperAI commented on GitHub (Jun 3, 2024): Currently running Ollama 0.1.41, sadly this problem still persists.

GiteaMirror commented

2026-04-12 11:04:34 -05:00

@seanmavley commented on GitHub (Jun 3, 2024):

@NikitaDeveloperAI this problem may hardly get a universal fix as it's like a whack-a-mole game.

In our case, we stopped using the format='json", and explicitly wrote a prompt that output in the exact json structure we want.

For now, that appears to work with some level of predictability and consistency. Testing it more, but so far, that approach doesn't cause hanging issues with Ollama.

@seanmavley commented on GitHub (Jun 3, 2024): @NikitaDeveloperAI this problem may hardly get a universal fix as it's like a whack-a-mole game. In our case, we stopped using the `format='json"`, and explicitly wrote a prompt that output in the exact json structure we want. For now, that appears to work with some level of predictability and consistency. Testing it more, but so far, that approach doesn't cause hanging issues with Ollama.

GiteaMirror commented

2026-04-12 11:04:34 -05:00

@thistlillo commented on GitHub (Mar 24, 2025):

I am using ollama version is 0.6.1 on a linux "Rocky Linux 8.10 (Green Obsidian)", machine with four H100 80GB, I configure several Ollama servers, 1 per gpu. Very often it freezes.

With the models available today:

Llama 3.3: it freezes every once in a while
gemma3:latest it always freezes
qwen2.5:latest it always freezes

@thistlillo commented on GitHub (Mar 24, 2025): I am using ollama version is 0.6.1 on a linux "Rocky Linux 8.10 (Green Obsidian)", machine with four H100 80GB, I configure several Ollama servers, 1 per gpu. Very often it freezes. With the models available today: - Llama 3.3: it freezes every once in a while - gemma3:latest it always freezes - qwen2.5:latest it always freezes

GiteaMirror commented

2026-04-12 11:04:35 -05:00

@fireblade2534 commented on GitHub (Mar 25, 2025):

same here with 0.6.2 and Gemma 3 on two A4500 gpu's

@fireblade2534 commented on GitHub (Mar 25, 2025): same here with 0.6.2 and Gemma 3 on two A4500 gpu's

GiteaMirror commented

2026-04-12 11:04:35 -05:00

@chm-dev commented on GitHub (Apr 11, 2025):

RTX 3090, Windows 11
Exactly same thing as @thistlillo

qwen2.5-coder 32b freezes almost instantly after loading. Sometimes you might be able to actually have one or two chats before it freezes.

@chm-dev commented on GitHub (Apr 11, 2025): RTX 3090, Windows 11 Exactly same thing as @thistlillo [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder) 32b freezes almost instantly after loading. Sometimes you might be able to actually have one or two chats before it freezes.

GiteaMirror commented

2026-04-12 11:04:36 -05:00

@festivus37 commented on GitHub (Apr 15, 2025):

Same. Ollama 0.6.5.
M3 Ultra 512 gigs.
First noticed it on Deepseek R1 671b q4 after a few queries, but then switched to Gemma 3 27b q8. It gets through about 7-10 requests and then starts freezing. The memory of the model is in use, the gpu is pegged permanently, no outputs via the api are being sent. Qutting Ollama fixes it for a short while, then it's back to the same behavior.

@festivus37 commented on GitHub (Apr 15, 2025): Same. Ollama 0.6.5. M3 Ultra 512 gigs. First noticed it on Deepseek R1 671b q4 after a few queries, but then switched to Gemma 3 27b q8. It gets through about 7-10 requests and then starts freezing. The memory of the model is in use, the gpu is pegged permanently, no outputs via the api are being sent. Qutting Ollama fixes it for a short while, then it's back to the same behavior.

GiteaMirror commented

2026-04-12 11:04:36 -05:00

@softmarshmallow commented on GitHub (Apr 25, 2025):

Same here. Running M1 Max, gemma3:27b freezes 100% after 20 ~ 30 min. I though this was related to screen saver, but it was simply randomly freezing. very annoying, need to run automation over night, no way..

@softmarshmallow commented on GitHub (Apr 25, 2025): Same here. Running M1 Max, gemma3:27b freezes 100% after 20 ~ 30 min. I though this was related to screen saver, but it was simply randomly freezing. very annoying, need to run automation over night, no way..

GiteaMirror commented

2026-04-12 11:04:36 -05:00

@Sven1403 commented on GitHub (Apr 28, 2025):

Same. Ollama 0.6.6 with a A16 vGPU.

First it works fine and then freeze. With ollama ps i see the model running or being stucked in "Stopping...". Only way to resolve is a reboot of the PC. Killing the ollama process isnt working.

@Sven1403 commented on GitHub (Apr 28, 2025): Same. Ollama 0.6.6 with a A16 vGPU. First it works fine and then freeze. With ollama ps i see the model running or being stucked in "Stopping...". Only way to resolve is a reboot of the PC. Killing the ollama process isnt working.

GiteaMirror commented

2026-04-12 11:04:37 -05:00

@remidebette commented on GitHub (Jun 4, 2025):

Hi guys,
Same issues with ollama deployed on A100 40GB
(in a kubernetes environment: helm chart 1.19.0 deploying ollama v0.9.0)

@remidebette commented on GitHub (Jun 4, 2025): Hi guys, Same issues with ollama deployed on A100 40GB (in a kubernetes environment: helm chart 1.19.0 deploying ollama v0.9.0)

GiteaMirror commented

2026-04-12 11:04:37 -05:00

@BillShiyaoZhang commented on GitHub (Jul 11, 2025):

Same issue. Ollama v0.9.6 on Mac M4 Pro

@BillShiyaoZhang commented on GitHub (Jul 11, 2025): Same issue. Ollama v0.9.6 on Mac M4 Pro

GiteaMirror commented

2026-04-12 11:04:38 -05:00

@czaku commented on GitHub (Aug 30, 2025):

same for me ollama 0.11.8 on latest macos, spinning but doesn't generate any output

@czaku commented on GitHub (Aug 30, 2025): same for me ollama 0.11.8 on latest macos, spinning but doesn't generate any output

GiteaMirror commented

2026-04-12 11:04:38 -05:00

@spacetime-labs commented on GitHub (Sep 8, 2025):

Same issue here on version 0.11.10 of MacOS. I use the ollama app, and models stop thinking and or responding. Remain stuck in 'loading', when I go back to it loading seems to have stopped and there is no output. Sometimes, adding another prompt or starting a new converstaion seems to get it to answer, but most times it gets stuck.

@spacetime-labs commented on GitHub (Sep 8, 2025): Same issue here on version 0.11.10 of MacOS. I use the ollama app, and models stop thinking and or responding. Remain stuck in 'loading', when I go back to it loading seems to have stopped and there is no output. Sometimes, adding another prompt or starting a new converstaion seems to get it to answer, but most times it gets stuck.

GiteaMirror commented

2026-04-12 11:04:39 -05:00

@Dayal-star commented on GitHub (Oct 24, 2025):

today is 24-OCT-2025 , the same is happening i am running on windows 10 and ollama version is 0.12.6.

@Dayal-star commented on GitHub (Oct 24, 2025): today is 24-OCT-2025 , the same is happening i am running on windows 10 and ollama version is 0.12.6.

GiteaMirror commented

2026-04-12 11:04:39 -05:00

@vijaykanade55-sys commented on GitHub (Oct 26, 2025):

I have been trying to run different models on Ollama like llama 3.2 and gemma:2b since past few days, and I have been encountering the same issue of run command getting stalled. I am using Windows 10 with ollama version 0.12.6. The run command does not process further and stop almost instantly. PFA image. Can anyone fix the issue and help me resolve it?

@vijaykanade55-sys commented on GitHub (Oct 26, 2025): I have been trying to run different models on Ollama like llama 3.2 and gemma:2b since past few days, and I have been encountering the same issue of run command getting stalled. I am using Windows 10 with ollama version 0.12.6. The run command does not process further and stop almost instantly. PFA image. Can anyone fix the issue and help me resolve it? <img width="962" height="143" alt="Image" src="https://github.com/user-attachments/assets/39e21b41-9a6e-4594-bc79-424dd56837fd" />

GiteaMirror commented

2026-04-12 11:04:40 -05:00

@JT0719 commented on GitHub (Nov 6, 2025):

Same as @vijaykanade55-sys over here... tried different models, reinstalled ollama, reset pc, tried on different pcs (all windows 10) and no luck. it downloads the model then it stays loading... on interface and on cmd... was working fine a few weeks ago.
ollama version 0.12.9.

@JT0719 commented on GitHub (Nov 6, 2025): Same as @vijaykanade55-sys over here... tried different models, reinstalled ollama, reset pc, tried on different pcs (all windows 10) and no luck. it downloads the model then it stays loading... on interface and on cmd... was working fine a few weeks ago. ollama version 0.12.9.

GiteaMirror referenced this issue

2026-04-12 23:02:38 -05:00

[PR #1273] [CLOSED] added llama_runner_timeout ModelFile parameter for longer timeouts #10534

GiteaMirror referenced this issue

2026-04-16 05:08:58 -05:00

[PR #1273] [CLOSED] added llama_runner_timeout ModelFile parameter for longer timeouts #15805

GiteaMirror referenced this issue

2026-04-19 15:25:39 -05:00

[PR #1273] [CLOSED] added llama_runner_timeout ModelFile parameter for longer timeouts #21074

GiteaMirror referenced this issue

2026-04-22 21:04:34 -05:00

[PR #1273] [CLOSED] added llama_runner_timeout ModelFile parameter for longer timeouts #36407

GiteaMirror referenced this issue

2026-04-24 21:36:39 -05:00

[PR #1273] [CLOSED] added llama_runner_timeout ModelFile parameter for longer timeouts #41782

GiteaMirror referenced this issue

2026-04-29 11:49:10 -05:00

[PR #1273] [CLOSED] added llama_runner_timeout ModelFile parameter for longer timeouts #57231

GiteaMirror referenced this issue

2026-05-05 04:23:47 -05:00

[PR #1273] [CLOSED] added llama_runner_timeout ModelFile parameter for longer timeouts #72828

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#1273