[GH-ISSUE #1641] Ollama hangs sometimes if it runs out of VRAM #916

Closed
opened 2026-04-12 10:36:04 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @nick-tonjum on GitHub (Dec 20, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1641

Hi! I just have been having an issue with models that cause the system to run out of VRAM. It usually does the following:

  1. (attempt to run a model via api, for example Llama2 70b)
  2. ollama-runner tries to load the model into VRAM
  3. ollama-runner runs out of VRAM and the process kills
  4. the API hangs indefinitely until it is killed (via systemctl restart or killing the docker container if applicable)

I don't know why it has to be restarted to process the next request, would it be possible to have a feature where it detects if it runs out of VRAM or crashes and then returns an error via the API and/or auto restarts? This is something I've been running into recently as I only have 24g of VRAM

Much appreciated!

Originally created by @nick-tonjum on GitHub (Dec 20, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1641 Hi! I just have been having an issue with models that cause the system to run out of VRAM. It usually does the following: 1. (attempt to run a model via api, for example Llama2 70b) 1. ollama-runner tries to load the model into VRAM 2. ollama-runner runs out of VRAM and the process kills 3. the API hangs indefinitely until it is killed (via systemctl restart or killing the docker container if applicable) I don't know why it has to be restarted to process the next request, would it be possible to have a feature where it detects if it runs out of VRAM or crashes and then returns an error via the API and/or auto restarts? This is something I've been running into recently as I only have 24g of VRAM Much appreciated!
GiteaMirror added the bug label 2026-04-12 10:36:04 -05:00
Author
Owner

@iplayfast commented on GitHub (Dec 23, 2023):

I've found this as well, when making my project https://github.com/iplayfast/OllamaPlayground/tree/main/createnotes#readme
This project really works the ollama system by checking that each model can be loaded, and then asking questions to it.

It gives a timeout when loading the falcon:180b and then after that some models will load and others won't. meditorn ends up heating up my cpu but not much else.

<!-- gh-comment-id:1868200447 --> @iplayfast commented on GitHub (Dec 23, 2023): I've found this as well, when making my project https://github.com/iplayfast/OllamaPlayground/tree/main/createnotes#readme This project really works the ollama system by checking that each model can be loaded, and then asking questions to it. It gives a timeout when loading the falcon:180b and then after that some models will load and others won't. meditorn ends up heating up my cpu but not much else.
Author
Owner

@iplayfast commented on GitHub (Dec 23, 2023):

Found my problem (hopefully), I had downloaded falcon:180b as a stress test and forgot about it. 101GB won't load....

After attempting to load it, and timing out, it went on to load some other models, but then died on meditron which is the pattern I've seen before. The funny thing is that meditron isnt' an especially big model.

attempting to load model falcon:180b
Timed out after 300 seconds for question: are you there
model falcon:180b ------------not loaded------------ in 364.1 seconds
attempting to load model llama2:latest
model llama2:latest loaded in 27.6 seconds
attempting to load model llama2-uncensored:latest
model llama2-uncensored:latest loaded in 23.6 seconds
attempting to load model llava:latest
model llava:latest loaded in 30.6 seconds
attempting to load model magicoder:latest
model magicoder:latest loaded in 28.4 seconds
attempting to load model meditron:latest
Timed out after 300 seconds for question: are you there

I think this is just a case of need better error handling when loading a model.
*** edit ***
Nope the problem persists. Created a separate issue for it.

<!-- gh-comment-id:1868210930 --> @iplayfast commented on GitHub (Dec 23, 2023): Found my problem (hopefully), I had downloaded falcon:180b as a stress test and forgot about it. 101GB won't load.... After attempting to load it, and timing out, it went on to load some other models, but then died on meditron which is the pattern I've seen before. The funny thing is that meditron isnt' an especially big model. ``` attempting to load model falcon:180b Timed out after 300 seconds for question: are you there model falcon:180b ------------not loaded------------ in 364.1 seconds attempting to load model llama2:latest model llama2:latest loaded in 27.6 seconds attempting to load model llama2-uncensored:latest model llama2-uncensored:latest loaded in 23.6 seconds attempting to load model llava:latest model llava:latest loaded in 30.6 seconds attempting to load model magicoder:latest model magicoder:latest loaded in 28.4 seconds attempting to load model meditron:latest Timed out after 300 seconds for question: are you there ``` I think this is just a case of need better error handling when loading a model. *** edit *** Nope the problem persists. Created a separate issue for it.
Author
Owner

@1e100 commented on GitHub (Jan 12, 2024):

Could be some kind of a timeout. I have two RTX A6000's in my machine and only one is allocated to Ollama, the other is usually training or finetuning something. When that training is ongoing, the Ollama GPU is slower, so it can also hang on something as small as Mixtral.

<!-- gh-comment-id:1888636989 --> @1e100 commented on GitHub (Jan 12, 2024): Could be some kind of a timeout. I have two RTX A6000's in my machine and only one is allocated to Ollama, the other is usually training or finetuning something. When that training is ongoing, the Ollama GPU is slower, so it can also hang on something as small as Mixtral.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#916