[GH-ISSUE #1197] Trying to run ollama on a server #47121

Closed
opened 2026-04-28 03:18:26 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @621625 on GitHub (Nov 19, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1197

Originally assigned to: @BruceMacD on GitHub.

Hi,

I'm Mr. Mist, a very friendly guy.

I'm trying to run ollama on a server but get this error message. Any ideas and solutions?

llama_new_context_with_model: kv self size = 256.00 MB
llama_new_context_with_model: compute buffer total size = 162.13 MB
2023/11/19 12:07:13 llama.go:438: error starting llama runner: timed out waiting for llama runner to start
2023/11/19 12:07:14 llama.go:430: signal: killed
2023/11/19 12:07:14 llama.go:504: llama runner stopped successfully
[GIN] 2023/11/19 - 12:07:14 | 500 | 4m32s | 127.0.0.1 | POST "/api/generate"
[GIN] 2023/11/19 - 12:18:11 | 200 | 581.303µs | 127.0.0.1 | HEAD "/"
[GIN] 2023/11/19 - 12:18:11 | 200 | 3.32279ms | 127.0.0.1 | GET "/api/tags"

Originally created by @621625 on GitHub (Nov 19, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1197 Originally assigned to: @BruceMacD on GitHub. Hi, I'm Mr. Mist, a very friendly guy. I'm trying to run ollama on a server but get this error message. Any ideas and solutions? llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: compute buffer total size = 162.13 MB 2023/11/19 12:07:13 llama.go:438: error starting llama runner: timed out waiting for llama runner to start 2023/11/19 12:07:14 llama.go:430: signal: killed 2023/11/19 12:07:14 llama.go:504: llama runner stopped successfully [GIN] 2023/11/19 - 12:07:14 | 500 | 4m32s | 127.0.0.1 | POST "/api/generate" [GIN] 2023/11/19 - 12:18:11 | 200 | 581.303µs | 127.0.0.1 | HEAD "/" [GIN] 2023/11/19 - 12:18:11 | 200 | 3.32279ms | 127.0.0.1 | GET "/api/tags"
GiteaMirror added the bug label 2026-04-28 03:18:26 -05:00
Author
Owner

@kukidevs commented on GitHub (Nov 19, 2023):

same here
getting "⠼ Error: timed out waiting for llama runner to start" while running "ollama run mistral" on Ubuntu

<!-- gh-comment-id:1817966524 --> @kukidevs commented on GitHub (Nov 19, 2023): same here getting "⠼ Error: timed out waiting for llama runner to start" while running "ollama run mistral" on Ubuntu
Author
Owner

@BruceMacD commented on GitHub (Nov 20, 2023):

Hi all, these messages indicate you may not have enough memory to run the models you are trying to load, or your machine does not have adequate resources. The error messages here could be better, sorry about that.

To confirm, what are the specs of the machines you are using?

<!-- gh-comment-id:1819425044 --> @BruceMacD commented on GitHub (Nov 20, 2023): Hi all, these messages indicate you may not have enough memory to run the models you are trying to load, or your machine does not have adequate resources. The error messages here could be better, sorry about that. To confirm, what are the specs of the machines you are using?
Author
Owner

@Amal-David commented on GitHub (Nov 20, 2023):

Hey, I'm getting the same error for llama2:70b, trying to run it on a G5.2xlarge (32GB RAM, 24GB A10 with 80GB storage and 20GB Available), RAM and GPU didn't seem to peak when observed, do I need more storage?

<!-- gh-comment-id:1819613217 --> @Amal-David commented on GitHub (Nov 20, 2023): Hey, I'm getting the same error for llama2:70b, trying to run it on a G5.2xlarge (32GB RAM, 24GB A10 with 80GB storage and 20GB Available), RAM and GPU didn't seem to peak when observed, do I need more storage?
Author
Owner

@technovangelist commented on GitHub (Dec 4, 2023):

you will need at least 64gb ram to run the 70b model. But there are plenty of other models that will fit in 32GB RAM. Perhaps try the 13b model for llama2?

<!-- gh-comment-id:1839746308 --> @technovangelist commented on GitHub (Dec 4, 2023): you will need at least 64gb ram to run the 70b model. But there are plenty of other models that will fit in 32GB RAM. Perhaps try the 13b model for llama2?
Author
Owner

@jexom commented on GitHub (Dec 21, 2023):

Having the same issue with dolphin-mixtral and wizard-vicuna-uncensored:30b on a 64gb ram machine running ollama as a docker container on windows. It takes a long time to load and then times out

<!-- gh-comment-id:1865585466 --> @jexom commented on GitHub (Dec 21, 2023): Having the same issue with dolphin-mixtral and wizard-vicuna-uncensored:30b on a 64gb ram machine running ollama as a docker container on windows. It takes a long time to load and then times out
Author
Owner

@technovangelist commented on GitHub (Jan 3, 2024):

We updated all of the models in the last week or so. Try repulling the models and let us know if your problem is solved.

<!-- gh-comment-id:1875711991 --> @technovangelist commented on GitHub (Jan 3, 2024): We updated all of the models in the last week or so. Try repulling the models and let us know if your problem is solved.
Author
Owner

@BruceMacD commented on GitHub (Mar 11, 2024):

Thanks to everyone for the information on this one. When this issue was created there were various issues related to memory allocation (over allocating context window, number of layers on the GPU) which have since been resolved. Closing this issue now, if you see any similar problems please open a new issue so they get attention.

<!-- gh-comment-id:1989069444 --> @BruceMacD commented on GitHub (Mar 11, 2024): Thanks to everyone for the information on this one. When this issue was created there were various issues related to memory allocation (over allocating context window, number of layers on the GPU) which have since been resolved. Closing this issue now, if you see any similar problems please open a new issue so they get attention.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47121