[GH-ISSUE #1838] Cuda Error with 2GB VRAM: Error: Post "http://127.0.0.1:11434/api/generate": EOF #47560

Closed
opened 2026-04-28 04:10:23 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @falaimo on GitHub (Jan 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1838

Hello everyone, in Ollama version 0.1.18, I'm encountering the error "Error: Post "http://127.0.0.1:11434/api/generate": EOF" when starting Ollama with any model. I think it depends of cuda...
logs_ollama.txt

Originally created by @falaimo on GitHub (Jan 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1838 Hello everyone, in Ollama version 0.1.18, I'm encountering the error "Error: Post "http://127.0.0.1:11434/api/generate": EOF" when starting Ollama with any model. I think it depends of cuda... [logs_ollama.txt](https://github.com/jmorganca/ollama/files/13852832/logs_ollama.txt)
GiteaMirror added the bug label 2026-04-28 04:10:23 -05:00
Author
Owner

@pierreuuuuu commented on GitHub (Jan 7, 2024):

Hello, I was about to create a ticket as well, I have the same behavior, the same error message about cuda:
"GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:7801: !"CUDA error""
I don't if it has a link to the error, but I have the same gpu as you, geforce gtx 950m. My cuda version is 12.3. Nvidia driver is 545.23.08.
I'm using also ollama v0.1.18, on ubuntu 22.04.3, and I'm trying to use mistral "ollama run mistral".

I've read older posts about "Error: Post "http://127.0.0.1:11434/api/generate": EOF", and the answer was about not enough ram memory, but I have 16GB and I thought it was enough for mistral.

logs.txt

Any ideas ?
Thanks for reading

<!-- gh-comment-id:1880014053 --> @pierreuuuuu commented on GitHub (Jan 7, 2024): Hello, I was about to create a ticket as well, I have the same behavior, the same error message about cuda: "GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:7801: !"CUDA error"" I don't if it has a link to the error, but I have the same gpu as you, geforce gtx 950m. My cuda version is 12.3. Nvidia driver is 545.23.08. I'm using also ollama v0.1.18, on ubuntu 22.04.3, and I'm trying to use mistral "ollama run mistral". I've read older posts about "Error: Post "http://127.0.0.1:11434/api/generate": EOF", and the answer was about not enough ram memory, but I have 16GB and I thought it was enough for mistral. [logs.txt](https://github.com/jmorganca/ollama/files/13852949/logs.txt) Any ideas ? Thanks for reading
Author
Owner

@kursatgormez commented on GitHub (Jan 7, 2024):

I got same error after update ollama.

<!-- gh-comment-id:1880015937 --> @kursatgormez commented on GitHub (Jan 7, 2024): I got same error after update ollama.
Author
Owner

@jmorganca commented on GitHub (Jan 7, 2024):

Hi all, sorry you hit this error. Working on a fix!

Here's a handy one line script for installing the previous version (which would fallback to CPU-only) until this is fixed

curl https://ollama.ai/install.sh | sed 's#https://ollama.ai/download#https://github.com/jmorganca/ollama/releases/download/v0.1.17#' | sh
<!-- gh-comment-id:1880065169 --> @jmorganca commented on GitHub (Jan 7, 2024): Hi all, sorry you hit this error. Working on a fix! Here's a handy one line script for installing the previous version (which would fallback to CPU-only) until this is fixed ``` curl https://ollama.ai/install.sh | sed 's#https://ollama.ai/download#https://github.com/jmorganca/ollama/releases/download/v0.1.17#' | sh ```
Author
Owner

@kursatgormez commented on GitHub (Jan 7, 2024):

My machine is Macbook Pro M2.

<!-- gh-comment-id:1880116686 --> @kursatgormez commented on GitHub (Jan 7, 2024): My machine is Macbook Pro M2.
Author
Owner

@jmorganca commented on GitHub (Jan 7, 2024):

@kursatgormez sorry about that – would it be possible to share any error you might see in the logs? ~/.ollama/logs/server.log

Thanks so much

<!-- gh-comment-id:1880179295 --> @jmorganca commented on GitHub (Jan 7, 2024): @kursatgormez sorry about that – would it be possible to share any error you might see in the logs? `~/.ollama/logs/server.log` Thanks so much
Author
Owner

@kursatgormez commented on GitHub (Jan 8, 2024):

My main purpose is fine-tuning llama2. So, I used llama.cpp for crate gguf file then insert with ADAPTER.
Maybe the GGUF file did this.
I lost my server.log, but if i face this situation i will ask. thank you so much

<!-- gh-comment-id:1880535505 --> @kursatgormez commented on GitHub (Jan 8, 2024): My main purpose is fine-tuning llama2. So, I used llama.cpp for crate gguf file then insert with ADAPTER. Maybe the GGUF file did this. I lost my server.log, but if i face this situation i will ask. thank you so much
Author
Owner

@deltawi commented on GitHub (Jan 8, 2024):

Hey team, I am facing the same issue on Ubuntu 22.04 with GPU RTX A5000. I am trying the mixtral:8x7b-instruct-v0.1-q4_0.

I ran:

ollama run mixtral:8x7b-instruct-v0.1-q4_0
<!-- gh-comment-id:1880637763 --> @deltawi commented on GitHub (Jan 8, 2024): Hey team, I am facing the same issue on `Ubuntu 22.04` with `GPU RTX A5000`. I am trying the `mixtral:8x7b-instruct-v0.1-q4_0`. I ran: ```bash ollama run mixtral:8x7b-instruct-v0.1-q4_0 ```
Author
Owner

@Cybervet commented on GitHub (Jan 8, 2024):

I think the problem continues , at least when we compile from source. Here is a the error msg when trying to run a small model in a 2 g VRAM . After the cuda error instead of falling in CPU only mode it exits.

2024/01/08 17:39:36 routes.go:930: Listening on 127.0.0.1:11434 (version 0.0.0)
2024/01/08 17:39:42 shim_ext_server.go:142: Dynamic LLM variants [cuda]
2024/01/08 17:39:42 gpu.go:37: Detecting GPU type
2024/01/08 17:39:42 gpu.go:56: Nvidia GPU detected
2024/01/08 17:39:42 gpu.go:86: CUDA Compute Capability detected: 5.0

llm_load_tensors: ggml ctx size = 0.08 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 35.52 MiB
llm_load_tensors: offloading 24 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors: VRAM used: 703.44 MiB
...........................................................................................
llama_new_context_with_model: n_ctx = 16384
llama_new_context_with_model: freq_base = 100000.0
llama_new_context_with_model: freq_scale = 0.25

CUDA error 2 at /root/ollama/llm/llama.cpp/ggml-cuda.cu:9132: out of memory
current device: 0
GGML_ASSERT: /root/ollama/llm/llama.cpp/ggml-cuda.cu:9132: !"CUDA error"
SIGABRT: abort
PC=0x7fd38b6a9d3c m=4 sigcode=18446744073709551610
signal arrived during cgo execution

<!-- gh-comment-id:1881554078 --> @Cybervet commented on GitHub (Jan 8, 2024): I think the problem continues , at least when we compile from source. Here is a the error msg when trying to run a small model in a 2 g VRAM . After the cuda error instead of falling in CPU only mode it exits. 2024/01/08 17:39:36 routes.go:930: Listening on 127.0.0.1:11434 (version 0.0.0) 2024/01/08 17:39:42 shim_ext_server.go:142: Dynamic LLM variants [cuda] 2024/01/08 17:39:42 gpu.go:37: Detecting GPU type 2024/01/08 17:39:42 gpu.go:56: Nvidia GPU detected 2024/01/08 17:39:42 gpu.go:86: CUDA Compute Capability detected: 5.0 llm_load_tensors: ggml ctx size = 0.08 MiB llm_load_tensors: using CUDA for GPU acceleration llm_load_tensors: mem required = 35.52 MiB llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 25/25 layers to GPU llm_load_tensors: VRAM used: 703.44 MiB ........................................................................................... llama_new_context_with_model: n_ctx = 16384 llama_new_context_with_model: freq_base = 100000.0 llama_new_context_with_model: freq_scale = 0.25 CUDA error 2 at /root/ollama/llm/llama.cpp/ggml-cuda.cu:9132: out of memory current device: 0 GGML_ASSERT: /root/ollama/llm/llama.cpp/ggml-cuda.cu:9132: !"CUDA error" SIGABRT: abort PC=0x7fd38b6a9d3c m=4 sigcode=18446744073709551610 signal arrived during cgo execution
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47560