[GH-ISSUE #9711] Error: POST Predict Request to http://127.0.0.1:53151/completion Failed – Connection Forcibly Closed by Remote Host (wsarecv) #6344

Closed
opened 2026-04-12 17:51:25 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @Dejon141 on GitHub (Mar 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9711

What is the issue?

So I was running the new gemma3 models when I stumbled upon a error with gemma3:27b didn't know why it happened even after testing multiple times also I don't have any environment variables interfering neither do I have an external host accessing the server tried it with the different gemma models and it worked it seems like either my GPU is a potato or there is some kinda of internal error with the ollama server.

Error: POST predict: Post "http://127.0.0.1:53194/completion": read tcp 127.0.0.1:53196->127.0.0.1:53194: wsarecv: An existing connection was forcibly closed by the remote host.

COMMAND

"ollama run gemma3:27b"

SPECIFICATION

Geforce GTX 1060 6GB
Intel i5 10400F

Relevant log output

[server.log](https://github.com/user-attachments/files/19222811/server.log)

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.6.0

Originally created by @Dejon141 on GitHub (Mar 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9711 ### What is the issue? So I was running the new gemma3 models when I stumbled upon a error with gemma3:27b didn't know why it happened even after testing multiple times also I don't have any environment variables interfering neither do I have an external host accessing the server tried it with the different gemma models and it worked it seems like either my GPU is a potato or there is some kinda of internal error with the ollama server. Error: POST predict: Post "http://127.0.0.1:53194/completion": read tcp 127.0.0.1:53196->127.0.0.1:53194: wsarecv: An existing connection was forcibly closed by the remote host. ### COMMAND "ollama run gemma3:27b" ## SPECIFICATION Geforce GTX 1060 6GB Intel i5 10400F ### Relevant log output ```shell [server.log](https://github.com/user-attachments/files/19222811/server.log) ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.0
GiteaMirror added the bug label 2026-04-12 17:51:25 -05:00
Author
Owner

@sabre-code commented on GitHub (Mar 13, 2025):

same here

ResponseError: POST predict: Post "http://127.0.0.1:40133/completion": EOF

<!-- gh-comment-id:2720092624 --> @sabre-code commented on GitHub (Mar 13, 2025): same here ResponseError: POST predict: Post "http://127.0.0.1:40133/completion": EOF
Author
Owner

@vlimki commented on GitHub (Mar 13, 2025):

Same here.

<!-- gh-comment-id:2720197627 --> @vlimki commented on GitHub (Mar 13, 2025): Same here.
Author
Owner

@stonega commented on GitHub (Mar 13, 2025):

Same hare

<!-- gh-comment-id:2720213217 --> @stonega commented on GitHub (Mar 13, 2025): Same hare
Author
Owner

@titandino88 commented on GitHub (Mar 13, 2025):

same here

<!-- gh-comment-id:2720353270 --> @titandino88 commented on GitHub (Mar 13, 2025): same here
Author
Owner

@SimpleYj commented on GitHub (Mar 13, 2025):

same here:Error: POST predict: Post "http://127.0.0.1:43845/completion": EOF

<!-- gh-comment-id:2720370200 --> @SimpleYj commented on GitHub (Mar 13, 2025): same here:Error: POST predict: Post "http://127.0.0.1:43845/completion": EOF
Author
Owner

@vlimki commented on GitHub (Mar 13, 2025):

https://github.com/ollama/ollama/issues/9699#issuecomment-2719330180

This helped me out

<!-- gh-comment-id:2720445003 --> @vlimki commented on GitHub (Mar 13, 2025): https://github.com/ollama/ollama/issues/9699#issuecomment-2719330180 This helped me out
Author
Owner

@ACheshirov commented on GitHub (Mar 13, 2025):

Same here: Error: POST predict: Post "http://127.0.0.1:43569/completion": EOF

<!-- gh-comment-id:2720639817 --> @ACheshirov commented on GitHub (Mar 13, 2025): Same here: Error: POST predict: Post "http://127.0.0.1:43569/completion": EOF
Author
Owner

@tuber84 commented on GitHub (Mar 13, 2025):

Same problem

<!-- gh-comment-id:2721364993 --> @tuber84 commented on GitHub (Mar 13, 2025): Same problem
Author
Owner

@omargohan commented on GitHub (Mar 14, 2025):

I had the same issue until I tried adding Environment="GGML_CUDA_ENABLE_UNIFIED_MEMORY=1" to ollama.service as suggested here.

<!-- gh-comment-id:2723816084 --> @omargohan commented on GitHub (Mar 14, 2025): I had the same issue until I tried adding `Environment="GGML_CUDA_ENABLE_UNIFIED_MEMORY=1"` to `ollama.service` as suggested [here](https://github.com/ollama/ollama/issues/9707#issuecomment-2719746522).
Author
Owner

@Walker555 commented on GitHub (Mar 14, 2025):

Fix for Docker users:
docker run -d --gpus=all -v /path/to/my/volumes/ollama:/root/.ollama -p 11434:11434 -e GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 --name ollama ollama/ollama:0.6.0

<!-- gh-comment-id:2723915202 --> @Walker555 commented on GitHub (Mar 14, 2025): Fix for Docker users: `docker run -d --gpus=all -v /path/to/my/volumes/ollama:/root/.ollama -p 11434:11434 -e GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 --name ollama ollama/ollama:0.6.0`
Author
Owner

@openelearning commented on GitHub (Mar 14, 2025):

Same problem with or without Environment="GGML_CUDA_ENABLE_UNIFIED_MEMORY=1" with gemma 27b or 12b, and with nvidia cuda.

<!-- gh-comment-id:2724892810 --> @openelearning commented on GitHub (Mar 14, 2025): Same problem with or without Environment="GGML_CUDA_ENABLE_UNIFIED_MEMORY=1" with gemma 27b or 12b, and with nvidia cuda.
Author
Owner

@kurokirasama commented on GitHub (Mar 15, 2025):

just reporting that the problem is not fixed in 0.6.1 for gemma3:27b in my main machine, but it was fixed in another machine (with lower resources), with the error in gemma3:4b.

<!-- gh-comment-id:2726982260 --> @kurokirasama commented on GitHub (Mar 15, 2025): just reporting that the problem is not fixed in 0.6.1 for gemma3:27b in my main machine, but it was fixed in another machine (with lower resources), with the error in gemma3:4b.
Author
Owner

@fansRealx commented on GitHub (Mar 24, 2025):

Same problem ,The number of words in your question exceeds the maximum token limit of the model. You can set the model's token length longer.

<!-- gh-comment-id:2746788713 --> @fansRealx commented on GitHub (Mar 24, 2025): Same problem ,The number of words in your question exceeds the maximum token limit of the model. You can set the model's token length longer.
Author
Owner

@tunisiano187 commented on GitHub (May 1, 2025):

0.6.6 has the same error, back to 0.6.5, solved it

<!-- gh-comment-id:2845262311 --> @tunisiano187 commented on GitHub (May 1, 2025): 0.6.6 has the same error, back to 0.6.5, solved it
Author
Owner

@jessegross commented on GitHub (May 1, 2025):

I believe that the original issue has been resolved, so I'm going to close this. If you are still seeing this with 0.6.7, please create a new issue and attach logs.

<!-- gh-comment-id:2845688461 --> @jessegross commented on GitHub (May 1, 2025): I believe that the original issue has been resolved, so I'm going to close this. If you are still seeing this with 0.6.7, please create a new issue and attach logs.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6344