[GH-ISSUE #10450] An existing connection was forcibly closed by the remote host. GGML_ASSERT failed and "key not found" #53383

Closed
opened 2026-04-29 02:49:52 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @SakirHussain on GitHub (Apr 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10450

What is the issue?

After the recent Ollama update, I am unable to run gemma3:4b on my NVIDIA GTX 1650.

For context I have been able to run the model prior to the update but now there seems to be some GPU memory issue that is causing it to give the following error.

ollama._types.ResponseError: POST predict: Post "http://127.0.0.1:56002/completion": read tcp 127.0.0.1:58282->127.0.0.1:56002: wsarecv: An existing connection was forcibly closed by the remote host. (status code: -1)

I have also provided the ollama serve logs below.

I have looked at other issues where it was mentioned that there maybe something taking up the port which is why Ollama is timing out.
but the ollama serve logs show that there seems to be some "key not found" and failure of GGML alloc.

Relevant log output

PS C:\Users\orpheus> ollama serve
2025/04/29 04:51:36 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\orpheus\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-04-29T04:51:36.423+05:30 level=INFO source=images.go:458 msg="total blobs: 5"
time=2025-04-29T04:51:36.424+05:30 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-04-29T04:51:36.425+05:30 level=INFO source=routes.go:1299 msg="Listening on 127.0.0.1:11434 (version 0.6.6)"
time=2025-04-29T04:51:36.425+05:30 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-29T04:51:36.425+05:30 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-04-29T04:51:36.425+05:30 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12
time=2025-04-29T04:51:36.796+05:30 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-78440d0b-e257-86ed-efbc-7e199fb00e4c library=cuda compute=7.5 driver=12.8 name="NVIDIA GeForce GTX 1650" overhead="638.5 MiB"
time=2025-04-29T04:51:36.798+05:30 level=INFO source=types.go:130 msg="inference compute" id=GPU-78440d0b-e257-86ed-efbc-7e199fb00e4c library=cuda variant=v12 compute=7.5 driver=12.8 name="NVIDIA GeForce GTX 1650" total="4.0 GiB" available="3.2 GiB"
time=2025-04-29T04:52:35.471+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-29T04:52:35.538+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-29T04:52:35.582+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-29T04:52:35.606+05:30 level=INFO source=server.go:105 msg="system memory" total="15.3 GiB" free="5.1 GiB" free_swap="8.9 GiB"
time=2025-04-29T04:52:35.608+05:30 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[3.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.6 GiB" memory.required.partial="0 B" memory.required.kv="214.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-04-29T04:52:35.708+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-29T04:52:35.714+05:30 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-29T04:52:35.729+05:30 level=INFO source=server.go:405 msg="starting llama server" cmd="C:\\Users\\orpheus\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\orpheus\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 2048 --batch-size 512 --threads 6 --no-mmap --parallel 1 --port 64435"
time=2025-04-29T04:52:35.733+05:30 level=INFO source=sched.go:451 msg="loaded runners" count=1
time=2025-04-29T04:52:35.733+05:30 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-04-29T04:52:35.733+05:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-04-29T04:52:35.752+05:30 level=INFO source=runner.go:866 msg="starting ollama engine"
time=2025-04-29T04:52:35.754+05:30 level=INFO source=runner.go:929 msg="Server listening on 127.0.0.1:64435"
time=2025-04-29T04:52:35.848+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-29T04:52:35.849+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-04-29T04:52:35.850+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-04-29T04:52:35.850+05:30 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\orpheus\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-04-29T04:52:35.886+05:30 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-04-29T04:52:35.890+05:30 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-04-29T04:52:35.985+05:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
time=2025-04-29T04:52:37.890+05:30 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-29T04:52:37.953+05:30 level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="58.0 MiB"
time=2025-04-29T04:52:37.988+05:30 level=INFO source=server.go:619 msg="llama runner started in 2.26 seconds"
ggml-alloc.c:819: GGML_ASSERT(talloc->buffer_id >= 0) failed
[GIN] 2025/04/29 - 04:52:38 | 200 |    2.7205007s |       127.0.0.1 | POST     "/api/generate"
time=2025-04-29T04:52:38.352+05:30 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @SakirHussain on GitHub (Apr 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10450 ### What is the issue? After the recent Ollama update, I am unable to run gemma3:4b on my NVIDIA GTX 1650. For context I have been able to run the model prior to the update but now there seems to be some GPU memory issue that is causing it to give the following error. ollama._types.ResponseError: POST predict: Post "http://127.0.0.1:56002/completion": read tcp 127.0.0.1:58282->127.0.0.1:56002: wsarecv: An existing connection was forcibly closed by the remote host. (status code: -1) I have also provided the ollama serve logs below. I have looked at other issues where it was mentioned that there maybe something taking up the port which is why Ollama is timing out. but the ollama serve logs show that there seems to be some "key not found" and failure of GGML alloc. ### Relevant log output ```shell PS C:\Users\orpheus> ollama serve 2025/04/29 04:51:36 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\orpheus\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-04-29T04:51:36.423+05:30 level=INFO source=images.go:458 msg="total blobs: 5" time=2025-04-29T04:51:36.424+05:30 level=INFO source=images.go:465 msg="total unused blobs removed: 0" time=2025-04-29T04:51:36.425+05:30 level=INFO source=routes.go:1299 msg="Listening on 127.0.0.1:11434 (version 0.6.6)" time=2025-04-29T04:51:36.425+05:30 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-04-29T04:51:36.425+05:30 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-04-29T04:51:36.425+05:30 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12 time=2025-04-29T04:51:36.796+05:30 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-78440d0b-e257-86ed-efbc-7e199fb00e4c library=cuda compute=7.5 driver=12.8 name="NVIDIA GeForce GTX 1650" overhead="638.5 MiB" time=2025-04-29T04:51:36.798+05:30 level=INFO source=types.go:130 msg="inference compute" id=GPU-78440d0b-e257-86ed-efbc-7e199fb00e4c library=cuda variant=v12 compute=7.5 driver=12.8 name="NVIDIA GeForce GTX 1650" total="4.0 GiB" available="3.2 GiB" time=2025-04-29T04:52:35.471+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-29T04:52:35.538+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-29T04:52:35.582+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-29T04:52:35.606+05:30 level=INFO source=server.go:105 msg="system memory" total="15.3 GiB" free="5.1 GiB" free_swap="8.9 GiB" time=2025-04-29T04:52:35.608+05:30 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[3.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.6 GiB" memory.required.partial="0 B" memory.required.kv="214.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-04-29T04:52:35.708+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-29T04:52:35.714+05:30 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-29T04:52:35.723+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-29T04:52:35.729+05:30 level=INFO source=server.go:405 msg="starting llama server" cmd="C:\\Users\\orpheus\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\orpheus\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 2048 --batch-size 512 --threads 6 --no-mmap --parallel 1 --port 64435" time=2025-04-29T04:52:35.733+05:30 level=INFO source=sched.go:451 msg="loaded runners" count=1 time=2025-04-29T04:52:35.733+05:30 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" time=2025-04-29T04:52:35.733+05:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" time=2025-04-29T04:52:35.752+05:30 level=INFO source=runner.go:866 msg="starting ollama engine" time=2025-04-29T04:52:35.754+05:30 level=INFO source=runner.go:929 msg="Server listening on 127.0.0.1:64435" time=2025-04-29T04:52:35.848+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-29T04:52:35.849+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-04-29T04:52:35.850+05:30 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-04-29T04:52:35.850+05:30 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\orpheus\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-04-29T04:52:35.886+05:30 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-04-29T04:52:35.890+05:30 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-04-29T04:52:35.985+05:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" time=2025-04-29T04:52:37.890+05:30 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-29T04:52:37.898+05:30 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-29T04:52:37.953+05:30 level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="58.0 MiB" time=2025-04-29T04:52:37.988+05:30 level=INFO source=server.go:619 msg="llama runner started in 2.26 seconds" ggml-alloc.c:819: GGML_ASSERT(talloc->buffer_id >= 0) failed [GIN] 2025/04/29 - 04:52:38 | 200 | 2.7205007s | 127.0.0.1 | POST "/api/generate" time=2025-04-29T04:52:38.352+05:30 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409" ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-29 02:49:52 -05:00
Author
Owner

@SakirHussain commented on GitHub (Apr 28, 2025):

OS - Windows 11

CPU - AMD Ryzen 5 5600H with Radeon Graphics

GPU - NVIDIA GeForce GTX 1650

Ollama version - 0.6.6

I have updated all drivers as well

<!-- gh-comment-id:2837000145 --> @SakirHussain commented on GitHub (Apr 28, 2025): OS - Windows 11 CPU - AMD Ryzen 5 5600H with Radeon Graphics GPU - NVIDIA GeForce GTX 1650 Ollama version - 0.6.6 I have updated all drivers as well
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

#10410

<!-- gh-comment-id:2837033786 --> @rick-github commented on GitHub (Apr 28, 2025): #10410
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53383