[GH-ISSUE #10474] Error: llama runner process has terminated: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed #68947

Closed
opened 2026-05-04 16:17:57 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @bjorn-ver on GitHub (Apr 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10474

What is the issue?

ollama run phi4:latest
Error: llama runner process has terminated: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed

It worked 2 days ago. tried different models. no luck.

  • happened after windows updates. (don't know if this is the cause)
  • tried updating cuda from 12.4 to 12.8 no luck

Updated windows packages:
2025-04 .NET 8.0.15 Security Update for x64 Client (KB5056686)
2024-11 .NET 6.0.36 Update for x64 Client (KB5047486)
2025-04 Preview-versie van cumulatieve update voor .NET Framework 3.5 en 4.8.1 voor Windows 11, version 24H2 voor x64 (KB5056579)
2025-04 Preview cumulatieve update voor Windows 11 Version 24H2 voor systemen op basis van x64 (KB5055627)

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:42:46_Pacific_Standard_Time_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

nvidia-smi
Tue Apr 29 16:21:33 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 576.02 Driver Version: 576.02 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4080 WDDM | 00000000:01:00.0 On | N/A |
| 0% 41C P8 28W / 320W | 2440MiB / 16376MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

Relevant log output

llama_context: KV self size  = 1600.00 MiB, K (f16):  800.00 MiB, V (f16):  800.00 MiB
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
time=2025-04-29T16:19:50.605+02:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409"
time=2025-04-29T16:19:50.632+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed"
[GIN] 2025/04/29 - 16:19:50 | 500 |    4.2487933s |       127.0.0.1 | POST     "/api/generate"
time=2025-04-29T16:19:55.649+02:00 level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0166343 model=C:\Users\bjorn\.ollama\models\blobs\sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20


llama_context: KV self size  = 1600.00 MiB, K (f16):  800.00 MiB, V (f16):  800.00 MiB
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
time=2025-04-29T16:19:50.605+02:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409"
time=2025-04-29T16:19:50.632+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed"
[GIN] 2025/04/29 - 16:19:50 | 500 |    4.2487933s |       127.0.0.1 | POST     "/api/generate"
time=2025-04-29T16:19:55.649+02:00 level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0166343 model=C:\Users\bjorn\.ollama\models\blobs\sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.6.6

Originally created by @bjorn-ver on GitHub (Apr 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10474 ### What is the issue? `ollama run phi4:latest` Error: llama runner process has terminated: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed It worked 2 days ago. tried different models. no luck. - happened after windows updates. (don't know if this is the cause) - tried updating cuda from 12.4 to 12.8 no luck - **Updated windows packages:** 2025-04 .NET 8.0.15 Security Update for x64 Client (KB5056686) 2024-11 .NET 6.0.36 Update for x64 Client (KB5047486) 2025-04 Preview-versie van cumulatieve update voor .NET Framework 3.5 en 4.8.1 voor Windows 11, version 24H2 voor x64 (KB5056579) 2025-04 Preview cumulatieve update voor Windows 11 Version 24H2 voor systemen op basis van x64 (KB5055627) **nvcc --version** nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Fri_Feb_21_20:42:46_Pacific_Standard_Time_2025 Cuda compilation tools, release 12.8, V12.8.93 Build cuda_12.8.r12.8/compiler.35583870_0 **nvidia-smi** Tue Apr 29 16:21:33 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 576.02 Driver Version: 576.02 CUDA Version: 12.9 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4080 WDDM | 00000000:01:00.0 On | N/A | | 0% 41C P8 28W / 320W | 2440MiB / 16376MiB | 6% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ ### Relevant log output ```shell llama_context: KV self size = 1600.00 MiB, K (f16): 800.00 MiB, V (f16): 800.00 MiB D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed time=2025-04-29T16:19:50.605+02:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409" time=2025-04-29T16:19:50.632+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed" [GIN] 2025/04/29 - 16:19:50 | 500 | 4.2487933s | 127.0.0.1 | POST "/api/generate" time=2025-04-29T16:19:55.649+02:00 level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0166343 model=C:\Users\bjorn\.ollama\models\blobs\sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 llama_context: KV self size = 1600.00 MiB, K (f16): 800.00 MiB, V (f16): 800.00 MiB D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed time=2025-04-29T16:19:50.605+02:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409" time=2025-04-29T16:19:50.632+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed" [GIN] 2025/04/29 - 16:19:50 | 500 | 4.2487933s | 127.0.0.1 | POST "/api/generate" time=2025-04-29T16:19:55.649+02:00 level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0166343 model=C:\Users\bjorn\.ollama\models\blobs\sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.6
GiteaMirror added the bug label 2026-05-04 16:17:58 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 29, 2025):

https://github.com/ollama/ollama/issues/10469#issuecomment-2838430227

<!-- gh-comment-id:2839283856 --> @rick-github commented on GitHub (Apr 29, 2025): https://github.com/ollama/ollama/issues/10469#issuecomment-2838430227
Author
Owner

@mattjrutter commented on GitHub (Apr 29, 2025):

Others confirm seeing the issue after updating Docker to 4.41.0.
https://github.com/ollama/ollama/issues/9149
Reinstall of ollama does nothing. Some have mentioned downgrading Docker to 4.40.0 gets it back up and running. I haven't confirmed myself.

Renaming dlls also fixing possibly: https://github.com/ollama/ollama/issues/9509

<!-- gh-comment-id:2840275024 --> @mattjrutter commented on GitHub (Apr 29, 2025): Others confirm seeing the issue after updating Docker to 4.41.0. https://github.com/ollama/ollama/issues/9149 Reinstall of ollama does nothing. Some have mentioned downgrading Docker to 4.40.0 gets it back up and running. I haven't confirmed myself. Renaming dlls also fixing possibly: https://github.com/ollama/ollama/issues/9509
Author
Owner

@seangalie commented on GitHub (Apr 29, 2025):

Renaming the DLLs and then turning off "Enable Docker Model Runner" in Docker Desktop settings > Features in development seems to not only fix the issue but doesn't break anything else.

<!-- gh-comment-id:2840304297 --> @seangalie commented on GitHub (Apr 29, 2025): Renaming the DLLs and then turning off "Enable Docker Model Runner" in Docker Desktop settings > Features in development seems to not only fix the issue but doesn't break anything else.
Author
Owner

@mattjrutter commented on GitHub (Apr 29, 2025):

Renaming the DLLs and then turning off "Enable Docker Model Runner" in Docker Desktop settings > Features in development seems to not only fix the issue but doesn't break anything else.

Yep. This worked for me. A band-aid of a solution, but it certainly works. Hopefully it helps those working on Ollama to determine next steps. Ollama + Docker is a fairly popular duo to install together. It seems that llama.cpp has been an issue with Ollama since March at least, and the current advice was to not use llama.cpp. But with Docker now bringing it in, that just pushed up the priority.

<!-- gh-comment-id:2840458525 --> @mattjrutter commented on GitHub (Apr 29, 2025): > Renaming the DLLs and then turning off "Enable Docker Model Runner" in Docker Desktop settings > Features in development seems to not only fix the issue but doesn't break anything else. Yep. This worked for me. A band-aid of a solution, but it certainly works. Hopefully it helps those working on Ollama to determine next steps. Ollama + Docker is a fairly popular duo to install together. It seems that llama.cpp has been an issue with Ollama since March at least, and the current advice was to not use llama.cpp. But with Docker now bringing it in, that just pushed up the priority.
Author
Owner

@oncedays commented on GitHub (Apr 30, 2025):

卸载 Docker后好了

<!-- gh-comment-id:2840824797 --> @oncedays commented on GitHub (Apr 30, 2025): 卸载 Docker后好了
Author
Owner

@kiview commented on GitHub (Apr 30, 2025):

Hey folks, Docker employee working on Docker Model Runner here.
I can confirm this issue through an unfortunate interaction between both components:

  • Docker Desktop puts its binary folder on the system path, and that's where our own llama.cpp and DLLs resides
  • Ollama loads its DLLs from the PATH

I think the best way forward, is to make changes from both side:

  • We are going to make sure our llama.cpp and DLLs don't end up in the PATH (we are working on this and are hoping for the patch to be released today
  • Ollama should explicitly load their own dependencies, instead of relying what is available on the PATH

We are very sorry for the inconvenience to all users and to our colleagues at Ollama 🙇

<!-- gh-comment-id:2840965229 --> @kiview commented on GitHub (Apr 30, 2025): Hey folks, Docker employee working on Docker Model Runner here. I can confirm this issue through an unfortunate interaction between both components: - Docker Desktop puts its binary folder on the system path, and that's where our own llama.cpp and DLLs resides - Ollama loads its DLLs from the PATH I think the best way forward, is to make changes from both side: - We are going to make sure our llama.cpp and DLLs don't end up in the PATH (we are working on this and are hoping for the patch to be released **today** - Ollama should explicitly load their own dependencies, instead of relying what is available on the PATH We are very sorry for the inconvenience to all users and to our colleagues at Ollama 🙇
Author
Owner

@rick-github commented on GitHub (Apr 30, 2025):

#10485

<!-- gh-comment-id:2841549873 --> @rick-github commented on GitHub (Apr 30, 2025): #10485
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68947