[GH-ISSUE #14085] Qwen3-coder-Next:CUDA error:an illegal memory access was encountered on 0.15.5-rc3(Windows) #71256

Closed
opened 2026-05-05 00:57:31 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Attect on GitHub (Feb 5, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14085

What is the issue?

A cuda error while generate.
CPU:AMD 7950X
MEM:128G
GPU:RTX 3090
Windows 11 25H2
OLLAMA:0.15.5-rc3
MODEL:qwen3-coder-next

Relevant log output

[GIN] 2026/02/05 - 11:06:46 | 200 |   30.0556278s |       127.0.0.1 | POST     "/api/generate"
time=2026-02-05T11:06:46.478+08:00 level=DEBUG source=sched.go:557 msg="context for request finished"
time=2026-02-05T11:06:46.478+08:00 level=DEBUG source=sched.go:310 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3-coder-next:latest runner.inference="[{ID:GPU-9b5c78bb-c978-5e14-c6fa-a71074d1f71a Library:CUDA}]" runner.size="57.9 GiB" runner.vram="20.0 GiB" runner.parallel=1 runner.pid=96688 runner.model=C:\ProgramData\ollama\blobs\sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb runner.num_ctx=256000 duration=5m0s
time=2026-02-05T11:06:46.479+08:00 level=DEBUG source=sched.go:328 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3-coder-next:latest runner.inference="[{ID:GPU-9b5c78bb-c978-5e14-c6fa-a71074d1f71a Library:CUDA}]" runner.size="57.9 GiB" runner.vram="20.0 GiB" runner.parallel=1 runner.pid=96688 runner.model=C:\ProgramData\ollama\blobs\sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb runner.num_ctx=256000 refCount=0
time=2026-02-05T11:07:01.227+08:00 level=DEBUG source=sched.go:698 msg="evaluating already loaded" model=C:\ProgramData\ollama\blobs\sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb
time=2026-02-05T11:07:01.241+08:00 level=DEBUG source=server.go:1535 msg="completion request" images=0 prompt=89 format=""
time=2026-02-05T11:07:01.255+08:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=16 used=0 remaining=16
ggml_cuda_compute_forward: PAD failed
CUDA error: an illegal memory access was encountered
  current device: 0, in function ggml_cuda_compute_forward at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:2882
  err
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:94: CUDA error

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.15.5-rc3

Originally created by @Attect on GitHub (Feb 5, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14085 ### What is the issue? A cuda error while generate. CPU:AMD 7950X MEM:128G GPU:RTX 3090 Windows 11 25H2 OLLAMA:0.15.5-rc3 MODEL:qwen3-coder-next ### Relevant log output ```shell [GIN] 2026/02/05 - 11:06:46 | 200 | 30.0556278s | 127.0.0.1 | POST "/api/generate" time=2026-02-05T11:06:46.478+08:00 level=DEBUG source=sched.go:557 msg="context for request finished" time=2026-02-05T11:06:46.478+08:00 level=DEBUG source=sched.go:310 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3-coder-next:latest runner.inference="[{ID:GPU-9b5c78bb-c978-5e14-c6fa-a71074d1f71a Library:CUDA}]" runner.size="57.9 GiB" runner.vram="20.0 GiB" runner.parallel=1 runner.pid=96688 runner.model=C:\ProgramData\ollama\blobs\sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb runner.num_ctx=256000 duration=5m0s time=2026-02-05T11:06:46.479+08:00 level=DEBUG source=sched.go:328 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3-coder-next:latest runner.inference="[{ID:GPU-9b5c78bb-c978-5e14-c6fa-a71074d1f71a Library:CUDA}]" runner.size="57.9 GiB" runner.vram="20.0 GiB" runner.parallel=1 runner.pid=96688 runner.model=C:\ProgramData\ollama\blobs\sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb runner.num_ctx=256000 refCount=0 time=2026-02-05T11:07:01.227+08:00 level=DEBUG source=sched.go:698 msg="evaluating already loaded" model=C:\ProgramData\ollama\blobs\sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb time=2026-02-05T11:07:01.241+08:00 level=DEBUG source=server.go:1535 msg="completion request" images=0 prompt=89 format="" time=2026-02-05T11:07:01.255+08:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=16 used=0 remaining=16 ggml_cuda_compute_forward: PAD failed CUDA error: an illegal memory access was encountered current device: 0, in function ggml_cuda_compute_forward at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:2882 err C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:94: CUDA error ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.15.5-rc3
GiteaMirror added the bug label 2026-05-05 00:57:31 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71256