[GH-ISSUE #14497] qwen3.5:35b throw 500 Internal Server Error on any second prompt #9403

Closed
opened 2026-04-12 22:19:37 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @BlackEric001 on GitHub (Feb 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14497

What is the issue?

Windows 10
Ollama 0.17.1
Ryzen 5 2600, 32 GB RAM, RTX3060 12 GB

Model qwen3.5:35b
Any first prompt complete success.
Any second prompt crushed with "Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"

Relevant log output

CUDA error: invalid argument
  current device: 0, in function ggml_cuda_cpy at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\cpy.cu:438
  cudaMemcpyAsyncReserve(src1_ddc, src0_ddc, ggml_nbytes(src0), cudaMemcpyDeviceToDevice, main_stream)
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:94: CUDA error
time=2026-02-27T15:25:27.735+03:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:50951/completion\": read tcp 127.0.0.1:50961->127.0.0.1:50951: wsarecv: An existing connection was forcibly closed by the remote host."
[GIN] 2026/02/27 - 15:25:27 | 500 |    693.7146ms |       127.0.0.1 | POST     "/api/chat"
time=2026-02-27T15:25:27.735+03:00 level=DEBUG source=sched.go:433 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:35b runner.inference="[{ID:GPU-37986fe8-f496-83f3-157b-42103078edb3 Library:CUDA}]" runner.size="25.2 GiB" runner.vram="10.6 GiB" runner.parallel=1 runner.pid=9008 runner.model=C:\Users\Eric\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a runner.num_ctx=4096
time=2026-02-27T15:25:27.735+03:00 level=DEBUG source=sched.go:338 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.5:35b runner.inference="[{ID:GPU-37986fe8-f496-83f3-157b-42103078edb3 Library:CUDA}]" runner.size="25.2 GiB" runner.vram="10.6 GiB" runner.parallel=1 runner.pid=9008 runner.model=C:\Users\Eric\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a runner.num_ctx=4096 duration=5m0s
time=2026-02-27T15:25:27.735+03:00 level=DEBUG source=sched.go:356 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:35b runner.inference="[{ID:GPU-37986fe8-f496-83f3-157b-42103078edb3 Library:CUDA}]" runner.size="25.2 GiB" runner.vram="10.6 GiB" runner.parallel=1 runner.pid=9008 runner.model=C:\Users\Eric\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a runner.num_ctx=4096 refCount=0
time=2026-02-27T15:25:28.962+03:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @BlackEric001 on GitHub (Feb 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14497 ### What is the issue? Windows 10 Ollama 0.17.1 Ryzen 5 2600, 32 GB RAM, RTX3060 12 GB Model qwen3.5:35b Any first prompt complete success. Any second prompt crushed with "Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details" ### Relevant log output ```shell CUDA error: invalid argument current device: 0, in function ggml_cuda_cpy at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\cpy.cu:438 cudaMemcpyAsyncReserve(src1_ddc, src0_ddc, ggml_nbytes(src0), cudaMemcpyDeviceToDevice, main_stream) C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:94: CUDA error time=2026-02-27T15:25:27.735+03:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:50951/completion\": read tcp 127.0.0.1:50961->127.0.0.1:50951: wsarecv: An existing connection was forcibly closed by the remote host." [GIN] 2026/02/27 - 15:25:27 | 500 | 693.7146ms | 127.0.0.1 | POST "/api/chat" time=2026-02-27T15:25:27.735+03:00 level=DEBUG source=sched.go:433 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:35b runner.inference="[{ID:GPU-37986fe8-f496-83f3-157b-42103078edb3 Library:CUDA}]" runner.size="25.2 GiB" runner.vram="10.6 GiB" runner.parallel=1 runner.pid=9008 runner.model=C:\Users\Eric\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a runner.num_ctx=4096 time=2026-02-27T15:25:27.735+03:00 level=DEBUG source=sched.go:338 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.5:35b runner.inference="[{ID:GPU-37986fe8-f496-83f3-157b-42103078edb3 Library:CUDA}]" runner.size="25.2 GiB" runner.vram="10.6 GiB" runner.parallel=1 runner.pid=9008 runner.model=C:\Users\Eric\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a runner.num_ctx=4096 duration=5m0s time=2026-02-27T15:25:27.735+03:00 level=DEBUG source=sched.go:356 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:35b runner.inference="[{ID:GPU-37986fe8-f496-83f3-157b-42103078edb3 Library:CUDA}]" runner.size="25.2 GiB" runner.vram="10.6 GiB" runner.parallel=1 runner.pid=9008 runner.model=C:\Users\Eric\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a runner.num_ctx=4096 refCount=0 time=2026-02-27T15:25:28.962+03:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1" ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 22:19:37 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9403