[GH-ISSUE #14088] qwen3 next no working in 0.15.5 when i config OLLAMA_NUM_PARALLEL=2 #14087 #71259

Open
opened 2026-05-05 00:58:25 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @r4c0box on GitHub (Feb 5, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14088

Originally assigned to: @jmorganca on GitHub.

What is the issue?

What is the issue?

In mycase, when i test Environment="OLLAMA_NUM_PARALLEL=num" in /etc/systemd/system/ollama.service start to use qwen3-coder-next:latest, it is always report "
Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details
", i try it several times ,and it still the same error。but if i not set this Evironment ,it work well。

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.15.5-rc3

Originally created by @r4c0box on GitHub (Feb 5, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14088 Originally assigned to: @jmorganca on GitHub. ### What is the issue? What is the issue? In mycase, when i test Environment="OLLAMA_NUM_PARALLEL=num" in /etc/systemd/system/ollama.service start to use qwen3-coder-next:latest, it is always report " Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details ", i try it several times ,and it still the same error。but if i not set this Evironment ,it work well。 ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.15.5-rc3
GiteaMirror added the bug label 2026-05-05 00:58:26 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 5, 2026):

Server logs may aid in debugging.

<!-- gh-comment-id:3851146502 --> @rick-github commented on GitHub (Feb 5, 2026): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may aid in debugging.
Author
Owner

@r4c0box commented on GitHub (Feb 5, 2026):

2月 05 14:28:04 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:28:04 | 200 | 5.017413597s | 192.168.5.1 | POST "/v1/messages?beta=true"
2月 05 14:28:29 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:28:29 | 404 | 273.601µs | 192.168.5.1 | POST "/v1/messages?beta=true"
2月 05 14:29:28 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:28 | 200 | 5.762331753s | 192.168.5.1 | POST "/api/chat"
2月 05 14:29:44 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:44 | 404 | 279.679µs | 192.168.5.1 | POST "/v1/messages?beta=true"
2月 05 14:29:44 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:44.967+08:00 level=WARN source=routes.go:2093 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder-next:latest
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: panic: failed to build graph: model does not support operation
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: goroutine 11 [running]:
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002330e0, {0x58942f7cb020, 0xc000378280})
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: github.com/ollama/ollama/runner/ollamarunner/runner.go:454 +0x325
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: github.com/ollama/ollama/runner/ollamarunner/runner.go:1422 +0x4c9
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:45.578+08:00 level=ERROR source=server.go:1609 msg="post predict" error="Post "http://127.0.0.1:35767/completion": EOF"
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:45 | 500 | 676.722835ms | 192.168.5.1 | POST "/v1/messages?beta=true"
2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:45 | 200 | 1.071997839s | 192.168.5.1 | POST "/api/chat"
2月 05 14:29:46 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:46.256+08:00 level=WARN source=routes.go:2093 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder-next:latest
2月 05 14:29:46 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:46.321+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42109"
2月 05 14:29:46 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:46.978+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36867"
2月 05 14:29:47 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:47.399+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44839"
2月 05 14:29:47 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:47.931+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38613"
2月 05 14:29:48 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:48.557+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37609"
2月 05 14:29:49 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:49.223+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43333"
2月 05 14:29:49 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:49.887+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34481"
2月 05 14:29:50 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:50.593+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38669"
2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35331"
2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.728+08:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] error="failed to finish discovery before timeout"
2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.728+08:00 level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"
2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.728+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40683"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.380+08:00 level=INFO source=server.go:246 msg="enabling flash attention"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb --port 36421"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:463 msg="system memory" total="61.3 GiB" free="57.4 GiB" free_swap="8.0 GiB"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-973dab5e-041e-d47f-5b51-27c124287be6 library=CUDA available="20.8 GiB" free="21.3 GiB" minimum="457.0 MiB" overhead="0 B"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 library=CUDA available="20.9 GiB" free="21.3 GiB" minimum="457.0 MiB" overhead="0 B"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 library=CUDA available="20.9 GiB" free="21.3 GiB" minimum="457.0 MiB" overhead="0 B"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=server.go:756 msg="loading model" "model layers"=49 requested=-1
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.388+08:00 level=INFO source=runner.go:1409 msg="starting ollama engine"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.388+08:00 level=INFO source=runner.go:1444 msg="Server listening on 127.0.0.1:36421"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.392+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:fit LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.416+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3next file_type=Q4_K_M name="" description="" num_tensors=843 num_key_values=38
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: ggml_cuda_init: found 3 CUDA devices:
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-973dab5e-041e-d47f-5b51-27c124287be6
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: Device 1: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: Device 2: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.752+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.207+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:fit LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:16(0..15) ID:GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 Layers:16(16..31) ID:GPU-973dab5e-041e-d47f-5b51-27c124287be6 Layers:17(32..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.647+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:alloc LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:16(0..15) ID:GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 Layers:16(16..31) ID:GPU-973dab5e-041e-d47f-5b51-27c124287be6 Layers:17(32..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:commit LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:16(0..15) ID:GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 Layers:16(16..31) ID:GPU-973dab5e-041e-d47f-5b51-27c124287be6 Layers:17(32..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=ggml.go:482 msg="offloading 48 repeating layers to GPU"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=ggml.go:494 msg="offloaded 49/49 layers to GPU"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="16.2 GiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:240 msg="model weights" device=CUDA1 size="15.9 GiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:240 msg="model weights" device=CUDA2 size="15.9 GiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="166.9 MiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="2.1 GiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA1 size="2.1 GiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA2 size="2.1 GiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="220.4 MiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA1 size="318.2 MiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA2 size="220.4 MiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.3 MiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:272 msg="total memory" size="55.3 GiB"
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=sched.go:537 msg="loaded runners" count=1
2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=server.go:1349 msg="waiting for llama runner to start responding"
2月 05 14:29:54 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.999+08:00 level=INFO source=server.go:1383 msg="waiting for server to become available" status="llm server loading model"
2月 05 14:30:06 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:30:06.788+08:00 level=INFO source=server.go:1387 msg="llama runner started in 14.41 seconds"
2月 05 14:30:35 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:30:35 | 200 | 49.099064684s | 192.168.5.1 | POST "/v1/messages?beta=true"
2月 05 14:30:35 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:30:35.449+08:00 level=WARN source=routes.go:2093 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder-next:latest
2月 05 14:30:42 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:30:42 | 200 | 7.193161067s | 192.168.5.1 | POST "/v1/messages?beta=true"

<!-- gh-comment-id:3851341457 --> @r4c0box commented on GitHub (Feb 5, 2026): 2月 05 14:28:04 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:28:04 | 200 | 5.017413597s | 192.168.5.1 | POST "/v1/messages?beta=true" 2月 05 14:28:29 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:28:29 | 404 | 273.601µs | 192.168.5.1 | POST "/v1/messages?beta=true" 2月 05 14:29:28 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:28 | 200 | 5.762331753s | 192.168.5.1 | POST "/api/chat" 2月 05 14:29:44 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:44 | 404 | 279.679µs | 192.168.5.1 | POST "/v1/messages?beta=true" 2月 05 14:29:44 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:44.967+08:00 level=WARN source=routes.go:2093 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder-next:latest 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: panic: failed to build graph: model does not support operation 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: goroutine 11 [running]: 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002330e0, {0x58942f7cb020, 0xc000378280}) 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: github.com/ollama/ollama/runner/ollamarunner/runner.go:454 +0x325 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: github.com/ollama/ollama/runner/ollamarunner/runner.go:1422 +0x4c9 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:45.578+08:00 level=ERROR source=server.go:1609 msg="post predict" error="Post \"http://127.0.0.1:35767/completion\": EOF" 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:45 | 500 | 676.722835ms | 192.168.5.1 | POST "/v1/messages?beta=true" 2月 05 14:29:45 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:29:45 | 200 | 1.071997839s | 192.168.5.1 | POST "/api/chat" 2月 05 14:29:46 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:46.256+08:00 level=WARN source=routes.go:2093 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder-next:latest 2月 05 14:29:46 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:46.321+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42109" 2月 05 14:29:46 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:46.978+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36867" 2月 05 14:29:47 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:47.399+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44839" 2月 05 14:29:47 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:47.931+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38613" 2月 05 14:29:48 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:48.557+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37609" 2月 05 14:29:49 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:49.223+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43333" 2月 05 14:29:49 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:49.887+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34481" 2月 05 14:29:50 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:50.593+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38669" 2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35331" 2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.728+08:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] error="failed to finish discovery before timeout" 2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.728+08:00 level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values" 2月 05 14:29:51 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:51.728+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40683" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.380+08:00 level=INFO source=server.go:246 msg="enabling flash attention" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-30e51a7cb1cf1333b9e298b90b4c7790fe2572d8736b002482a0ac96328a2ffb --port 36421" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:463 msg="system memory" total="61.3 GiB" free="57.4 GiB" free_swap="8.0 GiB" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-973dab5e-041e-d47f-5b51-27c124287be6 library=CUDA available="20.8 GiB" free="21.3 GiB" minimum="457.0 MiB" overhead="0 B" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 library=CUDA available="20.9 GiB" free="21.3 GiB" minimum="457.0 MiB" overhead="0 B" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 library=CUDA available="20.9 GiB" free="21.3 GiB" minimum="457.0 MiB" overhead="0 B" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.381+08:00 level=INFO source=server.go:756 msg="loading model" "model layers"=49 requested=-1 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.388+08:00 level=INFO source=runner.go:1409 msg="starting ollama engine" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.388+08:00 level=INFO source=runner.go:1444 msg="Server listening on 127.0.0.1:36421" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.392+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:fit LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.416+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3next file_type=Q4_K_M name="" description="" num_tensors=843 num_key_values=38 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: ggml_cuda_init: found 3 CUDA devices: 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-973dab5e-041e-d47f-5b51-27c124287be6 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: Device 1: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: Device 2: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so 2月 05 14:29:52 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:52.752+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.207+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:fit LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:16(0..15) ID:GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 Layers:16(16..31) ID:GPU-973dab5e-041e-d47f-5b51-27c124287be6 Layers:17(32..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.647+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:alloc LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:16(0..15) ID:GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 Layers:16(16..31) ID:GPU-973dab5e-041e-d47f-5b51-27c124287be6 Layers:17(32..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=runner.go:1282 msg=load request="{Operation:commit LoraPath:[] Parallel:2 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-3b1956ff-7570-052e-c344-9d8d2ea409d4 Layers:16(0..15) ID:GPU-0e6bd110-64ab-b976-743a-2ec6e22ec500 Layers:16(16..31) ID:GPU-973dab5e-041e-d47f-5b51-27c124287be6 Layers:17(32..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=ggml.go:482 msg="offloading 48 repeating layers to GPU" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=ggml.go:494 msg="offloaded 49/49 layers to GPU" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="16.2 GiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:240 msg="model weights" device=CUDA1 size="15.9 GiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:240 msg="model weights" device=CUDA2 size="15.9 GiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="166.9 MiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="2.1 GiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA1 size="2.1 GiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA2 size="2.1 GiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="220.4 MiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA1 size="318.2 MiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA2 size="220.4 MiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.3 MiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=device.go:272 msg="total memory" size="55.3 GiB" 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=sched.go:537 msg="loaded runners" count=1 2月 05 14:29:53 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.982+08:00 level=INFO source=server.go:1349 msg="waiting for llama runner to start responding" 2月 05 14:29:54 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:29:53.999+08:00 level=INFO source=server.go:1383 msg="waiting for server to become available" status="llm server loading model" 2月 05 14:30:06 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:30:06.788+08:00 level=INFO source=server.go:1387 msg="llama runner started in 14.41 seconds" 2月 05 14:30:35 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:30:35 | 200 | 49.099064684s | 192.168.5.1 | POST "/v1/messages?beta=true" 2月 05 14:30:35 test-B760M-Snow-Dream-wifi-W ollama[18718]: time=2026-02-05T14:30:35.449+08:00 level=WARN source=routes.go:2093 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder-next:latest 2月 05 14:30:42 test-B760M-Snow-Dream-wifi-W ollama[18718]: [GIN] 2026/02/05 - 14:30:42 | 200 | 7.193161067s | 192.168.5.1 | POST "/v1/messages?beta=true"
Author
Owner

@r4c0box commented on GitHub (Feb 5, 2026):

when i use two client answer question at time same time,it course the error like the log above

<!-- gh-comment-id:3851347736 --> @r4c0box commented on GitHub (Feb 5, 2026): when i use two client answer question at time same time,it course the error like the log above
Author
Owner

@znowfox commented on GitHub (Feb 5, 2026):

So, 0.15.5 RC1 worked, stopped working rc2, rc3.

<!-- gh-comment-id:3851645672 --> @znowfox commented on GitHub (Feb 5, 2026): So, 0.15.5 RC1 worked, stopped working rc2, rc3.
Author
Owner

@r4c0box commented on GitHub (Feb 5, 2026):

So, 0.15.5 RC1 worked, stopped working rc2, rc3.

ok,i will try rc1 to test again

<!-- gh-comment-id:3851666958 --> @r4c0box commented on GitHub (Feb 5, 2026): > So, 0.15.5 RC1 worked, stopped working rc2, rc3. ok,i will try rc1 to test again
Author
Owner

@r4c0box commented on GitHub (Feb 5, 2026):

but,i cannot find the rc1 package in releases 。。。。

<!-- gh-comment-id:3851828806 --> @r4c0box commented on GitHub (Feb 5, 2026): but,i cannot find the rc1 package in releases 。。。。
Author
Owner

@rick-github commented on GitHub (Feb 5, 2026):

The -rc releases have new default context logic that doesn't account for OLLAMA_NUM_PARALLEL. Until it's resolved, setting OLLAMA_CONTEXT_LENGTH=4096 in the server environment will return to the previous behaviour.

<!-- gh-comment-id:3854417199 --> @rick-github commented on GitHub (Feb 5, 2026): The `-rc` releases have new default context logic that doesn't account for `OLLAMA_NUM_PARALLEL`. Until it's resolved, setting `OLLAMA_CONTEXT_LENGTH=4096` in the server environment will return to the previous behaviour.
Author
Owner

@znowfox commented on GitHub (Feb 5, 2026):

so obviously its llama.cpp

<!-- gh-comment-id:3854944095 --> @znowfox commented on GitHub (Feb 5, 2026): so obviously its llama.cpp
Author
Owner

@fighter3005 commented on GitHub (Feb 27, 2026):

Sorry to chime in here, but I am confused. I tried setting OLLAMA_NUM_PARALLEL and OLLAMA_CONTEXT_LENGTH=4096 in my docker compose environment variables, and also setting just OLLAMA_NUM_PARALLEL, but OLLAMA_NUM_PARALLEL is being ignored regardless!?
I tried with the rocm and cuda containers in the latest version (0.17.4).
I tested qwen3-vl:4b-instruct-q8_0 and qwen3.5:27b-q4_K_M, as well as bge-m3:567m-fp16.
I don't get parallelism.

<!-- gh-comment-id:3975126828 --> @fighter3005 commented on GitHub (Feb 27, 2026): Sorry to chime in here, but I am confused. I tried setting OLLAMA_NUM_PARALLEL and OLLAMA_CONTEXT_LENGTH=4096 in my docker compose environment variables, and also setting just OLLAMA_NUM_PARALLEL, but OLLAMA_NUM_PARALLEL is being ignored regardless!? I tried with the rocm and cuda containers in the latest version (0.17.4). I tested qwen3-vl:4b-instruct-q8_0 and qwen3.5:27b-q4_K_M, as well as bge-m3:567m-fp16. I don't get parallelism.
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

Some model architectures do not support parallelism.

<!-- gh-comment-id:3975691107 --> @rick-github commented on GitHub (Feb 27, 2026): Some model architectures [do not support](https://github.com/ollama/ollama/blob/dd5eb6337dab84d76d7edec2c102064504d75378/server/sched.go#L450) parallelism.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71259