[GH-ISSUE #15410] 500 Internal Server Error - All freakin models after reboot #71913

Open
opened 2026-05-05 02:56:14 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @ch473221 on GitHub (Apr 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15410

What is the issue?

Hi all! This has been an issue across all models I've pulled directly from Ollama. I pull them, run them, and they work fine until a restart/reboot. After that, models will run and provide a prompt, but the second I interact with the model, it throws "500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details." I've spent a day troubleshooting and am stuck. Any help would be greatly appreciated!

4080
64 GB RAM
4TB 5.0 NVME
Windows 11
Models tested: qwen3-vl:4b, qwen35:4b
File hashes are consistent (no corruption)
CPU-only mode produces gibberish after reboot

Relevant log output

Ollama version: 0.20.3
Model path: [REDACTED]

time=... msg="loading model"
time=... msg="starting ollama engine"
time=... msg="Server listening on 127.0.0.1"

time=... msg="llama runner started in ~2 seconds"

panic: failed to sample token

goroutine ... [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(...)
    runner.go:762

error="Post \"http://127.0.0.1:PORT/completion\": 
wsarecv: An existing connection was forcibly closed by the remote host."

[GIN] ... | 500 | POST "/api/chat"

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.20.3

Originally created by @ch473221 on GitHub (Apr 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15410 ### What is the issue? Hi all! This has been an issue across all models I've pulled directly from Ollama. I pull them, run them, and they work fine until a restart/reboot. After that, models will run and provide a prompt, but the second I interact with the model, it throws "500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details." I've spent a day troubleshooting and am stuck. Any help would be greatly appreciated! 4080 64 GB RAM 4TB 5.0 NVME Windows 11 Models tested: qwen3-vl:4b, qwen35:4b File hashes are consistent (no corruption) CPU-only mode produces gibberish after reboot ### Relevant log output ```shell Ollama version: 0.20.3 Model path: [REDACTED] time=... msg="loading model" time=... msg="starting ollama engine" time=... msg="Server listening on 127.0.0.1" time=... msg="llama runner started in ~2 seconds" panic: failed to sample token goroutine ... [running]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(...) runner.go:762 error="Post \"http://127.0.0.1:PORT/completion\": wsarecv: An existing connection was forcibly closed by the remote host." [GIN] ... | 500 | POST "/api/chat" ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.20.3
GiteaMirror added the bug label 2026-05-05 02:56:14 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4203193621 --> @rick-github commented on GitHub (Apr 8, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@ch473221 commented on GitHub (Apr 8, 2026):

Server logs will aid in debugging.

Here is the latest log...

time=2026-04-07T17:19:19.939-07:00 level=INFO source=routes.go:1744 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:E:\PATH\TO\MODELS OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-04-07T17:19:19.939-07:00 level=INFO source=routes.go:1746 msg="Ollama cloud disabled: false"
time=2026-04-07T17:19:19.945-07:00 level=INFO source=images.go:499 msg="total blobs: 7"
time=2026-04-07T17:19:19.946-07:00 level=INFO source=images.go:506 msg="total unused blobs removed: 0"
time=2026-04-07T17:19:19.947-07:00 level=INFO source=routes.go:1802 msg="Listening on 127.0.0.1:11434 (version 0.20.3)"
time=2026-04-07T17:19:19.949-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-04-07T17:19:19.961-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 51530"
time=2026-04-07T17:19:20.157-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-04-07T17:19:20.158-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 51542"
time=2026-04-07T17:19:20.543-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 51553"
time=2026-04-07T17:19:20.891-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 51563"
time=2026-04-07T17:19:20.891-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 51564"
time=2026-04-07T17:19:21.122-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-REDACTED filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA GeForce RTX 4080" libdirs=ollama,cuda_v13 driver=13.2 pci_id=REDACTED type=discrete total="16.0 GiB" available="14.5 GiB"
time=2026-04-07T17:19:21.122-07:00 level=INFO source=routes.go:1852 msg="vram-based default context" total_vram="16.0 GiB" default_num_ctx=4096
[GIN] 2026/04/07 - 17:19:21 | 200 | 511.5µs | 127.0.0.1 | GET "/api/version"
[GIN] 2026/04/07 - 17:19:21 | 200 | 1.0255ms | 127.0.0.1 | GET "/api/version"
[GIN] 2026/04/07 - 17:19:21 | 200 | 0s | 127.0.0.1 | GET "/api/version"
[GIN] 2026/04/07 - 17:19:21 | 200 | 1.0274ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:19:21 | 200 | 115.3674ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/04/07 - 17:19:21 | 401 | 203.1088ms | 127.0.0.1 | POST "/api/me"
[GIN] 2026/04/07 - 17:19:21 | 401 | 202.0785ms | 127.0.0.1 | POST "/api/me"
[GIN] 2026/04/07 - 17:19:24 | 200 | 510µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:19:24 | 200 | 113.2493ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/04/07 - 17:19:24 | 200 | 109.5377ms | 127.0.0.1 | POST "/api/show"
time=2026-04-07T17:19:24.343-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 54510"
time=2026-04-07T17:19:24.516-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-04-07T17:19:24.516-07:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-04-07T17:19:24.516-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32
time=2026-04-07T17:19:24.586-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model E:\PATH\TO\MODELS\blobs\sha256-81fb60c7daa80fc1123380b98970b320ae233409f0f71a72ed7b9b0d62f40490 --port 54520"
time=2026-04-07T17:19:24.589-07:00 level=INFO source=sched.go:484 msg="system memory" total="63.7 GiB" free="54.3 GiB" free_swap="48.8 GiB"
time=2026-04-07T17:19:24.589-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=GPU-REDACTED library=CUDA available="14.0 GiB" free="14.5 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-07T17:19:24.589-07:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1
time=2026-04-07T17:19:24.676-07:00 level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-07T17:19:24.677-07:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:54520"
time=2026-04-07T17:19:24.685-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:33[ID:GPU-REDACTED Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:19:24.713-07:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=834 num_key_values=52
load_backend: loaded CPU backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes, ID: GPU-REDACTED
load_backend: loaded CUDA backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-04-07T17:19:24.774-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-04-07T17:19:25.395-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:33[ID:GPU-REDACTED Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:33[ID:GPU-REDACTED Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=ggml.go:482 msg="offloading 32 repeating layers to GPU"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=ggml.go:494 msg="offloaded 33/33 layers to GPU"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.1 GiB"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="504.4 MiB"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="1.4 GiB"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="4.3 GiB"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="19.8 MiB"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:272 msg="total memory" size="9.3 GiB"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-07T17:19:25.851-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-07T17:19:25.851-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-04-07T17:19:26.852-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.26 seconds"
panic: failed to sample token

goroutine 1107 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000218b40, {0x0, {0x7ff7a33822e0, 0xc00061e140}, {0x7ff7a3392498, 0xc0093e5938}, {0xc0002a3780, 0xb, 0x10}, {{0x7ff7a3392498, ...}, ...}, ...})
github.com/ollama/ollama/runner/ollamarunner/runner.go:762 +0x1c25
created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 6
github.com/ollama/ollama/runner/ollamarunner/runner.go:459 +0x2cd
time=2026-04-07T17:19:27.069-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post "http://127.0.0.1:54520/completion": read tcp 127.0.0.1:54530->127.0.0.1:54520: wsarecv: An existing connection was forcibly closed by the remote host."
[GIN] 2026/04/07 - 17:19:27 | 500 | 2.8181144s | 127.0.0.1 | POST "/api/chat"
[GIN] 2026/04/07 - 17:19:54 | 200 | 1.0193ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:20:24 | 200 | 802.9µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:20:54 | 200 | 799.3µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:21:15 | 200 | 73.4226ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/04/07 - 17:21:24 | 200 | 1.2399ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:21:27 | 200 | 66.2095ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/04/07 - 17:21:27 | 200 | 63.3633ms | 127.0.0.1 | POST "/api/show"
time=2026-04-07T17:21:27.468-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 60219"
time=2026-04-07T17:21:27.669-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-04-07T17:21:27.669-07:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-04-07T17:21:27.669-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32
time=2026-04-07T17:21:27.669-07:00 level=INFO source=sched.go:627 msg="updated VRAM based on existing loaded models" gpu=GPU-REDACTED library=CUDA total="16.0 GiB" available="7.2 GiB"
time=2026-04-07T17:21:27.712-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model E:\PATH\TO\MODELS\blobs\sha256-9c60bdd691c1897bbfe5ddbc67336848e18c346b7ee2ab8541b135f208e5bb38 --port 60231"
time=2026-04-07T17:21:27.714-07:00 level=INFO source=sched.go:484 msg="system memory" total="63.7 GiB" free="52.2 GiB" free_swap="46.3 GiB"
time=2026-04-07T17:21:27.714-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=GPU-REDACTED library=CUDA available="6.7 GiB" free="7.2 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-07T17:21:27.714-07:00 level=INFO source=server.go:759 msg="loading model" "model layers"=37 requested=-1
time=2026-04-07T17:21:27.809-07:00 level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-07T17:21:27.810-07:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:60231"
time=2026-04-07T17:21:27.811-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:21:27.828-07:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=Q4_K_M name="" description="" num_tensors=809 num_key_values=40
load_backend: loaded CPU backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes, ID: GPU-REDACTED
load_backend: loaded CUDA backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-04-07T17:21:27.877-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-04-07T17:21:28.246-07:00 level=INFO source=server.go:1031 msg="model requires more gpu memory than is currently available, evicting a model to make space" "loaded layers"=28
time=2026-04-07T17:21:28.246-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disabled KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.1 GiB"
time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB"
time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="576.0 MiB"
time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="4.7 GiB"
time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB"
time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:272 msg="total memory" size="8.6 GiB"
time=2026-04-07T17:21:28.257-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 60243"
time=2026-04-07T17:21:28.681-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 60254"
time=2026-04-07T17:21:28.934-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 60265"
time=2026-04-07T17:21:29.101-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 60276"
time=2026-04-07T17:21:29.272-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-04-07T17:21:29.272-07:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-04-07T17:21:29.272-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32
time=2026-04-07T17:21:29.288-07:00 level=INFO source=sched.go:484 msg="system memory" total="63.7 GiB" free="51.5 GiB" free_swap="41.3 GiB"
time=2026-04-07T17:21:29.288-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=GPU-REDACTED library=CUDA available="13.4 GiB" free="13.8 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-07T17:21:29.288-07:00 level=INFO source=server.go:759 msg="loading model" "model layers"=37 requested=-1
time=2026-04-07T17:21:29.289-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:21:29.500-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.1 GiB"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="576.0 MiB"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="4.7 GiB"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:272 msg="total memory" size="8.6 GiB"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-07T17:21:29.820-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-07T17:21:29.820-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-04-07T17:21:30.572-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.86 seconds"
panic: failed to sample token

goroutine 700 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00022f2c0, {0x0, {0x7ff7a33822e0, 0xc003f62100}, {0x7ff7a3392498, 0xc000ff0468}, {0xc00004ea00, 0x11, 0x20}, {{0x7ff7a3392498, ...}, ...}, ...})
github.com/ollama/ollama/runner/ollamarunner/runner.go:762 +0x1c25
created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 15
github.com/ollama/ollama/runner/ollamarunner/runner.go:459 +0x2cd
time=2026-04-07T17:21:30.746-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post "http://127.0.0.1:60231/completion": read tcp 127.0.0.1:60241->127.0.0.1:60231: wsarecv: An existing connection was forcibly closed by the remote host."
[GIN] 2026/04/07 - 17:21:30 | 500 | 3.3444095s | 127.0.0.1 | POST "/api/chat"
[GIN] 2026/04/07 - 17:21:54 | 200 | 3.0137ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:22:24 | 200 | 502.9µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:22:54 | 200 | 2.8243ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:23:24 | 200 | 502.4µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:23:54 | 200 | 1.9837ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:24:24 | 200 | 1.1573ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:24:54 | 200 | 548.7µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:25:24 | 200 | 3.3265ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:25:54 | 200 | 1.1009ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:26:24 | 200 | 1.5456ms | 127.0.0.1 | GET "/api/tags"
time=2026-04-07T17:26:30.767-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55514"
time=2026-04-07T17:26:31.200-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55526"
time=2026-04-07T17:26:31.448-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55537"
time=2026-04-07T17:26:31.698-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55548"
time=2026-04-07T17:26:31.948-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55559"
time=2026-04-07T17:26:32.199-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55570"
time=2026-04-07T17:26:32.449-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55582"
time=2026-04-07T17:26:32.700-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55593"
time=2026-04-07T17:26:32.948-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55604"
time=2026-04-07T17:26:33.200-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55615"
time=2026-04-07T17:26:33.450-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55627"
time=2026-04-07T17:26:33.698-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55638"
time=2026-04-07T17:26:33.948-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55649"
time=2026-04-07T17:26:34.199-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55660"
time=2026-04-07T17:26:34.449-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55671"
time=2026-04-07T17:26:34.699-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55682"
time=2026-04-07T17:26:34.951-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55693"
time=2026-04-07T17:26:35.199-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55704"
time=2026-04-07T17:26:35.450-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55715"
time=2026-04-07T17:26:35.698-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\tctvs\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55726"
[GIN] 2026/04/07 - 17:26:54 | 200 | 4.5294ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:27:24 | 200 | 2.6667ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:27:54 | 200 | 1.0473ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:28:24 | 200 | 1.0379ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:28:54 | 200 | 502.7µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:29:24 | 200 | 2.1349ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:29:54 | 200 | 2.9422ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:30:24 | 200 | 504.7µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:30:54 | 200 | 1.0288ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/04/07 - 17:31:24 | 200 | 2.827ms | 127.0.0.1 | GET "/api/tags"

<!-- gh-comment-id:4203222734 --> @ch473221 commented on GitHub (Apr 8, 2026): > [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging. Here is the latest log... time=2026-04-07T17:19:19.939-07:00 level=INFO source=routes.go:1744 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:E:\\PATH\\TO\\MODELS OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-04-07T17:19:19.939-07:00 level=INFO source=routes.go:1746 msg="Ollama cloud disabled: false" time=2026-04-07T17:19:19.945-07:00 level=INFO source=images.go:499 msg="total blobs: 7" time=2026-04-07T17:19:19.946-07:00 level=INFO source=images.go:506 msg="total unused blobs removed: 0" time=2026-04-07T17:19:19.947-07:00 level=INFO source=routes.go:1802 msg="Listening on 127.0.0.1:11434 (version 0.20.3)" time=2026-04-07T17:19:19.949-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-07T17:19:19.961-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51530" time=2026-04-07T17:19:20.157-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-04-07T17:19:20.158-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51542" time=2026-04-07T17:19:20.543-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51553" time=2026-04-07T17:19:20.891-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51563" time=2026-04-07T17:19:20.891-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 51564" time=2026-04-07T17:19:21.122-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-REDACTED filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA GeForce RTX 4080" libdirs=ollama,cuda_v13 driver=13.2 pci_id=REDACTED type=discrete total="16.0 GiB" available="14.5 GiB" time=2026-04-07T17:19:21.122-07:00 level=INFO source=routes.go:1852 msg="vram-based default context" total_vram="16.0 GiB" default_num_ctx=4096 [GIN] 2026/04/07 - 17:19:21 | 200 | 511.5µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/04/07 - 17:19:21 | 200 | 1.0255ms | 127.0.0.1 | GET "/api/version" [GIN] 2026/04/07 - 17:19:21 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/04/07 - 17:19:21 | 200 | 1.0274ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:19:21 | 200 | 115.3674ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/07 - 17:19:21 | 401 | 203.1088ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/04/07 - 17:19:21 | 401 | 202.0785ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/04/07 - 17:19:24 | 200 | 510µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:19:24 | 200 | 113.2493ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/07 - 17:19:24 | 200 | 109.5377ms | 127.0.0.1 | POST "/api/show" time=2026-04-07T17:19:24.343-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54510" time=2026-04-07T17:19:24.516-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-04-07T17:19:24.516-07:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-04-07T17:19:24.516-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32 time=2026-04-07T17:19:24.586-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model E:\\PATH\\TO\\MODELS\\blobs\\sha256-81fb60c7daa80fc1123380b98970b320ae233409f0f71a72ed7b9b0d62f40490 --port 54520" time=2026-04-07T17:19:24.589-07:00 level=INFO source=sched.go:484 msg="system memory" total="63.7 GiB" free="54.3 GiB" free_swap="48.8 GiB" time=2026-04-07T17:19:24.589-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=GPU-REDACTED library=CUDA available="14.0 GiB" free="14.5 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-07T17:19:24.589-07:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1 time=2026-04-07T17:19:24.676-07:00 level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-07T17:19:24.677-07:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:54520" time=2026-04-07T17:19:24.685-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:33[ID:GPU-REDACTED Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:19:24.713-07:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=834 num_key_values=52 load_backend: loaded CPU backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes, ID: GPU-REDACTED load_backend: loaded CUDA backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll time=2026-04-07T17:19:24.774-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-04-07T17:19:25.395-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:33[ID:GPU-REDACTED Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:19:25.851-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:33[ID:GPU-REDACTED Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:19:25.851-07:00 level=INFO source=ggml.go:482 msg="offloading 32 repeating layers to GPU" time=2026-04-07T17:19:25.851-07:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2026-04-07T17:19:25.851-07:00 level=INFO source=ggml.go:494 msg="offloaded 33/33 layers to GPU" time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.1 GiB" time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="504.4 MiB" time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="1.4 GiB" time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="4.3 GiB" time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="19.8 MiB" time=2026-04-07T17:19:25.851-07:00 level=INFO source=device.go:272 msg="total memory" size="9.3 GiB" time=2026-04-07T17:19:25.851-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-07T17:19:25.851-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-07T17:19:25.851-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" time=2026-04-07T17:19:26.852-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.26 seconds" panic: failed to sample token goroutine 1107 [running]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000218b40, {0x0, {0x7ff7a33822e0, 0xc00061e140}, {0x7ff7a3392498, 0xc0093e5938}, {0xc0002a3780, 0xb, 0x10}, {{0x7ff7a3392498, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:762 +0x1c25 created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 6 github.com/ollama/ollama/runner/ollamarunner/runner.go:459 +0x2cd time=2026-04-07T17:19:27.069-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:54520/completion\": read tcp 127.0.0.1:54530->127.0.0.1:54520: wsarecv: An existing connection was forcibly closed by the remote host." [GIN] 2026/04/07 - 17:19:27 | 500 | 2.8181144s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/04/07 - 17:19:54 | 200 | 1.0193ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:20:24 | 200 | 802.9µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:20:54 | 200 | 799.3µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:21:15 | 200 | 73.4226ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/07 - 17:21:24 | 200 | 1.2399ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:21:27 | 200 | 66.2095ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/07 - 17:21:27 | 200 | 63.3633ms | 127.0.0.1 | POST "/api/show" time=2026-04-07T17:21:27.468-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60219" time=2026-04-07T17:21:27.669-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-04-07T17:21:27.669-07:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-04-07T17:21:27.669-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32 time=2026-04-07T17:21:27.669-07:00 level=INFO source=sched.go:627 msg="updated VRAM based on existing loaded models" gpu=GPU-REDACTED library=CUDA total="16.0 GiB" available="7.2 GiB" time=2026-04-07T17:21:27.712-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model E:\\PATH\\TO\\MODELS\\blobs\\sha256-9c60bdd691c1897bbfe5ddbc67336848e18c346b7ee2ab8541b135f208e5bb38 --port 60231" time=2026-04-07T17:21:27.714-07:00 level=INFO source=sched.go:484 msg="system memory" total="63.7 GiB" free="52.2 GiB" free_swap="46.3 GiB" time=2026-04-07T17:21:27.714-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=GPU-REDACTED library=CUDA available="6.7 GiB" free="7.2 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-07T17:21:27.714-07:00 level=INFO source=server.go:759 msg="loading model" "model layers"=37 requested=-1 time=2026-04-07T17:21:27.809-07:00 level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-07T17:21:27.810-07:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:60231" time=2026-04-07T17:21:27.811-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:21:27.828-07:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=Q4_K_M name="" description="" num_tensors=809 num_key_values=40 load_backend: loaded CPU backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes, ID: GPU-REDACTED load_backend: loaded CUDA backend from C:\Users\USER\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll time=2026-04-07T17:21:27.877-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-04-07T17:21:28.246-07:00 level=INFO source=server.go:1031 msg="model requires more gpu memory than is currently available, evicting a model to make space" "loaded layers"=28 time=2026-04-07T17:21:28.246-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disabled KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.1 GiB" time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB" time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="576.0 MiB" time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="4.7 GiB" time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB" time=2026-04-07T17:21:28.246-07:00 level=INFO source=device.go:272 msg="total memory" size="8.6 GiB" time=2026-04-07T17:21:28.257-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60243" time=2026-04-07T17:21:28.681-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60254" time=2026-04-07T17:21:28.934-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60265" time=2026-04-07T17:21:29.101-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60276" time=2026-04-07T17:21:29.272-07:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-04-07T17:21:29.272-07:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-04-07T17:21:29.272-07:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32 time=2026-04-07T17:21:29.288-07:00 level=INFO source=sched.go:484 msg="system memory" total="63.7 GiB" free="51.5 GiB" free_swap="41.3 GiB" time=2026-04-07T17:21:29.288-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=GPU-REDACTED library=CUDA available="13.4 GiB" free="13.8 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-07T17:21:29.288-07:00 level=INFO source=server.go:759 msg="loading model" "model layers"=37 requested=-1 time=2026-04-07T17:21:29.289-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:21:29.500-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:21:29.820-07:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-REDACTED Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.1 GiB" time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB" time=2026-04-07T17:21:29.820-07:00 level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU" time=2026-04-07T17:21:29.820-07:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2026-04-07T17:21:29.820-07:00 level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU" time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="576.0 MiB" time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="4.7 GiB" time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB" time=2026-04-07T17:21:29.820-07:00 level=INFO source=device.go:272 msg="total memory" size="8.6 GiB" time=2026-04-07T17:21:29.820-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-07T17:21:29.820-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-07T17:21:29.820-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" time=2026-04-07T17:21:30.572-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.86 seconds" panic: failed to sample token goroutine 700 [running]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00022f2c0, {0x0, {0x7ff7a33822e0, 0xc003f62100}, {0x7ff7a3392498, 0xc000ff0468}, {0xc00004ea00, 0x11, 0x20}, {{0x7ff7a3392498, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:762 +0x1c25 created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 15 github.com/ollama/ollama/runner/ollamarunner/runner.go:459 +0x2cd time=2026-04-07T17:21:30.746-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:60231/completion\": read tcp 127.0.0.1:60241->127.0.0.1:60231: wsarecv: An existing connection was forcibly closed by the remote host." [GIN] 2026/04/07 - 17:21:30 | 500 | 3.3444095s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/04/07 - 17:21:54 | 200 | 3.0137ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:22:24 | 200 | 502.9µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:22:54 | 200 | 2.8243ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:23:24 | 200 | 502.4µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:23:54 | 200 | 1.9837ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:24:24 | 200 | 1.1573ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:24:54 | 200 | 548.7µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:25:24 | 200 | 3.3265ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:25:54 | 200 | 1.1009ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:26:24 | 200 | 1.5456ms | 127.0.0.1 | GET "/api/tags" time=2026-04-07T17:26:30.767-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55514" time=2026-04-07T17:26:31.200-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55526" time=2026-04-07T17:26:31.448-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55537" time=2026-04-07T17:26:31.698-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55548" time=2026-04-07T17:26:31.948-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55559" time=2026-04-07T17:26:32.199-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55570" time=2026-04-07T17:26:32.449-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55582" time=2026-04-07T17:26:32.700-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55593" time=2026-04-07T17:26:32.948-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55604" time=2026-04-07T17:26:33.200-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55615" time=2026-04-07T17:26:33.450-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55627" time=2026-04-07T17:26:33.698-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55638" time=2026-04-07T17:26:33.948-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55649" time=2026-04-07T17:26:34.199-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55660" time=2026-04-07T17:26:34.449-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55671" time=2026-04-07T17:26:34.699-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55682" time=2026-04-07T17:26:34.951-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55693" time=2026-04-07T17:26:35.199-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55704" time=2026-04-07T17:26:35.450-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55715" time=2026-04-07T17:26:35.698-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\tctvs\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55726" [GIN] 2026/04/07 - 17:26:54 | 200 | 4.5294ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:27:24 | 200 | 2.6667ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:27:54 | 200 | 1.0473ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:28:24 | 200 | 1.0379ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:28:54 | 200 | 502.7µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:29:24 | 200 | 2.1349ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:29:54 | 200 | 2.9422ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:30:24 | 200 | 504.7µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:30:54 | 200 | 1.0288ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/04/07 - 17:31:24 | 200 | 2.827ms | 127.0.0.1 | GET "/api/tags"
Author
Owner

@ch473221 commented on GitHub (Apr 9, 2026):

OK, found the issue, theres a workaround, but its a bit annoying. This seems to be a combo drive-specific/Ollama write issue as I pulled a model on a different drive that survives a reboot. I've isolated this as a manifest write issue. I pulled a model, ran fine, rebooted, got the 500 error. Tried rebooting again and this time pulled the same model that gave the 500 error. Since the models blobs were already saved, it rewrote the manifest and boom, the model worked in that session and has survived reboots. So there's a bug in the initial manifest write. For reference the models are on a 4t gen 5 nvme. Hope this helps someone.

<!-- gh-comment-id:4211054521 --> @ch473221 commented on GitHub (Apr 9, 2026): OK, found the issue, theres a workaround, but its a bit annoying. This seems to be a combo drive-specific/Ollama write issue as I pulled a model on a different drive that survives a reboot. I've isolated this as a manifest write issue. I pulled a model, ran fine, rebooted, got the 500 error. Tried rebooting again and this time pulled the same model that gave the 500 error. Since the models blobs were already saved, it rewrote the manifest and boom, the model worked in that session and has survived reboots. So there's a bug in the initial manifest write. For reference the models are on a 4t gen 5 nvme. Hope this helps someone.
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15410
Analyzed: 2026-04-18T18:22:20.337179

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274309830 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15410 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15410 **Analyzed**: 2026-04-18T18:22:20.337179 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71913