[GH-ISSUE #8770] Error: llama runner process has terminated: exit status 2 #5693

New Issue

GiteaMirror · 2026-04-12T16:59:11-05:00

GiteaMirror commented

2026-04-12 16:59:11 -05:00

Originally created by @rty0511 on GitHub (Feb 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8770

What is the issue?

When I used Ollama to locally deploy DeepSeek ollama run deepseek-v1:8b, an error message cropped up: "llama runner process has terminated: exit status 2." Since my computer is equipped with an AMD 5600 CPU and a 6750 XT GPU, I replaced some files with those for ROCm.gfx1031.

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.5.7

Originally created by @rty0511 on GitHub (Feb 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8770 ### What is the issue? When I used Ollama to locally deploy DeepSeek `ollama run deepseek-v1:8b`, an error message cropped up: "llama runner process has terminated: exit status 2." Since my computer is equipped with an AMD 5600 CPU and a 6750 XT GPU, I replaced some files with those for ROCm.gfx1031. [](https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases) ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version 0.5.7

GiteaMirror added the bug label 2026-04-12 16:59:11 -05:00

GiteaMirror commented

2026-04-12 16:59:12 -05:00

@rty0511 commented on GitHub (Feb 2, 2025):

ollma server log like this：

2025/02/02 09:23:09 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\LocalUser\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-02-02T09:23:09.826+08:00 level=INFO source=images.go:432 msg="total blobs: 5"
time=2025-02-02T09:23:09.827+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-02-02T09:23:09.833+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-02-02T09:23:09.839+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cuda_v12_avx rocm_avx cpu cpu_avx cpu_avx2 cuda_v11_avx]"
time=2025-02-02T09:23:09.841+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-02-02T09:23:09.843+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-02-02T09:23:09.843+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12
time=2025-02-02T09:23:12.268+08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1031 driver=6.1 name="AMD Radeon RX 6750 XT" total="12.0 GiB" available="11.8 GiB"
[GIN] 2025/02/02 - 09:24:08 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2025/02/02 - 09:24:08 | 200 | 60.035ms | 127.0.0.1 | POST "/api/show"
time=2025-02-02T09:24:09.529+08:00 level=INFO source=sched.go:185 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2025-02-02T09:24:09.561+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\LocalUser.ollama\models\blobs\sha256-6340dc3229b0d08ea9cc49b75d4098702983e17b4c096d57afbbf2ffc813f2be gpu=0 parallel=4 available=12721225728 required="6.5 GiB"
time=2025-02-02T09:24:10.000+08:00 level=INFO source=server.go:104 msg="system memory" total="15.9 GiB" free="11.2 GiB" free_swap="17.8 GiB"
time=2025-02-02T09:24:10.001+08:00 level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.5 GiB" memory.required.partial="6.5 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.5 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.5 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB"
time=2025-02-02T09:24:10.012+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="C:\Users\LocalUser\AppData\Local\Programs\Ollama\lib\ollama\runners\rocm_avx\ollama_llama_server.exe runner --model C:\Users\LocalUser\.ollama\models\blobs\sha256-6340dc3229b0d08ea9cc49b75d4098702983e17b4c096d57afbbf2ffc813f2be --ctx-size 8192 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 4 --port 1400"
time=2025-02-02T09:24:10.017+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-02-02T09:24:10.017+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
time=2025-02-02T09:24:10.017+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2025-02-02T09:24:10.236+08:00 level=INFO source=runner.go:936 msg="starting go runner"
Exception 0xc0000005 0x0 0x0 0x7ff81e9b3278
PC=0x7ff81e9b3278
signal arrived during external code execution

runtime.cgocall(0x7ff6fcf114c0, 0xc000187b10)
runtime/cgocall.go:167 +0x3e fp=0xc000187ae8 sp=0xc000187a80 pc=0x7ff6fccc619e
github.com/ollama/ollama/llama._Cfunc_llama_print_system_info()
_cgo_gotypes.go:839 +0x52 fp=0xc000187b10 sp=0xc000187ae8 pc=0x7ff6fcd73972
github.com/ollama/ollama/llama.PrintSystemInfo()
github.com/ollama/ollama/llama/llama.go:115 +0x79 fp=0xc000187b58 sp=0xc000187b10 pc=0x7ff6fcd74af9
github.com/ollama/ollama/llama/runner.Execute({0xc0000ac010?, 0x1ffffffff?, 0x0?})
github.com/ollama/ollama/llama/runner/runner.go:937 +0x6de fp=0xc000187ef8 sp=0xc000187b58 pc=0x7ff6fcf0f09e
main.main()
github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000187f50 sp=0xc000187ef8 pc=0x7ff6fcf10af4
runtime.main()
runtime/proc.go:272 +0x27d fp=0xc000187fe0 sp=0xc000187f50 pc=0x7ff6fcc9a61d
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000187fe8 sp=0xc000187fe0 pc=0x7ff6fccd3ec1

goroutine 18 gp=0xc0000861c0 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00004ffa8 sp=0xc00004ff88 pc=0x7ff6fcccc04e
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.forcegchelper()
runtime/proc.go:337 +0xb8 fp=0xc00004ffe0 sp=0xc00004ffa8 pc=0x7ff6fcc9a938
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00004ffe8 sp=0xc00004ffe0 pc=0x7ff6fccd3ec1
created by runtime.init.7 in goroutine 1
runtime/proc.go:325 +0x1a

goroutine 19 gp=0xc000086380 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000051f80 sp=0xc000051f60 pc=0x7ff6fcccc04e
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.bgsweep(0xc00008a000)
runtime/mgcsweep.go:277 +0x94 fp=0xc000051fc8 sp=0xc000051f80 pc=0x7ff6fcc837b4
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc000051fe0 sp=0xc000051fc8 pc=0x7ff6fcc78045
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000051fe8 sp=0xc000051fe0 pc=0x7ff6fccd3ec1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66

goroutine 20 gp=0xc000086540 m=nil [GC scavenge wait]:
runtime.gopark(0xc00008a000?, 0x7ff6fd2cbe70?, 0x1?, 0x0?, 0xc000086540?)
runtime/proc.go:424 +0xce fp=0xc000093f78 sp=0xc000093f58 pc=0x7ff6fcccc04e
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.(*scavengerState).park(0x7ff6fd55a7a0)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000093fa8 sp=0xc000093f78 pc=0x7ff6fcc811e9
runtime.bgscavenge(0xc00008a000)
runtime/mgcscavenge.go:653 +0x3c fp=0xc000093fc8 sp=0xc000093fa8 pc=0x7ff6fcc8175c
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc000093fe0 sp=0xc000093fc8 pc=0x7ff6fcc77fe5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x7ff6fccd3ec1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5

goroutine 21 gp=0xc000086700 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000095e20 sp=0xc000095e00 pc=0x7ff6fcccc04e
runtime.runfinq()
runtime/mfinal.go:193 +0x107 fp=0xc000095fe0 sp=0xc000095e20 pc=0x7ff6fcc77107
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000095fe8 sp=0xc000095fe0 pc=0x7ff6fccd3ec1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:163 +0x3d

goroutine 2 gp=0xc00004cc40 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00008ff18 sp=0xc00008fef8 pc=0x7ff6fcccc04e
runtime.chanrecv(0xc00001a0e0, 0x0, 0x1)
runtime/chan.go:639 +0x41e fp=0xc00008ff90 sp=0xc00008ff18 pc=0x7ff6fcc67cfe
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:489 +0x12 fp=0xc00008ffb8 sp=0xc00008ff90 pc=0x7ff6fcc678d2
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1784 +0x2f fp=0xc00008ffe0 sp=0xc00008ffb8 pc=0x7ff6fcc7af2f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x7ff6fccd3ec1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1779 +0x96
rax 0x0
rbx 0x231c57635e0
rcx 0x231c57635f0
rdx 0x0
rdi 0x0
rsi 0x231c57635e8
rbp 0x64b4afebe0
rsp 0x64b4afeb20
r8 0x0
r9 0x1
r10 0xb1af6371d13b
r11 0x64b4afeb50
r12 0xc000
r13 0xffffffffffffffff
r14 0x0
r15 0x231c5763630
rip 0x7ff81e9b3278
rflags 0x10286
cs 0x33
fs 0x53
gs 0x2b
time=2025-02-02T09:24:10.519+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/02/02 - 09:24:10 | 500 | 1.6004339s | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/02/02 - 09:27:57 | 200 | 0s | 127.0.0.1 | GET "/api/version"

@rty0511 commented on GitHub (Feb 2, 2025): ollma server log like this： 2025/02/02 09:23:09 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\LocalUser\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-02-02T09:23:09.826+08:00 level=INFO source=images.go:432 msg="total blobs: 5" time=2025-02-02T09:23:09.827+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-02-02T09:23:09.833+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)" time=2025-02-02T09:23:09.839+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cuda_v12_avx rocm_avx cpu cpu_avx cpu_avx2 cuda_v11_avx]" time=2025-02-02T09:23:09.841+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2025-02-02T09:23:09.843+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-02-02T09:23:09.843+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=12 time=2025-02-02T09:23:12.268+08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1031 driver=6.1 name="AMD Radeon RX 6750 XT" total="12.0 GiB" available="11.8 GiB" [GIN] 2025/02/02 - 09:24:08 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2025/02/02 - 09:24:08 | 200 | 60.035ms | 127.0.0.1 | POST "/api/show" time=2025-02-02T09:24:09.529+08:00 level=INFO source=sched.go:185 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2025-02-02T09:24:09.561+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\LocalUser\.ollama\models\blobs\sha256-6340dc3229b0d08ea9cc49b75d4098702983e17b4c096d57afbbf2ffc813f2be gpu=0 parallel=4 available=12721225728 required="6.5 GiB" time=2025-02-02T09:24:10.000+08:00 level=INFO source=server.go:104 msg="system memory" total="15.9 GiB" free="11.2 GiB" free_swap="17.8 GiB" time=2025-02-02T09:24:10.001+08:00 level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.5 GiB" memory.required.partial="6.5 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.5 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.5 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2025-02-02T09:24:10.012+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="C:\\Users\\LocalUser\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\runners\\rocm_avx\\ollama_llama_server.exe runner --model C:\\Users\\LocalUser\\.ollama\\models\\blobs\\sha256-6340dc3229b0d08ea9cc49b75d4098702983e17b4c096d57afbbf2ffc813f2be --ctx-size 8192 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 4 --port 1400" time=2025-02-02T09:24:10.017+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2025-02-02T09:24:10.017+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding" time=2025-02-02T09:24:10.017+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2025-02-02T09:24:10.236+08:00 level=INFO source=runner.go:936 msg="starting go runner" Exception 0xc0000005 0x0 0x0 0x7ff81e9b3278 PC=0x7ff81e9b3278 signal arrived during external code execution runtime.cgocall(0x7ff6fcf114c0, 0xc000187b10) runtime/cgocall.go:167 +0x3e fp=0xc000187ae8 sp=0xc000187a80 pc=0x7ff6fccc619e github.com/ollama/ollama/llama._Cfunc_llama_print_system_info() _cgo_gotypes.go:839 +0x52 fp=0xc000187b10 sp=0xc000187ae8 pc=0x7ff6fcd73972 github.com/ollama/ollama/llama.PrintSystemInfo() github.com/ollama/ollama/llama/llama.go:115 +0x79 fp=0xc000187b58 sp=0xc000187b10 pc=0x7ff6fcd74af9 github.com/ollama/ollama/llama/runner.Execute({0xc0000ac010?, 0x1ffffffff?, 0x0?}) github.com/ollama/ollama/llama/runner/runner.go:937 +0x6de fp=0xc000187ef8 sp=0xc000187b58 pc=0x7ff6fcf0f09e main.main() github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000187f50 sp=0xc000187ef8 pc=0x7ff6fcf10af4 runtime.main() runtime/proc.go:272 +0x27d fp=0xc000187fe0 sp=0xc000187f50 pc=0x7ff6fcc9a61d runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000187fe8 sp=0xc000187fe0 pc=0x7ff6fccd3ec1 goroutine 18 gp=0xc0000861c0 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc00004ffa8 sp=0xc00004ff88 pc=0x7ff6fcccc04e runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc00004ffe0 sp=0xc00004ffa8 pc=0x7ff6fcc9a938 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00004ffe8 sp=0xc00004ffe0 pc=0x7ff6fccd3ec1 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 19 gp=0xc000086380 m=nil [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000051f80 sp=0xc000051f60 pc=0x7ff6fcccc04e runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc00008a000) runtime/mgcsweep.go:277 +0x94 fp=0xc000051fc8 sp=0xc000051f80 pc=0x7ff6fcc837b4 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc000051fe0 sp=0xc000051fc8 pc=0x7ff6fcc78045 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000051fe8 sp=0xc000051fe0 pc=0x7ff6fccd3ec1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 20 gp=0xc000086540 m=nil [GC scavenge wait]: runtime.gopark(0xc00008a000?, 0x7ff6fd2cbe70?, 0x1?, 0x0?, 0xc000086540?) runtime/proc.go:424 +0xce fp=0xc000093f78 sp=0xc000093f58 pc=0x7ff6fcccc04e runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x7ff6fd55a7a0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000093fa8 sp=0xc000093f78 pc=0x7ff6fcc811e9 runtime.bgscavenge(0xc00008a000) runtime/mgcscavenge.go:653 +0x3c fp=0xc000093fc8 sp=0xc000093fa8 pc=0x7ff6fcc8175c runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000093fe0 sp=0xc000093fc8 pc=0x7ff6fcc77fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x7ff6fccd3ec1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 21 gp=0xc000086700 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000095e20 sp=0xc000095e00 pc=0x7ff6fcccc04e runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc000095fe0 sp=0xc000095e20 pc=0x7ff6fcc77107 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000095fe8 sp=0xc000095fe0 pc=0x7ff6fccd3ec1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 2 gp=0xc00004cc40 m=nil [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc00008ff18 sp=0xc00008fef8 pc=0x7ff6fcccc04e runtime.chanrecv(0xc00001a0e0, 0x0, 0x1) runtime/chan.go:639 +0x41e fp=0xc00008ff90 sp=0xc00008ff18 pc=0x7ff6fcc67cfe runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc00008ffb8 sp=0xc00008ff90 pc=0x7ff6fcc678d2 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc00008ffe0 sp=0xc00008ffb8 pc=0x7ff6fcc7af2f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x7ff6fccd3ec1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 rax 0x0 rbx 0x231c57635e0 rcx 0x231c57635f0 rdx 0x0 rdi 0x0 rsi 0x231c57635e8 rbp 0x64b4afebe0 rsp 0x64b4afeb20 r8 0x0 r9 0x1 r10 0xb1af6371d13b r11 0x64b4afeb50 r12 0xc000 r13 0xffffffffffffffff r14 0x0 r15 0x231c5763630 rip 0x7ff81e9b3278 rflags 0x10286 cs 0x33 fs 0x53 gs 0x2b time=2025-02-02T09:24:10.519+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2" [GIN] 2025/02/02 - 09:24:10 | 500 | 1.6004339s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/02/02 - 09:27:57 | 200 | 0s | 127.0.0.1 | GET "/api/version"

GiteaMirror commented

2026-04-12 16:59:12 -05:00

@khaneliman commented on GitHub (Feb 3, 2025):

I ran into this issue when I provided the wrong versions for the graphics overrides.

(7900XTX)
1100 worked but 1102 threw this error.

  HCC_AMDGPU_TARGET = "gfx1100";
  HSA_OVERRIDE_GFX_VERSION = "11.0.0;"

@khaneliman commented on GitHub (Feb 3, 2025): I ran into this issue when I provided the wrong versions for the graphics overrides. (7900XTX) 1100 worked but 1102 threw this error. ```nix HCC_AMDGPU_TARGET = "gfx1100"; HSA_OVERRIDE_GFX_VERSION = "11.0.0;" ```

GiteaMirror commented

2026-04-12 16:59:12 -05:00

@MobileMoney-Xavier commented on GitHub (Feb 7, 2025):

I ran into this issue when I provided the wrong versions for the graphics overrides.

(7900XTX) 1100 worked but 1102 threw this error.

HCC_AMDGPU_TARGET = "gfx1100";
HSA_OVERRIDE_GFX_VERSION = "11.0.0;"

I face same issue on my 7900XTX, means I should modify this two parameters? Does it affect my GPU running on other software?

@MobileMoney-Xavier commented on GitHub (Feb 7, 2025): > I ran into this issue when I provided the wrong versions for the graphics overrides. > > (7900XTX) 1100 worked but 1102 threw this error. > > HCC_AMDGPU_TARGET = "gfx1100"; > HSA_OVERRIDE_GFX_VERSION = "11.0.0;" I face same issue on my 7900XTX, means I should modify this two parameters? Does it affect my GPU running on other software?

GiteaMirror commented

2026-04-12 16:59:12 -05:00

@sunshinewithmoonlight commented on GitHub (Feb 7, 2025):

Same problem on Mac mini (2024) with 24GB RAM. Ollama version is 0.5.7. AnythingLLM version is 1.7.3-r2 (1.7.3-r2).

@sunshinewithmoonlight commented on GitHub (Feb 7, 2025): Same problem on Mac mini (2024) with 24GB RAM. Ollama version is 0.5.7. AnythingLLM version is 1.7.3-r2 (1.7.3-r2). <img width="913" alt="Image" src="https://github.com/user-attachments/assets/7b5f8ec4-964a-4521-b862-470bd0bba6a0" />

GiteaMirror commented

2026-04-12 16:59:12 -05:00

@creasyWinds commented on GitHub (Feb 7, 2025):

did you resolve id？

@creasyWinds commented on GitHub (Feb 7, 2025): did you resolve id？

GiteaMirror commented

2026-04-12 16:59:13 -05:00

@NicalDai commented on GitHub (Feb 12, 2025):

Same problem on Mac mini (2024) with 24GB RAM. Ollama version is 0.5.7. AnythingLLM version is 1.7.3-r2 (1.7.3-r2).

Maybe due to low memory, you can try 14B model.

@NicalDai commented on GitHub (Feb 12, 2025): > Same problem on Mac mini (2024) with 24GB RAM. Ollama version is 0.5.7. AnythingLLM version is 1.7.3-r2 (1.7.3-r2). > > <img alt="Image" width="913" src="https://private-user-images.githubusercontent.com/30280439/410905828-7b5f8ec4-964a-4521-b862-470bd0bba6a0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMzc3OTEsIm5iZiI6MTczOTMzNzQ5MSwicGF0aCI6Ii8zMDI4MDQzOS80MTA5MDU4MjgtN2I1ZjhlYzQtOTY0YS00NTIxLWI4NjItNDcwYmQwYmJhNmEwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDA1MTgxMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRjYWMxOGYwOWIyMTczZTJiMjA0ZDlkMzVhNTU1NTFkMGU5M2MwNDljMTA0M2I2ZmU0NDExZmJiMjdlZTg3MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-BK-n1SkwWG8TbCsn2KiTHTu_FkfLuKA5m2zqx3RP48"> Maybe due to low memory, you can try 14B model.

GiteaMirror commented

2026-04-12 16:59:13 -05:00

@shiji1236165 commented on GitHub (Feb 18, 2025):

All 64GRAM and 16GVRAM has been allocated when this message shows，so use model with less parameters

@shiji1236165 commented on GitHub (Feb 18, 2025): All 64GRAM and 16GVRAM has been allocated when this message shows，so use model with less parameters

GiteaMirror commented

2026-04-12 16:59:13 -05:00

@Panican-Whyasker commented on GitHub (Feb 20, 2025):

Ollama 0.5.11 here on a NUMA machine with 768 GBytes of RAM and no GPU. Windows Server 2016 Datacenter.

Deepseek-r1:671b (size: 404 Gbytes) used to run for awhile on older Ollama versions but eventually crashed due to Ollama limiting the default num_ctx to just 2048 tokens, and Deepseek also generates a lot of reasoning tokens (the text between < think > ... < /think >), so running out of Context window very quickly, with an error "an error was encountered while running the model: read tcp 127.0.0.1:60865->127.0.0.1:58956: wsarecv: An existing connection was forcibly closed by the remote host."

Now, after setting parameter 'num_ctx' to '16384' (and parameter 'num_thread' to '36' due to the NUMA architecture), I immediately get an error after my prompt:

Error: llama runner process has terminated: exit status 2

@Panican-Whyasker commented on GitHub (Feb 20, 2025): Ollama 0.5.11 here on a NUMA machine with 768 GBytes of RAM and no GPU. Windows Server 2016 Datacenter. Deepseek-r1:671b (size: 404 Gbytes) used to run for awhile on older Ollama versions but eventually crashed due to Ollama limiting the default num_ctx to just 2048 tokens, and Deepseek also generates a lot of reasoning tokens (the text between < think > ... < /think >), so running out of Context window very quickly, with an error "an error was encountered while running the model: read tcp 127.0.0.1:60865->127.0.0.1:58956: wsarecv: An existing connection was forcibly closed by the remote host." Now, after setting parameter 'num_ctx' to '16384' (and parameter 'num_thread' to '36' due to the NUMA architecture), I immediately get an error after my prompt: Error: llama runner process has terminated: exit status 2

GiteaMirror commented

2026-04-12 16:59:14 -05:00

@dataf3l commented on GitHub (Mar 4, 2025):

I fixed mine by reinstalling everything, it just kinda worked when I did that for the latest release, at the time of the writing it, the minor version ends in 13 or 12, I hope this helps

@dataf3l commented on GitHub (Mar 4, 2025): I fixed mine by reinstalling everything, it just kinda worked when I did that for the latest release, at the time of the writing it, the minor version ends in 13 or 12, I hope this helps

GiteaMirror commented

2026-04-12 16:59:14 -05:00

@dataf3l commented on GitHub (Mar 4, 2025):

https://github.com/ollama/ollama/releases

@dataf3l commented on GitHub (Mar 4, 2025): https://github.com/ollama/ollama/releases

GiteaMirror commented

2026-04-12 16:59:14 -05:00

@dataf3l commented on GitHub (Mar 4, 2025):

[> https://github.com/ollama/ollama/releases](https://github.com/ollama/ollama/releases/tag/v0.5.13)

@dataf3l commented on GitHub (Mar 4, 2025): [[> https://github.com/ollama/ollama/releases](https://github.com/ollama/ollama/releases/tag/v0.5.13)](https://github.com/ollama/ollama/releases/tag/v0.5.13)

GiteaMirror commented

2026-04-12 16:59:14 -05:00

@Vigneshgithub01 commented on GitHub (Jun 28, 2025):

I got the same error. I just installed microsoft visual redistributable c++ latest version. It is working for me

@Vigneshgithub01 commented on GitHub (Jun 28, 2025): I got the same error. I just installed microsoft visual redistributable c++ latest version. It is working for me

GiteaMirror commented

2026-04-12 16:59:15 -05:00

@iwaqas commented on GitHub (Jun 30, 2025):

I got the same error. I just installed microsoft visual redistributable c++ latest version. It is working for me

Worked perfectly well as a charm. Thanks!
FYI: The system needs a RESTART after installing the Latest Microsoft Visual C++ Redistributable version.

@iwaqas commented on GitHub (Jun 30, 2025): > I got the same error. I just installed microsoft visual redistributable c++ latest version. It is working for me Worked perfectly well as a charm. Thanks! FYI: The system needs a RESTART after installing the Latest Microsoft Visual C++ Redistributable version.

GiteaMirror commented

2026-04-12 16:59:15 -05:00

@jasonm23 commented on GitHub (Oct 28, 2025):

On Windows It's usually due to missing / incorrect VC C++ redistributable versions. The Ollama installer will make sure the required versions are installed / install them.

Uninstalling and Re-installing Ollama should fix it in 99% of cases

(No need to remove the models!)

@jasonm23 commented on GitHub (Oct 28, 2025): On Windows It's usually due to missing / incorrect VC C++ redistributable versions. The Ollama installer will make sure the required versions are installed / install them. Uninstalling and Re-installing Ollama should fix it in 99% of cases **(No need to remove the models!)**

GiteaMirror commented

2026-04-12 16:59:16 -05:00

@luislobo9b commented on GitHub (Nov 17, 2025):

I had the following error:
Error: 500 Internal Server Error: llama runner process has terminated: exit status 2

I wasn’t able to solve it on Windows using the Ollama application directly. The only way I managed to fix it was by using WSL (Windows Subsystem for Linux) and installing Ollama inside it. After doing that, I was able to run several models that had the same issue before.

@luislobo9b commented on GitHub (Nov 17, 2025): I had the following error: Error: 500 Internal Server Error: llama runner process has terminated: exit status 2 I wasn’t able to solve it on Windows using the Ollama application directly. The only way I managed to **fix it was by using WSL** (Windows Subsystem for Linux) and installing Ollama inside it. After doing that, I was able to run several models that had the same issue before.

GiteaMirror commented

2026-04-12 16:59:16 -05:00

@FlamxGames commented on GitHub (Dec 5, 2025):

Neither Microsoft Visual C++ Redistributable update (and restart), nor uninstalling and reinstalling worked for me.

Llama 3.2 models fail, but gemma3:1b works fine. So this seems to be specific to some models... I tested llama only.

This is an old Windows 10 computer with NVIDIA GeForce GTX 850M (2GB). I used to be able to run llama models just fine, it seems some update broke it.

I uninstalled everything, even the models, downloaded llama3.2:1b (y used to use 3b version) and got this failure. Tested gemma afterwards and it works fine.

These are the full logs of the failure:

[GIN] 2025/12/05 - 08:41:54 | 200 |      2.6176ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/12/05 - 08:41:54 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/12/05 - 08:41:54 | 404 |      1.0579ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/12/05 - 08:41:57 | 200 |      1.0679ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/12/05 - 08:41:58 | 200 |      1.7254ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/12/05 - 08:41:59 | 200 |      1.0561ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/12/05 - 08:41:59 | 200 |      83.969ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/12/05 - 08:42:02 | 200 |      1.6863ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/12/05 - 08:42:02 | 200 |     93.6888ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/12/05 - 08:42:02 | 200 |     74.4676ms |       127.0.0.1 | POST     "/api/show"
time=2025-12-05T08:42:03.035-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="C:\\Users\\x\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 50724"
time=2025-12-05T08:42:03.762-06:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2025-12-05T08:42:03.762-06:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=4 efficiency=0 threads=8
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\x\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 16
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  18:                          general.file_type u32              = 7
llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   34 tensors
llama_model_loader: - type q8_0:  113 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 1.22 GiB (8.50 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 1.24 B
print_info: general.name     = Llama 3.2 1B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-12-05T08:42:04.320-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="C:\\Users\\x\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\x\\.ollama\\models\\blobs\\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --port 50736"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=sched.go:443 msg="system memory" total="7.9 GiB" free="2.0 GiB" free_swap="3.9 GiB"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=sched.go:450 msg="gpu memory" id=GPU-e2592442-e152-3cd2-3beb-17e640a98765 library=CUDA available="1.5 GiB" free="2.0 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=server.go:459 msg="loading model" "model layers"=17 requested=-1
time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="863.0 MiB"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="389.4 MiB"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="112.0 MiB"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="16.0 MiB"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="464.0 MiB"
time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:272 msg="total memory" size="1.8 GiB"
time=2025-12-05T08:42:04.365-06:00 level=INFO source=runner.go:963 msg="starting go runner"
load_backend: loaded CPU backend from C:\Users\x\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 850M, compute capability 5.0, VMM: yes, ID: GPU-e2592442-e152-3cd2-3beb-17e640a98765
load_backend: loaded CUDA backend from C:\Users\x\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\ggml-cuda.dll
time=2025-12-05T08:42:04.598-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-12-05T08:42:04.600-06:00 level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:50736"
time=2025-12-05T08:42:04.608-06:00 level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:4 GPULayers:14[ID:GPU-e2592442-e152-3cd2-3beb-17e640a98765 Layers:14(2..15)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-12-05T08:42:04.608-06:00 level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
time=2025-12-05T08:42:04.608-06:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
ggml_backend_cuda_device_get_memory device GPU-e2592442-e152-3cd2-3beb-17e640a98765 utilizing NVML memory reporting free: 2106064896 total: 2147483648
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GTX 850M) (0000:01:00.0) - 2008 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\x\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 16
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  18:                          general.file_type u32              = 7
llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   34 tensors
llama_model_loader: - type q8_0:  113 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 1.22 GiB (8.50 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 2048
print_info: n_layer          = 16
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 512
print_info: n_embd_v_gqa     = 512
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 8192
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: model type       = 1B
print_info: model params     = 1.24 B
print_info: general.name     = Llama 3.2 1B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading 14 repeating layers to GPU
load_tensors: offloaded 14/17 layers to GPU
load_tensors:        CUDA0 model buffer size =   862.97 MiB
load_tensors:    CUDA_Host model buffer size =   389.45 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 500000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context:        CPU  output buffer size =     0.50 MiB
llama_kv_cache:      CUDA0 KV buffer size =   112.00 MiB
llama_kv_cache:        CPU KV buffer size =    16.00 MiB
llama_kv_cache: size =  128.00 MiB (  4096 cells,  16 layers,  1/1 seqs), K (f16):   64.00 MiB, V (f16):   64.00 MiB
graph_reserve: failed to allocate compute buffers
Exception 0xc0000005 0x0 0x2749b3a7788 0x7ffd8933760a
PC=0x7ffd8933760a
signal arrived during external code execution

runtime.cgocall(0x7ff78153cca0, 0xc0002cfc00)
	runtime/cgocall.go:167 +0x3e fp=0xc0002cfbd8 sp=0xc0002cfb70 pc=0x7ff78080243e
github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x274e3198010, {0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...})
	_cgo_gotypes.go:754 +0x54 fp=0xc0002cfc00 sp=0xc0002cfbd8 pc=0x7ff780bd2eb4
github.com/ollama/ollama/llama.NewContextWithModel.func1(...)
	github.com/ollama/ollama/llama/llama.go:317
github.com/ollama/ollama/llama.NewContextWithModel(0xc000448838, {{0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...}})
	github.com/ollama/ollama/llama/llama.go:317 +0x158 fp=0xc0002cfda0 sp=0xc0002cfc00 pc=0x7ff780bd7478
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc00078dc20, {{0xc000618760, 0x1, 0x1}, 0xe, 0x0, 0x0, {0xc000618758, 0x1, 0x2}, ...}, ...)
	github.com/ollama/ollama/runner/llamarunner/runner.go:845 +0x178 fp=0xc0002cfee8 sp=0xc0002cfda0 pc=0x7ff780c918b8
github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2()
	github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x115 fp=0xc0002cffe0 sp=0xc0002cfee8 pc=0x7ff780c92ad5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002cffe8 sp=0xc0002cffe0 pc=0x7ff78080d8e1
created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 11
	github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x88a

goroutine 1 gp=0xc0000021c0 m=nil [IO wait]:
runtime.gopark(0x7ff78080f0e0?, 0x7ff782684a00?, 0xa0?, 0xd6?, 0xc0006ed74c?)
	runtime/proc.go:435 +0xce fp=0xc00011d648 sp=0xc00011d628 pc=0x7ff78080598e
runtime.netpollblock(0x39c?, 0x807a0406?, 0xf7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc00011d680 sp=0xc00011d648 pc=0x7ff7807cbdf7
internal/poll.runtime_pollWait(0x274de5385f0, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc00011d6a0 sp=0xc00011d680 pc=0x7ff780804b25
internal/poll.(*pollDesc).wait(0x7ff78089a693?, 0x0?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00011d6c8 sp=0xc00011d6a0 pc=0x7ff78089bc87
internal/poll.execIO(0xc0006ed6a0, 0xc00011d770)
	internal/poll/fd_windows.go:177 +0x105 fp=0xc00011d740 sp=0xc00011d6c8 pc=0x7ff78089d0e5
internal/poll.(*FD).acceptOne(0xc0006ed688, 0x3a8, {0xc0001300f0?, 0xc00011d7d0?, 0x7ff7808a4da5?}, 0xc00011d804?)
	internal/poll/fd_windows.go:946 +0x65 fp=0xc00011d7a0 sp=0xc00011d740 pc=0x7ff7808a1665
internal/poll.(*FD).Accept(0xc0006ed688, 0xc00011d950)
	internal/poll/fd_windows.go:980 +0x1b6 fp=0xc00011d858 sp=0xc00011d7a0 pc=0x7ff7808a1996
net.(*netFD).accept(0xc0006ed688)
	net/fd_windows.go:182 +0x4b fp=0xc00011d970 sp=0xc00011d858 pc=0x7ff780912f0b
net.(*TCPListener).accept(0xc0004bd6c0)
	net/tcpsock_posix.go:159 +0x1b fp=0xc00011d9c0 sp=0xc00011d970 pc=0x7ff780928f5b
net.(*TCPListener).Accept(0xc0004bd6c0)
	net/tcpsock.go:380 +0x30 fp=0xc00011d9f0 sp=0xc00011d9c0 pc=0x7ff780927d10
net/http.(*onceCloseListener).Accept(0xc00010c3f0?)
	<autogenerated>:1 +0x24 fp=0xc00011da08 sp=0xc00011d9f0 pc=0x7ff780b41184
net/http.(*Server).Serve(0xc0006f9100, {0x7ff781cf7240, 0xc0004bd6c0})
	net/http/server.go:3424 +0x30c fp=0xc00011db38 sp=0xc00011da08 pc=0x7ff780b18a4c
github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000a0020, 0x4, 0x6})
	github.com/ollama/ollama/runner/llamarunner/runner.go:1000 +0x8f5 fp=0xc00011dd08 sp=0xc00011db38 pc=0x7ff780c93495
github.com/ollama/ollama/runner.Execute({0xc0000a0010?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00011dd30 sp=0xc00011dd08 pc=0x7ff780d39d14
github.com/ollama/ollama/cmd.NewCLI.func2(0xc0006f8e00?, {0x7ff781b0fd5f?, 0x4?, 0x7ff781b0fd63?})
	github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc00011dd58 sp=0xc00011dd30 pc=0x7ff7814cd505
github.com/spf13/cobra.(*Command).execute(0xc0006b1b08, {0xc0004bd4c0, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00011de78 sp=0xc00011dd58 pc=0x7ff78098d9dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0004c4308)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00011df30 sp=0xc00011de78 pc=0x7ff78098e225
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00011df50 sp=0xc00011df30 pc=0x7ff7814cdfed
runtime.main()
	runtime/proc.go:283 +0x27d fp=0xc00011dfe0 sp=0xc00011df50 pc=0x7ff7807d4ddd
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff78080d8e1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00006ffa8 sp=0xc00006ff88 pc=0x7ff78080598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.forcegchelper()
	runtime/proc.go:348 +0xb8 fp=0xc00006ffe0 sp=0xc00006ffa8 pc=0x7ff7807d50f8
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x7ff78080d8e1
created by runtime.init.7 in goroutine 1
	runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000071f80 sp=0xc000071f60 pc=0x7ff78080598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.bgsweep(0xc00007e000)
	runtime/mgcsweep.go:316 +0xdf fp=0xc000071fc8 sp=0xc000071f80 pc=0x7ff7807bdebf
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x7ff7807b2285
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x7ff78080d8e1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff781ce3a60?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x7ff78080598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.(*scavengerState).park(0x7ff7826ab3c0)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x7ff7807bb909
runtime.bgscavenge(0xc00007e000)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x7ff7807bbe99
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x7ff7807b2225
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff78080d8e1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000087e30 sp=0xc000087e10 pc=0x7ff78080598e
runtime.runfinq()
	runtime/mfinal.go:196 +0x107 fp=0xc000087fe0 sp=0xc000087e30 pc=0x7ff7807b1207
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x7ff78080d8e1
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]:
runtime.gopark(0xc0000d3900?, 0xc000600018?, 0x60?, 0x3f?, 0x7ff7808fbe48?)
	runtime/proc.go:435 +0xce fp=0xc000073f18 sp=0xc000073ef8 pc=0x7ff78080598e
runtime.chanrecv(0xc000036540, 0x0, 0x1)
	runtime/chan.go:664 +0x445 fp=0xc000073f90 sp=0xc000073f18 pc=0x7ff7807a2d45
runtime.chanrecv1(0x7ff7807d4f40?, 0xc000073f76?)
	runtime/chan.go:506 +0x12 fp=0xc000073fb8 sp=0xc000073f90 pc=0x7ff7807a28d2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1799 +0x2f fp=0xc000073fe0 sp=0xc000073fb8 pc=0x7ff7807b54af
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x7ff78080d8e1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0003da540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0003da700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc0001061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x25106eadedc?, 0x1?, 0x60?, 0xc0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000484000 m=nil [GC worker (idle)]:
runtime.gopark(0x25106eadedc?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0003da8c0 m=nil [GC worker (idle)]:
runtime.gopark(0x25106eaa2b4?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc000106380 m=nil [GC worker (idle)]:
runtime.gopark(0x25106e2f834?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000484380 m=nil [GC worker (idle)]:
runtime.gopark(0x25106e2f834?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000484540 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff7826f9fa0?, 0x1?, 0xa8?, 0xe6?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff78080598e
runtime.gcBgMarkWorker(0xc000037b20)
	runtime/mgc.go:1423 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff7807b47a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff7807b4685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff78080d8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc000484e00 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x80?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000471e20 sp=0xc000471e00 pc=0x7ff78080598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.semacquire1(0xc00078dc40, 0x0, 0x1, 0x0, 0x18)
	runtime/sema.go:188 +0x22f fp=0xc000471e88 sp=0xc000471e20 pc=0x7ff7807e750f
sync.runtime_SemacquireWaitGroup(0x0?)
	runtime/sema.go:110 +0x25 fp=0xc000471ec0 sp=0xc000471e88 pc=0x7ff780806f85
sync.(*WaitGroup).Wait(0x0?)
	sync/waitgroup.go:118 +0x48 fp=0xc000471ee8 sp=0xc000471ec0 pc=0x7ff78081b7a8
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc00078dc20, {0x7ff781cf9840, 0xc0004a7180})
	github.com/ollama/ollama/runner/llamarunner/runner.go:359 +0x4b fp=0xc000471fb8 sp=0xc000471ee8 pc=0x7ff780c8e26b
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x28 fp=0xc000471fe0 sp=0xc000471fb8 pc=0x7ff780c93708
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000471fe8 sp=0xc000471fe0 pc=0x7ff78080d8e1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x4c5

goroutine 11 gp=0xc000484fc0 m=nil [IO wait]:
runtime.gopark(0x0?, 0xc0006ed920?, 0xc8?, 0xd9?, 0xc0006ed9cc?)
	runtime/proc.go:435 +0xce fp=0xc0000498c8 sp=0xc0000498a8 pc=0x7ff78080598e
runtime.netpollblock(0x3a4?, 0x807a0406?, 0xf7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc000049900 sp=0xc0000498c8 pc=0x7ff7807cbdf7
internal/poll.runtime_pollWait(0x274de5384d8, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc000049920 sp=0xc000049900 pc=0x7ff780804b25
internal/poll.(*pollDesc).wait(0x7ff7809cc9b7?, 0xc000049970?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000049948 sp=0xc000049920 pc=0x7ff78089bc87
internal/poll.execIO(0xc0006ed920, 0x7ff781b873c8)
	internal/poll/fd_windows.go:177 +0x105 fp=0xc0000499c0 sp=0xc000049948 pc=0x7ff78089d0e5
internal/poll.(*FD).Read(0xc0006ed908, {0xc0001f2000, 0x1000, 0x1000})
	internal/poll/fd_windows.go:438 +0x29b fp=0xc000049a60 sp=0xc0000499c0 pc=0x7ff78089ddbb
net.(*netFD).Read(0xc0006ed908, {0xc0001f2000?, 0xc000049ad0?, 0x7ff78089c145?})
	net/fd_posix.go:55 +0x25 fp=0xc000049aa8 sp=0xc000049a60 pc=0x7ff780911025
net.(*conn).Read(0xc0004a2330, {0xc0001f2000?, 0x0?, 0x0?})
	net/net.go:194 +0x45 fp=0xc000049af0 sp=0xc000049aa8 pc=0x7ff780920505
net/http.(*connReader).Read(0xc00003e480, {0xc0001f2000, 0x1000, 0x1000})
	net/http/server.go:798 +0x159 fp=0xc000049b40 sp=0xc000049af0 pc=0x7ff780b0d8f9
bufio.(*Reader).fill(0xc000211e60)
	bufio/bufio.go:113 +0x103 fp=0xc000049b78 sp=0xc000049b40 pc=0x7ff780936d43
bufio.(*Reader).Peek(0xc000211e60, 0x4)
	bufio/bufio.go:152 +0x53 fp=0xc000049b98 sp=0xc000049b78 pc=0x7ff780936e73
net/http.(*conn).serve(0xc00010c3f0, {0x7ff781cf9808, 0xc00060aa20})
	net/http/server.go:2137 +0x785 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff780b136e5
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff780b18e48
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff78080d8e1
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3454 +0x485
rax     0xfffffffff4afcba0
rbx     0x0
rcx     0x274e33cb2d0
rdx     0x274f5bc15d0
rdi     0x274f525f720
rsi     0x274f525f550
rbp     0x79818ff7a0
rsp     0x79818ff558
r8      0x27499000e40
r9      0x1
r10     0x8000
r11     0x79818ff540
r12     0x274f5f53810
r13     0x274e33cd270
r14     0x1
r15     0x4a0
rip     0x7ffd8933760a
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b
time=2025-12-05T08:42:10.810-06:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server error"
time=2025-12-05T08:42:11.061-06:00 level=INFO source=sched.go:470 msg="Load failed" model=C:\Users\x\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 error="llama runner process has terminated: exit status 2"
[GIN] 2025/12/05 - 08:42:11 | 500 |    8.2401105s |       127.0.0.1 | POST     "/api/chat"

@FlamxGames commented on GitHub (Dec 5, 2025): Neither Microsoft Visual C++ Redistributable update (and restart), nor uninstalling and reinstalling worked for me. Llama 3.2 models fail, but gemma3:1b works fine. So this seems to be specific to some models... I tested llama only. This is an old Windows 10 computer with NVIDIA GeForce GTX 850M (2GB). I used to be able to run llama models just fine, it seems some update broke it. I uninstalled everything, even the models, downloaded llama3.2:1b (y used to use 3b version) and got this failure. Tested gemma afterwards and it works fine. These are the full logs of the failure: ``` [GIN] 2025/12/05 - 08:41:54 | 200 | 2.6176ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/12/05 - 08:41:54 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2025/12/05 - 08:41:54 | 404 | 1.0579ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/12/05 - 08:41:57 | 200 | 1.0679ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/12/05 - 08:41:58 | 200 | 1.7254ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/12/05 - 08:41:59 | 200 | 1.0561ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/12/05 - 08:41:59 | 200 | 83.969ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/12/05 - 08:42:02 | 200 | 1.6863ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/12/05 - 08:42:02 | 200 | 93.6888ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/12/05 - 08:42:02 | 200 | 74.4676ms | 127.0.0.1 | POST "/api/show" time=2025-12-05T08:42:03.035-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="C:\\Users\\x\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 50724" time=2025-12-05T08:42:03.762-06:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2025-12-05T08:42:03.762-06:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=4 efficiency=0 threads=8 llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\x\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Llama-3.2 llama_model_loader: - kv 5: general.size_label str = 1B llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 8: llama.block_count u32 = 16 llama_model_loader: - kv 9: llama.context_length u32 = 131072 llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 llama_model_loader: - kv 18: general.file_type u32 = 7 llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 29: general.quantization_version u32 = 2 llama_model_loader: - type f32: 34 tensors llama_model_loader: - type q8_0: 113 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 1.22 GiB (8.50 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 1.24 B print_info: general.name = Llama 3.2 1B Instruct print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-12-05T08:42:04.320-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="C:\\Users\\x\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\x\\.ollama\\models\\blobs\\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --port 50736" time=2025-12-05T08:42:04.324-06:00 level=INFO source=sched.go:443 msg="system memory" total="7.9 GiB" free="2.0 GiB" free_swap="3.9 GiB" time=2025-12-05T08:42:04.324-06:00 level=INFO source=sched.go:450 msg="gpu memory" id=GPU-e2592442-e152-3cd2-3beb-17e640a98765 library=CUDA available="1.5 GiB" free="2.0 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-12-05T08:42:04.324-06:00 level=INFO source=server.go:459 msg="loading model" "model layers"=17 requested=-1 time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="863.0 MiB" time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="389.4 MiB" time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="112.0 MiB" time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="16.0 MiB" time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="464.0 MiB" time=2025-12-05T08:42:04.324-06:00 level=INFO source=device.go:272 msg="total memory" size="1.8 GiB" time=2025-12-05T08:42:04.365-06:00 level=INFO source=runner.go:963 msg="starting go runner" load_backend: loaded CPU backend from C:\Users\x\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 850M, compute capability 5.0, VMM: yes, ID: GPU-e2592442-e152-3cd2-3beb-17e640a98765 load_backend: loaded CUDA backend from C:\Users\x\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\ggml-cuda.dll time=2025-12-05T08:42:04.598-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-12-05T08:42:04.600-06:00 level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:50736" time=2025-12-05T08:42:04.608-06:00 level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:4 GPULayers:14[ID:GPU-e2592442-e152-3cd2-3beb-17e640a98765 Layers:14(2..15)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-12-05T08:42:04.608-06:00 level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" time=2025-12-05T08:42:04.608-06:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" ggml_backend_cuda_device_get_memory device GPU-e2592442-e152-3cd2-3beb-17e640a98765 utilizing NVML memory reporting free: 2106064896 total: 2147483648 llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GTX 850M) (0000:01:00.0) - 2008 MiB free llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from C:\Users\x\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Llama-3.2 llama_model_loader: - kv 5: general.size_label str = 1B llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 8: llama.block_count u32 = 16 llama_model_loader: - kv 9: llama.context_length u32 = 131072 llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 llama_model_loader: - kv 18: general.file_type u32 = 7 llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 29: general.quantization_version u32 = 2 llama_model_loader: - type f32: 34 tensors llama_model_loader: - type q8_0: 113 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 1.22 GiB (8.50 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 2048 print_info: n_layer = 16 print_info: n_head = 32 print_info: n_head_kv = 8 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 4 print_info: n_embd_k_gqa = 512 print_info: n_embd_v_gqa = 512 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 8192 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 500000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 131072 print_info: rope_finetuned = unknown print_info: model type = 1B print_info: model params = 1.24 B print_info: general.name = Llama 3.2 1B Instruct print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: offloading 14 repeating layers to GPU load_tensors: offloaded 14/17 layers to GPU load_tensors: CUDA0 model buffer size = 862.97 MiB load_tensors: CUDA_Host model buffer size = 389.45 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 500000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_context: CPU output buffer size = 0.50 MiB llama_kv_cache: CUDA0 KV buffer size = 112.00 MiB llama_kv_cache: CPU KV buffer size = 16.00 MiB llama_kv_cache: size = 128.00 MiB ( 4096 cells, 16 layers, 1/1 seqs), K (f16): 64.00 MiB, V (f16): 64.00 MiB graph_reserve: failed to allocate compute buffers Exception 0xc0000005 0x0 0x2749b3a7788 0x7ffd8933760a PC=0x7ffd8933760a signal arrived during external code execution runtime.cgocall(0x7ff78153cca0, 0xc0002cfc00) runtime/cgocall.go:167 +0x3e fp=0xc0002cfbd8 sp=0xc0002cfb70 pc=0x7ff78080243e github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x274e3198010, {0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...}) _cgo_gotypes.go:754 +0x54 fp=0xc0002cfc00 sp=0xc0002cfbd8 pc=0x7ff780bd2eb4 github.com/ollama/ollama/llama.NewContextWithModel.func1(...) github.com/ollama/ollama/llama/llama.go:317 github.com/ollama/ollama/llama.NewContextWithModel(0xc000448838, {{0x1000, 0x200, 0x200, 0x1, 0x4, 0x4, 0xffffffff, 0xffffffff, 0xffffffff, ...}}) github.com/ollama/ollama/llama/llama.go:317 +0x158 fp=0xc0002cfda0 sp=0xc0002cfc00 pc=0x7ff780bd7478 github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc00078dc20, {{0xc000618760, 0x1, 0x1}, 0xe, 0x0, 0x0, {0xc000618758, 0x1, 0x2}, ...}, ...) github.com/ollama/ollama/runner/llamarunner/runner.go:845 +0x178 fp=0xc0002cfee8 sp=0xc0002cfda0 pc=0x7ff780c918b8 github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x115 fp=0xc0002cffe0 sp=0xc0002cfee8 pc=0x7ff780c92ad5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002cffe8 sp=0xc0002cffe0 pc=0x7ff78080d8e1 created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 11 github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x88a goroutine 1 gp=0xc0000021c0 m=nil [IO wait]: runtime.gopark(0x7ff78080f0e0?, 0x7ff782684a00?, 0xa0?, 0xd6?, 0xc0006ed74c?) runtime/proc.go:435 +0xce fp=0xc00011d648 sp=0xc00011d628 pc=0x7ff78080598e runtime.netpollblock(0x39c?, 0x807a0406?, 0xf7?) runtime/netpoll.go:575 +0xf7 fp=0xc00011d680 sp=0xc00011d648 pc=0x7ff7807cbdf7 internal/poll.runtime_pollWait(0x274de5385f0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00011d6a0 sp=0xc00011d680 pc=0x7ff780804b25 internal/poll.(*pollDesc).wait(0x7ff78089a693?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00011d6c8 sp=0xc00011d6a0 pc=0x7ff78089bc87 internal/poll.execIO(0xc0006ed6a0, 0xc00011d770) internal/poll/fd_windows.go:177 +0x105 fp=0xc00011d740 sp=0xc00011d6c8 pc=0x7ff78089d0e5 internal/poll.(*FD).acceptOne(0xc0006ed688, 0x3a8, {0xc0001300f0?, 0xc00011d7d0?, 0x7ff7808a4da5?}, 0xc00011d804?) internal/poll/fd_windows.go:946 +0x65 fp=0xc00011d7a0 sp=0xc00011d740 pc=0x7ff7808a1665 internal/poll.(*FD).Accept(0xc0006ed688, 0xc00011d950) internal/poll/fd_windows.go:980 +0x1b6 fp=0xc00011d858 sp=0xc00011d7a0 pc=0x7ff7808a1996 net.(*netFD).accept(0xc0006ed688) net/fd_windows.go:182 +0x4b fp=0xc00011d970 sp=0xc00011d858 pc=0x7ff780912f0b net.(*TCPListener).accept(0xc0004bd6c0) net/tcpsock_posix.go:159 +0x1b fp=0xc00011d9c0 sp=0xc00011d970 pc=0x7ff780928f5b net.(*TCPListener).Accept(0xc0004bd6c0) net/tcpsock.go:380 +0x30 fp=0xc00011d9f0 sp=0xc00011d9c0 pc=0x7ff780927d10 net/http.(*onceCloseListener).Accept(0xc00010c3f0?) <autogenerated>:1 +0x24 fp=0xc00011da08 sp=0xc00011d9f0 pc=0x7ff780b41184 net/http.(*Server).Serve(0xc0006f9100, {0x7ff781cf7240, 0xc0004bd6c0}) net/http/server.go:3424 +0x30c fp=0xc00011db38 sp=0xc00011da08 pc=0x7ff780b18a4c github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000a0020, 0x4, 0x6}) github.com/ollama/ollama/runner/llamarunner/runner.go:1000 +0x8f5 fp=0xc00011dd08 sp=0xc00011db38 pc=0x7ff780c93495 github.com/ollama/ollama/runner.Execute({0xc0000a0010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00011dd30 sp=0xc00011dd08 pc=0x7ff780d39d14 github.com/ollama/ollama/cmd.NewCLI.func2(0xc0006f8e00?, {0x7ff781b0fd5f?, 0x4?, 0x7ff781b0fd63?}) github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc00011dd58 sp=0xc00011dd30 pc=0x7ff7814cd505 github.com/spf13/cobra.(*Command).execute(0xc0006b1b08, {0xc0004bd4c0, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00011de78 sp=0xc00011dd58 pc=0x7ff78098d9dc github.com/spf13/cobra.(*Command).ExecuteC(0xc0004c4308) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00011df30 sp=0xc00011de78 pc=0x7ff78098e225 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00011df50 sp=0xc00011df30 pc=0x7ff7814cdfed runtime.main() runtime/proc.go:283 +0x27d fp=0xc00011dfe0 sp=0xc00011df50 pc=0x7ff7807d4ddd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff78080d8e1 goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00006ffa8 sp=0xc00006ff88 pc=0x7ff78080598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc00006ffe0 sp=0xc00006ffa8 pc=0x7ff7807d50f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x7ff78080d8e1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000071f80 sp=0xc000071f60 pc=0x7ff78080598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00007e000) runtime/mgcsweep.go:316 +0xdf fp=0xc000071fc8 sp=0xc000071f80 pc=0x7ff7807bdebf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x7ff7807b2285 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x7ff78080d8e1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x7ff781ce3a60?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x7ff78080598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x7ff7826ab3c0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x7ff7807bb909 runtime.bgscavenge(0xc00007e000) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x7ff7807bbe99 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x7ff7807b2225 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff78080d8e1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003340 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087e30 sp=0xc000087e10 pc=0x7ff78080598e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc000087fe0 sp=0xc000087e30 pc=0x7ff7807b1207 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x7ff78080d8e1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc000003dc0 m=nil [chan receive]: runtime.gopark(0xc0000d3900?, 0xc000600018?, 0x60?, 0x3f?, 0x7ff7808fbe48?) runtime/proc.go:435 +0xce fp=0xc000073f18 sp=0xc000073ef8 pc=0x7ff78080598e runtime.chanrecv(0xc000036540, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000073f90 sp=0xc000073f18 pc=0x7ff7807a2d45 runtime.chanrecv1(0x7ff7807d4f40?, 0xc000073f76?) runtime/chan.go:506 +0x12 fp=0xc000073fb8 sp=0xc000073f90 pc=0x7ff7807a28d2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc000073fe0 sp=0xc000073fb8 pc=0x7ff7807b54af runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x7ff78080d8e1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0003da540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0003da700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc0001061c0 m=nil [GC worker (idle)]: runtime.gopark(0x25106eadedc?, 0x1?, 0x60?, 0xc0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000484000 m=nil [GC worker (idle)]: runtime.gopark(0x25106eadedc?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0003da8c0 m=nil [GC worker (idle)]: runtime.gopark(0x25106eaa2b4?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc000106380 m=nil [GC worker (idle)]: runtime.gopark(0x25106e2f834?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc000484380 m=nil [GC worker (idle)]: runtime.gopark(0x25106e2f834?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000484540 m=nil [GC worker (idle)]: runtime.gopark(0x7ff7826f9fa0?, 0x1?, 0xa8?, 0xe6?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff78080598e runtime.gcBgMarkWorker(0xc000037b20) runtime/mgc.go:1423 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff7807b47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff7807b4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff78080d8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc000484e00 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x80?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000471e20 sp=0xc000471e00 pc=0x7ff78080598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc00078dc40, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x22f fp=0xc000471e88 sp=0xc000471e20 pc=0x7ff7807e750f sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x25 fp=0xc000471ec0 sp=0xc000471e88 pc=0x7ff780806f85 sync.(*WaitGroup).Wait(0x0?) sync/waitgroup.go:118 +0x48 fp=0xc000471ee8 sp=0xc000471ec0 pc=0x7ff78081b7a8 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc00078dc20, {0x7ff781cf9840, 0xc0004a7180}) github.com/ollama/ollama/runner/llamarunner/runner.go:359 +0x4b fp=0xc000471fb8 sp=0xc000471ee8 pc=0x7ff780c8e26b github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x28 fp=0xc000471fe0 sp=0xc000471fb8 pc=0x7ff780c93708 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000471fe8 sp=0xc000471fe0 pc=0x7ff78080d8e1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x4c5 goroutine 11 gp=0xc000484fc0 m=nil [IO wait]: runtime.gopark(0x0?, 0xc0006ed920?, 0xc8?, 0xd9?, 0xc0006ed9cc?) runtime/proc.go:435 +0xce fp=0xc0000498c8 sp=0xc0000498a8 pc=0x7ff78080598e runtime.netpollblock(0x3a4?, 0x807a0406?, 0xf7?) runtime/netpoll.go:575 +0xf7 fp=0xc000049900 sp=0xc0000498c8 pc=0x7ff7807cbdf7 internal/poll.runtime_pollWait(0x274de5384d8, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000049920 sp=0xc000049900 pc=0x7ff780804b25 internal/poll.(*pollDesc).wait(0x7ff7809cc9b7?, 0xc000049970?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000049948 sp=0xc000049920 pc=0x7ff78089bc87 internal/poll.execIO(0xc0006ed920, 0x7ff781b873c8) internal/poll/fd_windows.go:177 +0x105 fp=0xc0000499c0 sp=0xc000049948 pc=0x7ff78089d0e5 internal/poll.(*FD).Read(0xc0006ed908, {0xc0001f2000, 0x1000, 0x1000}) internal/poll/fd_windows.go:438 +0x29b fp=0xc000049a60 sp=0xc0000499c0 pc=0x7ff78089ddbb net.(*netFD).Read(0xc0006ed908, {0xc0001f2000?, 0xc000049ad0?, 0x7ff78089c145?}) net/fd_posix.go:55 +0x25 fp=0xc000049aa8 sp=0xc000049a60 pc=0x7ff780911025 net.(*conn).Read(0xc0004a2330, {0xc0001f2000?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc000049af0 sp=0xc000049aa8 pc=0x7ff780920505 net/http.(*connReader).Read(0xc00003e480, {0xc0001f2000, 0x1000, 0x1000}) net/http/server.go:798 +0x159 fp=0xc000049b40 sp=0xc000049af0 pc=0x7ff780b0d8f9 bufio.(*Reader).fill(0xc000211e60) bufio/bufio.go:113 +0x103 fp=0xc000049b78 sp=0xc000049b40 pc=0x7ff780936d43 bufio.(*Reader).Peek(0xc000211e60, 0x4) bufio/bufio.go:152 +0x53 fp=0xc000049b98 sp=0xc000049b78 pc=0x7ff780936e73 net/http.(*conn).serve(0xc00010c3f0, {0x7ff781cf9808, 0xc00060aa20}) net/http/server.go:2137 +0x785 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff780b136e5 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff780b18e48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff78080d8e1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 rax 0xfffffffff4afcba0 rbx 0x0 rcx 0x274e33cb2d0 rdx 0x274f5bc15d0 rdi 0x274f525f720 rsi 0x274f525f550 rbp 0x79818ff7a0 rsp 0x79818ff558 r8 0x27499000e40 r9 0x1 r10 0x8000 r11 0x79818ff540 r12 0x274f5f53810 r13 0x274e33cd270 r14 0x1 r15 0x4a0 rip 0x7ffd8933760a rflags 0x10206 cs 0x33 fs 0x53 gs 0x2b time=2025-12-05T08:42:10.810-06:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server error" time=2025-12-05T08:42:11.061-06:00 level=INFO source=sched.go:470 msg="Load failed" model=C:\Users\x\.ollama\models\blobs\sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 error="llama runner process has terminated: exit status 2" [GIN] 2025/12/05 - 08:42:11 | 500 | 8.2401105s | 127.0.0.1 | POST "/api/chat" ```

GiteaMirror commented

2026-04-12 16:59:16 -05:00

@KIC commented on GitHub (Feb 16, 2026):

Have this error when I use the "/v1/chat/completions" API.
level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"
However, with the other API I had no issues. But I was hoping I could merge the clients into a single OpenAPI client.

[GIN] 2026/02/16 - 13:00:47 | 500 | 6.841191037s | 172.19.0.6 | POST "/v1/chat/completions"

vs

[GIN] 2026/02/16 - 13:04:26 | 200 | 7.304322007s | 172.19.0.6 | POST "/api/generate"

@KIC commented on GitHub (Feb 16, 2026): Have this error when I use the "/v1/chat/completions" API. `level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"` However, with the other API I had no issues. But I was hoping I could merge the clients into a single OpenAPI client. `[GIN] 2026/02/16 - 13:00:47 | 500 | 6.841191037s | 172.19.0.6 | POST "/v1/chat/completions"` vs `[GIN] 2026/02/16 - 13:04:26 | 200 | 7.304322007s | 172.19.0.6 | POST "/api/generate"`

GiteaMirror referenced this issue

2026-04-22 08:04:07 -05:00

[GH-ISSUE #5693] Per-Model Concurrency #29308

GiteaMirror referenced this issue

2026-04-28 13:57:34 -05:00

[GH-ISSUE #5693] Per-Model Concurrency #50059

GiteaMirror referenced this issue

2026-05-03 21:47:33 -05:00

[GH-ISSUE #5693] Per-Model Concurrency #65585

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#5693