[GH-ISSUE #14188] Crashes RTX 3060 on cuda_v13 crashes v0.13.0 to v0.15.6 #9247

Closed
opened 2026-04-12 22:07:20 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @yunesj on GitHub (Feb 10, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14188

What is the issue?

Ollama crashes RTX 3060 on cuda_v13 crashes v0.13.0 to v0.15.6. Deleting the cuda_v13, backend falls back to cuda_v12, and works, as it did in version <= 0.12.11.

It seems similar to https://github.com/ollama/ollama/pull/12300.

Relevant log output

Feb 10 06:52:34 ollama ollama[5962]: time=2026-02-10T06:52:34.546Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:46567/load\": EOF"
Feb 10 06:52:34 ollama ollama[5962]: time=2026-02-10T06:52:34.546Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:46567/load\": dial tcp 
127.0.0.1:46567: connect: connection refused"
Feb 10 06:52:34 ollama ollama[5962]: time=2026-02-10T06:52:34.546Z level=INFO source=sched.go:490 msg="Load failed" model=/root/.ollama/models/blobs/sha256-a3de86cd1c132c822487e
dedd47a324c50491393e6565cd14bafa40d0b8e686f error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
Feb 10 06:52:34 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:34 | 500 |  903.736937ms |       127.0.0.1 | POST     "/api/generate"
Feb 10 06:52:41 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:41 | 200 |      32.665µs |       127.0.0.1 | HEAD     "/"
Feb 10 06:52:41 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:41 | 200 |  129.807248ms |       127.0.0.1 | POST     "/api/show"
Feb 10 06:52:41 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:41 | 200 |  125.652266ms |       127.0.0.1 | POST     "/api/show"
Feb 10 06:52:41 ollama ollama[5962]: time=2026-02-10T06:52:41.952Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-en
gine --port 36061"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.156Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsi
ng \"max\": invalid syntax"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=server.go:247 msg="enabling flash attention"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=server.go:431 msg="startFeb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1411 msg="starting ollama engine"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:39333"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.260Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAt
tention:Enabled KvSize:4096 KvCacheType: NumThreads:1 GPULayers:37[ID:GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 Us
eMmap:false}"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.290Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name="Qwen3 8B" description="" num_te
nsors=399 num_key_values=29
Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CPU backend from /usr/local/lib/ollama/lib/ollama/libggml-cpu-haswell.so
Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: found 1 CUDA devices:
Feb 10 06:52:42 ollama ollama[5962]:   Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14
Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/lib/ollama/cuda_v13/libggml-cuda.so
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.456Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.
0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compile
r=cgo(gcc)
Feb 10 06:52:42 ollama ollama[5962]: free(): invalid pointer
Feb 10 06:52:42 ollama ollama[5962]: SIGABRT: abort
Feb 10 06:52:42 ollama ollama[5962]: PC=0x7ccc3749eb2c m=5 sigcode=18446744073709551610
Feb 10 06:52:42 ollama ollama[5962]: signal arrived during cgo execution
Feb 10 06:52:42 ollama ollama[5962]: goroutine 23 gp=0xc000504e00 m=5 mp=0xc000100008 [syscall]:
Feb 10 06:52:42 ollama ollama[5962]: runtime.cgocall(0x65331c7f2d00, 0xc0000490d8)
Feb 10 06:52:42 ollama ollama[5962]:         runtime/cgocall.go:167 +0x4b fp=0xc0000490b0 sp=0xc000049078 pc=0x65331b9a994b
Feb 10 06:52:42 ollama ollama[5962]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0x7ccbdc79bfd0, 0x7ccb724e4e90)ing runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-en
gine --model /root/.ollama/models/blobs/sha256-a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f --port 39333"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=sched.go:463 msg="system memory" total="8.0 GiB" free="107.4 MiB" free_swap="512.0 MiB"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=sched.go:470 msg="gpu memory" id=GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 library=CUDA avail
able="11.2 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=server.go:757 msg="loading model" "model layers"=37 requested=-1
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1411 msg="starting ollama engine"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:39333"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.260Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAt
tention:Enabled KvSize:4096 KvCacheType: NumThreads:1 GPULayers:37[ID:GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 Us
eMmap:false}"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.290Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name="Qwen3 8B" description="" num_te
nsors=399 num_key_values=29
Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CPU backend from /usr/local/lib/ollama/lib/ollama/libggml-cpu-haswell.so
Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: found 1 CUDA devices:
Feb 10 06:52:42 ollama ollama[5962]:   Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14
Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/lib/ollama/cuda_v13/libggml-cuda.so
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.456Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.
0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compile
r=cgo(gcc)
Feb 10 06:52:42 ollama ollama[5962]: free(): invalid pointer
Feb 10 06:52:42 ollama ollama[5962]: SIGABRT: abort
Feb 10 06:52:42 ollama ollama[5962]: PC=0x7ccc3749eb2c m=5 sigcode=18446744073709551610
Feb 10 06:52:42 ollama ollama[5962]: signal arrived during cgo execution
Feb 10 06:52:42 ollama ollama[5962]: goroutine 23 gp=0xc000504e00 m=5 mp=0xc000100008 [syscall]:
Feb 10 06:52:42 ollama ollama[5962]: runtime.cgocall(0x65331c7f2d00, 0xc0000490d8)
Feb 10 06:52:42 ollama ollama[5962]:         runtime/cgocall.go:167 +0x4b fp=0xc0000490b0 sp=0xc000049078 pc=0x65331b9a994b
Feb 10 06:52:42 ollama ollama[5962]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0x7ccbdc79bfd0, 0x7ccb724e4e90)

...
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.680Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:39333/load\": EOF"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.681Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:39333/load\": dial tcp 
127.0.0.1:39333: connect: connection refused"
Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.681Z level=INFO source=sched.go:490 msg="Load failed" model=/root/.ollama/models/blobs/sha256-a3de86cd1c132c822487e
dedd47a324c50491393e6565cd14bafa40d0b8e686f error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
Feb 10 06:52:42 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:42 | 500 |  907.165692ms |       127.0.0.1 | POST     "/api/generate"
Feb 10 07:02:40 ollama ollama[5962]: [GIN] 2026/02/10 - 07:02:40 | 200 |      28.029µs |       127.0.0.1 | GET      "/api/version"
Feb 10 07:13:24 ollama systemd[1]: Stopping ollama.service - Ollama Service...

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.15.6

Originally created by @yunesj on GitHub (Feb 10, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14188 ### What is the issue? Ollama crashes RTX 3060 on cuda_v13 crashes v0.13.0 to v0.15.6. Deleting the cuda_v13, backend falls back to cuda_v12, and works, as it did in version <= 0.12.11. It seems similar to https://github.com/ollama/ollama/pull/12300. ### Relevant log output ```shell Feb 10 06:52:34 ollama ollama[5962]: time=2026-02-10T06:52:34.546Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:46567/load\": EOF" Feb 10 06:52:34 ollama ollama[5962]: time=2026-02-10T06:52:34.546Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:46567/load\": dial tcp 127.0.0.1:46567: connect: connection refused" Feb 10 06:52:34 ollama ollama[5962]: time=2026-02-10T06:52:34.546Z level=INFO source=sched.go:490 msg="Load failed" model=/root/.ollama/models/blobs/sha256-a3de86cd1c132c822487e dedd47a324c50491393e6565cd14bafa40d0b8e686f error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" Feb 10 06:52:34 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:34 | 500 | 903.736937ms | 127.0.0.1 | POST "/api/generate" Feb 10 06:52:41 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:41 | 200 | 32.665µs | 127.0.0.1 | HEAD "/" Feb 10 06:52:41 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:41 | 200 | 129.807248ms | 127.0.0.1 | POST "/api/show" Feb 10 06:52:41 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:41 | 200 | 125.652266ms | 127.0.0.1 | POST "/api/show" Feb 10 06:52:41 ollama ollama[5962]: time=2026-02-10T06:52:41.952Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-en gine --port 36061" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.156Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsi ng \"max\": invalid syntax" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=server.go:247 msg="enabling flash attention" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=server.go:431 msg="startFeb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1411 msg="starting ollama engine" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:39333" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.260Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAt tention:Enabled KvSize:4096 KvCacheType: NumThreads:1 GPULayers:37[ID:GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 Us eMmap:false}" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.290Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name="Qwen3 8B" description="" num_te nsors=399 num_key_values=29 Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CPU backend from /usr/local/lib/ollama/lib/ollama/libggml-cpu-haswell.so Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: found 1 CUDA devices: Feb 10 06:52:42 ollama ollama[5962]: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/lib/ollama/cuda_v13/libggml-cuda.so Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.456Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU. 0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compile r=cgo(gcc) Feb 10 06:52:42 ollama ollama[5962]: free(): invalid pointer Feb 10 06:52:42 ollama ollama[5962]: SIGABRT: abort Feb 10 06:52:42 ollama ollama[5962]: PC=0x7ccc3749eb2c m=5 sigcode=18446744073709551610 Feb 10 06:52:42 ollama ollama[5962]: signal arrived during cgo execution Feb 10 06:52:42 ollama ollama[5962]: goroutine 23 gp=0xc000504e00 m=5 mp=0xc000100008 [syscall]: Feb 10 06:52:42 ollama ollama[5962]: runtime.cgocall(0x65331c7f2d00, 0xc0000490d8) Feb 10 06:52:42 ollama ollama[5962]: runtime/cgocall.go:167 +0x4b fp=0xc0000490b0 sp=0xc000049078 pc=0x65331b9a994b Feb 10 06:52:42 ollama ollama[5962]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0x7ccbdc79bfd0, 0x7ccb724e4e90)ing runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-en gine --model /root/.ollama/models/blobs/sha256-a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f --port 39333" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=sched.go:463 msg="system memory" total="8.0 GiB" free="107.4 MiB" free_swap="512.0 MiB" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=sched.go:470 msg="gpu memory" id=GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 library=CUDA avail able="11.2 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.238Z level=INFO source=server.go:757 msg="loading model" "model layers"=37 requested=-1 Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1411 msg="starting ollama engine" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.249Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:39333" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.260Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAt tention:Enabled KvSize:4096 KvCacheType: NumThreads:1 GPULayers:37[ID:GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 Us eMmap:false}" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.290Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name="Qwen3 8B" description="" num_te nsors=399 num_key_values=29 Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CPU backend from /usr/local/lib/ollama/lib/ollama/libggml-cpu-haswell.so Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Feb 10 06:52:42 ollama ollama[5962]: ggml_cuda_init: found 1 CUDA devices: Feb 10 06:52:42 ollama ollama[5962]: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-60e3f5c9-f27e-4c35-502d-d4c9b9388f14 Feb 10 06:52:42 ollama ollama[5962]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/lib/ollama/cuda_v13/libggml-cuda.so Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.456Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU. 0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compile r=cgo(gcc) Feb 10 06:52:42 ollama ollama[5962]: free(): invalid pointer Feb 10 06:52:42 ollama ollama[5962]: SIGABRT: abort Feb 10 06:52:42 ollama ollama[5962]: PC=0x7ccc3749eb2c m=5 sigcode=18446744073709551610 Feb 10 06:52:42 ollama ollama[5962]: signal arrived during cgo execution Feb 10 06:52:42 ollama ollama[5962]: goroutine 23 gp=0xc000504e00 m=5 mp=0xc000100008 [syscall]: Feb 10 06:52:42 ollama ollama[5962]: runtime.cgocall(0x65331c7f2d00, 0xc0000490d8) Feb 10 06:52:42 ollama ollama[5962]: runtime/cgocall.go:167 +0x4b fp=0xc0000490b0 sp=0xc000049078 pc=0x65331b9a994b Feb 10 06:52:42 ollama ollama[5962]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0x7ccbdc79bfd0, 0x7ccb724e4e90) ... Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.680Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:39333/load\": EOF" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.681Z level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:39333/load\": dial tcp 127.0.0.1:39333: connect: connection refused" Feb 10 06:52:42 ollama ollama[5962]: time=2026-02-10T06:52:42.681Z level=INFO source=sched.go:490 msg="Load failed" model=/root/.ollama/models/blobs/sha256-a3de86cd1c132c822487e dedd47a324c50491393e6565cd14bafa40d0b8e686f error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" Feb 10 06:52:42 ollama ollama[5962]: [GIN] 2026/02/10 - 06:52:42 | 500 | 907.165692ms | 127.0.0.1 | POST "/api/generate" Feb 10 07:02:40 ollama ollama[5962]: [GIN] 2026/02/10 - 07:02:40 | 200 | 28.029µs | 127.0.0.1 | GET "/api/version" Feb 10 07:13:24 ollama systemd[1]: Stopping ollama.service - Ollama Service... ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.15.6
GiteaMirror added the needs more infobug labels 2026-04-12 22:07:20 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 10, 2026):

Set OLLAMA_DEBUG=2 and post the full log from start to the end of the model crashdump.

Note that you don't need to delete the cuda_v13 directory, set OLLAMA_LLIM_LIBRARY=cuda_v12 to force the runner to use v12.

<!-- gh-comment-id:3876877151 --> @rick-github commented on GitHub (Feb 10, 2026): Set `OLLAMA_DEBUG=2` and post the full log from start to the end of the model crashdump. Note that you don't need to delete the cuda_v13 directory, set `OLLAMA_LLIM_LIBRARY=cuda_v12` to force the runner to use v12.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9247