[GH-ISSUE #9847] reload model error #6448

Closed
opened 2026-04-12 18:00:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @lajiyou9 on GitHub (Mar 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9847

What is the issue?

When I run the deepseek-r1:32b model and pass different context lengths through the model parameter, the model reloads and throws an error.
Q1:why reload?
Q2: why error? host pid is error! SET_TASK_PID FAILED.

Relevant log output

[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17274503168 total=25386352640 limit=25757220864 usage=7617930240
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17092050944 total=25386352640 limit=25757220864 usage=7796343808
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17243045888 total=25386352640 limit=25757220864 usage=7644955648
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=17274503168 total=25386352640 limit=25757220864 usage=7617930240
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
time=2025-03-17T17:46:08.236+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.676611462 model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648
[HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
time=2025-03-17T17:46:09.916+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.357191556 model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb
[HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240
[HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808
[HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648
[HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
time=2025-03-17T17:46:11.704+08:00 level=INFO source=sched.go:730 msg="new model will fit in available VRAM, loading" model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb library=cuda parallel=4 required="31.7 GiB"
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648
[HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
time=2025-03-17T17:46:13.214+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=10.654653934 model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb
[HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240
[HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648
[HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088
time=2025-03-17T17:46:14.792+08:00 level=INFO source=server.go:100 msg="system memory" total="503.3 GiB" free="474.4 GiB" free_swap="0 B"
time=2025-03-17T17:46:14.794+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=65 layers.split=10,10,9,9,9,9,9 memory.available="[17.3 GiB 17.3 GiB 17.3 GiB 17.3 GiB 16.9 GiB 16.9 GiB 16.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="31.7 GiB" memory.required.partial="31.7 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[4.7 GiB 5.0 GiB 4.4 GiB 4.4 GiB 4.4 GiB 4.4 GiB 4.4 GiB]" memory.weights.total="19.5 GiB" memory.weights.repeating="18.9 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB"
time=2025-03-17T17:46:14.794+08:00 level=INFO source=server.go:381 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --threads 56 --parallel 4 --tensor-split 10,10,9,9,9,9,9 --port 45405"
time=2025-03-17T17:46:14.795+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-03-17T17:46:14.795+08:00 level=INFO source=server.go:558 msg="waiting for llama runner to start responding"
time=2025-03-17T17:46:14.796+08:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server error"
time=2025-03-17T17:46:14.847+08:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-03-17T17:46:14.848+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=56
time=2025-03-17T17:46:14.848+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:45405"
[HAMI-core Msg(432:140668983752448:libvgpu.c:836)]: Initializing.....
[HAMI-core Warn(432:140668983752448:multiprocess_memory_limit.c:589)]: Kick dead proc 78
time=2025-03-17T17:46:15.048+08:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server loading model"
[HAMI-core ERROR (pid:432 thread=140668983752448 utils.c:146)]: host pid is error!
[HAMI-core Msg(432:140668983752448:libvgpu.c:855)]: Initialized
[HAMI-core Warn(432:140668983752448:libvgpu.c:857)]: SET_TASK_PID FAILED.
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 7 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
  Device 2: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
  Device 3: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
  Device 4: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
  Device 5: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
  Device 6: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

No response

Originally created by @lajiyou9 on GitHub (Mar 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9847 ### What is the issue? When I run the deepseek-r1:32b model and pass different context lengths through the model parameter, the model reloads and throws an error. Q1:why reload? Q2: why error? host pid is error! SET_TASK_PID FAILED. ### Relevant log output ```shell [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17274503168 total=25386352640 limit=25757220864 usage=7617930240 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17092050944 total=25386352640 limit=25757220864 usage=7796343808 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17243045888 total=25386352640 limit=25757220864 usage=7644955648 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=17689739264 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=17274503168 total=25386352640 limit=25757220864 usage=7617930240 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 time=2025-03-17T17:46:08.236+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.676611462 model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648 [HAMI-core Msg(1:140179091461888:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 time=2025-03-17T17:46:09.916+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.357191556 model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb [HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240 [HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808 [HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648 [HAMI-core Msg(1:140178420778752:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 time=2025-03-17T17:46:11.704+08:00 level=INFO source=sched.go:730 msg="new model will fit in available VRAM, loading" model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb library=cuda parallel=4 required="31.7 GiB" [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240 [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808 [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648 [HAMI-core Msg(1:140178127021824:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 time=2025-03-17T17:46:13.214+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=10.654653934 model=/root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb [HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7617930240 [HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140176256923392:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7796343808 [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7644955648 [HAMI-core Msg(1:140176852510464:memory.c:511)]: orig free=25016664064 total=25386352640 limit=25757220864 usage=7198553088 time=2025-03-17T17:46:14.792+08:00 level=INFO source=server.go:100 msg="system memory" total="503.3 GiB" free="474.4 GiB" free_swap="0 B" time=2025-03-17T17:46:14.794+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=65 layers.split=10,10,9,9,9,9,9 memory.available="[17.3 GiB 17.3 GiB 17.3 GiB 17.3 GiB 16.9 GiB 16.9 GiB 16.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="31.7 GiB" memory.required.partial="31.7 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[4.7 GiB 5.0 GiB 4.4 GiB 4.4 GiB 4.4 GiB 4.4 GiB 4.4 GiB]" memory.weights.total="19.5 GiB" memory.weights.repeating="18.9 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB" time=2025-03-17T17:46:14.794+08:00 level=INFO source=server.go:381 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-c62ccde5630c20c8a9cf601861d31977d07450cad6dfdf1c661aab307107bddb --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --threads 56 --parallel 4 --tensor-split 10,10,9,9,9,9,9 --port 45405" time=2025-03-17T17:46:14.795+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2025-03-17T17:46:14.795+08:00 level=INFO source=server.go:558 msg="waiting for llama runner to start responding" time=2025-03-17T17:46:14.796+08:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server error" time=2025-03-17T17:46:14.847+08:00 level=INFO source=runner.go:936 msg="starting go runner" time=2025-03-17T17:46:14.848+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=56 time=2025-03-17T17:46:14.848+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:45405" [HAMI-core Msg(432:140668983752448:libvgpu.c:836)]: Initializing..... [HAMI-core Warn(432:140668983752448:multiprocess_memory_limit.c:589)]: Kick dead proc 78 time=2025-03-17T17:46:15.048+08:00 level=INFO source=server.go:592 msg="waiting for server to become available" status="llm server loading model" [HAMI-core ERROR (pid:432 thread=140668983752448 utils.c:146)]: host pid is error! [HAMI-core Msg(432:140668983752448:libvgpu.c:855)]: Initialized [HAMI-core Warn(432:140668983752448:libvgpu.c:857)]: SET_TASK_PID FAILED. ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 7 CUDA devices: Device 0: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes Device 1: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes Device 2: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes Device 3: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes Device 4: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes Device 5: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes Device 6: NVIDIA GeForce RTX 4090 D, compute capability 8.9, VMM: yes load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 18:00:24 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 18, 2025):

Q1:why reload?

A change in context size means a change in the processing environment so the model is reloaded.

Q2: why error? host pid is error! SET_TASK_PID FAILED.

HAMI-core is not part of ollama.

<!-- gh-comment-id:2731443857 --> @rick-github commented on GitHub (Mar 18, 2025): > Q1:why reload? A change in context size means a change in the processing environment so the model is reloaded. > Q2: why error? host pid is error! SET_TASK_PID FAILED. HAMI-core is not part of ollama.
Author
Owner

@pdevine commented on GitHub (Mar 18, 2025):

I'm going to go ahead and close this as answered (thank you @rick-github !)

<!-- gh-comment-id:2734635661 --> @pdevine commented on GitHub (Mar 18, 2025): I'm going to go ahead and close this as answered (thank you @rick-github !)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6448