[GH-ISSUE #9260] I have 1.6T of space but the operation failed. Maybe it is an ollama allocation problem? #6034

Closed
opened 2026-04-12 17:22:20 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @darkSuperman on GitHub (Feb 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9260

What is the issue?

I failed to run deepseekR1(671b-q8_0)(713G): Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer. However, my computer has 1T memory and 640VRAM (8*80G-A800), totaling 1.6T space. Theoretically, it can run if it is correctly allocated.

I have used the method of creating a model from a local Modelfile to start 671b-q4_K_M (404G) without any problems, and it runs at 100% GPU.

I guess it may be caused by too many layers unloaded to the GPU and too large context. However, I don’t know how to set num_gpu and num_ctx like from a local Modelfile, because I downloaded the model directly from ollama through ollama run deepseek-r1:671b-q8_0.

I don’t know how to change num_gpu and num_ctx, or is this a bug in Ollama itself?

Relevant log output

Feb 21 01:40:26 paibo ollama[402476]: time=2025-02-21T01:40:26.917Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.4 GiB" now.free_swap="8.0 GiB"
Feb 21 01:40:26 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:40:26 paibo ollama[402476]: calling cuInit
Feb 21 01:40:26 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:40:26 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:40:26 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:40:26 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:40:26 paibo ollama[402476]: device count 8
Feb 21 01:40:27 paibo ollama[402476]: time=2025-02-21T01:40:27.875Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:27 paibo ollama[402476]: time=2025-02-21T01:40:27.989Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.098Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.207Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.317Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.427Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.535Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.644Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.687Z level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.687Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.691Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.695Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.698Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.702Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.706Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.709Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.712Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.716Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.719Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.723Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.726Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.728Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.730Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.733Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.735Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.738Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=8 available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.740Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=8 available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.743Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.4 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.4 GiB" now.free_swap="8.0 GiB"
Feb 21 01:40:28 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:40:28 paibo ollama[402476]: calling cuInit
Feb 21 01:40:28 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:40:28 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:40:28 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:40:28 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:40:28 paibo ollama[402476]: device count 8
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.880Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.988Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.094Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.200Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.309Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.415Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.520Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.626Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:40:29 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.626Z level=INFO source=server.go:100 msg="system memory" total="1007.5 GiB" free="990.4 GiB" free_swap="8.0 GiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.626Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=8 available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.628Z level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=62 layers.offload=51 layers.split=7,7,7,6,6,6,6,6 memory.available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="690.3 GiB" memory.required.partial="573.9 GiB" memory.required.kv="9.5 GiB" memory.required.allocations="[72.2 GiB 72.2 GiB 72.2 GiB 71.5 GiB 71.5 GiB 71.5 GiB 71.5 GiB 71.5 GiB]" memory.weights.total="672.0 GiB" memory.weights.repeating="671.1 GiB" memory.weights.nonrepeating="939.0 MiB" memory.graph.full="1019.5 MiB" memory.graph.partial="1019.5 MiB"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible="[cuda_v12 cuda_v11]"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:302 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v12
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:310 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/cuda_v12]
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 --ctx-size 2048 --batch-size 512 --n-gpu-layers 51 --verbose --threads 64 --parallel 1 --tensor-split 7,7,7,6,6,6,6,6 --port 33193"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:398 msg=subprocess environment="[PATH=/home/paibo/anaconda3/bin:/home/paibo/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama CUDA_VISIBLE_DEVICES=GPU-3bcbe689-683f-8046-f5ca-7636b2697113,GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b,GPU-8f579a79-1630-5670-9247-c3b5122c6b4a,GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61,GPU-1401cc7b-d569-d4dd-7612-a61597c7c219,GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d,GPU-17189970-4906-3e07-39c5-ee5efa82f16c,GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d]"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.630Z level=INFO source=sched.go:449 msg="loaded runners" count=1
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.630Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.630Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.649Z level=INFO source=runner.go:936 msg="starting go runner"
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.649Z level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=64
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.649Z level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.650Z level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:33193"
Feb 21 01:40:29 paibo ollama[402476]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Feb 21 01:40:29 paibo ollama[402476]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Feb 21 01:40:29 paibo ollama[402476]: ggml_cuda_init: found 8 CUDA devices:
Feb 21 01:40:29 paibo ollama[402476]:   Device 0: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]:   Device 1: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]:   Device 2: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]:   Device 3: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]:   Device 4: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]:   Device 5: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]:   Device 6: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]:   Device 7: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes
Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.882Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 21 01:40:31 paibo ollama[402476]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
Feb 21 01:40:31 paibo ollama[402476]: time=2025-02-21T01:40:31.124Z level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-haswell.so score: 55
Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-icelake.so score: 1463
Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-alderlake.so score: 0
Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-sapphirerapids.so score: 0
Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-sandybridge.so score: 20
Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-skylakex.so score: 183
Feb 21 01:40:31 paibo ollama[402476]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA0 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA1 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA2 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA3 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA4 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA5 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA6 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA7 (NVIDIA A800-SXM4-80GB) - 80801 MiB free
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: loaded meta data with 42 key-value pairs and 1025 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 (version GGUF V3 (latest))
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   1:                               general.type str              = model
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   2:                         general.size_label str              = 256x20B
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   3:                      deepseek2.block_count u32              = 61
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   4:                   deepseek2.context_length u32              = 163840
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   5:                 deepseek2.embedding_length u32              = 7168
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   6:              deepseek2.feed_forward_length u32              = 18432
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   7:             deepseek2.attention.head_count u32              = 128
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   8:          deepseek2.attention.head_count_kv u32              = 128
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv   9:                   deepseek2.rope.freq_base f32              = 10000.000000
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  10: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  11:                deepseek2.expert_used_count u32              = 8
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  12:        deepseek2.leading_dense_block_count u32              = 3
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  13:                       deepseek2.vocab_size u32              = 129280
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  14:            deepseek2.attention.q_lora_rank u32              = 1536
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  15:           deepseek2.attention.kv_lora_rank u32              = 512
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  16:             deepseek2.attention.key_length u32              = 192
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  17:           deepseek2.attention.value_length u32              = 128
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  18:       deepseek2.expert_feed_forward_length u32              = 2048
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  19:                     deepseek2.expert_count u32              = 256
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  20:              deepseek2.expert_shared_count u32              = 1
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  21:             deepseek2.expert_weights_scale f32              = 2.500000
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  22:              deepseek2.expert_weights_norm bool             = true
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  23:               deepseek2.expert_gating_func u32              = 2
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  24:             deepseek2.rope.dimension_count u32              = 64
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  25:                deepseek2.rope.scaling.type str              = yarn
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  26:              deepseek2.rope.scaling.factor f32              = 40.000000
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  27: deepseek2.rope.scaling.original_context_length u32              = 4096
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  28: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  29:                       tokenizer.ggml.model str              = gpt2
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  30:                         tokenizer.ggml.pre str              = deepseek-v3
Feb 21 01:40:31 paibo ollama[402476]: [132B blob data]
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  32:                  tokenizer.ggml.token_type arr[i32,129280]  = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  33:                      tokenizer.ggml.merges arr[str,127741]  = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  34:                tokenizer.ggml.bos_token_id u32              = 0
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  35:                tokenizer.ggml.eos_token_id u32              = 1
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 1
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  37:               tokenizer.ggml.add_bos_token bool             = true
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  38:               tokenizer.ggml.add_eos_token bool             = false
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  40:               general.quantization_version u32              = 2
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv  41:                          general.file_type u32              = 7
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - type  f32:  361 tensors
Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - type q8_0:  664 tensors
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128813 '<|tool▁output▁end|>' is not marked as EOG
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128812 '<|tool▁output▁begin|>' is not marked as EOG
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128811 '<|tool▁outputs▁end|>' is not marked as EOG
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128810 '<|tool▁outputs▁begin|>' is not marked as EOG
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128808 '<|tool▁call▁begin|>' is not marked as EOG
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128804 '<|Assistant|>' is not marked as EOG
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128803 '<|User|>' is not marked as EOG
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128795 '<|place▁holder▁no▁795|>' is not marked as EOG
......
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: special tokens cache size = 818
Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: token to piece cache size = 0.8223 MB
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: arch             = deepseek2
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: vocab type       = BPE
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_vocab          = 129280
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_merges         = 127741
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: vocab_only       = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ctx_train      = 163840
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd           = 7168
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_layer          = 61
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_head           = 128
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_head_kv        = 128
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_rot            = 64
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_swa            = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_head_k    = 192
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_head_v    = 128
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_gqa            = 1
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_k_gqa     = 24576
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_v_gqa     = 16384
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ff             = 18432
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_expert         = 256
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_expert_used    = 8
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: causal attn      = 1
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: pooling type     = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope type        = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope scaling     = yarn
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: freq_base_train  = 10000.0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: freq_scale_train = 0.025
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ctx_orig_yarn  = 4096
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope_finetuned   = unknown
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_d_conv       = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_d_inner      = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_d_state      = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_dt_rank      = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model type       = 671B
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model ftype      = Q8_0
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model params     = 671.03 B
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model size       = 664.29 GiB (8.50 BPW)
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: general.name     = n/a
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: BOS token        = 0 '<|begin▁of▁sentence|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: EOS token        = 1 '<|end▁of▁sentence|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: EOT token        = 1 '<|end▁of▁sentence|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: PAD token        = 1 '<|end▁of▁sentence|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: LF token         = 131 'Ä'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: FIM PRE token    = 128801 '<|fim▁begin|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: FIM SUF token    = 128800 '<|fim▁hole|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: FIM MID token    = 128802 '<|fim▁end|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: EOG token        = 1 '<|end▁of▁sentence|>'
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: max token length = 256
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_layer_dense_lead   = 3
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_lora_q             = 1536
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_lora_kv            = 512
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ff_exp             = 2048
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_expert_shared      = 1
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: expert_weights_scale = 2.5
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: expert_weights_norm  = 1
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: expert_gating_func   = sigmoid
Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope_yarn_log_mul    = 0.1000
Feb 21 01:40:31 paibo ollama[402476]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 157 others) cannot be used with preferred buffer type CUDA_Host, using CPU instead
Feb 21 01:40:41 paibo ollama[402476]: time=2025-02-21T01:40:41.624Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding"
Feb 21 01:40:41 paibo ollama[402476]: time=2025-02-21T01:40:41.881Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 21 01:40:44 paibo ollama[402476]: time=2025-02-21T01:40:44.088Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding"
Feb 21 01:40:44 paibo ollama[402476]: time=2025-02-21T01:40:44.347Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 21 01:40:45 paibo ollama[402476]: time=2025-02-21T01:40:45.551Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding"
Feb 21 01:40:45 paibo ollama[402476]: time=2025-02-21T01:40:45.807Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 21 01:40:54 paibo ollama[402476]: time=2025-02-21T01:40:54.535Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding"
Feb 21 01:40:54 paibo ollama[402476]: time=2025-02-21T01:40:54.797Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 21 01:40:58 paibo ollama[402476]: time=2025-02-21T01:40:58.007Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding"
Feb 21 01:40:58 paibo ollama[402476]: time=2025-02-21T01:40:58.260Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 21 01:41:00 paibo ollama[402476]: time=2025-02-21T01:41:00.216Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding"
Feb 21 01:41:00 paibo ollama[402476]: time=2025-02-21T01:41:00.476Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 21 01:41:02 paibo ollama[402476]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 81656.95 MiB on device 0: cudaMalloc failed: out of memory
Feb 21 01:41:16 paibo ollama[402476]: llama_model_load: error loading model: unable to allocate CUDA0 buffer
Feb 21 01:41:16 paibo ollama[402476]: llama_load_model_from_file: failed to load model
Feb 21 01:41:16 paibo ollama[402476]: panic: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:16 paibo ollama[402476]: goroutine 85 [running]:
Feb 21 01:41:16 paibo ollama[402476]: github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc000283710, {0x33, 0x0, 0x1, 0x0, {0xc000504a80, 0x8, 0x8}, 0xc00060cd50, 0x0}, ...)
Feb 21 01:41:16 paibo ollama[402476]:         github.com/ollama/ollama/llama/runner/runner.go:852 +0x3ad
Feb 21 01:41:16 paibo ollama[402476]: created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
Feb 21 01:41:16 paibo ollama[402476]:         github.com/ollama/ollama/llama/runner/runner.go:970 +0xd0d
Feb 21 01:41:16 paibo ollama[402476]: time=2025-02-21T01:41:16.784Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.078Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2"
Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer\nllama_load_model_from_file: failed to load model"
Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=sched.go:458 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:17 paibo ollama[402476]: [GIN] 2025/02/21 - 01:41:17 | 500 | 50.423645663s |       127.0.0.1 | POST     "/api/generate"
Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.4 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB"
Feb 21 01:41:17 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:41:17 paibo ollama[402476]: calling cuInit
Feb 21 01:41:17 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:41:17 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:41:17 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:41:17 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:41:17 paibo ollama[402476]: device count 8
Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.905Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.011Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.117Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.224Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.331Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.441Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.547Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.653Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:18 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.653Z level=DEBUG source=server.go:1081 msg="stopping llama server"
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.653Z level=DEBUG source=sched.go:380 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.903Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB"
Feb 21 01:41:18 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:41:18 paibo ollama[402476]: calling cuInit
Feb 21 01:41:18 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:41:18 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:41:18 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:41:18 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:41:18 paibo ollama[402476]: device count 8
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.008Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.116Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.220Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.325Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.437Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.543Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.649Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.758Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:19 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.758Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB"
Feb 21 01:41:19 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:41:19 paibo ollama[402476]: calling cuInit
Feb 21 01:41:19 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:41:19 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:41:19 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:41:19 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:41:19 paibo ollama[402476]: device count 8
Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.865Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:20 paibo ollama[402476]: time=2025-02-21T01:41:20.798Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:20 paibo ollama[402476]: time=2025-02-21T01:41:20.914Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.023Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.131Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.240Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.348Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.454Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.454Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB"
Feb 21 01:41:21 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:41:21 paibo ollama[402476]: calling cuInit
Feb 21 01:41:21 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:41:21 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:41:21 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:41:21 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:41:21 paibo ollama[402476]: device count 8
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.560Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.669Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.775Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.882Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.992Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.103Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.210Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.031002835 model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=sched.go:384 msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB"
Feb 21 01:41:22 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:41:22 paibo ollama[402476]: calling cuInit
Feb 21 01:41:22 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:41:22 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:41:22 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:41:22 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:41:22 paibo ollama[402476]: device count 8
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.422Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.528Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.634Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.742Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.850Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.985Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:23 paibo ollama[402476]: time=2025-02-21T01:41:23.905Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.017Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.017Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=6.730869986 model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.017Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB"
Feb 21 01:41:24 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20
Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850
Feb 21 01:41:24 paibo ollama[402476]: calling cuInit
Feb 21 01:41:24 paibo ollama[402476]: calling cuDriverGetVersion
Feb 21 01:41:24 paibo ollama[402476]: raw version 0x2f08
Feb 21 01:41:24 paibo ollama[402476]: CUDA driver version: 12.4
Feb 21 01:41:24 paibo ollama[402476]: calling cuDeviceGetCount
Feb 21 01:41:24 paibo ollama[402476]: device count 8
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.125Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.234Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.345Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.459Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.566Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.676Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.782Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.889Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB"
Feb 21 01:41:24 paibo ollama[402476]: releasing cuda driver library
Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.889Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.60291859 model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.11

Originally created by @darkSuperman on GitHub (Feb 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9260 ### What is the issue? I failed to run **deepseekR1(671b-q8_0)**(713G): `Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer.` However, my computer has **1T memory and 640VRAM (8*80G-A800)**, totaling 1.6T space. Theoretically, it can run if it is correctly allocated. I have used the method of creating a model from a local Modelfile to start 671b-q4_K_M (404G) without any problems, and it runs at 100% GPU. I guess it may be caused by too many layers unloaded to the GPU and too large context. However, I don’t know how to set `num_gpu `and `num_ctx `like from a local Modelfile, because I downloaded the model directly from ollama through `ollama run deepseek-r1:671b-q8_0`. I don’t know how to change `num_gpu `and `num_ctx`, or is this a bug in Ollama itself? ### Relevant log output ```shell Feb 21 01:40:26 paibo ollama[402476]: time=2025-02-21T01:40:26.917Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.4 GiB" now.free_swap="8.0 GiB" Feb 21 01:40:26 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:40:26 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:40:26 paibo ollama[402476]: calling cuInit Feb 21 01:40:26 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:40:26 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:40:26 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:40:26 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:40:26 paibo ollama[402476]: device count 8 Feb 21 01:40:27 paibo ollama[402476]: time=2025-02-21T01:40:27.875Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:27 paibo ollama[402476]: time=2025-02-21T01:40:27.989Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.098Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.207Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.317Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.427Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.535Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.644Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: releasing cuda driver library Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.687Z level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.687Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.691Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.695Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.698Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.702Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.706Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.709Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.712Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.716Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.719Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.723Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.726Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.728Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.730Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.733Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.735Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.738Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=8 available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.740Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=8 available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.743Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.4 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.4 GiB" now.free_swap="8.0 GiB" Feb 21 01:40:28 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:40:28 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:40:28 paibo ollama[402476]: calling cuInit Feb 21 01:40:28 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:40:28 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:40:28 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:40:28 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:40:28 paibo ollama[402476]: device count 8 Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.880Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:28 paibo ollama[402476]: time=2025-02-21T01:40:28.988Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.094Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.200Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.309Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.415Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.520Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.626Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:40:29 paibo ollama[402476]: releasing cuda driver library Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.626Z level=INFO source=server.go:100 msg="system memory" total="1007.5 GiB" free="990.4 GiB" free_swap="8.0 GiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.626Z level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=8 available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.628Z level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=62 layers.offload=51 layers.split=7,7,7,6,6,6,6,6 memory.available="[78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB 78.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="690.3 GiB" memory.required.partial="573.9 GiB" memory.required.kv="9.5 GiB" memory.required.allocations="[72.2 GiB 72.2 GiB 72.2 GiB 71.5 GiB 71.5 GiB 71.5 GiB 71.5 GiB 71.5 GiB]" memory.weights.total="672.0 GiB" memory.weights.repeating="671.1 GiB" memory.weights.nonrepeating="939.0 MiB" memory.graph.full="1019.5 MiB" memory.graph.partial="1019.5 MiB" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible="[cuda_v12 cuda_v11]" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:302 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v12 Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:310 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/cuda_v12] Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 --ctx-size 2048 --batch-size 512 --n-gpu-layers 51 --verbose --threads 64 --parallel 1 --tensor-split 7,7,7,6,6,6,6,6 --port 33193" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.629Z level=DEBUG source=server.go:398 msg=subprocess environment="[PATH=/home/paibo/anaconda3/bin:/home/paibo/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama CUDA_VISIBLE_DEVICES=GPU-3bcbe689-683f-8046-f5ca-7636b2697113,GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b,GPU-8f579a79-1630-5670-9247-c3b5122c6b4a,GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61,GPU-1401cc7b-d569-d4dd-7612-a61597c7c219,GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d,GPU-17189970-4906-3e07-39c5-ee5efa82f16c,GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d]" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.630Z level=INFO source=sched.go:449 msg="loaded runners" count=1 Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.630Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.630Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.649Z level=INFO source=runner.go:936 msg="starting go runner" Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.649Z level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=64 Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.649Z level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12 Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.650Z level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:33193" Feb 21 01:40:29 paibo ollama[402476]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Feb 21 01:40:29 paibo ollama[402476]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Feb 21 01:40:29 paibo ollama[402476]: ggml_cuda_init: found 8 CUDA devices: Feb 21 01:40:29 paibo ollama[402476]: Device 0: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: Device 1: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: Device 2: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: Device 3: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: Device 4: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: Device 5: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: Device 6: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: Device 7: NVIDIA A800-SXM4-80GB, compute capability 8.0, VMM: yes Feb 21 01:40:29 paibo ollama[402476]: time=2025-02-21T01:40:29.882Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 21 01:40:31 paibo ollama[402476]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so Feb 21 01:40:31 paibo ollama[402476]: time=2025-02-21T01:40:31.124Z level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/lib/ollama Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-haswell.so score: 55 Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-icelake.so score: 1463 Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-alderlake.so score: 0 Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-sapphirerapids.so score: 0 Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-sandybridge.so score: 20 Feb 21 01:40:31 paibo ollama[402476]: ggml_backend_load_best: /usr/local/lib/ollama/libggml-cpu-skylakex.so score: 183 Feb 21 01:40:31 paibo ollama[402476]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA0 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA1 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA2 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA3 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA4 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA5 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA6 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_load_model_from_file: using device CUDA7 (NVIDIA A800-SXM4-80GB) - 80801 MiB free Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: loaded meta data with 42 key-value pairs and 1025 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 (version GGUF V3 (latest)) Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 0: general.architecture str = deepseek2 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 1: general.type str = model Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 2: general.size_label str = 256x20B Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 3: deepseek2.block_count u32 = 61 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 4: deepseek2.context_length u32 = 163840 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 5: deepseek2.embedding_length u32 = 7168 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 6: deepseek2.feed_forward_length u32 = 18432 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 7: deepseek2.attention.head_count u32 = 128 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 8: deepseek2.attention.head_count_kv u32 = 128 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 9: deepseek2.rope.freq_base f32 = 10000.000000 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 10: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 11: deepseek2.expert_used_count u32 = 8 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 12: deepseek2.leading_dense_block_count u32 = 3 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 13: deepseek2.vocab_size u32 = 129280 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 14: deepseek2.attention.q_lora_rank u32 = 1536 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 15: deepseek2.attention.kv_lora_rank u32 = 512 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 16: deepseek2.attention.key_length u32 = 192 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 17: deepseek2.attention.value_length u32 = 128 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 18: deepseek2.expert_feed_forward_length u32 = 2048 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 19: deepseek2.expert_count u32 = 256 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 20: deepseek2.expert_shared_count u32 = 1 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 21: deepseek2.expert_weights_scale f32 = 2.500000 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 22: deepseek2.expert_weights_norm bool = true Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 23: deepseek2.expert_gating_func u32 = 2 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 24: deepseek2.rope.dimension_count u32 = 64 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 25: deepseek2.rope.scaling.type str = yarn Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 26: deepseek2.rope.scaling.factor f32 = 40.000000 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 27: deepseek2.rope.scaling.original_context_length u32 = 4096 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 28: deepseek2.rope.scaling.yarn_log_multiplier f32 = 0.100000 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 29: tokenizer.ggml.model str = gpt2 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 30: tokenizer.ggml.pre str = deepseek-v3 Feb 21 01:40:31 paibo ollama[402476]: [132B blob data] Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 32: tokenizer.ggml.token_type arr[i32,129280] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 33: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e... Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 34: tokenizer.ggml.bos_token_id u32 = 0 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 35: tokenizer.ggml.eos_token_id u32 = 1 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 1 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 37: tokenizer.ggml.add_bos_token bool = true Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 38: tokenizer.ggml.add_eos_token bool = false Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 39: tokenizer.chat_template str = {% if not add_generation_prompt is de... Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 40: general.quantization_version u32 = 2 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - kv 41: general.file_type u32 = 7 Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - type f32: 361 tensors Feb 21 01:40:31 paibo ollama[402476]: llama_model_loader: - type q8_0: 664 tensors Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128813 '<|tool▁output▁end|>' is not marked as EOG Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128812 '<|tool▁output▁begin|>' is not marked as EOG Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128811 '<|tool▁outputs▁end|>' is not marked as EOG Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128810 '<|tool▁outputs▁begin|>' is not marked as EOG Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128808 '<|tool▁call▁begin|>' is not marked as EOG Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128804 '<|Assistant|>' is not marked as EOG Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128803 '<|User|>' is not marked as EOG Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: control token: 128795 '<|place▁holder▁no▁795|>' is not marked as EOG ...... Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: special tokens cache size = 818 Feb 21 01:40:31 paibo ollama[402476]: llm_load_vocab: token to piece cache size = 0.8223 MB Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: format = GGUF V3 (latest) Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: arch = deepseek2 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: vocab type = BPE Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_vocab = 129280 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_merges = 127741 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: vocab_only = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ctx_train = 163840 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd = 7168 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_layer = 61 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_head = 128 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_head_kv = 128 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_rot = 64 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_swa = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_head_k = 192 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_head_v = 128 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_gqa = 1 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_k_gqa = 24576 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_embd_v_gqa = 16384 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_norm_eps = 0.0e+00 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_norm_rms_eps = 1.0e-06 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: f_logit_scale = 0.0e+00 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ff = 18432 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_expert = 256 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_expert_used = 8 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: causal attn = 1 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: pooling type = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope type = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope scaling = yarn Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: freq_base_train = 10000.0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: freq_scale_train = 0.025 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ctx_orig_yarn = 4096 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope_finetuned = unknown Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_d_conv = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_d_inner = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_d_state = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_dt_rank = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model type = 671B Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model ftype = Q8_0 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model params = 671.03 B Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: model size = 664.29 GiB (8.50 BPW) Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: general.name = n/a Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: BOS token = 0 '<|begin▁of▁sentence|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: EOS token = 1 '<|end▁of▁sentence|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: EOT token = 1 '<|end▁of▁sentence|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: PAD token = 1 '<|end▁of▁sentence|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: LF token = 131 'Ä' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: FIM PRE token = 128801 '<|fim▁begin|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: FIM SUF token = 128800 '<|fim▁hole|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: FIM MID token = 128802 '<|fim▁end|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: EOG token = 1 '<|end▁of▁sentence|>' Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: max token length = 256 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_layer_dense_lead = 3 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_lora_q = 1536 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_lora_kv = 512 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_ff_exp = 2048 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: n_expert_shared = 1 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: expert_weights_scale = 2.5 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: expert_weights_norm = 1 Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: expert_gating_func = sigmoid Feb 21 01:40:31 paibo ollama[402476]: llm_load_print_meta: rope_yarn_log_mul = 0.1000 Feb 21 01:40:31 paibo ollama[402476]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 157 others) cannot be used with preferred buffer type CUDA_Host, using CPU instead Feb 21 01:40:41 paibo ollama[402476]: time=2025-02-21T01:40:41.624Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding" Feb 21 01:40:41 paibo ollama[402476]: time=2025-02-21T01:40:41.881Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 21 01:40:44 paibo ollama[402476]: time=2025-02-21T01:40:44.088Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding" Feb 21 01:40:44 paibo ollama[402476]: time=2025-02-21T01:40:44.347Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 21 01:40:45 paibo ollama[402476]: time=2025-02-21T01:40:45.551Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding" Feb 21 01:40:45 paibo ollama[402476]: time=2025-02-21T01:40:45.807Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 21 01:40:54 paibo ollama[402476]: time=2025-02-21T01:40:54.535Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding" Feb 21 01:40:54 paibo ollama[402476]: time=2025-02-21T01:40:54.797Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 21 01:40:58 paibo ollama[402476]: time=2025-02-21T01:40:58.007Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding" Feb 21 01:40:58 paibo ollama[402476]: time=2025-02-21T01:40:58.260Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 21 01:41:00 paibo ollama[402476]: time=2025-02-21T01:41:00.216Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server not responding" Feb 21 01:41:00 paibo ollama[402476]: time=2025-02-21T01:41:00.476Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 21 01:41:02 paibo ollama[402476]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 81656.95 MiB on device 0: cudaMalloc failed: out of memory Feb 21 01:41:16 paibo ollama[402476]: llama_model_load: error loading model: unable to allocate CUDA0 buffer Feb 21 01:41:16 paibo ollama[402476]: llama_load_model_from_file: failed to load model Feb 21 01:41:16 paibo ollama[402476]: panic: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:16 paibo ollama[402476]: goroutine 85 [running]: Feb 21 01:41:16 paibo ollama[402476]: github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc000283710, {0x33, 0x0, 0x1, 0x0, {0xc000504a80, 0x8, 0x8}, 0xc00060cd50, 0x0}, ...) Feb 21 01:41:16 paibo ollama[402476]: github.com/ollama/ollama/llama/runner/runner.go:852 +0x3ad Feb 21 01:41:16 paibo ollama[402476]: created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 Feb 21 01:41:16 paibo ollama[402476]: github.com/ollama/ollama/llama/runner/runner.go:970 +0xd0d Feb 21 01:41:16 paibo ollama[402476]: time=2025-02-21T01:41:16.784Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.078Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2" Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer\nllama_load_model_from_file: failed to load model" Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=sched.go:458 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:17 paibo ollama[402476]: [GIN] 2025/02/21 - 01:41:17 | 500 | 50.423645663s | 127.0.0.1 | POST "/api/generate" Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.286Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.4 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB" Feb 21 01:41:17 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:41:17 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:41:17 paibo ollama[402476]: calling cuInit Feb 21 01:41:17 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:41:17 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:41:17 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:41:17 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:41:17 paibo ollama[402476]: device count 8 Feb 21 01:41:17 paibo ollama[402476]: time=2025-02-21T01:41:17.905Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.011Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.117Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.224Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.331Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.441Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.547Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.653Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:18 paibo ollama[402476]: releasing cuda driver library Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.653Z level=DEBUG source=server.go:1081 msg="stopping llama server" Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.653Z level=DEBUG source=sched.go:380 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:18 paibo ollama[402476]: time=2025-02-21T01:41:18.903Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB" Feb 21 01:41:18 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:41:18 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:41:18 paibo ollama[402476]: calling cuInit Feb 21 01:41:18 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:41:18 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:41:18 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:41:18 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:41:18 paibo ollama[402476]: device count 8 Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.008Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.116Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.220Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.325Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.437Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.543Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.649Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.758Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:19 paibo ollama[402476]: releasing cuda driver library Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.758Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB" Feb 21 01:41:19 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:41:19 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:41:19 paibo ollama[402476]: calling cuInit Feb 21 01:41:19 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:41:19 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:41:19 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:41:19 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:41:19 paibo ollama[402476]: device count 8 Feb 21 01:41:19 paibo ollama[402476]: time=2025-02-21T01:41:19.865Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:20 paibo ollama[402476]: time=2025-02-21T01:41:20.798Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:20 paibo ollama[402476]: time=2025-02-21T01:41:20.914Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.023Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.131Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.240Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.348Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.454Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: releasing cuda driver library Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.454Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB" Feb 21 01:41:21 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:41:21 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:41:21 paibo ollama[402476]: calling cuInit Feb 21 01:41:21 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:41:21 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:41:21 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:41:21 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:41:21 paibo ollama[402476]: device count 8 Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.560Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.669Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.775Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.882Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:21 paibo ollama[402476]: time=2025-02-21T01:41:21.992Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.103Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.210Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: releasing cuda driver library Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.031002835 model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=sched.go:384 msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.317Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB" Feb 21 01:41:22 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:41:22 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:41:22 paibo ollama[402476]: calling cuInit Feb 21 01:41:22 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:41:22 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:41:22 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:41:22 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:41:22 paibo ollama[402476]: device count 8 Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.422Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.528Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.634Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.742Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.850Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:22 paibo ollama[402476]: time=2025-02-21T01:41:22.985Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:23 paibo ollama[402476]: time=2025-02-21T01:41:23.905Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.017Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: releasing cuda driver library Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.017Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=6.730869986 model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.017Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="1007.5 GiB" before.free="990.3 GiB" before.free_swap="8.0 GiB" now.total="1007.5 GiB" now.free="990.3 GiB" now.free_swap="8.0 GiB" Feb 21 01:41:24 paibo ollama[402476]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.127.08 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuInit - 0x7843e667cbc0 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDriverGetVersion - 0x7843e667cbe0 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetCount - 0x7843e667cc20 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGet - 0x7843e667cc00 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetAttribute - 0x7843e667cd00 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetUuid - 0x7843e667cc60 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuDeviceGetName - 0x7843e667cc40 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuCtxCreate_v3 - 0x7843e667cee0 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuMemGetInfo_v2 - 0x7843e6686e20 Feb 21 01:41:24 paibo ollama[402476]: dlsym: cuCtxDestroy - 0x7843e66e1850 Feb 21 01:41:24 paibo ollama[402476]: calling cuInit Feb 21 01:41:24 paibo ollama[402476]: calling cuDriverGetVersion Feb 21 01:41:24 paibo ollama[402476]: raw version 0x2f08 Feb 21 01:41:24 paibo ollama[402476]: CUDA driver version: 12.4 Feb 21 01:41:24 paibo ollama[402476]: calling cuDeviceGetCount Feb 21 01:41:24 paibo ollama[402476]: device count 8 Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.125Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-3bcbe689-683f-8046-f5ca-7636b2697113 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.234Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ee81c063-67b5-367c-c0f5-cb70bafbcc4b name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.345Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-8f579a79-1630-5670-9247-c3b5122c6b4a name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.459Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-824656ea-21a7-3cf2-1c68-7fdafad7be61 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.566Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1401cc7b-d569-d4dd-7612-a61597c7c219 name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.676Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bea94dc-d4e2-5eec-dfac-bb80e23b972d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.782Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-17189970-4906-3e07-39c5-ee5efa82f16c name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.889Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d2f47573-5803-99f3-24d4-b1d3f50d967d name="NVIDIA A800-SXM4-80GB" overhead="0 B" before.total="79.3 GiB" before.free="78.9 GiB" now.total="79.3 GiB" now.free="78.9 GiB" now.used="420.8 MiB" Feb 21 01:41:24 paibo ollama[402476]: releasing cuda driver library Feb 21 01:41:24 paibo ollama[402476]: time=2025-02-21T01:41:24.889Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.60291859 model=/usr/share/ollama/.ollama/models/blobs/sha256-b74fcbabbb0836b765a011150b96e24ff3937bc46ee4a820e88877893dbe9a63 ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.11
GiteaMirror added the bug label 2026-04-12 17:22:20 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 21, 2025):

echo FROM deepseek-r1:671b-q8_0 > Modelfile
echo PARAMETER num_gpu 43 >> Modelfile
ollama create deepseek-r1:671b-g43-q8_0
ollama run deepseek-r1:671b-g43-q8_0
<!-- gh-comment-id:2673867505 --> @rick-github commented on GitHub (Feb 21, 2025): ``` echo FROM deepseek-r1:671b-q8_0 > Modelfile echo PARAMETER num_gpu 43 >> Modelfile ollama create deepseek-r1:671b-g43-q8_0 ollama run deepseek-r1:671b-g43-q8_0 ```
Author
Owner

@darkSuperman commented on GitHub (Feb 21, 2025):

@rick-github Thanks, when I set num_gpu to 43, the model can run normally: 29%/71% CPU/GPU. But I found that the gpu computing utilization is very low, while the cpu is very high, is this reasonable?

Image

Image

<!-- gh-comment-id:2674359041 --> @darkSuperman commented on GitHub (Feb 21, 2025): @rick-github Thanks, when I set num_gpu to 43, the model can run normally: `29%/71% CPU/GPU`. But I found that the gpu computing utilization is very low, while the cpu is very high, is this reasonable? ![Image](https://github.com/user-attachments/assets/26f57acb-53f5-4837-afc3-707ae0987b17) ![Image](https://github.com/user-attachments/assets/d6ed753d-4688-454b-ac24-c5a098391d04)
Author
Owner

@rick-github commented on GitHub (Feb 21, 2025):

GPU and CPU take turns at running inference over the part of the model they have loaded. GPU is faster and has to wait on CPU, so GPU utilization is low and CPU utilization is high.

<!-- gh-comment-id:2674369673 --> @rick-github commented on GitHub (Feb 21, 2025): GPU and CPU take turns at running inference over the part of the model they have loaded. GPU is faster and has to wait on CPU, so GPU utilization is low and CPU utilization is high.
Author
Owner

@darkSuperman commented on GitHub (Feb 21, 2025):

Thank you for your patience, I understand.

<!-- gh-comment-id:2674391161 --> @darkSuperman commented on GitHub (Feb 21, 2025): Thank you for your patience, I understand.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6034