[GH-ISSUE #10369] ollama ps runtime shows using GPU but actually logs using cpu #32573

Closed
opened 2026-04-22 13:58:48 -05:00 by GiteaMirror · 19 comments
Owner

Originally created by @liuyixia-make on GitHub (Apr 22, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10369

What is the issue?

This has never happened with previous versions, this problem occurred with the 0.65 upgrade, and my environment has been configured to use the gpu:
Environment="USE_GPU=True"
Environment="CUDA_VISIBLE_DEVICES=0,1"
Environment="OLLAMA_FORCE_GPU=1"
Environment="OLLAMA_SCHED_SPREAD=1"
root@hpry:~# nvidia-smi
Tue Apr 22 20:17:46 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 42% 25C P8 26W / 350W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:05:00.0 Off | N/A |
| 42% 22C P8 13W / 350W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Relevant log output

Apr 22 20:13:55 hpry ollama[1592]: load_tensors:   CPU_Mapped model buffer size = 18926.01 MiB
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_seq_max     = 4
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ctx         = 8192
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ctx_per_seq = 2048
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_batch       = 2048
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ubatch      = 512
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: flash_attn    = 0
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: freq_base     = 1000000.0
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: freq_scale    = 1
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Apr 22 20:13:55 hpry ollama[1592]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1
Apr 22 20:13:55 hpry ollama[1592]: llama_kv_cache_init:        CPU KV buffer size =  2048.00 MiB
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: KV self size  = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model:        CPU  output buffer size =     2.40 MiB
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model:        CPU compute buffer size =   696.01 MiB
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: graph nodes  = 2246
Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: graph splits = 1
Apr 22 20:13:56 hpry ollama[1592]: time=2025-04-22T20:13:56.016+08:00 level=INFO source=server.go:619 msg="llama runner started in 1.76 seconds"

root@hpry:~# ollama ps
NAME               ID              SIZE     PROCESSOR    UNTIL              
deepseek-r1:32b    38056bbcbb2d    25 GB    100% GPU     4 minutes from now

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @liuyixia-make on GitHub (Apr 22, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10369 ### What is the issue? This has never happened with previous versions, this problem occurred with the 0.65 upgrade, and my environment has been configured to use the gpu: Environment="USE_GPU=True" Environment="CUDA_VISIBLE_DEVICES=0,1" Environment="OLLAMA_FORCE_GPU=1" Environment="OLLAMA_SCHED_SPREAD=1" root@hpry:~# nvidia-smi Tue Apr 22 20:17:46 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A | | 42% 25C P8 26W / 350W | 4MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 Off | 00000000:05:00.0 Off | N/A | | 42% 22C P8 13W / 350W | 4MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ### Relevant log output ```shell Apr 22 20:13:55 hpry ollama[1592]: load_tensors: CPU_Mapped model buffer size = 18926.01 MiB Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_seq_max = 4 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ctx = 8192 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ctx_per_seq = 2048 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_batch = 2048 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ubatch = 512 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: flash_attn = 0 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: freq_base = 1000000.0 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: freq_scale = 1 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized Apr 22 20:13:55 hpry ollama[1592]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1 Apr 22 20:13:55 hpry ollama[1592]: llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: CPU output buffer size = 2.40 MiB Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: CPU compute buffer size = 696.01 MiB Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: graph nodes = 2246 Apr 22 20:13:55 hpry ollama[1592]: llama_init_from_model: graph splits = 1 Apr 22 20:13:56 hpry ollama[1592]: time=2025-04-22T20:13:56.016+08:00 level=INFO source=server.go:619 msg="llama runner started in 1.76 seconds" root@hpry:~# ollama ps NAME ID SIZE PROCESSOR UNTIL deepseek-r1:32b 38056bbcbb2d 25 GB 100% GPU 4 minutes from now ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-22 13:58:48 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 22, 2025):

The full log will aid in debugging.

<!-- gh-comment-id:2821179610 --> @rick-github commented on GitHub (Apr 22, 2025): The full log will aid in debugging.
Author
Owner

@liuyixia-make commented on GitHub (Apr 22, 2025):

Apr 22 21:20:46 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:46 | 200 | 35.346µs | 127.0.0.1 | HEAD "/"
Apr 22 21:20:46 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:46 | 200 | 19.322161ms | 127.0.0.1 | POST "/api/show"
Apr 22 21:20:46 hpry ollama[27413]: time=2025-04-22T21:20:46.666+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.6 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:46 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:46 hpry ollama[27413]: calling cuInit
Apr 22 21:20:46 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:46 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:46 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:46 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:46 hpry ollama[27413]: device count 2
Apr 22 21:20:46 hpry ollama[27413]: time=2025-04-22T21:20:46.927+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.003+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.003+08:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:47 hpry ollama[27413]: calling cuInit
Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:47 hpry ollama[27413]: device count 2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.175+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=INFO source=sched.go:732 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 library=cuda parallel=4 required="23.4 GiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:47 hpry ollama[27413]: calling cuInit
Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:47 hpry ollama[27413]: device count 2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.393+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=INFO source=server.go:105 msg="system memory" total="125.0 GiB" free="122.5 GiB" free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:47 hpry ollama[27413]: calling cuInit
Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:47 hpry ollama[27413]: device count 2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.609+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[23.4 GiB 23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.4 GiB" memory.required.partial="23.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[12.0 GiB 11.4 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=INFO source=server.go:185 msg="enabling flash attention"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type=""
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[cuda_v11]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:20:47 hpry ollama[27413]: print_info: file format = GGUF V3 (latest)
Apr 22 21:20:47 hpry ollama[27413]: print_info: file type = Q4_K - Medium
Apr 22 21:20:47 hpry ollama[27413]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:20:47 hpry ollama[27413]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: special tokens cache size = 22
Apr 22 21:20:47 hpry ollama[27413]: load: token to piece cache size = 0.9310 MB
Apr 22 21:20:47 hpry ollama[27413]: print_info: arch = qwen2
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab_only = 1
Apr 22 21:20:47 hpry ollama[27413]: print_info: model type = ?B
Apr 22 21:20:47 hpry ollama[27413]: print_info: model params = 32.76 B
Apr 22 21:20:47 hpry ollama[27413]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab type = BPE
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_vocab = 152064
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_merges = 151387
Apr 22 21:20:47 hpry ollama[27413]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: LF token = 198 'Ċ'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: max token length = 256
Apr 22 21:20:47 hpry ollama[27413]: llama_model_load: vocab only - skipping tensors
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --flash-attn --parallel 4 --tensor-split 33,32 --port 44667"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin CUDA_VISIBLE_DEVICES=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4,GPU-de554a1a-5def-5a94-397d-5512a37432da LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v11:/usr/local/lib/ollama]"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=runner.go:853 msg="starting go runner"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load blas: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-blas.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cann: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cann.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cuda: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cuda.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load hip: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-hip.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load kompute: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-kompute.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load metal: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-metal.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load rpc: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-rpc.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load sycl: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-sycl.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load vulkan: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-vulkan.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load opencl: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-opencl.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load musa: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-musa.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cpu: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cpu.so]
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.789+08:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:44667"
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:20:47 hpry ollama[27413]: print_info: file format = GGUF V3 (latest)
Apr 22 21:20:47 hpry ollama[27413]: print_info: file type = Q4_K - Medium
Apr 22 21:20:47 hpry ollama[27413]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:20:47 hpry ollama[27413]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: special tokens cache size = 22
Apr 22 21:20:47 hpry ollama[27413]: load: token to piece cache size = 0.9310 MB
Apr 22 21:20:47 hpry ollama[27413]: print_info: arch = qwen2
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab_only = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ctx_train = 32768
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd = 5120
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_layer = 64
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_head = 40
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_head_kv = 8
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_rot = 128
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_swa = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_head_k = 128
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_head_v = 128
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_gqa = 5
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_k_gqa = 1024
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_v_gqa = 1024
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_norm_eps = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_norm_rms_eps = 1.0e-06
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_clamp_kqv = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_max_alibi_bias = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_logit_scale = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ff = 27648
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_expert = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_expert_used = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: causal attn = 1
Apr 22 21:20:47 hpry ollama[27413]: print_info: pooling type = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: rope type = 2
Apr 22 21:20:47 hpry ollama[27413]: print_info: rope scaling = linear
Apr 22 21:20:47 hpry ollama[27413]: print_info: freq_base_train = 1000000.0
Apr 22 21:20:47 hpry ollama[27413]: print_info: freq_scale_train = 1
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ctx_orig_yarn = 32768
Apr 22 21:20:47 hpry ollama[27413]: print_info: rope_finetuned = unknown
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_conv = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_inner = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_state = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_dt_rank = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_dt_b_c_rms = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: model type = 32B
Apr 22 21:20:47 hpry ollama[27413]: print_info: model params = 32.76 B
Apr 22 21:20:47 hpry ollama[27413]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab type = BPE
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_vocab = 152064
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_merges = 151387
Apr 22 21:20:47 hpry ollama[27413]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: LF token = 198 'Ċ'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: max token length = 256
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 0 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 1 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 2 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 3 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 4 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 5 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 6 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 7 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 8 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 9 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 10 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 11 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 12 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 13 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 14 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 15 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 16 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 17 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 18 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 19 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 20 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 21 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 22 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 23 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 24 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 25 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 26 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 27 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 28 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 29 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 30 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 31 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 32 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 33 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 34 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 35 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 36 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 37 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 38 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 39 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 40 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 41 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 42 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 43 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 44 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 45 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 46 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 47 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 48 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 49 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 50 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 51 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 52 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 53 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 54 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 55 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 56 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 57 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 58 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 59 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 60 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 61 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 62 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 63 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 64 assigned to device CPU
Apr 22 21:20:48 hpry ollama[27413]: time=2025-04-22T21:20:48.033+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
Apr 22 21:20:48 hpry ollama[27413]: load_tensors: CPU_Mapped model buffer size = 18926.01 MiB
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_seq_max = 4
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx = 8192
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx_per_seq = 2048
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_batch = 2048
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ubatch = 512
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: flash_attn = 1
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: freq_base = 1000000.0
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: freq_scale = 1
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: time=2025-04-22T21:20:48.784+08:00 level=DEBUG source=server.go:625 msg="model load progress 1.00"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.035+08:00 level=DEBUG source=server.go:628 msg="model load completed, waiting for server to become available" status="llm server loading model"
Apr 22 21:20:49 hpry ollama[27413]: llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: CPU output buffer size = 2.40 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: CPU compute buffer size = 307.00 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: graph nodes = 1991
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: graph splits = 1
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=INFO source=server.go:619 msg="llama runner started in 1.76 seconds"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:464 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41
Apr 22 21:20:49 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:49 | 200 | 2.880933624s | 127.0.0.1 | POST "/api/generate"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:468 msg="context for request finished"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 duration=5m0s
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 refCount=0
root@hpry:~#

<!-- gh-comment-id:2821327724 --> @liuyixia-make commented on GitHub (Apr 22, 2025): Apr 22 21:20:46 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:46 | 200 | 35.346µs | 127.0.0.1 | HEAD "/" Apr 22 21:20:46 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:46 | 200 | 19.322161ms | 127.0.0.1 | POST "/api/show" Apr 22 21:20:46 hpry ollama[27413]: time=2025-04-22T21:20:46.666+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.6 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB" Apr 22 21:20:46 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20 Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850 Apr 22 21:20:46 hpry ollama[27413]: calling cuInit Apr 22 21:20:46 hpry ollama[27413]: calling cuDriverGetVersion Apr 22 21:20:46 hpry ollama[27413]: raw version 0x2f08 Apr 22 21:20:46 hpry ollama[27413]: CUDA driver version: 12.4 Apr 22 21:20:46 hpry ollama[27413]: calling cuDeviceGetCount Apr 22 21:20:46 hpry ollama[27413]: device count 2 Apr 22 21:20:46 hpry ollama[27413]: time=2025-04-22T21:20:46.927+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.003+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.003+08:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB" Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850 Apr 22 21:20:47 hpry ollama[27413]: calling cuInit Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08 Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4 Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount Apr 22 21:20:47 hpry ollama[27413]: device count 2 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.175+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=INFO source=sched.go:732 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 library=cuda parallel=4 required="23.4 GiB" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB" Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850 Apr 22 21:20:47 hpry ollama[27413]: calling cuInit Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08 Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4 Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount Apr 22 21:20:47 hpry ollama[27413]: device count 2 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.393+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=INFO source=server.go:105 msg="system memory" total="125.0 GiB" free="122.5 GiB" free_swap="8.0 GiB" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB" Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20 Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850 Apr 22 21:20:47 hpry ollama[27413]: calling cuInit Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08 Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4 Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount Apr 22 21:20:47 hpry ollama[27413]: device count 2 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.609+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[23.4 GiB 23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.4 GiB" memory.required.partial="23.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[12.0 GiB 11.4 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=INFO source=server.go:185 msg="enabling flash attention" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type="" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[cuda_v11] Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest)) Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 0: general.architecture str = qwen2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 1: general.type str = model Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 3: general.finetune str = Instruct Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 4: general.basename str = Qwen2.5 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 5: general.size_label str = 32B Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 6: general.license str = apache-2.0 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 22: general.file_type u32 = 15 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type f32: 321 tensors Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q4_K: 385 tensors Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q6_K: 65 tensors Apr 22 21:20:47 hpry ollama[27413]: print_info: file format = GGUF V3 (latest) Apr 22 21:20:47 hpry ollama[27413]: print_info: file type = Q4_K - Medium Apr 22 21:20:47 hpry ollama[27413]: print_info: file size = 18.48 GiB (4.85 BPW) Apr 22 21:20:47 hpry ollama[27413]: init_tokenizer: initializing tokenizer for type 2 Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151653 '<|vision_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151648 '<|box_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151649 '<|box_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151655 '<|image_pad|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151651 '<|quad_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151652 '<|vision_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151656 '<|video_pad|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151644 '<|im_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151650 '<|quad_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: special tokens cache size = 22 Apr 22 21:20:47 hpry ollama[27413]: load: token to piece cache size = 0.9310 MB Apr 22 21:20:47 hpry ollama[27413]: print_info: arch = qwen2 Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab_only = 1 Apr 22 21:20:47 hpry ollama[27413]: print_info: model type = ?B Apr 22 21:20:47 hpry ollama[27413]: print_info: model params = 32.76 B Apr 22 21:20:47 hpry ollama[27413]: print_info: general.name = Qwen2.5 32B Instruct Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab type = BPE Apr 22 21:20:47 hpry ollama[27413]: print_info: n_vocab = 152064 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_merges = 151387 Apr 22 21:20:47 hpry ollama[27413]: print_info: BOS token = 151643 '<|endoftext|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOS token = 151645 '<|im_end|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOT token = 151645 '<|im_end|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: PAD token = 151643 '<|endoftext|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: LF token = 198 'Ċ' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM MID token = 151660 '<|fim_middle|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM REP token = 151663 '<|repo_name|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SEP token = 151664 '<|file_sep|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151643 '<|endoftext|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151645 '<|im_end|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151662 '<|fim_pad|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151663 '<|repo_name|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151664 '<|file_sep|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: max token length = 256 Apr 22 21:20:47 hpry ollama[27413]: llama_model_load: vocab only - skipping tensors Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v11 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --flash-attn --parallel 4 --tensor-split 33,32 --port 44667" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin CUDA_VISIBLE_DEVICES=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4,GPU-de554a1a-5def-5a94-397d-5512a37432da LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v11:/usr/local/lib/ollama]" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1 Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=runner.go:853 msg="starting go runner" Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11 Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load blas: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-blas.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cann: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cann.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cuda: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cuda.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load hip: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-hip.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load kompute: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-kompute.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load metal: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-metal.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load rpc: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-rpc.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load sycl: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-sycl.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load vulkan: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-vulkan.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load opencl: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-opencl.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load musa: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-musa.so] Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cpu: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cpu.so] Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.789+08:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:44667" Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest)) Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 0: general.architecture str = qwen2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 1: general.type str = model Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 3: general.finetune str = Instruct Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 4: general.basename str = Qwen2.5 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 5: general.size_label str = 32B Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 6: general.license str = apache-2.0 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 22: general.file_type u32 = 15 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type f32: 321 tensors Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q4_K: 385 tensors Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q6_K: 65 tensors Apr 22 21:20:47 hpry ollama[27413]: print_info: file format = GGUF V3 (latest) Apr 22 21:20:47 hpry ollama[27413]: print_info: file type = Q4_K - Medium Apr 22 21:20:47 hpry ollama[27413]: print_info: file size = 18.48 GiB (4.85 BPW) Apr 22 21:20:47 hpry ollama[27413]: init_tokenizer: initializing tokenizer for type 2 Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151653 '<|vision_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151648 '<|box_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151649 '<|box_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151655 '<|image_pad|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151651 '<|quad_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151652 '<|vision_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151656 '<|video_pad|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151644 '<|im_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151650 '<|quad_start|>' is not marked as EOG Apr 22 21:20:47 hpry ollama[27413]: load: special tokens cache size = 22 Apr 22 21:20:47 hpry ollama[27413]: load: token to piece cache size = 0.9310 MB Apr 22 21:20:47 hpry ollama[27413]: print_info: arch = qwen2 Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab_only = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ctx_train = 32768 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd = 5120 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_layer = 64 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_head = 40 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_head_kv = 8 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_rot = 128 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_swa = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_head_k = 128 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_head_v = 128 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_gqa = 5 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_k_gqa = 1024 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_v_gqa = 1024 Apr 22 21:20:47 hpry ollama[27413]: print_info: f_norm_eps = 0.0e+00 Apr 22 21:20:47 hpry ollama[27413]: print_info: f_norm_rms_eps = 1.0e-06 Apr 22 21:20:47 hpry ollama[27413]: print_info: f_clamp_kqv = 0.0e+00 Apr 22 21:20:47 hpry ollama[27413]: print_info: f_max_alibi_bias = 0.0e+00 Apr 22 21:20:47 hpry ollama[27413]: print_info: f_logit_scale = 0.0e+00 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ff = 27648 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_expert = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_expert_used = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: causal attn = 1 Apr 22 21:20:47 hpry ollama[27413]: print_info: pooling type = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: rope type = 2 Apr 22 21:20:47 hpry ollama[27413]: print_info: rope scaling = linear Apr 22 21:20:47 hpry ollama[27413]: print_info: freq_base_train = 1000000.0 Apr 22 21:20:47 hpry ollama[27413]: print_info: freq_scale_train = 1 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ctx_orig_yarn = 32768 Apr 22 21:20:47 hpry ollama[27413]: print_info: rope_finetuned = unknown Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_conv = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_inner = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_state = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_dt_rank = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_dt_b_c_rms = 0 Apr 22 21:20:47 hpry ollama[27413]: print_info: model type = 32B Apr 22 21:20:47 hpry ollama[27413]: print_info: model params = 32.76 B Apr 22 21:20:47 hpry ollama[27413]: print_info: general.name = Qwen2.5 32B Instruct Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab type = BPE Apr 22 21:20:47 hpry ollama[27413]: print_info: n_vocab = 152064 Apr 22 21:20:47 hpry ollama[27413]: print_info: n_merges = 151387 Apr 22 21:20:47 hpry ollama[27413]: print_info: BOS token = 151643 '<|endoftext|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOS token = 151645 '<|im_end|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOT token = 151645 '<|im_end|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: PAD token = 151643 '<|endoftext|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: LF token = 198 'Ċ' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM MID token = 151660 '<|fim_middle|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM REP token = 151663 '<|repo_name|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SEP token = 151664 '<|file_sep|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151643 '<|endoftext|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151645 '<|im_end|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151662 '<|fim_pad|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151663 '<|repo_name|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151664 '<|file_sep|>' Apr 22 21:20:47 hpry ollama[27413]: print_info: max token length = 256 Apr 22 21:20:47 hpry ollama[27413]: load_tensors: loading model tensors, this can take a while... (mmap = true) Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 0 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 1 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 2 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 3 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 4 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 5 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 6 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 7 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 8 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 9 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 10 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 11 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 12 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 13 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 14 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 15 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 16 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 17 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 18 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 19 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 20 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 21 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 22 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 23 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 24 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 25 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 26 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 27 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 28 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 29 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 30 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 31 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 32 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 33 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 34 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 35 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 36 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 37 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 38 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 39 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 40 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 41 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 42 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 43 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 44 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 45 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 46 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 47 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 48 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 49 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 50 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 51 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 52 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 53 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 54 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 55 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 56 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 57 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 58 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 59 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 60 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 61 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 62 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 63 assigned to device CPU Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 64 assigned to device CPU Apr 22 21:20:48 hpry ollama[27413]: time=2025-04-22T21:20:48.033+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" Apr 22 21:20:48 hpry ollama[27413]: load_tensors: CPU_Mapped model buffer size = 18926.01 MiB Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_seq_max = 4 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx = 8192 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx_per_seq = 2048 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_batch = 2048 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ubatch = 512 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: flash_attn = 1 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: freq_base = 1000000.0 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: freq_scale = 1 Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 Apr 22 21:20:48 hpry ollama[27413]: time=2025-04-22T21:20:48.784+08:00 level=DEBUG source=server.go:625 msg="model load progress 1.00" Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.035+08:00 level=DEBUG source=server.go:628 msg="model load completed, waiting for server to become available" status="llm server loading model" Apr 22 21:20:49 hpry ollama[27413]: llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: CPU output buffer size = 2.40 MiB Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: CPU compute buffer size = 307.00 MiB Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: graph nodes = 1991 Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: graph splits = 1 Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=INFO source=server.go:619 msg="llama runner started in 1.76 seconds" Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:464 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 Apr 22 21:20:49 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:49 | 200 | 2.880933624s | 127.0.0.1 | POST "/api/generate" Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:468 msg="context for request finished" Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 duration=5m0s Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 refCount=0 root@hpry:~#
Author
Owner

@rick-github commented on GitHub (Apr 22, 2025):

It seems like you have some file permission problems:

Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load blas: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-blas.so]

This is preventing the runner from loading the GPU enabled backends, and so only basic CPU is used for inference:

Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)

The reason that ollama ps reports "100% GPU" is because it fully expected to be able to load the GPU backends, but the permissions problem prevented that.

What's the output of the following:

p="/usr/local/lib/ollama/cuda_v11/libggml-blas.so"; ls -ld / $(while [ "$p" != "/" ]; do echo "$p"; p=$(dirname "$p"); done | tac)
<!-- gh-comment-id:2821368787 --> @rick-github commented on GitHub (Apr 22, 2025): It seems like you have some file permission problems: ``` Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11 Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load blas: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-blas.so] ``` This is preventing the runner from loading the GPU enabled backends, and so only basic CPU is used for inference: ``` Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) ``` The reason that `ollama ps` reports "100% GPU" is because it fully expected to be able to load the GPU backends, but the permissions problem prevented that. What's the output of the following: ``` p="/usr/local/lib/ollama/cuda_v11/libggml-blas.so"; ls -ld / $(while [ "$p" != "/" ]; do echo "$p"; p=$(dirname "$p"); done | tac) ```
Author
Owner

@liuyixia-make commented on GitHub (Apr 22, 2025):

root@hpry:~# p="/usr/local/lib/ollama/cuda_v11/libggml-blas.so"; ls -ld / $(while [ "$p" != "/" ]; do echo "$p"; p=$(dirname "$p"); done | tac)
ls: cannot access '/usr/local/lib/ollama/cuda_v11/libggml-blas.so': No such file or directory
drwxr-xr-x 25 root root 4096 Apr 2 08:16 /
drwxr-xr-x 12 root root 4096 Feb 17 04:51 /usr
drwxr-xr-x 10 root root 4096 Feb 17 04:51 /usr/local
drwxr-xr-x 6 root root 4096 Apr 17 08:43 /usr/local/lib
drwxr-xr-x 3 ollama ollama 4096 Apr 17 08:43 /usr/local/lib/ollama
drwxr-xr-x 2 ollama ollama 4096 Apr 17 08:44 /usr/local/lib/ollama/cuda_v11

<!-- gh-comment-id:2821376440 --> @liuyixia-make commented on GitHub (Apr 22, 2025): root@hpry:~# p="/usr/local/lib/ollama/cuda_v11/libggml-blas.so"; ls -ld / $(while [ "$p" != "/" ]; do echo "$p"; p=$(dirname "$p"); done | tac) ls: cannot access '/usr/local/lib/ollama/cuda_v11/libggml-blas.so': No such file or directory drwxr-xr-x 25 root root 4096 Apr 2 08:16 / drwxr-xr-x 12 root root 4096 Feb 17 04:51 /usr drwxr-xr-x 10 root root 4096 Feb 17 04:51 /usr/local drwxr-xr-x 6 root root 4096 Apr 17 08:43 /usr/local/lib drwxr-xr-x 3 ollama ollama 4096 Apr 17 08:43 /usr/local/lib/ollama drwxr-xr-x 2 ollama ollama 4096 Apr 17 08:44 /usr/local/lib/ollama/cuda_v11
Author
Owner

@liuyixia-make commented on GitHub (Apr 22, 2025):

I solved the permissions problem but it's still assigned to the cpu.

root@hpry:~# journalctl -u ollama -n 500 --no-pager
Apr 22 21:35:11 hpry ollama[28685]: [GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4] Compute Capability 8.6
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5710 unique_id=0
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card0/device/vendor error="open /sys/class/drm/card0/device/vendor: no such file or directory"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card0-Unknown-1/device/vendor error="open /sys/class/drm/card0-Unknown-1/device/vendor: no such file or directory"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected"
Apr 22 21:35:11 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-de554a1a-5def-5a94-397d-5512a37432da library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB"
Apr 22 21:35:35 hpry ollama[28685]: [GIN] 2025/04/22 - 21:35:35 | 200 | 42.629µs | 127.0.0.1 | HEAD "/"
Apr 22 21:35:35 hpry ollama[28685]: [GIN] 2025/04/22 - 21:35:35 | 200 | 18.904389ms | 127.0.0.1 | POST "/api/show"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.422+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.6 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:35 hpry ollama[28685]: calling cuInit
Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:35 hpry ollama[28685]: device count 2
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.675+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.749+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.749+08:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.4 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:35 hpry ollama[28685]: calling cuInit
Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:35 hpry ollama[28685]: device count 2
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.925+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=INFO source=sched.go:732 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 library=cuda parallel=4 required="23.4 GiB"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.4 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:35 hpry ollama[28685]: calling cuInit
Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:35 hpry ollama[28685]: device count 2
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.141+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=INFO source=server.go:105 msg="system memory" total="125.0 GiB" free="122.5 GiB" free_swap="8.0 GiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.206+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:36 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:36 hpry ollama[28685]: calling cuInit
Apr 22 21:35:36 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:36 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:36 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:36 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:36 hpry ollama[28685]: device count 2
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.355+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[23.4 GiB 23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.4 GiB" memory.required.partial="23.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[12.0 GiB 11.4 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=INFO source=server.go:185 msg="enabling flash attention"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type=""
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[cuda_v11]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:35:36 hpry ollama[28685]: print_info: file format = GGUF V3 (latest)
Apr 22 21:35:36 hpry ollama[28685]: print_info: file type = Q4_K - Medium
Apr 22 21:35:36 hpry ollama[28685]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:35:36 hpry ollama[28685]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: special tokens cache size = 22
Apr 22 21:35:36 hpry ollama[28685]: load: token to piece cache size = 0.9310 MB
Apr 22 21:35:36 hpry ollama[28685]: print_info: arch = qwen2
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab_only = 1
Apr 22 21:35:36 hpry ollama[28685]: print_info: model type = ?B
Apr 22 21:35:36 hpry ollama[28685]: print_info: model params = 32.76 B
Apr 22 21:35:36 hpry ollama[28685]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab type = BPE
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_vocab = 152064
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_merges = 151387
Apr 22 21:35:36 hpry ollama[28685]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: LF token = 198 'Ċ'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: max token length = 256
Apr 22 21:35:36 hpry ollama[28685]: llama_model_load: vocab only - skipping tensors
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --flash-attn --parallel 4 --tensor-split 33,32 --port 37071"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin CUDA_VISIBLE_DEVICES=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4,GPU-de554a1a-5def-5a94-397d-5512a37432da LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v11:/usr/local/lib/ollama]"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.535+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.535+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.542+08:00 level=INFO source=runner.go:853 msg="starting go runner"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.542+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:37071"
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:35:36 hpry ollama[28685]: print_info: file format = GGUF V3 (latest)
Apr 22 21:35:36 hpry ollama[28685]: print_info: file type = Q4_K - Medium
Apr 22 21:35:36 hpry ollama[28685]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:35:36 hpry ollama[28685]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: special tokens cache size = 22
Apr 22 21:35:36 hpry ollama[28685]: load: token to piece cache size = 0.9310 MB
Apr 22 21:35:36 hpry ollama[28685]: print_info: arch = qwen2
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab_only = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ctx_train = 32768
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd = 5120
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_layer = 64
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_head = 40
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_head_kv = 8
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_rot = 128
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_swa = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_head_k = 128
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_head_v = 128
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_gqa = 5
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_k_gqa = 1024
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_v_gqa = 1024
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_norm_eps = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_norm_rms_eps = 1.0e-06
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_clamp_kqv = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_max_alibi_bias = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_logit_scale = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ff = 27648
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_expert = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_expert_used = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: causal attn = 1
Apr 22 21:35:36 hpry ollama[28685]: print_info: pooling type = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: rope type = 2
Apr 22 21:35:36 hpry ollama[28685]: print_info: rope scaling = linear
Apr 22 21:35:36 hpry ollama[28685]: print_info: freq_base_train = 1000000.0
Apr 22 21:35:36 hpry ollama[28685]: print_info: freq_scale_train = 1
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ctx_orig_yarn = 32768
Apr 22 21:35:36 hpry ollama[28685]: print_info: rope_finetuned = unknown
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_conv = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_inner = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_state = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_dt_rank = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_dt_b_c_rms = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: model type = 32B
Apr 22 21:35:36 hpry ollama[28685]: print_info: model params = 32.76 B
Apr 22 21:35:36 hpry ollama[28685]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab type = BPE
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_vocab = 152064
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_merges = 151387
Apr 22 21:35:36 hpry ollama[28685]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: LF token = 198 'Ċ'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: max token length = 256
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 0 assigned to device CPU
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 1 assigned to device CPU
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 2 assigned to device CPU
...
...

<!-- gh-comment-id:2821380080 --> @liuyixia-make commented on GitHub (Apr 22, 2025): I solved the permissions problem but it's still assigned to the cpu. root@hpry:~# journalctl -u ollama -n 500 --no-pager Apr 22 21:35:11 hpry ollama[28685]: [GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4] Compute Capability 8.6 Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5710 unique_id=0 Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card0/device/vendor error="open /sys/class/drm/card0/device/vendor: no such file or directory" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card0-Unknown-1/device/vendor error="open /sys/class/drm/card0-Unknown-1/device/vendor: no such file or directory" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected" Apr 22 21:35:11 hpry ollama[28685]: releasing cuda driver library Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-de554a1a-5def-5a94-397d-5512a37432da library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB" Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB" Apr 22 21:35:35 hpry ollama[28685]: [GIN] 2025/04/22 - 21:35:35 | 200 | 42.629µs | 127.0.0.1 | HEAD "/" Apr 22 21:35:35 hpry ollama[28685]: [GIN] 2025/04/22 - 21:35:35 | 200 | 18.904389ms | 127.0.0.1 | POST "/api/show" Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.422+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.6 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB" Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850 Apr 22 21:35:35 hpry ollama[28685]: calling cuInit Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08 Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4 Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount Apr 22 21:35:35 hpry ollama[28685]: device count 2 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.675+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.749+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:35 hpry ollama[28685]: releasing cuda driver library Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.749+08:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]" Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.4 GiB" now.free_swap="8.0 GiB" Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850 Apr 22 21:35:35 hpry ollama[28685]: calling cuInit Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08 Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4 Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount Apr 22 21:35:35 hpry ollama[28685]: device count 2 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.925+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:35 hpry ollama[28685]: releasing cuda driver library Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=INFO source=sched.go:732 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 library=cuda parallel=4 required="23.4 GiB" Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.4 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB" Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20 Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850 Apr 22 21:35:35 hpry ollama[28685]: calling cuInit Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08 Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4 Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount Apr 22 21:35:35 hpry ollama[28685]: device count 2 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.141+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:36 hpry ollama[28685]: releasing cuda driver library Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=INFO source=server.go:105 msg="system memory" total="125.0 GiB" free="122.5 GiB" free_swap="8.0 GiB" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.206+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB" Apr 22 21:35:36 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20 Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850 Apr 22 21:35:36 hpry ollama[28685]: calling cuInit Apr 22 21:35:36 hpry ollama[28685]: calling cuDriverGetVersion Apr 22 21:35:36 hpry ollama[28685]: raw version 0x2f08 Apr 22 21:35:36 hpry ollama[28685]: CUDA driver version: 12.4 Apr 22 21:35:36 hpry ollama[28685]: calling cuDeviceGetCount Apr 22 21:35:36 hpry ollama[28685]: device count 2 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.355+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB" Apr 22 21:35:36 hpry ollama[28685]: releasing cuda driver library Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[23.4 GiB 23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.4 GiB" memory.required.partial="23.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[12.0 GiB 11.4 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=INFO source=server.go:185 msg="enabling flash attention" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type="" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[cuda_v11] Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest)) Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 0: general.architecture str = qwen2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 1: general.type str = model Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 3: general.finetune str = Instruct Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 4: general.basename str = Qwen2.5 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 5: general.size_label str = 32B Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 6: general.license str = apache-2.0 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 22: general.file_type u32 = 15 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type f32: 321 tensors Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q4_K: 385 tensors Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q6_K: 65 tensors Apr 22 21:35:36 hpry ollama[28685]: print_info: file format = GGUF V3 (latest) Apr 22 21:35:36 hpry ollama[28685]: print_info: file type = Q4_K - Medium Apr 22 21:35:36 hpry ollama[28685]: print_info: file size = 18.48 GiB (4.85 BPW) Apr 22 21:35:36 hpry ollama[28685]: init_tokenizer: initializing tokenizer for type 2 Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151653 '<|vision_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151648 '<|box_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151649 '<|box_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151655 '<|image_pad|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151651 '<|quad_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151652 '<|vision_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151656 '<|video_pad|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151644 '<|im_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151650 '<|quad_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: special tokens cache size = 22 Apr 22 21:35:36 hpry ollama[28685]: load: token to piece cache size = 0.9310 MB Apr 22 21:35:36 hpry ollama[28685]: print_info: arch = qwen2 Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab_only = 1 Apr 22 21:35:36 hpry ollama[28685]: print_info: model type = ?B Apr 22 21:35:36 hpry ollama[28685]: print_info: model params = 32.76 B Apr 22 21:35:36 hpry ollama[28685]: print_info: general.name = Qwen2.5 32B Instruct Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab type = BPE Apr 22 21:35:36 hpry ollama[28685]: print_info: n_vocab = 152064 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_merges = 151387 Apr 22 21:35:36 hpry ollama[28685]: print_info: BOS token = 151643 '<|endoftext|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOS token = 151645 '<|im_end|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOT token = 151645 '<|im_end|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: PAD token = 151643 '<|endoftext|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: LF token = 198 'Ċ' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM MID token = 151660 '<|fim_middle|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM REP token = 151663 '<|repo_name|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SEP token = 151664 '<|file_sep|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151643 '<|endoftext|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151645 '<|im_end|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151662 '<|fim_pad|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151663 '<|repo_name|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151664 '<|file_sep|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: max token length = 256 Apr 22 21:35:36 hpry ollama[28685]: llama_model_load: vocab only - skipping tensors Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v11 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --flash-attn --parallel 4 --tensor-split 33,32 --port 37071" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin CUDA_VISIBLE_DEVICES=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4,GPU-de554a1a-5def-5a94-397d-5512a37432da LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v11:/usr/local/lib/ollama]" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.535+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.535+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.542+08:00 level=INFO source=runner.go:853 msg="starting go runner" Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.542+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11 Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:37071" Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest)) Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 0: general.architecture str = qwen2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 1: general.type str = model Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 3: general.finetune str = Instruct Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 4: general.basename str = Qwen2.5 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 5: general.size_label str = 32B Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 6: general.license str = apache-2.0 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"] Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 22: general.file_type u32 = 15 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type f32: 321 tensors Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q4_K: 385 tensors Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q6_K: 65 tensors Apr 22 21:35:36 hpry ollama[28685]: print_info: file format = GGUF V3 (latest) Apr 22 21:35:36 hpry ollama[28685]: print_info: file type = Q4_K - Medium Apr 22 21:35:36 hpry ollama[28685]: print_info: file size = 18.48 GiB (4.85 BPW) Apr 22 21:35:36 hpry ollama[28685]: init_tokenizer: initializing tokenizer for type 2 Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151653 '<|vision_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151648 '<|box_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151649 '<|box_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151655 '<|image_pad|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151651 '<|quad_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151652 '<|vision_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151656 '<|video_pad|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151644 '<|im_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151650 '<|quad_start|>' is not marked as EOG Apr 22 21:35:36 hpry ollama[28685]: load: special tokens cache size = 22 Apr 22 21:35:36 hpry ollama[28685]: load: token to piece cache size = 0.9310 MB Apr 22 21:35:36 hpry ollama[28685]: print_info: arch = qwen2 Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab_only = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ctx_train = 32768 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd = 5120 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_layer = 64 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_head = 40 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_head_kv = 8 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_rot = 128 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_swa = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_head_k = 128 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_head_v = 128 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_gqa = 5 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_k_gqa = 1024 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_v_gqa = 1024 Apr 22 21:35:36 hpry ollama[28685]: print_info: f_norm_eps = 0.0e+00 Apr 22 21:35:36 hpry ollama[28685]: print_info: f_norm_rms_eps = 1.0e-06 Apr 22 21:35:36 hpry ollama[28685]: print_info: f_clamp_kqv = 0.0e+00 Apr 22 21:35:36 hpry ollama[28685]: print_info: f_max_alibi_bias = 0.0e+00 Apr 22 21:35:36 hpry ollama[28685]: print_info: f_logit_scale = 0.0e+00 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ff = 27648 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_expert = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_expert_used = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: causal attn = 1 Apr 22 21:35:36 hpry ollama[28685]: print_info: pooling type = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: rope type = 2 Apr 22 21:35:36 hpry ollama[28685]: print_info: rope scaling = linear Apr 22 21:35:36 hpry ollama[28685]: print_info: freq_base_train = 1000000.0 Apr 22 21:35:36 hpry ollama[28685]: print_info: freq_scale_train = 1 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ctx_orig_yarn = 32768 Apr 22 21:35:36 hpry ollama[28685]: print_info: rope_finetuned = unknown Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_conv = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_inner = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_state = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_dt_rank = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_dt_b_c_rms = 0 Apr 22 21:35:36 hpry ollama[28685]: print_info: model type = 32B Apr 22 21:35:36 hpry ollama[28685]: print_info: model params = 32.76 B Apr 22 21:35:36 hpry ollama[28685]: print_info: general.name = Qwen2.5 32B Instruct Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab type = BPE Apr 22 21:35:36 hpry ollama[28685]: print_info: n_vocab = 152064 Apr 22 21:35:36 hpry ollama[28685]: print_info: n_merges = 151387 Apr 22 21:35:36 hpry ollama[28685]: print_info: BOS token = 151643 '<|endoftext|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOS token = 151645 '<|im_end|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOT token = 151645 '<|im_end|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: PAD token = 151643 '<|endoftext|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: LF token = 198 'Ċ' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM MID token = 151660 '<|fim_middle|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PAD token = 151662 '<|fim_pad|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM REP token = 151663 '<|repo_name|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SEP token = 151664 '<|file_sep|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151643 '<|endoftext|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151645 '<|im_end|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151662 '<|fim_pad|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151663 '<|repo_name|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151664 '<|file_sep|>' Apr 22 21:35:36 hpry ollama[28685]: print_info: max token length = 256 Apr 22 21:35:36 hpry ollama[28685]: load_tensors: loading model tensors, this can take a while... (mmap = true) Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 0 assigned to device CPU Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 1 assigned to device CPU Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 2 assigned to device CPU ... ...
Author
Owner

@rick-github commented on GitHub (Apr 22, 2025):

ls: cannot access '/usr/local/lib/ollama/cuda_v11/libggml-blas.so': No such file or directory

Your installation appears to be incomplete.

<!-- gh-comment-id:2821398261 --> @rick-github commented on GitHub (Apr 22, 2025): ``` ls: cannot access '/usr/local/lib/ollama/cuda_v11/libggml-blas.so': No such file or directory ``` Your installation appears to be incomplete.
Author
Owner

@liuyixia-make commented on GitHub (Apr 22, 2025):

ls: cannot access '/usr/local/lib/ollama/cuda_v11/libggml-blas.so': No such file or directory

您的安装似乎不完整。

Something doesn't seem right.

root@hpry:/usr/local/lib/ollama/cuda_v11# ls -l /usr/local/lib/ollama/cuda_v11
total 1010100
-rwxr-xr-x 1 ollama ollama 93134848 Apr 17 08:44 libcublasLt.so.11.5.1.109
lrwxrwxrwx 1 ollama ollama 23 Apr 7 12:34 libcublas.so.11 -> libcublas.so.11.5.1.109
-rwxr-xr-x 1 ollama ollama 121866104 May 5 2021 libcublas.so.11.5.1.109
-rwxr-xr-x 1 ollama ollama 819327824 Apr 7 12:34 libggml-cuda.so

<!-- gh-comment-id:2821453087 --> @liuyixia-make commented on GitHub (Apr 22, 2025): > ``` > ls: cannot access '/usr/local/lib/ollama/cuda_v11/libggml-blas.so': No such file or directory > ``` > > 您的安装似乎不完整。 Something doesn't seem right. root@hpry:/usr/local/lib/ollama/cuda_v11# ls -l /usr/local/lib/ollama/cuda_v11 total 1010100 -rwxr-xr-x 1 ollama ollama 93134848 Apr 17 08:44 libcublasLt.so.11.5.1.109 lrwxrwxrwx 1 ollama ollama 23 Apr 7 12:34 libcublas.so.11 -> libcublas.so.11.5.1.109 -rwxr-xr-x 1 ollama ollama 121866104 May 5 2021 libcublas.so.11.5.1.109 -rwxr-xr-x 1 ollama ollama 819327824 Apr 7 12:34 libggml-cuda.so
Author
Owner

@shaofanqi commented on GitHub (Apr 23, 2025):

please help.

4月 23 17:39:17 ollama[2069]: time=2025-04-23T17:39:17.821+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="246.1 GiB" before.free_swap="2.0 GiB" now.total="251.5 GiB" now.free="246.1 GiB" now.free_swap="2.0 GiB"
4月 23 17:39:17 ollama[2069]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 17:39:17 ollama[2069]: dlsym: cuInit - 0x7f6c62679ef0
4月 23 17:39:17 ollama[2069]: dlsym: cuDriverGetVersion - 0x7f6c62679f10
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetCount - 0x7f6c62679f50
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGet - 0x7f6c62679f30
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetAttribute - 0x7f6c6267a030
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetUuid - 0x7f6c62679f90
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetName - 0x7f6c62679f70
4月 23 17:39:17 ollama[2069]: dlsym: cuCtxCreate_v3 - 0x7f6c6267a210
4月 23 17:39:17 ollama[2069]: dlsym: cuMemGetInfo_v2 - 0x7f6c62684190
4月 23 17:39:17 ollama[2069]: dlsym: cuCtxDestroy - 0x7f6c626de7f0
4月 23 17:39:17 ollama[2069]: calling cuInit
4月 23 17:39:17 ollama[2069]: calling cuDriverGetVersion
4月 23 17:39:17 ollama[2069]: raw version 0x2f08
4月 23 17:39:17 ollama[2069]: CUDA driver version: 12.4
4月 23 17:39:17 ollama[2069]: calling cuDeviceGetCount
4月 23 17:39:17 ollama[2069]: device count 2
4月 23 17:39:17 ollama[2069]: time=2025-04-23T17:39:17.932+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.039+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.4 GiB"
4月 23 17:39:18 ollama[2069]: releasing cuda driver library
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.080+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.081+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.081+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.082+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.083+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.084+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.085+08:00 level=INFO source=sched.go:730 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 library=cuda parallel=4 required="37.4 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.085+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="246.1 GiB" before.free_swap="2.0 GiB" now.total="251.5 GiB" now.free="246.1 GiB" now.free_swap="2.0 GiB"
4月 23 17:39:18 ollama[2069]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 17:39:18 ollama[2069]: dlsym: cuInit - 0x7f6c62679ef0
4月 23 17:39:18 ollama[2069]: dlsym: cuDriverGetVersion - 0x7f6c62679f10
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetCount - 0x7f6c62679f50
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGet - 0x7f6c62679f30
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetAttribute - 0x7f6c6267a030
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetUuid - 0x7f6c62679f90
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetName - 0x7f6c62679f70
4月 23 17:39:18 ollama[2069]: dlsym: cuCtxCreate_v3 - 0x7f6c6267a210
4月 23 17:39:18 ollama[2069]: dlsym: cuMemGetInfo_v2 - 0x7f6c62684190
4月 23 17:39:18 ollama[2069]: dlsym: cuCtxDestroy - 0x7f6c626de7f0
4月 23 17:39:18 ollama[2069]: calling cuInit
4月 23 17:39:18 ollama[2069]: calling cuDriverGetVersion
4月 23 17:39:18 ollama[2069]: raw version 0x2f08
4月 23 17:39:18 ollama[2069]: CUDA driver version: 12.4
4月 23 17:39:18 ollama[2069]: calling cuDeviceGetCount
4月 23 17:39:18 ollama[2069]: device count 2
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.176+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.4 GiB"
4月 23 17:39:18 ollama[2069]: releasing cuda driver library
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=INFO source=server.go:104 msg="system memory" total="251.5 GiB" free="246.1 GiB" free_swap="2.0 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.264+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[20.2 GiB 19.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="37.4 GiB" memory.required.partial="37.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[19.1 GiB 18.3 GiB]" memory.weights.total="32.9 GiB" memory.weights.repeating="32.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=DEBUG source=gpu.go:713 msg="no filter required for library cpu"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --parallel 4 --tensor-split 33,32 --port 35827"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=DEBUG source=server.go:393 msg=subprocess environment="[PATH=/home/eppei/miniconda3/envs/vllm/bin:/home/eppei/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:.:/usr/local/bin]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.266+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:936 msg="starting go runner"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:35827"
4月 23 17:39:18 ollama[2069]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest))
4月 23 17:39:18 ollama[2069]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 0: general.architecture str = qwen2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 1: general.type str = model
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 2: general.name str = QwQ 32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 3: general.basename str = QwQ
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 4: general.size_label str = 32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 5: general.license str = apache-2.0
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 7: general.base_model.count u32 = 1
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"]
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 31: general.quantization_version u32 = 2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 32: general.file_type u32 = 7
4月 23 17:39:18 ollama[2069]: llama_model_loader: - type f32: 321 tensors
4月 23 17:39:18 ollama[2069]: llama_model_loader: - type q8_0: 450 tensors
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.517+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: special tokens cache size = 26
4月 23 17:39:18 ollama[2069]: llm_load_vocab: token to piece cache size = 0.9311 MB
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: format = GGUF V3 (latest)
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: arch = qwen2
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: vocab type = BPE
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_vocab = 152064
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_merges = 151387
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: vocab_only = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ctx_train = 131072
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd = 5120
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_layer = 64
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_head = 40
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_head_kv = 8
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_rot = 128
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_swa = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_head_k = 128
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_head_v = 128
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_gqa = 5
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_k_gqa = 1024
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_v_gqa = 1024
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_norm_eps = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_logit_scale = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ff = 27648
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_expert = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_expert_used = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: causal attn = 1
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: pooling type = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope type = 2
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope scaling = linear
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: freq_base_train = 1000000.0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: freq_scale_train = 1
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ctx_orig_yarn = 131072
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope_finetuned = unknown
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_conv = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_inner = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_state = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_dt_rank = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_dt_b_c_rms = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model type = 32B
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model ftype = Q8_0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model params = 32.76 B
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW)
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: general.name = QwQ 32B
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOS token = 151645 '<|im_end|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOT token = 151645 '<|im_end|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: LF token = 148848 'ÄĬ'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151645 '<|im_end|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: max token length = 256
4月 23 17:39:18 ollama[2069]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 770 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
4月 23 17:39:20 ollama[2069]: llm_load_tensors: CPU_Mapped model buffer size = 33202.08 MiB
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_seq_max = 4
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx = 8192
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx_per_seq = 2048
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_batch = 2048
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ubatch = 512
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: flash_attn = 0
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: freq_base = 1000000.0
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: freq_scale = 1
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: time=2025-04-23T17:39:20.278+08:00 level=DEBUG source=server.go:600 msg="model load progress 1.00"
4月 23 17:39:20 ollama[2069]: time=2025-04-23T17:39:20.529+08:00 level=DEBUG source=server.go:603 msg="model load completed, waiting for server to become available" status="llm server loading model"
4月 23 17:39:21 ollama[2069]: llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: CPU output buffer size = 2.40 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: CPU compute buffer size = 696.01 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: graph nodes = 2246
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: graph splits = 1
4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.534+08:00 level=INFO source=server.go:594 msg="llama runner started in 3.27 seconds"
4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.534+08:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.536+08:00 level=DEBUG source=server.go:966 msg="new runner detected, loading model for cgo tokenization"
4月 23 17:39:21 ollama[2069]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest))
4月 23 17:39:21 ollama[2069]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 0: general.architecture str = qwen2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 1: general.type str = model
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 2: general.name str = QwQ 32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 3: general.basename str = QwQ
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 4: general.size_label str = 32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 5: general.license str = apache-2.0
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 7: general.base_model.count u32 = 1
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"]
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 31: general.quantization_version u32 = 2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 32: general.file_type u32 = 7
4月 23 17:39:21 ollama[2069]: llama_model_loader: - type f32: 321 tensors
4月 23 17:39:21 ollama[2069]: llama_model_loader: - type q8_0: 450 tensors
4月 23 17:39:21 ollama[2069]: llm_load_vocab: special tokens cache size = 26
4月 23 17:39:22 ollama[2069]: llm_load_vocab: token to piece cache size = 0.9311 MB
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: format = GGUF V3 (latest)
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: arch = qwen2
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: vocab type = BPE
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: n_vocab = 152064
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: n_merges = 151387
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: vocab_only = 1
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model type = ?B
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model ftype = all F32
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model params = 32.76 B
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW)
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: general.name = QwQ 32B
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOS token = 151645 '<|im_end|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOT token = 151645 '<|im_end|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: LF token = 148848 'ÄĬ'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151645 '<|im_end|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: max token length = 256
4月 23 17:39:22 ollama[2069]: llama_model_load: vocab only - skipping tensors
4月 23 17:39:22 ollama[2069]: time=2025-04-23T17:39:22.347+08:00 level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="<|im_start|>system\n你是xxx<|im_end|>\n<|im_start|>user\n你好<|im_end|>\n<|im_start|>assistant\n"
4月 23 17:39:22 ollama[2069]: time=2025-04-23T17:39:22.404+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=1594 used=0 remaining=1594
4月 23 17:39:35 ollama[2069]: [GIN] 2025/04/23 - 17:39:35 | 200 | 42.814µs | 127.0.0.1 | HEAD "/"
4月 23 17:39:35 ollama[2069]: [GIN] 2025/04/23 - 17:39:35 | 200 | 53.549µs | 127.0.0.1 | GET "/api/ps"
4月 23 18:01:48 ollama[2069]: [GIN] 2025/04/23 - 18:01:48 | 200 | 22m31s | 127.0.0.1 | POST "/v1/chat/completions"
4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:466 msg="context for request finished"
4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 duration=5m0s
4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0

<!-- gh-comment-id:2823824008 --> @shaofanqi commented on GitHub (Apr 23, 2025): please help. 4月 23 17:39:17 ollama[2069]: time=2025-04-23T17:39:17.821+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="246.1 GiB" before.free_swap="2.0 GiB" now.total="251.5 GiB" now.free="246.1 GiB" now.free_swap="2.0 GiB" 4月 23 17:39:17 ollama[2069]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 17:39:17 ollama[2069]: dlsym: cuInit - 0x7f6c62679ef0 4月 23 17:39:17 ollama[2069]: dlsym: cuDriverGetVersion - 0x7f6c62679f10 4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetCount - 0x7f6c62679f50 4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGet - 0x7f6c62679f30 4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetAttribute - 0x7f6c6267a030 4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetUuid - 0x7f6c62679f90 4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetName - 0x7f6c62679f70 4月 23 17:39:17 ollama[2069]: dlsym: cuCtxCreate_v3 - 0x7f6c6267a210 4月 23 17:39:17 ollama[2069]: dlsym: cuMemGetInfo_v2 - 0x7f6c62684190 4月 23 17:39:17 ollama[2069]: dlsym: cuCtxDestroy - 0x7f6c626de7f0 4月 23 17:39:17 ollama[2069]: calling cuInit 4月 23 17:39:17 ollama[2069]: calling cuDriverGetVersion 4月 23 17:39:17 ollama[2069]: raw version 0x2f08 4月 23 17:39:17 ollama[2069]: CUDA driver version: 12.4 4月 23 17:39:17 ollama[2069]: calling cuDeviceGetCount 4月 23 17:39:17 ollama[2069]: device count 2 4月 23 17:39:17 ollama[2069]: time=2025-04-23T17:39:17.932+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.039+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.4 GiB" 4月 23 17:39:18 ollama[2069]: releasing cuda driver library 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.080+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.081+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.081+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.082+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.083+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.084+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.085+08:00 level=INFO source=sched.go:730 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 library=cuda parallel=4 required="37.4 GiB" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.085+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="246.1 GiB" before.free_swap="2.0 GiB" now.total="251.5 GiB" now.free="246.1 GiB" now.free_swap="2.0 GiB" 4月 23 17:39:18 ollama[2069]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 17:39:18 ollama[2069]: dlsym: cuInit - 0x7f6c62679ef0 4月 23 17:39:18 ollama[2069]: dlsym: cuDriverGetVersion - 0x7f6c62679f10 4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetCount - 0x7f6c62679f50 4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGet - 0x7f6c62679f30 4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetAttribute - 0x7f6c6267a030 4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetUuid - 0x7f6c62679f90 4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetName - 0x7f6c62679f70 4月 23 17:39:18 ollama[2069]: dlsym: cuCtxCreate_v3 - 0x7f6c6267a210 4月 23 17:39:18 ollama[2069]: dlsym: cuMemGetInfo_v2 - 0x7f6c62684190 4月 23 17:39:18 ollama[2069]: dlsym: cuCtxDestroy - 0x7f6c626de7f0 4月 23 17:39:18 ollama[2069]: calling cuInit 4月 23 17:39:18 ollama[2069]: calling cuDriverGetVersion 4月 23 17:39:18 ollama[2069]: raw version 0x2f08 4月 23 17:39:18 ollama[2069]: CUDA driver version: 12.4 4月 23 17:39:18 ollama[2069]: calling cuDeviceGetCount 4月 23 17:39:18 ollama[2069]: device count 2 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.176+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.4 GiB" 4月 23 17:39:18 ollama[2069]: releasing cuda driver library 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=INFO source=server.go:104 msg="system memory" total="251.5 GiB" free="246.1 GiB" free_swap="2.0 GiB" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.264+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[20.2 GiB 19.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="37.4 GiB" memory.required.partial="37.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[19.1 GiB 18.3 GiB]" memory.weights.total="32.9 GiB" memory.weights.repeating="32.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=DEBUG source=gpu.go:713 msg="no filter required for library cpu" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --parallel 4 --tensor-split 33,32 --port 35827" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=DEBUG source=server.go:393 msg=subprocess environment="[PATH=/home/eppei/miniconda3/envs/vllm/bin:/home/eppei/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:.:/usr/local/bin]" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.266+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:936 msg="starting go runner" 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:35827" 4月 23 17:39:18 ollama[2069]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest)) 4月 23 17:39:18 ollama[2069]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 0: general.architecture str = qwen2 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 1: general.type str = model 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 2: general.name str = QwQ 32B 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 3: general.basename str = QwQ 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 4: general.size_label str = 32B 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 5: general.license str = apache-2.0 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b... 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 7: general.base_model.count u32 = 1 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"] 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"] 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 31: general.quantization_version u32 = 2 4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 32: general.file_type u32 = 7 4月 23 17:39:18 ollama[2069]: llama_model_loader: - type f32: 321 tensors 4月 23 17:39:18 ollama[2069]: llama_model_loader: - type q8_0: 450 tensors 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.517+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model" 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG 4月 23 17:39:18 ollama[2069]: llm_load_vocab: special tokens cache size = 26 4月 23 17:39:18 ollama[2069]: llm_load_vocab: token to piece cache size = 0.9311 MB 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: format = GGUF V3 (latest) 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: arch = qwen2 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: vocab type = BPE 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_vocab = 152064 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_merges = 151387 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: vocab_only = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ctx_train = 131072 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd = 5120 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_layer = 64 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_head = 40 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_head_kv = 8 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_rot = 128 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_swa = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_head_k = 128 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_head_v = 128 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_gqa = 5 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_k_gqa = 1024 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_v_gqa = 1024 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_norm_eps = 0.0e+00 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_logit_scale = 0.0e+00 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ff = 27648 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_expert = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_expert_used = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: causal attn = 1 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: pooling type = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope type = 2 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope scaling = linear 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: freq_base_train = 1000000.0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: freq_scale_train = 1 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ctx_orig_yarn = 131072 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope_finetuned = unknown 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_conv = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_inner = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_state = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_dt_rank = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_dt_b_c_rms = 0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model type = 32B 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model ftype = Q8_0 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model params = 32.76 B 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW) 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: general.name = QwQ 32B 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOS token = 151645 '<|im_end|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOT token = 151645 '<|im_end|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: LF token = 148848 'ÄĬ' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151645 '<|im_end|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' 4月 23 17:39:18 ollama[2069]: llm_load_print_meta: max token length = 256 4月 23 17:39:18 ollama[2069]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 770 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead 4月 23 17:39:20 ollama[2069]: llm_load_tensors: CPU_Mapped model buffer size = 33202.08 MiB 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_seq_max = 4 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx = 8192 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx_per_seq = 2048 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_batch = 2048 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ubatch = 512 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: flash_attn = 0 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: freq_base = 1000000.0 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: freq_scale = 1 4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 17:39:20 ollama[2069]: time=2025-04-23T17:39:20.278+08:00 level=DEBUG source=server.go:600 msg="model load progress 1.00" 4月 23 17:39:20 ollama[2069]: time=2025-04-23T17:39:20.529+08:00 level=DEBUG source=server.go:603 msg="model load completed, waiting for server to become available" status="llm server loading model" 4月 23 17:39:21 ollama[2069]: llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB 4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB 4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: CPU output buffer size = 2.40 MiB 4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: CPU compute buffer size = 696.01 MiB 4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: graph nodes = 2246 4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: graph splits = 1 4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.534+08:00 level=INFO source=server.go:594 msg="llama runner started in 3.27 seconds" 4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.534+08:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.536+08:00 level=DEBUG source=server.go:966 msg="new runner detected, loading model for cgo tokenization" 4月 23 17:39:21 ollama[2069]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest)) 4月 23 17:39:21 ollama[2069]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 0: general.architecture str = qwen2 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 1: general.type str = model 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 2: general.name str = QwQ 32B 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 3: general.basename str = QwQ 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 4: general.size_label str = 32B 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 5: general.license str = apache-2.0 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b... 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 7: general.base_model.count u32 = 1 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"] 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"] 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 31: general.quantization_version u32 = 2 4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 32: general.file_type u32 = 7 4月 23 17:39:21 ollama[2069]: llama_model_loader: - type f32: 321 tensors 4月 23 17:39:21 ollama[2069]: llama_model_loader: - type q8_0: 450 tensors 4月 23 17:39:21 ollama[2069]: llm_load_vocab: special tokens cache size = 26 4月 23 17:39:22 ollama[2069]: llm_load_vocab: token to piece cache size = 0.9311 MB 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: format = GGUF V3 (latest) 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: arch = qwen2 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: vocab type = BPE 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: n_vocab = 152064 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: n_merges = 151387 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: vocab_only = 1 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model type = ?B 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model ftype = all F32 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model params = 32.76 B 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW) 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: general.name = QwQ 32B 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOS token = 151645 '<|im_end|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOT token = 151645 '<|im_end|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: LF token = 148848 'ÄĬ' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151645 '<|im_end|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' 4月 23 17:39:22 ollama[2069]: llm_load_print_meta: max token length = 256 4月 23 17:39:22 ollama[2069]: llama_model_load: vocab only - skipping tensors 4月 23 17:39:22 ollama[2069]: time=2025-04-23T17:39:22.347+08:00 level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="<|im_start|>system\n你是xxx<|im_end|>\n<|im_start|>user\n你好<|im_end|>\n<|im_start|>assistant\n" 4月 23 17:39:22 ollama[2069]: time=2025-04-23T17:39:22.404+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=1594 used=0 remaining=1594 4月 23 17:39:35 ollama[2069]: [GIN] 2025/04/23 - 17:39:35 | 200 | 42.814µs | 127.0.0.1 | HEAD "/" 4月 23 17:39:35 ollama[2069]: [GIN] 2025/04/23 - 17:39:35 | 200 | 53.549µs | 127.0.0.1 | GET "/api/ps" 4月 23 18:01:48 ollama[2069]: [GIN] 2025/04/23 - 18:01:48 | 200 | 22m31s | 127.0.0.1 | POST "/v1/chat/completions" 4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:466 msg="context for request finished" 4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 duration=5m0s 4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0
Author
Owner

@rick-github commented on GitHub (Apr 23, 2025):

4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16

Runner didn't find any backends. What's the output from

find /usr/local/lib/ollama
<!-- gh-comment-id:2823850008 --> @rick-github commented on GitHub (Apr 23, 2025): ``` 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16 ``` Runner didn't find any backends. What's the output from ``` find /usr/local/lib/ollama ```
Author
Owner

@shaofanqi commented on GitHub (Apr 23, 2025):

4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16

Runner didn't find any backends. What's the output from

find /usr/local/lib/ollama

/usr/local/lib/ollama

this folder is empty

<!-- gh-comment-id:2823907740 --> @shaofanqi commented on GitHub (Apr 23, 2025): > ``` > 4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16 > ``` > > Runner didn't find any backends. What's the output from > > ``` > find /usr/local/lib/ollama > ``` /usr/local/lib/ollama this folder is empty
Author
Owner

@rick-github commented on GitHub (Apr 23, 2025):

You installation is incomplete or otherwise incorrect. The output should be something like:

/usr/local/lib/ollama
/usr/local/lib/ollama/cuda_v11
/usr/local/lib/ollama/cuda_v11/libcudart.so.11.0
/usr/local/lib/ollama/cuda_v11/libcudart.so.11.3.109
/usr/local/lib/ollama/cuda_v11/libcublas.so.11
/usr/local/lib/ollama/cuda_v11/libcublasLt.so.11
/usr/local/lib/ollama/cuda_v11/libggml-cuda.so
/usr/local/lib/ollama/cuda_v11/libcublasLt.so.11.5.1.109
/usr/local/lib/ollama/cuda_v11/libcublas.so.11.5.1.109
/usr/local/lib/ollama/libggml-base.so
/usr/local/lib/ollama/libggml-cpu-sandybridge.so
/usr/local/lib/ollama/libggml-cpu-alderlake.so
/usr/local/lib/ollama/libggml-cpu-haswell.so
/usr/local/lib/ollama/libggml-cpu-skylakex.so
/usr/local/lib/ollama/cuda_v12
/usr/local/lib/ollama/cuda_v12/libcublasLt.so.12
/usr/local/lib/ollama/cuda_v12/libcudart.so.12
/usr/local/lib/ollama/cuda_v12/libggml-cuda.so
/usr/local/lib/ollama/cuda_v12/libcublas.so.12
/usr/local/lib/ollama/cuda_v12/libcublasLt.so.12.8.4.1
/usr/local/lib/ollama/cuda_v12/libcudart.so.12.8.90
/usr/local/lib/ollama/cuda_v12/libcublas.so.12.8.4.1
/usr/local/lib/ollama/libggml-cpu-icelake.so

That is, the directory should contain the CPU and GPU backends for ollama. How did you install ollama?

<!-- gh-comment-id:2823914907 --> @rick-github commented on GitHub (Apr 23, 2025): You installation is incomplete or otherwise incorrect. The output should be something like: ``` /usr/local/lib/ollama /usr/local/lib/ollama/cuda_v11 /usr/local/lib/ollama/cuda_v11/libcudart.so.11.0 /usr/local/lib/ollama/cuda_v11/libcudart.so.11.3.109 /usr/local/lib/ollama/cuda_v11/libcublas.so.11 /usr/local/lib/ollama/cuda_v11/libcublasLt.so.11 /usr/local/lib/ollama/cuda_v11/libggml-cuda.so /usr/local/lib/ollama/cuda_v11/libcublasLt.so.11.5.1.109 /usr/local/lib/ollama/cuda_v11/libcublas.so.11.5.1.109 /usr/local/lib/ollama/libggml-base.so /usr/local/lib/ollama/libggml-cpu-sandybridge.so /usr/local/lib/ollama/libggml-cpu-alderlake.so /usr/local/lib/ollama/libggml-cpu-haswell.so /usr/local/lib/ollama/libggml-cpu-skylakex.so /usr/local/lib/ollama/cuda_v12 /usr/local/lib/ollama/cuda_v12/libcublasLt.so.12 /usr/local/lib/ollama/cuda_v12/libcudart.so.12 /usr/local/lib/ollama/cuda_v12/libggml-cuda.so /usr/local/lib/ollama/cuda_v12/libcublas.so.12 /usr/local/lib/ollama/cuda_v12/libcublasLt.so.12.8.4.1 /usr/local/lib/ollama/cuda_v12/libcudart.so.12.8.90 /usr/local/lib/ollama/cuda_v12/libcublas.so.12.8.4.1 /usr/local/lib/ollama/libggml-cpu-icelake.so ``` That is, the directory should contain the CPU and GPU backends for ollama. How did you install ollama?
Author
Owner

@shaofanqi commented on GitHub (Apr 23, 2025):

sudo lsof -c ollama

lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/128/gvfs
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.portal file system /run/user/128/doc
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ollama 2069 ollama cwd DIR 259,3 4096 2 /
ollama 2069 ollama rtd DIR 259,3 4096 2 /
ollama 2069 ollama txt REG 259,3 30038968 36732173 /usr/local/bin/ollama
ollama 2069 ollama mem REG 259,3 28392536 36599527 /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
ollama 2069 ollama mem CHR 195,1 984 /dev/nvidia1
ollama 2069 ollama mem CHR 195,0 983 /dev/nvidia0
ollama 2069 ollama mem REG 259,3 2220400 36571227 /usr/lib/x86_64-linux-gnu/libc.so.6
ollama 2069 ollama mem REG 259,3 2260296 36597726 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
ollama 2069 ollama mem REG 259,3 125488 36596952 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
ollama 2069 ollama mem REG 259,3 940560 36571232 /usr/lib/x86_64-linux-gnu/libm.so.6
ollama 2069 ollama mem REG 259,3 14664 36571276 /usr/lib/x86_64-linux-gnu/librt.so.1
ollama 2069 ollama mem REG 259,3 21448 36571274 /usr/lib/x86_64-linux-gnu/libpthread.so.0
ollama 2069 ollama mem REG 259,3 14432 36571230 /usr/lib/x86_64-linux-gnu/libdl.so.2
ollama 2069 ollama mem REG 259,3 68552 36571275 /usr/lib/x86_64-linux-gnu/libresolv.so.2
ollama 2069 ollama mem REG 259,3 240936 36571221 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
ollama 2069 ollama 0r CHR 1,3 0t0 5 /dev/null
ollama 2069 ollama 1u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM
ollama 2069 ollama 2u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM
ollama 2069 ollama 3u IPv4 48380 0t0 TCP localhost:11434 (LISTEN)
ollama 2069 ollama 4u a_inode 0,15 0 1049 [eventpoll]
ollama 2069 ollama 5u a_inode 0,15 0 1049 [eventfd]
ollama 2069 ollama 6u IPv4 57398 0t0 TCP localhost:11434->localhost:42744 (ESTABLISHED)
ollama 2069 ollama 7u a_inode 0,15 0 1049 [eventfd]
ollama 2069 ollama 8r FIFO 0,14 0t0 33089 pipe
ollama 2069 ollama 9w FIFO 0,14 0t0 33089 pipe
ollama 2069 ollama 10r FIFO 0,14 0t0 33090 pipe
ollama 2069 ollama 11w FIFO 0,14 0t0 33090 pipe
ollama 2069 ollama 12u CHR 195,255 0t0 982 /dev/nvidiactl
ollama 2069 ollama 13u CHR 234,0 0t0 985 /dev/nvidia-uvm
ollama 2069 ollama 14u CHR 195,0 0t0 983 /dev/nvidia0
ollama 2069 ollama 15u CHR 195,1 0t0 984 /dev/nvidia1
ollama 2069 ollama 16u CHR 195,0 0t0 983 /dev/nvidia0
ollama 2069 ollama 17u CHR 195,0 0t0 983 /dev/nvidia0
ollama 2069 ollama 18u CHR 195,1 0t0 984 /dev/nvidia1
ollama 2069 ollama 19u CHR 195,1 0t0 984 /dev/nvidia1
ollama 2069 ollama 20u unix 0xffffa097cadf7700 0t0 33097 @cuda-uvmfd-4026531836-2069@ type=SEQPACKET
ollama 2069 ollama 22r FIFO 0,14 0t0 36779 pipe
ollama 2069 ollama 26u a_inode 0,15 0 1049 [pidfd]
ollama 16277 ollama cwd DIR 259,3 4096 2 /
ollama 16277 ollama rtd DIR 259,3 4096 2 /
ollama 16277 ollama txt REG 259,3 30038968 36732173 /usr/local/bin/ollama
ollama 16277 ollama mem REG 259,3 34820885056 36603946 /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
ollama 16277 ollama mem REG 259,3 2220400 36571227 /usr/lib/x86_64-linux-gnu/libc.so.6
ollama 16277 ollama mem REG 259,3 2260296 36597726 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
ollama 16277 ollama mem REG 259,3 125488 36596952 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
ollama 16277 ollama mem REG 259,3 940560 36571232 /usr/lib/x86_64-linux-gnu/libm.so.6
ollama 16277 ollama mem REG 259,3 14664 36571276 /usr/lib/x86_64-linux-gnu/librt.so.1
ollama 16277 ollama mem REG 259,3 21448 36571274 /usr/lib/x86_64-linux-gnu/libpthread.so.0
ollama 16277 ollama mem REG 259,3 14432 36571230 /usr/lib/x86_64-linux-gnu/libdl.so.2
ollama 16277 ollama mem REG 259,3 68552 36571275 /usr/lib/x86_64-linux-gnu/libresolv.so.2
ollama 16277 ollama mem REG 259,3 240936 36571221 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
ollama 16277 ollama 0r CHR 1,3 0t0 5 /dev/null
ollama 16277 ollama 1u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM
ollama 16277 ollama 2w FIFO 0,14 0t0 36779 pipe
ollama 16277 ollama 3u IPv4 36782 0t0 TCP localhost:44491 (LISTEN)
ollama 16277 ollama 4u a_inode 0,15 0 1049 [eventpoll]
ollama 16277 ollama 5u a_inode 0,15 0 1049 [eventfd]

You installation is incomplete or otherwise incorrect. The output should be something like:

/usr/local/lib/ollama
/usr/local/lib/ollama/cuda_v11
/usr/local/lib/ollama/cuda_v11/libcudart.so.11.0
/usr/local/lib/ollama/cuda_v11/libcudart.so.11.3.109
/usr/local/lib/ollama/cuda_v11/libcublas.so.11
/usr/local/lib/ollama/cuda_v11/libcublasLt.so.11
/usr/local/lib/ollama/cuda_v11/libggml-cuda.so
/usr/local/lib/ollama/cuda_v11/libcublasLt.so.11.5.1.109
/usr/local/lib/ollama/cuda_v11/libcublas.so.11.5.1.109
/usr/local/lib/ollama/libggml-base.so
/usr/local/lib/ollama/libggml-cpu-sandybridge.so
/usr/local/lib/ollama/libggml-cpu-alderlake.so
/usr/local/lib/ollama/libggml-cpu-haswell.so
/usr/local/lib/ollama/libggml-cpu-skylakex.so
/usr/local/lib/ollama/cuda_v12
/usr/local/lib/ollama/cuda_v12/libcublasLt.so.12
/usr/local/lib/ollama/cuda_v12/libcudart.so.12
/usr/local/lib/ollama/cuda_v12/libggml-cuda.so
/usr/local/lib/ollama/cuda_v12/libcublas.so.12
/usr/local/lib/ollama/cuda_v12/libcublasLt.so.12.8.4.1
/usr/local/lib/ollama/cuda_v12/libcudart.so.12.8.90
/usr/local/lib/ollama/cuda_v12/libcublas.so.12.8.4.1
/usr/local/lib/ollama/libggml-cpu-icelake.so

That is, the directory should contain the CPU and GPU backends for ollama. How did you install ollama?

It works fine this morning, I use /set parameter num_ctx 131072 this noon and ollama run at cpu and gpu mode. When i use ctrl+d and run /set parameter num_ctx 3000 again, it fails to use gpu.

<!-- gh-comment-id:2823922639 --> @shaofanqi commented on GitHub (Apr 23, 2025): sudo lsof -c ollama lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/128/gvfs Output information may be incomplete. lsof: WARNING: can't stat() fuse.portal file system /run/user/128/doc Output information may be incomplete. lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs Output information may be incomplete. lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc Output information may be incomplete. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ollama 2069 ollama cwd DIR 259,3 4096 2 / ollama 2069 ollama rtd DIR 259,3 4096 2 / ollama 2069 ollama txt REG 259,3 30038968 36732173 /usr/local/bin/ollama ollama 2069 ollama mem REG 259,3 28392536 36599527 /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 ollama 2069 ollama mem CHR 195,1 984 /dev/nvidia1 ollama 2069 ollama mem CHR 195,0 983 /dev/nvidia0 ollama 2069 ollama mem REG 259,3 2220400 36571227 /usr/lib/x86_64-linux-gnu/libc.so.6 ollama 2069 ollama mem REG 259,3 2260296 36597726 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 ollama 2069 ollama mem REG 259,3 125488 36596952 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 ollama 2069 ollama mem REG 259,3 940560 36571232 /usr/lib/x86_64-linux-gnu/libm.so.6 ollama 2069 ollama mem REG 259,3 14664 36571276 /usr/lib/x86_64-linux-gnu/librt.so.1 ollama 2069 ollama mem REG 259,3 21448 36571274 /usr/lib/x86_64-linux-gnu/libpthread.so.0 ollama 2069 ollama mem REG 259,3 14432 36571230 /usr/lib/x86_64-linux-gnu/libdl.so.2 ollama 2069 ollama mem REG 259,3 68552 36571275 /usr/lib/x86_64-linux-gnu/libresolv.so.2 ollama 2069 ollama mem REG 259,3 240936 36571221 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ollama 2069 ollama 0r CHR 1,3 0t0 5 /dev/null ollama 2069 ollama 1u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM ollama 2069 ollama 2u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM ollama 2069 ollama 3u IPv4 48380 0t0 TCP localhost:11434 (LISTEN) ollama 2069 ollama 4u a_inode 0,15 0 1049 [eventpoll] ollama 2069 ollama 5u a_inode 0,15 0 1049 [eventfd] ollama 2069 ollama 6u IPv4 57398 0t0 TCP localhost:11434->localhost:42744 (ESTABLISHED) ollama 2069 ollama 7u a_inode 0,15 0 1049 [eventfd] ollama 2069 ollama 8r FIFO 0,14 0t0 33089 pipe ollama 2069 ollama 9w FIFO 0,14 0t0 33089 pipe ollama 2069 ollama 10r FIFO 0,14 0t0 33090 pipe ollama 2069 ollama 11w FIFO 0,14 0t0 33090 pipe ollama 2069 ollama 12u CHR 195,255 0t0 982 /dev/nvidiactl ollama 2069 ollama 13u CHR 234,0 0t0 985 /dev/nvidia-uvm ollama 2069 ollama 14u CHR 195,0 0t0 983 /dev/nvidia0 ollama 2069 ollama 15u CHR 195,1 0t0 984 /dev/nvidia1 ollama 2069 ollama 16u CHR 195,0 0t0 983 /dev/nvidia0 ollama 2069 ollama 17u CHR 195,0 0t0 983 /dev/nvidia0 ollama 2069 ollama 18u CHR 195,1 0t0 984 /dev/nvidia1 ollama 2069 ollama 19u CHR 195,1 0t0 984 /dev/nvidia1 ollama 2069 ollama 20u unix 0xffffa097cadf7700 0t0 33097 @cuda-uvmfd-4026531836-2069@ type=SEQPACKET ollama 2069 ollama 22r FIFO 0,14 0t0 36779 pipe ollama 2069 ollama 26u a_inode 0,15 0 1049 [pidfd] ollama 16277 ollama cwd DIR 259,3 4096 2 / ollama 16277 ollama rtd DIR 259,3 4096 2 / ollama 16277 ollama txt REG 259,3 30038968 36732173 /usr/local/bin/ollama ollama 16277 ollama mem REG 259,3 34820885056 36603946 /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 ollama 16277 ollama mem REG 259,3 2220400 36571227 /usr/lib/x86_64-linux-gnu/libc.so.6 ollama 16277 ollama mem REG 259,3 2260296 36597726 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 ollama 16277 ollama mem REG 259,3 125488 36596952 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 ollama 16277 ollama mem REG 259,3 940560 36571232 /usr/lib/x86_64-linux-gnu/libm.so.6 ollama 16277 ollama mem REG 259,3 14664 36571276 /usr/lib/x86_64-linux-gnu/librt.so.1 ollama 16277 ollama mem REG 259,3 21448 36571274 /usr/lib/x86_64-linux-gnu/libpthread.so.0 ollama 16277 ollama mem REG 259,3 14432 36571230 /usr/lib/x86_64-linux-gnu/libdl.so.2 ollama 16277 ollama mem REG 259,3 68552 36571275 /usr/lib/x86_64-linux-gnu/libresolv.so.2 ollama 16277 ollama mem REG 259,3 240936 36571221 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ollama 16277 ollama 0r CHR 1,3 0t0 5 /dev/null ollama 16277 ollama 1u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM ollama 16277 ollama 2w FIFO 0,14 0t0 36779 pipe ollama 16277 ollama 3u IPv4 36782 0t0 TCP localhost:44491 (LISTEN) ollama 16277 ollama 4u a_inode 0,15 0 1049 [eventpoll] ollama 16277 ollama 5u a_inode 0,15 0 1049 [eventfd] > You installation is incomplete or otherwise incorrect. The output should be something like: > > ``` > /usr/local/lib/ollama > /usr/local/lib/ollama/cuda_v11 > /usr/local/lib/ollama/cuda_v11/libcudart.so.11.0 > /usr/local/lib/ollama/cuda_v11/libcudart.so.11.3.109 > /usr/local/lib/ollama/cuda_v11/libcublas.so.11 > /usr/local/lib/ollama/cuda_v11/libcublasLt.so.11 > /usr/local/lib/ollama/cuda_v11/libggml-cuda.so > /usr/local/lib/ollama/cuda_v11/libcublasLt.so.11.5.1.109 > /usr/local/lib/ollama/cuda_v11/libcublas.so.11.5.1.109 > /usr/local/lib/ollama/libggml-base.so > /usr/local/lib/ollama/libggml-cpu-sandybridge.so > /usr/local/lib/ollama/libggml-cpu-alderlake.so > /usr/local/lib/ollama/libggml-cpu-haswell.so > /usr/local/lib/ollama/libggml-cpu-skylakex.so > /usr/local/lib/ollama/cuda_v12 > /usr/local/lib/ollama/cuda_v12/libcublasLt.so.12 > /usr/local/lib/ollama/cuda_v12/libcudart.so.12 > /usr/local/lib/ollama/cuda_v12/libggml-cuda.so > /usr/local/lib/ollama/cuda_v12/libcublas.so.12 > /usr/local/lib/ollama/cuda_v12/libcublasLt.so.12.8.4.1 > /usr/local/lib/ollama/cuda_v12/libcudart.so.12.8.90 > /usr/local/lib/ollama/cuda_v12/libcublas.so.12.8.4.1 > /usr/local/lib/ollama/libggml-cpu-icelake.so > ``` > > That is, the directory should contain the CPU and GPU backends for ollama. How did you install ollama? It works fine this morning, I use /set parameter num_ctx 131072 this noon and ollama run at cpu and gpu mode. When i use ctrl+d and run /set parameter num_ctx 3000 again, it fails to use gpu.
Author
Owner

@rick-github commented on GitHub (Apr 23, 2025):

I use /set parameter num_ctx 131072 this noon and ollama run at cpu and gpu mode

How do you know this?

<!-- gh-comment-id:2823935359 --> @rick-github commented on GitHub (Apr 23, 2025): > I use /set parameter num_ctx 131072 this noon and ollama run at cpu and gpu mode How do you know this?
Author
Owner

@shaofanqi commented on GitHub (Apr 23, 2025):

I use /set parameter num_ctx 131072 this noon and ollama run at cpu and gpu mode

How do you know this?

use ollama ps, it shows cpu 57% gpu 43%

<!-- gh-comment-id:2823940495 --> @shaofanqi commented on GitHub (Apr 23, 2025): > > I use /set parameter num_ctx 131072 this noon and ollama run at cpu and gpu mode > > How do you know this? use ollama ps, it shows cpu 57% gpu 43%
Author
Owner

@rick-github commented on GitHub (Apr 23, 2025):

ollama ps is what ollama expects the split to be, based on being able load a GPU backend. But if the runner can't find a backend, it will fall back to just CPU, and the output from ollama ps will be incorrect. You need to check the logs to determine whether the runner did in fact load 43% of the model in VRAM.

<!-- gh-comment-id:2823948245 --> @rick-github commented on GitHub (Apr 23, 2025): `ollama ps` is what ollama expects the split to be, based on being able load a GPU backend. But if the runner can't find a backend, it will fall back to just CPU, and the output from `ollama ps` will be incorrect. You need to check the logs to determine whether the runner did in fact load 43% of the model in VRAM.
Author
Owner

@shaofanqi commented on GitHub (Apr 23, 2025):

ollama ps is what ollama expects the split to be, based on being able load a GPU backend. But if the runner can't find a backend, it will fall back to just CPU, and the output from ollama ps will be incorrect. You need to check the logs to determine whether the runner did in fact load 43% of the model in VRAM.

This is the log

4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.688+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:11 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:11 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:11 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:11 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:11 ollama[2186]: calling cuInit
4月 23 16:32:11 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:11 ollama[2186]: raw version 0x2f08
4月 23 16:32:11 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:11 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:11 ollama[2186]: device count 2
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.777+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.865+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:11 ollama[2186]: releasing cuda driver library
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.865+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:11 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:11 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:11 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:11 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:11 ollama[2186]: calling cuInit
4月 23 16:32:11 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:11 ollama[2186]: raw version 0x2f08
4月 23 16:32:11 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:11 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:11 ollama[2186]: device count 2
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.937+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.452710664 model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.952+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.041+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:12 ollama[2186]: releasing cuda driver library
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.041+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:12 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:12 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:12 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:12 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:12 ollama[2186]: calling cuInit
4月 23 16:32:12 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:12 ollama[2186]: raw version 0x2f08
4月 23 16:32:12 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:12 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:12 ollama[2186]: device count 2
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.078+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.078+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-10670930-fdf7-6410-320c-f8808f4aac75 library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="20.2 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="41.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="19.8 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="41.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.084+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.085+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-10670930-fdf7-6410-320c-f8808f4aac75 library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="20.2 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="51.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="19.8 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="51.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.131+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:12 ollama[2186]: releasing cuda driver library
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.733618916 model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:12 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:12 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:12 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:12 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:12 ollama[2186]: calling cuInit
4月 23 16:32:12 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:12 ollama[2186]: raw version 0x2f08
4月 23 16:32:12 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:12 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:12 ollama[2186]: device count 2
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.305+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:12 ollama[2186]: releasing cuda driver library
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=INFO source=server.go:104 msg="system memory" total="251.5 GiB" free="243.1 GiB" free_swap="1.4 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[19.8 GiB 20.2 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=11 layers.split=5,6 memory.available="[19.8 GiB 20.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="92.1 GiB" memory.required.partial="39.2 GiB" memory.required.kv="32.0 GiB" memory.required.allocations="[19.1 GiB 20.1 GiB]" memory.weights.total="62.9 GiB" memory.weights.repeating="62.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="12.8 GiB" memory.graph.partial="12.8 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=DEBUG source=gpu.go:713 msg="no filter required for library cpu"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 --ctx-size 131072 --batch-size 512 --n-gpu-layers 11 --verbose --threads 16 --parallel 1 --tensor-split 5,6 --port 39391"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=DEBUG source=server.go:393 msg=subprocess environment="[PATH=/home/eppei/miniconda3/envs/vllm/bin:/home/eppei/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama:/usr/local/bin]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:936 msg="starting go runner"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:39391"
4月 23 16:32:12 ollama[2186]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest))
4月 23 16:32:12 ollama[2186]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 0: general.architecture str = qwen2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 1: general.type str = model
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 2: general.name str = QwQ 32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 3: general.basename str = QwQ
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 4: general.size_label str = 32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 5: general.license str = apache-2.0
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 7: general.base_model.count u32 = 1
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"]
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 31: general.quantization_version u32 = 2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 32: general.file_type u32 = 7
4月 23 16:32:12 ollama[2186]: llama_model_loader: - type f32: 321 tensors
4月 23 16:32:12 ollama[2186]: llama_model_loader: - type q8_0: 450 tensors
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.646+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: special tokens cache size = 26
4月 23 16:32:13 ollama[2186]: llm_load_vocab: token to piece cache size = 0.9311 MB
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: format = GGUF V3 (latest)
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: arch = qwen2
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: vocab type = BPE
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_vocab = 152064
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_merges = 151387
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: vocab_only = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ctx_train = 131072
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd = 5120
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_layer = 64
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_head = 40
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_head_kv = 8
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_rot = 128
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_swa = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_head_k = 128
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_head_v = 128
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_gqa = 5
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_k_gqa = 1024
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_v_gqa = 1024
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_norm_eps = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_logit_scale = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ff = 27648
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_expert = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_expert_used = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: causal attn = 1
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: pooling type = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope type = 2
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope scaling = linear
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: freq_base_train = 1000000.0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: freq_scale_train = 1
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ctx_orig_yarn = 131072
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope_finetuned = unknown
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_conv = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_inner = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_state = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_dt_rank = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_dt_b_c_rms = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model type = 32B
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model ftype = Q8_0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model params = 32.76 B
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW)
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: general.name = QwQ 32B
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOS token = 151645 '<|im_end|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOT token = 151645 '<|im_end|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: LF token = 148848 'ÄĬ'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151645 '<|im_end|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: max token length = 256
4月 23 16:32:13 ollama[2186]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 770 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
4月 23 16:32:14 ollama[2186]: llm_load_tensors: CPU_Mapped model buffer size = 33202.08 MiB
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_seq_max = 1
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ctx = 131072
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ctx_per_seq = 131072
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_batch = 512
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ubatch = 512
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: flash_attn = 0
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: freq_base = 1000000.0
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: freq_scale = 1
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: kv_size = 131072, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: time=2025-04-23T16:32:14.906+08:00 level=DEBUG source=server.go:600 msg="model load progress 1.00"
4月 23 16:32:15 ollama[2186]: time=2025-04-23T16:32:15.158+08:00 level=DEBUG source=server.go:603 msg="model load completed, waiting for server to become available" status="llm server loading model"
4月 23 16:32:31 ollama[2186]: llama_kv_cache_init: CPU KV buffer size = 32768.00 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: KV self size = 32768.00 MiB, K (f16): 16384.00 MiB, V (f16): 16384.00 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: CPU output buffer size = 0.60 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: CPU compute buffer size = 10536.01 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: graph nodes = 2246
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: graph splits = 1
4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.489+08:00 level=INFO source=server.go:594 msg="llama runner started in 19.09 seconds"
4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.489+08:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: 使提出相应意见建议<|im_end|>\n<|im_start|>assistant\n"
4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.889+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=70179 used=0 remaining=70179
4月 23 16:33:57 ollama[2186]: [GIN] 2025/04/23 - 16:33:57 | 200 | 38.816µs | 127.0.0.1 | HEAD "/"
4月 23 16:33:57 ollama[2186]: [GIN] 2025/04/23 - 16:33:57 | 200 | 63.861µs | 127.0.0.1 | GET "/api/ps"
4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:466 msg="context for request finished"
4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 duration=5m0s
4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0
4月 23 16:34:30 ollama[2186]: [GIN] 2025/04/23 - 16:34:30 | 200 | 2m24s | 127.0.0.1 | POST "/api/chat"
4月 23 16:34:35 ollama[2186]: [GIN] 2025/04/23 - 16:34:35 | 200 | 35.318µs | 127.0.0.1 | HEAD "/"
4月 23 16:34:35 ollama[2186]: [GIN] 2025/04/23 - 16:34:35 | 200 | 39.087821ms | 127.0.0.1 | POST "/api/show"
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:283 msg="resetting model to expire immediately to make room" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:296 msg="waiting for pending requests to complete and unload to occur" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="210.8 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:35 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:35 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:35 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:35 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:35 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:35 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:35 ollama[2186]: calling cuInit
4月 23 16:34:35 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:35 ollama[2186]: raw version 0x2f08
4月 23 16:34:35 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:35 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:35 ollama[2186]: device count 2
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.075+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.218+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:36 ollama[2186]: releasing cuda driver library
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.218+08:00 level=DEBUG source=server.go:1079 msg="stopping llama server"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.219+08:00 level=DEBUG source=server.go:1085 msg="waiting for llama server to exit"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="210.8 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="216.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:36 ollama[2186]: calling cuInit
4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:36 ollama[2186]: raw version 0x2f08
4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:36 ollama[2186]: device count 2
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.566+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.667+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:36 ollama[2186]: releasing cuda driver library
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.720+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="216.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="222.2 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:36 ollama[2186]: calling cuInit
4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:36 ollama[2186]: raw version 0x2f08
4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:36 ollama[2186]: device count 2
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.817+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.910+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:36 ollama[2186]: releasing cuda driver library
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.969+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="222.2 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="227.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:36 ollama[2186]: calling cuInit
4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:36 ollama[2186]: raw version 0x2f08
4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:36 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.066+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.160+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="227.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="231.8 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.328+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.415+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.470+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="231.8 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="236.6 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.566+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.659+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="236.6 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="241.4 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.811+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.898+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="241.4 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.061+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.152+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.306+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.391+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.562+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.653+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.687+08:00 level=DEBUG source=server.go:1089 msg="llama server stopped"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.687+08:00 level=DEBUG source=sched.go:380 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.814+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.904+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.059+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.143+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:39 ollama[2186]: calling cuInit
4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:39 ollama[2186]: raw version 0x2f08
4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:39 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.316+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.408+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:39 ollama[2186]: calling cuInit
4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:39 ollama[2186]: raw version 0x2f08
4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:39 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.560+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.651+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:39 ollama[2186]: calling cuInit
4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:39 ollama[2186]: raw version 0x2f08
4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:39 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.807+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.891+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"

<!-- gh-comment-id:2824117096 --> @shaofanqi commented on GitHub (Apr 23, 2025): > `ollama ps` is what ollama expects the split to be, based on being able load a GPU backend. But if the runner can't find a backend, it will fall back to just CPU, and the output from `ollama ps` will be incorrect. You need to check the logs to determine whether the runner did in fact load 43% of the model in VRAM. This is the log 4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.688+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:32:11 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:32:11 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:32:11 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:32:11 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:32:11 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:32:11 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:32:11 ollama[2186]: calling cuInit 4月 23 16:32:11 ollama[2186]: calling cuDriverGetVersion 4月 23 16:32:11 ollama[2186]: raw version 0x2f08 4月 23 16:32:11 ollama[2186]: CUDA driver version: 12.4 4月 23 16:32:11 ollama[2186]: calling cuDeviceGetCount 4月 23 16:32:11 ollama[2186]: device count 2 4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.777+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.865+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:32:11 ollama[2186]: releasing cuda driver library 4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.865+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:32:11 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:32:11 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:32:11 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:32:11 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:32:11 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:32:11 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:32:11 ollama[2186]: calling cuInit 4月 23 16:32:11 ollama[2186]: calling cuDriverGetVersion 4月 23 16:32:11 ollama[2186]: raw version 0x2f08 4月 23 16:32:11 ollama[2186]: CUDA driver version: 12.4 4月 23 16:32:11 ollama[2186]: calling cuDeviceGetCount 4月 23 16:32:11 ollama[2186]: device count 2 4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.937+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.452710664 model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.952+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.041+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:32:12 ollama[2186]: releasing cuda driver library 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.041+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:32:12 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:32:12 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:32:12 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:32:12 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:32:12 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:32:12 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:32:12 ollama[2186]: calling cuInit 4月 23 16:32:12 ollama[2186]: calling cuDriverGetVersion 4月 23 16:32:12 ollama[2186]: raw version 0x2f08 4月 23 16:32:12 ollama[2186]: CUDA driver version: 12.4 4月 23 16:32:12 ollama[2186]: calling cuDeviceGetCount 4月 23 16:32:12 ollama[2186]: device count 2 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.078+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.078+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-10670930-fdf7-6410-320c-f8808f4aac75 library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="20.2 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="41.0 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="19.8 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="41.0 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.084+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.085+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-10670930-fdf7-6410-320c-f8808f4aac75 library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="20.2 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="51.0 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="19.8 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="51.0 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.131+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:32:12 ollama[2186]: releasing cuda driver library 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.733618916 model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:32:12 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:32:12 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:32:12 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:32:12 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:32:12 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:32:12 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:32:12 ollama[2186]: calling cuInit 4月 23 16:32:12 ollama[2186]: calling cuDriverGetVersion 4月 23 16:32:12 ollama[2186]: raw version 0x2f08 4月 23 16:32:12 ollama[2186]: CUDA driver version: 12.4 4月 23 16:32:12 ollama[2186]: calling cuDeviceGetCount 4月 23 16:32:12 ollama[2186]: device count 2 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.305+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:32:12 ollama[2186]: releasing cuda driver library 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=INFO source=server.go:104 msg="system memory" total="251.5 GiB" free="243.1 GiB" free_swap="1.4 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[19.8 GiB 20.2 GiB]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=11 layers.split=5,6 memory.available="[19.8 GiB 20.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="92.1 GiB" memory.required.partial="39.2 GiB" memory.required.kv="32.0 GiB" memory.required.allocations="[19.1 GiB 20.1 GiB]" memory.weights.total="62.9 GiB" memory.weights.repeating="62.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="12.8 GiB" memory.graph.partial="12.8 GiB" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=DEBUG source=gpu.go:713 msg="no filter required for library cpu" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 --ctx-size 131072 --batch-size 512 --n-gpu-layers 11 --verbose --threads 16 --parallel 1 --tensor-split 5,6 --port 39391" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=DEBUG source=server.go:393 msg=subprocess environment="[PATH=/home/eppei/miniconda3/envs/vllm/bin:/home/eppei/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama:/usr/local/bin]" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:936 msg="starting go runner" 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:39391" 4月 23 16:32:12 ollama[2186]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest)) 4月 23 16:32:12 ollama[2186]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 0: general.architecture str = qwen2 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 1: general.type str = model 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 2: general.name str = QwQ 32B 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 3: general.basename str = QwQ 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 4: general.size_label str = 32B 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 5: general.license str = apache-2.0 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b... 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 7: general.base_model.count u32 = 1 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"] 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"] 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 31: general.quantization_version u32 = 2 4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 32: general.file_type u32 = 7 4月 23 16:32:12 ollama[2186]: llama_model_loader: - type f32: 321 tensors 4月 23 16:32:12 ollama[2186]: llama_model_loader: - type q8_0: 450 tensors 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.646+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model" 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG 4月 23 16:32:12 ollama[2186]: llm_load_vocab: special tokens cache size = 26 4月 23 16:32:13 ollama[2186]: llm_load_vocab: token to piece cache size = 0.9311 MB 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: format = GGUF V3 (latest) 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: arch = qwen2 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: vocab type = BPE 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_vocab = 152064 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_merges = 151387 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: vocab_only = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ctx_train = 131072 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd = 5120 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_layer = 64 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_head = 40 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_head_kv = 8 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_rot = 128 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_swa = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_head_k = 128 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_head_v = 128 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_gqa = 5 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_k_gqa = 1024 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_v_gqa = 1024 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_norm_eps = 0.0e+00 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_logit_scale = 0.0e+00 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ff = 27648 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_expert = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_expert_used = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: causal attn = 1 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: pooling type = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope type = 2 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope scaling = linear 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: freq_base_train = 1000000.0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: freq_scale_train = 1 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ctx_orig_yarn = 131072 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope_finetuned = unknown 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_conv = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_inner = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_state = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_dt_rank = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_dt_b_c_rms = 0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model type = 32B 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model ftype = Q8_0 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model params = 32.76 B 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW) 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: general.name = QwQ 32B 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOS token = 151645 '<|im_end|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOT token = 151645 '<|im_end|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: LF token = 148848 'ÄĬ' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151645 '<|im_end|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' 4月 23 16:32:13 ollama[2186]: llm_load_print_meta: max token length = 256 4月 23 16:32:13 ollama[2186]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 770 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead 4月 23 16:32:14 ollama[2186]: llm_load_tensors: CPU_Mapped model buffer size = 33202.08 MiB 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_seq_max = 1 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ctx = 131072 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ctx_per_seq = 131072 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_batch = 512 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ubatch = 512 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: flash_attn = 0 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: freq_base = 1000000.0 4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: freq_scale = 1 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: kv_size = 131072, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 4月 23 16:32:14 ollama[2186]: time=2025-04-23T16:32:14.906+08:00 level=DEBUG source=server.go:600 msg="model load progress 1.00" 4月 23 16:32:15 ollama[2186]: time=2025-04-23T16:32:15.158+08:00 level=DEBUG source=server.go:603 msg="model load completed, waiting for server to become available" status="llm server loading model" 4月 23 16:32:31 ollama[2186]: llama_kv_cache_init: CPU KV buffer size = 32768.00 MiB 4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: KV self size = 32768.00 MiB, K (f16): 16384.00 MiB, V (f16): 16384.00 MiB 4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: CPU output buffer size = 0.60 MiB 4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: CPU compute buffer size = 10536.01 MiB 4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: graph nodes = 2246 4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: graph splits = 1 4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.489+08:00 level=INFO source=server.go:594 msg="llama runner started in 19.09 seconds" 4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.489+08:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:32:31 ollama[2186]: [48.0K blob data] 4月 23 16:32:31 ollama[2186]: [48.0K blob data] 4月 23 16:32:31 ollama[2186]: [48.0K blob data] 4月 23 16:32:31 ollama[2186]: [48.0K blob data] 4月 23 16:32:31 ollama[2186]: [48.0K blob data] 4月 23 16:32:31 ollama[2186]: 使提出相应意见建议<|im_end|>\n<|im_start|>assistant\n" 4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.889+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=70179 used=0 remaining=70179 4月 23 16:33:57 ollama[2186]: [GIN] 2025/04/23 - 16:33:57 | 200 | 38.816µs | 127.0.0.1 | HEAD "/" 4月 23 16:33:57 ollama[2186]: [GIN] 2025/04/23 - 16:33:57 | 200 | 63.861µs | 127.0.0.1 | GET "/api/ps" 4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:466 msg="context for request finished" 4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 duration=5m0s 4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0 4月 23 16:34:30 ollama[2186]: [GIN] 2025/04/23 - 16:34:30 | 200 | 2m24s | 127.0.0.1 | POST "/api/chat" 4月 23 16:34:35 ollama[2186]: [GIN] 2025/04/23 - 16:34:35 | 200 | 35.318µs | 127.0.0.1 | HEAD "/" 4月 23 16:34:35 ollama[2186]: [GIN] 2025/04/23 - 16:34:35 | 200 | 39.087821ms | 127.0.0.1 | POST "/api/show" 4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:283 msg="resetting model to expire immediately to make room" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0 4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:296 msg="waiting for pending requests to complete and unload to occur" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="210.8 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:35 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:35 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:35 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:35 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:35 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:35 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:35 ollama[2186]: calling cuInit 4月 23 16:34:35 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:35 ollama[2186]: raw version 0x2f08 4月 23 16:34:35 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:35 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:35 ollama[2186]: device count 2 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.075+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.218+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:36 ollama[2186]: releasing cuda driver library 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.218+08:00 level=DEBUG source=server.go:1079 msg="stopping llama server" 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.219+08:00 level=DEBUG source=server.go:1085 msg="waiting for llama server to exit" 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="210.8 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="216.9 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:36 ollama[2186]: calling cuInit 4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:36 ollama[2186]: raw version 0x2f08 4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:36 ollama[2186]: device count 2 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.566+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.667+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:36 ollama[2186]: releasing cuda driver library 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.720+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="216.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="222.2 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:36 ollama[2186]: calling cuInit 4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:36 ollama[2186]: raw version 0x2f08 4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:36 ollama[2186]: device count 2 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.817+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.910+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:36 ollama[2186]: releasing cuda driver library 4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.969+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="222.2 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="227.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:36 ollama[2186]: calling cuInit 4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:36 ollama[2186]: raw version 0x2f08 4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:36 ollama[2186]: device count 2 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.066+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.160+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:37 ollama[2186]: releasing cuda driver library 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="227.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="231.8 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:37 ollama[2186]: calling cuInit 4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:37 ollama[2186]: raw version 0x2f08 4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:37 ollama[2186]: device count 2 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.328+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.415+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:37 ollama[2186]: releasing cuda driver library 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.470+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="231.8 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="236.6 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:37 ollama[2186]: calling cuInit 4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:37 ollama[2186]: raw version 0x2f08 4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:37 ollama[2186]: device count 2 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.566+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.659+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:37 ollama[2186]: releasing cuda driver library 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="236.6 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="241.4 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:37 ollama[2186]: calling cuInit 4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:37 ollama[2186]: raw version 0x2f08 4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:37 ollama[2186]: device count 2 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.811+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.898+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:37 ollama[2186]: releasing cuda driver library 4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="241.4 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:37 ollama[2186]: calling cuInit 4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:37 ollama[2186]: raw version 0x2f08 4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:37 ollama[2186]: device count 2 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.061+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.152+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:38 ollama[2186]: releasing cuda driver library 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:38 ollama[2186]: calling cuInit 4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:38 ollama[2186]: raw version 0x2f08 4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:38 ollama[2186]: device count 2 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.306+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.391+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:38 ollama[2186]: releasing cuda driver library 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:38 ollama[2186]: calling cuInit 4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:38 ollama[2186]: raw version 0x2f08 4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:38 ollama[2186]: device count 2 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.562+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.653+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:38 ollama[2186]: releasing cuda driver library 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.687+08:00 level=DEBUG source=server.go:1089 msg="llama server stopped" 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.687+08:00 level=DEBUG source=sched.go:380 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:38 ollama[2186]: calling cuInit 4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:38 ollama[2186]: raw version 0x2f08 4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:38 ollama[2186]: device count 2 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.814+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.904+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:38 ollama[2186]: releasing cuda driver library 4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:38 ollama[2186]: calling cuInit 4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:38 ollama[2186]: raw version 0x2f08 4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:38 ollama[2186]: device count 2 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.059+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.143+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:39 ollama[2186]: releasing cuda driver library 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:39 ollama[2186]: calling cuInit 4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:39 ollama[2186]: raw version 0x2f08 4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:39 ollama[2186]: device count 2 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.316+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.408+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:39 ollama[2186]: releasing cuda driver library 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:39 ollama[2186]: calling cuInit 4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:39 ollama[2186]: raw version 0x2f08 4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:39 ollama[2186]: device count 2 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.560+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.651+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:39 ollama[2186]: releasing cuda driver library 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB" 4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15 4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0 4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90 4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70 4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210 4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190 4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0 4月 23 16:34:39 ollama[2186]: calling cuInit 4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion 4月 23 16:34:39 ollama[2186]: raw version 0x2f08 4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4 4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount 4月 23 16:34:39 ollama[2186]: device count 2 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.807+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB" 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.891+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB" 4月 23 16:34:39 ollama[2186]: releasing cuda driver library 4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
Author
Owner

@shaofanqi commented on GitHub (Apr 23, 2025):

ollama ps is what ollama expects the split to be, based on being able load a GPU backend. But if the runner can't find a backend, it will fall back to just CPU, and the output from ollama ps will be incorrect. You need to check the logs to determine whether the runner did in fact load 43% of the model in VRAM.

I may find the reason of runners

You installation is incomplete or otherwise incorrect. The output should be something like:

/usr/local/lib/ollama
/usr/local/lib/ollama/cuda_v11
/usr/local/lib/ollama/cuda_v11/libcudart.so.11.0
/usr/local/lib/ollama/cuda_v11/libcudart.so.11.3.109
/usr/local/lib/ollama/cuda_v11/libcublas.so.11
/usr/local/lib/ollama/cuda_v11/libcublasLt.so.11
/usr/local/lib/ollama/cuda_v11/libggml-cuda.so
/usr/local/lib/ollama/cuda_v11/libcublasLt.so.11.5.1.109
/usr/local/lib/ollama/cuda_v11/libcublas.so.11.5.1.109
/usr/local/lib/ollama/libggml-base.so
/usr/local/lib/ollama/libggml-cpu-sandybridge.so
/usr/local/lib/ollama/libggml-cpu-alderlake.so
/usr/local/lib/ollama/libggml-cpu-haswell.so
/usr/local/lib/ollama/libggml-cpu-skylakex.so
/usr/local/lib/ollama/cuda_v12
/usr/local/lib/ollama/cuda_v12/libcublasLt.so.12
/usr/local/lib/ollama/cuda_v12/libcudart.so.12
/usr/local/lib/ollama/cuda_v12/libggml-cuda.so
/usr/local/lib/ollama/cuda_v12/libcublas.so.12
/usr/local/lib/ollama/cuda_v12/libcublasLt.so.12.8.4.1
/usr/local/lib/ollama/cuda_v12/libcudart.so.12.8.90
/usr/local/lib/ollama/cuda_v12/libcublas.so.12.8.4.1
/usr/local/lib/ollama/libggml-cpu-icelake.so

That is, the directory should contain the CPU and GPU backends for ollama. How did you install ollama?

I may find the reason why the backends deleted, i failed to run the update sh (https://ollama.com/install.sh), I think "if [ -d "$OLLAMA_INSTALL_DIR/lib/ollama" ] ; then
status "Cleaning up old version at $OLLAMA_INSTALL_DIR/lib/ollama"
$SUDO rm -rf "$OLLAMA_INSTALL_DIR/lib/ollama"
fi" this bash deleted the old runners, and the newer was not installed successfully.

<!-- gh-comment-id:2824144010 --> @shaofanqi commented on GitHub (Apr 23, 2025): > `ollama ps` is what ollama expects the split to be, based on being able load a GPU backend. But if the runner can't find a backend, it will fall back to just CPU, and the output from `ollama ps` will be incorrect. You need to check the logs to determine whether the runner did in fact load 43% of the model in VRAM. I may find the reason of runners > You installation is incomplete or otherwise incorrect. The output should be something like: > > ``` > /usr/local/lib/ollama > /usr/local/lib/ollama/cuda_v11 > /usr/local/lib/ollama/cuda_v11/libcudart.so.11.0 > /usr/local/lib/ollama/cuda_v11/libcudart.so.11.3.109 > /usr/local/lib/ollama/cuda_v11/libcublas.so.11 > /usr/local/lib/ollama/cuda_v11/libcublasLt.so.11 > /usr/local/lib/ollama/cuda_v11/libggml-cuda.so > /usr/local/lib/ollama/cuda_v11/libcublasLt.so.11.5.1.109 > /usr/local/lib/ollama/cuda_v11/libcublas.so.11.5.1.109 > /usr/local/lib/ollama/libggml-base.so > /usr/local/lib/ollama/libggml-cpu-sandybridge.so > /usr/local/lib/ollama/libggml-cpu-alderlake.so > /usr/local/lib/ollama/libggml-cpu-haswell.so > /usr/local/lib/ollama/libggml-cpu-skylakex.so > /usr/local/lib/ollama/cuda_v12 > /usr/local/lib/ollama/cuda_v12/libcublasLt.so.12 > /usr/local/lib/ollama/cuda_v12/libcudart.so.12 > /usr/local/lib/ollama/cuda_v12/libggml-cuda.so > /usr/local/lib/ollama/cuda_v12/libcublas.so.12 > /usr/local/lib/ollama/cuda_v12/libcublasLt.so.12.8.4.1 > /usr/local/lib/ollama/cuda_v12/libcudart.so.12.8.90 > /usr/local/lib/ollama/cuda_v12/libcublas.so.12.8.4.1 > /usr/local/lib/ollama/libggml-cpu-icelake.so > ``` > > That is, the directory should contain the CPU and GPU backends for ollama. How did you install ollama? I may find the reason why the backends deleted, i failed to run the update sh (https://ollama.com/install.sh), I think "if [ -d "$OLLAMA_INSTALL_DIR/lib/ollama" ] ; then status "Cleaning up old version at $OLLAMA_INSTALL_DIR/lib/ollama" $SUDO rm -rf "$OLLAMA_INSTALL_DIR/lib/ollama" fi" this bash deleted the old runners, and the newer was not installed successfully.
Author
Owner

@rick-github commented on GitHub (Apr 23, 2025):

The ollama server calculates that it can offload 11 of 65 layers to the GPU. This is why ollama ps shows GPU usage,

4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=11 layers.split=5,6 memory.available="[19.8 GiB 20.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="92.1 GiB" memory.required.partial="39.2 GiB" memory.required.kv="32.0 GiB" memory.required.allocations="[19.1 GiB 20.1 GiB]" memory.weights.total="62.9 GiB" memory.weights.repeating="62.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="12.8 GiB" memory.graph.partial="12.8 GiB"

But the runner is unable to find a GPU backend and only runs on CPU:

4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16
<!-- gh-comment-id:2824149126 --> @rick-github commented on GitHub (Apr 23, 2025): The ollama server calculates that it can offload 11 of 65 layers to the GPU. This is why `ollama ps` shows GPU usage, ``` 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=11 layers.split=5,6 memory.available="[19.8 GiB 20.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="92.1 GiB" memory.required.partial="39.2 GiB" memory.required.kv="32.0 GiB" memory.required.allocations="[19.1 GiB 20.1 GiB]" memory.weights.total="62.9 GiB" memory.weights.repeating="62.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="12.8 GiB" memory.graph.partial="12.8 GiB" ``` But the runner is unable to find a GPU backend and only runs on CPU: ``` 4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16 ```
Author
Owner

@HENScience commented on GitHub (Apr 29, 2025):

Check if the installed Ollama version is compatible with the corresponding system, arm or amd.

<!-- gh-comment-id:2837723941 --> @HENScience commented on GitHub (Apr 29, 2025): Check if the installed Ollama version is compatible with the corresponding system, arm or amd.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32573