[GH-ISSUE #9362] ollama do not use only gpus #52624

Closed
opened 2026-04-28 23:54:14 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @moluzhui on GitHub (Feb 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9362

What is the issue?

I use ollama on P100 server, I install cuda_10.1. OS is Centos7

my command

ollama run deepseek-r1:1.5b "some question"

Command ollama ps shows that the GPU is used, but command nvidia-smi shows that the GPU usage is 0

ollama show is displayed as follows

# ollama ps
NAME                ID              SIZE      PROCESSOR    UNTIL
deepseek-r1:1.5b    a42b25d8c10a    2.0 GB    100% GPU     4 minutes from now

The gpu usage is displayed as follows

# nvidia-smi
Wed Feb 26 19:48:45 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.226.00   Driver Version: 418.226.00   CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:09.0 Off |                  Off |
| N/A   34C    P0    26W / 250W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

systemd setting

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=root
Group=root
Restart=always
RestartSec=3
Environment="PATH=$PATH"
Environment="OLLAMA_MODELS=/data/ollama/models"
Environment="OLLAMA_HOST=x.x.x.x:xxxxx"
Environment="OLLAMA_SCHED_SPREAD=1"
Environment="CUDA_VISIBLE_DEVICES=0"
Environment="OLLAMA_LLM_LIBRARY=cuda_v10"

[Install]
WantedBy=default.target

nvcc

# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

Relevant log output

Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Started Ollama Service.
Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: 2025/02/26 19:46:29 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://1.2.3.4:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda_v10 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=images.go:432 msg="total blobs: 11"
Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=routes.go:1256 msg="Listening on 1.2.3.4:11434 (version 0.5.12)"
Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.575+08:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /usr/lib64/libcuda.so.418.226.00: symbol lookup for cuCtxCreate_v3 failed: /usr/lib64/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3"
Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.754+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda variant=v11 compute=6.0 driver=0.0 name="" total="15.9 GiB" available="15.6 GiB"
Feb 26 19:47:22 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:22 | 200 |     100.669µs |    1.2.3.4 | HEAD     "/"
Feb 26 19:47:22 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:22 | 200 |    1.048744ms |    1.2.3.4 | GET      "/api/tags"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:41 | 200 |      37.491µs |    1.2.3.4 | HEAD     "/"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:41 | 200 |   23.315974ms |    1.2.3.4 | POST     "/api/show"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.481+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.481+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.481+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc library=cuda parallel=4 required="1.9 GiB"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.636+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.7 GiB" free_swap="0 B"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.637+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[15.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --threads 4 --parallel 4 --port 11463"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.657+08:00 level=INFO source=runner.go:932 msg="starting go runner"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.658+08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=4
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.658+08:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:11463"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   1:                               general.type str              = model
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1-Distill-Qwen
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   5:                          qwen2.block_count u32              = 28
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   6:                       qwen2.context_length u32              = 131072
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 1536
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 8960
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 12
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 10000.000000
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  13:                          general.file_type u32              = 15
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = qwen2
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.890+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type  f32:  141 tensors
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q4_K:  169 tensors
Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q6_K:   29 tensors
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special tokens cache size = 22
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: token to piece cache size = 0.9310 MB
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: arch             = qwen2
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab type       = BPE
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_vocab          = 151936
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_merges         = 151387
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab_only       = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_ctx_train      = 131072
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd           = 1536
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_layer          = 28
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_head           = 12
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_head_kv        = 2
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_rot            = 128
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_swa            = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_head_k    = 128
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_head_v    = 128
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_gqa            = 6
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_k_gqa     = 256
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_v_gqa     = 256
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_ff             = 8960
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_expert         = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_expert_used    = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: causal attn      = 1
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: pooling type     = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: rope type        = 2
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: rope scaling     = linear
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: freq_base_train  = 10000.0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: freq_scale_train = 1
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_ctx_orig_yarn  = 131072
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: rope_finetuned   = unknown
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_d_conv       = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_d_inner      = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_d_state      = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_dt_rank      = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model type       = 1.5B
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model ftype      = Q4_K - Medium
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model params     = 1.78 B
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model size       = 1.04 GiB (5.00 BPW)
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: general.name     = DeepSeek R1 Distill Qwen 1.5B
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: BOS token        = 151646 '<|begin▁of▁sentence|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOS token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOT token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: PAD token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: LF token         = 148848 'ÄĬ'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: max token length = 256
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_tensors:   CPU_Mapped model buffer size =  1059.89 MiB
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_seq_max     = 4
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ctx         = 8192
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ctx_per_seq = 2048
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_batch       = 2048
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ubatch      = 512
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: flash_attn    = 0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: freq_base     = 10000.0
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: freq_scale    = 1
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_kv_cache_init:        CPU KV buffer size =   224.00 MiB
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: KV self size  =  224.00 MiB, K (f16):  112.00 MiB, V (f16):  112.00 MiB
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model:        CPU  output buffer size =     2.34 MiB
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model:        CPU compute buffer size =   302.75 MiB
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: graph nodes  = 986
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: graph splits = 1
Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:42.894+08:00 level=INFO source=server.go:596 msg="llama runner started in 1.26 seconds"
Feb 26 19:48:18 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:48:18 | 200 |      47.674µs |    1.2.3.4 | HEAD     "/"
Feb 26 19:48:18 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:48:18 | 200 |     182.918µs |    1.2.3.4 | GET      "/api/ps"
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   1:                               general.type str              = model
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1-Distill-Qwen
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   5:                          qwen2.block_count u32              = 28
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   6:                       qwen2.context_length u32              = 131072
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 1536
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 8960
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 12
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 10000.000000
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  13:                          general.file_type u32              = 15
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = qwen2
Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type  f32:  141 tensors
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q4_K:  169 tensors
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q6_K:   29 tensors
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special tokens cache size = 22
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: token to piece cache size = 0.9310 MB
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: arch             = qwen2
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab type       = BPE
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_vocab          = 151936
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_merges         = 151387
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab_only       = 1
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model type       = ?B
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model ftype      = all F32
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model params     = 1.78 B
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model size       = 1.04 GiB (5.00 BPW)
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: general.name     = DeepSeek R1 Distill Qwen 1.5B
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: BOS token        = 151646 '<|begin▁of▁sentence|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOS token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOT token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: PAD token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: LF token         = 148848 'ÄĬ'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151643 '<|end▁of▁sentence|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: max token length = 256
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_load: vocab only - skipping tensors
Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:52:21 | 200 |         4m40s |    1.2.3.4 | POST     "/api/generate"

OS

No response

GPU

Nvidia

CPU

Intel

Ollama version

0.5.12

Originally created by @moluzhui on GitHub (Feb 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9362 ### What is the issue? I use ollama on **P100** server, I install **cuda_10.1**. OS is Centos7 my command ``` ollama run deepseek-r1:1.5b "some question" ``` Command `ollama ps` shows that the GPU is used, but command `nvidia-smi` shows that the GPU usage is 0 ollama show is displayed as follows ``` # ollama ps NAME ID SIZE PROCESSOR UNTIL deepseek-r1:1.5b a42b25d8c10a 2.0 GB 100% GPU 4 minutes from now ``` The gpu usage is displayed as follows ``` # nvidia-smi Wed Feb 26 19:48:45 2025 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.226.00 Driver Version: 418.226.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 00000000:00:09.0 Off | Off | | N/A 34C P0 26W / 250W | 10MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` systemd setting ``` [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=root Group=root Restart=always RestartSec=3 Environment="PATH=$PATH" Environment="OLLAMA_MODELS=/data/ollama/models" Environment="OLLAMA_HOST=x.x.x.x:xxxxx" Environment="OLLAMA_SCHED_SPREAD=1" Environment="CUDA_VISIBLE_DEVICES=0" Environment="OLLAMA_LLM_LIBRARY=cuda_v10" [Install] WantedBy=default.target ``` nvcc ``` # nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 ``` ### Relevant log output ```shell Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Started Ollama Service. Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: 2025/02/26 19:46:29 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://1.2.3.4:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda_v10 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=images.go:432 msg="total blobs: 11" Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=routes.go:1256 msg="Listening on 1.2.3.4:11434 (version 0.5.12)" Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.571+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.575+08:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /usr/lib64/libcuda.so.418.226.00: symbol lookup for cuCtxCreate_v3 failed: /usr/lib64/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3" Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.754+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda variant=v11 compute=6.0 driver=0.0 name="" total="15.9 GiB" available="15.6 GiB" Feb 26 19:47:22 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:22 | 200 | 100.669µs | 1.2.3.4 | HEAD "/" Feb 26 19:47:22 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:22 | 200 | 1.048744ms | 1.2.3.4 | GET "/api/tags" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:41 | 200 | 37.491µs | 1.2.3.4 | HEAD "/" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:47:41 | 200 | 23.315974ms | 1.2.3.4 | POST "/api/show" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.481+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.481+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.481+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc library=cuda parallel=4 required="1.9 GiB" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.636+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.7 GiB" free_swap="0 B" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.637+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[15.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --threads 4 --parallel 4 --port 11463" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.638+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.657+08:00 level=INFO source=runner.go:932 msg="starting go runner" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.658+08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=4 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.658+08:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:11463" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest)) Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 0: general.architecture str = qwen2 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 1: general.type str = model Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 3: general.basename str = DeepSeek-R1-Distill-Qwen Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 4: general.size_label str = 1.5B Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 5: qwen2.block_count u32 = 28 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 6: qwen2.context_length u32 = 131072 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 7: qwen2.embedding_length u32 = 1536 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 8960 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 12 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 10000.000000 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 13: general.file_type u32 = 15 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = qwen2 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:41.890+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type f32: 141 tensors Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q4_K: 169 tensors Feb 26 19:47:41 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q6_K: 29 tensors Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special tokens cache size = 22 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: token to piece cache size = 0.9310 MB Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: format = GGUF V3 (latest) Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: arch = qwen2 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab type = BPE Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_vocab = 151936 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_merges = 151387 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab_only = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_ctx_train = 131072 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd = 1536 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_layer = 28 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_head = 12 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_head_kv = 2 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_rot = 128 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_swa = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_head_k = 128 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_head_v = 128 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_gqa = 6 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_k_gqa = 256 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_embd_v_gqa = 256 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_norm_eps = 0.0e+00 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_norm_rms_eps = 1.0e-06 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: f_logit_scale = 0.0e+00 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_ff = 8960 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_expert = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_expert_used = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: causal attn = 1 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: pooling type = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: rope type = 2 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: rope scaling = linear Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: freq_base_train = 10000.0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: freq_scale_train = 1 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_ctx_orig_yarn = 131072 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: rope_finetuned = unknown Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_d_conv = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_d_inner = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_d_state = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_dt_rank = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model type = 1.5B Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model ftype = Q4_K - Medium Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model params = 1.78 B Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model size = 1.04 GiB (5.00 BPW) Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: general.name = DeepSeek R1 Distill Qwen 1.5B Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: BOS token = 151646 '<|begin▁of▁sentence|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOS token = 151643 '<|end▁of▁sentence|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOT token = 151643 '<|end▁of▁sentence|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: PAD token = 151643 '<|end▁of▁sentence|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: LF token = 148848 'ÄĬ' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151643 '<|end▁of▁sentence|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: max token length = 256 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_tensors: CPU_Mapped model buffer size = 1059.89 MiB Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_seq_max = 4 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ctx = 8192 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ctx_per_seq = 2048 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_batch = 2048 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ubatch = 512 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: flash_attn = 0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: freq_base = 10000.0 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: freq_scale = 1 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_kv_cache_init: CPU KV buffer size = 224.00 MiB Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: KV self size = 224.00 MiB, K (f16): 112.00 MiB, V (f16): 112.00 MiB Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: CPU output buffer size = 2.34 MiB Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: CPU compute buffer size = 302.75 MiB Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: graph nodes = 986 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_new_context_with_model: graph splits = 1 Feb 26 19:47:42 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:47:42.894+08:00 level=INFO source=server.go:596 msg="llama runner started in 1.26 seconds" Feb 26 19:48:18 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:48:18 | 200 | 47.674µs | 1.2.3.4 | HEAD "/" Feb 26 19:48:18 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:48:18 | 200 | 182.918µs | 1.2.3.4 | GET "/api/ps" Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest)) Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 0: general.architecture str = qwen2 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 1: general.type str = model Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 3: general.basename str = DeepSeek-R1-Distill-Qwen Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 4: general.size_label str = 1.5B Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 5: qwen2.block_count u32 = 28 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 6: qwen2.context_length u32 = 131072 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 7: qwen2.embedding_length u32 = 1536 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 8960 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 12 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 10000.000000 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 13: general.file_type u32 = 15 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = qwen2 Feb 26 19:52:20 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type f32: 141 tensors Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q4_K: 169 tensors Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_loader: - type q6_K: 29 tensors Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: special tokens cache size = 22 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_vocab: token to piece cache size = 0.9310 MB Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: format = GGUF V3 (latest) Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: arch = qwen2 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab type = BPE Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_vocab = 151936 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: n_merges = 151387 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: vocab_only = 1 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model type = ?B Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model ftype = all F32 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model params = 1.78 B Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: model size = 1.04 GiB (5.00 BPW) Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: general.name = DeepSeek R1 Distill Qwen 1.5B Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: BOS token = 151646 '<|begin▁of▁sentence|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOS token = 151643 '<|end▁of▁sentence|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOT token = 151643 '<|end▁of▁sentence|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: PAD token = 151643 '<|end▁of▁sentence|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: LF token = 148848 'ÄĬ' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151643 '<|end▁of▁sentence|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llm_load_print_meta: max token length = 256 Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: llama_model_load: vocab only - skipping tensors Feb 26 19:52:21 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: [GIN] 2025/02/26 - 19:52:21 | 200 | 4m40s | 1.2.3.4 | POST "/api/generate" ``` ### OS _No response_ ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.12
GiteaMirror added the bug label 2026-04-28 23:54:14 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 26, 2025):

Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.575+08:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /usr/lib64/libcuda.so.418.226.00: symbol lookup for cuCtxCreate_v3 failed: /usr/lib64/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3"

CUDA driver is too old.

<!-- gh-comment-id:2684812785 --> @rick-github commented on GitHub (Feb 26, 2025): ``` Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.575+08:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /usr/lib64/libcuda.so.418.226.00: symbol lookup for cuCtxCreate_v3 failed: /usr/lib64/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3" ``` CUDA driver is too old.
Author
Owner

@moluzhui commented on GitHub (Feb 26, 2025):

Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.575+08:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /usr/lib64/libcuda.so.418.226.00: symbol lookup for cuCtxCreate_v3 failed: /usr/lib64/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3"

CUDA driver is too old.

Sorry, this is the version recommended by the server provider. What versions are currently supported by ollama? I will upgrade to the corresponding version. I can't find version compatibility docs.

Or which ollama version I downgrade to meet cuda10.1.

<!-- gh-comment-id:2684823709 --> @moluzhui commented on GitHub (Feb 26, 2025): > ``` > Feb 26 19:46:29 iZbp1d0j83sijy1mh3b3s9Z ollama[21816]: time=2025-02-26T19:46:29.575+08:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /usr/lib64/libcuda.so.418.226.00: symbol lookup for cuCtxCreate_v3 failed: /usr/lib64/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3" > ``` > > CUDA driver is too old. Sorry, this is the version recommended by the server provider. What versions are currently supported by ollama? I will upgrade to the corresponding version. I can't find version compatibility docs. Or which ollama version I downgrade to meet cuda10.1.
Author
Owner

@rick-github commented on GitHub (Feb 26, 2025):

CUDA 11 or above.

<!-- gh-comment-id:2684971336 --> @rick-github commented on GitHub (Feb 26, 2025): CUDA 11 or above.
Author
Owner

@moluzhui commented on GitHub (Feb 27, 2025):

The problem remains unresolved.
I upgraded CUDA to version 11.4 and before that upgraded the driver to a compatible version 470.256.02. However, after ollama run deepseek-r1:1.5b "same question" is executed, gpu-util is still 0. The new output log is below

# nvidia-smi
Thu Feb 27 10:24:27 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.256.02   Driver Version: 470.256.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:09.0 Off |                  Off |
| N/A   34C    P0    27W / 250W |      2MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

cuda version

# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
# cat /etc/systemd/system/ollama.service
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=root
Group=root
Restart=always
RestartSec=3
Environment="PATH=$PATH"
Environment="OLLAMA_MODELS=/data/ollama/models"
Environment="OLLAMA_HOST=10.2.3.4:11434"
Environment="OLLAMA_SCHED_SPREAD=1"
Environment="CUDA_VISIBLE_DEVICES=0"
Environment="OLLAMA_LLM_LIBRARY=cuda_v11"

[Install]
WantedBy=default.target

Relevant log output

Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Started Ollama Service.
Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: 2025/02/27 10:34:45 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://10.2.3.4:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda_v11 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.784+08:00 level=INFO source=images.go:432 msg="total blobs: 11"
Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.784+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.785+08:00 level=INFO source=routes.go:1256 msg="Listening on 10.2.3.4:11434 (version 0.5.12)"
Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.785+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.942+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda variant=v11 compute=6.0 driver=11.4 name="Tesla P100-PCIE-16GB" total="15.9 GiB" available="15.7 GiB"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:34:56 | 200 |      84.434µs |    10.2.3.4 | HEAD     "/"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:34:56 | 200 |    22.87061ms |    10.2.3.4 | POST     "/api/show"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.541+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.541+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.541+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc library=cuda parallel=4 required="1.9 GiB"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.671+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.7 GiB" free_swap="0 B"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.671+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.672+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.672+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[15.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.673+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --threads 4 --parallel 4 --port 28037"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.673+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.673+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.677+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.693+08:00 level=INFO source=runner.go:932 msg="starting go runner"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.693+08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=4
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.693+08:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:28037"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   1:                               general.type str              = model
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1-Distill-Qwen
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   5:                          qwen2.block_count u32              = 28
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   6:                       qwen2.context_length u32              = 131072
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 1536
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 8960
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 12
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 10000.000000
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  13:                          general.file_type u32              = 15
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = qwen2
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.928+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type  f32:  141 tensors
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q4_K:  169 tensors
Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q6_K:   29 tensors
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special tokens cache size = 22
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: token to piece cache size = 0.9310 MB
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: arch             = qwen2
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab type       = BPE
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_vocab          = 151936
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_merges         = 151387
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab_only       = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_ctx_train      = 131072
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd           = 1536
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_layer          = 28
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_head           = 12
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_head_kv        = 2
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_rot            = 128
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_swa            = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_head_k    = 128
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_head_v    = 128
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_gqa            = 6
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_k_gqa     = 256
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_v_gqa     = 256
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_ff             = 8960
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_expert         = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_expert_used    = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: causal attn      = 1
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: pooling type     = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: rope type        = 2
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: rope scaling     = linear
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: freq_base_train  = 10000.0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: freq_scale_train = 1
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_ctx_orig_yarn  = 131072
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: rope_finetuned   = unknown
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_d_conv       = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_d_inner      = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_d_state      = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_dt_rank      = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model type       = 1.5B
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model ftype      = Q4_K - Medium
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model params     = 1.78 B
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model size       = 1.04 GiB (5.00 BPW)
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: general.name     = DeepSeek R1 Distill Qwen 1.5B
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: BOS token        = 151646 '<|begin▁of▁sentence|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOS token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOT token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: PAD token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: LF token         = 148848 'ÄĬ'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: max token length = 256
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_tensors:   CPU_Mapped model buffer size =  1059.89 MiB
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_seq_max     = 4
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ctx         = 8192
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ctx_per_seq = 2048
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_batch       = 2048
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ubatch      = 512
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: flash_attn    = 0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: freq_base     = 10000.0
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: freq_scale    = 1
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_kv_cache_init:        CPU KV buffer size =   224.00 MiB
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: KV self size  =  224.00 MiB, K (f16):  112.00 MiB, V (f16):  112.00 MiB
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model:        CPU  output buffer size =     2.34 MiB
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model:        CPU compute buffer size =   302.75 MiB
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: graph nodes  = 986
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: graph splits = 1
Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:57.933+08:00 level=INFO source=server.go:596 msg="llama runner started in 1.26 seconds"
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   1:                               general.type str              = model
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1-Distill-Qwen
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   5:                          qwen2.block_count u32              = 28
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   6:                       qwen2.context_length u32              = 131072
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 1536
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 8960
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 12
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 10000.000000
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  13:                          general.file_type u32              = 15
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = qwen2
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type  f32:  141 tensors
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q4_K:  169 tensors
Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q6_K:   29 tensors
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special tokens cache size = 22
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: token to piece cache size = 0.9310 MB
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: arch             = qwen2
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab type       = BPE
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_vocab          = 151936
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_merges         = 151387
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab_only       = 1
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model type       = ?B
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model ftype      = all F32
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model params     = 1.78 B
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model size       = 1.04 GiB (5.00 BPW)
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: general.name     = DeepSeek R1 Distill Qwen 1.5B
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: BOS token        = 151646 '<|begin▁of▁sentence|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOS token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOT token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: PAD token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: LF token         = 148848 'ÄĬ'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151643 '<|end▁of▁sentence|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: max token length = 256
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_load: vocab only - skipping tensors
Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:40:31 | 200 |         5m35s |    10.2.3.4 | POST     "/api/generate"
Feb 27 10:40:58 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:40:58 | 200 |      31.277µs |    10.2.3.4 | HEAD     "/"
Feb 27 10:40:58 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:40:58 | 200 |    3.560035ms |    10.2.3.4 | GET      "/api/tags"
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:41:13 | 200 |      33.794µs |    10.2.3.4 | HEAD     "/"
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:41:13 | 200 |   23.038912ms |    10.2.3.4 | POST     "/api/show"
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.636+08:00 level=INFO source=sched.go:508 msg="updated VRAM based on existing loaded models" gpu=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda total="15.9 GiB" available="14.0 GiB"
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.637+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e library=cuda parallel=4 required="10.8 GiB"
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.767+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.4 GiB" free_swap="0 B"
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.768+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.768+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.768+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[14.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="10.8 GiB" memory.required.kv="1.5 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.9 GiB" memory.weights.repeating="8.3 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="676.0 MiB" memory.graph.partial="916.1 MiB"
Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.769+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 4 --parallel 4 --port 19631"
<!-- gh-comment-id:2686706582 --> @moluzhui commented on GitHub (Feb 27, 2025): The problem remains unresolved. I upgraded CUDA to version **11.4** and before that upgraded the driver to a compatible version **470.256.02**. However, after `ollama run deepseek-r1:1.5b "same question"` is executed, **gpu-util is still 0**. The new output log is below ``` # nvidia-smi Thu Feb 27 10:24:27 2025 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.256.02 Driver Version: 470.256.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 00000000:00:09.0 Off | Off | | N/A 34C P0 27W / 250W | 2MiB / 16280MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` cuda version ``` # nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Wed_Jun__2_19:15:15_PDT_2021 Cuda compilation tools, release 11.4, V11.4.48 Build cuda_11.4.r11.4/compiler.30033411_0 ``` ``` # cat /etc/systemd/system/ollama.service [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=root Group=root Restart=always RestartSec=3 Environment="PATH=$PATH" Environment="OLLAMA_MODELS=/data/ollama/models" Environment="OLLAMA_HOST=10.2.3.4:11434" Environment="OLLAMA_SCHED_SPREAD=1" Environment="CUDA_VISIBLE_DEVICES=0" Environment="OLLAMA_LLM_LIBRARY=cuda_v11" [Install] WantedBy=default.target ``` ## Relevant log output ``` Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Started Ollama Service. Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: 2025/02/27 10:34:45 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://10.2.3.4:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda_v11 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.784+08:00 level=INFO source=images.go:432 msg="total blobs: 11" Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.784+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.785+08:00 level=INFO source=routes.go:1256 msg="Listening on 10.2.3.4:11434 (version 0.5.12)" Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.785+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Feb 27 10:34:45 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:45.942+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda variant=v11 compute=6.0 driver=11.4 name="Tesla P100-PCIE-16GB" total="15.9 GiB" available="15.7 GiB" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:34:56 | 200 | 84.434µs | 10.2.3.4 | HEAD "/" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:34:56 | 200 | 22.87061ms | 10.2.3.4 | POST "/api/show" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.541+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.541+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.541+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc library=cuda parallel=4 required="1.9 GiB" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.671+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.7 GiB" free_swap="0 B" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.671+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.672+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.672+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[15.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.673+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --threads 4 --parallel 4 --port 28037" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.673+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.673+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.677+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.693+08:00 level=INFO source=runner.go:932 msg="starting go runner" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.693+08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=4 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.693+08:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:28037" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest)) Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 0: general.architecture str = qwen2 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 1: general.type str = model Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 3: general.basename str = DeepSeek-R1-Distill-Qwen Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 4: general.size_label str = 1.5B Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 5: qwen2.block_count u32 = 28 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 6: qwen2.context_length u32 = 131072 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 7: qwen2.embedding_length u32 = 1536 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 8960 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 12 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 10000.000000 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 13: general.file_type u32 = 15 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = qwen2 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:56.928+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type f32: 141 tensors Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q4_K: 169 tensors Feb 27 10:34:56 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q6_K: 29 tensors Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special tokens cache size = 22 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: token to piece cache size = 0.9310 MB Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: format = GGUF V3 (latest) Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: arch = qwen2 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab type = BPE Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_vocab = 151936 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_merges = 151387 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab_only = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_ctx_train = 131072 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd = 1536 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_layer = 28 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_head = 12 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_head_kv = 2 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_rot = 128 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_swa = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_head_k = 128 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_head_v = 128 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_gqa = 6 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_k_gqa = 256 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_embd_v_gqa = 256 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_norm_eps = 0.0e+00 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_norm_rms_eps = 1.0e-06 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: f_logit_scale = 0.0e+00 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_ff = 8960 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_expert = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_expert_used = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: causal attn = 1 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: pooling type = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: rope type = 2 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: rope scaling = linear Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: freq_base_train = 10000.0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: freq_scale_train = 1 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_ctx_orig_yarn = 131072 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: rope_finetuned = unknown Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_d_conv = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_d_inner = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_d_state = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_dt_rank = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model type = 1.5B Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model ftype = Q4_K - Medium Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model params = 1.78 B Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model size = 1.04 GiB (5.00 BPW) Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: general.name = DeepSeek R1 Distill Qwen 1.5B Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: BOS token = 151646 '<|begin▁of▁sentence|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOS token = 151643 '<|end▁of▁sentence|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOT token = 151643 '<|end▁of▁sentence|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: PAD token = 151643 '<|end▁of▁sentence|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: LF token = 148848 'ÄĬ' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151643 '<|end▁of▁sentence|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: max token length = 256 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_tensors: CPU_Mapped model buffer size = 1059.89 MiB Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_seq_max = 4 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ctx = 8192 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ctx_per_seq = 2048 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_batch = 2048 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ubatch = 512 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: flash_attn = 0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: freq_base = 10000.0 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: freq_scale = 1 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_kv_cache_init: CPU KV buffer size = 224.00 MiB Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: KV self size = 224.00 MiB, K (f16): 112.00 MiB, V (f16): 112.00 MiB Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: CPU output buffer size = 2.34 MiB Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: CPU compute buffer size = 302.75 MiB Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: graph nodes = 986 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_new_context_with_model: graph splits = 1 Feb 27 10:34:57 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:34:57.933+08:00 level=INFO source=server.go:596 msg="llama runner started in 1.26 seconds" Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest)) Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 0: general.architecture str = qwen2 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 1: general.type str = model Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 3: general.basename str = DeepSeek-R1-Distill-Qwen Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 4: general.size_label str = 1.5B Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 5: qwen2.block_count u32 = 28 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 6: qwen2.context_length u32 = 131072 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 7: qwen2.embedding_length u32 = 1536 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 8960 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 12 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 10000.000000 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 13: general.file_type u32 = 15 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = qwen2 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type f32: 141 tensors Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q4_K: 169 tensors Feb 27 10:40:30 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_loader: - type q6_K: 29 tensors Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: special tokens cache size = 22 Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_vocab: token to piece cache size = 0.9310 MB Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: format = GGUF V3 (latest) Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: arch = qwen2 Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab type = BPE Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_vocab = 151936 Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: n_merges = 151387 Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: vocab_only = 1 Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model type = ?B Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model ftype = all F32 Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model params = 1.78 B Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: model size = 1.04 GiB (5.00 BPW) Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: general.name = DeepSeek R1 Distill Qwen 1.5B Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: BOS token = 151646 '<|begin▁of▁sentence|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOS token = 151643 '<|end▁of▁sentence|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOT token = 151643 '<|end▁of▁sentence|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: PAD token = 151643 '<|end▁of▁sentence|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: LF token = 148848 'ÄĬ' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151643 '<|end▁of▁sentence|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llm_load_print_meta: max token length = 256 Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: llama_model_load: vocab only - skipping tensors Feb 27 10:40:31 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:40:31 | 200 | 5m35s | 10.2.3.4 | POST "/api/generate" Feb 27 10:40:58 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:40:58 | 200 | 31.277µs | 10.2.3.4 | HEAD "/" Feb 27 10:40:58 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:40:58 | 200 | 3.560035ms | 10.2.3.4 | GET "/api/tags" Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:41:13 | 200 | 33.794µs | 10.2.3.4 | HEAD "/" Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: [GIN] 2025/02/27 - 10:41:13 | 200 | 23.038912ms | 10.2.3.4 | POST "/api/show" Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.636+08:00 level=INFO source=sched.go:508 msg="updated VRAM based on existing loaded models" gpu=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda total="15.9 GiB" available="14.0 GiB" Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.636+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.637+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e library=cuda parallel=4 required="10.8 GiB" Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.767+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.4 GiB" free_swap="0 B" Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.768+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.768+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.768+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[14.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="10.8 GiB" memory.required.kv="1.5 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.9 GiB" memory.weights.repeating="8.3 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="676.0 MiB" memory.graph.partial="916.1 MiB" Feb 27 10:41:13 iZbp1d0j83sijy1mh3b3s9Z ollama[9355]: time=2025-02-27T10:41:13.769+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 4 --parallel 4 --port 19631" ```
Author
Owner

@moluzhui commented on GitHub (Feb 27, 2025):

The log output after setting the debug environment variable OLLAMA_DEBUG=1 is as follows

Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Stopped Ollama Service.
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Started Ollama Service.
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: 2025/02/27 11:23:26 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://10.2.3.4:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda_v11 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.201+08:00 level=INFO source=images.go:432 msg="total blobs: 11"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=INFO source=routes.go:1256 msg="Listening on 10.2.3.4:11434 (version 0.5.12)"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=DEBUG source=sched.go:106 msg="starting llm scheduler"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.203+08:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.203+08:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.203+08:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/bin/libcuda.so* /libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.204+08:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[/usr/lib64/libcuda.so.470.256.02]
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: initializing /usr/lib64/libcuda.so.470.256.02
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuInit - 0x7feca2b11730
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDriverGetVersion - 0x7feca2b11700
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetCount - 0x7feca2b116a0
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGet - 0x7feca2b116d0
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetAttribute - 0x7feca2b11550
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetUuid - 0x7feca2b11640
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetName - 0x7feca2b11670
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxCreate_v3 - 0x7feca2b11280
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuMemGetInfo_v2 - 0x7feca2b10bf0
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxDestroy - 0x7feca2b35860
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuInit
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDriverGetVersion
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: raw version 0x2b20
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: CUDA driver version: 11.4
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDeviceGetCount
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: device count 1
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.223+08:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib64/libcuda.so.470.256.02
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d] CUDA totalMem 16280 mb
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d] CUDA freeMem 16025 mb
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d] Compute Capability 6.0
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.365+08:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu"
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: releasing cuda driver library
Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.365+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda variant=v11 compute=6.0 driver=11.4 name="Tesla P100-PCIE-16GB" total="15.9 GiB" available="15.7 GiB"
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GIN] 2025/02/27 - 11:23:57 | 200 |      95.974µs |    10.2.3.4 | HEAD     "/"
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GIN] 2025/02/27 - 11:23:57 | 200 |   22.745396ms |    10.2.3.4 | POST     "/api/show"
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:57.994+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="58.8 GiB" before.free="57.8 GiB" before.free_swap="0 B" now.total="58.8 GiB" now.free="57.7 GiB" now.free_swap="0 B"
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: initializing /usr/lib64/libcuda.so.470.256.02
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuInit - 0x7feca2b11730
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDriverGetVersion - 0x7feca2b11700
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetCount - 0x7feca2b116a0
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGet - 0x7feca2b116d0
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetAttribute - 0x7feca2b11550
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetUuid - 0x7feca2b11640
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetName - 0x7feca2b11670
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxCreate_v3 - 0x7feca2b11280
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuMemGetInfo_v2 - 0x7feca2b10bf0
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxDestroy - 0x7feca2b35860
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuInit
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDriverGetVersion
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: raw version 0x2b20
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: CUDA driver version: 11.4
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDeviceGetCount
Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: device count 1
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.131+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d name="Tesla P100-PCIE-16GB" overhead="0 B" before.total="15.9 GiB" before.free="15.7 GiB" now.total="15.9 GiB" now.free="15.7 GiB" now.used="255.1 MiB"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: releasing cuda driver library
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.131+08:00 level=DEBUG source=sched.go:182 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.174+08:00 level=DEBUG source=sched.go:225 msg="loading first model" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.174+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[15.7 GiB]"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc library=cuda parallel=4 required="1.9 GiB"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="58.8 GiB" before.free="57.7 GiB" before.free_swap="0 B" now.total="58.8 GiB" now.free="57.7 GiB" now.free_swap="0 B"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: initializing /usr/lib64/libcuda.so.470.256.02
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuInit - 0x7feca2b11730
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDriverGetVersion - 0x7feca2b11700
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetCount - 0x7feca2b116a0
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGet - 0x7feca2b116d0
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetAttribute - 0x7feca2b11550
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetUuid - 0x7feca2b11640
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetName - 0x7feca2b11670
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxCreate_v3 - 0x7feca2b11280
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuMemGetInfo_v2 - 0x7feca2b10bf0
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxDestroy - 0x7feca2b35860
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuInit
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDriverGetVersion
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: raw version 0x2b20
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: CUDA driver version: 11.4
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDeviceGetCount
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: device count 1
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d name="Tesla P100-PCIE-16GB" overhead="0 B" before.total="15.9 GiB" before.free="15.7 GiB" now.total="15.9 GiB" now.free="15.7 GiB" now.used="255.1 MiB"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: releasing cuda driver library
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.7 GiB" free_swap="0 B"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[15.7 GiB]"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[15.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[]
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.307+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 4 --parallel 4 --port 5188"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.307+08:00 level=DEBUG source=server.go:398 msg=subprocess environment="[PATH=$PATH CUDA_VISIBLE_DEVICES=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d LD_LIBRARY_PATH=/usr/local/bin]"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.308+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.308+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.308+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=INFO source=runner.go:932 msg="starting go runner"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/bin
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.328+08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=4
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.328+08:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:5188"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   1:                               general.type str              = model
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1-Distill-Qwen
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   5:                          qwen2.block_count u32              = 28
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   6:                       qwen2.context_length u32              = 131072
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 1536
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 8960
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 12
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 10000.000000
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  13:                          general.file_type u32              = 15
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = qwen2
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.559+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type  f32:  141 tensors
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q4_K:  169 tensors
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q6_K:   29 tensors
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151646 '<|begin▁of▁sentence|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151644 '<|User|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151643 '<|end▁of▁sentence|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151645 '<|Assistant|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151647 '<|EOT|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special tokens cache size = 22
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: token to piece cache size = 0.9310 MB
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: arch             = qwen2
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab type       = BPE
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_vocab          = 151936
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_merges         = 151387
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab_only       = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_ctx_train      = 131072
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd           = 1536
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_layer          = 28
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_head           = 12
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_head_kv        = 2
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_rot            = 128
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_swa            = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_head_k    = 128
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_head_v    = 128
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_gqa            = 6
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_k_gqa     = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_v_gqa     = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_ff             = 8960
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_expert         = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_expert_used    = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: causal attn      = 1
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: pooling type     = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: rope type        = 2
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: rope scaling     = linear
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: freq_base_train  = 10000.0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: freq_scale_train = 1
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_ctx_orig_yarn  = 131072
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: rope_finetuned   = unknown
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_d_conv       = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_d_inner      = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_d_state      = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_dt_rank      = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model type       = 1.5B
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model ftype      = Q4_K - Medium
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model params     = 1.78 B
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model size       = 1.04 GiB (5.00 BPW)
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: general.name     = DeepSeek R1 Distill Qwen 1.5B
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: BOS token        = 151646 '<|begin▁of▁sentence|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOS token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOT token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: PAD token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: LF token         = 148848 'ÄĬ'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: max token length = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_tensors:   CPU_Mapped model buffer size =  1059.89 MiB
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_seq_max     = 4
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ctx         = 8192
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ctx_per_seq = 2048
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_batch       = 2048
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ubatch      = 512
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: flash_attn    = 0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: freq_base     = 10000.0
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: freq_scale    = 1
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 256, n_embd_v_gqa = 256
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init:        CPU KV buffer size =   224.00 MiB
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: KV self size  =  224.00 MiB, K (f16):  112.00 MiB, V (f16):  112.00 MiB
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model:        CPU  output buffer size =     2.34 MiB
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model:        CPU compute buffer size =   302.75 MiB
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: graph nodes  = 986
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: graph splits = 1
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.563+08:00 level=INFO source=server.go:596 msg="llama runner started in 1.26 seconds"
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.563+08:00 level=DEBUG source=sched.go:463 msg="finished setting up runner" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.563+08:00 level=DEBUG source=routes.go:289 msg="generate request" images=0 prompt="<|User|>how to restart redis cluster<|Assistant|>"
Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.567+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=8 used=0 remaining=8
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:55.887+08:00 level=DEBUG source=server.go:968 msg="new runner detected, loading model for cgo tokenization"
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   1:                               general.type str              = model
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 1.5B
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1-Distill-Qwen
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   5:                          qwen2.block_count u32              = 28
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   6:                       qwen2.context_length u32              = 131072
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 1536
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 8960
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 12
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 10000.000000
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  13:                          general.file_type u32              = 15
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = qwen2
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type  f32:  141 tensors
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q4_K:  169 tensors
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q6_K:   29 tensors
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special tokens cache size = 22
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: token to piece cache size = 0.9310 MB
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: arch             = qwen2
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab type       = BPE
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_vocab          = 151936
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_merges         = 151387
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab_only       = 1
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model type       = ?B
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model ftype      = all F32
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model params     = 1.78 B
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model size       = 1.04 GiB (5.00 BPW)
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: general.name     = DeepSeek R1 Distill Qwen 1.5B
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: BOS token        = 151646 '<|begin▁of▁sentence|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOS token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOT token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: PAD token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: LF token         = 148848 'ÄĬ'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151643 '<|end▁of▁sentence|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: max token length = 256
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_load: vocab only - skipping tensors
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GIN] 2025/02/27 - 11:28:56 | 200 |         4m58s |    10.2.3.4 | POST     "/api/generate"
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:56.855+08:00 level=DEBUG source=sched.go:467 msg="context for request finished"
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:56.855+08:00 level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc duration=5m0s
Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:56.855+08:00 level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc refCount=0

systemd setting

# cat /etc/systemd/system/ollama.service
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=root
Group=root
Restart=always
RestartSec=3
Environment="PATH=$PATH"
Environment="OLLAMA_MODELS=/data/ollama/models"
Environment="OLLAMA_HOST=10.2.3.4:11434"
Environment="OLLAMA_SCHED_SPREAD=1"
Environment="CUDA_VISIBLE_DEVICES=0"
Environment="OLLAMA_LLM_LIBRARY=cuda_v11"
Environment="OLLAMA_DEBUG=1"

[Install]
WantedBy=default.target
<!-- gh-comment-id:2686764201 --> @moluzhui commented on GitHub (Feb 27, 2025): The log output after setting the debug environment variable `OLLAMA_DEBUG=1` is as follows ```bash Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Stopped Ollama Service. Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z systemd[1]: Started Ollama Service. Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: 2025/02/27 11:23:26 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://10.2.3.4:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda_v11 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.201+08:00 level=INFO source=images.go:432 msg="total blobs: 11" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=INFO source=routes.go:1256 msg="Listening on 10.2.3.4:11434 (version 0.5.12)" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=DEBUG source=sched.go:106 msg="starting llm scheduler" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.202+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.203+08:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.203+08:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.203+08:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/bin/libcuda.so* /libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.204+08:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[/usr/lib64/libcuda.so.470.256.02] Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: initializing /usr/lib64/libcuda.so.470.256.02 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuInit - 0x7feca2b11730 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDriverGetVersion - 0x7feca2b11700 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetCount - 0x7feca2b116a0 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGet - 0x7feca2b116d0 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetAttribute - 0x7feca2b11550 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetUuid - 0x7feca2b11640 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetName - 0x7feca2b11670 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxCreate_v3 - 0x7feca2b11280 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuMemGetInfo_v2 - 0x7feca2b10bf0 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxDestroy - 0x7feca2b35860 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuInit Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDriverGetVersion Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: raw version 0x2b20 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: CUDA driver version: 11.4 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDeviceGetCount Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: device count 1 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.223+08:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib64/libcuda.so.470.256.02 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d] CUDA totalMem 16280 mb Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d] CUDA freeMem 16025 mb Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d] Compute Capability 6.0 Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.365+08:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu" Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: releasing cuda driver library Feb 27 11:23:26 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:26.365+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d library=cuda variant=v11 compute=6.0 driver=11.4 name="Tesla P100-PCIE-16GB" total="15.9 GiB" available="15.7 GiB" Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GIN] 2025/02/27 - 11:23:57 | 200 | 95.974µs | 10.2.3.4 | HEAD "/" Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GIN] 2025/02/27 - 11:23:57 | 200 | 22.745396ms | 10.2.3.4 | POST "/api/show" Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:57.994+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="58.8 GiB" before.free="57.8 GiB" before.free_swap="0 B" now.total="58.8 GiB" now.free="57.7 GiB" now.free_swap="0 B" Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: initializing /usr/lib64/libcuda.so.470.256.02 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuInit - 0x7feca2b11730 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDriverGetVersion - 0x7feca2b11700 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetCount - 0x7feca2b116a0 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGet - 0x7feca2b116d0 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetAttribute - 0x7feca2b11550 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetUuid - 0x7feca2b11640 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetName - 0x7feca2b11670 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxCreate_v3 - 0x7feca2b11280 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuMemGetInfo_v2 - 0x7feca2b10bf0 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxDestroy - 0x7feca2b35860 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuInit Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDriverGetVersion Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: raw version 0x2b20 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: CUDA driver version: 11.4 Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDeviceGetCount Feb 27 11:23:57 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: device count 1 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.131+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d name="Tesla P100-PCIE-16GB" overhead="0 B" before.total="15.9 GiB" before.free="15.7 GiB" now.total="15.9 GiB" now.free="15.7 GiB" now.used="255.1 MiB" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: releasing cuda driver library Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.131+08:00 level=DEBUG source=sched.go:182 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.174+08:00 level=DEBUG source=sched.go:225 msg="loading first model" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.174+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[15.7 GiB]" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc library=cuda parallel=4 required="1.9 GiB" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.175+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="58.8 GiB" before.free="57.7 GiB" before.free_swap="0 B" now.total="58.8 GiB" now.free="57.7 GiB" now.free_swap="0 B" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: initializing /usr/lib64/libcuda.so.470.256.02 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuInit - 0x7feca2b11730 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDriverGetVersion - 0x7feca2b11700 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetCount - 0x7feca2b116a0 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGet - 0x7feca2b116d0 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetAttribute - 0x7feca2b11550 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetUuid - 0x7feca2b11640 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuDeviceGetName - 0x7feca2b11670 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxCreate_v3 - 0x7feca2b11280 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuMemGetInfo_v2 - 0x7feca2b10bf0 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: dlsym: cuCtxDestroy - 0x7feca2b35860 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuInit Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDriverGetVersion Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: raw version 0x2b20 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: CUDA driver version: 11.4 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: calling cuDeviceGetCount Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: device count 1 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d name="Tesla P100-PCIE-16GB" overhead="0 B" before.total="15.9 GiB" before.free="15.7 GiB" now.total="15.9 GiB" now.free="15.7 GiB" now.used="255.1 MiB" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: releasing cuda driver library Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=INFO source=server.go:97 msg="system memory" total="58.8 GiB" free="57.7 GiB" free_swap="0 B" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[15.7 GiB]" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[15.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.9 GiB" memory.required.partial="1.9 GiB" memory.required.kv="224.0 MiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[] Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.307+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 4 --parallel 4 --port 5188" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.307+08:00 level=DEBUG source=server.go:398 msg=subprocess environment="[PATH=$PATH CUDA_VISIBLE_DEVICES=GPU-6bccd7ac-afa6-0d26-8870-b01094a16c7d LD_LIBRARY_PATH=/usr/local/bin]" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.308+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.308+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.308+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=INFO source=runner.go:932 msg="starting go runner" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/bin Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.328+08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=4 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.328+08:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:5188" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest)) Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 0: general.architecture str = qwen2 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 1: general.type str = model Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 3: general.basename str = DeepSeek-R1-Distill-Qwen Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 4: general.size_label str = 1.5B Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 5: qwen2.block_count u32 = 28 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 6: qwen2.context_length u32 = 131072 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 7: qwen2.embedding_length u32 = 1536 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 8960 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 12 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 10000.000000 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 13: general.file_type u32 = 15 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = qwen2 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.559+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type f32: 141 tensors Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q4_K: 169 tensors Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q6_K: 29 tensors Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151646 '<|begin▁of▁sentence|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151644 '<|User|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151643 '<|end▁of▁sentence|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151645 '<|Assistant|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151647 '<|EOT|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special tokens cache size = 22 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: token to piece cache size = 0.9310 MB Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: format = GGUF V3 (latest) Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: arch = qwen2 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab type = BPE Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_vocab = 151936 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_merges = 151387 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab_only = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_ctx_train = 131072 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd = 1536 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_layer = 28 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_head = 12 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_head_kv = 2 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_rot = 128 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_swa = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_head_k = 128 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_head_v = 128 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_gqa = 6 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_k_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_norm_eps = 0.0e+00 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_norm_rms_eps = 1.0e-06 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: f_logit_scale = 0.0e+00 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_ff = 8960 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_expert = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_expert_used = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: causal attn = 1 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: pooling type = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: rope type = 2 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: rope scaling = linear Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: freq_base_train = 10000.0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: freq_scale_train = 1 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_ctx_orig_yarn = 131072 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: rope_finetuned = unknown Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_d_conv = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_d_inner = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_d_state = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_dt_rank = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model type = 1.5B Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model ftype = Q4_K - Medium Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model params = 1.78 B Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model size = 1.04 GiB (5.00 BPW) Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: general.name = DeepSeek R1 Distill Qwen 1.5B Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: BOS token = 151646 '<|begin▁of▁sentence|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOS token = 151643 '<|end▁of▁sentence|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOT token = 151643 '<|end▁of▁sentence|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: PAD token = 151643 '<|end▁of▁sentence|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: LF token = 148848 'ÄĬ' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151643 '<|end▁of▁sentence|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: max token length = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_tensors: CPU_Mapped model buffer size = 1059.89 MiB Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_seq_max = 4 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ctx = 8192 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ctx_per_seq = 2048 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_batch = 2048 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ubatch = 512 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: flash_attn = 0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: freq_base = 10000.0 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: freq_scale = 1 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 256, n_embd_v_gqa = 256 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_kv_cache_init: CPU KV buffer size = 224.00 MiB Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: KV self size = 224.00 MiB, K (f16): 112.00 MiB, V (f16): 112.00 MiB Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: CPU output buffer size = 2.34 MiB Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: CPU compute buffer size = 302.75 MiB Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: graph nodes = 986 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_new_context_with_model: graph splits = 1 Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.563+08:00 level=INFO source=server.go:596 msg="llama runner started in 1.26 seconds" Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.563+08:00 level=DEBUG source=sched.go:463 msg="finished setting up runner" model=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.563+08:00 level=DEBUG source=routes.go:289 msg="generate request" images=0 prompt="<|User|>how to restart redis cluster<|Assistant|>" Feb 27 11:23:59 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:59.567+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=8 used=0 remaining=8 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:55.887+08:00 level=DEBUG source=server.go:968 msg="new runner detected, loading model for cgo tokenization" Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest)) Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 0: general.architecture str = qwen2 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 1: general.type str = model Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 3: general.basename str = DeepSeek-R1-Distill-Qwen Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 4: general.size_label str = 1.5B Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 5: qwen2.block_count u32 = 28 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 6: qwen2.context_length u32 = 131072 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 7: qwen2.embedding_length u32 = 1536 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 8960 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 12 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 10000.000000 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 13: general.file_type u32 = 15 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 Feb 27 11:28:55 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = qwen2 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type f32: 141 tensors Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q4_K: 169 tensors Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_loader: - type q6_K: 29 tensors Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: special tokens cache size = 22 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_vocab: token to piece cache size = 0.9310 MB Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: format = GGUF V3 (latest) Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: arch = qwen2 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab type = BPE Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_vocab = 151936 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: n_merges = 151387 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: vocab_only = 1 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model type = ?B Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model ftype = all F32 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model params = 1.78 B Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: model size = 1.04 GiB (5.00 BPW) Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: general.name = DeepSeek R1 Distill Qwen 1.5B Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: BOS token = 151646 '<|begin▁of▁sentence|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOS token = 151643 '<|end▁of▁sentence|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOT token = 151643 '<|end▁of▁sentence|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: PAD token = 151643 '<|end▁of▁sentence|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: LF token = 148848 'ÄĬ' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151643 '<|end▁of▁sentence|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>' Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llm_load_print_meta: max token length = 256 Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: llama_model_load: vocab only - skipping tensors Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: [GIN] 2025/02/27 - 11:28:56 | 200 | 4m58s | 10.2.3.4 | POST "/api/generate" Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:56.855+08:00 level=DEBUG source=sched.go:467 msg="context for request finished" Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:56.855+08:00 level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc duration=5m0s Feb 27 11:28:56 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:28:56.855+08:00 level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/data/ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc refCount=0 ``` systemd setting ``` # cat /etc/systemd/system/ollama.service [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=root Group=root Restart=always RestartSec=3 Environment="PATH=$PATH" Environment="OLLAMA_MODELS=/data/ollama/models" Environment="OLLAMA_HOST=10.2.3.4:11434" Environment="OLLAMA_SCHED_SPREAD=1" Environment="CUDA_VISIBLE_DEVICES=0" Environment="OLLAMA_LLM_LIBRARY=cuda_v11" Environment="OLLAMA_DEBUG=1" [Install] WantedBy=default.target ```
Author
Owner

@rick-github commented on GitHub (Feb 27, 2025):

Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[]
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/bin

The good news is that the GPU is detected. The bad news is that you apparently don't have any GPU enabled runners. What's the output of:

find $(dirname $(dirname $(command -v ollama)))/lib/ollama | xargs ls -ld
<!-- gh-comment-id:2687789318 --> @rick-github commented on GitHub (Feb 27, 2025): ``` Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[] Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/bin ``` The good news is that the GPU is detected. The bad news is that you apparently don't have any GPU enabled runners. What's the output of: ``` find $(dirname $(dirname $(command -v ollama)))/lib/ollama | xargs ls -ld ```
Author
Owner

@moluzhui commented on GitHub (Feb 28, 2025):

Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[]
Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/bin

The good news is that the GPU is detected. The bad news is that you apparently don't have any GPU enabled runners. What's the output of:

find $(dirname $(dirname $(command -v ollama)))/lib/ollama | xargs ls -ld

I made a stupid mistake. I didn't put lib/ollama in the correct directory. After I adjusted it, the GPU display is working properly now. Thank you very much. I will close this.

<!-- gh-comment-id:2689827675 --> @moluzhui commented on GitHub (Feb 28, 2025): > ``` > Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.306+08:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[] > Feb 27 11:23:58 iZbp1d0j83sijy1mh3b3s9Z ollama[26115]: time=2025-02-27T11:23:58.327+08:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/usr/local/bin > ``` > > The good news is that the GPU is detected. The bad news is that you apparently don't have any GPU enabled runners. What's the output of: > > ``` > find $(dirname $(dirname $(command -v ollama)))/lib/ollama | xargs ls -ld > ``` I made a stupid mistake. I didn't put `lib/ollama` in the correct directory. After I adjusted it, the GPU display is working properly now. Thank you very much. I will close this.
Author
Owner

@Revita1ize commented on GitHub (May 23, 2025):

excuse me,where is the right directory of lib/ollama

<!-- gh-comment-id:2903956909 --> @Revita1ize commented on GitHub (May 23, 2025): excuse me,where is the right directory of `lib/ollama`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52624