[GH-ISSUE #7409] smollm got cuda error #51223

Closed
opened 2026-04-28 18:56:53 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @cool9203 on GitHub (Oct 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7409

What is the issue?

Run smollm got cuda error

Step:

  1. ollama run smollm:135m
  2. Input any text
Error: an unknown error was encountered while running the model CUDA error: CUBLAS_STATUS_NOT_SUPPORTED
  current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1896
  cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:106: CUDA error

Screen shot:
image

GPU:
nvidia RTX 3060ti
cuda version: 12.0

image

OS

WSL2

GPU

Nvidia

CPU

No response

Ollama version

0.3.14

Originally created by @cool9203 on GitHub (Oct 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7409 ### What is the issue? Run [smollm](https://ollama.com/library/smollm:135m) got cuda error Step: 1. ollama run smollm:135m 2. Input any text ``` Error: an unknown error was encountered while running the model CUDA error: CUBLAS_STATUS_NOT_SUPPORTED current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1896 cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:106: CUDA error ``` Screen shot: ![image](https://github.com/user-attachments/assets/40d5bacb-d6a3-4bf9-9799-16a6e2149ee7) GPU: nvidia RTX 3060ti cuda version: 12.0 ![image](https://github.com/user-attachments/assets/7e00958b-0a55-4ebf-a407-307e2b62a180) ### OS WSL2 ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.3.14
GiteaMirror added the bug label 2026-04-28 18:56:53 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 29, 2024):

Please post full server log.

<!-- gh-comment-id:2443832392 --> @rick-github commented on GitHub (Oct 29, 2024): Please post full server log.
Author
Owner

@cool9203 commented on GitHub (Oct 30, 2024):

This log ollama version is 0.4.0-rc5

Oct 30 09:20:44 220908-NB systemd[1]: Started Ollama Service.
Oct 30 09:20:45 220908-NB ollama[198]: 2024/10/30 09:20:45 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.105+08:00 level=INFO source=images.go:754 msg="total blobs: 18"
Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.111+08:00 level=INFO source=images.go:761 msg="total unused blobs removed: 0"
Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.113+08:00 level=INFO source=routes.go:1217 msg="Listening on 127.0.0.1:11434 (version 0.4.0-rc5)"
Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.124+08:00 level=INFO source=common.go:168 msg="extracting embedded files" dir=/tmp/ollama2316810848/runners
Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.289+08:00 level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm]"
Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.290+08:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Oct 30 09:20:47 220908-NB ollama[198]: time=2024-10-30T09:20:47.844+08:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 library=cuda variant=v11 compute=8.6 driver=12.0 name="NVIDIA GeForce RTX 3060 Ti" total="8.0 GiB" available="7.0 GiB"
Oct 30 09:20:56 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:20:56 | 200 |     907.115µs |       127.0.0.1 | HEAD     "/"
Oct 30 09:20:56 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:20:56 | 200 |    1.702733ms |       127.0.0.1 | GET      "/api/tags"
Oct 30 09:21:02 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:02 | 200 |      34.656µs |       127.0.0.1 | HEAD     "/"
Oct 30 09:21:02 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:02 | 200 |   10.449795ms |       127.0.0.1 | POST     "/api/show"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.196+08:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 gpu=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 parallel=4 available=7488929792 required="895.2 MiB"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.310+08:00 level=INFO source=llama-server.go:72 msg="system memory" total="7.6 GiB" free="6.8 GiB" free_swap="20.0 GiB"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.310+08:00 level=INFO source=memory.go:346 msg="offload to cuda" layers.requested=-1 layers.model=31 layers.offload=31 layers.split="" memory.available="[7.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="895.2 MiB" memory.required.partial="895.2 MiB" memory.required.kv="180.0 MiB" memory.required.allocations="[895.2 MiB]" memory.weights.total="237.1 MiB" memory.weights.repeating="208.4 MiB" memory.weights.nonrepeating="28.7 MiB" memory.graph.full="164.5 MiB" memory.graph.partial="168.4 MiB"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.311+08:00 level=INFO source=llama-server.go:355 msg="starting llama server" cmd="/tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 31 --threads 4 --parallel 4 --port 39413"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.312+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.312+08:00 level=INFO source=llama-server.go:534 msg="waiting for llama runner to start responding"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.312+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server error"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.344+08:00 level=INFO source=runner.go:869 msg="starting go runner"
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.344+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:39413"
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: loaded meta data with 39 key-value pairs and 272 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 (version GGUF V3 (latest))
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   1:                               general.type str              = model
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   2:                               general.name str              = SmolLM 135M
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   3:                       general.organization str              = HuggingFaceTB
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   4:                           general.finetune str              = Instruct
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   5:                           general.basename str              = SmolLM
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   6:                         general.size_label str              = 135M
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   7:                            general.license str              = apache-2.0
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = SmolLM 135M
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = HuggingFaceTB
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/HuggingFaceTB/...
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  12:                               general.tags arr[str,3]       = ["alignment-handbook", "trl", "sft"]
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  14:                           general.datasets arr[str,4]       = ["Magpie-Align/Magpie-Pro-300K-Filter...
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  15:                          llama.block_count u32              = 30
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  16:                       llama.context_length u32              = 2048
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  17:                     llama.embedding_length u32              = 576
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  18:                  llama.feed_forward_length u32              = 1536
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  19:                 llama.attention.head_count u32              = 9
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  20:              llama.attention.head_count_kv u32              = 3
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  21:                       llama.rope.freq_base f32              = 10000.000000
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  22:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  23:                          general.file_type u32              = 2
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  24:                           llama.vocab_size u32              = 49152
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  25:                 llama.rope.dimension_count u32              = 64
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  26:            tokenizer.ggml.add_space_prefix bool             = false
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  27:               tokenizer.ggml.add_bos_token bool             = false
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gpt2
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  29:                         tokenizer.ggml.pre str              = smollm
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  30:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<|im_start|>", "<|...
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,48900]   = ["Ġ t", "Ġ a", "i n", "h e", "Ġ Ġ...
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 1
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 2
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  35:            tokenizer.ggml.unknown_token_id u32              = 0
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 2
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  37:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv  38:               general.quantization_version u32              = 2
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - type  f32:   61 tensors
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - type q4_0:  210 tensors
Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - type q8_0:    1 tensors
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_vocab: special tokens cache size = 17
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_vocab: token to piece cache size = 0.3170 MB
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: format           = GGUF V3 (latest)
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: arch             = llama
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: vocab type       = BPE
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_vocab          = 49152
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_merges         = 48900
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: vocab_only       = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_ctx_train      = 2048
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd           = 576
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_layer          = 30
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_head           = 9
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_head_kv        = 3
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_rot            = 64
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_swa            = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_head_k    = 64
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_head_v    = 64
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_gqa            = 3
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_k_gqa     = 192
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_v_gqa     = 192
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_ff             = 1536
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_expert         = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_expert_used    = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: causal attn      = 1
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: pooling type     = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: rope type        = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: rope scaling     = linear
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: freq_base_train  = 10000.0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: freq_scale_train = 1
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_ctx_orig_yarn  = 2048
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: rope_finetuned   = unknown
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_d_conv       = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_d_inner      = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_d_state      = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_dt_rank      = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model type       = ?B
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model ftype      = Q4_0
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model params     = 134.52 M
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model size       = 85.77 MiB (5.35 BPW)
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: general.name     = SmolLM 135M
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: BOS token        = 1 '<|im_start|>'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOS token        = 2 '<|im_end|>'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: UNK token        = 0 '<|endoftext|>'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: PAD token        = 2 '<|im_end|>'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: LF token         = 143 'Ä'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOT token        = 0 '<|endoftext|>'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOG token        = 0 '<|endoftext|>'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOG token        = 2 '<|im_end|>'
Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: max token length = 162
Oct 30 09:21:02 220908-NB ollama[198]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Oct 30 09:21:02 220908-NB ollama[198]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Oct 30 09:21:02 220908-NB ollama[198]: ggml_cuda_init: found 1 CUDA devices:
Oct 30 09:21:02 220908-NB ollama[198]:   Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes
Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.563+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server loading model"
Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: ggml ctx size =    0.25 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: offloading 30 repeating layers to GPU
Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: offloading non-repeating layers to GPU
Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: offloaded 31/31 layers to GPU
Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors:        CPU buffer size =    28.69 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors:      CUDA0 buffer size =    85.82 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: n_ctx      = 8192
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: n_batch    = 2048
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: n_ubatch   = 512
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: flash_attn = 0
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: freq_base  = 10000.0
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: freq_scale = 1
Oct 30 09:21:03 220908-NB ollama[198]: llama_kv_cache_init:      CUDA0 KV buffer size =   180.00 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: KV self size  =  180.00 MiB, K (f16):   90.00 MiB, V (f16):   90.00 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model:  CUDA_Host  output buffer size =     0.76 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model:      CUDA0 compute buffer size =   164.50 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model:  CUDA_Host compute buffer size =    17.13 MiB
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: graph nodes  = 966
Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: graph splits = 2
Oct 30 09:21:03 220908-NB ollama[198]: time=2024-10-30T09:21:03.573+08:00 level=INFO source=llama-server.go:573 msg="llama runner started in 1.26 seconds"
Oct 30 09:21:03 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:03 | 200 |  1.535215335s |       127.0.0.1 | POST     "/api/generate"
Oct 30 09:21:08 220908-NB ollama[198]: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED
Oct 30 09:21:08 220908-NB ollama[198]:   current device: 0, in function ggml_cuda_mul_mat_batched_cublas at ggml-cuda.cu:1926
Oct 30 09:21:08 220908-NB ollama[198]:   cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
Oct 30 09:21:08 220908-NB ollama[198]: ggml-cuda.cu:132: CUDA error
Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(+0x3a3688)[0x5572fdae9688]
Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(ggml_abort+0x136)[0x5572fdaeafb6]
Oct 30 09:21:08 220908-NB ollama[198]: /usr/local/lib/ollama/libggml_cuda_v11.so(+0x36a52)[0x7f21a8c6fa52]
Oct 30 09:21:08 220908-NB ollama[198]: /usr/local/lib/ollama/libggml_cuda_v11.so(+0x3936c)[0x7f21a8c7236c]
Oct 30 09:21:08 220908-NB ollama[198]: /usr/local/lib/ollama/libggml_cuda_v11.so(+0x4408e)[0x7f21a8c7d08e]
Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(ggml_backend_sched_graph_compute_async+0x181)[0x5572fdad3971]
Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(llama_decode+0x5f1)[0x5572fdbb60a1]
Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(_cgo_aa09f91ec3e0_Cfunc_llama_decode+0x4f)[0x5572fdacb83f]
Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(+0x171961)[0x5572fd8b7961]
Oct 30 09:21:08 220908-NB ollama[198]: SIGABRT: abort
Oct 30 09:21:08 220908-NB ollama[198]: PC=0x7f218fe419fc m=3 sigcode=18446744073709551610
Oct 30 09:21:08 220908-NB ollama[198]: signal arrived during cgo execution
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 20 gp=0xc000102a80 m=3 mp=0xc000059008 [syscall]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.cgocall(0x5572fdacb7f0, 0xc000204ad8)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/cgocall.go:157 +0x4b fp=0xc000204ab0 sp=0xc000204a78 pc=0x5572fd84f2cb
Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7f21340062d0, {0x1, 0x7f21341773f0, 0x0, 0x7f2134179400, 0x7f213417b410, 0x7f21342f4ca0, 0x7f21341402e0, 0x0, 0x0, ...})
Oct 30 09:21:08 220908-NB ollama[198]:         _cgo_gotypes.go:512 +0x4f fp=0xc000204ad8 sp=0xc000204ab0 pc=0x5572fd94cb0f
Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5572fd85f178?, 0x15?)
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/llama.go:124 +0x11e fp=0xc000204be0 sp=0xc000204ad8 pc=0x5572fd94e9be
Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama.(*Context).Decode(0xc000242280?, 0xc000204ef8?)
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/llama.go:124 +0x17 fp=0xc000204c28 sp=0xc000204be0 pc=0x5572fd94e7d7
Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).processBatch(0xc000144120, 0xc000204ef8, 0xc000204e90)
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/runner/runner.go:434 +0x285 fp=0xc000204df8 sp=0xc000204c28 pc=0x5572fdac69a5
Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).run(0xc000144120, {0x5572fde08cc8, 0xc000180000})
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/runner/runner.go:352 +0x359 fp=0xc000204fb8 sp=0xc000204df8 pc=0x5572fdac6279
Oct 30 09:21:08 220908-NB ollama[198]: main.main.gowrap2()
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/runner/runner.go:907 +0x28 fp=0xc000204fe0 sp=0xc000204fb8 pc=0x5572fdaca988
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000204fe8 sp=0xc000204fe0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: created by main.main in goroutine 1
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/runner/runner.go:907 +0xcab
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x1?, 0xc00002b908?, 0xf4?, 0x5c?, 0xc00002b8e8?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:402 +0xce fp=0xc00002b888 sp=0xc00002b868 pc=0x5572fd885f0e
Oct 30 09:21:08 220908-NB ollama[198]: runtime.netpollblock(0x10?, 0xfd84ea26?, 0x72?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/netpoll.go:573 +0xf7 fp=0xc00002b8c0 sp=0xc00002b888 pc=0x5572fd87e157
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.runtime_pollWait(0x7f21a8457f50, 0x72)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/netpoll.go:345 +0x85 fp=0xc00002b8e0 sp=0xc00002b8c0 pc=0x5572fd8b29a5
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).wait(0x3?, 0x7f21a8452d48?, 0x0)
Oct 30 09:21:08 220908-NB ollama[198]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002b908 sp=0xc00002b8e0 pc=0x5572fd902c87
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).waitRead(...)
Oct 30 09:21:08 220908-NB ollama[198]:         internal/poll/fd_poll_runtime.go:89
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*FD).Accept(0xc00017c080)
Oct 30 09:21:08 220908-NB ollama[198]:         internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002b9b0 sp=0xc00002b908 pc=0x5572fd90414c
Oct 30 09:21:08 220908-NB ollama[198]: net.(*netFD).accept(0xc00017c080)
Oct 30 09:21:08 220908-NB ollama[198]:         net/fd_unix.go:172 +0x29 fp=0xc00002ba68 sp=0xc00002b9b0 pc=0x5572fd971789
Oct 30 09:21:08 220908-NB ollama[198]: net.(*TCPListener).accept(0xc0001461c0)
Oct 30 09:21:08 220908-NB ollama[198]:         net/tcpsock_posix.go:159 +0x1e fp=0xc00002ba90 sp=0xc00002ba68 pc=0x5572fd9824be
Oct 30 09:21:08 220908-NB ollama[198]: net.(*TCPListener).Accept(0xc0001461c0)
Oct 30 09:21:08 220908-NB ollama[198]:         net/tcpsock.go:327 +0x30 fp=0xc00002bac0 sp=0xc00002ba90 pc=0x5572fd981810
Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*onceCloseListener).Accept(0xc00021a000?)
Oct 30 09:21:08 220908-NB ollama[198]:         <autogenerated>:1 +0x24 fp=0xc00002bad8 sp=0xc00002bac0 pc=0x5572fdaa8924
Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*Server).Serve(0xc000196000, {0x5572fde086c0, 0xc0001461c0})
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:3260 +0x33e fp=0xc00002bc08 sp=0xc00002bad8 pc=0x5572fda9f73e
Oct 30 09:21:08 220908-NB ollama[198]: main.main()
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/runner/runner.go:927 +0x104c fp=0xc00002bf50 sp=0xc00002bc08 pc=0x5572fdaca70c
Oct 30 09:21:08 220908-NB ollama[198]: runtime.main()
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:271 +0x29d fp=0xc00002bfe0 sp=0xc00002bf50 pc=0x5572fd885add
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00002bfe8 sp=0xc00002bfe0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:402 +0xce fp=0xc000052fa8 sp=0xc000052f88 pc=0x5572fd885f0e
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goparkunlock(...)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:408
Oct 30 09:21:08 220908-NB ollama[198]: runtime.forcegchelper()
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:326 +0xb8 fp=0xc000052fe0 sp=0xc000052fa8 pc=0x5572fd885d98
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000052fe8 sp=0xc000052fe0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.init.6 in goroutine 1
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:314 +0x1a
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:402 +0xce fp=0xc000053780 sp=0xc000053760 pc=0x5572fd885f0e
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goparkunlock(...)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:408
Oct 30 09:21:08 220908-NB ollama[198]: runtime.bgsweep(0xc00007c000)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mgcsweep.go:278 +0x94 fp=0xc0000537c8 sp=0xc000053780 pc=0x5572fd870a54
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gcenable.gowrap1()
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mgc.go:203 +0x25 fp=0xc0000537e0 sp=0xc0000537c8 pc=0x5572fd865585
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000537e8 sp=0xc0000537e0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.gcenable in goroutine 1
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mgc.go:203 +0x66
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0xc00007c000?, 0x5572fdd08be8?, 0x1?, 0x0?, 0xc000007340?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:402 +0xce fp=0xc000053f78 sp=0xc000053f58 pc=0x5572fd885f0e
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goparkunlock(...)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:408
Oct 30 09:21:08 220908-NB ollama[198]: runtime.(*scavengerState).park(0x5572fdfd54c0)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mgcscavenge.go:425 +0x49 fp=0xc000053fa8 sp=0xc000053f78 pc=0x5572fd86e449
Oct 30 09:21:08 220908-NB ollama[198]: runtime.bgscavenge(0xc00007c000)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mgcscavenge.go:653 +0x3c fp=0xc000053fc8 sp=0xc000053fa8 pc=0x5572fd86e9dc
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gcenable.gowrap2()
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mgc.go:204 +0x25 fp=0xc000053fe0 sp=0xc000053fc8 pc=0x5572fd865525
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000053fe8 sp=0xc000053fe0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.gcenable in goroutine 1
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mgc.go:204 +0xa5
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 18 gp=0xc000102700 m=nil [finalizer wait]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0xc000052648?, 0x5572fd858e85?, 0xa8?, 0x1?, 0xc0000061c0?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:402 +0xce fp=0xc000052620 sp=0xc000052600 pc=0x5572fd885f0e
Oct 30 09:21:08 220908-NB ollama[198]: runtime.runfinq()
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mfinal.go:194 +0x107 fp=0xc0000527e0 sp=0xc000052620 pc=0x5572fd8645c7
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000527e8 sp=0xc0000527e0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.createfing in goroutine 1
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/mfinal.go:164 +0x3d
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 34 gp=0xc000220000 m=nil [select]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0xc000257a80?, 0x2?, 0x18?, 0x77?, 0xc000257824?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:402 +0xce fp=0xc000257698 sp=0xc000257678 pc=0x5572fd885f0e
Oct 30 09:21:08 220908-NB ollama[198]: runtime.selectgo(0xc000257a80, 0xc000257820, 0xc000218280?, 0x0, 0x1?, 0x1)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/select.go:327 +0x725 fp=0xc0002577b8 sp=0xc000257698 pc=0x5572fd8972e5
Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).completion(0xc000144120, {0x5572fde08870, 0xc0002361c0}, 0xc0002266c0)
Oct 30 09:21:08 220908-NB ollama[198]:         github.com/ollama/ollama/llama/runner/runner.go:659 +0x8d1 fp=0xc000257ab8 sp=0xc0002577b8 pc=0x5572fdac8111
Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).completion-fm({0x5572fde08870?, 0xc0002361c0?}, 0x5572fdaa3a6d?)
Oct 30 09:21:08 220908-NB ollama[198]:         <autogenerated>:1 +0x36 fp=0xc000257ae8 sp=0xc000257ab8 pc=0x5572fdacb076
Oct 30 09:21:08 220908-NB ollama[198]: net/http.HandlerFunc.ServeHTTP(0xc00011ea90?, {0x5572fde08870?, 0xc0002361c0?}, 0x10?)
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:2171 +0x29 fp=0xc000257b10 sp=0xc000257ae8 pc=0x5572fda9c509
Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*ServeMux).ServeHTTP(0x5572fd858e85?, {0x5572fde08870, 0xc0002361c0}, 0xc0002266c0)
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:2688 +0x1ad fp=0xc000257b60 sp=0xc000257b10 pc=0x5572fda9e38d
Oct 30 09:21:08 220908-NB ollama[198]: net/http.serverHandler.ServeHTTP({0x5572fde07bc0?}, {0x5572fde08870?, 0xc0002361c0?}, 0x6?)
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:3142 +0x8e fp=0xc000257b90 sp=0xc000257b60 pc=0x5572fda9f3ae
Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*conn).serve(0xc00021a000, {0x5572fde08c90, 0xc00011cd80})
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:2044 +0x5e8 fp=0xc000257fb8 sp=0xc000257b90 pc=0x5572fda9b148
Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*Server).Serve.gowrap3()
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:3290 +0x28 fp=0xc000257fe0 sp=0xc000257fb8 pc=0x5572fda9fb28
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000257fe8 sp=0xc000257fe0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: created by net/http.(*Server).Serve in goroutine 1
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:3290 +0x4b4
Oct 30 09:21:08 220908-NB ollama[198]: goroutine 37 gp=0xc000220380 m=nil [IO wait]:
Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0xed?, 0xb?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/proc.go:402 +0xce fp=0xc00004eda8 sp=0xc00004ed88 pc=0x5572fd885f0e
Oct 30 09:21:08 220908-NB ollama[198]: runtime.netpollblock(0x5572fd8ec818?, 0xfd84ea26?, 0x72?)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/netpoll.go:573 +0xf7 fp=0xc00004ede0 sp=0xc00004eda8 pc=0x5572fd87e157
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.runtime_pollWait(0x7f21a8457e58, 0x72)
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/netpoll.go:345 +0x85 fp=0xc00004ee00 sp=0xc00004ede0 pc=0x5572fd8b29a5
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).wait(0xc000218000?, 0xc00020e101?, 0x0)
Oct 30 09:21:08 220908-NB ollama[198]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00004ee28 sp=0xc00004ee00 pc=0x5572fd902c87
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).waitRead(...)
Oct 30 09:21:08 220908-NB ollama[198]:         internal/poll/fd_poll_runtime.go:89
Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*FD).Read(0xc000218000, {0xc00020e101, 0x1, 0x1})
Oct 30 09:21:08 220908-NB ollama[198]:         internal/poll/fd_unix.go:164 +0x27a fp=0xc00004eec0 sp=0xc00004ee28 pc=0x5572fd9037da
Oct 30 09:21:08 220908-NB ollama[198]: net.(*netFD).Read(0xc000218000, {0xc00020e101?, 0xc00004ef48?, 0x5572fd8b45d0?})
Oct 30 09:21:08 220908-NB ollama[198]:         net/fd_posix.go:55 +0x25 fp=0xc00004ef08 sp=0xc00004eec0 pc=0x5572fd970685
Oct 30 09:21:08 220908-NB ollama[198]: net.(*conn).Read(0xc000210008, {0xc00020e101?, 0x0?, 0x5572fe0be060?})
Oct 30 09:21:08 220908-NB ollama[198]:         net/net.go:185 +0x45 fp=0xc00004ef50 sp=0xc00004ef08 pc=0x5572fd97a945
Oct 30 09:21:08 220908-NB ollama[198]: net.(*TCPConn).Read(0x5572fdf96830?, {0xc00020e101?, 0x0?, 0x0?})
Oct 30 09:21:08 220908-NB ollama[198]:         <autogenerated>:1 +0x25 fp=0xc00004ef80 sp=0xc00004ef50 pc=0x5572fd986325
Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*connReader).backgroundRead(0xc00020e0f0)
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:681 +0x37 fp=0xc00004efc8 sp=0xc00004ef80 pc=0x5572fda950b7
Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*connReader).startBackgroundRead.gowrap2()
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:677 +0x25 fp=0xc00004efe0 sp=0xc00004efc8 pc=0x5572fda94fe5
Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({})
Oct 30 09:21:08 220908-NB ollama[198]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00004efe8 sp=0xc00004efe0 pc=0x5572fd8b7ce1
Oct 30 09:21:08 220908-NB ollama[198]: created by net/http.(*connReader).startBackgroundRead in goroutine 34
Oct 30 09:21:08 220908-NB ollama[198]:         net/http/server.go:677 +0xba
Oct 30 09:21:08 220908-NB ollama[198]: rax    0x0
Oct 30 09:21:08 220908-NB ollama[198]: rbx    0x7f2148da9000
Oct 30 09:21:08 220908-NB ollama[198]: rcx    0x7f218fe419fc
Oct 30 09:21:08 220908-NB ollama[198]: rdx    0x6
Oct 30 09:21:08 220908-NB ollama[198]: rdi    0x227
Oct 30 09:21:08 220908-NB ollama[198]: rsi    0x229
Oct 30 09:21:08 220908-NB ollama[198]: rbp    0x229
Oct 30 09:21:08 220908-NB ollama[198]: rsp    0x7f2148d9fea0
Oct 30 09:21:08 220908-NB ollama[198]: r8     0x7f2148d9ff70
Oct 30 09:21:08 220908-NB ollama[198]: r9     0x5
Oct 30 09:21:08 220908-NB ollama[198]: r10    0x8
Oct 30 09:21:08 220908-NB ollama[198]: r11    0x246
Oct 30 09:21:08 220908-NB ollama[198]: r12    0x6
Oct 30 09:21:08 220908-NB ollama[198]: r13    0x16
Oct 30 09:21:08 220908-NB ollama[198]: r14    0x7f21a8ecf0b0
Oct 30 09:21:08 220908-NB ollama[198]: r15    0x7f2102afa790
Oct 30 09:21:08 220908-NB ollama[198]: rip    0x7f218fe419fc
Oct 30 09:21:08 220908-NB ollama[198]: rflags 0x246
Oct 30 09:21:08 220908-NB ollama[198]: cs     0x33
Oct 30 09:21:08 220908-NB ollama[198]: fs     0x0
Oct 30 09:21:08 220908-NB ollama[198]: gs     0x0
Oct 30 09:21:08 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:08 | 200 |  1.772771544s |       127.0.0.1 | POST     "/api/chat"
<!-- gh-comment-id:2445625096 --> @cool9203 commented on GitHub (Oct 30, 2024): This log ollama version is `0.4.0-rc5` ``` Oct 30 09:20:44 220908-NB systemd[1]: Started Ollama Service. Oct 30 09:20:45 220908-NB ollama[198]: 2024/10/30 09:20:45 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.105+08:00 level=INFO source=images.go:754 msg="total blobs: 18" Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.111+08:00 level=INFO source=images.go:761 msg="total unused blobs removed: 0" Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.113+08:00 level=INFO source=routes.go:1217 msg="Listening on 127.0.0.1:11434 (version 0.4.0-rc5)" Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.124+08:00 level=INFO source=common.go:168 msg="extracting embedded files" dir=/tmp/ollama2316810848/runners Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.289+08:00 level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm]" Oct 30 09:20:45 220908-NB ollama[198]: time=2024-10-30T09:20:45.290+08:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Oct 30 09:20:47 220908-NB ollama[198]: time=2024-10-30T09:20:47.844+08:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 library=cuda variant=v11 compute=8.6 driver=12.0 name="NVIDIA GeForce RTX 3060 Ti" total="8.0 GiB" available="7.0 GiB" Oct 30 09:20:56 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:20:56 | 200 | 907.115µs | 127.0.0.1 | HEAD "/" Oct 30 09:20:56 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:20:56 | 200 | 1.702733ms | 127.0.0.1 | GET "/api/tags" Oct 30 09:21:02 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:02 | 200 | 34.656µs | 127.0.0.1 | HEAD "/" Oct 30 09:21:02 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:02 | 200 | 10.449795ms | 127.0.0.1 | POST "/api/show" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.196+08:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 gpu=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 parallel=4 available=7488929792 required="895.2 MiB" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.310+08:00 level=INFO source=llama-server.go:72 msg="system memory" total="7.6 GiB" free="6.8 GiB" free_swap="20.0 GiB" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.310+08:00 level=INFO source=memory.go:346 msg="offload to cuda" layers.requested=-1 layers.model=31 layers.offload=31 layers.split="" memory.available="[7.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="895.2 MiB" memory.required.partial="895.2 MiB" memory.required.kv="180.0 MiB" memory.required.allocations="[895.2 MiB]" memory.weights.total="237.1 MiB" memory.weights.repeating="208.4 MiB" memory.weights.nonrepeating="28.7 MiB" memory.graph.full="164.5 MiB" memory.graph.partial="168.4 MiB" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.311+08:00 level=INFO source=llama-server.go:355 msg="starting llama server" cmd="/tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 31 --threads 4 --parallel 4 --port 39413" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.312+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.312+08:00 level=INFO source=llama-server.go:534 msg="waiting for llama runner to start responding" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.312+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server error" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.344+08:00 level=INFO source=runner.go:869 msg="starting go runner" Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.344+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:39413" Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: loaded meta data with 39 key-value pairs and 272 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 (version GGUF V3 (latest)) Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 0: general.architecture str = llama Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 1: general.type str = model Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 2: general.name str = SmolLM 135M Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 3: general.organization str = HuggingFaceTB Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 4: general.finetune str = Instruct Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 5: general.basename str = SmolLM Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 6: general.size_label str = 135M Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 7: general.license str = apache-2.0 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 9: general.base_model.0.name str = SmolLM 135M Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 10: general.base_model.0.organization str = HuggingFaceTB Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/HuggingFaceTB/... Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 12: general.tags arr[str,3] = ["alignment-handbook", "trl", "sft"] Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 14: general.datasets arr[str,4] = ["Magpie-Align/Magpie-Pro-300K-Filter... Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 15: llama.block_count u32 = 30 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 16: llama.context_length u32 = 2048 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 17: llama.embedding_length u32 = 576 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 18: llama.feed_forward_length u32 = 1536 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 19: llama.attention.head_count u32 = 9 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 20: llama.attention.head_count_kv u32 = 3 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 21: llama.rope.freq_base f32 = 10000.000000 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 22: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 23: general.file_type u32 = 2 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 24: llama.vocab_size u32 = 49152 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 25: llama.rope.dimension_count u32 = 64 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 26: tokenizer.ggml.add_space_prefix bool = false Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 28: tokenizer.ggml.model str = gpt2 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 29: tokenizer.ggml.pre str = smollm Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 30: tokenizer.ggml.tokens arr[str,49152] = ["<|endoftext|>", "<|im_start|>", "<|... Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,49152] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 32: tokenizer.ggml.merges arr[str,48900] = ["Ġ t", "Ġ a", "i n", "h e", "Ġ Ġ... Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 33: tokenizer.ggml.bos_token_id u32 = 1 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 34: tokenizer.ggml.eos_token_id u32 = 2 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 35: tokenizer.ggml.unknown_token_id u32 = 0 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 2 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 37: tokenizer.chat_template str = {% for message in messages %}{{'<|im_... Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - kv 38: general.quantization_version u32 = 2 Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - type f32: 61 tensors Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - type q4_0: 210 tensors Oct 30 09:21:02 220908-NB ollama[198]: llama_model_loader: - type q8_0: 1 tensors Oct 30 09:21:02 220908-NB ollama[198]: llm_load_vocab: special tokens cache size = 17 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_vocab: token to piece cache size = 0.3170 MB Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: format = GGUF V3 (latest) Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: arch = llama Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: vocab type = BPE Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_vocab = 49152 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_merges = 48900 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: vocab_only = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_ctx_train = 2048 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd = 576 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_layer = 30 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_head = 9 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_head_kv = 3 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_rot = 64 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_swa = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_head_k = 64 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_head_v = 64 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_gqa = 3 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_k_gqa = 192 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_embd_v_gqa = 192 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_norm_eps = 0.0e+00 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: f_logit_scale = 0.0e+00 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_ff = 1536 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_expert = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_expert_used = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: causal attn = 1 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: pooling type = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: rope type = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: rope scaling = linear Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: freq_base_train = 10000.0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: freq_scale_train = 1 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: n_ctx_orig_yarn = 2048 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: rope_finetuned = unknown Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_d_conv = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_d_inner = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_d_state = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_dt_rank = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model type = ?B Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model ftype = Q4_0 Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model params = 134.52 M Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: model size = 85.77 MiB (5.35 BPW) Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: general.name = SmolLM 135M Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: BOS token = 1 '<|im_start|>' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOS token = 2 '<|im_end|>' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: UNK token = 0 '<|endoftext|>' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: PAD token = 2 '<|im_end|>' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: LF token = 143 'Ä' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOT token = 0 '<|endoftext|>' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOG token = 0 '<|endoftext|>' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: EOG token = 2 '<|im_end|>' Oct 30 09:21:02 220908-NB ollama[198]: llm_load_print_meta: max token length = 162 Oct 30 09:21:02 220908-NB ollama[198]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Oct 30 09:21:02 220908-NB ollama[198]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Oct 30 09:21:02 220908-NB ollama[198]: ggml_cuda_init: found 1 CUDA devices: Oct 30 09:21:02 220908-NB ollama[198]: Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes Oct 30 09:21:02 220908-NB ollama[198]: time=2024-10-30T09:21:02.563+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server loading model" Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: ggml ctx size = 0.25 MiB Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: offloading 30 repeating layers to GPU Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: offloading non-repeating layers to GPU Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: offloaded 31/31 layers to GPU Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: CPU buffer size = 28.69 MiB Oct 30 09:21:03 220908-NB ollama[198]: llm_load_tensors: CUDA0 buffer size = 85.82 MiB Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: n_ctx = 8192 Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: n_batch = 2048 Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: n_ubatch = 512 Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: flash_attn = 0 Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: freq_base = 10000.0 Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: freq_scale = 1 Oct 30 09:21:03 220908-NB ollama[198]: llama_kv_cache_init: CUDA0 KV buffer size = 180.00 MiB Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: KV self size = 180.00 MiB, K (f16): 90.00 MiB, V (f16): 90.00 MiB Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: CUDA_Host output buffer size = 0.76 MiB Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: CUDA0 compute buffer size = 164.50 MiB Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: CUDA_Host compute buffer size = 17.13 MiB Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: graph nodes = 966 Oct 30 09:21:03 220908-NB ollama[198]: llama_new_context_with_model: graph splits = 2 Oct 30 09:21:03 220908-NB ollama[198]: time=2024-10-30T09:21:03.573+08:00 level=INFO source=llama-server.go:573 msg="llama runner started in 1.26 seconds" Oct 30 09:21:03 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:03 | 200 | 1.535215335s | 127.0.0.1 | POST "/api/generate" Oct 30 09:21:08 220908-NB ollama[198]: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED Oct 30 09:21:08 220908-NB ollama[198]: current device: 0, in function ggml_cuda_mul_mat_batched_cublas at ggml-cuda.cu:1926 Oct 30 09:21:08 220908-NB ollama[198]: cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) Oct 30 09:21:08 220908-NB ollama[198]: ggml-cuda.cu:132: CUDA error Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(+0x3a3688)[0x5572fdae9688] Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(ggml_abort+0x136)[0x5572fdaeafb6] Oct 30 09:21:08 220908-NB ollama[198]: /usr/local/lib/ollama/libggml_cuda_v11.so(+0x36a52)[0x7f21a8c6fa52] Oct 30 09:21:08 220908-NB ollama[198]: /usr/local/lib/ollama/libggml_cuda_v11.so(+0x3936c)[0x7f21a8c7236c] Oct 30 09:21:08 220908-NB ollama[198]: /usr/local/lib/ollama/libggml_cuda_v11.so(+0x4408e)[0x7f21a8c7d08e] Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(ggml_backend_sched_graph_compute_async+0x181)[0x5572fdad3971] Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(llama_decode+0x5f1)[0x5572fdbb60a1] Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(_cgo_aa09f91ec3e0_Cfunc_llama_decode+0x4f)[0x5572fdacb83f] Oct 30 09:21:08 220908-NB ollama[198]: /tmp/ollama2316810848/runners/cuda_v11/ollama_llama_server(+0x171961)[0x5572fd8b7961] Oct 30 09:21:08 220908-NB ollama[198]: SIGABRT: abort Oct 30 09:21:08 220908-NB ollama[198]: PC=0x7f218fe419fc m=3 sigcode=18446744073709551610 Oct 30 09:21:08 220908-NB ollama[198]: signal arrived during cgo execution Oct 30 09:21:08 220908-NB ollama[198]: goroutine 20 gp=0xc000102a80 m=3 mp=0xc000059008 [syscall]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.cgocall(0x5572fdacb7f0, 0xc000204ad8) Oct 30 09:21:08 220908-NB ollama[198]: runtime/cgocall.go:157 +0x4b fp=0xc000204ab0 sp=0xc000204a78 pc=0x5572fd84f2cb Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7f21340062d0, {0x1, 0x7f21341773f0, 0x0, 0x7f2134179400, 0x7f213417b410, 0x7f21342f4ca0, 0x7f21341402e0, 0x0, 0x0, ...}) Oct 30 09:21:08 220908-NB ollama[198]: _cgo_gotypes.go:512 +0x4f fp=0xc000204ad8 sp=0xc000204ab0 pc=0x5572fd94cb0f Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5572fd85f178?, 0x15?) Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/llama.go:124 +0x11e fp=0xc000204be0 sp=0xc000204ad8 pc=0x5572fd94e9be Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama.(*Context).Decode(0xc000242280?, 0xc000204ef8?) Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/llama.go:124 +0x17 fp=0xc000204c28 sp=0xc000204be0 pc=0x5572fd94e7d7 Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).processBatch(0xc000144120, 0xc000204ef8, 0xc000204e90) Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/runner/runner.go:434 +0x285 fp=0xc000204df8 sp=0xc000204c28 pc=0x5572fdac69a5 Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).run(0xc000144120, {0x5572fde08cc8, 0xc000180000}) Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/runner/runner.go:352 +0x359 fp=0xc000204fb8 sp=0xc000204df8 pc=0x5572fdac6279 Oct 30 09:21:08 220908-NB ollama[198]: main.main.gowrap2() Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/runner/runner.go:907 +0x28 fp=0xc000204fe0 sp=0xc000204fb8 pc=0x5572fdaca988 Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000204fe8 sp=0xc000204fe0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: created by main.main in goroutine 1 Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/runner/runner.go:907 +0xcab Oct 30 09:21:08 220908-NB ollama[198]: goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x1?, 0xc00002b908?, 0xf4?, 0x5c?, 0xc00002b8e8?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:402 +0xce fp=0xc00002b888 sp=0xc00002b868 pc=0x5572fd885f0e Oct 30 09:21:08 220908-NB ollama[198]: runtime.netpollblock(0x10?, 0xfd84ea26?, 0x72?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/netpoll.go:573 +0xf7 fp=0xc00002b8c0 sp=0xc00002b888 pc=0x5572fd87e157 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.runtime_pollWait(0x7f21a8457f50, 0x72) Oct 30 09:21:08 220908-NB ollama[198]: runtime/netpoll.go:345 +0x85 fp=0xc00002b8e0 sp=0xc00002b8c0 pc=0x5572fd8b29a5 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).wait(0x3?, 0x7f21a8452d48?, 0x0) Oct 30 09:21:08 220908-NB ollama[198]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002b908 sp=0xc00002b8e0 pc=0x5572fd902c87 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).waitRead(...) Oct 30 09:21:08 220908-NB ollama[198]: internal/poll/fd_poll_runtime.go:89 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*FD).Accept(0xc00017c080) Oct 30 09:21:08 220908-NB ollama[198]: internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002b9b0 sp=0xc00002b908 pc=0x5572fd90414c Oct 30 09:21:08 220908-NB ollama[198]: net.(*netFD).accept(0xc00017c080) Oct 30 09:21:08 220908-NB ollama[198]: net/fd_unix.go:172 +0x29 fp=0xc00002ba68 sp=0xc00002b9b0 pc=0x5572fd971789 Oct 30 09:21:08 220908-NB ollama[198]: net.(*TCPListener).accept(0xc0001461c0) Oct 30 09:21:08 220908-NB ollama[198]: net/tcpsock_posix.go:159 +0x1e fp=0xc00002ba90 sp=0xc00002ba68 pc=0x5572fd9824be Oct 30 09:21:08 220908-NB ollama[198]: net.(*TCPListener).Accept(0xc0001461c0) Oct 30 09:21:08 220908-NB ollama[198]: net/tcpsock.go:327 +0x30 fp=0xc00002bac0 sp=0xc00002ba90 pc=0x5572fd981810 Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*onceCloseListener).Accept(0xc00021a000?) Oct 30 09:21:08 220908-NB ollama[198]: <autogenerated>:1 +0x24 fp=0xc00002bad8 sp=0xc00002bac0 pc=0x5572fdaa8924 Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*Server).Serve(0xc000196000, {0x5572fde086c0, 0xc0001461c0}) Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:3260 +0x33e fp=0xc00002bc08 sp=0xc00002bad8 pc=0x5572fda9f73e Oct 30 09:21:08 220908-NB ollama[198]: main.main() Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/runner/runner.go:927 +0x104c fp=0xc00002bf50 sp=0xc00002bc08 pc=0x5572fdaca70c Oct 30 09:21:08 220908-NB ollama[198]: runtime.main() Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:271 +0x29d fp=0xc00002bfe0 sp=0xc00002bf50 pc=0x5572fd885add Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00002bfe8 sp=0xc00002bfe0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:402 +0xce fp=0xc000052fa8 sp=0xc000052f88 pc=0x5572fd885f0e Oct 30 09:21:08 220908-NB ollama[198]: runtime.goparkunlock(...) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:408 Oct 30 09:21:08 220908-NB ollama[198]: runtime.forcegchelper() Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:326 +0xb8 fp=0xc000052fe0 sp=0xc000052fa8 pc=0x5572fd885d98 Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000052fe8 sp=0xc000052fe0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.init.6 in goroutine 1 Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:314 +0x1a Oct 30 09:21:08 220908-NB ollama[198]: goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:402 +0xce fp=0xc000053780 sp=0xc000053760 pc=0x5572fd885f0e Oct 30 09:21:08 220908-NB ollama[198]: runtime.goparkunlock(...) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:408 Oct 30 09:21:08 220908-NB ollama[198]: runtime.bgsweep(0xc00007c000) Oct 30 09:21:08 220908-NB ollama[198]: runtime/mgcsweep.go:278 +0x94 fp=0xc0000537c8 sp=0xc000053780 pc=0x5572fd870a54 Oct 30 09:21:08 220908-NB ollama[198]: runtime.gcenable.gowrap1() Oct 30 09:21:08 220908-NB ollama[198]: runtime/mgc.go:203 +0x25 fp=0xc0000537e0 sp=0xc0000537c8 pc=0x5572fd865585 Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000537e8 sp=0xc0000537e0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.gcenable in goroutine 1 Oct 30 09:21:08 220908-NB ollama[198]: runtime/mgc.go:203 +0x66 Oct 30 09:21:08 220908-NB ollama[198]: goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0xc00007c000?, 0x5572fdd08be8?, 0x1?, 0x0?, 0xc000007340?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:402 +0xce fp=0xc000053f78 sp=0xc000053f58 pc=0x5572fd885f0e Oct 30 09:21:08 220908-NB ollama[198]: runtime.goparkunlock(...) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:408 Oct 30 09:21:08 220908-NB ollama[198]: runtime.(*scavengerState).park(0x5572fdfd54c0) Oct 30 09:21:08 220908-NB ollama[198]: runtime/mgcscavenge.go:425 +0x49 fp=0xc000053fa8 sp=0xc000053f78 pc=0x5572fd86e449 Oct 30 09:21:08 220908-NB ollama[198]: runtime.bgscavenge(0xc00007c000) Oct 30 09:21:08 220908-NB ollama[198]: runtime/mgcscavenge.go:653 +0x3c fp=0xc000053fc8 sp=0xc000053fa8 pc=0x5572fd86e9dc Oct 30 09:21:08 220908-NB ollama[198]: runtime.gcenable.gowrap2() Oct 30 09:21:08 220908-NB ollama[198]: runtime/mgc.go:204 +0x25 fp=0xc000053fe0 sp=0xc000053fc8 pc=0x5572fd865525 Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000053fe8 sp=0xc000053fe0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.gcenable in goroutine 1 Oct 30 09:21:08 220908-NB ollama[198]: runtime/mgc.go:204 +0xa5 Oct 30 09:21:08 220908-NB ollama[198]: goroutine 18 gp=0xc000102700 m=nil [finalizer wait]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0xc000052648?, 0x5572fd858e85?, 0xa8?, 0x1?, 0xc0000061c0?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:402 +0xce fp=0xc000052620 sp=0xc000052600 pc=0x5572fd885f0e Oct 30 09:21:08 220908-NB ollama[198]: runtime.runfinq() Oct 30 09:21:08 220908-NB ollama[198]: runtime/mfinal.go:194 +0x107 fp=0xc0000527e0 sp=0xc000052620 pc=0x5572fd8645c7 Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000527e8 sp=0xc0000527e0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: created by runtime.createfing in goroutine 1 Oct 30 09:21:08 220908-NB ollama[198]: runtime/mfinal.go:164 +0x3d Oct 30 09:21:08 220908-NB ollama[198]: goroutine 34 gp=0xc000220000 m=nil [select]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0xc000257a80?, 0x2?, 0x18?, 0x77?, 0xc000257824?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:402 +0xce fp=0xc000257698 sp=0xc000257678 pc=0x5572fd885f0e Oct 30 09:21:08 220908-NB ollama[198]: runtime.selectgo(0xc000257a80, 0xc000257820, 0xc000218280?, 0x0, 0x1?, 0x1) Oct 30 09:21:08 220908-NB ollama[198]: runtime/select.go:327 +0x725 fp=0xc0002577b8 sp=0xc000257698 pc=0x5572fd8972e5 Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).completion(0xc000144120, {0x5572fde08870, 0xc0002361c0}, 0xc0002266c0) Oct 30 09:21:08 220908-NB ollama[198]: github.com/ollama/ollama/llama/runner/runner.go:659 +0x8d1 fp=0xc000257ab8 sp=0xc0002577b8 pc=0x5572fdac8111 Oct 30 09:21:08 220908-NB ollama[198]: main.(*Server).completion-fm({0x5572fde08870?, 0xc0002361c0?}, 0x5572fdaa3a6d?) Oct 30 09:21:08 220908-NB ollama[198]: <autogenerated>:1 +0x36 fp=0xc000257ae8 sp=0xc000257ab8 pc=0x5572fdacb076 Oct 30 09:21:08 220908-NB ollama[198]: net/http.HandlerFunc.ServeHTTP(0xc00011ea90?, {0x5572fde08870?, 0xc0002361c0?}, 0x10?) Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:2171 +0x29 fp=0xc000257b10 sp=0xc000257ae8 pc=0x5572fda9c509 Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*ServeMux).ServeHTTP(0x5572fd858e85?, {0x5572fde08870, 0xc0002361c0}, 0xc0002266c0) Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:2688 +0x1ad fp=0xc000257b60 sp=0xc000257b10 pc=0x5572fda9e38d Oct 30 09:21:08 220908-NB ollama[198]: net/http.serverHandler.ServeHTTP({0x5572fde07bc0?}, {0x5572fde08870?, 0xc0002361c0?}, 0x6?) Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:3142 +0x8e fp=0xc000257b90 sp=0xc000257b60 pc=0x5572fda9f3ae Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*conn).serve(0xc00021a000, {0x5572fde08c90, 0xc00011cd80}) Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:2044 +0x5e8 fp=0xc000257fb8 sp=0xc000257b90 pc=0x5572fda9b148 Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*Server).Serve.gowrap3() Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:3290 +0x28 fp=0xc000257fe0 sp=0xc000257fb8 pc=0x5572fda9fb28 Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000257fe8 sp=0xc000257fe0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: created by net/http.(*Server).Serve in goroutine 1 Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:3290 +0x4b4 Oct 30 09:21:08 220908-NB ollama[198]: goroutine 37 gp=0xc000220380 m=nil [IO wait]: Oct 30 09:21:08 220908-NB ollama[198]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0xed?, 0xb?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/proc.go:402 +0xce fp=0xc00004eda8 sp=0xc00004ed88 pc=0x5572fd885f0e Oct 30 09:21:08 220908-NB ollama[198]: runtime.netpollblock(0x5572fd8ec818?, 0xfd84ea26?, 0x72?) Oct 30 09:21:08 220908-NB ollama[198]: runtime/netpoll.go:573 +0xf7 fp=0xc00004ede0 sp=0xc00004eda8 pc=0x5572fd87e157 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.runtime_pollWait(0x7f21a8457e58, 0x72) Oct 30 09:21:08 220908-NB ollama[198]: runtime/netpoll.go:345 +0x85 fp=0xc00004ee00 sp=0xc00004ede0 pc=0x5572fd8b29a5 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).wait(0xc000218000?, 0xc00020e101?, 0x0) Oct 30 09:21:08 220908-NB ollama[198]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00004ee28 sp=0xc00004ee00 pc=0x5572fd902c87 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*pollDesc).waitRead(...) Oct 30 09:21:08 220908-NB ollama[198]: internal/poll/fd_poll_runtime.go:89 Oct 30 09:21:08 220908-NB ollama[198]: internal/poll.(*FD).Read(0xc000218000, {0xc00020e101, 0x1, 0x1}) Oct 30 09:21:08 220908-NB ollama[198]: internal/poll/fd_unix.go:164 +0x27a fp=0xc00004eec0 sp=0xc00004ee28 pc=0x5572fd9037da Oct 30 09:21:08 220908-NB ollama[198]: net.(*netFD).Read(0xc000218000, {0xc00020e101?, 0xc00004ef48?, 0x5572fd8b45d0?}) Oct 30 09:21:08 220908-NB ollama[198]: net/fd_posix.go:55 +0x25 fp=0xc00004ef08 sp=0xc00004eec0 pc=0x5572fd970685 Oct 30 09:21:08 220908-NB ollama[198]: net.(*conn).Read(0xc000210008, {0xc00020e101?, 0x0?, 0x5572fe0be060?}) Oct 30 09:21:08 220908-NB ollama[198]: net/net.go:185 +0x45 fp=0xc00004ef50 sp=0xc00004ef08 pc=0x5572fd97a945 Oct 30 09:21:08 220908-NB ollama[198]: net.(*TCPConn).Read(0x5572fdf96830?, {0xc00020e101?, 0x0?, 0x0?}) Oct 30 09:21:08 220908-NB ollama[198]: <autogenerated>:1 +0x25 fp=0xc00004ef80 sp=0xc00004ef50 pc=0x5572fd986325 Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*connReader).backgroundRead(0xc00020e0f0) Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:681 +0x37 fp=0xc00004efc8 sp=0xc00004ef80 pc=0x5572fda950b7 Oct 30 09:21:08 220908-NB ollama[198]: net/http.(*connReader).startBackgroundRead.gowrap2() Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:677 +0x25 fp=0xc00004efe0 sp=0xc00004efc8 pc=0x5572fda94fe5 Oct 30 09:21:08 220908-NB ollama[198]: runtime.goexit({}) Oct 30 09:21:08 220908-NB ollama[198]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00004efe8 sp=0xc00004efe0 pc=0x5572fd8b7ce1 Oct 30 09:21:08 220908-NB ollama[198]: created by net/http.(*connReader).startBackgroundRead in goroutine 34 Oct 30 09:21:08 220908-NB ollama[198]: net/http/server.go:677 +0xba Oct 30 09:21:08 220908-NB ollama[198]: rax 0x0 Oct 30 09:21:08 220908-NB ollama[198]: rbx 0x7f2148da9000 Oct 30 09:21:08 220908-NB ollama[198]: rcx 0x7f218fe419fc Oct 30 09:21:08 220908-NB ollama[198]: rdx 0x6 Oct 30 09:21:08 220908-NB ollama[198]: rdi 0x227 Oct 30 09:21:08 220908-NB ollama[198]: rsi 0x229 Oct 30 09:21:08 220908-NB ollama[198]: rbp 0x229 Oct 30 09:21:08 220908-NB ollama[198]: rsp 0x7f2148d9fea0 Oct 30 09:21:08 220908-NB ollama[198]: r8 0x7f2148d9ff70 Oct 30 09:21:08 220908-NB ollama[198]: r9 0x5 Oct 30 09:21:08 220908-NB ollama[198]: r10 0x8 Oct 30 09:21:08 220908-NB ollama[198]: r11 0x246 Oct 30 09:21:08 220908-NB ollama[198]: r12 0x6 Oct 30 09:21:08 220908-NB ollama[198]: r13 0x16 Oct 30 09:21:08 220908-NB ollama[198]: r14 0x7f21a8ecf0b0 Oct 30 09:21:08 220908-NB ollama[198]: r15 0x7f2102afa790 Oct 30 09:21:08 220908-NB ollama[198]: rip 0x7f218fe419fc Oct 30 09:21:08 220908-NB ollama[198]: rflags 0x246 Oct 30 09:21:08 220908-NB ollama[198]: cs 0x33 Oct 30 09:21:08 220908-NB ollama[198]: fs 0x0 Oct 30 09:21:08 220908-NB ollama[198]: gs 0x0 Oct 30 09:21:08 220908-NB ollama[198]: [GIN] 2024/10/30 - 09:21:08 | 200 | 1.772771544s | 127.0.0.1 | POST "/api/chat" ```
Author
Owner

@rick-github commented on GitHub (Oct 30, 2024):

If use cuda_v11, I see this error. If I use cuda_v12 it works fine. Nvidia driver version 525.* has some issues with CUDA 12.0 (#6556), so my recommendation is to upgrade your Nvidia driver.

<!-- gh-comment-id:2446622975 --> @rick-github commented on GitHub (Oct 30, 2024): If use cuda_v11, I see this error. If I use cuda_v12 it works fine. Nvidia driver version 525.* has some issues with CUDA 12.0 (#6556), so my recommendation is to upgrade your Nvidia driver.
Author
Owner

@cool9203 commented on GitHub (Nov 1, 2024):

I upgrade nvidia driver, still get error.
Tried cuda 12.3 and 12.2, always have same error.

By the way, my gpu not direct install on pc, is external with thunderbolt, hope it won't affect.

smollm:135m and smollm:360m will get this error, if used smollm:1.7b or llama3.1 will not get error.

log(ollama version is 0.4.0-rc5, cuda is 12.3):

Nov 01 14:35:55 220908-NB systemd[1]: Started Ollama Service.
Nov 01 14:35:55 220908-NB ollama[2592]: 2024/11/01 14:35:55 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.952+08:00 level=INFO source=images.go:754 msg="total blobs: 13"
Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.952+08:00 level=INFO source=images.go:761 msg="total unused blobs removed: 0"
Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.952+08:00 level=INFO source=routes.go:1217 msg="Listening on 127.0.0.1:11434 (version 0.4.0-rc5)"
Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.953+08:00 level=INFO source=common.go:168 msg="extracting embedded files" dir=/tmp/ollama2178342259/runners
Nov 01 14:35:56 220908-NB ollama[2592]: time=2024-11-01T14:35:56.045+08:00 level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cuda_v11 cuda_v12 rocm cpu cpu_avx cpu_avx2]"
Nov 01 14:35:56 220908-NB ollama[2592]: time=2024-11-01T14:35:56.045+08:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
Nov 01 14:35:58 220908-NB ollama[2592]: time=2024-11-01T14:35:58.131+08:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 library=cuda variant=v12 compute=8.6 driver=12.3 name="NVIDIA GeForce RTX 3060 Ti" total="8.0 GiB" available="7.0 GiB"
Nov 01 14:35:58 220908-NB ollama[2592]: time=2024-11-01T14:35:58.132+08:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-b438b072-bc61-cae2-d728-b342033cda6a library=cuda variant=v12 compute=8.6 driver=12.3 name="NVIDIA GeForce RTX 3060 Ti" total="8.0 GiB" available="7.0 GiB"
Nov 01 14:36:31 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:31 | 200 |      44.799µs |       127.0.0.1 | HEAD     "/"
Nov 01 14:36:31 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:31 | 200 |     5.28753ms |       127.0.0.1 | POST     "/api/show"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.305+08:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 gpu=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 parallel=4 available=7488929792 required="895.2 MiB"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.520+08:00 level=INFO source=llama-server.go:72 msg="system memory" total="7.6 GiB" free="6.8 GiB" free_swap="20.0 GiB"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=memory.go:346 msg="offload to cuda" layers.requested=-1 layers.model=31 layers.offload=31 layers.split="" memory.available="[7.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="895.2 MiB" memory.required.partial="895.2 MiB" memory.required.kv="180.0 MiB" memory.required.allocations="[895.2 MiB]" memory.weights.total="237.1 MiB" memory.weights.repeating="208.4 MiB" memory.weights.nonrepeating="28.7 MiB" memory.graph.full="164.5 MiB" memory.graph.partial="168.4 MiB"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=llama-server.go:355 msg="starting llama server" cmd="/tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 31 --threads 4 --parallel 4 --port 44793"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=llama-server.go:534 msg="waiting for llama runner to start responding"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server error"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.551+08:00 level=INFO source=runner.go:869 msg="starting go runner"
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.551+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:44793"
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: loaded meta data with 39 key-value pairs and 272 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 (version GGUF V3 (latest))
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   1:                               general.type str              = model
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   2:                               general.name str              = SmolLM 135M
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   3:                       general.organization str              = HuggingFaceTB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   4:                           general.finetune str              = Instruct
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   5:                           general.basename str              = SmolLM
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   6:                         general.size_label str              = 135M
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   7:                            general.license str              = apache-2.0
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv   9:                  general.base_model.0.name str              = SmolLM 135M
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  10:          general.base_model.0.organization str              = HuggingFaceTB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/HuggingFaceTB/...
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  12:                               general.tags arr[str,3]       = ["alignment-handbook", "trl", "sft"]
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  14:                           general.datasets arr[str,4]       = ["Magpie-Align/Magpie-Pro-300K-Filter...
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  15:                          llama.block_count u32              = 30
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  16:                       llama.context_length u32              = 2048
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  17:                     llama.embedding_length u32              = 576
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  18:                  llama.feed_forward_length u32              = 1536
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  19:                 llama.attention.head_count u32              = 9
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  20:              llama.attention.head_count_kv u32              = 3
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  21:                       llama.rope.freq_base f32              = 10000.000000
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  22:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  23:                          general.file_type u32              = 2
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  24:                           llama.vocab_size u32              = 49152
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  25:                 llama.rope.dimension_count u32              = 64
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  26:            tokenizer.ggml.add_space_prefix bool             = false
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  27:               tokenizer.ggml.add_bos_token bool             = false
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gpt2
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  29:                         tokenizer.ggml.pre str              = smollm
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  30:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<|im_start|>", "<|...
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,48900]   = ["Ġ t", "Ġ a", "i n", "h e", "Ġ Ġ...
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 1
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 2
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  35:            tokenizer.ggml.unknown_token_id u32              = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 2
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  37:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv  38:               general.quantization_version u32              = 2
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - type  f32:   61 tensors
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - type q4_0:  210 tensors
Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - type q8_0:    1 tensors
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_vocab: special tokens cache size = 17
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_vocab: token to piece cache size = 0.3170 MB
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: format           = GGUF V3 (latest)
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: arch             = llama
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: vocab type       = BPE
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_vocab          = 49152
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_merges         = 48900
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: vocab_only       = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_ctx_train      = 2048
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd           = 576
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_layer          = 30
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_head           = 9
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_head_kv        = 3
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_rot            = 64
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_swa            = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_head_k    = 64
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_head_v    = 64
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_gqa            = 3
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_k_gqa     = 192
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_v_gqa     = 192
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_ff             = 1536
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_expert         = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_expert_used    = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: causal attn      = 1
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: pooling type     = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: rope type        = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: rope scaling     = linear
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: freq_base_train  = 10000.0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: freq_scale_train = 1
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_ctx_orig_yarn  = 2048
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: rope_finetuned   = unknown
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_d_conv       = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_d_inner      = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_d_state      = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_dt_rank      = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_dt_b_c_rms   = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model type       = ?B
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model ftype      = Q4_0
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model params     = 134.52 M
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model size       = 85.77 MiB (5.35 BPW)
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: general.name     = SmolLM 135M
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: BOS token        = 1 '<|im_start|>'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOS token        = 2 '<|im_end|>'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: UNK token        = 0 '<|endoftext|>'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: PAD token        = 2 '<|im_end|>'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: LF token         = 143 'Ä'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOT token        = 0 '<|endoftext|>'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOG token        = 0 '<|endoftext|>'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOG token        = 2 '<|im_end|>'
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: max token length = 162
Nov 01 14:36:31 220908-NB ollama[2592]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Nov 01 14:36:31 220908-NB ollama[2592]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Nov 01 14:36:31 220908-NB ollama[2592]: ggml_cuda_init: found 1 CUDA devices:
Nov 01 14:36:31 220908-NB ollama[2592]:   Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes
Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.772+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server loading model"
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: ggml ctx size =    0.25 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: offloading 30 repeating layers to GPU
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: offloading non-repeating layers to GPU
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: offloaded 31/31 layers to GPU
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors:        CPU buffer size =    28.69 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors:      CUDA0 buffer size =    85.82 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: n_ctx      = 8192
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: n_batch    = 2048
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: n_ubatch   = 512
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: flash_attn = 0
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: freq_base  = 10000.0
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: freq_scale = 1
Nov 01 14:36:31 220908-NB ollama[2592]: llama_kv_cache_init:      CUDA0 KV buffer size =   180.00 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: KV self size  =  180.00 MiB, K (f16):   90.00 MiB, V (f16):   90.00 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model:  CUDA_Host  output buffer size =     0.76 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model:      CUDA0 compute buffer size =   164.50 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model:  CUDA_Host compute buffer size =    17.13 MiB
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: graph nodes  = 966
Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: graph splits = 2
Nov 01 14:36:32 220908-NB ollama[2592]: time=2024-11-01T14:36:32.023+08:00 level=INFO source=llama-server.go:573 msg="llama runner started in 0.50 seconds"
Nov 01 14:36:32 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:32 | 200 |  1.008514607s |       127.0.0.1 | POST     "/api/generate"
Nov 01 14:36:35 220908-NB ollama[2592]: CUDA error: the requested functionality is not supported
Nov 01 14:36:35 220908-NB ollama[2592]:   current device: 0, in function ggml_cuda_mul_mat_batched_cublas at ggml-cuda.cu:1926
Nov 01 14:36:35 220908-NB ollama[2592]:   cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
Nov 01 14:36:35 220908-NB ollama[2592]: ggml-cuda.cu:132: CUDA error
Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(+0x39e688)[0x559dff2cc688]
Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(ggml_abort+0x136)[0x559dff2cdfb6]
Nov 01 14:36:35 220908-NB ollama[2592]: /usr/local/lib/ollama/libggml_cuda_v12.so(+0x47892)[0x7fe10b205892]
Nov 01 14:36:35 220908-NB ollama[2592]: /usr/local/lib/ollama/libggml_cuda_v12.so(+0x4a20c)[0x7fe10b20820c]
Nov 01 14:36:35 220908-NB ollama[2592]: /usr/local/lib/ollama/libggml_cuda_v12.so(+0x54909)[0x7fe10b212909]
Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(ggml_backend_sched_graph_compute_async+0x181)[0x559dff2b6971]
Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(llama_decode+0x5f1)[0x559dff3990a1]
Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(_cgo_aa09f91ec3e0_Cfunc_llama_decode+0x4f)[0x559dff2ae83f]
Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(+0x16c961)[0x559dff09a961]
Nov 01 14:36:35 220908-NB ollama[2592]: SIGABRT: abort
Nov 01 14:36:35 220908-NB ollama[2592]: PC=0x7fe0e5a419fc m=5 sigcode=18446744073709551610
Nov 01 14:36:35 220908-NB ollama[2592]: signal arrived during cgo execution
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 7 gp=0xc0000e4000 m=5 mp=0xc000100008 [syscall]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.cgocall(0x559dff2ae7f0, 0xc000062ad8)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/cgocall.go:157 +0x4b fp=0xc000062ab0 sp=0xc000062a78 pc=0x559dff0322cb
Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7fe07c0062d0, {0x1, 0x7fe07c0777f0, 0x0, 0x7fe07c079800, 0x7fe07c0530e0, 0x7fe07c0550f0, 0x7fe07c067520, 0x0, 0x0, ...})
Nov 01 14:36:35 220908-NB ollama[2592]:         _cgo_gotypes.go:512 +0x4f fp=0xc000062ad8 sp=0xc000062ab0 pc=0x559dff12fb0f
Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama.(*Context).Decode.func1(0x559dff042178?, 0x15?)
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/llama.go:124 +0x11e fp=0xc000062be0 sp=0xc000062ad8 pc=0x559dff1319be
Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama.(*Context).Decode(0xc0000d03c0?, 0xc000062ef8?)
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/llama.go:124 +0x17 fp=0xc000062c28 sp=0xc000062be0 pc=0x559dff1317d7
Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).processBatch(0xc0000b2120, 0xc000062ef8, 0xc000062e90)
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/runner/runner.go:434 +0x285 fp=0xc000062df8 sp=0xc000062c28 pc=0x559dff2a99a5
Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).run(0xc0000b2120, {0x559dff5e6cc8, 0xc000088050})
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/runner/runner.go:352 +0x359 fp=0xc000062fb8 sp=0xc000062df8 pc=0x559dff2a9279
Nov 01 14:36:35 220908-NB ollama[2592]: main.main.gowrap2()
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/runner/runner.go:907 +0x28 fp=0xc000062fe0 sp=0xc000062fb8 pc=0x559dff2ad988
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000062fe8 sp=0xc000062fe0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: created by main.main in goroutine 1
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/runner/runner.go:907 +0xcab
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x1?, 0xc00002b908?, 0xf4?, 0x8c?, 0xc00002b8e8?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:402 +0xce fp=0xc00002b888 sp=0xc00002b868 pc=0x559dff068f0e
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.netpollblock(0x10?, 0xff031a26?, 0x9d?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/netpoll.go:573 +0xf7 fp=0xc00002b8c0 sp=0xc00002b888 pc=0x559dff061157
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.runtime_pollWait(0x7fe103d88020, 0x72)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/netpoll.go:345 +0x85 fp=0xc00002b8e0 sp=0xc00002b8c0 pc=0x559dff0959a5
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).wait(0x3?, 0x7fe103dc3de8?, 0x0)
Nov 01 14:36:35 220908-NB ollama[2592]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002b908 sp=0xc00002b8e0 pc=0x559dff0e5c87
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).waitRead(...)
Nov 01 14:36:35 220908-NB ollama[2592]:         internal/poll/fd_poll_runtime.go:89
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*FD).Accept(0xc0000e0080)
Nov 01 14:36:35 220908-NB ollama[2592]:         internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002b9b0 sp=0xc00002b908 pc=0x559dff0e714c
Nov 01 14:36:35 220908-NB ollama[2592]: net.(*netFD).accept(0xc0000e0080)
Nov 01 14:36:35 220908-NB ollama[2592]:         net/fd_unix.go:172 +0x29 fp=0xc00002ba68 sp=0xc00002b9b0 pc=0x559dff154789
Nov 01 14:36:35 220908-NB ollama[2592]: net.(*TCPListener).accept(0xc0000721e0)
Nov 01 14:36:35 220908-NB ollama[2592]:         net/tcpsock_posix.go:159 +0x1e fp=0xc00002ba90 sp=0xc00002ba68 pc=0x559dff1654be
Nov 01 14:36:35 220908-NB ollama[2592]: net.(*TCPListener).Accept(0xc0000721e0)
Nov 01 14:36:35 220908-NB ollama[2592]:         net/tcpsock.go:327 +0x30 fp=0xc00002bac0 sp=0xc00002ba90 pc=0x559dff164810
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*onceCloseListener).Accept(0xc000192000?)
Nov 01 14:36:35 220908-NB ollama[2592]:         <autogenerated>:1 +0x24 fp=0xc00002bad8 sp=0xc00002bac0 pc=0x559dff28b924
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*Server).Serve(0xc0000f6000, {0x559dff5e66c0, 0xc0000721e0})
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:3260 +0x33e fp=0xc00002bc08 sp=0xc00002bad8 pc=0x559dff28273e
Nov 01 14:36:35 220908-NB ollama[2592]: main.main()
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/runner/runner.go:927 +0x104c fp=0xc00002bf50 sp=0xc00002bc08 pc=0x559dff2ad70c
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.main()
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:271 +0x29d fp=0xc00002bfe0 sp=0xc00002bf50 pc=0x559dff068add
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00002bfe8 sp=0xc00002bfe0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:402 +0xce fp=0xc000052fa8 sp=0xc000052f88 pc=0x559dff068f0e
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goparkunlock(...)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:408
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.forcegchelper()
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:326 +0xb8 fp=0xc000052fe0 sp=0xc000052fa8 pc=0x559dff068d98
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000052fe8 sp=0xc000052fe0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.init.6 in goroutine 1
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:314 +0x1a
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:402 +0xce fp=0xc000053780 sp=0xc000053760 pc=0x559dff068f0e
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goparkunlock(...)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:408
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.bgsweep(0xc00007c000)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mgcsweep.go:278 +0x94 fp=0xc0000537c8 sp=0xc000053780 pc=0x559dff053a54
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gcenable.gowrap1()
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mgc.go:203 +0x25 fp=0xc0000537e0 sp=0xc0000537c8 pc=0x559dff048585
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000537e8 sp=0xc0000537e0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.gcenable in goroutine 1
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mgc.go:203 +0x66
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0xc00007c000?, 0x559dff4e8be8?, 0x1?, 0x0?, 0xc000007340?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:402 +0xce fp=0xc000053f78 sp=0xc000053f58 pc=0x559dff068f0e
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goparkunlock(...)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:408
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.(*scavengerState).park(0x559dff7b34c0)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mgcscavenge.go:425 +0x49 fp=0xc000053fa8 sp=0xc000053f78 pc=0x559dff051449
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.bgscavenge(0xc00007c000)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mgcscavenge.go:653 +0x3c fp=0xc000053fc8 sp=0xc000053fa8 pc=0x559dff0519dc
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gcenable.gowrap2()
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mgc.go:204 +0x25 fp=0xc000053fe0 sp=0xc000053fc8 pc=0x559dff048525
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000053fe8 sp=0xc000053fe0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.gcenable in goroutine 1
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mgc.go:204 +0xa5
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0xc000052648?, 0x559dff03be85?, 0xa8?, 0x1?, 0xc0000061c0?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:402 +0xce fp=0xc000052620 sp=0xc000052600 pc=0x559dff068f0e
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.runfinq()
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mfinal.go:194 +0x107 fp=0xc0000527e0 sp=0xc000052620 pc=0x559dff0475c7
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000527e8 sp=0xc0000527e0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.createfing in goroutine 1
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/mfinal.go:164 +0x3d
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 18 gp=0xc000198000 m=nil [select]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0xc000211a80?, 0x2?, 0x18?, 0x17?, 0xc000211824?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:402 +0xce fp=0xc000211698 sp=0xc000211678 pc=0x559dff068f0e
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.selectgo(0xc000211a80, 0xc000211820, 0xc000130000?, 0x0, 0x1?, 0x1)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/select.go:327 +0x725 fp=0xc0002117b8 sp=0xc000211698 pc=0x559dff07a2e5
Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).completion(0xc0000b2120, {0x559dff5e6870, 0xc0000de540}, 0xc0000b4480)
Nov 01 14:36:35 220908-NB ollama[2592]:         github.com/ollama/ollama/llama/runner/runner.go:659 +0x8d1 fp=0xc000211ab8 sp=0xc0002117b8 pc=0x559dff2ab111
Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).completion-fm({0x559dff5e6870?, 0xc0000de540?}, 0x559dff286a6d?)
Nov 01 14:36:35 220908-NB ollama[2592]:         <autogenerated>:1 +0x36 fp=0xc000211ae8 sp=0xc000211ab8 pc=0x559dff2ae076
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.HandlerFunc.ServeHTTP(0xc000098dd0?, {0x559dff5e6870?, 0xc0000de540?}, 0x10?)
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:2171 +0x29 fp=0xc000211b10 sp=0xc000211ae8 pc=0x559dff27f509
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*ServeMux).ServeHTTP(0x559dff03be85?, {0x559dff5e6870, 0xc0000de540}, 0xc0000b4480)
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:2688 +0x1ad fp=0xc000211b60 sp=0xc000211b10 pc=0x559dff28138d
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.serverHandler.ServeHTTP({0x559dff5e5bc0?}, {0x559dff5e6870?, 0xc0000de540?}, 0x6?)
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:3142 +0x8e fp=0xc000211b90 sp=0xc000211b60 pc=0x559dff2823ae
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*conn).serve(0xc000192000, {0x559dff5e6c90, 0xc000096d80})
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:2044 +0x5e8 fp=0xc000211fb8 sp=0xc000211b90 pc=0x559dff27e148
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*Server).Serve.gowrap3()
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:3290 +0x28 fp=0xc000211fe0 sp=0xc000211fb8 pc=0x559dff282b28
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000211fe8 sp=0xc000211fe0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: created by net/http.(*Server).Serve in goroutine 1
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:3290 +0x4b4
Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 9 gp=0xc0000e41c0 m=nil [IO wait]:
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x55?, 0xb?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/proc.go:402 +0xce fp=0xc0000555a8 sp=0xc000055588 pc=0x559dff068f0e
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.netpollblock(0x559dff0cf818?, 0xff031a26?, 0x9d?)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/netpoll.go:573 +0xf7 fp=0xc0000555e0 sp=0xc0000555a8 pc=0x559dff061157
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.runtime_pollWait(0x7fe103d87f28, 0x72)
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/netpoll.go:345 +0x85 fp=0xc000055600 sp=0xc0000555e0 pc=0x559dff0959a5
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).wait(0xc000190000?, 0xc000186101?, 0x0)
Nov 01 14:36:35 220908-NB ollama[2592]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000055628 sp=0xc000055600 pc=0x559dff0e5c87
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).waitRead(...)
Nov 01 14:36:35 220908-NB ollama[2592]:         internal/poll/fd_poll_runtime.go:89
Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*FD).Read(0xc000190000, {0xc000186101, 0x1, 0x1})
Nov 01 14:36:35 220908-NB ollama[2592]:         internal/poll/fd_unix.go:164 +0x27a fp=0xc0000556c0 sp=0xc000055628 pc=0x559dff0e67da
Nov 01 14:36:35 220908-NB ollama[2592]: net.(*netFD).Read(0xc000190000, {0xc000186101?, 0xc000055748?, 0x559dff0975d0?})
Nov 01 14:36:35 220908-NB ollama[2592]:         net/fd_posix.go:55 +0x25 fp=0xc000055708 sp=0xc0000556c0 pc=0x559dff153685
Nov 01 14:36:35 220908-NB ollama[2592]: net.(*conn).Read(0xc000188008, {0xc000186101?, 0x0?, 0x559dff89c060?})
Nov 01 14:36:35 220908-NB ollama[2592]:         net/net.go:185 +0x45 fp=0xc000055750 sp=0xc000055708 pc=0x559dff15d945
Nov 01 14:36:35 220908-NB ollama[2592]: net.(*TCPConn).Read(0x559dff774830?, {0xc000186101?, 0x0?, 0x0?})
Nov 01 14:36:35 220908-NB ollama[2592]:         <autogenerated>:1 +0x25 fp=0xc000055780 sp=0xc000055750 pc=0x559dff169325
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*connReader).backgroundRead(0xc0001860f0)
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:681 +0x37 fp=0xc0000557c8 sp=0xc000055780 pc=0x559dff2780b7
Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*connReader).startBackgroundRead.gowrap2()
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:677 +0x25 fp=0xc0000557e0 sp=0xc0000557c8 pc=0x559dff277fe5
Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({})
Nov 01 14:36:35 220908-NB ollama[2592]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000557e8 sp=0xc0000557e0 pc=0x559dff09ace1
Nov 01 14:36:35 220908-NB ollama[2592]: created by net/http.(*connReader).startBackgroundRead in goroutine 18
Nov 01 14:36:35 220908-NB ollama[2592]:         net/http/server.go:677 +0xba
Nov 01 14:36:35 220908-NB ollama[2592]: rax    0x0
Nov 01 14:36:35 220908-NB ollama[2592]: rbx    0x7fe0851fc000
Nov 01 14:36:35 220908-NB ollama[2592]: rcx    0x7fe0e5a419fc
Nov 01 14:36:35 220908-NB ollama[2592]: rdx    0x6
Nov 01 14:36:35 220908-NB ollama[2592]: rdi    0xadb
Nov 01 14:36:35 220908-NB ollama[2592]: rsi    0xadf
Nov 01 14:36:35 220908-NB ollama[2592]: rbp    0xadf
Nov 01 14:36:35 220908-NB ollama[2592]: rsp    0x7fe0851d2e60
Nov 01 14:36:35 220908-NB ollama[2592]: r8     0x7fe0851d2f30
Nov 01 14:36:35 220908-NB ollama[2592]: r9     0x5
Nov 01 14:36:35 220908-NB ollama[2592]: r10    0x8
Nov 01 14:36:35 220908-NB ollama[2592]: r11    0x246
Nov 01 14:36:35 220908-NB ollama[2592]: r12    0x6
Nov 01 14:36:35 220908-NB ollama[2592]: r13    0x16
Nov 01 14:36:35 220908-NB ollama[2592]: r14    0x7fe0e96c6c48
Nov 01 14:36:35 220908-NB ollama[2592]: r15    0x7fe07c0c4000
Nov 01 14:36:35 220908-NB ollama[2592]: rip    0x7fe0e5a419fc
Nov 01 14:36:35 220908-NB ollama[2592]: rflags 0x246
Nov 01 14:36:35 220908-NB ollama[2592]: cs     0x33
Nov 01 14:36:35 220908-NB ollama[2592]: fs     0x0
Nov 01 14:36:35 220908-NB ollama[2592]: gs     0x0
Nov 01 14:36:35 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:35 | 200 |  1.301513399s |       127.0.0.1 | POST     "/api/chat"

nvidia-smi:

Fri Nov  1 14:39:47 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.07              Driver Version: 546.12       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  | 00000000:03:00.0 Off |                  N/A |
|  0%   35C    P8              12W / 225W |      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3060 Ti     On  | 00000000:2E:00.0 Off |                  N/A |
|  0%   31C    P8               9W / 225W |      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2592      C   /ollama                                   N/A      |
|    0   N/A  N/A      2592      C   /ollama                                   N/A      |
|    1   N/A  N/A      2592      C   /ollama                                   N/A      |
|    1   N/A  N/A      2592      C   /ollama                                   N/A      |
+---------------------------------------------------------------------------------------+
Fri Nov  1 15:08:46 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.04              Driver Version: 536.23       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  | 00000000:03:00.0 Off |                  N/A |
|  0%   39C    P8              10W / 225W |      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       173      C   /ollama                                   N/A      |
+---------------------------------------------------------------------------------------+
<!-- gh-comment-id:2451414038 --> @cool9203 commented on GitHub (Nov 1, 2024): I upgrade nvidia driver, still get error. Tried cuda `12.3` and `12.2`, always have same error. By the way, my gpu not direct install on pc, is external with thunderbolt, hope it won't affect. `smollm:135m` and `smollm:360m` will get this error, if used `smollm:1.7b` or `llama3.1` will not get error. log(ollama version is 0.4.0-rc5, cuda is 12.3): ``` Nov 01 14:35:55 220908-NB systemd[1]: Started Ollama Service. Nov 01 14:35:55 220908-NB ollama[2592]: 2024/11/01 14:35:55 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.952+08:00 level=INFO source=images.go:754 msg="total blobs: 13" Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.952+08:00 level=INFO source=images.go:761 msg="total unused blobs removed: 0" Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.952+08:00 level=INFO source=routes.go:1217 msg="Listening on 127.0.0.1:11434 (version 0.4.0-rc5)" Nov 01 14:35:55 220908-NB ollama[2592]: time=2024-11-01T14:35:55.953+08:00 level=INFO source=common.go:168 msg="extracting embedded files" dir=/tmp/ollama2178342259/runners Nov 01 14:35:56 220908-NB ollama[2592]: time=2024-11-01T14:35:56.045+08:00 level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cuda_v11 cuda_v12 rocm cpu cpu_avx cpu_avx2]" Nov 01 14:35:56 220908-NB ollama[2592]: time=2024-11-01T14:35:56.045+08:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs" Nov 01 14:35:58 220908-NB ollama[2592]: time=2024-11-01T14:35:58.131+08:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 library=cuda variant=v12 compute=8.6 driver=12.3 name="NVIDIA GeForce RTX 3060 Ti" total="8.0 GiB" available="7.0 GiB" Nov 01 14:35:58 220908-NB ollama[2592]: time=2024-11-01T14:35:58.132+08:00 level=INFO source=types.go:123 msg="inference compute" id=GPU-b438b072-bc61-cae2-d728-b342033cda6a library=cuda variant=v12 compute=8.6 driver=12.3 name="NVIDIA GeForce RTX 3060 Ti" total="8.0 GiB" available="7.0 GiB" Nov 01 14:36:31 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:31 | 200 | 44.799µs | 127.0.0.1 | HEAD "/" Nov 01 14:36:31 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:31 | 200 | 5.28753ms | 127.0.0.1 | POST "/api/show" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.305+08:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 gpu=GPU-fa0dce43-ece7-81bd-c88c-48d046c522e5 parallel=4 available=7488929792 required="895.2 MiB" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.520+08:00 level=INFO source=llama-server.go:72 msg="system memory" total="7.6 GiB" free="6.8 GiB" free_swap="20.0 GiB" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=memory.go:346 msg="offload to cuda" layers.requested=-1 layers.model=31 layers.offload=31 layers.split="" memory.available="[7.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="895.2 MiB" memory.required.partial="895.2 MiB" memory.required.kv="180.0 MiB" memory.required.allocations="[895.2 MiB]" memory.weights.total="237.1 MiB" memory.weights.repeating="208.4 MiB" memory.weights.nonrepeating="28.7 MiB" memory.graph.full="164.5 MiB" memory.graph.partial="168.4 MiB" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=llama-server.go:355 msg="starting llama server" cmd="/tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 31 --threads 4 --parallel 4 --port 44793" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=llama-server.go:534 msg="waiting for llama runner to start responding" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.521+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server error" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.551+08:00 level=INFO source=runner.go:869 msg="starting go runner" Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.551+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:44793" Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: loaded meta data with 39 key-value pairs and 272 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eb2c714d40d4b35ba4b8ee98475a06d51d8080a17d2d2a75a23665985c739b94 (version GGUF V3 (latest)) Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 0: general.architecture str = llama Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 1: general.type str = model Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 2: general.name str = SmolLM 135M Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 3: general.organization str = HuggingFaceTB Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 4: general.finetune str = Instruct Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 5: general.basename str = SmolLM Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 6: general.size_label str = 135M Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 7: general.license str = apache-2.0 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 9: general.base_model.0.name str = SmolLM 135M Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 10: general.base_model.0.organization str = HuggingFaceTB Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/HuggingFaceTB/... Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 12: general.tags arr[str,3] = ["alignment-handbook", "trl", "sft"] Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 14: general.datasets arr[str,4] = ["Magpie-Align/Magpie-Pro-300K-Filter... Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 15: llama.block_count u32 = 30 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 16: llama.context_length u32 = 2048 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 17: llama.embedding_length u32 = 576 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 18: llama.feed_forward_length u32 = 1536 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 19: llama.attention.head_count u32 = 9 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 20: llama.attention.head_count_kv u32 = 3 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 21: llama.rope.freq_base f32 = 10000.000000 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 22: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 23: general.file_type u32 = 2 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 24: llama.vocab_size u32 = 49152 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 25: llama.rope.dimension_count u32 = 64 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 26: tokenizer.ggml.add_space_prefix bool = false Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 28: tokenizer.ggml.model str = gpt2 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 29: tokenizer.ggml.pre str = smollm Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 30: tokenizer.ggml.tokens arr[str,49152] = ["<|endoftext|>", "<|im_start|>", "<|... Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,49152] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 32: tokenizer.ggml.merges arr[str,48900] = ["Ġ t", "Ġ a", "i n", "h e", "Ġ Ġ... Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 33: tokenizer.ggml.bos_token_id u32 = 1 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 34: tokenizer.ggml.eos_token_id u32 = 2 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 35: tokenizer.ggml.unknown_token_id u32 = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 2 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 37: tokenizer.chat_template str = {% for message in messages %}{{'<|im_... Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - kv 38: general.quantization_version u32 = 2 Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - type f32: 61 tensors Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - type q4_0: 210 tensors Nov 01 14:36:31 220908-NB ollama[2592]: llama_model_loader: - type q8_0: 1 tensors Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_vocab: special tokens cache size = 17 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_vocab: token to piece cache size = 0.3170 MB Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: format = GGUF V3 (latest) Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: arch = llama Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: vocab type = BPE Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_vocab = 49152 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_merges = 48900 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: vocab_only = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_ctx_train = 2048 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd = 576 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_layer = 30 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_head = 9 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_head_kv = 3 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_rot = 64 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_swa = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_head_k = 64 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_head_v = 64 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_gqa = 3 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_k_gqa = 192 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_embd_v_gqa = 192 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_norm_eps = 0.0e+00 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: f_logit_scale = 0.0e+00 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_ff = 1536 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_expert = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_expert_used = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: causal attn = 1 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: pooling type = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: rope type = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: rope scaling = linear Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: freq_base_train = 10000.0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: freq_scale_train = 1 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: n_ctx_orig_yarn = 2048 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: rope_finetuned = unknown Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_d_conv = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_d_inner = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_d_state = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_dt_rank = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model type = ?B Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model ftype = Q4_0 Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model params = 134.52 M Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: model size = 85.77 MiB (5.35 BPW) Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: general.name = SmolLM 135M Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: BOS token = 1 '<|im_start|>' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOS token = 2 '<|im_end|>' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: UNK token = 0 '<|endoftext|>' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: PAD token = 2 '<|im_end|>' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: LF token = 143 'Ä' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOT token = 0 '<|endoftext|>' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOG token = 0 '<|endoftext|>' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: EOG token = 2 '<|im_end|>' Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_print_meta: max token length = 162 Nov 01 14:36:31 220908-NB ollama[2592]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Nov 01 14:36:31 220908-NB ollama[2592]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Nov 01 14:36:31 220908-NB ollama[2592]: ggml_cuda_init: found 1 CUDA devices: Nov 01 14:36:31 220908-NB ollama[2592]: Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes Nov 01 14:36:31 220908-NB ollama[2592]: time=2024-11-01T14:36:31.772+08:00 level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server loading model" Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: ggml ctx size = 0.25 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: offloading 30 repeating layers to GPU Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: offloading non-repeating layers to GPU Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: offloaded 31/31 layers to GPU Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: CPU buffer size = 28.69 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llm_load_tensors: CUDA0 buffer size = 85.82 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: n_ctx = 8192 Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: n_batch = 2048 Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: n_ubatch = 512 Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: flash_attn = 0 Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: freq_base = 10000.0 Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: freq_scale = 1 Nov 01 14:36:31 220908-NB ollama[2592]: llama_kv_cache_init: CUDA0 KV buffer size = 180.00 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: KV self size = 180.00 MiB, K (f16): 90.00 MiB, V (f16): 90.00 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: CUDA_Host output buffer size = 0.76 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: CUDA0 compute buffer size = 164.50 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: CUDA_Host compute buffer size = 17.13 MiB Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: graph nodes = 966 Nov 01 14:36:31 220908-NB ollama[2592]: llama_new_context_with_model: graph splits = 2 Nov 01 14:36:32 220908-NB ollama[2592]: time=2024-11-01T14:36:32.023+08:00 level=INFO source=llama-server.go:573 msg="llama runner started in 0.50 seconds" Nov 01 14:36:32 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:32 | 200 | 1.008514607s | 127.0.0.1 | POST "/api/generate" Nov 01 14:36:35 220908-NB ollama[2592]: CUDA error: the requested functionality is not supported Nov 01 14:36:35 220908-NB ollama[2592]: current device: 0, in function ggml_cuda_mul_mat_batched_cublas at ggml-cuda.cu:1926 Nov 01 14:36:35 220908-NB ollama[2592]: cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) Nov 01 14:36:35 220908-NB ollama[2592]: ggml-cuda.cu:132: CUDA error Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(+0x39e688)[0x559dff2cc688] Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(ggml_abort+0x136)[0x559dff2cdfb6] Nov 01 14:36:35 220908-NB ollama[2592]: /usr/local/lib/ollama/libggml_cuda_v12.so(+0x47892)[0x7fe10b205892] Nov 01 14:36:35 220908-NB ollama[2592]: /usr/local/lib/ollama/libggml_cuda_v12.so(+0x4a20c)[0x7fe10b20820c] Nov 01 14:36:35 220908-NB ollama[2592]: /usr/local/lib/ollama/libggml_cuda_v12.so(+0x54909)[0x7fe10b212909] Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(ggml_backend_sched_graph_compute_async+0x181)[0x559dff2b6971] Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(llama_decode+0x5f1)[0x559dff3990a1] Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(_cgo_aa09f91ec3e0_Cfunc_llama_decode+0x4f)[0x559dff2ae83f] Nov 01 14:36:35 220908-NB ollama[2592]: /tmp/ollama2178342259/runners/cuda_v12/ollama_llama_server(+0x16c961)[0x559dff09a961] Nov 01 14:36:35 220908-NB ollama[2592]: SIGABRT: abort Nov 01 14:36:35 220908-NB ollama[2592]: PC=0x7fe0e5a419fc m=5 sigcode=18446744073709551610 Nov 01 14:36:35 220908-NB ollama[2592]: signal arrived during cgo execution Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 7 gp=0xc0000e4000 m=5 mp=0xc000100008 [syscall]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.cgocall(0x559dff2ae7f0, 0xc000062ad8) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/cgocall.go:157 +0x4b fp=0xc000062ab0 sp=0xc000062a78 pc=0x559dff0322cb Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7fe07c0062d0, {0x1, 0x7fe07c0777f0, 0x0, 0x7fe07c079800, 0x7fe07c0530e0, 0x7fe07c0550f0, 0x7fe07c067520, 0x0, 0x0, ...}) Nov 01 14:36:35 220908-NB ollama[2592]: _cgo_gotypes.go:512 +0x4f fp=0xc000062ad8 sp=0xc000062ab0 pc=0x559dff12fb0f Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama.(*Context).Decode.func1(0x559dff042178?, 0x15?) Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/llama.go:124 +0x11e fp=0xc000062be0 sp=0xc000062ad8 pc=0x559dff1319be Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama.(*Context).Decode(0xc0000d03c0?, 0xc000062ef8?) Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/llama.go:124 +0x17 fp=0xc000062c28 sp=0xc000062be0 pc=0x559dff1317d7 Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).processBatch(0xc0000b2120, 0xc000062ef8, 0xc000062e90) Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/runner/runner.go:434 +0x285 fp=0xc000062df8 sp=0xc000062c28 pc=0x559dff2a99a5 Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).run(0xc0000b2120, {0x559dff5e6cc8, 0xc000088050}) Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/runner/runner.go:352 +0x359 fp=0xc000062fb8 sp=0xc000062df8 pc=0x559dff2a9279 Nov 01 14:36:35 220908-NB ollama[2592]: main.main.gowrap2() Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/runner/runner.go:907 +0x28 fp=0xc000062fe0 sp=0xc000062fb8 pc=0x559dff2ad988 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000062fe8 sp=0xc000062fe0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: created by main.main in goroutine 1 Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/runner/runner.go:907 +0xcab Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x1?, 0xc00002b908?, 0xf4?, 0x8c?, 0xc00002b8e8?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:402 +0xce fp=0xc00002b888 sp=0xc00002b868 pc=0x559dff068f0e Nov 01 14:36:35 220908-NB ollama[2592]: runtime.netpollblock(0x10?, 0xff031a26?, 0x9d?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/netpoll.go:573 +0xf7 fp=0xc00002b8c0 sp=0xc00002b888 pc=0x559dff061157 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.runtime_pollWait(0x7fe103d88020, 0x72) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/netpoll.go:345 +0x85 fp=0xc00002b8e0 sp=0xc00002b8c0 pc=0x559dff0959a5 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).wait(0x3?, 0x7fe103dc3de8?, 0x0) Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002b908 sp=0xc00002b8e0 pc=0x559dff0e5c87 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).waitRead(...) Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll/fd_poll_runtime.go:89 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*FD).Accept(0xc0000e0080) Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002b9b0 sp=0xc00002b908 pc=0x559dff0e714c Nov 01 14:36:35 220908-NB ollama[2592]: net.(*netFD).accept(0xc0000e0080) Nov 01 14:36:35 220908-NB ollama[2592]: net/fd_unix.go:172 +0x29 fp=0xc00002ba68 sp=0xc00002b9b0 pc=0x559dff154789 Nov 01 14:36:35 220908-NB ollama[2592]: net.(*TCPListener).accept(0xc0000721e0) Nov 01 14:36:35 220908-NB ollama[2592]: net/tcpsock_posix.go:159 +0x1e fp=0xc00002ba90 sp=0xc00002ba68 pc=0x559dff1654be Nov 01 14:36:35 220908-NB ollama[2592]: net.(*TCPListener).Accept(0xc0000721e0) Nov 01 14:36:35 220908-NB ollama[2592]: net/tcpsock.go:327 +0x30 fp=0xc00002bac0 sp=0xc00002ba90 pc=0x559dff164810 Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*onceCloseListener).Accept(0xc000192000?) Nov 01 14:36:35 220908-NB ollama[2592]: <autogenerated>:1 +0x24 fp=0xc00002bad8 sp=0xc00002bac0 pc=0x559dff28b924 Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*Server).Serve(0xc0000f6000, {0x559dff5e66c0, 0xc0000721e0}) Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:3260 +0x33e fp=0xc00002bc08 sp=0xc00002bad8 pc=0x559dff28273e Nov 01 14:36:35 220908-NB ollama[2592]: main.main() Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/runner/runner.go:927 +0x104c fp=0xc00002bf50 sp=0xc00002bc08 pc=0x559dff2ad70c Nov 01 14:36:35 220908-NB ollama[2592]: runtime.main() Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:271 +0x29d fp=0xc00002bfe0 sp=0xc00002bf50 pc=0x559dff068add Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00002bfe8 sp=0xc00002bfe0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:402 +0xce fp=0xc000052fa8 sp=0xc000052f88 pc=0x559dff068f0e Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goparkunlock(...) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:408 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.forcegchelper() Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:326 +0xb8 fp=0xc000052fe0 sp=0xc000052fa8 pc=0x559dff068d98 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000052fe8 sp=0xc000052fe0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.init.6 in goroutine 1 Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:314 +0x1a Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:402 +0xce fp=0xc000053780 sp=0xc000053760 pc=0x559dff068f0e Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goparkunlock(...) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:408 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.bgsweep(0xc00007c000) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mgcsweep.go:278 +0x94 fp=0xc0000537c8 sp=0xc000053780 pc=0x559dff053a54 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gcenable.gowrap1() Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mgc.go:203 +0x25 fp=0xc0000537e0 sp=0xc0000537c8 pc=0x559dff048585 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000537e8 sp=0xc0000537e0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.gcenable in goroutine 1 Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mgc.go:203 +0x66 Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0xc00007c000?, 0x559dff4e8be8?, 0x1?, 0x0?, 0xc000007340?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:402 +0xce fp=0xc000053f78 sp=0xc000053f58 pc=0x559dff068f0e Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goparkunlock(...) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:408 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.(*scavengerState).park(0x559dff7b34c0) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mgcscavenge.go:425 +0x49 fp=0xc000053fa8 sp=0xc000053f78 pc=0x559dff051449 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.bgscavenge(0xc00007c000) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mgcscavenge.go:653 +0x3c fp=0xc000053fc8 sp=0xc000053fa8 pc=0x559dff0519dc Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gcenable.gowrap2() Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mgc.go:204 +0x25 fp=0xc000053fe0 sp=0xc000053fc8 pc=0x559dff048525 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000053fe8 sp=0xc000053fe0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.gcenable in goroutine 1 Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mgc.go:204 +0xa5 Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0xc000052648?, 0x559dff03be85?, 0xa8?, 0x1?, 0xc0000061c0?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:402 +0xce fp=0xc000052620 sp=0xc000052600 pc=0x559dff068f0e Nov 01 14:36:35 220908-NB ollama[2592]: runtime.runfinq() Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mfinal.go:194 +0x107 fp=0xc0000527e0 sp=0xc000052620 pc=0x559dff0475c7 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000527e8 sp=0xc0000527e0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: created by runtime.createfing in goroutine 1 Nov 01 14:36:35 220908-NB ollama[2592]: runtime/mfinal.go:164 +0x3d Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 18 gp=0xc000198000 m=nil [select]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0xc000211a80?, 0x2?, 0x18?, 0x17?, 0xc000211824?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:402 +0xce fp=0xc000211698 sp=0xc000211678 pc=0x559dff068f0e Nov 01 14:36:35 220908-NB ollama[2592]: runtime.selectgo(0xc000211a80, 0xc000211820, 0xc000130000?, 0x0, 0x1?, 0x1) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/select.go:327 +0x725 fp=0xc0002117b8 sp=0xc000211698 pc=0x559dff07a2e5 Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).completion(0xc0000b2120, {0x559dff5e6870, 0xc0000de540}, 0xc0000b4480) Nov 01 14:36:35 220908-NB ollama[2592]: github.com/ollama/ollama/llama/runner/runner.go:659 +0x8d1 fp=0xc000211ab8 sp=0xc0002117b8 pc=0x559dff2ab111 Nov 01 14:36:35 220908-NB ollama[2592]: main.(*Server).completion-fm({0x559dff5e6870?, 0xc0000de540?}, 0x559dff286a6d?) Nov 01 14:36:35 220908-NB ollama[2592]: <autogenerated>:1 +0x36 fp=0xc000211ae8 sp=0xc000211ab8 pc=0x559dff2ae076 Nov 01 14:36:35 220908-NB ollama[2592]: net/http.HandlerFunc.ServeHTTP(0xc000098dd0?, {0x559dff5e6870?, 0xc0000de540?}, 0x10?) Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:2171 +0x29 fp=0xc000211b10 sp=0xc000211ae8 pc=0x559dff27f509 Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*ServeMux).ServeHTTP(0x559dff03be85?, {0x559dff5e6870, 0xc0000de540}, 0xc0000b4480) Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:2688 +0x1ad fp=0xc000211b60 sp=0xc000211b10 pc=0x559dff28138d Nov 01 14:36:35 220908-NB ollama[2592]: net/http.serverHandler.ServeHTTP({0x559dff5e5bc0?}, {0x559dff5e6870?, 0xc0000de540?}, 0x6?) Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:3142 +0x8e fp=0xc000211b90 sp=0xc000211b60 pc=0x559dff2823ae Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*conn).serve(0xc000192000, {0x559dff5e6c90, 0xc000096d80}) Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:2044 +0x5e8 fp=0xc000211fb8 sp=0xc000211b90 pc=0x559dff27e148 Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*Server).Serve.gowrap3() Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:3290 +0x28 fp=0xc000211fe0 sp=0xc000211fb8 pc=0x559dff282b28 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000211fe8 sp=0xc000211fe0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: created by net/http.(*Server).Serve in goroutine 1 Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:3290 +0x4b4 Nov 01 14:36:35 220908-NB ollama[2592]: goroutine 9 gp=0xc0000e41c0 m=nil [IO wait]: Nov 01 14:36:35 220908-NB ollama[2592]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x55?, 0xb?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/proc.go:402 +0xce fp=0xc0000555a8 sp=0xc000055588 pc=0x559dff068f0e Nov 01 14:36:35 220908-NB ollama[2592]: runtime.netpollblock(0x559dff0cf818?, 0xff031a26?, 0x9d?) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/netpoll.go:573 +0xf7 fp=0xc0000555e0 sp=0xc0000555a8 pc=0x559dff061157 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.runtime_pollWait(0x7fe103d87f28, 0x72) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/netpoll.go:345 +0x85 fp=0xc000055600 sp=0xc0000555e0 pc=0x559dff0959a5 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).wait(0xc000190000?, 0xc000186101?, 0x0) Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000055628 sp=0xc000055600 pc=0x559dff0e5c87 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*pollDesc).waitRead(...) Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll/fd_poll_runtime.go:89 Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll.(*FD).Read(0xc000190000, {0xc000186101, 0x1, 0x1}) Nov 01 14:36:35 220908-NB ollama[2592]: internal/poll/fd_unix.go:164 +0x27a fp=0xc0000556c0 sp=0xc000055628 pc=0x559dff0e67da Nov 01 14:36:35 220908-NB ollama[2592]: net.(*netFD).Read(0xc000190000, {0xc000186101?, 0xc000055748?, 0x559dff0975d0?}) Nov 01 14:36:35 220908-NB ollama[2592]: net/fd_posix.go:55 +0x25 fp=0xc000055708 sp=0xc0000556c0 pc=0x559dff153685 Nov 01 14:36:35 220908-NB ollama[2592]: net.(*conn).Read(0xc000188008, {0xc000186101?, 0x0?, 0x559dff89c060?}) Nov 01 14:36:35 220908-NB ollama[2592]: net/net.go:185 +0x45 fp=0xc000055750 sp=0xc000055708 pc=0x559dff15d945 Nov 01 14:36:35 220908-NB ollama[2592]: net.(*TCPConn).Read(0x559dff774830?, {0xc000186101?, 0x0?, 0x0?}) Nov 01 14:36:35 220908-NB ollama[2592]: <autogenerated>:1 +0x25 fp=0xc000055780 sp=0xc000055750 pc=0x559dff169325 Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*connReader).backgroundRead(0xc0001860f0) Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:681 +0x37 fp=0xc0000557c8 sp=0xc000055780 pc=0x559dff2780b7 Nov 01 14:36:35 220908-NB ollama[2592]: net/http.(*connReader).startBackgroundRead.gowrap2() Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:677 +0x25 fp=0xc0000557e0 sp=0xc0000557c8 pc=0x559dff277fe5 Nov 01 14:36:35 220908-NB ollama[2592]: runtime.goexit({}) Nov 01 14:36:35 220908-NB ollama[2592]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000557e8 sp=0xc0000557e0 pc=0x559dff09ace1 Nov 01 14:36:35 220908-NB ollama[2592]: created by net/http.(*connReader).startBackgroundRead in goroutine 18 Nov 01 14:36:35 220908-NB ollama[2592]: net/http/server.go:677 +0xba Nov 01 14:36:35 220908-NB ollama[2592]: rax 0x0 Nov 01 14:36:35 220908-NB ollama[2592]: rbx 0x7fe0851fc000 Nov 01 14:36:35 220908-NB ollama[2592]: rcx 0x7fe0e5a419fc Nov 01 14:36:35 220908-NB ollama[2592]: rdx 0x6 Nov 01 14:36:35 220908-NB ollama[2592]: rdi 0xadb Nov 01 14:36:35 220908-NB ollama[2592]: rsi 0xadf Nov 01 14:36:35 220908-NB ollama[2592]: rbp 0xadf Nov 01 14:36:35 220908-NB ollama[2592]: rsp 0x7fe0851d2e60 Nov 01 14:36:35 220908-NB ollama[2592]: r8 0x7fe0851d2f30 Nov 01 14:36:35 220908-NB ollama[2592]: r9 0x5 Nov 01 14:36:35 220908-NB ollama[2592]: r10 0x8 Nov 01 14:36:35 220908-NB ollama[2592]: r11 0x246 Nov 01 14:36:35 220908-NB ollama[2592]: r12 0x6 Nov 01 14:36:35 220908-NB ollama[2592]: r13 0x16 Nov 01 14:36:35 220908-NB ollama[2592]: r14 0x7fe0e96c6c48 Nov 01 14:36:35 220908-NB ollama[2592]: r15 0x7fe07c0c4000 Nov 01 14:36:35 220908-NB ollama[2592]: rip 0x7fe0e5a419fc Nov 01 14:36:35 220908-NB ollama[2592]: rflags 0x246 Nov 01 14:36:35 220908-NB ollama[2592]: cs 0x33 Nov 01 14:36:35 220908-NB ollama[2592]: fs 0x0 Nov 01 14:36:35 220908-NB ollama[2592]: gs 0x0 Nov 01 14:36:35 220908-NB ollama[2592]: [GIN] 2024/11/01 - 14:36:35 | 200 | 1.301513399s | 127.0.0.1 | POST "/api/chat" ``` nvidia-smi: ``` Fri Nov 1 14:39:47 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.23.07 Driver Version: 546.12 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 Ti On | 00000000:03:00.0 Off | N/A | | 0% 35C P8 12W / 225W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 3060 Ti On | 00000000:2E:00.0 Off | N/A | | 0% 31C P8 9W / 225W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2592 C /ollama N/A | | 0 N/A N/A 2592 C /ollama N/A | | 1 N/A N/A 2592 C /ollama N/A | | 1 N/A N/A 2592 C /ollama N/A | +---------------------------------------------------------------------------------------+ ``` ``` Fri Nov 1 15:08:46 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.04 Driver Version: 536.23 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 Ti On | 00000000:03:00.0 Off | N/A | | 0% 39C P8 10W / 225W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 173 C /ollama N/A | +---------------------------------------------------------------------------------------+ ```
Author
Owner

@rick-github commented on GitHub (Nov 1, 2024):

I'm unable to replicate the issue. The Nvidia documentation for cublasGemmBatchedEx says:

the combination of the parameters Atype,
Btype and Ctype or the algorithm, algo is not
supported

so this would seem to be a problem at the llama.cpp level, although there are no directly related issues there. Have you considered using the smollm2 models instead?

<!-- gh-comment-id:2451634625 --> @rick-github commented on GitHub (Nov 1, 2024): I'm unable to replicate the issue. The Nvidia documentation for `cublasGemmBatchedEx` says: ``` the combination of the parameters Atype, Btype and Ctype or the algorithm, algo is not supported ``` so this would seem to be a problem at the [llama.cpp](https://github.com/ggerganov/llama.cpp/issues) level, although there are no directly related issues there. Have you considered using the [smollm2](https://ollama.com/library/smollm2:135m) models instead?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51223