[GH-ISSUE #4893] Error: error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 #49600

Closed
opened 2026-04-28 12:23:22 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Hsiayukoo on GitHub (Jun 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4893

What is the issue?

1. background

I want to use llama.cpp to build llama2-7b model based on my own ckpt file, follow theses steps:

  1. Download a llama2-7b.Q2_k from hugging face.(This gguf file can be loaded by Ollama)
  2. read the gguf file and get the metadata key value pairs, save it into json format.
  3. read the ckpt file, convert the tensor name to gguf format and convert the tensor to numpy.ndarray
  4. follow llama.cpp example writer(https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/examples/writer.py) to write the example.gguf( I have checked the start of each value before the rest of the file is same as the llama-2-7b.Q2_K.gguf, even the start offset of them are the same, and I also checked that the example.gguf can be quantize by llama2.cpp)

image

2. what happended?

error loading llama server" error="llama runner process has terminated: exit status 0xc0000409

here is the log.

[GIN] 2024/06/07 - 09:15:59 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2024/06/07 - 09:16:40 | 200 |            0s |       127.0.0.1 | POST     "/api/blobs/sha256:1446039e892b513e16dd803d0b4ca3b8ee9c2b0c61b808f4884d070d01dc9f2f"
[GIN] 2024/06/07 - 09:19:25 | 200 |         2m44s |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/06/07 - 09:19:33 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2024/06/07 - 09:19:33 | 200 |     26.3074ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/06/07 - 09:19:33 | 200 |      1.1037ms |       127.0.0.1 | POST     "/api/show"
time=2024-06-07T09:19:34.199+08:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=33 memory.available="55.3 GiB" memory.required.full="13.8 GiB" memory.required.partial="13.8 GiB" memory.required.kv="1.0 GiB" memory.weights.total="12.3 GiB" memory.weights.repeating="12.1 GiB" memory.weights.nonrepeating="250.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="193.0 MiB"
time=2024-06-07T09:19:34.201+08:00 level=INFO source=server.go:341 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\cpu_avx2\\ollama_llama_server.exe --model D:\\ollama_models\\blobs\\sha256-1446039e892b513e16dd803d0b4ca3b8ee9c2b0c61b808f4884d070d01dc9f2f --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 64940"
time=2024-06-07T09:19:34.245+08:00 level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-06-07T09:19:34.245+08:00 level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
time=2024-06-07T09:19:34.245+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
INFO [wmain] build info | build=3051 commit="5921b8f0" tid="11088" timestamp=1717723174
INFO [wmain] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="11088" timestamp=1717723174 total_threads=16
INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="64940" tid="11088" timestamp=1717723174
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from D:\ollama_models\blobs\sha256-1446039e892b513e16dd803d0b4ca3b8ee9c2b0c61b808f4884d070d01dc9f2f (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 10
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f16:  291 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.3368 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q2_K - Medium
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 12.55 GiB (16.00 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.15 MiB
time=2024-06-07T09:19:34.503+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model"
llm_load_tensors:        CPU buffer size = 12852.51 MiB
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.14 MiB
llama_new_context_with_model:        CPU compute buffer size =   164.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml.c:10240: src1->type == GGML_TYPE_F32 && "only f32 src1 supported for now"
GGML_ASSERT:time=2024-06-07T09:19:36.086+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
time=2024-06-07T09:19:36.349+08:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 "
[GIN] 2024/06/07 - 09:19:36 | 500 |    2.4369838s |       127.0.0.1 | POST     "/api/chat"

here is the log of quantize.

PS D:\gguf-mindspore\llama.cpp-master> .\bin\quantize.exe .\example.gguf .\example_q2.gguf q2_k
main: build = 10 (a30cd28)
main: built with gcc.exe (x86_64-posix-seh-rev1, Built by MinGW-Builds project) 13.2.0 for x86_64-w64-mingw32
main: quantizing '.\example.gguf' to '.\example_q2.gguf' as Q2_K
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from .\example.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 10
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f16:  291 tensors
[   1/ 291]                    token_embd.weight - [ 4096, 32000,     1,     1], type =    f16, converting to q2_K .. size =   250.00 MiB ->    41.02 MiB
[   2/ 291]                  blk.0.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[   3/ 291]                  blk.0.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[   4/ 291]                  blk.0.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[   5/ 291]             blk.0.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[   6/ 291]                blk.0.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[   7/ 291]                  blk.0.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[   8/ 291]                blk.0.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[   9/ 291]               blk.0.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  10/ 291]                blk.0.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  11/ 291]                  blk.1.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  12/ 291]                  blk.1.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  13/ 291]                  blk.1.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  14/ 291]             blk.1.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  15/ 291]                blk.1.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  16/ 291]                  blk.1.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  17/ 291]                blk.1.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  18/ 291]               blk.1.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  19/ 291]                blk.1.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  20/ 291]                  blk.2.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  21/ 291]                  blk.2.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  22/ 291]                  blk.2.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  23/ 291]             blk.2.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  24/ 291]                blk.2.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  25/ 291]                  blk.2.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  26/ 291]                blk.2.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  27/ 291]               blk.2.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  28/ 291]                blk.2.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  29/ 291]                  blk.3.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  30/ 291]                  blk.3.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  31/ 291]                  blk.3.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  32/ 291]             blk.3.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  33/ 291]                blk.3.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  34/ 291]                  blk.3.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  35/ 291]                blk.3.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  36/ 291]               blk.3.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  37/ 291]                blk.3.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  38/ 291]                  blk.4.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  39/ 291]                  blk.4.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  40/ 291]                  blk.4.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  41/ 291]             blk.4.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  42/ 291]                blk.4.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  43/ 291]                  blk.4.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  44/ 291]                blk.4.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  45/ 291]               blk.4.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  46/ 291]                blk.4.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  47/ 291]                  blk.5.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  48/ 291]                  blk.5.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  49/ 291]                  blk.5.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  50/ 291]             blk.5.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  51/ 291]                blk.5.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  52/ 291]                  blk.5.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  53/ 291]                blk.5.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  54/ 291]               blk.5.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  55/ 291]                blk.5.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  56/ 291]                  blk.6.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  57/ 291]                  blk.6.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  58/ 291]                  blk.6.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  59/ 291]             blk.6.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  60/ 291]                blk.6.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  61/ 291]                  blk.6.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  62/ 291]                blk.6.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  63/ 291]               blk.6.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  64/ 291]                blk.6.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  65/ 291]                  blk.7.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  66/ 291]                  blk.7.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  67/ 291]                  blk.7.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  68/ 291]             blk.7.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  69/ 291]                blk.7.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  70/ 291]                  blk.7.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  71/ 291]                blk.7.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  72/ 291]               blk.7.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  73/ 291]                blk.7.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  74/ 291]                  blk.8.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  75/ 291]                  blk.8.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  76/ 291]                  blk.8.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  77/ 291]             blk.8.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  78/ 291]                blk.8.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  79/ 291]                  blk.8.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  80/ 291]                blk.8.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  81/ 291]               blk.8.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  82/ 291]                blk.8.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  83/ 291]                  blk.9.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  84/ 291]                  blk.9.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  85/ 291]                  blk.9.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  86/ 291]             blk.9.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  87/ 291]                blk.9.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  88/ 291]                  blk.9.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  89/ 291]                blk.9.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  90/ 291]               blk.9.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  91/ 291]                blk.9.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[  92/ 291]                 blk.10.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  93/ 291]                 blk.10.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[  94/ 291]                 blk.10.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  95/ 291]            blk.10.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[  96/ 291]               blk.10.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  97/ 291]                 blk.10.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[  98/ 291]               blk.10.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[  99/ 291]              blk.10.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 100/ 291]               blk.10.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 101/ 291]                 blk.11.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 102/ 291]                 blk.11.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 103/ 291]                 blk.11.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 104/ 291]            blk.11.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 105/ 291]               blk.11.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 106/ 291]                 blk.11.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 107/ 291]               blk.11.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 108/ 291]              blk.11.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 109/ 291]               blk.11.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 110/ 291]                 blk.12.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 111/ 291]                 blk.12.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 112/ 291]                 blk.12.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 113/ 291]            blk.12.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 114/ 291]               blk.12.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 115/ 291]                 blk.12.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 116/ 291]               blk.12.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 117/ 291]              blk.12.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 118/ 291]               blk.12.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 119/ 291]                 blk.13.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 120/ 291]                 blk.13.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 121/ 291]                 blk.13.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 122/ 291]            blk.13.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 123/ 291]               blk.13.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 124/ 291]                 blk.13.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 125/ 291]               blk.13.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 126/ 291]              blk.13.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 127/ 291]               blk.13.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 128/ 291]                 blk.14.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 129/ 291]                 blk.14.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 130/ 291]                 blk.14.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 131/ 291]            blk.14.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 132/ 291]               blk.14.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 133/ 291]                 blk.14.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 134/ 291]               blk.14.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 135/ 291]              blk.14.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 136/ 291]               blk.14.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 137/ 291]                 blk.15.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 138/ 291]                 blk.15.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 139/ 291]                 blk.15.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 140/ 291]            blk.15.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 141/ 291]               blk.15.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 142/ 291]                 blk.15.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 143/ 291]               blk.15.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 144/ 291]              blk.15.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 145/ 291]               blk.15.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 146/ 291]                 blk.16.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 147/ 291]                 blk.16.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 148/ 291]                 blk.16.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 149/ 291]            blk.16.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 150/ 291]               blk.16.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 151/ 291]                 blk.16.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 152/ 291]               blk.16.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 153/ 291]              blk.16.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 154/ 291]               blk.16.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 155/ 291]                 blk.17.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 156/ 291]                 blk.17.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 157/ 291]                 blk.17.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 158/ 291]            blk.17.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 159/ 291]               blk.17.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 160/ 291]                 blk.17.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 161/ 291]               blk.17.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 162/ 291]              blk.17.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 163/ 291]               blk.17.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 164/ 291]                 blk.18.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 165/ 291]                 blk.18.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 166/ 291]                 blk.18.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 167/ 291]            blk.18.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 168/ 291]               blk.18.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 169/ 291]                 blk.18.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 170/ 291]               blk.18.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 171/ 291]              blk.18.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 172/ 291]               blk.18.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 173/ 291]                 blk.19.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 174/ 291]                 blk.19.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 175/ 291]                 blk.19.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 176/ 291]            blk.19.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 177/ 291]               blk.19.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 178/ 291]                 blk.19.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 179/ 291]               blk.19.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 180/ 291]              blk.19.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 181/ 291]               blk.19.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 182/ 291]                 blk.20.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 183/ 291]                 blk.20.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 184/ 291]                 blk.20.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 185/ 291]            blk.20.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 186/ 291]               blk.20.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 187/ 291]                 blk.20.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 188/ 291]               blk.20.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 189/ 291]              blk.20.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 190/ 291]               blk.20.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 191/ 291]                 blk.21.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 192/ 291]                 blk.21.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 193/ 291]                 blk.21.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 194/ 291]            blk.21.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 195/ 291]               blk.21.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 196/ 291]                 blk.21.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 197/ 291]               blk.21.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 198/ 291]              blk.21.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 199/ 291]               blk.21.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 200/ 291]                 blk.22.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 201/ 291]                 blk.22.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 202/ 291]                 blk.22.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 203/ 291]            blk.22.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 204/ 291]               blk.22.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 205/ 291]                 blk.22.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 206/ 291]               blk.22.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 207/ 291]              blk.22.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 208/ 291]               blk.22.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 209/ 291]                 blk.23.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 210/ 291]                 blk.23.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 211/ 291]                 blk.23.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 212/ 291]            blk.23.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 213/ 291]               blk.23.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 214/ 291]                 blk.23.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 215/ 291]               blk.23.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 216/ 291]              blk.23.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 217/ 291]               blk.23.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 218/ 291]                 blk.24.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 219/ 291]                 blk.24.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 220/ 291]                 blk.24.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 221/ 291]            blk.24.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 222/ 291]               blk.24.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 223/ 291]                 blk.24.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 224/ 291]               blk.24.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 225/ 291]              blk.24.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 226/ 291]               blk.24.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 227/ 291]                 blk.25.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 228/ 291]                 blk.25.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 229/ 291]                 blk.25.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 230/ 291]            blk.25.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 231/ 291]               blk.25.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 232/ 291]                 blk.25.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 233/ 291]               blk.25.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 234/ 291]              blk.25.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 235/ 291]               blk.25.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 236/ 291]                 blk.26.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 237/ 291]                 blk.26.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 238/ 291]                 blk.26.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 239/ 291]            blk.26.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 240/ 291]               blk.26.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 241/ 291]                 blk.26.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 242/ 291]               blk.26.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 243/ 291]              blk.26.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 244/ 291]               blk.26.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 245/ 291]                 blk.27.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 246/ 291]                 blk.27.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 247/ 291]                 blk.27.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 248/ 291]            blk.27.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 249/ 291]               blk.27.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 250/ 291]                 blk.27.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 251/ 291]               blk.27.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 252/ 291]              blk.27.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 253/ 291]               blk.27.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 254/ 291]                 blk.28.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 255/ 291]                 blk.28.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 256/ 291]                 blk.28.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 257/ 291]            blk.28.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 258/ 291]               blk.28.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 259/ 291]                 blk.28.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 260/ 291]               blk.28.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 261/ 291]              blk.28.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 262/ 291]               blk.28.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 263/ 291]                 blk.29.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 264/ 291]                 blk.29.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 265/ 291]                 blk.29.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 266/ 291]            blk.29.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 267/ 291]               blk.29.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 268/ 291]                 blk.29.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 269/ 291]               blk.29.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 270/ 291]              blk.29.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 271/ 291]               blk.29.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 272/ 291]                 blk.30.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 273/ 291]                 blk.30.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 274/ 291]                 blk.30.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 275/ 291]            blk.30.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 276/ 291]               blk.30.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 277/ 291]                 blk.30.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 278/ 291]               blk.30.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 279/ 291]              blk.30.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 280/ 291]               blk.30.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 281/ 291]                 blk.31.attn_q.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 282/ 291]                 blk.31.attn_k.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q2_K .. size =    32.00 MiB ->     5.25 MiB
[ 283/ 291]                 blk.31.attn_v.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 284/ 291]            blk.31.attn_output.weight - [ 4096,  4096,     1,     1], type =    f16, converting to q3_K .. size =    32.00 MiB ->     6.88 MiB
[ 285/ 291]               blk.31.ffn_gate.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 286/ 291]                 blk.31.ffn_up.weight - [ 4096, 11008,     1,     1], type =    f16, converting to q2_K .. size =    86.00 MiB ->    14.11 MiB
[ 287/ 291]               blk.31.ffn_down.weight - [11008,  4096,     1,     1], type =    f16, converting to q3_K .. size =    86.00 MiB ->    18.48 MiB
[ 288/ 291]              blk.31.attn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 289/ 291]               blk.31.ffn_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 290/ 291]                   output_norm.weight - [ 4096,     1,     1,     1], type =    f16, size =    0.008 MB
[ 291/ 291]                        output.weight - [ 4096, 32000,     1,     1], type =    f16, converting to q6_K .. size =   250.00 MiB ->   102.54 MiB
llama_model_quantize_internal: model size  = 12852.51 MB
llama_model_quantize_internal: quant size  =  2414.31 MB

main: quantize time = 80060.02 ms
main:    total time = 80060.03 ms

OS

Windows

GPU

Other

CPU

Intel

Ollama version

0.1.41

Originally created by @Hsiayukoo on GitHub (Jun 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4893 ### What is the issue? ### 1. background I want to use **llama.cpp** to build llama2-7b model based on my own ckpt file, follow theses steps: 1. Download a [llama2-7b.Q2_k](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q2_K.gguf) from hugging face.(This gguf file can be loaded by Ollama) 2. read the gguf file and get the metadata key value pairs, **save it into json format**. 3. read the ckpt file, **convert the tensor name to gguf format and convert the tensor to numpy.ndarray** 4. follow **llama.cpp** example writer(https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/examples/writer.py) to write the example.gguf( I have checked the start of each value before the rest of the file is same as the llama-2-7b.Q2_K.gguf, even the start offset of them are the same, and I also checked that the example.gguf can be quantize by llama2.cpp) ![image](https://github.com/ollama/ollama/assets/81662220/076e7442-5bfc-49b6-b10d-a9fc93cf92b8) ### 2. what happended? **error loading llama server" error="llama runner process has terminated: exit status 0xc0000409** here is the log. ```shell [GIN] 2024/06/07 - 09:15:59 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2024/06/07 - 09:16:40 | 200 | 0s | 127.0.0.1 | POST "/api/blobs/sha256:1446039e892b513e16dd803d0b4ca3b8ee9c2b0c61b808f4884d070d01dc9f2f" [GIN] 2024/06/07 - 09:19:25 | 200 | 2m44s | 127.0.0.1 | POST "/api/create" [GIN] 2024/06/07 - 09:19:33 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2024/06/07 - 09:19:33 | 200 | 26.3074ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/06/07 - 09:19:33 | 200 | 1.1037ms | 127.0.0.1 | POST "/api/show" time=2024-06-07T09:19:34.199+08:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=33 memory.available="55.3 GiB" memory.required.full="13.8 GiB" memory.required.partial="13.8 GiB" memory.required.kv="1.0 GiB" memory.weights.total="12.3 GiB" memory.weights.repeating="12.1 GiB" memory.weights.nonrepeating="250.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="193.0 MiB" time=2024-06-07T09:19:34.201+08:00 level=INFO source=server.go:341 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\cpu_avx2\\ollama_llama_server.exe --model D:\\ollama_models\\blobs\\sha256-1446039e892b513e16dd803d0b4ca3b8ee9c2b0c61b808f4884d070d01dc9f2f --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 64940" time=2024-06-07T09:19:34.245+08:00 level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-06-07T09:19:34.245+08:00 level=INFO source=server.go:529 msg="waiting for llama runner to start responding" time=2024-06-07T09:19:34.245+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" INFO [wmain] build info | build=3051 commit="5921b8f0" tid="11088" timestamp=1717723174 INFO [wmain] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="11088" timestamp=1717723174 total_threads=16 INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="64940" tid="11088" timestamp=1717723174 llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from D:\ollama_models\blobs\sha256-1446039e892b513e16dd803d0b4ca3b8ee9c2b0c61b808f4884d070d01dc9f2f (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 10 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - type f16: 291 tensors llm_load_vocab: special tokens cache size = 259 llm_load_vocab: token to piece cache size = 0.3368 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q2_K - Medium llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 12.55 GiB (16.00 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.15 MiB time=2024-06-07T09:19:34.503+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model" llm_load_tensors: CPU buffer size = 12852.51 MiB llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CPU output buffer size = 0.14 MiB llama_new_context_with_model: CPU compute buffer size = 164.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml.c:10240: src1->type == GGML_TYPE_F32 && "only f32 src1 supported for now" GGML_ASSERT:time=2024-06-07T09:19:36.086+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" time=2024-06-07T09:19:36.349+08:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 " [GIN] 2024/06/07 - 09:19:36 | 500 | 2.4369838s | 127.0.0.1 | POST "/api/chat" ``` here is the log of quantize. ```shell PS D:\gguf-mindspore\llama.cpp-master> .\bin\quantize.exe .\example.gguf .\example_q2.gguf q2_k main: build = 10 (a30cd28) main: built with gcc.exe (x86_64-posix-seh-rev1, Built by MinGW-Builds project) 13.2.0 for x86_64-w64-mingw32 main: quantizing '.\example.gguf' to '.\example_q2.gguf' as Q2_K llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from .\example.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 10 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - type f16: 291 tensors [ 1/ 291] token_embd.weight - [ 4096, 32000, 1, 1], type = f16, converting to q2_K .. size = 250.00 MiB -> 41.02 MiB [ 2/ 291] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 3/ 291] blk.0.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 4/ 291] blk.0.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 5/ 291] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 6/ 291] blk.0.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 7/ 291] blk.0.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 8/ 291] blk.0.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 9/ 291] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 10/ 291] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 11/ 291] blk.1.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 12/ 291] blk.1.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 13/ 291] blk.1.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 14/ 291] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 15/ 291] blk.1.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 16/ 291] blk.1.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 17/ 291] blk.1.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 18/ 291] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 19/ 291] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 20/ 291] blk.2.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 21/ 291] blk.2.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 22/ 291] blk.2.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 23/ 291] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 24/ 291] blk.2.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 25/ 291] blk.2.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 26/ 291] blk.2.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 27/ 291] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 28/ 291] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 29/ 291] blk.3.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 30/ 291] blk.3.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 31/ 291] blk.3.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 32/ 291] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 33/ 291] blk.3.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 34/ 291] blk.3.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 35/ 291] blk.3.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 36/ 291] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 37/ 291] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 38/ 291] blk.4.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 39/ 291] blk.4.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 40/ 291] blk.4.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 41/ 291] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 42/ 291] blk.4.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 43/ 291] blk.4.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 44/ 291] blk.4.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 45/ 291] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 46/ 291] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 47/ 291] blk.5.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 48/ 291] blk.5.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 49/ 291] blk.5.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 50/ 291] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 51/ 291] blk.5.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 52/ 291] blk.5.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 53/ 291] blk.5.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 54/ 291] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 55/ 291] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 56/ 291] blk.6.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 57/ 291] blk.6.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 58/ 291] blk.6.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 59/ 291] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 60/ 291] blk.6.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 61/ 291] blk.6.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 62/ 291] blk.6.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 63/ 291] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 64/ 291] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 65/ 291] blk.7.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 66/ 291] blk.7.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 67/ 291] blk.7.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 68/ 291] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 69/ 291] blk.7.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 70/ 291] blk.7.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 71/ 291] blk.7.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 72/ 291] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 73/ 291] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 74/ 291] blk.8.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 75/ 291] blk.8.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 76/ 291] blk.8.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 77/ 291] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 78/ 291] blk.8.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 79/ 291] blk.8.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 80/ 291] blk.8.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 81/ 291] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 82/ 291] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 83/ 291] blk.9.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 84/ 291] blk.9.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 85/ 291] blk.9.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 86/ 291] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 87/ 291] blk.9.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 88/ 291] blk.9.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 89/ 291] blk.9.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 90/ 291] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 91/ 291] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 92/ 291] blk.10.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 93/ 291] blk.10.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 94/ 291] blk.10.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 95/ 291] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 96/ 291] blk.10.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 97/ 291] blk.10.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 98/ 291] blk.10.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 99/ 291] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 100/ 291] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 101/ 291] blk.11.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 102/ 291] blk.11.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 103/ 291] blk.11.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 104/ 291] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 105/ 291] blk.11.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 106/ 291] blk.11.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 107/ 291] blk.11.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 108/ 291] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 109/ 291] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 110/ 291] blk.12.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 111/ 291] blk.12.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 112/ 291] blk.12.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 113/ 291] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 114/ 291] blk.12.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 115/ 291] blk.12.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 116/ 291] blk.12.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 117/ 291] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 118/ 291] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 119/ 291] blk.13.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 120/ 291] blk.13.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 121/ 291] blk.13.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 122/ 291] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 123/ 291] blk.13.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 124/ 291] blk.13.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 125/ 291] blk.13.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 126/ 291] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 127/ 291] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 128/ 291] blk.14.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 129/ 291] blk.14.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 130/ 291] blk.14.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 131/ 291] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 132/ 291] blk.14.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 133/ 291] blk.14.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 134/ 291] blk.14.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 135/ 291] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 136/ 291] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 137/ 291] blk.15.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 138/ 291] blk.15.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 139/ 291] blk.15.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 140/ 291] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 141/ 291] blk.15.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 142/ 291] blk.15.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 143/ 291] blk.15.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 144/ 291] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 145/ 291] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 146/ 291] blk.16.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 147/ 291] blk.16.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 148/ 291] blk.16.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 149/ 291] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 150/ 291] blk.16.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 151/ 291] blk.16.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 152/ 291] blk.16.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 153/ 291] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 154/ 291] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 155/ 291] blk.17.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 156/ 291] blk.17.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 157/ 291] blk.17.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 158/ 291] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 159/ 291] blk.17.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 160/ 291] blk.17.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 161/ 291] blk.17.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 162/ 291] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 163/ 291] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 164/ 291] blk.18.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 165/ 291] blk.18.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 166/ 291] blk.18.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 167/ 291] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 168/ 291] blk.18.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 169/ 291] blk.18.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 170/ 291] blk.18.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 171/ 291] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 172/ 291] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 173/ 291] blk.19.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 174/ 291] blk.19.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 175/ 291] blk.19.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 176/ 291] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 177/ 291] blk.19.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 178/ 291] blk.19.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 179/ 291] blk.19.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 180/ 291] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 181/ 291] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 182/ 291] blk.20.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 183/ 291] blk.20.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 184/ 291] blk.20.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 185/ 291] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 186/ 291] blk.20.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 187/ 291] blk.20.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 188/ 291] blk.20.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 189/ 291] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 190/ 291] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 191/ 291] blk.21.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 192/ 291] blk.21.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 193/ 291] blk.21.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 194/ 291] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 195/ 291] blk.21.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 196/ 291] blk.21.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 197/ 291] blk.21.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 198/ 291] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 199/ 291] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 200/ 291] blk.22.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 201/ 291] blk.22.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 202/ 291] blk.22.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 203/ 291] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 204/ 291] blk.22.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 205/ 291] blk.22.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 206/ 291] blk.22.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 207/ 291] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 208/ 291] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 209/ 291] blk.23.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 210/ 291] blk.23.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 211/ 291] blk.23.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 212/ 291] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 213/ 291] blk.23.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 214/ 291] blk.23.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 215/ 291] blk.23.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 216/ 291] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 217/ 291] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 218/ 291] blk.24.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 219/ 291] blk.24.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 220/ 291] blk.24.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 221/ 291] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 222/ 291] blk.24.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 223/ 291] blk.24.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 224/ 291] blk.24.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 225/ 291] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 226/ 291] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 227/ 291] blk.25.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 228/ 291] blk.25.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 229/ 291] blk.25.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 230/ 291] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 231/ 291] blk.25.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 232/ 291] blk.25.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 233/ 291] blk.25.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 234/ 291] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 235/ 291] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 236/ 291] blk.26.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 237/ 291] blk.26.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 238/ 291] blk.26.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 239/ 291] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 240/ 291] blk.26.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 241/ 291] blk.26.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 242/ 291] blk.26.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 243/ 291] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 244/ 291] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 245/ 291] blk.27.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 246/ 291] blk.27.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 247/ 291] blk.27.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 248/ 291] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 249/ 291] blk.27.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 250/ 291] blk.27.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 251/ 291] blk.27.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 252/ 291] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 253/ 291] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 254/ 291] blk.28.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 255/ 291] blk.28.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 256/ 291] blk.28.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 257/ 291] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 258/ 291] blk.28.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 259/ 291] blk.28.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 260/ 291] blk.28.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 261/ 291] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 262/ 291] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 263/ 291] blk.29.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 264/ 291] blk.29.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 265/ 291] blk.29.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 266/ 291] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 267/ 291] blk.29.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 268/ 291] blk.29.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 269/ 291] blk.29.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 270/ 291] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 271/ 291] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 272/ 291] blk.30.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 273/ 291] blk.30.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 274/ 291] blk.30.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 275/ 291] blk.30.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 276/ 291] blk.30.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 277/ 291] blk.30.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 278/ 291] blk.30.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 279/ 291] blk.30.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 280/ 291] blk.30.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 281/ 291] blk.31.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 282/ 291] blk.31.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, converting to q2_K .. size = 32.00 MiB -> 5.25 MiB [ 283/ 291] blk.31.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 284/ 291] blk.31.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q3_K .. size = 32.00 MiB -> 6.88 MiB [ 285/ 291] blk.31.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 286/ 291] blk.31.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, converting to q2_K .. size = 86.00 MiB -> 14.11 MiB [ 287/ 291] blk.31.ffn_down.weight - [11008, 4096, 1, 1], type = f16, converting to q3_K .. size = 86.00 MiB -> 18.48 MiB [ 288/ 291] blk.31.attn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 289/ 291] blk.31.ffn_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 290/ 291] output_norm.weight - [ 4096, 1, 1, 1], type = f16, size = 0.008 MB [ 291/ 291] output.weight - [ 4096, 32000, 1, 1], type = f16, converting to q6_K .. size = 250.00 MiB -> 102.54 MiB llama_model_quantize_internal: model size = 12852.51 MB llama_model_quantize_internal: quant size = 2414.31 MB main: quantize time = 80060.02 ms main: total time = 80060.03 ms ``` ### OS Windows ### GPU Other ### CPU Intel ### Ollama version 0.1.41
GiteaMirror added the bug label 2026-04-28 12:23:22 -05:00
Author
Owner

@jmorganca commented on GitHub (Jun 9, 2024):

Hi there @Hsiayukoo sorry this happened. It seems from the assertion only f32 src1 supported for now there may have been an issue converting and quantizing the model. It seems there may be a tensor that should be in f32 format that may be in something else.

You can import GGUF files directly into Ollama via ollama create.

Create a Modelfile:

FROM ./my-model.gguf

then create the ollama model

ollama create my-model -f Modelfile

and you should be able to use it from there:

ollama run my-mdoel

Let me know if this helps

<!-- gh-comment-id:2156711365 --> @jmorganca commented on GitHub (Jun 9, 2024): Hi there @Hsiayukoo sorry this happened. It seems from the assertion `only f32 src1 supported for now` there may have been an issue converting and quantizing the model. It seems there may be a tensor that should be in f32 format that may be in something else. You can import GGUF files directly into Ollama via `ollama create`. Create a `Modelfile`: ``` FROM ./my-model.gguf ``` then create the ollama model ``` ollama create my-model -f Modelfile ``` and you should be able to use it from there: ``` ollama run my-mdoel ``` Let me know if this helps
Author
Owner

@Hsiayukoo commented on GitHub (Jun 11, 2024):

thanks, I tried to convert all the tensors to np.float32 and it works.

<!-- gh-comment-id:2159723589 --> @Hsiayukoo commented on GitHub (Jun 11, 2024): thanks, I tried to convert all the tensors to np.float32 and it works.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49600