[GH-ISSUE #7188] Bad juju creating a model (the llama.cpp generated file starts with "GGUF") #4568

Closed
opened 2026-04-12 15:30:07 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @robbiemu on GitHub (Oct 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7188

What is the issue?

I have the 2b base model of Salamandra quantized to different weights, but I am getting an error creating it.

ollama create salamandra:2b_bf16 -f ./Modelfile
transferring model data 100%
Error: invalid file magic

I hesitate to even use the chat template because it is a base model, not a chat model, so its chat looks kinda erratic.

> hola, puedes decirme por qué el sol es amarillo?
2014-03-09
Hola!
El color del Sol se debe a que la luz solar tiene una longitud de onda larga (de unos 580 nm) y corta (entre los 400nm y los 760nm).
La radiación visible es aquella cuya longitud de onda está entre el rojo y el azul, por lo tanto las longitudes de onda más cortas son rojas.
El Sol emite luz en todas direcciones pero la mayor
>

but its text generation looks good (aligned with what you expect from a base model). this is a log of a full run in case the details are pertinent to the above magic

llama-cli -m ./salamandra-2b_bf16.gguf --ctx-size 8192 --rope-freq-base 10000.0 --top-p 0.95 --repeat-penalty 1.2 --temp 0.1 --n-predict 128 --top-k 40 -p "hola, puedes decirme por qué el sol es amarillo?"
build: 3889 (b6d6c528) with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin24.0.0
main: warning: changing RoPE frequency base to 10000.
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 29 key-value pairs and 219 tensors from ./salamandra-2b_bf16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                         general.size_label str              = 2.3B
llama_model_loader: - kv   3:                            general.license str              = apache-2.0
llama_model_loader: - kv   4:                               general.tags arr[str,1]       = ["text-generation"]
llama_model_loader: - kv   5:                          general.languages arr[str,36]      = ["bg", "ca", "code", "cs", "cy", "da"...
llama_model_loader: - kv   6:                          llama.block_count u32              = 24
llama_model_loader: - kv   7:                       llama.context_length u32              = 8192
llama_model_loader: - kv   8:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   9:                  llama.feed_forward_length u32              = 5440
llama_model_loader: - kv  10:                 llama.attention.head_count u32              = 16
llama_model_loader: - kv  11:              llama.attention.head_count_kv u32              = 16
llama_model_loader: - kv  12:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  13:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  14:                          general.file_type u32              = 32
llama_model_loader: - kv  15:                           llama.vocab_size u32              = 256000
llama_model_loader: - kv  16:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  17:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  18:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  19:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  20:                      tokenizer.ggml.tokens arr[str,256000]  = ["<unk>", "<s>", "</s>", "<pad>", "<|...
llama_model_loader: - kv  21:                      tokenizer.ggml.scores arr[f32,256000]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  22:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  23:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  24:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  25:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  26:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  27:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   49 tensors
llama_model_loader: - type bf16:  170 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 104
llm_load_vocab: token to piece cache size = 1.8842 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 256000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5440
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = BF16
llm_load_print_meta: model params     = 2.25 B
llm_load_print_meta: model size       = 4.20 GiB (16.00 BPW)
llm_load_print_meta: general.name     = n/a
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 145 '<0x0A>'
llm_load_print_meta: EOT token        = 5 '<|im_end|>'
llm_load_print_meta: EOG token        = 2 '</s>'
llm_load_print_meta: EOG token        = 5 '<|im_end|>'
llm_load_print_meta: max token length = 72
llm_load_tensors: ggml ctx size =    0.20 MiB
llm_load_tensors: offloading 24 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors:      Metal buffer size =  4298.39 MiB
llm_load_tensors:        CPU buffer size =  1000.00 MiB
.......................................................
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M3 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 42949.67 MB
llama_kv_cache_init:      Metal KV buffer size =  1536.00 MiB
llama_new_context_with_model: KV self size  = 1536.00 MiB, K (f16):  768.00 MiB, V (f16):  768.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.98 MiB
llama_new_context_with_model:      Metal compute buffer size =   288.00 MiB
llama_new_context_with_model:        CPU compute buffer size =   500.00 MiB
llama_new_context_with_model: graph nodes  = 774
llama_new_context_with_model: graph splits = 339
llama_init_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 12

system_info: n_threads = 12 (n_threads_batch = 12) / 16 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 1 | LLAMAFILE = 1 |

sampler seed: 892523417
sampler params:
	repeat_last_n = 64, repeat_penalty = 1.200, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.100
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist
generate: n_ctx = 8192, n_batch = 2048, n_predict = 128, n_keep = 1

 hola, puedes decirme por qué el sol es amarillo?
¿Por qué la luz del sol se ve más clara en verano que en invierno?
La respuesta a esta pregunta está relacionada con las propiedades de los colores.
Los colores son formas diferentes de energía electromagnética y tienen una longitud de onda diferente.
En general, cuanto mayor sea el número de vibraciones por segundo (velocidad), menor será la frecuencia del color.
Por ejemplo, si se mide en Hertz (Hz) o ciclos por segundo, un color rojo tiene 160 Hz mientras que uno azul tiene solo 435 Hz.
La luz visible es una combinación de colores compu

llama_perf_sampler_print:    sampling time =     132.98 ms /   143 runs   (    0.93 ms per token,  1075.34 tokens per second)
llama_perf_context_print:        load time =     685.66 ms
llama_perf_context_print: prompt eval time =    1167.94 ms /    15 tokens (   77.86 ms per token,    12.84 tokens per second)
llama_perf_context_print:        eval time =   17103.92 ms /   127 runs   (  134.68 ms per token,     7.43 tokens per second)
llama_perf_context_print:       total time =   18423.56 ms /   142 tokens
ggml_metal_free: deallocating

but perhaps when I wrote it, I did something wrong in my model file?

# Ollama Modelfile for Salamandra 2B IQ4_NL

FROM ./salamandra-2b_Q8_0.gguf

# Model Parameters
PARAMETER num_ctx 8192
PARAMETER rope_freq_base 10000.0
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1.2

# System Prompt
SYSTEM """You are a multilingual assistant capable of understanding and responding in multiple languages. Adapt your responses to match the user's input language while providing clear, accurate, and concise information."""

# Template
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>{{ end }}<|im_start|>assistant"""

# License
LICENSE """

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2024 Language Technologies Unit, Barcelona Supercomputing Center

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License."""

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.13

Originally created by @robbiemu on GitHub (Oct 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7188 ### What is the issue? I have the [2b base model of Salamandra](https://huggingface.co/robbiemu/salamandra-2b) quantized to different weights, but I am getting an error creating it. ``` ollama create salamandra:2b_bf16 -f ./Modelfile transferring model data 100% Error: invalid file magic ``` I hesitate to even use the chat template because it is a base model, not a chat model, so its chat looks kinda erratic. ``` > hola, puedes decirme por qué el sol es amarillo? 2014-03-09 Hola! El color del Sol se debe a que la luz solar tiene una longitud de onda larga (de unos 580 nm) y corta (entre los 400nm y los 760nm). La radiación visible es aquella cuya longitud de onda está entre el rojo y el azul, por lo tanto las longitudes de onda más cortas son rojas. El Sol emite luz en todas direcciones pero la mayor > ``` but its text generation looks good (aligned with what you expect from a base model). this is a log of a full run in case the details are pertinent to the above magic ``` llama-cli -m ./salamandra-2b_bf16.gguf --ctx-size 8192 --rope-freq-base 10000.0 --top-p 0.95 --repeat-penalty 1.2 --temp 0.1 --n-predict 128 --top-k 40 -p "hola, puedes decirme por qué el sol es amarillo?" build: 3889 (b6d6c528) with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin24.0.0 main: warning: changing RoPE frequency base to 10000. main: llama backend init main: load the model and apply lora adapter, if any llama_model_loader: loaded meta data with 29 key-value pairs and 219 tensors from ./salamandra-2b_bf16.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.size_label str = 2.3B llama_model_loader: - kv 3: general.license str = apache-2.0 llama_model_loader: - kv 4: general.tags arr[str,1] = ["text-generation"] llama_model_loader: - kv 5: general.languages arr[str,36] = ["bg", "ca", "code", "cs", "cy", "da"... llama_model_loader: - kv 6: llama.block_count u32 = 24 llama_model_loader: - kv 7: llama.context_length u32 = 8192 llama_model_loader: - kv 8: llama.embedding_length u32 = 2048 llama_model_loader: - kv 9: llama.feed_forward_length u32 = 5440 llama_model_loader: - kv 10: llama.attention.head_count u32 = 16 llama_model_loader: - kv 11: llama.attention.head_count_kv u32 = 16 llama_model_loader: - kv 12: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 13: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 14: general.file_type u32 = 32 llama_model_loader: - kv 15: llama.vocab_size u32 = 256000 llama_model_loader: - kv 16: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 17: tokenizer.ggml.add_space_prefix bool = true llama_model_loader: - kv 18: tokenizer.ggml.model str = llama llama_model_loader: - kv 19: tokenizer.ggml.pre str = default llama_model_loader: - kv 20: tokenizer.ggml.tokens arr[str,256000] = ["<unk>", "<s>", "</s>", "<pad>", "<|... llama_model_loader: - kv 21: tokenizer.ggml.scores arr[f32,256000] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 22: tokenizer.ggml.token_type arr[i32,256000] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 23: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 24: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 25: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 26: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 27: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 49 tensors llama_model_loader: - type bf16: 170 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 104 llm_load_vocab: token to piece cache size = 1.8842 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 256000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 24 llm_load_print_meta: n_head = 16 llm_load_print_meta: n_head_kv = 16 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 2048 llm_load_print_meta: n_embd_v_gqa = 2048 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 5440 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = BF16 llm_load_print_meta: model params = 2.25 B llm_load_print_meta: model size = 4.20 GiB (16.00 BPW) llm_load_print_meta: general.name = n/a llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 145 '<0x0A>' llm_load_print_meta: EOT token = 5 '<|im_end|>' llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: EOG token = 5 '<|im_end|>' llm_load_print_meta: max token length = 72 llm_load_tensors: ggml ctx size = 0.20 MiB llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 25/25 layers to GPU llm_load_tensors: Metal buffer size = 4298.39 MiB llm_load_tensors: CPU buffer size = 1000.00 MiB ....................................................... llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: found device: Apple M3 Max ggml_metal_init: picking default device: Apple M3 Max ggml_metal_init: using embedded metal library ggml_metal_init: GPU name: Apple M3 Max ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction support = true ggml_metal_init: simdgroup matrix mul. support = true ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 42949.67 MB llama_kv_cache_init: Metal KV buffer size = 1536.00 MiB llama_new_context_with_model: KV self size = 1536.00 MiB, K (f16): 768.00 MiB, V (f16): 768.00 MiB llama_new_context_with_model: CPU output buffer size = 0.98 MiB llama_new_context_with_model: Metal compute buffer size = 288.00 MiB llama_new_context_with_model: CPU compute buffer size = 500.00 MiB llama_new_context_with_model: graph nodes = 774 llama_new_context_with_model: graph splits = 339 llama_init_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) main: llama threadpool init, n_threads = 12 system_info: n_threads = 12 (n_threads_batch = 12) / 16 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 1 | LLAMAFILE = 1 | sampler seed: 892523417 sampler params: repeat_last_n = 64, repeat_penalty = 1.200, frequency_penalty = 0.000, presence_penalty = 0.000 top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.100 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist generate: n_ctx = 8192, n_batch = 2048, n_predict = 128, n_keep = 1 hola, puedes decirme por qué el sol es amarillo? ¿Por qué la luz del sol se ve más clara en verano que en invierno? La respuesta a esta pregunta está relacionada con las propiedades de los colores. Los colores son formas diferentes de energía electromagnética y tienen una longitud de onda diferente. En general, cuanto mayor sea el número de vibraciones por segundo (velocidad), menor será la frecuencia del color. Por ejemplo, si se mide en Hertz (Hz) o ciclos por segundo, un color rojo tiene 160 Hz mientras que uno azul tiene solo 435 Hz. La luz visible es una combinación de colores compu llama_perf_sampler_print: sampling time = 132.98 ms / 143 runs ( 0.93 ms per token, 1075.34 tokens per second) llama_perf_context_print: load time = 685.66 ms llama_perf_context_print: prompt eval time = 1167.94 ms / 15 tokens ( 77.86 ms per token, 12.84 tokens per second) llama_perf_context_print: eval time = 17103.92 ms / 127 runs ( 134.68 ms per token, 7.43 tokens per second) llama_perf_context_print: total time = 18423.56 ms / 142 tokens ggml_metal_free: deallocating ``` but perhaps when I wrote it, I did something wrong in my model file? ``` # Ollama Modelfile for Salamandra 2B IQ4_NL FROM ./salamandra-2b_Q8_0.gguf # Model Parameters PARAMETER num_ctx 8192 PARAMETER rope_freq_base 10000.0 PARAMETER top_p 0.95 PARAMETER repeat_penalty 1.2 # System Prompt SYSTEM """You are a multilingual assistant capable of understanding and responding in multiple languages. Adapt your responses to match the user's input language while providing clear, accurate, and concise information.""" # Template TEMPLATE """{{ if .System }}<|im_start|>system {{ .System }}<|im_end|>{{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|>{{ end }}<|im_start|>assistant""" # License LICENSE """ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2024 Language Technologies Unit, Barcelona Supercomputing Center Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.""" ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.3.13
GiteaMirror added the bug label 2026-04-12 15:30:07 -05:00
Author
Owner

@robbiemu commented on GitHub (Oct 13, 2024):

this is the log of how it was made:

/Users/Shared/Public/Github/llama.cpp/llama-cli --version
version: 3906 (7eee341b)
built with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin24.0.0

/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py --outtype bf16 . --outfile ./salamandra-2b_bf16.gguf
INFO:hf-to-gguf:Loading model:
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight,               torch.bfloat16 --> BF16, shape = {2048, 256000}
INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> BF16, shape = {2048, 256000}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.0.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.1.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.10.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.11.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.12.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.13.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.14.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.15.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.16.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.17.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.18.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.19.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.2.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.20.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.21.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.22.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.ffn_down.weight,      torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,      torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.23.ffn_up.weight,        torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.attn_k.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_output.weight,   torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_q.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_v.weight,        torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.3.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.4.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.5.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.6.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.7.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.8.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.ffn_down.weight,       torch.bfloat16 --> BF16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,       torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_up.weight,         torch.bfloat16 --> BF16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_k.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_output.weight,    torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_q.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.bfloat16 --> BF16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 5440
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:salamandra-2b_bf16.gguf: n_tensors = 219, total_size = 4.5G
Writing: 100%|████████████████████████████████████████████████████████████████████| 4.51G/4.51G [00:10<00:00, 419Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to salamandra-2b_bf16.gguf
<!-- gh-comment-id:2409059564 --> @robbiemu commented on GitHub (Oct 13, 2024): this is the log of how it was made: ``` /Users/Shared/Public/Github/llama.cpp/llama-cli --version version: 3906 (7eee341b) built with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin24.0.0 /Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py --outtype bf16 . --outfile ./salamandra-2b_bf16.gguf INFO:hf-to-gguf:Loading model: INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> BF16, shape = {2048, 256000} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> BF16, shape = {2048, 256000} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> BF16, shape = {5440, 2048} INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2048, 5440} INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> BF16, shape = {2048, 2048} INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:Set meta model INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 2048 INFO:hf-to-gguf:gguf: feed forward length = 5440 INFO:hf-to-gguf:gguf: head count = 16 INFO:hf-to-gguf:gguf: key-value head count = 16 INFO:hf-to-gguf:gguf: rope theta = 10000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 32 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Setting special token type bos to 1 INFO:gguf.vocab:Setting special token type eos to 2 INFO:gguf.vocab:Setting special token type unk to 0 INFO:gguf.vocab:Setting add_bos_token to True INFO:gguf.vocab:Setting add_eos_token to False INFO:hf-to-gguf:Set model quantization version INFO:gguf.gguf_writer:Writing the following files: INFO:gguf.gguf_writer:salamandra-2b_bf16.gguf: n_tensors = 219, total_size = 4.5G Writing: 100%|████████████████████████████████████████████████████████████████████| 4.51G/4.51G [00:10<00:00, 419Mbyte/s] INFO:hf-to-gguf:Model successfully exported to salamandra-2b_bf16.gguf ```
Author
Owner

@robbiemu commented on GitHub (Oct 13, 2024):

The problem can be seen here:

xxd -l 16 salamandra-2b_bf16.gguf | head
00000000: 4747 5546 0300 0000 db00 0000 0000 0000  GGUF............

The problem is that Ollama's ggml.go file is specifically checking for two different magic numbers:
0x67676a74 ("ggjt" in ASCII)
0x67676d6c ("ggml" in ASCII)
the file starts with "GGUF", which is neither of these expected magic numbers.

<!-- gh-comment-id:2409065065 --> @robbiemu commented on GitHub (Oct 13, 2024): The problem can be seen here: ``` xxd -l 16 salamandra-2b_bf16.gguf | head 00000000: 4747 5546 0300 0000 db00 0000 0000 0000 GGUF............ ``` The problem is that Ollama's ggml.go file is specifically checking for two different magic numbers: 0x67676a74 ("ggjt" in ASCII) 0x67676d6c ("ggml" in ASCII) the file starts with "GGUF", which is neither of these expected magic numbers.
Author
Owner

@rick-github commented on GitHub (Oct 13, 2024):

ollama doesn't properly handle models with BF16 tensor types, submitted https://github.com/ollama/ollama/pull/7193 to address that.

Your Modelfile also needs to be adjusted to match your test with llama-cli. With the orignal TEMPLATE, the model usually responds in English:

$ diff -u Modelefile.orig Modelfile.salamandra-2b_bf16.gguf
--- Modelefile.orig	2024-10-13 19:48:42.530644497 +0200
+++ Modelfile.salamandra-2b_bf16.gguf	2024-10-13 19:50:16.945055692 +0200
@@ -4,7 +4,6 @@
 
 # Model Parameters
 PARAMETER num_ctx 8192
-PARAMETER rope_freq_base 10000.0
 PARAMETER top_p 0.95
 PARAMETER repeat_penalty 1.2
 
@@ -12,9 +11,7 @@
 SYSTEM """You are a multilingual assistant capable of understanding and responding in multiple languages. Adapt your responses to match the user's input language while providing clear, accurate, and concise information."""
 
 # Template
-TEMPLATE """{{ if .System }}<|im_start|>system
-{{ .System }}<|im_end|>{{ end }}{{ if .Prompt }}<|im_start|>user
-{{ .Prompt }}<|im_end|>{{ end }}<|im_start|>assistant"""
+TEMPLATE """{{ .Prompt }}"""
 
 # License
 LICENSE """
$ ollama create salamandra:2b_bf16 -f Modelfile.salamandra-2b_bf16.gguf 
transferring model data 100% 
using existing layer sha256:da1b0660e660a079a4d6c24989d69217c0814fe8e7e64123566b69d1c2fdbbf0 
using existing layer sha256:c3ba9be54b31eb6790caff873863db748ca96f138199b06887291bcb8bdebf70 
using existing layer sha256:b507b9c2f6ca642bffcd06665ea7c91f235fd32daeefdf875a0f938db05fb315 
using existing layer sha256:689957b165f2f55104a7d39410eee3802e2d8198dc5a132b94055bffc9025ec4 
using existing layer sha256:dcb1db86b8e15efc93f4cc65d57a9ad3be18bbbd1a82883bd94fdec990ab631c 
creating new layer sha256:2121edc63e38e5ff90b6cf5a3116ac07e74a0afb6ad5131a70c5e984ef6449f9 
writing manifest 
success 
$ curl -s localhost:11434/api/generate -d '{"model":"salamandra:2b_bf16","options":{"temperature":0.1,"num_predict":128,"top_k":40,"seed":892523417},"prompt":"hola, puedes decirme por qué el sol es amarillo?","stream":false}' | jq -r .response

¿Por qué la luz del sol se ve más clara en verano que en invierno?
La respuesta a esta pregunta está relacionada con las propiedades de los colores. Los colores son diferentes porque tienen una longitud de onda diferente y esto afecta su intensidad o brillo. En general, cuando hay mayor cantidad de energía luminosa (luz) en un lugar, la luz se ve más brillante. Esto es lo que ocurre durante el verano: debido a que hay mucha más radiación solar en ese momento del año, las personas perciben una mayor luminosidad y colorido natural.
¿Por qué los colores son diferentes?
Los colores dependen

The response is not quite the same as your llama-cli test, there are some parameters (tfs_z, mirostat_lr, etc) that may be affecting that.

<!-- gh-comment-id:2409068502 --> @rick-github commented on GitHub (Oct 13, 2024): ollama doesn't properly handle models with BF16 tensor types, submitted https://github.com/ollama/ollama/pull/7193 to address that. Your Modelfile also needs to be adjusted to match your test with llama-cli. With the orignal TEMPLATE, the model usually responds in English: ```diff $ diff -u Modelefile.orig Modelfile.salamandra-2b_bf16.gguf --- Modelefile.orig 2024-10-13 19:48:42.530644497 +0200 +++ Modelfile.salamandra-2b_bf16.gguf 2024-10-13 19:50:16.945055692 +0200 @@ -4,7 +4,6 @@ # Model Parameters PARAMETER num_ctx 8192 -PARAMETER rope_freq_base 10000.0 PARAMETER top_p 0.95 PARAMETER repeat_penalty 1.2 @@ -12,9 +11,7 @@ SYSTEM """You are a multilingual assistant capable of understanding and responding in multiple languages. Adapt your responses to match the user's input language while providing clear, accurate, and concise information.""" # Template -TEMPLATE """{{ if .System }}<|im_start|>system -{{ .System }}<|im_end|>{{ end }}{{ if .Prompt }}<|im_start|>user -{{ .Prompt }}<|im_end|>{{ end }}<|im_start|>assistant""" +TEMPLATE """{{ .Prompt }}""" # License LICENSE """ ``` ```console $ ollama create salamandra:2b_bf16 -f Modelfile.salamandra-2b_bf16.gguf transferring model data 100% using existing layer sha256:da1b0660e660a079a4d6c24989d69217c0814fe8e7e64123566b69d1c2fdbbf0 using existing layer sha256:c3ba9be54b31eb6790caff873863db748ca96f138199b06887291bcb8bdebf70 using existing layer sha256:b507b9c2f6ca642bffcd06665ea7c91f235fd32daeefdf875a0f938db05fb315 using existing layer sha256:689957b165f2f55104a7d39410eee3802e2d8198dc5a132b94055bffc9025ec4 using existing layer sha256:dcb1db86b8e15efc93f4cc65d57a9ad3be18bbbd1a82883bd94fdec990ab631c creating new layer sha256:2121edc63e38e5ff90b6cf5a3116ac07e74a0afb6ad5131a70c5e984ef6449f9 writing manifest success $ curl -s localhost:11434/api/generate -d '{"model":"salamandra:2b_bf16","options":{"temperature":0.1,"num_predict":128,"top_k":40,"seed":892523417},"prompt":"hola, puedes decirme por qué el sol es amarillo?","stream":false}' | jq -r .response ¿Por qué la luz del sol se ve más clara en verano que en invierno? La respuesta a esta pregunta está relacionada con las propiedades de los colores. Los colores son diferentes porque tienen una longitud de onda diferente y esto afecta su intensidad o brillo. En general, cuando hay mayor cantidad de energía luminosa (luz) en un lugar, la luz se ve más brillante. Esto es lo que ocurre durante el verano: debido a que hay mucha más radiación solar en ese momento del año, las personas perciben una mayor luminosidad y colorido natural. ¿Por qué los colores son diferentes? Los colores dependen ``` The response is not quite the same as your llama-cli test, there are some parameters (`tfs_z`, `mirostat_lr`, etc) that may be affecting that.
Author
Owner

@robbiemu commented on GitHub (Oct 13, 2024):

thank you! Can't wait.

My tests had --temp 0.1 so probably that is the variation :)

<!-- gh-comment-id:2409074804 --> @robbiemu commented on GitHub (Oct 13, 2024): thank you! Can't wait. My tests had --temp 0.1 so probably that is the variation :)
Author
Owner

@pdevine commented on GitHub (Oct 15, 2024):

The problem is that Ollama's ggml.go file is specifically checking for two different magic numbers:
0x67676a74 ("ggjt" in ASCII)
0x67676d6c ("ggml" in ASCII)
the file starts with "GGUF", which is neither of these expected magic numbers.

@robbiemu The GGJT and GGML file formats haven't really been used for a while and I doubt either would work at this point. You can only import from Safetensors and GGUF right now. The magic # for gguf is 0x46554747 ("fugg").

I think you should be able to make this work with just changing the way you're running convert_hf_to_gguf.py. Change the --outtype bf16 to --outtype fp16.

<!-- gh-comment-id:2412548969 --> @pdevine commented on GitHub (Oct 15, 2024): > The problem is that Ollama's ggml.go file is specifically checking for two different magic numbers: > 0x67676a74 ("ggjt" in ASCII) > 0x67676d6c ("ggml" in ASCII) > the file starts with "GGUF", which is neither of these expected magic numbers. @robbiemu The `GGJT` and `GGML` file formats haven't really been used for a while and I doubt either would work at this point. You can only import from Safetensors and GGUF right now. The magic # for gguf is 0x46554747 ("fugg"). I think you should be able to make this work with just changing the way you're running `convert_hf_to_gguf.py`. Change the `--outtype bf16` to `--outtype fp16`.
Author
Owner

@robbiemu commented on GitHub (Oct 15, 2024):

@pdevine yes? wondering where those are read from then (the error message is only in the repo from the code where I got those values). thank you for merging the PR :)

I didn't want to make that change because it makes it not a copy (so, even if in only the tiniest ways, lossy), and this is after all a gguf still.

can I just clone the repo now that the pr is merged and make my models early, or is it more complicated and I should wait for the next update?

<!-- gh-comment-id:2412591342 --> @robbiemu commented on GitHub (Oct 15, 2024): @pdevine yes? wondering where those are read from then (the error message is only in the repo from the code where I got those values). thank you for merging the PR :) I didn't want to make that change because it makes it not a copy (so, even if in only the tiniest ways, lossy), and this is after all a gguf still. can I just clone the repo now that the pr is merged and make my models early, or is it more complicated and I should wait for the next update?
Author
Owner

@pdevine commented on GitHub (Oct 15, 2024):

@robbiemu I went ahead and built the 2b-instruct model. You can try it out with ollama run pdevine/salamandra:2b. I just built it from the safetensors weights on hf. The template was a bit of a pain but seems to work if you want to see it here.

The steps to get this to work:

  1. clone the BSC-LT/salamandra repos
  2. move the tokenizer.json to something like tokenizer.json-old (it's getting picked up, but the protobuf tokenizer works fine)
  3. create a Modelfile in the directory w/ FROM . in it, and add the TEMPLATE for the instruct models
  4. ollama create salamandra:2b

I think you've probably seen it, but there are more instructions here.

<!-- gh-comment-id:2412618211 --> @pdevine commented on GitHub (Oct 15, 2024): @robbiemu I went ahead and built the 2b-instruct model. You can try it out with `ollama run pdevine/salamandra:2b`. I just built it from the safetensors weights on [hf](https://huggingface.co/BSC-LT/salamandra-2b-instruct). The template was a bit of a pain but seems to work if you want to see it [here](https://ollama.com/pdevine/salamandra/blobs/b69a90018566). The steps to get this to work: 1. clone the BSC-LT/salamandra repos 2. move the `tokenizer.json` to something like `tokenizer.json-old` (it's getting picked up, but the protobuf tokenizer works fine) 3. create a Modelfile in the directory w/ `FROM .` in it, and add the `TEMPLATE` for the instruct models 4. `ollama create salamandra:2b` I think you've probably seen it, but there are more instructions [here](https://github.com/ollama/ollama/blob/main/docs/import.md#Importing-a-model-from-Safetensors-weights).
Author
Owner

@robbiemu commented on GitHub (Oct 15, 2024):

where did that template come from?? :)

-- btw I just finished the imatrix-based quants for the 2b-instruct too :) they are on huggingface

I have a question about that tokenizer, they haven't answered yet. The base model's tokenizer is twice the size of the one for the instruct fine tune

<!-- gh-comment-id:2412632031 --> @robbiemu commented on GitHub (Oct 15, 2024): where did that template come from?? :) -- btw I just finished the imatrix-based quants for the 2b-instruct too :) they are on huggingface I have a question about that tokenizer, they haven't answered yet. The base model's tokenizer is twice the size of the one for the instruct fine tune
Author
Owner

@pdevine commented on GitHub (Oct 15, 2024):

I just wrote the template myself. You can find the original template in tokenizer-config.json.

<!-- gh-comment-id:2412642209 --> @pdevine commented on GitHub (Oct 15, 2024): I just wrote the template myself. You can find the original template in `tokenizer-config.json`.
Author
Owner

@robbiemu commented on GitHub (Oct 15, 2024):

I hadn't seen it because so far I've only looked at building the template for Ollama with the base model :) they didn't add the chat template since the base model has no chat training.
I can skip uploading the base models to Ollama if you think we're better served with just the instruct models

when is the next release? I want to use these models but need that pr to do the same with my imatrix-based quantizations:

ollama create salamandra:2b-instruct_bf16 -f ./Modelfile
transferring model data 100%
Error: invalid file magic
<!-- gh-comment-id:2412648110 --> @robbiemu commented on GitHub (Oct 15, 2024): I hadn't seen it because so far I've only looked at building the template for Ollama with the base model :) they didn't add the chat template since the base model has no chat training. I can skip uploading the base models to Ollama if you think we're better served with just the instruct models when is the next release? I want to use these models but need that pr to do the same with my imatrix-based quantizations: ``` ollama create salamandra:2b-instruct_bf16 -f ./Modelfile transferring model data 100% Error: invalid file magic ```
Author
Owner

@robbiemu commented on GitHub (Oct 15, 2024):

@pdevine one last question. On review of this conversation, this part of your reply still stood out to me:

  • move the tokenizer.json to something like tokenizer.json-old (it's getting picked up, but the protobuf tokenizer works fine)

I am wanting to verify the basis of this choice. It is because this puts less memory demands on its use? Not because you faced any issues with the provided tokenizer, right? Would you recommend for any other reason that I use the protobuf one?

<!-- gh-comment-id:2414531093 --> @robbiemu commented on GitHub (Oct 15, 2024): @pdevine one last question. On review of this conversation, this part of your reply still stood out to me: > - move the tokenizer.json to something like tokenizer.json-old (it's getting picked up, but the protobuf tokenizer works fine) I am wanting to verify the basis of this choice. It is because this puts less memory demands on its use? Not because you faced any issues with the provided tokenizer, right? Would you recommend for any other reason that I use the protobuf one?
Author
Owner

@pdevine commented on GitHub (Oct 15, 2024):

So there are two tokenizer files in there: tokenizer.json and tokenizer.model. For some reason there is a bug reading through the tokenizer.json with the merges. I just used the tokenizer.model file instead which is a protobuf version of essentially the same thing.

This is specifically when you're converting from safetensors, not trying to pull in a GGUF based file. The GGUF file has the tokenizer data already.

<!-- gh-comment-id:2414666523 --> @pdevine commented on GitHub (Oct 15, 2024): So there are two tokenizer files in there: `tokenizer.json` and `tokenizer.model`. For some reason there is a bug reading through the `tokenizer.json` with the merges. I just used the `tokenizer.model` file instead which is a protobuf version of essentially the same thing. This is specifically when you're converting from safetensors, not trying to pull in a GGUF based file. The GGUF file has the tokenizer data already.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4568