[GH-ISSUE #7679] The fine tuned codegemma model exhibits abnormal performance #4900

Closed
opened 2026-04-12 15:56:57 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @TheSongg on GitHub (Nov 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7679

What is the issue?

I downloaded the codegemma and codellama models from Huggingface and fine tuned them using llama factory. After importing the fine tuned model into Ollama, Codellama works normally, while the Codegemma model seems to have not learned the knowledge of the fine tuned dataset. Similarly, importing the fine tuned codegemma model into llama factory works normally. I have made multiple modifications to the modelfile file when creating the codegemma, but it has not been effective. May I ask what the reason is and how can I resolve it? thank you
ollama:0.4.1
llama factory:0.8.3
codegemma:https://huggingface.co/google/codegemma-7b
codellama:https://huggingface.co/codellama/CodeLlama-7b-hf

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.4.1

Originally created by @TheSongg on GitHub (Nov 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7679 ### What is the issue? I downloaded the codegemma and codellama models from Huggingface and fine tuned them using llama factory. After importing the fine tuned model into Ollama, Codellama works normally, while the Codegemma model seems to have not learned the knowledge of the fine tuned dataset. Similarly, importing the fine tuned codegemma model into llama factory works normally. I have made multiple modifications to the modelfile file when creating the codegemma, but it has not been effective. May I ask what the reason is and how can I resolve it? thank you ollama:0.4.1 llama factory:0.8.3 codegemma:https://huggingface.co/google/codegemma-7b codellama:https://huggingface.co/codellama/CodeLlama-7b-hf ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.1
GiteaMirror added the needs more infobug labels 2026-04-12 15:56:57 -05:00
Author
Owner

@pdevine commented on GitHub (Nov 15, 2024):

@TheSongg it's hard to say what went wrong without your Modelfile or the model itself. Maybe you could push the models to ollama.com? You can find out more information on how to do that in the docs.

<!-- gh-comment-id:2478205107 --> @pdevine commented on GitHub (Nov 15, 2024): @TheSongg it's hard to say what went wrong without your Modelfile or the model itself. Maybe you could push the models to ollama.com? You can find out more information on how to do that [in the docs](https://github.com/ollama/ollama/blob/main/docs/import.md#sharing-your-model-on-ollamacom).
Author
Owner

@TheSongg commented on GitHub (Nov 18, 2024):

I saved the fine-tuned model that has been fused with the base model(codegemma) as a safetensors format file in the /usr/case path, and also tried to convert the model to GGUF format. The following is the Modelfile content in safetensors format.

FROM /usr/case
PARAMETER temperature 0.95
PARAMETER num_ctx 4096
PARAMETER top_p 0.7

TEMPLATE """
<start_of_turn>user
{{ if .System }}{{ .System }} {{ end }}{{ .Prompt }}<end_of_turn>
<start_of_turn>model
{{ .Response }}<end_of_turn>
"""

<!-- gh-comment-id:2481745720 --> @TheSongg commented on GitHub (Nov 18, 2024): I saved the fine-tuned model that has been fused with the base model(codegemma) as a safetensors format file in the /usr/case path, and also tried to convert the model to GGUF format. The following is the Modelfile content in safetensors format. FROM /usr/case PARAMETER temperature 0.95 PARAMETER num_ctx 4096 PARAMETER top_p 0.7 TEMPLATE """ <start_of_turn>user {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }}<end_of_turn> <start_of_turn>model {{ .Response }}<end_of_turn> """
Author
Owner

@TheSongg commented on GitHub (Nov 21, 2024):

time=2024-11-21T07:20:15.452Z level=INFO source=server.go:105 msg="system memory" total="503.2 GiB" free="450.3 GiB" free_swap="8.0 GiB"
time=2024-11-21T07:20:15.453Z level=INFO source=memory.go:343 msg="offload to cpu" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[450.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="24.2 GiB" memory.required.partial="0 B" memory.required.kv="7.0 GiB" memory.required.allocations="[24.2 GiB]" memory.weights.total="21.4 GiB" memory.weights.repeating="20.0 GiB" memory.weights.nonrepeating="1.5 GiB" memory.graph.full="572.0 MiB" memory.graph.partial="1.1 GiB"
time=2024-11-21T07:20:15.454Z level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama1497612674/runners/cpu_avx2/ollama_llama_server --model /home/ubuntu/.ollama/models/blobs/sha256-9831f8f5e23db7c3f529497724e829ac23b2eb4c0083394461b8e60cce4452e9 --ctx-size 16384 --batch-size 512 --lora /home/ubuntu/.ollama/models/blobs/sha256-07cd0a2c1e4cd3e1824d6b1002c27daf95f68cdc52963090a63d61bed00c24bd --threads 56 --no-mmap --parallel 4 --port 45157"
time=2024-11-21T07:20:15.454Z level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-11-21T07:20:15.454Z level=INFO source=server.go:562 msg="waiting for llama runner to start responding"
time=2024-11-21T07:20:15.455Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error"
time=2024-11-21T07:20:15.465Z level=INFO source=runner.go:863 msg="starting go runner"
time=2024-11-21T07:20:15.465Z level=INFO source=runner.go:864 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=56
time=2024-11-21T07:20:15.466Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:45157"
llama_model_loader: loaded meta data with 32 key-value pairs and 254 tensors from /home/ubuntu/.ollama/models/blobs/sha256-9831f8f5e23db7c3f529497724e829ac23b2eb4c0083394461b8e60cce4452e9 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = gemma
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Codegemma
llama_model_loader: - kv 3: general.size_label str = 8.5B
llama_model_loader: - kv 4: general.license str = gemma
llama_model_loader: - kv 5: general.license.link str = https://ai.google.dev/gemma/terms
llama_model_loader: - kv 6: general.tags arr[str,1] = ["text-generation"]
llama_model_loader: - kv 7: gemma.context_length u32 = 8192
llama_model_loader: - kv 8: gemma.embedding_length u32 = 3072
llama_model_loader: - kv 9: gemma.block_count u32 = 28
llama_model_loader: - kv 10: gemma.feed_forward_length u32 = 24576
llama_model_loader: - kv 11: gemma.attention.head_count u32 = 16
llama_model_loader: - kv 12: gemma.attention.head_count_kv u32 = 16
llama_model_loader: - kv 13: gemma.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 14: gemma.attention.key_length u32 = 256
llama_model_loader: - kv 15: gemma.attention.value_length u32 = 256
llama_model_loader: - kv 16: general.file_type u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.model str = llama
llama_model_loader: - kv 18: tokenizer.ggml.pre str = default
llama_model_loader: - kv 19: tokenizer.ggml.tokens arr[str,256000] = ["", "", "", "", ...
time=2024-11-21T07:20:15.707Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv 20: tokenizer.ggml.scores arr[f32,256000] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 21: tokenizer.ggml.token_type arr[i32,256000] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 2
llama_model_loader: - kv 23: tokenizer.ggml.eos_token_id u32 = 1
llama_model_loader: - kv 24: tokenizer.ggml.unknown_token_id u32 = 3
llama_model_loader: - kv 25: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 26: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 27: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 28: tokenizer.chat_template str = {{ bos_token }}{% if messages[0]['rol...
llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 107
llama_model_loader: - kv 30: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 31: general.quantization_version u32 = 2
llama_model_loader: - type f32: 57 tensors
llama_model_loader: - type f16: 197 tensors
llm_load_vocab: control-looking token: '<end_of_turn>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 249
llm_load_vocab: token to piece cache size = 1.6014 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = gemma
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 256000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 3072
llm_load_print_meta: n_layer = 28
llm_load_print_meta: n_head = 16
llm_load_print_meta: n_head_kv = 16
llm_load_print_meta: n_rot = 256
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 256
llm_load_print_meta: n_embd_head_v = 256
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 4096
llm_load_print_meta: n_embd_v_gqa = 4096
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 24576
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 8.54 B
llm_load_print_meta: model size = 15.90 GiB (16.00 BPW)
llm_load_print_meta: general.name = Codegemma
llm_load_print_meta: BOS token = 2 ''
llm_load_print_meta: EOS token = 1 ''
llm_load_print_meta: UNK token = 3 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 227 '<0x0A>'
llm_load_print_meta: PRE token = 67 '<|fim_prefix|>'
llm_load_print_meta: SUF token = 69 '<|fim_suffix|>'
llm_load_print_meta: MID token = 68 '<|fim_middle|>'
llm_load_print_meta: EOT token = 107 '<end_of_turn>'
llm_load_print_meta: EOG token = 1 ''
llm_load_print_meta: EOG token = 107 '<end_of_turn>'
llm_load_print_meta: max token length = 48
llm_load_tensors: ggml ctx size = 0.12 MiB
llm_load_tensors: CPU buffer size = 17784.67 MiB
llama_new_context_with_model: n_ctx = 16384
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 7168.00 MiB
llama_new_context_with_model: KV self size = 7168.00 MiB, K (f16): 3584.00 MiB, V (f16): 3584.00 MiB
llama_new_context_with_model: CPU output buffer size = 3.95 MiB
llama_new_context_with_model: CPU compute buffer size = 572.01 MiB
llama_new_context_with_model: graph nodes = 931
llama_new_context_with_model: graph splits = 1
llama_lora_adapter_init_internal: loading lora adapter from '/home/ubuntu/.ollama/models/blobs/sha256-07cd0a2c1e4cd3e1824d6b1002c27daf95f68cdc52963090a63d61bed00c24bd' ...
llama_lora_adapter_init_internal: CPU LoRA buffer size = 95.38 MiB
llama_lora_adapter_init_internal: loaded 392 tensors from lora file
time=2024-11-21T07:20:30.755Z level=INFO source=server.go:601 msg="llama runner started in 15.30 seconds"

<!-- gh-comment-id:2490256221 --> @TheSongg commented on GitHub (Nov 21, 2024): time=2024-11-21T07:20:15.452Z level=INFO source=server.go:105 msg="system memory" total="503.2 GiB" free="450.3 GiB" free_swap="8.0 GiB" time=2024-11-21T07:20:15.453Z level=INFO source=memory.go:343 msg="offload to cpu" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[450.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="24.2 GiB" memory.required.partial="0 B" memory.required.kv="7.0 GiB" memory.required.allocations="[24.2 GiB]" memory.weights.total="21.4 GiB" memory.weights.repeating="20.0 GiB" memory.weights.nonrepeating="1.5 GiB" memory.graph.full="572.0 MiB" memory.graph.partial="1.1 GiB" time=2024-11-21T07:20:15.454Z level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama1497612674/runners/cpu_avx2/ollama_llama_server --model /home/ubuntu/.ollama/models/blobs/sha256-9831f8f5e23db7c3f529497724e829ac23b2eb4c0083394461b8e60cce4452e9 --ctx-size 16384 --batch-size 512 --lora /home/ubuntu/.ollama/models/blobs/sha256-07cd0a2c1e4cd3e1824d6b1002c27daf95f68cdc52963090a63d61bed00c24bd --threads 56 --no-mmap --parallel 4 --port 45157" time=2024-11-21T07:20:15.454Z level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-11-21T07:20:15.454Z level=INFO source=server.go:562 msg="waiting for llama runner to start responding" time=2024-11-21T07:20:15.455Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error" time=2024-11-21T07:20:15.465Z level=INFO source=runner.go:863 msg="starting go runner" time=2024-11-21T07:20:15.465Z level=INFO source=runner.go:864 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=56 time=2024-11-21T07:20:15.466Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:45157" llama_model_loader: loaded meta data with 32 key-value pairs and 254 tensors from /home/ubuntu/.ollama/models/blobs/sha256-9831f8f5e23db7c3f529497724e829ac23b2eb4c0083394461b8e60cce4452e9 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = gemma llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Codegemma llama_model_loader: - kv 3: general.size_label str = 8.5B llama_model_loader: - kv 4: general.license str = gemma llama_model_loader: - kv 5: general.license.link str = https://ai.google.dev/gemma/terms llama_model_loader: - kv 6: general.tags arr[str,1] = ["text-generation"] llama_model_loader: - kv 7: gemma.context_length u32 = 8192 llama_model_loader: - kv 8: gemma.embedding_length u32 = 3072 llama_model_loader: - kv 9: gemma.block_count u32 = 28 llama_model_loader: - kv 10: gemma.feed_forward_length u32 = 24576 llama_model_loader: - kv 11: gemma.attention.head_count u32 = 16 llama_model_loader: - kv 12: gemma.attention.head_count_kv u32 = 16 llama_model_loader: - kv 13: gemma.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 14: gemma.attention.key_length u32 = 256 llama_model_loader: - kv 15: gemma.attention.value_length u32 = 256 llama_model_loader: - kv 16: general.file_type u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.model str = llama llama_model_loader: - kv 18: tokenizer.ggml.pre str = default llama_model_loader: - kv 19: tokenizer.ggml.tokens arr[str,256000] = ["<pad>", "<eos>", "<bos>", "<unk>", ... time=2024-11-21T07:20:15.707Z level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 20: tokenizer.ggml.scores arr[f32,256000] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 21: tokenizer.ggml.token_type arr[i32,256000] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 23: tokenizer.ggml.eos_token_id u32 = 1 llama_model_loader: - kv 24: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 25: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 26: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 27: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 28: tokenizer.chat_template str = {{ bos_token }}{% if messages[0]['rol... llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 107 llama_model_loader: - kv 30: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 31: general.quantization_version u32 = 2 llama_model_loader: - type f32: 57 tensors llama_model_loader: - type f16: 197 tensors llm_load_vocab: control-looking token: '<end_of_turn>' was not control-type; this is probably a bug in the model. its type will be overridden llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 249 llm_load_vocab: token to piece cache size = 1.6014 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = gemma llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 256000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 3072 llm_load_print_meta: n_layer = 28 llm_load_print_meta: n_head = 16 llm_load_print_meta: n_head_kv = 16 llm_load_print_meta: n_rot = 256 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 256 llm_load_print_meta: n_embd_head_v = 256 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 24576 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = F16 llm_load_print_meta: model params = 8.54 B llm_load_print_meta: model size = 15.90 GiB (16.00 BPW) llm_load_print_meta: general.name = Codegemma llm_load_print_meta: BOS token = 2 '<bos>' llm_load_print_meta: EOS token = 1 '<eos>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: PAD token = 0 '<pad>' llm_load_print_meta: LF token = 227 '<0x0A>' llm_load_print_meta: PRE token = 67 '<|fim_prefix|>' llm_load_print_meta: SUF token = 69 '<|fim_suffix|>' llm_load_print_meta: MID token = 68 '<|fim_middle|>' llm_load_print_meta: EOT token = 107 '<end_of_turn>' llm_load_print_meta: EOG token = 1 '<eos>' llm_load_print_meta: EOG token = 107 '<end_of_turn>' llm_load_print_meta: max token length = 48 llm_load_tensors: ggml ctx size = 0.12 MiB llm_load_tensors: CPU buffer size = 17784.67 MiB llama_new_context_with_model: n_ctx = 16384 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 7168.00 MiB llama_new_context_with_model: KV self size = 7168.00 MiB, K (f16): 3584.00 MiB, V (f16): 3584.00 MiB llama_new_context_with_model: CPU output buffer size = 3.95 MiB llama_new_context_with_model: CPU compute buffer size = 572.01 MiB llama_new_context_with_model: graph nodes = 931 llama_new_context_with_model: graph splits = 1 llama_lora_adapter_init_internal: loading lora adapter from '/home/ubuntu/.ollama/models/blobs/sha256-07cd0a2c1e4cd3e1824d6b1002c27daf95f68cdc52963090a63d61bed00c24bd' ... llama_lora_adapter_init_internal: CPU LoRA buffer size = 95.38 MiB llama_lora_adapter_init_internal: loaded 392 tensors from lora file time=2024-11-21T07:20:30.755Z level=INFO source=server.go:601 msg="llama runner started in 15.30 seconds"
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4900