[GH-ISSUE #9511] snowflake-arctic-embed2 causes GGML_ASSERT(ctx->kv[key_id].get_type() != GGUF_TYPE_STRING) failed #6199

Closed
opened 2026-04-12 17:35:06 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @jmorganca on GitHub (Mar 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9511

Originally assigned to: @jmorganca on GitHub.

What is the issue?

GGUF parsing code is broken as of the last llama.cpp vendor code commit for models such as snowflake-arctic-embed2

Logs:

2025-03-05 09:00:48 time=2025-03-05T07:00:48.651Z level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | CUDA : ARCHS = 500,600,610,700,750,800,860,870,890,900,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=10
2025-03-05 09:00:48 time=2025-03-05T07:00:48.672Z level=INFO source=runner.go:992 msg="Server listening on 127.0.0.1:37753"
2025-03-05 09:00:48 time=2025-03-05T07:00:48.743Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
2025-03-05 09:00:48 llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4080 Laptop GPU) - 11047 MiB free
2025-03-05 09:00:48 llama_model_loader: loaded meta data with 34 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-8c625c9569c3c799f5f9595b5a141f91d224233055608189d66746347c14e613 (version GGUF V3 (latest))
2025-03-05 09:00:48 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
2025-03-05 09:00:48 llama_model_loader: - kv   0:                       general.architecture str              = bert
2025-03-05 09:00:48 llama_model_loader: - kv   1:                               general.type str              = model
2025-03-05 09:00:48 llama_model_loader: - kv   2:                         general.size_label str              = 567M
2025-03-05 09:00:48 llama_model_loader: - kv   3:                            general.license str              = apache-2.0
2025-03-05 09:00:48 llama_model_loader: - kv   4:                               general.tags arr[str,8]       = ["sentence-transformers", "feature-ex...
2025-03-05 09:00:48 llama_model_loader: - kv   5:                          general.languages arr[str,74]      = ["af", "ar", "az", "be", "bg", "bn", ...
2025-03-05 09:00:48 llama_model_loader: - kv   6:                           bert.block_count u32              = 24
2025-03-05 09:00:48 llama_model_loader: - kv   7:                        bert.context_length u32              = 8192
2025-03-05 09:00:48 llama_model_loader: - kv   8:                      bert.embedding_length u32              = 1024
2025-03-05 09:00:48 llama_model_loader: - kv   9:                   bert.feed_forward_length u32              = 4096
2025-03-05 09:00:48 llama_model_loader: - kv  10:                  bert.attention.head_count u32              = 16
2025-03-05 09:00:48 llama_model_loader: - kv  11:          bert.attention.layer_norm_epsilon f32              = 0.000010
2025-03-05 09:00:48 llama_model_loader: - kv  12:                          general.file_type u32              = 1
2025-03-05 09:00:48 llama_model_loader: - kv  13:                      bert.attention.causal bool             = false
2025-03-05 09:00:48 llama_model_loader: - kv  14:                          bert.pooling_type u32              = 2
2025-03-05 09:00:48 llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = t5
2025-03-05 09:00:48 llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = default
2025-03-05 09:00:48 llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,250002]  = ["<s>", "<pad>", "</s>", "<unk>", ","...
2025-03-05 09:00:48 llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,250002]  = [3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, ...
2025-03-05 09:00:48 llama_model_loader: - kv  19:                      tokenizer.ggml.scores arr[f32,250002]  = [-10000.000000, -10000.000000, -10000...
2025-03-05 09:00:48 llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
2025-03-05 09:00:48 llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = true
2025-03-05 09:00:48 llama_model_loader: - kv  22:            tokenizer.ggml.token_type_count u32              = 1
2025-03-05 09:00:48 llama_model_loader: - kv  23:                tokenizer.ggml.bos_token_id u32              = 0
2025-03-05 09:00:48 llama_model_loader: - kv  24:                tokenizer.ggml.eos_token_id u32              = 2
2025-03-05 09:00:48 llama_model_loader: - kv  25:            tokenizer.ggml.unknown_token_id u32              = 3
2025-03-05 09:00:48 llama_model_loader: - kv  26:          tokenizer.ggml.seperator_token_id u32              = 2
2025-03-05 09:00:48 llama_model_loader: - kv  27:            tokenizer.ggml.padding_token_id u32              = 1
2025-03-05 09:00:48 llama_model_loader: - kv  28:                tokenizer.ggml.cls_token_id u32              = 0
2025-03-05 09:00:48 llama_model_loader: - kv  29:               tokenizer.ggml.mask_token_id u32              = 250001
2025-03-05 09:00:48 llama_model_loader: - kv  30:        tokenizer.ggml.precompiled_charsmap arr[str,316720]  = ["A", "L", "Q", "C", "A", "A", "C", "...
2025-03-05 09:00:48 llama_model_loader: - kv  31:    tokenizer.ggml.remove_extra_whitespaces bool             = true
2025-03-05 09:00:48 llama_model_loader: - kv  32:            tokenizer.ggml.add_space_prefix bool             = true
2025-03-05 09:00:48 llama_model_loader: - kv  33:               general.quantization_version u32              = 2
2025-03-05 09:00:48 llama_model_loader: - type  f32:  244 tensors
2025-03-05 09:00:48 llama_model_loader: - type  f16:  145 tensors
2025-03-05 09:00:48 print_info: file format = GGUF V3 (latest)
2025-03-05 09:00:48 print_info: file type   = F16
2025-03-05 09:00:48 print_info: file size   = 1.07 GiB (16.25 BPW) 
2025-03-05 09:00:48 gguf.cpp:780: GGML_ASSERT(ctx->kv[key_id].get_type() != GGUF_TYPE_STRING) failed

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @jmorganca on GitHub (Mar 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9511 Originally assigned to: @jmorganca on GitHub. ### What is the issue? GGUF parsing code is broken as of the last llama.cpp vendor code commit for models such as `snowflake-arctic-embed2` Logs: ``` 2025-03-05 09:00:48 time=2025-03-05T07:00:48.651Z level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | CUDA : ARCHS = 500,600,610,700,750,800,860,870,890,900,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=10 2025-03-05 09:00:48 time=2025-03-05T07:00:48.672Z level=INFO source=runner.go:992 msg="Server listening on 127.0.0.1:37753" 2025-03-05 09:00:48 time=2025-03-05T07:00:48.743Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" 2025-03-05 09:00:48 llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4080 Laptop GPU) - 11047 MiB free 2025-03-05 09:00:48 llama_model_loader: loaded meta data with 34 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-8c625c9569c3c799f5f9595b5a141f91d224233055608189d66746347c14e613 (version GGUF V3 (latest)) 2025-03-05 09:00:48 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 2025-03-05 09:00:48 llama_model_loader: - kv 0: general.architecture str = bert 2025-03-05 09:00:48 llama_model_loader: - kv 1: general.type str = model 2025-03-05 09:00:48 llama_model_loader: - kv 2: general.size_label str = 567M 2025-03-05 09:00:48 llama_model_loader: - kv 3: general.license str = apache-2.0 2025-03-05 09:00:48 llama_model_loader: - kv 4: general.tags arr[str,8] = ["sentence-transformers", "feature-ex... 2025-03-05 09:00:48 llama_model_loader: - kv 5: general.languages arr[str,74] = ["af", "ar", "az", "be", "bg", "bn", ... 2025-03-05 09:00:48 llama_model_loader: - kv 6: bert.block_count u32 = 24 2025-03-05 09:00:48 llama_model_loader: - kv 7: bert.context_length u32 = 8192 2025-03-05 09:00:48 llama_model_loader: - kv 8: bert.embedding_length u32 = 1024 2025-03-05 09:00:48 llama_model_loader: - kv 9: bert.feed_forward_length u32 = 4096 2025-03-05 09:00:48 llama_model_loader: - kv 10: bert.attention.head_count u32 = 16 2025-03-05 09:00:48 llama_model_loader: - kv 11: bert.attention.layer_norm_epsilon f32 = 0.000010 2025-03-05 09:00:48 llama_model_loader: - kv 12: general.file_type u32 = 1 2025-03-05 09:00:48 llama_model_loader: - kv 13: bert.attention.causal bool = false 2025-03-05 09:00:48 llama_model_loader: - kv 14: bert.pooling_type u32 = 2 2025-03-05 09:00:48 llama_model_loader: - kv 15: tokenizer.ggml.model str = t5 2025-03-05 09:00:48 llama_model_loader: - kv 16: tokenizer.ggml.pre str = default 2025-03-05 09:00:48 llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,250002] = ["<s>", "<pad>", "</s>", "<unk>", ","... 2025-03-05 09:00:48 llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,250002] = [3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, ... 2025-03-05 09:00:48 llama_model_loader: - kv 19: tokenizer.ggml.scores arr[f32,250002] = [-10000.000000, -10000.000000, -10000... 2025-03-05 09:00:48 llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true 2025-03-05 09:00:48 llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = true 2025-03-05 09:00:48 llama_model_loader: - kv 22: tokenizer.ggml.token_type_count u32 = 1 2025-03-05 09:00:48 llama_model_loader: - kv 23: tokenizer.ggml.bos_token_id u32 = 0 2025-03-05 09:00:48 llama_model_loader: - kv 24: tokenizer.ggml.eos_token_id u32 = 2 2025-03-05 09:00:48 llama_model_loader: - kv 25: tokenizer.ggml.unknown_token_id u32 = 3 2025-03-05 09:00:48 llama_model_loader: - kv 26: tokenizer.ggml.seperator_token_id u32 = 2 2025-03-05 09:00:48 llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 1 2025-03-05 09:00:48 llama_model_loader: - kv 28: tokenizer.ggml.cls_token_id u32 = 0 2025-03-05 09:00:48 llama_model_loader: - kv 29: tokenizer.ggml.mask_token_id u32 = 250001 2025-03-05 09:00:48 llama_model_loader: - kv 30: tokenizer.ggml.precompiled_charsmap arr[str,316720] = ["A", "L", "Q", "C", "A", "A", "C", "... 2025-03-05 09:00:48 llama_model_loader: - kv 31: tokenizer.ggml.remove_extra_whitespaces bool = true 2025-03-05 09:00:48 llama_model_loader: - kv 32: tokenizer.ggml.add_space_prefix bool = true 2025-03-05 09:00:48 llama_model_loader: - kv 33: general.quantization_version u32 = 2 2025-03-05 09:00:48 llama_model_loader: - type f32: 244 tensors 2025-03-05 09:00:48 llama_model_loader: - type f16: 145 tensors 2025-03-05 09:00:48 print_info: file format = GGUF V3 (latest) 2025-03-05 09:00:48 print_info: file type = F16 2025-03-05 09:00:48 print_info: file size = 1.07 GiB (16.25 BPW) 2025-03-05 09:00:48 gguf.cpp:780: GGML_ASSERT(ctx->kv[key_id].get_type() != GGUF_TYPE_STRING) failed ``` ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 17:35:06 -05:00
Author
Owner

@Fade78 commented on GitHub (Mar 6, 2025):

got this too.

<!-- gh-comment-id:2703140052 --> @Fade78 commented on GitHub (Mar 6, 2025): got this too.
Author
Owner

@Fade78 commented on GitHub (Mar 6, 2025):

Fixed in 0.6.0.

old instructions:

How to rollback on docker

Waiting the fix you can replace with the image sha256:fe6cfe49855d66ff55c88022c7e0c92071a00803aff4e6f9aef884829174477f aka fe6cfe49855d. You can run a container simply by using its id instead of the image name. You can verify that you have this image with docker image ls --no-trunc ollama/ollama or docker image ls ollama/ollama.

<!-- gh-comment-id:2703174242 --> @Fade78 commented on GitHub (Mar 6, 2025): # Fixed in 0.6.0. old instructions: # How to rollback on docker Waiting the fix you can replace with the image `sha256:fe6cfe49855d66ff55c88022c7e0c92071a00803aff4e6f9aef884829174477f` aka `fe6cfe49855d`. You can `run` a container simply by using its id instead of the image name. You can verify that you have this image with `docker image ls --no-trunc ollama/ollama` or `docker image ls ollama/ollama`.
Author
Owner

@Carbaz commented on GitHub (Mar 6, 2025):

ggml-cpu.c:8624: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed

When tried using snowflake-arctic-embed instead of snowflake-arctic-embed2

<!-- gh-comment-id:2703886595 --> @Carbaz commented on GitHub (Mar 6, 2025): `ggml-cpu.c:8624: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed` When tried using `snowflake-arctic-embed` instead of `snowflake-arctic-embed2`
Author
Owner

@DangerousBerries commented on GitHub (Mar 8, 2025):

This still isn't fixed for me even with the latest version.

<!-- gh-comment-id:2708143843 --> @DangerousBerries commented on GitHub (Mar 8, 2025): This still isn't fixed for me even with the latest version.
Author
Owner

@thiswillbeyourgithub commented on GitHub (Mar 8, 2025):

As of today it is still present in the "latest" docker image (this one I believe) and disappears when I switch back to 0.5.12 via image: ollama/ollama:0.5.12. Using the 0.5.13 client with the 0.5.12 docker container so far causes no issue.

<!-- gh-comment-id:2708352188 --> @thiswillbeyourgithub commented on GitHub (Mar 8, 2025): As of today it is still present in the "latest" docker image ([this one I believe](https://hub.docker.com/layers/ollama/ollama/latest/images/sha256-2853c8e31d8cf3f9c6a1e15ffd3d1085226026030378415907f91ea16a6c3c86)) and disappears when I switch back to 0.5.12 via `image: ollama/ollama:0.5.12`. Using the 0.5.13 client with the 0.5.12 docker container so far causes no issue.
Author
Owner

@ChellShort commented on GitHub (Mar 11, 2025):

Downgrading to ollama 0.5.12 resolved the issue for me

<!-- gh-comment-id:2714478738 --> @ChellShort commented on GitHub (Mar 11, 2025): Downgrading to ollama 0.5.12 resolved the issue for me
Author
Owner

@DangerousBerries commented on GitHub (Mar 12, 2025):

@jmorganca this should probably be reopened considering that people continue coming here with the same problem.

<!-- gh-comment-id:2716703410 --> @DangerousBerries commented on GitHub (Mar 12, 2025): @jmorganca this should probably be reopened considering that people continue coming here with the same problem.
Author
Owner

@juangon commented on GitHub (Mar 12, 2025):

@DangerousBerries there was a new release few hours ago fixing it

<!-- gh-comment-id:2716787130 --> @juangon commented on GitHub (Mar 12, 2025): @DangerousBerries there was a new release few hours ago fixing it
Author
Owner

@DangerousBerries commented on GitHub (Mar 13, 2025):

@DangerousBerries there was a new release few hours ago fixing it

You're right, thank you for letting me know.

<!-- gh-comment-id:2720350942 --> @DangerousBerries commented on GitHub (Mar 13, 2025): > [@DangerousBerries](https://github.com/DangerousBerries) there was a new release few hours ago fixing it You're right, thank you for letting me know.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6199