[GH-ISSUE #10685] v0.7.0 crashes when using snowflake-arctic-embed2 #32783

Closed
opened 2026-04-22 14:37:08 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @Blumlaut on GitHub (May 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10685

Originally assigned to: @jmorganca on GitHub.

What is the issue?

When using Ollama 0.7.0-rc0 with qwen3:8b and open-webui knowledge (using snowflake-arctic-embed2 as the embedding model), ollama crashes and restarts, works fine when not using knowledge.

0.6.9 works perfectly fine using the same parameters

Relevant log output

https://pastebin.com/ike4qdHF

OS

Docker (Debian Bullseye in a Proxmox KVM)

GPU

Nvidia RTX 3060 12GB LHR (Driver Version 535.216.03)

CPU

AMD Ryzen 5 5600G

Ollama version

v0.7.0-rc0

Originally created by @Blumlaut on GitHub (May 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10685 Originally assigned to: @jmorganca on GitHub. ### What is the issue? When using Ollama 0.7.0-rc0 with `qwen3:8b` and open-webui knowledge (using `snowflake-arctic-embed2` as the embedding model), ollama crashes and restarts, works fine when not using knowledge. 0.6.9 works perfectly fine using the same parameters ### Relevant log output ```shell https://pastebin.com/ike4qdHF ``` ### OS Docker (Debian Bullseye in a Proxmox KVM) ### GPU Nvidia RTX 3060 12GB LHR (Driver Version 535.216.03) ### CPU AMD Ryzen 5 5600G ### Ollama version v0.7.0-rc0
GiteaMirror added the bug label 2026-04-22 14:37:08 -05:00
Author
Owner

@rick-github commented on GitHub (May 13, 2025):

Seems to be a problem with snowflake-arctic-embed2:latest. None of the other embedding models in my library are affected.

llama-vocab.cpp:1472: GGML_ASSERT(pc_type == GGUF_TYPE_INT8 || pc_type == GGUF_TYPE_UINT8) failed
/bin/ollama(+0x1057118)[0x5584c1e66118]
/bin/ollama(ggml_abort+0x136)[0x5584c1e66436]
/bin/ollama(_ZN11llama_vocab4impl4loadER18llama_model_loaderRK6LLM_KV+0x2cf3)[0x5584c1f73dd3]
/bin/ollama(_ZN11llama_model10load_vocabER18llama_model_loader+0x30)[0x5584c1f0b1c0]
/bin/ollama(+0x116c96a)[0x5584c1f7b96a]
/bin/ollama(llama_model_load_from_file+0xa2)[0x5584c1f7c152]
/bin/ollama(_cgo_e5c1b5ac37b2_Cfunc_llama_model_load_from_file+0x4b)[0x5584c1e4043b]
/bin/ollama(+0x3a2461)[0x5584c11b1461]
SIGABRT: abort
PC=0x7fe1de65d00b m=11 sigcode=18446744073709551610
signal arrived during cgo execution
 
goroutine 23 gp=0xc000103340 m=11 mp=0xc000680008 [syscall]:
runtime.cgocall(0x5584c1e403f0, 0xc000698a10)
	runtime/cgocall.go:167 +0x4b fp=0xc0006989e8 sp=0xc0006989b0 pc=0x5584c11a6dab
github.com/ollama/ollama/llama._Cfunc_llama_model_load_from_file(0x7fe1480c11f0, {0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...})
<!-- gh-comment-id:2876097857 --> @rick-github commented on GitHub (May 13, 2025): Seems to be a problem with snowflake-arctic-embed2:latest. None of the other embedding models in my library are affected. ``` llama-vocab.cpp:1472: GGML_ASSERT(pc_type == GGUF_TYPE_INT8 || pc_type == GGUF_TYPE_UINT8) failed /bin/ollama(+0x1057118)[0x5584c1e66118] /bin/ollama(ggml_abort+0x136)[0x5584c1e66436] /bin/ollama(_ZN11llama_vocab4impl4loadER18llama_model_loaderRK6LLM_KV+0x2cf3)[0x5584c1f73dd3] /bin/ollama(_ZN11llama_model10load_vocabER18llama_model_loader+0x30)[0x5584c1f0b1c0] /bin/ollama(+0x116c96a)[0x5584c1f7b96a] /bin/ollama(llama_model_load_from_file+0xa2)[0x5584c1f7c152] /bin/ollama(_cgo_e5c1b5ac37b2_Cfunc_llama_model_load_from_file+0x4b)[0x5584c1e4043b] /bin/ollama(+0x3a2461)[0x5584c11b1461] SIGABRT: abort PC=0x7fe1de65d00b m=11 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 23 gp=0xc000103340 m=11 mp=0xc000680008 [syscall]: runtime.cgocall(0x5584c1e403f0, 0xc000698a10) runtime/cgocall.go:167 +0x4b fp=0xc0006989e8 sp=0xc0006989b0 pc=0x5584c11a6dab github.com/ollama/ollama/llama._Cfunc_llama_model_load_from_file(0x7fe1480c11f0, {0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...}) ```
Author
Owner

@Blumlaut commented on GitHub (May 13, 2025):

Seems to be a problem with snowflake-arctic-embed2:latest. None of the other embedding models in my library are affected.

good to know! I didn't have time to test a different model but it seems you're right, i updated the issue title accordingly.

<!-- gh-comment-id:2876183156 --> @Blumlaut commented on GitHub (May 13, 2025): > Seems to be a problem with snowflake-arctic-embed2:latest. None of the other embedding models in my library are affected. good to know! I didn't have time to test a different model but it seems you're right, i updated the issue title accordingly.
Author
Owner

@rick-github commented on GitHub (May 13, 2025):

Appears to be a result of 10d2af0eaa in llama.cpp, where an assert was added for the type of tokenizer.ggml.precompiled_charsmap:

c7f4ae7b9c/llama/llama.cpp/src/llama-vocab.cpp (L1469-L1472)

The problem seems to be that in snowflake-arctic-embed2, this is declared as [STRING]:

$ dumpgguf --no-tensors /root/.ollama/models/blobs/sha256-8c625c9569c3c799f5f9595b5a141f91d224233055608189d66746347c14e613 | grep tokenizer.ggml.precompiled_charsmap
     34: [STRING]   |   316720 | tokenizer.ggml.precompiled_charsmap = ['A', 'L', 'Q', 'C', 'A', 'A', ...]

whereas in other models it's [UINT8], eg granite-embedding:278m:

$ dumpgguf --no-tensors /root/.ollama/models/blobs/sha256-a658c903bf83a9d84fe1b9869874229ddfdfc1c88b04f958f17dd24e243d262c | grep tokenizer.ggml.precompiled_charsmap
     30: [UINT8]    |   237539 | tokenizer.ggml.precompiled_charsmap = [0, 180, 2, 0, 0, 132, ...]
<!-- gh-comment-id:2876368773 --> @rick-github commented on GitHub (May 13, 2025): Appears to be a result of https://github.com/ggml-org/llama.cpp/commit/10d2af0eaa0aafd7c6577b279dfa5221ff44a63f in llama.cpp, where an assert was added for the type of `tokenizer.ggml.precompiled_charsmap`: https://github.com/ollama/ollama/blob/c7f4ae7b9c8976b4d50c59eb87e9582ea9c5c82f/llama/llama.cpp/src/llama-vocab.cpp#L1469-L1472 The problem seems to be that in snowflake-arctic-embed2, this is declared as [STRING]: ```console $ dumpgguf --no-tensors /root/.ollama/models/blobs/sha256-8c625c9569c3c799f5f9595b5a141f91d224233055608189d66746347c14e613 | grep tokenizer.ggml.precompiled_charsmap 34: [STRING] | 316720 | tokenizer.ggml.precompiled_charsmap = ['A', 'L', 'Q', 'C', 'A', 'A', ...] ``` whereas in other models it's [UINT8], eg granite-embedding:278m: ```console $ dumpgguf --no-tensors /root/.ollama/models/blobs/sha256-a658c903bf83a9d84fe1b9869874229ddfdfc1c88b04f958f17dd24e243d262c | grep tokenizer.ggml.precompiled_charsmap 30: [UINT8] | 237539 | tokenizer.ggml.precompiled_charsmap = [0, 180, 2, 0, 0, 132, ...] ```
Author
Owner

@jmorganca commented on GitHub (May 13, 2025):

Closing for https://github.com/ollama/ollama/issues/10685

<!-- gh-comment-id:2877379352 --> @jmorganca commented on GitHub (May 13, 2025): Closing for https://github.com/ollama/ollama/issues/10685
Author
Owner

@ProjectMoon commented on GitHub (May 13, 2025):

Err, isn't this #10685? Lol.

<!-- gh-comment-id:2877439027 --> @ProjectMoon commented on GitHub (May 13, 2025): Err, isn't this #10685? Lol.
Author
Owner

@jmorganca commented on GitHub (May 13, 2025):

Oops!

<!-- gh-comment-id:2877675143 --> @jmorganca commented on GitHub (May 13, 2025): Oops!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32783