[GH-ISSUE #8376] Ollama version doesn't properly truncate tokens to 512 max for official snowflake-arctic-embed-l model #67433

Open
opened 2026-05-04 10:19:48 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @shuaiscott on GitHub (Jan 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8376

What is the issue?

When using the official Ollama model of snowflake-arctic-embed-l (latest/335m - 21ab8b9b0545), if input is greater than 512 tokens, instead of truncating, the model encounters an error.

On a previous version (0.3.9) when you pass it more than 512 tokens, it returns only [0,0,0...] embeddings.
In 0.5.4, Ollama returns a 500 error and the logs show that "Process xxxxxx (ollama_llama_se) of user xxx dumped core"

Logs:

llama_model_load: vocab only - skipping tensors
ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
SIGSEGV: segmentation violation
PC=0x7fcc733ecc57 m=5 sigcode=1 addr=0x207203fe0
signal arrived during ago violation
goroutine 8 gp=0xc0000f21c0 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x562b649d47d0, 0xc000073b90)
        runtime/cgocall.go:167
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7fcbf115bfa0, {0x2, 0x7fcbf0b80590, 0x0, 0x0, 0x7fcbf0b80da0, 0x7fcbf0b815b, 0x7fcbf0b81dc0, 0x7fcbf1144dc0})
...

I've checked my Ollama parameters and this occurs when "truncate": true. Other embedding models properly truncates the input and I see the INFO log in Ollama say "input truncated". I don't see this message with snowflake-arctic-embed-l.

When "truncate" is set to false, I get the expected "input length exceeds maximum context length".

https://ollama.com/library/snowflake-arctic-embed

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.4

Originally created by @shuaiscott on GitHub (Jan 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8376 ### What is the issue? When using the official Ollama model of snowflake-arctic-embed-l (latest/335m - 21ab8b9b0545), if input is greater than 512 tokens, instead of truncating, the model encounters an error. On a previous version (0.3.9) when you pass it more than 512 tokens, it returns only [0,0,0...] embeddings. In 0.5.4, Ollama returns a 500 error and the logs show that "Process xxxxxx (ollama_llama_se) of user xxx dumped core" Logs: ``` llama_model_load: vocab only - skipping tensors ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed SIGSEGV: segmentation violation PC=0x7fcc733ecc57 m=5 sigcode=1 addr=0x207203fe0 signal arrived during ago violation goroutine 8 gp=0xc0000f21c0 m=5 mp=0xc000100008 [syscall]: runtime.cgocall(0x562b649d47d0, 0xc000073b90) runtime/cgocall.go:167 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7fcbf115bfa0, {0x2, 0x7fcbf0b80590, 0x0, 0x0, 0x7fcbf0b80da0, 0x7fcbf0b815b, 0x7fcbf0b81dc0, 0x7fcbf1144dc0}) ... ``` I've checked my Ollama parameters and this occurs when "truncate": true. Other embedding models properly truncates the input and I see the INFO log in Ollama say "input truncated". I don't see this message with snowflake-arctic-embed-l. When "truncate" is set to false, I get the expected "input length exceeds maximum context length". https://ollama.com/library/snowflake-arctic-embed ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.4
GiteaMirror added the bug label 2026-05-04 10:19:48 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 10, 2025):

https://github.com/ollama/ollama/issues/7288

The problem can be worked around by setting num_ctx for the model to the actual context length of the model, rather than the default value of 2048 that ollama uses. You can either do that by setting num_ctx in the API call ("options":{"num_ctx":512}) or by creating a copy of the model with the parameter:

$ ollama show --modelfile  snowflake-arctic-embed:l > Modelfile
$ echo PARAMETER num_ctx 512 >> Modelfile
$ ollama create snowflake-arctic-embed:l-c512

and then adjust the client to use snowflake-arctic-embed:l-c512 instead of snowflake-arctic-embed:l.

<!-- gh-comment-id:2583021192 --> @rick-github commented on GitHub (Jan 10, 2025): https://github.com/ollama/ollama/issues/7288 The problem can be worked around by setting `num_ctx` for the model to the actual context length of the model, rather than the default value of 2048 that ollama uses. You can either do that by setting `num_ctx` in the API call (`"options":{"num_ctx":512}`) or by creating a copy of the model with the parameter: ```console $ ollama show --modelfile snowflake-arctic-embed:l > Modelfile $ echo PARAMETER num_ctx 512 >> Modelfile $ ollama create snowflake-arctic-embed:l-c512 ``` and then adjust the client to use `snowflake-arctic-embed:l-c512` instead of `snowflake-arctic-embed:l`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67433