[GH-ISSUE #9499] Embedding model fails with SIGSEGV error #6188

New Issue

GiteaMirror · 2026-04-12T17:33:25-05:00

GiteaMirror commented

2026-04-12 17:33:25 -05:00

Originally created by @ProDG on GitHub (Mar 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9499

What is the issue?

I faced an issue when trying to build embeddings from a text, containing a lot of meaningless punctuation symbols (which were used for visual formatting of table of contents, I believe).

I'll provide a code based on LangChain, but the problem is definitely not in the LangChain itself, as you can see from the attached logs.

Code to reproduce the issue:

from langchain_ollama import OllamaEmbeddings

embeddings_model = OllamaEmbeddings(
    base_url=settings.OLLAMA_URL,
    model='jeffh/intfloat-multilingual-e5-large-instruct:f32',
)

print(embeddings_model.embed_query("Hello. How are you? Fine. " * 50))  # this works without any issues
print(embeddings_model.embed_query("Hello . . . . . . . . . . " * 50))  # this fails

I also tried other quantizations of the model:

jeffh/intfloat-multilingual-e5-large-instruct:f16
jeffh/intfloat-multilingual-e5-large-instruct:q8_0

And a different model:

zylonai/multilingual-e5-large:latest

Results were pretty much the same.

On the same setup, tens of megabytes of other texts were processed successfully.

Attached logs are from Ollama Docker container. It work in a VM with GPU (3090RTX with 24Gb of RAM), and there were no other issues with this setup.

If needed, I can provide an exact text on which I faced the issue (it was a text chunk retrieved via Unstructured from a book, of length of 972 bytes if I remember correctly).

Relevant log output

time=2025-03-04T15:02:05.084Z level=WARN source=types.go:512 msg="invalid option provided" option=tfs_z
time=2025-03-04T15:02:10.136Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.051770335 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed
time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:508 msg="updated VRAM based on existing loaded models" gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e library=cuda total="23.7 GiB" available="6.7 GiB"
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e parallel=1 available=7182680064 required="2.6 GiB"
time=2025-03-04T15:02:10.297Z level=INFO source=server.go:97 msg="system memory" total="47.0 GiB" free="36.7 GiB" free_swap="2.7 MiB"
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.297Z level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[6.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="12.0 MiB" memory.required.allocations="[2.6 GiB]" memory.weights.total="1.1 GiB" memory.weights.repeating="188.6 MiB" memory.weights.nonrepeating="976.6 MiB" memory.graph.full="32.0 MiB" memory.graph.partial="32.0 MiB"
time=2025-03-04T15:02:10.297Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 8 --parallel 1 --port 40671"
time=2025-03-04T15:02:10.298Z level=INFO source=sched.go:450 msg="loaded runners" count=3
time=2025-03-04T15:02:10.298Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-04T15:02:10.298Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-04T15:02:10.308Z level=INFO source=runner.go:932 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 600,610,620,700,720,750,800,860,870,890,900 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=8
time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:40671"
llama_load_model_from_file: using device CUDA0 (NVIDIA GeForce RTX 3090) - 6849 MiB free
llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = multilingual-e5-large-instruct
llama_model_loader: - kv   3:                       general.organization str              = Tmp
llama_model_loader: - kv   4:                           general.finetune str              = instruct
llama_model_loader: - kv   5:                           general.basename str              = intfloat-multilingual-e5
llama_model_loader: - kv   6:                         general.size_label str              = large
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["mteb", "sentence-transformers", "tr...
llama_model_loader: - kv   9:                          general.languages arr[str,94]      = ["multilingual", "af", "am", "ar", "a...
llama_model_loader: - kv  10:                           bert.block_count u32              = 24
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 1024
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 4096
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 16
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 1
llama_model_loader: - kv  19:                       tokenizer.ggml.model str              = t5
llama_model_loader: - kv  20:                         tokenizer.ggml.pre str              = default
time=2025-03-04T15:02:10.386Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3018933520000004 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed
llama_model_loader: - kv  21:                      tokenizer.ggml.tokens arr[str,250002]  = ["<s>", "<pad>", "</s>", "<unk>", ","...
llama_model_loader: - kv  22:                      tokenizer.ggml.scores arr[f32,250002]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,250002]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  25:            tokenizer.ggml.token_type_count u32              = 1
llama_model_loader: - kv  26:    tokenizer.ggml.remove_extra_whitespaces bool             = true
llama_model_loader: - kv  27:        tokenizer.ggml.precompiled_charsmap arr[u8,237539]   = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  31:          tokenizer.ggml.seperator_token_id u32              = 2
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  33:               tokenizer.ggml.mask_token_id u32              = 250001
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = true
llama_model_loader: - kv  36:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  389 tensors
time=2025-03-04T15:02:10.549Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 4
llm_load_vocab: token to piece cache size = 2.1668 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = bert
llm_load_print_meta: vocab type       = UGM
llm_load_print_meta: n_vocab          = 250002
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 512
llm_load_print_meta: n_embd           = 1024
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 4096
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 0
llm_load_print_meta: pooling type     = 1
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 512
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 335M
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 558.84 M
llm_load_print_meta: model size       = 2.08 GiB (32.00 BPW) 
llm_load_print_meta: general.name     = multilingual-e5-large-instruct
llm_load_print_meta: BOS token        = 0 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: SEP token        = 2 '</s>'
llm_load_print_meta: PAD token        = 1 '<pad>'
llm_load_print_meta: MASK token       = 250001 '[PAD250000]'
llm_load_print_meta: LF token         = 6 '▁'
llm_load_print_meta: EOG token        = 2 '</s>'
llm_load_print_meta: max token length = 48
llm_load_tensors: offloading 24 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors:        CUDA0 model buffer size =  1153.22 MiB
llm_load_tensors:   CPU_Mapped model buffer size =   978.58 MiB
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 2048
llama_new_context_with_model: n_ctx_per_seq = 2048
llama_new_context_with_model: n_batch       = 512
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 10000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow
llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1
llama_kv_cache_init:      CUDA0 KV buffer size =   192.00 MiB
llama_new_context_with_model: KV self size  =  192.00 MiB, K (f16):   96.00 MiB, V (f16):   96.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.00 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =    26.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =     6.00 MiB
llama_new_context_with_model: graph nodes  = 851
llama_new_context_with_model: graph splits = 4 (with bs=512), 2 (with bs=1)
time=2025-03-04T15:02:11.051Z level=INFO source=server.go:596 msg="llama runner started in 0.75 seconds"
llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = multilingual-e5-large-instruct
llama_model_loader: - kv   3:                       general.organization str              = Tmp
llama_model_loader: - kv   4:                           general.finetune str              = instruct
llama_model_loader: - kv   5:                           general.basename str              = intfloat-multilingual-e5
llama_model_loader: - kv   6:                         general.size_label str              = large
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["mteb", "sentence-transformers", "tr...
llama_model_loader: - kv   9:                          general.languages arr[str,94]      = ["multilingual", "af", "am", "ar", "a...
llama_model_loader: - kv  10:                           bert.block_count u32              = 24
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 1024
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 4096
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 16
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 1
llama_model_loader: - kv  19:                       tokenizer.ggml.model str              = t5
llama_model_loader: - kv  20:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  21:                      tokenizer.ggml.tokens arr[str,250002]  = ["<s>", "<pad>", "</s>", "<unk>", ","...
llama_model_loader: - kv  22:                      tokenizer.ggml.scores arr[f32,250002]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,250002]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  25:            tokenizer.ggml.token_type_count u32              = 1
llama_model_loader: - kv  26:    tokenizer.ggml.remove_extra_whitespaces bool             = true
llama_model_loader: - kv  27:        tokenizer.ggml.precompiled_charsmap arr[u8,237539]   = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  31:          tokenizer.ggml.seperator_token_id u32              = 2
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  33:               tokenizer.ggml.mask_token_id u32              = 250001
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = true
llama_model_loader: - kv  36:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  389 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 4
llm_load_vocab: token to piece cache size = 2.1668 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = bert
llm_load_print_meta: vocab type       = UGM
llm_load_print_meta: n_vocab          = 250002
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 558.84 M
llm_load_print_meta: model size       = 2.08 GiB (32.00 BPW) 
llm_load_print_meta: general.name     = multilingual-e5-large-instruct
llm_load_print_meta: BOS token        = 0 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: SEP token        = 2 '</s>'
llm_load_print_meta: PAD token        = 1 '<pad>'
llm_load_print_meta: MASK token       = 250001 '[PAD250000]'
llm_load_print_meta: LF token         = 6 '▁'
llm_load_print_meta: EOG token        = 2 '</s>'
llm_load_print_meta: max token length = 48
llama_model_load: vocab only - skipping tensors
//ml/backend/ggml/ggml/src/ggml-cpu/ggml-cpu.c:8456: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
SIGSEGV: segmentation violation
PC=0x79a828624c47 m=3 sigcode=1 addr=0x204a03fe0
signal arrived during cgo execution

goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]:
runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0)
        runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0})
        _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f
github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?)
        github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295
github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?)
        github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20)
        github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410})
        github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2()
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce
runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6680, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000123f00)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5
net.(*netFD).accept(0xc000123f00)
        net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9
net.(*TCPListener).accept(0xc000146b40)
        net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e
net.(*TCPListener).Accept(0xc000146b40)
        net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0
net/http.(*onceCloseListener).Accept(0xc0004ac000?)
        <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964
net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40})
        net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe})
        github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834
github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?})
        github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245
github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902
github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd
runtime.main()
        runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc00003e080)
        runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x57d1d2b02080)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889
runtime.bgscavenge(0xc00003e080)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc0001cb500 m=nil [chan receive]:
runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?)
        runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0000ac310, 0x0, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 10 gp=0xc000504700 m=nil [chan receive]:
runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?)
        runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc
runtime.chanrecv1(0xc00069cd80?, 0xc000384808?)
        runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?)
        <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6
net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?)
        net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9
net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea
net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?)
        net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e
net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10})
        net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485

goroutine 28 gp=0xc000105500 m=nil [IO wait]:
runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?)
        runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce
runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6568, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa
net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?})
        net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05
net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?})
        net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205
net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?})
        <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405
net/http.(*connReader).backgroundRead(0xc0000b6630)
        net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1
created by net/http.(*connReader).startBackgroundRead in goroutine 10
        net/http/server.go:686 +0xb6

rax    0x204a03fe0
rbx    0x79a8241707b0
rcx    0xff8
rdx    0x79a824008980
rdi    0x79a824008990
rsi    0x0
rbp    0x79a82bdff2b0
rsp    0x79a82bdff290
r8     0x0
r9     0x79a833c2ea28
r10    0x0
r11    0x246
r12    0x79a741384540
r13    0x79a824008990
r14    0x0
r15    0x57d1e9240f70
rip    0x79a828624c47
rflags 0x10297
cs     0x33
fs     0x0
gs     0x0
SIGABRT: abort
PC=0x79a87a90500b m=3 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]:
runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0)
        runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0})
        _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f
github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?)
        github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295
github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?)
        github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20)
        github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410})
        github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2()
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce
runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6680, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000123f00)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5
net.(*netFD).accept(0xc000123f00)
        net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9
net.(*TCPListener).accept(0xc000146b40)
        net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e
net.(*TCPListener).Accept(0xc000146b40)
        net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0
net/http.(*onceCloseListener).Accept(0xc0004ac000?)
        <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964
net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40})
        net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe})
        github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834
github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?})
        github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245
github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902
github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd
runtime.main()
        runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc00003e080)
        runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x57d1d2b02080)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889
runtime.bgscavenge(0xc00003e080)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc0001cb500 m=nil [chan receive]:
runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?)
        runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0000ac310, 0x0, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 10 gp=0xc000504700 m=nil [chan receive]:
runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?)
        runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc
runtime.chanrecv1(0xc00069cd80?, 0xc000384808?)
        runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?)
        <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6
net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?)
        net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9
net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea
net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?)
        net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e
net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10})
        net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485

goroutine 28 gp=0xc000105500 m=nil [IO wait]:
runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?)
        runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce
runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6568, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa
net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?})
        net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05
net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?})
        net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205
net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?})
        <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405
net/http.(*connReader).backgroundRead(0xc0000b6630)
        net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1
created by net/http.(*connReader).startBackgroundRead in goroutine 10
        net/http/server.go:686 +0xb6

rax    0x0
rbx    0x79a82be00700
rcx    0x79a87a90500b
rdx    0x0
rdi    0x2
rsi    0x79a82bdff2b0
rbp    0x79a8329a801d
rsp    0x79a82bdff2b0
r8     0x0
r9     0x79a82bdff2b0
r10    0x8
r11    0x246
r12    0x79a8329a8790
r13    0x2108
r14    0x1
r15    0x79a74132e040
rip    0x79a87a90500b
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
[GIN] 2025/03/04 - 15:02:11 | 500 |  6.507803808s |    192.168.1.75 | POST     "/api/embed"
time=2025-03-04T15:02:11.591Z level=ERROR source=routes.go:478 msg="embedding generation failed" error="do embedding request: Post \"http://127.0.0.1:40671/embedding\": EOF"
time=2025-03-04T15:02:11.600Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2"

OS

Linux

GPU

nVidia 3090RTX 24Gb RAM

CPU

Intel Core i7-12700kf

Ollama version

0.5.12

Originally created by @ProDG on GitHub (Mar 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9499 ### What is the issue? I faced an issue when trying to build embeddings from a text, containing a lot of meaningless punctuation symbols (which were used for visual formatting of table of contents, I believe). I'll provide a code based on LangChain, but the problem is definitely not in the LangChain itself, as you can see from the attached logs. Code to reproduce the issue: ```python from langchain_ollama import OllamaEmbeddings embeddings_model = OllamaEmbeddings( base_url=settings.OLLAMA_URL, model='jeffh/intfloat-multilingual-e5-large-instruct:f32', ) print(embeddings_model.embed_query("Hello. How are you? Fine. " * 50)) # this works without any issues print(embeddings_model.embed_query("Hello . . . . . . . . . . " * 50)) # this fails ``` I also tried other quantizations of the model: * `jeffh/intfloat-multilingual-e5-large-instruct:f16` * `jeffh/intfloat-multilingual-e5-large-instruct:q8_0` And a different model: * `zylonai/multilingual-e5-large:latest` Results were pretty much the same. On the same setup, tens of megabytes of other texts were processed successfully. Attached logs are from Ollama Docker container. It work in a VM with GPU (3090RTX with 24Gb of RAM), and there were no other issues with this setup. If needed, I can provide an exact text on which I faced the issue (it was a text chunk retrieved via `Unstructured` from a book, of length of 972 bytes if I remember correctly). ### Relevant log output ```shell time=2025-03-04T15:02:05.084Z level=WARN source=types.go:512 msg="invalid option provided" option=tfs_z time=2025-03-04T15:02:10.136Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.051770335 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:508 msg="updated VRAM based on existing loaded models" gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e library=cuda total="23.7 GiB" available="6.7 GiB" time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64 time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64 time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e parallel=1 available=7182680064 required="2.6 GiB" time=2025-03-04T15:02:10.297Z level=INFO source=server.go:97 msg="system memory" total="47.0 GiB" free="36.7 GiB" free_swap="2.7 MiB" time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64 time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64 time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.297Z level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[6.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="12.0 MiB" memory.required.allocations="[2.6 GiB]" memory.weights.total="1.1 GiB" memory.weights.repeating="188.6 MiB" memory.weights.nonrepeating="976.6 MiB" memory.graph.full="32.0 MiB" memory.graph.partial="32.0 MiB" time=2025-03-04T15:02:10.297Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 8 --parallel 1 --port 40671" time=2025-03-04T15:02:10.298Z level=INFO source=sched.go:450 msg="loaded runners" count=3 time=2025-03-04T15:02:10.298Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-04T15:02:10.298Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-03-04T15:02:10.308Z level=INFO source=runner.go:932 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 600,610,620,700,720,750,800,860,870,890,900 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=8 time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:40671" llama_load_model_from_file: using device CUDA0 (NVIDIA GeForce RTX 3090) - 6849 MiB free llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = multilingual-e5-large-instruct llama_model_loader: - kv 3: general.organization str = Tmp llama_model_loader: - kv 4: general.finetune str = instruct llama_model_loader: - kv 5: general.basename str = intfloat-multilingual-e5 llama_model_loader: - kv 6: general.size_label str = large llama_model_loader: - kv 7: general.license str = mit llama_model_loader: - kv 8: general.tags arr[str,3] = ["mteb", "sentence-transformers", "tr... llama_model_loader: - kv 9: general.languages arr[str,94] = ["multilingual", "af", "am", "ar", "a... llama_model_loader: - kv 10: bert.block_count u32 = 24 llama_model_loader: - kv 11: bert.context_length u32 = 512 llama_model_loader: - kv 12: bert.embedding_length u32 = 1024 llama_model_loader: - kv 13: bert.feed_forward_length u32 = 4096 llama_model_loader: - kv 14: bert.attention.head_count u32 = 16 llama_model_loader: - kv 15: bert.attention.layer_norm_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 0 llama_model_loader: - kv 17: bert.attention.causal bool = false llama_model_loader: - kv 18: bert.pooling_type u32 = 1 llama_model_loader: - kv 19: tokenizer.ggml.model str = t5 llama_model_loader: - kv 20: tokenizer.ggml.pre str = default time=2025-03-04T15:02:10.386Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3018933520000004 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed llama_model_loader: - kv 21: tokenizer.ggml.tokens arr[str,250002] = ["<s>", "<pad>", "</s>", "<unk>", ","... llama_model_loader: - kv 22: tokenizer.ggml.scores arr[f32,250002] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,250002] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.add_space_prefix bool = true llama_model_loader: - kv 25: tokenizer.ggml.token_type_count u32 = 1 llama_model_loader: - kv 26: tokenizer.ggml.remove_extra_whitespaces bool = true llama_model_loader: - kv 27: tokenizer.ggml.precompiled_charsmap arr[u8,237539] = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,... llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 31: tokenizer.ggml.seperator_token_id u32 = 2 llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 1 llama_model_loader: - kv 33: tokenizer.ggml.mask_token_id u32 = 250001 llama_model_loader: - kv 34: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 35: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 36: general.quantization_version u32 = 2 llama_model_loader: - type f32: 389 tensors time=2025-03-04T15:02:10.549Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 4 llm_load_vocab: token to piece cache size = 2.1668 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = bert llm_load_print_meta: vocab type = UGM llm_load_print_meta: n_vocab = 250002 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 512 llm_load_print_meta: n_embd = 1024 llm_load_print_meta: n_layer = 24 llm_load_print_meta: n_head = 16 llm_load_print_meta: n_head_kv = 16 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 0.0e+00 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 4096 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 0 llm_load_print_meta: pooling type = 1 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 512 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 335M llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 558.84 M llm_load_print_meta: model size = 2.08 GiB (32.00 BPW) llm_load_print_meta: general.name = multilingual-e5-large-instruct llm_load_print_meta: BOS token = 0 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: SEP token = 2 '</s>' llm_load_print_meta: PAD token = 1 '<pad>' llm_load_print_meta: MASK token = 250001 '[PAD250000]' llm_load_print_meta: LF token = 6 '▁' llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: max token length = 48 llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 25/25 layers to GPU llm_load_tensors: CUDA0 model buffer size = 1153.22 MiB llm_load_tensors: CPU_Mapped model buffer size = 978.58 MiB llama_new_context_with_model: n_seq_max = 1 llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_ctx_per_seq = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1 llama_kv_cache_init: CUDA0 KV buffer size = 192.00 MiB llama_new_context_with_model: KV self size = 192.00 MiB, K (f16): 96.00 MiB, V (f16): 96.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.00 MiB llama_new_context_with_model: CUDA0 compute buffer size = 26.00 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 6.00 MiB llama_new_context_with_model: graph nodes = 851 llama_new_context_with_model: graph splits = 4 (with bs=512), 2 (with bs=1) time=2025-03-04T15:02:11.051Z level=INFO source=server.go:596 msg="llama runner started in 0.75 seconds" llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = multilingual-e5-large-instruct llama_model_loader: - kv 3: general.organization str = Tmp llama_model_loader: - kv 4: general.finetune str = instruct llama_model_loader: - kv 5: general.basename str = intfloat-multilingual-e5 llama_model_loader: - kv 6: general.size_label str = large llama_model_loader: - kv 7: general.license str = mit llama_model_loader: - kv 8: general.tags arr[str,3] = ["mteb", "sentence-transformers", "tr... llama_model_loader: - kv 9: general.languages arr[str,94] = ["multilingual", "af", "am", "ar", "a... llama_model_loader: - kv 10: bert.block_count u32 = 24 llama_model_loader: - kv 11: bert.context_length u32 = 512 llama_model_loader: - kv 12: bert.embedding_length u32 = 1024 llama_model_loader: - kv 13: bert.feed_forward_length u32 = 4096 llama_model_loader: - kv 14: bert.attention.head_count u32 = 16 llama_model_loader: - kv 15: bert.attention.layer_norm_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 0 llama_model_loader: - kv 17: bert.attention.causal bool = false llama_model_loader: - kv 18: bert.pooling_type u32 = 1 llama_model_loader: - kv 19: tokenizer.ggml.model str = t5 llama_model_loader: - kv 20: tokenizer.ggml.pre str = default llama_model_loader: - kv 21: tokenizer.ggml.tokens arr[str,250002] = ["<s>", "<pad>", "</s>", "<unk>", ","... llama_model_loader: - kv 22: tokenizer.ggml.scores arr[f32,250002] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,250002] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.add_space_prefix bool = true llama_model_loader: - kv 25: tokenizer.ggml.token_type_count u32 = 1 llama_model_loader: - kv 26: tokenizer.ggml.remove_extra_whitespaces bool = true llama_model_loader: - kv 27: tokenizer.ggml.precompiled_charsmap arr[u8,237539] = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,... llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 31: tokenizer.ggml.seperator_token_id u32 = 2 llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 1 llama_model_loader: - kv 33: tokenizer.ggml.mask_token_id u32 = 250001 llama_model_loader: - kv 34: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 35: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 36: general.quantization_version u32 = 2 llama_model_loader: - type f32: 389 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 4 llm_load_vocab: token to piece cache size = 2.1668 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = bert llm_load_print_meta: vocab type = UGM llm_load_print_meta: n_vocab = 250002 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 1 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 558.84 M llm_load_print_meta: model size = 2.08 GiB (32.00 BPW) llm_load_print_meta: general.name = multilingual-e5-large-instruct llm_load_print_meta: BOS token = 0 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: SEP token = 2 '</s>' llm_load_print_meta: PAD token = 1 '<pad>' llm_load_print_meta: MASK token = 250001 '[PAD250000]' llm_load_print_meta: LF token = 6 '▁' llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: max token length = 48 llama_model_load: vocab only - skipping tensors //ml/backend/ggml/ggml/src/ggml-cpu/ggml-cpu.c:8456: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed SIGSEGV: segmentation violation PC=0x79a828624c47 m=3 sigcode=1 addr=0x204a03fe0 signal arrived during cgo execution goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]: runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0) runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0}) _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?) github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295 github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?) github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20) github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5 goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6680, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000123f00) internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5 net.(*netFD).accept(0xc000123f00) net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9 net.(*TCPListener).accept(0xc000146b40) net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e net.(*TCPListener).Accept(0xc000146b40) net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0 net/http.(*onceCloseListener).Accept(0xc0004ac000?) <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964 net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40}) net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe}) github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834 github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?}) github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245 github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902 github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd runtime.main() runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1 goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc00003e080) runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x57d1d2b02080) runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889 runtime.bgscavenge(0xc00003e080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?) runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 6 gp=0xc0001cb500 m=nil [chan receive]: runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?) runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0000ac310, 0x0, 0x1) runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]: runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 10 gp=0xc000504700 m=nil [chan receive]: runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?) runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1) runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc runtime.chanrecv1(0xc00069cd80?, 0xc000384808?) runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?) <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6 net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?) net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9 net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?) net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10}) net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890 net/http.(*Server).Serve.gowrap3() net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3360 +0x485 goroutine 28 gp=0xc000105500 m=nil [IO wait]: runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?) runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6568, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?}) net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05 net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?}) net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205 net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?}) <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405 net/http.(*connReader).backgroundRead(0xc0000b6630) net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1 created by net/http.(*connReader).startBackgroundRead in goroutine 10 net/http/server.go:686 +0xb6 rax 0x204a03fe0 rbx 0x79a8241707b0 rcx 0xff8 rdx 0x79a824008980 rdi 0x79a824008990 rsi 0x0 rbp 0x79a82bdff2b0 rsp 0x79a82bdff290 r8 0x0 r9 0x79a833c2ea28 r10 0x0 r11 0x246 r12 0x79a741384540 r13 0x79a824008990 r14 0x0 r15 0x57d1e9240f70 rip 0x79a828624c47 rflags 0x10297 cs 0x33 fs 0x0 gs 0x0 SIGABRT: abort PC=0x79a87a90500b m=3 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]: runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0) runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0}) _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?) github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295 github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?) github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20) github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5 goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6680, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000123f00) internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5 net.(*netFD).accept(0xc000123f00) net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9 net.(*TCPListener).accept(0xc000146b40) net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e net.(*TCPListener).Accept(0xc000146b40) net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0 net/http.(*onceCloseListener).Accept(0xc0004ac000?) <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964 net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40}) net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe}) github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834 github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?}) github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245 github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902 github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd runtime.main() runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1 goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc00003e080) runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x57d1d2b02080) runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889 runtime.bgscavenge(0xc00003e080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?) runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 6 gp=0xc0001cb500 m=nil [chan receive]: runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?) runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0000ac310, 0x0, 0x1) runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]: runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 10 gp=0xc000504700 m=nil [chan receive]: runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?) runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1) runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc runtime.chanrecv1(0xc00069cd80?, 0xc000384808?) runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?) <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6 net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?) net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9 net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?) net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10}) net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890 net/http.(*Server).Serve.gowrap3() net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3360 +0x485 goroutine 28 gp=0xc000105500 m=nil [IO wait]: runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?) runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6568, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?}) net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05 net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?}) net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205 net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?}) <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405 net/http.(*connReader).backgroundRead(0xc0000b6630) net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1 created by net/http.(*connReader).startBackgroundRead in goroutine 10 net/http/server.go:686 +0xb6 rax 0x0 rbx 0x79a82be00700 rcx 0x79a87a90500b rdx 0x0 rdi 0x2 rsi 0x79a82bdff2b0 rbp 0x79a8329a801d rsp 0x79a82bdff2b0 r8 0x0 r9 0x79a82bdff2b0 r10 0x8 r11 0x246 r12 0x79a8329a8790 r13 0x2108 r14 0x1 r15 0x79a74132e040 rip 0x79a87a90500b rflags 0x246 cs 0x33 fs 0x0 gs 0x0 [GIN] 2025/03/04 - 15:02:11 | 500 | 6.507803808s | 192.168.1.75 | POST "/api/embed" time=2025-03-04T15:02:11.591Z level=ERROR source=routes.go:478 msg="embedding generation failed" error="do embedding request: Post \"http://127.0.0.1:40671/embedding\": EOF" time=2025-03-04T15:02:11.600Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2" ``` ### OS Linux ### GPU nVidia 3090RTX 24Gb RAM ### CPU Intel Core i7-12700kf ### Ollama version 0.5.12

GiteaMirror added the bug label 2026-04-12 17:33:25 -05:00

GiteaMirror closed this issue

2026-04-12 17:33:25 -05:00

GiteaMirror commented

2026-04-12 17:33:26 -05:00

@rick-github commented on GitHub (Mar 4, 2025):

llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow

https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109

@rick-github commented on GitHub (Mar 4, 2025): ``` llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow ``` https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109

GiteaMirror commented

2026-04-12 17:33:27 -05:00

@ProDG commented on GitHub (Mar 5, 2025):

@rick-github thank you, this is helpful.

In general, it feels like a design flaw, and I can't even say if it is Ollama or LangChain or Ustructured or all of them.

In RAG pipelines, document is split into chunks (whose size limit is set in characters). And then, when the chunks are fed into Ollama to generate embeddings, there is a context size limit in tokens. And, as I understand, there is no way to ensure that the particular 1000 characters do not exceeed 510 tokens other than trying to generate the embedding and see if it will not fail.

For now, I'll probably just add a code which will cleanup " . " sequences from the initial text — it will help in this particular case and similar ones. And also will wrap the embedding generation call into try .. except to not fail the whole pipeline at one problematic chunk.

But both 'fixes' are ugly. Any suggestions are appreciated.

@ProDG commented on GitHub (Mar 5, 2025): @rick-github thank you, this is helpful. In general, it feels like a design flaw, and I can't even say if it is Ollama or LangChain or Ustructured or all of them. In RAG pipelines, document is split into chunks (whose size limit is set in _characters_). And then, when the chunks are fed into Ollama to generate embeddings, there is a context size limit in _tokens_. And, as I understand, there is no way to ensure that the particular 1000 characters do not exceeed 510 tokens other than trying to generate the embedding and see if it will not fail. For now, I'll probably just add a code which will cleanup `" . "` sequences from the initial text — it will help in this particular case and similar ones. And also will wrap the embedding generation call into `try .. except` to not fail the whole pipeline at one problematic chunk. But both 'fixes' are ugly. Any suggestions are appreciated.

GiteaMirror commented

2026-04-12 17:33:28 -05:00

@rick-github commented on GitHub (Mar 5, 2025):

If you set num_ctx to 512, it won't fail. The issue is that the chunk size of the pipeline is too big. Ignoring the ollama failure, if you send a chunk for embedding that's too big for the context window of the model, then the embedding you get back is going to be missing information. It that's acceptable for the RAG application, then just set num_ctx. If not, then either the chunk size needs to be set to a static size that doesn't exceed the context buffer, or it needs to be dynamically determined during the embedding. You're right that the tokenization is opaque, see #3582. Until then the only option is to determine a character:token ratio and use that to set the chunk size.

$ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello. How are you? Fine. " * 20, end="")')"'"}' | jq .prompt_eval_count
160
$ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello . . . . . . . . . . " * 20, end="")')"'"}' | jq .prompt_eval_count
420
$ python -c 'print("Hello . . . . . . . . . . " * 20, end="")' | wc -c
520

So about 520:420, or a chunk size of (512 * (520 / 420)) = 633.

@rick-github commented on GitHub (Mar 5, 2025): If you set `num_ctx` to 512, it won't fail. The issue is that the chunk size of the pipeline is too big. Ignoring the ollama failure, if you send a chunk for embedding that's too big for the context window of the model, then the embedding you get back is going to be missing information. It that's acceptable for the RAG application, then just set `num_ctx`. If not, then either the chunk size needs to be set to a static size that doesn't exceed the context buffer, or it needs to be dynamically determined during the embedding. You're right that the tokenization is opaque, see #3582. Until then the only option is to determine a character:token ratio and use that to set the chunk size. ```console $ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello. How are you? Fine. " * 20, end="")')"'"}' | jq .prompt_eval_count 160 $ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello . . . . . . . . . . " * 20, end="")')"'"}' | jq .prompt_eval_count 420 $ python -c 'print("Hello . . . . . . . . . . " * 20, end="")' | wc -c 520 ``` So about 520:420, or a chunk size of (512 * (520 / 420)) = 633.

GiteaMirror referenced this issue

2026-04-12 23:47:45 -05:00

[PR #6188] [CLOSED] Allow singular array for CompletionRequest prompt field #12048

GiteaMirror referenced this issue

2026-04-16 05:59:17 -05:00

[PR #6188] [CLOSED] Allow singular array for CompletionRequest prompt field #17319

GiteaMirror referenced this issue

2026-04-19 16:25:34 -05:00

[PR #6188] [CLOSED] Allow singular array for CompletionRequest prompt field #22588

GiteaMirror referenced this issue

2026-04-22 22:35:28 -05:00

[PR #6188] [CLOSED] Allow singular array for CompletionRequest prompt field #37921

GiteaMirror referenced this issue

2026-04-24 22:56:53 -05:00

[PR #6188] [CLOSED] Allow singular array for CompletionRequest prompt field #43296

GiteaMirror referenced this issue

2026-04-29 13:38:30 -05:00

[PR #6188] [CLOSED] Allow singular array for CompletionRequest prompt field #58745

GiteaMirror referenced this issue

2026-05-05 06:23:03 -05:00

[PR #6188] [CLOSED] Allow singular array for CompletionRequest prompt field #74342

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#6188