[GH-ISSUE #9499] Embedding model fails with SIGSEGV error #6188

Closed
opened 2026-04-12 17:33:25 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @ProDG on GitHub (Mar 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9499

What is the issue?

I faced an issue when trying to build embeddings from a text, containing a lot of meaningless punctuation symbols (which were used for visual formatting of table of contents, I believe).

I'll provide a code based on LangChain, but the problem is definitely not in the LangChain itself, as you can see from the attached logs.

Code to reproduce the issue:

from langchain_ollama import OllamaEmbeddings

embeddings_model = OllamaEmbeddings(
    base_url=settings.OLLAMA_URL,
    model='jeffh/intfloat-multilingual-e5-large-instruct:f32',
)

print(embeddings_model.embed_query("Hello. How are you? Fine. " * 50))  # this works without any issues
print(embeddings_model.embed_query("Hello . . . . . . . . . . " * 50))  # this fails

I also tried other quantizations of the model:

  • jeffh/intfloat-multilingual-e5-large-instruct:f16
  • jeffh/intfloat-multilingual-e5-large-instruct:q8_0

And a different model:

  • zylonai/multilingual-e5-large:latest

Results were pretty much the same.

On the same setup, tens of megabytes of other texts were processed successfully.

Attached logs are from Ollama Docker container. It work in a VM with GPU (3090RTX with 24Gb of RAM), and there were no other issues with this setup.

If needed, I can provide an exact text on which I faced the issue (it was a text chunk retrieved via Unstructured from a book, of length of 972 bytes if I remember correctly).

Relevant log output

time=2025-03-04T15:02:05.084Z level=WARN source=types.go:512 msg="invalid option provided" option=tfs_z
time=2025-03-04T15:02:10.136Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.051770335 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed
time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:508 msg="updated VRAM based on existing loaded models" gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e library=cuda total="23.7 GiB" available="6.7 GiB"
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64
time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e parallel=1 available=7182680064 required="2.6 GiB"
time=2025-03-04T15:02:10.297Z level=INFO source=server.go:97 msg="system memory" total="47.0 GiB" free="36.7 GiB" free_swap="2.7 MiB"
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64
time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-03-04T15:02:10.297Z level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[6.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="12.0 MiB" memory.required.allocations="[2.6 GiB]" memory.weights.total="1.1 GiB" memory.weights.repeating="188.6 MiB" memory.weights.nonrepeating="976.6 MiB" memory.graph.full="32.0 MiB" memory.graph.partial="32.0 MiB"
time=2025-03-04T15:02:10.297Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 8 --parallel 1 --port 40671"
time=2025-03-04T15:02:10.298Z level=INFO source=sched.go:450 msg="loaded runners" count=3
time=2025-03-04T15:02:10.298Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-04T15:02:10.298Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-04T15:02:10.308Z level=INFO source=runner.go:932 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 600,610,620,700,720,750,800,860,870,890,900 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=8
time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:40671"
llama_load_model_from_file: using device CUDA0 (NVIDIA GeForce RTX 3090) - 6849 MiB free
llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = multilingual-e5-large-instruct
llama_model_loader: - kv   3:                       general.organization str              = Tmp
llama_model_loader: - kv   4:                           general.finetune str              = instruct
llama_model_loader: - kv   5:                           general.basename str              = intfloat-multilingual-e5
llama_model_loader: - kv   6:                         general.size_label str              = large
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["mteb", "sentence-transformers", "tr...
llama_model_loader: - kv   9:                          general.languages arr[str,94]      = ["multilingual", "af", "am", "ar", "a...
llama_model_loader: - kv  10:                           bert.block_count u32              = 24
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 1024
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 4096
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 16
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 1
llama_model_loader: - kv  19:                       tokenizer.ggml.model str              = t5
llama_model_loader: - kv  20:                         tokenizer.ggml.pre str              = default
time=2025-03-04T15:02:10.386Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3018933520000004 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed
llama_model_loader: - kv  21:                      tokenizer.ggml.tokens arr[str,250002]  = ["<s>", "<pad>", "</s>", "<unk>", ","...
llama_model_loader: - kv  22:                      tokenizer.ggml.scores arr[f32,250002]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,250002]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  25:            tokenizer.ggml.token_type_count u32              = 1
llama_model_loader: - kv  26:    tokenizer.ggml.remove_extra_whitespaces bool             = true
llama_model_loader: - kv  27:        tokenizer.ggml.precompiled_charsmap arr[u8,237539]   = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  31:          tokenizer.ggml.seperator_token_id u32              = 2
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  33:               tokenizer.ggml.mask_token_id u32              = 250001
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = true
llama_model_loader: - kv  36:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  389 tensors
time=2025-03-04T15:02:10.549Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 4
llm_load_vocab: token to piece cache size = 2.1668 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = bert
llm_load_print_meta: vocab type       = UGM
llm_load_print_meta: n_vocab          = 250002
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 512
llm_load_print_meta: n_embd           = 1024
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 4096
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 0
llm_load_print_meta: pooling type     = 1
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 512
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 335M
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 558.84 M
llm_load_print_meta: model size       = 2.08 GiB (32.00 BPW) 
llm_load_print_meta: general.name     = multilingual-e5-large-instruct
llm_load_print_meta: BOS token        = 0 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: SEP token        = 2 '</s>'
llm_load_print_meta: PAD token        = 1 '<pad>'
llm_load_print_meta: MASK token       = 250001 '[PAD250000]'
llm_load_print_meta: LF token         = 6 '▁'
llm_load_print_meta: EOG token        = 2 '</s>'
llm_load_print_meta: max token length = 48
llm_load_tensors: offloading 24 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors:        CUDA0 model buffer size =  1153.22 MiB
llm_load_tensors:   CPU_Mapped model buffer size =   978.58 MiB
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 2048
llama_new_context_with_model: n_ctx_per_seq = 2048
llama_new_context_with_model: n_batch       = 512
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 10000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow
llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1
llama_kv_cache_init:      CUDA0 KV buffer size =   192.00 MiB
llama_new_context_with_model: KV self size  =  192.00 MiB, K (f16):   96.00 MiB, V (f16):   96.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.00 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =    26.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =     6.00 MiB
llama_new_context_with_model: graph nodes  = 851
llama_new_context_with_model: graph splits = 4 (with bs=512), 2 (with bs=1)
time=2025-03-04T15:02:11.051Z level=INFO source=server.go:596 msg="llama runner started in 0.75 seconds"
llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = multilingual-e5-large-instruct
llama_model_loader: - kv   3:                       general.organization str              = Tmp
llama_model_loader: - kv   4:                           general.finetune str              = instruct
llama_model_loader: - kv   5:                           general.basename str              = intfloat-multilingual-e5
llama_model_loader: - kv   6:                         general.size_label str              = large
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["mteb", "sentence-transformers", "tr...
llama_model_loader: - kv   9:                          general.languages arr[str,94]      = ["multilingual", "af", "am", "ar", "a...
llama_model_loader: - kv  10:                           bert.block_count u32              = 24
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 1024
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 4096
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 16
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 1
llama_model_loader: - kv  19:                       tokenizer.ggml.model str              = t5
llama_model_loader: - kv  20:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  21:                      tokenizer.ggml.tokens arr[str,250002]  = ["<s>", "<pad>", "</s>", "<unk>", ","...
llama_model_loader: - kv  22:                      tokenizer.ggml.scores arr[f32,250002]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,250002]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  25:            tokenizer.ggml.token_type_count u32              = 1
llama_model_loader: - kv  26:    tokenizer.ggml.remove_extra_whitespaces bool             = true
llama_model_loader: - kv  27:        tokenizer.ggml.precompiled_charsmap arr[u8,237539]   = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  31:          tokenizer.ggml.seperator_token_id u32              = 2
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  33:               tokenizer.ggml.mask_token_id u32              = 250001
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = true
llama_model_loader: - kv  36:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  389 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 4
llm_load_vocab: token to piece cache size = 2.1668 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = bert
llm_load_print_meta: vocab type       = UGM
llm_load_print_meta: n_vocab          = 250002
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 558.84 M
llm_load_print_meta: model size       = 2.08 GiB (32.00 BPW) 
llm_load_print_meta: general.name     = multilingual-e5-large-instruct
llm_load_print_meta: BOS token        = 0 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: SEP token        = 2 '</s>'
llm_load_print_meta: PAD token        = 1 '<pad>'
llm_load_print_meta: MASK token       = 250001 '[PAD250000]'
llm_load_print_meta: LF token         = 6 '▁'
llm_load_print_meta: EOG token        = 2 '</s>'
llm_load_print_meta: max token length = 48
llama_model_load: vocab only - skipping tensors
//ml/backend/ggml/ggml/src/ggml-cpu/ggml-cpu.c:8456: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
SIGSEGV: segmentation violation
PC=0x79a828624c47 m=3 sigcode=1 addr=0x204a03fe0
signal arrived during cgo execution

goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]:
runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0)
        runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0})
        _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f
github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?)
        github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295
github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?)
        github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20)
        github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410})
        github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2()
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce
runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6680, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000123f00)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5
net.(*netFD).accept(0xc000123f00)
        net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9
net.(*TCPListener).accept(0xc000146b40)
        net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e
net.(*TCPListener).Accept(0xc000146b40)
        net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0
net/http.(*onceCloseListener).Accept(0xc0004ac000?)
        <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964
net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40})
        net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe})
        github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834
github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?})
        github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245
github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902
github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd
runtime.main()
        runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc00003e080)
        runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x57d1d2b02080)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889
runtime.bgscavenge(0xc00003e080)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc0001cb500 m=nil [chan receive]:
runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?)
        runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0000ac310, 0x0, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 10 gp=0xc000504700 m=nil [chan receive]:
runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?)
        runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc
runtime.chanrecv1(0xc00069cd80?, 0xc000384808?)
        runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?)
        <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6
net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?)
        net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9
net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea
net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?)
        net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e
net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10})
        net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485

goroutine 28 gp=0xc000105500 m=nil [IO wait]:
runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?)
        runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce
runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6568, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa
net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?})
        net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05
net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?})
        net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205
net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?})
        <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405
net/http.(*connReader).backgroundRead(0xc0000b6630)
        net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1
created by net/http.(*connReader).startBackgroundRead in goroutine 10
        net/http/server.go:686 +0xb6

rax    0x204a03fe0
rbx    0x79a8241707b0
rcx    0xff8
rdx    0x79a824008980
rdi    0x79a824008990
rsi    0x0
rbp    0x79a82bdff2b0
rsp    0x79a82bdff290
r8     0x0
r9     0x79a833c2ea28
r10    0x0
r11    0x246
r12    0x79a741384540
r13    0x79a824008990
r14    0x0
r15    0x57d1e9240f70
rip    0x79a828624c47
rflags 0x10297
cs     0x33
fs     0x0
gs     0x0
SIGABRT: abort
PC=0x79a87a90500b m=3 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]:
runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0)
        runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0})
        _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f
github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?)
        github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295
github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?)
        github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20)
        github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410})
        github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2()
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce
runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6680, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000123f00)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5
net.(*netFD).accept(0xc000123f00)
        net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9
net.(*TCPListener).accept(0xc000146b40)
        net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e
net.(*TCPListener).Accept(0xc000146b40)
        net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0
net/http.(*onceCloseListener).Accept(0xc0004ac000?)
        <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964
net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40})
        net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe})
        github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834
github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?})
        github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245
github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902
github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd
runtime.main()
        runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc00003e080)
        runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x57d1d2b02080)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889
runtime.bgscavenge(0xc00003e080)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc0001cb500 m=nil [chan receive]:
runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?)
        runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0000ac310, 0x0, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]:
runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce
runtime.gcBgMarkWorker(0xc0000ad730)
        runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 10 gp=0xc000504700 m=nil [chan receive]:
runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?)
        runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce
runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc
runtime.chanrecv1(0xc00069cd80?, 0xc000384808?)
        runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?)
        <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6
net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?)
        net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9
net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0)
        net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea
net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?)
        net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e
net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10})
        net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485

goroutine 28 gp=0xc000105500 m=nil [IO wait]:
runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?)
        runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce
runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?)
        runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37
internal/poll.runtime_pollWait(0x79a833dc6568, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5
internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa
net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?})
        net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05
net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?})
        net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205
net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?})
        <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405
net/http.(*connReader).backgroundRead(0xc0000b6630)
        net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1
created by net/http.(*connReader).startBackgroundRead in goroutine 10
        net/http/server.go:686 +0xb6

rax    0x0
rbx    0x79a82be00700
rcx    0x79a87a90500b
rdx    0x0
rdi    0x2
rsi    0x79a82bdff2b0
rbp    0x79a8329a801d
rsp    0x79a82bdff2b0
r8     0x0
r9     0x79a82bdff2b0
r10    0x8
r11    0x246
r12    0x79a8329a8790
r13    0x2108
r14    0x1
r15    0x79a74132e040
rip    0x79a87a90500b
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
[GIN] 2025/03/04 - 15:02:11 | 500 |  6.507803808s |    192.168.1.75 | POST     "/api/embed"
time=2025-03-04T15:02:11.591Z level=ERROR source=routes.go:478 msg="embedding generation failed" error="do embedding request: Post \"http://127.0.0.1:40671/embedding\": EOF"
time=2025-03-04T15:02:11.600Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2"

OS

Linux

GPU

nVidia 3090RTX 24Gb RAM

CPU

Intel Core i7-12700kf

Ollama version

0.5.12

Originally created by @ProDG on GitHub (Mar 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9499 ### What is the issue? I faced an issue when trying to build embeddings from a text, containing a lot of meaningless punctuation symbols (which were used for visual formatting of table of contents, I believe). I'll provide a code based on LangChain, but the problem is definitely not in the LangChain itself, as you can see from the attached logs. Code to reproduce the issue: ```python from langchain_ollama import OllamaEmbeddings embeddings_model = OllamaEmbeddings( base_url=settings.OLLAMA_URL, model='jeffh/intfloat-multilingual-e5-large-instruct:f32', ) print(embeddings_model.embed_query("Hello. How are you? Fine. " * 50)) # this works without any issues print(embeddings_model.embed_query("Hello . . . . . . . . . . " * 50)) # this fails ``` I also tried other quantizations of the model: * `jeffh/intfloat-multilingual-e5-large-instruct:f16` * `jeffh/intfloat-multilingual-e5-large-instruct:q8_0` And a different model: * `zylonai/multilingual-e5-large:latest` Results were pretty much the same. On the same setup, tens of megabytes of other texts were processed successfully. Attached logs are from Ollama Docker container. It work in a VM with GPU (3090RTX with 24Gb of RAM), and there were no other issues with this setup. If needed, I can provide an exact text on which I faced the issue (it was a text chunk retrieved via `Unstructured` from a book, of length of 972 bytes if I remember correctly). ### Relevant log output ```shell time=2025-03-04T15:02:05.084Z level=WARN source=types.go:512 msg="invalid option provided" option=tfs_z time=2025-03-04T15:02:10.136Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.051770335 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:508 msg="updated VRAM based on existing loaded models" gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e library=cuda total="23.7 GiB" available="6.7 GiB" time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64 time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64 time=2025-03-04T15:02:10.255Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.255Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed gpu=GPU-d8622296-6d17-435e-57df-9631b117f22e parallel=1 available=7182680064 required="2.6 GiB" time=2025-03-04T15:02:10.297Z level=INFO source=server.go:97 msg="system memory" total="47.0 GiB" free="36.7 GiB" free_swap="2.7 MiB" time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.key_length default=64 time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.value_length default=64 time=2025-03-04T15:02:10.297Z level=WARN source=ggml.go:132 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-03-04T15:02:10.297Z level=INFO source=server.go:130 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[6.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="12.0 MiB" memory.required.allocations="[2.6 GiB]" memory.weights.total="1.1 GiB" memory.weights.repeating="188.6 MiB" memory.weights.nonrepeating="976.6 MiB" memory.graph.full="32.0 MiB" memory.graph.partial="32.0 MiB" time=2025-03-04T15:02:10.297Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 8 --parallel 1 --port 40671" time=2025-03-04T15:02:10.298Z level=INFO source=sched.go:450 msg="loaded runners" count=3 time=2025-03-04T15:02:10.298Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-04T15:02:10.298Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-03-04T15:02:10.308Z level=INFO source=runner.go:932 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 600,610,620,700,720,750,800,860,870,890,900 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=8 time=2025-03-04T15:02:10.334Z level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:40671" llama_load_model_from_file: using device CUDA0 (NVIDIA GeForce RTX 3090) - 6849 MiB free llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = multilingual-e5-large-instruct llama_model_loader: - kv 3: general.organization str = Tmp llama_model_loader: - kv 4: general.finetune str = instruct llama_model_loader: - kv 5: general.basename str = intfloat-multilingual-e5 llama_model_loader: - kv 6: general.size_label str = large llama_model_loader: - kv 7: general.license str = mit llama_model_loader: - kv 8: general.tags arr[str,3] = ["mteb", "sentence-transformers", "tr... llama_model_loader: - kv 9: general.languages arr[str,94] = ["multilingual", "af", "am", "ar", "a... llama_model_loader: - kv 10: bert.block_count u32 = 24 llama_model_loader: - kv 11: bert.context_length u32 = 512 llama_model_loader: - kv 12: bert.embedding_length u32 = 1024 llama_model_loader: - kv 13: bert.feed_forward_length u32 = 4096 llama_model_loader: - kv 14: bert.attention.head_count u32 = 16 llama_model_loader: - kv 15: bert.attention.layer_norm_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 0 llama_model_loader: - kv 17: bert.attention.causal bool = false llama_model_loader: - kv 18: bert.pooling_type u32 = 1 llama_model_loader: - kv 19: tokenizer.ggml.model str = t5 llama_model_loader: - kv 20: tokenizer.ggml.pre str = default time=2025-03-04T15:02:10.386Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3018933520000004 model=/root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed llama_model_loader: - kv 21: tokenizer.ggml.tokens arr[str,250002] = ["<s>", "<pad>", "</s>", "<unk>", ","... llama_model_loader: - kv 22: tokenizer.ggml.scores arr[f32,250002] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,250002] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.add_space_prefix bool = true llama_model_loader: - kv 25: tokenizer.ggml.token_type_count u32 = 1 llama_model_loader: - kv 26: tokenizer.ggml.remove_extra_whitespaces bool = true llama_model_loader: - kv 27: tokenizer.ggml.precompiled_charsmap arr[u8,237539] = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,... llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 31: tokenizer.ggml.seperator_token_id u32 = 2 llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 1 llama_model_loader: - kv 33: tokenizer.ggml.mask_token_id u32 = 250001 llama_model_loader: - kv 34: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 35: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 36: general.quantization_version u32 = 2 llama_model_loader: - type f32: 389 tensors time=2025-03-04T15:02:10.549Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 4 llm_load_vocab: token to piece cache size = 2.1668 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = bert llm_load_print_meta: vocab type = UGM llm_load_print_meta: n_vocab = 250002 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 512 llm_load_print_meta: n_embd = 1024 llm_load_print_meta: n_layer = 24 llm_load_print_meta: n_head = 16 llm_load_print_meta: n_head_kv = 16 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 0.0e+00 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 4096 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 0 llm_load_print_meta: pooling type = 1 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 512 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 335M llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 558.84 M llm_load_print_meta: model size = 2.08 GiB (32.00 BPW) llm_load_print_meta: general.name = multilingual-e5-large-instruct llm_load_print_meta: BOS token = 0 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: SEP token = 2 '</s>' llm_load_print_meta: PAD token = 1 '<pad>' llm_load_print_meta: MASK token = 250001 '[PAD250000]' llm_load_print_meta: LF token = 6 '▁' llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: max token length = 48 llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 25/25 layers to GPU llm_load_tensors: CUDA0 model buffer size = 1153.22 MiB llm_load_tensors: CPU_Mapped model buffer size = 978.58 MiB llama_new_context_with_model: n_seq_max = 1 llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_ctx_per_seq = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1 llama_kv_cache_init: CUDA0 KV buffer size = 192.00 MiB llama_new_context_with_model: KV self size = 192.00 MiB, K (f16): 96.00 MiB, V (f16): 96.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.00 MiB llama_new_context_with_model: CUDA0 compute buffer size = 26.00 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 6.00 MiB llama_new_context_with_model: graph nodes = 851 llama_new_context_with_model: graph splits = 4 (with bs=512), 2 (with bs=1) time=2025-03-04T15:02:11.051Z level=INFO source=server.go:596 msg="llama runner started in 0.75 seconds" llama_model_loader: loaded meta data with 37 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-136c89cb1c6ea901df358fe576ee9cd7501daa0f72e28526eca929e2bbeeb4ed (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = multilingual-e5-large-instruct llama_model_loader: - kv 3: general.organization str = Tmp llama_model_loader: - kv 4: general.finetune str = instruct llama_model_loader: - kv 5: general.basename str = intfloat-multilingual-e5 llama_model_loader: - kv 6: general.size_label str = large llama_model_loader: - kv 7: general.license str = mit llama_model_loader: - kv 8: general.tags arr[str,3] = ["mteb", "sentence-transformers", "tr... llama_model_loader: - kv 9: general.languages arr[str,94] = ["multilingual", "af", "am", "ar", "a... llama_model_loader: - kv 10: bert.block_count u32 = 24 llama_model_loader: - kv 11: bert.context_length u32 = 512 llama_model_loader: - kv 12: bert.embedding_length u32 = 1024 llama_model_loader: - kv 13: bert.feed_forward_length u32 = 4096 llama_model_loader: - kv 14: bert.attention.head_count u32 = 16 llama_model_loader: - kv 15: bert.attention.layer_norm_epsilon f32 = 0.000010 llama_model_loader: - kv 16: general.file_type u32 = 0 llama_model_loader: - kv 17: bert.attention.causal bool = false llama_model_loader: - kv 18: bert.pooling_type u32 = 1 llama_model_loader: - kv 19: tokenizer.ggml.model str = t5 llama_model_loader: - kv 20: tokenizer.ggml.pre str = default llama_model_loader: - kv 21: tokenizer.ggml.tokens arr[str,250002] = ["<s>", "<pad>", "</s>", "<unk>", ","... llama_model_loader: - kv 22: tokenizer.ggml.scores arr[f32,250002] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,250002] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.add_space_prefix bool = true llama_model_loader: - kv 25: tokenizer.ggml.token_type_count u32 = 1 llama_model_loader: - kv 26: tokenizer.ggml.remove_extra_whitespaces bool = true llama_model_loader: - kv 27: tokenizer.ggml.precompiled_charsmap arr[u8,237539] = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,... llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 31: tokenizer.ggml.seperator_token_id u32 = 2 llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 1 llama_model_loader: - kv 33: tokenizer.ggml.mask_token_id u32 = 250001 llama_model_loader: - kv 34: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 35: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 36: general.quantization_version u32 = 2 llama_model_loader: - type f32: 389 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 4 llm_load_vocab: token to piece cache size = 2.1668 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = bert llm_load_print_meta: vocab type = UGM llm_load_print_meta: n_vocab = 250002 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 1 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 558.84 M llm_load_print_meta: model size = 2.08 GiB (32.00 BPW) llm_load_print_meta: general.name = multilingual-e5-large-instruct llm_load_print_meta: BOS token = 0 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: SEP token = 2 '</s>' llm_load_print_meta: PAD token = 1 '<pad>' llm_load_print_meta: MASK token = 250001 '[PAD250000]' llm_load_print_meta: LF token = 6 '▁' llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: max token length = 48 llama_model_load: vocab only - skipping tensors //ml/backend/ggml/ggml/src/ggml-cpu/ggml-cpu.c:8456: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed SIGSEGV: segmentation violation PC=0x79a828624c47 m=3 sigcode=1 addr=0x204a03fe0 signal arrived during cgo execution goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]: runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0) runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0}) _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?) github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295 github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?) github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20) github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5 goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6680, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000123f00) internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5 net.(*netFD).accept(0xc000123f00) net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9 net.(*TCPListener).accept(0xc000146b40) net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e net.(*TCPListener).Accept(0xc000146b40) net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0 net/http.(*onceCloseListener).Accept(0xc0004ac000?) <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964 net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40}) net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe}) github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834 github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?}) github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245 github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902 github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd runtime.main() runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1 goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc00003e080) runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x57d1d2b02080) runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889 runtime.bgscavenge(0xc00003e080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?) runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 6 gp=0xc0001cb500 m=nil [chan receive]: runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?) runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0000ac310, 0x0, 0x1) runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]: runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 10 gp=0xc000504700 m=nil [chan receive]: runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?) runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1) runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc runtime.chanrecv1(0xc00069cd80?, 0xc000384808?) runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?) <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6 net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?) net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9 net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?) net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10}) net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890 net/http.(*Server).Serve.gowrap3() net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3360 +0x485 goroutine 28 gp=0xc000105500 m=nil [IO wait]: runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?) runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6568, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?}) net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05 net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?}) net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205 net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?}) <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405 net/http.(*connReader).backgroundRead(0xc0000b6630) net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1 created by net/http.(*connReader).startBackgroundRead in goroutine 10 net/http/server.go:686 +0xb6 rax 0x204a03fe0 rbx 0x79a8241707b0 rcx 0xff8 rdx 0x79a824008980 rdi 0x79a824008990 rsi 0x0 rbp 0x79a82bdff2b0 rsp 0x79a82bdff290 r8 0x0 r9 0x79a833c2ea28 r10 0x0 r11 0x246 r12 0x79a741384540 r13 0x79a824008990 r14 0x0 r15 0x57d1e9240f70 rip 0x79a828624c47 rflags 0x10297 cs 0x33 fs 0x0 gs 0x0 SIGABRT: abort PC=0x79a87a90500b m=3 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 24 gp=0xc000504380 m=3 mp=0xc00007ae08 [syscall]: runtime.cgocall(0x57d1d1cbace0, 0xc00008dba0) runtime/cgocall.go:167 +0x4b fp=0xc00008db78 sp=0xc00008db40 pc=0x57d1d10a5acb github.com/ollama/ollama/llama._Cfunc_llama_decode(0x79a7413845e0, {0x1, 0x79a741391980, 0x0, 0x0, 0x79a741392580, 0x79a741393180, 0x79a74149f980, 0x79a741356bc0}) _cgo_gotypes.go:545 +0x4f fp=0xc00008dba0 sp=0xc00008db78 pc=0x57d1d145b56f github.com/ollama/ollama/llama.(*Context).Decode.func1(0x57d1d147a48b?, 0x79a7413845e0?) github.com/ollama/ollama/llama/llama.go:163 +0xf5 fp=0xc00008dc90 sp=0xc00008dba0 pc=0x57d1d145e295 github.com/ollama/ollama/llama.(*Context).Decode(0x57d1d2bae480?, 0x0?) github.com/ollama/ollama/llama/llama.go:163 +0x13 fp=0xc00008dcd8 sp=0xc00008dc90 pc=0x57d1d145e113 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0001ad5f0, 0xc000112960, 0xc00008df20) github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23f fp=0xc00008dee0 sp=0xc00008dcd8 pc=0x57d1d147927f github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0001ad5f0, {0x57d1d230d920, 0xc000142410}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc00008dfb8 sp=0xc00008dee0 pc=0x57d1d1478cb5 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x57d1d147db48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x57d1d10b45a1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xdb5 goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc0005875c0 sp=0xc0005875a0 pc=0x57d1d10ac1ce runtime.netpollblock(0xc0004adf80?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc0005875f8 sp=0xc0005875c0 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6680, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000587618 sp=0xc0005875f8 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc000123f00?, 0x900000036?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587640 sp=0xc000587618 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000123f00) internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e8 sp=0xc000587640 pc=0x57d1d1138ad5 net.(*netFD).accept(0xc000123f00) net/fd_unix.go:172 +0x29 fp=0xc0005877a0 sp=0xc0005876e8 pc=0x57d1d11a1bc9 net.(*TCPListener).accept(0xc000146b40) net/tcpsock_posix.go:159 +0x1e fp=0xc0005877f0 sp=0xc0005877a0 pc=0x57d1d11b783e net.(*TCPListener).Accept(0xc000146b40) net/tcpsock.go:372 +0x30 fp=0xc000587820 sp=0xc0005877f0 pc=0x57d1d11b66f0 net/http.(*onceCloseListener).Accept(0xc0004ac000?) <autogenerated>:1 +0x24 fp=0xc000587838 sp=0xc000587820 pc=0x57d1d1400964 net/http.(*Server).Serve(0xc000154c30, {0x57d1d230b4f8, 0xc000146b40}) net/http/server.go:3330 +0x30c fp=0xc000587968 sp=0xc000587838 pc=0x57d1d13d88ec github.com/ollama/ollama/runner/llamarunner.Execute({0xc000036120, 0xe, 0xe}) github.com/ollama/ollama/runner/llamarunner/runner.go:994 +0x1174 fp=0xc000587d08 sp=0xc000587968 pc=0x57d1d147d834 github.com/ollama/ollama/runner.Execute({0xc000036110?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x57d1d16adc54 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000037400?, {0x57d1d1ea8050?, 0x4?, 0x57d1d1ea8054?}) github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x57d1d1cba245 github.com/spf13/cobra.(*Command).execute(0xc000175b08, {0xc00013e700, 0xe, 0xe}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc000587e78 sp=0xc000587d58 pc=0x57d1d121a902 github.com/spf13/cobra.(*Command).ExecuteC(0xc00046fb08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x57d1d121b145 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x57d1d1cba5cd runtime.main() runtime/proc.go:272 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x57d1d10774dd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x57d1d10b45a1 goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x57d1d1077818 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57d1d10b45a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc00003e080) runtime/mgcsweep.go:317 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x57d1d1061ebf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x57d1d1056505 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x57d1d205b6f8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x57d1d10ac1ce runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x57d1d2b02080) runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x57d1d105f889 runtime.bgscavenge(0xc00003e080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x57d1d105fe19 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x57d1d10564a5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x57d1d10b45a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: runtime.gopark(0xc000074648?, 0x57d1d104ca05?, 0xb0?, 0x1?, 0xc0000061c0?) runtime/proc.go:424 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x57d1d10ac1ce runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x57d1d1055587 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57d1d10b45a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 6 gp=0xc0001cb500 m=nil [chan receive]: runtime.gopark(0xc000076760?, 0x57d1d1189245?, 0x60?, 0x69?, 0x57d1d2320280?) runtime/proc.go:424 +0xce fp=0xc000076718 sp=0xc0000766f8 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0000ac310, 0x0, 0x1) runtime/chan.go:639 +0x41c fp=0xc000076790 sp=0xc000076718 pc=0x57d1d1045bfc runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc0000767b8 sp=0xc000076790 pc=0x57d1d10457b2 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc0000767e0 sp=0xc0000767b8 pc=0x57d1d105956f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x57d1d10b45a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 goroutine 7 gp=0xc0001cba40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000076f38 sp=0xc000076f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000076fc8 sp=0xc000076f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 8 gp=0xc0001cbc00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077738 sp=0xc000077718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000777c8 sp=0xc000077738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000777e0 sp=0xc0000777c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000777e8 sp=0xc0000777e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 9 gp=0xc0001cbdc0 m=nil [GC worker (idle)]: runtime.gopark(0x1d75291645414?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000077f38 sp=0xc000077f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000077fc8 sp=0xc000077f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000077fe0 sp=0xc000077fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000077fe8 sp=0xc000077fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 18 gp=0xc000104380 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163edb3?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070738 sp=0xc000070718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000707c8 sp=0xc000070738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000707e0 sp=0xc0000707c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 19 gp=0xc000104540 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163ed06?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 20 gp=0xc000104700 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163d411?, 0x3?, 0xb?, 0x17?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 21 gp=0xc0001048c0 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529163e7ab?, 0x3?, 0x85?, 0x1a?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 22 gp=0xc000104a80 m=nil [GC worker (idle)]: runtime.gopark(0x1d7529161f9a1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000072738 sp=0xc000072718 pc=0x57d1d10ac1ce runtime.gcBgMarkWorker(0xc0000ad730) runtime/mgc.go:1412 +0xe9 fp=0xc0000727c8 sp=0xc000072738 pc=0x57d1d1058869 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1328 +0x25 fp=0xc0000727e0 sp=0xc0000727c8 pc=0x57d1d1058745 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57d1d10b45a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1328 +0x105 goroutine 10 gp=0xc000504700 m=nil [chan receive]: runtime.gopark(0x57d1d10b25b4?, 0xc00021f898?, 0xd0?, 0xa2?, 0xc00021f880?) runtime/proc.go:424 +0xce fp=0xc00021f860 sp=0xc00021f840 pc=0x57d1d10ac1ce runtime.chanrecv(0xc0002f5ab0, 0xc00021fa10, 0x1) runtime/chan.go:639 +0x41c fp=0xc00021f8d8 sp=0xc00021f860 pc=0x57d1d1045bfc runtime.chanrecv1(0xc00069cd80?, 0xc000384808?) runtime/chan.go:489 +0x12 fp=0xc00021f900 sp=0xc00021f8d8 pc=0x57d1d10457b2 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0001ad5f0, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) github.com/ollama/ollama/runner/llamarunner/runner.go:783 +0x746 fp=0xc00021fac0 sp=0xc00021f900 pc=0x57d1d147bc06 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x57d1d230b708?, 0xc00013eb60?}, 0x57d1d13e26c7?) <autogenerated>:1 +0x36 fp=0xc00021faf0 sp=0xc00021fac0 pc=0x57d1d147dff6 net/http.HandlerFunc.ServeHTTP(0xc00013e8c0?, {0x57d1d230b708?, 0xc00013eb60?}, 0x0?) net/http/server.go:2220 +0x29 fp=0xc00021fb18 sp=0xc00021faf0 pc=0x57d1d13d4ee9 net/http.(*ServeMux).ServeHTTP(0x57d1d104ca05?, {0x57d1d230b708, 0xc00013eb60}, 0xc0004823c0) net/http/server.go:2747 +0x1ca fp=0xc00021fb68 sp=0xc00021fb18 pc=0x57d1d13d6dea net/http.serverHandler.ServeHTTP({0x57d1d23080d0?}, {0x57d1d230b708?, 0xc00013eb60?}, 0x6?) net/http/server.go:3210 +0x8e fp=0xc00021fb98 sp=0xc00021fb68 pc=0x57d1d13f434e net/http.(*conn).serve(0xc0004ac000, {0x57d1d230d8e8, 0xc00069cb10}) net/http/server.go:2092 +0x5d0 fp=0xc00021ffb8 sp=0xc00021fb98 pc=0x57d1d13d3890 net/http.(*Server).Serve.gowrap3() net/http/server.go:3360 +0x28 fp=0xc00021ffe0 sp=0xc00021ffb8 pc=0x57d1d13d8ce8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x57d1d10b45a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3360 +0x485 goroutine 28 gp=0xc000105500 m=nil [IO wait]: runtime.gopark(0x57d1d1050ee5?, 0xc00001de00?, 0xa5?, 0x10?, 0xb?) runtime/proc.go:424 +0xce fp=0xc00001dda8 sp=0xc00001dd88 pc=0x57d1d10ac1ce runtime.netpollblock(0x57d1d10cf6b8?, 0xd1042fe6?, 0xd1?) runtime/netpoll.go:575 +0xf7 fp=0xc00001dde0 sp=0xc00001dda8 pc=0x57d1d106fe37 internal/poll.runtime_pollWait(0x79a833dc6568, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00001de00 sp=0xc00001dde0 pc=0x57d1d10ab4c5 internal/poll.(*pollDesc).wait(0xc0004aa000?, 0xc0000b6641?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00001de28 sp=0xc00001de00 pc=0x57d1d1133707 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0004aa000, {0xc0000b6641, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc00001dec0 sp=0xc00001de28 pc=0x57d1d11349fa net.(*netFD).Read(0xc0004aa000, {0xc0000b6641?, 0xc00001df48?, 0x57d1d10ade50?}) net/fd_posix.go:55 +0x25 fp=0xc00001df08 sp=0xc00001dec0 pc=0x57d1d119fc05 net.(*conn).Read(0xc000126030, {0xc0000b6641?, 0x0?, 0x57d1d2bae480?}) net/net.go:189 +0x45 fp=0xc00001df50 sp=0xc00001df08 pc=0x57d1d11ae205 net.(*TCPConn).Read(0x57d1d2a5ff60?, {0xc0000b6641?, 0x0?, 0x0?}) <autogenerated>:1 +0x25 fp=0xc00001df80 sp=0xc00001df50 pc=0x57d1d11c1405 net/http.(*connReader).backgroundRead(0xc0000b6630) net/http/server.go:690 +0x37 fp=0xc00001dfc8 sp=0xc00001df80 pc=0x57d1d13ce217 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc00001dfe0 sp=0xc00001dfc8 pc=0x57d1d13ce145 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00001dfe8 sp=0xc00001dfe0 pc=0x57d1d10b45a1 created by net/http.(*connReader).startBackgroundRead in goroutine 10 net/http/server.go:686 +0xb6 rax 0x0 rbx 0x79a82be00700 rcx 0x79a87a90500b rdx 0x0 rdi 0x2 rsi 0x79a82bdff2b0 rbp 0x79a8329a801d rsp 0x79a82bdff2b0 r8 0x0 r9 0x79a82bdff2b0 r10 0x8 r11 0x246 r12 0x79a8329a8790 r13 0x2108 r14 0x1 r15 0x79a74132e040 rip 0x79a87a90500b rflags 0x246 cs 0x33 fs 0x0 gs 0x0 [GIN] 2025/03/04 - 15:02:11 | 500 | 6.507803808s | 192.168.1.75 | POST "/api/embed" time=2025-03-04T15:02:11.591Z level=ERROR source=routes.go:478 msg="embedding generation failed" error="do embedding request: Post \"http://127.0.0.1:40671/embedding\": EOF" time=2025-03-04T15:02:11.600Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2" ``` ### OS Linux ### GPU nVidia 3090RTX 24Gb RAM ### CPU Intel Core i7-12700kf ### Ollama version 0.5.12
GiteaMirror added the bug label 2026-04-12 17:33:25 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow

https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109

<!-- gh-comment-id:2698328033 --> @rick-github commented on GitHub (Mar 4, 2025): ``` llama_new_context_with_model: n_ctx_pre_seq (2048) > n_ctx_train (512) -- possible training context overflow ``` https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109
Author
Owner

@ProDG commented on GitHub (Mar 5, 2025):

@rick-github thank you, this is helpful.

In general, it feels like a design flaw, and I can't even say if it is Ollama or LangChain or Ustructured or all of them.

In RAG pipelines, document is split into chunks (whose size limit is set in characters). And then, when the chunks are fed into Ollama to generate embeddings, there is a context size limit in tokens. And, as I understand, there is no way to ensure that the particular 1000 characters do not exceeed 510 tokens other than trying to generate the embedding and see if it will not fail.

For now, I'll probably just add a code which will cleanup " . " sequences from the initial text — it will help in this particular case and similar ones. And also will wrap the embedding generation call into try .. except to not fail the whole pipeline at one problematic chunk.

But both 'fixes' are ugly. Any suggestions are appreciated.

<!-- gh-comment-id:2700242582 --> @ProDG commented on GitHub (Mar 5, 2025): @rick-github thank you, this is helpful. In general, it feels like a design flaw, and I can't even say if it is Ollama or LangChain or Ustructured or all of them. In RAG pipelines, document is split into chunks (whose size limit is set in _characters_). And then, when the chunks are fed into Ollama to generate embeddings, there is a context size limit in _tokens_. And, as I understand, there is no way to ensure that the particular 1000 characters do not exceeed 510 tokens other than trying to generate the embedding and see if it will not fail. For now, I'll probably just add a code which will cleanup `" . "` sequences from the initial text — it will help in this particular case and similar ones. And also will wrap the embedding generation call into `try .. except` to not fail the whole pipeline at one problematic chunk. But both 'fixes' are ugly. Any suggestions are appreciated.
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

If you set num_ctx to 512, it won't fail. The issue is that the chunk size of the pipeline is too big. Ignoring the ollama failure, if you send a chunk for embedding that's too big for the context window of the model, then the embedding you get back is going to be missing information. It that's acceptable for the RAG application, then just set num_ctx. If not, then either the chunk size needs to be set to a static size that doesn't exceed the context buffer, or it needs to be dynamically determined during the embedding. You're right that the tokenization is opaque, see #3582. Until then the only option is to determine a character:token ratio and use that to set the chunk size.

$ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello. How are you? Fine. " * 20, end="")')"'"}' | jq .prompt_eval_count
160
$ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello . . . . . . . . . . " * 20, end="")')"'"}' | jq .prompt_eval_count
420
$ python -c 'print("Hello . . . . . . . . . . " * 20, end="")' | wc -c
520

So about 520:420, or a chunk size of (512 * (520 / 420)) = 633.

<!-- gh-comment-id:2700849232 --> @rick-github commented on GitHub (Mar 5, 2025): If you set `num_ctx` to 512, it won't fail. The issue is that the chunk size of the pipeline is too big. Ignoring the ollama failure, if you send a chunk for embedding that's too big for the context window of the model, then the embedding you get back is going to be missing information. It that's acceptable for the RAG application, then just set `num_ctx`. If not, then either the chunk size needs to be set to a static size that doesn't exceed the context buffer, or it needs to be dynamically determined during the embedding. You're right that the tokenization is opaque, see #3582. Until then the only option is to determine a character:token ratio and use that to set the chunk size. ```console $ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello. How are you? Fine. " * 20, end="")')"'"}' | jq .prompt_eval_count 160 $ curl -s localhost:11434/api/embed -d '{"model":"jeffh/intfloat-multilingual-e5-large-instruct:f32","input":"'"$(python -c 'print("Hello . . . . . . . . . . " * 20, end="")')"'"}' | jq .prompt_eval_count 420 $ python -c 'print("Hello . . . . . . . . . . " * 20, end="")' | wc -c 520 ``` So about 520:420, or a chunk size of (512 * (520 / 420)) = 633.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6188