[GH-ISSUE #12585] Ollama v0.12.5: nomic-embed-text is failing multiple times #70410

Open
opened 2026-05-04 21:26:52 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @premsai2030 on GitHub (Oct 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12585

Originally assigned to: @jmorganca on GitHub.

What is the issue?

I am using ollama api to call the nomic-embed-text model, it is failing multiple times. It's not like the text is having issues. Assume if i run the same api again for 10 times, the success rate is 50-50 which is only happening in newer version only.

Relevant log output

500 Server Error: Internal Server Error for url: http://localhost:11434/api/embeddings

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

ollama version is 0.12.5

Originally created by @premsai2030 on GitHub (Oct 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12585 Originally assigned to: @jmorganca on GitHub. ### What is the issue? I am using ollama api to call the nomic-embed-text model, it is failing multiple times. It's not like the text is having issues. Assume if i run the same api again for 10 times, the success rate is 50-50 which is only happening in newer version only. ### Relevant log output ```shell 500 Server Error: Internal Server Error for url: http://localhost:11434/api/embeddings ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version ollama version is 0.12.5
GiteaMirror added the embeddingsbug labels 2026-05-04 21:26:53 -05:00
Author
Owner

@phrozen commented on GitHub (Oct 12, 2025):

I'll just add here instead of opening a new one. I am using Qwen3-Embedding:8B and testing both Q8 and Q4_K_M versions, and as of this morning (when I updated to 0.12.5), I am getting a TON of zero/infinite vectors while embedding with batches with different dimensions (1024, 2048, 4096). That data was fine yesterday, so this is definitely an issue with the latest version. This was caught by integration testing, and pinned back to the latest version (0.12.5).

Edit: Confirmed, downgraded to 0.12.3 and my embeddings are back to normal now. I caution anyone to not upgrade if using embedding models. This is on Mac OSX Tahoe 26.0.1 (M2 Max)

<!-- gh-comment-id:3394991312 --> @phrozen commented on GitHub (Oct 12, 2025): I'll just add here instead of opening a new one. I am using `Qwen3-Embedding:8B` and testing both `Q8` and `Q4_K_M` versions, and as of this morning (when I updated to `0.12.5`), I am getting a TON of zero/infinite vectors while embedding with batches with different dimensions (1024, 2048, 4096). That data was fine yesterday, so this is definitely an issue with the latest version. This was caught by integration testing, and pinned back to the latest version (`0.12.5`). **Edit:** Confirmed, downgraded to `0.12.3` and my embeddings are back to normal now. I caution anyone to not upgrade if using embedding models. This is on Mac OSX Tahoe 26.0.1 (M2 Max)
Author
Owner

@frikishaan commented on GitHub (Oct 13, 2025):

Embeddings not working for me too. The API is not responding. I am also not able to run any text models.
I have tried with embeddinggemma:300m-qat-q8_0 and nomic-embed-text:latest. After updating this morning to v0.12.5.

Platform - Windows 10

<!-- gh-comment-id:3396394191 --> @frikishaan commented on GitHub (Oct 13, 2025): Embeddings not working for me too. The API is not responding. I am also not able to run any text models. I have tried with `embeddinggemma:300m-qat-q8_0` and `nomic-embed-text:latest`. After updating this morning to `v0.12.5`. **Platform** - Windows 10
Author
Owner

@xpomul commented on GitHub (Oct 13, 2025):

I have the same issue. MacBook Pro M1 Mac.

Log output:

msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:4h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/stefan/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
....
time=2025-10-13T17:02:25.698+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:8 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M1 Max) (unknown id) - 53083 MiB free
time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding"
time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/stefan/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.14 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context:      Metal compute buffer size =    23.51 MiB
llama_context:        CPU compute buffer size =     4.01 MiB
llama_context: graph nodes  = 371
llama_context: graph splits = 2
time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds"
time=2025-10-13T17:02:25.951+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1
time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding"
time=2025-10-13T17:02:25.952+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds"
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB
init: embeddings required but some input tokens were not marked as outputs -> overriding
SIGTRAP: trace trap
PC=0x1815b6b1c m=4 sigcode=0
signal arrived during cgo execution

goroutine 22 gp=0x14000503180 m=4 mp=0x14000073808 [syscall]:
runtime.cgocall(0x1051458ec, 0x14000687ba8)
	runtime/cgocall.go:167 +0x44 fp=0x14000687b70 sp=0x14000687b30 pc=0x10465e824
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1062bd2f0, {0x200, 0xcb70ce800, 0x0, 0xcb6820800, 0x1062c5570, 0xcb6f2a800, 0x1062bfb50})
	_cgo_gotypes.go:674 +0x30 fp=0x14000687ba0 sp=0x14000687b70 pc=0x1049a4320
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:158
github.com/ollama/ollama/llama.(*Context).Decode(0x140003e2c08?, 0xb87?)
	github.com/ollama/ollama/llama/llama.go:158 +0xcc fp=0x14000687c90 sp=0x14000687ba0 pc=0x1049a63cc
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000421680, 0x14000386140, 0x14000687f18)
	github.com/ollama/ollama/runner/llamarunner/runner.go:439 +0x1c8 fp=0x14000687ed0 sp=0x14000687c90 pc=0x104a4aac8
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000421680, {0x1058430c0, 0x1400004f630})
	github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x15c fp=0x14000687fa0 sp=0x14000687ed0 pc=0x104a4a79c
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x2c fp=0x14000687fd0 sp=0x14000687fa0 pc=0x104a4e43c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000687fd0 sp=0x14000687fd0 pc=0x104669f94
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x418

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140000b1720 sp=0x140000b1700 pc=0x104661bf0
runtime.netpollblock(0x140000b17b8?, 0x46e408c?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x140000b1760 sp=0x140000b1720 pc=0x1046277c0
internal/poll.runtime_pollWait(0x131a2fc00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x140000b1790 sp=0x140000b1760 pc=0x104660e20
internal/poll.(*pollDesc).wait(0x140001d8000?, 0x1046e611c?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140000b17c0 sp=0x140000b1790 pc=0x1046dfb88
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x140001d8000)
	internal/poll/fd_unix.go:613 +0x21c fp=0x140000b1870 sp=0x140000b17c0 pc=0x1046e416c
net.(*netFD).accept(0x140001d8000)
	net/fd_unix.go:161 +0x28 fp=0x140000b1930 sp=0x140000b1870 pc=0x104745ed8
net.(*TCPListener).accept(0x14000428000)
	net/tcpsock_posix.go:159 +0x24 fp=0x140000b1980 sp=0x140000b1930 pc=0x104759544
net.(*TCPListener).Accept(0x14000428000)
	net/tcpsock.go:380 +0x2c fp=0x140000b19c0 sp=0x140000b1980 pc=0x1047585ec
net/http.(*onceCloseListener).Accept(0x140000c0090?)
	<autogenerated>:1 +0x2c fp=0x140000b19e0 sp=0x140000b19c0 pc=0x10492ce5c
net/http.(*Server).Serve(0x140001fa700, {0x105840a88, 0x14000428000})
	net/http/server.go:3463 +0x24c fp=0x140000b1b10 sp=0x140000b19e0 pc=0x104907fec
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000136140, 0x4, 0x4})
	github.com/ollama/ollama/runner/llamarunner/runner.go:901 +0x754 fp=0x140000b1ce0 sp=0x140000b1b10 pc=0x104a4e254
github.com/ollama/ollama/runner.Execute({0x14000136130?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140000b1d10 sp=0x140000b1ce0 pc=0x104abe5f8
github.com/ollama/ollama/cmd.NewCLI.func2(0x140000a8b00?, {0x1053973cb?, 0x4?, 0x1053973cf?})
	github.com/ollama/ollama/cmd/cmd.go:1769 +0x50 fp=0x140000b1d40 sp=0x140000b1d10 pc=0x1050f67d0
github.com/spf13/cobra.(*Command).execute(0x140001fd208, {0x140005aba80, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140000b1e60 sp=0x140000b1d40 pc=0x1047b0660
github.com/spf13/cobra.(*Command).ExecuteC(0x140005bec08)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140000b1f20 sp=0x140000b1e60 pc=0x1047b0d3c
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x140000b1f40 sp=0x140000b1f20 pc=0x1050f72e4
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x140000b1fd0 sp=0x140000b1f40 pc=0x10462e278
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000b1fd0 sp=0x140000b1fd0 pc=0x104669f94

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10462e5c4
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104669f94
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006d760 sp=0x1400006d740 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x14000098000)
	runtime/mgcsweep.go:323 +0x104 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x104619084
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x10460ccd8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104669f94
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x105553498?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006df60 sp=0x1400006df40 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x106128ba0)
	runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x104616b9c
runtime.bgscavenge(0x14000098000)
	runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10461713c
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x10460cc78
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104669f94
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 18 gp=0x14000102700 m=nil [finalizer wait]:
runtime.gopark(0x105700f80?, 0x1e?, 0xc8?, 0xc5?, 0x1000000000000?)
	runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x104661bf0
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x10460bcc4
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104669f94
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 19 gp=0x14000103180 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068740 sp=0x14000068720 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x106129a80)
	runtime/mcleanup.go:439 +0x110 fp=0x14000068780 sp=0x14000068740 pc=0x1046091b0
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x140000687d0 sp=0x14000068780 pc=0x1046099c0
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104669f94
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 20 gp=0x14000103500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068f10 sp=0x14000068ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000068fb0 sp=0x14000068f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 34 gp=0x14000502380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000518710 sp=0x140005186f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x140005187b0 sp=0x14000518710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140005187d0 sp=0x140005187b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140005187d0 sp=0x140005187d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 35 gp=0x14000502540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000518f10 sp=0x14000518ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000518fb0 sp=0x14000518f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000518fd0 sp=0x14000518fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000518fd0 sp=0x14000518fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 36 gp=0x14000502700 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b3226?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000680f10 sp=0x14000680ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000680fb0 sp=0x14000680f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000680fd0 sp=0x14000680fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000680fd0 sp=0x14000680fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 21 gp=0x140001036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b656b?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b13b6?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000684f10 sp=0x14000684ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000684fb0 sp=0x14000684f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000684fd0 sp=0x14000684fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000684fd0 sp=0x14000684fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 37 gp=0x140005028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b66e2?, 0x3?, 0xc?, 0x5?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000519f10 sp=0x14000519ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000519fb0 sp=0x14000519f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000519fd0 sp=0x14000519fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000519fd0 sp=0x14000519fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b6665?, 0x3?, 0x4c?, 0x1d?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 8 gp=0x14000003dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9add83?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140000b9f10 sp=0x140000b9ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x140000b9fb0 sp=0x140000b9f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000b9fd0 sp=0x140000b9fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000b9fd0 sp=0x140000b9fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 9 gp=0x14000582380 m=nil [chan receive]:
runtime.gopark(0x14000049868?, 0x104642734?, 0x98?, 0x98?, 0x104663c8c?)
	runtime/proc.go:460 +0xc0 fp=0x14000049850 sp=0x14000049830 pc=0x104661bf0
runtime.chanrecv(0x140003261c0, 0x14000049a40, 0x1)
	runtime/chan.go:667 +0x428 fp=0x140000498d0 sp=0x14000049850 pc=0x1045fc318
runtime.chanrecv1(0x140002742d0?, 0x14000400000?)
	runtime/chan.go:509 +0x14 fp=0x14000049900 sp=0x140000498d0 pc=0x1045fbeb4
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000421680, {0x105840c68, 0x1400001a3c0}, 0x14000014b40)
	github.com/ollama/ollama/runner/llamarunner/runner.go:712 +0x5a0 fp=0x14000049a90 sp=0x14000049900 pc=0x104a4c910
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x105840c68?, 0x1400001a3c0?}, 0x14000049b18?)
	<autogenerated>:1 +0x40 fp=0x14000049ac0 sp=0x14000049a90 pc=0x104a4e760
net/http.HandlerFunc.ServeHTTP(0x14000162000?, {0x105840c68?, 0x1400001a3c0?}, 0x14000049b00?)
	net/http/server.go:2322 +0x38 fp=0x14000049af0 sp=0x14000049ac0 pc=0x104904c28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x105840c68, 0x1400001a3c0}, 0x14000014b40)
	net/http/server.go:2861 +0x190 fp=0x14000049b40 sp=0x14000049af0 pc=0x1049066c0
net/http.serverHandler.ServeHTTP({0x10583d6d0?}, {0x105840c68?, 0x1400001a3c0?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x14000049b70 sp=0x14000049b40 pc=0x104920eb0
net/http.(*conn).serve(0x140000c0090, {0x105843088, 0x140007749c0})
	net/http/server.go:2109 +0x528 fp=0x14000049fa0 sp=0x14000049b70 pc=0x104903018
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x14000049fd0 sp=0x14000049fa0 pc=0x10490834c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000049fd0 sp=0x14000049fd0 pc=0x104669f94
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 42 gp=0x14000602540 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x104685c20?)
	runtime/proc.go:460 +0xc0 fp=0x1400017d580 sp=0x1400017d560 pc=0x104661bf0
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x1400017d5c0 sp=0x1400017d580 pc=0x1046277c0
internal/poll.runtime_pollWait(0x131a2fa00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x1400017d5f0 sp=0x1400017d5c0 pc=0x104660e20
internal/poll.(*pollDesc).wait(0x140001d8a80?, 0x14000428061?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400017d620 sp=0x1400017d5f0 pc=0x1046dfb88
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140001d8a80, {0x14000428061, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400017d6c0 sp=0x1400017d620 pc=0x1046e0da0
net.(*netFD).Read(0x140001d8a80, {0x14000428061?, 0x106065380?, 0x14000428114?})
	net/fd_posix.go:68 +0x28 fp=0x1400017d710 sp=0x1400017d6c0 pc=0x1047446d8
net.(*conn).Read(0x1400011c180, {0x14000428061?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x1400017d760 sp=0x1400017d710 pc=0x104750cb4
net/http.(*connReader).backgroundRead(0x14000428040)
	net/http/server.go:702 +0x38 fp=0x1400017d7b0 sp=0x1400017d760 pc=0x1048fe088
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x1400017d7d0 sp=0x1400017d7b0 pc=0x1048fdf78
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400017d7d0 sp=0x1400017d7d0 pc=0x104669f94
created by net/http.(*connReader).startBackgroundRead in goroutine 9
	net/http/server.go:698 +0xb8

r0      0x1067fc000
r1      0x1067ffba0
r2      0x0
r3      0x106802fc0
r4      0xcb710d800
r5      0x0
r6      0xffffffffbfc007ff
r7      0xfffff0003ffff800
r8      0xcb710d800
r9      0x0
r10     0x300100401980
r11     0xbf8192bdc0558a82
r12     0xbf3701bb3eff790e
r13     0x546319e169533ed5
r14     0x1068454c8
r15     0xcb710c000
r16     0x2839d9e88
r17     0xffffffffb00007ff
r18     0x0
r19     0xc00
r20     0xc00414634b5
r21     0x0
r22     0x1062bd430
r23     0x0
r24     0x300
r25     0x300
r26     0x200
r27     0x1062bd430
r28     0x0
r29     0x16d02acd0
lr      0x181748a78
sp      0x16d02ac60
pc      0x1815b6b1c
fault   0x1815b6b1c
<!-- gh-comment-id:3397980685 --> @xpomul commented on GitHub (Oct 13, 2025): I have the same issue. MacBook Pro M1 Mac. Log output: ``` msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:4h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/stefan/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" .... time=2025-10-13T17:02:25.698+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:8 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M1 Max) (unknown id) - 53083 MiB free time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding" time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/stefan/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 768 print_info: n_layer = 12 print_info: n_head = 12 print_info: n_head_kv = 12 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 768 print_info: n_embd_v_gqa = 768 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 3072 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: model type = 137M print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 12 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 44.72 MiB load_tensors: Metal_Mapped model buffer size = 216.14 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M1 Max ggml_metal_init: use bfloat = true ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.12 MiB llama_context: Metal compute buffer size = 23.51 MiB llama_context: CPU compute buffer size = 4.01 MiB llama_context: graph nodes = 371 llama_context: graph splits = 2 time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds" time=2025-10-13T17:02:25.951+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1 time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding" time=2025-10-13T17:02:25.952+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds" init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB init: embeddings required but some input tokens were not marked as outputs -> overriding SIGTRAP: trace trap PC=0x1815b6b1c m=4 sigcode=0 signal arrived during cgo execution goroutine 22 gp=0x14000503180 m=4 mp=0x14000073808 [syscall]: runtime.cgocall(0x1051458ec, 0x14000687ba8) runtime/cgocall.go:167 +0x44 fp=0x14000687b70 sp=0x14000687b30 pc=0x10465e824 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1062bd2f0, {0x200, 0xcb70ce800, 0x0, 0xcb6820800, 0x1062c5570, 0xcb6f2a800, 0x1062bfb50}) _cgo_gotypes.go:674 +0x30 fp=0x14000687ba0 sp=0x14000687b70 pc=0x1049a4320 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:158 github.com/ollama/ollama/llama.(*Context).Decode(0x140003e2c08?, 0xb87?) github.com/ollama/ollama/llama/llama.go:158 +0xcc fp=0x14000687c90 sp=0x14000687ba0 pc=0x1049a63cc github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000421680, 0x14000386140, 0x14000687f18) github.com/ollama/ollama/runner/llamarunner/runner.go:439 +0x1c8 fp=0x14000687ed0 sp=0x14000687c90 pc=0x104a4aac8 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000421680, {0x1058430c0, 0x1400004f630}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x15c fp=0x14000687fa0 sp=0x14000687ed0 pc=0x104a4a79c github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x2c fp=0x14000687fd0 sp=0x14000687fa0 pc=0x104a4e43c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000687fd0 sp=0x14000687fd0 pc=0x104669f94 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x418 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140000b1720 sp=0x140000b1700 pc=0x104661bf0 runtime.netpollblock(0x140000b17b8?, 0x46e408c?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x140000b1760 sp=0x140000b1720 pc=0x1046277c0 internal/poll.runtime_pollWait(0x131a2fc00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x140000b1790 sp=0x140000b1760 pc=0x104660e20 internal/poll.(*pollDesc).wait(0x140001d8000?, 0x1046e611c?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140000b17c0 sp=0x140000b1790 pc=0x1046dfb88 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x140001d8000) internal/poll/fd_unix.go:613 +0x21c fp=0x140000b1870 sp=0x140000b17c0 pc=0x1046e416c net.(*netFD).accept(0x140001d8000) net/fd_unix.go:161 +0x28 fp=0x140000b1930 sp=0x140000b1870 pc=0x104745ed8 net.(*TCPListener).accept(0x14000428000) net/tcpsock_posix.go:159 +0x24 fp=0x140000b1980 sp=0x140000b1930 pc=0x104759544 net.(*TCPListener).Accept(0x14000428000) net/tcpsock.go:380 +0x2c fp=0x140000b19c0 sp=0x140000b1980 pc=0x1047585ec net/http.(*onceCloseListener).Accept(0x140000c0090?) <autogenerated>:1 +0x2c fp=0x140000b19e0 sp=0x140000b19c0 pc=0x10492ce5c net/http.(*Server).Serve(0x140001fa700, {0x105840a88, 0x14000428000}) net/http/server.go:3463 +0x24c fp=0x140000b1b10 sp=0x140000b19e0 pc=0x104907fec github.com/ollama/ollama/runner/llamarunner.Execute({0x14000136140, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:901 +0x754 fp=0x140000b1ce0 sp=0x140000b1b10 pc=0x104a4e254 github.com/ollama/ollama/runner.Execute({0x14000136130?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140000b1d10 sp=0x140000b1ce0 pc=0x104abe5f8 github.com/ollama/ollama/cmd.NewCLI.func2(0x140000a8b00?, {0x1053973cb?, 0x4?, 0x1053973cf?}) github.com/ollama/ollama/cmd/cmd.go:1769 +0x50 fp=0x140000b1d40 sp=0x140000b1d10 pc=0x1050f67d0 github.com/spf13/cobra.(*Command).execute(0x140001fd208, {0x140005aba80, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140000b1e60 sp=0x140000b1d40 pc=0x1047b0660 github.com/spf13/cobra.(*Command).ExecuteC(0x140005bec08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140000b1f20 sp=0x140000b1e60 pc=0x1047b0d3c github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x140000b1f40 sp=0x140000b1f20 pc=0x1050f72e4 runtime.main() runtime/proc.go:285 +0x278 fp=0x140000b1fd0 sp=0x140000b1f40 pc=0x10462e278 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000b1fd0 sp=0x140000b1fd0 pc=0x104669f94 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10462e5c4 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104669f94 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006d760 sp=0x1400006d740 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x14000098000) runtime/mgcsweep.go:323 +0x104 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x104619084 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x10460ccd8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104669f94 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x105553498?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006df60 sp=0x1400006df40 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x106128ba0) runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x104616b9c runtime.bgscavenge(0x14000098000) runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10461713c runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x10460cc78 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104669f94 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 18 gp=0x14000102700 m=nil [finalizer wait]: runtime.gopark(0x105700f80?, 0x1e?, 0xc8?, 0xc5?, 0x1000000000000?) runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x104661bf0 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x10460bcc4 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104669f94 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 19 gp=0x14000103180 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068740 sp=0x14000068720 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x106129a80) runtime/mcleanup.go:439 +0x110 fp=0x14000068780 sp=0x14000068740 pc=0x1046091b0 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x140000687d0 sp=0x14000068780 pc=0x1046099c0 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104669f94 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 20 gp=0x14000103500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068f10 sp=0x14000068ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000068fb0 sp=0x14000068f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 34 gp=0x14000502380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000518710 sp=0x140005186f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x140005187b0 sp=0x14000518710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140005187d0 sp=0x140005187b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140005187d0 sp=0x140005187d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 35 gp=0x14000502540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000518f10 sp=0x14000518ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000518fb0 sp=0x14000518f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000518fd0 sp=0x14000518fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000518fd0 sp=0x14000518fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 36 gp=0x14000502700 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b3226?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000680f10 sp=0x14000680ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000680fb0 sp=0x14000680f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000680fd0 sp=0x14000680fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000680fd0 sp=0x14000680fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 21 gp=0x140001036c0 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b656b?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b13b6?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000684f10 sp=0x14000684ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000684fb0 sp=0x14000684f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000684fd0 sp=0x14000684fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000684fd0 sp=0x14000684fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 37 gp=0x140005028c0 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b66e2?, 0x3?, 0xc?, 0x5?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000519f10 sp=0x14000519ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000519fb0 sp=0x14000519f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000519fd0 sp=0x14000519fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000519fd0 sp=0x14000519fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b6665?, 0x3?, 0x4c?, 0x1d?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 8 gp=0x14000003dc0 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9add83?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140000b9f10 sp=0x140000b9ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x140000b9fb0 sp=0x140000b9f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000b9fd0 sp=0x140000b9fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000b9fd0 sp=0x140000b9fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 9 gp=0x14000582380 m=nil [chan receive]: runtime.gopark(0x14000049868?, 0x104642734?, 0x98?, 0x98?, 0x104663c8c?) runtime/proc.go:460 +0xc0 fp=0x14000049850 sp=0x14000049830 pc=0x104661bf0 runtime.chanrecv(0x140003261c0, 0x14000049a40, 0x1) runtime/chan.go:667 +0x428 fp=0x140000498d0 sp=0x14000049850 pc=0x1045fc318 runtime.chanrecv1(0x140002742d0?, 0x14000400000?) runtime/chan.go:509 +0x14 fp=0x14000049900 sp=0x140000498d0 pc=0x1045fbeb4 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000421680, {0x105840c68, 0x1400001a3c0}, 0x14000014b40) github.com/ollama/ollama/runner/llamarunner/runner.go:712 +0x5a0 fp=0x14000049a90 sp=0x14000049900 pc=0x104a4c910 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x105840c68?, 0x1400001a3c0?}, 0x14000049b18?) <autogenerated>:1 +0x40 fp=0x14000049ac0 sp=0x14000049a90 pc=0x104a4e760 net/http.HandlerFunc.ServeHTTP(0x14000162000?, {0x105840c68?, 0x1400001a3c0?}, 0x14000049b00?) net/http/server.go:2322 +0x38 fp=0x14000049af0 sp=0x14000049ac0 pc=0x104904c28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x105840c68, 0x1400001a3c0}, 0x14000014b40) net/http/server.go:2861 +0x190 fp=0x14000049b40 sp=0x14000049af0 pc=0x1049066c0 net/http.serverHandler.ServeHTTP({0x10583d6d0?}, {0x105840c68?, 0x1400001a3c0?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x14000049b70 sp=0x14000049b40 pc=0x104920eb0 net/http.(*conn).serve(0x140000c0090, {0x105843088, 0x140007749c0}) net/http/server.go:2109 +0x528 fp=0x14000049fa0 sp=0x14000049b70 pc=0x104903018 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x14000049fd0 sp=0x14000049fa0 pc=0x10490834c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000049fd0 sp=0x14000049fd0 pc=0x104669f94 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 42 gp=0x14000602540 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x104685c20?) runtime/proc.go:460 +0xc0 fp=0x1400017d580 sp=0x1400017d560 pc=0x104661bf0 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x1400017d5c0 sp=0x1400017d580 pc=0x1046277c0 internal/poll.runtime_pollWait(0x131a2fa00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x1400017d5f0 sp=0x1400017d5c0 pc=0x104660e20 internal/poll.(*pollDesc).wait(0x140001d8a80?, 0x14000428061?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400017d620 sp=0x1400017d5f0 pc=0x1046dfb88 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x140001d8a80, {0x14000428061, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400017d6c0 sp=0x1400017d620 pc=0x1046e0da0 net.(*netFD).Read(0x140001d8a80, {0x14000428061?, 0x106065380?, 0x14000428114?}) net/fd_posix.go:68 +0x28 fp=0x1400017d710 sp=0x1400017d6c0 pc=0x1047446d8 net.(*conn).Read(0x1400011c180, {0x14000428061?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x1400017d760 sp=0x1400017d710 pc=0x104750cb4 net/http.(*connReader).backgroundRead(0x14000428040) net/http/server.go:702 +0x38 fp=0x1400017d7b0 sp=0x1400017d760 pc=0x1048fe088 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x1400017d7d0 sp=0x1400017d7b0 pc=0x1048fdf78 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400017d7d0 sp=0x1400017d7d0 pc=0x104669f94 created by net/http.(*connReader).startBackgroundRead in goroutine 9 net/http/server.go:698 +0xb8 r0 0x1067fc000 r1 0x1067ffba0 r2 0x0 r3 0x106802fc0 r4 0xcb710d800 r5 0x0 r6 0xffffffffbfc007ff r7 0xfffff0003ffff800 r8 0xcb710d800 r9 0x0 r10 0x300100401980 r11 0xbf8192bdc0558a82 r12 0xbf3701bb3eff790e r13 0x546319e169533ed5 r14 0x1068454c8 r15 0xcb710c000 r16 0x2839d9e88 r17 0xffffffffb00007ff r18 0x0 r19 0xc00 r20 0xc00414634b5 r21 0x0 r22 0x1062bd430 r23 0x0 r24 0x300 r25 0x300 r26 0x200 r27 0x1062bd430 r28 0x0 r29 0x16d02acd0 lr 0x181748a78 sp 0x16d02ac60 pc 0x1815b6b1c fault 0x1815b6b1c ```
Author
Owner

@phrozen commented on GitHub (Oct 18, 2025):

I tested ollama 0.12.6 and at least my issue is fixed, thank you! No more zero/infinite vectors while generating embeddings over open data sets and synthentic tests for qwen3-embedding:8B.

<!-- gh-comment-id:3417834408 --> @phrozen commented on GitHub (Oct 18, 2025): I tested ollama `0.12.6` and at least my issue is fixed, thank you! No more zero/infinite vectors while generating embeddings over open data sets and synthentic tests for `qwen3-embedding:8B`.
Author
Owner

@liaoweiguo commented on GitHub (Oct 24, 2025):

fixed in NVD, fail in apple metal, 0.12.6, bge-m3

<!-- gh-comment-id:3441834117 --> @liaoweiguo commented on GitHub (Oct 24, 2025): fixed in NVD, fail in apple metal, 0.12.6, bge-m3
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70410