[GH-ISSUE #12585] Ollama v0.12.5: nomic-embed-text is failing multiple times #70410

New Issue

GiteaMirror · 2026-05-04T21:26:52-05:00

GiteaMirror commented

2026-05-04 21:26:52 -05:00

Originally created by @premsai2030 on GitHub (Oct 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12585

Originally assigned to: @jmorganca on GitHub.

What is the issue?

I am using ollama api to call the nomic-embed-text model, it is failing multiple times. It's not like the text is having issues. Assume if i run the same api again for 10 times, the success rate is 50-50 which is only happening in newer version only.

Relevant log output

500 Server Error: Internal Server Error for url: http://localhost:11434/api/embeddings

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

ollama version is 0.12.5

Originally created by @premsai2030 on GitHub (Oct 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12585 Originally assigned to: @jmorganca on GitHub. ### What is the issue? I am using ollama api to call the nomic-embed-text model, it is failing multiple times. It's not like the text is having issues. Assume if i run the same api again for 10 times, the success rate is 50-50 which is only happening in newer version only. ### Relevant log output ```shell 500 Server Error: Internal Server Error for url: http://localhost:11434/api/embeddings ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version ollama version is 0.12.5

GiteaMirror added the embeddings bug labels 2026-05-04 21:26:53 -05:00

GiteaMirror commented

2026-05-04 21:26:58 -05:00

@phrozen commented on GitHub (Oct 12, 2025):

I'll just add here instead of opening a new one. I am using Qwen3-Embedding:8B and testing both Q8 and Q4_K_M versions, and as of this morning (when I updated to 0.12.5), I am getting a TON of zero/infinite vectors while embedding with batches with different dimensions (1024, 2048, 4096). That data was fine yesterday, so this is definitely an issue with the latest version. This was caught by integration testing, and pinned back to the latest version (0.12.5).

Edit: Confirmed, downgraded to 0.12.3 and my embeddings are back to normal now. I caution anyone to not upgrade if using embedding models. This is on Mac OSX Tahoe 26.0.1 (M2 Max)

@phrozen commented on GitHub (Oct 12, 2025): I'll just add here instead of opening a new one. I am using `Qwen3-Embedding:8B` and testing both `Q8` and `Q4_K_M` versions, and as of this morning (when I updated to `0.12.5`), I am getting a TON of zero/infinite vectors while embedding with batches with different dimensions (1024, 2048, 4096). That data was fine yesterday, so this is definitely an issue with the latest version. This was caught by integration testing, and pinned back to the latest version (`0.12.5`). **Edit:** Confirmed, downgraded to `0.12.3` and my embeddings are back to normal now. I caution anyone to not upgrade if using embedding models. This is on Mac OSX Tahoe 26.0.1 (M2 Max)

GiteaMirror commented

2026-05-04 21:27:00 -05:00

@frikishaan commented on GitHub (Oct 13, 2025):

Embeddings not working for me too. The API is not responding. I am also not able to run any text models.
I have tried with embeddinggemma:300m-qat-q8_0 and nomic-embed-text:latest. After updating this morning to v0.12.5.

Platform - Windows 10

@frikishaan commented on GitHub (Oct 13, 2025): Embeddings not working for me too. The API is not responding. I am also not able to run any text models. I have tried with `embeddinggemma:300m-qat-q8_0` and `nomic-embed-text:latest`. After updating this morning to `v0.12.5`. **Platform** - Windows 10

GiteaMirror commented

2026-05-04 21:27:02 -05:00

@xpomul commented on GitHub (Oct 13, 2025):

I have the same issue. MacBook Pro M1 Mac.

Log output:

msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:4h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/stefan/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
....
time=2025-10-13T17:02:25.698+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:8 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M1 Max) (unknown id) - 53083 MiB free
time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding"
time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/stefan/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.14 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context:      Metal compute buffer size =    23.51 MiB
llama_context:        CPU compute buffer size =     4.01 MiB
llama_context: graph nodes  = 371
llama_context: graph splits = 2
time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds"
time=2025-10-13T17:02:25.951+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1
time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding"
time=2025-10-13T17:02:25.952+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds"
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB
init: embeddings required but some input tokens were not marked as outputs -> overriding
SIGTRAP: trace trap
PC=0x1815b6b1c m=4 sigcode=0
signal arrived during cgo execution

goroutine 22 gp=0x14000503180 m=4 mp=0x14000073808 [syscall]:
runtime.cgocall(0x1051458ec, 0x14000687ba8)
	runtime/cgocall.go:167 +0x44 fp=0x14000687b70 sp=0x14000687b30 pc=0x10465e824
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1062bd2f0, {0x200, 0xcb70ce800, 0x0, 0xcb6820800, 0x1062c5570, 0xcb6f2a800, 0x1062bfb50})
	_cgo_gotypes.go:674 +0x30 fp=0x14000687ba0 sp=0x14000687b70 pc=0x1049a4320
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:158
github.com/ollama/ollama/llama.(*Context).Decode(0x140003e2c08?, 0xb87?)
	github.com/ollama/ollama/llama/llama.go:158 +0xcc fp=0x14000687c90 sp=0x14000687ba0 pc=0x1049a63cc
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000421680, 0x14000386140, 0x14000687f18)
	github.com/ollama/ollama/runner/llamarunner/runner.go:439 +0x1c8 fp=0x14000687ed0 sp=0x14000687c90 pc=0x104a4aac8
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000421680, {0x1058430c0, 0x1400004f630})
	github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x15c fp=0x14000687fa0 sp=0x14000687ed0 pc=0x104a4a79c
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x2c fp=0x14000687fd0 sp=0x14000687fa0 pc=0x104a4e43c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000687fd0 sp=0x14000687fd0 pc=0x104669f94
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x418

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140000b1720 sp=0x140000b1700 pc=0x104661bf0
runtime.netpollblock(0x140000b17b8?, 0x46e408c?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x140000b1760 sp=0x140000b1720 pc=0x1046277c0
internal/poll.runtime_pollWait(0x131a2fc00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x140000b1790 sp=0x140000b1760 pc=0x104660e20
internal/poll.(*pollDesc).wait(0x140001d8000?, 0x1046e611c?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140000b17c0 sp=0x140000b1790 pc=0x1046dfb88
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x140001d8000)
	internal/poll/fd_unix.go:613 +0x21c fp=0x140000b1870 sp=0x140000b17c0 pc=0x1046e416c
net.(*netFD).accept(0x140001d8000)
	net/fd_unix.go:161 +0x28 fp=0x140000b1930 sp=0x140000b1870 pc=0x104745ed8
net.(*TCPListener).accept(0x14000428000)
	net/tcpsock_posix.go:159 +0x24 fp=0x140000b1980 sp=0x140000b1930 pc=0x104759544
net.(*TCPListener).Accept(0x14000428000)
	net/tcpsock.go:380 +0x2c fp=0x140000b19c0 sp=0x140000b1980 pc=0x1047585ec
net/http.(*onceCloseListener).Accept(0x140000c0090?)
	<autogenerated>:1 +0x2c fp=0x140000b19e0 sp=0x140000b19c0 pc=0x10492ce5c
net/http.(*Server).Serve(0x140001fa700, {0x105840a88, 0x14000428000})
	net/http/server.go:3463 +0x24c fp=0x140000b1b10 sp=0x140000b19e0 pc=0x104907fec
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000136140, 0x4, 0x4})
	github.com/ollama/ollama/runner/llamarunner/runner.go:901 +0x754 fp=0x140000b1ce0 sp=0x140000b1b10 pc=0x104a4e254
github.com/ollama/ollama/runner.Execute({0x14000136130?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140000b1d10 sp=0x140000b1ce0 pc=0x104abe5f8
github.com/ollama/ollama/cmd.NewCLI.func2(0x140000a8b00?, {0x1053973cb?, 0x4?, 0x1053973cf?})
	github.com/ollama/ollama/cmd/cmd.go:1769 +0x50 fp=0x140000b1d40 sp=0x140000b1d10 pc=0x1050f67d0
github.com/spf13/cobra.(*Command).execute(0x140001fd208, {0x140005aba80, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140000b1e60 sp=0x140000b1d40 pc=0x1047b0660
github.com/spf13/cobra.(*Command).ExecuteC(0x140005bec08)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140000b1f20 sp=0x140000b1e60 pc=0x1047b0d3c
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x140000b1f40 sp=0x140000b1f20 pc=0x1050f72e4
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x140000b1fd0 sp=0x140000b1f40 pc=0x10462e278
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000b1fd0 sp=0x140000b1fd0 pc=0x104669f94

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10462e5c4
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104669f94
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006d760 sp=0x1400006d740 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x14000098000)
	runtime/mgcsweep.go:323 +0x104 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x104619084
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x10460ccd8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104669f94
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x105553498?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006df60 sp=0x1400006df40 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x106128ba0)
	runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x104616b9c
runtime.bgscavenge(0x14000098000)
	runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10461713c
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x10460cc78
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104669f94
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 18 gp=0x14000102700 m=nil [finalizer wait]:
runtime.gopark(0x105700f80?, 0x1e?, 0xc8?, 0xc5?, 0x1000000000000?)
	runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x104661bf0
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x10460bcc4
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104669f94
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 19 gp=0x14000103180 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068740 sp=0x14000068720 pc=0x104661bf0
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x106129a80)
	runtime/mcleanup.go:439 +0x110 fp=0x14000068780 sp=0x14000068740 pc=0x1046091b0
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x140000687d0 sp=0x14000068780 pc=0x1046099c0
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104669f94
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 20 gp=0x14000103500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068f10 sp=0x14000068ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000068fb0 sp=0x14000068f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 34 gp=0x14000502380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000518710 sp=0x140005186f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x140005187b0 sp=0x14000518710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140005187d0 sp=0x140005187b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140005187d0 sp=0x140005187d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 35 gp=0x14000502540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000518f10 sp=0x14000518ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000518fb0 sp=0x14000518f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000518fd0 sp=0x14000518fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000518fd0 sp=0x14000518fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 36 gp=0x14000502700 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b3226?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000680f10 sp=0x14000680ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000680fb0 sp=0x14000680f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000680fd0 sp=0x14000680fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000680fd0 sp=0x14000680fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 21 gp=0x140001036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b656b?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b13b6?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000684f10 sp=0x14000684ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000684fb0 sp=0x14000684f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000684fd0 sp=0x14000684fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000684fd0 sp=0x14000684fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 37 gp=0x140005028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b66e2?, 0x3?, 0xc?, 0x5?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000519f10 sp=0x14000519ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000519fb0 sp=0x14000519f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000519fd0 sp=0x14000519fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000519fd0 sp=0x14000519fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9b6665?, 0x3?, 0x4c?, 0x1d?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 8 gp=0x14000003dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x322f65c9add83?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140000b9f10 sp=0x140000b9ef0 pc=0x104661bf0
runtime.gcBgMarkWorker(0x14000111880)
	runtime/mgc.go:1463 +0xe0 fp=0x140000b9fb0 sp=0x140000b9f10 pc=0x10460f350
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000b9fd0 sp=0x140000b9fb0 pc=0x10460f238
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000b9fd0 sp=0x140000b9fd0 pc=0x104669f94
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 9 gp=0x14000582380 m=nil [chan receive]:
runtime.gopark(0x14000049868?, 0x104642734?, 0x98?, 0x98?, 0x104663c8c?)
	runtime/proc.go:460 +0xc0 fp=0x14000049850 sp=0x14000049830 pc=0x104661bf0
runtime.chanrecv(0x140003261c0, 0x14000049a40, 0x1)
	runtime/chan.go:667 +0x428 fp=0x140000498d0 sp=0x14000049850 pc=0x1045fc318
runtime.chanrecv1(0x140002742d0?, 0x14000400000?)
	runtime/chan.go:509 +0x14 fp=0x14000049900 sp=0x140000498d0 pc=0x1045fbeb4
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000421680, {0x105840c68, 0x1400001a3c0}, 0x14000014b40)
	github.com/ollama/ollama/runner/llamarunner/runner.go:712 +0x5a0 fp=0x14000049a90 sp=0x14000049900 pc=0x104a4c910
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x105840c68?, 0x1400001a3c0?}, 0x14000049b18?)
	<autogenerated>:1 +0x40 fp=0x14000049ac0 sp=0x14000049a90 pc=0x104a4e760
net/http.HandlerFunc.ServeHTTP(0x14000162000?, {0x105840c68?, 0x1400001a3c0?}, 0x14000049b00?)
	net/http/server.go:2322 +0x38 fp=0x14000049af0 sp=0x14000049ac0 pc=0x104904c28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x105840c68, 0x1400001a3c0}, 0x14000014b40)
	net/http/server.go:2861 +0x190 fp=0x14000049b40 sp=0x14000049af0 pc=0x1049066c0
net/http.serverHandler.ServeHTTP({0x10583d6d0?}, {0x105840c68?, 0x1400001a3c0?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x14000049b70 sp=0x14000049b40 pc=0x104920eb0
net/http.(*conn).serve(0x140000c0090, {0x105843088, 0x140007749c0})
	net/http/server.go:2109 +0x528 fp=0x14000049fa0 sp=0x14000049b70 pc=0x104903018
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x14000049fd0 sp=0x14000049fa0 pc=0x10490834c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000049fd0 sp=0x14000049fd0 pc=0x104669f94
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 42 gp=0x14000602540 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x104685c20?)
	runtime/proc.go:460 +0xc0 fp=0x1400017d580 sp=0x1400017d560 pc=0x104661bf0
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x1400017d5c0 sp=0x1400017d580 pc=0x1046277c0
internal/poll.runtime_pollWait(0x131a2fa00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x1400017d5f0 sp=0x1400017d5c0 pc=0x104660e20
internal/poll.(*pollDesc).wait(0x140001d8a80?, 0x14000428061?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400017d620 sp=0x1400017d5f0 pc=0x1046dfb88
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140001d8a80, {0x14000428061, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400017d6c0 sp=0x1400017d620 pc=0x1046e0da0
net.(*netFD).Read(0x140001d8a80, {0x14000428061?, 0x106065380?, 0x14000428114?})
	net/fd_posix.go:68 +0x28 fp=0x1400017d710 sp=0x1400017d6c0 pc=0x1047446d8
net.(*conn).Read(0x1400011c180, {0x14000428061?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x1400017d760 sp=0x1400017d710 pc=0x104750cb4
net/http.(*connReader).backgroundRead(0x14000428040)
	net/http/server.go:702 +0x38 fp=0x1400017d7b0 sp=0x1400017d760 pc=0x1048fe088
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x1400017d7d0 sp=0x1400017d7b0 pc=0x1048fdf78
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400017d7d0 sp=0x1400017d7d0 pc=0x104669f94
created by net/http.(*connReader).startBackgroundRead in goroutine 9
	net/http/server.go:698 +0xb8

r0      0x1067fc000
r1      0x1067ffba0
r2      0x0
r3      0x106802fc0
r4      0xcb710d800
r5      0x0
r6      0xffffffffbfc007ff
r7      0xfffff0003ffff800
r8      0xcb710d800
r9      0x0
r10     0x300100401980
r11     0xbf8192bdc0558a82
r12     0xbf3701bb3eff790e
r13     0x546319e169533ed5
r14     0x1068454c8
r15     0xcb710c000
r16     0x2839d9e88
r17     0xffffffffb00007ff
r18     0x0
r19     0xc00
r20     0xc00414634b5
r21     0x0
r22     0x1062bd430
r23     0x0
r24     0x300
r25     0x300
r26     0x200
r27     0x1062bd430
r28     0x0
r29     0x16d02acd0
lr      0x181748a78
sp      0x16d02ac60
pc      0x1815b6b1c
fault   0x1815b6b1c

@xpomul commented on GitHub (Oct 13, 2025): I have the same issue. MacBook Pro M1 Mac. Log output: ``` msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:4h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/stefan/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" .... time=2025-10-13T17:02:25.698+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:8 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M1 Max) (unknown id) - 53083 MiB free time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding" time=2025-10-13T17:02:25.698+02:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/stefan/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 768 print_info: n_layer = 12 print_info: n_head = 12 print_info: n_head_kv = 12 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 768 print_info: n_embd_v_gqa = 768 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 3072 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: model type = 137M print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 12 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 44.72 MiB load_tensors: Metal_Mapped model buffer size = 216.14 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M1 Max ggml_metal_init: use bfloat = true ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.12 MiB llama_context: Metal compute buffer size = 23.51 MiB llama_context: CPU compute buffer size = 4.01 MiB llama_context: graph nodes = 371 llama_context: graph splits = 2 time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds" time=2025-10-13T17:02:25.951+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1 time=2025-10-13T17:02:25.951+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding" time=2025-10-13T17:02:25.952+02:00 level=INFO source=server.go:1309 msg="llama runner started in 0.30 seconds" init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB init: embeddings required but some input tokens were not marked as outputs -> overriding SIGTRAP: trace trap PC=0x1815b6b1c m=4 sigcode=0 signal arrived during cgo execution goroutine 22 gp=0x14000503180 m=4 mp=0x14000073808 [syscall]: runtime.cgocall(0x1051458ec, 0x14000687ba8) runtime/cgocall.go:167 +0x44 fp=0x14000687b70 sp=0x14000687b30 pc=0x10465e824 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1062bd2f0, {0x200, 0xcb70ce800, 0x0, 0xcb6820800, 0x1062c5570, 0xcb6f2a800, 0x1062bfb50}) _cgo_gotypes.go:674 +0x30 fp=0x14000687ba0 sp=0x14000687b70 pc=0x1049a4320 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:158 github.com/ollama/ollama/llama.(*Context).Decode(0x140003e2c08?, 0xb87?) github.com/ollama/ollama/llama/llama.go:158 +0xcc fp=0x14000687c90 sp=0x14000687ba0 pc=0x1049a63cc github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000421680, 0x14000386140, 0x14000687f18) github.com/ollama/ollama/runner/llamarunner/runner.go:439 +0x1c8 fp=0x14000687ed0 sp=0x14000687c90 pc=0x104a4aac8 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000421680, {0x1058430c0, 0x1400004f630}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x15c fp=0x14000687fa0 sp=0x14000687ed0 pc=0x104a4a79c github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x2c fp=0x14000687fd0 sp=0x14000687fa0 pc=0x104a4e43c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000687fd0 sp=0x14000687fd0 pc=0x104669f94 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x418 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140000b1720 sp=0x140000b1700 pc=0x104661bf0 runtime.netpollblock(0x140000b17b8?, 0x46e408c?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x140000b1760 sp=0x140000b1720 pc=0x1046277c0 internal/poll.runtime_pollWait(0x131a2fc00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x140000b1790 sp=0x140000b1760 pc=0x104660e20 internal/poll.(*pollDesc).wait(0x140001d8000?, 0x1046e611c?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140000b17c0 sp=0x140000b1790 pc=0x1046dfb88 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x140001d8000) internal/poll/fd_unix.go:613 +0x21c fp=0x140000b1870 sp=0x140000b17c0 pc=0x1046e416c net.(*netFD).accept(0x140001d8000) net/fd_unix.go:161 +0x28 fp=0x140000b1930 sp=0x140000b1870 pc=0x104745ed8 net.(*TCPListener).accept(0x14000428000) net/tcpsock_posix.go:159 +0x24 fp=0x140000b1980 sp=0x140000b1930 pc=0x104759544 net.(*TCPListener).Accept(0x14000428000) net/tcpsock.go:380 +0x2c fp=0x140000b19c0 sp=0x140000b1980 pc=0x1047585ec net/http.(*onceCloseListener).Accept(0x140000c0090?) <autogenerated>:1 +0x2c fp=0x140000b19e0 sp=0x140000b19c0 pc=0x10492ce5c net/http.(*Server).Serve(0x140001fa700, {0x105840a88, 0x14000428000}) net/http/server.go:3463 +0x24c fp=0x140000b1b10 sp=0x140000b19e0 pc=0x104907fec github.com/ollama/ollama/runner/llamarunner.Execute({0x14000136140, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:901 +0x754 fp=0x140000b1ce0 sp=0x140000b1b10 pc=0x104a4e254 github.com/ollama/ollama/runner.Execute({0x14000136130?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140000b1d10 sp=0x140000b1ce0 pc=0x104abe5f8 github.com/ollama/ollama/cmd.NewCLI.func2(0x140000a8b00?, {0x1053973cb?, 0x4?, 0x1053973cf?}) github.com/ollama/ollama/cmd/cmd.go:1769 +0x50 fp=0x140000b1d40 sp=0x140000b1d10 pc=0x1050f67d0 github.com/spf13/cobra.(*Command).execute(0x140001fd208, {0x140005aba80, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140000b1e60 sp=0x140000b1d40 pc=0x1047b0660 github.com/spf13/cobra.(*Command).ExecuteC(0x140005bec08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140000b1f20 sp=0x140000b1e60 pc=0x1047b0d3c github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x140000b1f40 sp=0x140000b1f20 pc=0x1050f72e4 runtime.main() runtime/proc.go:285 +0x278 fp=0x140000b1fd0 sp=0x140000b1f40 pc=0x10462e278 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000b1fd0 sp=0x140000b1fd0 pc=0x104669f94 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10462e5c4 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104669f94 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006d760 sp=0x1400006d740 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x14000098000) runtime/mgcsweep.go:323 +0x104 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x104619084 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x10460ccd8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104669f94 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x105553498?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006df60 sp=0x1400006df40 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x106128ba0) runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x104616b9c runtime.bgscavenge(0x14000098000) runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10461713c runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x10460cc78 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104669f94 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 18 gp=0x14000102700 m=nil [finalizer wait]: runtime.gopark(0x105700f80?, 0x1e?, 0xc8?, 0xc5?, 0x1000000000000?) runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x104661bf0 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x10460bcc4 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104669f94 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 19 gp=0x14000103180 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068740 sp=0x14000068720 pc=0x104661bf0 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x106129a80) runtime/mcleanup.go:439 +0x110 fp=0x14000068780 sp=0x14000068740 pc=0x1046091b0 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x140000687d0 sp=0x14000068780 pc=0x1046099c0 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104669f94 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 20 gp=0x14000103500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068f10 sp=0x14000068ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000068fb0 sp=0x14000068f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 34 gp=0x14000502380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000518710 sp=0x140005186f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x140005187b0 sp=0x14000518710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140005187d0 sp=0x140005187b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140005187d0 sp=0x140005187d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 35 gp=0x14000502540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000518f10 sp=0x14000518ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000518fb0 sp=0x14000518f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000518fd0 sp=0x14000518fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000518fd0 sp=0x14000518fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 36 gp=0x14000502700 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b3226?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000680f10 sp=0x14000680ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000680fb0 sp=0x14000680f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000680fd0 sp=0x14000680fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000680fd0 sp=0x14000680fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 21 gp=0x140001036c0 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b656b?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b13b6?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000684f10 sp=0x14000684ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000684fb0 sp=0x14000684f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000684fd0 sp=0x14000684fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000684fd0 sp=0x14000684fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 37 gp=0x140005028c0 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b66e2?, 0x3?, 0xc?, 0x5?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000519f10 sp=0x14000519ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x14000519fb0 sp=0x14000519f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000519fd0 sp=0x14000519fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000519fd0 sp=0x14000519fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9b6665?, 0x3?, 0x4c?, 0x1d?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 8 gp=0x14000003dc0 m=nil [GC worker (idle)]: runtime.gopark(0x322f65c9add83?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140000b9f10 sp=0x140000b9ef0 pc=0x104661bf0 runtime.gcBgMarkWorker(0x14000111880) runtime/mgc.go:1463 +0xe0 fp=0x140000b9fb0 sp=0x140000b9f10 pc=0x10460f350 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000b9fd0 sp=0x140000b9fb0 pc=0x10460f238 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000b9fd0 sp=0x140000b9fd0 pc=0x104669f94 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 9 gp=0x14000582380 m=nil [chan receive]: runtime.gopark(0x14000049868?, 0x104642734?, 0x98?, 0x98?, 0x104663c8c?) runtime/proc.go:460 +0xc0 fp=0x14000049850 sp=0x14000049830 pc=0x104661bf0 runtime.chanrecv(0x140003261c0, 0x14000049a40, 0x1) runtime/chan.go:667 +0x428 fp=0x140000498d0 sp=0x14000049850 pc=0x1045fc318 runtime.chanrecv1(0x140002742d0?, 0x14000400000?) runtime/chan.go:509 +0x14 fp=0x14000049900 sp=0x140000498d0 pc=0x1045fbeb4 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000421680, {0x105840c68, 0x1400001a3c0}, 0x14000014b40) github.com/ollama/ollama/runner/llamarunner/runner.go:712 +0x5a0 fp=0x14000049a90 sp=0x14000049900 pc=0x104a4c910 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x105840c68?, 0x1400001a3c0?}, 0x14000049b18?) <autogenerated>:1 +0x40 fp=0x14000049ac0 sp=0x14000049a90 pc=0x104a4e760 net/http.HandlerFunc.ServeHTTP(0x14000162000?, {0x105840c68?, 0x1400001a3c0?}, 0x14000049b00?) net/http/server.go:2322 +0x38 fp=0x14000049af0 sp=0x14000049ac0 pc=0x104904c28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x105840c68, 0x1400001a3c0}, 0x14000014b40) net/http/server.go:2861 +0x190 fp=0x14000049b40 sp=0x14000049af0 pc=0x1049066c0 net/http.serverHandler.ServeHTTP({0x10583d6d0?}, {0x105840c68?, 0x1400001a3c0?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x14000049b70 sp=0x14000049b40 pc=0x104920eb0 net/http.(*conn).serve(0x140000c0090, {0x105843088, 0x140007749c0}) net/http/server.go:2109 +0x528 fp=0x14000049fa0 sp=0x14000049b70 pc=0x104903018 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x14000049fd0 sp=0x14000049fa0 pc=0x10490834c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000049fd0 sp=0x14000049fd0 pc=0x104669f94 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 42 gp=0x14000602540 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x104685c20?) runtime/proc.go:460 +0xc0 fp=0x1400017d580 sp=0x1400017d560 pc=0x104661bf0 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x1400017d5c0 sp=0x1400017d580 pc=0x1046277c0 internal/poll.runtime_pollWait(0x131a2fa00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x1400017d5f0 sp=0x1400017d5c0 pc=0x104660e20 internal/poll.(*pollDesc).wait(0x140001d8a80?, 0x14000428061?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400017d620 sp=0x1400017d5f0 pc=0x1046dfb88 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x140001d8a80, {0x14000428061, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400017d6c0 sp=0x1400017d620 pc=0x1046e0da0 net.(*netFD).Read(0x140001d8a80, {0x14000428061?, 0x106065380?, 0x14000428114?}) net/fd_posix.go:68 +0x28 fp=0x1400017d710 sp=0x1400017d6c0 pc=0x1047446d8 net.(*conn).Read(0x1400011c180, {0x14000428061?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x1400017d760 sp=0x1400017d710 pc=0x104750cb4 net/http.(*connReader).backgroundRead(0x14000428040) net/http/server.go:702 +0x38 fp=0x1400017d7b0 sp=0x1400017d760 pc=0x1048fe088 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x1400017d7d0 sp=0x1400017d7b0 pc=0x1048fdf78 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400017d7d0 sp=0x1400017d7d0 pc=0x104669f94 created by net/http.(*connReader).startBackgroundRead in goroutine 9 net/http/server.go:698 +0xb8 r0 0x1067fc000 r1 0x1067ffba0 r2 0x0 r3 0x106802fc0 r4 0xcb710d800 r5 0x0 r6 0xffffffffbfc007ff r7 0xfffff0003ffff800 r8 0xcb710d800 r9 0x0 r10 0x300100401980 r11 0xbf8192bdc0558a82 r12 0xbf3701bb3eff790e r13 0x546319e169533ed5 r14 0x1068454c8 r15 0xcb710c000 r16 0x2839d9e88 r17 0xffffffffb00007ff r18 0x0 r19 0xc00 r20 0xc00414634b5 r21 0x0 r22 0x1062bd430 r23 0x0 r24 0x300 r25 0x300 r26 0x200 r27 0x1062bd430 r28 0x0 r29 0x16d02acd0 lr 0x181748a78 sp 0x16d02ac60 pc 0x1815b6b1c fault 0x1815b6b1c ```

GiteaMirror commented

2026-05-04 21:27:04 -05:00

@phrozen commented on GitHub (Oct 18, 2025):

I tested ollama 0.12.6 and at least my issue is fixed, thank you! No more zero/infinite vectors while generating embeddings over open data sets and synthentic tests for qwen3-embedding:8B.

@phrozen commented on GitHub (Oct 18, 2025): I tested ollama `0.12.6` and at least my issue is fixed, thank you! No more zero/infinite vectors while generating embeddings over open data sets and synthentic tests for `qwen3-embedding:8B`.

GiteaMirror commented

2026-05-04 21:27:05 -05:00

@liaoweiguo commented on GitHub (Oct 24, 2025):

fixed in NVD, fail in apple metal, 0.12.6, bge-m3

@liaoweiguo commented on GitHub (Oct 24, 2025): fixed in NVD, fail in apple metal, 0.12.6, bge-m3

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#70410