[GH-ISSUE #13054] Ollama 0.12.10 embedding crash (nomic-embed-text-v1.5 on macOS) #55162

Open
opened 2026-04-29 08:25:40 -05:00 by GiteaMirror · 24 comments
Owner

Originally created by @smileBeda on GitHub (Nov 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13054

Originally assigned to: @npardal on GitHub.

What is the issue?

I’m seeing the runner crash whenever I embed larger chunks. The server log shows it still launching with num_ctx=8192 even though the model only supports 2048, which ends in a SIGTRAP:

time=2025-11-11T18:03:30.193-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048
init: embeddings required but some input tokens were not marked as outputs -> overriding
SIGTRAP: trace trap
PC=0x181ddab1c m=9 sigcode=0
...
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch

Requests look like:
POST http://IP:11434/api/embed

Obviously, Ollama server runs in uttermost default mode, no changes made whatsoever, just plain OLLAMA SERVE:
OLLAMA_HOST=0.0.0.0 ollama serve

If I put my input chunks as low as 512 it works
But that model should support 2048 at least:

ollama show nomic-embed-text 
  Model
    architecture        nomic-bert    
    parameters          137M          
    context length      2048          
    embedding length    768           
    quantization        F16           

  Capabilities
    embedding    

  Parameters
    num_ctx    8192    

  License
    Apache License               
    Version 2.0, January 2004    
    ...                          
  1. Is there a supported way to force the embedding runner to stay within 2048 tokens (e.g., a server-wide num_ctx cap) or a patch that prevents this crash? Any guidance or workaround would be appreciated.
  2. And why would Ollama even try to force such a window on a model that by far does not support it?
  3. Why does it not support the actual window size?
  4. Most importantly? Why did this just work without any error in prior releases? I am not sure WHEN it broke but I am sure it is Ollama, because my API calls or data did not change!

Thanks

Relevant log output

Too long to share, above gist gives the idea.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.12.10

Originally created by @smileBeda on GitHub (Nov 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13054 Originally assigned to: @npardal on GitHub. ### What is the issue? I’m seeing the runner crash whenever I embed larger chunks. The server log shows it still launching with `num_ctx=8192` even though the model only supports `2048`, which ends in a SIGTRAP: ``` time=2025-11-11T18:03:30.193-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048 init: embeddings required but some input tokens were not marked as outputs -> overriding SIGTRAP: trace trap PC=0x181ddab1c m=9 sigcode=0 ... github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch ``` Requests look like: `POST http://IP:11434/api/embed` Obviously, Ollama server runs in uttermost default mode, no changes made whatsoever, just plain OLLAMA SERVE: `OLLAMA_HOST=0.0.0.0 ollama serve` If I put my input chunks as low as 512 it works But that model should support 2048 at least: ``` ollama show nomic-embed-text Model architecture nomic-bert parameters 137M context length 2048 embedding length 768 quantization F16 Capabilities embedding Parameters num_ctx 8192 License Apache License Version 2.0, January 2004 ... ``` 1. Is there a supported way to force the embedding runner to stay within 2048 tokens (e.g., a server-wide num_ctx cap) or a patch that prevents this crash? Any guidance or workaround would be appreciated. 2. And why would Ollama even try to force such a window on a model that by far does not support it? 3. Why does it not support the actual window size? 4. Most importantly? Why did this just work without any error in prior releases? I am not sure WHEN it broke but I am sure it is Ollama, because my API calls or data did not change! Thanks ### Relevant log output ```shell Too long to share, above gist gives the idea. ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.12.10
GiteaMirror added the embeddingsbugmacos labels 2026-04-29 08:25:41 -05:00
Author
Owner

@aaronpliu commented on GitHub (Nov 12, 2025):

experiencing the issue. After revert to previous version, nomic-embed working and my other LLM working as well. I guess that the new version broke something. I'd like to use VL model, so upgrade it, it indeed reduce resource consumption in new version and support new models. Meanwhile, introduce some destructive update has led to instability as well. Too many issues in new version, hopefully have more testing and verify before new version release.

<!-- gh-comment-id:3520199221 --> @aaronpliu commented on GitHub (Nov 12, 2025): experiencing the issue. After revert to previous version, nomic-embed working and my other LLM working as well. I guess that the new version broke something. I'd like to use VL model, so upgrade it, it indeed reduce resource consumption in new version and support new models. Meanwhile, introduce some destructive update has led to instability as well. Too many issues in new version, hopefully have more testing and verify before new version release.
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

  1. Is there a supported way to force the embedding runner to stay within 2048 tokens (e.g., a server-wide num_ctx cap) or a patch that prevents this crash? Any guidance or workaround would be appreciated.

It already does this, that's what the requested context size too large for model message is about.

  1. And why would Ollama even try to force such a window on a model that by far does not support it?

It doesn't.

  1. Why does it not support the actual window size?

It does.

  1. Most importantly? Why did this just work without any error in prior releases? I am not sure WHEN it broke but I am sure it is Ollama, because my API calls or data did not change!

A full server log will help to answer this question.

<!-- gh-comment-id:3520661694 --> @rick-github commented on GitHub (Nov 12, 2025): > 1. Is there a supported way to force the embedding runner to stay within 2048 tokens (e.g., a server-wide num_ctx cap) or a patch that prevents this crash? Any guidance or workaround would be appreciated. It already does this, that's what the `requested context size too large for model` message is about. > 2. And why would Ollama even try to force such a window on a model that by far does not support it? It doesn't. > 3. Why does it not support the actual window size? It does. > 4. Most importantly? Why did this just work without any error in prior releases? I am not sure WHEN it broke but I am sure it is Ollama, because my API calls or data did not change! A full server log will help to answer this question.
Author
Owner

@smileBeda commented on GitHub (Nov 12, 2025):

@rick-github, the nomic-embed-text model I use reports a window of 2048, yet, it only works if you pass Max 512.
As for not respecting it, I get that Ollama throws an error (server side, not client side, client side it just says EOF) - but that is not what I consider "respecting" the window size. It (ollama itself) tries to force 8192 size, as shown in the opening post.
The very report itself of the model says: num_ctx 8192, which is what seems to be used when loading the model.

Also, have you noticed @aaronpliu comment saying not only he sees the same but confirms its happening since last releases, and rolling back fixes it?

Full log:
(of one several requests hitting, some pass, some not, this was with 2048 chunks, it WORKS with 512 chunks)

ollama serve
time=2025-11-12T08:54:22.205-03:00 level=INFO source=routes.go:1525 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/bedas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-11-12T08:54:22.206-03:00 level=INFO source=images.go:522 msg="total blobs: 9"
time=2025-11-12T08:54:22.206-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-12T08:54:22.207-03:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)"
time=2025-11-12T08:54:22.207-03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-12T08:54:22.208-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --ollama-engine --port 54584"
time=2025-11-12T08:54:22.272-03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M2 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="11.8 GiB" available="11.8 GiB"
time=2025-11-12T08:54:22.272-03:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="11.8 GiB" threshold="20.0 GiB"
[GIN] 2025/11/12 - 08:54:45 | 200 |    4.001917ms |     192.168.1.3 | GET      "/api/tags"
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.017 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
time=2025-11-12T08:54:45.493-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048
time=2025-11-12T08:54:45.494-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 54587"
time=2025-11-12T08:54:45.496-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.7 GiB" free_swap="0 B"
time=2025-11-12T08:54:45.496-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1
time=2025-11-12T08:54:45.496-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB"
time=2025-11-12T08:54:45.513-03:00 level=INFO source=runner.go:910 msg="starting go runner"
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
time=2025-11-12T08:54:45.513-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-11-12T08:54:45.537-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:54587"
time=2025-11-12T08:54:45.542-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
time=2025-11-12T08:54:45.542-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T08:54:45.542-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.14 MiB
llama_init_from_model: model default pooling_type is [1], but [-1] was specified
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context:      Metal compute buffer size =    23.51 MiB
llama_context:        CPU compute buffer size =     4.01 MiB
llama_context: graph nodes  = 371
llama_context: graph splits = 2
time=2025-11-12T08:54:45.793-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds"
time=2025-11-12T08:54:45.793-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1
time=2025-11-12T08:54:45.793-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T08:54:45.794-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds"
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 0.12 MiB to 59.08 MiB
[GIN] 2025/11/12 - 08:54:45 | 200 |  458.876625ms |     192.168.1.3 | POST     "/api/embed"
init: embeddings required but some input tokens were not marked as outputs -> overriding
[GIN] 2025/11/12 - 08:54:45 | 200 |   60.467708ms |     192.168.1.3 | POST     "/api/embed"
init: embeddings required but some input tokens were not marked as outputs -> overriding
[GIN] 2025/11/12 - 08:54:45 | 200 |   56.952416ms |     192.168.1.3 | POST     "/api/embed"
init: embeddings required but some input tokens were not marked as outputs -> overriding
[GIN] 2025/11/12 - 08:54:46 | 200 |   34.189041ms |     192.168.1.3 | POST     "/api/embed"
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 59.08 MiB to 61.11 MiB
init: embeddings required but some input tokens were not marked as outputs -> overriding
SIGTRAP: trace trap
PC=0x181ddab1c m=0 sigcode=0
signal arrived during cgo execution

goroutine 50 gp=0x14000504c40 m=0 mp=0x104444120 [syscall]:
runtime.cgocall(0x103444fbc, 0x14000711b88)
	runtime/cgocall.go:167 +0x44 fp=0x14000711b50 sp=0x14000711b10 pc=0x10294a684
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x104e4f7e0, {0x200, 0x104e566e0, 0x0, 0xaca899000, 0xaca898800, 0xacb5b6800, 0x104e504e0})
	_cgo_gotypes.go:674 +0x30 fp=0x14000711b80 sp=0x14000711b50 pc=0x102c91790
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:160
github.com/ollama/ollama/llama.(*Context).Decode(0x140003d4c08?, 0x0?)
	github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x14000711c70 sp=0x14000711b80 pc=0x102c9394c
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000412320, 0x1400004eb40, 0x14000711f18)
	github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x14000711ed0 sp=0x14000711c70 pc=0x102d33774
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000412320, {0x103b4fcd0, 0x14000538960})
	github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x14000711fa0 sp=0x14000711ed0 pc=0x102d3343c
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x14000711fd0 sp=0x14000711fa0 pc=0x102d3724c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000711fd0 sp=0x14000711fd0 pc=0x102955d04
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000231720 sp=0x14000231700 pc=0x10294da50
runtime.netpollblock(0x1400052f7b8?, 0x29cfa6c?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x14000231760 sp=0x14000231720 pc=0x102913620
internal/poll.runtime_pollWait(0x12fd60400, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x14000231790 sp=0x14000231760 pc=0x10294cc80
internal/poll.(*pollDesc).wait(0x14000718100?, 0x1029d1afc?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140002317c0 sp=0x14000231790 pc=0x1029cb568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14000718100)
	internal/poll/fd_unix.go:613 +0x21c fp=0x14000231870 sp=0x140002317c0 pc=0x1029cfb4c
net.(*netFD).accept(0x14000718100)
	net/fd_unix.go:161 +0x28 fp=0x14000231930 sp=0x14000231870 pc=0x102a317f8
net.(*TCPListener).accept(0x140000f1780)
	net/tcpsock_posix.go:159 +0x24 fp=0x14000231980 sp=0x14000231930 pc=0x102a44e64
net.(*TCPListener).Accept(0x140000f1780)
	net/tcpsock.go:380 +0x2c fp=0x140002319c0 sp=0x14000231980 pc=0x102a43f0c
net/http.(*onceCloseListener).Accept(0x14000720090?)
	<autogenerated>:1 +0x2c fp=0x140002319e0 sp=0x140002319c0 pc=0x102c18a1c
net/http.(*Server).Serve(0x140001f4800, {0x103b4d668, 0x140000f1780})
	net/http/server.go:3463 +0x24c fp=0x14000231b10 sp=0x140002319e0 pc=0x102bf3bac
github.com/ollama/ollama/runner/llamarunner.Execute({0x140000341a0, 0x4, 0x4})
	github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x14000231ce0 sp=0x14000231b10 pc=0x102d37064
github.com/ollama/ollama/runner.Execute({0x14000034190?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x14000231d10 sp=0x14000231ce0 pc=0x102dae528
github.com/ollama/ollama/cmd.NewCLI.func2(0x1400051ee00?, {0x10369baf0?, 0x4?, 0x10369baf4?})
	github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x14000231d40 sp=0x14000231d10 pc=0x1033f5db0
github.com/spf13/cobra.(*Command).execute(0x140004ad508, {0x1400041bf40, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x14000231e60 sp=0x14000231d40 pc=0x102a9bf60
github.com/spf13/cobra.(*Command).ExecuteC(0x1400055af08)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x14000231f20 sp=0x14000231e60 pc=0x102a9c63c
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x14000231f40 sp=0x14000231f20 pc=0x1033f68d4
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x14000231fd0 sp=0x14000231f40 pc=0x10291a0d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000231fd0 sp=0x14000231fd0 pc=0x102955d04

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x10294da50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10291a424
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x102955d04
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 18 gp=0x14000102380 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x10294da50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x14000098000)
	runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x102904ee4
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x1028f8b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x102955d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 19 gp=0x14000102540 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x10385a590?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x10294da50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x104441860)
	runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x1029029fc
runtime.bgscavenge(0x14000098000)
	runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x102902f9c
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1028f8ad8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x102955d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 3 gp=0x14000003880 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x103b39a18?, 0x20?, 0xa0?, 0x1000000010?)
	runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x10294da50
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x1028f7b24
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x102955d04
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 4 gp=0x1400023a380 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006d740 sp=0x1400006d720 pc=0x10294da50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x104442740)
	runtime/mcleanup.go:439 +0x110 fp=0x1400006d780 sp=0x1400006d740 pc=0x1028f5010
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x1400006d7d0 sp=0x1400006d780 pc=0x1028f5820
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x102955d04
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 5 gp=0x1400023a8c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 34 gp=0x14000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x104473920?, 0x3?, 0x95?, 0x72?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400050a710 sp=0x1400050a6f0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400050a7b0 sp=0x1400050a710 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400050a7d0 sp=0x1400050a7b0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400050a7d0 sp=0x1400050a7d0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x576c9d4eb22c8?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 21 gp=0x140001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x576c9d4ebe835?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 6 gp=0x1400023aa80 m=nil [GC worker (idle)]:
runtime.gopark(0x576c9d4eba64a?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000080f10 sp=0x14000080ef0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000080fb0 sp=0x14000080f10 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000080fd0 sp=0x14000080fb0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000080fd0 sp=0x14000080fd0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 35 gp=0x140005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x576c9d4ec447e?, 0x3?, 0x1a?, 0xc5?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400050af10 sp=0x1400050aef0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400050afb0 sp=0x1400050af10 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400050afd0 sp=0x1400050afb0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400050afd0 sp=0x1400050afd0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 22 gp=0x14000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x576c9d4ebf370?, 0x3?, 0xf8?, 0x93?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 23 gp=0x14000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0x576c9d4eb88ab?, 0x3?, 0x76?, 0xf8?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006af10 sp=0x1400006aef0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006afb0 sp=0x1400006af10 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006afd0 sp=0x1400006afb0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006afd0 sp=0x1400006afd0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 7 gp=0x1400023ac40 m=nil [GC worker (idle)]:
runtime.gopark(0x576c9d4ebb51a?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000713f10 sp=0x14000713ef0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000713fb0 sp=0x14000713f10 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000713fd0 sp=0x14000713fb0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000713fd0 sp=0x14000713fd0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 36 gp=0x14000504700 m=nil [GC worker (idle)]:
runtime.gopark(0x104473920?, 0x1?, 0x6e?, 0x3b?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000710f10 sp=0x14000710ef0 pc=0x10294da50
runtime.gcBgMarkWorker(0x140000a36c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000710fb0 sp=0x14000710f10 pc=0x1028fb1b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000710fd0 sp=0x14000710fb0 pc=0x1028fb098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000710fd0 sp=0x14000710fd0 pc=0x102955d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 51 gp=0x14000504e00 m=nil [chan receive]:
runtime.gopark(0x14000049868?, 0x10292e594?, 0x98?, 0x98?, 0x10294f9fc?)
	runtime/proc.go:460 +0xc0 fp=0x14000049850 sp=0x14000049830 pc=0x10294da50
runtime.chanrecv(0x140003300e0, 0x14000049a40, 0x1)
	runtime/chan.go:667 +0x428 fp=0x140000498d0 sp=0x14000049850 pc=0x1028e8318
runtime.chanrecv1(0x140000f2030?, 0x14000590000?)
	runtime/chan.go:509 +0x14 fp=0x14000049900 sp=0x140000498d0 pc=0x1028e7eb4
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000412320, {0x103b4d848, 0x1400024a0f0}, 0x140003783c0)
	github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x14000049a90 sp=0x14000049900 pc=0x102d35728
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x103b4d848?, 0x1400024a0f0?}, 0x14000049b18?)
	<autogenerated>:1 +0x40 fp=0x14000049ac0 sp=0x14000049a90 pc=0x102d37570
net/http.HandlerFunc.ServeHTTP(0x140000e0000?, {0x103b4d848?, 0x1400024a0f0?}, 0x14000049b00?)
	net/http/server.go:2322 +0x38 fp=0x14000049af0 sp=0x14000049ac0 pc=0x102bf07e8
net/http.(*ServeMux).ServeHTTP(0x10?, {0x103b4d848, 0x1400024a0f0}, 0x140003783c0)
	net/http/server.go:2861 +0x190 fp=0x14000049b40 sp=0x14000049af0 pc=0x102bf2280
net/http.serverHandler.ServeHTTP({0x103b4a2b0?}, {0x103b4d848?, 0x1400024a0f0?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x14000049b70 sp=0x14000049b40 pc=0x102c0ca70
net/http.(*conn).serve(0x14000720090, {0x103b4fc98, 0x1400027af30})
	net/http/server.go:2109 +0x528 fp=0x14000049fa0 sp=0x14000049b70 pc=0x102beebd8
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x14000049fd0 sp=0x14000049fa0 pc=0x102bf3f0c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000049fd0 sp=0x14000049fd0 pc=0x102955d04
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 67 gp=0x140005056c0 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x102971600?)
	runtime/proc.go:460 +0xc0 fp=0x14000506580 sp=0x14000506560 pc=0x10294da50
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x140005065c0 sp=0x14000506580 pc=0x102913620
internal/poll.runtime_pollWait(0x12fd60200, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x140005065f0 sp=0x140005065c0 pc=0x10294cc80
internal/poll.(*pollDesc).wait(0x14000718180?, 0x140000f17e1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000506620 sp=0x140005065f0 pc=0x1029cb568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x14000718180, {0x140000f17e1, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x140005066c0 sp=0x14000506620 pc=0x1029cc780
net.(*netFD).Read(0x14000718180, {0x140000f17e1?, 0x10437d450?, 0x140000f1894?})
	net/fd_posix.go:68 +0x28 fp=0x14000506710 sp=0x140005066c0 pc=0x102a2fff8
net.(*conn).Read(0x14000532110, {0x140000f17e1?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x14000506760 sp=0x14000506710 pc=0x102a3c5d4
net/http.(*connReader).backgroundRead(0x140000f17c0)
	net/http/server.go:702 +0x38 fp=0x140005067b0 sp=0x14000506760 pc=0x102be9c48
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x140005067d0 sp=0x140005067b0 pc=0x102be9b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140005067d0 sp=0x140005067d0 pc=0x102955d04
created by net/http.(*connReader).startBackgroundRead in goroutine 51
	net/http/server.go:698 +0xb8

r0      0x1049f4000
r1      0x1049f7c60
r2      0x0
r3      0x1049fb000
r4      0xaca4a1800
r5      0x0
r6      0xffffffffbfc007ff
r7      0xfffff0003ffff800
r8      0xaca4a1800
r9      0x0
r10     0x800380401980
r11     0xbfb99984c02334b7
r12     0x3f140930bf8574a0
r13     0x96c7612865f9de43
r14     0x104a68fb8
r15     0xaca4a0000
r16     0x2821b5e88
r17     0xffffffffb00007ff
r18     0x0
r19     0xc00
r20     0xc0040d232c7
r21     0x0
r22     0x104e4f920
r23     0x0
r24     0x300
r25     0x300
r26     0x200
r27     0x104e4f920
r28     0x0
r29     0x16d52ad00
lr      0x181f6ca78
sp      0x16d52ac90
pc      0x181ddab1c
fault   0x181ddab1c
[GIN] 2025/11/12 - 08:54:46 | 500 |   64.535708ms |     192.168.1.3 | POST     "/api/embed"
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
time=2025-11-12T08:54:46.144-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048
etc etc etc
<!-- gh-comment-id:3521579335 --> @smileBeda commented on GitHub (Nov 12, 2025): @rick-github, the `nomic-embed-text` model I use reports a window of `2048`, yet, it only works if you pass Max 512. As for not respecting it, I get that Ollama throws an error (server side, not client side, client side it just says EOF) - but that is not what I consider "respecting" the window size. It (ollama itself) tries to force `8192` size, as shown in the opening post. The very report itself of the model says: `num_ctx 8192`, which is what seems to be used when loading the model. Also, have you noticed @aaronpliu comment saying not only he sees the same but confirms its happening _since last releases_, and rolling back fixes it? Full log: (of one several requests hitting, some pass, some not, this was with 2048 chunks, it WORKS with 512 chunks) ``` ollama serve time=2025-11-12T08:54:22.205-03:00 level=INFO source=routes.go:1525 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/bedas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-11-12T08:54:22.206-03:00 level=INFO source=images.go:522 msg="total blobs: 9" time=2025-11-12T08:54:22.206-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-12T08:54:22.207-03:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)" time=2025-11-12T08:54:22.207-03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-12T08:54:22.208-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --ollama-engine --port 54584" time=2025-11-12T08:54:22.272-03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M2 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="11.8 GiB" available="11.8 GiB" time=2025-11-12T08:54:22.272-03:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="11.8 GiB" threshold="20.0 GiB" [GIN] 2025/11/12 - 08:54:45 | 200 | 4.001917ms | 192.168.1.3 | GET "/api/tags" ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.017 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 llama_model_load: vocab only - skipping tensors time=2025-11-12T08:54:45.493-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048 time=2025-11-12T08:54:45.494-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 54587" time=2025-11-12T08:54:45.496-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.7 GiB" free_swap="0 B" time=2025-11-12T08:54:45.496-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1 time=2025-11-12T08:54:45.496-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB" time=2025-11-12T08:54:45.513-03:00 level=INFO source=runner.go:910 msg="starting go runner" ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.006 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB time=2025-11-12T08:54:45.513-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2025-11-12T08:54:45.537-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:54587" time=2025-11-12T08:54:45.542-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free time=2025-11-12T08:54:45.542-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T08:54:45.542-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 768 print_info: n_layer = 12 print_info: n_head = 12 print_info: n_head_kv = 12 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 768 print_info: n_embd_v_gqa = 768 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 3072 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: model type = 137M print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 12 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 44.72 MiB load_tensors: Metal_Mapped model buffer size = 216.14 MiB llama_init_from_model: model default pooling_type is [1], but [-1] was specified llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M2 Pro ggml_metal_init: use bfloat = true ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.12 MiB llama_context: Metal compute buffer size = 23.51 MiB llama_context: CPU compute buffer size = 4.01 MiB llama_context: graph nodes = 371 llama_context: graph splits = 2 time=2025-11-12T08:54:45.793-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds" time=2025-11-12T08:54:45.793-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1 time=2025-11-12T08:54:45.793-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T08:54:45.794-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds" init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 0.12 MiB to 59.08 MiB [GIN] 2025/11/12 - 08:54:45 | 200 | 458.876625ms | 192.168.1.3 | POST "/api/embed" init: embeddings required but some input tokens were not marked as outputs -> overriding [GIN] 2025/11/12 - 08:54:45 | 200 | 60.467708ms | 192.168.1.3 | POST "/api/embed" init: embeddings required but some input tokens were not marked as outputs -> overriding [GIN] 2025/11/12 - 08:54:45 | 200 | 56.952416ms | 192.168.1.3 | POST "/api/embed" init: embeddings required but some input tokens were not marked as outputs -> overriding [GIN] 2025/11/12 - 08:54:46 | 200 | 34.189041ms | 192.168.1.3 | POST "/api/embed" init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 59.08 MiB to 61.11 MiB init: embeddings required but some input tokens were not marked as outputs -> overriding SIGTRAP: trace trap PC=0x181ddab1c m=0 sigcode=0 signal arrived during cgo execution goroutine 50 gp=0x14000504c40 m=0 mp=0x104444120 [syscall]: runtime.cgocall(0x103444fbc, 0x14000711b88) runtime/cgocall.go:167 +0x44 fp=0x14000711b50 sp=0x14000711b10 pc=0x10294a684 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x104e4f7e0, {0x200, 0x104e566e0, 0x0, 0xaca899000, 0xaca898800, 0xacb5b6800, 0x104e504e0}) _cgo_gotypes.go:674 +0x30 fp=0x14000711b80 sp=0x14000711b50 pc=0x102c91790 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:160 github.com/ollama/ollama/llama.(*Context).Decode(0x140003d4c08?, 0x0?) github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x14000711c70 sp=0x14000711b80 pc=0x102c9394c github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000412320, 0x1400004eb40, 0x14000711f18) github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x14000711ed0 sp=0x14000711c70 pc=0x102d33774 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000412320, {0x103b4fcd0, 0x14000538960}) github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x14000711fa0 sp=0x14000711ed0 pc=0x102d3343c github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x14000711fd0 sp=0x14000711fa0 pc=0x102d3724c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000711fd0 sp=0x14000711fd0 pc=0x102955d04 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000231720 sp=0x14000231700 pc=0x10294da50 runtime.netpollblock(0x1400052f7b8?, 0x29cfa6c?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x14000231760 sp=0x14000231720 pc=0x102913620 internal/poll.runtime_pollWait(0x12fd60400, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x14000231790 sp=0x14000231760 pc=0x10294cc80 internal/poll.(*pollDesc).wait(0x14000718100?, 0x1029d1afc?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140002317c0 sp=0x14000231790 pc=0x1029cb568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x14000718100) internal/poll/fd_unix.go:613 +0x21c fp=0x14000231870 sp=0x140002317c0 pc=0x1029cfb4c net.(*netFD).accept(0x14000718100) net/fd_unix.go:161 +0x28 fp=0x14000231930 sp=0x14000231870 pc=0x102a317f8 net.(*TCPListener).accept(0x140000f1780) net/tcpsock_posix.go:159 +0x24 fp=0x14000231980 sp=0x14000231930 pc=0x102a44e64 net.(*TCPListener).Accept(0x140000f1780) net/tcpsock.go:380 +0x2c fp=0x140002319c0 sp=0x14000231980 pc=0x102a43f0c net/http.(*onceCloseListener).Accept(0x14000720090?) <autogenerated>:1 +0x2c fp=0x140002319e0 sp=0x140002319c0 pc=0x102c18a1c net/http.(*Server).Serve(0x140001f4800, {0x103b4d668, 0x140000f1780}) net/http/server.go:3463 +0x24c fp=0x14000231b10 sp=0x140002319e0 pc=0x102bf3bac github.com/ollama/ollama/runner/llamarunner.Execute({0x140000341a0, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x14000231ce0 sp=0x14000231b10 pc=0x102d37064 github.com/ollama/ollama/runner.Execute({0x14000034190?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x14000231d10 sp=0x14000231ce0 pc=0x102dae528 github.com/ollama/ollama/cmd.NewCLI.func2(0x1400051ee00?, {0x10369baf0?, 0x4?, 0x10369baf4?}) github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x14000231d40 sp=0x14000231d10 pc=0x1033f5db0 github.com/spf13/cobra.(*Command).execute(0x140004ad508, {0x1400041bf40, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x14000231e60 sp=0x14000231d40 pc=0x102a9bf60 github.com/spf13/cobra.(*Command).ExecuteC(0x1400055af08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x14000231f20 sp=0x14000231e60 pc=0x102a9c63c github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x14000231f40 sp=0x14000231f20 pc=0x1033f68d4 runtime.main() runtime/proc.go:285 +0x278 fp=0x14000231fd0 sp=0x14000231f40 pc=0x10291a0d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000231fd0 sp=0x14000231fd0 pc=0x102955d04 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x10294da50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10291a424 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x102955d04 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 18 gp=0x14000102380 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x10294da50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x14000098000) runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x102904ee4 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x1028f8b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x102955d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 19 gp=0x14000102540 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x10385a590?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x10294da50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x104441860) runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x1029029fc runtime.bgscavenge(0x14000098000) runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x102902f9c runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1028f8ad8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x102955d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 3 gp=0x14000003880 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x103b39a18?, 0x20?, 0xa0?, 0x1000000010?) runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x10294da50 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x1028f7b24 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x102955d04 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 4 gp=0x1400023a380 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006d740 sp=0x1400006d720 pc=0x10294da50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x104442740) runtime/mcleanup.go:439 +0x110 fp=0x1400006d780 sp=0x1400006d740 pc=0x1028f5010 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x1400006d7d0 sp=0x1400006d780 pc=0x1028f5820 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x102955d04 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 5 gp=0x1400023a8c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 34 gp=0x14000504000 m=nil [GC worker (idle)]: runtime.gopark(0x104473920?, 0x3?, 0x95?, 0x72?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400050a710 sp=0x1400050a6f0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x1400050a7b0 sp=0x1400050a710 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400050a7d0 sp=0x1400050a7b0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400050a7d0 sp=0x1400050a7d0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]: runtime.gopark(0x576c9d4eb22c8?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 21 gp=0x140001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x576c9d4ebe835?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 6 gp=0x1400023aa80 m=nil [GC worker (idle)]: runtime.gopark(0x576c9d4eba64a?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000080f10 sp=0x14000080ef0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x14000080fb0 sp=0x14000080f10 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000080fd0 sp=0x14000080fb0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000080fd0 sp=0x14000080fd0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 35 gp=0x140005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x576c9d4ec447e?, 0x3?, 0x1a?, 0xc5?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400050af10 sp=0x1400050aef0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x1400050afb0 sp=0x1400050af10 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400050afd0 sp=0x1400050afb0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400050afd0 sp=0x1400050afd0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 22 gp=0x14000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x576c9d4ebf370?, 0x3?, 0xf8?, 0x93?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 23 gp=0x14000102c40 m=nil [GC worker (idle)]: runtime.gopark(0x576c9d4eb88ab?, 0x3?, 0x76?, 0xf8?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006af10 sp=0x1400006aef0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006afb0 sp=0x1400006af10 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006afd0 sp=0x1400006afb0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006afd0 sp=0x1400006afd0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 7 gp=0x1400023ac40 m=nil [GC worker (idle)]: runtime.gopark(0x576c9d4ebb51a?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000713f10 sp=0x14000713ef0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x14000713fb0 sp=0x14000713f10 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000713fd0 sp=0x14000713fb0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000713fd0 sp=0x14000713fd0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 36 gp=0x14000504700 m=nil [GC worker (idle)]: runtime.gopark(0x104473920?, 0x1?, 0x6e?, 0x3b?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000710f10 sp=0x14000710ef0 pc=0x10294da50 runtime.gcBgMarkWorker(0x140000a36c0) runtime/mgc.go:1463 +0xe0 fp=0x14000710fb0 sp=0x14000710f10 pc=0x1028fb1b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000710fd0 sp=0x14000710fb0 pc=0x1028fb098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000710fd0 sp=0x14000710fd0 pc=0x102955d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 51 gp=0x14000504e00 m=nil [chan receive]: runtime.gopark(0x14000049868?, 0x10292e594?, 0x98?, 0x98?, 0x10294f9fc?) runtime/proc.go:460 +0xc0 fp=0x14000049850 sp=0x14000049830 pc=0x10294da50 runtime.chanrecv(0x140003300e0, 0x14000049a40, 0x1) runtime/chan.go:667 +0x428 fp=0x140000498d0 sp=0x14000049850 pc=0x1028e8318 runtime.chanrecv1(0x140000f2030?, 0x14000590000?) runtime/chan.go:509 +0x14 fp=0x14000049900 sp=0x140000498d0 pc=0x1028e7eb4 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000412320, {0x103b4d848, 0x1400024a0f0}, 0x140003783c0) github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x14000049a90 sp=0x14000049900 pc=0x102d35728 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x103b4d848?, 0x1400024a0f0?}, 0x14000049b18?) <autogenerated>:1 +0x40 fp=0x14000049ac0 sp=0x14000049a90 pc=0x102d37570 net/http.HandlerFunc.ServeHTTP(0x140000e0000?, {0x103b4d848?, 0x1400024a0f0?}, 0x14000049b00?) net/http/server.go:2322 +0x38 fp=0x14000049af0 sp=0x14000049ac0 pc=0x102bf07e8 net/http.(*ServeMux).ServeHTTP(0x10?, {0x103b4d848, 0x1400024a0f0}, 0x140003783c0) net/http/server.go:2861 +0x190 fp=0x14000049b40 sp=0x14000049af0 pc=0x102bf2280 net/http.serverHandler.ServeHTTP({0x103b4a2b0?}, {0x103b4d848?, 0x1400024a0f0?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x14000049b70 sp=0x14000049b40 pc=0x102c0ca70 net/http.(*conn).serve(0x14000720090, {0x103b4fc98, 0x1400027af30}) net/http/server.go:2109 +0x528 fp=0x14000049fa0 sp=0x14000049b70 pc=0x102beebd8 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x14000049fd0 sp=0x14000049fa0 pc=0x102bf3f0c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000049fd0 sp=0x14000049fd0 pc=0x102955d04 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 67 gp=0x140005056c0 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x102971600?) runtime/proc.go:460 +0xc0 fp=0x14000506580 sp=0x14000506560 pc=0x10294da50 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x140005065c0 sp=0x14000506580 pc=0x102913620 internal/poll.runtime_pollWait(0x12fd60200, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x140005065f0 sp=0x140005065c0 pc=0x10294cc80 internal/poll.(*pollDesc).wait(0x14000718180?, 0x140000f17e1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000506620 sp=0x140005065f0 pc=0x1029cb568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x14000718180, {0x140000f17e1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x140005066c0 sp=0x14000506620 pc=0x1029cc780 net.(*netFD).Read(0x14000718180, {0x140000f17e1?, 0x10437d450?, 0x140000f1894?}) net/fd_posix.go:68 +0x28 fp=0x14000506710 sp=0x140005066c0 pc=0x102a2fff8 net.(*conn).Read(0x14000532110, {0x140000f17e1?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x14000506760 sp=0x14000506710 pc=0x102a3c5d4 net/http.(*connReader).backgroundRead(0x140000f17c0) net/http/server.go:702 +0x38 fp=0x140005067b0 sp=0x14000506760 pc=0x102be9c48 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x140005067d0 sp=0x140005067b0 pc=0x102be9b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140005067d0 sp=0x140005067d0 pc=0x102955d04 created by net/http.(*connReader).startBackgroundRead in goroutine 51 net/http/server.go:698 +0xb8 r0 0x1049f4000 r1 0x1049f7c60 r2 0x0 r3 0x1049fb000 r4 0xaca4a1800 r5 0x0 r6 0xffffffffbfc007ff r7 0xfffff0003ffff800 r8 0xaca4a1800 r9 0x0 r10 0x800380401980 r11 0xbfb99984c02334b7 r12 0x3f140930bf8574a0 r13 0x96c7612865f9de43 r14 0x104a68fb8 r15 0xaca4a0000 r16 0x2821b5e88 r17 0xffffffffb00007ff r18 0x0 r19 0xc00 r20 0xc0040d232c7 r21 0x0 r22 0x104e4f920 r23 0x0 r24 0x300 r25 0x300 r26 0x200 r27 0x104e4f920 r28 0x0 r29 0x16d52ad00 lr 0x181f6ca78 sp 0x16d52ac90 pc 0x181ddab1c fault 0x181ddab1c [GIN] 2025/11/12 - 08:54:46 | 500 | 64.535708ms | 192.168.1.3 | POST "/api/embed" llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 llama_model_load: vocab only - skipping tensors time=2025-11-12T08:54:46.144-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048 etc etc etc ```
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

@rick-github, the nomic-embed-text model I use reports a window of 2048, yet, it only works if you pass Max 512. As for not respecting it, I get that Ollama throws an error (server side, not client side, client side it just says EOF) - but that is not what I consider "respecting" the window size. It (ollama itself) tries to force 8192 size, as shown in the opening post. The very report itself of the model says: num_ctx 8192, which is what seems to be used when loading the model.

ollama does not use a context size of 8192 when loading the model, that's why it emitted the requested context size too large for model message. Since the training size of the model (as shown in the output from ollama show and also indicated in the too large message) is 2048 tokens, that is what ollama used to load the model. This can be seen in the value for n_ctx in the log.

Also, have you noticed @aaronpliu comment saying not only he sees the same but confirms its happening since last releases, and rolling back fixes it?

I'm not suggesting there's not a problem. That's why I asked for a full log. Thank you for supplying it.

<!-- gh-comment-id:3521660217 --> @rick-github commented on GitHub (Nov 12, 2025): > [@rick-github](https://github.com/rick-github), the `nomic-embed-text` model I use reports a window of `2048`, yet, it only works if you pass Max 512. As for not respecting it, I get that Ollama throws an error (server side, not client side, client side it just says EOF) - but that is not what I consider "respecting" the window size. It (ollama itself) tries to force `8192` size, as shown in the opening post. The very report itself of the model says: `num_ctx 8192`, which is what seems to be used when loading the model. ollama does not use a context size of 8192 when loading the model, that's why it emitted the `requested context size too large for model` message. Since the training size of the model (as shown in the output from `ollama show` and also indicated in the `too large` message) is 2048 tokens, that is what ollama used to load the model. This can be seen in the value for `n_ctx` in the log. > Also, have you noticed [@aaronpliu](https://github.com/aaronpliu) comment saying not only he sees the same but confirms its happening _since last releases_, and rolling back fixes it? I'm not suggesting there's not a problem. That's why I asked for a full log. Thank you for supplying it.
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

I am unable to replicate this in a Linux system with 0.12.10. Please add OLLAMA_DEBUG=1 to the server environment and post a log from a failure.

<!-- gh-comment-id:3521847586 --> @rick-github commented on GitHub (Nov 12, 2025): I am unable to replicate this in a Linux system with 0.12.10. Please add `OLLAMA_DEBUG=1` to the server environment and post a log from a failure.
Author
Owner

@liaoweiguo commented on GitHub (Nov 12, 2025):

this exist long time, OK on NVD failed in metal

<!-- gh-comment-id:3522076632 --> @liaoweiguo commented on GitHub (Nov 12, 2025): this exist long time, OK on NVD failed in metal
Author
Owner

@liaoweiguo commented on GitHub (Nov 12, 2025):

seems no one really cares

<!-- gh-comment-id:3522078321 --> @liaoweiguo commented on GitHub (Nov 12, 2025): seems no one really cares
Author
Owner

@smileBeda commented on GitHub (Nov 12, 2025):

@rick-github
Yes, this is not linux, as the title says.
Logs included until the first failures.

ince the training size of the model (as shown in the output from ollama show and also indicated in the too large message) is 2048 tokens,

The reality looks different:

  • it fails above 512 sie
  • model info says that it uses num_ctx 8192 and that is what I see it trying to pass, unless I craft my batches so that they are 512 or smaller. I am totally aware that model does NOT support 8192 itself. However it does seem to also not support the stated 2048
time=2025-11-12T12:14:53.224-03:00 level=INFO source=routes.go:1525 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/bedas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-11-12T12:14:53.225-03:00 level=INFO source=images.go:522 msg="total blobs: 9"
time=2025-11-12T12:14:53.225-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-12T12:14:53.225-03:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)"
time=2025-11-12T12:14:53.225-03:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-12T12:14:53.225-03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-12T12:14:53.226-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --ollama-engine --port 56899"
time=2025-11-12T12:14:53.226-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin
time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=51.694ms OLLAMA_LIBRARY_PATH=[/opt/homebrew/Cellar/ollama/0.12.10/bin] extra_envs=map[]
time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=1
time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:172 msg="adjusting filtering IDs" FilterID=0 new_ID=0
time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=51.869833ms
time=2025-11-12T12:14:53.277-03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M2 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="11.8 GiB" available="11.8 GiB"
time=2025-11-12T12:14:53.277-03:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="11.8 GiB" threshold="20.0 GiB"
[GIN] 2025/11/12 - 12:15:02 | 200 |    6.742583ms |     192.168.1.3 | GET      "/api/tags"
time=2025-11-12T12:15:03.072-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=2.583µs
time=2025-11-12T12:15:03.072-03:00 level=DEBUG source=sched.go:194 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-11-12T12:15:03.076-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:03.076-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.019 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
time=2025-11-12T12:15:03.144-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048
time=2025-11-12T12:15:03.144-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 56911"
time=2025-11-12T12:15:03.144-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin
time=2025-11-12T12:15:03.147-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.6 GiB" free_swap="0 B"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.147-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB"
time=2025-11-12T12:15:03.162-03:00 level=INFO source=runner.go:910 msg="starting go runner"
time=2025-11-12T12:15:03.162-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.018 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
time=2025-11-12T12:15:03.162-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-11-12T12:15:03.200-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:56911"
time=2025-11-12T12:15:03.202-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
time=2025-11-12T12:15:03.203-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T12:15:03.203-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device Metal, is_swa = 0
load_tensors: layer   1 assigned to device Metal, is_swa = 0
load_tensors: layer   2 assigned to device Metal, is_swa = 0
load_tensors: layer   3 assigned to device Metal, is_swa = 0
load_tensors: layer   4 assigned to device Metal, is_swa = 0
load_tensors: layer   5 assigned to device Metal, is_swa = 0
load_tensors: layer   6 assigned to device Metal, is_swa = 0
load_tensors: layer   7 assigned to device Metal, is_swa = 0
load_tensors: layer   8 assigned to device Metal, is_swa = 0
load_tensors: layer   9 assigned to device Metal, is_swa = 0
load_tensors: layer  10 assigned to device Metal, is_swa = 0
load_tensors: layer  11 assigned to device Metal, is_swa = 0
load_tensors: layer  12 assigned to device Metal, is_swa = 0
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor token_types.weight
create_tensor: loading tensor token_embd_norm.weight
create_tensor: loading tensor token_embd_norm.bias
create_tensor: loading tensor blk.0.attn_qkv.weight
create_tensor: loading tensor blk.0.attn_output.weight
create_tensor: loading tensor blk.0.attn_output_norm.weight
create_tensor: loading tensor blk.0.attn_output_norm.bias
create_tensor: loading tensor blk.0.ffn_up.weight
create_tensor: loading tensor blk.0.ffn_down.weight
create_tensor: loading tensor blk.0.ffn_gate.weight
create_tensor: loading tensor blk.0.layer_output_norm.weight
create_tensor: loading tensor blk.0.layer_output_norm.bias
create_tensor: loading tensor blk.1.attn_qkv.weight
create_tensor: loading tensor blk.1.attn_output.weight
create_tensor: loading tensor blk.1.attn_output_norm.weight
create_tensor: loading tensor blk.1.attn_output_norm.bias
create_tensor: loading tensor blk.1.ffn_up.weight
create_tensor: loading tensor blk.1.ffn_down.weight
create_tensor: loading tensor blk.1.ffn_gate.weight
create_tensor: loading tensor blk.1.layer_output_norm.weight
create_tensor: loading tensor blk.1.layer_output_norm.bias
create_tensor: loading tensor blk.2.attn_qkv.weight
create_tensor: loading tensor blk.2.attn_output.weight
create_tensor: loading tensor blk.2.attn_output_norm.weight
create_tensor: loading tensor blk.2.attn_output_norm.bias
create_tensor: loading tensor blk.2.ffn_up.weight
create_tensor: loading tensor blk.2.ffn_down.weight
create_tensor: loading tensor blk.2.ffn_gate.weight
create_tensor: loading tensor blk.2.layer_output_norm.weight
create_tensor: loading tensor blk.2.layer_output_norm.bias
create_tensor: loading tensor blk.3.attn_qkv.weight
create_tensor: loading tensor blk.3.attn_output.weight
create_tensor: loading tensor blk.3.attn_output_norm.weight
create_tensor: loading tensor blk.3.attn_output_norm.bias
create_tensor: loading tensor blk.3.ffn_up.weight
create_tensor: loading tensor blk.3.ffn_down.weight
create_tensor: loading tensor blk.3.ffn_gate.weight
create_tensor: loading tensor blk.3.layer_output_norm.weight
create_tensor: loading tensor blk.3.layer_output_norm.bias
create_tensor: loading tensor blk.4.attn_qkv.weight
create_tensor: loading tensor blk.4.attn_output.weight
create_tensor: loading tensor blk.4.attn_output_norm.weight
create_tensor: loading tensor blk.4.attn_output_norm.bias
create_tensor: loading tensor blk.4.ffn_up.weight
create_tensor: loading tensor blk.4.ffn_down.weight
create_tensor: loading tensor blk.4.ffn_gate.weight
create_tensor: loading tensor blk.4.layer_output_norm.weight
create_tensor: loading tensor blk.4.layer_output_norm.bias
create_tensor: loading tensor blk.5.attn_qkv.weight
create_tensor: loading tensor blk.5.attn_output.weight
create_tensor: loading tensor blk.5.attn_output_norm.weight
create_tensor: loading tensor blk.5.attn_output_norm.bias
create_tensor: loading tensor blk.5.ffn_up.weight
create_tensor: loading tensor blk.5.ffn_down.weight
create_tensor: loading tensor blk.5.ffn_gate.weight
create_tensor: loading tensor blk.5.layer_output_norm.weight
create_tensor: loading tensor blk.5.layer_output_norm.bias
create_tensor: loading tensor blk.6.attn_qkv.weight
create_tensor: loading tensor blk.6.attn_output.weight
create_tensor: loading tensor blk.6.attn_output_norm.weight
create_tensor: loading tensor blk.6.attn_output_norm.bias
create_tensor: loading tensor blk.6.ffn_up.weight
create_tensor: loading tensor blk.6.ffn_down.weight
create_tensor: loading tensor blk.6.ffn_gate.weight
create_tensor: loading tensor blk.6.layer_output_norm.weight
create_tensor: loading tensor blk.6.layer_output_norm.bias
create_tensor: loading tensor blk.7.attn_qkv.weight
create_tensor: loading tensor blk.7.attn_output.weight
create_tensor: loading tensor blk.7.attn_output_norm.weight
create_tensor: loading tensor blk.7.attn_output_norm.bias
create_tensor: loading tensor blk.7.ffn_up.weight
create_tensor: loading tensor blk.7.ffn_down.weight
create_tensor: loading tensor blk.7.ffn_gate.weight
create_tensor: loading tensor blk.7.layer_output_norm.weight
create_tensor: loading tensor blk.7.layer_output_norm.bias
create_tensor: loading tensor blk.8.attn_qkv.weight
create_tensor: loading tensor blk.8.attn_output.weight
create_tensor: loading tensor blk.8.attn_output_norm.weight
create_tensor: loading tensor blk.8.attn_output_norm.bias
create_tensor: loading tensor blk.8.ffn_up.weight
create_tensor: loading tensor blk.8.ffn_down.weight
create_tensor: loading tensor blk.8.ffn_gate.weight
create_tensor: loading tensor blk.8.layer_output_norm.weight
create_tensor: loading tensor blk.8.layer_output_norm.bias
create_tensor: loading tensor blk.9.attn_qkv.weight
create_tensor: loading tensor blk.9.attn_output.weight
create_tensor: loading tensor blk.9.attn_output_norm.weight
create_tensor: loading tensor blk.9.attn_output_norm.bias
create_tensor: loading tensor blk.9.ffn_up.weight
create_tensor: loading tensor blk.9.ffn_down.weight
create_tensor: loading tensor blk.9.ffn_gate.weight
create_tensor: loading tensor blk.9.layer_output_norm.weight
create_tensor: loading tensor blk.9.layer_output_norm.bias
create_tensor: loading tensor blk.10.attn_qkv.weight
create_tensor: loading tensor blk.10.attn_output.weight
create_tensor: loading tensor blk.10.attn_output_norm.weight
create_tensor: loading tensor blk.10.attn_output_norm.bias
create_tensor: loading tensor blk.10.ffn_up.weight
create_tensor: loading tensor blk.10.ffn_down.weight
create_tensor: loading tensor blk.10.ffn_gate.weight
create_tensor: loading tensor blk.10.layer_output_norm.weight
create_tensor: loading tensor blk.10.layer_output_norm.bias
create_tensor: loading tensor blk.11.attn_qkv.weight
create_tensor: loading tensor blk.11.attn_output.weight
create_tensor: loading tensor blk.11.attn_output_norm.weight
create_tensor: loading tensor blk.11.attn_output_norm.bias
create_tensor: loading tensor blk.11.ffn_up.weight
create_tensor: loading tensor blk.11.ffn_down.weight
create_tensor: loading tensor blk.11.ffn_gate.weight
create_tensor: loading tensor blk.11.layer_output_norm.weight
create_tensor: loading tensor blk.11.layer_output_norm.bias
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.14 MiB
llama_init_from_model: model default pooling_type is [1], but [-1] was specified
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 3
llama_context: max_nodes = 1024
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
llama_context:      Metal compute buffer size =    23.51 MiB
llama_context:        CPU compute buffer size =     4.01 MiB
llama_context: graph nodes  = 371
llama_context: graph splits = 2
time=2025-11-12T12:15:03.454-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds"
time=2025-11-12T12:15:03.454-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1
time=2025-11-12T12:15:03.454-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T12:15:03.454-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds"
time=2025-11-12T12:15:03.454-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.457-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:03.462-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=495 used=0 remaining=495
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 0.12 MiB to 59.08 MiB
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1             0x104bfc720 | th_max =  896 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32'
ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32                          0x104bf8740 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32'
ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32                            0x104bfd720 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1             0x104bfe8a0 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32                           0x104bff0a0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1             0x104bff6a0 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1'
ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1                             0x104bff9a0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4                     0xb54448000 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32'
ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32                             0xb54448300 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32'
ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32                           0xb54448600 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4                   0xb54448900 | th_max = 1024 | th_width =   32
[GIN] 2025/11/12 - 12:15:03 | 200 |  479.809709ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T12:15:03.535-03:00 level=DEBUG source=sched.go:520 msg="context for request finished"
time=2025-11-12T12:15:03.536-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T12:15:03.536-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T12:15:03.547-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.549-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:03.550-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=495 prompt=491 used=0 remaining=491
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
[GIN] 2025/11/12 - 12:15:03 | 200 |   58.642833ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T12:15:03.597-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.597-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T12:15:03.597-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T12:15:03.610-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.613-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:03.614-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=491 prompt=452 used=0 remaining=452
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4                         0xb54448c00 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4                 0xb54448f00 | th_max =  768 | th_width =   32
[GIN] 2025/11/12 - 12:15:03 | 200 |   54.334542ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T12:15:03.656-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.656-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T12:15:03.656-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T12:15:03.666-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.669-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:03.670-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=452 prompt=255 used=0 remaining=255
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=2'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=2                   0xb54449200 | th_max = 1024 | th_width =   32
[GIN] 2025/11/12 - 12:15:03 | 200 |   33.019959ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T12:15:03.691-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.691-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T12:15:03.691-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T12:15:03.707-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.709-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:03.710-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=255 prompt=1872 used=0 remaining=1872
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 59.08 MiB to 61.11 MiB
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0             0xb54449500 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0             0xb54449800 | th_max =  896 | th_width =   32
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
SIGTRAP: trace trap
PC=0x181ddab1c m=4 sigcode=0
signal arrived during cgo execution

goroutine 23 gp=0x14000103500 m=4 mp=0x14000100008 [syscall]:
runtime.cgocall(0x103450fbc, 0x14000085b88)
	runtime/cgocall.go:167 +0x44 fp=0x14000085b50 sp=0x14000085b10 pc=0x102956684
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x104bef650, {0x200, 0x104bf5fe0, 0x0, 0xb5489d000, 0xb5489c800, 0xb5558e800, 0x104bf03f0})
	_cgo_gotypes.go:674 +0x30 fp=0x14000085b80 sp=0x14000085b50 pc=0x102c9d790
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:160
github.com/ollama/ollama/llama.(*Context).Decode(0x14000482c08?, 0x0?)
	github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x14000085c70 sp=0x14000085b80 pc=0x102c9f94c
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x140003f2320, 0x1400004ebe0, 0x14000085f18)
	github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x14000085ed0 sp=0x14000085c70 pc=0x102d3f774
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x140003f2320, {0x103b5bcd0, 0x1400040e0a0})
	github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x14000085fa0 sp=0x14000085ed0 pc=0x102d3f43c
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x14000085fd0 sp=0x14000085fa0 pc=0x102d4324c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000085fd0 sp=0x14000085fd0 pc=0x102961d04
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400056d720 sp=0x1400056d700 pc=0x102959a50
runtime.netpollblock(0x1400050f7b8?, 0x29dba6c?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x1400056d760 sp=0x1400056d720 pc=0x10291f620
internal/poll.runtime_pollWait(0x14fcc1e00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x1400056d790 sp=0x1400056d760 pc=0x102958c80
internal/poll.(*pollDesc).wait(0x14000517b00?, 0x1028ff5d0?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400056d7c0 sp=0x1400056d790 pc=0x1029d7568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14000517b00)
	internal/poll/fd_unix.go:613 +0x21c fp=0x1400056d870 sp=0x1400056d7c0 pc=0x1029dbb4c
net.(*netFD).accept(0x14000517b00)
	net/fd_unix.go:161 +0x28 fp=0x1400056d930 sp=0x1400056d870 pc=0x102a3d7f8
net.(*TCPListener).accept(0x140004ae080)
	net/tcpsock_posix.go:159 +0x24 fp=0x1400056d980 sp=0x1400056d930 pc=0x102a50e64
net.(*TCPListener).Accept(0x140004ae080)
	net/tcpsock.go:380 +0x2c fp=0x1400056d9c0 sp=0x1400056d980 pc=0x102a4ff0c
net/http.(*onceCloseListener).Accept(0x14000014000?)
	<autogenerated>:1 +0x2c fp=0x1400056d9e0 sp=0x1400056d9c0 pc=0x102c24a1c
net/http.(*Server).Serve(0x14000270800, {0x103b59668, 0x140004ae080})
	net/http/server.go:3463 +0x24c fp=0x1400056db10 sp=0x1400056d9e0 pc=0x102bffbac
github.com/ollama/ollama/runner/llamarunner.Execute({0x140001801a0, 0x4, 0x4})
	github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x1400056dce0 sp=0x1400056db10 pc=0x102d43064
github.com/ollama/ollama/runner.Execute({0x14000180190?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x1400056dd10 sp=0x1400056dce0 pc=0x102dba528
github.com/ollama/ollama/cmd.NewCLI.func2(0x140000a4e00?, {0x1036a7af0?, 0x4?, 0x1036a7af4?})
	github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x1400056dd40 sp=0x1400056dd10 pc=0x103401db0
github.com/spf13/cobra.(*Command).execute(0x1400052bb08, {0x14000113a40, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x1400056de60 sp=0x1400056dd40 pc=0x102aa7f60
github.com/spf13/cobra.(*Command).ExecuteC(0x140000d8608)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x1400056df20 sp=0x1400056de60 pc=0x102aa863c
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x1400056df40 sp=0x1400056df20 pc=0x1034028d4
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x1400056dfd0 sp=0x1400056df40 pc=0x1029260d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400056dfd0 sp=0x1400056dfd0 pc=0x102961d04

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x102959a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x102926424
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x102961d04
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 18 gp=0x14000102380 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x102959a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x14000098000)
	runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x102910ee4
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x102904b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x102961d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 19 gp=0x14000102540 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x103866590?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x102959a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x10444d860)
	runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x10290e9fc
runtime.bgscavenge(0x14000098000)
	runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x10290ef9c
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x102904ad8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x102961d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 34 gp=0x14000186380 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x103b45a18?, 0xb0?, 0x2?, 0x1000000010?)
	runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x102959a50
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x102903b24
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x102961d04
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 35 gp=0x14000186e00 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000308740 sp=0x14000308720 pc=0x102959a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x10444e740)
	runtime/mcleanup.go:439 +0x110 fp=0x14000308780 sp=0x14000308740 pc=0x102901010
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x140003087d0 sp=0x14000308780 pc=0x102901820
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140003087d0 sp=0x140003087d0 pc=0x102961d04
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 36 gp=0x14000187180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000308f10 sp=0x14000308ef0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000308fb0 sp=0x14000308f10 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000308fd0 sp=0x14000308fb0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000308fd0 sp=0x14000308fd0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 3 gp=0x14000003500 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f3740f6e?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000173f10 sp=0x14000173ef0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000173fb0 sp=0x14000173f10 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000173fd0 sp=0x14000173fb0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000173fd0 sp=0x14000173fd0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 37 gp=0x14000187340 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f36ff69a?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000309710 sp=0x140003096f0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x140003097b0 sp=0x14000309710 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140003097d0 sp=0x140003097b0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140003097d0 sp=0x140003097d0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 38 gp=0x14000187500 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f37408c2?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000175f10 sp=0x14000175ef0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000175fb0 sp=0x14000175f10 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000175fd0 sp=0x14000175fb0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000175fd0 sp=0x14000175fd0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 21 gp=0x140001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f36ff4a6?, 0x3?, 0xfa?, 0x7d?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 4 gp=0x140000036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f373f2f3?, 0x3?, 0xb6?, 0xa4?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 39 gp=0x140001876c0 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f36fe55a?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400030a710 sp=0x1400030a6f0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400030a7b0 sp=0x1400030a710 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400030a7d0 sp=0x1400030a7b0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400030a7d0 sp=0x1400030a7d0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 22 gp=0x14000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f37047fc?, 0x1?, 0x4c?, 0x5?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x581b7f373da89?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x102959a50
runtime.gcBgMarkWorker(0x14000183880)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1029071b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x102907098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x102961d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 40 gp=0x14000003a40 m=nil [chan receive]:
runtime.gopark(0x1400001b868?, 0x10293a594?, 0x98?, 0xb8?, 0x10295b9fc?)
	runtime/proc.go:460 +0xc0 fp=0x1400001b850 sp=0x1400001b830 pc=0x102959a50
runtime.chanrecv(0x140003bc150, 0x1400001ba40, 0x1)
	runtime/chan.go:667 +0x428 fp=0x1400001b8d0 sp=0x1400001b850 pc=0x1028f4318
runtime.chanrecv1(0x14000168ae0?, 0x1400056e000?)
	runtime/chan.go:509 +0x14 fp=0x1400001b900 sp=0x1400001b8d0 pc=0x1028f3eb4
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x140003f2320, {0x103b59848, 0x140002b20f0}, 0x140000003c0)
	github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x1400001ba90 sp=0x1400001b900 pc=0x102d41728
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x103b59848?, 0x140002b20f0?}, 0x1400001bb18?)
	<autogenerated>:1 +0x40 fp=0x1400001bac0 sp=0x1400001ba90 pc=0x102d43570
net/http.HandlerFunc.ServeHTTP(0x140000b4480?, {0x103b59848?, 0x140002b20f0?}, 0x1400001bb00?)
	net/http/server.go:2322 +0x38 fp=0x1400001baf0 sp=0x1400001bac0 pc=0x102bfc7e8
net/http.(*ServeMux).ServeHTTP(0x10?, {0x103b59848, 0x140002b20f0}, 0x140000003c0)
	net/http/server.go:2861 +0x190 fp=0x1400001bb40 sp=0x1400001baf0 pc=0x102bfe280
net/http.serverHandler.ServeHTTP({0x103b562b0?}, {0x103b59848?, 0x140002b20f0?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x1400001bb70 sp=0x1400001bb40 pc=0x102c18a70
net/http.(*conn).serve(0x14000014000, {0x103b5bc98, 0x140002f8f30})
	net/http/server.go:2109 +0x528 fp=0x1400001bfa0 sp=0x1400001bb70 pc=0x102bfabd8
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x1400001bfd0 sp=0x1400001bfa0 pc=0x102bfff0c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400001bfd0 sp=0x1400001bfd0 pc=0x102961d04
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 9 gp=0x14000003c00 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x10297d600?)
	runtime/proc.go:460 +0xc0 fp=0x14000305580 sp=0x14000305560 pc=0x102959a50
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x140003055c0 sp=0x14000305580 pc=0x10291f620
internal/poll.runtime_pollWait(0x14fcc1c00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x140003055f0 sp=0x140003055c0 pc=0x102958c80
internal/poll.(*pollDesc).wait(0x1400024e000?, 0x140000a87e1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000305620 sp=0x140003055f0 pc=0x1029d7568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400024e000, {0x140000a87e1, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x140003056c0 sp=0x14000305620 pc=0x1029d8780
net.(*netFD).Read(0x1400024e000, {0x140000a87e1?, 0x104389450?, 0x140000a8894?})
	net/fd_posix.go:68 +0x28 fp=0x14000305710 sp=0x140003056c0 pc=0x102a3bff8
net.(*conn).Read(0x14000070000, {0x140000a87e1?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x14000305760 sp=0x14000305710 pc=0x102a485d4
net/http.(*connReader).backgroundRead(0x140000a87c0)
	net/http/server.go:702 +0x38 fp=0x140003057b0 sp=0x14000305760 pc=0x102bf5c48
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x140003057d0 sp=0x140003057b0 pc=0x102bf5b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140003057d0 sp=0x140003057d0 pc=0x102961d04
created by net/http.(*connReader).startBackgroundRead in goroutine 40
	net/http/server.go:698 +0xb8

r0      0x1047e0000
r1      0x1047e3c60
r2      0x0
r3      0x1047e7000
r4      0xb54461800
r5      0x0
r6      0xffffffffbfc007ff
r7      0xfffff0003ffff800
r8      0xb54461800
r9      0x0
r10     0x800380401980
r11     0xbfb99984c02334b7
r12     0x3f140930bf8574a0
r13     0xe0cfed5d694beb4b
r14     0x104858cb8
r15     0xb54460000
r16     0x2821b5e88
r17     0xffffffffb00007ff
r18     0x0
r19     0xc00
r20     0xc0040d262c7
r21     0x0
r22     0x104bef790
r23     0x0
r24     0x300
r25     0x300
r26     0x200
r27     0x104bef790
r28     0x0
r29     0x16ed36cd0
lr      0x181f6ca78
sp      0x16ed36c60
pc      0x181ddab1c
fault   0x181ddab1c
[GIN] 2025/11/12 - 12:15:03 | 500 |   60.695958ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T12:15:03.760-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.760-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T12:15:03.760-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 "
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 "
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:161 msg=reloading runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:236 msg="resetting model to expire immediately to make room" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:247 msg="waiting for pending requests to complete and unload to occur" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:311 msg="runner expired event received" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:326 msg="got lock to unload expired event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:349 msg="starting background wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:657 msg="no need to wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=server.go:1699 msg="stopping llama server" pid=99032
time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=sched.go:358 msg="runner terminated and removed from list, blocking for VRAM recovery" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=sched.go:361 msg="sending an unloaded event" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=sched.go:253 msg="unload completed" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.125µs
time=2025-11-12T12:15:03.775-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:03.776-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
time=2025-11-12T12:15:03.797-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048
time=2025-11-12T12:15:03.798-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 56918"
time=2025-11-12T12:15:03.798-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin
time=2025-11-12T12:15:03.800-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.2 GiB" free_swap="0 B"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.800-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T12:15:03.801-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB"
time=2025-11-12T12:15:03.821-03:00 level=INFO source=runner.go:910 msg="starting go runner"
time=2025-11-12T12:15:03.821-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.007 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
time=2025-11-12T12:15:03.821-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-11-12T12:15:03.845-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:56918"
time=2025-11-12T12:15:03.856-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
time=2025-11-12T12:15:03.856-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T12:15:03.857-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device Metal, is_swa = 0
load_tensors: layer   1 assigned to device Metal, is_swa = 0
load_tensors: layer   2 assigned to device Metal, is_swa = 0
load_tensors: layer   3 assigned to device Metal, is_swa = 0
load_tensors: layer   4 assigned to device Metal, is_swa = 0
load_tensors: layer   5 assigned to device Metal, is_swa = 0
load_tensors: layer   6 assigned to device Metal, is_swa = 0
load_tensors: layer   7 assigned to device Metal, is_swa = 0
load_tensors: layer   8 assigned to device Metal, is_swa = 0
load_tensors: layer   9 assigned to device Metal, is_swa = 0
load_tensors: layer  10 assigned to device Metal, is_swa = 0
load_tensors: layer  11 assigned to device Metal, is_swa = 0
load_tensors: layer  12 assigned to device Metal, is_swa = 0
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor token_types.weight
create_tensor: loading tensor token_embd_norm.weight
create_tensor: loading tensor token_embd_norm.bias
create_tensor: loading tensor blk.0.attn_qkv.weight
create_tensor: loading tensor blk.0.attn_output.weight
create_tensor: loading tensor blk.0.attn_output_norm.weight
create_tensor: loading tensor blk.0.attn_output_norm.bias
create_tensor: loading tensor blk.0.ffn_up.weight
create_tensor: loading tensor blk.0.ffn_down.weight
create_tensor: loading tensor blk.0.ffn_gate.weight
create_tensor: loading tensor blk.0.layer_output_norm.weight
create_tensor: loading tensor blk.0.layer_output_norm.bias
create_tensor: loading tensor blk.1.attn_qkv.weight
create_tensor: loading tensor blk.1.attn_output.weight
create_tensor: loading tensor blk.1.attn_output_norm.weight
create_tensor: loading tensor blk.1.attn_output_norm.bias
create_tensor: loading tensor blk.1.ffn_up.weight
create_tensor: loading tensor blk.1.ffn_down.weight
create_tensor: loading tensor blk.1.ffn_gate.weight
create_tensor: loading tensor blk.1.layer_output_norm.weight
create_tensor: loading tensor blk.1.layer_output_norm.bias
create_tensor: loading tensor blk.2.attn_qkv.weight
create_tensor: loading tensor blk.2.attn_output.weight
create_tensor: loading tensor blk.2.attn_output_norm.weight
create_tensor: loading tensor blk.2.attn_output_norm.bias
create_tensor: loading tensor blk.2.ffn_up.weight
create_tensor: loading tensor blk.2.ffn_down.weight
create_tensor: loading tensor blk.2.ffn_gate.weight
create_tensor: loading tensor blk.2.layer_output_norm.weight
create_tensor: loading tensor blk.2.layer_output_norm.bias
create_tensor: loading tensor blk.3.attn_qkv.weight
create_tensor: loading tensor blk.3.attn_output.weight
create_tensor: loading tensor blk.3.attn_output_norm.weight
create_tensor: loading tensor blk.3.attn_output_norm.bias
create_tensor: loading tensor blk.3.ffn_up.weight
create_tensor: loading tensor blk.3.ffn_down.weight
create_tensor: loading tensor blk.3.ffn_gate.weight
create_tensor: loading tensor blk.3.layer_output_norm.weight
create_tensor: loading tensor blk.3.layer_output_norm.bias
create_tensor: loading tensor blk.4.attn_qkv.weight
create_tensor: loading tensor blk.4.attn_output.weight
create_tensor: loading tensor blk.4.attn_output_norm.weight
create_tensor: loading tensor blk.4.attn_output_norm.bias
create_tensor: loading tensor blk.4.ffn_up.weight
create_tensor: loading tensor blk.4.ffn_down.weight
create_tensor: loading tensor blk.4.ffn_gate.weight
create_tensor: loading tensor blk.4.layer_output_norm.weight
create_tensor: loading tensor blk.4.layer_output_norm.bias
create_tensor: loading tensor blk.5.attn_qkv.weight
create_tensor: loading tensor blk.5.attn_output.weight
create_tensor: loading tensor blk.5.attn_output_norm.weight
create_tensor: loading tensor blk.5.attn_output_norm.bias
create_tensor: loading tensor blk.5.ffn_up.weight
create_tensor: loading tensor blk.5.ffn_down.weight
create_tensor: loading tensor blk.5.ffn_gate.weight
create_tensor: loading tensor blk.5.layer_output_norm.weight
create_tensor: loading tensor blk.5.layer_output_norm.bias
create_tensor: loading tensor blk.6.attn_qkv.weight
create_tensor: loading tensor blk.6.attn_output.weight
create_tensor: loading tensor blk.6.attn_output_norm.weight
create_tensor: loading tensor blk.6.attn_output_norm.bias
create_tensor: loading tensor blk.6.ffn_up.weight
create_tensor: loading tensor blk.6.ffn_down.weight
create_tensor: loading tensor blk.6.ffn_gate.weight
create_tensor: loading tensor blk.6.layer_output_norm.weight
create_tensor: loading tensor blk.6.layer_output_norm.bias
create_tensor: loading tensor blk.7.attn_qkv.weight
create_tensor: loading tensor blk.7.attn_output.weight
create_tensor: loading tensor blk.7.attn_output_norm.weight
create_tensor: loading tensor blk.7.attn_output_norm.bias
create_tensor: loading tensor blk.7.ffn_up.weight
create_tensor: loading tensor blk.7.ffn_down.weight
create_tensor: loading tensor blk.7.ffn_gate.weight
create_tensor: loading tensor blk.7.layer_output_norm.weight
create_tensor: loading tensor blk.7.layer_output_norm.bias
create_tensor: loading tensor blk.8.attn_qkv.weight
create_tensor: loading tensor blk.8.attn_output.weight
create_tensor: loading tensor blk.8.attn_output_norm.weight
create_tensor: loading tensor blk.8.attn_output_norm.bias
create_tensor: loading tensor blk.8.ffn_up.weight
create_tensor: loading tensor blk.8.ffn_down.weight
create_tensor: loading tensor blk.8.ffn_gate.weight
create_tensor: loading tensor blk.8.layer_output_norm.weight
create_tensor: loading tensor blk.8.layer_output_norm.bias
create_tensor: loading tensor blk.9.attn_qkv.weight
create_tensor: loading tensor blk.9.attn_output.weight
create_tensor: loading tensor blk.9.attn_output_norm.weight
create_tensor: loading tensor blk.9.attn_output_norm.bias
create_tensor: loading tensor blk.9.ffn_up.weight
create_tensor: loading tensor blk.9.ffn_down.weight
create_tensor: loading tensor blk.9.ffn_gate.weight
create_tensor: loading tensor blk.9.layer_output_norm.weight
create_tensor: loading tensor blk.9.layer_output_norm.bias
create_tensor: loading tensor blk.10.attn_qkv.weight
create_tensor: loading tensor blk.10.attn_output.weight
create_tensor: loading tensor blk.10.attn_output_norm.weight
create_tensor: loading tensor blk.10.attn_output_norm.bias
create_tensor: loading tensor blk.10.ffn_up.weight
create_tensor: loading tensor blk.10.ffn_down.weight
create_tensor: loading tensor blk.10.ffn_gate.weight
create_tensor: loading tensor blk.10.layer_output_norm.weight
create_tensor: loading tensor blk.10.layer_output_norm.bias
create_tensor: loading tensor blk.11.attn_qkv.weight
create_tensor: loading tensor blk.11.attn_output.weight
create_tensor: loading tensor blk.11.attn_output_norm.weight
create_tensor: loading tensor blk.11.attn_output_norm.bias
create_tensor: loading tensor blk.11.ffn_up.weight
create_tensor: loading tensor blk.11.ffn_down.weight
create_tensor: loading tensor blk.11.ffn_gate.weight
create_tensor: loading tensor blk.11.layer_output_norm.weight
create_tensor: loading tensor blk.11.layer_output_norm.bias
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.14 MiB
llama_init_from_model: model default pooling_type is [1], but [-1] was specified
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 3
llama_context: max_nodes = 1024
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
llama_context:      Metal compute buffer size =    23.51 MiB
llama_context:        CPU compute buffer size =     4.01 MiB
llama_context: graph nodes  = 371
llama_context: graph splits = 2
time=2025-11-12T12:15:04.108-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds"
time=2025-11-12T12:15:04.108-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1
time=2025-11-12T12:15:04.108-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T12:15:04.108-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds"
time=2025-11-12T12:15:04.108-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99034 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T12:15:04.111-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T12:15:04.114-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=901 used=0 remaining=901
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0             0x103133da0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32'
ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32                          0x1031346e0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32'
ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32                            0x1031351e0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0             0x103136360 | th_max =  896 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4                         0x103136b60 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1'
ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1                             0x103136e60 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4                     0x103137320 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32'
ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32                             0xaba94c000 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32'
ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32                           0xaba94c300 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4                 0xaba94c600 | th_max =  768 | th_width =   32
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1             0xaba94c900 | th_max =  896 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1             0xaba94cc00 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32                           0xaba94cf00 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1             0xaba94d200 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4                   0xaba94d500 | th_max = 1024 | th_width =   32
SIGTRAP: trace trap
PC=0x181ddab1c m=0 sigcode=0
signal arrived during cgo execution

goroutine 40 gp=0x14000102fc0 m=0 mp=0x1027b0120 [syscall]:
runtime.cgocall(0x1017b0fbc, 0x1400052cb88)
	runtime/cgocall.go:167 +0x44 fp=0x1400052cb50 sp=0x1400052cb10 pc=0x100cb6684
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x103127260, {0x185, 0xabac95000, 0x0, 0x10312d0e0, 0x10312d8e0, 0xaba852800, 0x103128260})
	_cgo_gotypes.go:674 +0x30 fp=0x1400052cb80 sp=0x1400052cb50 pc=0x100ffd790
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:160
github.com/ollama/ollama/llama.(*Context).Decode(0x140000dec08?, 0x0?)
	github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x1400052cc70 sp=0x1400052cb80 pc=0x100fff94c
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000530140, 0x140001f6af0, 0x1400052cf18)
	github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x1400052ced0 sp=0x1400052cc70 pc=0x10109f774
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000530140, {0x101ebbcd0, 0x1400004e0f0})
	github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x1400052cfa0 sp=0x1400052ced0 pc=0x10109f43c
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x1400052cfd0 sp=0x1400052cfa0 pc=0x1010a324c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400052cfd0 sp=0x1400052cfd0 pc=0x100cc1d04
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140002af720 sp=0x140002af700 pc=0x100cb9a50
runtime.netpollblock(0x1400011f7b8?, 0xd3ba6c?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x140002af760 sp=0x140002af720 pc=0x100c7f620
internal/poll.runtime_pollWait(0x12e068000, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x140002af790 sp=0x140002af760 pc=0x100cb8c80
internal/poll.(*pollDesc).wait(0x14000250a80?, 0x100d3dafc?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140002af7c0 sp=0x140002af790 pc=0x100d37568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14000250a80)
	internal/poll/fd_unix.go:613 +0x21c fp=0x140002af870 sp=0x140002af7c0 pc=0x100d3bb4c
net.(*netFD).accept(0x14000250a80)
	net/fd_unix.go:161 +0x28 fp=0x140002af930 sp=0x140002af870 pc=0x100d9d7f8
net.(*TCPListener).accept(0x140001ed780)
	net/tcpsock_posix.go:159 +0x24 fp=0x140002af980 sp=0x140002af930 pc=0x100db0e64
net.(*TCPListener).Accept(0x140001ed780)
	net/tcpsock.go:380 +0x2c fp=0x140002af9c0 sp=0x140002af980 pc=0x100daff0c
net/http.(*onceCloseListener).Accept(0x1400053c090?)
	<autogenerated>:1 +0x2c fp=0x140002af9e0 sp=0x140002af9c0 pc=0x100f84a1c
net/http.(*Server).Serve(0x140000a6200, {0x101eb9668, 0x140001ed780})
	net/http/server.go:3463 +0x24c fp=0x140002afb10 sp=0x140002af9e0 pc=0x100f5fbac
github.com/ollama/ollama/runner/llamarunner.Execute({0x140001801a0, 0x4, 0x4})
	github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x140002afce0 sp=0x140002afb10 pc=0x1010a3064
github.com/ollama/ollama/runner.Execute({0x14000180190?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140002afd10 sp=0x140002afce0 pc=0x10111a528
github.com/ollama/ollama/cmd.NewCLI.func2(0x14000273400?, {0x101a07af0?, 0x4?, 0x101a07af4?})
	github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x140002afd40 sp=0x140002afd10 pc=0x101761db0
github.com/spf13/cobra.(*Command).execute(0x140000bf508, {0x14000113200, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140002afe60 sp=0x140002afd40 pc=0x100e07f60
github.com/spf13/cobra.(*Command).ExecuteC(0x1400027bb08)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140002aff20 sp=0x140002afe60 pc=0x100e0863c
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x140002aff40 sp=0x140002aff20 pc=0x1017628d4
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x140002affd0 sp=0x140002aff40 pc=0x100c860d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140002affd0 sp=0x140002affd0 pc=0x100cc1d04

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100cb9a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x100c86424
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x100cc1d04
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 18 gp=0x14000102380 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x100cb9a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x14000098000)
	runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x100c70ee4
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x100c64b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x100cc1d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 19 gp=0x14000102540 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x101bc6590?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x100cb9a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x1027ad860)
	runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x100c6e9fc
runtime.bgscavenge(0x14000098000)
	runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x100c6ef9c
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x100c64ad8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x100cc1d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 34 gp=0x14000186380 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x101ea5a18?, 0xb0?, 0x22?, 0x1000000010?)
	runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x100cb9a50
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x100c63b24
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x100cc1d04
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 35 gp=0x14000186e00 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000308740 sp=0x14000308720 pc=0x100cb9a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x1027ae740)
	runtime/mcleanup.go:439 +0x110 fp=0x14000308780 sp=0x14000308740 pc=0x100c61010
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x140003087d0 sp=0x14000308780 pc=0x100c61820
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140003087d0 sp=0x140003087d0 pc=0x100cc1d04
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 36 gp=0x14000187180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000308f10 sp=0x14000308ef0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000308fb0 sp=0x14000308f10 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000308fd0 sp=0x14000308fb0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000308fd0 sp=0x14000308fd0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 3 gp=0x14000003500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006d710 sp=0x1400006d6f0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006d7b0 sp=0x1400006d710 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 37 gp=0x14000187340 m=nil [GC worker (idle)]:
runtime.gopark(0x581b812bbe909?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000309710 sp=0x140003096f0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x140003097b0 sp=0x14000309710 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140003097d0 sp=0x140003097b0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140003097d0 sp=0x140003097d0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 4 gp=0x140000036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x581b812bcca9f?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 21 gp=0x140001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x581b812bcc8fe?, 0x3?, 0x3e?, 0xbf?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 38 gp=0x14000187500 m=nil [GC worker (idle)]:
runtime.gopark(0x581b812bbe986?, 0x1?, 0xed?, 0x17?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000309f10 sp=0x14000309ef0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000309fb0 sp=0x14000309f10 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000309fd0 sp=0x14000309fb0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000309fd0 sp=0x14000309fd0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 39 gp=0x140001876c0 m=nil [GC worker (idle)]:
runtime.gopark(0x581b812bbd4b0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400030a710 sp=0x1400030a6f0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400030a7b0 sp=0x1400030a710 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400030a7d0 sp=0x1400030a7b0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400030a7d0 sp=0x1400030a7d0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 22 gp=0x14000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x581b812bbec4a?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400052df10 sp=0x1400052def0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400052dfb0 sp=0x1400052df10 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400052dfd0 sp=0x1400052dfb0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400052dfd0 sp=0x1400052dfd0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x581b812bc5fd9?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400052bf10 sp=0x1400052bef0 pc=0x100cb9a50
runtime.gcBgMarkWorker(0x140001836c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400052bfb0 sp=0x1400052bf10 pc=0x100c671b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400052bfd0 sp=0x1400052bfb0 pc=0x100c67098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400052bfd0 sp=0x1400052bfd0 pc=0x100cc1d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 41 gp=0x14000103180 m=nil [chan receive]:
runtime.gopark(0x14000049868?, 0x100c9a594?, 0x98?, 0x98?, 0x100cbb9fc?)
	runtime/proc.go:460 +0xc0 fp=0x14000541850 sp=0x14000541830 pc=0x100cb9a50
runtime.chanrecv(0x1400040be30, 0x14000049a40, 0x1)
	runtime/chan.go:667 +0x428 fp=0x140005418d0 sp=0x14000541850 pc=0x100c54318
runtime.chanrecv1(0x14000016030?, 0x140000e6008?)
	runtime/chan.go:509 +0x14 fp=0x14000541900 sp=0x140005418d0 pc=0x100c53eb4
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000530140, {0x101eb9848, 0x140004a6f00}, 0x140004e8780)
	github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x14000541a90 sp=0x14000541900 pc=0x1010a1728
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x101eb9848?, 0x140004a6f00?}, 0x14000049b18?)
	<autogenerated>:1 +0x40 fp=0x14000541ac0 sp=0x14000541a90 pc=0x1010a3570
net/http.HandlerFunc.ServeHTTP(0x14000536000?, {0x101eb9848?, 0x140004a6f00?}, 0x14000049b00?)
	net/http/server.go:2322 +0x38 fp=0x14000541af0 sp=0x14000541ac0 pc=0x100f5c7e8
net/http.(*ServeMux).ServeHTTP(0x10?, {0x101eb9848, 0x140004a6f00}, 0x140004e8780)
	net/http/server.go:2861 +0x190 fp=0x14000541b40 sp=0x14000541af0 pc=0x100f5e280
net/http.serverHandler.ServeHTTP({0x101eb62b0?}, {0x101eb9848?, 0x140004a6f00?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x14000541b70 sp=0x14000541b40 pc=0x100f78a70
net/http.(*conn).serve(0x1400053c090, {0x101ebbc98, 0x1400052e360})
	net/http/server.go:2109 +0x528 fp=0x14000541fa0 sp=0x14000541b70 pc=0x100f5abd8
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x14000541fd0 sp=0x14000541fa0 pc=0x100f5ff0c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000541fd0 sp=0x14000541fd0 pc=0x100cc1d04
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 7 gp=0x14000187dc0 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100cdd600?)
	runtime/proc.go:460 +0xc0 fp=0x14000305580 sp=0x14000305560 pc=0x100cb9a50
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x140003055c0 sp=0x14000305580 pc=0x100c7f620
internal/poll.runtime_pollWait(0x12e067e00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x140003055f0 sp=0x140003055c0 pc=0x100cb8c80
internal/poll.(*pollDesc).wait(0x14000250b00?, 0x140001ed7e1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000305620 sp=0x140003055f0 pc=0x100d37568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x14000250b00, {0x140001ed7e1, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x140003056c0 sp=0x14000305620 pc=0x100d38780
net.(*netFD).Read(0x14000250b00, {0x140001ed7e1?, 0x1026e9450?, 0x140001ed894?})
	net/fd_posix.go:68 +0x28 fp=0x14000305710 sp=0x140003056c0 pc=0x100d9bff8
net.(*conn).Read(0x14000140128, {0x140001ed7e1?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x14000305760 sp=0x14000305710 pc=0x100da85d4
net/http.(*connReader).backgroundRead(0x140001ed7c0)
	net/http/server.go:702 +0x38 fp=0x140003057b0 sp=0x14000305760 pc=0x100f55c48
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x140003057d0 sp=0x140003057b0 pc=0x100f55b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140003057d0 sp=0x140003057d0 pc=0x100cc1d04
created by net/http.(*connReader).startBackgroundRead in goroutine 41
	net/http/server.go:698 +0xb8

r0      0x102f34000
r1      0x102f37cc0
r2      0x0
r3      0x102f3b020
r4      0xabac63000
r5      0x0
r6      0xffffffffbfc007ff
r7      0xfffff0003ffff800
r8      0xabac63000
r9      0x0
r10     0x300100400b00
r11     0xbfb99984c02334b7
r12     0x3f140930bf8574a0
r13     0xdacb4deb4e9a747a
r14     0x102f80cb8
r15     0xabac60000
r16     0x2821b5e88
r17     0xffffffffb00007ff
r18     0x0
r19     0xc00
r20     0xc00405fe2c7
r21     0x0
r22     0x1031273a0
r23     0x0
r24     0x300
r25     0x300
r26     0x185
r27     0x1031273a0
r28     0x0
r29     0x16f1becf0
lr      0x181f6ca78
sp      0x16f1bec80
pc      0x181ddab1c
fault   0x181ddab1c
[GIN] 2025/11/12 - 12:15:04 | 500 |  411.454458ms |     192.168.1.3 | POST     "/api/embed"
<!-- gh-comment-id:3522461205 --> @smileBeda commented on GitHub (Nov 12, 2025): @rick-github Yes, this is not linux, as the title says. Logs included until the first failures. > ince the training size of the model (as shown in the output from ollama show and also indicated in the too large message) is 2048 tokens, The reality looks different: - it fails above 512 sie - model info says that it uses `num_ctx 8192` and that is what I see it trying to pass, _unless_ I craft my batches so that they are 512 or smaller. I am totally aware that model does NOT support 8192 itself. However it does seem to also not support the stated 2048 ```OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 ollama serve time=2025-11-12T12:14:53.224-03:00 level=INFO source=routes.go:1525 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/bedas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-11-12T12:14:53.225-03:00 level=INFO source=images.go:522 msg="total blobs: 9" time=2025-11-12T12:14:53.225-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-12T12:14:53.225-03:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)" time=2025-11-12T12:14:53.225-03:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-12T12:14:53.225-03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-12T12:14:53.226-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --ollama-engine --port 56899" time=2025-11-12T12:14:53.226-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=51.694ms OLLAMA_LIBRARY_PATH=[/opt/homebrew/Cellar/ollama/0.12.10/bin] extra_envs=map[] time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=1 time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:172 msg="adjusting filtering IDs" FilterID=0 new_ID=0 time=2025-11-12T12:14:53.277-03:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=51.869833ms time=2025-11-12T12:14:53.277-03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M2 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="11.8 GiB" available="11.8 GiB" time=2025-11-12T12:14:53.277-03:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="11.8 GiB" threshold="20.0 GiB" [GIN] 2025/11/12 - 12:15:02 | 200 | 6.742583ms | 192.168.1.3 | GET "/api/tags" time=2025-11-12T12:15:03.072-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=2.583µs time=2025-11-12T12:15:03.072-03:00 level=DEBUG source=sched.go:194 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-11-12T12:15:03.076-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:03.076-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.019 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 llama_model_load: vocab only - skipping tensors time=2025-11-12T12:15:03.144-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048 time=2025-11-12T12:15:03.144-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 56911" time=2025-11-12T12:15:03.144-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin time=2025-11-12T12:15:03.147-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.6 GiB" free_swap="0 B" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.147-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB" time=2025-11-12T12:15:03.162-03:00 level=INFO source=runner.go:910 msg="starting go runner" time=2025-11-12T12:15:03.162-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.018 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB time=2025-11-12T12:15:03.162-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2025-11-12T12:15:03.200-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:56911" time=2025-11-12T12:15:03.202-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free time=2025-11-12T12:15:03.203-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T12:15:03.203-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 768 print_info: n_layer = 12 print_info: n_head = 12 print_info: n_head_kv = 12 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 768 print_info: n_embd_v_gqa = 768 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 3072 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: model type = 137M print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: layer 0 assigned to device Metal, is_swa = 0 load_tensors: layer 1 assigned to device Metal, is_swa = 0 load_tensors: layer 2 assigned to device Metal, is_swa = 0 load_tensors: layer 3 assigned to device Metal, is_swa = 0 load_tensors: layer 4 assigned to device Metal, is_swa = 0 load_tensors: layer 5 assigned to device Metal, is_swa = 0 load_tensors: layer 6 assigned to device Metal, is_swa = 0 load_tensors: layer 7 assigned to device Metal, is_swa = 0 load_tensors: layer 8 assigned to device Metal, is_swa = 0 load_tensors: layer 9 assigned to device Metal, is_swa = 0 load_tensors: layer 10 assigned to device Metal, is_swa = 0 load_tensors: layer 11 assigned to device Metal, is_swa = 0 load_tensors: layer 12 assigned to device Metal, is_swa = 0 create_tensor: loading tensor token_embd.weight create_tensor: loading tensor token_types.weight create_tensor: loading tensor token_embd_norm.weight create_tensor: loading tensor token_embd_norm.bias create_tensor: loading tensor blk.0.attn_qkv.weight create_tensor: loading tensor blk.0.attn_output.weight create_tensor: loading tensor blk.0.attn_output_norm.weight create_tensor: loading tensor blk.0.attn_output_norm.bias create_tensor: loading tensor blk.0.ffn_up.weight create_tensor: loading tensor blk.0.ffn_down.weight create_tensor: loading tensor blk.0.ffn_gate.weight create_tensor: loading tensor blk.0.layer_output_norm.weight create_tensor: loading tensor blk.0.layer_output_norm.bias create_tensor: loading tensor blk.1.attn_qkv.weight create_tensor: loading tensor blk.1.attn_output.weight create_tensor: loading tensor blk.1.attn_output_norm.weight create_tensor: loading tensor blk.1.attn_output_norm.bias create_tensor: loading tensor blk.1.ffn_up.weight create_tensor: loading tensor blk.1.ffn_down.weight create_tensor: loading tensor blk.1.ffn_gate.weight create_tensor: loading tensor blk.1.layer_output_norm.weight create_tensor: loading tensor blk.1.layer_output_norm.bias create_tensor: loading tensor blk.2.attn_qkv.weight create_tensor: loading tensor blk.2.attn_output.weight create_tensor: loading tensor blk.2.attn_output_norm.weight create_tensor: loading tensor blk.2.attn_output_norm.bias create_tensor: loading tensor blk.2.ffn_up.weight create_tensor: loading tensor blk.2.ffn_down.weight create_tensor: loading tensor blk.2.ffn_gate.weight create_tensor: loading tensor blk.2.layer_output_norm.weight create_tensor: loading tensor blk.2.layer_output_norm.bias create_tensor: loading tensor blk.3.attn_qkv.weight create_tensor: loading tensor blk.3.attn_output.weight create_tensor: loading tensor blk.3.attn_output_norm.weight create_tensor: loading tensor blk.3.attn_output_norm.bias create_tensor: loading tensor blk.3.ffn_up.weight create_tensor: loading tensor blk.3.ffn_down.weight create_tensor: loading tensor blk.3.ffn_gate.weight create_tensor: loading tensor blk.3.layer_output_norm.weight create_tensor: loading tensor blk.3.layer_output_norm.bias create_tensor: loading tensor blk.4.attn_qkv.weight create_tensor: loading tensor blk.4.attn_output.weight create_tensor: loading tensor blk.4.attn_output_norm.weight create_tensor: loading tensor blk.4.attn_output_norm.bias create_tensor: loading tensor blk.4.ffn_up.weight create_tensor: loading tensor blk.4.ffn_down.weight create_tensor: loading tensor blk.4.ffn_gate.weight create_tensor: loading tensor blk.4.layer_output_norm.weight create_tensor: loading tensor blk.4.layer_output_norm.bias create_tensor: loading tensor blk.5.attn_qkv.weight create_tensor: loading tensor blk.5.attn_output.weight create_tensor: loading tensor blk.5.attn_output_norm.weight create_tensor: loading tensor blk.5.attn_output_norm.bias create_tensor: loading tensor blk.5.ffn_up.weight create_tensor: loading tensor blk.5.ffn_down.weight create_tensor: loading tensor blk.5.ffn_gate.weight create_tensor: loading tensor blk.5.layer_output_norm.weight create_tensor: loading tensor blk.5.layer_output_norm.bias create_tensor: loading tensor blk.6.attn_qkv.weight create_tensor: loading tensor blk.6.attn_output.weight create_tensor: loading tensor blk.6.attn_output_norm.weight create_tensor: loading tensor blk.6.attn_output_norm.bias create_tensor: loading tensor blk.6.ffn_up.weight create_tensor: loading tensor blk.6.ffn_down.weight create_tensor: loading tensor blk.6.ffn_gate.weight create_tensor: loading tensor blk.6.layer_output_norm.weight create_tensor: loading tensor blk.6.layer_output_norm.bias create_tensor: loading tensor blk.7.attn_qkv.weight create_tensor: loading tensor blk.7.attn_output.weight create_tensor: loading tensor blk.7.attn_output_norm.weight create_tensor: loading tensor blk.7.attn_output_norm.bias create_tensor: loading tensor blk.7.ffn_up.weight create_tensor: loading tensor blk.7.ffn_down.weight create_tensor: loading tensor blk.7.ffn_gate.weight create_tensor: loading tensor blk.7.layer_output_norm.weight create_tensor: loading tensor blk.7.layer_output_norm.bias create_tensor: loading tensor blk.8.attn_qkv.weight create_tensor: loading tensor blk.8.attn_output.weight create_tensor: loading tensor blk.8.attn_output_norm.weight create_tensor: loading tensor blk.8.attn_output_norm.bias create_tensor: loading tensor blk.8.ffn_up.weight create_tensor: loading tensor blk.8.ffn_down.weight create_tensor: loading tensor blk.8.ffn_gate.weight create_tensor: loading tensor blk.8.layer_output_norm.weight create_tensor: loading tensor blk.8.layer_output_norm.bias create_tensor: loading tensor blk.9.attn_qkv.weight create_tensor: loading tensor blk.9.attn_output.weight create_tensor: loading tensor blk.9.attn_output_norm.weight create_tensor: loading tensor blk.9.attn_output_norm.bias create_tensor: loading tensor blk.9.ffn_up.weight create_tensor: loading tensor blk.9.ffn_down.weight create_tensor: loading tensor blk.9.ffn_gate.weight create_tensor: loading tensor blk.9.layer_output_norm.weight create_tensor: loading tensor blk.9.layer_output_norm.bias create_tensor: loading tensor blk.10.attn_qkv.weight create_tensor: loading tensor blk.10.attn_output.weight create_tensor: loading tensor blk.10.attn_output_norm.weight create_tensor: loading tensor blk.10.attn_output_norm.bias create_tensor: loading tensor blk.10.ffn_up.weight create_tensor: loading tensor blk.10.ffn_down.weight create_tensor: loading tensor blk.10.ffn_gate.weight create_tensor: loading tensor blk.10.layer_output_norm.weight create_tensor: loading tensor blk.10.layer_output_norm.bias create_tensor: loading tensor blk.11.attn_qkv.weight create_tensor: loading tensor blk.11.attn_output.weight create_tensor: loading tensor blk.11.attn_output_norm.weight create_tensor: loading tensor blk.11.attn_output_norm.bias create_tensor: loading tensor blk.11.ffn_up.weight create_tensor: loading tensor blk.11.ffn_down.weight create_tensor: loading tensor blk.11.ffn_gate.weight create_tensor: loading tensor blk.11.layer_output_norm.weight create_tensor: loading tensor blk.11.layer_output_norm.bias load_tensors: offloading 12 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 44.72 MiB load_tensors: Metal_Mapped model buffer size = 216.14 MiB llama_init_from_model: model default pooling_type is [1], but [-1] was specified llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M2 Pro ggml_metal_init: use bfloat = true ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true set_abort_callback: call llama_context: CPU output buffer size = 0.12 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 3 llama_context: max_nodes = 1024 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: Metal compute buffer size = 23.51 MiB llama_context: CPU compute buffer size = 4.01 MiB llama_context: graph nodes = 371 llama_context: graph splits = 2 time=2025-11-12T12:15:03.454-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds" time=2025-11-12T12:15:03.454-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1 time=2025-11-12T12:15:03.454-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T12:15:03.454-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds" time=2025-11-12T12:15:03.454-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.457-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:03.462-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=495 used=0 remaining=495 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 0.12 MiB to 59.08 MiB ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1 0x104bfc720 | th_max = 896 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32' ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32 0x104bf8740 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32' ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0x104bfd720 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1 0x104bfe8a0 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32 0x104bff0a0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1 0x104bff6a0 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1' ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1 0x104bff9a0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4 0xb54448000 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32' ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32 0xb54448300 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32' ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32 0xb54448600 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4 0xb54448900 | th_max = 1024 | th_width = 32 [GIN] 2025/11/12 - 12:15:03 | 200 | 479.809709ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T12:15:03.535-03:00 level=DEBUG source=sched.go:520 msg="context for request finished" time=2025-11-12T12:15:03.536-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T12:15:03.536-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T12:15:03.547-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.549-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:03.550-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=495 prompt=491 used=0 remaining=491 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding [GIN] 2025/11/12 - 12:15:03 | 200 | 58.642833ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T12:15:03.597-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.597-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T12:15:03.597-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T12:15:03.610-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.613-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:03.614-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=491 prompt=452 used=0 remaining=452 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4 0xb54448c00 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4 0xb54448f00 | th_max = 768 | th_width = 32 [GIN] 2025/11/12 - 12:15:03 | 200 | 54.334542ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T12:15:03.656-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.656-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T12:15:03.656-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T12:15:03.666-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.669-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:03.670-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=452 prompt=255 used=0 remaining=255 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=2' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=2 0xb54449200 | th_max = 1024 | th_width = 32 [GIN] 2025/11/12 - 12:15:03 | 200 | 33.019959ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T12:15:03.691-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.691-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T12:15:03.691-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T12:15:03.707-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.709-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:03.710-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=255 prompt=1872 used=0 remaining=1872 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 59.08 MiB to 61.11 MiB ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0 0xb54449500 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0 0xb54449800 | th_max = 896 | th_width = 32 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding SIGTRAP: trace trap PC=0x181ddab1c m=4 sigcode=0 signal arrived during cgo execution goroutine 23 gp=0x14000103500 m=4 mp=0x14000100008 [syscall]: runtime.cgocall(0x103450fbc, 0x14000085b88) runtime/cgocall.go:167 +0x44 fp=0x14000085b50 sp=0x14000085b10 pc=0x102956684 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x104bef650, {0x200, 0x104bf5fe0, 0x0, 0xb5489d000, 0xb5489c800, 0xb5558e800, 0x104bf03f0}) _cgo_gotypes.go:674 +0x30 fp=0x14000085b80 sp=0x14000085b50 pc=0x102c9d790 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:160 github.com/ollama/ollama/llama.(*Context).Decode(0x14000482c08?, 0x0?) github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x14000085c70 sp=0x14000085b80 pc=0x102c9f94c github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x140003f2320, 0x1400004ebe0, 0x14000085f18) github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x14000085ed0 sp=0x14000085c70 pc=0x102d3f774 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x140003f2320, {0x103b5bcd0, 0x1400040e0a0}) github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x14000085fa0 sp=0x14000085ed0 pc=0x102d3f43c github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x14000085fd0 sp=0x14000085fa0 pc=0x102d4324c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000085fd0 sp=0x14000085fd0 pc=0x102961d04 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400056d720 sp=0x1400056d700 pc=0x102959a50 runtime.netpollblock(0x1400050f7b8?, 0x29dba6c?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x1400056d760 sp=0x1400056d720 pc=0x10291f620 internal/poll.runtime_pollWait(0x14fcc1e00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x1400056d790 sp=0x1400056d760 pc=0x102958c80 internal/poll.(*pollDesc).wait(0x14000517b00?, 0x1028ff5d0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400056d7c0 sp=0x1400056d790 pc=0x1029d7568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x14000517b00) internal/poll/fd_unix.go:613 +0x21c fp=0x1400056d870 sp=0x1400056d7c0 pc=0x1029dbb4c net.(*netFD).accept(0x14000517b00) net/fd_unix.go:161 +0x28 fp=0x1400056d930 sp=0x1400056d870 pc=0x102a3d7f8 net.(*TCPListener).accept(0x140004ae080) net/tcpsock_posix.go:159 +0x24 fp=0x1400056d980 sp=0x1400056d930 pc=0x102a50e64 net.(*TCPListener).Accept(0x140004ae080) net/tcpsock.go:380 +0x2c fp=0x1400056d9c0 sp=0x1400056d980 pc=0x102a4ff0c net/http.(*onceCloseListener).Accept(0x14000014000?) <autogenerated>:1 +0x2c fp=0x1400056d9e0 sp=0x1400056d9c0 pc=0x102c24a1c net/http.(*Server).Serve(0x14000270800, {0x103b59668, 0x140004ae080}) net/http/server.go:3463 +0x24c fp=0x1400056db10 sp=0x1400056d9e0 pc=0x102bffbac github.com/ollama/ollama/runner/llamarunner.Execute({0x140001801a0, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x1400056dce0 sp=0x1400056db10 pc=0x102d43064 github.com/ollama/ollama/runner.Execute({0x14000180190?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x1400056dd10 sp=0x1400056dce0 pc=0x102dba528 github.com/ollama/ollama/cmd.NewCLI.func2(0x140000a4e00?, {0x1036a7af0?, 0x4?, 0x1036a7af4?}) github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x1400056dd40 sp=0x1400056dd10 pc=0x103401db0 github.com/spf13/cobra.(*Command).execute(0x1400052bb08, {0x14000113a40, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x1400056de60 sp=0x1400056dd40 pc=0x102aa7f60 github.com/spf13/cobra.(*Command).ExecuteC(0x140000d8608) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x1400056df20 sp=0x1400056de60 pc=0x102aa863c github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x1400056df40 sp=0x1400056df20 pc=0x1034028d4 runtime.main() runtime/proc.go:285 +0x278 fp=0x1400056dfd0 sp=0x1400056df40 pc=0x1029260d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400056dfd0 sp=0x1400056dfd0 pc=0x102961d04 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x102959a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x102926424 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x102961d04 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 18 gp=0x14000102380 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x102959a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x14000098000) runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x102910ee4 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x102904b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x102961d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 19 gp=0x14000102540 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x103866590?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x102959a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x10444d860) runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x10290e9fc runtime.bgscavenge(0x14000098000) runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x10290ef9c runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x102904ad8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x102961d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 34 gp=0x14000186380 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x103b45a18?, 0xb0?, 0x2?, 0x1000000010?) runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x102959a50 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x102903b24 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x102961d04 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 35 gp=0x14000186e00 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000308740 sp=0x14000308720 pc=0x102959a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x10444e740) runtime/mcleanup.go:439 +0x110 fp=0x14000308780 sp=0x14000308740 pc=0x102901010 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x140003087d0 sp=0x14000308780 pc=0x102901820 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140003087d0 sp=0x140003087d0 pc=0x102961d04 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 36 gp=0x14000187180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000308f10 sp=0x14000308ef0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x14000308fb0 sp=0x14000308f10 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000308fd0 sp=0x14000308fb0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000308fd0 sp=0x14000308fd0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 3 gp=0x14000003500 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f3740f6e?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000173f10 sp=0x14000173ef0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x14000173fb0 sp=0x14000173f10 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000173fd0 sp=0x14000173fb0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000173fd0 sp=0x14000173fd0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 37 gp=0x14000187340 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f36ff69a?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000309710 sp=0x140003096f0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x140003097b0 sp=0x14000309710 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140003097d0 sp=0x140003097b0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140003097d0 sp=0x140003097d0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 38 gp=0x14000187500 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f37408c2?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000175f10 sp=0x14000175ef0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x14000175fb0 sp=0x14000175f10 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000175fd0 sp=0x14000175fb0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000175fd0 sp=0x14000175fd0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 21 gp=0x140001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f36ff4a6?, 0x3?, 0xfa?, 0x7d?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 4 gp=0x140000036c0 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f373f2f3?, 0x3?, 0xb6?, 0xa4?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 39 gp=0x140001876c0 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f36fe55a?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400030a710 sp=0x1400030a6f0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x1400030a7b0 sp=0x1400030a710 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400030a7d0 sp=0x1400030a7b0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400030a7d0 sp=0x1400030a7d0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 22 gp=0x14000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f37047fc?, 0x1?, 0x4c?, 0x5?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x581b7f373da89?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x102959a50 runtime.gcBgMarkWorker(0x14000183880) runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1029071b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x102907098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x102961d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 40 gp=0x14000003a40 m=nil [chan receive]: runtime.gopark(0x1400001b868?, 0x10293a594?, 0x98?, 0xb8?, 0x10295b9fc?) runtime/proc.go:460 +0xc0 fp=0x1400001b850 sp=0x1400001b830 pc=0x102959a50 runtime.chanrecv(0x140003bc150, 0x1400001ba40, 0x1) runtime/chan.go:667 +0x428 fp=0x1400001b8d0 sp=0x1400001b850 pc=0x1028f4318 runtime.chanrecv1(0x14000168ae0?, 0x1400056e000?) runtime/chan.go:509 +0x14 fp=0x1400001b900 sp=0x1400001b8d0 pc=0x1028f3eb4 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x140003f2320, {0x103b59848, 0x140002b20f0}, 0x140000003c0) github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x1400001ba90 sp=0x1400001b900 pc=0x102d41728 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x103b59848?, 0x140002b20f0?}, 0x1400001bb18?) <autogenerated>:1 +0x40 fp=0x1400001bac0 sp=0x1400001ba90 pc=0x102d43570 net/http.HandlerFunc.ServeHTTP(0x140000b4480?, {0x103b59848?, 0x140002b20f0?}, 0x1400001bb00?) net/http/server.go:2322 +0x38 fp=0x1400001baf0 sp=0x1400001bac0 pc=0x102bfc7e8 net/http.(*ServeMux).ServeHTTP(0x10?, {0x103b59848, 0x140002b20f0}, 0x140000003c0) net/http/server.go:2861 +0x190 fp=0x1400001bb40 sp=0x1400001baf0 pc=0x102bfe280 net/http.serverHandler.ServeHTTP({0x103b562b0?}, {0x103b59848?, 0x140002b20f0?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x1400001bb70 sp=0x1400001bb40 pc=0x102c18a70 net/http.(*conn).serve(0x14000014000, {0x103b5bc98, 0x140002f8f30}) net/http/server.go:2109 +0x528 fp=0x1400001bfa0 sp=0x1400001bb70 pc=0x102bfabd8 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x1400001bfd0 sp=0x1400001bfa0 pc=0x102bfff0c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400001bfd0 sp=0x1400001bfd0 pc=0x102961d04 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 9 gp=0x14000003c00 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x10297d600?) runtime/proc.go:460 +0xc0 fp=0x14000305580 sp=0x14000305560 pc=0x102959a50 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x140003055c0 sp=0x14000305580 pc=0x10291f620 internal/poll.runtime_pollWait(0x14fcc1c00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x140003055f0 sp=0x140003055c0 pc=0x102958c80 internal/poll.(*pollDesc).wait(0x1400024e000?, 0x140000a87e1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000305620 sp=0x140003055f0 pc=0x1029d7568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400024e000, {0x140000a87e1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x140003056c0 sp=0x14000305620 pc=0x1029d8780 net.(*netFD).Read(0x1400024e000, {0x140000a87e1?, 0x104389450?, 0x140000a8894?}) net/fd_posix.go:68 +0x28 fp=0x14000305710 sp=0x140003056c0 pc=0x102a3bff8 net.(*conn).Read(0x14000070000, {0x140000a87e1?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x14000305760 sp=0x14000305710 pc=0x102a485d4 net/http.(*connReader).backgroundRead(0x140000a87c0) net/http/server.go:702 +0x38 fp=0x140003057b0 sp=0x14000305760 pc=0x102bf5c48 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x140003057d0 sp=0x140003057b0 pc=0x102bf5b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140003057d0 sp=0x140003057d0 pc=0x102961d04 created by net/http.(*connReader).startBackgroundRead in goroutine 40 net/http/server.go:698 +0xb8 r0 0x1047e0000 r1 0x1047e3c60 r2 0x0 r3 0x1047e7000 r4 0xb54461800 r5 0x0 r6 0xffffffffbfc007ff r7 0xfffff0003ffff800 r8 0xb54461800 r9 0x0 r10 0x800380401980 r11 0xbfb99984c02334b7 r12 0x3f140930bf8574a0 r13 0xe0cfed5d694beb4b r14 0x104858cb8 r15 0xb54460000 r16 0x2821b5e88 r17 0xffffffffb00007ff r18 0x0 r19 0xc00 r20 0xc0040d262c7 r21 0x0 r22 0x104bef790 r23 0x0 r24 0x300 r25 0x300 r26 0x200 r27 0x104bef790 r28 0x0 r29 0x16ed36cd0 lr 0x181f6ca78 sp 0x16ed36c60 pc 0x181ddab1c fault 0x181ddab1c [GIN] 2025/11/12 - 12:15:03 | 500 | 60.695958ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T12:15:03.760-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.760-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T12:15:03.760-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 " time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 " time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:161 msg=reloading runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:236 msg="resetting model to expire immediately to make room" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:247 msg="waiting for pending requests to complete and unload to occur" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:311 msg="runner expired event received" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:326 msg="got lock to unload expired event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:349 msg="starting background wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.772-03:00 level=DEBUG source=sched.go:657 msg="no need to wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=server.go:1699 msg="stopping llama server" pid=99032 time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=sched.go:358 msg="runner terminated and removed from list, blocking for VRAM recovery" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=sched.go:361 msg="sending an unloaded event" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=sched.go:253 msg="unload completed" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99032 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T12:15:03.773-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.125µs time=2025-11-12T12:15:03.775-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:03.776-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 llama_model_load: vocab only - skipping tensors time=2025-11-12T12:15:03.797-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048 time=2025-11-12T12:15:03.798-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 56918" time=2025-11-12T12:15:03.798-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin time=2025-11-12T12:15:03.800-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.2 GiB" free_swap="0 B" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.800-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1 time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T12:15:03.800-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.801-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T12:15:03.801-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB" time=2025-11-12T12:15:03.821-03:00 level=INFO source=runner.go:910 msg="starting go runner" time=2025-11-12T12:15:03.821-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.007 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB time=2025-11-12T12:15:03.821-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2025-11-12T12:15:03.845-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:56918" time=2025-11-12T12:15:03.856-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free time=2025-11-12T12:15:03.856-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T12:15:03.857-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 768 print_info: n_layer = 12 print_info: n_head = 12 print_info: n_head_kv = 12 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 768 print_info: n_embd_v_gqa = 768 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 3072 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: model type = 137M print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: layer 0 assigned to device Metal, is_swa = 0 load_tensors: layer 1 assigned to device Metal, is_swa = 0 load_tensors: layer 2 assigned to device Metal, is_swa = 0 load_tensors: layer 3 assigned to device Metal, is_swa = 0 load_tensors: layer 4 assigned to device Metal, is_swa = 0 load_tensors: layer 5 assigned to device Metal, is_swa = 0 load_tensors: layer 6 assigned to device Metal, is_swa = 0 load_tensors: layer 7 assigned to device Metal, is_swa = 0 load_tensors: layer 8 assigned to device Metal, is_swa = 0 load_tensors: layer 9 assigned to device Metal, is_swa = 0 load_tensors: layer 10 assigned to device Metal, is_swa = 0 load_tensors: layer 11 assigned to device Metal, is_swa = 0 load_tensors: layer 12 assigned to device Metal, is_swa = 0 create_tensor: loading tensor token_embd.weight create_tensor: loading tensor token_types.weight create_tensor: loading tensor token_embd_norm.weight create_tensor: loading tensor token_embd_norm.bias create_tensor: loading tensor blk.0.attn_qkv.weight create_tensor: loading tensor blk.0.attn_output.weight create_tensor: loading tensor blk.0.attn_output_norm.weight create_tensor: loading tensor blk.0.attn_output_norm.bias create_tensor: loading tensor blk.0.ffn_up.weight create_tensor: loading tensor blk.0.ffn_down.weight create_tensor: loading tensor blk.0.ffn_gate.weight create_tensor: loading tensor blk.0.layer_output_norm.weight create_tensor: loading tensor blk.0.layer_output_norm.bias create_tensor: loading tensor blk.1.attn_qkv.weight create_tensor: loading tensor blk.1.attn_output.weight create_tensor: loading tensor blk.1.attn_output_norm.weight create_tensor: loading tensor blk.1.attn_output_norm.bias create_tensor: loading tensor blk.1.ffn_up.weight create_tensor: loading tensor blk.1.ffn_down.weight create_tensor: loading tensor blk.1.ffn_gate.weight create_tensor: loading tensor blk.1.layer_output_norm.weight create_tensor: loading tensor blk.1.layer_output_norm.bias create_tensor: loading tensor blk.2.attn_qkv.weight create_tensor: loading tensor blk.2.attn_output.weight create_tensor: loading tensor blk.2.attn_output_norm.weight create_tensor: loading tensor blk.2.attn_output_norm.bias create_tensor: loading tensor blk.2.ffn_up.weight create_tensor: loading tensor blk.2.ffn_down.weight create_tensor: loading tensor blk.2.ffn_gate.weight create_tensor: loading tensor blk.2.layer_output_norm.weight create_tensor: loading tensor blk.2.layer_output_norm.bias create_tensor: loading tensor blk.3.attn_qkv.weight create_tensor: loading tensor blk.3.attn_output.weight create_tensor: loading tensor blk.3.attn_output_norm.weight create_tensor: loading tensor blk.3.attn_output_norm.bias create_tensor: loading tensor blk.3.ffn_up.weight create_tensor: loading tensor blk.3.ffn_down.weight create_tensor: loading tensor blk.3.ffn_gate.weight create_tensor: loading tensor blk.3.layer_output_norm.weight create_tensor: loading tensor blk.3.layer_output_norm.bias create_tensor: loading tensor blk.4.attn_qkv.weight create_tensor: loading tensor blk.4.attn_output.weight create_tensor: loading tensor blk.4.attn_output_norm.weight create_tensor: loading tensor blk.4.attn_output_norm.bias create_tensor: loading tensor blk.4.ffn_up.weight create_tensor: loading tensor blk.4.ffn_down.weight create_tensor: loading tensor blk.4.ffn_gate.weight create_tensor: loading tensor blk.4.layer_output_norm.weight create_tensor: loading tensor blk.4.layer_output_norm.bias create_tensor: loading tensor blk.5.attn_qkv.weight create_tensor: loading tensor blk.5.attn_output.weight create_tensor: loading tensor blk.5.attn_output_norm.weight create_tensor: loading tensor blk.5.attn_output_norm.bias create_tensor: loading tensor blk.5.ffn_up.weight create_tensor: loading tensor blk.5.ffn_down.weight create_tensor: loading tensor blk.5.ffn_gate.weight create_tensor: loading tensor blk.5.layer_output_norm.weight create_tensor: loading tensor blk.5.layer_output_norm.bias create_tensor: loading tensor blk.6.attn_qkv.weight create_tensor: loading tensor blk.6.attn_output.weight create_tensor: loading tensor blk.6.attn_output_norm.weight create_tensor: loading tensor blk.6.attn_output_norm.bias create_tensor: loading tensor blk.6.ffn_up.weight create_tensor: loading tensor blk.6.ffn_down.weight create_tensor: loading tensor blk.6.ffn_gate.weight create_tensor: loading tensor blk.6.layer_output_norm.weight create_tensor: loading tensor blk.6.layer_output_norm.bias create_tensor: loading tensor blk.7.attn_qkv.weight create_tensor: loading tensor blk.7.attn_output.weight create_tensor: loading tensor blk.7.attn_output_norm.weight create_tensor: loading tensor blk.7.attn_output_norm.bias create_tensor: loading tensor blk.7.ffn_up.weight create_tensor: loading tensor blk.7.ffn_down.weight create_tensor: loading tensor blk.7.ffn_gate.weight create_tensor: loading tensor blk.7.layer_output_norm.weight create_tensor: loading tensor blk.7.layer_output_norm.bias create_tensor: loading tensor blk.8.attn_qkv.weight create_tensor: loading tensor blk.8.attn_output.weight create_tensor: loading tensor blk.8.attn_output_norm.weight create_tensor: loading tensor blk.8.attn_output_norm.bias create_tensor: loading tensor blk.8.ffn_up.weight create_tensor: loading tensor blk.8.ffn_down.weight create_tensor: loading tensor blk.8.ffn_gate.weight create_tensor: loading tensor blk.8.layer_output_norm.weight create_tensor: loading tensor blk.8.layer_output_norm.bias create_tensor: loading tensor blk.9.attn_qkv.weight create_tensor: loading tensor blk.9.attn_output.weight create_tensor: loading tensor blk.9.attn_output_norm.weight create_tensor: loading tensor blk.9.attn_output_norm.bias create_tensor: loading tensor blk.9.ffn_up.weight create_tensor: loading tensor blk.9.ffn_down.weight create_tensor: loading tensor blk.9.ffn_gate.weight create_tensor: loading tensor blk.9.layer_output_norm.weight create_tensor: loading tensor blk.9.layer_output_norm.bias create_tensor: loading tensor blk.10.attn_qkv.weight create_tensor: loading tensor blk.10.attn_output.weight create_tensor: loading tensor blk.10.attn_output_norm.weight create_tensor: loading tensor blk.10.attn_output_norm.bias create_tensor: loading tensor blk.10.ffn_up.weight create_tensor: loading tensor blk.10.ffn_down.weight create_tensor: loading tensor blk.10.ffn_gate.weight create_tensor: loading tensor blk.10.layer_output_norm.weight create_tensor: loading tensor blk.10.layer_output_norm.bias create_tensor: loading tensor blk.11.attn_qkv.weight create_tensor: loading tensor blk.11.attn_output.weight create_tensor: loading tensor blk.11.attn_output_norm.weight create_tensor: loading tensor blk.11.attn_output_norm.bias create_tensor: loading tensor blk.11.ffn_up.weight create_tensor: loading tensor blk.11.ffn_down.weight create_tensor: loading tensor blk.11.ffn_gate.weight create_tensor: loading tensor blk.11.layer_output_norm.weight create_tensor: loading tensor blk.11.layer_output_norm.bias load_tensors: offloading 12 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 44.72 MiB load_tensors: Metal_Mapped model buffer size = 216.14 MiB llama_init_from_model: model default pooling_type is [1], but [-1] was specified llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M2 Pro ggml_metal_init: use bfloat = true ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true set_abort_callback: call llama_context: CPU output buffer size = 0.12 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 3 llama_context: max_nodes = 1024 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: Metal compute buffer size = 23.51 MiB llama_context: CPU compute buffer size = 4.01 MiB llama_context: graph nodes = 371 llama_context: graph splits = 2 time=2025-11-12T12:15:04.108-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds" time=2025-11-12T12:15:04.108-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1 time=2025-11-12T12:15:04.108-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T12:15:04.108-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds" time=2025-11-12T12:15:04.108-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=99034 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T12:15:04.111-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T12:15:04.114-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=901 used=0 remaining=901 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0 0x103133da0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32' ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32 0x1031346e0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32' ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0x1031351e0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0 0x103136360 | th_max = 896 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4 0x103136b60 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1' ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1 0x103136e60 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4 0x103137320 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32' ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32 0xaba94c000 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32' ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32 0xaba94c300 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4 0xaba94c600 | th_max = 768 | th_width = 32 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1 0xaba94c900 | th_max = 896 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1 0xaba94cc00 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32 0xaba94cf00 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1 0xaba94d200 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4 0xaba94d500 | th_max = 1024 | th_width = 32 SIGTRAP: trace trap PC=0x181ddab1c m=0 sigcode=0 signal arrived during cgo execution goroutine 40 gp=0x14000102fc0 m=0 mp=0x1027b0120 [syscall]: runtime.cgocall(0x1017b0fbc, 0x1400052cb88) runtime/cgocall.go:167 +0x44 fp=0x1400052cb50 sp=0x1400052cb10 pc=0x100cb6684 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x103127260, {0x185, 0xabac95000, 0x0, 0x10312d0e0, 0x10312d8e0, 0xaba852800, 0x103128260}) _cgo_gotypes.go:674 +0x30 fp=0x1400052cb80 sp=0x1400052cb50 pc=0x100ffd790 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:160 github.com/ollama/ollama/llama.(*Context).Decode(0x140000dec08?, 0x0?) github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x1400052cc70 sp=0x1400052cb80 pc=0x100fff94c github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000530140, 0x140001f6af0, 0x1400052cf18) github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x1400052ced0 sp=0x1400052cc70 pc=0x10109f774 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000530140, {0x101ebbcd0, 0x1400004e0f0}) github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x1400052cfa0 sp=0x1400052ced0 pc=0x10109f43c github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x1400052cfd0 sp=0x1400052cfa0 pc=0x1010a324c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400052cfd0 sp=0x1400052cfd0 pc=0x100cc1d04 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140002af720 sp=0x140002af700 pc=0x100cb9a50 runtime.netpollblock(0x1400011f7b8?, 0xd3ba6c?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x140002af760 sp=0x140002af720 pc=0x100c7f620 internal/poll.runtime_pollWait(0x12e068000, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x140002af790 sp=0x140002af760 pc=0x100cb8c80 internal/poll.(*pollDesc).wait(0x14000250a80?, 0x100d3dafc?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140002af7c0 sp=0x140002af790 pc=0x100d37568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x14000250a80) internal/poll/fd_unix.go:613 +0x21c fp=0x140002af870 sp=0x140002af7c0 pc=0x100d3bb4c net.(*netFD).accept(0x14000250a80) net/fd_unix.go:161 +0x28 fp=0x140002af930 sp=0x140002af870 pc=0x100d9d7f8 net.(*TCPListener).accept(0x140001ed780) net/tcpsock_posix.go:159 +0x24 fp=0x140002af980 sp=0x140002af930 pc=0x100db0e64 net.(*TCPListener).Accept(0x140001ed780) net/tcpsock.go:380 +0x2c fp=0x140002af9c0 sp=0x140002af980 pc=0x100daff0c net/http.(*onceCloseListener).Accept(0x1400053c090?) <autogenerated>:1 +0x2c fp=0x140002af9e0 sp=0x140002af9c0 pc=0x100f84a1c net/http.(*Server).Serve(0x140000a6200, {0x101eb9668, 0x140001ed780}) net/http/server.go:3463 +0x24c fp=0x140002afb10 sp=0x140002af9e0 pc=0x100f5fbac github.com/ollama/ollama/runner/llamarunner.Execute({0x140001801a0, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x140002afce0 sp=0x140002afb10 pc=0x1010a3064 github.com/ollama/ollama/runner.Execute({0x14000180190?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140002afd10 sp=0x140002afce0 pc=0x10111a528 github.com/ollama/ollama/cmd.NewCLI.func2(0x14000273400?, {0x101a07af0?, 0x4?, 0x101a07af4?}) github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x140002afd40 sp=0x140002afd10 pc=0x101761db0 github.com/spf13/cobra.(*Command).execute(0x140000bf508, {0x14000113200, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140002afe60 sp=0x140002afd40 pc=0x100e07f60 github.com/spf13/cobra.(*Command).ExecuteC(0x1400027bb08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140002aff20 sp=0x140002afe60 pc=0x100e0863c github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x140002aff40 sp=0x140002aff20 pc=0x1017628d4 runtime.main() runtime/proc.go:285 +0x278 fp=0x140002affd0 sp=0x140002aff40 pc=0x100c860d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140002affd0 sp=0x140002affd0 pc=0x100cc1d04 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100cb9a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x100c86424 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x100cc1d04 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 18 gp=0x14000102380 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x100cb9a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x14000098000) runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x100c70ee4 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x100c64b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x100cc1d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 19 gp=0x14000102540 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x101bc6590?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x100cb9a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x1027ad860) runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x100c6e9fc runtime.bgscavenge(0x14000098000) runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x100c6ef9c runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x100c64ad8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x100cc1d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 34 gp=0x14000186380 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x101ea5a18?, 0xb0?, 0x22?, 0x1000000010?) runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x100cb9a50 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x100c63b24 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x100cc1d04 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 35 gp=0x14000186e00 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000308740 sp=0x14000308720 pc=0x100cb9a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x1027ae740) runtime/mcleanup.go:439 +0x110 fp=0x14000308780 sp=0x14000308740 pc=0x100c61010 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x140003087d0 sp=0x14000308780 pc=0x100c61820 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140003087d0 sp=0x140003087d0 pc=0x100cc1d04 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 36 gp=0x14000187180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000308f10 sp=0x14000308ef0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x14000308fb0 sp=0x14000308f10 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000308fd0 sp=0x14000308fb0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000308fd0 sp=0x14000308fd0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 3 gp=0x14000003500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006d710 sp=0x1400006d6f0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006d7b0 sp=0x1400006d710 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069710 sp=0x140000696f0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x140000697b0 sp=0x14000069710 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 37 gp=0x14000187340 m=nil [GC worker (idle)]: runtime.gopark(0x581b812bbe909?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000309710 sp=0x140003096f0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x140003097b0 sp=0x14000309710 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140003097d0 sp=0x140003097b0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140003097d0 sp=0x140003097d0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 4 gp=0x140000036c0 m=nil [GC worker (idle)]: runtime.gopark(0x581b812bcca9f?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 21 gp=0x140001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x581b812bcc8fe?, 0x3?, 0x3e?, 0xbf?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 38 gp=0x14000187500 m=nil [GC worker (idle)]: runtime.gopark(0x581b812bbe986?, 0x1?, 0xed?, 0x17?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000309f10 sp=0x14000309ef0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x14000309fb0 sp=0x14000309f10 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000309fd0 sp=0x14000309fb0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000309fd0 sp=0x14000309fd0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 39 gp=0x140001876c0 m=nil [GC worker (idle)]: runtime.gopark(0x581b812bbd4b0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400030a710 sp=0x1400030a6f0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x1400030a7b0 sp=0x1400030a710 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400030a7d0 sp=0x1400030a7b0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400030a7d0 sp=0x1400030a7d0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 22 gp=0x14000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x581b812bbec4a?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400052df10 sp=0x1400052def0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x1400052dfb0 sp=0x1400052df10 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400052dfd0 sp=0x1400052dfb0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400052dfd0 sp=0x1400052dfd0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x581b812bc5fd9?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400052bf10 sp=0x1400052bef0 pc=0x100cb9a50 runtime.gcBgMarkWorker(0x140001836c0) runtime/mgc.go:1463 +0xe0 fp=0x1400052bfb0 sp=0x1400052bf10 pc=0x100c671b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400052bfd0 sp=0x1400052bfb0 pc=0x100c67098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400052bfd0 sp=0x1400052bfd0 pc=0x100cc1d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 41 gp=0x14000103180 m=nil [chan receive]: runtime.gopark(0x14000049868?, 0x100c9a594?, 0x98?, 0x98?, 0x100cbb9fc?) runtime/proc.go:460 +0xc0 fp=0x14000541850 sp=0x14000541830 pc=0x100cb9a50 runtime.chanrecv(0x1400040be30, 0x14000049a40, 0x1) runtime/chan.go:667 +0x428 fp=0x140005418d0 sp=0x14000541850 pc=0x100c54318 runtime.chanrecv1(0x14000016030?, 0x140000e6008?) runtime/chan.go:509 +0x14 fp=0x14000541900 sp=0x140005418d0 pc=0x100c53eb4 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000530140, {0x101eb9848, 0x140004a6f00}, 0x140004e8780) github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x14000541a90 sp=0x14000541900 pc=0x1010a1728 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x101eb9848?, 0x140004a6f00?}, 0x14000049b18?) <autogenerated>:1 +0x40 fp=0x14000541ac0 sp=0x14000541a90 pc=0x1010a3570 net/http.HandlerFunc.ServeHTTP(0x14000536000?, {0x101eb9848?, 0x140004a6f00?}, 0x14000049b00?) net/http/server.go:2322 +0x38 fp=0x14000541af0 sp=0x14000541ac0 pc=0x100f5c7e8 net/http.(*ServeMux).ServeHTTP(0x10?, {0x101eb9848, 0x140004a6f00}, 0x140004e8780) net/http/server.go:2861 +0x190 fp=0x14000541b40 sp=0x14000541af0 pc=0x100f5e280 net/http.serverHandler.ServeHTTP({0x101eb62b0?}, {0x101eb9848?, 0x140004a6f00?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x14000541b70 sp=0x14000541b40 pc=0x100f78a70 net/http.(*conn).serve(0x1400053c090, {0x101ebbc98, 0x1400052e360}) net/http/server.go:2109 +0x528 fp=0x14000541fa0 sp=0x14000541b70 pc=0x100f5abd8 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x14000541fd0 sp=0x14000541fa0 pc=0x100f5ff0c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000541fd0 sp=0x14000541fd0 pc=0x100cc1d04 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 7 gp=0x14000187dc0 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100cdd600?) runtime/proc.go:460 +0xc0 fp=0x14000305580 sp=0x14000305560 pc=0x100cb9a50 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x140003055c0 sp=0x14000305580 pc=0x100c7f620 internal/poll.runtime_pollWait(0x12e067e00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x140003055f0 sp=0x140003055c0 pc=0x100cb8c80 internal/poll.(*pollDesc).wait(0x14000250b00?, 0x140001ed7e1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000305620 sp=0x140003055f0 pc=0x100d37568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x14000250b00, {0x140001ed7e1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x140003056c0 sp=0x14000305620 pc=0x100d38780 net.(*netFD).Read(0x14000250b00, {0x140001ed7e1?, 0x1026e9450?, 0x140001ed894?}) net/fd_posix.go:68 +0x28 fp=0x14000305710 sp=0x140003056c0 pc=0x100d9bff8 net.(*conn).Read(0x14000140128, {0x140001ed7e1?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x14000305760 sp=0x14000305710 pc=0x100da85d4 net/http.(*connReader).backgroundRead(0x140001ed7c0) net/http/server.go:702 +0x38 fp=0x140003057b0 sp=0x14000305760 pc=0x100f55c48 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x140003057d0 sp=0x140003057b0 pc=0x100f55b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140003057d0 sp=0x140003057d0 pc=0x100cc1d04 created by net/http.(*connReader).startBackgroundRead in goroutine 41 net/http/server.go:698 +0xb8 r0 0x102f34000 r1 0x102f37cc0 r2 0x0 r3 0x102f3b020 r4 0xabac63000 r5 0x0 r6 0xffffffffbfc007ff r7 0xfffff0003ffff800 r8 0xabac63000 r9 0x0 r10 0x300100400b00 r11 0xbfb99984c02334b7 r12 0x3f140930bf8574a0 r13 0xdacb4deb4e9a747a r14 0x102f80cb8 r15 0xabac60000 r16 0x2821b5e88 r17 0xffffffffb00007ff r18 0x0 r19 0xc00 r20 0xc00405fe2c7 r21 0x0 r22 0x1031273a0 r23 0x0 r24 0x300 r25 0x300 r26 0x185 r27 0x1031273a0 r28 0x0 r29 0x16f1becf0 lr 0x181f6ca78 sp 0x16f1bec80 pc 0x181ddab1c fault 0x181ddab1c [GIN] 2025/11/12 - 12:15:04 | 500 | 411.454458ms | 192.168.1.3 | POST "/api/embed" ```
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

@rick-github Yes, this is not linux, as the title says.

I am aware. I am confirming that this affects macOS and not other operating systems. I am thereby reducing the scope of the problem to a known subset that are affected.

  • it fails above 512 sie

I am aware. You stated this in the original problem report.

  • model info says that it uses num_ctx 8192 and that is what I see it trying to pass, unless I craft my batches so that they are 512 or smaller. I am totally aware that model does NOT support 8192 itself. However it does seem to also not support the stated 2048

It is not trying to pass 8192 tokens. If the input is greater than the size of the training context, ollama truncates it to the size of the training context. As you realize, the problem occurs on macOS when the size of the input buffer is greater than 512 tokens. Have you observed this with any other embedding models? Does this happen if you force the model to load on the CPU with num_gpu:0?

<!-- gh-comment-id:3522729755 --> @rick-github commented on GitHub (Nov 12, 2025): > [@rick-github](https://github.com/rick-github) Yes, this is not linux, as the title says. I am aware. I am confirming that this affects macOS and not other operating systems. I am thereby reducing the scope of the problem to a known subset that are affected. > * it fails above 512 sie I am aware. You stated this in the original problem report. > * model info says that it uses `num_ctx 8192` and that is what I see it trying to pass, _unless_ I craft my batches so that they are 512 or smaller. I am totally aware that model does NOT support 8192 itself. However it does seem to also not support the stated 2048 It is not trying to pass 8192 tokens. If the input is greater than the size of the training context, ollama truncates it to the size of the training context. As you realize, the problem occurs on macOS when the size of the input buffer is greater than 512 tokens. Have you observed this with any other embedding models? Does this happen if you force the model to load on the CPU with `num_gpu:0`?
Author
Owner

@smileBeda commented on GitHub (Nov 12, 2025):

It is not trying to pass 8192 tokens. If the input is greater than the size of the training context, ollama truncates it to the size of the training context. As you realize, the problem occurs on macOS when the size of the input buffer is greater than 512 tokens.

That must be exactly what happened up and until given release (not sure which), because it "just worked" before, and now fails (ungracefully), and somehow now changed in behavior (I am not sure if this can be related to macOS updates themselves. But I def updated ollama recently, after which this started breaking)

Have you observed this with any other embedding models?

Problem was I couldn't try others (at least not with my existing codebase) due to the expected embedding length, which changes across models and the vector database I use would fail upon a change (and need a full re-index)
But for the sake of testing I spun up a curl with a set of large docs and using mxbai-embed-large ... and sure enough, I get only [GIN] 2025/11/12 - 13:36:23 | 200 | 63.503208ms | 192.168.1.3 | POST "/api/embed", no error 500.

That is, with a 2048 chunk length max, which as far I understand is far beyond the supported of this model - so it probably truncates here, as it would be expected from nomic-embed-text too.

I made sure to re-pull nomic-embed-text, which brought no change.

Does this happen if you force the model to load on the CPU with num_gpu:0?

Yes (with the faulty model)

<!-- gh-comment-id:3522860340 --> @smileBeda commented on GitHub (Nov 12, 2025): > It is not trying to pass 8192 tokens. If the input is greater than the size of the training context, ollama truncates it to the size of the training context. As you realize, the problem occurs on macOS when the size of the input buffer is greater than 512 tokens. That must be _exactly_ what happened up and until given release (not sure which), because it "just worked" before, and now fails (ungracefully), and somehow now changed in behavior (I am not sure if this can be related to macOS updates themselves. But I def updated ollama recently, after which this started breaking) > Have you observed this with any other embedding models? Problem was I couldn't try others (at least not with my existing codebase) due to the expected `embedding length`, which changes across models and the vector database I use would fail upon a change (and need a full re-index) But for the sake of testing I spun up a curl with a set of large docs and using `mxbai-embed-large` ... and sure enough, I get only `[GIN] 2025/11/12 - 13:36:23 | 200 | 63.503208ms | 192.168.1.3 | POST "/api/embed"`, no error 500. That is, with a 2048 chunk length max, which as far I understand is far beyond the supported of this model - so it probably truncates here, as it would be expected from nomic-embed-text too. I made sure to re-pull nomic-embed-text, which brought no change. > Does this happen if you force the model to load on the CPU with num_gpu:0? Yes (with the faulty model)
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

Context length for mxbai-embed-large is 512 tokens and so will not trigger this problem. bge-m3 has a context length of 8192. It's possible that this can be triggered from theCLI, which would make testing easier. What happens if you run:

ollama run nomic-embed-text "$(yes | head -1024)"

If that fails (ie doesn't produce an embedding array), try it with bge-m3:

ollama run bge-m3 "$(yes | head -1024)"

If the first fails and the second doesn't, then the scope of the problem has been narrowed to nomic-embed-text on macOS with more than 512 tokens.

Does this happen if you force the model to load on the CPU with num_gpu:0?

Yes (with the faulty model)

Can you provide the log from this failure?

<!-- gh-comment-id:3523001238 --> @rick-github commented on GitHub (Nov 12, 2025): Context length for mxbai-embed-large is 512 tokens and so will not trigger this problem. [bge-m3](https://ollama.com/library/bge-m3) has a context length of 8192. It's possible that this can be triggered from theCLI, which would make testing easier. What happens if you run: ``` ollama run nomic-embed-text "$(yes | head -1024)" ``` If that fails (ie doesn't produce an embedding array), try it with bge-m3: ``` ollama run bge-m3 "$(yes | head -1024)" ``` If the first fails and the second doesn't, then the scope of the problem has been narrowed to nomic-embed-text on macOS with more than 512 tokens. >> Does this happen if you force the model to load on the CPU with num_gpu:0? > Yes (with the faulty model) Can you provide the log from this failure?
Author
Owner

@smileBeda commented on GitHub (Nov 12, 2025):

ollama run nomic-embed-text "$(yes | head -1024)"
Error: do embedding request: Post "http://127.0.0.1:61135/embedding": EOF
ollama run bge-m3 "$(yes | head -1024)"

[-0.0024655391,0.012520963,-0.034661766,0.00979911,0.020126741,-0.0063052033,0.061135337,0.02785952,0.01282014,0.01313768,0.03755821,-0.010756892,-0.013573597,-0.0027548627,0.015132667,-0.012089926,0.022975657,-0.02472517,-0.020567201,0.0067664445,-0.011849044,-0.007570717,-0.03824529,-0.028428787,-0.030870901,0.04635912,0.00093890616,-0.0024808426,-0.018684616,0.034005128,-0.0011272626,0.00035640955,0.02115013,-0.045110386,0.010169646,0.00084363075,0.0243686,-0.028889816,-0.05862591,-0.0040431353,-0.03596958,-0.0057717767,-0.012274092,-0.07077382,0.0056228437,-0.021380764,-0.03406687,-0.06495494,0.013454194,-0.027232219,0.0001037273,-0.015824683,0.07380437,-0.0057144463,0.02181727,0.044988256,0.0126143675,-0.0018407698,-0.03308246,-0.025498824,0.014384171,-0.006924586,0.0045332373,-0.015956,-0.02739619,0.081781015,0.011106647,-0.010713183,0.041856807,-0.054054722,-0.0271439,-0.015687754,0.0013446996,-0.054479368,-0.032991424,0.026090862,0.006033099,-0.018812899,0.03209107,0.052387595,0.029508837,0.017055718,0.015636925,0.034301832,0.025275117,0.037229896,-0.0040980135,0.037856203,-0.029072529,-0.011431079,0.001313473,-0.008878349,0.05359564,-0.014697112,0.0062859715,0.040346902,-0.039523553,0.0011466894,0.015034411,-0.006230922,0.03317728,0.0096215205,-0.036962673,-0.05009489,-0.0041105906,-0.009056571,0.009797539,0.009659828,0.022052253,-0.014271221,0.004911889,0.034467973,0.018570494,0.007644651,-0.015751498,0.031769533,-0.03232941,0.010429895,0.031727176,-0.0044451705,0.0017900123,0.026673919,0.02348727,0.049866468,0.01145509,-0.0012024986,0.033363994,-0.0089492295,-0.0032395925,-0.01593136,0.029613724,0.004149588,0.002019955,-0.025029045,-0.006650551,-0.048293322,-0.0044958806,0.05816978,-0.039653793,0.003548803,0.0504772,0.009426564,-0.0037158318,-0.036830965,0.027001854,0.0034835231,0.0403916,-0.011458368,0.0019726914,-0.007508284,-0.00050825294,-0.018630989,-0.002110288,0.001588383,-0.027995486,0.009936741,3.033485e-7,0.011067152,-0.05336586,0.02202685,0.0037786409,-0.038541194,-0.0132649215,-0.00093720097,-0.010298659,-0.0075208712,-0.016267667,0.0010408205,-0.0027519322,-0.026513953,0.009943936,-0.08339628,0.018493485,0.008538674,0.015437462,0.078491874,-0.029538492,-0.00759044,-0.021251151,-0.013018569,-0.0040967194,-0.029866422,0.018041184,-0.01926123,-0.03276526,0.038832404,-0.030362166,-0.04040532,0.008604108,0.013357923,-0.0107174385,0.0062563936,0.020380922,0.025674624,0.0004139832,-0.023605218,0.011787711,-0.0263709,0.022049392,-0.03183026,-0.041072633,0.03520126,0.03636103,-0.001420021,-0.0071168095,-0.003475114,-0.009256344,-0.024726992,0.004858782,-0.07107463,-0.011742883,-0.029894238,0.007675998,-0.0039243214,0.02797107,0.044983312,0.020941233,-0.00031587912,0.0032516897,0.023841256,0.018934144,0.017419532,0.029200885,0.033058114,0.0057526706,-0.015528183,0.024385974,0.03961257,0.011234596,-0.06473419,0.006752831,0.014939676,-0.02408232,-0.007482002,-0.02326038,0.011269281,-0.0021610092,0.03188758,-0.030374486,0.044189982,-0.007030019,-0.018662086,0.03398468,0.019599242,-0.03382726,-0.024045229,0.009293969,0.012517427,-0.024604863,0.009967605,0.05701172,-0.042490818,-0.05574554,0.029391129,-0.013158905,-0.022451937,-0.0013953196,0.0010310693,0.030122241,0.03512458,-0.023279674,-0.008600523,-0.0017030638,0.024007333,-0.056472007,0.012965989,0.0036796296,-0.03226964,0.016177427,0.0043270844,0.02023193,-0.042816862,-0.012473664,0.02284442,0.022730084,0.08285353,0.040155057,-0.03872918,0.030392934,-0.021634726,0.00546185,-0.0011475339,0.066409476,-0.0066396515,-0.03981463,-0.012861173,-0.054637857,0.005779912,-0.051658977,-0.053374965,0.0031512675,-0.014707976,0.06142391,-0.03319437,-0.024947861,0.04467378,-0.0015979138,-0.1444461,-0.005867286,0.010005407,0.009326801,0.027588205,0.027580697,0.013830309,0.019020082,0.0045419964,0.0009360246,0.011734269,-0.07169997,-0.03942002,0.033524357,-0.0016364885,0.021200843,-0.03803462,0.006927008,-0.014600801,-0.020997213,-0.027597075,-0.021096494,0.037062924,-0.03223568,0.012495709,0.016814955,0.00793817,-0.020347144,-0.03451836,0.0058715516,-0.022061015,0.005970981,-0.034165315,-0.033752747,-0.024826435,-0.015665732,0.061299153,-0.027362015,-0.011104958,0.016834145,-0.020344483,0.029956134,-0.05248282,0.011467307,0.0024013566,-0.05782271,-0.03273222,-0.018879442,0.0074885366,0.006577176,-0.02210981,0.022134557,-0.01096359,-0.022107601,-0.036801744,0.013578336,0.012765276,0.011498251,-0.010516095,0.03200766,-0.0043872544,-0.011588062,0.007655406,-0.020142725,-0.02313444,0.02140618,-0.011085141,-0.005426379,0.0062283957,-0.049036637,0.031972714,0.045493525,0.04060209,-0.019140955,-0.021028873,0.005654341,-0.023051027,-0.0258323,-0.036621604,-0.07010111,-0.0058659078,-0.003555226,-0.025211412,0.0046697045,-0.048972543,-0.0066941585,0.0721344,0.038623564,0.022870922,0.4188231,-0.016584517,-0.006423623,-0.023491489,-0.0014622242,-0.025837049,0.026689028,0.005219663,-0.007853802,-0.032330308,0.010086888,0.0024895759,-0.059916563,0.008715808,-0.010020744,0.051186983,-0.043995805,-0.006019228,0.029576596,-0.0031411275,0.042419996,0.0005692696,0.002248256,-0.00013912994,-0.029533146,0.025610736,0.019355472,0.013901333,0.010196629,0.039352432,-0.014640141,-0.03582712,0.046053924,-0.00030978123,0.01877422,-0.0069803945,-0.0024567114,-0.011088582,-0.01144942,0.046130415,0.00982192,-0.033516433,-0.009391387,-0.07873915,-0.028174788,0.02506204,0.029990388,-0.0053328737,0.047977164,0.040182456,0.023841534,-0.026355827,-0.06659133,0.006661398,0.014868735,0.019982077,-0.036526073,0.01893396,-0.04987242,0.022044118,0.042882357,0.055330362,-0.043434836,0.027803088,0.0010729382,-0.009366099,-0.046222456,-0.027903738,-0.023584226,0.04328739,0.035069674,-0.0060054446,-0.016948579,0.0011108259,0.01125101,0.030035915,0.05761236,-0.0039558248,0.026275672,0.007079113,0.013887756,-0.019085867,0.024673104,0.010854946,0.019223688,-0.02400833,0.0015028184,0.0048109735,-0.012231457,0.030818144,-0.000494414,-0.005980282,-0.047696657,0.0314536,-0.049490336,-0.039108228,-0.01056573,-0.04411308,0.0066438275,0.0034389494,-0.030610284,-0.02732567,0.011243963,-0.06744624,0.030650817,0.017871248,0.0072186445,-0.029006612,-0.02638049,0.033587027,0.029566811,0.008505501,0.027395366,-0.0664682,0.03632154,0.01634719,-0.00069439784,0.035416592,0.021308318,0.011108194,-0.023692587,-0.011214007,-0.011781931,0.016087994,-0.040829666,0.008935033,-0.018292667,0.0056724157,-0.015730256,-0.0061015366,0.012424478,-0.016951142,-0.011590707,-0.025020381,0.028604867,0.008927267,0.019564424,0.03904284,0.024485415,-0.0056477864,0.02967337,0.02240289,-0.0562787,0.04340599,-0.024985952,0.017091712,-0.0145033,-0.00076440943,-0.008496924,0.003899913,-0.017432932,0.00493525,0.029253187,0.010091748,0.0012004063,-0.0071473643,-0.019423338,-0.051589005,0.010105738,-0.031340126,0.043789517,0.020013098,0.0065812464,0.084248334,0.003083852,0.03728648,-0.0008686313,0.019962952,-0.01712255,0.053697962,-0.014919218,-0.031848058,0.021151142,0.01644062,0.028637784,-0.02432075,-0.010255131,0.005267059,-0.025809908,-0.038523044,0.050633863,-0.011014982,-0.0015101071,0.040950038,0.03718395,0.03672299,0.028105877,-0.00558873,0.022421764,-0.023462698,-0.019924395,0.057750594,0.016954603,-0.021807274,0.038083732,0.022665353,0.025241043,0.022568753,0.020972993,-0.04087117,-0.008020938,0.004225881,-0.028563386,-0.016267743,-0.031485554,-0.021455178,-0.005757515,0.0497787,0.0034065873,-0.0070191757,-0.021384139,0.0069736773,-0.02429704,-0.025753088,0.019425116,-0.037422083,0.025720833,0.016636813,0.032276906,-0.012337935,0.00418953,-0.0009897655,0.06980091,-0.032844268,-0.023691827,0.06353138,-0.0025569363,-0.019205905,-0.027978208,-0.008032115,-0.010481519,-0.03767267,0.008283569,-0.0033715994,0.046683595,0.03409144,-0.04110347,0.039985873,-0.002968713,0.004485528,0.03830466,-0.02181867,0.01076896,0.018330116,0.021085488,0.0263162,-0.04520039,-0.013822619,-0.023639588,-0.0008488159,-0.006620213,0.010439314,0.038564805,-0.033890728,0.01770206,0.012213449,-0.036237907,0.00763573,-0.035758387,-0.03908441,0.009209322,0.028018894,0.0036218711,-0.040811997,-0.021229675,-0.022977633,0.011879559,0.03238317,-0.023480931,-0.029326448,-0.03503613,-0.0113010965,0.012513779,-0.02613464,0.029717911,0.024427637,0.01423229,0.04650249,-0.0251392,0.021643467,-0.012065791,0.0457312,-0.04316157,0.0131587,-0.00050999597,0.01796037,0.025816992,-0.0073407874,-0.0017777034,0.009066388,0.022376463,-0.03755141,-0.0027831635,0.016198842,-0.009827243,0.050868277,0.012146033,0.026347242,-0.00084143894,0.03212374,-0.0065670223,-0.018514324,-0.027147882,0.016092014,-0.038171522,-0.011425882,-0.0006904658,-0.034653895,0.03836904,0.021484783,0.06093142,-0.027745996,-0.031167656,-0.040309142,-0.008138127,0.0123959305,0.015726129,-0.017049938,-0.022439092,-0.010031138,0.020072857,-0.016622897,-0.040653937,0.005570996,-0.038436256,0.046834797,0.016570263,0.017752757,0.0130015565,-0.034592267,-0.003268793,-0.028658863,0.02435429,-0.0082318485,0.019522885,-0.007309754,0.039943315,0.0030807054,0.026084023,-0.017065985,0.005910353,0.048532736,0.01178011,0.0025256095,0.03549256,-0.022447921,-0.018427558,0.04802989,-0.013333389,0.017590683,-0.02865029,0.030927725,-0.007425591,-0.032867193,0.072334364,-0.0051595215,-0.003760645,-0.018146658,0.043036375,-0.037239496,-0.0005816647,0.024694342,0.00053734315,-0.0019469006,-0.012092161,0.03574291,-0.052824248,-0.0034759114,-0.020724956,0.018236168,0.030281255,0.010804771,-0.023447935,-0.010797497,-0.06856684,0.06818638,-0.029461134,0.008633985,-0.017991582,0.016394403,-0.033078976,-0.032454077,-0.025115354,-0.016500521,0.022592153,-0.0077800285,-0.012368588,0.0015893972,-0.032424908,0.026063792,-0.06023825,0.00059756543,0.009382394,0.008505659,-0.1321474,0.029312903,-0.015193261,0.0025878823,-0.0387726,0.020469075,-0.0057056695,-0.009442006,-0.032056034,-0.044402387,0.025135178,-0.04530564,0.046452574,0.008444393,-0.0011374742,0.036127485,-0.004313948,-0.012632526,0.024961505,0.048601005,-0.0040913653,0.018671183,0.0074533084,0.01996697,0.016175397,-0.023270665,-0.00569709,-0.010178411,-0.011731929,0.01766548,-0.008570303,0.017449716,-0.014216223,0.012492117,0.017967332,-0.038002644,0.0113833435,-0.005700737,-0.0341095,-0.027408944,0.010282649,0.019552762,-0.0034715703,-0.00018759748,-0.02216678,0.023314673,0.029872691,-0.0147312535,-0.034435906,-0.011677904,-0.026339358,-0.00092818117,0.013567718,0.029714638,-0.001719257,-0.036353868,-0.033376772,0.005550725,-0.019802434,0.028393691,-0.012579767,0.014000176,-0.019786911,0.02421517,-0.029089866,0.00782554,-0.00049744727,-0.022144828,-0.028562952,-0.00015825986,-0.037844125,-0.036894493,0.04215439,0.020620883,0.01885507,-0.038857844,0.048837516,0.004341216,-0.011581981,-0.049321745,-0.032807406,0.007057428,0.004249416,-0.027840981,0.00078957924,0.009674887,-0.013996783,-0.034583524,-0.025872016,-0.004104291,-0.03406769,-0.009652061,-0.004043305,-0.0068282573,-0.05273987,0.011827701,-0.036969002,0.001871128,0.0035747464,-0.008623273,-0.034558963,0.0003303392,-0.025174242,-0.004510887,-0.019840019,0.018243162,-0.023271669,-0.02466403,0.034955174,0.016693972,-0.05068059,-0.005946675,-0.0077928966,0.01861228,-0.07544394,-0.010803999,-0.023254205,-0.014070995,0.008985934,-0.015939059,-0.015008386,-0.02572816,0.008838571,-0.04049887,0.017221892,0.03112894,-0.015034083,0.07777998,0.016974635,-0.011862955,0.06681233,-0.036532596,-0.0092163,-0.047023665,0.017291145,0.0051411637,-0.0064917402,0.018890465,-0.01127899,-0.037489187,-0.025443766,0.051484432,-0.040845744,-0.015196647,-0.0900095,0.0012304548,0.0091564935,0.00935908,-0.034656838,-0.018139584,0.04535503,0.017865177,0.052362476,0.015837295,0.0019606838,0.029097918,0.010493229,-0.015125583,-0.0068201665,-0.004712786,0.034350347,-0.010478846,-0.011280923,-0.049657345,-0.012973315,0.032526378,0.010206259,-0.015764402,-0.009471565,-0.018411959,0.014796017,0.05210716,-0.047796525,-0.0052287923,0.0048679374,-0.024274275,-0.014123835,0.005471596,0.014785772,0.030673018,0.031013083,0.0014718971,0.019773485,-0.03055917,0.0049005705,0.0067022564,-0.014821356,-0.006461881,-0.033621244,-0.052686,-0.017066486,-0.0057467385,-0.013057907,-0.03026125,0.023544429,0.026313292,-0.01840306,0.048073422,0.03715534,-0.035943616,-0.05844684,-0.0004259377,-0.017406482,-0.005458709,-0.03653825,-0.034500774,0.017790042,0.00029393873,-0.0071086367,0.004814528,-0.014828339,0.008669794,-0.012632796,0.05237533,-0.027390089,0.014993734,0.011721252,-0.03145538,0.032266226,0.0502262,0.0027262687,-0.01816494,-0.05503297,-0.019706922,0.0011754895,-0.021551775,-0.02882505,-0.006974214,0.0012116006,-0.001067933,-0.003411494,-0.005696014,0.013949916,0.028859617,0.008325512,0.028334796,-0.009813161,0.03681828,-0.03541507,-0.04573604,0.0027602196,0.045494065,-0.010330862,0.012248682]

Can you provide the log from this failure?
Until second failure

OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 ollama serve
time=2025-11-12T14:26:20.612-03:00 level=INFO source=routes.go:1525 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/bedas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-11-12T14:26:20.613-03:00 level=INFO source=images.go:522 msg="total blobs: 15"
time=2025-11-12T14:26:20.613-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-12T14:26:20.613-03:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)"
time=2025-11-12T14:26:20.614-03:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-12T14:26:20.614-03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-12T14:26:20.615-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --ollama-engine --port 61459"
time=2025-11-12T14:26:20.615-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin
time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=52.30125ms OLLAMA_LIBRARY_PATH=[/opt/homebrew/Cellar/ollama/0.12.10/bin] extra_envs=map[]
time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=1
time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:172 msg="adjusting filtering IDs" FilterID=0 new_ID=0
time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=52.803875ms
time=2025-11-12T14:26:20.666-03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M2 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="11.8 GiB" available="11.8 GiB"
time=2025-11-12T14:26:20.666-03:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="11.8 GiB" threshold="20.0 GiB"
[GIN] 2025/11/12 - 14:26:35 | 200 |    2.496208ms |     192.168.1.3 | GET      "/api/tags"
time=2025-11-12T14:26:35.142-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.334µs
time=2025-11-12T14:26:35.142-03:00 level=DEBUG source=sched.go:194 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-11-12T14:26:35.144-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:35.145-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
time=2025-11-12T14:26:35.193-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048
time=2025-11-12T14:26:35.193-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 61462"
time=2025-11-12T14:26:35.193-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin
time=2025-11-12T14:26:35.196-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.6 GiB" free_swap="0 B"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.196-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB"
time=2025-11-12T14:26:35.209-03:00 level=INFO source=runner.go:910 msg="starting go runner"
time=2025-11-12T14:26:35.209-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
time=2025-11-12T14:26:35.210-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-11-12T14:26:35.231-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:61462"
time=2025-11-12T14:26:35.240-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
time=2025-11-12T14:26:35.240-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T14:26:35.241-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device Metal, is_swa = 0
load_tensors: layer   1 assigned to device Metal, is_swa = 0
load_tensors: layer   2 assigned to device Metal, is_swa = 0
load_tensors: layer   3 assigned to device Metal, is_swa = 0
load_tensors: layer   4 assigned to device Metal, is_swa = 0
load_tensors: layer   5 assigned to device Metal, is_swa = 0
load_tensors: layer   6 assigned to device Metal, is_swa = 0
load_tensors: layer   7 assigned to device Metal, is_swa = 0
load_tensors: layer   8 assigned to device Metal, is_swa = 0
load_tensors: layer   9 assigned to device Metal, is_swa = 0
load_tensors: layer  10 assigned to device Metal, is_swa = 0
load_tensors: layer  11 assigned to device Metal, is_swa = 0
load_tensors: layer  12 assigned to device Metal, is_swa = 0
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor token_types.weight
create_tensor: loading tensor token_embd_norm.weight
create_tensor: loading tensor token_embd_norm.bias
create_tensor: loading tensor blk.0.attn_qkv.weight
create_tensor: loading tensor blk.0.attn_output.weight
create_tensor: loading tensor blk.0.attn_output_norm.weight
create_tensor: loading tensor blk.0.attn_output_norm.bias
create_tensor: loading tensor blk.0.ffn_up.weight
create_tensor: loading tensor blk.0.ffn_down.weight
create_tensor: loading tensor blk.0.ffn_gate.weight
create_tensor: loading tensor blk.0.layer_output_norm.weight
create_tensor: loading tensor blk.0.layer_output_norm.bias
create_tensor: loading tensor blk.1.attn_qkv.weight
create_tensor: loading tensor blk.1.attn_output.weight
create_tensor: loading tensor blk.1.attn_output_norm.weight
create_tensor: loading tensor blk.1.attn_output_norm.bias
create_tensor: loading tensor blk.1.ffn_up.weight
create_tensor: loading tensor blk.1.ffn_down.weight
create_tensor: loading tensor blk.1.ffn_gate.weight
create_tensor: loading tensor blk.1.layer_output_norm.weight
create_tensor: loading tensor blk.1.layer_output_norm.bias
create_tensor: loading tensor blk.2.attn_qkv.weight
create_tensor: loading tensor blk.2.attn_output.weight
create_tensor: loading tensor blk.2.attn_output_norm.weight
create_tensor: loading tensor blk.2.attn_output_norm.bias
create_tensor: loading tensor blk.2.ffn_up.weight
create_tensor: loading tensor blk.2.ffn_down.weight
create_tensor: loading tensor blk.2.ffn_gate.weight
create_tensor: loading tensor blk.2.layer_output_norm.weight
create_tensor: loading tensor blk.2.layer_output_norm.bias
create_tensor: loading tensor blk.3.attn_qkv.weight
create_tensor: loading tensor blk.3.attn_output.weight
create_tensor: loading tensor blk.3.attn_output_norm.weight
create_tensor: loading tensor blk.3.attn_output_norm.bias
create_tensor: loading tensor blk.3.ffn_up.weight
create_tensor: loading tensor blk.3.ffn_down.weight
create_tensor: loading tensor blk.3.ffn_gate.weight
create_tensor: loading tensor blk.3.layer_output_norm.weight
create_tensor: loading tensor blk.3.layer_output_norm.bias
create_tensor: loading tensor blk.4.attn_qkv.weight
create_tensor: loading tensor blk.4.attn_output.weight
create_tensor: loading tensor blk.4.attn_output_norm.weight
create_tensor: loading tensor blk.4.attn_output_norm.bias
create_tensor: loading tensor blk.4.ffn_up.weight
create_tensor: loading tensor blk.4.ffn_down.weight
create_tensor: loading tensor blk.4.ffn_gate.weight
create_tensor: loading tensor blk.4.layer_output_norm.weight
create_tensor: loading tensor blk.4.layer_output_norm.bias
create_tensor: loading tensor blk.5.attn_qkv.weight
create_tensor: loading tensor blk.5.attn_output.weight
create_tensor: loading tensor blk.5.attn_output_norm.weight
create_tensor: loading tensor blk.5.attn_output_norm.bias
create_tensor: loading tensor blk.5.ffn_up.weight
create_tensor: loading tensor blk.5.ffn_down.weight
create_tensor: loading tensor blk.5.ffn_gate.weight
create_tensor: loading tensor blk.5.layer_output_norm.weight
create_tensor: loading tensor blk.5.layer_output_norm.bias
create_tensor: loading tensor blk.6.attn_qkv.weight
create_tensor: loading tensor blk.6.attn_output.weight
create_tensor: loading tensor blk.6.attn_output_norm.weight
create_tensor: loading tensor blk.6.attn_output_norm.bias
create_tensor: loading tensor blk.6.ffn_up.weight
create_tensor: loading tensor blk.6.ffn_down.weight
create_tensor: loading tensor blk.6.ffn_gate.weight
create_tensor: loading tensor blk.6.layer_output_norm.weight
create_tensor: loading tensor blk.6.layer_output_norm.bias
create_tensor: loading tensor blk.7.attn_qkv.weight
create_tensor: loading tensor blk.7.attn_output.weight
create_tensor: loading tensor blk.7.attn_output_norm.weight
create_tensor: loading tensor blk.7.attn_output_norm.bias
create_tensor: loading tensor blk.7.ffn_up.weight
create_tensor: loading tensor blk.7.ffn_down.weight
create_tensor: loading tensor blk.7.ffn_gate.weight
create_tensor: loading tensor blk.7.layer_output_norm.weight
create_tensor: loading tensor blk.7.layer_output_norm.bias
create_tensor: loading tensor blk.8.attn_qkv.weight
create_tensor: loading tensor blk.8.attn_output.weight
create_tensor: loading tensor blk.8.attn_output_norm.weight
create_tensor: loading tensor blk.8.attn_output_norm.bias
create_tensor: loading tensor blk.8.ffn_up.weight
create_tensor: loading tensor blk.8.ffn_down.weight
create_tensor: loading tensor blk.8.ffn_gate.weight
create_tensor: loading tensor blk.8.layer_output_norm.weight
create_tensor: loading tensor blk.8.layer_output_norm.bias
create_tensor: loading tensor blk.9.attn_qkv.weight
create_tensor: loading tensor blk.9.attn_output.weight
create_tensor: loading tensor blk.9.attn_output_norm.weight
create_tensor: loading tensor blk.9.attn_output_norm.bias
create_tensor: loading tensor blk.9.ffn_up.weight
create_tensor: loading tensor blk.9.ffn_down.weight
create_tensor: loading tensor blk.9.ffn_gate.weight
create_tensor: loading tensor blk.9.layer_output_norm.weight
create_tensor: loading tensor blk.9.layer_output_norm.bias
create_tensor: loading tensor blk.10.attn_qkv.weight
create_tensor: loading tensor blk.10.attn_output.weight
create_tensor: loading tensor blk.10.attn_output_norm.weight
create_tensor: loading tensor blk.10.attn_output_norm.bias
create_tensor: loading tensor blk.10.ffn_up.weight
create_tensor: loading tensor blk.10.ffn_down.weight
create_tensor: loading tensor blk.10.ffn_gate.weight
create_tensor: loading tensor blk.10.layer_output_norm.weight
create_tensor: loading tensor blk.10.layer_output_norm.bias
create_tensor: loading tensor blk.11.attn_qkv.weight
create_tensor: loading tensor blk.11.attn_output.weight
create_tensor: loading tensor blk.11.attn_output_norm.weight
create_tensor: loading tensor blk.11.attn_output_norm.bias
create_tensor: loading tensor blk.11.ffn_up.weight
create_tensor: loading tensor blk.11.ffn_down.weight
create_tensor: loading tensor blk.11.ffn_gate.weight
create_tensor: loading tensor blk.11.layer_output_norm.weight
create_tensor: loading tensor blk.11.layer_output_norm.bias
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.14 MiB
llama_init_from_model: model default pooling_type is [1], but [-1] was specified
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 3
llama_context: max_nodes = 1024
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
llama_context:      Metal compute buffer size =    23.51 MiB
llama_context:        CPU compute buffer size =     4.01 MiB
llama_context: graph nodes  = 371
llama_context: graph splits = 2
time=2025-11-12T14:26:35.492-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds"
time=2025-11-12T14:26:35.492-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1
time=2025-11-12T14:26:35.493-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T14:26:35.493-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds"
time=2025-11-12T14:26:35.493-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.498-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:35.504-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=495 used=0 remaining=495
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 0.12 MiB to 59.08 MiB
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1             0x1044fd660 | th_max =  896 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32'
ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32                          0x1044fdfa0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32'
ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32                            0x1044fe8e0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1             0x1044ffa60 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32                           0x104500260 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1             0x104500860 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1'
ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1                             0x104500b60 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4                     0xbeef90000 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32'
ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32                             0xbeef90300 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32'
ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32                           0xbeef90600 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4                   0xbeef90900 | th_max = 1024 | th_width =   32
[GIN] 2025/11/12 - 14:26:35 | 200 |  451.770458ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T14:26:35.586-03:00 level=DEBUG source=sched.go:520 msg="context for request finished"
time=2025-11-12T14:26:35.586-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T14:26:35.586-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T14:26:35.599-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.602-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:35.603-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=495 prompt=491 used=0 remaining=491
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
[GIN] 2025/11/12 - 14:26:35 | 200 |   62.181875ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T14:26:35.650-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.650-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T14:26:35.650-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T14:26:35.660-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.663-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:35.664-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=491 prompt=452 used=0 remaining=452
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4                         0xbeef90c00 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4                 0xbeef90f00 | th_max =  768 | th_width =   32
[GIN] 2025/11/12 - 14:26:35 | 200 |   54.840042ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T14:26:35.707-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.707-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T14:26:35.707-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T14:26:35.717-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.719-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:35.720-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=452 prompt=255 used=0 remaining=255
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=2'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=2                   0xbeef91200 | th_max = 1024 | th_width =   32
[GIN] 2025/11/12 - 14:26:35 | 200 |   33.137375ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T14:26:35.741-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.741-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T14:26:35.742-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T14:26:35.757-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.759-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:35.760-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=255 prompt=1872 used=0 remaining=1872
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 59.08 MiB to 61.11 MiB
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0             0xbeef91500 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0             0xbeef91800 | th_max =  896 | th_width =   32
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
SIGTRAP: trace trap
PC=0x181ddab1c m=10 sigcode=0
signal arrived during cgo execution

goroutine 38 gp=0x14000102c40 m=10 mp=0x1400054c008 [syscall]:
runtime.cgocall(0x102c00fbc, 0x140004ceb88)
	runtime/cgocall.go:167 +0x44 fp=0x140004ceb50 sp=0x140004ceb10 pc=0x102106684
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1044ef1d0, {0x200, 0x1044f68f0, 0x0, 0x1044f70f0, 0xbee802000, 0xbef04fc00, 0x1044f0570})
	_cgo_gotypes.go:674 +0x30 fp=0x140004ceb80 sp=0x140004ceb50 pc=0x10244d790
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:160
github.com/ollama/ollama/llama.(*Context).Decode(0x140004b8008?, 0x0?)
	github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x140004cec70 sp=0x140004ceb80 pc=0x10244f94c
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x1400034e280, 0x14000378550, 0x140004cef18)
	github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x140004ceed0 sp=0x140004cec70 pc=0x1024ef774
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x1400034e280, {0x10330bcd0, 0x140003780a0})
	github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x140004cefa0 sp=0x140004ceed0 pc=0x1024ef43c
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x140004cefd0 sp=0x140004cefa0 pc=0x1024f324c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140004cefd0 sp=0x140004cefd0 pc=0x102111d04
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140003c1720 sp=0x140003c1700 pc=0x102109a50
runtime.netpollblock(0x1400011f7b8?, 0x218ba6c?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x140003c1760 sp=0x140003c1720 pc=0x1020cf620
internal/poll.runtime_pollWait(0x12f44ba00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x140003c1790 sp=0x140003c1760 pc=0x102108c80
internal/poll.(*pollDesc).wait(0x1400047c100?, 0x10218dafc?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140003c17c0 sp=0x140003c1790 pc=0x102187568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x1400047c100)
	internal/poll/fd_unix.go:613 +0x21c fp=0x140003c1870 sp=0x140003c17c0 pc=0x10218bb4c
net.(*netFD).accept(0x1400047c100)
	net/fd_unix.go:161 +0x28 fp=0x140003c1930 sp=0x140003c1870 pc=0x1021ed7f8
net.(*TCPListener).accept(0x1400012a0c0)
	net/tcpsock_posix.go:159 +0x24 fp=0x140003c1980 sp=0x140003c1930 pc=0x102200e64
net.(*TCPListener).Accept(0x1400012a0c0)
	net/tcpsock.go:380 +0x2c fp=0x140003c19c0 sp=0x140003c1980 pc=0x1021fff0c
net/http.(*onceCloseListener).Accept(0x14000568090?)
	<autogenerated>:1 +0x2c fp=0x140003c19e0 sp=0x140003c19c0 pc=0x1023d4a1c
net/http.(*Server).Serve(0x14000562100, {0x103309668, 0x1400012a0c0})
	net/http/server.go:3463 +0x24c fp=0x140003c1b10 sp=0x140003c19e0 pc=0x1023afbac
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000034200, 0x4, 0x4})
	github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x140003c1ce0 sp=0x140003c1b10 pc=0x1024f3064
github.com/ollama/ollama/runner.Execute({0x140000341f0?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140003c1d10 sp=0x140003c1ce0 pc=0x10256a528
github.com/ollama/ollama/cmd.NewCLI.func2(0x140001f5400?, {0x102e57af0?, 0x4?, 0x102e57af4?})
	github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x140003c1d40 sp=0x140003c1d10 pc=0x102bb1db0
github.com/spf13/cobra.(*Command).execute(0x140007a3808, {0x1400052fac0, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140003c1e60 sp=0x140003c1d40 pc=0x102257f60
github.com/spf13/cobra.(*Command).ExecuteC(0x14000536908)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140003c1f20 sp=0x140003c1e60 pc=0x10225863c
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x140003c1f40 sp=0x140003c1f20 pc=0x102bb28d4
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x140003c1fd0 sp=0x140003c1f40 pc=0x1020d60d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140003c1fd0 sp=0x140003c1fd0 pc=0x102111d04

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x102109a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1020d6424
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x102111d04
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006d760 sp=0x1400006d740 pc=0x102109a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x14000098000)
	runtime/mgcsweep.go:323 +0x104 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1020c0ee4
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1020b4b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x102111d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x103016590?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006df60 sp=0x1400006df40 pc=0x102109a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x103bfd860)
	runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1020be9fc
runtime.bgscavenge(0x14000098000)
	runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1020bef9c
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1020b4ad8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x102111d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 5 gp=0x14000003c00 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x1032f5a18?, 0xc0?, 0x0?, 0x1000000010?)
	runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x102109a50
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x1020b3b24
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x102111d04
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 6 gp=0x140001e0700 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006e740 sp=0x1400006e720 pc=0x102109a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x103bfe740)
	runtime/mcleanup.go:439 +0x110 fp=0x1400006e780 sp=0x1400006e740 pc=0x1020b1010
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x1400006e7d0 sp=0x1400006e780 pc=0x1020b1820
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x102111d04
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 7 gp=0x140001e0c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 18 gp=0x14000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068710 sp=0x140000686f0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x140000687b0 sp=0x14000068710 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 34 gp=0x14000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400050a710 sp=0x1400050a6f0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400050a7b0 sp=0x1400050a710 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400050a7d0 sp=0x1400050a7b0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400050a7d0 sp=0x1400050a7d0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 35 gp=0x140005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x103c2f920?, 0x1?, 0xd4?, 0xad?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140004c9f10 sp=0x140004c9ef0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x140004c9fb0 sp=0x140004c9f10 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140004c9fd0 sp=0x140004c9fb0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140004c9fd0 sp=0x140004c9fd0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 36 gp=0x14000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x588e585b5798a?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400050b710 sp=0x1400050b6f0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400050b7b0 sp=0x1400050b710 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400050b7d0 sp=0x1400050b7b0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400050b7d0 sp=0x1400050b7d0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 8 gp=0x140001e0e00 m=nil [GC worker (idle)]:
runtime.gopark(0x588e585b58954?, 0x3?, 0xa9?, 0xbc?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 19 gp=0x14000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x588e585b70054?, 0x3?, 0x5?, 0x6?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068f10 sp=0x14000068ef0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000068fb0 sp=0x14000068f10 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 37 gp=0x14000504540 m=nil [GC worker (idle)]:
runtime.gopark(0x588e585b57c78?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400050bf10 sp=0x1400050bef0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400050bfb0 sp=0x1400050bf10 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400050bfd0 sp=0x1400050bfb0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400050bfd0 sp=0x1400050bfd0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x588e585b71194?, 0x1?, 0x7?, 0xcd?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000083f10 sp=0x14000083ef0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000083fb0 sp=0x14000083f10 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000083fd0 sp=0x14000083fb0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 9 gp=0x140001e0fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x588e585b771c5?, 0x1?, 0x98?, 0x3a?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140004caf10 sp=0x140004caef0 pc=0x102109a50
runtime.gcBgMarkWorker(0x140000a56c0)
	runtime/mgc.go:1463 +0xe0 fp=0x140004cafb0 sp=0x140004caf10 pc=0x1020b71b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140004cafd0 sp=0x140004cafb0 pc=0x1020b7098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140004cafd0 sp=0x140004cafd0 pc=0x102111d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 39 gp=0x14000102e00 m=nil [chan receive]:
runtime.gopark(0x1400056d868?, 0x1020ea594?, 0x98?, 0xd8?, 0x10210b9fc?)
	runtime/proc.go:460 +0xc0 fp=0x1400056d850 sp=0x1400056d830 pc=0x102109a50
runtime.chanrecv(0x140003182a0, 0x1400056da40, 0x1)
	runtime/chan.go:667 +0x428 fp=0x1400056d8d0 sp=0x1400056d850 pc=0x1020a4318
runtime.chanrecv1(0x1400057e030?, 0x140004d8000?)
	runtime/chan.go:509 +0x14 fp=0x1400056d900 sp=0x1400056d8d0 pc=0x1020a3eb4
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x1400034e280, {0x103309848, 0x140005424b0}, 0x1400037aa00)
	github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x1400056da90 sp=0x1400056d900 pc=0x1024f1728
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x103309848?, 0x140005424b0?}, 0x1400056db18?)
	<autogenerated>:1 +0x40 fp=0x1400056dac0 sp=0x1400056da90 pc=0x1024f3570
net/http.HandlerFunc.ServeHTTP(0x14000796300?, {0x103309848?, 0x140005424b0?}, 0x1400056db00?)
	net/http/server.go:2322 +0x38 fp=0x1400056daf0 sp=0x1400056dac0 pc=0x1023ac7e8
net/http.(*ServeMux).ServeHTTP(0x10?, {0x103309848, 0x140005424b0}, 0x1400037aa00)
	net/http/server.go:2861 +0x190 fp=0x1400056db40 sp=0x1400056daf0 pc=0x1023ae280
net/http.serverHandler.ServeHTTP({0x1033062b0?}, {0x103309848?, 0x140005424b0?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x1400056db70 sp=0x1400056db40 pc=0x1023c8a70
net/http.(*conn).serve(0x14000568090, {0x10330bc98, 0x1400027cea0})
	net/http/server.go:2109 +0x528 fp=0x1400056dfa0 sp=0x1400056db70 pc=0x1023aabd8
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x1400056dfd0 sp=0x1400056dfa0 pc=0x1023aff0c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400056dfd0 sp=0x1400056dfd0 pc=0x102111d04
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 52 gp=0x14000102fc0 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x10212d600?)
	runtime/proc.go:460 +0xc0 fp=0x1400050dd80 sp=0x1400050dd60 pc=0x102109a50
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x1400050ddc0 sp=0x1400050dd80 pc=0x1020cf620
internal/poll.runtime_pollWait(0x12f44b800, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x1400050ddf0 sp=0x1400050ddc0 pc=0x102108c80
internal/poll.(*pollDesc).wait(0x1400047c180?, 0x1400012a121?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400050de20 sp=0x1400050ddf0 pc=0x102187568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400047c180, {0x1400012a121, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400050dec0 sp=0x1400050de20 pc=0x102188780
net.(*netFD).Read(0x1400047c180, {0x1400012a121?, 0x103b39450?, 0x1400012a1d4?})
	net/fd_posix.go:68 +0x28 fp=0x1400050df10 sp=0x1400050dec0 pc=0x1021ebff8
net.(*conn).Read(0x14000126030, {0x1400012a121?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x1400050df60 sp=0x1400050df10 pc=0x1021f85d4
net/http.(*connReader).backgroundRead(0x1400012a100)
	net/http/server.go:702 +0x38 fp=0x1400050dfb0 sp=0x1400050df60 pc=0x1023a5c48
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x1400050dfd0 sp=0x1400050dfb0 pc=0x1023a5b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400050dfd0 sp=0x1400050dfd0 pc=0x102111d04
created by net/http.(*connReader).startBackgroundRead in goroutine 39
	net/http/server.go:698 +0xb8

r0      0x104188000
r1      0x10418bcc0
r2      0x0
r3      0x10418f020
r4      0xbeec7f000
r5      0x0
r6      0xffffffffbfc007ff
r7      0xfffff0003ffff800
r8      0xbeec7f000
r9      0x0
r10     0x700300400b00
r11     0xbfb99984c02334b7
r12     0x3f140930bf8574a0
r13     0x238bd17ec34be941
r14     0x1041d4e08
r15     0xbeec7c000
r16     0x2821b5e88
r17     0xffffffffb00007ff
r18     0x0
r19     0xc00
r20     0xc0040b122c7
r21     0x0
r22     0x1044ef310
r23     0x0
r24     0x300
r25     0x300
r26     0x200
r27     0x1044ef310
r28     0x0
r29     0x1725cecd0
lr      0x181f6ca78
sp      0x1725cec60
pc      0x181ddab1c
fault   0x181ddab1c
[GIN] 2025/11/12 - 14:26:35 | 500 |   60.308792ms |     192.168.1.3 | POST     "/api/embed"
time=2025-11-12T14:26:35.810-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.810-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s
time=2025-11-12T14:26:35.810-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 "
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 "
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:161 msg=reloading runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:236 msg="resetting model to expire immediately to make room" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:247 msg="waiting for pending requests to complete and unload to occur" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:311 msg="runner expired event received" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:326 msg="got lock to unload expired event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:349 msg="starting background wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:657 msg="no need to wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=server.go:1699 msg="stopping llama server" pid=11885
time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=sched.go:358 msg="runner terminated and removed from list, blocking for VRAM recovery" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=sched.go:361 msg="sending an unloaded event" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=sched.go:253 msg="unload completed" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=709ns
time=2025-11-12T14:26:35.822-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:35.823-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
time=2025-11-12T14:26:35.844-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048
time=2025-11-12T14:26:35.845-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 61468"
time=2025-11-12T14:26:35.845-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin
time=2025-11-12T14:26:35.847-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.1 GiB" free_swap="0 B"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64
time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64
time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0
time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0
time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-11-12T14:26:35.848-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB"
time=2025-11-12T14:26:35.870-03:00 level=INFO source=runner.go:910 msg="starting go runner"
time=2025-11-12T14:26:35.870-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.007 sec
ggml_metal_device_init: GPU name:   Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
time=2025-11-12T14:26:35.871-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-11-12T14:26:35.896-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:61468"
time=2025-11-12T14:26:35.902-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free
time=2025-11-12T14:26:35.902-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T14:26:35.903-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load:   - 102 ('[SEP]')
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device Metal, is_swa = 0
load_tensors: layer   1 assigned to device Metal, is_swa = 0
load_tensors: layer   2 assigned to device Metal, is_swa = 0
load_tensors: layer   3 assigned to device Metal, is_swa = 0
load_tensors: layer   4 assigned to device Metal, is_swa = 0
load_tensors: layer   5 assigned to device Metal, is_swa = 0
load_tensors: layer   6 assigned to device Metal, is_swa = 0
load_tensors: layer   7 assigned to device Metal, is_swa = 0
load_tensors: layer   8 assigned to device Metal, is_swa = 0
load_tensors: layer   9 assigned to device Metal, is_swa = 0
load_tensors: layer  10 assigned to device Metal, is_swa = 0
load_tensors: layer  11 assigned to device Metal, is_swa = 0
load_tensors: layer  12 assigned to device Metal, is_swa = 0
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor token_types.weight
create_tensor: loading tensor token_embd_norm.weight
create_tensor: loading tensor token_embd_norm.bias
create_tensor: loading tensor blk.0.attn_qkv.weight
create_tensor: loading tensor blk.0.attn_output.weight
create_tensor: loading tensor blk.0.attn_output_norm.weight
create_tensor: loading tensor blk.0.attn_output_norm.bias
create_tensor: loading tensor blk.0.ffn_up.weight
create_tensor: loading tensor blk.0.ffn_down.weight
create_tensor: loading tensor blk.0.ffn_gate.weight
create_tensor: loading tensor blk.0.layer_output_norm.weight
create_tensor: loading tensor blk.0.layer_output_norm.bias
create_tensor: loading tensor blk.1.attn_qkv.weight
create_tensor: loading tensor blk.1.attn_output.weight
create_tensor: loading tensor blk.1.attn_output_norm.weight
create_tensor: loading tensor blk.1.attn_output_norm.bias
create_tensor: loading tensor blk.1.ffn_up.weight
create_tensor: loading tensor blk.1.ffn_down.weight
create_tensor: loading tensor blk.1.ffn_gate.weight
create_tensor: loading tensor blk.1.layer_output_norm.weight
create_tensor: loading tensor blk.1.layer_output_norm.bias
create_tensor: loading tensor blk.2.attn_qkv.weight
create_tensor: loading tensor blk.2.attn_output.weight
create_tensor: loading tensor blk.2.attn_output_norm.weight
create_tensor: loading tensor blk.2.attn_output_norm.bias
create_tensor: loading tensor blk.2.ffn_up.weight
create_tensor: loading tensor blk.2.ffn_down.weight
create_tensor: loading tensor blk.2.ffn_gate.weight
create_tensor: loading tensor blk.2.layer_output_norm.weight
create_tensor: loading tensor blk.2.layer_output_norm.bias
create_tensor: loading tensor blk.3.attn_qkv.weight
create_tensor: loading tensor blk.3.attn_output.weight
create_tensor: loading tensor blk.3.attn_output_norm.weight
create_tensor: loading tensor blk.3.attn_output_norm.bias
create_tensor: loading tensor blk.3.ffn_up.weight
create_tensor: loading tensor blk.3.ffn_down.weight
create_tensor: loading tensor blk.3.ffn_gate.weight
create_tensor: loading tensor blk.3.layer_output_norm.weight
create_tensor: loading tensor blk.3.layer_output_norm.bias
create_tensor: loading tensor blk.4.attn_qkv.weight
create_tensor: loading tensor blk.4.attn_output.weight
create_tensor: loading tensor blk.4.attn_output_norm.weight
create_tensor: loading tensor blk.4.attn_output_norm.bias
create_tensor: loading tensor blk.4.ffn_up.weight
create_tensor: loading tensor blk.4.ffn_down.weight
create_tensor: loading tensor blk.4.ffn_gate.weight
create_tensor: loading tensor blk.4.layer_output_norm.weight
create_tensor: loading tensor blk.4.layer_output_norm.bias
create_tensor: loading tensor blk.5.attn_qkv.weight
create_tensor: loading tensor blk.5.attn_output.weight
create_tensor: loading tensor blk.5.attn_output_norm.weight
create_tensor: loading tensor blk.5.attn_output_norm.bias
create_tensor: loading tensor blk.5.ffn_up.weight
create_tensor: loading tensor blk.5.ffn_down.weight
create_tensor: loading tensor blk.5.ffn_gate.weight
create_tensor: loading tensor blk.5.layer_output_norm.weight
create_tensor: loading tensor blk.5.layer_output_norm.bias
create_tensor: loading tensor blk.6.attn_qkv.weight
create_tensor: loading tensor blk.6.attn_output.weight
create_tensor: loading tensor blk.6.attn_output_norm.weight
create_tensor: loading tensor blk.6.attn_output_norm.bias
create_tensor: loading tensor blk.6.ffn_up.weight
create_tensor: loading tensor blk.6.ffn_down.weight
create_tensor: loading tensor blk.6.ffn_gate.weight
create_tensor: loading tensor blk.6.layer_output_norm.weight
create_tensor: loading tensor blk.6.layer_output_norm.bias
create_tensor: loading tensor blk.7.attn_qkv.weight
create_tensor: loading tensor blk.7.attn_output.weight
create_tensor: loading tensor blk.7.attn_output_norm.weight
create_tensor: loading tensor blk.7.attn_output_norm.bias
create_tensor: loading tensor blk.7.ffn_up.weight
create_tensor: loading tensor blk.7.ffn_down.weight
create_tensor: loading tensor blk.7.ffn_gate.weight
create_tensor: loading tensor blk.7.layer_output_norm.weight
create_tensor: loading tensor blk.7.layer_output_norm.bias
create_tensor: loading tensor blk.8.attn_qkv.weight
create_tensor: loading tensor blk.8.attn_output.weight
create_tensor: loading tensor blk.8.attn_output_norm.weight
create_tensor: loading tensor blk.8.attn_output_norm.bias
create_tensor: loading tensor blk.8.ffn_up.weight
create_tensor: loading tensor blk.8.ffn_down.weight
create_tensor: loading tensor blk.8.ffn_gate.weight
create_tensor: loading tensor blk.8.layer_output_norm.weight
create_tensor: loading tensor blk.8.layer_output_norm.bias
create_tensor: loading tensor blk.9.attn_qkv.weight
create_tensor: loading tensor blk.9.attn_output.weight
create_tensor: loading tensor blk.9.attn_output_norm.weight
create_tensor: loading tensor blk.9.attn_output_norm.bias
create_tensor: loading tensor blk.9.ffn_up.weight
create_tensor: loading tensor blk.9.ffn_down.weight
create_tensor: loading tensor blk.9.ffn_gate.weight
create_tensor: loading tensor blk.9.layer_output_norm.weight
create_tensor: loading tensor blk.9.layer_output_norm.bias
create_tensor: loading tensor blk.10.attn_qkv.weight
create_tensor: loading tensor blk.10.attn_output.weight
create_tensor: loading tensor blk.10.attn_output_norm.weight
create_tensor: loading tensor blk.10.attn_output_norm.bias
create_tensor: loading tensor blk.10.ffn_up.weight
create_tensor: loading tensor blk.10.ffn_down.weight
create_tensor: loading tensor blk.10.ffn_gate.weight
create_tensor: loading tensor blk.10.layer_output_norm.weight
create_tensor: loading tensor blk.10.layer_output_norm.bias
create_tensor: loading tensor blk.11.attn_qkv.weight
create_tensor: loading tensor blk.11.attn_output.weight
create_tensor: loading tensor blk.11.attn_output_norm.weight
create_tensor: loading tensor blk.11.attn_output_norm.bias
create_tensor: loading tensor blk.11.ffn_up.weight
create_tensor: loading tensor blk.11.ffn_down.weight
create_tensor: loading tensor blk.11.ffn_gate.weight
create_tensor: loading tensor blk.11.layer_output_norm.weight
create_tensor: loading tensor blk.11.layer_output_norm.bias
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.14 MiB
llama_init_from_model: model default pooling_type is [1], but [-1] was specified
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = disabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: use bfloat         = true
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 3
llama_context: max_nodes = 1024
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
llama_context:      Metal compute buffer size =    23.51 MiB
llama_context:        CPU compute buffer size =     4.01 MiB
llama_context: graph nodes  = 371
llama_context: graph splits = 2
time=2025-11-12T14:26:36.154-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds"
time=2025-11-12T14:26:36.154-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1
time=2025-11-12T14:26:36.154-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T14:26:36.154-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds"
time=2025-11-12T14:26:36.154-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11891 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192
time=2025-11-12T14:26:36.158-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-12T14:26:36.161-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=901 used=0 remaining=901
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0             0x1069537e0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32'
ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32                          0x106954120 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32'
ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32                            0x106954c20 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0             0x106955da0 | th_max =  896 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4                         0x1069565a0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1'
ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1                             0x1069568a0 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4                     0x106956d60 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32'
ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32                             0x904908000 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32'
ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32                           0x904908300 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4                 0x904908600 | th_max =  768 | th_width =   32
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1             0x904908900 | th_max =  896 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1             0x904908c00 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32'
ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32                           0x904908f00 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1             0x904909200 | th_max =  832 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4                   0x904909500 | th_max = 1024 | th_width =   32
SIGTRAP: trace trap
PC=0x181ddab1c m=8 sigcode=0
signal arrived during cgo execution

goroutine 50 gp=0x14000003a40 m=8 mp=0x14000600008 [syscall]:
runtime.cgocall(0x105180fbc, 0x14000085b88)
	runtime/cgocall.go:167 +0x44 fp=0x14000085b50 sp=0x14000085b10 pc=0x104686684
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1069470a0, {0x185, 0x904c95000, 0x0, 0x10694cf20, 0x10694d720, 0x904dee800, 0x1069480a0})
	_cgo_gotypes.go:674 +0x30 fp=0x14000085b80 sp=0x14000085b50 pc=0x1049cd790
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	github.com/ollama/ollama/llama/llama.go:160
github.com/ollama/ollama/llama.(*Context).Decode(0x1400039ec08?, 0x0?)
	github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x14000085c70 sp=0x14000085b80 pc=0x1049cf94c
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000612140, 0x1400004ea00, 0x14000085f18)
	github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x14000085ed0 sp=0x14000085c70 pc=0x104a6f774
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000612140, {0x10588bcd0, 0x1400069c870})
	github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x14000085fa0 sp=0x14000085ed0 pc=0x104a6f43c
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x14000085fd0 sp=0x14000085fa0 pc=0x104a7324c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000085fd0 sp=0x14000085fd0 pc=0x104691d04
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000457720 sp=0x14000457700 pc=0x104689a50
runtime.netpollblock(0x140001177b8?, 0x470ba6c?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x14000457760 sp=0x14000457720 pc=0x10464f620
internal/poll.runtime_pollWait(0x131a50200, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x14000457790 sp=0x14000457760 pc=0x104688c80
internal/poll.(*pollDesc).wait(0x14000254a80?, 0x10470dafc?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140004577c0 sp=0x14000457790 pc=0x104707568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14000254a80)
	internal/poll/fd_unix.go:613 +0x21c fp=0x14000457870 sp=0x140004577c0 pc=0x10470bb4c
net.(*netFD).accept(0x14000254a80)
	net/fd_unix.go:161 +0x28 fp=0x14000457930 sp=0x14000457870 pc=0x10476d7f8
net.(*TCPListener).accept(0x140004ba080)
	net/tcpsock_posix.go:159 +0x24 fp=0x14000457980 sp=0x14000457930 pc=0x104780e64
net.(*TCPListener).Accept(0x140004ba080)
	net/tcpsock.go:380 +0x2c fp=0x140004579c0 sp=0x14000457980 pc=0x10477ff0c
net/http.(*onceCloseListener).Accept(0x14000618090?)
	<autogenerated>:1 +0x2c fp=0x140004579e0 sp=0x140004579c0 pc=0x104954a1c
net/http.(*Server).Serve(0x14000610100, {0x105889668, 0x140004ba080})
	net/http/server.go:3463 +0x24c fp=0x14000457b10 sp=0x140004579e0 pc=0x10492fbac
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000196200, 0x4, 0x4})
	github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x14000457ce0 sp=0x14000457b10 pc=0x104a73064
github.com/ollama/ollama/runner.Execute({0x140001961f0?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x14000457d10 sp=0x14000457ce0 pc=0x104aea528
github.com/ollama/ollama/cmd.NewCLI.func2(0x14000277400?, {0x1053d7af0?, 0x4?, 0x1053d7af4?})
	github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x14000457d40 sp=0x14000457d10 pc=0x105131db0
github.com/spf13/cobra.(*Command).execute(0x140006bb508, {0x140000ae980, 0x4, 0x4})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x14000457e60 sp=0x14000457d40 pc=0x1047d7f60
github.com/spf13/cobra.(*Command).ExecuteC(0x14000143208)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x14000457f20 sp=0x14000457e60 pc=0x1047d863c
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x14000457f40 sp=0x14000457f20 pc=0x1051328d4
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x14000457fd0 sp=0x14000457f40 pc=0x1046560d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000457fd0 sp=0x14000457fd0 pc=0x104691d04

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x104689a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x104656424
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104691d04
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 18 gp=0x14000182380 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x104689a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x14000190000)
	runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x104640ee4
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x104634b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104691d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 19 gp=0x14000182540 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x105596590?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x104689a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x10617d860)
	runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x10463e9fc
runtime.bgscavenge(0x14000190000)
	runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x10463ef9c
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x104634ad8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104691d04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 20 gp=0x14000182a80 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x105875a18?, 0xb0?, 0x62?, 0x1000000010?)
	runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x104689a50
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x104633b24
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104691d04
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 21 gp=0x14000183500 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069740 sp=0x14000069720 pc=0x104689a50
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x10617e740)
	runtime/mcleanup.go:439 +0x110 fp=0x14000069780 sp=0x14000069740 pc=0x104631010
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x140000697d0 sp=0x14000069780 pc=0x104631820
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104691d04
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 22 gp=0x14000183880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 3 gp=0x14000003500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006d710 sp=0x1400006d6f0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006d7b0 sp=0x1400006d710 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 34 gp=0x14000584000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400058a710 sp=0x1400058a6f0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400058a7b0 sp=0x1400058a710 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400058a7d0 sp=0x1400058a7b0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400058a7d0 sp=0x1400058a7d0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 23 gp=0x14000183a40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 4 gp=0x140000036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x588e5a75f37f4?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 35 gp=0x140005841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x588e5a75edfbd?, 0x3?, 0xb7?, 0x20?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000704f10 sp=0x14000704ef0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000704fb0 sp=0x14000704f10 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000704fd0 sp=0x14000704fb0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000704fd0 sp=0x14000704fd0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 24 gp=0x14000183c00 m=nil [GC worker (idle)]:
runtime.gopark(0x588e5a75f1eba?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000703f10 sp=0x14000703ef0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000703fb0 sp=0x14000703f10 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000703fd0 sp=0x14000703fb0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000703fd0 sp=0x14000703fd0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x1061af920?, 0x1?, 0x81?, 0x15?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 36 gp=0x14000584380 m=nil [GC worker (idle)]:
runtime.gopark(0x1061af920?, 0x1?, 0x6b?, 0xfd?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400058b710 sp=0x1400058b6f0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400058b7b0 sp=0x1400058b710 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400058b7d0 sp=0x1400058b7b0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400058b7d0 sp=0x1400058b7d0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 25 gp=0x14000183dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x588e5a75d8efd?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000706f10 sp=0x14000706ef0 pc=0x104689a50
runtime.gcBgMarkWorker(0x140001996c0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000706fb0 sp=0x14000706f10 pc=0x1046371b0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000706fd0 sp=0x14000706fb0 pc=0x104637098
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000706fd0 sp=0x14000706fd0 pc=0x104691d04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 51 gp=0x14000003c00 m=nil [chan receive]:
runtime.gopark(0x14000049868?, 0x10466a594?, 0x98?, 0x98?, 0x10468b9fc?)
	runtime/proc.go:460 +0xc0 fp=0x14000621850 sp=0x14000621830 pc=0x104689a50
runtime.chanrecv(0x140003da070, 0x14000049a40, 0x1)
	runtime/chan.go:667 +0x428 fp=0x140006218d0 sp=0x14000621850 pc=0x104624318
runtime.chanrecv1(0x140001ffdd0?, 0x140001f4008?)
	runtime/chan.go:509 +0x14 fp=0x14000621900 sp=0x140006218d0 pc=0x104623eb4
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000612140, {0x105889848, 0x140000a80f0}, 0x14000020140)
	github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x14000621a90 sp=0x14000621900 pc=0x104a71728
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x105889848?, 0x140000a80f0?}, 0x14000049b18?)
	<autogenerated>:1 +0x40 fp=0x14000621ac0 sp=0x14000621a90 pc=0x104a73570
net/http.HandlerFunc.ServeHTTP(0x140001e4000?, {0x105889848?, 0x140000a80f0?}, 0x14000049b00?)
	net/http/server.go:2322 +0x38 fp=0x14000621af0 sp=0x14000621ac0 pc=0x10492c7e8
net/http.(*ServeMux).ServeHTTP(0x10?, {0x105889848, 0x140000a80f0}, 0x14000020140)
	net/http/server.go:2861 +0x190 fp=0x14000621b40 sp=0x14000621af0 pc=0x10492e280
net/http.serverHandler.ServeHTTP({0x1058862b0?}, {0x105889848?, 0x140000a80f0?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x14000621b70 sp=0x14000621b40 pc=0x104948a70
net/http.(*conn).serve(0x14000618090, {0x10588bc98, 0x14000298360})
	net/http/server.go:2109 +0x528 fp=0x14000621fa0 sp=0x14000621b70 pc=0x10492abd8
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x14000621fd0 sp=0x14000621fa0 pc=0x10492ff0c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000621fd0 sp=0x14000621fd0 pc=0x104691d04
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 27 gp=0x140001836c0 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1046ad600?)
	runtime/proc.go:460 +0xc0 fp=0x1400016dd80 sp=0x1400016dd60 pc=0x104689a50
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x1400016ddc0 sp=0x1400016dd80 pc=0x10464f620
internal/poll.runtime_pollWait(0x131a50000, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x1400016ddf0 sp=0x1400016ddc0 pc=0x104688c80
internal/poll.(*pollDesc).wait(0x14000254b00?, 0x140004ba0e1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400016de20 sp=0x1400016ddf0 pc=0x104707568
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x14000254b00, {0x140004ba0e1, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400016dec0 sp=0x1400016de20 pc=0x104708780
net.(*netFD).Read(0x14000254b00, {0x140004ba0e1?, 0x0?, 0x0?})
	net/fd_posix.go:68 +0x28 fp=0x1400016df10 sp=0x1400016dec0 pc=0x10476bff8
net.(*conn).Read(0x14000070038, {0x140004ba0e1?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x1400016df60 sp=0x1400016df10 pc=0x1047785d4
net/http.(*connReader).backgroundRead(0x140004ba0c0)
	net/http/server.go:702 +0x38 fp=0x1400016dfb0 sp=0x1400016df60 pc=0x104925c48
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x1400016dfd0 sp=0x1400016dfb0 pc=0x104925b38
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400016dfd0 sp=0x1400016dfd0 pc=0x104691d04
created by net/http.(*connReader).startBackgroundRead in goroutine 51
	net/http/server.go:698 +0xb8

r0      0x10670c000
r1      0x10670fcc0
r2      0x0
r3      0x106713020
r4      0x904c63000
r5      0x0
r6      0xffffffffbfc007ff
r7      0xfffff0003ffff800
r8      0x904c63000
r9      0x0
r10     0x300100400b00
r11     0xbfb99984c02334b7
r12     0x3f140930bf8574a0
r13     0xab575efd22575a47
r14     0x106758cb8
r15     0x904c60000
r16     0x2821b5e88
r17     0xffffffffb00007ff
r18     0x0
r19     0xc00
r20     0xc00414722c7
r21     0x0
r22     0x1069471e0
r23     0x0
r24     0x300
r25     0x300
r26     0x185
r27     0x1069471e0
r28     0x0
r29     0x16f032cd0
lr      0x181f6ca78
sp      0x16f032c60
pc      0x181ddab1c
fault   0x181ddab1c
[GIN] 2025/11/12 - 14:26:36 | 500 |   409.52825ms |     192.168.1.3 | POST     "/api/embed"
<!-- gh-comment-id:3523073822 --> @smileBeda commented on GitHub (Nov 12, 2025): ``` ollama run nomic-embed-text "$(yes | head -1024)" Error: do embedding request: Post "http://127.0.0.1:61135/embedding": EOF ``` ``` ollama run bge-m3 "$(yes | head -1024)" [-0.0024655391,0.012520963,-0.034661766,0.00979911,0.020126741,-0.0063052033,0.061135337,0.02785952,0.01282014,0.01313768,0.03755821,-0.010756892,-0.013573597,-0.0027548627,0.015132667,-0.012089926,0.022975657,-0.02472517,-0.020567201,0.0067664445,-0.011849044,-0.007570717,-0.03824529,-0.028428787,-0.030870901,0.04635912,0.00093890616,-0.0024808426,-0.018684616,0.034005128,-0.0011272626,0.00035640955,0.02115013,-0.045110386,0.010169646,0.00084363075,0.0243686,-0.028889816,-0.05862591,-0.0040431353,-0.03596958,-0.0057717767,-0.012274092,-0.07077382,0.0056228437,-0.021380764,-0.03406687,-0.06495494,0.013454194,-0.027232219,0.0001037273,-0.015824683,0.07380437,-0.0057144463,0.02181727,0.044988256,0.0126143675,-0.0018407698,-0.03308246,-0.025498824,0.014384171,-0.006924586,0.0045332373,-0.015956,-0.02739619,0.081781015,0.011106647,-0.010713183,0.041856807,-0.054054722,-0.0271439,-0.015687754,0.0013446996,-0.054479368,-0.032991424,0.026090862,0.006033099,-0.018812899,0.03209107,0.052387595,0.029508837,0.017055718,0.015636925,0.034301832,0.025275117,0.037229896,-0.0040980135,0.037856203,-0.029072529,-0.011431079,0.001313473,-0.008878349,0.05359564,-0.014697112,0.0062859715,0.040346902,-0.039523553,0.0011466894,0.015034411,-0.006230922,0.03317728,0.0096215205,-0.036962673,-0.05009489,-0.0041105906,-0.009056571,0.009797539,0.009659828,0.022052253,-0.014271221,0.004911889,0.034467973,0.018570494,0.007644651,-0.015751498,0.031769533,-0.03232941,0.010429895,0.031727176,-0.0044451705,0.0017900123,0.026673919,0.02348727,0.049866468,0.01145509,-0.0012024986,0.033363994,-0.0089492295,-0.0032395925,-0.01593136,0.029613724,0.004149588,0.002019955,-0.025029045,-0.006650551,-0.048293322,-0.0044958806,0.05816978,-0.039653793,0.003548803,0.0504772,0.009426564,-0.0037158318,-0.036830965,0.027001854,0.0034835231,0.0403916,-0.011458368,0.0019726914,-0.007508284,-0.00050825294,-0.018630989,-0.002110288,0.001588383,-0.027995486,0.009936741,3.033485e-7,0.011067152,-0.05336586,0.02202685,0.0037786409,-0.038541194,-0.0132649215,-0.00093720097,-0.010298659,-0.0075208712,-0.016267667,0.0010408205,-0.0027519322,-0.026513953,0.009943936,-0.08339628,0.018493485,0.008538674,0.015437462,0.078491874,-0.029538492,-0.00759044,-0.021251151,-0.013018569,-0.0040967194,-0.029866422,0.018041184,-0.01926123,-0.03276526,0.038832404,-0.030362166,-0.04040532,0.008604108,0.013357923,-0.0107174385,0.0062563936,0.020380922,0.025674624,0.0004139832,-0.023605218,0.011787711,-0.0263709,0.022049392,-0.03183026,-0.041072633,0.03520126,0.03636103,-0.001420021,-0.0071168095,-0.003475114,-0.009256344,-0.024726992,0.004858782,-0.07107463,-0.011742883,-0.029894238,0.007675998,-0.0039243214,0.02797107,0.044983312,0.020941233,-0.00031587912,0.0032516897,0.023841256,0.018934144,0.017419532,0.029200885,0.033058114,0.0057526706,-0.015528183,0.024385974,0.03961257,0.011234596,-0.06473419,0.006752831,0.014939676,-0.02408232,-0.007482002,-0.02326038,0.011269281,-0.0021610092,0.03188758,-0.030374486,0.044189982,-0.007030019,-0.018662086,0.03398468,0.019599242,-0.03382726,-0.024045229,0.009293969,0.012517427,-0.024604863,0.009967605,0.05701172,-0.042490818,-0.05574554,0.029391129,-0.013158905,-0.022451937,-0.0013953196,0.0010310693,0.030122241,0.03512458,-0.023279674,-0.008600523,-0.0017030638,0.024007333,-0.056472007,0.012965989,0.0036796296,-0.03226964,0.016177427,0.0043270844,0.02023193,-0.042816862,-0.012473664,0.02284442,0.022730084,0.08285353,0.040155057,-0.03872918,0.030392934,-0.021634726,0.00546185,-0.0011475339,0.066409476,-0.0066396515,-0.03981463,-0.012861173,-0.054637857,0.005779912,-0.051658977,-0.053374965,0.0031512675,-0.014707976,0.06142391,-0.03319437,-0.024947861,0.04467378,-0.0015979138,-0.1444461,-0.005867286,0.010005407,0.009326801,0.027588205,0.027580697,0.013830309,0.019020082,0.0045419964,0.0009360246,0.011734269,-0.07169997,-0.03942002,0.033524357,-0.0016364885,0.021200843,-0.03803462,0.006927008,-0.014600801,-0.020997213,-0.027597075,-0.021096494,0.037062924,-0.03223568,0.012495709,0.016814955,0.00793817,-0.020347144,-0.03451836,0.0058715516,-0.022061015,0.005970981,-0.034165315,-0.033752747,-0.024826435,-0.015665732,0.061299153,-0.027362015,-0.011104958,0.016834145,-0.020344483,0.029956134,-0.05248282,0.011467307,0.0024013566,-0.05782271,-0.03273222,-0.018879442,0.0074885366,0.006577176,-0.02210981,0.022134557,-0.01096359,-0.022107601,-0.036801744,0.013578336,0.012765276,0.011498251,-0.010516095,0.03200766,-0.0043872544,-0.011588062,0.007655406,-0.020142725,-0.02313444,0.02140618,-0.011085141,-0.005426379,0.0062283957,-0.049036637,0.031972714,0.045493525,0.04060209,-0.019140955,-0.021028873,0.005654341,-0.023051027,-0.0258323,-0.036621604,-0.07010111,-0.0058659078,-0.003555226,-0.025211412,0.0046697045,-0.048972543,-0.0066941585,0.0721344,0.038623564,0.022870922,0.4188231,-0.016584517,-0.006423623,-0.023491489,-0.0014622242,-0.025837049,0.026689028,0.005219663,-0.007853802,-0.032330308,0.010086888,0.0024895759,-0.059916563,0.008715808,-0.010020744,0.051186983,-0.043995805,-0.006019228,0.029576596,-0.0031411275,0.042419996,0.0005692696,0.002248256,-0.00013912994,-0.029533146,0.025610736,0.019355472,0.013901333,0.010196629,0.039352432,-0.014640141,-0.03582712,0.046053924,-0.00030978123,0.01877422,-0.0069803945,-0.0024567114,-0.011088582,-0.01144942,0.046130415,0.00982192,-0.033516433,-0.009391387,-0.07873915,-0.028174788,0.02506204,0.029990388,-0.0053328737,0.047977164,0.040182456,0.023841534,-0.026355827,-0.06659133,0.006661398,0.014868735,0.019982077,-0.036526073,0.01893396,-0.04987242,0.022044118,0.042882357,0.055330362,-0.043434836,0.027803088,0.0010729382,-0.009366099,-0.046222456,-0.027903738,-0.023584226,0.04328739,0.035069674,-0.0060054446,-0.016948579,0.0011108259,0.01125101,0.030035915,0.05761236,-0.0039558248,0.026275672,0.007079113,0.013887756,-0.019085867,0.024673104,0.010854946,0.019223688,-0.02400833,0.0015028184,0.0048109735,-0.012231457,0.030818144,-0.000494414,-0.005980282,-0.047696657,0.0314536,-0.049490336,-0.039108228,-0.01056573,-0.04411308,0.0066438275,0.0034389494,-0.030610284,-0.02732567,0.011243963,-0.06744624,0.030650817,0.017871248,0.0072186445,-0.029006612,-0.02638049,0.033587027,0.029566811,0.008505501,0.027395366,-0.0664682,0.03632154,0.01634719,-0.00069439784,0.035416592,0.021308318,0.011108194,-0.023692587,-0.011214007,-0.011781931,0.016087994,-0.040829666,0.008935033,-0.018292667,0.0056724157,-0.015730256,-0.0061015366,0.012424478,-0.016951142,-0.011590707,-0.025020381,0.028604867,0.008927267,0.019564424,0.03904284,0.024485415,-0.0056477864,0.02967337,0.02240289,-0.0562787,0.04340599,-0.024985952,0.017091712,-0.0145033,-0.00076440943,-0.008496924,0.003899913,-0.017432932,0.00493525,0.029253187,0.010091748,0.0012004063,-0.0071473643,-0.019423338,-0.051589005,0.010105738,-0.031340126,0.043789517,0.020013098,0.0065812464,0.084248334,0.003083852,0.03728648,-0.0008686313,0.019962952,-0.01712255,0.053697962,-0.014919218,-0.031848058,0.021151142,0.01644062,0.028637784,-0.02432075,-0.010255131,0.005267059,-0.025809908,-0.038523044,0.050633863,-0.011014982,-0.0015101071,0.040950038,0.03718395,0.03672299,0.028105877,-0.00558873,0.022421764,-0.023462698,-0.019924395,0.057750594,0.016954603,-0.021807274,0.038083732,0.022665353,0.025241043,0.022568753,0.020972993,-0.04087117,-0.008020938,0.004225881,-0.028563386,-0.016267743,-0.031485554,-0.021455178,-0.005757515,0.0497787,0.0034065873,-0.0070191757,-0.021384139,0.0069736773,-0.02429704,-0.025753088,0.019425116,-0.037422083,0.025720833,0.016636813,0.032276906,-0.012337935,0.00418953,-0.0009897655,0.06980091,-0.032844268,-0.023691827,0.06353138,-0.0025569363,-0.019205905,-0.027978208,-0.008032115,-0.010481519,-0.03767267,0.008283569,-0.0033715994,0.046683595,0.03409144,-0.04110347,0.039985873,-0.002968713,0.004485528,0.03830466,-0.02181867,0.01076896,0.018330116,0.021085488,0.0263162,-0.04520039,-0.013822619,-0.023639588,-0.0008488159,-0.006620213,0.010439314,0.038564805,-0.033890728,0.01770206,0.012213449,-0.036237907,0.00763573,-0.035758387,-0.03908441,0.009209322,0.028018894,0.0036218711,-0.040811997,-0.021229675,-0.022977633,0.011879559,0.03238317,-0.023480931,-0.029326448,-0.03503613,-0.0113010965,0.012513779,-0.02613464,0.029717911,0.024427637,0.01423229,0.04650249,-0.0251392,0.021643467,-0.012065791,0.0457312,-0.04316157,0.0131587,-0.00050999597,0.01796037,0.025816992,-0.0073407874,-0.0017777034,0.009066388,0.022376463,-0.03755141,-0.0027831635,0.016198842,-0.009827243,0.050868277,0.012146033,0.026347242,-0.00084143894,0.03212374,-0.0065670223,-0.018514324,-0.027147882,0.016092014,-0.038171522,-0.011425882,-0.0006904658,-0.034653895,0.03836904,0.021484783,0.06093142,-0.027745996,-0.031167656,-0.040309142,-0.008138127,0.0123959305,0.015726129,-0.017049938,-0.022439092,-0.010031138,0.020072857,-0.016622897,-0.040653937,0.005570996,-0.038436256,0.046834797,0.016570263,0.017752757,0.0130015565,-0.034592267,-0.003268793,-0.028658863,0.02435429,-0.0082318485,0.019522885,-0.007309754,0.039943315,0.0030807054,0.026084023,-0.017065985,0.005910353,0.048532736,0.01178011,0.0025256095,0.03549256,-0.022447921,-0.018427558,0.04802989,-0.013333389,0.017590683,-0.02865029,0.030927725,-0.007425591,-0.032867193,0.072334364,-0.0051595215,-0.003760645,-0.018146658,0.043036375,-0.037239496,-0.0005816647,0.024694342,0.00053734315,-0.0019469006,-0.012092161,0.03574291,-0.052824248,-0.0034759114,-0.020724956,0.018236168,0.030281255,0.010804771,-0.023447935,-0.010797497,-0.06856684,0.06818638,-0.029461134,0.008633985,-0.017991582,0.016394403,-0.033078976,-0.032454077,-0.025115354,-0.016500521,0.022592153,-0.0077800285,-0.012368588,0.0015893972,-0.032424908,0.026063792,-0.06023825,0.00059756543,0.009382394,0.008505659,-0.1321474,0.029312903,-0.015193261,0.0025878823,-0.0387726,0.020469075,-0.0057056695,-0.009442006,-0.032056034,-0.044402387,0.025135178,-0.04530564,0.046452574,0.008444393,-0.0011374742,0.036127485,-0.004313948,-0.012632526,0.024961505,0.048601005,-0.0040913653,0.018671183,0.0074533084,0.01996697,0.016175397,-0.023270665,-0.00569709,-0.010178411,-0.011731929,0.01766548,-0.008570303,0.017449716,-0.014216223,0.012492117,0.017967332,-0.038002644,0.0113833435,-0.005700737,-0.0341095,-0.027408944,0.010282649,0.019552762,-0.0034715703,-0.00018759748,-0.02216678,0.023314673,0.029872691,-0.0147312535,-0.034435906,-0.011677904,-0.026339358,-0.00092818117,0.013567718,0.029714638,-0.001719257,-0.036353868,-0.033376772,0.005550725,-0.019802434,0.028393691,-0.012579767,0.014000176,-0.019786911,0.02421517,-0.029089866,0.00782554,-0.00049744727,-0.022144828,-0.028562952,-0.00015825986,-0.037844125,-0.036894493,0.04215439,0.020620883,0.01885507,-0.038857844,0.048837516,0.004341216,-0.011581981,-0.049321745,-0.032807406,0.007057428,0.004249416,-0.027840981,0.00078957924,0.009674887,-0.013996783,-0.034583524,-0.025872016,-0.004104291,-0.03406769,-0.009652061,-0.004043305,-0.0068282573,-0.05273987,0.011827701,-0.036969002,0.001871128,0.0035747464,-0.008623273,-0.034558963,0.0003303392,-0.025174242,-0.004510887,-0.019840019,0.018243162,-0.023271669,-0.02466403,0.034955174,0.016693972,-0.05068059,-0.005946675,-0.0077928966,0.01861228,-0.07544394,-0.010803999,-0.023254205,-0.014070995,0.008985934,-0.015939059,-0.015008386,-0.02572816,0.008838571,-0.04049887,0.017221892,0.03112894,-0.015034083,0.07777998,0.016974635,-0.011862955,0.06681233,-0.036532596,-0.0092163,-0.047023665,0.017291145,0.0051411637,-0.0064917402,0.018890465,-0.01127899,-0.037489187,-0.025443766,0.051484432,-0.040845744,-0.015196647,-0.0900095,0.0012304548,0.0091564935,0.00935908,-0.034656838,-0.018139584,0.04535503,0.017865177,0.052362476,0.015837295,0.0019606838,0.029097918,0.010493229,-0.015125583,-0.0068201665,-0.004712786,0.034350347,-0.010478846,-0.011280923,-0.049657345,-0.012973315,0.032526378,0.010206259,-0.015764402,-0.009471565,-0.018411959,0.014796017,0.05210716,-0.047796525,-0.0052287923,0.0048679374,-0.024274275,-0.014123835,0.005471596,0.014785772,0.030673018,0.031013083,0.0014718971,0.019773485,-0.03055917,0.0049005705,0.0067022564,-0.014821356,-0.006461881,-0.033621244,-0.052686,-0.017066486,-0.0057467385,-0.013057907,-0.03026125,0.023544429,0.026313292,-0.01840306,0.048073422,0.03715534,-0.035943616,-0.05844684,-0.0004259377,-0.017406482,-0.005458709,-0.03653825,-0.034500774,0.017790042,0.00029393873,-0.0071086367,0.004814528,-0.014828339,0.008669794,-0.012632796,0.05237533,-0.027390089,0.014993734,0.011721252,-0.03145538,0.032266226,0.0502262,0.0027262687,-0.01816494,-0.05503297,-0.019706922,0.0011754895,-0.021551775,-0.02882505,-0.006974214,0.0012116006,-0.001067933,-0.003411494,-0.005696014,0.013949916,0.028859617,0.008325512,0.028334796,-0.009813161,0.03681828,-0.03541507,-0.04573604,0.0027602196,0.045494065,-0.010330862,0.012248682] ``` > Can you provide the log from this failure? _Until second failure_ ``` OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 ollama serve time=2025-11-12T14:26:20.612-03:00 level=INFO source=routes.go:1525 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/bedas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-11-12T14:26:20.613-03:00 level=INFO source=images.go:522 msg="total blobs: 15" time=2025-11-12T14:26:20.613-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-12T14:26:20.613-03:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)" time=2025-11-12T14:26:20.614-03:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-12T14:26:20.614-03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-12T14:26:20.615-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --ollama-engine --port 61459" time=2025-11-12T14:26:20.615-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=52.30125ms OLLAMA_LIBRARY_PATH=[/opt/homebrew/Cellar/ollama/0.12.10/bin] extra_envs=map[] time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=1 time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:172 msg="adjusting filtering IDs" FilterID=0 new_ID=0 time=2025-11-12T14:26:20.666-03:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=52.803875ms time=2025-11-12T14:26:20.666-03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M2 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="11.8 GiB" available="11.8 GiB" time=2025-11-12T14:26:20.666-03:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="11.8 GiB" threshold="20.0 GiB" [GIN] 2025/11/12 - 14:26:35 | 200 | 2.496208ms | 192.168.1.3 | GET "/api/tags" time=2025-11-12T14:26:35.142-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.334µs time=2025-11-12T14:26:35.142-03:00 level=DEBUG source=sched.go:194 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-11-12T14:26:35.144-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:35.145-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.006 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 llama_model_load: vocab only - skipping tensors time=2025-11-12T14:26:35.193-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048 time=2025-11-12T14:26:35.193-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 61462" time=2025-11-12T14:26:35.193-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin time=2025-11-12T14:26:35.196-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.6 GiB" free_swap="0 B" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.196-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB" time=2025-11-12T14:26:35.209-03:00 level=INFO source=runner.go:910 msg="starting go runner" time=2025-11-12T14:26:35.209-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.006 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB time=2025-11-12T14:26:35.210-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2025-11-12T14:26:35.231-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:61462" time=2025-11-12T14:26:35.240-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free time=2025-11-12T14:26:35.240-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T14:26:35.241-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 768 print_info: n_layer = 12 print_info: n_head = 12 print_info: n_head_kv = 12 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 768 print_info: n_embd_v_gqa = 768 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 3072 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: model type = 137M print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: layer 0 assigned to device Metal, is_swa = 0 load_tensors: layer 1 assigned to device Metal, is_swa = 0 load_tensors: layer 2 assigned to device Metal, is_swa = 0 load_tensors: layer 3 assigned to device Metal, is_swa = 0 load_tensors: layer 4 assigned to device Metal, is_swa = 0 load_tensors: layer 5 assigned to device Metal, is_swa = 0 load_tensors: layer 6 assigned to device Metal, is_swa = 0 load_tensors: layer 7 assigned to device Metal, is_swa = 0 load_tensors: layer 8 assigned to device Metal, is_swa = 0 load_tensors: layer 9 assigned to device Metal, is_swa = 0 load_tensors: layer 10 assigned to device Metal, is_swa = 0 load_tensors: layer 11 assigned to device Metal, is_swa = 0 load_tensors: layer 12 assigned to device Metal, is_swa = 0 create_tensor: loading tensor token_embd.weight create_tensor: loading tensor token_types.weight create_tensor: loading tensor token_embd_norm.weight create_tensor: loading tensor token_embd_norm.bias create_tensor: loading tensor blk.0.attn_qkv.weight create_tensor: loading tensor blk.0.attn_output.weight create_tensor: loading tensor blk.0.attn_output_norm.weight create_tensor: loading tensor blk.0.attn_output_norm.bias create_tensor: loading tensor blk.0.ffn_up.weight create_tensor: loading tensor blk.0.ffn_down.weight create_tensor: loading tensor blk.0.ffn_gate.weight create_tensor: loading tensor blk.0.layer_output_norm.weight create_tensor: loading tensor blk.0.layer_output_norm.bias create_tensor: loading tensor blk.1.attn_qkv.weight create_tensor: loading tensor blk.1.attn_output.weight create_tensor: loading tensor blk.1.attn_output_norm.weight create_tensor: loading tensor blk.1.attn_output_norm.bias create_tensor: loading tensor blk.1.ffn_up.weight create_tensor: loading tensor blk.1.ffn_down.weight create_tensor: loading tensor blk.1.ffn_gate.weight create_tensor: loading tensor blk.1.layer_output_norm.weight create_tensor: loading tensor blk.1.layer_output_norm.bias create_tensor: loading tensor blk.2.attn_qkv.weight create_tensor: loading tensor blk.2.attn_output.weight create_tensor: loading tensor blk.2.attn_output_norm.weight create_tensor: loading tensor blk.2.attn_output_norm.bias create_tensor: loading tensor blk.2.ffn_up.weight create_tensor: loading tensor blk.2.ffn_down.weight create_tensor: loading tensor blk.2.ffn_gate.weight create_tensor: loading tensor blk.2.layer_output_norm.weight create_tensor: loading tensor blk.2.layer_output_norm.bias create_tensor: loading tensor blk.3.attn_qkv.weight create_tensor: loading tensor blk.3.attn_output.weight create_tensor: loading tensor blk.3.attn_output_norm.weight create_tensor: loading tensor blk.3.attn_output_norm.bias create_tensor: loading tensor blk.3.ffn_up.weight create_tensor: loading tensor blk.3.ffn_down.weight create_tensor: loading tensor blk.3.ffn_gate.weight create_tensor: loading tensor blk.3.layer_output_norm.weight create_tensor: loading tensor blk.3.layer_output_norm.bias create_tensor: loading tensor blk.4.attn_qkv.weight create_tensor: loading tensor blk.4.attn_output.weight create_tensor: loading tensor blk.4.attn_output_norm.weight create_tensor: loading tensor blk.4.attn_output_norm.bias create_tensor: loading tensor blk.4.ffn_up.weight create_tensor: loading tensor blk.4.ffn_down.weight create_tensor: loading tensor blk.4.ffn_gate.weight create_tensor: loading tensor blk.4.layer_output_norm.weight create_tensor: loading tensor blk.4.layer_output_norm.bias create_tensor: loading tensor blk.5.attn_qkv.weight create_tensor: loading tensor blk.5.attn_output.weight create_tensor: loading tensor blk.5.attn_output_norm.weight create_tensor: loading tensor blk.5.attn_output_norm.bias create_tensor: loading tensor blk.5.ffn_up.weight create_tensor: loading tensor blk.5.ffn_down.weight create_tensor: loading tensor blk.5.ffn_gate.weight create_tensor: loading tensor blk.5.layer_output_norm.weight create_tensor: loading tensor blk.5.layer_output_norm.bias create_tensor: loading tensor blk.6.attn_qkv.weight create_tensor: loading tensor blk.6.attn_output.weight create_tensor: loading tensor blk.6.attn_output_norm.weight create_tensor: loading tensor blk.6.attn_output_norm.bias create_tensor: loading tensor blk.6.ffn_up.weight create_tensor: loading tensor blk.6.ffn_down.weight create_tensor: loading tensor blk.6.ffn_gate.weight create_tensor: loading tensor blk.6.layer_output_norm.weight create_tensor: loading tensor blk.6.layer_output_norm.bias create_tensor: loading tensor blk.7.attn_qkv.weight create_tensor: loading tensor blk.7.attn_output.weight create_tensor: loading tensor blk.7.attn_output_norm.weight create_tensor: loading tensor blk.7.attn_output_norm.bias create_tensor: loading tensor blk.7.ffn_up.weight create_tensor: loading tensor blk.7.ffn_down.weight create_tensor: loading tensor blk.7.ffn_gate.weight create_tensor: loading tensor blk.7.layer_output_norm.weight create_tensor: loading tensor blk.7.layer_output_norm.bias create_tensor: loading tensor blk.8.attn_qkv.weight create_tensor: loading tensor blk.8.attn_output.weight create_tensor: loading tensor blk.8.attn_output_norm.weight create_tensor: loading tensor blk.8.attn_output_norm.bias create_tensor: loading tensor blk.8.ffn_up.weight create_tensor: loading tensor blk.8.ffn_down.weight create_tensor: loading tensor blk.8.ffn_gate.weight create_tensor: loading tensor blk.8.layer_output_norm.weight create_tensor: loading tensor blk.8.layer_output_norm.bias create_tensor: loading tensor blk.9.attn_qkv.weight create_tensor: loading tensor blk.9.attn_output.weight create_tensor: loading tensor blk.9.attn_output_norm.weight create_tensor: loading tensor blk.9.attn_output_norm.bias create_tensor: loading tensor blk.9.ffn_up.weight create_tensor: loading tensor blk.9.ffn_down.weight create_tensor: loading tensor blk.9.ffn_gate.weight create_tensor: loading tensor blk.9.layer_output_norm.weight create_tensor: loading tensor blk.9.layer_output_norm.bias create_tensor: loading tensor blk.10.attn_qkv.weight create_tensor: loading tensor blk.10.attn_output.weight create_tensor: loading tensor blk.10.attn_output_norm.weight create_tensor: loading tensor blk.10.attn_output_norm.bias create_tensor: loading tensor blk.10.ffn_up.weight create_tensor: loading tensor blk.10.ffn_down.weight create_tensor: loading tensor blk.10.ffn_gate.weight create_tensor: loading tensor blk.10.layer_output_norm.weight create_tensor: loading tensor blk.10.layer_output_norm.bias create_tensor: loading tensor blk.11.attn_qkv.weight create_tensor: loading tensor blk.11.attn_output.weight create_tensor: loading tensor blk.11.attn_output_norm.weight create_tensor: loading tensor blk.11.attn_output_norm.bias create_tensor: loading tensor blk.11.ffn_up.weight create_tensor: loading tensor blk.11.ffn_down.weight create_tensor: loading tensor blk.11.ffn_gate.weight create_tensor: loading tensor blk.11.layer_output_norm.weight create_tensor: loading tensor blk.11.layer_output_norm.bias load_tensors: offloading 12 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 44.72 MiB load_tensors: Metal_Mapped model buffer size = 216.14 MiB llama_init_from_model: model default pooling_type is [1], but [-1] was specified llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M2 Pro ggml_metal_init: use bfloat = true ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true set_abort_callback: call llama_context: CPU output buffer size = 0.12 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 3 llama_context: max_nodes = 1024 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: Metal compute buffer size = 23.51 MiB llama_context: CPU compute buffer size = 4.01 MiB llama_context: graph nodes = 371 llama_context: graph splits = 2 time=2025-11-12T14:26:35.492-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds" time=2025-11-12T14:26:35.492-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1 time=2025-11-12T14:26:35.493-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T14:26:35.493-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.30 seconds" time=2025-11-12T14:26:35.493-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.498-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:35.504-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=495 used=0 remaining=495 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 0.12 MiB to 59.08 MiB ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1 0x1044fd660 | th_max = 896 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32' ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32 0x1044fdfa0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32' ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0x1044fe8e0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1 0x1044ffa60 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32 0x104500260 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1 0x104500860 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1' ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1 0x104500b60 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4 0xbeef90000 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32' ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32 0xbeef90300 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32' ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32 0xbeef90600 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4 0xbeef90900 | th_max = 1024 | th_width = 32 [GIN] 2025/11/12 - 14:26:35 | 200 | 451.770458ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T14:26:35.586-03:00 level=DEBUG source=sched.go:520 msg="context for request finished" time=2025-11-12T14:26:35.586-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T14:26:35.586-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T14:26:35.599-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.602-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:35.603-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=495 prompt=491 used=0 remaining=491 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding [GIN] 2025/11/12 - 14:26:35 | 200 | 62.181875ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T14:26:35.650-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.650-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T14:26:35.650-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T14:26:35.660-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.663-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:35.664-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=491 prompt=452 used=0 remaining=452 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4 0xbeef90c00 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4 0xbeef90f00 | th_max = 768 | th_width = 32 [GIN] 2025/11/12 - 14:26:35 | 200 | 54.840042ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T14:26:35.707-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.707-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T14:26:35.707-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T14:26:35.717-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.719-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:35.720-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=452 prompt=255 used=0 remaining=255 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=2' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=2 0xbeef91200 | th_max = 1024 | th_width = 32 [GIN] 2025/11/12 - 14:26:35 | 200 | 33.137375ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T14:26:35.741-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.741-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T14:26:35.742-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T14:26:35.757-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.759-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:35.760-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=255 prompt=1872 used=0 remaining=1872 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 59.08 MiB to 61.11 MiB ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0 0xbeef91500 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0 0xbeef91800 | th_max = 896 | th_width = 32 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding SIGTRAP: trace trap PC=0x181ddab1c m=10 sigcode=0 signal arrived during cgo execution goroutine 38 gp=0x14000102c40 m=10 mp=0x1400054c008 [syscall]: runtime.cgocall(0x102c00fbc, 0x140004ceb88) runtime/cgocall.go:167 +0x44 fp=0x140004ceb50 sp=0x140004ceb10 pc=0x102106684 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1044ef1d0, {0x200, 0x1044f68f0, 0x0, 0x1044f70f0, 0xbee802000, 0xbef04fc00, 0x1044f0570}) _cgo_gotypes.go:674 +0x30 fp=0x140004ceb80 sp=0x140004ceb50 pc=0x10244d790 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:160 github.com/ollama/ollama/llama.(*Context).Decode(0x140004b8008?, 0x0?) github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x140004cec70 sp=0x140004ceb80 pc=0x10244f94c github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x1400034e280, 0x14000378550, 0x140004cef18) github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x140004ceed0 sp=0x140004cec70 pc=0x1024ef774 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x1400034e280, {0x10330bcd0, 0x140003780a0}) github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x140004cefa0 sp=0x140004ceed0 pc=0x1024ef43c github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x140004cefd0 sp=0x140004cefa0 pc=0x1024f324c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140004cefd0 sp=0x140004cefd0 pc=0x102111d04 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140003c1720 sp=0x140003c1700 pc=0x102109a50 runtime.netpollblock(0x1400011f7b8?, 0x218ba6c?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x140003c1760 sp=0x140003c1720 pc=0x1020cf620 internal/poll.runtime_pollWait(0x12f44ba00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x140003c1790 sp=0x140003c1760 pc=0x102108c80 internal/poll.(*pollDesc).wait(0x1400047c100?, 0x10218dafc?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140003c17c0 sp=0x140003c1790 pc=0x102187568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x1400047c100) internal/poll/fd_unix.go:613 +0x21c fp=0x140003c1870 sp=0x140003c17c0 pc=0x10218bb4c net.(*netFD).accept(0x1400047c100) net/fd_unix.go:161 +0x28 fp=0x140003c1930 sp=0x140003c1870 pc=0x1021ed7f8 net.(*TCPListener).accept(0x1400012a0c0) net/tcpsock_posix.go:159 +0x24 fp=0x140003c1980 sp=0x140003c1930 pc=0x102200e64 net.(*TCPListener).Accept(0x1400012a0c0) net/tcpsock.go:380 +0x2c fp=0x140003c19c0 sp=0x140003c1980 pc=0x1021fff0c net/http.(*onceCloseListener).Accept(0x14000568090?) <autogenerated>:1 +0x2c fp=0x140003c19e0 sp=0x140003c19c0 pc=0x1023d4a1c net/http.(*Server).Serve(0x14000562100, {0x103309668, 0x1400012a0c0}) net/http/server.go:3463 +0x24c fp=0x140003c1b10 sp=0x140003c19e0 pc=0x1023afbac github.com/ollama/ollama/runner/llamarunner.Execute({0x14000034200, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x140003c1ce0 sp=0x140003c1b10 pc=0x1024f3064 github.com/ollama/ollama/runner.Execute({0x140000341f0?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x140003c1d10 sp=0x140003c1ce0 pc=0x10256a528 github.com/ollama/ollama/cmd.NewCLI.func2(0x140001f5400?, {0x102e57af0?, 0x4?, 0x102e57af4?}) github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x140003c1d40 sp=0x140003c1d10 pc=0x102bb1db0 github.com/spf13/cobra.(*Command).execute(0x140007a3808, {0x1400052fac0, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x140003c1e60 sp=0x140003c1d40 pc=0x102257f60 github.com/spf13/cobra.(*Command).ExecuteC(0x14000536908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x140003c1f20 sp=0x140003c1e60 pc=0x10225863c github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x140003c1f40 sp=0x140003c1f20 pc=0x102bb28d4 runtime.main() runtime/proc.go:285 +0x278 fp=0x140003c1fd0 sp=0x140003c1f40 pc=0x1020d60d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140003c1fd0 sp=0x140003c1fd0 pc=0x102111d04 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x102109a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1020d6424 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x102111d04 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006d760 sp=0x1400006d740 pc=0x102109a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x14000098000) runtime/mgcsweep.go:323 +0x104 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1020c0ee4 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1020b4b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x102111d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x103016590?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006df60 sp=0x1400006df40 pc=0x102109a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x103bfd860) runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1020be9fc runtime.bgscavenge(0x14000098000) runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1020bef9c runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1020b4ad8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x102111d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 5 gp=0x14000003c00 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x1032f5a18?, 0xc0?, 0x0?, 0x1000000010?) runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x102109a50 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x1020b3b24 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x102111d04 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 6 gp=0x140001e0700 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006e740 sp=0x1400006e720 pc=0x102109a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x103bfe740) runtime/mcleanup.go:439 +0x110 fp=0x1400006e780 sp=0x1400006e740 pc=0x1020b1010 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x1400006e7d0 sp=0x1400006e780 pc=0x1020b1820 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x102111d04 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 7 gp=0x140001e0c40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 18 gp=0x14000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068710 sp=0x140000686f0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x140000687b0 sp=0x14000068710 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 34 gp=0x14000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400050a710 sp=0x1400050a6f0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x1400050a7b0 sp=0x1400050a710 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400050a7d0 sp=0x1400050a7b0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400050a7d0 sp=0x1400050a7d0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 35 gp=0x140005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x103c2f920?, 0x1?, 0xd4?, 0xad?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140004c9f10 sp=0x140004c9ef0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x140004c9fb0 sp=0x140004c9f10 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140004c9fd0 sp=0x140004c9fb0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140004c9fd0 sp=0x140004c9fd0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 36 gp=0x14000504380 m=nil [GC worker (idle)]: runtime.gopark(0x588e585b5798a?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400050b710 sp=0x1400050b6f0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x1400050b7b0 sp=0x1400050b710 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400050b7d0 sp=0x1400050b7b0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400050b7d0 sp=0x1400050b7d0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 8 gp=0x140001e0e00 m=nil [GC worker (idle)]: runtime.gopark(0x588e585b58954?, 0x3?, 0xa9?, 0xbc?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 19 gp=0x14000102540 m=nil [GC worker (idle)]: runtime.gopark(0x588e585b70054?, 0x3?, 0x5?, 0x6?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068f10 sp=0x14000068ef0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x14000068fb0 sp=0x14000068f10 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 37 gp=0x14000504540 m=nil [GC worker (idle)]: runtime.gopark(0x588e585b57c78?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400050bf10 sp=0x1400050bef0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x1400050bfb0 sp=0x1400050bf10 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400050bfd0 sp=0x1400050bfb0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400050bfd0 sp=0x1400050bfd0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 20 gp=0x14000102700 m=nil [GC worker (idle)]: runtime.gopark(0x588e585b71194?, 0x1?, 0x7?, 0xcd?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000083f10 sp=0x14000083ef0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x14000083fb0 sp=0x14000083f10 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000083fd0 sp=0x14000083fb0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 9 gp=0x140001e0fc0 m=nil [GC worker (idle)]: runtime.gopark(0x588e585b771c5?, 0x1?, 0x98?, 0x3a?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140004caf10 sp=0x140004caef0 pc=0x102109a50 runtime.gcBgMarkWorker(0x140000a56c0) runtime/mgc.go:1463 +0xe0 fp=0x140004cafb0 sp=0x140004caf10 pc=0x1020b71b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140004cafd0 sp=0x140004cafb0 pc=0x1020b7098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140004cafd0 sp=0x140004cafd0 pc=0x102111d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 39 gp=0x14000102e00 m=nil [chan receive]: runtime.gopark(0x1400056d868?, 0x1020ea594?, 0x98?, 0xd8?, 0x10210b9fc?) runtime/proc.go:460 +0xc0 fp=0x1400056d850 sp=0x1400056d830 pc=0x102109a50 runtime.chanrecv(0x140003182a0, 0x1400056da40, 0x1) runtime/chan.go:667 +0x428 fp=0x1400056d8d0 sp=0x1400056d850 pc=0x1020a4318 runtime.chanrecv1(0x1400057e030?, 0x140004d8000?) runtime/chan.go:509 +0x14 fp=0x1400056d900 sp=0x1400056d8d0 pc=0x1020a3eb4 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x1400034e280, {0x103309848, 0x140005424b0}, 0x1400037aa00) github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x1400056da90 sp=0x1400056d900 pc=0x1024f1728 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x103309848?, 0x140005424b0?}, 0x1400056db18?) <autogenerated>:1 +0x40 fp=0x1400056dac0 sp=0x1400056da90 pc=0x1024f3570 net/http.HandlerFunc.ServeHTTP(0x14000796300?, {0x103309848?, 0x140005424b0?}, 0x1400056db00?) net/http/server.go:2322 +0x38 fp=0x1400056daf0 sp=0x1400056dac0 pc=0x1023ac7e8 net/http.(*ServeMux).ServeHTTP(0x10?, {0x103309848, 0x140005424b0}, 0x1400037aa00) net/http/server.go:2861 +0x190 fp=0x1400056db40 sp=0x1400056daf0 pc=0x1023ae280 net/http.serverHandler.ServeHTTP({0x1033062b0?}, {0x103309848?, 0x140005424b0?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x1400056db70 sp=0x1400056db40 pc=0x1023c8a70 net/http.(*conn).serve(0x14000568090, {0x10330bc98, 0x1400027cea0}) net/http/server.go:2109 +0x528 fp=0x1400056dfa0 sp=0x1400056db70 pc=0x1023aabd8 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x1400056dfd0 sp=0x1400056dfa0 pc=0x1023aff0c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400056dfd0 sp=0x1400056dfd0 pc=0x102111d04 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 52 gp=0x14000102fc0 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x10212d600?) runtime/proc.go:460 +0xc0 fp=0x1400050dd80 sp=0x1400050dd60 pc=0x102109a50 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x1400050ddc0 sp=0x1400050dd80 pc=0x1020cf620 internal/poll.runtime_pollWait(0x12f44b800, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x1400050ddf0 sp=0x1400050ddc0 pc=0x102108c80 internal/poll.(*pollDesc).wait(0x1400047c180?, 0x1400012a121?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400050de20 sp=0x1400050ddf0 pc=0x102187568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400047c180, {0x1400012a121, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400050dec0 sp=0x1400050de20 pc=0x102188780 net.(*netFD).Read(0x1400047c180, {0x1400012a121?, 0x103b39450?, 0x1400012a1d4?}) net/fd_posix.go:68 +0x28 fp=0x1400050df10 sp=0x1400050dec0 pc=0x1021ebff8 net.(*conn).Read(0x14000126030, {0x1400012a121?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x1400050df60 sp=0x1400050df10 pc=0x1021f85d4 net/http.(*connReader).backgroundRead(0x1400012a100) net/http/server.go:702 +0x38 fp=0x1400050dfb0 sp=0x1400050df60 pc=0x1023a5c48 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x1400050dfd0 sp=0x1400050dfb0 pc=0x1023a5b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400050dfd0 sp=0x1400050dfd0 pc=0x102111d04 created by net/http.(*connReader).startBackgroundRead in goroutine 39 net/http/server.go:698 +0xb8 r0 0x104188000 r1 0x10418bcc0 r2 0x0 r3 0x10418f020 r4 0xbeec7f000 r5 0x0 r6 0xffffffffbfc007ff r7 0xfffff0003ffff800 r8 0xbeec7f000 r9 0x0 r10 0x700300400b00 r11 0xbfb99984c02334b7 r12 0x3f140930bf8574a0 r13 0x238bd17ec34be941 r14 0x1041d4e08 r15 0xbeec7c000 r16 0x2821b5e88 r17 0xffffffffb00007ff r18 0x0 r19 0xc00 r20 0xc0040b122c7 r21 0x0 r22 0x1044ef310 r23 0x0 r24 0x300 r25 0x300 r26 0x200 r27 0x1044ef310 r28 0x0 r29 0x1725cecd0 lr 0x181f6ca78 sp 0x1725cec60 pc 0x181ddab1c fault 0x181ddab1c [GIN] 2025/11/12 - 14:26:35 | 500 | 60.308792ms | 192.168.1.3 | POST "/api/embed" time=2025-11-12T14:26:35.810-03:00 level=DEBUG source=sched.go:385 msg="context for request finished" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.810-03:00 level=DEBUG source=sched.go:290 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 duration=5m0s time=2025-11-12T14:26:35.810-03:00 level=DEBUG source=sched.go:308 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 " time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:609 msg="evaluating already loaded" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=server.go:1241 msg="server unhealthy" error="llama runner process no longer running: 2 " time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:161 msg=reloading runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:236 msg="resetting model to expire immediately to make room" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 refCount=0 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:247 msg="waiting for pending requests to complete and unload to occur" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:311 msg="runner expired event received" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:326 msg="got lock to unload expired event" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:349 msg="starting background wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.819-03:00 level=DEBUG source=sched.go:657 msg="no need to wait for VRAM recovery" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=server.go:1699 msg="stopping llama server" pid=11885 time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=sched.go:358 msg="runner terminated and removed from list, blocking for VRAM recovery" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=sched.go:361 msg="sending an unloaded event" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=sched.go:253 msg="unload completed" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11885 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 time=2025-11-12T14:26:35.820-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=709ns time=2025-11-12T14:26:35.822-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:35.823-03:00 level=DEBUG source=sched.go:211 msg="loading first model" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 llama_model_load: vocab only - skipping tensors time=2025-11-12T14:26:35.844-03:00 level=WARN source=server.go:173 msg="requested context size too large for model" num_ctx=8192 n_ctx_train=2048 time=2025-11-12T14:26:35.845-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.12.10/bin/ollama runner --model /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --port 61468" time=2025-11-12T14:26:35.845-03:00 level=DEBUG source=server.go:401 msg=subprocess PATH="/opt/homebrew/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/bedas/.atuin/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin://Applications/Topaz Gigapixel AI.app/Contents/Resources/bin:/Library/Apple/usr/bin:/usr/local/MacGPG2/bin:/Users/bedas/.composer/vendor/bin" OLLAMA_NUM_GPU=0 OLLAMA_HOST=0.0.0.0 OLLAMA_DEBUG=1 DYLD_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin OLLAMA_LIBRARY_PATH=/opt/homebrew/Cellar/ollama/0.12.10/bin time=2025-11-12T14:26:35.847-03:00 level=INFO source=server.go:470 msg="system memory" total="16.0 GiB" free="4.1 GiB" free_swap="0 B" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 library=Metal parallel=1 required="809.4 MiB" gpus=1 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=memory.go:198 msg=evaluating library=Metal gpu_count=1 available="[11.8 GiB]" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.vision.block_count default=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.key_length default=64 time=2025-11-12T14:26:35.847-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.value_length default=64 time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:611 msg="default cache size estimate" "attention MiB"=6 "attention bytes"=6291456 "recurrent MiB"=0 "recurrent bytes"=0 time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default=0 time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.848-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=nomic-bert.attention.head_count_kv default="&{size:0 values:[]}" time=2025-11-12T14:26:35.848-03:00 level=INFO source=server.go:522 msg=offload library=Metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split=[13] memory.available="[11.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB" time=2025-11-12T14:26:35.870-03:00 level=INFO source=runner.go:910 msg="starting go runner" time=2025-11-12T14:26:35.870-03:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/opt/homebrew/Cellar/ollama/0.12.10/bin ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.007 sec ggml_metal_device_init: GPU name: Apple M2 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB time=2025-11-12T14:26:35.871-03:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2025-11-12T14:26:35.896-03:00 level=INFO source=runner.go:946 msg="Server listening on 127.0.0.1:61468" time=2025-11-12T14:26:35.902-03:00 level=INFO source=runner.go:845 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:2048 KvCacheType: NumThreads:6 GPULayers:13[ID:0 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M2 Pro) (unknown id) - 12123 MiB free time=2025-11-12T14:26:35.902-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T14:26:35.903-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = nomic-bert llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 15: tokenizer.ggml.model str = bert llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 51 tensors llama_model_loader: - type f16: 61 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 260.86 MiB (16.00 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 100 '[UNK]' is not marked as EOG load: control token: 101 '[CLS]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: printing all EOG tokens: load: - 102 ('[SEP]') load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = nomic-bert print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 768 print_info: n_layer = 12 print_info: n_head = 12 print_info: n_head_kv = 12 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 768 print_info: n_embd_v_gqa = 768 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 3072 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: model type = 137M print_info: model params = 136.73 M print_info: general.name = nomic-embed-text-v1.5 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: layer 0 assigned to device Metal, is_swa = 0 load_tensors: layer 1 assigned to device Metal, is_swa = 0 load_tensors: layer 2 assigned to device Metal, is_swa = 0 load_tensors: layer 3 assigned to device Metal, is_swa = 0 load_tensors: layer 4 assigned to device Metal, is_swa = 0 load_tensors: layer 5 assigned to device Metal, is_swa = 0 load_tensors: layer 6 assigned to device Metal, is_swa = 0 load_tensors: layer 7 assigned to device Metal, is_swa = 0 load_tensors: layer 8 assigned to device Metal, is_swa = 0 load_tensors: layer 9 assigned to device Metal, is_swa = 0 load_tensors: layer 10 assigned to device Metal, is_swa = 0 load_tensors: layer 11 assigned to device Metal, is_swa = 0 load_tensors: layer 12 assigned to device Metal, is_swa = 0 create_tensor: loading tensor token_embd.weight create_tensor: loading tensor token_types.weight create_tensor: loading tensor token_embd_norm.weight create_tensor: loading tensor token_embd_norm.bias create_tensor: loading tensor blk.0.attn_qkv.weight create_tensor: loading tensor blk.0.attn_output.weight create_tensor: loading tensor blk.0.attn_output_norm.weight create_tensor: loading tensor blk.0.attn_output_norm.bias create_tensor: loading tensor blk.0.ffn_up.weight create_tensor: loading tensor blk.0.ffn_down.weight create_tensor: loading tensor blk.0.ffn_gate.weight create_tensor: loading tensor blk.0.layer_output_norm.weight create_tensor: loading tensor blk.0.layer_output_norm.bias create_tensor: loading tensor blk.1.attn_qkv.weight create_tensor: loading tensor blk.1.attn_output.weight create_tensor: loading tensor blk.1.attn_output_norm.weight create_tensor: loading tensor blk.1.attn_output_norm.bias create_tensor: loading tensor blk.1.ffn_up.weight create_tensor: loading tensor blk.1.ffn_down.weight create_tensor: loading tensor blk.1.ffn_gate.weight create_tensor: loading tensor blk.1.layer_output_norm.weight create_tensor: loading tensor blk.1.layer_output_norm.bias create_tensor: loading tensor blk.2.attn_qkv.weight create_tensor: loading tensor blk.2.attn_output.weight create_tensor: loading tensor blk.2.attn_output_norm.weight create_tensor: loading tensor blk.2.attn_output_norm.bias create_tensor: loading tensor blk.2.ffn_up.weight create_tensor: loading tensor blk.2.ffn_down.weight create_tensor: loading tensor blk.2.ffn_gate.weight create_tensor: loading tensor blk.2.layer_output_norm.weight create_tensor: loading tensor blk.2.layer_output_norm.bias create_tensor: loading tensor blk.3.attn_qkv.weight create_tensor: loading tensor blk.3.attn_output.weight create_tensor: loading tensor blk.3.attn_output_norm.weight create_tensor: loading tensor blk.3.attn_output_norm.bias create_tensor: loading tensor blk.3.ffn_up.weight create_tensor: loading tensor blk.3.ffn_down.weight create_tensor: loading tensor blk.3.ffn_gate.weight create_tensor: loading tensor blk.3.layer_output_norm.weight create_tensor: loading tensor blk.3.layer_output_norm.bias create_tensor: loading tensor blk.4.attn_qkv.weight create_tensor: loading tensor blk.4.attn_output.weight create_tensor: loading tensor blk.4.attn_output_norm.weight create_tensor: loading tensor blk.4.attn_output_norm.bias create_tensor: loading tensor blk.4.ffn_up.weight create_tensor: loading tensor blk.4.ffn_down.weight create_tensor: loading tensor blk.4.ffn_gate.weight create_tensor: loading tensor blk.4.layer_output_norm.weight create_tensor: loading tensor blk.4.layer_output_norm.bias create_tensor: loading tensor blk.5.attn_qkv.weight create_tensor: loading tensor blk.5.attn_output.weight create_tensor: loading tensor blk.5.attn_output_norm.weight create_tensor: loading tensor blk.5.attn_output_norm.bias create_tensor: loading tensor blk.5.ffn_up.weight create_tensor: loading tensor blk.5.ffn_down.weight create_tensor: loading tensor blk.5.ffn_gate.weight create_tensor: loading tensor blk.5.layer_output_norm.weight create_tensor: loading tensor blk.5.layer_output_norm.bias create_tensor: loading tensor blk.6.attn_qkv.weight create_tensor: loading tensor blk.6.attn_output.weight create_tensor: loading tensor blk.6.attn_output_norm.weight create_tensor: loading tensor blk.6.attn_output_norm.bias create_tensor: loading tensor blk.6.ffn_up.weight create_tensor: loading tensor blk.6.ffn_down.weight create_tensor: loading tensor blk.6.ffn_gate.weight create_tensor: loading tensor blk.6.layer_output_norm.weight create_tensor: loading tensor blk.6.layer_output_norm.bias create_tensor: loading tensor blk.7.attn_qkv.weight create_tensor: loading tensor blk.7.attn_output.weight create_tensor: loading tensor blk.7.attn_output_norm.weight create_tensor: loading tensor blk.7.attn_output_norm.bias create_tensor: loading tensor blk.7.ffn_up.weight create_tensor: loading tensor blk.7.ffn_down.weight create_tensor: loading tensor blk.7.ffn_gate.weight create_tensor: loading tensor blk.7.layer_output_norm.weight create_tensor: loading tensor blk.7.layer_output_norm.bias create_tensor: loading tensor blk.8.attn_qkv.weight create_tensor: loading tensor blk.8.attn_output.weight create_tensor: loading tensor blk.8.attn_output_norm.weight create_tensor: loading tensor blk.8.attn_output_norm.bias create_tensor: loading tensor blk.8.ffn_up.weight create_tensor: loading tensor blk.8.ffn_down.weight create_tensor: loading tensor blk.8.ffn_gate.weight create_tensor: loading tensor blk.8.layer_output_norm.weight create_tensor: loading tensor blk.8.layer_output_norm.bias create_tensor: loading tensor blk.9.attn_qkv.weight create_tensor: loading tensor blk.9.attn_output.weight create_tensor: loading tensor blk.9.attn_output_norm.weight create_tensor: loading tensor blk.9.attn_output_norm.bias create_tensor: loading tensor blk.9.ffn_up.weight create_tensor: loading tensor blk.9.ffn_down.weight create_tensor: loading tensor blk.9.ffn_gate.weight create_tensor: loading tensor blk.9.layer_output_norm.weight create_tensor: loading tensor blk.9.layer_output_norm.bias create_tensor: loading tensor blk.10.attn_qkv.weight create_tensor: loading tensor blk.10.attn_output.weight create_tensor: loading tensor blk.10.attn_output_norm.weight create_tensor: loading tensor blk.10.attn_output_norm.bias create_tensor: loading tensor blk.10.ffn_up.weight create_tensor: loading tensor blk.10.ffn_down.weight create_tensor: loading tensor blk.10.ffn_gate.weight create_tensor: loading tensor blk.10.layer_output_norm.weight create_tensor: loading tensor blk.10.layer_output_norm.bias create_tensor: loading tensor blk.11.attn_qkv.weight create_tensor: loading tensor blk.11.attn_output.weight create_tensor: loading tensor blk.11.attn_output_norm.weight create_tensor: loading tensor blk.11.attn_output_norm.bias create_tensor: loading tensor blk.11.ffn_up.weight create_tensor: loading tensor blk.11.ffn_down.weight create_tensor: loading tensor blk.11.ffn_gate.weight create_tensor: loading tensor blk.11.layer_output_norm.weight create_tensor: loading tensor blk.11.layer_output_norm.bias load_tensors: offloading 12 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 44.72 MiB load_tensors: Metal_Mapped model buffer size = 216.14 MiB llama_init_from_model: model default pooling_type is [1], but [-1] was specified llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = disabled llama_context: kv_unified = false llama_context: freq_base = 1000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M2 Pro ggml_metal_init: use bfloat = true ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true set_abort_callback: call llama_context: CPU output buffer size = 0.12 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 3 llama_context: max_nodes = 1024 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: Metal compute buffer size = 23.51 MiB llama_context: CPU compute buffer size = 4.01 MiB llama_context: graph nodes = 371 llama_context: graph splits = 2 time=2025-11-12T14:26:36.154-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds" time=2025-11-12T14:26:36.154-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1 time=2025-11-12T14:26:36.154-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T14:26:36.154-03:00 level=INFO source=server.go:1289 msg="llama runner started in 0.31 seconds" time=2025-11-12T14:26:36.154-03:00 level=DEBUG source=sched.go:512 msg="finished setting up" runner.name=registry.ollama.ai/library/nomic-embed-text:latest runner.inference="[{ID:0 Library:Metal}]" runner.size="809.4 MiB" runner.vram="809.4 MiB" runner.parallel=1 runner.pid=11891 runner.model=/Users/bedas/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 runner.num_ctx=8192 time=2025-11-12T14:26:36.158-03:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-12T14:26:36.161-03:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=901 used=0 remaining=901 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding output_reserve: reallocating output buffer from size 0.12 MiB to 61.11 MiB ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=0 0x1069537e0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32' ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32 0x106954120 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32' ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0x106954c20 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=0 0x106955da0 | th_max = 896 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32_4', name = 'kernel_soft_max_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32_4 0x1069565a0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_add_fuse_1', name = 'kernel_add_fuse_1' ggml_metal_library_compile_pipeline: loaded kernel_add_fuse_1 0x1069568a0 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_norm_mul_add_f32_4', name = 'kernel_norm_mul_add_f32_4' ggml_metal_library_compile_pipeline: loaded kernel_norm_mul_add_f32_4 0x106956d60 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32' ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32 0x904908000 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32' ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32 0x904908300 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32_4', name = 'kernel_mul_mv_f32_f32_4_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_4_nsg=4 0x904908600 | th_max = 768 | th_width = 32 decode: cannot decode batches with this context (calling encode() instead) init: embeddings required but some input tokens were not marked as outputs -> overriding ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f32', name = 'kernel_mul_mm_f16_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f32_bci=0_bco=1 0x904908900 | th_max = 896 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1 0x904908c00 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_soft_max_f32', name = 'kernel_soft_max_f32' ggml_metal_library_compile_pipeline: loaded kernel_soft_max_f32 0x904908f00 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=1_bco=1' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=1_bco=1 0x904909200 | th_max = 832 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f32_f32', name = 'kernel_mul_mv_f32_f32_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f32_f32_nsg=4 0x904909500 | th_max = 1024 | th_width = 32 SIGTRAP: trace trap PC=0x181ddab1c m=8 sigcode=0 signal arrived during cgo execution goroutine 50 gp=0x14000003a40 m=8 mp=0x14000600008 [syscall]: runtime.cgocall(0x105180fbc, 0x14000085b88) runtime/cgocall.go:167 +0x44 fp=0x14000085b50 sp=0x14000085b10 pc=0x104686684 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x1069470a0, {0x185, 0x904c95000, 0x0, 0x10694cf20, 0x10694d720, 0x904dee800, 0x1069480a0}) _cgo_gotypes.go:674 +0x30 fp=0x14000085b80 sp=0x14000085b50 pc=0x1049cd790 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:160 github.com/ollama/ollama/llama.(*Context).Decode(0x1400039ec08?, 0x0?) github.com/ollama/ollama/llama/llama.go:160 +0xcc fp=0x14000085c70 sp=0x14000085b80 pc=0x1049cf94c github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000612140, 0x1400004ea00, 0x14000085f18) github.com/ollama/ollama/runner/llamarunner/runner.go:468 +0x1d4 fp=0x14000085ed0 sp=0x14000085c70 pc=0x104a6f774 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000612140, {0x10588bcd0, 0x1400069c870}) github.com/ollama/ollama/runner/llamarunner/runner.go:361 +0x15c fp=0x14000085fa0 sp=0x14000085ed0 pc=0x104a6f43c github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x2c fp=0x14000085fd0 sp=0x14000085fa0 pc=0x104a7324c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000085fd0 sp=0x14000085fd0 pc=0x104691d04 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:926 +0x418 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000457720 sp=0x14000457700 pc=0x104689a50 runtime.netpollblock(0x140001177b8?, 0x470ba6c?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x14000457760 sp=0x14000457720 pc=0x10464f620 internal/poll.runtime_pollWait(0x131a50200, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x14000457790 sp=0x14000457760 pc=0x104688c80 internal/poll.(*pollDesc).wait(0x14000254a80?, 0x10470dafc?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140004577c0 sp=0x14000457790 pc=0x104707568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x14000254a80) internal/poll/fd_unix.go:613 +0x21c fp=0x14000457870 sp=0x140004577c0 pc=0x10470bb4c net.(*netFD).accept(0x14000254a80) net/fd_unix.go:161 +0x28 fp=0x14000457930 sp=0x14000457870 pc=0x10476d7f8 net.(*TCPListener).accept(0x140004ba080) net/tcpsock_posix.go:159 +0x24 fp=0x14000457980 sp=0x14000457930 pc=0x104780e64 net.(*TCPListener).Accept(0x140004ba080) net/tcpsock.go:380 +0x2c fp=0x140004579c0 sp=0x14000457980 pc=0x10477ff0c net/http.(*onceCloseListener).Accept(0x14000618090?) <autogenerated>:1 +0x2c fp=0x140004579e0 sp=0x140004579c0 pc=0x104954a1c net/http.(*Server).Serve(0x14000610100, {0x105889668, 0x140004ba080}) net/http/server.go:3463 +0x24c fp=0x14000457b10 sp=0x140004579e0 pc=0x10492fbac github.com/ollama/ollama/runner/llamarunner.Execute({0x14000196200, 0x4, 0x4}) github.com/ollama/ollama/runner/llamarunner/runner.go:947 +0x754 fp=0x14000457ce0 sp=0x14000457b10 pc=0x104a73064 github.com/ollama/ollama/runner.Execute({0x140001961f0?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0x128 fp=0x14000457d10 sp=0x14000457ce0 pc=0x104aea528 github.com/ollama/ollama/cmd.NewCLI.func2(0x14000277400?, {0x1053d7af0?, 0x4?, 0x1053d7af4?}) github.com/ollama/ollama/cmd/cmd.go:1841 +0x50 fp=0x14000457d40 sp=0x14000457d10 pc=0x105131db0 github.com/spf13/cobra.(*Command).execute(0x140006bb508, {0x140000ae980, 0x4, 0x4}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x14000457e60 sp=0x14000457d40 pc=0x1047d7f60 github.com/spf13/cobra.(*Command).ExecuteC(0x14000143208) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x14000457f20 sp=0x14000457e60 pc=0x1047d863c github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x14000457f40 sp=0x14000457f20 pc=0x1051328d4 runtime.main() runtime/proc.go:285 +0x278 fp=0x14000457fd0 sp=0x14000457f40 pc=0x1046560d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000457fd0 sp=0x14000457fd0 pc=0x104691d04 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x104689a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x104656424 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104691d04 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 18 gp=0x14000182380 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068760 sp=0x14000068740 pc=0x104689a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x14000190000) runtime/mgcsweep.go:323 +0x104 fp=0x140000687b0 sp=0x14000068760 pc=0x104640ee4 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x104634b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104691d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 19 gp=0x14000182540 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x105596590?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000068f60 sp=0x14000068f40 pc=0x104689a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x10617d860) runtime/mgcscavenge.go:425 +0x5c fp=0x14000068f90 sp=0x14000068f60 pc=0x10463e9fc runtime.bgscavenge(0x14000190000) runtime/mgcscavenge.go:658 +0xac fp=0x14000068fb0 sp=0x14000068f90 pc=0x10463ef9c runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x104634ad8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104691d04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 20 gp=0x14000182a80 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x105875a18?, 0xb0?, 0x62?, 0x1000000010?) runtime/proc.go:460 +0xc0 fp=0x1400006c580 sp=0x1400006c560 pc=0x104689a50 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x1400006c7d0 sp=0x1400006c580 pc=0x104633b24 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104691d04 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 21 gp=0x14000183500 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069740 sp=0x14000069720 pc=0x104689a50 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x10617e740) runtime/mcleanup.go:439 +0x110 fp=0x14000069780 sp=0x14000069740 pc=0x104631010 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x140000697d0 sp=0x14000069780 pc=0x104631820 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104691d04 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 22 gp=0x14000183880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000069f10 sp=0x14000069ef0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x14000069fb0 sp=0x14000069f10 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 3 gp=0x14000003500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006d710 sp=0x1400006d6f0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006d7b0 sp=0x1400006d710 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 34 gp=0x14000584000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400058a710 sp=0x1400058a6f0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x1400058a7b0 sp=0x1400058a710 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400058a7d0 sp=0x1400058a7b0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400058a7d0 sp=0x1400058a7d0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 23 gp=0x14000183a40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 4 gp=0x140000036c0 m=nil [GC worker (idle)]: runtime.gopark(0x588e5a75f37f4?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006df10 sp=0x1400006def0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006dfb0 sp=0x1400006df10 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 35 gp=0x140005841c0 m=nil [GC worker (idle)]: runtime.gopark(0x588e5a75edfbd?, 0x3?, 0xb7?, 0x20?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000704f10 sp=0x14000704ef0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x14000704fb0 sp=0x14000704f10 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000704fd0 sp=0x14000704fb0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000704fd0 sp=0x14000704fd0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 24 gp=0x14000183c00 m=nil [GC worker (idle)]: runtime.gopark(0x588e5a75f1eba?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000703f10 sp=0x14000703ef0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x14000703fb0 sp=0x14000703f10 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000703fd0 sp=0x14000703fb0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000703fd0 sp=0x14000703fd0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x1061af920?, 0x1?, 0x81?, 0x15?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 36 gp=0x14000584380 m=nil [GC worker (idle)]: runtime.gopark(0x1061af920?, 0x1?, 0x6b?, 0xfd?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400058b710 sp=0x1400058b6f0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x1400058b7b0 sp=0x1400058b710 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400058b7d0 sp=0x1400058b7b0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400058b7d0 sp=0x1400058b7d0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 25 gp=0x14000183dc0 m=nil [GC worker (idle)]: runtime.gopark(0x588e5a75d8efd?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000706f10 sp=0x14000706ef0 pc=0x104689a50 runtime.gcBgMarkWorker(0x140001996c0) runtime/mgc.go:1463 +0xe0 fp=0x14000706fb0 sp=0x14000706f10 pc=0x1046371b0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000706fd0 sp=0x14000706fb0 pc=0x104637098 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000706fd0 sp=0x14000706fd0 pc=0x104691d04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 51 gp=0x14000003c00 m=nil [chan receive]: runtime.gopark(0x14000049868?, 0x10466a594?, 0x98?, 0x98?, 0x10468b9fc?) runtime/proc.go:460 +0xc0 fp=0x14000621850 sp=0x14000621830 pc=0x104689a50 runtime.chanrecv(0x140003da070, 0x14000049a40, 0x1) runtime/chan.go:667 +0x428 fp=0x140006218d0 sp=0x14000621850 pc=0x104624318 runtime.chanrecv1(0x140001ffdd0?, 0x140001f4008?) runtime/chan.go:509 +0x14 fp=0x14000621900 sp=0x140006218d0 pc=0x104623eb4 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0x14000612140, {0x105889848, 0x140000a80f0}, 0x14000020140) github.com/ollama/ollama/runner/llamarunner/runner.go:758 +0x5a8 fp=0x14000621a90 sp=0x14000621900 pc=0x104a71728 github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x105889848?, 0x140000a80f0?}, 0x14000049b18?) <autogenerated>:1 +0x40 fp=0x14000621ac0 sp=0x14000621a90 pc=0x104a73570 net/http.HandlerFunc.ServeHTTP(0x140001e4000?, {0x105889848?, 0x140000a80f0?}, 0x14000049b00?) net/http/server.go:2322 +0x38 fp=0x14000621af0 sp=0x14000621ac0 pc=0x10492c7e8 net/http.(*ServeMux).ServeHTTP(0x10?, {0x105889848, 0x140000a80f0}, 0x14000020140) net/http/server.go:2861 +0x190 fp=0x14000621b40 sp=0x14000621af0 pc=0x10492e280 net/http.serverHandler.ServeHTTP({0x1058862b0?}, {0x105889848?, 0x140000a80f0?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x14000621b70 sp=0x14000621b40 pc=0x104948a70 net/http.(*conn).serve(0x14000618090, {0x10588bc98, 0x14000298360}) net/http/server.go:2109 +0x528 fp=0x14000621fa0 sp=0x14000621b70 pc=0x10492abd8 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x14000621fd0 sp=0x14000621fa0 pc=0x10492ff0c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000621fd0 sp=0x14000621fd0 pc=0x104691d04 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 27 gp=0x140001836c0 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1046ad600?) runtime/proc.go:460 +0xc0 fp=0x1400016dd80 sp=0x1400016dd60 pc=0x104689a50 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x1400016ddc0 sp=0x1400016dd80 pc=0x10464f620 internal/poll.runtime_pollWait(0x131a50000, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x1400016ddf0 sp=0x1400016ddc0 pc=0x104688c80 internal/poll.(*pollDesc).wait(0x14000254b00?, 0x140004ba0e1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400016de20 sp=0x1400016ddf0 pc=0x104707568 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x14000254b00, {0x140004ba0e1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x1400016dec0 sp=0x1400016de20 pc=0x104708780 net.(*netFD).Read(0x14000254b00, {0x140004ba0e1?, 0x0?, 0x0?}) net/fd_posix.go:68 +0x28 fp=0x1400016df10 sp=0x1400016dec0 pc=0x10476bff8 net.(*conn).Read(0x14000070038, {0x140004ba0e1?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x1400016df60 sp=0x1400016df10 pc=0x1047785d4 net/http.(*connReader).backgroundRead(0x140004ba0c0) net/http/server.go:702 +0x38 fp=0x1400016dfb0 sp=0x1400016df60 pc=0x104925c48 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x1400016dfd0 sp=0x1400016dfb0 pc=0x104925b38 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400016dfd0 sp=0x1400016dfd0 pc=0x104691d04 created by net/http.(*connReader).startBackgroundRead in goroutine 51 net/http/server.go:698 +0xb8 r0 0x10670c000 r1 0x10670fcc0 r2 0x0 r3 0x106713020 r4 0x904c63000 r5 0x0 r6 0xffffffffbfc007ff r7 0xfffff0003ffff800 r8 0x904c63000 r9 0x0 r10 0x300100400b00 r11 0xbfb99984c02334b7 r12 0x3f140930bf8574a0 r13 0xab575efd22575a47 r14 0x106758cb8 r15 0x904c60000 r16 0x2821b5e88 r17 0xffffffffb00007ff r18 0x0 r19 0xc00 r20 0xc00414722c7 r21 0x0 r22 0x1069471e0 r23 0x0 r24 0x300 r25 0x300 r26 0x185 r27 0x1069471e0 r28 0x0 r29 0x16f032cd0 lr 0x181f6ca78 sp 0x16f032c60 pc 0x181ddab1c fault 0x181ddab1c [GIN] 2025/11/12 - 14:26:36 | 500 | 409.52825ms | 192.168.1.3 | POST "/api/embed" ```
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

OLLAMA_NUM_GPU is not an ollama configuration variable, in this case the model is still running on the GPU. Try this instead:

curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","options":{"num_gpu":0},"input":"'"$(yes | head -1024 | tr \\n ' ')"'"}'
<!-- gh-comment-id:3523109075 --> @rick-github commented on GitHub (Nov 12, 2025): `OLLAMA_NUM_GPU` is not an ollama configuration variable, in this case the model is still running on the GPU. Try this instead: ``` curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","options":{"num_gpu":0},"input":"'"$(yes | head -1024 | tr \\n ' ')"'"}' ```
Author
Owner

@smileBeda commented on GitHub (Nov 12, 2025):

Uhmmm ok - in this case:

localhost:11434/api/embed -d '{"model":"nomic-embed-text","options":{"num_gpu":0},"input":"'"$(yes | head -1024 | tr \\n ' ')"'"}'
{"model":"nomic-embed-text","embeddings":[[0.021686183,-0.019075416,-0.08084915,0.046185914,-0.013433088,0.0034744788,-0.009352537,-0.03835853,0.052491013,0.023707729,0.01777299,0.01824284,0.058592543,0.014298118,-0.0148073565,-0.04226909,0.071865134,-0.02081744,0.07402428,0.047630727,-0.035703592,-0.061972808,-0.025439903,0.029358461,0.13932094,0.029959764,0.008890659,0.055962462,-0.040253285,0.055738572,0.023032077,0.027151333,0.048234127,0.024731524,0.040766396,0.02059236,-0.011554242,0.003746063,0.04557921,0.002551454,-0.026286239,0.053431354,0.03796014,0.037456088,-0.0127428025,0.020931019,0.0014215623,-0.040728625,0.0071615926,0.019677326,-0.00016294623,0.05451718,-0.04591856,-0.003011821,0.075811304,0.0032525528,0.023912014,-0.021694973,-0.05343476,0.02303965,0.009487864,0.06414517,-0.03941804,0.00847457,0.02839105,-0.057757292,-0.042644575,-0.012965547,0.016992878,-0.0152737545,0.04317819,-0.016121816,0.007725548,0.0017134714,-0.04855355,0.013031644,0.057297707,0.018767038,-0.05227413,-0.021235187,-0.051099427,-0.01794608,0.09555547,0.04080336,0.0033400136,-0.012623584,0.025734171,0.02822867,-0.02935033,0.008740868,-0.020500278,0.0345513,0.039487172,0.038882088,-0.04920274,0.017994575,-0.042839777,-0.030968646,-0.007905659,-0.0026594969,-0.0065685664,-0.02126081,0.026138514,-0.026490662,0.10591035,0.012755773,-0.0068796934,-0.047791924,-0.024277285,0.017529499,-0.03310422,0.090151444,0.049970638,-0.045384873,-0.0056034126,-0.028291563,0.0031464777,-0.036383282,0.0402807,0.019436417,-0.013684183,-0.018630262,-0.07858707,-0.0072106086,0.008105802,-0.02971634,-0.053113725,-0.04665898,0.0044327183,0.037047036,-0.022794677,0.02021023,-0.0601057,-0.0057190326,-0.021340989,-0.025705835,-0.021358663,-0.022228397,0.04684543,0.04607339,0.027008768,-0.028202662,0.0034202125,-0.0471792,-0.061992332,-0.0051967567,0.040850088,-0.0024919366,-0.04056746,-0.019759843,0.016079975,0.09639304,0.051422298,0.022808686,-0.009716571,-0.06855191,-0.023506738,0.030357841,0.044496626,-0.051363133,0.032271896,0.015534277,0.020483406,0.0016665284,-0.063900396,-0.025207968,-0.046492346,0.015482503,-0.015348717,0.103639685,-0.005517729,-0.047375068,-0.04089083,-0.037740417,0.0076540615,-0.019980198,-0.001330645,0.02783084,0.002560613,-0.03460203,0.006611639,-0.01852568,-0.026192183,-0.072958335,0.013951939,-0.08045076,0.027723024,0.006414844,-0.07839159,0.0038343705,0.010379564,0.026131876,-0.0018594388,-0.050311957,-0.023675995,0.015751258,0.046607446,0.028146273,0.010697382,-0.016766312,-0.06496338,-0.03277741,0.019242188,-0.041225147,-0.09072078,0.03691292,-0.042609725,-0.032949124,-0.041525368,0.021689268,0.03020935,-0.056459293,0.057957932,-0.010038808,0.019697005,-0.046555217,-0.022095392,-0.003560141,0.014076218,-0.033721812,-0.004732759,0.023110555,0.04749636,0.038056158,-0.011409013,0.031094052,0.021581084,0.012823222,0.021493573,0.02342016,-0.02345297,-0.029092252,0.04903069,-0.005707408,0.0025186331,-0.011744213,-0.047379926,0.014004199,-0.023883512,-0.04105765,0.019632658,0.027518265,0.018094705,0.06457233,0.0024730836,0.018916942,0.026531907,-0.046211712,-0.022285229,0.021573773,-0.013419937,-0.03455596,-0.09191394,0.006868248,-0.0074618882,-0.0036054577,-0.011943719,0.054376084,-0.023703573,0.0062128394,-0.0011602545,-0.034951344,-0.0027187609,-0.061549872,-0.005690091,-0.003928343,-0.031891953,-0.08489665,0.009549399,-0.040205255,-0.033712078,-0.020124912,0.013518412,-0.019645879,0.004544032,0.05125325,-0.043481413,-0.00819612,-0.037473537,0.033948384,0.03443079,-0.01904758,-0.017521713,0.036012836,0.039729983,0.04016558,0.026435697,-0.01900826,-0.02292816,0.016070291,-0.0035369154,-0.024110485,0.0038215688,0.057592604,0.029590948,0.0071622715,0.02379542,0.009109867,-0.06997849,-0.041678693,0.056863043,0.053877287,0.0534399,-0.0042019654,0.0048725526,-0.051756836,0.014207478,0.001239464,-0.016997792,-0.012972334,0.0043350216,-0.010455633,-0.06252787,0.0056886543,-0.04634297,-0.012747424,0.016325515,-0.0011880386,0.009256547,0.018445907,0.093901336,0.0032362181,0.036879685,0.03898592,-0.0017730877,0.0084034735,0.020843638,0.00012466834,0.03648086,-0.020137697,0.010239599,0.034984734,0.048634406,0.009181588,-0.017981991,0.0341379,0.0069884867,-0.037323523,-0.013165663,0.01828296,0.10198325,-0.01322182,0.04200674,-0.017133482,-0.04823288,-0.023230165,-0.012041545,0.039270226,0.047154948,-0.03833253,0.016359722,0.006687654,0.009697709,0.027688542,0.018698134,0.018184038,0.083178155,-0.008737084,-0.0221613,-0.016088588,0.018888673,0.036256406,-0.011139573,-0.028507024,0.0048806933,0.08796526,0.087164365,0.0017190895,0.055111267,-0.029558316,-0.026093286,-0.03513178,-0.001955272,0.016355641,-0.038865004,-0.04278645,-0.016241442,0.016694512,0.029632216,0.036559317,0.0074990178,-0.027446846,0.026351007,-0.014120773,-0.0028301547,-0.031107536,-0.023011928,-0.01390782,0.010659927,-0.0010016179,0.019410174,-0.022966182,-0.04033363,0.019179517,0.008030088,-0.062301833,0.04377396,-0.0095559545,-0.05824732,-0.055477094,-0.00073785806,0.008106026,0.025866156,0.006177542,-0.04722947,0.0479828,-0.0050166636,-0.000563004,0.03293222,-0.024315206,-0.019643135,-0.0044638272,-0.039113633,-0.04050051,0.017827738,0.054445498,0.08128559,-0.010175632,-0.0075262473,0.036429584,0.09462373,0.05923723,0.06024173,0.017778952,0.0011750376,0.056077678,0.005126308,0.044036042,0.02955781,-0.029093454,-0.0005067129,-0.03964037,-0.00272547,0.01486924,-0.019214861,0.001671789,0.015784303,-0.01886978,-0.048071902,0.0568134,0.0656898,-0.03506794,0.029272802,-0.006944527,-0.0143931005,0.041246418,0.025522089,0.0010183979,-0.0054674605,0.017614638,-0.059203632,0.022599941,0.06559491,-0.040254146,0.07594877,-0.013017859,-0.0037439996,-0.031746235,0.0419938,0.10224012,0.067910805,-0.058072705,-0.013629509,0.0074107726,0.022738233,-0.024427887,-0.030576007,-0.05286132,0.027219392,0.07018184,-0.072518066,0.061472982,0.008000156,0.07016992,-0.045264628,-0.0045131734,-0.010343464,-0.03781142,0.03482338,-0.00040026402,-0.042967003,0.0034719815,-0.02626769,-0.011086819,-0.004766106,0.037381247,0.000103722574,0.021916747,-0.05167537,-0.00132617,-0.009572417,-0.009425646,0.037652317,-0.0057214363,-0.093880065,-0.03364398,0.01840425,-0.09699545,-0.029650358,0.021758681,-0.00019601939,0.001798711,0.006651571,-0.0027565758,0.019624304,0.0033517152,-0.0021955771,0.0069242218,-0.07163115,-0.03282917,0.02885534,-0.054206476,0.013943535,0.041023366,0.047308676,-0.015385463,-0.04626191,0.016062548,-0.0147262495,0.02499899,0.048379168,0.01441045,-0.027214805,-0.0025033941,-0.039530385,0.034421377,-0.04003829,-0.0420297,-0.01782141,-0.017828116,-0.0117866015,-0.050140034,-0.049780894,-0.044062153,-0.0049346397,0.0072679194,0.0033092557,-0.023582848,-0.026398756,0.037671007,-0.00985491,-0.040940713,0.00027156656,0.05968527,0.011502466,-0.04156381,-0.05325467,0.023848912,0.018052131,-0.015945133,0.00685342,-0.013844726,0.05296898,-0.006917853,-0.022049213,-0.023141906,0.0129875075,0.010701747,-0.06344161,-0.041602444,0.005573286,0.03274333,-0.015899876,-0.012204048,0.05062888,0.020073386,-0.00201493,-0.011889087,-0.0030152905,0.03937736,0.028480224,-0.03426437,0.025448287,-0.0348186,-0.01899614,0.031518877,0.042536896,0.029132357,-0.0070573185,-0.01738459,0.049294688,-0.036723215,-0.038572576,0.0071683135,-0.038969625,0.037439983,-0.0017382645,-0.06817503,-0.0060464684,-0.04483543,-0.005090633,0.01416351,0.0070704212,0.037871495,-0.029571362,-0.010826985,-0.012624524,0.11096962,0.023774981,-0.06727741,-0.04794441,-0.01632399,0.10217753,-0.043636844,-0.029781265,0.023843026,-0.010395858,0.043904353,-0.022760343,-0.05062468,0.018510994,0.019039785,0.012164014,-0.028493334,-0.007211876,-0.023033062,-0.015655043,0.04444586,0.0135373715,-0.07732589,-0.028173676,-0.03711684,0.006997713,0.003112476,0.013158137,0.0033843294,-0.03786328,0.0025156725,-0.008304768,0.014516092,0.0187173,-0.028482845,0.0076532774,-0.0069246422,-0.0033282607,-0.00021157203,-0.020454971,0.03290957,0.0188229,0.005858269,0.06623579,-0.0047679693,0.034281466,0.01660904,0.041005354,-0.081362806,-0.026192991,0.009393615,-0.001276406,-0.003977486,0.02229323,0.053764395,-0.032457035,-0.024147298,0.005305748,0.033774894,-0.00601022,-0.0025842565,0.011640188,-0.007762979,-0.00928128,0.008298095,-0.014858036,0.027766164,0.015023938,0.0071566817,-0.038882636,0.0069395625,-0.056028824,0.026073955,-0.01701176,-0.008454133,-0.037241224,-0.050372347,-0.015120472,0.020027528,0.001240695,-0.026284391,0.08324433,0.0070742825,-0.061718702,0.047395356,-0.05504478,-0.011403008,-0.04701833,-0.015377757,0.074295945,-0.028195063,-0.0699167,-0.06365188,-0.010389164,-0.059649818,0.042751353,0.029899709,-0.035607744,-0.021836322,0.057011824,0.016018704,-0.06891442,0.05407251,0.032165393,-0.019952746,-0.035503045,0.021520425,-0.025386892,0.020795086,-0.030297859,-0.04978729,0.03572664,-0.01997003,-0.034393944,0.035613995,-0.0076516597,-0.019534158,0.037733212,0.047553454,-0.011319463,-0.025111137,-0.025001481,-0.029580059,0.010800015,0.0377503,0.04133556,-0.038590226,0.026377652,-0.044148196,0.05241729,-0.05216622,0.036611874,-0.025294192,0.05463115,0.006853497,-0.051690135,-0.0339763,-0.0335185,0.021476023,-0.0059319953,0.008567571,-0.02239574,0.010827056,0.025126077,0.02252043,0.010323782,-0.0048508123,0.038753286,0.009434406,0.010941266,0.048907187,0.04210256,-0.02345035,-0.0051358747,0.046302903,-0.08639835,0.033185463,0.00733457,0.010599788,-0.036217347,-0.04257358,0.012305713,0.009866975,-0.017283777,-0.010904702,0.008157945,-0.016731942,-0.014050226,0.004177265,-0.011041926,0.006446419]],"total_duration":690168709,"load_duration":366171667,"prompt_eval_count":1024}

but...

curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":"'"$(yes | head -1024 | tr \\n ' ')"'"}'         
{"model":"nomic-embed-text","embeddings":[[0.021686183,-0.019075416,-0.08084915,0.046185914,-0.013433088,0.0034744788,-0.009352537,-0.03835853,0.052491013,0.023707729,0.01777299,0.01824284,0.058592543,0.014298118,-0.0148073565,-0.04226909,0.071865134,-0.02081744,0.07402428,0.047630727,-0.035703592,-0.061972808,-0.025439903,0.029358461,0.13932094,0.029959764,0.008890659,0.055962462,-0.040253285,0.055738572,0.023032077,0.027151333,0.048234127,0.024731524,0.040766396,0.02059236,-0.011554242,0.003746063,0.04557921,0.002551454,-0.026286239,0.053431354,0.03796014,0.037456088,-0.0127428025,0.020931019,0.0014215623,-0.040728625,0.0071615926,0.019677326,-0.00016294623,0.05451718,-0.04591856,-0.003011821,0.075811304,0.0032525528,0.023912014,-0.021694973,-0.05343476,0.02303965,0.009487864,0.06414517,-0.03941804,0.00847457,0.02839105,-0.057757292,-0.042644575,-0.012965547,0.016992878,-0.0152737545,0.04317819,-0.016121816,0.007725548,0.0017134714,-0.04855355,0.013031644,0.057297707,0.018767038,-0.05227413,-0.021235187,-0.051099427,-0.01794608,0.09555547,0.04080336,0.0033400136,-0.012623584,0.025734171,0.02822867,-0.02935033,0.008740868,-0.020500278,0.0345513,0.039487172,0.038882088,-0.04920274,0.017994575,-0.042839777,-0.030968646,-0.007905659,-0.0026594969,-0.0065685664,-0.02126081,0.026138514,-0.026490662,0.10591035,0.012755773,-0.0068796934,-0.047791924,-0.024277285,0.017529499,-0.03310422,0.090151444,0.049970638,-0.045384873,-0.0056034126,-0.028291563,0.0031464777,-0.036383282,0.0402807,0.019436417,-0.013684183,-0.018630262,-0.07858707,-0.0072106086,0.008105802,-0.02971634,-0.053113725,-0.04665898,0.0044327183,0.037047036,-0.022794677,0.02021023,-0.0601057,-0.0057190326,-0.021340989,-0.025705835,-0.021358663,-0.022228397,0.04684543,0.04607339,0.027008768,-0.028202662,0.0034202125,-0.0471792,-0.061992332,-0.0051967567,0.040850088,-0.0024919366,-0.04056746,-0.019759843,0.016079975,0.09639304,0.051422298,0.022808686,-0.009716571,-0.06855191,-0.023506738,0.030357841,0.044496626,-0.051363133,0.032271896,0.015534277,0.020483406,0.0016665284,-0.063900396,-0.025207968,-0.046492346,0.015482503,-0.015348717,0.103639685,-0.005517729,-0.047375068,-0.04089083,-0.037740417,0.0076540615,-0.019980198,-0.001330645,0.02783084,0.002560613,-0.03460203,0.006611639,-0.01852568,-0.026192183,-0.072958335,0.013951939,-0.08045076,0.027723024,0.006414844,-0.07839159,0.0038343705,0.010379564,0.026131876,-0.0018594388,-0.050311957,-0.023675995,0.015751258,0.046607446,0.028146273,0.010697382,-0.016766312,-0.06496338,-0.03277741,0.019242188,-0.041225147,-0.09072078,0.03691292,-0.042609725,-0.032949124,-0.041525368,0.021689268,0.03020935,-0.056459293,0.057957932,-0.010038808,0.019697005,-0.046555217,-0.022095392,-0.003560141,0.014076218,-0.033721812,-0.004732759,0.023110555,0.04749636,0.038056158,-0.011409013,0.031094052,0.021581084,0.012823222,0.021493573,0.02342016,-0.02345297,-0.029092252,0.04903069,-0.005707408,0.0025186331,-0.011744213,-0.047379926,0.014004199,-0.023883512,-0.04105765,0.019632658,0.027518265,0.018094705,0.06457233,0.0024730836,0.018916942,0.026531907,-0.046211712,-0.022285229,0.021573773,-0.013419937,-0.03455596,-0.09191394,0.006868248,-0.0074618882,-0.0036054577,-0.011943719,0.054376084,-0.023703573,0.0062128394,-0.0011602545,-0.034951344,-0.0027187609,-0.061549872,-0.005690091,-0.003928343,-0.031891953,-0.08489665,0.009549399,-0.040205255,-0.033712078,-0.020124912,0.013518412,-0.019645879,0.004544032,0.05125325,-0.043481413,-0.00819612,-0.037473537,0.033948384,0.03443079,-0.01904758,-0.017521713,0.036012836,0.039729983,0.04016558,0.026435697,-0.01900826,-0.02292816,0.016070291,-0.0035369154,-0.024110485,0.0038215688,0.057592604,0.029590948,0.0071622715,0.02379542,0.009109867,-0.06997849,-0.041678693,0.056863043,0.053877287,0.0534399,-0.0042019654,0.0048725526,-0.051756836,0.014207478,0.001239464,-0.016997792,-0.012972334,0.0043350216,-0.010455633,-0.06252787,0.0056886543,-0.04634297,-0.012747424,0.016325515,-0.0011880386,0.009256547,0.018445907,0.093901336,0.0032362181,0.036879685,0.03898592,-0.0017730877,0.0084034735,0.020843638,0.00012466834,0.03648086,-0.020137697,0.010239599,0.034984734,0.048634406,0.009181588,-0.017981991,0.0341379,0.0069884867,-0.037323523,-0.013165663,0.01828296,0.10198325,-0.01322182,0.04200674,-0.017133482,-0.04823288,-0.023230165,-0.012041545,0.039270226,0.047154948,-0.03833253,0.016359722,0.006687654,0.009697709,0.027688542,0.018698134,0.018184038,0.083178155,-0.008737084,-0.0221613,-0.016088588,0.018888673,0.036256406,-0.011139573,-0.028507024,0.0048806933,0.08796526,0.087164365,0.0017190895,0.055111267,-0.029558316,-0.026093286,-0.03513178,-0.001955272,0.016355641,-0.038865004,-0.04278645,-0.016241442,0.016694512,0.029632216,0.036559317,0.0074990178,-0.027446846,0.026351007,-0.014120773,-0.0028301547,-0.031107536,-0.023011928,-0.01390782,0.010659927,-0.0010016179,0.019410174,-0.022966182,-0.04033363,0.019179517,0.008030088,-0.062301833,0.04377396,-0.0095559545,-0.05824732,-0.055477094,-0.00073785806,0.008106026,0.025866156,0.006177542,-0.04722947,0.0479828,-0.0050166636,-0.000563004,0.03293222,-0.024315206,-0.019643135,-0.0044638272,-0.039113633,-0.04050051,0.017827738,0.054445498,0.08128559,-0.010175632,-0.0075262473,0.036429584,0.09462373,0.05923723,0.06024173,0.017778952,0.0011750376,0.056077678,0.005126308,0.044036042,0.02955781,-0.029093454,-0.0005067129,-0.03964037,-0.00272547,0.01486924,-0.019214861,0.001671789,0.015784303,-0.01886978,-0.048071902,0.0568134,0.0656898,-0.03506794,0.029272802,-0.006944527,-0.0143931005,0.041246418,0.025522089,0.0010183979,-0.0054674605,0.017614638,-0.059203632,0.022599941,0.06559491,-0.040254146,0.07594877,-0.013017859,-0.0037439996,-0.031746235,0.0419938,0.10224012,0.067910805,-0.058072705,-0.013629509,0.0074107726,0.022738233,-0.024427887,-0.030576007,-0.05286132,0.027219392,0.07018184,-0.072518066,0.061472982,0.008000156,0.07016992,-0.045264628,-0.0045131734,-0.010343464,-0.03781142,0.03482338,-0.00040026402,-0.042967003,0.0034719815,-0.02626769,-0.011086819,-0.004766106,0.037381247,0.000103722574,0.021916747,-0.05167537,-0.00132617,-0.009572417,-0.009425646,0.037652317,-0.0057214363,-0.093880065,-0.03364398,0.01840425,-0.09699545,-0.029650358,0.021758681,-0.00019601939,0.001798711,0.006651571,-0.0027565758,0.019624304,0.0033517152,-0.0021955771,0.0069242218,-0.07163115,-0.03282917,0.02885534,-0.054206476,0.013943535,0.041023366,0.047308676,-0.015385463,-0.04626191,0.016062548,-0.0147262495,0.02499899,0.048379168,0.01441045,-0.027214805,-0.0025033941,-0.039530385,0.034421377,-0.04003829,-0.0420297,-0.01782141,-0.017828116,-0.0117866015,-0.050140034,-0.049780894,-0.044062153,-0.0049346397,0.0072679194,0.0033092557,-0.023582848,-0.026398756,0.037671007,-0.00985491,-0.040940713,0.00027156656,0.05968527,0.011502466,-0.04156381,-0.05325467,0.023848912,0.018052131,-0.015945133,0.00685342,-0.013844726,0.05296898,-0.006917853,-0.022049213,-0.023141906,0.0129875075,0.010701747,-0.06344161,-0.041602444,0.005573286,0.03274333,-0.015899876,-0.012204048,0.05062888,0.020073386,-0.00201493,-0.011889087,-0.0030152905,0.03937736,0.028480224,-0.03426437,0.025448287,-0.0348186,-0.01899614,0.031518877,0.042536896,0.029132357,-0.0070573185,-0.01738459,0.049294688,-0.036723215,-0.038572576,0.0071683135,-0.038969625,0.037439983,-0.0017382645,-0.06817503,-0.0060464684,-0.04483543,-0.005090633,0.01416351,0.0070704212,0.037871495,-0.029571362,-0.010826985,-0.012624524,0.11096962,0.023774981,-0.06727741,-0.04794441,-0.01632399,0.10217753,-0.043636844,-0.029781265,0.023843026,-0.010395858,0.043904353,-0.022760343,-0.05062468,0.018510994,0.019039785,0.012164014,-0.028493334,-0.007211876,-0.023033062,-0.015655043,0.04444586,0.0135373715,-0.07732589,-0.028173676,-0.03711684,0.006997713,0.003112476,0.013158137,0.0033843294,-0.03786328,0.0025156725,-0.008304768,0.014516092,0.0187173,-0.028482845,0.0076532774,-0.0069246422,-0.0033282607,-0.00021157203,-0.020454971,0.03290957,0.0188229,0.005858269,0.06623579,-0.0047679693,0.034281466,0.01660904,0.041005354,-0.081362806,-0.026192991,0.009393615,-0.001276406,-0.003977486,0.02229323,0.053764395,-0.032457035,-0.024147298,0.005305748,0.033774894,-0.00601022,-0.0025842565,0.011640188,-0.007762979,-0.00928128,0.008298095,-0.014858036,0.027766164,0.015023938,0.0071566817,-0.038882636,0.0069395625,-0.056028824,0.026073955,-0.01701176,-0.008454133,-0.037241224,-0.050372347,-0.015120472,0.020027528,0.001240695,-0.026284391,0.08324433,0.0070742825,-0.061718702,0.047395356,-0.05504478,-0.011403008,-0.04701833,-0.015377757,0.074295945,-0.028195063,-0.0699167,-0.06365188,-0.010389164,-0.059649818,0.042751353,0.029899709,-0.035607744,-0.021836322,0.057011824,0.016018704,-0.06891442,0.05407251,0.032165393,-0.019952746,-0.035503045,0.021520425,-0.025386892,0.020795086,-0.030297859,-0.04978729,0.03572664,-0.01997003,-0.034393944,0.035613995,-0.0076516597,-0.019534158,0.037733212,0.047553454,-0.011319463,-0.025111137,-0.025001481,-0.029580059,0.010800015,0.0377503,0.04133556,-0.038590226,0.026377652,-0.044148196,0.05241729,-0.05216622,0.036611874,-0.025294192,0.05463115,0.006853497,-0.051690135,-0.0339763,-0.0335185,0.021476023,-0.0059319953,0.008567571,-0.02239574,0.010827056,0.025126077,0.02252043,0.010323782,-0.0048508123,0.038753286,0.009434406,0.010941266,0.048907187,0.04210256,-0.02345035,-0.0051358747,0.046302903,-0.08639835,0.033185463,0.00733457,0.010599788,-0.036217347,-0.04257358,0.012305713,0.009866975,-0.017283777,-0.010904702,0.008157945,-0.016731942,-0.014050226,0.004177265,-0.011041926,0.006446419]],"total_duration":322109250,"load_duration":12898125,"prompt_eval_count":1024}

The above also works 🤷

<!-- gh-comment-id:3523117834 --> @smileBeda commented on GitHub (Nov 12, 2025): Uhmmm ok - in this case: ``` localhost:11434/api/embed -d '{"model":"nomic-embed-text","options":{"num_gpu":0},"input":"'"$(yes | head -1024 | tr \\n ' ')"'"}' {"model":"nomic-embed-text","embeddings":[[0.021686183,-0.019075416,-0.08084915,0.046185914,-0.013433088,0.0034744788,-0.009352537,-0.03835853,0.052491013,0.023707729,0.01777299,0.01824284,0.058592543,0.014298118,-0.0148073565,-0.04226909,0.071865134,-0.02081744,0.07402428,0.047630727,-0.035703592,-0.061972808,-0.025439903,0.029358461,0.13932094,0.029959764,0.008890659,0.055962462,-0.040253285,0.055738572,0.023032077,0.027151333,0.048234127,0.024731524,0.040766396,0.02059236,-0.011554242,0.003746063,0.04557921,0.002551454,-0.026286239,0.053431354,0.03796014,0.037456088,-0.0127428025,0.020931019,0.0014215623,-0.040728625,0.0071615926,0.019677326,-0.00016294623,0.05451718,-0.04591856,-0.003011821,0.075811304,0.0032525528,0.023912014,-0.021694973,-0.05343476,0.02303965,0.009487864,0.06414517,-0.03941804,0.00847457,0.02839105,-0.057757292,-0.042644575,-0.012965547,0.016992878,-0.0152737545,0.04317819,-0.016121816,0.007725548,0.0017134714,-0.04855355,0.013031644,0.057297707,0.018767038,-0.05227413,-0.021235187,-0.051099427,-0.01794608,0.09555547,0.04080336,0.0033400136,-0.012623584,0.025734171,0.02822867,-0.02935033,0.008740868,-0.020500278,0.0345513,0.039487172,0.038882088,-0.04920274,0.017994575,-0.042839777,-0.030968646,-0.007905659,-0.0026594969,-0.0065685664,-0.02126081,0.026138514,-0.026490662,0.10591035,0.012755773,-0.0068796934,-0.047791924,-0.024277285,0.017529499,-0.03310422,0.090151444,0.049970638,-0.045384873,-0.0056034126,-0.028291563,0.0031464777,-0.036383282,0.0402807,0.019436417,-0.013684183,-0.018630262,-0.07858707,-0.0072106086,0.008105802,-0.02971634,-0.053113725,-0.04665898,0.0044327183,0.037047036,-0.022794677,0.02021023,-0.0601057,-0.0057190326,-0.021340989,-0.025705835,-0.021358663,-0.022228397,0.04684543,0.04607339,0.027008768,-0.028202662,0.0034202125,-0.0471792,-0.061992332,-0.0051967567,0.040850088,-0.0024919366,-0.04056746,-0.019759843,0.016079975,0.09639304,0.051422298,0.022808686,-0.009716571,-0.06855191,-0.023506738,0.030357841,0.044496626,-0.051363133,0.032271896,0.015534277,0.020483406,0.0016665284,-0.063900396,-0.025207968,-0.046492346,0.015482503,-0.015348717,0.103639685,-0.005517729,-0.047375068,-0.04089083,-0.037740417,0.0076540615,-0.019980198,-0.001330645,0.02783084,0.002560613,-0.03460203,0.006611639,-0.01852568,-0.026192183,-0.072958335,0.013951939,-0.08045076,0.027723024,0.006414844,-0.07839159,0.0038343705,0.010379564,0.026131876,-0.0018594388,-0.050311957,-0.023675995,0.015751258,0.046607446,0.028146273,0.010697382,-0.016766312,-0.06496338,-0.03277741,0.019242188,-0.041225147,-0.09072078,0.03691292,-0.042609725,-0.032949124,-0.041525368,0.021689268,0.03020935,-0.056459293,0.057957932,-0.010038808,0.019697005,-0.046555217,-0.022095392,-0.003560141,0.014076218,-0.033721812,-0.004732759,0.023110555,0.04749636,0.038056158,-0.011409013,0.031094052,0.021581084,0.012823222,0.021493573,0.02342016,-0.02345297,-0.029092252,0.04903069,-0.005707408,0.0025186331,-0.011744213,-0.047379926,0.014004199,-0.023883512,-0.04105765,0.019632658,0.027518265,0.018094705,0.06457233,0.0024730836,0.018916942,0.026531907,-0.046211712,-0.022285229,0.021573773,-0.013419937,-0.03455596,-0.09191394,0.006868248,-0.0074618882,-0.0036054577,-0.011943719,0.054376084,-0.023703573,0.0062128394,-0.0011602545,-0.034951344,-0.0027187609,-0.061549872,-0.005690091,-0.003928343,-0.031891953,-0.08489665,0.009549399,-0.040205255,-0.033712078,-0.020124912,0.013518412,-0.019645879,0.004544032,0.05125325,-0.043481413,-0.00819612,-0.037473537,0.033948384,0.03443079,-0.01904758,-0.017521713,0.036012836,0.039729983,0.04016558,0.026435697,-0.01900826,-0.02292816,0.016070291,-0.0035369154,-0.024110485,0.0038215688,0.057592604,0.029590948,0.0071622715,0.02379542,0.009109867,-0.06997849,-0.041678693,0.056863043,0.053877287,0.0534399,-0.0042019654,0.0048725526,-0.051756836,0.014207478,0.001239464,-0.016997792,-0.012972334,0.0043350216,-0.010455633,-0.06252787,0.0056886543,-0.04634297,-0.012747424,0.016325515,-0.0011880386,0.009256547,0.018445907,0.093901336,0.0032362181,0.036879685,0.03898592,-0.0017730877,0.0084034735,0.020843638,0.00012466834,0.03648086,-0.020137697,0.010239599,0.034984734,0.048634406,0.009181588,-0.017981991,0.0341379,0.0069884867,-0.037323523,-0.013165663,0.01828296,0.10198325,-0.01322182,0.04200674,-0.017133482,-0.04823288,-0.023230165,-0.012041545,0.039270226,0.047154948,-0.03833253,0.016359722,0.006687654,0.009697709,0.027688542,0.018698134,0.018184038,0.083178155,-0.008737084,-0.0221613,-0.016088588,0.018888673,0.036256406,-0.011139573,-0.028507024,0.0048806933,0.08796526,0.087164365,0.0017190895,0.055111267,-0.029558316,-0.026093286,-0.03513178,-0.001955272,0.016355641,-0.038865004,-0.04278645,-0.016241442,0.016694512,0.029632216,0.036559317,0.0074990178,-0.027446846,0.026351007,-0.014120773,-0.0028301547,-0.031107536,-0.023011928,-0.01390782,0.010659927,-0.0010016179,0.019410174,-0.022966182,-0.04033363,0.019179517,0.008030088,-0.062301833,0.04377396,-0.0095559545,-0.05824732,-0.055477094,-0.00073785806,0.008106026,0.025866156,0.006177542,-0.04722947,0.0479828,-0.0050166636,-0.000563004,0.03293222,-0.024315206,-0.019643135,-0.0044638272,-0.039113633,-0.04050051,0.017827738,0.054445498,0.08128559,-0.010175632,-0.0075262473,0.036429584,0.09462373,0.05923723,0.06024173,0.017778952,0.0011750376,0.056077678,0.005126308,0.044036042,0.02955781,-0.029093454,-0.0005067129,-0.03964037,-0.00272547,0.01486924,-0.019214861,0.001671789,0.015784303,-0.01886978,-0.048071902,0.0568134,0.0656898,-0.03506794,0.029272802,-0.006944527,-0.0143931005,0.041246418,0.025522089,0.0010183979,-0.0054674605,0.017614638,-0.059203632,0.022599941,0.06559491,-0.040254146,0.07594877,-0.013017859,-0.0037439996,-0.031746235,0.0419938,0.10224012,0.067910805,-0.058072705,-0.013629509,0.0074107726,0.022738233,-0.024427887,-0.030576007,-0.05286132,0.027219392,0.07018184,-0.072518066,0.061472982,0.008000156,0.07016992,-0.045264628,-0.0045131734,-0.010343464,-0.03781142,0.03482338,-0.00040026402,-0.042967003,0.0034719815,-0.02626769,-0.011086819,-0.004766106,0.037381247,0.000103722574,0.021916747,-0.05167537,-0.00132617,-0.009572417,-0.009425646,0.037652317,-0.0057214363,-0.093880065,-0.03364398,0.01840425,-0.09699545,-0.029650358,0.021758681,-0.00019601939,0.001798711,0.006651571,-0.0027565758,0.019624304,0.0033517152,-0.0021955771,0.0069242218,-0.07163115,-0.03282917,0.02885534,-0.054206476,0.013943535,0.041023366,0.047308676,-0.015385463,-0.04626191,0.016062548,-0.0147262495,0.02499899,0.048379168,0.01441045,-0.027214805,-0.0025033941,-0.039530385,0.034421377,-0.04003829,-0.0420297,-0.01782141,-0.017828116,-0.0117866015,-0.050140034,-0.049780894,-0.044062153,-0.0049346397,0.0072679194,0.0033092557,-0.023582848,-0.026398756,0.037671007,-0.00985491,-0.040940713,0.00027156656,0.05968527,0.011502466,-0.04156381,-0.05325467,0.023848912,0.018052131,-0.015945133,0.00685342,-0.013844726,0.05296898,-0.006917853,-0.022049213,-0.023141906,0.0129875075,0.010701747,-0.06344161,-0.041602444,0.005573286,0.03274333,-0.015899876,-0.012204048,0.05062888,0.020073386,-0.00201493,-0.011889087,-0.0030152905,0.03937736,0.028480224,-0.03426437,0.025448287,-0.0348186,-0.01899614,0.031518877,0.042536896,0.029132357,-0.0070573185,-0.01738459,0.049294688,-0.036723215,-0.038572576,0.0071683135,-0.038969625,0.037439983,-0.0017382645,-0.06817503,-0.0060464684,-0.04483543,-0.005090633,0.01416351,0.0070704212,0.037871495,-0.029571362,-0.010826985,-0.012624524,0.11096962,0.023774981,-0.06727741,-0.04794441,-0.01632399,0.10217753,-0.043636844,-0.029781265,0.023843026,-0.010395858,0.043904353,-0.022760343,-0.05062468,0.018510994,0.019039785,0.012164014,-0.028493334,-0.007211876,-0.023033062,-0.015655043,0.04444586,0.0135373715,-0.07732589,-0.028173676,-0.03711684,0.006997713,0.003112476,0.013158137,0.0033843294,-0.03786328,0.0025156725,-0.008304768,0.014516092,0.0187173,-0.028482845,0.0076532774,-0.0069246422,-0.0033282607,-0.00021157203,-0.020454971,0.03290957,0.0188229,0.005858269,0.06623579,-0.0047679693,0.034281466,0.01660904,0.041005354,-0.081362806,-0.026192991,0.009393615,-0.001276406,-0.003977486,0.02229323,0.053764395,-0.032457035,-0.024147298,0.005305748,0.033774894,-0.00601022,-0.0025842565,0.011640188,-0.007762979,-0.00928128,0.008298095,-0.014858036,0.027766164,0.015023938,0.0071566817,-0.038882636,0.0069395625,-0.056028824,0.026073955,-0.01701176,-0.008454133,-0.037241224,-0.050372347,-0.015120472,0.020027528,0.001240695,-0.026284391,0.08324433,0.0070742825,-0.061718702,0.047395356,-0.05504478,-0.011403008,-0.04701833,-0.015377757,0.074295945,-0.028195063,-0.0699167,-0.06365188,-0.010389164,-0.059649818,0.042751353,0.029899709,-0.035607744,-0.021836322,0.057011824,0.016018704,-0.06891442,0.05407251,0.032165393,-0.019952746,-0.035503045,0.021520425,-0.025386892,0.020795086,-0.030297859,-0.04978729,0.03572664,-0.01997003,-0.034393944,0.035613995,-0.0076516597,-0.019534158,0.037733212,0.047553454,-0.011319463,-0.025111137,-0.025001481,-0.029580059,0.010800015,0.0377503,0.04133556,-0.038590226,0.026377652,-0.044148196,0.05241729,-0.05216622,0.036611874,-0.025294192,0.05463115,0.006853497,-0.051690135,-0.0339763,-0.0335185,0.021476023,-0.0059319953,0.008567571,-0.02239574,0.010827056,0.025126077,0.02252043,0.010323782,-0.0048508123,0.038753286,0.009434406,0.010941266,0.048907187,0.04210256,-0.02345035,-0.0051358747,0.046302903,-0.08639835,0.033185463,0.00733457,0.010599788,-0.036217347,-0.04257358,0.012305713,0.009866975,-0.017283777,-0.010904702,0.008157945,-0.016731942,-0.014050226,0.004177265,-0.011041926,0.006446419]],"total_duration":690168709,"load_duration":366171667,"prompt_eval_count":1024} ``` but... ``` curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":"'"$(yes | head -1024 | tr \\n ' ')"'"}' {"model":"nomic-embed-text","embeddings":[[0.021686183,-0.019075416,-0.08084915,0.046185914,-0.013433088,0.0034744788,-0.009352537,-0.03835853,0.052491013,0.023707729,0.01777299,0.01824284,0.058592543,0.014298118,-0.0148073565,-0.04226909,0.071865134,-0.02081744,0.07402428,0.047630727,-0.035703592,-0.061972808,-0.025439903,0.029358461,0.13932094,0.029959764,0.008890659,0.055962462,-0.040253285,0.055738572,0.023032077,0.027151333,0.048234127,0.024731524,0.040766396,0.02059236,-0.011554242,0.003746063,0.04557921,0.002551454,-0.026286239,0.053431354,0.03796014,0.037456088,-0.0127428025,0.020931019,0.0014215623,-0.040728625,0.0071615926,0.019677326,-0.00016294623,0.05451718,-0.04591856,-0.003011821,0.075811304,0.0032525528,0.023912014,-0.021694973,-0.05343476,0.02303965,0.009487864,0.06414517,-0.03941804,0.00847457,0.02839105,-0.057757292,-0.042644575,-0.012965547,0.016992878,-0.0152737545,0.04317819,-0.016121816,0.007725548,0.0017134714,-0.04855355,0.013031644,0.057297707,0.018767038,-0.05227413,-0.021235187,-0.051099427,-0.01794608,0.09555547,0.04080336,0.0033400136,-0.012623584,0.025734171,0.02822867,-0.02935033,0.008740868,-0.020500278,0.0345513,0.039487172,0.038882088,-0.04920274,0.017994575,-0.042839777,-0.030968646,-0.007905659,-0.0026594969,-0.0065685664,-0.02126081,0.026138514,-0.026490662,0.10591035,0.012755773,-0.0068796934,-0.047791924,-0.024277285,0.017529499,-0.03310422,0.090151444,0.049970638,-0.045384873,-0.0056034126,-0.028291563,0.0031464777,-0.036383282,0.0402807,0.019436417,-0.013684183,-0.018630262,-0.07858707,-0.0072106086,0.008105802,-0.02971634,-0.053113725,-0.04665898,0.0044327183,0.037047036,-0.022794677,0.02021023,-0.0601057,-0.0057190326,-0.021340989,-0.025705835,-0.021358663,-0.022228397,0.04684543,0.04607339,0.027008768,-0.028202662,0.0034202125,-0.0471792,-0.061992332,-0.0051967567,0.040850088,-0.0024919366,-0.04056746,-0.019759843,0.016079975,0.09639304,0.051422298,0.022808686,-0.009716571,-0.06855191,-0.023506738,0.030357841,0.044496626,-0.051363133,0.032271896,0.015534277,0.020483406,0.0016665284,-0.063900396,-0.025207968,-0.046492346,0.015482503,-0.015348717,0.103639685,-0.005517729,-0.047375068,-0.04089083,-0.037740417,0.0076540615,-0.019980198,-0.001330645,0.02783084,0.002560613,-0.03460203,0.006611639,-0.01852568,-0.026192183,-0.072958335,0.013951939,-0.08045076,0.027723024,0.006414844,-0.07839159,0.0038343705,0.010379564,0.026131876,-0.0018594388,-0.050311957,-0.023675995,0.015751258,0.046607446,0.028146273,0.010697382,-0.016766312,-0.06496338,-0.03277741,0.019242188,-0.041225147,-0.09072078,0.03691292,-0.042609725,-0.032949124,-0.041525368,0.021689268,0.03020935,-0.056459293,0.057957932,-0.010038808,0.019697005,-0.046555217,-0.022095392,-0.003560141,0.014076218,-0.033721812,-0.004732759,0.023110555,0.04749636,0.038056158,-0.011409013,0.031094052,0.021581084,0.012823222,0.021493573,0.02342016,-0.02345297,-0.029092252,0.04903069,-0.005707408,0.0025186331,-0.011744213,-0.047379926,0.014004199,-0.023883512,-0.04105765,0.019632658,0.027518265,0.018094705,0.06457233,0.0024730836,0.018916942,0.026531907,-0.046211712,-0.022285229,0.021573773,-0.013419937,-0.03455596,-0.09191394,0.006868248,-0.0074618882,-0.0036054577,-0.011943719,0.054376084,-0.023703573,0.0062128394,-0.0011602545,-0.034951344,-0.0027187609,-0.061549872,-0.005690091,-0.003928343,-0.031891953,-0.08489665,0.009549399,-0.040205255,-0.033712078,-0.020124912,0.013518412,-0.019645879,0.004544032,0.05125325,-0.043481413,-0.00819612,-0.037473537,0.033948384,0.03443079,-0.01904758,-0.017521713,0.036012836,0.039729983,0.04016558,0.026435697,-0.01900826,-0.02292816,0.016070291,-0.0035369154,-0.024110485,0.0038215688,0.057592604,0.029590948,0.0071622715,0.02379542,0.009109867,-0.06997849,-0.041678693,0.056863043,0.053877287,0.0534399,-0.0042019654,0.0048725526,-0.051756836,0.014207478,0.001239464,-0.016997792,-0.012972334,0.0043350216,-0.010455633,-0.06252787,0.0056886543,-0.04634297,-0.012747424,0.016325515,-0.0011880386,0.009256547,0.018445907,0.093901336,0.0032362181,0.036879685,0.03898592,-0.0017730877,0.0084034735,0.020843638,0.00012466834,0.03648086,-0.020137697,0.010239599,0.034984734,0.048634406,0.009181588,-0.017981991,0.0341379,0.0069884867,-0.037323523,-0.013165663,0.01828296,0.10198325,-0.01322182,0.04200674,-0.017133482,-0.04823288,-0.023230165,-0.012041545,0.039270226,0.047154948,-0.03833253,0.016359722,0.006687654,0.009697709,0.027688542,0.018698134,0.018184038,0.083178155,-0.008737084,-0.0221613,-0.016088588,0.018888673,0.036256406,-0.011139573,-0.028507024,0.0048806933,0.08796526,0.087164365,0.0017190895,0.055111267,-0.029558316,-0.026093286,-0.03513178,-0.001955272,0.016355641,-0.038865004,-0.04278645,-0.016241442,0.016694512,0.029632216,0.036559317,0.0074990178,-0.027446846,0.026351007,-0.014120773,-0.0028301547,-0.031107536,-0.023011928,-0.01390782,0.010659927,-0.0010016179,0.019410174,-0.022966182,-0.04033363,0.019179517,0.008030088,-0.062301833,0.04377396,-0.0095559545,-0.05824732,-0.055477094,-0.00073785806,0.008106026,0.025866156,0.006177542,-0.04722947,0.0479828,-0.0050166636,-0.000563004,0.03293222,-0.024315206,-0.019643135,-0.0044638272,-0.039113633,-0.04050051,0.017827738,0.054445498,0.08128559,-0.010175632,-0.0075262473,0.036429584,0.09462373,0.05923723,0.06024173,0.017778952,0.0011750376,0.056077678,0.005126308,0.044036042,0.02955781,-0.029093454,-0.0005067129,-0.03964037,-0.00272547,0.01486924,-0.019214861,0.001671789,0.015784303,-0.01886978,-0.048071902,0.0568134,0.0656898,-0.03506794,0.029272802,-0.006944527,-0.0143931005,0.041246418,0.025522089,0.0010183979,-0.0054674605,0.017614638,-0.059203632,0.022599941,0.06559491,-0.040254146,0.07594877,-0.013017859,-0.0037439996,-0.031746235,0.0419938,0.10224012,0.067910805,-0.058072705,-0.013629509,0.0074107726,0.022738233,-0.024427887,-0.030576007,-0.05286132,0.027219392,0.07018184,-0.072518066,0.061472982,0.008000156,0.07016992,-0.045264628,-0.0045131734,-0.010343464,-0.03781142,0.03482338,-0.00040026402,-0.042967003,0.0034719815,-0.02626769,-0.011086819,-0.004766106,0.037381247,0.000103722574,0.021916747,-0.05167537,-0.00132617,-0.009572417,-0.009425646,0.037652317,-0.0057214363,-0.093880065,-0.03364398,0.01840425,-0.09699545,-0.029650358,0.021758681,-0.00019601939,0.001798711,0.006651571,-0.0027565758,0.019624304,0.0033517152,-0.0021955771,0.0069242218,-0.07163115,-0.03282917,0.02885534,-0.054206476,0.013943535,0.041023366,0.047308676,-0.015385463,-0.04626191,0.016062548,-0.0147262495,0.02499899,0.048379168,0.01441045,-0.027214805,-0.0025033941,-0.039530385,0.034421377,-0.04003829,-0.0420297,-0.01782141,-0.017828116,-0.0117866015,-0.050140034,-0.049780894,-0.044062153,-0.0049346397,0.0072679194,0.0033092557,-0.023582848,-0.026398756,0.037671007,-0.00985491,-0.040940713,0.00027156656,0.05968527,0.011502466,-0.04156381,-0.05325467,0.023848912,0.018052131,-0.015945133,0.00685342,-0.013844726,0.05296898,-0.006917853,-0.022049213,-0.023141906,0.0129875075,0.010701747,-0.06344161,-0.041602444,0.005573286,0.03274333,-0.015899876,-0.012204048,0.05062888,0.020073386,-0.00201493,-0.011889087,-0.0030152905,0.03937736,0.028480224,-0.03426437,0.025448287,-0.0348186,-0.01899614,0.031518877,0.042536896,0.029132357,-0.0070573185,-0.01738459,0.049294688,-0.036723215,-0.038572576,0.0071683135,-0.038969625,0.037439983,-0.0017382645,-0.06817503,-0.0060464684,-0.04483543,-0.005090633,0.01416351,0.0070704212,0.037871495,-0.029571362,-0.010826985,-0.012624524,0.11096962,0.023774981,-0.06727741,-0.04794441,-0.01632399,0.10217753,-0.043636844,-0.029781265,0.023843026,-0.010395858,0.043904353,-0.022760343,-0.05062468,0.018510994,0.019039785,0.012164014,-0.028493334,-0.007211876,-0.023033062,-0.015655043,0.04444586,0.0135373715,-0.07732589,-0.028173676,-0.03711684,0.006997713,0.003112476,0.013158137,0.0033843294,-0.03786328,0.0025156725,-0.008304768,0.014516092,0.0187173,-0.028482845,0.0076532774,-0.0069246422,-0.0033282607,-0.00021157203,-0.020454971,0.03290957,0.0188229,0.005858269,0.06623579,-0.0047679693,0.034281466,0.01660904,0.041005354,-0.081362806,-0.026192991,0.009393615,-0.001276406,-0.003977486,0.02229323,0.053764395,-0.032457035,-0.024147298,0.005305748,0.033774894,-0.00601022,-0.0025842565,0.011640188,-0.007762979,-0.00928128,0.008298095,-0.014858036,0.027766164,0.015023938,0.0071566817,-0.038882636,0.0069395625,-0.056028824,0.026073955,-0.01701176,-0.008454133,-0.037241224,-0.050372347,-0.015120472,0.020027528,0.001240695,-0.026284391,0.08324433,0.0070742825,-0.061718702,0.047395356,-0.05504478,-0.011403008,-0.04701833,-0.015377757,0.074295945,-0.028195063,-0.0699167,-0.06365188,-0.010389164,-0.059649818,0.042751353,0.029899709,-0.035607744,-0.021836322,0.057011824,0.016018704,-0.06891442,0.05407251,0.032165393,-0.019952746,-0.035503045,0.021520425,-0.025386892,0.020795086,-0.030297859,-0.04978729,0.03572664,-0.01997003,-0.034393944,0.035613995,-0.0076516597,-0.019534158,0.037733212,0.047553454,-0.011319463,-0.025111137,-0.025001481,-0.029580059,0.010800015,0.0377503,0.04133556,-0.038590226,0.026377652,-0.044148196,0.05241729,-0.05216622,0.036611874,-0.025294192,0.05463115,0.006853497,-0.051690135,-0.0339763,-0.0335185,0.021476023,-0.0059319953,0.008567571,-0.02239574,0.010827056,0.025126077,0.02252043,0.010323782,-0.0048508123,0.038753286,0.009434406,0.010941266,0.048907187,0.04210256,-0.02345035,-0.0051358747,0.046302903,-0.08639835,0.033185463,0.00733457,0.010599788,-0.036217347,-0.04257358,0.012305713,0.009866975,-0.017283777,-0.010904702,0.008157945,-0.016731942,-0.014050226,0.004177265,-0.011041926,0.006446419]],"total_duration":322109250,"load_duration":12898125,"prompt_eval_count":1024} ``` The above also works 🤷
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

Is jq available on macOS? If so, what's the output of:

curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":'$(yes | head -1024 | jq -sR)'}' 

And what's the output from ollama ps after this runs.

<!-- gh-comment-id:3523135593 --> @rick-github commented on GitHub (Nov 12, 2025): Is `jq` available on macOS? If so, what's the output of: ``` curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":'$(yes | head -1024 | jq -sR)'}' ``` And what's the output from `ollama ps` after this runs.
Author
Owner

@smileBeda commented on GitHub (Nov 12, 2025):

 curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":'$(yes | head -1024 | jq -sR)'}' 

{"error":"do embedding request: Post \"http://127.0.0.1:62975/embedding\": EOF"}
ollama ps
NAME                       ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
nomic-embed-text:latest    0a109f422b47    848 MB    100% GPU     8192       4 minutes from now 
<!-- gh-comment-id:3523206123 --> @smileBeda commented on GitHub (Nov 12, 2025): ``` curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":'$(yes | head -1024 | jq -sR)'}' {"error":"do embedding request: Post \"http://127.0.0.1:62975/embedding\": EOF"} ``` ``` ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL nomic-embed-text:latest 0a109f422b47 848 MB 100% GPU 8192 4 minutes from now ```
Author
Owner

@rick-github commented on GitHub (Nov 12, 2025):

What's the output of:

for i in {0..13} ; do echo -n "$i $(curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","options":{"num_gpu":'$i'},"input":'$(yes | head -1024 | jq -sR)'}' | jq | grep -q embeddings && echo good || echo bad) " ; ollama ps | grep nomic-embed-text ; done
<!-- gh-comment-id:3523259361 --> @rick-github commented on GitHub (Nov 12, 2025): What's the output of: ``` for i in {0..13} ; do echo -n "$i $(curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text","options":{"num_gpu":'$i'},"input":'$(yes | head -1024 | jq -sR)'}' | jq | grep -q embeddings && echo good || echo bad) " ; ollama ps | grep nomic-embed-text ; done ```
Author
Owner

@smileBeda commented on GitHub (Nov 12, 2025):

0 good nomic-embed-text:latest    0a109f422b47    279 MB    100% CPU     8192       4 minutes from now    
1 bad nomic-embed-text:latest    0a109f422b47    848 MB    31%/69% CPU/GPU    8192       4 minutes from now    
2 bad nomic-embed-text:latest    0a109f422b47    848 MB    28%/72% CPU/GPU    8192       4 minutes from now    
3 bad nomic-embed-text:latest    0a109f422b47    848 MB    26%/74% CPU/GPU    8192       4 minutes from now    
4 bad nomic-embed-text:latest    0a109f422b47    848 MB    24%/76% CPU/GPU    8192       4 minutes from now    
5 bad nomic-embed-text:latest    0a109f422b47    848 MB    22%/78% CPU/GPU    8192       4 minutes from now    
6 bad nomic-embed-text:latest    0a109f422b47    848 MB    19%/81% CPU/GPU    8192       4 minutes from now    
7 bad nomic-embed-text:latest    0a109f422b47    848 MB    17%/83% CPU/GPU    8192       4 minutes from now    
8 bad nomic-embed-text:latest    0a109f422b47    848 MB    15%/85% CPU/GPU    8192       4 minutes from now    
9 bad nomic-embed-text:latest    0a109f422b47    848 MB    12%/88% CPU/GPU    8192       4 minutes from now    
10 bad nomic-embed-text:latest    0a109f422b47    848 MB    10%/90% CPU/GPU    8192       4 minutes from now    
11 bad nomic-embed-text:latest    0a109f422b47    848 MB    8%/92% CPU/GPU    8192       4 minutes from now    
12 bad nomic-embed-text:latest    0a109f422b47    848 MB    6%/94% CPU/GPU    8192       4 minutes from now    
13 bad nomic-embed-text:latest    0a109f422b47    848 MB    100% GPU     8192       4 minutes from now
<!-- gh-comment-id:3523889701 --> @smileBeda commented on GitHub (Nov 12, 2025): ``` 0 good nomic-embed-text:latest 0a109f422b47 279 MB 100% CPU 8192 4 minutes from now 1 bad nomic-embed-text:latest 0a109f422b47 848 MB 31%/69% CPU/GPU 8192 4 minutes from now 2 bad nomic-embed-text:latest 0a109f422b47 848 MB 28%/72% CPU/GPU 8192 4 minutes from now 3 bad nomic-embed-text:latest 0a109f422b47 848 MB 26%/74% CPU/GPU 8192 4 minutes from now 4 bad nomic-embed-text:latest 0a109f422b47 848 MB 24%/76% CPU/GPU 8192 4 minutes from now 5 bad nomic-embed-text:latest 0a109f422b47 848 MB 22%/78% CPU/GPU 8192 4 minutes from now 6 bad nomic-embed-text:latest 0a109f422b47 848 MB 19%/81% CPU/GPU 8192 4 minutes from now 7 bad nomic-embed-text:latest 0a109f422b47 848 MB 17%/83% CPU/GPU 8192 4 minutes from now 8 bad nomic-embed-text:latest 0a109f422b47 848 MB 15%/85% CPU/GPU 8192 4 minutes from now 9 bad nomic-embed-text:latest 0a109f422b47 848 MB 12%/88% CPU/GPU 8192 4 minutes from now 10 bad nomic-embed-text:latest 0a109f422b47 848 MB 10%/90% CPU/GPU 8192 4 minutes from now 11 bad nomic-embed-text:latest 0a109f422b47 848 MB 8%/92% CPU/GPU 8192 4 minutes from now 12 bad nomic-embed-text:latest 0a109f422b47 848 MB 6%/94% CPU/GPU 8192 4 minutes from now 13 bad nomic-embed-text:latest 0a109f422b47 848 MB 100% GPU 8192 4 minutes from now ```
Author
Owner

@plockcdev commented on GitHub (Nov 15, 2025):

In OS X I don't see the issue in 0.12.3 but it appear in all through 0.12.10

<!-- gh-comment-id:3535422329 --> @plockcdev commented on GitHub (Nov 15, 2025): In OS X I don't see the issue in 0.12.3 but it appear in all through 0.12.10
Author
Owner

@bmustata commented on GitHub (Nov 18, 2025):

0.12.11 also is not working.

<!-- gh-comment-id:3546658964 --> @bmustata commented on GitHub (Nov 18, 2025): 0.12.11 also is not working.
Author
Owner

@smileBeda commented on GitHub (Nov 21, 2025):

This just went a ton worse with the last release 0.13.0
Now, also the model that before worked fails all the time with sigtraps and instead of actually allowing the supported amount of tokens needs to be run at sometimes 512 tokens, sometimes 1024

In other words, it now is unusable on a silicon Mac.

<!-- gh-comment-id:3563624109 --> @smileBeda commented on GitHub (Nov 21, 2025): This just went a ton worse with the last release 0.13.0 Now, also the model that before worked fails all the time with sigtraps and instead of actually allowing the supported amount of tokens needs to be run at sometimes 512 tokens, sometimes 1024 In other words, it now is unusable on a silicon Mac.
Author
Owner

@smileBeda commented on GitHub (Nov 21, 2025):

By now it seems to happen with any Bert based model
Gemma works.

<!-- gh-comment-id:3563807066 --> @smileBeda commented on GitHub (Nov 21, 2025): By now it seems to happen with any Bert based model Gemma works.
Author
Owner

@stevewillett commented on GitHub (Dec 4, 2025):

I am getting is in 1.13.1 (latest) on ubuntu 22.04 using nomic-embed-text:latest, i rolled back to 12.1 and the error is still there but Ollama doesn't panic like it does in 1.13.1

<!-- gh-comment-id:3614285031 --> @stevewillett commented on GitHub (Dec 4, 2025): I am getting is in 1.13.1 (latest) on ubuntu 22.04 using nomic-embed-text:latest, i rolled back to 12.1 and the error is still there but Ollama doesn't panic like it does in 1.13.1
Author
Owner

@lcy19930619 commented on GitHub (Dec 9, 2025):

Is there any plan to fix it? I'm also having the same problem.

<!-- gh-comment-id:3629885930 --> @lcy19930619 commented on GitHub (Dec 9, 2025): Is there any plan to fix it? I'm also having the same problem.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55162