[GH-ISSUE #10411] Ollama exception inside docker - ggml-alloc.c:819: GGML_ASSERT(talloc->buffer_id >= 0) failed #68901

Closed
opened 2026-05-04 15:36:52 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @Loremaster on GitHub (Apr 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10411

What is the issue?

Hey, I am using ollama together with gemma 3. Locally on my mac it works well however when I try to use ollama inside docker using linuex (together with longchain) it immediately raises an error GGML_ASSERT(talloc->buffer_id >= 0) failed

My code goes something along this:

llm = ChatOllama(model='gemma3', keep_alive=0, max_tokens=512, temperature=0.3)
response = llm.invoke(prompt)

Will appreciate any help with that. Thanks!

Relevant log output

ollama                  | time=2025-04-25T15:57:29.408Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:29.410Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
ollama                  | time=2025-04-25T15:57:29.417Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:29.423Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:29.424Z level=INFO source=server.go:105 msg="system memory" total="5.8 GiB" free="2.6 GiB" free_swap="3.3 GiB"
ollama                  | time=2025-04-25T15:57:29.424Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.vision.block_count default=0
ollama                  | time=2025-04-25T15:57:29.424Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.head_count_kv default=1
ollama                  | time=2025-04-25T15:57:29.425Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.key_length default=64
ollama                  | time=2025-04-25T15:57:29.425Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.value_length default=64
ollama                  | time=2025-04-25T15:57:29.425Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.head_count_kv default=1
ollama                  | time=2025-04-25T15:57:29.426Z level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=13 layers.offload=0 layers.split="" memory.available="[2.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="352.9 MiB" memory.required.partial="0 B" memory.required.kv="24.0 MiB" memory.required.allocations="[352.9 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="48.0 MiB" memory.graph.partial="48.0 MiB"
ollama                  | llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
ollama                  | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
ollama                  | llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
ollama                  | llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
ollama                  | llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
ollama                  | llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
ollama                  | llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
ollama                  | llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
ollama                  | llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
ollama                  | llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
ollama                  | llama_model_loader: - kv   8:                          general.file_type u32              = 1
ollama                  | llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
ollama                  | llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
ollama                  | llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
ollama                  | llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
ollama                  | llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
ollama                  | llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
ollama                  | llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
ollama                  | llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
ollama                  | llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
ollama                  | llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
ollama                  | llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
ollama                  | llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
ollama                  | llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
ollama                  | llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
ollama                  | llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
ollama                  | llama_model_loader: - type  f32:   51 tensors
ollama                  | llama_model_loader: - type  f16:   61 tensors
ollama                  | print_info: file format = GGUF V3 (latest)
ollama                  | print_info: file type   = F16
ollama                  | print_info: file size   = 260.86 MiB (16.00 BPW) 
ollama                  | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
ollama                  | load: special tokens cache size = 5
ollama                  | load: token to piece cache size = 0.2032 MB
ollama                  | print_info: arch             = nomic-bert
ollama                  | print_info: vocab_only       = 1
ollama                  | print_info: model type       = ?B
ollama                  | print_info: model params     = 136.73 M
ollama                  | print_info: general.name     = nomic-embed-text-v1.5
ollama                  | print_info: vocab type       = WPM
ollama                  | print_info: n_vocab          = 30522
ollama                  | print_info: n_merges         = 0
ollama                  | print_info: BOS token        = 101 '[CLS]'
ollama                  | print_info: EOS token        = 102 '[SEP]'
ollama                  | print_info: UNK token        = 100 '[UNK]'
ollama                  | print_info: SEP token        = 102 '[SEP]'
ollama                  | print_info: PAD token        = 0 '[PAD]'
ollama                  | print_info: MASK token       = 103 '[MASK]'
ollama                  | print_info: LF token         = 0 '[PAD]'
ollama                  | print_info: EOG token        = 102 '[SEP]'
ollama                  | print_info: max token length = 21
ollama                  | llama_model_load: vocab only - skipping tensors
ollama                  | time=2025-04-25T15:57:29.506Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --ctx-size 8192 --batch-size 512 --threads 10 --no-mmap --parallel 1 --port 36271"
ollama                  | time=2025-04-25T15:57:29.511Z level=INFO source=sched.go:451 msg="loaded runners" count=1
ollama                  | time=2025-04-25T15:57:29.512Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
ollama                  | time=2025-04-25T15:57:29.519Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
ollama                  | time=2025-04-25T15:57:29.664Z level=INFO source=runner.go:853 msg="starting go runner"
ollama                  | time=2025-04-25T15:57:29.685Z level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
ollama                  | time=2025-04-25T15:57:29.698Z level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:36271"
ollama                  | llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
ollama                  | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
ollama                  | llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
ollama                  | llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
ollama                  | llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
ollama                  | llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
ollama                  | llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
ollama                  | llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
ollama                  | llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
ollama                  | llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
ollama                  | llama_model_loader: - kv   8:                          general.file_type u32              = 1
ollama                  | llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
ollama                  | llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
ollama                  | llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
ollama                  | llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
ollama                  | llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
ollama                  | llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
ollama                  | llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
ollama                  | llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
ollama                  | llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
ollama                  | llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
ollama                  | llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
ollama                  | llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
ollama                  | llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
ollama                  | llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
ollama                  | llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
ollama                  | llama_model_loader: - type  f32:   51 tensors
ollama                  | llama_model_loader: - type  f16:   61 tensors
ollama                  | print_info: file format = GGUF V3 (latest)
ollama                  | print_info: file type   = F16
ollama                  | print_info: file size   = 260.86 MiB (16.00 BPW) 
ollama                  | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
ollama                  | load: special tokens cache size = 5
ollama                  | load: token to piece cache size = 0.2032 MB
ollama                  | print_info: arch             = nomic-bert
ollama                  | print_info: vocab_only       = 0
ollama                  | print_info: n_ctx_train      = 2048
ollama                  | print_info: n_embd           = 768
ollama                  | print_info: n_layer          = 12
ollama                  | print_info: n_head           = 12
ollama                  | print_info: n_head_kv        = 12
ollama                  | print_info: n_rot            = 64
ollama                  | print_info: n_swa            = 0
ollama                  | print_info: n_swa_pattern    = 1
ollama                  | print_info: n_embd_head_k    = 64
ollama                  | print_info: n_embd_head_v    = 64
ollama                  | print_info: n_gqa            = 1
ollama                  | print_info: n_embd_k_gqa     = 768
ollama                  | print_info: n_embd_v_gqa     = 768
ollama                  | print_info: f_norm_eps       = 1.0e-12
ollama                  | print_info: f_norm_rms_eps   = 0.0e+00
ollama                  | print_info: f_clamp_kqv      = 0.0e+00
ollama                  | print_info: f_max_alibi_bias = 0.0e+00
ollama                  | print_info: f_logit_scale    = 0.0e+00
ollama                  | print_info: f_attn_scale     = 0.0e+00
ollama                  | print_info: n_ff             = 3072
ollama                  | print_info: n_expert         = 0
ollama                  | print_info: n_expert_used    = 0
ollama                  | print_info: causal attn      = 0
ollama                  | print_info: pooling type     = 1
ollama                  | print_info: rope type        = 2
ollama                  | print_info: rope scaling     = linear
ollama                  | print_info: freq_base_train  = 1000.0
ollama                  | print_info: freq_scale_train = 1
ollama                  | print_info: n_ctx_orig_yarn  = 2048
ollama                  | print_info: rope_finetuned   = unknown
ollama                  | print_info: ssm_d_conv       = 0
ollama                  | print_info: ssm_d_inner      = 0
ollama                  | print_info: ssm_d_state      = 0
ollama                  | print_info: ssm_dt_rank      = 0
ollama                  | print_info: ssm_dt_b_c_rms   = 0
ollama                  | print_info: model type       = 137M
ollama                  | print_info: model params     = 136.73 M
ollama                  | print_info: general.name     = nomic-embed-text-v1.5
ollama                  | print_info: vocab type       = WPM
ollama                  | print_info: n_vocab          = 30522
ollama                  | print_info: n_merges         = 0
ollama                  | print_info: BOS token        = 101 '[CLS]'
ollama                  | print_info: EOS token        = 102 '[SEP]'
ollama                  | print_info: UNK token        = 100 '[UNK]'
ollama                  | print_info: SEP token        = 102 '[SEP]'
ollama                  | print_info: PAD token        = 0 '[PAD]'
ollama                  | print_info: MASK token       = 103 '[MASK]'
ollama                  | print_info: LF token         = 0 '[PAD]'
ollama                  | print_info: EOG token        = 102 '[SEP]'
ollama                  | print_info: max token length = 21
ollama                  | load_tensors: loading model tensors, this can take a while... (mmap = false)
ollama                  | load_tensors:          CPU model buffer size =   260.86 MiB
ollama                  | time=2025-04-25T15:57:29.788Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
ollama                  | llama_context: constructing llama_context
ollama                  | llama_context: n_seq_max     = 1
ollama                  | llama_context: n_ctx         = 8192
ollama                  | llama_context: n_ctx_per_seq = 8192
ollama                  | llama_context: n_batch       = 512
ollama                  | llama_context: n_ubatch      = 512
ollama                  | llama_context: causal_attn   = 0
ollama                  | llama_context: flash_attn    = 0
ollama                  | llama_context: freq_base     = 1000.0
ollama                  | llama_context: freq_scale    = 1
ollama                  | llama_context: n_ctx_pre_seq (8192) > n_ctx_train (2048) -- possible training context overflow
ollama                  | llama_context:        CPU  output buffer size =     0.00 MiB
ollama                  | init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 12, can_shift = 1
ollama                  | init:        CPU KV buffer size =   288.00 MiB
ollama                  | llama_context: KV self size  =  288.00 MiB, K (f16):  144.00 MiB, V (f16):  144.00 MiB
ollama                  | llama_context:        CPU compute buffer size =    24.50 MiB
ollama                  | llama_context: graph nodes  = 441
ollama                  | llama_context: graph splits = 1
ollama                  | time=2025-04-25T15:57:30.800Z level=INFO source=server.go:619 msg="llama runner started in 1.29 seconds"
ollama                  | time=2025-04-25T15:57:30.808Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | [GIN] 2025/04/25 - 15:57:31 | 200 |  1.957762501s |      172.22.0.2 | POST     "/api/embed"
ollama                  | time=2025-04-25T15:57:31.493Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:31.544Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:31.597Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:31.667Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:31.719Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:31.720Z level=INFO source=server.go:105 msg="system memory" total="5.8 GiB" free="2.6 GiB" free_swap="3.3 GiB"
ollama                  | time=2025-04-25T15:57:31.722Z level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[2.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.0 GiB" memory.required.partial="0 B" memory.required.kv="1.3 GiB" memory.required.allocations="[0 B]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="721.1 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
ollama                  | time=2025-04-25T15:57:31.844Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:31.848Z level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
ollama                  | time=2025-04-25T15:57:31.857Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
ollama                  | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
ollama                  | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
ollama                  | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
ollama                  | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
ollama                  | time=2025-04-25T15:57:31.858Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 40000 --batch-size 512 --threads 10 --no-mmap --parallel 4 --port 33995"
ollama                  | time=2025-04-25T15:57:31.861Z level=INFO source=sched.go:451 msg="loaded runners" count=1
ollama                  | time=2025-04-25T15:57:31.861Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
ollama                  | time=2025-04-25T15:57:31.862Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
ollama                  | time=2025-04-25T15:57:31.978Z level=INFO source=runner.go:866 msg="starting ollama engine"
ollama                  | time=2025-04-25T15:57:31.991Z level=INFO source=runner.go:929 msg="Server listening on 127.0.0.1:33995"
ollama                  | time=2025-04-25T15:57:32.090Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama                  | time=2025-04-25T15:57:32.092Z level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
ollama                  | time=2025-04-25T15:57:32.092Z level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
ollama                  | time=2025-04-25T15:57:32.092Z level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
ollama                  | time=2025-04-25T15:57:32.100Z level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
ollama                  | time=2025-04-25T15:57:32.111Z level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
ollama                  | time=2025-04-25T15:57:32.125Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
ollama                  | time=2025-04-25T15:57:34.555Z level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
ollama                  | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
ollama                  | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
ollama                  | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
ollama                  | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
ollama                  | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
ollama                  | time=2025-04-25T15:57:38.031Z level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="634.0 MiB"
ollama                  | time=2025-04-25T15:57:38.186Z level=INFO source=server.go:619 msg="llama runner started in 6.32 seconds"
ollama                  | ggml-alloc.c:819: GGML_ASSERT(talloc->buffer_id >= 0) failed
ollama                  | /usr/bin/ollama(+0x11021a8)[0x5555566561a8]
ollama                  | /usr/bin/ollama(+0x1102526)[0x555556656526]
ollama                  | /usr/bin/ollama(+0x10ef8f5)[0x5555566438f5]
ollama                  | /usr/bin/ollama(+0x10f101b)[0x55555664501b]
ollama                  | /usr/bin/ollama(+0x1116005)[0x55555666a005]
ollama                  | /usr/bin/ollama(+0x111645b)[0x55555666a45b]
ollama                  | /usr/bin/ollama(+0x117071b)[0x5555566c471b]
ollama                  | /usr/bin/ollama(+0x334801)[0x555555888801]
ollama                  | SIGABRT: abort
ollama                  | PC=0x7fffff27f00b m=3 sigcode=18446744073709551610
ollama                  | signal arrived during cgo execution
ollama                  | 
ollama                  | goroutine 10 gp=0xc0006028c0 m=3 mp=0xc000075008 [syscall]:
ollama                  | runtime.cgocall(0x5555566c4700, 0xc00026baf8)
ollama                  |       runtime/cgocall.go:167 +0x4b fp=0xc00026bad0 sp=0xc00026ba98 pc=0x55555587e14b
ollama                  | github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7ffe54000d40, 0x7fffa80190a0)
ollama                  |       _cgo_gotypes.go:516 +0x4a fp=0xc00026baf8 sp=0xc00026bad0 pc=0x555555c7b6aa
ollama                  | github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute.func1(...)
ollama                  |       github.com/ollama/ollama/ml/backend/ggml/ggml.go:529
ollama                  | github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc00338e4e0, {0xc002d38740, 0x1, 0x0?})
ollama                  |       github.com/ollama/ollama/ml/backend/ggml/ggml.go:529 +0x96 fp=0xc00026bb88 sp=0xc00026baf8 pc=0x555555c84956
ollama                  | github.com/ollama/ollama/model.Forward({0x555556b840b0, 0xc00338e4e0}, {0x555556b7aa90, 0xc00320c180}, {0xc0030d7800, 0x200, 0x200}, {{0x555556b8cab0, 0xc000283320}, {0x0, ...}, ...})
ollama                  |       github.com/ollama/ollama/model/model.go:313 +0x2b8 fp=0xc00026bc70 sp=0xc00026bb88 pc=0x555555cb27d8
ollama                  | github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc000128b40)
ollama                  |       github.com/ollama/ollama/runner/ollamarunner/runner.go:478 +0x476 fp=0xc00026bf98 sp=0xc00026bc70 pc=0x555555d34ab6
ollama                  | github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000128b40, {0x555556b7bdf0, 0xc0002eaa00})
ollama                  |       github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x4e fp=0xc00026bfb8 sp=0xc00026bf98 pc=0x555555d345ee
ollama                  | github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
ollama                  |       github.com/ollama/ollama/runner/ollamarunner/runner.go:906 +0x28 fp=0xc00026bfe0 sp=0xc00026bfb8 pc=0x555555d390e8
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00026bfe8 sp=0xc00026bfe0 pc=0x555555888b81
ollama                  | created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
ollama                  |       github.com/ollama/ollama/runner/ollamarunner/runner.go:906 +0xb37
ollama                  | 
ollama                  | goroutine 1 gp=0xc000002380 m=nil [IO wait]:
ollama                  | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00026d628 sp=0xc00026d608 pc=0x55555588144e
ollama                  | runtime.netpollblock(0xc00026d678?, 0x5581ac06?, 0x55?)
ollama                  |       runtime/netpoll.go:575 +0xf7 fp=0xc00026d660 sp=0xc00026d628 pc=0x555555846237
ollama                  | internal/poll.runtime_pollWait(0x7fffb7f6deb0, 0x72)
ollama                  |       runtime/netpoll.go:351 +0x85 fp=0xc00026d680 sp=0xc00026d660 pc=0x555555880665
ollama                  | internal/poll.(*pollDesc).wait(0xc0002aa980?, 0x900000036?, 0x0)
ollama                  |       internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00026d6a8 sp=0xc00026d680 pc=0x5555559079c7
ollama                  | internal/poll.(*pollDesc).waitRead(...)
ollama                  |       internal/poll/fd_poll_runtime.go:89
ollama                  | internal/poll.(*FD).Accept(0xc0002aa980)
ollama                  |       internal/poll/fd_unix.go:620 +0x295 fp=0xc00026d750 sp=0xc00026d6a8 pc=0x55555590cd95
ollama                  | net.(*netFD).accept(0xc0002aa980)
ollama                  |       net/fd_unix.go:172 +0x29 fp=0xc00026d808 sp=0xc00026d750 pc=0x55555597fba9
ollama                  | net.(*TCPListener).accept(0xc000122580)
ollama                  |       net/tcpsock_posix.go:159 +0x1b fp=0xc00026d858 sp=0xc00026d808 pc=0x55555599555b
ollama                  | net.(*TCPListener).Accept(0xc000122580)
ollama                  |       net/tcpsock.go:380 +0x30 fp=0xc00026d888 sp=0xc00026d858 pc=0x555555994410
ollama                  | net/http.(*onceCloseListener).Accept(0xc0000f0630?)
ollama                  |       <autogenerated>:1 +0x24 fp=0xc00026d8a0 sp=0xc00026d888 pc=0x555555baba44
ollama                  | net/http.(*Server).Serve(0xc000288f00, {0x555556b79af8, 0xc000122580})
ollama                  |       net/http/server.go:3424 +0x30c fp=0xc00026d9d0 sp=0xc00026d8a0 pc=0x555555b8330c
ollama                  | github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034130, 0xd, 0xd})
ollama                  |       github.com/ollama/ollama/runner/ollamarunner/runner.go:930 +0xec9 fp=0xc00026dd08 sp=0xc00026d9d0 pc=0x555555d38e49
ollama                  | github.com/ollama/ollama/runner.Execute({0xc000034110?, 0x0?, 0x0?})
ollama                  |       github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc00026dd30 sp=0xc00026dd08 pc=0x555555d39ac9
ollama                  | github.com/ollama/ollama/cmd.NewCLI.func2(0xc000289300?, {0x5555566e0055?, 0x4?, 0x5555566e0059?})
ollama                  |       github.com/ollama/ollama/cmd/cmd.go:1365 +0x45 fp=0xc00026dd58 sp=0xc00026dd30 pc=0x555556488be5
ollama                  | github.com/spf13/cobra.(*Command).execute(0xc000114f08, {0xc000302380, 0xe, 0xe})
ollama                  |       github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00026de78 sp=0xc00026dd58 pc=0x5555559f91fc
ollama                  | github.com/spf13/cobra.(*Command).ExecuteC(0xc0001d4908)
ollama                  |       github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00026df30 sp=0xc00026de78 pc=0x5555559f9a45
ollama                  | github.com/spf13/cobra.(*Command).Execute(...)
ollama                  |       github.com/spf13/cobra@v1.7.0/command.go:992
ollama                  | github.com/spf13/cobra.(*Command).ExecuteContext(...)
ollama                  |       github.com/spf13/cobra@v1.7.0/command.go:985
ollama                  | main.main()
ollama                  |       github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00026df50 sp=0xc00026df30 pc=0x555556488f4d
ollama                  | runtime.main()
ollama                  |       runtime/proc.go:283 +0x29d fp=0xc00026dfe0 sp=0xc00026df50 pc=0x55555584d83d
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00026dfe8 sp=0xc00026dfe0 pc=0x555555888b81
ollama                  | 
ollama                  | goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
ollama                  | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006efa8 sp=0xc00006ef88 pc=0x55555588144e
ollama                  | runtime.goparkunlock(...)
ollama                  |       runtime/proc.go:441
ollama                  | runtime.forcegchelper()
ollama                  |       runtime/proc.go:348 +0xb8 fp=0xc00006efe0 sp=0xc00006efa8 pc=0x55555584db78
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x555555888b81
ollama                  | created by runtime.init.7 in goroutine 1
ollama                  |       runtime/proc.go:336 +0x1a
ollama                  | 
ollama                  | goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
ollama                  | runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006f780 sp=0xc00006f760 pc=0x55555588144e
ollama                  | runtime.goparkunlock(...)
ollama                  |       runtime/proc.go:441
ollama                  | runtime.bgsweep(0xc00009a000)
ollama                  |       runtime/mgcsweep.go:316 +0xdf fp=0xc00006f7c8 sp=0xc00006f780 pc=0x55555583823f
ollama                  | runtime.gcenable.gowrap1()
ollama                  |       runtime/mgc.go:204 +0x25 fp=0xc00006f7e0 sp=0xc00006f7c8 pc=0x55555582c625
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x555555888b81
ollama                  | created by runtime.gcenable in goroutine 1
ollama                  |       runtime/mgc.go:204 +0x66
ollama                  | 
ollama                  | goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
ollama                  | runtime.gopark(0x10000?, 0x5555568978c8?, 0x0?, 0x0?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006ff78 sp=0xc00006ff58 pc=0x55555588144e
ollama                  | runtime.goparkunlock(...)
ollama                  |       runtime/proc.go:441
ollama                  | runtime.(*scavengerState).park(0x5555573e2300)
ollama                  |       runtime/mgcscavenge.go:425 +0x49 fp=0xc00006ffa8 sp=0xc00006ff78 pc=0x555555835c89
ollama                  | runtime.bgscavenge(0xc00009a000)
ollama                  |       runtime/mgcscavenge.go:658 +0x59 fp=0xc00006ffc8 sp=0xc00006ffa8 pc=0x555555836219
ollama                  | runtime.gcenable.gowrap2()
ollama                  |       runtime/mgc.go:205 +0x25 fp=0xc00006ffe0 sp=0xc00006ffc8 pc=0x55555582c5c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x555555888b81
ollama                  | created by runtime.gcenable in goroutine 1
ollama                  |       runtime/mgc.go:205 +0xa5
ollama                  | 
ollama                  | goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
ollama                  | runtime.gopark(0x1b8?, 0x55555584fde9?, 0x1?, 0x23?, 0xc00006e688?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006e630 sp=0xc00006e610 pc=0x55555588144e
ollama                  | runtime.runfinq()
ollama                  |       runtime/mfinal.go:196 +0x107 fp=0xc00006e7e0 sp=0xc00006e630 pc=0x55555582b5e7
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x555555888b81
ollama                  | created by runtime.createfing in goroutine 1
ollama                  |       runtime/mfinal.go:166 +0x3d
ollama                  | 
ollama                  | goroutine 6 gp=0xc0003b01c0 m=nil [chan receive]:
ollama                  | runtime.gopark(0xc0000bb2c0?, 0xc0035c16f8?, 0x60?, 0x7?, 0x5555559668e8?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc000070718 sp=0xc0000706f8 pc=0x55555588144e
ollama                  | runtime.chanrecv(0xc0000a82a0, 0x0, 0x1)
ollama                  |       runtime/chan.go:664 +0x445 fp=0xc000070790 sp=0xc000070718 pc=0x55555581d7e5
ollama                  | runtime.chanrecv1(0x0?, 0x0?)
ollama                  |       runtime/chan.go:506 +0x12 fp=0xc0000707b8 sp=0xc000070790 pc=0x55555581d372
ollama                  | runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
ollama                  |       runtime/mgc.go:1796
ollama                  | runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
ollama                  |       runtime/mgc.go:1799 +0x2f fp=0xc0000707e0 sp=0xc0000707b8 pc=0x55555582f7cf
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x555555888b81
ollama                  | created by unique.runtime_registerUniqueMapCleanup in goroutine 1
ollama                  |       runtime/mgc.go:1794 +0x85
ollama                  | 
ollama                  | goroutine 7 gp=0xc0003b0380 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398a91bbfe?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 18 gp=0xc0001828c0 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398a838d22?, 0x3?, 0x6d?, 0xd?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006a738 sp=0xc00006a718 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00006a7c8 sp=0xc00006a738 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00006a7e0 sp=0xc00006a7c8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 19 gp=0xc000182a80 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398a9a7441?, 0x3?, 0xbf?, 0xb3?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006af38 sp=0xc00006af18 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00006afc8 sp=0xc00006af38 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00006afe0 sp=0xc00006afc8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 20 gp=0xc000182c40 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0x555557490b40?, 0x3?, 0x35?, 0x61?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006b738 sp=0xc00006b718 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00006b7c8 sp=0xc00006b738 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00006b7e0 sp=0xc00006b7c8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006b7e8 sp=0xc00006b7e0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 21 gp=0xc000182e00 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398aadbf61?, 0x1?, 0x5a?, 0x50?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006bf38 sp=0xc00006bf18 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00006bfc8 sp=0xc00006bf38 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00006bfe0 sp=0xc00006bfc8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006bfe8 sp=0xc00006bfe0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 22 gp=0xc000182fc0 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398a839064?, 0x3?, 0x2c?, 0x45?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006c738 sp=0xc00006c718 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00006c7c8 sp=0xc00006c738 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00006c7e0 sp=0xc00006c7c8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 34 gp=0xc00031e1c0 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398a8bf83f?, 0x3?, 0x73?, 0x87?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00032a738 sp=0xc00032a718 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00032a7c8 sp=0xc00032a738 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00032a7e0 sp=0xc00032a7c8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00032a7e8 sp=0xc00032a7e0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 8 gp=0xc0003b0540 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398a8025b7?, 0x3?, 0x35?, 0xc?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 23 gp=0xc000183180 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0x555557490b40?, 0x3?, 0x28?, 0xf3?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00006cf38 sp=0xc00006cf18 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00006cfc8 sp=0xc00006cf38 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00006cfe0 sp=0xc00006cfc8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 35 gp=0xc00031e380 m=nil [GC worker (idle)]:
ollama                  | runtime.gopark(0xc398a9c4596?, 0x1?, 0xa0?, 0xf?, 0x0?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc00032af38 sp=0xc00032af18 pc=0x55555588144e
ollama                  | runtime.gcBgMarkWorker(0xc0000a9500)
ollama                  |       runtime/mgc.go:1423 +0xe9 fp=0xc00032afc8 sp=0xc00032af38 pc=0x55555582eae9
ollama                  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama                  |       runtime/mgc.go:1339 +0x25 fp=0xc00032afe0 sp=0xc00032afc8 pc=0x55555582e9c5
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc00032afe8 sp=0xc00032afe0 pc=0x555555888b81
ollama                  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama                  |       runtime/mgc.go:1339 +0x105
ollama                  | 
ollama                  | goroutine 886 gp=0xc000102e00 m=nil [select]:
ollama                  | runtime.gopark(0xc000045a28?, 0x2?, 0x0?, 0xe4?, 0xc000045894?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc0000456a8 sp=0xc000045688 pc=0x55555588144e
ollama                  | runtime.selectgo(0xc000045a28, 0xc000045890, 0x4b4?, 0x0, 0x4?, 0x1)
ollama                  |       runtime/select.go:351 +0x837 fp=0xc0000457e0 sp=0xc0000456a8 pc=0x55555585fd37
ollama                  | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000128b40, {0x555556b79cd8, 0xc00021e700}, 0xc00014ec80)
ollama                  |       github.com/ollama/ollama/runner/ollamarunner/runner.go:677 +0xb05 fp=0xc000045ac0 sp=0xc0000457e0 pc=0x555555d36dc5
ollama                  | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x555556b79cd8?, 0xc00021e700?}, 0xc000271b40?)
ollama                  |       <autogenerated>:1 +0x36 fp=0xc000045af0 sp=0xc000045ac0 pc=0x555555d39936
ollama                  | net/http.HandlerFunc.ServeHTTP(0xc00019c240?, {0x555556b79cd8?, 0xc00021e700?}, 0xc000271b60?)
ollama                  |       net/http/server.go:2294 +0x29 fp=0xc000045b18 sp=0xc000045af0 pc=0x555555b7f949
ollama                  | net/http.(*ServeMux).ServeHTTP(0x555555825b05?, {0x555556b79cd8, 0xc00021e700}, 0xc00014ec80)
ollama                  |       net/http/server.go:2822 +0x1c4 fp=0xc000045b68 sp=0xc000045b18 pc=0x555555b81844
ollama                  | net/http.serverHandler.ServeHTTP({0x555556b763b0?}, {0x555556b79cd8?, 0xc00021e700?}, 0x1?)
ollama                  |       net/http/server.go:3301 +0x8e fp=0xc000045b98 sp=0xc000045b68 pc=0x555555b9f2ce
ollama                  | net/http.(*conn).serve(0xc0000f0630, {0x555556b7bdb8, 0xc00021d5f0})
ollama                  |       net/http/server.go:2102 +0x625 fp=0xc000045fb8 sp=0xc000045b98 pc=0x555555b7de45
ollama                  | net/http.(*Server).Serve.gowrap3()
ollama                  |       net/http/server.go:3454 +0x28 fp=0xc000045fe0 sp=0xc000045fb8 pc=0x555555b83708
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc000045fe8 sp=0xc000045fe0 pc=0x555555888b81
ollama                  | created by net/http.(*Server).Serve in goroutine 1
ollama                  |       net/http/server.go:3454 +0x485
ollama                  | 
ollama                  | goroutine 938 gp=0xc0003b1180 m=nil [IO wait]:
ollama                  | runtime.gopark(0x1b0000?, 0x31332e6b6c620000?, 0x2e?, 0x70?, 0xb?)
ollama                  |       runtime/proc.go:435 +0xce fp=0xc000332dd8 sp=0xc000332db8 pc=0x55555588144e
ollama                  | runtime.netpollblock(0x5555558a47b8?, 0x5581ac06?, 0x55?)
ollama                  |       runtime/netpoll.go:575 +0xf7 fp=0xc000332e10 sp=0xc000332dd8 pc=0x555555846237
ollama                  | internal/poll.runtime_pollWait(0x7fffb7f6dd98, 0x72)
ollama                  |       runtime/netpoll.go:351 +0x85 fp=0xc000332e30 sp=0xc000332e10 pc=0x555555880665
ollama                  | internal/poll.(*pollDesc).wait(0xc0027a2180?, 0xc0001249a1?, 0x0)
ollama                  |       internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000332e58 sp=0xc000332e30 pc=0x5555559079c7
ollama                  | internal/poll.(*pollDesc).waitRead(...)
ollama                  |       internal/poll/fd_poll_runtime.go:89
ollama                  | internal/poll.(*FD).Read(0xc0027a2180, {0xc0001249a1, 0x1, 0x1})
ollama                  |       internal/poll/fd_unix.go:165 +0x27a fp=0xc000332ef0 sp=0xc000332e58 pc=0x555555908cba
ollama                  | net.(*netFD).Read(0xc0027a2180, {0xc0001249a1?, 0xc0001225d8?, 0xc000332f70?})
ollama                  |       net/fd_posix.go:55 +0x25 fp=0xc000332f38 sp=0xc000332ef0 pc=0x55555597dc05
ollama                  | net.(*conn).Read(0xc003110030, {0xc0001249a1?, 0x2746867?, 0xa?})
ollama                  |       net/net.go:194 +0x45 fp=0xc000332f80 sp=0xc000332f38 pc=0x55555598bfc5
ollama                  | net/http.(*connReader).backgroundRead(0xc000124990)
ollama                  |       net/http/server.go:690 +0x37 fp=0xc000332fc8 sp=0xc000332f80 pc=0x555555b77d17
ollama                  | net/http.(*connReader).startBackgroundRead.gowrap2()
ollama                  |       net/http/server.go:686 +0x25 fp=0xc000332fe0 sp=0xc000332fc8 pc=0x555555b77c45
ollama                  | runtime.goexit({})
ollama                  |       runtime/asm_amd64.s:1700 +0x1 fp=0xc000332fe8 sp=0xc000332fe0 pc=0x555555888b81
ollama                  | created by net/http.(*connReader).startBackgroundRead in goroutine 886
ollama                  |       net/http/server.go:686 +0xb6
ollama                  | 
ollama                  | rax    0x0
ollama                  | rbx    0x7fffb7f65700
ollama                  | rcx    0x2c
ollama                  | rdx    0x0
ollama                  | rdi    0x2
ollama                  | rsi    0x7fffb7f64930
ollama                  | rbp    0x5555568b535d
ollama                  | rsp    0x7fffb7f64930
ollama                  | r8     0x0
ollama                  | r9     0x7fffb7f64930
ollama                  | r10    0x8
ollama                  | r11    0x7fffff25e72e
ollama                  | r12    0x5555568db0ab
ollama                  | r13    0x333
ollama                  | r14    0x5cf
ollama                  | r15    0x7ffe5402a2a0
ollama                  | rip    0x7fffff27f00b
ollama                  | rflags 0x246
ollama                  | cs     0x33
ollama                  | fs     0x0
ollama                  | gs     0x0
ollama                  | time=2025-04-25T15:57:38.645Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2"
ollama                  | [GIN] 2025/04/25 - 15:57:38 | 200 |  7.225590128s |      172.22.0.2 | POST     "/api/chat"

OS

Linux

GPU

No response

CPU

No response

Ollama version

0.6.6

Originally created by @Loremaster on GitHub (Apr 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10411 ### What is the issue? Hey, I am using ollama together with gemma 3. Locally on my mac it works well however when I try to use ollama inside docker using linuex (together with longchain) it immediately raises an error `GGML_ASSERT(talloc->buffer_id >= 0) failed` My code goes something along this: ```python llm = ChatOllama(model='gemma3', keep_alive=0, max_tokens=512, temperature=0.3) response = llm.invoke(prompt) ``` Will appreciate any help with that. Thanks! ### Relevant log output ```shell ollama | time=2025-04-25T15:57:29.408Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:29.410Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z ollama | time=2025-04-25T15:57:29.417Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:29.423Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:29.424Z level=INFO source=server.go:105 msg="system memory" total="5.8 GiB" free="2.6 GiB" free_swap="3.3 GiB" ollama | time=2025-04-25T15:57:29.424Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.vision.block_count default=0 ollama | time=2025-04-25T15:57:29.424Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.head_count_kv default=1 ollama | time=2025-04-25T15:57:29.425Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.key_length default=64 ollama | time=2025-04-25T15:57:29.425Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.value_length default=64 ollama | time=2025-04-25T15:57:29.425Z level=WARN source=ggml.go:152 msg="key not found" key=nomic-bert.attention.head_count_kv default=1 ollama | time=2025-04-25T15:57:29.426Z level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=13 layers.offload=0 layers.split="" memory.available="[2.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="352.9 MiB" memory.required.partial="0 B" memory.required.kv="24.0 MiB" memory.required.allocations="[352.9 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="48.0 MiB" memory.graph.partial="48.0 MiB" ollama | llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) ollama | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. ollama | llama_model_loader: - kv 0: general.architecture str = nomic-bert ollama | llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 ollama | llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 ollama | llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 ollama | llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 ollama | llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 ollama | llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 ollama | llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 ollama | llama_model_loader: - kv 8: general.file_type u32 = 1 ollama | llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false ollama | llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 ollama | llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 ollama | llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 ollama | llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 ollama | llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 ollama | llama_model_loader: - kv 15: tokenizer.ggml.model str = bert ollama | llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... ollama | llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... ollama | llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... ollama | llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 ollama | llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 ollama | llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 ollama | llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 ollama | llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 ollama | llama_model_loader: - type f32: 51 tensors ollama | llama_model_loader: - type f16: 61 tensors ollama | print_info: file format = GGUF V3 (latest) ollama | print_info: file type = F16 ollama | print_info: file size = 260.86 MiB (16.00 BPW) ollama | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect ollama | load: special tokens cache size = 5 ollama | load: token to piece cache size = 0.2032 MB ollama | print_info: arch = nomic-bert ollama | print_info: vocab_only = 1 ollama | print_info: model type = ?B ollama | print_info: model params = 136.73 M ollama | print_info: general.name = nomic-embed-text-v1.5 ollama | print_info: vocab type = WPM ollama | print_info: n_vocab = 30522 ollama | print_info: n_merges = 0 ollama | print_info: BOS token = 101 '[CLS]' ollama | print_info: EOS token = 102 '[SEP]' ollama | print_info: UNK token = 100 '[UNK]' ollama | print_info: SEP token = 102 '[SEP]' ollama | print_info: PAD token = 0 '[PAD]' ollama | print_info: MASK token = 103 '[MASK]' ollama | print_info: LF token = 0 '[PAD]' ollama | print_info: EOG token = 102 '[SEP]' ollama | print_info: max token length = 21 ollama | llama_model_load: vocab only - skipping tensors ollama | time=2025-04-25T15:57:29.506Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --ctx-size 8192 --batch-size 512 --threads 10 --no-mmap --parallel 1 --port 36271" ollama | time=2025-04-25T15:57:29.511Z level=INFO source=sched.go:451 msg="loaded runners" count=1 ollama | time=2025-04-25T15:57:29.512Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding" ollama | time=2025-04-25T15:57:29.519Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" ollama | time=2025-04-25T15:57:29.664Z level=INFO source=runner.go:853 msg="starting go runner" ollama | time=2025-04-25T15:57:29.685Z level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) ollama | time=2025-04-25T15:57:29.698Z level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:36271" ollama | llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /root/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest)) ollama | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. ollama | llama_model_loader: - kv 0: general.architecture str = nomic-bert ollama | llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 ollama | llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 ollama | llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 ollama | llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 ollama | llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 ollama | llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 ollama | llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 ollama | llama_model_loader: - kv 8: general.file_type u32 = 1 ollama | llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false ollama | llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 ollama | llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 ollama | llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 ollama | llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 ollama | llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 ollama | llama_model_loader: - kv 15: tokenizer.ggml.model str = bert ollama | llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... ollama | llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... ollama | llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... ollama | llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 ollama | llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 ollama | llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 ollama | llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 ollama | llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 ollama | llama_model_loader: - type f32: 51 tensors ollama | llama_model_loader: - type f16: 61 tensors ollama | print_info: file format = GGUF V3 (latest) ollama | print_info: file type = F16 ollama | print_info: file size = 260.86 MiB (16.00 BPW) ollama | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect ollama | load: special tokens cache size = 5 ollama | load: token to piece cache size = 0.2032 MB ollama | print_info: arch = nomic-bert ollama | print_info: vocab_only = 0 ollama | print_info: n_ctx_train = 2048 ollama | print_info: n_embd = 768 ollama | print_info: n_layer = 12 ollama | print_info: n_head = 12 ollama | print_info: n_head_kv = 12 ollama | print_info: n_rot = 64 ollama | print_info: n_swa = 0 ollama | print_info: n_swa_pattern = 1 ollama | print_info: n_embd_head_k = 64 ollama | print_info: n_embd_head_v = 64 ollama | print_info: n_gqa = 1 ollama | print_info: n_embd_k_gqa = 768 ollama | print_info: n_embd_v_gqa = 768 ollama | print_info: f_norm_eps = 1.0e-12 ollama | print_info: f_norm_rms_eps = 0.0e+00 ollama | print_info: f_clamp_kqv = 0.0e+00 ollama | print_info: f_max_alibi_bias = 0.0e+00 ollama | print_info: f_logit_scale = 0.0e+00 ollama | print_info: f_attn_scale = 0.0e+00 ollama | print_info: n_ff = 3072 ollama | print_info: n_expert = 0 ollama | print_info: n_expert_used = 0 ollama | print_info: causal attn = 0 ollama | print_info: pooling type = 1 ollama | print_info: rope type = 2 ollama | print_info: rope scaling = linear ollama | print_info: freq_base_train = 1000.0 ollama | print_info: freq_scale_train = 1 ollama | print_info: n_ctx_orig_yarn = 2048 ollama | print_info: rope_finetuned = unknown ollama | print_info: ssm_d_conv = 0 ollama | print_info: ssm_d_inner = 0 ollama | print_info: ssm_d_state = 0 ollama | print_info: ssm_dt_rank = 0 ollama | print_info: ssm_dt_b_c_rms = 0 ollama | print_info: model type = 137M ollama | print_info: model params = 136.73 M ollama | print_info: general.name = nomic-embed-text-v1.5 ollama | print_info: vocab type = WPM ollama | print_info: n_vocab = 30522 ollama | print_info: n_merges = 0 ollama | print_info: BOS token = 101 '[CLS]' ollama | print_info: EOS token = 102 '[SEP]' ollama | print_info: UNK token = 100 '[UNK]' ollama | print_info: SEP token = 102 '[SEP]' ollama | print_info: PAD token = 0 '[PAD]' ollama | print_info: MASK token = 103 '[MASK]' ollama | print_info: LF token = 0 '[PAD]' ollama | print_info: EOG token = 102 '[SEP]' ollama | print_info: max token length = 21 ollama | load_tensors: loading model tensors, this can take a while... (mmap = false) ollama | load_tensors: CPU model buffer size = 260.86 MiB ollama | time=2025-04-25T15:57:29.788Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" ollama | llama_context: constructing llama_context ollama | llama_context: n_seq_max = 1 ollama | llama_context: n_ctx = 8192 ollama | llama_context: n_ctx_per_seq = 8192 ollama | llama_context: n_batch = 512 ollama | llama_context: n_ubatch = 512 ollama | llama_context: causal_attn = 0 ollama | llama_context: flash_attn = 0 ollama | llama_context: freq_base = 1000.0 ollama | llama_context: freq_scale = 1 ollama | llama_context: n_ctx_pre_seq (8192) > n_ctx_train (2048) -- possible training context overflow ollama | llama_context: CPU output buffer size = 0.00 MiB ollama | init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 12, can_shift = 1 ollama | init: CPU KV buffer size = 288.00 MiB ollama | llama_context: KV self size = 288.00 MiB, K (f16): 144.00 MiB, V (f16): 144.00 MiB ollama | llama_context: CPU compute buffer size = 24.50 MiB ollama | llama_context: graph nodes = 441 ollama | llama_context: graph splits = 1 ollama | time=2025-04-25T15:57:30.800Z level=INFO source=server.go:619 msg="llama runner started in 1.29 seconds" ollama | time=2025-04-25T15:57:30.808Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | [GIN] 2025/04/25 - 15:57:31 | 200 | 1.957762501s | 172.22.0.2 | POST "/api/embed" ollama | time=2025-04-25T15:57:31.493Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:31.544Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:31.597Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:31.667Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:31.719Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:31.720Z level=INFO source=server.go:105 msg="system memory" total="5.8 GiB" free="2.6 GiB" free_swap="3.3 GiB" ollama | time=2025-04-25T15:57:31.722Z level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[2.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.0 GiB" memory.required.partial="0 B" memory.required.kv="1.3 GiB" memory.required.allocations="[0 B]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="721.1 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" ollama | time=2025-04-25T15:57:31.844Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:31.848Z level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false ollama | time=2025-04-25T15:57:31.857Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 ollama | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 ollama | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 ollama | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 ollama | time=2025-04-25T15:57:31.858Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 ollama | time=2025-04-25T15:57:31.858Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 40000 --batch-size 512 --threads 10 --no-mmap --parallel 4 --port 33995" ollama | time=2025-04-25T15:57:31.861Z level=INFO source=sched.go:451 msg="loaded runners" count=1 ollama | time=2025-04-25T15:57:31.861Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding" ollama | time=2025-04-25T15:57:31.862Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" ollama | time=2025-04-25T15:57:31.978Z level=INFO source=runner.go:866 msg="starting ollama engine" ollama | time=2025-04-25T15:57:31.991Z level=INFO source=runner.go:929 msg="Server listening on 127.0.0.1:33995" ollama | time=2025-04-25T15:57:32.090Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-04-25T15:57:32.092Z level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" ollama | time=2025-04-25T15:57:32.092Z level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" ollama | time=2025-04-25T15:57:32.092Z level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 ollama | time=2025-04-25T15:57:32.100Z level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) ollama | time=2025-04-25T15:57:32.111Z level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" ollama | time=2025-04-25T15:57:32.125Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" ollama | time=2025-04-25T15:57:34.555Z level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false ollama | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 ollama | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 ollama | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 ollama | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 ollama | time=2025-04-25T15:57:34.587Z level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 ollama | time=2025-04-25T15:57:38.031Z level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="634.0 MiB" ollama | time=2025-04-25T15:57:38.186Z level=INFO source=server.go:619 msg="llama runner started in 6.32 seconds" ollama | ggml-alloc.c:819: GGML_ASSERT(talloc->buffer_id >= 0) failed ollama | /usr/bin/ollama(+0x11021a8)[0x5555566561a8] ollama | /usr/bin/ollama(+0x1102526)[0x555556656526] ollama | /usr/bin/ollama(+0x10ef8f5)[0x5555566438f5] ollama | /usr/bin/ollama(+0x10f101b)[0x55555664501b] ollama | /usr/bin/ollama(+0x1116005)[0x55555666a005] ollama | /usr/bin/ollama(+0x111645b)[0x55555666a45b] ollama | /usr/bin/ollama(+0x117071b)[0x5555566c471b] ollama | /usr/bin/ollama(+0x334801)[0x555555888801] ollama | SIGABRT: abort ollama | PC=0x7fffff27f00b m=3 sigcode=18446744073709551610 ollama | signal arrived during cgo execution ollama | ollama | goroutine 10 gp=0xc0006028c0 m=3 mp=0xc000075008 [syscall]: ollama | runtime.cgocall(0x5555566c4700, 0xc00026baf8) ollama | runtime/cgocall.go:167 +0x4b fp=0xc00026bad0 sp=0xc00026ba98 pc=0x55555587e14b ollama | github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7ffe54000d40, 0x7fffa80190a0) ollama | _cgo_gotypes.go:516 +0x4a fp=0xc00026baf8 sp=0xc00026bad0 pc=0x555555c7b6aa ollama | github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute.func1(...) ollama | github.com/ollama/ollama/ml/backend/ggml/ggml.go:529 ollama | github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc00338e4e0, {0xc002d38740, 0x1, 0x0?}) ollama | github.com/ollama/ollama/ml/backend/ggml/ggml.go:529 +0x96 fp=0xc00026bb88 sp=0xc00026baf8 pc=0x555555c84956 ollama | github.com/ollama/ollama/model.Forward({0x555556b840b0, 0xc00338e4e0}, {0x555556b7aa90, 0xc00320c180}, {0xc0030d7800, 0x200, 0x200}, {{0x555556b8cab0, 0xc000283320}, {0x0, ...}, ...}) ollama | github.com/ollama/ollama/model/model.go:313 +0x2b8 fp=0xc00026bc70 sp=0xc00026bb88 pc=0x555555cb27d8 ollama | github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc000128b40) ollama | github.com/ollama/ollama/runner/ollamarunner/runner.go:478 +0x476 fp=0xc00026bf98 sp=0xc00026bc70 pc=0x555555d34ab6 ollama | github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000128b40, {0x555556b7bdf0, 0xc0002eaa00}) ollama | github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x4e fp=0xc00026bfb8 sp=0xc00026bf98 pc=0x555555d345ee ollama | github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() ollama | github.com/ollama/ollama/runner/ollamarunner/runner.go:906 +0x28 fp=0xc00026bfe0 sp=0xc00026bfb8 pc=0x555555d390e8 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00026bfe8 sp=0xc00026bfe0 pc=0x555555888b81 ollama | created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 ollama | github.com/ollama/ollama/runner/ollamarunner/runner.go:906 +0xb37 ollama | ollama | goroutine 1 gp=0xc000002380 m=nil [IO wait]: ollama | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00026d628 sp=0xc00026d608 pc=0x55555588144e ollama | runtime.netpollblock(0xc00026d678?, 0x5581ac06?, 0x55?) ollama | runtime/netpoll.go:575 +0xf7 fp=0xc00026d660 sp=0xc00026d628 pc=0x555555846237 ollama | internal/poll.runtime_pollWait(0x7fffb7f6deb0, 0x72) ollama | runtime/netpoll.go:351 +0x85 fp=0xc00026d680 sp=0xc00026d660 pc=0x555555880665 ollama | internal/poll.(*pollDesc).wait(0xc0002aa980?, 0x900000036?, 0x0) ollama | internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00026d6a8 sp=0xc00026d680 pc=0x5555559079c7 ollama | internal/poll.(*pollDesc).waitRead(...) ollama | internal/poll/fd_poll_runtime.go:89 ollama | internal/poll.(*FD).Accept(0xc0002aa980) ollama | internal/poll/fd_unix.go:620 +0x295 fp=0xc00026d750 sp=0xc00026d6a8 pc=0x55555590cd95 ollama | net.(*netFD).accept(0xc0002aa980) ollama | net/fd_unix.go:172 +0x29 fp=0xc00026d808 sp=0xc00026d750 pc=0x55555597fba9 ollama | net.(*TCPListener).accept(0xc000122580) ollama | net/tcpsock_posix.go:159 +0x1b fp=0xc00026d858 sp=0xc00026d808 pc=0x55555599555b ollama | net.(*TCPListener).Accept(0xc000122580) ollama | net/tcpsock.go:380 +0x30 fp=0xc00026d888 sp=0xc00026d858 pc=0x555555994410 ollama | net/http.(*onceCloseListener).Accept(0xc0000f0630?) ollama | <autogenerated>:1 +0x24 fp=0xc00026d8a0 sp=0xc00026d888 pc=0x555555baba44 ollama | net/http.(*Server).Serve(0xc000288f00, {0x555556b79af8, 0xc000122580}) ollama | net/http/server.go:3424 +0x30c fp=0xc00026d9d0 sp=0xc00026d8a0 pc=0x555555b8330c ollama | github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034130, 0xd, 0xd}) ollama | github.com/ollama/ollama/runner/ollamarunner/runner.go:930 +0xec9 fp=0xc00026dd08 sp=0xc00026d9d0 pc=0x555555d38e49 ollama | github.com/ollama/ollama/runner.Execute({0xc000034110?, 0x0?, 0x0?}) ollama | github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc00026dd30 sp=0xc00026dd08 pc=0x555555d39ac9 ollama | github.com/ollama/ollama/cmd.NewCLI.func2(0xc000289300?, {0x5555566e0055?, 0x4?, 0x5555566e0059?}) ollama | github.com/ollama/ollama/cmd/cmd.go:1365 +0x45 fp=0xc00026dd58 sp=0xc00026dd30 pc=0x555556488be5 ollama | github.com/spf13/cobra.(*Command).execute(0xc000114f08, {0xc000302380, 0xe, 0xe}) ollama | github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00026de78 sp=0xc00026dd58 pc=0x5555559f91fc ollama | github.com/spf13/cobra.(*Command).ExecuteC(0xc0001d4908) ollama | github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00026df30 sp=0xc00026de78 pc=0x5555559f9a45 ollama | github.com/spf13/cobra.(*Command).Execute(...) ollama | github.com/spf13/cobra@v1.7.0/command.go:992 ollama | github.com/spf13/cobra.(*Command).ExecuteContext(...) ollama | github.com/spf13/cobra@v1.7.0/command.go:985 ollama | main.main() ollama | github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00026df50 sp=0xc00026df30 pc=0x555556488f4d ollama | runtime.main() ollama | runtime/proc.go:283 +0x29d fp=0xc00026dfe0 sp=0xc00026df50 pc=0x55555584d83d ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00026dfe8 sp=0xc00026dfe0 pc=0x555555888b81 ollama | ollama | goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: ollama | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006efa8 sp=0xc00006ef88 pc=0x55555588144e ollama | runtime.goparkunlock(...) ollama | runtime/proc.go:441 ollama | runtime.forcegchelper() ollama | runtime/proc.go:348 +0xb8 fp=0xc00006efe0 sp=0xc00006efa8 pc=0x55555584db78 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x555555888b81 ollama | created by runtime.init.7 in goroutine 1 ollama | runtime/proc.go:336 +0x1a ollama | ollama | goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: ollama | runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006f780 sp=0xc00006f760 pc=0x55555588144e ollama | runtime.goparkunlock(...) ollama | runtime/proc.go:441 ollama | runtime.bgsweep(0xc00009a000) ollama | runtime/mgcsweep.go:316 +0xdf fp=0xc00006f7c8 sp=0xc00006f780 pc=0x55555583823f ollama | runtime.gcenable.gowrap1() ollama | runtime/mgc.go:204 +0x25 fp=0xc00006f7e0 sp=0xc00006f7c8 pc=0x55555582c625 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x555555888b81 ollama | created by runtime.gcenable in goroutine 1 ollama | runtime/mgc.go:204 +0x66 ollama | ollama | goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: ollama | runtime.gopark(0x10000?, 0x5555568978c8?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006ff78 sp=0xc00006ff58 pc=0x55555588144e ollama | runtime.goparkunlock(...) ollama | runtime/proc.go:441 ollama | runtime.(*scavengerState).park(0x5555573e2300) ollama | runtime/mgcscavenge.go:425 +0x49 fp=0xc00006ffa8 sp=0xc00006ff78 pc=0x555555835c89 ollama | runtime.bgscavenge(0xc00009a000) ollama | runtime/mgcscavenge.go:658 +0x59 fp=0xc00006ffc8 sp=0xc00006ffa8 pc=0x555555836219 ollama | runtime.gcenable.gowrap2() ollama | runtime/mgc.go:205 +0x25 fp=0xc00006ffe0 sp=0xc00006ffc8 pc=0x55555582c5c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x555555888b81 ollama | created by runtime.gcenable in goroutine 1 ollama | runtime/mgc.go:205 +0xa5 ollama | ollama | goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: ollama | runtime.gopark(0x1b8?, 0x55555584fde9?, 0x1?, 0x23?, 0xc00006e688?) ollama | runtime/proc.go:435 +0xce fp=0xc00006e630 sp=0xc00006e610 pc=0x55555588144e ollama | runtime.runfinq() ollama | runtime/mfinal.go:196 +0x107 fp=0xc00006e7e0 sp=0xc00006e630 pc=0x55555582b5e7 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x555555888b81 ollama | created by runtime.createfing in goroutine 1 ollama | runtime/mfinal.go:166 +0x3d ollama | ollama | goroutine 6 gp=0xc0003b01c0 m=nil [chan receive]: ollama | runtime.gopark(0xc0000bb2c0?, 0xc0035c16f8?, 0x60?, 0x7?, 0x5555559668e8?) ollama | runtime/proc.go:435 +0xce fp=0xc000070718 sp=0xc0000706f8 pc=0x55555588144e ollama | runtime.chanrecv(0xc0000a82a0, 0x0, 0x1) ollama | runtime/chan.go:664 +0x445 fp=0xc000070790 sp=0xc000070718 pc=0x55555581d7e5 ollama | runtime.chanrecv1(0x0?, 0x0?) ollama | runtime/chan.go:506 +0x12 fp=0xc0000707b8 sp=0xc000070790 pc=0x55555581d372 ollama | runtime.unique_runtime_registerUniqueMapCleanup.func2(...) ollama | runtime/mgc.go:1796 ollama | runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() ollama | runtime/mgc.go:1799 +0x2f fp=0xc0000707e0 sp=0xc0000707b8 pc=0x55555582f7cf ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x555555888b81 ollama | created by unique.runtime_registerUniqueMapCleanup in goroutine 1 ollama | runtime/mgc.go:1794 +0x85 ollama | ollama | goroutine 7 gp=0xc0003b0380 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398a91bbfe?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000070f38 sp=0xc000070f18 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc000070fc8 sp=0xc000070f38 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc000070fe0 sp=0xc000070fc8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 18 gp=0xc0001828c0 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398a838d22?, 0x3?, 0x6d?, 0xd?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006a738 sp=0xc00006a718 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00006a7c8 sp=0xc00006a738 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00006a7e0 sp=0xc00006a7c8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 19 gp=0xc000182a80 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398a9a7441?, 0x3?, 0xbf?, 0xb3?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006af38 sp=0xc00006af18 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00006afc8 sp=0xc00006af38 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00006afe0 sp=0xc00006afc8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 20 gp=0xc000182c40 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x555557490b40?, 0x3?, 0x35?, 0x61?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006b738 sp=0xc00006b718 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00006b7c8 sp=0xc00006b738 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00006b7e0 sp=0xc00006b7c8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006b7e8 sp=0xc00006b7e0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 21 gp=0xc000182e00 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398aadbf61?, 0x1?, 0x5a?, 0x50?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006bf38 sp=0xc00006bf18 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00006bfc8 sp=0xc00006bf38 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00006bfe0 sp=0xc00006bfc8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006bfe8 sp=0xc00006bfe0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 22 gp=0xc000182fc0 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398a839064?, 0x3?, 0x2c?, 0x45?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006c738 sp=0xc00006c718 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00006c7c8 sp=0xc00006c738 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00006c7e0 sp=0xc00006c7c8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 34 gp=0xc00031e1c0 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398a8bf83f?, 0x3?, 0x73?, 0x87?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00032a738 sp=0xc00032a718 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00032a7c8 sp=0xc00032a738 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00032a7e0 sp=0xc00032a7c8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00032a7e8 sp=0xc00032a7e0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 8 gp=0xc0003b0540 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398a8025b7?, 0x3?, 0x35?, 0xc?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000071738 sp=0xc000071718 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc0000717c8 sp=0xc000071738 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 23 gp=0xc000183180 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x555557490b40?, 0x3?, 0x28?, 0xf3?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00006cf38 sp=0xc00006cf18 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00006cfc8 sp=0xc00006cf38 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00006cfe0 sp=0xc00006cfc8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 35 gp=0xc00031e380 m=nil [GC worker (idle)]: ollama | runtime.gopark(0xc398a9c4596?, 0x1?, 0xa0?, 0xf?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00032af38 sp=0xc00032af18 pc=0x55555588144e ollama | runtime.gcBgMarkWorker(0xc0000a9500) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00032afc8 sp=0xc00032af38 pc=0x55555582eae9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00032afe0 sp=0xc00032afc8 pc=0x55555582e9c5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00032afe8 sp=0xc00032afe0 pc=0x555555888b81 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 886 gp=0xc000102e00 m=nil [select]: ollama | runtime.gopark(0xc000045a28?, 0x2?, 0x0?, 0xe4?, 0xc000045894?) ollama | runtime/proc.go:435 +0xce fp=0xc0000456a8 sp=0xc000045688 pc=0x55555588144e ollama | runtime.selectgo(0xc000045a28, 0xc000045890, 0x4b4?, 0x0, 0x4?, 0x1) ollama | runtime/select.go:351 +0x837 fp=0xc0000457e0 sp=0xc0000456a8 pc=0x55555585fd37 ollama | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000128b40, {0x555556b79cd8, 0xc00021e700}, 0xc00014ec80) ollama | github.com/ollama/ollama/runner/ollamarunner/runner.go:677 +0xb05 fp=0xc000045ac0 sp=0xc0000457e0 pc=0x555555d36dc5 ollama | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x555556b79cd8?, 0xc00021e700?}, 0xc000271b40?) ollama | <autogenerated>:1 +0x36 fp=0xc000045af0 sp=0xc000045ac0 pc=0x555555d39936 ollama | net/http.HandlerFunc.ServeHTTP(0xc00019c240?, {0x555556b79cd8?, 0xc00021e700?}, 0xc000271b60?) ollama | net/http/server.go:2294 +0x29 fp=0xc000045b18 sp=0xc000045af0 pc=0x555555b7f949 ollama | net/http.(*ServeMux).ServeHTTP(0x555555825b05?, {0x555556b79cd8, 0xc00021e700}, 0xc00014ec80) ollama | net/http/server.go:2822 +0x1c4 fp=0xc000045b68 sp=0xc000045b18 pc=0x555555b81844 ollama | net/http.serverHandler.ServeHTTP({0x555556b763b0?}, {0x555556b79cd8?, 0xc00021e700?}, 0x1?) ollama | net/http/server.go:3301 +0x8e fp=0xc000045b98 sp=0xc000045b68 pc=0x555555b9f2ce ollama | net/http.(*conn).serve(0xc0000f0630, {0x555556b7bdb8, 0xc00021d5f0}) ollama | net/http/server.go:2102 +0x625 fp=0xc000045fb8 sp=0xc000045b98 pc=0x555555b7de45 ollama | net/http.(*Server).Serve.gowrap3() ollama | net/http/server.go:3454 +0x28 fp=0xc000045fe0 sp=0xc000045fb8 pc=0x555555b83708 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000045fe8 sp=0xc000045fe0 pc=0x555555888b81 ollama | created by net/http.(*Server).Serve in goroutine 1 ollama | net/http/server.go:3454 +0x485 ollama | ollama | goroutine 938 gp=0xc0003b1180 m=nil [IO wait]: ollama | runtime.gopark(0x1b0000?, 0x31332e6b6c620000?, 0x2e?, 0x70?, 0xb?) ollama | runtime/proc.go:435 +0xce fp=0xc000332dd8 sp=0xc000332db8 pc=0x55555588144e ollama | runtime.netpollblock(0x5555558a47b8?, 0x5581ac06?, 0x55?) ollama | runtime/netpoll.go:575 +0xf7 fp=0xc000332e10 sp=0xc000332dd8 pc=0x555555846237 ollama | internal/poll.runtime_pollWait(0x7fffb7f6dd98, 0x72) ollama | runtime/netpoll.go:351 +0x85 fp=0xc000332e30 sp=0xc000332e10 pc=0x555555880665 ollama | internal/poll.(*pollDesc).wait(0xc0027a2180?, 0xc0001249a1?, 0x0) ollama | internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000332e58 sp=0xc000332e30 pc=0x5555559079c7 ollama | internal/poll.(*pollDesc).waitRead(...) ollama | internal/poll/fd_poll_runtime.go:89 ollama | internal/poll.(*FD).Read(0xc0027a2180, {0xc0001249a1, 0x1, 0x1}) ollama | internal/poll/fd_unix.go:165 +0x27a fp=0xc000332ef0 sp=0xc000332e58 pc=0x555555908cba ollama | net.(*netFD).Read(0xc0027a2180, {0xc0001249a1?, 0xc0001225d8?, 0xc000332f70?}) ollama | net/fd_posix.go:55 +0x25 fp=0xc000332f38 sp=0xc000332ef0 pc=0x55555597dc05 ollama | net.(*conn).Read(0xc003110030, {0xc0001249a1?, 0x2746867?, 0xa?}) ollama | net/net.go:194 +0x45 fp=0xc000332f80 sp=0xc000332f38 pc=0x55555598bfc5 ollama | net/http.(*connReader).backgroundRead(0xc000124990) ollama | net/http/server.go:690 +0x37 fp=0xc000332fc8 sp=0xc000332f80 pc=0x555555b77d17 ollama | net/http.(*connReader).startBackgroundRead.gowrap2() ollama | net/http/server.go:686 +0x25 fp=0xc000332fe0 sp=0xc000332fc8 pc=0x555555b77c45 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000332fe8 sp=0xc000332fe0 pc=0x555555888b81 ollama | created by net/http.(*connReader).startBackgroundRead in goroutine 886 ollama | net/http/server.go:686 +0xb6 ollama | ollama | rax 0x0 ollama | rbx 0x7fffb7f65700 ollama | rcx 0x2c ollama | rdx 0x0 ollama | rdi 0x2 ollama | rsi 0x7fffb7f64930 ollama | rbp 0x5555568b535d ollama | rsp 0x7fffb7f64930 ollama | r8 0x0 ollama | r9 0x7fffb7f64930 ollama | r10 0x8 ollama | r11 0x7fffff25e72e ollama | r12 0x5555568db0ab ollama | r13 0x333 ollama | r14 0x5cf ollama | r15 0x7ffe5402a2a0 ollama | rip 0x7fffff27f00b ollama | rflags 0x246 ollama | cs 0x33 ollama | fs 0x0 ollama | gs 0x0 ollama | time=2025-04-25T15:57:38.645Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2" ollama | [GIN] 2025/04/25 - 15:57:38 | 200 | 7.225590128s | 172.22.0.2 | POST "/api/chat" ``` ### OS Linux ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.6.6
GiteaMirror added the bug label 2026-05-04 15:36:52 -05:00
Author
Owner

@stephen304 commented on GitHub (Apr 27, 2025):

I get the same error on 0.6.6 and 0.6.7-rc0. Downgrading to 0.6.5 fixes the issue so I can do CPU inference again with gemma3:4b with 32K context (num_ctx 32768).

<!-- gh-comment-id:2833137799 --> @stephen304 commented on GitHub (Apr 27, 2025): I get the same error on 0.6.6 and 0.6.7-rc0. Downgrading to 0.6.5 fixes the issue so I can do CPU inference again with gemma3:4b with 32K context (num_ctx 32768).
Author
Owner

@Loremaster commented on GitHub (Apr 28, 2025):

@stephen304 Thank you! Downgrading indeed helped (to 0.6.5)

<!-- gh-comment-id:2834723436 --> @Loremaster commented on GitHub (Apr 28, 2025): @stephen304 Thank you! Downgrading indeed helped (to 0.6.5)
Author
Owner

@jessegross commented on GitHub (Apr 28, 2025):

This looks like #10410 - I'm going to close this one so we can focus comments there.

<!-- gh-comment-id:2836007618 --> @jessegross commented on GitHub (Apr 28, 2025): This looks like #10410 - I'm going to close this one so we can focus comments there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68901