[GH-ISSUE #10681] Notable degradation in response quality when using CPU inference compared to GPU inference #69079

New Issue

GiteaMirror · 2026-05-04T17:06:53-05:00

GiteaMirror commented

2026-05-04 17:06:53 -05:00

Originally created by @kripper on GitHub (May 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10681

What is the issue?

I'm comparing the response quality of qwen3:14b on CPU-only vs GPU inference, and discovered a notable quality degradation when using CPU-only inference.

Is this normal? Why does this happen?
Is ollama/llama.cpp doing some optimization on CPU inference?
Can it be disabled?

This is the model I'm testing:

  Model
    architecture        qwen3
    parameters          14.8B
    context length      40960
    embedding length    5120
    quantization        Q4_K_M

  Capabilities
    completion
    tools

  Parameters
    repeat_penalty    1
    stop              "<|im_start|>"
    stop              "<|im_end|>"
    temperature       0.6
    top_k             20
    top_p             0.95

  License
    Apache License
    Version 2.0, January 2004

The CPU-only Hardware is a Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz and has only AVX1.

Test case

See: https://ai.imatronix.com/a/tmp/prompt.txt

Relevant log output

May 13 11:13:24   [GIN] 2025/05/13 - 11:13:24 | 200 | 84.255µs | 127.0.0.1 | HEAD "/"
May 13 11:13:24   time=2025-05-13T11:13:24.285-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.349-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   [GIN] 2025/05/13 - 11:13:24 | 200 | 133.819676ms | 127.0.0.1 | POST "/api/show"
May 13 11:13:24   time=2025-05-13T11:13:24.426-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.489-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.551-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.554-04:00 level=INFO source=server.go:105 msg="system memory" total="377.3 GiB" free="269.4 GiB" free_swap="0 B"
May 13 11:13:24   time=2025-05-13T11:13:24.554-04:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen3.vision.block_count default=0
May 13 11:13:24   time=2025-05-13T11:13:24.555-04:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[269.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="0 B" memory.required.kv="1.2 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB"
May 13 11:13:24   llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 13 11:13:24   llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 13 11:13:24   llama_model_loader: - kv 0: general.architecture str = qwen3
May 13 11:13:24   llama_model_loader: - kv 1: general.type str = model
May 13 11:13:24   llama_model_loader: - kv 2: general.name str = Qwen3 14B
May 13 11:13:24   llama_model_loader: - kv 3: general.basename str = Qwen3
May 13 11:13:24   llama_model_loader: - kv 4: general.size_label str = 14B
May 13 11:13:24   llama_model_loader: - kv 5: qwen3.block_count u32 = 40
May 13 11:13:24   llama_model_loader: - kv 6: qwen3.context_length u32 = 40960
May 13 11:13:24   llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120
May 13 11:13:24   llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408
May 13 11:13:24   llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40
May 13 11:13:24   llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8
May 13 11:13:24   llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000
May 13 11:13:24   llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
May 13 11:13:24   llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128
May 13 11:13:24   llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128
May 13 11:13:24   llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
May 13 11:13:24   llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2
May 13 11:13:24   llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
May 13 11:13:24   llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 13 11:13:24   llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 13 11:13:24   llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645
May 13 11:13:24   llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
May 13 11:13:24   llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643
May 13 11:13:24   llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false
May 13 11:13:24   llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
May 13 11:13:24   llama_model_loader: - kv 25: general.quantization_version u32 = 2
May 13 11:13:24   llama_model_loader: - kv 26: general.file_type u32 = 15
May 13 11:13:24   llama_model_loader: - type f32: 161 tensors
May 13 11:13:24   llama_model_loader: - type f16: 40 tensors
May 13 11:13:24   llama_model_loader: - type q4_K: 221 tensors
May 13 11:13:24   llama_model_loader: - type q6_K: 21 tensors
May 13 11:13:24   print_info: file format = GGUF V3 (latest)
May 13 11:13:24   print_info: file type = Q4_K - Medium
May 13 11:13:24   print_info: file size = 8.63 GiB (5.02 BPW)
May 13 11:13:24   load: special tokens cache size = 26
May 13 11:13:25   load: token to piece cache size = 0.9311 MB
May 13 11:13:25   print_info: arch = qwen3
May 13 11:13:25   print_info: vocab_only = 1
May 13 11:13:25   print_info: model type = ?B
May 13 11:13:25   print_info: model params = 14.77 B
May 13 11:13:25   print_info: general.name = Qwen3 14B
May 13 11:13:25   print_info: vocab type = BPE
May 13 11:13:25   print_info: n_vocab = 151936
May 13 11:13:25   print_info: n_merges = 151387
May 13 11:13:25   print_info: BOS token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOS token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOT token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: PAD token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: LF token = 198 'Ċ'
May 13 11:13:25   print_info: FIM PRE token = 151659 '<|fim_prefix|>'
May 13 11:13:25   print_info: FIM SUF token = 151661 '<|fim_suffix|>'
May 13 11:13:25   print_info: FIM MID token = 151660 '<|fim_middle|>'
May 13 11:13:25   print_info: FIM PAD token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: FIM REP token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: FIM SEP token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: EOG token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOG token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOG token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: EOG token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: EOG token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: max token length = 256
May 13 11:13:25   llama_model_load: vocab only - skipping tensors
May 13 11:13:25   time=2025-05-13T11:13:25.052-04:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --threads 12 --no-mmap --parallel 4 --port 38897"
May 13 11:13:25   time=2025-05-13T11:13:25.053-04:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
May 13 11:13:25   time=2025-05-13T11:13:25.053-04:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
May 13 11:13:25   time=2025-05-13T11:13:25.054-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
May 13 11:13:25   time=2025-05-13T11:13:25.081-04:00 level=INFO source=runner.go:853 msg="starting go runner"
May 13 11:13:25   load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so
May 13 11:13:25   time=2025-05-13T11:13:25.119-04:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
May 13 11:13:25   time=2025-05-13T11:13:25.120-04:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:38897"
May 13 11:13:25   llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 13 11:13:25   llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 13 11:13:25   llama_model_loader: - kv 0: general.architecture str = qwen3
May 13 11:13:25   llama_model_loader: - kv 1: general.type str = model
May 13 11:13:25   llama_model_loader: - kv 2: general.name str = Qwen3 14B
May 13 11:13:25   llama_model_loader: - kv 3: general.basename str = Qwen3
May 13 11:13:25   llama_model_loader: - kv 4: general.size_label str = 14B
May 13 11:13:25   llama_model_loader: - kv 5: qwen3.block_count u32 = 40
May 13 11:13:25   llama_model_loader: - kv 6: qwen3.context_length u32 = 40960
May 13 11:13:25   llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120
May 13 11:13:25   llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408
May 13 11:13:25   llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40
May 13 11:13:25   llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8
May 13 11:13:25   llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000
May 13 11:13:25   llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
May 13 11:13:25   llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128
May 13 11:13:25   llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128
May 13 11:13:25   llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
May 13 11:13:25   llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2
May 13 11:13:25   llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
May 13 11:13:25   time=2025-05-13T11:13:25.306-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
May 13 11:13:25   llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 13 11:13:25   llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 13 11:13:25   llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645
May 13 11:13:25   llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
May 13 11:13:25   llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643
May 13 11:13:25   llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false
May 13 11:13:25   llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
May 13 11:13:25   llama_model_loader: - kv 25: general.quantization_version u32 = 2
May 13 11:13:25   llama_model_loader: - kv 26: general.file_type u32 = 15
May 13 11:13:25   llama_model_loader: - type f32: 161 tensors
May 13 11:13:25   llama_model_loader: - type f16: 40 tensors
May 13 11:13:25   llama_model_loader: - type q4_K: 221 tensors
May 13 11:13:25   llama_model_loader: - type q6_K: 21 tensors
May 13 11:13:25   print_info: file format = GGUF V3 (latest)
May 13 11:13:25   print_info: file type = Q4_K - Medium
May 13 11:13:25   print_info: file size = 8.63 GiB (5.02 BPW)
May 13 11:13:25   load: special tokens cache size = 26
May 13 11:13:25   load: token to piece cache size = 0.9311 MB
May 13 11:13:25   print_info: arch = qwen3
May 13 11:13:25   print_info: vocab_only = 0
May 13 11:13:25   print_info: n_ctx_train = 40960
May 13 11:13:25   print_info: n_embd = 5120
May 13 11:13:25   print_info: n_layer = 40
May 13 11:13:25   print_info: n_head = 40
May 13 11:13:25   print_info: n_head_kv = 8
May 13 11:13:25   print_info: n_rot = 128
May 13 11:13:25   print_info: n_swa = 0
May 13 11:13:25   print_info: n_swa_pattern = 1
May 13 11:13:25   print_info: n_embd_head_k = 128
May 13 11:13:25   print_info: n_embd_head_v = 128
May 13 11:13:25   print_info: n_gqa = 5
May 13 11:13:25   print_info: n_embd_k_gqa = 1024
May 13 11:13:25   print_info: n_embd_v_gqa = 1024
May 13 11:13:25   print_info: f_norm_eps = 0.0e+00
May 13 11:13:25   print_info: f_norm_rms_eps = 1.0e-06
May 13 11:13:25   print_info: f_clamp_kqv = 0.0e+00
May 13 11:13:25   print_info: f_max_alibi_bias = 0.0e+00
May 13 11:13:25   print_info: f_logit_scale = 0.0e+00
May 13 11:13:25   print_info: f_attn_scale = 0.0e+00
May 13 11:13:25   print_info: n_ff = 17408
May 13 11:13:25   print_info: n_expert = 0
May 13 11:13:25   print_info: n_expert_used = 0
May 13 11:13:25   print_info: causal attn = 1
May 13 11:13:25   print_info: pooling type = 0
May 13 11:13:25   print_info: rope type = 2
May 13 11:13:25   print_info: rope scaling = linear
May 13 11:13:25   print_info: freq_base_train = 1000000.0
May 13 11:13:25   print_info: freq_scale_train = 1
May 13 11:13:25   print_info: n_ctx_orig_yarn = 40960
May 13 11:13:25   print_info: rope_finetuned = unknown
May 13 11:13:25   print_info: ssm_d_conv = 0
May 13 11:13:25   print_info: ssm_d_inner = 0
May 13 11:13:25   print_info: ssm_d_state = 0
May 13 11:13:25   print_info: ssm_dt_rank = 0
May 13 11:13:25   print_info: ssm_dt_b_c_rms = 0
May 13 11:13:25   print_info: model type = ?B
May 13 11:13:25   print_info: model params = 14.77 B
May 13 11:13:25   print_info: general.name = Qwen3 14B
May 13 11:13:25   print_info: vocab type = BPE
May 13 11:13:25   print_info: n_vocab = 151936
May 13 11:13:25   print_info: n_merges = 151387
May 13 11:13:25   print_info: BOS token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOS token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOT token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: PAD token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: LF token = 198 'Ċ'
May 13 11:13:25   print_info: FIM PRE token = 151659 '<|fim_prefix|>'
May 13 11:13:25   print_info: FIM SUF token = 151661 '<|fim_suffix|>'
May 13 11:13:25   print_info: FIM MID token = 151660 '<|fim_middle|>'
May 13 11:13:25   print_info: FIM PAD token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: FIM REP token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: FIM SEP token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: EOG token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOG token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOG token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: EOG token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: EOG token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: max token length = 256
May 13 11:13:25   load_tensors: loading model tensors, this can take a while... (mmap = false)
May 13 11:13:25   load_tensors: CPU model buffer size = 8840.78 MiB
May 13 11:13:31   llama_context: constructing llama_context
May 13 11:13:31   llama_context: n_seq_max = 4
May 13 11:13:31   llama_context: n_ctx = 8192
May 13 11:13:31   llama_context: n_ctx_per_seq = 2048
May 13 11:13:31   llama_context: n_batch = 2048
May 13 11:13:31   llama_context: n_ubatch = 512
May 13 11:13:31   llama_context: causal_attn = 1
May 13 11:13:31   llama_context: flash_attn = 0
May 13 11:13:31   llama_context: freq_base = 1000000.0
May 13 11:13:31   llama_context: freq_scale = 1
May 13 11:13:31   llama_context: n_ctx_per_seq (2048) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
May 13 11:13:31   llama_context: CPU output buffer size = 2.40 MiB
May 13 11:13:31   init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1
May 13 11:13:32   init: CPU KV buffer size = 1280.00 MiB
May 13 11:13:32   llama_context: KV self size = 1280.00 MiB, K (f16): 640.00 MiB, V (f16): 640.00 MiB
May 13 11:13:32   llama_context: CPU compute buffer size = 696.01 MiB
May 13 11:13:32   llama_context: graph nodes = 1526
May 13 11:13:32   llama_context: graph splits = 1
May 13 11:13:33   time=2025-05-13T11:13:33.092-04:00 level=INFO source=server.go:619 msg="llama runner started in 8.04 seconds"
May 13 11:13:33   [GIN] 2025/05/13 - 11:13:33 | 200 | 8.733650993s | 127.0.0.1 | POST "/api/generate"


May 13 11:13:57   time=2025-05-13T11:13:57.191-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:57   time=2025-05-13T11:13:57.236-04:00 level=WARN source=runner.go:131 msg="truncating input prompt" limit=2048 prompt=2065 keep=4 new=2048

Originally created by @kripper on GitHub (May 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10681 ### What is the issue? I'm comparing the response quality of `qwen3:14b` on CPU-only vs GPU inference, and discovered a notable quality degradation when using CPU-only inference. Is this normal? Why does this happen? Is ollama/llama.cpp doing some optimization on CPU inference? Can it be disabled? This is the model I'm testing: ``` Model architecture qwen3 parameters 14.8B context length 40960 embedding length 5120 quantization Q4_K_M Capabilities completion tools Parameters repeat_penalty 1 stop "<|im_start|>" stop "<|im_end|>" temperature 0.6 top_k 20 top_p 0.95 License Apache License Version 2.0, January 2004 ``` The CPU-only Hardware is a Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz and has only AVX1. ### Test case See: https://ai.imatronix.com/a/tmp/prompt.txt ### Relevant log output ```shell May 13 11:13:24 [GIN] 2025/05/13 - 11:13:24 | 200 | 84.255µs | 127.0.0.1 | HEAD "/" May 13 11:13:24 time=2025-05-13T11:13:24.285-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.349-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 [GIN] 2025/05/13 - 11:13:24 | 200 | 133.819676ms | 127.0.0.1 | POST "/api/show" May 13 11:13:24 time=2025-05-13T11:13:24.426-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.489-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.551-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.554-04:00 level=INFO source=server.go:105 msg="system memory" total="377.3 GiB" free="269.4 GiB" free_swap="0 B" May 13 11:13:24 time=2025-05-13T11:13:24.554-04:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen3.vision.block_count default=0 May 13 11:13:24 time=2025-05-13T11:13:24.555-04:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[269.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="0 B" memory.required.kv="1.2 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" May 13 11:13:24 llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 13 11:13:24 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 13 11:13:24 llama_model_loader: - kv 0: general.architecture str = qwen3 May 13 11:13:24 llama_model_loader: - kv 1: general.type str = model May 13 11:13:24 llama_model_loader: - kv 2: general.name str = Qwen3 14B May 13 11:13:24 llama_model_loader: - kv 3: general.basename str = Qwen3 May 13 11:13:24 llama_model_loader: - kv 4: general.size_label str = 14B May 13 11:13:24 llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 13 11:13:24 llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 13 11:13:24 llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 13 11:13:24 llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 13 11:13:24 llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 13 11:13:24 llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 13 11:13:24 llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 13 11:13:24 llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 13 11:13:24 llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 13 11:13:24 llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 13 11:13:24 llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 13 11:13:24 llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 13 11:13:24 llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 13 11:13:24 llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 13 11:13:24 llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 13 11:13:24 llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 13 11:13:24 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 13 11:13:24 llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 13 11:13:24 llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 13 11:13:24 llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 13 11:13:24 llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 13 11:13:24 llama_model_loader: - kv 26: general.file_type u32 = 15 May 13 11:13:24 llama_model_loader: - type f32: 161 tensors May 13 11:13:24 llama_model_loader: - type f16: 40 tensors May 13 11:13:24 llama_model_loader: - type q4_K: 221 tensors May 13 11:13:24 llama_model_loader: - type q6_K: 21 tensors May 13 11:13:24 print_info: file format = GGUF V3 (latest) May 13 11:13:24 print_info: file type = Q4_K - Medium May 13 11:13:24 print_info: file size = 8.63 GiB (5.02 BPW) May 13 11:13:24 load: special tokens cache size = 26 May 13 11:13:25 load: token to piece cache size = 0.9311 MB May 13 11:13:25 print_info: arch = qwen3 May 13 11:13:25 print_info: vocab_only = 1 May 13 11:13:25 print_info: model type = ?B May 13 11:13:25 print_info: model params = 14.77 B May 13 11:13:25 print_info: general.name = Qwen3 14B May 13 11:13:25 print_info: vocab type = BPE May 13 11:13:25 print_info: n_vocab = 151936 May 13 11:13:25 print_info: n_merges = 151387 May 13 11:13:25 print_info: BOS token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOS token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOT token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: PAD token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: LF token = 198 'Ċ' May 13 11:13:25 print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 13 11:13:25 print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 13 11:13:25 print_info: FIM MID token = 151660 '<|fim_middle|>' May 13 11:13:25 print_info: FIM PAD token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: FIM REP token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: FIM SEP token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: EOG token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOG token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOG token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: EOG token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: EOG token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: max token length = 256 May 13 11:13:25 llama_model_load: vocab only - skipping tensors May 13 11:13:25 time=2025-05-13T11:13:25.052-04:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --threads 12 --no-mmap --parallel 4 --port 38897" May 13 11:13:25 time=2025-05-13T11:13:25.053-04:00 level=INFO source=sched.go:451 msg="loaded runners" count=1 May 13 11:13:25 time=2025-05-13T11:13:25.053-04:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" May 13 11:13:25 time=2025-05-13T11:13:25.054-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" May 13 11:13:25 time=2025-05-13T11:13:25.081-04:00 level=INFO source=runner.go:853 msg="starting go runner" May 13 11:13:25 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so May 13 11:13:25 time=2025-05-13T11:13:25.119-04:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) May 13 11:13:25 time=2025-05-13T11:13:25.120-04:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:38897" May 13 11:13:25 llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 13 11:13:25 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 13 11:13:25 llama_model_loader: - kv 0: general.architecture str = qwen3 May 13 11:13:25 llama_model_loader: - kv 1: general.type str = model May 13 11:13:25 llama_model_loader: - kv 2: general.name str = Qwen3 14B May 13 11:13:25 llama_model_loader: - kv 3: general.basename str = Qwen3 May 13 11:13:25 llama_model_loader: - kv 4: general.size_label str = 14B May 13 11:13:25 llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 13 11:13:25 llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 13 11:13:25 llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 13 11:13:25 llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 13 11:13:25 llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 13 11:13:25 llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 13 11:13:25 llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 13 11:13:25 llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 13 11:13:25 llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 13 11:13:25 llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 13 11:13:25 llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 13 11:13:25 llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 13 11:13:25 llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 13 11:13:25 time=2025-05-13T11:13:25.306-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" May 13 11:13:25 llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 13 11:13:25 llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 13 11:13:25 llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 13 11:13:25 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 13 11:13:25 llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 13 11:13:25 llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 13 11:13:25 llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 13 11:13:25 llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 13 11:13:25 llama_model_loader: - kv 26: general.file_type u32 = 15 May 13 11:13:25 llama_model_loader: - type f32: 161 tensors May 13 11:13:25 llama_model_loader: - type f16: 40 tensors May 13 11:13:25 llama_model_loader: - type q4_K: 221 tensors May 13 11:13:25 llama_model_loader: - type q6_K: 21 tensors May 13 11:13:25 print_info: file format = GGUF V3 (latest) May 13 11:13:25 print_info: file type = Q4_K - Medium May 13 11:13:25 print_info: file size = 8.63 GiB (5.02 BPW) May 13 11:13:25 load: special tokens cache size = 26 May 13 11:13:25 load: token to piece cache size = 0.9311 MB May 13 11:13:25 print_info: arch = qwen3 May 13 11:13:25 print_info: vocab_only = 0 May 13 11:13:25 print_info: n_ctx_train = 40960 May 13 11:13:25 print_info: n_embd = 5120 May 13 11:13:25 print_info: n_layer = 40 May 13 11:13:25 print_info: n_head = 40 May 13 11:13:25 print_info: n_head_kv = 8 May 13 11:13:25 print_info: n_rot = 128 May 13 11:13:25 print_info: n_swa = 0 May 13 11:13:25 print_info: n_swa_pattern = 1 May 13 11:13:25 print_info: n_embd_head_k = 128 May 13 11:13:25 print_info: n_embd_head_v = 128 May 13 11:13:25 print_info: n_gqa = 5 May 13 11:13:25 print_info: n_embd_k_gqa = 1024 May 13 11:13:25 print_info: n_embd_v_gqa = 1024 May 13 11:13:25 print_info: f_norm_eps = 0.0e+00 May 13 11:13:25 print_info: f_norm_rms_eps = 1.0e-06 May 13 11:13:25 print_info: f_clamp_kqv = 0.0e+00 May 13 11:13:25 print_info: f_max_alibi_bias = 0.0e+00 May 13 11:13:25 print_info: f_logit_scale = 0.0e+00 May 13 11:13:25 print_info: f_attn_scale = 0.0e+00 May 13 11:13:25 print_info: n_ff = 17408 May 13 11:13:25 print_info: n_expert = 0 May 13 11:13:25 print_info: n_expert_used = 0 May 13 11:13:25 print_info: causal attn = 1 May 13 11:13:25 print_info: pooling type = 0 May 13 11:13:25 print_info: rope type = 2 May 13 11:13:25 print_info: rope scaling = linear May 13 11:13:25 print_info: freq_base_train = 1000000.0 May 13 11:13:25 print_info: freq_scale_train = 1 May 13 11:13:25 print_info: n_ctx_orig_yarn = 40960 May 13 11:13:25 print_info: rope_finetuned = unknown May 13 11:13:25 print_info: ssm_d_conv = 0 May 13 11:13:25 print_info: ssm_d_inner = 0 May 13 11:13:25 print_info: ssm_d_state = 0 May 13 11:13:25 print_info: ssm_dt_rank = 0 May 13 11:13:25 print_info: ssm_dt_b_c_rms = 0 May 13 11:13:25 print_info: model type = ?B May 13 11:13:25 print_info: model params = 14.77 B May 13 11:13:25 print_info: general.name = Qwen3 14B May 13 11:13:25 print_info: vocab type = BPE May 13 11:13:25 print_info: n_vocab = 151936 May 13 11:13:25 print_info: n_merges = 151387 May 13 11:13:25 print_info: BOS token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOS token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOT token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: PAD token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: LF token = 198 'Ċ' May 13 11:13:25 print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 13 11:13:25 print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 13 11:13:25 print_info: FIM MID token = 151660 '<|fim_middle|>' May 13 11:13:25 print_info: FIM PAD token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: FIM REP token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: FIM SEP token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: EOG token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOG token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOG token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: EOG token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: EOG token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: max token length = 256 May 13 11:13:25 load_tensors: loading model tensors, this can take a while... (mmap = false) May 13 11:13:25 load_tensors: CPU model buffer size = 8840.78 MiB May 13 11:13:31 llama_context: constructing llama_context May 13 11:13:31 llama_context: n_seq_max = 4 May 13 11:13:31 llama_context: n_ctx = 8192 May 13 11:13:31 llama_context: n_ctx_per_seq = 2048 May 13 11:13:31 llama_context: n_batch = 2048 May 13 11:13:31 llama_context: n_ubatch = 512 May 13 11:13:31 llama_context: causal_attn = 1 May 13 11:13:31 llama_context: flash_attn = 0 May 13 11:13:31 llama_context: freq_base = 1000000.0 May 13 11:13:31 llama_context: freq_scale = 1 May 13 11:13:31 llama_context: n_ctx_per_seq (2048) < n_ctx_train (40960) -- the full capacity of the model will not be utilized May 13 11:13:31 llama_context: CPU output buffer size = 2.40 MiB May 13 11:13:31 init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1 May 13 11:13:32 init: CPU KV buffer size = 1280.00 MiB May 13 11:13:32 llama_context: KV self size = 1280.00 MiB, K (f16): 640.00 MiB, V (f16): 640.00 MiB May 13 11:13:32 llama_context: CPU compute buffer size = 696.01 MiB May 13 11:13:32 llama_context: graph nodes = 1526 May 13 11:13:32 llama_context: graph splits = 1 May 13 11:13:33 time=2025-05-13T11:13:33.092-04:00 level=INFO source=server.go:619 msg="llama runner started in 8.04 seconds" May 13 11:13:33 [GIN] 2025/05/13 - 11:13:33 | 200 | 8.733650993s | 127.0.0.1 | POST "/api/generate" May 13 11:13:57 time=2025-05-13T11:13:57.191-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:57 time=2025-05-13T11:13:57.236-04:00 level=WARN source=runner.go:131 msg="truncating input prompt" limit=2048 prompt=2065 keep=4 new=2048 ```

GiteaMirror added the bug label 2026-05-04 17:06:53 -05:00

GiteaMirror closed this issue

2026-05-04 17:06:55 -05:00

GiteaMirror commented

2026-05-04 17:06:57 -05:00

@rick-github commented on GitHub (May 13, 2025):

How are you quantifying degradation? Server logs may aid in debugging.

@rick-github commented on GitHub (May 13, 2025): How are you quantifying degradation? [Server logs]( https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.

GiteaMirror commented

2026-05-04 17:06:57 -05:00

@kripper commented on GitHub (May 13, 2025):

How are you quantifying degradation? Server logs may aid in debugging.

I'm giving it a list of numbered rules and ask it to describe rule number 15.

On CPU, it fails always.
On GPU (split), it works always.

*always = I only tried 5 times

@kripper commented on GitHub (May 13, 2025): > How are you quantifying degradation? [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging. I'm giving it a list of numbered rules and ask it to describe rule number 15. On CPU, it fails always. On GPU (split), it works always. *always = I only tried 5 times

GiteaMirror commented

2026-05-04 17:06:58 -05:00

@rick-github commented on GitHub (May 13, 2025):

$ ollama run qwen3:14b
>>> """ Tell me what rule 15 is in the following list:
... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences.  
... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others.  
... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space.  
... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles.  
... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly.  
... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care.  
... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property.  
... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities.  
... 9. **Practice patience and perseverance** – Work through difficulties without giving up.  
... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve.  
... 11. **Express gratitude** – Acknowledge the contributions and kindness of others.  
... 12. **Avoid assumptions** – Clarify misunderstandings before reacting.  
... 13. **Stay humble** – Recognize your limitations and learn from others.  
... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature.  
... 15. **Act with kindness** – Treat others with empathy and compassion.  
... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions.  
... 17. **Think before you speak** – Avoid harmful words and choose your language carefully.  
... 18. **Collaborate effectively** – Work with others toward shared goals.  
... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results.  
... 20. **Help others when needed** – Offer support and contribute to the well-being of your community.  
... """
<think>
Okay, let's see. The user is asking about Rule 15 in the list they provided. Let me check the list again.

Looking at the numbered list, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user just wants confirmation of that. But maybe they want 
more than just the statement. Perhaps they want an explanation or examples of how to apply it. Wait, the original question was straightforward: "Tell me what rule 15 is 
in the following list." So the answer is simply stating Rule 15 as given. But maybe the user wants to know the significance or context. However, since the list is 
provided in the question, the answer is directly from the list. I should make sure there's no confusion. Let me count again to confirm the numbering. Yes, Rule 15 is 
indeed the one about kindness. I should present that clearly.
</think>

Rule 15 in the list is:  
**"Act with kindness – Treat others with empathy and compassion."**  

This rule emphasizes the importance of showing care, understanding, and consideration toward others in all interactions.

>>> /clear
Cleared session context
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> """ Tell me what rule 15 is in the following list:
... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences.  
... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others.  
... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space.  
... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles.  
... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly.  
... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care.  
... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property.  
... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities.  
... 9. **Practice patience and perseverance** – Work through difficulties without giving up.  
... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve.  
... 11. **Express gratitude** – Acknowledge the contributions and kindness of others.  
... 12. **Avoid assumptions** – Clarify misunderstandings before reacting.  
... 13. **Stay humble** – Recognize your limitations and learn from others.  
... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature.  
... 15. **Act with kindness** – Treat others with empathy and compassion.  
... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions.  
... 17. **Think before you speak** – Avoid harmful words and choose your language carefully.  
... 18. **Collaborate effectively** – Work with others toward shared goals.  
... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results.  
... 20. **Help others when needed** – Offer support and contribute to the well-being of your community.  
... """
<think>
Okay, let's see. The user is asking about Rule 15 in the list they provided. The list has 20 rules, each with a number and a description. Let me check the list again to 
make sure I have the right one.

Looking at the list, the rules are numbered from 1 to 20. The user is specifically asking for Rule 15. Let me count them again to confirm. 

1. Be responsible for your actions
2. Communicate clearly and respectfully
3. Respect others’ time and boundaries
4. Prioritize honesty and integrity
5. Strive for continuous learning
6. Maintain physical and mental well-being
7. Follow safety protocols
8. Adapt to change
9. Practice patience and perseverance
10. Hold yourself accountable
11. Express gratitude
12. Avoid assumptions
13. Stay humble
14. Protect the environment
15. Act with kindness
16. Be fair and just
17. Think before you speak
18. Collaborate effectively
19. Celebrate progress, not just perfection
20. Help others when needed

Yes, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user probably wants a clear answer confirming that. They might be checking if they 
have the correct rule or maybe they need more details. But the question seems straightforward, so the answer is just stating Rule 15 as given. No need for extra 
information unless the user asks for examples or how to apply it, which they didn't. So the answer is straightforward.
</think>

Rule 15 in the list is:  
**"Act with kindness – Treat others with empathy and compassion."**  

This rule emphasizes the importance of showing care, understanding, and generosity toward others, fostering positive relationships and a supportive environment.

>>> /bye
$ ollama ps
NAME         ID              SIZE      PROCESSOR    UNTIL   
qwen3:14b    7d7da67570e2    9.6 GB    100% CPU     Forever

@rick-github commented on GitHub (May 13, 2025): ```console $ ollama run qwen3:14b >>> """ Tell me what rule 15 is in the following list: ... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences. ... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others. ... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space. ... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles. ... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly. ... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care. ... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property. ... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities. ... 9. **Practice patience and perseverance** – Work through difficulties without giving up. ... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve. ... 11. **Express gratitude** – Acknowledge the contributions and kindness of others. ... 12. **Avoid assumptions** – Clarify misunderstandings before reacting. ... 13. **Stay humble** – Recognize your limitations and learn from others. ... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature. ... 15. **Act with kindness** – Treat others with empathy and compassion. ... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions. ... 17. **Think before you speak** – Avoid harmful words and choose your language carefully. ... 18. **Collaborate effectively** – Work with others toward shared goals. ... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results. ... 20. **Help others when needed** – Offer support and contribute to the well-being of your community. ... """ <think> Okay, let's see. The user is asking about Rule 15 in the list they provided. Let me check the list again. Looking at the numbered list, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user just wants confirmation of that. But maybe they want more than just the statement. Perhaps they want an explanation or examples of how to apply it. Wait, the original question was straightforward: "Tell me what rule 15 is in the following list." So the answer is simply stating Rule 15 as given. But maybe the user wants to know the significance or context. However, since the list is provided in the question, the answer is directly from the list. I should make sure there's no confusion. Let me count again to confirm the numbering. Yes, Rule 15 is indeed the one about kindness. I should present that clearly. </think> Rule 15 in the list is: **"Act with kindness – Treat others with empathy and compassion."** This rule emphasizes the importance of showing care, understanding, and consideration toward others in all interactions. >>> /clear Cleared session context >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> """ Tell me what rule 15 is in the following list: ... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences. ... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others. ... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space. ... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles. ... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly. ... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care. ... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property. ... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities. ... 9. **Practice patience and perseverance** – Work through difficulties without giving up. ... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve. ... 11. **Express gratitude** – Acknowledge the contributions and kindness of others. ... 12. **Avoid assumptions** – Clarify misunderstandings before reacting. ... 13. **Stay humble** – Recognize your limitations and learn from others. ... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature. ... 15. **Act with kindness** – Treat others with empathy and compassion. ... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions. ... 17. **Think before you speak** – Avoid harmful words and choose your language carefully. ... 18. **Collaborate effectively** – Work with others toward shared goals. ... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results. ... 20. **Help others when needed** – Offer support and contribute to the well-being of your community. ... """ <think> Okay, let's see. The user is asking about Rule 15 in the list they provided. The list has 20 rules, each with a number and a description. Let me check the list again to make sure I have the right one. Looking at the list, the rules are numbered from 1 to 20. The user is specifically asking for Rule 15. Let me count them again to confirm. 1. Be responsible for your actions 2. Communicate clearly and respectfully 3. Respect others’ time and boundaries 4. Prioritize honesty and integrity 5. Strive for continuous learning 6. Maintain physical and mental well-being 7. Follow safety protocols 8. Adapt to change 9. Practice patience and perseverance 10. Hold yourself accountable 11. Express gratitude 12. Avoid assumptions 13. Stay humble 14. Protect the environment 15. Act with kindness 16. Be fair and just 17. Think before you speak 18. Collaborate effectively 19. Celebrate progress, not just perfection 20. Help others when needed Yes, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user probably wants a clear answer confirming that. They might be checking if they have the correct rule or maybe they need more details. But the question seems straightforward, so the answer is just stating Rule 15 as given. No need for extra information unless the user asks for examples or how to apply it, which they didn't. So the answer is straightforward. </think> Rule 15 in the list is: **"Act with kindness – Treat others with empathy and compassion."** This rule emphasizes the importance of showing care, understanding, and generosity toward others, fostering positive relationships and a supportive environment. >>> /bye $ ollama ps NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 9.6 GB 100% CPU Forever ```

GiteaMirror commented

2026-05-04 17:06:59 -05:00

@kripper commented on GitHub (May 13, 2025):

$ ollama run qwen3:14b

""" Tell me what rule 15 is in the following list:

My prompt is not that straightforward.
I can remove sensitive info and provide a test case if necessary.
I just wanted to explore first if this is a known behaviour.

@kripper commented on GitHub (May 13, 2025): > $ ollama run qwen3:14b > >>> """ Tell me what rule 15 is in the following list: My prompt is not that straightforward. I can remove sensitive info and provide a test case if necessary. I just wanted to explore first if this is a known behaviour.

GiteaMirror commented

2026-05-04 17:06:59 -05:00

@kripper commented on GitHub (May 13, 2025):

I added my logs from the CPU-only machine.

@kripper commented on GitHub (May 13, 2025): I added my logs from the CPU-only machine.

GiteaMirror commented

2026-05-04 17:07:00 -05:00

@kripper commented on GitHub (May 13, 2025):

I added a test case to reproduce the problem.

@kripper commented on GitHub (May 13, 2025): I added a test case to reproduce the problem.

GiteaMirror commented

2026-05-04 17:07:01 -05:00

@rick-github commented on GitHub (May 13, 2025):

What version of ollama is running on the CPU-only machine?

@rick-github commented on GitHub (May 13, 2025): What version of ollama is running on the CPU-only machine?

GiteaMirror commented

2026-05-04 17:07:02 -05:00

@rick-github commented on GitHub (May 13, 2025):

$ ollama -v
ollama version is 0.6.8
$ ollama run qwen3:14b --verbose
>>> """
... ### Instructions
... 
... Eres un bot de soporte técnico que procesa mails de clientes recibidos en la casilla de soporte.
...
... Que dice la regla 15?
... """
<think>
Okay, the user is asking about rule 15. Let me check the instructions again.

Looking through the rules, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action to send the liquidation with the client's ID.

So the main points here are that when a client inquires about their invoices or debts related to ServicioX's monthly service, the action is to send a liquidation. The action includes the client's ID. The user probably wants a clear summary of this rule. I should make sure to mention the trigger (invoices or debts inquiry) and the resulting action 
(send-liquidacion with the ID). Also, note that the ID is required. No other details are needed since the rule is straightforward.
</think>

La **regla 15** establece lo siguiente:

> **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX**, debes generar la acción:
> ```json
> {
>     "action": "send-liquidacion",
>     "ID": <ID del contribuyente cliente de ServicioX>
> }
> ```

### Detalles clave:
- **Condiciones de activación**: El cliente pregunta sobre:
  - Facturas emitidas por ServicioX relacionadas con el servicio mensual.
  - Deudas pendientes con ServicioX.
- **Acción requerida**: Generar una acción para enviar una **liquidación** (probablemente relacionada con facturación o saldos pendientes).
- **Argumento obligatorio**: El **ID del contribuyente cliente de ServicioX** (no se acepta valor `null` si no se proporciona).

No se requiere información adicional para cumplir con esta regla.

total duration:       11.051470043s
load duration:        276.136914ms
prompt eval count:    1973 token(s)
prompt eval duration: 1.10856585s
prompt eval rate:     1779.78 tokens/s
eval count:           418 token(s)
eval duration:        9.665861322s
eval rate:            43.24 tokens/s
... 
>>> /clear
Cleared session context
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> """
... ### Instructions
...
... Que dice la regla 15?
... """
<think>
Okay, the user is asking about rule 15. Let me check the instructions again to make sure I get it right.

Looking through the list, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action "send-liquidacion" with the client's ID.

So the main points here are when the client is inquiring about their invoices or debts related to the monthly service. The action required is to send a liquidation, which involves the client's ID. The user probably wants a clear explanation of this rule, maybe for applying it correctly when handling such emails. They might be a support technician needing to 
know exactly what to do when a client mentions invoices or debts. I should make sure to highlight the trigger (invoices or debts) and the action (send-liquidacion with the ID). Also, note that the ID is a required argument here. No optional parameters mentioned for this action. Just need to confirm that the rule is correctly understood and that the action is 
generated only when the specific conditions are met.
</think>

La **regla 15** establece lo siguiente:

> **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX, debes generar la acción:**
> 
> ```json
> {
>     "action": "send-liquidacion",
>     "ID": <ID del contribuyente cliente de ServicioX>
> }
> ```

### Interpretación:
- **Condición de activación**: El cliente pregunta sobre **facturas mensuales** de ServicioX o **deudas pendientes** con la plataforma.
- **Acción requerida**: Enviar una liquidación (facturación o resumen de deudas) al cliente, proporcionando su **ID de contribuyente**.
- **Argumento obligatorio**: El `ID` del contribuyente (no es opcional). Si no se proporciona, **no se genera la acción** y se debe solicitar esa información al cliente.

### Ejemplo:
- **Correo del cliente**: *"¿Por qué no he recibido la factura mensual de ServicioX?"*
- **Respuesta**: Se genera la acción `send-liquidacion` con el `ID` del contribuyente (si se proporciona en el correo). Si no se incluye el `ID`, se debe pedir al cliente que lo proporcione.

total duration:       3m46.23970599s
load duration:        4.669283655s
prompt eval count:    1973 token(s)
prompt eval duration: 2m4.871704903s
prompt eval rate:     15.80 tokens/s
eval count:           547 token(s)
eval duration:        1m36.695757422s
eval rate:            5.66 tokens/s
>>> /bye
$ ollama ps
NAME         ID              SIZE      PROCESSOR    UNTIL   
qwen3:14b    7d7da67570e2    9.6 GB    100% CPU     Forever

@rick-github commented on GitHub (May 13, 2025): ```console $ ollama -v ollama version is 0.6.8 $ ollama run qwen3:14b --verbose >>> """ ... ### Instructions ... ... Eres un bot de soporte técnico que procesa mails de clientes recibidos en la casilla de soporte. ... ... Que dice la regla 15? ... """ <think> Okay, the user is asking about rule 15. Let me check the instructions again. Looking through the rules, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action to send the liquidation with the client's ID. So the main points here are that when a client inquires about their invoices or debts related to ServicioX's monthly service, the action is to send a liquidation. The action includes the client's ID. The user probably wants a clear summary of this rule. I should make sure to mention the trigger (invoices or debts inquiry) and the resulting action (send-liquidacion with the ID). Also, note that the ID is required. No other details are needed since the rule is straightforward. </think> La **regla 15** establece lo siguiente: > **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX**, debes generar la acción: > ```json > { > "action": "send-liquidacion", > "ID": <ID del contribuyente cliente de ServicioX> > } > ``` ### Detalles clave: - **Condiciones de activación**: El cliente pregunta sobre: - Facturas emitidas por ServicioX relacionadas con el servicio mensual. - Deudas pendientes con ServicioX. - **Acción requerida**: Generar una acción para enviar una **liquidación** (probablemente relacionada con facturación o saldos pendientes). - **Argumento obligatorio**: El **ID del contribuyente cliente de ServicioX** (no se acepta valor `null` si no se proporciona). No se requiere información adicional para cumplir con esta regla. total duration: 11.051470043s load duration: 276.136914ms prompt eval count: 1973 token(s) prompt eval duration: 1.10856585s prompt eval rate: 1779.78 tokens/s eval count: 418 token(s) eval duration: 9.665861322s eval rate: 43.24 tokens/s ... >>> /clear Cleared session context >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> """ ... ### Instructions ... ... Que dice la regla 15? ... """ <think> Okay, the user is asking about rule 15. Let me check the instructions again to make sure I get it right. Looking through the list, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action "send-liquidacion" with the client's ID. So the main points here are when the client is inquiring about their invoices or debts related to the monthly service. The action required is to send a liquidation, which involves the client's ID. The user probably wants a clear explanation of this rule, maybe for applying it correctly when handling such emails. They might be a support technician needing to know exactly what to do when a client mentions invoices or debts. I should make sure to highlight the trigger (invoices or debts) and the action (send-liquidacion with the ID). Also, note that the ID is a required argument here. No optional parameters mentioned for this action. Just need to confirm that the rule is correctly understood and that the action is generated only when the specific conditions are met. </think> La **regla 15** establece lo siguiente: > **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX, debes generar la acción:** > > ```json > { > "action": "send-liquidacion", > "ID": <ID del contribuyente cliente de ServicioX> > } > ``` ### Interpretación: - **Condición de activación**: El cliente pregunta sobre **facturas mensuales** de ServicioX o **deudas pendientes** con la plataforma. - **Acción requerida**: Enviar una liquidación (facturación o resumen de deudas) al cliente, proporcionando su **ID de contribuyente**. - **Argumento obligatorio**: El `ID` del contribuyente (no es opcional). Si no se proporciona, **no se genera la acción** y se debe solicitar esa información al cliente. ### Ejemplo: - **Correo del cliente**: *"¿Por qué no he recibido la factura mensual de ServicioX?"* - **Respuesta**: Se genera la acción `send-liquidacion` con el `ID` del contribuyente (si se proporciona en el correo). Si no se incluye el `ID`, se debe pedir al cliente que lo proporcione. total duration: 3m46.23970599s load duration: 4.669283655s prompt eval count: 1973 token(s) prompt eval duration: 2m4.871704903s prompt eval rate: 15.80 tokens/s eval count: 547 token(s) eval duration: 1m36.695757422s eval rate: 5.66 tokens/s >>> /bye $ ollama ps NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 9.6 GB 100% CPU Forever ```

GiteaMirror commented

2026-05-04 17:07:04 -05:00

@kripper commented on GitHub (May 14, 2025):

This is the expected answer.
I'm not getting this answer with cpu-only.
Are you sure you used cpu-only?

@kripper commented on GitHub (May 14, 2025): This is the expected answer. I'm not getting this answer with cpu-only. Are you sure you used cpu-only?

GiteaMirror commented

2026-05-04 17:07:05 -05:00

@rick-github commented on GitHub (May 14, 2025):

Are you sure you used cpu-only?

NAME         ID              SIZE      PROCESSOR    UNTIL   
qwen3:14b    7d7da67570e2    9.6 GB    100% CPU     Forever

What version of ollama is running on the CPU-only machine?

@rick-github commented on GitHub (May 14, 2025): > Are you sure you used cpu-only? ``` NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 9.6 GB 100% CPU Forever ``` What version of ollama is running on the CPU-only machine?

GiteaMirror commented

2026-05-04 17:07:05 -05:00

@kripper commented on GitHub (May 14, 2025):

What version of ollama is running on the CPU-only machine?

ollama version is 0.6.6

@kripper commented on GitHub (May 14, 2025): > What version of ollama is running on the CPU-only machine? `ollama version is 0.6.6`

GiteaMirror commented

2026-05-04 17:07:05 -05:00

@kripper commented on GitHub (May 14, 2025):

I upgraded ollama to 0.6.8 and it fixed the problem.
I also tested qwen3:30b-a3b.
Thanks.

@kripper commented on GitHub (May 14, 2025): I upgraded ollama to `0.6.8` and it fixed the problem. I also tested `qwen3:30b-a3b`. Thanks.

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#69079