[GH-ISSUE #10681] Notable degradation in response quality when using CPU inference compared to GPU inference #69079

Closed
opened 2026-05-04 17:06:53 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @kripper on GitHub (May 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10681

What is the issue?

I'm comparing the response quality of qwen3:14b on CPU-only vs GPU inference, and discovered a notable quality degradation when using CPU-only inference.

Is this normal? Why does this happen?
Is ollama/llama.cpp doing some optimization on CPU inference?
Can it be disabled?

This is the model I'm testing:

  Model
    architecture        qwen3
    parameters          14.8B
    context length      40960
    embedding length    5120
    quantization        Q4_K_M

  Capabilities
    completion
    tools

  Parameters
    repeat_penalty    1
    stop              "<|im_start|>"
    stop              "<|im_end|>"
    temperature       0.6
    top_k             20
    top_p             0.95

  License
    Apache License
    Version 2.0, January 2004

The CPU-only Hardware is a Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz and has only AVX1.

Test case

See: https://ai.imatronix.com/a/tmp/prompt.txt

Relevant log output

May 13 11:13:24   [GIN] 2025/05/13 - 11:13:24 | 200 | 84.255µs | 127.0.0.1 | HEAD "/"
May 13 11:13:24   time=2025-05-13T11:13:24.285-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.349-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   [GIN] 2025/05/13 - 11:13:24 | 200 | 133.819676ms | 127.0.0.1 | POST "/api/show"
May 13 11:13:24   time=2025-05-13T11:13:24.426-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.489-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.551-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:24   time=2025-05-13T11:13:24.554-04:00 level=INFO source=server.go:105 msg="system memory" total="377.3 GiB" free="269.4 GiB" free_swap="0 B"
May 13 11:13:24   time=2025-05-13T11:13:24.554-04:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen3.vision.block_count default=0
May 13 11:13:24   time=2025-05-13T11:13:24.555-04:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[269.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="0 B" memory.required.kv="1.2 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB"
May 13 11:13:24   llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 13 11:13:24   llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 13 11:13:24   llama_model_loader: - kv 0: general.architecture str = qwen3
May 13 11:13:24   llama_model_loader: - kv 1: general.type str = model
May 13 11:13:24   llama_model_loader: - kv 2: general.name str = Qwen3 14B
May 13 11:13:24   llama_model_loader: - kv 3: general.basename str = Qwen3
May 13 11:13:24   llama_model_loader: - kv 4: general.size_label str = 14B
May 13 11:13:24   llama_model_loader: - kv 5: qwen3.block_count u32 = 40
May 13 11:13:24   llama_model_loader: - kv 6: qwen3.context_length u32 = 40960
May 13 11:13:24   llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120
May 13 11:13:24   llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408
May 13 11:13:24   llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40
May 13 11:13:24   llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8
May 13 11:13:24   llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000
May 13 11:13:24   llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
May 13 11:13:24   llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128
May 13 11:13:24   llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128
May 13 11:13:24   llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
May 13 11:13:24   llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2
May 13 11:13:24   llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
May 13 11:13:24   llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 13 11:13:24   llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 13 11:13:24   llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645
May 13 11:13:24   llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
May 13 11:13:24   llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643
May 13 11:13:24   llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false
May 13 11:13:24   llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
May 13 11:13:24   llama_model_loader: - kv 25: general.quantization_version u32 = 2
May 13 11:13:24   llama_model_loader: - kv 26: general.file_type u32 = 15
May 13 11:13:24   llama_model_loader: - type f32: 161 tensors
May 13 11:13:24   llama_model_loader: - type f16: 40 tensors
May 13 11:13:24   llama_model_loader: - type q4_K: 221 tensors
May 13 11:13:24   llama_model_loader: - type q6_K: 21 tensors
May 13 11:13:24   print_info: file format = GGUF V3 (latest)
May 13 11:13:24   print_info: file type = Q4_K - Medium
May 13 11:13:24   print_info: file size = 8.63 GiB (5.02 BPW)
May 13 11:13:24   load: special tokens cache size = 26
May 13 11:13:25   load: token to piece cache size = 0.9311 MB
May 13 11:13:25   print_info: arch = qwen3
May 13 11:13:25   print_info: vocab_only = 1
May 13 11:13:25   print_info: model type = ?B
May 13 11:13:25   print_info: model params = 14.77 B
May 13 11:13:25   print_info: general.name = Qwen3 14B
May 13 11:13:25   print_info: vocab type = BPE
May 13 11:13:25   print_info: n_vocab = 151936
May 13 11:13:25   print_info: n_merges = 151387
May 13 11:13:25   print_info: BOS token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOS token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOT token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: PAD token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: LF token = 198 'Ċ'
May 13 11:13:25   print_info: FIM PRE token = 151659 '<|fim_prefix|>'
May 13 11:13:25   print_info: FIM SUF token = 151661 '<|fim_suffix|>'
May 13 11:13:25   print_info: FIM MID token = 151660 '<|fim_middle|>'
May 13 11:13:25   print_info: FIM PAD token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: FIM REP token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: FIM SEP token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: EOG token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOG token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOG token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: EOG token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: EOG token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: max token length = 256
May 13 11:13:25   llama_model_load: vocab only - skipping tensors
May 13 11:13:25   time=2025-05-13T11:13:25.052-04:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --threads 12 --no-mmap --parallel 4 --port 38897"
May 13 11:13:25   time=2025-05-13T11:13:25.053-04:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
May 13 11:13:25   time=2025-05-13T11:13:25.053-04:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
May 13 11:13:25   time=2025-05-13T11:13:25.054-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
May 13 11:13:25   time=2025-05-13T11:13:25.081-04:00 level=INFO source=runner.go:853 msg="starting go runner"
May 13 11:13:25   load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so
May 13 11:13:25   time=2025-05-13T11:13:25.119-04:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
May 13 11:13:25   time=2025-05-13T11:13:25.120-04:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:38897"
May 13 11:13:25   llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 13 11:13:25   llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 13 11:13:25   llama_model_loader: - kv 0: general.architecture str = qwen3
May 13 11:13:25   llama_model_loader: - kv 1: general.type str = model
May 13 11:13:25   llama_model_loader: - kv 2: general.name str = Qwen3 14B
May 13 11:13:25   llama_model_loader: - kv 3: general.basename str = Qwen3
May 13 11:13:25   llama_model_loader: - kv 4: general.size_label str = 14B
May 13 11:13:25   llama_model_loader: - kv 5: qwen3.block_count u32 = 40
May 13 11:13:25   llama_model_loader: - kv 6: qwen3.context_length u32 = 40960
May 13 11:13:25   llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120
May 13 11:13:25   llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408
May 13 11:13:25   llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40
May 13 11:13:25   llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8
May 13 11:13:25   llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000
May 13 11:13:25   llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
May 13 11:13:25   llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128
May 13 11:13:25   llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128
May 13 11:13:25   llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
May 13 11:13:25   llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2
May 13 11:13:25   llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
May 13 11:13:25   time=2025-05-13T11:13:25.306-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
May 13 11:13:25   llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 13 11:13:25   llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 13 11:13:25   llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645
May 13 11:13:25   llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
May 13 11:13:25   llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643
May 13 11:13:25   llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false
May 13 11:13:25   llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
May 13 11:13:25   llama_model_loader: - kv 25: general.quantization_version u32 = 2
May 13 11:13:25   llama_model_loader: - kv 26: general.file_type u32 = 15
May 13 11:13:25   llama_model_loader: - type f32: 161 tensors
May 13 11:13:25   llama_model_loader: - type f16: 40 tensors
May 13 11:13:25   llama_model_loader: - type q4_K: 221 tensors
May 13 11:13:25   llama_model_loader: - type q6_K: 21 tensors
May 13 11:13:25   print_info: file format = GGUF V3 (latest)
May 13 11:13:25   print_info: file type = Q4_K - Medium
May 13 11:13:25   print_info: file size = 8.63 GiB (5.02 BPW)
May 13 11:13:25   load: special tokens cache size = 26
May 13 11:13:25   load: token to piece cache size = 0.9311 MB
May 13 11:13:25   print_info: arch = qwen3
May 13 11:13:25   print_info: vocab_only = 0
May 13 11:13:25   print_info: n_ctx_train = 40960
May 13 11:13:25   print_info: n_embd = 5120
May 13 11:13:25   print_info: n_layer = 40
May 13 11:13:25   print_info: n_head = 40
May 13 11:13:25   print_info: n_head_kv = 8
May 13 11:13:25   print_info: n_rot = 128
May 13 11:13:25   print_info: n_swa = 0
May 13 11:13:25   print_info: n_swa_pattern = 1
May 13 11:13:25   print_info: n_embd_head_k = 128
May 13 11:13:25   print_info: n_embd_head_v = 128
May 13 11:13:25   print_info: n_gqa = 5
May 13 11:13:25   print_info: n_embd_k_gqa = 1024
May 13 11:13:25   print_info: n_embd_v_gqa = 1024
May 13 11:13:25   print_info: f_norm_eps = 0.0e+00
May 13 11:13:25   print_info: f_norm_rms_eps = 1.0e-06
May 13 11:13:25   print_info: f_clamp_kqv = 0.0e+00
May 13 11:13:25   print_info: f_max_alibi_bias = 0.0e+00
May 13 11:13:25   print_info: f_logit_scale = 0.0e+00
May 13 11:13:25   print_info: f_attn_scale = 0.0e+00
May 13 11:13:25   print_info: n_ff = 17408
May 13 11:13:25   print_info: n_expert = 0
May 13 11:13:25   print_info: n_expert_used = 0
May 13 11:13:25   print_info: causal attn = 1
May 13 11:13:25   print_info: pooling type = 0
May 13 11:13:25   print_info: rope type = 2
May 13 11:13:25   print_info: rope scaling = linear
May 13 11:13:25   print_info: freq_base_train = 1000000.0
May 13 11:13:25   print_info: freq_scale_train = 1
May 13 11:13:25   print_info: n_ctx_orig_yarn = 40960
May 13 11:13:25   print_info: rope_finetuned = unknown
May 13 11:13:25   print_info: ssm_d_conv = 0
May 13 11:13:25   print_info: ssm_d_inner = 0
May 13 11:13:25   print_info: ssm_d_state = 0
May 13 11:13:25   print_info: ssm_dt_rank = 0
May 13 11:13:25   print_info: ssm_dt_b_c_rms = 0
May 13 11:13:25   print_info: model type = ?B
May 13 11:13:25   print_info: model params = 14.77 B
May 13 11:13:25   print_info: general.name = Qwen3 14B
May 13 11:13:25   print_info: vocab type = BPE
May 13 11:13:25   print_info: n_vocab = 151936
May 13 11:13:25   print_info: n_merges = 151387
May 13 11:13:25   print_info: BOS token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOS token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOT token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: PAD token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: LF token = 198 'Ċ'
May 13 11:13:25   print_info: FIM PRE token = 151659 '<|fim_prefix|>'
May 13 11:13:25   print_info: FIM SUF token = 151661 '<|fim_suffix|>'
May 13 11:13:25   print_info: FIM MID token = 151660 '<|fim_middle|>'
May 13 11:13:25   print_info: FIM PAD token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: FIM REP token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: FIM SEP token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: EOG token = 151643 '<|endoftext|>'
May 13 11:13:25   print_info: EOG token = 151645 '<|im_end|>'
May 13 11:13:25   print_info: EOG token = 151662 '<|fim_pad|>'
May 13 11:13:25   print_info: EOG token = 151663 '<|repo_name|>'
May 13 11:13:25   print_info: EOG token = 151664 '<|file_sep|>'
May 13 11:13:25   print_info: max token length = 256
May 13 11:13:25   load_tensors: loading model tensors, this can take a while... (mmap = false)
May 13 11:13:25   load_tensors: CPU model buffer size = 8840.78 MiB
May 13 11:13:31   llama_context: constructing llama_context
May 13 11:13:31   llama_context: n_seq_max = 4
May 13 11:13:31   llama_context: n_ctx = 8192
May 13 11:13:31   llama_context: n_ctx_per_seq = 2048
May 13 11:13:31   llama_context: n_batch = 2048
May 13 11:13:31   llama_context: n_ubatch = 512
May 13 11:13:31   llama_context: causal_attn = 1
May 13 11:13:31   llama_context: flash_attn = 0
May 13 11:13:31   llama_context: freq_base = 1000000.0
May 13 11:13:31   llama_context: freq_scale = 1
May 13 11:13:31   llama_context: n_ctx_per_seq (2048) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
May 13 11:13:31   llama_context: CPU output buffer size = 2.40 MiB
May 13 11:13:31   init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1
May 13 11:13:32   init: CPU KV buffer size = 1280.00 MiB
May 13 11:13:32   llama_context: KV self size = 1280.00 MiB, K (f16): 640.00 MiB, V (f16): 640.00 MiB
May 13 11:13:32   llama_context: CPU compute buffer size = 696.01 MiB
May 13 11:13:32   llama_context: graph nodes = 1526
May 13 11:13:32   llama_context: graph splits = 1
May 13 11:13:33   time=2025-05-13T11:13:33.092-04:00 level=INFO source=server.go:619 msg="llama runner started in 8.04 seconds"
May 13 11:13:33   [GIN] 2025/05/13 - 11:13:33 | 200 | 8.733650993s | 127.0.0.1 | POST "/api/generate"


May 13 11:13:57   time=2025-05-13T11:13:57.191-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 13 11:13:57   time=2025-05-13T11:13:57.236-04:00 level=WARN source=runner.go:131 msg="truncating input prompt" limit=2048 prompt=2065 keep=4 new=2048
Originally created by @kripper on GitHub (May 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10681 ### What is the issue? I'm comparing the response quality of `qwen3:14b` on CPU-only vs GPU inference, and discovered a notable quality degradation when using CPU-only inference. Is this normal? Why does this happen? Is ollama/llama.cpp doing some optimization on CPU inference? Can it be disabled? This is the model I'm testing: ``` Model architecture qwen3 parameters 14.8B context length 40960 embedding length 5120 quantization Q4_K_M Capabilities completion tools Parameters repeat_penalty 1 stop "<|im_start|>" stop "<|im_end|>" temperature 0.6 top_k 20 top_p 0.95 License Apache License Version 2.0, January 2004 ``` The CPU-only Hardware is a Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz and has only AVX1. ### Test case See: https://ai.imatronix.com/a/tmp/prompt.txt ### Relevant log output ```shell May 13 11:13:24 [GIN] 2025/05/13 - 11:13:24 | 200 | 84.255µs | 127.0.0.1 | HEAD "/" May 13 11:13:24 time=2025-05-13T11:13:24.285-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.349-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 [GIN] 2025/05/13 - 11:13:24 | 200 | 133.819676ms | 127.0.0.1 | POST "/api/show" May 13 11:13:24 time=2025-05-13T11:13:24.426-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.489-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.551-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:24 time=2025-05-13T11:13:24.554-04:00 level=INFO source=server.go:105 msg="system memory" total="377.3 GiB" free="269.4 GiB" free_swap="0 B" May 13 11:13:24 time=2025-05-13T11:13:24.554-04:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen3.vision.block_count default=0 May 13 11:13:24 time=2025-05-13T11:13:24.555-04:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[269.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="0 B" memory.required.kv="1.2 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" May 13 11:13:24 llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 13 11:13:24 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 13 11:13:24 llama_model_loader: - kv 0: general.architecture str = qwen3 May 13 11:13:24 llama_model_loader: - kv 1: general.type str = model May 13 11:13:24 llama_model_loader: - kv 2: general.name str = Qwen3 14B May 13 11:13:24 llama_model_loader: - kv 3: general.basename str = Qwen3 May 13 11:13:24 llama_model_loader: - kv 4: general.size_label str = 14B May 13 11:13:24 llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 13 11:13:24 llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 13 11:13:24 llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 13 11:13:24 llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 13 11:13:24 llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 13 11:13:24 llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 13 11:13:24 llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 13 11:13:24 llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 13 11:13:24 llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 13 11:13:24 llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 13 11:13:24 llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 13 11:13:24 llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 13 11:13:24 llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 13 11:13:24 llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 13 11:13:24 llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 13 11:13:24 llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 13 11:13:24 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 13 11:13:24 llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 13 11:13:24 llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 13 11:13:24 llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 13 11:13:24 llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 13 11:13:24 llama_model_loader: - kv 26: general.file_type u32 = 15 May 13 11:13:24 llama_model_loader: - type f32: 161 tensors May 13 11:13:24 llama_model_loader: - type f16: 40 tensors May 13 11:13:24 llama_model_loader: - type q4_K: 221 tensors May 13 11:13:24 llama_model_loader: - type q6_K: 21 tensors May 13 11:13:24 print_info: file format = GGUF V3 (latest) May 13 11:13:24 print_info: file type = Q4_K - Medium May 13 11:13:24 print_info: file size = 8.63 GiB (5.02 BPW) May 13 11:13:24 load: special tokens cache size = 26 May 13 11:13:25 load: token to piece cache size = 0.9311 MB May 13 11:13:25 print_info: arch = qwen3 May 13 11:13:25 print_info: vocab_only = 1 May 13 11:13:25 print_info: model type = ?B May 13 11:13:25 print_info: model params = 14.77 B May 13 11:13:25 print_info: general.name = Qwen3 14B May 13 11:13:25 print_info: vocab type = BPE May 13 11:13:25 print_info: n_vocab = 151936 May 13 11:13:25 print_info: n_merges = 151387 May 13 11:13:25 print_info: BOS token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOS token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOT token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: PAD token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: LF token = 198 'Ċ' May 13 11:13:25 print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 13 11:13:25 print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 13 11:13:25 print_info: FIM MID token = 151660 '<|fim_middle|>' May 13 11:13:25 print_info: FIM PAD token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: FIM REP token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: FIM SEP token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: EOG token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOG token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOG token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: EOG token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: EOG token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: max token length = 256 May 13 11:13:25 llama_model_load: vocab only - skipping tensors May 13 11:13:25 time=2025-05-13T11:13:25.052-04:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --threads 12 --no-mmap --parallel 4 --port 38897" May 13 11:13:25 time=2025-05-13T11:13:25.053-04:00 level=INFO source=sched.go:451 msg="loaded runners" count=1 May 13 11:13:25 time=2025-05-13T11:13:25.053-04:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" May 13 11:13:25 time=2025-05-13T11:13:25.054-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" May 13 11:13:25 time=2025-05-13T11:13:25.081-04:00 level=INFO source=runner.go:853 msg="starting go runner" May 13 11:13:25 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so May 13 11:13:25 time=2025-05-13T11:13:25.119-04:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) May 13 11:13:25 time=2025-05-13T11:13:25.120-04:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:38897" May 13 11:13:25 llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 13 11:13:25 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 13 11:13:25 llama_model_loader: - kv 0: general.architecture str = qwen3 May 13 11:13:25 llama_model_loader: - kv 1: general.type str = model May 13 11:13:25 llama_model_loader: - kv 2: general.name str = Qwen3 14B May 13 11:13:25 llama_model_loader: - kv 3: general.basename str = Qwen3 May 13 11:13:25 llama_model_loader: - kv 4: general.size_label str = 14B May 13 11:13:25 llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 13 11:13:25 llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 13 11:13:25 llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 13 11:13:25 llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 13 11:13:25 llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 13 11:13:25 llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 13 11:13:25 llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 13 11:13:25 llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 13 11:13:25 llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 13 11:13:25 llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 13 11:13:25 llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 13 11:13:25 llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 13 11:13:25 llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 13 11:13:25 time=2025-05-13T11:13:25.306-04:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" May 13 11:13:25 llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 13 11:13:25 llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 13 11:13:25 llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 13 11:13:25 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 13 11:13:25 llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 13 11:13:25 llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 13 11:13:25 llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 13 11:13:25 llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 13 11:13:25 llama_model_loader: - kv 26: general.file_type u32 = 15 May 13 11:13:25 llama_model_loader: - type f32: 161 tensors May 13 11:13:25 llama_model_loader: - type f16: 40 tensors May 13 11:13:25 llama_model_loader: - type q4_K: 221 tensors May 13 11:13:25 llama_model_loader: - type q6_K: 21 tensors May 13 11:13:25 print_info: file format = GGUF V3 (latest) May 13 11:13:25 print_info: file type = Q4_K - Medium May 13 11:13:25 print_info: file size = 8.63 GiB (5.02 BPW) May 13 11:13:25 load: special tokens cache size = 26 May 13 11:13:25 load: token to piece cache size = 0.9311 MB May 13 11:13:25 print_info: arch = qwen3 May 13 11:13:25 print_info: vocab_only = 0 May 13 11:13:25 print_info: n_ctx_train = 40960 May 13 11:13:25 print_info: n_embd = 5120 May 13 11:13:25 print_info: n_layer = 40 May 13 11:13:25 print_info: n_head = 40 May 13 11:13:25 print_info: n_head_kv = 8 May 13 11:13:25 print_info: n_rot = 128 May 13 11:13:25 print_info: n_swa = 0 May 13 11:13:25 print_info: n_swa_pattern = 1 May 13 11:13:25 print_info: n_embd_head_k = 128 May 13 11:13:25 print_info: n_embd_head_v = 128 May 13 11:13:25 print_info: n_gqa = 5 May 13 11:13:25 print_info: n_embd_k_gqa = 1024 May 13 11:13:25 print_info: n_embd_v_gqa = 1024 May 13 11:13:25 print_info: f_norm_eps = 0.0e+00 May 13 11:13:25 print_info: f_norm_rms_eps = 1.0e-06 May 13 11:13:25 print_info: f_clamp_kqv = 0.0e+00 May 13 11:13:25 print_info: f_max_alibi_bias = 0.0e+00 May 13 11:13:25 print_info: f_logit_scale = 0.0e+00 May 13 11:13:25 print_info: f_attn_scale = 0.0e+00 May 13 11:13:25 print_info: n_ff = 17408 May 13 11:13:25 print_info: n_expert = 0 May 13 11:13:25 print_info: n_expert_used = 0 May 13 11:13:25 print_info: causal attn = 1 May 13 11:13:25 print_info: pooling type = 0 May 13 11:13:25 print_info: rope type = 2 May 13 11:13:25 print_info: rope scaling = linear May 13 11:13:25 print_info: freq_base_train = 1000000.0 May 13 11:13:25 print_info: freq_scale_train = 1 May 13 11:13:25 print_info: n_ctx_orig_yarn = 40960 May 13 11:13:25 print_info: rope_finetuned = unknown May 13 11:13:25 print_info: ssm_d_conv = 0 May 13 11:13:25 print_info: ssm_d_inner = 0 May 13 11:13:25 print_info: ssm_d_state = 0 May 13 11:13:25 print_info: ssm_dt_rank = 0 May 13 11:13:25 print_info: ssm_dt_b_c_rms = 0 May 13 11:13:25 print_info: model type = ?B May 13 11:13:25 print_info: model params = 14.77 B May 13 11:13:25 print_info: general.name = Qwen3 14B May 13 11:13:25 print_info: vocab type = BPE May 13 11:13:25 print_info: n_vocab = 151936 May 13 11:13:25 print_info: n_merges = 151387 May 13 11:13:25 print_info: BOS token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOS token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOT token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: PAD token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: LF token = 198 'Ċ' May 13 11:13:25 print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 13 11:13:25 print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 13 11:13:25 print_info: FIM MID token = 151660 '<|fim_middle|>' May 13 11:13:25 print_info: FIM PAD token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: FIM REP token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: FIM SEP token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: EOG token = 151643 '<|endoftext|>' May 13 11:13:25 print_info: EOG token = 151645 '<|im_end|>' May 13 11:13:25 print_info: EOG token = 151662 '<|fim_pad|>' May 13 11:13:25 print_info: EOG token = 151663 '<|repo_name|>' May 13 11:13:25 print_info: EOG token = 151664 '<|file_sep|>' May 13 11:13:25 print_info: max token length = 256 May 13 11:13:25 load_tensors: loading model tensors, this can take a while... (mmap = false) May 13 11:13:25 load_tensors: CPU model buffer size = 8840.78 MiB May 13 11:13:31 llama_context: constructing llama_context May 13 11:13:31 llama_context: n_seq_max = 4 May 13 11:13:31 llama_context: n_ctx = 8192 May 13 11:13:31 llama_context: n_ctx_per_seq = 2048 May 13 11:13:31 llama_context: n_batch = 2048 May 13 11:13:31 llama_context: n_ubatch = 512 May 13 11:13:31 llama_context: causal_attn = 1 May 13 11:13:31 llama_context: flash_attn = 0 May 13 11:13:31 llama_context: freq_base = 1000000.0 May 13 11:13:31 llama_context: freq_scale = 1 May 13 11:13:31 llama_context: n_ctx_per_seq (2048) < n_ctx_train (40960) -- the full capacity of the model will not be utilized May 13 11:13:31 llama_context: CPU output buffer size = 2.40 MiB May 13 11:13:31 init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1 May 13 11:13:32 init: CPU KV buffer size = 1280.00 MiB May 13 11:13:32 llama_context: KV self size = 1280.00 MiB, K (f16): 640.00 MiB, V (f16): 640.00 MiB May 13 11:13:32 llama_context: CPU compute buffer size = 696.01 MiB May 13 11:13:32 llama_context: graph nodes = 1526 May 13 11:13:32 llama_context: graph splits = 1 May 13 11:13:33 time=2025-05-13T11:13:33.092-04:00 level=INFO source=server.go:619 msg="llama runner started in 8.04 seconds" May 13 11:13:33 [GIN] 2025/05/13 - 11:13:33 | 200 | 8.733650993s | 127.0.0.1 | POST "/api/generate" May 13 11:13:57 time=2025-05-13T11:13:57.191-04:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 13 11:13:57 time=2025-05-13T11:13:57.236-04:00 level=WARN source=runner.go:131 msg="truncating input prompt" limit=2048 prompt=2065 keep=4 new=2048 ```
GiteaMirror added the bug label 2026-05-04 17:06:53 -05:00
Author
Owner

@rick-github commented on GitHub (May 13, 2025):

How are you quantifying degradation? Server logs may aid in debugging.

<!-- gh-comment-id:2875329226 --> @rick-github commented on GitHub (May 13, 2025): How are you quantifying degradation? [Server logs]( https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@kripper commented on GitHub (May 13, 2025):

How are you quantifying degradation? Server logs may aid in debugging.

I'm giving it a list of numbered rules and ask it to describe rule number 15.

On CPU, it fails always.
On GPU (split), it works always.

*always = I only tried 5 times

<!-- gh-comment-id:2876701784 --> @kripper commented on GitHub (May 13, 2025): > How are you quantifying degradation? [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging. I'm giving it a list of numbered rules and ask it to describe rule number 15. On CPU, it fails always. On GPU (split), it works always. *always = I only tried 5 times
Author
Owner

@rick-github commented on GitHub (May 13, 2025):

$ ollama run qwen3:14b
>>> """ Tell me what rule 15 is in the following list:
... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences.  
... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others.  
... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space.  
... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles.  
... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly.  
... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care.  
... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property.  
... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities.  
... 9. **Practice patience and perseverance** – Work through difficulties without giving up.  
... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve.  
... 11. **Express gratitude** – Acknowledge the contributions and kindness of others.  
... 12. **Avoid assumptions** – Clarify misunderstandings before reacting.  
... 13. **Stay humble** – Recognize your limitations and learn from others.  
... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature.  
... 15. **Act with kindness** – Treat others with empathy and compassion.  
... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions.  
... 17. **Think before you speak** – Avoid harmful words and choose your language carefully.  
... 18. **Collaborate effectively** – Work with others toward shared goals.  
... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results.  
... 20. **Help others when needed** – Offer support and contribute to the well-being of your community.  
... """
<think>
Okay, let's see. The user is asking about Rule 15 in the list they provided. Let me check the list again.

Looking at the numbered list, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user just wants confirmation of that. But maybe they want 
more than just the statement. Perhaps they want an explanation or examples of how to apply it. Wait, the original question was straightforward: "Tell me what rule 15 is 
in the following list." So the answer is simply stating Rule 15 as given. But maybe the user wants to know the significance or context. However, since the list is 
provided in the question, the answer is directly from the list. I should make sure there's no confusion. Let me count again to confirm the numbering. Yes, Rule 15 is 
indeed the one about kindness. I should present that clearly.
</think>

Rule 15 in the list is:  
**"Act with kindness – Treat others with empathy and compassion."**  

This rule emphasizes the importance of showing care, understanding, and consideration toward others in all interactions.

>>> /clear
Cleared session context
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> """ Tell me what rule 15 is in the following list:
... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences.  
... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others.  
... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space.  
... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles.  
... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly.  
... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care.  
... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property.  
... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities.  
... 9. **Practice patience and perseverance** – Work through difficulties without giving up.  
... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve.  
... 11. **Express gratitude** – Acknowledge the contributions and kindness of others.  
... 12. **Avoid assumptions** – Clarify misunderstandings before reacting.  
... 13. **Stay humble** – Recognize your limitations and learn from others.  
... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature.  
... 15. **Act with kindness** – Treat others with empathy and compassion.  
... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions.  
... 17. **Think before you speak** – Avoid harmful words and choose your language carefully.  
... 18. **Collaborate effectively** – Work with others toward shared goals.  
... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results.  
... 20. **Help others when needed** – Offer support and contribute to the well-being of your community.  
... """
<think>
Okay, let's see. The user is asking about Rule 15 in the list they provided. The list has 20 rules, each with a number and a description. Let me check the list again to 
make sure I have the right one.

Looking at the list, the rules are numbered from 1 to 20. The user is specifically asking for Rule 15. Let me count them again to confirm. 

1. Be responsible for your actions
2. Communicate clearly and respectfully
3. Respect others’ time and boundaries
4. Prioritize honesty and integrity
5. Strive for continuous learning
6. Maintain physical and mental well-being
7. Follow safety protocols
8. Adapt to change
9. Practice patience and perseverance
10. Hold yourself accountable
11. Express gratitude
12. Avoid assumptions
13. Stay humble
14. Protect the environment
15. Act with kindness
16. Be fair and just
17. Think before you speak
18. Collaborate effectively
19. Celebrate progress, not just perfection
20. Help others when needed

Yes, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user probably wants a clear answer confirming that. They might be checking if they 
have the correct rule or maybe they need more details. But the question seems straightforward, so the answer is just stating Rule 15 as given. No need for extra 
information unless the user asks for examples or how to apply it, which they didn't. So the answer is straightforward.
</think>

Rule 15 in the list is:  
**"Act with kindness – Treat others with empathy and compassion."**  

This rule emphasizes the importance of showing care, understanding, and generosity toward others, fostering positive relationships and a supportive environment.

>>> /bye
$ ollama ps
NAME         ID              SIZE      PROCESSOR    UNTIL   
qwen3:14b    7d7da67570e2    9.6 GB    100% CPU     Forever    
<!-- gh-comment-id:2876911264 --> @rick-github commented on GitHub (May 13, 2025): ```console $ ollama run qwen3:14b >>> """ Tell me what rule 15 is in the following list: ... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences. ... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others. ... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space. ... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles. ... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly. ... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care. ... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property. ... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities. ... 9. **Practice patience and perseverance** – Work through difficulties without giving up. ... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve. ... 11. **Express gratitude** – Acknowledge the contributions and kindness of others. ... 12. **Avoid assumptions** – Clarify misunderstandings before reacting. ... 13. **Stay humble** – Recognize your limitations and learn from others. ... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature. ... 15. **Act with kindness** – Treat others with empathy and compassion. ... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions. ... 17. **Think before you speak** – Avoid harmful words and choose your language carefully. ... 18. **Collaborate effectively** – Work with others toward shared goals. ... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results. ... 20. **Help others when needed** – Offer support and contribute to the well-being of your community. ... """ <think> Okay, let's see. The user is asking about Rule 15 in the list they provided. Let me check the list again. Looking at the numbered list, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user just wants confirmation of that. But maybe they want more than just the statement. Perhaps they want an explanation or examples of how to apply it. Wait, the original question was straightforward: "Tell me what rule 15 is in the following list." So the answer is simply stating Rule 15 as given. But maybe the user wants to know the significance or context. However, since the list is provided in the question, the answer is directly from the list. I should make sure there's no confusion. Let me count again to confirm the numbering. Yes, Rule 15 is indeed the one about kindness. I should present that clearly. </think> Rule 15 in the list is: **"Act with kindness – Treat others with empathy and compassion."** This rule emphasizes the importance of showing care, understanding, and consideration toward others in all interactions. >>> /clear Cleared session context >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> """ Tell me what rule 15 is in the following list: ... 1. **Be responsible for your actions** – Take ownership of decisions and their consequences. ... 2. **Communicate clearly and respectfully** – Express ideas openly and listen actively to others. ... 3. **Respect others’ time and boundaries** – Avoid interruptions and value personal space. ... 4. **Prioritize honesty and integrity** – Always act with transparency and ethical principles. ... 5. **Strive for continuous learning** – Seek knowledge and improve skills regularly. ... 6. **Maintain physical and mental well-being** – Prioritize health through exercise, rest, and self-care. ... 7. **Follow safety protocols** – Adhere to rules designed to protect people and property. ... 8. **Adapt to change** – Stay flexible and embrace new challenges or opportunities. ... 9. **Practice patience and perseverance** – Work through difficulties without giving up. ... 10. **Hold yourself accountable** – Reflect on mistakes and take steps to improve. ... 11. **Express gratitude** – Acknowledge the contributions and kindness of others. ... 12. **Avoid assumptions** – Clarify misunderstandings before reacting. ... 13. **Stay humble** – Recognize your limitations and learn from others. ... 14. **Protect the environment** – Reduce waste, conserve resources, and minimize harm to nature. ... 15. **Act with kindness** – Treat others with empathy and compassion. ... 16. **Be fair and just** – Treat everyone equally and make unbiased decisions. ... 17. **Think before you speak** – Avoid harmful words and choose your language carefully. ... 18. **Collaborate effectively** – Work with others toward shared goals. ... 19. **Celebrate progress, not just perfection** – Value effort and growth over flawless results. ... 20. **Help others when needed** – Offer support and contribute to the well-being of your community. ... """ <think> Okay, let's see. The user is asking about Rule 15 in the list they provided. The list has 20 rules, each with a number and a description. Let me check the list again to make sure I have the right one. Looking at the list, the rules are numbered from 1 to 20. The user is specifically asking for Rule 15. Let me count them again to confirm. 1. Be responsible for your actions 2. Communicate clearly and respectfully 3. Respect others’ time and boundaries 4. Prioritize honesty and integrity 5. Strive for continuous learning 6. Maintain physical and mental well-being 7. Follow safety protocols 8. Adapt to change 9. Practice patience and perseverance 10. Hold yourself accountable 11. Express gratitude 12. Avoid assumptions 13. Stay humble 14. Protect the environment 15. Act with kindness 16. Be fair and just 17. Think before you speak 18. Collaborate effectively 19. Celebrate progress, not just perfection 20. Help others when needed Yes, Rule 15 is "Act with kindness – Treat others with empathy and compassion." The user probably wants a clear answer confirming that. They might be checking if they have the correct rule or maybe they need more details. But the question seems straightforward, so the answer is just stating Rule 15 as given. No need for extra information unless the user asks for examples or how to apply it, which they didn't. So the answer is straightforward. </think> Rule 15 in the list is: **"Act with kindness – Treat others with empathy and compassion."** This rule emphasizes the importance of showing care, understanding, and generosity toward others, fostering positive relationships and a supportive environment. >>> /bye $ ollama ps NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 9.6 GB 100% CPU Forever ```
Author
Owner

@kripper commented on GitHub (May 13, 2025):

$ ollama run qwen3:14b

""" Tell me what rule 15 is in the following list:

My prompt is not that straightforward.
I can remove sensitive info and provide a test case if necessary.
I just wanted to explore first if this is a known behaviour.

<!-- gh-comment-id:2876948002 --> @kripper commented on GitHub (May 13, 2025): > $ ollama run qwen3:14b > >>> """ Tell me what rule 15 is in the following list: My prompt is not that straightforward. I can remove sensitive info and provide a test case if necessary. I just wanted to explore first if this is a known behaviour.
Author
Owner

@kripper commented on GitHub (May 13, 2025):

I added my logs from the CPU-only machine.

<!-- gh-comment-id:2876951262 --> @kripper commented on GitHub (May 13, 2025): I added my logs from the CPU-only machine.
Author
Owner

@kripper commented on GitHub (May 13, 2025):

I added a test case to reproduce the problem.

<!-- gh-comment-id:2877443731 --> @kripper commented on GitHub (May 13, 2025): I added a test case to reproduce the problem.
Author
Owner

@rick-github commented on GitHub (May 13, 2025):

What version of ollama is running on the CPU-only machine?

<!-- gh-comment-id:2877908694 --> @rick-github commented on GitHub (May 13, 2025): What version of ollama is running on the CPU-only machine?
Author
Owner

@rick-github commented on GitHub (May 13, 2025):

$ ollama -v
ollama version is 0.6.8
$ ollama run qwen3:14b --verbose
>>> """
... ### Instructions
... 
... Eres un bot de soporte técnico que procesa mails de clientes recibidos en la casilla de soporte.
...
... Que dice la regla 15?
... """
<think>
Okay, the user is asking about rule 15. Let me check the instructions again.

Looking through the rules, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action to send the liquidation with the client's ID.

So the main points here are that when a client inquires about their invoices or debts related to ServicioX's monthly service, the action is to send a liquidation. The action includes the client's ID. The user probably wants a clear summary of this rule. I should make sure to mention the trigger (invoices or debts inquiry) and the resulting action 
(send-liquidacion with the ID). Also, note that the ID is required. No other details are needed since the rule is straightforward.
</think>

La **regla 15** establece lo siguiente:

> **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX**, debes generar la acción:
> ```json
> {
>     "action": "send-liquidacion",
>     "ID": <ID del contribuyente cliente de ServicioX>
> }
> ```

### Detalles clave:
- **Condiciones de activación**: El cliente pregunta sobre:
  - Facturas emitidas por ServicioX relacionadas con el servicio mensual.
  - Deudas pendientes con ServicioX.
- **Acción requerida**: Generar una acción para enviar una **liquidación** (probablemente relacionada con facturación o saldos pendientes).
- **Argumento obligatorio**: El **ID del contribuyente cliente de ServicioX** (no se acepta valor `null` si no se proporciona).

No se requiere información adicional para cumplir con esta regla.

total duration:       11.051470043s
load duration:        276.136914ms
prompt eval count:    1973 token(s)
prompt eval duration: 1.10856585s
prompt eval rate:     1779.78 tokens/s
eval count:           418 token(s)
eval duration:        9.665861322s
eval rate:            43.24 tokens/s
... 
>>> /clear
Cleared session context
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> """
... ### Instructions
...
... Que dice la regla 15?
... """
<think>
Okay, the user is asking about rule 15. Let me check the instructions again to make sure I get it right.

Looking through the list, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action "send-liquidacion" with the client's ID.

So the main points here are when the client is inquiring about their invoices or debts related to the monthly service. The action required is to send a liquidation, which involves the client's ID. The user probably wants a clear explanation of this rule, maybe for applying it correctly when handling such emails. They might be a support technician needing to 
know exactly what to do when a client mentions invoices or debts. I should make sure to highlight the trigger (invoices or debts) and the action (send-liquidacion with the ID). Also, note that the ID is a required argument here. No optional parameters mentioned for this action. Just need to confirm that the rule is correctly understood and that the action is 
generated only when the specific conditions are met.
</think>

La **regla 15** establece lo siguiente:

> **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX, debes generar la acción:**
> 
> ```json
> {
>     "action": "send-liquidacion",
>     "ID": <ID del contribuyente cliente de ServicioX>
> }
> ```

### Interpretación:
- **Condición de activación**: El cliente pregunta sobre **facturas mensuales** de ServicioX o **deudas pendientes** con la plataforma.
- **Acción requerida**: Enviar una liquidación (facturación o resumen de deudas) al cliente, proporcionando su **ID de contribuyente**.
- **Argumento obligatorio**: El `ID` del contribuyente (no es opcional). Si no se proporciona, **no se genera la acción** y se debe solicitar esa información al cliente.

### Ejemplo:
- **Correo del cliente**: *"¿Por qué no he recibido la factura mensual de ServicioX?"*
- **Respuesta**: Se genera la acción `send-liquidacion` con el `ID` del contribuyente (si se proporciona en el correo). Si no se incluye el `ID`, se debe pedir al cliente que lo proporcione.

total duration:       3m46.23970599s
load duration:        4.669283655s
prompt eval count:    1973 token(s)
prompt eval duration: 2m4.871704903s
prompt eval rate:     15.80 tokens/s
eval count:           547 token(s)
eval duration:        1m36.695757422s
eval rate:            5.66 tokens/s
>>> /bye
$ ollama ps
NAME         ID              SIZE      PROCESSOR    UNTIL   
qwen3:14b    7d7da67570e2    9.6 GB    100% CPU     Forever    
<!-- gh-comment-id:2877961362 --> @rick-github commented on GitHub (May 13, 2025): ```console $ ollama -v ollama version is 0.6.8 $ ollama run qwen3:14b --verbose >>> """ ... ### Instructions ... ... Eres un bot de soporte técnico que procesa mails de clientes recibidos en la casilla de soporte. ... ... Que dice la regla 15? ... """ <think> Okay, the user is asking about rule 15. Let me check the instructions again. Looking through the rules, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action to send the liquidation with the client's ID. So the main points here are that when a client inquires about their invoices or debts related to ServicioX's monthly service, the action is to send a liquidation. The action includes the client's ID. The user probably wants a clear summary of this rule. I should make sure to mention the trigger (invoices or debts inquiry) and the resulting action (send-liquidacion with the ID). Also, note that the ID is required. No other details are needed since the rule is straightforward. </think> La **regla 15** establece lo siguiente: > **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX**, debes generar la acción: > ```json > { > "action": "send-liquidacion", > "ID": <ID del contribuyente cliente de ServicioX> > } > ``` ### Detalles clave: - **Condiciones de activación**: El cliente pregunta sobre: - Facturas emitidas por ServicioX relacionadas con el servicio mensual. - Deudas pendientes con ServicioX. - **Acción requerida**: Generar una acción para enviar una **liquidación** (probablemente relacionada con facturación o saldos pendientes). - **Argumento obligatorio**: El **ID del contribuyente cliente de ServicioX** (no se acepta valor `null` si no se proporciona). No se requiere información adicional para cumplir con esta regla. total duration: 11.051470043s load duration: 276.136914ms prompt eval count: 1973 token(s) prompt eval duration: 1.10856585s prompt eval rate: 1779.78 tokens/s eval count: 418 token(s) eval duration: 9.665861322s eval rate: 43.24 tokens/s ... >>> /clear Cleared session context >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> """ ... ### Instructions ... ... Que dice la regla 15? ... """ <think> Okay, the user is asking about rule 15. Let me check the instructions again to make sure I get it right. Looking through the list, rule 15 says: If the client asks about the invoices that ServicioX issues to the client for the monthly service of ServicioX, or about the debts they have with ServicioX, then generate the action "send-liquidacion" with the client's ID. So the main points here are when the client is inquiring about their invoices or debts related to the monthly service. The action required is to send a liquidation, which involves the client's ID. The user probably wants a clear explanation of this rule, maybe for applying it correctly when handling such emails. They might be a support technician needing to know exactly what to do when a client mentions invoices or debts. I should make sure to highlight the trigger (invoices or debts) and the action (send-liquidacion with the ID). Also, note that the ID is a required argument here. No optional parameters mentioned for this action. Just need to confirm that the rule is correctly understood and that the action is generated only when the specific conditions are met. </think> La **regla 15** establece lo siguiente: > **Si el cliente consulta por las facturas que emite ServicioX al cliente por el servicio mensual de ServicioX, o por las deudas que mantiene con ServicioX, debes generar la acción:** > > ```json > { > "action": "send-liquidacion", > "ID": <ID del contribuyente cliente de ServicioX> > } > ``` ### Interpretación: - **Condición de activación**: El cliente pregunta sobre **facturas mensuales** de ServicioX o **deudas pendientes** con la plataforma. - **Acción requerida**: Enviar una liquidación (facturación o resumen de deudas) al cliente, proporcionando su **ID de contribuyente**. - **Argumento obligatorio**: El `ID` del contribuyente (no es opcional). Si no se proporciona, **no se genera la acción** y se debe solicitar esa información al cliente. ### Ejemplo: - **Correo del cliente**: *"¿Por qué no he recibido la factura mensual de ServicioX?"* - **Respuesta**: Se genera la acción `send-liquidacion` con el `ID` del contribuyente (si se proporciona en el correo). Si no se incluye el `ID`, se debe pedir al cliente que lo proporcione. total duration: 3m46.23970599s load duration: 4.669283655s prompt eval count: 1973 token(s) prompt eval duration: 2m4.871704903s prompt eval rate: 15.80 tokens/s eval count: 547 token(s) eval duration: 1m36.695757422s eval rate: 5.66 tokens/s >>> /bye $ ollama ps NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 9.6 GB 100% CPU Forever ```
Author
Owner

@kripper commented on GitHub (May 14, 2025):

This is the expected answer.
I'm not getting this answer with cpu-only.
Are you sure you used cpu-only?

<!-- gh-comment-id:2878226655 --> @kripper commented on GitHub (May 14, 2025): This is the expected answer. I'm not getting this answer with cpu-only. Are you sure you used cpu-only?
Author
Owner

@rick-github commented on GitHub (May 14, 2025):

Are you sure you used cpu-only?

NAME         ID              SIZE      PROCESSOR    UNTIL   
qwen3:14b    7d7da67570e2    9.6 GB    100% CPU     Forever

What version of ollama is running on the CPU-only machine?

<!-- gh-comment-id:2879317535 --> @rick-github commented on GitHub (May 14, 2025): > Are you sure you used cpu-only? ``` NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 9.6 GB 100% CPU Forever ``` What version of ollama is running on the CPU-only machine?
Author
Owner

@kripper commented on GitHub (May 14, 2025):

What version of ollama is running on the CPU-only machine?

ollama version is 0.6.6

<!-- gh-comment-id:2879339500 --> @kripper commented on GitHub (May 14, 2025): > What version of ollama is running on the CPU-only machine? `ollama version is 0.6.6`
Author
Owner

@kripper commented on GitHub (May 14, 2025):

I upgraded ollama to 0.6.8 and it fixed the problem.
I also tested qwen3:30b-a3b.
Thanks.

<!-- gh-comment-id:2879433775 --> @kripper commented on GitHub (May 14, 2025): I upgraded ollama to `0.6.8` and it fixed the problem. I also tested `qwen3:30b-a3b`. Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69079