[GH-ISSUE #11043] Please keepQ6_K quantizations support in Ollama #33044

Open
opened 2026-04-22 15:13:12 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @Burnarz on GitHub (Jun 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11043

I understand the desire to reduce and streamline quantization support, as staded in this comment
But in my humble opinion, dropping support for Q6_K is not the best move.

The idea is:

Reintroduce support for the Q6_K quantization format in Ollama, either as a first-class option or as an advanced override for model loading.

My use case:

Many developers (including myself) run Ollama models on 24GB of VRAM consumer GPUs — like RTX 3090 or 4090.

While Q4_K is great for memory efficiency, Q6_K hits a perfect sweet spot between performance and output quality. It leverages available VRAM more effectively, giving us:

  • Noticeably better generation quality than Q4_K, especially in long-form or nuanced outputs.
  • Still lightweight enough to run fast, with acceptable token speeds on 24GB GPUs.
  • Avoids the performance and memory overhead of full FP16 or Q8_0.

Why it's important:

  • Ollama aims to make local AI practical and efficient — and Q6_K is one of the best quant formats for high-end consumer setups.
  • Current quant choices feel like a gap: either too light (Q4_K) or too heavy (Q8_0 / F16).
  • Users with capable hardware aren't fully benefiting from the potential performance/quality ratio that Q6_K provides.

Resources:

  • Q6_K support was available in older GGUF builds and proven to work well.
  • Several models (like LLaMA, Mistral, Mixtral, etc.) had high-quality Q6_K variants.

Are you willing to help?

Happy to test and benchmark Q6_K versions on 24GB hardware and share results with the community.

Originally created by @Burnarz on GitHub (Jun 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11043 I understand the desire to reduce and streamline quantization support, as staded in this [comment](https://github.com/ollama/ollama/pull/10647#issuecomment-2873563847) But in my humble opinion, dropping support for `Q6_K` is not the best move. ### The idea is: Reintroduce support for the `Q6_K` quantization format in Ollama, either as a first-class option or as an advanced override for model loading. ### My use case: Many developers (including myself) run Ollama models on **24GB of VRAM** consumer GPUs — like RTX 3090 or 4090. While `Q4_K` is great for memory efficiency, `Q6_K` hits a perfect **sweet spot between performance and output quality**. It leverages available VRAM more effectively, giving us: - Noticeably **better generation quality** than `Q4_K`, especially in long-form or nuanced outputs. - Still **lightweight enough to run fast**, with acceptable token speeds on 24GB GPUs. - Avoids the performance and memory overhead of full `FP16` or `Q8_0`. ### Why it's important: - Ollama aims to make local AI practical and efficient — and `Q6_K` is one of the best quant formats for high-end consumer setups. - Current quant choices feel like a gap: either too light (`Q4_K`) or too heavy (`Q8_0` / `F16`). - Users with capable hardware aren't fully benefiting from the potential performance/quality ratio that `Q6_K` provides. ### Resources: - `Q6_K` support was available in older GGUF builds and proven to work well. - Several models (like LLaMA, Mistral, Mixtral, etc.) had high-quality Q6_K variants. ### Are you willing to help? Happy to test and benchmark Q6_K versions on 24GB hardware and share results with the community.
GiteaMirror added the feature request label 2026-04-22 15:13:12 -05:00
Author
Owner

@LarsKort commented on GitHub (Jun 17, 2025):

Why not to use huggingface hub to get Q6_K models?
Example:
ollama pull hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q6_K

<!-- gh-comment-id:2978779760 --> @LarsKort commented on GitHub (Jun 17, 2025): Why not to use huggingface hub to get Q6_K models? Example: `ollama pull hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q6_K`
Author
Owner

@Burnarz commented on GitHub (Jun 17, 2025):

My bad,.. I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work... but I haven't tried it....
Thank you.

<!-- gh-comment-id:2980343633 --> @Burnarz commented on GitHub (Jun 17, 2025): My bad,.. I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work... but I haven't tried it.... Thank you.
Author
Owner

@laniakea64 commented on GitHub (Jun 18, 2025):

I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work...

@Burnarz were you thinking of https://github.com/ollama/ollama/pull/10647#issuecomment-2873563847 ? That comment seems like valid reason for this issue to be open?


So I happened on this because I heard about quantization and was curious what practical difference it might make, so compared a few quantization levels to see how well the model could answer questions in areas where I have a nuanced understanding:

  • Default quantization from ollama pull <model> (IIUC this is q4_K_M for very recent models and q4_0 for models that have been available for a while): This was my baseline, and even with the perspective from this testing it still seems ok.
  • q6_K: This is indeed a "sweet spot" between quality and speed (even though only 8 GB VRAM here), producing noticeably better generation quality than q4_K_M.
  • q8_0: Surprisingly, this quantization level was not just slower, but also worse quality answers, than both the default quantization and q6_K?? 👀 It seemed like the model knew about more aspects, but was much more prone to conflating things that shouldn't be conflated, making overall worse results. That didn't happen at other quantization levels of the same model.
  • fp16: Only one model had enough latitude in speed on my hardware to try this quantization level, and in that one case, it produced reasonable quality output with significantly more fine-tuned nuance than the rest.

At first I didn't believe what I was seeing at q8_0, so I made sure to test multiple different models at each quantization level (other than fp16) and try multiple runs each with the exact same input prompt. The same type of degradation at only q8_0 occurred across the board. This was with Ollama 0.9.0 with the only non-default server setting is OLLAMA_NUM_PARALLEL=1.

It seems from the above linked comment by dhiltgen that the rationale for phasing out q6_K support was the same as the rationale for phasing out other quantizations: supporting too many quantizations was making too big a maintenance burden. But given that q8_0 appears to have counterintuitive caveats, and given how many users both here and in the linked PR find such positive results with q6_K specifically, is it possible that q6_K support might have more value than q8_0 support?

<!-- gh-comment-id:2982581098 --> @laniakea64 commented on GitHub (Jun 18, 2025): > I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work... @Burnarz were you thinking of https://github.com/ollama/ollama/pull/10647#issuecomment-2873563847 ? That comment seems like valid reason for this issue to be open? ---------- So I happened on this because I heard about quantization and was curious what practical difference it might make, so compared a few quantization levels to see how well the model could answer questions in areas where I have a nuanced understanding: - Default quantization from `ollama pull <model>` (IIUC this is `q4_K_M` for very recent models and `q4_0` for models that have been available for a while): This was my baseline, and even with the perspective from this testing it still seems ok. - `q6_K`: This is indeed a "sweet spot" between quality and speed (even though only 8 GB VRAM here), producing noticeably better generation quality than `q4_K_M`. - `q8_0`: Surprisingly, this quantization level was not just slower, but also **worse quality answers**, than both the default quantization and `q6_K`?? :eyes: It seemed like the model knew about more aspects, but was much more prone to conflating things that shouldn't be conflated, making overall worse results. That didn't happen at other quantization levels of the same model. - `fp16`: Only one model had enough latitude in speed on my hardware to try this quantization level, and in that one case, it produced reasonable quality output with significantly more fine-tuned nuance than the rest. At first I didn't believe what I was seeing at `q8_0`, so I made sure to test multiple different models at each quantization level (other than `fp16`) and try multiple runs each with the exact same input prompt. The same type of degradation at only `q8_0` occurred across the board. This was with Ollama 0.9.0 with the only non-default server setting is `OLLAMA_NUM_PARALLEL=1`. It seems from the above linked comment by dhiltgen that the rationale for phasing out `q6_K` support was the same as the rationale for phasing out other quantizations: supporting too many quantizations was making too big a maintenance burden. But given that `q8_0` appears to have counterintuitive caveats, and given how many users both here and in the linked PR find such positive results with `q6_K` specifically, is it possible that **`q6_K` support might have more value** than `q8_0` support?
Author
Owner

@Burnarz commented on GitHub (Jun 18, 2025):

Thanks @laniakea64 ,
This was the one.
Reopening and renaming

<!-- gh-comment-id:2983216975 --> @Burnarz commented on GitHub (Jun 18, 2025): Thanks @laniakea64 , This was the one. Reopening and renaming
Author
Owner

@jake1271 commented on GitHub (Aug 23, 2025):

Odd that q6 isn't yet supported natively considering it fits nicely with consumer GPU vram amounts, which I would think is the majority of users of Ollama. At least it should be supported on the technically focused models like qwen3-coder , for the general purpose ones maybe not as important.

<!-- gh-comment-id:3216550810 --> @jake1271 commented on GitHub (Aug 23, 2025): Odd that q6 isn't yet supported natively considering it fits nicely with consumer GPU vram amounts, which I would think is the majority of users of Ollama. At least it should be supported on the technically focused models like qwen3-coder , for the general purpose ones maybe not as important.
Author
Owner

@hveigz commented on GitHub (Oct 21, 2025):

any news about this?

<!-- gh-comment-id:3425597984 --> @hveigz commented on GitHub (Oct 21, 2025): any news about this?
Author
Owner

@SuperUserNameMan commented on GitHub (Oct 21, 2025):

Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ?

<!-- gh-comment-id:3425691438 --> @SuperUserNameMan commented on GitHub (Oct 21, 2025): Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ?
Author
Owner

@laniakea64 commented on GitHub (Oct 22, 2025):

Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ?

Here's how I tried to do that (there might be a better way as I'm not expert in AI quality testing):

Ask the AI factual question(s) in area where you have nuanced domain-specific knowledge and understanding and possibly also some experience that might help you understand even more nuance. If any of the information in the AI's response seems weird or "off" to you, fact-check it.

To account for the text being random, perform multiple runs at each quantization level, where all runs across all quantization levels of the model are all exactly the same input & context. I would say at least 3 runs for each quantization level. It's easiest if your chat context is just your one message that contains your question.

<!-- gh-comment-id:3433239102 --> @laniakea64 commented on GitHub (Oct 22, 2025): > Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ? Here's how I tried to do that (there might be a better way as I'm not expert in AI quality testing): Ask the AI factual question(s) in area where you have **nuanced** domain-specific knowledge and understanding and possibly also some experience that might help you understand even more nuance. If any of the information in the AI's response seems weird or "off" to you, fact-check it. To account for the text being random, perform multiple runs at each quantization level, where all runs across all quantization levels of the model are all *exactly* the same input & context. I would say at least 3 runs for each quantization level. It's easiest if your chat context is just your one message that contains your question.
Author
Owner

@chigkim commented on GitHub (Nov 3, 2025):

A while ago, I ran the MMLU Pro benchmark with different quants of Gemma2 9b-instruct and 27b-instruct using chigkim/Ollama-MMLU-Pro and Ollama.

Model Size overall biology business chemistry computer science economics engineering health history law math philosophy physics psychology other
9b-q2_K 3.8GB 42.02 64.99 44.36 35.16 37.07 55.09 22.50 43.28 48.56 29.25 41.52 39.28 36.26 59.27 48.16
9b-q3_K_S 4.3GB 44.92 65.27 52.09 38.34 42.68 61.02 22.08 46.21 51.71 31.34 44.49 41.28 38.49 62.53 50.00
9b-q3_K_M 4.8GB 46.43 60.53 50.44 42.49 41.95 63.74 23.63 49.02 54.33 32.43 46.85 40.28 41.72 62.91 53.14
9b-q3_K_L 5.1GB 46.95 63.18 52.09 42.31 45.12 62.80 23.74 51.22 50.92 33.15 46.26 43.89 40.34 63.91 54.65
9b-q4_0 5.4GB 47.94 64.44 53.61 45.05 42.93 61.14 24.25 53.91 53.81 33.51 47.45 43.49 42.80 64.41 54.44
9b-q4_K_S 5.5GB 48.31 66.67 53.74 45.58 43.90 61.61 25.28 51.10 53.02 34.70 47.37 43.69 43.65 64.66 54.87
9b-q4_K_M 5.8GB 47.73 64.44 53.74 44.61 43.90 61.97 24.46 51.22 54.07 31.61 47.82 43.29 42.73 63.78 55.52
9b-q4_1 6.0GB 48.58 66.11 53.61 43.55 47.07 61.49 24.87 56.36 54.59 33.06 49.00 47.70 42.19 66.17 53.35
9b-q5_0 6.5GB 49.23 68.62 55.13 45.67 45.61 63.15 25.59 55.87 51.97 34.79 48.56 45.49 43.49 64.79 54.98
9b-q5_K_S 6.5GB 48.99 70.01 55.01 45.76 45.61 63.51 24.77 55.87 53.81 32.97 47.22 47.70 42.03 64.91 55.52
9b-q5_K_M 6.6GB 48.99 68.76 55.39 46.82 45.61 62.32 24.05 56.60 53.54 32.61 46.93 46.69 42.57 65.16 56.60
9b-q5_1 7.0GB 49.17 71.13 56.40 43.90 44.63 61.73 25.08 55.50 53.54 34.24 48.78 45.69 43.19 64.91 55.84
9b-q6_K 7.6GB 48.99 68.90 54.25 45.41 47.32 61.85 25.59 55.75 53.54 32.97 47.52 45.69 43.57 64.91 55.95
9b-q8_0 9.8GB 48.55 66.53 54.50 45.23 45.37 60.90 25.70 54.65 52.23 32.88 47.22 47.29 43.11 65.66 54.87
9b-fp16 18GB 48.89 67.78 54.25 46.47 44.63 62.09 26.21 54.16 52.76 33.15 47.45 47.09 42.65 65.41 56.28
27b-q2_K 10GB 44.63 72.66 48.54 35.25 43.66 59.83 19.81 51.10 48.56 32.97 41.67 42.89 35.95 62.91 51.84
27b-q3_K_S 12GB 54.14 77.68 57.41 50.18 53.90 67.65 31.06 60.76 59.06 39.87 50.04 50.50 49.42 71.43 58.66
27b-q3_K_M 13GB 53.23 75.17 61.09 48.67 51.95 68.01 27.66 61.12 59.06 38.51 48.70 47.90 48.19 71.18 58.23
27b-q3_K_L 15GB 54.06 76.29 61.72 49.03 52.68 68.13 27.76 61.25 54.07 40.42 50.33 51.10 48.88 72.56 59.96
27b-q4_0 16GB 55.38 77.55 60.08 51.15 53.90 69.19 32.20 63.33 57.22 41.33 50.85 52.51 51.35 71.43 60.61
27b-q4_K_S 16GB 54.85 76.15 61.85 48.85 55.61 68.13 32.30 62.96 56.43 39.06 51.89 50.90 49.73 71.80 60.93
27b-q4_K_M 17GB 54.80 76.01 60.71 50.35 54.63 70.14 30.96 62.59 59.32 40.51 50.78 51.70 49.11 70.93 59.74
27b-q4_1 17GB 55.59 78.38 60.96 51.33 57.07 69.79 30.86 62.96 57.48 40.15 52.63 52.91 50.73 72.31 60.17
27b-q5_0 19GB 56.46 76.29 61.09 52.39 55.12 70.73 31.48 63.08 59.58 41.24 55.22 53.71 51.50 73.18 62.66
27b-q5_K_S 19GB 56.14 77.41 63.37 50.71 57.07 70.73 31.99 64.43 58.27 42.87 53.15 50.70 51.04 72.31 59.85
27b-q5_K_M 19GB 55.97 77.41 63.37 51.94 56.10 69.79 30.34 64.06 58.79 41.14 52.55 52.30 51.35 72.18 60.93
27b-q5_1 21GB 57.09 77.41 63.88 53.89 56.83 71.56 31.27 63.69 58.53 42.05 56.48 51.70 51.35 74.44 61.80
27b-q6_K 22GB 56.85 77.82 63.50 52.39 56.34 71.68 32.51 63.33 58.53 40.96 54.33 53.51 51.81 73.56 63.20
27b-q8_0 29GB 56.96 77.27 63.88 52.83 58.05 71.09 32.61 64.06 59.32 42.14 54.48 52.10 52.66 72.81 61.47
<!-- gh-comment-id:3478874199 --> @chigkim commented on GitHub (Nov 3, 2025): A while ago, I ran the [MMLU Pro benchmark](https://arxiv.org/html/2406.01574v4) with different quants of Gemma2 9b-instruct and 27b-instruct using [chigkim/Ollama-MMLU-Pro](https://github.com/chigkim/Ollama-MMLU-Pro/) and Ollama. | Model | Size | overall | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other | | ---------- | ----- | ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | --- | ---- | ---------- | ------- | ---------- | ----- | | 9b-q2_K | 3.8GB | 42.02 | 64.99 | 44.36 | 35.16 | 37.07 | 55.09 | 22.50 | 43.28 | 48.56 | 29.25 | 41.52 | 39.28 | 36.26 | 59.27 | 48.16 | | 9b-q3_K_S | 4.3GB | 44.92 | 65.27 | 52.09 | 38.34 | 42.68 | 61.02 | 22.08 | 46.21 | 51.71 | 31.34 | 44.49 | 41.28 | 38.49 | 62.53 | 50.00 | | 9b-q3_K_M | 4.8GB | 46.43 | 60.53 | 50.44 | 42.49 | 41.95 | 63.74 | 23.63 | 49.02 | 54.33 | 32.43 | 46.85 | 40.28 | 41.72 | 62.91 | 53.14 | | 9b-q3_K_L | 5.1GB | 46.95 | 63.18 | 52.09 | 42.31 | 45.12 | 62.80 | 23.74 | 51.22 | 50.92 | 33.15 | 46.26 | 43.89 | 40.34 | 63.91 | 54.65 | | 9b-q4_0 | 5.4GB | 47.94 | 64.44 | 53.61 | 45.05 | 42.93 | 61.14 | 24.25 | 53.91 | 53.81 | 33.51 | 47.45 | 43.49 | 42.80 | 64.41 | 54.44 | | 9b-q4_K_S | 5.5GB | 48.31 | 66.67 | 53.74 | 45.58 | 43.90 | 61.61 | 25.28 | 51.10 | 53.02 | 34.70 | 47.37 | 43.69 | 43.65 | 64.66 | 54.87 | | 9b-q4_K_M | 5.8GB | 47.73 | 64.44 | 53.74 | 44.61 | 43.90 | 61.97 | 24.46 | 51.22 | 54.07 | 31.61 | 47.82 | 43.29 | 42.73 | 63.78 | 55.52 | | 9b-q4_1 | 6.0GB | 48.58 | 66.11 | 53.61 | 43.55 | 47.07 | 61.49 | 24.87 | 56.36 | 54.59 | 33.06 | 49.00 | 47.70 | 42.19 | 66.17 | 53.35 | | 9b-q5_0 | 6.5GB | 49.23 | 68.62 | 55.13 | 45.67 | 45.61 | 63.15 | 25.59 | 55.87 | 51.97 | 34.79 | 48.56 | 45.49 | 43.49 | 64.79 | 54.98 | | 9b-q5_K_S | 6.5GB | 48.99 | 70.01 | 55.01 | 45.76 | 45.61 | 63.51 | 24.77 | 55.87 | 53.81 | 32.97 | 47.22 | 47.70 | 42.03 | 64.91 | 55.52 | | 9b-q5_K_M | 6.6GB | 48.99 | 68.76 | 55.39 | 46.82 | 45.61 | 62.32 | 24.05 | 56.60 | 53.54 | 32.61 | 46.93 | 46.69 | 42.57 | 65.16 | 56.60 | | 9b-q5_1 | 7.0GB | 49.17 | 71.13 | 56.40 | 43.90 | 44.63 | 61.73 | 25.08 | 55.50 | 53.54 | 34.24 | 48.78 | 45.69 | 43.19 | 64.91 | 55.84 | | 9b-q6_K | 7.6GB | 48.99 | 68.90 | 54.25 | 45.41 | 47.32 | 61.85 | 25.59 | 55.75 | 53.54 | 32.97 | 47.52 | 45.69 | 43.57 | 64.91 | 55.95 | | 9b-q8_0 | 9.8GB | 48.55 | 66.53 | 54.50 | 45.23 | 45.37 | 60.90 | 25.70 | 54.65 | 52.23 | 32.88 | 47.22 | 47.29 | 43.11 | 65.66 | 54.87 | | 9b-fp16 | 18GB | 48.89 | 67.78 | 54.25 | 46.47 | 44.63 | 62.09 | 26.21 | 54.16 | 52.76 | 33.15 | 47.45 | 47.09 | 42.65 | 65.41 | 56.28 | | 27b-q2_K | 10GB | 44.63 | 72.66 | 48.54 | 35.25 | 43.66 | 59.83 | 19.81 | 51.10 | 48.56 | 32.97 | 41.67 | 42.89 | 35.95 | 62.91 | 51.84 | | 27b-q3_K_S | 12GB | 54.14 | 77.68 | 57.41 | 50.18 | 53.90 | 67.65 | 31.06 | 60.76 | 59.06 | 39.87 | 50.04 | 50.50 | 49.42 | 71.43 | 58.66 | | 27b-q3_K_M | 13GB | 53.23 | 75.17 | 61.09 | 48.67 | 51.95 | 68.01 | 27.66 | 61.12 | 59.06 | 38.51 | 48.70 | 47.90 | 48.19 | 71.18 | 58.23 | | 27b-q3_K_L | 15GB | 54.06 | 76.29 | 61.72 | 49.03 | 52.68 | 68.13 | 27.76 | 61.25 | 54.07 | 40.42 | 50.33 | 51.10 | 48.88 | 72.56 | 59.96 | | 27b-q4_0 | 16GB | 55.38 | 77.55 | 60.08 | 51.15 | 53.90 | 69.19 | 32.20 | 63.33 | 57.22 | 41.33 | 50.85 | 52.51 | 51.35 | 71.43 | 60.61 | | 27b-q4_K_S | 16GB | 54.85 | 76.15 | 61.85 | 48.85 | 55.61 | 68.13 | 32.30 | 62.96 | 56.43 | 39.06 | 51.89 | 50.90 | 49.73 | 71.80 | 60.93 | | 27b-q4_K_M | 17GB | 54.80 | 76.01 | 60.71 | 50.35 | 54.63 | 70.14 | 30.96 | 62.59 | 59.32 | 40.51 | 50.78 | 51.70 | 49.11 | 70.93 | 59.74 | | 27b-q4_1 | 17GB | 55.59 | 78.38 | 60.96 | 51.33 | 57.07 | 69.79 | 30.86 | 62.96 | 57.48 | 40.15 | 52.63 | 52.91 | 50.73 | 72.31 | 60.17 | | 27b-q5_0 | 19GB | 56.46 | 76.29 | 61.09 | 52.39 | 55.12 | 70.73 | 31.48 | 63.08 | 59.58 | 41.24 | 55.22 | 53.71 | 51.50 | 73.18 | 62.66 | | 27b-q5_K_S | 19GB | 56.14 | 77.41 | 63.37 | 50.71 | 57.07 | 70.73 | 31.99 | 64.43 | 58.27 | 42.87 | 53.15 | 50.70 | 51.04 | 72.31 | 59.85 | | 27b-q5_K_M | 19GB | 55.97 | 77.41 | 63.37 | 51.94 | 56.10 | 69.79 | 30.34 | 64.06 | 58.79 | 41.14 | 52.55 | 52.30 | 51.35 | 72.18 | 60.93 | | 27b-q5_1 | 21GB | 57.09 | 77.41 | 63.88 | 53.89 | 56.83 | 71.56 | 31.27 | 63.69 | 58.53 | 42.05 | 56.48 | 51.70 | 51.35 | 74.44 | 61.80 | | 27b-q6_K | 22GB | 56.85 | 77.82 | 63.50 | 52.39 | 56.34 | 71.68 | 32.51 | 63.33 | 58.53 | 40.96 | 54.33 | 53.51 | 51.81 | 73.56 | 63.20 | | 27b-q8_0 | 29GB | 56.96 | 77.27 | 63.88 | 52.83 | 58.05 | 71.09 | 32.61 | 64.06 | 59.32 | 42.14 | 54.48 | 52.10 | 52.66 | 72.81 | 61.47 |
Author
Owner

@SuperUserNameMan commented on GitHub (Nov 3, 2025):

@chigkim : thanks for these data.

Here is my colorful interpretation of them :

Image
<!-- gh-comment-id:3480099335 --> @SuperUserNameMan commented on GitHub (Nov 3, 2025): @chigkim : thanks for these data. Here is my colorful interpretation of them : <img width="854" height="820" alt="Image" src="https://github.com/user-attachments/assets/8558f7a9-6a16-44a6-9be1-b26e06618276" />
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33044