[GH-ISSUE #11043] Please keepQ6_K quantizations support in Ollama #69343

New Issue

GiteaMirror · 2026-05-04T17:51:19-05:00

GiteaMirror commented

2026-05-04 17:51:19 -05:00

Originally created by @Burnarz on GitHub (Jun 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11043

I understand the desire to reduce and streamline quantization support, as staded in this comment
But in my humble opinion, dropping support for Q6_K is not the best move.

The idea is:

Reintroduce support for the Q6_K quantization format in Ollama, either as a first-class option or as an advanced override for model loading.

My use case:

Many developers (including myself) run Ollama models on 24GB of VRAM consumer GPUs — like RTX 3090 or 4090.

While Q4_K is great for memory efficiency, Q6_K hits a perfect sweet spot between performance and output quality. It leverages available VRAM more effectively, giving us:

Noticeably better generation quality than Q4_K, especially in long-form or nuanced outputs.
Still lightweight enough to run fast, with acceptable token speeds on 24GB GPUs.
Avoids the performance and memory overhead of full FP16 or Q8_0.

Why it's important:

Ollama aims to make local AI practical and efficient — and Q6_K is one of the best quant formats for high-end consumer setups.
Current quant choices feel like a gap: either too light (Q4_K) or too heavy (Q8_0 / F16).
Users with capable hardware aren't fully benefiting from the potential performance/quality ratio that Q6_K provides.

Resources:

Q6_K support was available in older GGUF builds and proven to work well.
Several models (like LLaMA, Mistral, Mixtral, etc.) had high-quality Q6_K variants.

Are you willing to help?

Happy to test and benchmark Q6_K versions on 24GB hardware and share results with the community.

Originally created by @Burnarz on GitHub (Jun 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11043 I understand the desire to reduce and streamline quantization support, as staded in this [comment](https://github.com/ollama/ollama/pull/10647#issuecomment-2873563847) But in my humble opinion, dropping support for `Q6_K` is not the best move. ### The idea is: Reintroduce support for the `Q6_K` quantization format in Ollama, either as a first-class option or as an advanced override for model loading. ### My use case: Many developers (including myself) run Ollama models on **24GB of VRAM** consumer GPUs — like RTX 3090 or 4090. While `Q4_K` is great for memory efficiency, `Q6_K` hits a perfect **sweet spot between performance and output quality**. It leverages available VRAM more effectively, giving us: - Noticeably **better generation quality** than `Q4_K`, especially in long-form or nuanced outputs. - Still **lightweight enough to run fast**, with acceptable token speeds on 24GB GPUs. - Avoids the performance and memory overhead of full `FP16` or `Q8_0`. ### Why it's important: - Ollama aims to make local AI practical and efficient — and `Q6_K` is one of the best quant formats for high-end consumer setups. - Current quant choices feel like a gap: either too light (`Q4_K`) or too heavy (`Q8_0` / `F16`). - Users with capable hardware aren't fully benefiting from the potential performance/quality ratio that `Q6_K` provides. ### Resources: - `Q6_K` support was available in older GGUF builds and proven to work well. - Several models (like LLaMA, Mistral, Mixtral, etc.) had high-quality Q6_K variants. ### Are you willing to help? Happy to test and benchmark Q6_K versions on 24GB hardware and share results with the community.

GiteaMirror added the feature request label 2026-05-04 17:51:19 -05:00

GiteaMirror commented

2026-05-04 17:52:25 -05:00

@LarsKort commented on GitHub (Jun 17, 2025):

Why not to use huggingface hub to get Q6_K models?
Example:
ollama pull hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q6_K

@LarsKort commented on GitHub (Jun 17, 2025): Why not to use huggingface hub to get Q6_K models? Example: `ollama pull hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q6_K`

GiteaMirror commented

2026-05-04 17:52:27 -05:00

@Burnarz commented on GitHub (Jun 17, 2025):

My bad,.. I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work... but I haven't tried it....
Thank you.

@Burnarz commented on GitHub (Jun 17, 2025): My bad,.. I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work... but I haven't tried it.... Thank you.

GiteaMirror commented

2026-05-04 17:52:28 -05:00

@laniakea64 commented on GitHub (Jun 18, 2025):

I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work...

@Burnarz were you thinking of https://github.com/ollama/ollama/pull/10647#issuecomment-2873563847 ? That comment seems like valid reason for this issue to be open?

So I happened on this because I heard about quantization and was curious what practical difference it might make, so compared a few quantization levels to see how well the model could answer questions in areas where I have a nuanced understanding:

Default quantization from ollama pull <model> (IIUC this is q4_K_M for very recent models and q4_0 for models that have been available for a while): This was my baseline, and even with the perspective from this testing it still seems ok.
q6_K: This is indeed a "sweet spot" between quality and speed (even though only 8 GB VRAM here), producing noticeably better generation quality than q4_K_M.
q8_0: Surprisingly, this quantization level was not just slower, but also worse quality answers, than both the default quantization and q6_K?? 👀 It seemed like the model knew about more aspects, but was much more prone to conflating things that shouldn't be conflated, making overall worse results. That didn't happen at other quantization levels of the same model.
fp16: Only one model had enough latitude in speed on my hardware to try this quantization level, and in that one case, it produced reasonable quality output with significantly more fine-tuned nuance than the rest.

At first I didn't believe what I was seeing at q8_0, so I made sure to test multiple different models at each quantization level (other than fp16) and try multiple runs each with the exact same input prompt. The same type of degradation at only q8_0 occurred across the board. This was with Ollama 0.9.0 with the only non-default server setting is OLLAMA_NUM_PARALLEL=1.

It seems from the above linked comment by dhiltgen that the rationale for phasing out q6_K support was the same as the rationale for phasing out other quantizations: supporting too many quantizations was making too big a maintenance burden. But given that q8_0 appears to have counterintuitive caveats, and given how many users both here and in the linked PR find such positive results with q6_K specifically, is it possible that q6_K support might have more value than q8_0 support?

@laniakea64 commented on GitHub (Jun 18, 2025): > I thought I had read somewhere a discussion between the ollama team, saying that even that wouldn't work... @Burnarz were you thinking of https://github.com/ollama/ollama/pull/10647#issuecomment-2873563847 ? That comment seems like valid reason for this issue to be open? ---------- So I happened on this because I heard about quantization and was curious what practical difference it might make, so compared a few quantization levels to see how well the model could answer questions in areas where I have a nuanced understanding: - Default quantization from `ollama pull <model>` (IIUC this is `q4_K_M` for very recent models and `q4_0` for models that have been available for a while): This was my baseline, and even with the perspective from this testing it still seems ok. - `q6_K`: This is indeed a "sweet spot" between quality and speed (even though only 8 GB VRAM here), producing noticeably better generation quality than `q4_K_M`. - `q8_0`: Surprisingly, this quantization level was not just slower, but also **worse quality answers**, than both the default quantization and `q6_K`?? :eyes: It seemed like the model knew about more aspects, but was much more prone to conflating things that shouldn't be conflated, making overall worse results. That didn't happen at other quantization levels of the same model. - `fp16`: Only one model had enough latitude in speed on my hardware to try this quantization level, and in that one case, it produced reasonable quality output with significantly more fine-tuned nuance than the rest. At first I didn't believe what I was seeing at `q8_0`, so I made sure to test multiple different models at each quantization level (other than `fp16`) and try multiple runs each with the exact same input prompt. The same type of degradation at only `q8_0` occurred across the board. This was with Ollama 0.9.0 with the only non-default server setting is `OLLAMA_NUM_PARALLEL=1`. It seems from the above linked comment by dhiltgen that the rationale for phasing out `q6_K` support was the same as the rationale for phasing out other quantizations: supporting too many quantizations was making too big a maintenance burden. But given that `q8_0` appears to have counterintuitive caveats, and given how many users both here and in the linked PR find such positive results with `q6_K` specifically, is it possible that **`q6_K` support might have more value** than `q8_0` support?

GiteaMirror commented

2026-05-04 17:52:29 -05:00

@Burnarz commented on GitHub (Jun 18, 2025):

Thanks @laniakea64 ,
This was the one.
Reopening and renaming

@Burnarz commented on GitHub (Jun 18, 2025): Thanks @laniakea64 , This was the one. Reopening and renaming

GiteaMirror commented

2026-05-04 17:52:32 -05:00

@jake1271 commented on GitHub (Aug 23, 2025):

Odd that q6 isn't yet supported natively considering it fits nicely with consumer GPU vram amounts, which I would think is the majority of users of Ollama. At least it should be supported on the technically focused models like qwen3-coder , for the general purpose ones maybe not as important.

@jake1271 commented on GitHub (Aug 23, 2025): Odd that q6 isn't yet supported natively considering it fits nicely with consumer GPU vram amounts, which I would think is the majority of users of Ollama. At least it should be supported on the technically focused models like qwen3-coder , for the general purpose ones maybe not as important.

GiteaMirror commented

2026-05-04 17:52:34 -05:00

@hveigz commented on GitHub (Oct 21, 2025):

any news about this?

@hveigz commented on GitHub (Oct 21, 2025): any news about this?

GiteaMirror commented

2026-05-04 17:52:36 -05:00

@SuperUserNameMan commented on GitHub (Oct 21, 2025):

Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ?

@SuperUserNameMan commented on GitHub (Oct 21, 2025): Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ?

GiteaMirror commented

2026-05-04 17:52:38 -05:00

@laniakea64 commented on GitHub (Oct 22, 2025):

Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ?

Here's how I tried to do that (there might be a better way as I'm not expert in AI quality testing):

Ask the AI factual question(s) in area where you have nuanced domain-specific knowledge and understanding and possibly also some experience that might help you understand even more nuance. If any of the information in the AI's response seems weird or "off" to you, fact-check it.

To account for the text being random, perform multiple runs at each quantization level, where all runs across all quantization levels of the model are all exactly the same input & context. I would say at least 3 runs for each quantization level. It's easiest if your chat context is just your one message that contains your question.

@laniakea64 commented on GitHub (Oct 22, 2025): > Given that the generated text is random, what is the method to compare the output quality of q4_0 vs q4_k vs q6_k vs q8_0 in an objective (non-subjective) manner ? Here's how I tried to do that (there might be a better way as I'm not expert in AI quality testing): Ask the AI factual question(s) in area where you have **nuanced** domain-specific knowledge and understanding and possibly also some experience that might help you understand even more nuance. If any of the information in the AI's response seems weird or "off" to you, fact-check it. To account for the text being random, perform multiple runs at each quantization level, where all runs across all quantization levels of the model are all *exactly* the same input & context. I would say at least 3 runs for each quantization level. It's easiest if your chat context is just your one message that contains your question.

GiteaMirror commented

2026-05-04 17:52:39 -05:00

@chigkim commented on GitHub (Nov 3, 2025):

A while ago, I ran the MMLU Pro benchmark with different quants of Gemma2 9b-instruct and 27b-instruct using chigkim/Ollama-MMLU-Pro and Ollama.

Model	Size	overall	biology	business	chemistry	computer science	economics	engineering	health	history	law	math	philosophy	physics	psychology	other
9b-q2_K	3.8GB	42.02	64.99	44.36	35.16	37.07	55.09	22.50	43.28	48.56	29.25	41.52	39.28	36.26	59.27	48.16
9b-q3_K_S	4.3GB	44.92	65.27	52.09	38.34	42.68	61.02	22.08	46.21	51.71	31.34	44.49	41.28	38.49	62.53	50.00
9b-q3_K_M	4.8GB	46.43	60.53	50.44	42.49	41.95	63.74	23.63	49.02	54.33	32.43	46.85	40.28	41.72	62.91	53.14
9b-q3_K_L	5.1GB	46.95	63.18	52.09	42.31	45.12	62.80	23.74	51.22	50.92	33.15	46.26	43.89	40.34	63.91	54.65
9b-q4_0	5.4GB	47.94	64.44	53.61	45.05	42.93	61.14	24.25	53.91	53.81	33.51	47.45	43.49	42.80	64.41	54.44
9b-q4_K_S	5.5GB	48.31	66.67	53.74	45.58	43.90	61.61	25.28	51.10	53.02	34.70	47.37	43.69	43.65	64.66	54.87
9b-q4_K_M	5.8GB	47.73	64.44	53.74	44.61	43.90	61.97	24.46	51.22	54.07	31.61	47.82	43.29	42.73	63.78	55.52
9b-q4_1	6.0GB	48.58	66.11	53.61	43.55	47.07	61.49	24.87	56.36	54.59	33.06	49.00	47.70	42.19	66.17	53.35
9b-q5_0	6.5GB	49.23	68.62	55.13	45.67	45.61	63.15	25.59	55.87	51.97	34.79	48.56	45.49	43.49	64.79	54.98
9b-q5_K_S	6.5GB	48.99	70.01	55.01	45.76	45.61	63.51	24.77	55.87	53.81	32.97	47.22	47.70	42.03	64.91	55.52
9b-q5_K_M	6.6GB	48.99	68.76	55.39	46.82	45.61	62.32	24.05	56.60	53.54	32.61	46.93	46.69	42.57	65.16	56.60
9b-q5_1	7.0GB	49.17	71.13	56.40	43.90	44.63	61.73	25.08	55.50	53.54	34.24	48.78	45.69	43.19	64.91	55.84
9b-q6_K	7.6GB	48.99	68.90	54.25	45.41	47.32	61.85	25.59	55.75	53.54	32.97	47.52	45.69	43.57	64.91	55.95
9b-q8_0	9.8GB	48.55	66.53	54.50	45.23	45.37	60.90	25.70	54.65	52.23	32.88	47.22	47.29	43.11	65.66	54.87
9b-fp16	18GB	48.89	67.78	54.25	46.47	44.63	62.09	26.21	54.16	52.76	33.15	47.45	47.09	42.65	65.41	56.28
27b-q2_K	10GB	44.63	72.66	48.54	35.25	43.66	59.83	19.81	51.10	48.56	32.97	41.67	42.89	35.95	62.91	51.84
27b-q3_K_S	12GB	54.14	77.68	57.41	50.18	53.90	67.65	31.06	60.76	59.06	39.87	50.04	50.50	49.42	71.43	58.66
27b-q3_K_M	13GB	53.23	75.17	61.09	48.67	51.95	68.01	27.66	61.12	59.06	38.51	48.70	47.90	48.19	71.18	58.23
27b-q3_K_L	15GB	54.06	76.29	61.72	49.03	52.68	68.13	27.76	61.25	54.07	40.42	50.33	51.10	48.88	72.56	59.96
27b-q4_0	16GB	55.38	77.55	60.08	51.15	53.90	69.19	32.20	63.33	57.22	41.33	50.85	52.51	51.35	71.43	60.61
27b-q4_K_S	16GB	54.85	76.15	61.85	48.85	55.61	68.13	32.30	62.96	56.43	39.06	51.89	50.90	49.73	71.80	60.93
27b-q4_K_M	17GB	54.80	76.01	60.71	50.35	54.63	70.14	30.96	62.59	59.32	40.51	50.78	51.70	49.11	70.93	59.74
27b-q4_1	17GB	55.59	78.38	60.96	51.33	57.07	69.79	30.86	62.96	57.48	40.15	52.63	52.91	50.73	72.31	60.17
27b-q5_0	19GB	56.46	76.29	61.09	52.39	55.12	70.73	31.48	63.08	59.58	41.24	55.22	53.71	51.50	73.18	62.66
27b-q5_K_S	19GB	56.14	77.41	63.37	50.71	57.07	70.73	31.99	64.43	58.27	42.87	53.15	50.70	51.04	72.31	59.85
27b-q5_K_M	19GB	55.97	77.41	63.37	51.94	56.10	69.79	30.34	64.06	58.79	41.14	52.55	52.30	51.35	72.18	60.93
27b-q5_1	21GB	57.09	77.41	63.88	53.89	56.83	71.56	31.27	63.69	58.53	42.05	56.48	51.70	51.35	74.44	61.80
27b-q6_K	22GB	56.85	77.82	63.50	52.39	56.34	71.68	32.51	63.33	58.53	40.96	54.33	53.51	51.81	73.56	63.20
27b-q8_0	29GB	56.96	77.27	63.88	52.83	58.05	71.09	32.61	64.06	59.32	42.14	54.48	52.10	52.66	72.81	61.47

@chigkim commented on GitHub (Nov 3, 2025): A while ago, I ran the [MMLU Pro benchmark](https://arxiv.org/html/2406.01574v4) with different quants of Gemma2 9b-instruct and 27b-instruct using [chigkim/Ollama-MMLU-Pro](https://github.com/chigkim/Ollama-MMLU-Pro/) and Ollama. | Model | Size | overall | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other | | ---------- | ----- | ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | --- | ---- | ---------- | ------- | ---------- | ----- | | 9b-q2_K | 3.8GB | 42.02 | 64.99 | 44.36 | 35.16 | 37.07 | 55.09 | 22.50 | 43.28 | 48.56 | 29.25 | 41.52 | 39.28 | 36.26 | 59.27 | 48.16 | | 9b-q3_K_S | 4.3GB | 44.92 | 65.27 | 52.09 | 38.34 | 42.68 | 61.02 | 22.08 | 46.21 | 51.71 | 31.34 | 44.49 | 41.28 | 38.49 | 62.53 | 50.00 | | 9b-q3_K_M | 4.8GB | 46.43 | 60.53 | 50.44 | 42.49 | 41.95 | 63.74 | 23.63 | 49.02 | 54.33 | 32.43 | 46.85 | 40.28 | 41.72 | 62.91 | 53.14 | | 9b-q3_K_L | 5.1GB | 46.95 | 63.18 | 52.09 | 42.31 | 45.12 | 62.80 | 23.74 | 51.22 | 50.92 | 33.15 | 46.26 | 43.89 | 40.34 | 63.91 | 54.65 | | 9b-q4_0 | 5.4GB | 47.94 | 64.44 | 53.61 | 45.05 | 42.93 | 61.14 | 24.25 | 53.91 | 53.81 | 33.51 | 47.45 | 43.49 | 42.80 | 64.41 | 54.44 | | 9b-q4_K_S | 5.5GB | 48.31 | 66.67 | 53.74 | 45.58 | 43.90 | 61.61 | 25.28 | 51.10 | 53.02 | 34.70 | 47.37 | 43.69 | 43.65 | 64.66 | 54.87 | | 9b-q4_K_M | 5.8GB | 47.73 | 64.44 | 53.74 | 44.61 | 43.90 | 61.97 | 24.46 | 51.22 | 54.07 | 31.61 | 47.82 | 43.29 | 42.73 | 63.78 | 55.52 | | 9b-q4_1 | 6.0GB | 48.58 | 66.11 | 53.61 | 43.55 | 47.07 | 61.49 | 24.87 | 56.36 | 54.59 | 33.06 | 49.00 | 47.70 | 42.19 | 66.17 | 53.35 | | 9b-q5_0 | 6.5GB | 49.23 | 68.62 | 55.13 | 45.67 | 45.61 | 63.15 | 25.59 | 55.87 | 51.97 | 34.79 | 48.56 | 45.49 | 43.49 | 64.79 | 54.98 | | 9b-q5_K_S | 6.5GB | 48.99 | 70.01 | 55.01 | 45.76 | 45.61 | 63.51 | 24.77 | 55.87 | 53.81 | 32.97 | 47.22 | 47.70 | 42.03 | 64.91 | 55.52 | | 9b-q5_K_M | 6.6GB | 48.99 | 68.76 | 55.39 | 46.82 | 45.61 | 62.32 | 24.05 | 56.60 | 53.54 | 32.61 | 46.93 | 46.69 | 42.57 | 65.16 | 56.60 | | 9b-q5_1 | 7.0GB | 49.17 | 71.13 | 56.40 | 43.90 | 44.63 | 61.73 | 25.08 | 55.50 | 53.54 | 34.24 | 48.78 | 45.69 | 43.19 | 64.91 | 55.84 | | 9b-q6_K | 7.6GB | 48.99 | 68.90 | 54.25 | 45.41 | 47.32 | 61.85 | 25.59 | 55.75 | 53.54 | 32.97 | 47.52 | 45.69 | 43.57 | 64.91 | 55.95 | | 9b-q8_0 | 9.8GB | 48.55 | 66.53 | 54.50 | 45.23 | 45.37 | 60.90 | 25.70 | 54.65 | 52.23 | 32.88 | 47.22 | 47.29 | 43.11 | 65.66 | 54.87 | | 9b-fp16 | 18GB | 48.89 | 67.78 | 54.25 | 46.47 | 44.63 | 62.09 | 26.21 | 54.16 | 52.76 | 33.15 | 47.45 | 47.09 | 42.65 | 65.41 | 56.28 | | 27b-q2_K | 10GB | 44.63 | 72.66 | 48.54 | 35.25 | 43.66 | 59.83 | 19.81 | 51.10 | 48.56 | 32.97 | 41.67 | 42.89 | 35.95 | 62.91 | 51.84 | | 27b-q3_K_S | 12GB | 54.14 | 77.68 | 57.41 | 50.18 | 53.90 | 67.65 | 31.06 | 60.76 | 59.06 | 39.87 | 50.04 | 50.50 | 49.42 | 71.43 | 58.66 | | 27b-q3_K_M | 13GB | 53.23 | 75.17 | 61.09 | 48.67 | 51.95 | 68.01 | 27.66 | 61.12 | 59.06 | 38.51 | 48.70 | 47.90 | 48.19 | 71.18 | 58.23 | | 27b-q3_K_L | 15GB | 54.06 | 76.29 | 61.72 | 49.03 | 52.68 | 68.13 | 27.76 | 61.25 | 54.07 | 40.42 | 50.33 | 51.10 | 48.88 | 72.56 | 59.96 | | 27b-q4_0 | 16GB | 55.38 | 77.55 | 60.08 | 51.15 | 53.90 | 69.19 | 32.20 | 63.33 | 57.22 | 41.33 | 50.85 | 52.51 | 51.35 | 71.43 | 60.61 | | 27b-q4_K_S | 16GB | 54.85 | 76.15 | 61.85 | 48.85 | 55.61 | 68.13 | 32.30 | 62.96 | 56.43 | 39.06 | 51.89 | 50.90 | 49.73 | 71.80 | 60.93 | | 27b-q4_K_M | 17GB | 54.80 | 76.01 | 60.71 | 50.35 | 54.63 | 70.14 | 30.96 | 62.59 | 59.32 | 40.51 | 50.78 | 51.70 | 49.11 | 70.93 | 59.74 | | 27b-q4_1 | 17GB | 55.59 | 78.38 | 60.96 | 51.33 | 57.07 | 69.79 | 30.86 | 62.96 | 57.48 | 40.15 | 52.63 | 52.91 | 50.73 | 72.31 | 60.17 | | 27b-q5_0 | 19GB | 56.46 | 76.29 | 61.09 | 52.39 | 55.12 | 70.73 | 31.48 | 63.08 | 59.58 | 41.24 | 55.22 | 53.71 | 51.50 | 73.18 | 62.66 | | 27b-q5_K_S | 19GB | 56.14 | 77.41 | 63.37 | 50.71 | 57.07 | 70.73 | 31.99 | 64.43 | 58.27 | 42.87 | 53.15 | 50.70 | 51.04 | 72.31 | 59.85 | | 27b-q5_K_M | 19GB | 55.97 | 77.41 | 63.37 | 51.94 | 56.10 | 69.79 | 30.34 | 64.06 | 58.79 | 41.14 | 52.55 | 52.30 | 51.35 | 72.18 | 60.93 | | 27b-q5_1 | 21GB | 57.09 | 77.41 | 63.88 | 53.89 | 56.83 | 71.56 | 31.27 | 63.69 | 58.53 | 42.05 | 56.48 | 51.70 | 51.35 | 74.44 | 61.80 | | 27b-q6_K | 22GB | 56.85 | 77.82 | 63.50 | 52.39 | 56.34 | 71.68 | 32.51 | 63.33 | 58.53 | 40.96 | 54.33 | 53.51 | 51.81 | 73.56 | 63.20 | | 27b-q8_0 | 29GB | 56.96 | 77.27 | 63.88 | 52.83 | 58.05 | 71.09 | 32.61 | 64.06 | 59.32 | 42.14 | 54.48 | 52.10 | 52.66 | 72.81 | 61.47 |

GiteaMirror commented

2026-05-04 17:52:40 -05:00

@SuperUserNameMan commented on GitHub (Nov 3, 2025):

@chigkim : thanks for these data.

Here is my colorful interpretation of them :

@SuperUserNameMan commented on GitHub (Nov 3, 2025): @chigkim : thanks for these data. Here is my colorful interpretation of them : <img width="854" height="820" alt="Image" src="https://github.com/user-attachments/assets/8558f7a9-6a16-44a6-9be1-b26e06618276" />

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#69343

[GH-ISSUE #11043] Please keepQ6_K quantizations support in Ollama #69343

The idea is:

My use case:

Why it's important:

Resources:

Are you willing to help?

[GH-ISSUE #11043] Please keep`Q6_K` quantizations support in Ollama #69343