[GH-ISSUE #10749] Regression in 0.7.0 : can't quantize to Q2_K #7061

Closed
opened 2026-04-12 18:58:43 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @lowlyocean on GitHub (May 16, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10749

What is the issue?

This was working in 0.6.x but now seems to be unsupported in latest version

Relevant log output

Error: unsupported quantization type Q2_K - supported types are F32, F16, Q4_K_S, Q4_K_M, Q8_0

OS

No response

GPU

No response

CPU

No response

Ollama version

0.7.0

Originally created by @lowlyocean on GitHub (May 16, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10749 ### What is the issue? This was working in 0.6.x but now seems to be unsupported in latest version ### Relevant log output ```shell Error: unsupported quantization type Q2_K - supported types are F32, F16, Q4_K_S, Q4_K_M, Q8_0 ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.7.0
GiteaMirror added the bug label 2026-04-12 18:58:43 -05:00
Author
Owner

@rick-github commented on GitHub (May 16, 2025):

ollama only supports Q4 and Q8 quantizations during model creation.

<!-- gh-comment-id:2887606781 --> @rick-github commented on GitHub (May 16, 2025): ollama [only supports](https://github.com/ollama/ollama/pull/10647#issuecomment-2873563847) Q4 and Q8 quantizations during model creation.
Author
Owner

@lowlyocean commented on GitHub (May 16, 2025):

That's not sufficiently small for my use case. per the person's response to your question on that thread, it sounds like we will be phased out of using Q2_k over time (even though initially, we can get away with using llama.cpp for quantization). That's disappointing

<!-- gh-comment-id:2887610982 --> @lowlyocean commented on GitHub (May 16, 2025): That's not sufficiently small for my use case. per the person's response to your question on that thread, it sounds like we will be phased out of using Q2_k over time (even though initially, we can get away with using llama.cpp for quantization). That's disappointing
Author
Owner

@rick-github commented on GitHub (May 16, 2025):

There are other projects that will likely continue to support quantization levels that ollama doesn't. You probably know these already: llama.cpp (of course), LMStudio, Mistral.rs. vLLM support GGUF and while I don't know if they support very low quant levels, they have their own LLM compressor.

<!-- gh-comment-id:2887645305 --> @rick-github commented on GitHub (May 16, 2025): There are other projects that will likely continue to support quantization levels that ollama doesn't. You probably know these already: [llama.cpp](https://github.com/ggml-org/llama.cpp) (of course), [LMStudio](https://lmstudio.ai/), [Mistral.rs](https://github.com/EricLBuehler/mistral.rs). [vLLM](https://github.com/vllm-project/vllm) support GGUF and while I don't know if they support very low quant levels, they have their own [LLM compressor](https://github.com/vllm-project/llm-compressor).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7061