[GH-ISSUE #14335] Set scaling factors to have negative values in quantization #87013

Open
opened 2026-05-10 04:39:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @g-lizhang on GitHub (Feb 20, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14335

I have a custom quantization algorithm that uses negative values for scaling factors in model quantization. Does Ollama allow using negative values for scaling factors? Any potential issues?
I asked AI engines (e.g. gemini) and got "yes" as answer.
Would love to get verification from human experts here.
Thank you!

Originally created by @g-lizhang on GitHub (Feb 20, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14335 I have a custom quantization algorithm that uses negative values for scaling factors in model quantization. Does Ollama allow using negative values for scaling factors? Any potential issues? I asked AI engines (e.g. gemini) and got "yes" as answer. Would love to get verification from human experts here. Thank you!
GiteaMirror added the question label 2026-05-10 04:39:24 -05:00
Author
Owner

@joeVenner commented on GitHub (Apr 14, 2026):

Ollama uses GGUF format for quantized models, which is based on the GGML tensor format. The scaling factors (Q scales) in GGUF are stored as float32 values.

Can scaling factors be negative? Technically yes — the GGUF spec doesn't enforce positive-only scaling. However, there are practical considerations:

  1. Numerical stability: Many quantization schemes (Q4_K_M, Q5_K_S, etc.) are designed assuming positive scale factors. Negative scales could break optimized SIMD kernels (AVX, NEON) that expect positive multipliers.

  2. Ollama's GGUF loader: Uses llama.cpp under the hood. The dequantization formula is:

output = quantized_value * scale + minimum

Negative scales would invert the value range, which might cause unexpected behavior in attention layers.

  1. Testing: If you're creating custom quantizations, test with a small model first. Ollama will load it, but inference quality depends on whether the underlying ops handle negative scales correctly.

Recommendation: Try it and check the output quality. GGUF/llama.cpp won't error out, but correctness isn't guaranteed. Consider using absgmax quantization instead if you need signed ranges.

<!-- gh-comment-id:4245292600 --> @joeVenner commented on GitHub (Apr 14, 2026): Ollama uses GGUF format for quantized models, which is based on the GGML tensor format. The scaling factors (Q scales) in GGUF are stored as float32 values. Can scaling factors be negative? Technically yes — the GGUF spec doesn't enforce positive-only scaling. However, there are practical considerations: 1. Numerical stability: Many quantization schemes (Q4_K_M, Q5_K_S, etc.) are designed assuming positive scale factors. Negative scales could break optimized SIMD kernels (AVX, NEON) that expect positive multipliers. 2. Ollama's GGUF loader: Uses llama.cpp under the hood. The dequantization formula is: output = quantized_value * scale + minimum Negative scales would invert the value range, which might cause unexpected behavior in attention layers. 3. Testing: If you're creating custom quantizations, test with a small model first. Ollama will load it, but inference quality depends on whether the underlying ops handle negative scales correctly. Recommendation: Try it and check the output quality. GGUF/llama.cpp won't error out, but correctness isn't guaranteed. Consider using absgmax quantization instead if you need signed ranges.
Author
Owner

@joeVenner commented on GitHub (Apr 14, 2026):

@g-lizhang FYI

<!-- gh-comment-id:4245295632 --> @joeVenner commented on GitHub (Apr 14, 2026): @g-lizhang FYI
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#87013