[GH-ISSUE #15908] CRITICAL: GGUF Models Have Corrupted F32 Norm Weights (upstream llama.cpp bug) #72192

Closed
opened 2026-05-05 03:36:56 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @ssfdre38 on GitHub (Apr 30, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15908

Upstream Bug Affecting All Ollama Models

Upstream Issue: https://github.com/ggml-org/llama.cpp/issues/22565

Summary

We discovered that F32 norm weights in GGUF files are corrupted during SafeTensors→GGUF conversion. This affects all Ollama models because Ollama uses GGUF format.

Confirmed Affected Models

  • Qwen2.5-3B: Norm weights 1.18x-4x off from SafeTensors
  • Gemma 4-E4B: Norm weights 0.31x-9.58x off from SafeTensors

Likely affects all models in Ollama library.

Impact

  • Models run without crashing
  • Significantly degraded output quality (token predictions 3-10x off)
  • Users blame 'model quality' when it's actually file corruption

Reproduction

See full details in upstream issue: https://github.com/ggml-org/llama.cpp/issues/22565

We compared GGUF F32 weights to original SafeTensors numerically and found systematic corruption in RMSNorm weights.

  1. Acknowledge issue - Users should know current models may be corrupted
  2. Monitor upstream fix - llama.cpp team needs to fix conversion
  3. Re-publish models - Once fixed, regenerate all GGUFs

Priority

P0 CRITICAL - Affects entire Ollama library and millions of users.


Discovered while debugging inference engine. Expected logit 19.46, got 6.44. Root cause: GGUF weights don't match ground truth.

Originally created by @ssfdre38 on GitHub (Apr 30, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15908 # Upstream Bug Affecting All Ollama Models **Upstream Issue:** https://github.com/ggml-org/llama.cpp/issues/22565 ## Summary We discovered that F32 norm weights in GGUF files are corrupted during SafeTensors→GGUF conversion. This affects **all Ollama models** because Ollama uses GGUF format. ## Confirmed Affected Models - **Qwen2.5-3B**: Norm weights 1.18x-4x off from SafeTensors - **Gemma 4-E4B**: Norm weights 0.31x-9.58x off from SafeTensors Likely affects **all models in Ollama library**. ## Impact - ✅ Models run without crashing - ❌ **Significantly degraded output quality** (token predictions 3-10x off) - ❌ Users blame 'model quality' when it's actually file corruption ## Reproduction See full details in upstream issue: https://github.com/ggml-org/llama.cpp/issues/22565 We compared GGUF F32 weights to original SafeTensors numerically and found systematic corruption in RMSNorm weights. ## Recommended Actions 1. **Acknowledge issue** - Users should know current models may be corrupted 2. **Monitor upstream fix** - llama.cpp team needs to fix conversion 3. **Re-publish models** - Once fixed, regenerate all GGUFs ## Priority **P0 CRITICAL** - Affects entire Ollama library and millions of users. --- Discovered while debugging inference engine. Expected logit 19.46, got 6.44. Root cause: GGUF weights don't match ground truth.
Author
Owner

@ssfdre38 commented on GitHub (Apr 30, 2026):

False alarm - this was a tensor name mapping error in my validation script, not a GGUF bug.

What happened:

  • I compared wrong tensors between SafeTensors and GGUF
  • GGUF is storing exactly what's in SafeTensors (verified with manual BF16 decode)
  • Gemma 4's complex multimodal architecture has 7 norm types per layer that I didn't map correctly

Verification:

SafeTensors (manual BF16): L2 norm = 562.24
SafeTensors (PyTorch):     L2 norm = 562.24
GGUF F32:                  L2 norm = 562.39
Match: ✓ (0.03% difference)

No action needed - GGUF conversion is working correctly. Apologies for the noise.

Filed and retracted: https://github.com/ggml-org/llama.cpp/issues/22565

<!-- gh-comment-id:4356660068 --> @ssfdre38 commented on GitHub (Apr 30, 2026): False alarm - this was a tensor name mapping error in my validation script, not a GGUF bug. **What happened:** - I compared wrong tensors between SafeTensors and GGUF - GGUF is storing exactly what's in SafeTensors (verified with manual BF16 decode) - Gemma 4's complex multimodal architecture has 7 norm types per layer that I didn't map correctly **Verification:** ``` SafeTensors (manual BF16): L2 norm = 562.24 SafeTensors (PyTorch): L2 norm = 562.24 GGUF F32: L2 norm = 562.39 Match: ✓ (0.03% difference) ``` **No action needed** - GGUF conversion is working correctly. Apologies for the noise. Filed and retracted: https://github.com/ggml-org/llama.cpp/issues/22565
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72192