[PR #11508] fix: quantization on non-macos systems #24100

Open
opened 2026-04-19 17:23:07 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11508
Author: @mxyng
Created: 7/23/2025
Status: 🔄 Open

Base: mainHead: mxyng/quant


📝 Commits (3)

📊 Changes

4 files changed (+95 additions, -33 deletions)

View changed files

📝 ml/backend/ggml/ggml/src/ggml.go (+1 -0)
📝 ml/backend/ggml/quantization.go (+27 -29)
📝 server/quantization.go (+13 -4)
📝 server/quantization_test.go (+54 -0)

📄 Description

This fixes and adds a test for quantization on non-Darwin system. Previously quantizing on Linux and Window would produce corrupt files due to values being mapped incorrectly.

Also update Quantize to return a iter.Seq[[]byte]. This allows portions of the tensor to be quantized at a time (which it was already doing before) which reduces overall memory usage


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11508 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 7/23/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `mxyng/quant` --- ### 📝 Commits (3) - [`2a03498`](https://github.com/ollama/ollama/commit/2a03498bbb1d41fa413ef404cd106032118f879e) iter quant - [`362bf09`](https://github.com/ollama/ollama/commit/362bf0901f444977c323cef246af6a82c680702d) cleanup quantization - [`c8b1f9e`](https://github.com/ollama/ollama/commit/c8b1f9e1a1595ea96bc61649bcbe10be513b563a) fix quantization ### 📊 Changes **4 files changed** (+95 additions, -33 deletions) <details> <summary>View changed files</summary> 📝 `ml/backend/ggml/ggml/src/ggml.go` (+1 -0) 📝 `ml/backend/ggml/quantization.go` (+27 -29) 📝 `server/quantization.go` (+13 -4) 📝 `server/quantization_test.go` (+54 -0) </details> ### 📄 Description This fixes and adds a test for quantization on non-Darwin system. Previously quantizing on Linux and Window would produce corrupt files due to values being mapped incorrectly. Also update `Quantize` to return a `iter.Seq[[]byte]`. This allows portions of the tensor to be quantized at a time (which it was already doing before) which reduces overall memory usage --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:23:07 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#24100