[PR #10363] Move quantization to new backend #13221

Closed
opened 2026-04-13 00:21:19 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/10363

State: closed
Merged: Yes


This change converts the quantization support for model creation to use the new ml backend via CGO, and implements the model specific type adjustments in Go. The algorithm balances parallelism and CPU/Memory load to reduce the time it takes to perform the conversion process. This also makes some changes to the fs/ggml module to support round-trip serialization, and exposes supported quantization types. Tensors can now be written based on a channel to enable more efficient memory usage during conversion.

This enables removing one of our carried patches.

New CLI UX:

% cat test.modelfile
FROM llava:34b-v1.6-fp16
% ./ollama create test4 -f ./test.modelfile -q Q4_0
gathering model components
quantizing F16 model to Q4_0 100% ▕████████████████████████████████████▏  68 GB
verifying conversion
creating new layer sha256:8156890c443d9809ba8e00c2990638462a1ad038c53a93bc0c9dd94fa94c3031
using existing layer sha256:83720bd8438ccdc910deba5efbdc3340820b29258d94a7a60d1addc9a1b5f095
using existing layer sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1
using existing layer sha256:a47b02e00552cd7022ea700b1abf8c572bb26c9bc8c1a37e01b566f2344df5dc
using existing layer sha256:f02dd72bb2423204352eabc5637b44d79d17f109fdb510a7c51455892aa2d216
writing manifest
success
**Original Pull Request:** https://github.com/ollama/ollama/pull/10363 **State:** closed **Merged:** Yes --- This change converts the quantization support for model creation to use the new ml backend via CGO, and implements the model specific type adjustments in Go. The algorithm balances parallelism and CPU/Memory load to reduce the time it takes to perform the conversion process. This also makes some changes to the fs/ggml module to support round-trip serialization, and exposes supported quantization types. Tensors can now be written based on a channel to enable more efficient memory usage during conversion. This enables removing one of our carried patches. New CLI UX: ``` % cat test.modelfile FROM llava:34b-v1.6-fp16 % ./ollama create test4 -f ./test.modelfile -q Q4_0 gathering model components quantizing F16 model to Q4_0 100% ▕████████████████████████████████████▏ 68 GB verifying conversion creating new layer sha256:8156890c443d9809ba8e00c2990638462a1ad038c53a93bc0c9dd94fa94c3031 using existing layer sha256:83720bd8438ccdc910deba5efbdc3340820b29258d94a7a60d1addc9a1b5f095 using existing layer sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1 using existing layer sha256:a47b02e00552cd7022ea700b1abf8c572bb26c9bc8c1a37e01b566f2344df5dc using existing layer sha256:f02dd72bb2423204352eabc5637b44d79d17f109fdb510a7c51455892aa2d216 writing manifest success ```
GiteaMirror added the pull-request label 2026-04-13 00:21:19 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13221