[PR #10806] server: improve tensor quantization fallback logic #13371

Closed
opened 2026-04-13 00:25:14 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/10806

State: closed
Merged: Yes


Fall back to alternative quantization types when a tensor's dimensions aren't divisible by the block size required for the original desired quantization type. If retried quantization types fail, the system ultimately falls back to F16 (half-precision floating point) which has a block size of 1 and can handle any tensor dimension.

resolves #10729

**Original Pull Request:** https://github.com/ollama/ollama/pull/10806 **State:** closed **Merged:** Yes --- Fall back to alternative quantization types when a tensor's dimensions aren't divisible by the block size required for the original desired quantization type. If retried quantization types fail, the system ultimately falls back to F16 (half-precision floating point) which has a block size of 1 and can handle any tensor dimension. resolves #10729
GiteaMirror added the pull-request label 2026-04-13 00:25:14 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13371