[GH-ISSUE #15189] TurboQuant-MoE: SOTA 8.5x KV-cache compression with Residual Correction #71783

Open
opened 2026-05-05 02:29:35 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @RemizovDenis on GitHub (Apr 1, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15189

Hi, I am the author of the TurboQuant-MoE repository. I noticed the ongoing work on TurboQuant integration (PR 15090). We have a production-ready implementation that achieves 8.5x compression with high fidelity through a specialized Residual Correction stage.

As mentioned in the PR discussions, quality at 3-bit/4-bit is a priority. Our implementation with randomized Hadamard rotations and residual correction dramatically improves accuracy for tq3/tq4 formats compared to basic rotation methods. I would be happy to contribute our residual correction logic to the Ollama backend to bring the implementation to industrial-grade quality and performance.

Originally created by @RemizovDenis on GitHub (Apr 1, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15189 Hi, I am the author of the TurboQuant-MoE repository. I noticed the ongoing work on TurboQuant integration (PR 15090). We have a production-ready implementation that achieves 8.5x compression with high fidelity through a specialized Residual Correction stage. As mentioned in the PR discussions, quality at 3-bit/4-bit is a priority. Our implementation with randomized Hadamard rotations and residual correction dramatically improves accuracy for tq3/tq4 formats compared to basic rotation methods. I would be happy to contribute our residual correction logic to the Ollama backend to bring the implementation to industrial-grade quality and performance.
GiteaMirror added the feature request label 2026-05-05 02:29:35 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71783