[GH-ISSUE #10295] 🤗 Use 8 or 16 bit integers for internal representation for integer quantized models instead of floats. #6760

Closed
opened 2026-04-12 18:31:00 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @FieldMouse-AI on GitHub (Apr 16, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10295

It seems that integer quantized models, while smaller on disk, appear to be handled using floats when executed in ollama.

Could there be an option to run them on integers, instead?

That would give a great performance boost on weaker hardware.

I do believe that I've seen this imlemented with the llama.cpp project.

Please share your thoughts on this. 🤗

Thanks!

Originally created by @FieldMouse-AI on GitHub (Apr 16, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10295 It seems that integer quantized models, while smaller on disk, appear to be handled using floats when executed in ollama. Could there be an option to run them on integers, instead? That would give a great performance boost on weaker hardware. I do believe that I've seen this imlemented with the `llama.cpp` project. Please share your thoughts on this. 🤗 Thanks!
GiteaMirror added the feature request label 2026-04-12 18:31:00 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6760