[GH-ISSUE #6341] Llama 3.1 70B high-quality HQQ quantized model - 99%+ quality of fp16 #3980

Open
opened 2026-04-12 14:51:07 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @gileneusz on GitHub (Aug 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6341

I'm not really sure if that's possible but adding that to ollama could really impact the performance on 4-bit quant option:

99%+ in all benchmarks in lm-eval relative performance to FP16 and similar inference speed to fp16

url:
https://huggingface.co/mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq

Screenshot 2024-08-13 at 19 03 57

also this:

https://huggingface.co/ModelCloud/Meta-Llama-3.1-70B-Instruct-gptq-4bit

Screenshot 2024-08-13 at 19 07 18
Originally created by @gileneusz on GitHub (Aug 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6341 I'm not really sure if that's possible but adding that to ollama could really impact the performance on 4-bit quant option: 99%+ in all benchmarks in lm-eval relative performance to FP16 and similar inference speed to fp16 url: https://huggingface.co/mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq <img width="604" alt="Screenshot 2024-08-13 at 19 03 57" src="https://github.com/user-attachments/assets/64cd0427-b7c7-4fb8-b846-15f172669248"> also this: https://huggingface.co/ModelCloud/Meta-Llama-3.1-70B-Instruct-gptq-4bit <img width="597" alt="Screenshot 2024-08-13 at 19 07 18" src="https://github.com/user-attachments/assets/c3518ffe-323d-42f0-9162-d188179797fb">
GiteaMirror added the model label 2026-04-12 14:51:07 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 13, 2024):

Requires support in llama.cpp. There have been issues raised (https://github.com/ggerganov/llama.cpp/issues/6368, https://github.com/ggerganov/llama.cpp/issues/4782) but unfortunately no progress.

<!-- gh-comment-id:2286873852 --> @rick-github commented on GitHub (Aug 13, 2024): Requires support in llama.cpp. There have been issues raised (https://github.com/ggerganov/llama.cpp/issues/6368, https://github.com/ggerganov/llama.cpp/issues/4782) but unfortunately no progress.
Author
Owner

@charlesrwest commented on GitHub (Aug 21, 2024):

I would also be really interested in seeing support added for this. Would pay $100 bounty, if that helps.

<!-- gh-comment-id:2301914632 --> @charlesrwest commented on GitHub (Aug 21, 2024): I would also be really interested in seeing support added for this. Would pay $100 bounty, if that helps.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3980