[GH-ISSUE #4730] llama3:8b-instruct performs much worse than llama3-8b-8192 on groq #2983

Open
opened 2026-04-12 13:22:14 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @mitar on GitHub (May 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4730

What is the issue?

I am running the same prompt (around 4K long in tokens) on both ollama and qroq. I tested with llama3:8b-instruct-q4_0, llama3:8b-instruct-q6_K and llama3:8b-instruct-q8_0 and results are much worse than (I get around 22% accurcacy on test data I have) when I run the same prompts against llama3-8b-8192 on groq (I get around 66% accuracy on test data I have). I do not understand how is this possible. It should be the same model.

Using llama3:70b-instruct-q4_0 behaves similarly to llama3-70b-8192.

In both cases I set num_ctx to 8K.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.39

Originally created by @mitar on GitHub (May 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4730 ### What is the issue? I am running the same prompt (around 4K long in tokens) on both ollama and qroq. I tested with `llama3:8b-instruct-q4_0`, `llama3:8b-instruct-q6_K` and `llama3:8b-instruct-q8_0` and results are much worse than (I get around 22% accurcacy on test data I have) when I run the same prompts against `llama3-8b-8192` on groq (I get around 66% accuracy on test data I have). I do not understand how is this possible. It should be the same model. Using `llama3:70b-instruct-q4_0` behaves similarly to `llama3-70b-8192`. In both cases I set `num_ctx` to 8K. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.39
GiteaMirror added the bug label 2026-04-12 13:22:14 -05:00
Author
Owner

@amvalero10 commented on GitHub (Jun 6, 2024):

I have the same problem :(

<!-- gh-comment-id:2152669412 --> @amvalero10 commented on GitHub (Jun 6, 2024): I have the same problem :(
Author
Owner

@amvalero10 commented on GitHub (Jun 6, 2024):

You can try with this model

https://ollama.com/koesn/llama3-8b-instruct

Has given me good results.

<!-- gh-comment-id:2152707658 --> @amvalero10 commented on GitHub (Jun 6, 2024): You can try with this model https://ollama.com/koesn/llama3-8b-instruct Has given me good results.
Author
Owner

@mitar commented on GitHub (Jun 6, 2024):

@amvalero10 koesn/llama3-8b-instruct performs even worse than llama3:8b-instruct-q4_0 for us.

<!-- gh-comment-id:2153291606 --> @mitar commented on GitHub (Jun 6, 2024): @amvalero10 koesn/llama3-8b-instruct performs even worse than llama3:8b-instruct-q4_0 for us.
Author
Owner

@alexchenyu commented on GitHub (Jun 22, 2024):

I meet similar situation here, the smartness and accuracy of groq's llama3-70b-8192 model is much better than my llama3:70b-instruct-fp16 power by ollama. I don't have any clue about why. I thought it would be the precision issue, so I tried fp16, still the same.

<!-- gh-comment-id:2183630769 --> @alexchenyu commented on GitHub (Jun 22, 2024): I meet similar situation here, the smartness and accuracy of groq's llama3-70b-8192 model is much better than my llama3:70b-instruct-fp16 power by ollama. I don't have any clue about why. I thought it would be the precision issue, so I tried fp16, still the same.
Author
Owner

@mitar commented on GitHub (Jun 23, 2024):

@alexchenyu How large are your prompts? Ours are around 3.5K.

<!-- gh-comment-id:2184915620 --> @mitar commented on GitHub (Jun 23, 2024): @alexchenyu How large are your prompts? Ours are around 3.5K.
Author
Owner

@alexchenyu commented on GitHub (Jun 25, 2024):

@alexchenyu How large are your prompts? Ours are around 3.5K.

My prompt is quite long, over 4k, I thinks maybe that's the reason, and after I switch to vLLM, it is as smart as huggingface/groq/meta.ai

<!-- gh-comment-id:2189419089 --> @alexchenyu commented on GitHub (Jun 25, 2024): > @alexchenyu How large are your prompts? Ours are around 3.5K. My prompt is quite long, over 4k, I thinks maybe that's the reason, and after I switch to vLLM, it is as smart as huggingface/groq/meta.ai
Author
Owner

@iris0329 commented on GitHub (Sep 2, 2024):

same, llama3-8b-8192 on groq performs better a lot.

<!-- gh-comment-id:2323619379 --> @iris0329 commented on GitHub (Sep 2, 2024): same, `llama3-8b-8192 on groq` performs better a lot.
Author
Owner

@petebytes commented on GitHub (Feb 24, 2025):

same issue for me, tried fp16 and also tried increasing context length (num_ctx) to 65536
changed model and context length with good improvements

<!-- gh-comment-id:2678785440 --> @petebytes commented on GitHub (Feb 24, 2025): same issue for me, tried fp16 and also tried increasing context length (num_ctx) to 65536 changed model and context length with good improvements
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2983