[GH-ISSUE #5945] Llama3.1 405b q1 q2 q5 q6 q8 fp16 #65748

Closed
opened 2026-05-03 22:31:48 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @Llamadouble999q on GitHub (Jul 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5945

Would support?

Originally created by @Llamadouble999q on GitHub (Jul 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5945 Would support?
GiteaMirror added the model label 2026-05-03 22:31:48 -05:00
Author
Owner

@kozuch commented on GitHub (Jul 28, 2024):

Related issue: https://github.com/ollama/ollama/issues/5889

<!-- gh-comment-id:2254464643 --> @kozuch commented on GitHub (Jul 28, 2024): Related issue: https://github.com/ollama/ollama/issues/5889
Author
Owner

@gileneusz commented on GitHub (Aug 2, 2024):

it's quite concerning why ollama team ignores gpu rich users 🤑😁

<!-- gh-comment-id:2265816418 --> @gileneusz commented on GitHub (Aug 2, 2024): it's quite concerning why ollama team ignores gpu rich users 🤑😁
Author
Owner

@igorschlum commented on GitHub (Aug 6, 2024):

Hello, they are all there in instruct:
https://ollama.com/library/llama3.1/tags
405b-instruct-q2_K
811dcd740cc3 • 151GB • Updated 2 hours ago
405b-instruct-q3_K_L
98babcc85c29 • 215GB • Updated 28 minutes ago
405b-instruct-q3_K_M
2d60e04cc717 • 197GB • Updated 56 minutes ago
405b-instruct-q3_K_S
8efe69b31081 • 177GB • Updated 2 hours ago
405b-instruct-q4_0
78f97162a617 • 231GB • Updated 5 hours ago
405b-instruct-q4_1
5db145603842 • 257GB • Updated 4 hours ago
405b-instruct-q5_0
1fb70cf0a02b • 282GB • Updated 4 hours ago
405b-instruct-q5_1
c8ba45da8139 • 308GB • Updated 3 hours ago
405b-instruct-q8_0
f5ac28d40d17 • 436GB • Updated 3 hours ago

@Llamadouble999q could you please close the issue to keep the issues under 1000 :-)

<!-- gh-comment-id:2270598579 --> @igorschlum commented on GitHub (Aug 6, 2024): Hello, they are all there in instruct: https://ollama.com/library/llama3.1/tags 405b-instruct-q2_K 811dcd740cc3 • 151GB • Updated 2 hours ago 405b-instruct-q3_K_L 98babcc85c29 • 215GB • Updated 28 minutes ago 405b-instruct-q3_K_M 2d60e04cc717 • 197GB • Updated 56 minutes ago 405b-instruct-q3_K_S 8efe69b31081 • 177GB • Updated 2 hours ago 405b-instruct-q4_0 78f97162a617 • 231GB • Updated 5 hours ago 405b-instruct-q4_1 5db145603842 • 257GB • Updated 4 hours ago 405b-instruct-q5_0 1fb70cf0a02b • 282GB • Updated 4 hours ago 405b-instruct-q5_1 c8ba45da8139 • 308GB • Updated 3 hours ago 405b-instruct-q8_0 f5ac28d40d17 • 436GB • Updated 3 hours ago @Llamadouble999q could you please close the issue to keep the issues under 1000 :-)
Author
Owner

@kozuch commented on GitHub (Aug 6, 2024):

Still Q1 missing...

<!-- gh-comment-id:2270762522 --> @kozuch commented on GitHub (Aug 6, 2024): Still Q1 missing...
Author
Owner

@alvincho commented on GitHub (Aug 6, 2024):

I tried q2 but got an error: wrong number of tensors; expected 1138, got 1137

<!-- gh-comment-id:2270925589 --> @alvincho commented on GitHub (Aug 6, 2024): I tried q2 but got an error: wrong number of tensors; expected 1138, got 1137
Author
Owner

@kozuch commented on GitHub (Aug 6, 2024):

I tried q2 but got an error: wrong number of tensors; expected 1138, got 1137

@alvincho Same problem was reported at https://github.com/ollama/ollama/issues/5889. Please report the problem as bug in separate issue.

<!-- gh-comment-id:2271283048 --> @kozuch commented on GitHub (Aug 6, 2024): > I tried q2 but got an error: wrong number of tensors; expected 1138, got 1137 @alvincho Same problem was reported at https://github.com/ollama/ollama/issues/5889. Please report the problem as bug in separate issue.
Author
Owner

@igorschlum commented on GitHub (Aug 11, 2024):

Hi @Llamadouble999q I tried 405b-insctruct-q2_K and it is less powerfull than 70b-instruct-q8_0 in solving math + logical question.
I don't think that q1 could be effective. Do you have a link on internet to Llama3.1:405-instruct-q1 ? I search and find nothing.

<!-- gh-comment-id:2282930001 --> @igorschlum commented on GitHub (Aug 11, 2024): Hi @Llamadouble999q I tried 405b-insctruct-q2_K and it is less powerfull than 70b-instruct-q8_0 in solving math + logical question. I don't think that q1 could be effective. Do you have a link on internet to Llama3.1:405-instruct-q1 ? I search and find nothing.
Author
Owner

@MaxJa4 commented on GitHub (Aug 14, 2024):

@Llamadouble999q 405B has been uploaded a few days ago, can this issue be closed?

<!-- gh-comment-id:2290076427 --> @MaxJa4 commented on GitHub (Aug 14, 2024): @Llamadouble999q 405B has been uploaded a few days ago, can this issue be closed?
Author
Owner

@igorschlum commented on GitHub (Aug 15, 2024):

@alvincho @kozuch 405b models have been updated several times in the library without versioning. It's possible that this occurred while you were pulling your model. Can you confirmed that you could successfully pull them since?

<!-- gh-comment-id:2290633825 --> @igorschlum commented on GitHub (Aug 15, 2024): @alvincho @kozuch 405b models have been updated several times in the library without versioning. It's possible that this occurred while you were pulling your model. Can you confirmed that you could successfully pull them since?
Author
Owner

@alvincho commented on GitHub (Aug 15, 2024):

I update a new version of ollama and it works now

<!-- gh-comment-id:2290696962 --> @alvincho commented on GitHub (Aug 15, 2024): I update a new version of ollama and it works now
Author
Owner

@igorschlum commented on GitHub (Aug 15, 2024):

Hi @Llamadouble999q,

Working with an extremely large model, such as a 405 billion parameter model, using a quantization level of Q1 is not effective and that's why it's not proposed in the library.

Quantization reduces the computational load and memory footprint of LLMs by approximating the weights of the model, which reduces the precision of the answers. This is true for any model size, but especially for very large models.

The minimum recommended quantization level is Q2. Ideally, aim for Q4 and progressively move up to Q8 if you have the computational resources and time. This ensures progressively higher degrees of precision and better answers.

Could you please close the issue if you agree?

<!-- gh-comment-id:2290718987 --> @igorschlum commented on GitHub (Aug 15, 2024): Hi @Llamadouble999q, Working with an extremely large model, such as a 405 billion parameter model, using a quantization level of Q1 is not effective and that's why it's not proposed in the library. Quantization reduces the computational load and memory footprint of LLMs by approximating the weights of the model, which reduces the precision of the answers. This is true for any model size, but especially for very large models. The minimum recommended quantization level is Q2. Ideally, aim for Q4 and progressively move up to Q8 if you have the computational resources and time. This ensures progressively higher degrees of precision and better answers. Could you please close the issue if you agree?
Author
Owner

@pdevine commented on GitHub (Aug 28, 2024):

Closing since this is supported now. You can find the tags here: https://ollama.com/library/llama3.1/tags

<!-- gh-comment-id:2316270254 --> @pdevine commented on GitHub (Aug 28, 2024): Closing since this is supported now. You can find the tags here: https://ollama.com/library/llama3.1/tags
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65748