[GH-ISSUE #5957] Llama 3.1 base models for text completion #3723

Closed
opened 2026-04-12 14:32:01 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @kaetemi on GitHub (Jul 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5957

Currently only the instruct models appear to be in the library, the text completion models would be appreciated too. Thanks! :)

Originally created by @kaetemi on GitHub (Jul 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5957 Currently only the instruct models appear to be in the library, the text completion models would be appreciated too. Thanks! :)
GiteaMirror added the model label 2026-04-12 14:32:01 -05:00
Author
Owner

@d-kleine commented on GitHub (Aug 6, 2024):

@kaetemi @sala91 @alexOarga @gwillen @chigkim
Llama 3.1 text completion models are now available:
https://ollama.com/library/llama3.1

<!-- gh-comment-id:2270625066 --> @d-kleine commented on GitHub (Aug 6, 2024): @kaetemi @sala91 @alexOarga @gwillen @chigkim Llama 3.1 text completion models are now available: https://ollama.com/library/llama3.1
Author
Owner

@chigkim commented on GitHub (Aug 6, 2024):

Thanks, but can we also have for 405b as well? There are only 405b quants for instruct.

<!-- gh-comment-id:2271373920 --> @chigkim commented on GitHub (Aug 6, 2024): Thanks, but can we also have for 405b as well? There are only 405b quants for instruct.
Author
Owner

@d-kleine commented on GitHub (Aug 6, 2024):

You are right, only the 8b and 70b text models have been added. I have raised a request for the 405b text model here:

https://github.com/ollama/ollama/issues/6060#issuecomment-2271938799

<!-- gh-comment-id:2271939759 --> @d-kleine commented on GitHub (Aug 6, 2024): You are right, only the 8b and 70b `text` models have been added. I have raised a request for the 405b `text` model here: https://github.com/ollama/ollama/issues/6060#issuecomment-2271938799
Author
Owner

@d-kleine commented on GitHub (Aug 6, 2024):

Just saw 405b text generation models are being added too, might take some more hours because the files are quite large to be uploaded

<!-- gh-comment-id:2272322626 --> @d-kleine commented on GitHub (Aug 6, 2024): Just saw 405b text generation models are being added too, might take some more hours because the files are quite large to be uploaded
Author
Owner

@chigkim commented on GitHub (Aug 10, 2024):

It seems like meta updated 405b?

https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f

"Without explanation, Meta changed the number of KV heads from 16 to 8 (which now matches the whitepaper) for the 405B model. This is not just a config change, the whole model has been updated 😵"

https://www.reddit.com/r/LocalLLaMA/comments/1eoin62/meta_just_pushed_a_new_llama_31_405b_to_hf/

<!-- gh-comment-id:2281400917 --> @chigkim commented on GitHub (Aug 10, 2024): It seems like meta updated 405b? https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f "Without explanation, Meta changed the number of KV heads from 16 to 8 (which now matches the whitepaper) for the 405B model. This is not just a config change, the whole model has been updated 😵" https://www.reddit.com/r/LocalLLaMA/comments/1eoin62/meta_just_pushed_a_new_llama_31_405b_to_hf/
Author
Owner

@d-kleine commented on GitHub (Aug 10, 2024):

Oh wow 😓

@igorschlum Would it be possible to update the Llama 3.1 405b models (both text and instruct are affected) in Ollama Models again?

<!-- gh-comment-id:2281590656 --> @d-kleine commented on GitHub (Aug 10, 2024): Oh wow 😓 @igorschlum Would it be possible to update the Llama 3.1 405b models (both `text` and `instruct` are affected) in Ollama Models again?
Author
Owner

@chigkim commented on GitHub (Aug 10, 2024):

Arthur Zucker: "The recent update to Llama checkpoints is from a good catch by @vllm_project, allows ~20% memory reduction because the 16 heads were already 8 heads copied 2 times. When you compute attention you expand (shared mem) to 32. Now it’s 8x4 instead of 16x2."

https://x.com/art_zucker/status/1822243183889105368

<!-- gh-comment-id:2282215650 --> @chigkim commented on GitHub (Aug 10, 2024): Arthur Zucker: "The recent update to Llama checkpoints is from a good catch by @vllm_project, allows ~20% memory reduction because the 16 heads were already 8 heads copied 2 times. When you compute attention you expand (shared mem) to 32. Now it’s 8x4 instead of 16x2." https://x.com/art_zucker/status/1822243183889105368
Author
Owner

@d-kleine commented on GitHub (Aug 10, 2024):

#6303

<!-- gh-comment-id:2282257029 --> @d-kleine commented on GitHub (Aug 10, 2024): #6303
Author
Owner

@igorschlum commented on GitHub (Aug 11, 2024):

@d-kleine Llama3.1:405b is in the process of being updated on the library page. I don’t know who is responsible for updating the models on the Ollama team, as I’m just a user of Ollama, like you.

@kaetemi As this issue has been addressed, could you please close it so we can focus on other issues to solve?

<!-- gh-comment-id:2282768081 --> @igorschlum commented on GitHub (Aug 11, 2024): @d-kleine Llama3.1:405b is in the process of being updated on the [library](https://ollama.com/library/llama3.1/tags) page. I don’t know who is responsible for updating the models on the Ollama team, as I’m just a user of Ollama, like you. @kaetemi As this issue has been addressed, could you please close it so we can focus on other issues to solve?
Author
Owner

@kaetemi commented on GitHub (Aug 11, 2024):

Seems to be resolved now.

<!-- gh-comment-id:2282821251 --> @kaetemi commented on GitHub (Aug 11, 2024): Seems to be resolved now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3723