[GH-ISSUE #6303] Llama 3.1 405B fix-update #3952

Closed
opened 2026-04-12 14:49:41 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @gileneusz on GitHub (Aug 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6303

The update reduces memory usage, maintaining the same quality of the previous version. 🎉

Screenshot 2024-08-10 at 20 04 09

Link to updated model:

https://huggingface.co/meta-llama/Meta-Llama-3.1-405B

Screenshot 2024-08-10 at 20 04 21
Originally created by @gileneusz on GitHub (Aug 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6303 The update reduces memory usage, maintaining the same quality of the previous version. 🎉 <img width="597" alt="Screenshot 2024-08-10 at 20 04 09" src="https://github.com/user-attachments/assets/a4c9f9c6-7598-43a7-9967-b7f32044b269"> Link to updated model: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B <img width="601" alt="Screenshot 2024-08-10 at 20 04 21" src="https://github.com/user-attachments/assets/28844f00-f894-4ea0-835f-ad58e92ef22c">
GiteaMirror added the model label 2026-04-12 14:49:41 -05:00
Author
Owner

@igorschlum commented on GitHub (Aug 11, 2024):

Hi gileneusz
I don't see a lower memory usage for the new Llama3.1:405b models. I think it's more performance and bug fixing.
Can you close this issue as the new models are in Ollama librairy

<!-- gh-comment-id:2282912404 --> @igorschlum commented on GitHub (Aug 11, 2024): Hi [gileneusz](https://github.com/gileneusz) I don't see a lower memory usage for the new Llama3.1:405b models. I think it's more performance and bug fixing. Can you close this issue as the new models are in Ollama [librairy](https://ollama.com/library/llama3.1/tags)
Author
Owner

@FellowTraveler commented on GitHub (Aug 12, 2024):

Hi gileneusz I don't see a lower memory usage for the new Llama3.1:405b models.

If the number of KV heads was changed from 16 to 8 and vllm was able to get a 20% memory reduction, shouldn't Ollama see the same benefit from that?
Can you confirm that the new version is the one in Ollama now? Seems like the model was just updated in Ollama 2 hours ago so I assume so.

<!-- gh-comment-id:2282934018 --> @FellowTraveler commented on GitHub (Aug 12, 2024): > Hi [gileneusz](https://github.com/gileneusz) I don't see a lower memory usage for the new Llama3.1:405b models. If the number of KV heads was changed from 16 to 8 and vllm was able to get a 20% memory reduction, shouldn't Ollama see the same benefit from that? Can you confirm that the new version is the one in Ollama now? Seems like the model was just updated in Ollama 2 hours ago so I assume so.
Author
Owner

@gileneusz commented on GitHub (Aug 12, 2024):

thanks for updating the model, I'll close the issue now, I'll test the fixed model and see if the memory usage is lower....

<!-- gh-comment-id:2283013613 --> @gileneusz commented on GitHub (Aug 12, 2024): thanks for updating the model, I'll close the issue now, I'll test the fixed model and see if the memory usage is lower....
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3952