[GH-ISSUE #2393] Inquiry on Optimal CPU and GPU Configurations for LLaMA 2(70B) #1392

Closed
opened 2026-04-12 11:13:22 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @gautam-fairpe on GitHub (Feb 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2393

Originally assigned to: @dhiltgen on GitHub.

I am currently exploring the capabilities of LLaMA 2 for various NLP tasks and am in the process of setting up the necessary hardware environment to ensure optimal performance. Given the complexity and resource-intensive nature of LLaMA 2(70B), I am seeking advice on the most suitable CPU and GPU configurations that can deliver the best performance for training and inference tasks with this model.

Specifically, I am interested in:

  • Recommendations for CPU and GPU models that are known to work well with LLaMA 2, considering both performance and cost-efficiency.
  • Any available estimation charts or benchmarks that illustrate the performance of LLaMA 2 with different hardware configurations. This information would be incredibly helpful for planning hardware investments and understanding the expected model throughput and latency.
Originally created by @gautam-fairpe on GitHub (Feb 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2393 Originally assigned to: @dhiltgen on GitHub. I am currently exploring the capabilities of LLaMA 2 for various NLP tasks and am in the process of setting up the necessary hardware environment to ensure optimal performance. Given the complexity and resource-intensive nature of LLaMA 2(70B), I am seeking advice on the most suitable CPU and GPU configurations that can deliver the best performance for training and inference tasks with this model. Specifically, I am interested in: - Recommendations for CPU and GPU models that are known to work well with LLaMA 2, considering both performance and cost-efficiency. - Any available estimation charts or benchmarks that illustrate the performance of LLaMA 2 with different hardware configurations. This information would be incredibly helpful for planning hardware investments and understanding the expected model throughput and latency.
Author
Owner

@igorschlum commented on GitHub (Feb 7, 2024):

Hi @gautam-fairpe I bought a Mac Station with Max memory. (192GB), si J cab Run simultaneously and locally all the LLM.

It's quick and I know that ai will always be able to sell it if I'm not more happy with it.

<!-- gh-comment-id:1933108515 --> @igorschlum commented on GitHub (Feb 7, 2024): Hi @gautam-fairpe I bought a Mac Station with Max memory. (192GB), si J cab Run simultaneously and locally all the LLM. It's quick and I know that ai will always be able to sell it if I'm not more happy with it.
Author
Owner

@jmorganca commented on GitHub (May 7, 2024):

For Llama2/3 70B you'd want at least 48GB of VRAM, if not closer to 64GB to fit the model and context size

<!-- gh-comment-id:2097128324 --> @jmorganca commented on GitHub (May 7, 2024): For Llama2/3 70B you'd want at least 48GB of VRAM, if not closer to 64GB to fit the model and context size
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1392