[GH-ISSUE #14257] Built‑in Hardware Benchmark Tool for Model Compatibility #55796

Open
opened 2026-04-29 09:44:55 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @xkinkykongx on GitHub (Feb 14, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14257

🚀 Feature Request: Built‑in Hardware Benchmark Tool for Model Compatibility

Summary

Ollama allows users to run many different models, but it’s often unclear which models are suitable for a given system. Users frequently run into slow performance, failed loads, or crashes because they don’t know the hardware requirements. A built‑in benchmark tool would solve this problem.

Proposal

Introduce a command such as:

ollama benchmark

This command would:

  • Detect system hardware (CPU, GPU, VRAM, RAM, architecture)
  • Run a short standardized performance test
  • Produce a compatibility report showing:
    • Recommended models for this system
    • Borderline models
    • Models not suitable for the hardware
    • Expected tokens/sec for each category
    • Suggested quantization levels (Q4, Q5, Q8, etc.)

Why This Matters

  • Reduces user frustration and guesswork
  • Improves onboarding for new users
  • Prevents unnecessary support issues
  • Helps users choose the right model and quantization
  • Fits Ollama’s philosophy of simplicity and local‑first design

Example Output (Concept)

Ollama Benchmark Results

Hardware:

  • GPU: NVIDIA RTX 3060 (12GB VRAM)
  • CPU: AMD Ryzen 7 5800X
  • RAM: 32GB

Recommended Models:

  • Llama 3.1 8B (Q4, Q5, Q8)
  • Mistral 7B (all quantizations)
  • Phi-3 Mini

Borderline:

  • Llama 3.1 70B (Q2 only, slow)
  • Mixtral 8x22B (Q2 only, slow)

Not Recommended:

  • Llama 405B
  • Any model requiring >12GB VRAM

Estimated Performance:

  • Llama 3.1 8B Q4: ~42 tokens/sec
  • Mistral 7B Q5: ~38 tokens/sec

Optional Enhancements

  • ollama benchmark --quick for a fast test
  • ollama benchmark --full for a detailed test
  • JSON export for tooling
  • Integration with ollama run to warn users before loading an unsuitable model

Closing Thoughts

A built‑in benchmark tool would make Ollama more user‑friendly, reduce friction, and help users get the best performance out of their hardware. Thanks for considering this feature!

Originally created by @xkinkykongx on GitHub (Feb 14, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14257 # 🚀 Feature Request: Built‑in Hardware Benchmark Tool for Model Compatibility ## Summary Ollama allows users to run many different models, but it’s often unclear which models are suitable for a given system. Users frequently run into slow performance, failed loads, or crashes because they don’t know the hardware requirements. A built‑in benchmark tool would solve this problem. ## Proposal Introduce a command such as: ollama benchmark This command would: - Detect system hardware (CPU, GPU, VRAM, RAM, architecture) - Run a short standardized performance test - Produce a compatibility report showing: - Recommended models for this system - Borderline models - Models not suitable for the hardware - Expected tokens/sec for each category - Suggested quantization levels (Q4, Q5, Q8, etc.) ## Why This Matters - Reduces user frustration and guesswork - Improves onboarding for new users - Prevents unnecessary support issues - Helps users choose the right model and quantization - Fits Ollama’s philosophy of simplicity and local‑first design ## Example Output (Concept) Ollama Benchmark Results ------------------------ Hardware: - GPU: NVIDIA RTX 3060 (12GB VRAM) - CPU: AMD Ryzen 7 5800X - RAM: 32GB Recommended Models: - Llama 3.1 8B (Q4, Q5, Q8) - Mistral 7B (all quantizations) - Phi-3 Mini Borderline: - Llama 3.1 70B (Q2 only, slow) - Mixtral 8x22B (Q2 only, slow) Not Recommended: - Llama 405B - Any model requiring >12GB VRAM Estimated Performance: - Llama 3.1 8B Q4: ~42 tokens/sec - Mistral 7B Q5: ~38 tokens/sec ## Optional Enhancements - `ollama benchmark --quick` for a fast test - `ollama benchmark --full` for a detailed test - JSON export for tooling - Integration with `ollama run` to warn users before loading an unsuitable model ## Closing Thoughts A built‑in benchmark tool would make Ollama more user‑friendly, reduce friction, and help users get the best performance out of their hardware. Thanks for considering this feature!
GiteaMirror added the feature request label 2026-04-29 09:44:55 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 14, 2026):

Related: #9774

<!-- gh-comment-id:3902519414 --> @rick-github commented on GitHub (Feb 14, 2026): Related: #9774
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55796