[GH-ISSUE #4809] Add information on RAM and VRAM requirements of model quantization in library #3035

Open
opened 2026-04-12 13:27:21 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @geroldmeisinger on GitHub (Jun 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4809

this would help to decide which model can run
a) at all
b) at highest-quality although slowly
c) at highest-quality at GPU-speed
before downloading huge files

I think one of the great plus of Ollama is the curated model library. adding memory information would make it even more convenient.

thank you for this great tool!

Originally created by @geroldmeisinger on GitHub (Jun 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4809 this would help to decide which model can run a) at all b) at highest-quality although slowly c) at highest-quality at GPU-speed before downloading huge files I think one of the great plus of Ollama is the curated model library. adding memory information would make it even more convenient. _thank you for this great tool!_
GiteaMirror added the feature request label 2026-04-12 13:27:21 -05:00
Author
Owner

@pdevine commented on GitHub (Jul 11, 2024):

@geroldmeisinger sorry for the slow response. Ollama actually does graph prediction to see whether or not a model can be loaded into memory, however, the graph is actually different depending on whether you're loading it onto GPU or CPU (or split between the two), so it's somewhat difficult to know ahead of time unless we know the specs of your system.

I love the idea in principle though. It's really frustrating to download a model and then find out it's not going to run.

<!-- gh-comment-id:2221881482 --> @pdevine commented on GitHub (Jul 11, 2024): @geroldmeisinger sorry for the slow response. Ollama actually does graph prediction to see whether or not a model can be loaded into memory, however, the graph is actually different depending on whether you're loading it onto GPU or CPU (or split between the two), so it's _somewhat_ difficult to know ahead of time unless we know the specs of your system. I love the idea in principle though. It's really frustrating to download a model and then find out it's not going to run.
Author
Owner

@geroldmeisinger commented on GitHub (Jul 11, 2024):

thanks for the answer. is it really that different from system to system though? most GPUs come with either 8GB, 12GB, 16GB, 24GB... VRAM so knowing the +-2GB bracket would be close enough.

<!-- gh-comment-id:2222101652 --> @geroldmeisinger commented on GitHub (Jul 11, 2024): thanks for the answer. is it really that different from system to system though? most GPUs come with either 8GB, 12GB, 16GB, 24GB... VRAM so knowing the +-2GB bracket would be close enough.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3035