[GH-ISSUE #3166] Please add the memory requirement estimate if run with cpu and vram request for run with GPU for each model in model list. #1950

Open
opened 2026-04-12 12:05:49 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @JerryYao75 on GitHub (Mar 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3166

What are you trying to do?

I want to know if my computer can support the model or not, but currently no one can tell me.

How should we solve this?

Add the memory needed for each model tag if run on cpu.
Add the vram needed for each model tag if run on GPU.

What is the impact of not solving this?

All users had to download each model and test, this will be a big waste of resources.

Anything else?

No response

Originally created by @JerryYao75 on GitHub (Mar 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3166 ### What are you trying to do? I want to know if my computer can support the model or not, but currently no one can tell me. ### How should we solve this? Add the memory needed for each model tag if run on cpu. Add the vram needed for each model tag if run on GPU. ### What is the impact of not solving this? All users had to download each model and test, this will be a big waste of resources. ### Anything else? _No response_
GiteaMirror added the feature request label 2026-04-12 12:05:49 -05:00
Author
Owner

@mkos11 commented on GitHub (Mar 15, 2024):

To provide memory (RAM) and video memory (VRAM) requests for each model in a model list, I'll assume you are referring to machine learning or deep learning models commonly used in AI applications. Keep in mind that these are approximate values and actual resource requirements may vary based on specific implementations, dataset sizes, and hardware configurations. Also, GPU memory requirements can vary based on the batch size used during training.

Here is an example list of models along with their approximate memory and VRAM requests:

  1. ResNet50:

    • CPU Memory Request: Around 4-8 GB RAM
    • GPU VRAM Request: Around 2-4 GB VRAM
  2. MobileNetV2:

    • CPU Memory Request: Around 2-4 GB RAM
    • GPU VRAM Request: Around 1-2 GB VRAM
  3. BERT (Base):

    • CPU Memory Request: Around 8-16 GB RAM
    • GPU VRAM Request: Around 6-12 GB VRAM
  4. YOLOv3:

    • CPU Memory Request: Around 4-8 GB RAM
    • GPU VRAM Request: Around 4-8 GB VRAM
  5. GPT-3 (Small):

    • CPU Memory Request: Around 16-32 GB RAM
    • GPU VRAM Request: Around 8-16 GB VRAM
  6. InceptionV3:

    • CPU Memory Request: Around 4-8 GB RAM
    • GPU VRAM Request: Around 2-4 GB VRAM
  7. VGG16:

    • CPU Memory Request: Around 8-16 GB RAM
    • GPU VRAM Request: Around 4-8 GB VRAM
  8. LSTM (Long Short-Term Memory):

    • CPU Memory Request: Around 8-16 GB RAM
    • GPU VRAM Request: Around 4-8 GB VRAM

These are rough estimates and can vary based on factors such as batch size, input data size, model complexity, and specific hardware configurations. It's always a good practice to monitor resource usage during model training to optimize resource allocation and performance.

<!-- gh-comment-id:1999313900 --> @mkos11 commented on GitHub (Mar 15, 2024): To provide memory (RAM) and video memory (VRAM) requests for each model in a model list, I'll assume you are referring to machine learning or deep learning models commonly used in AI applications. Keep in mind that these are approximate values and actual resource requirements may vary based on specific implementations, dataset sizes, and hardware configurations. Also, GPU memory requirements can vary based on the batch size used during training. Here is an example list of models along with their approximate memory and VRAM requests: 1. **ResNet50**: - CPU Memory Request: Around 4-8 GB RAM - GPU VRAM Request: Around 2-4 GB VRAM 2. **MobileNetV2**: - CPU Memory Request: Around 2-4 GB RAM - GPU VRAM Request: Around 1-2 GB VRAM 3. **BERT (Base)**: - CPU Memory Request: Around 8-16 GB RAM - GPU VRAM Request: Around 6-12 GB VRAM 4. **YOLOv3**: - CPU Memory Request: Around 4-8 GB RAM - GPU VRAM Request: Around 4-8 GB VRAM 5. **GPT-3 (Small)**: - CPU Memory Request: Around 16-32 GB RAM - GPU VRAM Request: Around 8-16 GB VRAM 6. **InceptionV3**: - CPU Memory Request: Around 4-8 GB RAM - GPU VRAM Request: Around 2-4 GB VRAM 7. **VGG16**: - CPU Memory Request: Around 8-16 GB RAM - GPU VRAM Request: Around 4-8 GB VRAM 8. **LSTM (Long Short-Term Memory)**: - CPU Memory Request: Around 8-16 GB RAM - GPU VRAM Request: Around 4-8 GB VRAM These are rough estimates and can vary based on factors such as batch size, input data size, model complexity, and specific hardware configurations. It's always a good practice to monitor resource usage during model training to optimize resource allocation and performance.
Author
Owner

@santiago-afonso commented on GitHub (Apr 16, 2024):

Some integration with an LLM VRAM calculator like this https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator would be very helpful. A simple indication of whether one can run a model either on GPU or CPU in personal devices.

(Btw the ollama readme does have some model size estimates https://github.com/ollama/ollama, but the VRAM calc seems more complete.)

<!-- gh-comment-id:2059310041 --> @santiago-afonso commented on GitHub (Apr 16, 2024): Some integration with an LLM VRAM calculator like this https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator would be very helpful. A simple indication of whether one can run a model either on GPU or CPU in personal devices. (Btw the ollama readme does have some model size estimates https://github.com/ollama/ollama, but the VRAM calc seems more complete.)
Author
Owner

@robertvazan commented on GitHub (Nov 2, 2024):

Context (KV cache) memory estimates would be particularly useful, because models have wildly differing KV cache sizes, which cannot be guessed from model size. This should include information about effective context size, which is usually default (see #1005), but it can be modified in model file. In order to set context size in front-ends correctly, it would be useful to know how many tokens fit in 1GB. Currently, one has to attempt to load the model and then search Ollama log for KV cache allocation info.

<!-- gh-comment-id:2452896622 --> @robertvazan commented on GitHub (Nov 2, 2024): Context (KV cache) memory estimates would be particularly useful, because models have wildly differing KV cache sizes, which cannot be guessed from model size. This should include information about effective context size, which is usually default (see #1005), but it can be modified in model file. In order to set context size in front-ends correctly, it would be useful to know how many tokens fit in 1GB. Currently, one has to attempt to load the model and then search Ollama log for KV cache allocation info.
Author
Owner

@tdbe commented on GitHub (Mar 23, 2026):

At least tell the user after the fact what allocation NUMBER was tried that failed when you get this annoying message: "Error: 500 Internal Server Error: memory layout cannot be allocated"

(If you have access to the ollama server, and the server.log or server's command line output, then you can look for this line time=2026-03-23T18:12:18.431+02:00 level=INFO source=device.go:272 msg="total memory" size="75.8 GiB")

<!-- gh-comment-id:4111861332 --> @tdbe commented on GitHub (Mar 23, 2026): At least tell the user after the fact what allocation NUMBER was tried that failed when you get this annoying message: "Error: 500 Internal Server Error: memory layout cannot be allocated" (If you have access to the ollama server, and the `server.log` or server's command line output, then you can look for this line `time=2026-03-23T18:12:18.431+02:00 level=INFO source=device.go:272 msg="total memory" size="75.8 GiB"`)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1950