[GH-ISSUE #1303] Memory required to run differs from expectation #47185

Closed
opened 2026-04-28 03:25:05 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @technovangelist on GitHub (Nov 28, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1303

After discussing internally, it was suggested that as long as we have enough total memory across ram and vram, the model should load. Layers are loaded into main memory then offloaded into vram. So I tried with different memory sizes and number of attached T4 cards with 16-ish GB vram each.

When there is 16 GB RAM and 4x T4, adding up to 76GB, I get a timeout: Error: timed out waiting for llama runner to start.

I get the same error with 30GB RAM, and 60GB RAM. Its not until i go to the next threshold (100 GB) with the 4x T4 that it loads correctly.

We need to clarify how much memory is required to run models. This is easy where we started on Apple Silicon because there is one number. But more complicated on Nvidia.

Originally created by @technovangelist on GitHub (Nov 28, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1303 After discussing internally, it was suggested that as long as we have enough total memory across ram and vram, the model should load. Layers are loaded into main memory then offloaded into vram. So I tried with different memory sizes and number of attached T4 cards with 16-ish GB vram each. When there is 16 GB RAM and 4x T4, adding up to 76GB, I get a timeout: `Error: timed out waiting for llama runner to start`. I get the same error with 30GB RAM, and 60GB RAM. Its not until i go to the next threshold (100 GB) with the 4x T4 that it loads correctly. We need to clarify how much memory is required to run models. This is easy where we started on Apple Silicon because there is one number. But more complicated on Nvidia.
Author
Owner

@phalexo commented on GitHub (Dec 14, 2023):

@technovangelist The function that computes NVIDIA GPU VRAM ignores CUDA_VISIBLE_DEVICES variable and considers VRAM from all GPUs, even if some of them cannot be used.

<!-- gh-comment-id:1856338876 --> @phalexo commented on GitHub (Dec 14, 2023): @technovangelist The function that computes NVIDIA GPU VRAM ignores CUDA_VISIBLE_DEVICES variable and considers VRAM from all GPUs, even if some of them cannot be used.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47185