[GH-ISSUE #13622] poor vram allocations #34724

Closed
opened 2026-04-22 18:32:51 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @wOvAN on GitHub (Jan 4, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13622

What is the issue?

Image as you see, a lot of vram not used at all, while model gets offloaded to ram. for example with llama.cpp I can run larger context and quants on the same system.

also would be good to load a model layers in parallel for multi gpus at once in cases where model storage faster than gpu's pci link speed.

Relevant log output


OS

Linux, Docker

GPU

Nvidia

CPU

No response

Ollama version

0.13.5

Originally created by @wOvAN on GitHub (Jan 4, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13622 ### What is the issue? 1. <img width="1650" height="496" alt="Image" src="https://github.com/user-attachments/assets/b820ff14-9f50-4062-905b-cca2a0f21103" /> as you see, a lot of vram not used at all, while model gets offloaded to ram. for example with llama.cpp I can run larger context and quants on the same system. 2. also would be good to load a model layers in parallel for multi gpus at once in cases where model storage faster than gpu's pci link speed. ### Relevant log output ```shell ``` ### OS Linux, Docker ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.13.5
GiteaMirror added the bug label 2026-04-22 18:32:51 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 5, 2026):

GLM is not yet supported in the new ollama engine and so uses the older memory estimation logic, which is sometimes inaccurate. More VRAM can be used by explictly setting num_gpu.

<!-- gh-comment-id:3708575766 --> @rick-github commented on GitHub (Jan 5, 2026): GLM is not yet supported in the new ollama engine and so uses the older memory estimation logic, which is sometimes inaccurate. More VRAM can be used by explictly setting [`num_gpu`](https://github.com/ollama/ollama/issues/6950#issuecomment-2373663650).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34724