[GH-ISSUE #9720] Loading of models into main memory limited to 1-3GB even from fast storage media #68409

Open
opened 2026-05-04 13:51:21 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @m-schenker on GitHub (Mar 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9720

What is the issue?

I am on a Fedora 41 system with a mdadm array that can be read at 14GB/s by a single thread or 22GB/s by multiple threads but ollama loads the models into main memory at a rate from about 1GB to 3GB per second. This is especially bothersome because loading larger models takes minutes. This is true for both native installation as well as the ollama container image run by podman.

Relevant log output


OS

Linux

GPU

Intel

CPU

AMD

Ollama version

0.6.0

Originally created by @m-schenker on GitHub (Mar 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9720 ### What is the issue? I am on a Fedora 41 system with a mdadm array that can be read at 14GB/s by a single thread or 22GB/s by multiple threads but ollama loads the models into main memory at a rate from about 1GB to 3GB per second. This is especially bothersome because loading larger models takes minutes. This is true for both native installation as well as the ollama container image run by podman. ### Relevant log output ```shell ``` ### OS Linux ### GPU Intel ### CPU AMD ### Ollama version 0.6.0
GiteaMirror added the bug label 2026-05-04 13:51:21 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68409