[GH-ISSUE #3386] Loading the model on VM from attached volumes is extremely slow #64119

Closed
opened 2026-05-03 16:19:44 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @levy42 on GitHub (Mar 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3386

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

When pulling the model and running it the first time everything works fine.
However, after deallocating the VM and starting it again (attaching a permanent disk with Ollama models downloaded) it takes more than 20 minutes to load any large model.
It seems it's loading it to the CPU first with a speed of 100 MB per second.
It doesn't happen when I download a new model with "ollama pull" && "ollama run", only with models that were attached.

What did you expect to see?

Same loading time as after downloading the model

Steps to reproduce

  • Install Ollama on VM Ubuntu 22.04
  • ollama pull llama2:70b
  • ollama run llama2:70b --> loads fast
  • restart VM (deallocate)
  • ollama run llama2:70b --> takes 20x longer to start

Are there any recent changes that introduced the issue?

No response

OS

Linux

Architecture

x86

Platform

No response

Ollama version

0.1.28

GPU

Nvidia

GPU info

Nvidia A100

CPU

No response

Other software

No response

Originally created by @levy42 on GitHub (Mar 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3386 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? When pulling the model and running it the first time everything works fine. However, after deallocating the VM and starting it again (attaching a permanent disk with Ollama models downloaded) it takes more than 20 minutes to load any large model. It seems it's loading it to the CPU first with a speed of 100 MB per second. It doesn't happen when I download a new model with "ollama pull" && "ollama run", only with models that were attached. ### What did you expect to see? Same loading time as after downloading the model ### Steps to reproduce - Install Ollama on VM Ubuntu 22.04 - ollama pull llama2:70b - ollama run llama2:70b _--> loads fast_ - restart VM (deallocate) - ollama run llama2:70b --> _takes 20x longer to start_ ### Are there any recent changes that introduced the issue? _No response_ ### OS Linux ### Architecture x86 ### Platform _No response_ ### Ollama version 0.1.28 ### GPU Nvidia ### GPU info Nvidia A100 ### CPU _No response_ ### Other software _No response_
GiteaMirror added the bug label 2026-05-03 16:19:44 -05:00
Author
Owner

@mxyng commented on GitHub (Mar 28, 2024):

Is it a cloud platform, such as AWS, GCP, or self-hosted? If self-hosted, what hypervisor? What kind of disk is attached to the VM? Please provide as much detail as you can

<!-- gh-comment-id:2025685478 --> @mxyng commented on GitHub (Mar 28, 2024): Is it a cloud platform, such as AWS, GCP, or self-hosted? If self-hosted, what hypervisor? What kind of disk is attached to the VM? Please provide as much detail as you can
Author
Owner

@OPDEV001 commented on GitHub (Apr 2, 2024):

x86? Hoping that you just typed a wrong, I thought it should be x86-64, because the memory on x86 is too low to run ollama on any models.

And what disk type you have? Local SSD or iSCSI?

It is easy to test the disk performance on Ubuntu. You can try any test software on first time before starting up ollama and second time before ollama.

Thanks,

<!-- gh-comment-id:2032130861 --> @OPDEV001 commented on GitHub (Apr 2, 2024): x86? Hoping that you just typed a wrong, I thought it should be x86-64, because the memory on x86 is too low to run ollama on any models. And what disk type you have? Local SSD or iSCSI? It is easy to test the disk performance on Ubuntu. You can try any test software on first time before starting up ollama and second time before ollama. Thanks,
Author
Owner

@dhiltgen commented on GitHub (Jun 1, 2024):

@levy42 if you're still having performance problems, can you share more information about your setup and I'll reopen.

This feels more like a hypervisor/virtualization platform performance issue and less likely to be Ollama causing the problem, we just do a lot of I/O on large files. For example, if your virtualization layer is doing a disk replication from one server to another when you restart/recreate the VM, that could be causing slow I/O. Share more information about what hypervisor you're using, where the virtual disks are being stored, is there a storage array, what the mechanism is for re-creating the VM, are multiple hosts involved or is this all on a single host, etc. Anything that can help us understand how I/O work, as that feels like the most likely cause of the poor performance.

<!-- gh-comment-id:2143615636 --> @dhiltgen commented on GitHub (Jun 1, 2024): @levy42 if you're still having performance problems, can you share more information about your setup and I'll reopen. This feels more like a hypervisor/virtualization platform performance issue and less likely to be Ollama causing the problem, we just do a lot of I/O on large files. For example, if your virtualization layer is doing a disk replication from one server to another when you restart/recreate the VM, that could be causing slow I/O. Share more information about what hypervisor you're using, where the virtual disks are being stored, is there a storage array, what the mechanism is for re-creating the VM, are multiple hosts involved or is this all on a single host, etc. Anything that can help us understand how I/O work, as that feels like the most likely cause of the poor performance.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64119