[GH-ISSUE #8330] Using the Ollama 0.5.4 will cause the pull progress to decrease instead of increase. #51849

Closed
opened 2026-04-28 21:04:00 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @leoho0722 on GitHub (Jan 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8330

What is the issue?

Hi! I created a CPU Instance in HPC-AI.com to pull llama3.3:70b-instruct-fp16 and store it in Shared HighSpeedStorage for subsequent inference in the GPU Instance.

The Ollama version installed in the CPU Instance is 0.5.4, as shown below:

截圖 2025-01-07 13 50 02

However, when I pulled the model in CPU Instance, I found that the pull progress decreased instead of increasing, as shown in the following two figures.

截圖 2024-12-26 14 06 11
截圖 2024-12-26 14 19 26

After discussions with HPC-AI.com's technical advisors, HPC-AI.com said there were no issues with their infrastructure, as shown below:

截圖 2025-01-07 13 44 04
截圖 2025-01-07 13 44 45
截圖 2025-01-07 13 45 14
截圖 2025-01-07 13 45 28

I tried to pull llama3.3:70b-instruct-fp16 locally. The pull progress was normal and continued to increase. There was no problem of decreasing the pull progress, as shown in the following two figures.

截圖 2024-12-31 11 28 45
截圖 2024-12-31 11 29 17

But the Ollama version installed locally is 0.3.14

截圖 2025-01-07 13 49 13

I'm wondering if this is a known bug in Ollama 0.5.4?

Looking forward to your reply, thanks.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.4

Originally created by @leoho0722 on GitHub (Jan 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8330 ### What is the issue? Hi! I created a CPU Instance in HPC-AI.com to pull llama3.3:70b-instruct-fp16 and store it in Shared HighSpeedStorage for subsequent inference in the GPU Instance. The Ollama version installed in the CPU Instance is 0.5.4, as shown below: ![截圖 2025-01-07 13 50 02](https://github.com/user-attachments/assets/0367b5ad-fcda-4372-844a-72fb5eae9145) However, when I pulled the model in CPU Instance, I found that the pull progress decreased instead of increasing, as shown in the following two figures. ![截圖 2024-12-26 14 06 11](https://github.com/user-attachments/assets/b6787d06-3e4c-42a8-a8c5-94363affe9d7) ![截圖 2024-12-26 14 19 26](https://github.com/user-attachments/assets/1f8da0cf-70c9-4a0c-a8f2-0d2a635b46dc) After discussions with HPC-AI.com's technical advisors, HPC-AI.com said there were no issues with their infrastructure, as shown below: ![截圖 2025-01-07 13 44 04](https://github.com/user-attachments/assets/7d2a9dd1-d54e-4da0-9fe8-5e21f08f1b37) ![截圖 2025-01-07 13 44 45](https://github.com/user-attachments/assets/d72402c3-942f-4fde-8b3f-60b40cb1a6b5) ![截圖 2025-01-07 13 45 14](https://github.com/user-attachments/assets/f93b96d3-3b03-4c07-a1a9-7f887a9dd555) ![截圖 2025-01-07 13 45 28](https://github.com/user-attachments/assets/0bab93cd-ac56-4a68-a37e-ee6e4276ce64) I tried to pull llama3.3:70b-instruct-fp16 locally. The pull progress was normal and continued to increase. There was no problem of decreasing the pull progress, as shown in the following two figures. ![截圖 2024-12-31 11 28 45](https://github.com/user-attachments/assets/0c7472dd-d392-432b-81cb-8b962568e41f) ![截圖 2024-12-31 11 29 17](https://github.com/user-attachments/assets/d3f4ac10-725a-41ef-978b-3b1c0d2a1bf3) But the Ollama version installed locally is 0.3.14 ![截圖 2025-01-07 13 49 13](https://github.com/user-attachments/assets/cda54846-389f-4a62-8672-28596559234e) I'm wondering if this is a known bug in Ollama 0.5.4? Looking forward to your reply, thanks. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.4
GiteaMirror added the networkingbug labels 2026-04-28 21:04:01 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 7, 2025):

ollama downloads models in parallel chunks, in the case of llama3.3:70b-instruct-fp16 these chunks are 1GB each so ollama creates 142 parallel connections to retrieve the model data. ollama has a stall detector, so that when a data for a chunk hasn't been received for 5 seconds, that chunk is marked as stalled, resulting in a restart of that chunk. This is probably why the percentage complete goes down: a chunk was partially downloaded (say 200M of 1G) but then stalled, so ollama deletes that partially downloaded chunk and tries again. From your screenshot after the first stall, downloaded went from 2.5G to 2.3G, so 200M was discarded for the chunk restart.

The root cause here is that downloads started from the HPC-AI.com instance are stalling. It's not clear if the stall is related to disk or network bandwidth. You mention HighSpeedStorage so I'm going to assume it's not disk. I'm also assuming that HPC-AI.com is well-connected and that they're not throttling parallel connections, and the cloudfare CDN that is serving the model data is also working properly. So the likely problem is an intermediate network issue - something between your instance and the cloudfare server is causing problems (throttling, packet drops, etc). You can try probing the network from your instance (traceroute, tcpdump, ping, etc) to see if there's anything notable (packet drops, latency spikes, network loops, etc) but probably the only thing you can do is retry the download. In a pinch, you could download the model to your local machine and then copy it to your HPC-AI.com instance.

<!-- gh-comment-id:2574510267 --> @rick-github commented on GitHub (Jan 7, 2025): ollama downloads models in parallel chunks, in the case of llama3.3:70b-instruct-fp16 these chunks are 1GB each so ollama creates 142 parallel connections to retrieve the model data. ollama has a stall detector, so that when a data for a chunk hasn't been received for 5 seconds, that chunk is marked as stalled, resulting in a restart of that chunk. This is probably why the percentage complete goes down: a chunk was partially downloaded (say 200M of 1G) but then stalled, so ollama deletes that partially downloaded chunk and tries again. From your screenshot after the first stall, downloaded went from 2.5G to 2.3G, so 200M was discarded for the chunk restart. The root cause here is that downloads started from the HPC-AI.com instance are stalling. It's not clear if the stall is related to disk or network bandwidth. You mention HighSpeedStorage so I'm going to assume it's not disk. I'm also assuming that HPC-AI.com is well-connected and that they're not throttling parallel connections, and the cloudfare CDN that is serving the model data is also working properly. So the likely problem is an intermediate network issue - something between your instance and the cloudfare server is causing problems (throttling, packet drops, etc). You can try probing the network from your instance (traceroute, tcpdump, ping, etc) to see if there's anything notable (packet drops, latency spikes, network loops, etc) but probably the only thing you can do is retry the download. In a pinch, you could download the model to your local machine and then copy it to your HPC-AI.com instance.
Author
Owner

@dpkirchner commented on GitHub (Jan 22, 2025):

5 seconds is a pretty low timeout for 142 parallel downloads IMO -- in my experience I'd estimate it increases the pull time by at least 1000% because the all of those requests overload the network. Is there a way to configure that timeout or decrease the number of chunks that will be downloaded concurrently? Ideally we'd be able to limit the downloads to one chunk at a time.

<!-- gh-comment-id:2606266026 --> @dpkirchner commented on GitHub (Jan 22, 2025): 5 seconds is a pretty low timeout for 142 parallel downloads IMO -- in my experience I'd estimate it increases the pull time by at least 1000% because the all of those requests overload the network. Is there a way to configure that timeout or decrease the number of chunks that will be downloaded concurrently? Ideally we'd be able to limit the downloads to one chunk at a time.
Author
Owner

@Sergsw74 commented on GitHub (Feb 5, 2025):

Also it'd be good to have an option to configure number of parallel downloads.

<!-- gh-comment-id:2635561479 --> @Sergsw74 commented on GitHub (Feb 5, 2025): Also it'd be good to have an option to configure number of parallel downloads.
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

This should be mitigated as of 0.5.8 by #8831 and #9294 provides an overhaul of model pulling, so closing but feel free to add updates if you are still having issues.

<!-- gh-comment-id:2698049466 --> @rick-github commented on GitHub (Mar 4, 2025): This should be mitigated as of 0.5.8 by #8831 and #9294 provides an overhaul of model pulling, so closing but feel free to add updates if you are still having issues.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51849