[PR #10133] [MERGED] Create a new file descriptor for each goroutine during GGML model loading to improve loading from network filesystems. #39025

Closed
opened 2026-04-22 23:40:07 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10133
Author: @danhipke
Created: 4/4/2025
Status: Merged
Merged: 4/5/2025
Merged by: @jmorganca

Base: mainHead: newfd


📝 Commits (3)

  • 4a5c608 Try creating a new fd for each reader.
  • c37533a New approach of opening the file multiple times to get a new FD (cloning reuses the same fd).
  • 0fe6d2d refactor and add comments

📊 Changes

1 file changed (+8 additions, -1 deletions)

View changed files

📝 ml/backend/ggml/ggml.go (+8 -1)

📄 Description

This fixes #9691

Ollama currently reuses the same file descriptor across all goroutines reading the model. GCS Fuse (and often other network-based filesystems) only keeps one open download stream per FD, as you cannot seek within a stream. This means every read at a non-sequential offset has to close the existing download and start a new one at the desired offset, which is slow. This happens frequently as each goroutine is reading from a different offset from the same FD. The error in #9691 is because the model cannot be loaded within the timeout.

This fixes the issue by creating a new FD per goroutine, which each goroutine reads sequentally from.

On Cloud Run with the Cloud Storage volume (powered by GCS Fuse), this fixes the issue where model loading just times out. gemma3:4b loads in 12s and gemma3:27b loads in 32s (measured by model loading time with OLLAMA_DEBUG=1.

I also tested on Google Compute Engine on a n2-standard-8 (8 vCPUs, 32 GB Memory) with the model stored on both HDD (Standard PD) and local SSD to make sure there's no regressions there (it is expected that we get the same performance between the two branches):

% git clone https://github.com/danhipke/ollama.git
% cd ollama
% git checkout newfd
% go build

# test script (new fd per goroutine)
% OLLAMA_MODELS=/mnt/disks/pd OLLAMA_DEBUG=1 ./ollama serve
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3  # 32s
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b  # 2min55
% OLLAMA_MODELS=/mnt/disks/localssd OLLAMA_DEBUG=1 ./ollama serve
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3 # 5s
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b  # 17s

% git checkout main
% go build
# repeat test above

% OLLAMA_MODELS=/mnt/disks/pd OLLAMA_DEBUG=1 ./ollama serve 
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3  # 32s
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b  # 2min55
% OLLAMA_MODELS=/mnt/disks/localssd OLLAMA_DEBUG=1 ./ollama serve
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3 # 5s 
% OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b # 17s

Let me know if there's any other testing needed.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10133 **Author:** [@danhipke](https://github.com/danhipke) **Created:** 4/4/2025 **Status:** ✅ Merged **Merged:** 4/5/2025 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `newfd` --- ### 📝 Commits (3) - [`4a5c608`](https://github.com/ollama/ollama/commit/4a5c608b6c0515fdb049a3ea89cf569d2f9cee9e) Try creating a new fd for each reader. - [`c37533a`](https://github.com/ollama/ollama/commit/c37533ad5c5edf11b0ee879740ed0bbc34c1cb59) New approach of opening the file multiple times to get a new FD (cloning reuses the same fd). - [`0fe6d2d`](https://github.com/ollama/ollama/commit/0fe6d2de15e9adce1ea8c2ffe487c200cda2ac98) refactor and add comments ### 📊 Changes **1 file changed** (+8 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `ml/backend/ggml/ggml.go` (+8 -1) </details> ### 📄 Description This fixes #9691 Ollama currently reuses the same file descriptor across all goroutines reading the model. GCS Fuse (and often other network-based filesystems) only keeps one open download stream per FD, as you cannot seek within a stream. This means every read at a non-sequential offset has to close the existing download and start a new one at the desired offset, which is slow. This happens frequently as each goroutine is reading from a different offset from the same FD. The error in #9691 is because the model cannot be loaded within the timeout. This fixes the issue by creating a new FD per goroutine, which each goroutine reads sequentally from. On Cloud Run with the Cloud Storage volume (powered by GCS Fuse), this fixes the issue where model loading just times out. `gemma3:4b` loads in 12s and `gemma3:27b` loads in 32s (measured by model loading time with `OLLAMA_DEBUG=1`. I also tested on Google Compute Engine on a `n2-standard-8` (8 vCPUs, 32 GB Memory) with the model stored on both HDD (Standard PD) and local SSD to make sure there's no regressions there (it is expected that we get the same performance between the two branches): ``` % git clone https://github.com/danhipke/ollama.git % cd ollama % git checkout newfd % go build # test script (new fd per goroutine) % OLLAMA_MODELS=/mnt/disks/pd OLLAMA_DEBUG=1 ./ollama serve % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3 # 32s % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b # 2min55 % OLLAMA_MODELS=/mnt/disks/localssd OLLAMA_DEBUG=1 ./ollama serve % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3 # 5s % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b # 17s % git checkout main % go build # repeat test above % OLLAMA_MODELS=/mnt/disks/pd OLLAMA_DEBUG=1 ./ollama serve % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3 # 32s % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b # 2min55 % OLLAMA_MODELS=/mnt/disks/localssd OLLAMA_DEBUG=1 ./ollama serve % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3 # 5s % OLLAMA_HOST=http://localhost:11434 ~/ollama/ollama run gemma3:27b # 17s ``` Let me know if there's any other testing needed. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 23:40:07 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#39025