[PR #3895] [MERGED] Move ggml loading to when attempting to fit #11312

Closed
opened 2026-04-12 23:27:31 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3895
Author: @brycereitano
Created: 4/24/2024
Status: Merged
Merged: 4/25/2024
Merged by: @dhiltgen

Base: mainHead: shiftloading


📝 Commits (3)

  • 284e02b Move ggml loading to when we attempt fitting
  • ceb0e26 Provide variable ggml for TestLoad
  • 36a6dac Restructure loading conditional chain

📊 Changes

2 files changed (+45 additions, -36 deletions)

View changed files

📝 server/sched.go (+29 -25)
📝 server/sched_test.go (+16 -11)

📄 Description

Fixes #3860

This moves the loading of the model until we attempt to see if we need we can fit the model into memory. I decided to not pull it out into a separate function as I don't think the 4 lines warrants it especially after moving some of the logic around. I kept that in a separate commit to easily rollback if the preference is to keep them separated in the main conditional flow.

  • Tested on a device without a GPU, loading a single model and concurrent requests to multiple models.
  • Test on a device with a dedicated GPU, limiting it to a single model as well as loading multiple models and making concurrent requests.

Additionally, fixed a panic in the tests when running with -race.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3895 **Author:** [@brycereitano](https://github.com/brycereitano) **Created:** 4/24/2024 **Status:** ✅ Merged **Merged:** 4/25/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `shiftloading` --- ### 📝 Commits (3) - [`284e02b`](https://github.com/ollama/ollama/commit/284e02bed03647986231566717eb2fbdb3aa8c9c) Move ggml loading to when we attempt fitting - [`ceb0e26`](https://github.com/ollama/ollama/commit/ceb0e26e5e8d60228eaa4e04d85869cb19d823c3) Provide variable ggml for TestLoad - [`36a6dac`](https://github.com/ollama/ollama/commit/36a6daccab4c4353cdfdb88ac7c6c75dbf052231) Restructure loading conditional chain ### 📊 Changes **2 files changed** (+45 additions, -36 deletions) <details> <summary>View changed files</summary> 📝 `server/sched.go` (+29 -25) 📝 `server/sched_test.go` (+16 -11) </details> ### 📄 Description Fixes #3860 This moves the loading of the model until we attempt to see if we need we can fit the model into memory. I decided to not pull it out into a separate function as I don't think the 4 lines warrants it especially after moving some of the logic around. I kept that in a separate commit to easily rollback if the preference is to keep them separated in the main conditional flow. - Tested on a device without a GPU, loading a single model and concurrent requests to multiple models. - Test on a device with a dedicated GPU, limiting it to a single model as well as loading multiple models and making concurrent requests. Additionally, fixed a panic in the tests when running with `-race`. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:27:31 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11312