[PR #10003] server: prevent model thrashing from unset API fields #44363

Open
opened 2026-04-24 23:52:19 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10003
Author: @rick-github
Created: 3/26/2025
Status: 🔄 Open

Base: mainHead: model-default


📝 Commits (6)

  • ae3d77b server: prevent model thrashing from unset API fields
  • 58ebe97 fix typo
  • 0d12c08 Merge branch 'main' into model-default
  • 0d8b5ae remove deprecated option use_mlock
  • 2512543 Merge branch 'main' into model-default
  • b8181de fix formatting

📊 Changes

3 files changed (+47 additions, -2 deletions)

View changed files

📝 llm/server.go (+12 -0)
📝 server/routes.go (+33 -2)
📝 server/sched_test.go (+2 -0)

📄 Description

TLDR: a model shouldn't be evicted due to different valued API fields if the client doesn't care about those fields.

This is a superseding PR to #8029, which only dealt with num_ctx. It turns out there are other fields that have the same effect.

Client A loads a model with a context window different to the default or the value configured in the Modelfile:

$ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}'
$ ollama ps
NAME               ID              SIZE     PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    13 GB    100% GPU     Forever 

Client B does a completion but doesn't specify a context window, causing the default value of 2048 to be used, resulting in eviction and immediate reload of the model.

$ curl localhost:11434/api/generate -d '{"model":"llama3.2"}'
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     Forever    

Client A sends another completion with the large context causing another eviction and reload.

$ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}'
$ ollama ps
NAME               ID              SIZE     PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    13 GB    100% GPU     Forever    

If client B is not concerned about the context window, it shouldn't cause the eviction of an an already loaded model. This is particularly noticeable when sharing a model between ollama and OpenAI endpoints - since the OpenAI endpoint can't set a context window, a model loaded via the ollama endpoint with a custom context window gets evicted by the next OpenAI request.

Thrashing can also occur when a client makes secondary completions after a primary completion, eg open-webui's auto-complete feature (see https://github.com/ollama/ollama/issues/7919#issuecomment-2560465774), or when a model is used for both completion and embedding (https://github.com/ollama/ollama/issues/6148#issuecomment-2568402497).

This also happens with other fields, eg use_mlock: #8903, #8922.

Fixes: #8903
Fixes: #8922

I messed up the rebase, see #8935 for previous discussion.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10003 **Author:** [@rick-github](https://github.com/rick-github) **Created:** 3/26/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `model-default` --- ### 📝 Commits (6) - [`ae3d77b`](https://github.com/ollama/ollama/commit/ae3d77b6def551f07eedd6535f5188f7a7a35960) server: prevent model thrashing from unset API fields - [`58ebe97`](https://github.com/ollama/ollama/commit/58ebe97d48fc03fc516db9a59c6798a114977007) fix typo - [`0d12c08`](https://github.com/ollama/ollama/commit/0d12c086fcc39a08230bfc3e6e4eb39d62c69d30) Merge branch 'main' into model-default - [`0d8b5ae`](https://github.com/ollama/ollama/commit/0d8b5aea2e37f4ee2f5f7d864b65d881503be8da) remove deprecated option use_mlock - [`2512543`](https://github.com/ollama/ollama/commit/25125438c32e043eb2f79ed35cc762ed7c0de7d1) Merge branch 'main' into model-default - [`b8181de`](https://github.com/ollama/ollama/commit/b8181deb8e2ce0360666ea61acb25334454c0ef3) fix formatting ### 📊 Changes **3 files changed** (+47 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+12 -0) 📝 `server/routes.go` (+33 -2) 📝 `server/sched_test.go` (+2 -0) </details> ### 📄 Description TLDR: a model shouldn't be evicted due to different valued API fields if the client doesn't care about those fields. This is a superseding PR to #8029, which only dealt with `num_ctx`. It turns out there are other fields that have the same effect. Client A loads a model with a context window different to the default or the value configured in the Modelfile: ```console $ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}' $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:latest a80c4f17acd5 13 GB 100% GPU Forever ``` Client B does a completion but doesn't specify a context window, causing the default value of 2048 to be used, resulting in eviction and immediate reload of the model. ```console $ curl localhost:11434/api/generate -d '{"model":"llama3.2"}' $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU Forever ``` Client A sends another completion with the large context causing another eviction and reload. ```console $ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}' $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:latest a80c4f17acd5 13 GB 100% GPU Forever ``` If client B is not concerned about the context window, it shouldn't cause the eviction of an an already loaded model. This is particularly noticeable when sharing a model between ollama and OpenAI endpoints - since the OpenAI endpoint can't set a context window, a model loaded via the ollama endpoint with a custom context window gets evicted by the next OpenAI request. Thrashing can also occur when a client makes secondary completions after a primary completion, eg open-webui's auto-complete feature (see https://github.com/ollama/ollama/issues/7919#issuecomment-2560465774), or when a model is used for both completion and embedding (https://github.com/ollama/ollama/issues/6148#issuecomment-2568402497). This also happens with other fields, eg `use_mlock`: #8903, #8922. Fixes: #8903 Fixes: #8922 I messed up the rebase, see #8935 for previous discussion. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 23:52:19 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44363