[PR #8935] [CLOSED] Prevent model thrashing from unset API fields. #44067

Closed
opened 2026-04-24 23:36:17 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/8935
Author: @rick-github
Created: 2/8/2025
Status: Closed

Base: mainHead: model-default


📄 Description

TLDR: a model shouldn't be evicted due to different valued API fields if the client doesn't care about those fields.

This is a superseding PR to #8029, which only dealt with num_ctx. It turns out there are other fields that have the same effect.

Client A loads a model with a context window different to the default or the value configured in the Modelfile:

$ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}'
$ ollama ps
NAME               ID              SIZE     PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    13 GB    100% GPU     Forever 

Client B does a completion but doesn't specify a context window, causing the default value of 2048 to be used, resulting in eviction and immediate reload of the model.

$ curl localhost:11434/api/generate -d '{"model":"llama3.2"}'
$ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    3.1 GB    100% GPU     Forever    

Client A sends another completion with the large context causing another eviction and reload.

$ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}'
$ ollama ps
NAME               ID              SIZE     PROCESSOR    UNTIL   
llama3.2:latest    a80c4f17acd5    13 GB    100% GPU     Forever    

If client B is not concerned about the context window, it shouldn't cause the eviction of an an already loaded model. This is particularly noticeable when sharing a model between ollama and OpenAI endpoints - since the OpenAI endpoint can't set a context window, a model loaded via the ollama endpoint with a custom context window gets evicted by the next OpenAI request.

Thrashing can also occur when a client makes secondary completions after a primary completion, eg open-webui's auto-complete feature (see https://github.com/ollama/ollama/issues/7919#issuecomment-2560465774), or when a model is used for both completion and embedding (https://github.com/ollama/ollama/issues/6148#issuecomment-2568402497).

This also happens with other fields, eg use_mlock: #8903, #8922.

Fixes: #8903
Fixes: #8922


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/8935 **Author:** [@rick-github](https://github.com/rick-github) **Created:** 2/8/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `model-default` --- ### 📄 Description TLDR: a model shouldn't be evicted due to different valued API fields if the client doesn't care about those fields. This is a superseding PR to #8029, which only dealt with `num_ctx`. It turns out there are other fields that have the same effect. Client A loads a model with a context window different to the default or the value configured in the Modelfile: ```console $ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}' $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:latest a80c4f17acd5 13 GB 100% GPU Forever ``` Client B does a completion but doesn't specify a context window, causing the default value of 2048 to be used, resulting in eviction and immediate reload of the model. ```console $ curl localhost:11434/api/generate -d '{"model":"llama3.2"}' $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:latest a80c4f17acd5 3.1 GB 100% GPU Forever ``` Client A sends another completion with the large context causing another eviction and reload. ```console $ curl localhost:11434/api/generate -d '{"model":"llama3.2","options":{"num_ctx":65536}}' $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.2:latest a80c4f17acd5 13 GB 100% GPU Forever ``` If client B is not concerned about the context window, it shouldn't cause the eviction of an an already loaded model. This is particularly noticeable when sharing a model between ollama and OpenAI endpoints - since the OpenAI endpoint can't set a context window, a model loaded via the ollama endpoint with a custom context window gets evicted by the next OpenAI request. Thrashing can also occur when a client makes secondary completions after a primary completion, eg open-webui's auto-complete feature (see https://github.com/ollama/ollama/issues/7919#issuecomment-2560465774), or when a model is used for both completion and embedding (https://github.com/ollama/ollama/issues/6148#issuecomment-2568402497). This also happens with other fields, eg `use_mlock`: #8903, #8922. Fixes: #8903 Fixes: #8922 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 23:36:17 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44067