[GH-ISSUE #12288] feat: context_length model setting estimator #16535

New Issue

2026-04-19T22:26:02-05:00

GiteaMirror commented

2026-04-19 22:26:02 -05:00

Originally created by @TheSpaceGod on GitHub (Apr 1, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12288

Check Existing Issues

I have searched the existing issues and discussions.

Problem Description

After playing around with trying to get the biggest context length I can get for several models for my specific GPUs VRAM capacity, I am wondering if there's an easier way to do this other than monitoring Ollama log output and trial and error.

Desired Solution you'd like

If a user inputs their known vram capacity, is there any conservative estimate that can be made about what context length could be set AUTOMATICALLY for a model, once a model's quantized size has been factored in? Any mechanism other than just setting a default context length of 2048, which I'm sure there are many users that have no idea their context length is being truncated.

Alternatives Considered

No response

Additional Context

Could be related to #573

Originally created by @TheSpaceGod on GitHub (Apr 1, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/12288 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description After playing around with trying to get the biggest context length I can get for several models for my specific GPUs VRAM capacity, I am wondering if there's an easier way to do this other than monitoring Ollama log output and trial and error. ### Desired Solution you'd like If a user inputs their known vram capacity, is there any conservative estimate that can be made about what context length could be set AUTOMATICALLY for a model, once a model's quantized size has been factored in? Any mechanism other than just setting a default context length of 2048, which I'm sure there are many users that have no idea their context length is being truncated. ### Alternatives Considered _No response_ ### Additional Context Could be related to #573

GiteaMirror closed this issue

2026-04-19 22:26:03 -05:00

GiteaMirror commented

2026-04-19 22:26:04 -05:00

@TheSpaceGod commented on GitHub (Apr 1, 2025):

Just found the estimator tool. I wonder if something like this could be integrated for RAG context length estimation.
https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

@TheSpaceGod commented on GitHub (Apr 1, 2025): Just found the estimator tool. I wonder if something like this could be integrated for RAG context length estimation. https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

GiteaMirror commented

2026-04-19 22:26:05 -05:00

@TheSpaceGod commented on GitHub (Apr 1, 2025):

After reading more issue ticket in the Ollama repo, this seems more apt to solve in the Ollama project itself and doesn't seem fair to put on open webui. Closing this issue.

@TheSpaceGod commented on GitHub (Apr 1, 2025): After reading more issue ticket in the Ollama repo, this seems more apt to solve in the Ollama project itself and doesn't seem fair to put on open webui. Closing this issue.

GiteaMirror commented

2026-04-19 22:26:06 -05:00

@TheSpaceGod commented on GitHub (Apr 1, 2025):

Looks like this ollama issue tracks this problem: https://github.com/ollama/ollama/issues/1005

@TheSpaceGod commented on GitHub (Apr 1, 2025): Looks like this ollama issue tracks this problem: https://github.com/ollama/ollama/issues/1005

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#16535