mirror of
https://github.com/open-webui/open-webui.git
synced 2026-06-03 23:38:13 -05:00
[GH-ISSUE #12288] feat: context_length model setting estimator #16535
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TheSpaceGod on GitHub (Apr 1, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12288
Check Existing Issues
Problem Description
After playing around with trying to get the biggest context length I can get for several models for my specific GPUs VRAM capacity, I am wondering if there's an easier way to do this other than monitoring Ollama log output and trial and error.
Desired Solution you'd like
If a user inputs their known vram capacity, is there any conservative estimate that can be made about what context length could be set AUTOMATICALLY for a model, once a model's quantized size has been factored in? Any mechanism other than just setting a default context length of 2048, which I'm sure there are many users that have no idea their context length is being truncated.
Alternatives Considered
No response
Additional Context
Could be related to #573
@TheSpaceGod commented on GitHub (Apr 1, 2025):
Just found the estimator tool. I wonder if something like this could be integrated for RAG context length estimation.
https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
@TheSpaceGod commented on GitHub (Apr 1, 2025):
After reading more issue ticket in the Ollama repo, this seems more apt to solve in the Ollama project itself and doesn't seem fair to put on open webui. Closing this issue.
@TheSpaceGod commented on GitHub (Apr 1, 2025):
Looks like this ollama issue tracks this problem: https://github.com/ollama/ollama/issues/1005