mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[GH-ISSUE #1268] feat: smart context length managment #51086
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tjbck on GitHub (Mar 22, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1268
e.g. messages.length > 10, slice
@ghost commented on GitHub (Mar 26, 2024):
Great idea. I think it would be beneficial to cache this litellm file anyway which contains useful information including max_tokens:
https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json
While it may not be useful for local Ollama models, the Ollama Modelfile syntax supports the num_ctx parameter and can be queried via the API. A good strategy may be to leverage the litellm JSON data for external models like OpenAI and presume that all Ollama context length is the Ollama default of
2048unless otherwise determined by the Modelfile parameter for a given model.Although it's still beneficial to retain configurability for cases where you don't require the max context or if the information is absent.
@VertexMachine commented on GitHub (Jun 5, 2024):
Let me add to this some more information. This is very needed if you use APIs, and not only OpenAI API, but others like OpenRouter or Infermatics. Some models+endpoints just fail when you exceed the context length (returning error 400), some will incur massive cost for the user (as it grows with context size, and turncation might be a good option in those cases). Unfortunately, the problem is that there is no standardized tokenization endpoint defined in OpenAI compatibile API. OpenAI recommend using https://github.com/openai/tiktoken on client side.
As a workaround I do use AutoTokenizer from Transformers (
from transformers import AutoTokenizer) to calculate token count in my apps. This is the function I've written (feel free to incorporate it in your code):I made it generic as sometimes I don't want to have BOS/EOS token counted (hf.tokniezers by default add BOS, but not EOS).
There are a few issues here:
tokenizer = AutoTokenizer.from_pretrained("model_name", legacy=False)will download appropriate files.Also, feel free to use the above mapping as a starting point. As a last effort fallback I'm using simply gpt2 tokenizer
AutoTokenizer.from_pretrained("gpt2").@tjbck commented on GitHub (Jun 19, 2024):
Filter function from #3247 will resolved this. You can essentially write your own custom middleware and install it with functions.
@tjbck commented on GitHub (Jun 30, 2024):
https://openwebui.com/f/hub/context_clip_filter
Feedback wanted here!