mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #1952] feat: context warning message #12695
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @notasquid1938 on GitHub (May 3, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1952
I decided to test gradient's 1 million context llama3 model by adjusting the context parameter accordingly. However, as can be seen in this server log I run out of memory trying to store all this context:
The issue I had was the openwebui didn't show or explain this error. It just tries to generate the response forever before erroring out with a message about connection issues to ollama. It would be very helpful if the ui could pop up a message indicating your context length caused an out of memory issue, preferably with the amount of memory it was trying to use to make it easy for users to tune how much context their system can handle.
@edwardochoaphd commented on GitHub (May 9, 2024):
Would there be a bug:issue with Settings>Advanced Parameters>Context Length?
I'm also getting errors possibly running out of GPU memory ... in both these examples while testing my own LLM pipeline, an extra zero appears to have been added by OpenWebUI...?
Would an extra zero be added to the user's setting when sent from OpenWebUI? (i.e., not user error, possibly a bug in OpenWebUI?)
Note: The above user wants to test for 1M, however the extra zero changes their setting to 10M?...

@justinh-rahb commented on GitHub (May 9, 2024):
I am in favour of having an environment variable to disable the ability for users to change the
num_ctxparameter from default.@Cdddo commented on GitHub (Sep 30, 2024):
The UI should definitely show how much context is available. Maybe as percentage if not in tokens
@justinh-rahb commented on GitHub (Sep 30, 2024):
Not practical to just add it, as one does, or it would have been already. Every model uses a different tokenizer, and it's not always possible to reliably determine it from the model name alone.