mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #8541] OpenWebUI-Ollama does not fully utilize NVIDIA GPU when context length or parallel session icncreases #15162
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @rpaGuyai on GitHub (Jan 14, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/8541
Bug Report
Important Notes
Before submitting a bug report: Please check the Issues or Discussions section to see if a similar issue or feature request has already been posted. It's likely we're already tracking it! If you’re unsure, start a discussion post first. This will help us efficiently focus on improving the project.
Collaborate respectfully: We value a constructive attitude, so please be mindful of your communication. If negativity is part of your approach, our capacity to engage may be limited. We’re here to help if you’re open to learning and communicating positively. Remember, Open WebUI is a volunteer-driven project managed by a single maintainer and supported by contributors who also have full-time jobs. We appreciate your time and ask that you respect ours.
Contributing: If you encounter an issue, we highly encourage you to submit a pull request or fork the project. We actively work to prevent contributor burnout to maintain the quality and continuity of Open WebUI.
Bug reproducibility: If a bug cannot be reproduced with a
:mainor:devDocker setup, or a pip install with Python 3.11, it may require additional help from the community. In such cases, we will move it to the "issues" Discussions section due to our limited resources. We encourage the community to assist with these issues. Remember, it’s not that the issue doesn’t exist; we need your help!Note: Please remove the notes above when submitting your post. Thank you for your understanding and support!
Installation Method
[Describe the method you used to install the project, e.g., git clone, Docker, pip, etc.]
Environment
Open WebUI Version: [e.g., v0.3.11]
Ollama (if applicable): [e.g., v0.2.0, v0.1.32-rc1]
Operating System: [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04]
Browser (if applicable): [e.g., Chrome 100.0, Firefox 98.0]
Confirmation:
Expected Behavior:
[Describe what you expected to happen.]
Actual Behavior:
[Describe what actually happened.]
Description
Bug Summary:
I am hosting OpenWebUI on my server ( specs - AWS G4dn.12xlarge, Memory: 192 GB RAM, GPU: 4 x NVIDIA Tesla T4 GPUs, TITAL 64 gb GPU, 16 GB each).
Issues and scenarios:
I have found a sweet spot to get the optimized result, when the context length is set to 11000 and Environment="OLLAMA_NUM_PARALLEL=10" in ollamma . service file, it works good, utilizing all 4 GPUS and minimum CPU.
However, If I increase either the context length to let say 15000 or num parallel to let say 15, the speed reduces drastically and the load is being shared almost 50- 50 by CPU and GPU, GPU not being utilized fully casuing slowness in respsonse when just 5-6 concurrent sessions as there.
If I further increase either od context length to 20K or num parallel to 20, then in such cases and beyond, it stops using GPU and load is fully transferred to CPU which kills the speed totally.
Need help from experts please, is it due to some configurations in ollama or openwebui or is it that T4 GPUS in both cases each unit is only 16GB, is it that we need entire GPU memory in just 1GPU to fullu utilize it, if the GPU , in my case of 12x Large, it is 64(16*4) split into 4.
Reproduction Details
Steps to Reproduce:
[Outline the steps to reproduce the bug. Be as detailed as possible.]
Logs and Screenshots
Browser Console Logs:
[Include relevant browser console logs, if applicable]
Docker Container Logs:
[Include relevant Docker container logs, if applicable]
Screenshots/Screen Recordings (if applicable):
[Attach any relevant screenshots to help illustrate the issue]
Additional Information
[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!