mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #17217] feat: support video in chat messages #18210
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @wei0623kb on GitHub (Sep 5, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17217
Check Existing Issues
Installation Method
Pip Install
Open WebUI Version
v0.6.26
Ollama Version (if applicable)
we use vllm,vllm version==0.10.1.1
Operating System
Ubuntu 22.04
Browser (if applicable)
Chrome
Confirmation
README.md.Expected Behavior
The model should be able to analyze the visual content in the video and generate descriptive responses relevant to the actual content of the video.
Actual Behavior
The model's response does not match the video content at all, presenting random and meaningless responses without demonstrating an understanding of the video content.
Steps to Reproduce
pip install vllm==0.10.1.1 flash-attn transformers sentencepiece
pip install open-webui
nohup python -m vllm.entrypoints.openai.api_server
--model /media/vlm_model/Qwen2.5-VL-72B-Instruct
--served-model-name Qwen2.5-VL-72B
--tensor-parallel-size 8
--dtype bfloat16
--host 0.0.0.0
--port 8888
--max-model-len 128000
--swap-space 16
--enable-auto-tool-choice
--tool-call-parser hermes \
export OPENAI_API_BASE_URL="http://localhost:8888/v1"
export DEFAULT_MODELS="Qwen2.5-VL-72B"
source ~/.bashrc
open-webui serve
Logs & Screenshots
https://github.com/user-attachments/assets/25b0ceb6-fc6d-4710-ab08-08f8e7c4522f
er-attachments/assets/173190c1-41de-4a4a-a8ae-54e35cd864c0)
Additional Information
Currently, I have deployed qwen2.5vl-72B based on vLLM and OpenWebUI, aiming to support content understanding and analysis of uploaded video files. However, in actual use, it was found that when a video file is uploaded and the model is requested to interpret the video content, the model fails to correctly understand the information in the video, with the response content being unrelated to the actual video content and presenting randomized responses.
@tjbck commented on GitHub (Sep 5, 2025):
The video gets uploaded to Open WebUI and processed there, and does not get sent to the inference engine, this should be considered as a feature request.
@AlvinNorin commented on GitHub (Sep 14, 2025):
I second this very hard
@warshanks commented on GitHub (Sep 21, 2025):
Direct video support would be amazing