[PR #22247] [CLOSED] fix: video upload flow for multimodal vLLM chat #42201

Closed
opened 2026-04-25 14:11:30 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/22247
Author: @shihanqu
Created: 3/4/2026
Status: Closed

Base: mainHead: fix/video-upload-vllm-multimodal


📝 Commits (1)

  • f0a1ad8 Fix multimodal video uploads for vLLM chat flow

📊 Changes

2 files changed (+54 additions, -14 deletions)

View changed files

📝 backend/open_webui/routers/files.py (+12 -3)
📝 backend/open_webui/utils/middleware.py (+42 -11)

📄 Description

Pull Request Checklist

  • Target branch: Verify that the pull request targets the dev branch. PRs targeting main will be immediately closed.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Add docs in Open WebUI Docs Repository. Document user-facing behavior, environment variables, public APIs/interfaces, or deployment steps.
  • Dependencies: Are there any new or upgraded dependencies? If so, explain why, update the changelog/docs, and include any compatibility notes. Actually run the code/function that uses updated library to ensure it doesn't crash.
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Include reproducible steps to demonstrate the issue before the fix. Test edge cases (URL encoding, HTML entities, types). Take this as an opportunity to make screenshots of the feature/fix and include them in the PR description.
  • Agentic AI Code: Confirm this Pull Request has gone through additional manual review AND manual testing.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Design & Architecture: Prefer smart defaults over adding new settings; use local state for ephemeral UI logic. Open a Discussion for major architectural or UX changes.
  • Git Hygiene: Keep PRs atomic (one logical change). Clean up commits and rebase on dev to ensure no unrelated commits (e.g. from main) are included. Push updates to the existing PR branch instead of closing and reopening.
  • Title Prefix: PR title uses the fix: prefix.

Changelog Entry

Description

  • Fixes OpenAI-compatible multimodal video upload flow in chat by ensuring uploaded video/* files are injected into the outgoing message payload as video_url parts and media URLs are converted to base64 for both images and videos when needed.
  • Removes misleading upload-processing failure for video/mp4 by treating video uploads as completed for multimodal chat usage rather than forcing retrieval/text extraction processing.

Added

  • Support in process_chat_payload for injecting uploaded video files as {"type":"video_url", "video_url":{"url":...}} content parts.
  • Support in media URL conversion for both image_url and video_url items.

Changed

  • Renamed convert_url_images_to_base64 to convert_url_media_to_base64 and generalized handling from image-only to image+video.

Deprecated

  • None.

Removed

  • None.

Fixed

  • Fixed upload-time warning/error path where valid video/mp4 chat uploads were marked as unsupported for processing.
  • Fixed missing propagation of uploaded videos into OpenAI-compatible multimodal request content.

Security

  • No security behavior changes.

Breaking Changes

  • BREAKING CHANGE: None.

Additional Information

  • No new dependencies introduced.
  • Manual validation performed in a real Docker deployment with vLLM backend:
    1. Upload /home/shihan/Downloads/N1cdUjctpG8.mp4 in Open WebUI chat.
    2. Verify no upload failure toast for video/mp4 processing.
    3. Send prompt to vLLM-backed model.
    4. Confirm model returns accurate video interpretation.
  • "No sources found" may still appear for video-only turns because no RAG citations are attached; this is expected and unchanged by this PR.

Screenshots or Videos

  • Verified manually in local environment; screenshots/video evidence can be provided if maintainers request artifacts in-thread.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/22247 **Author:** [@shihanqu](https://github.com/shihanqu) **Created:** 3/4/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix/video-upload-vllm-multimodal` --- ### 📝 Commits (1) - [`f0a1ad8`](https://github.com/open-webui/open-webui/commit/f0a1ad864dc263705715d23cbd14b770744ec1e0) Fix multimodal video uploads for vLLM chat flow ### 📊 Changes **2 files changed** (+54 additions, -14 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/routers/files.py` (+12 -3) 📝 `backend/open_webui/utils/middleware.py` (+42 -11) </details> ### 📄 Description # Pull Request Checklist - [x] **Target branch:** Verify that the pull request targets the `dev` branch. **PRs targeting `main` will be immediately closed.** - [x] **Description:** Provide a concise description of the changes made in this pull request down below. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** Add docs in [Open WebUI Docs Repository](https://github.com/open-webui/docs). Document user-facing behavior, environment variables, public APIs/interfaces, or deployment steps. - [x] **Dependencies:** Are there any new or upgraded dependencies? If so, explain why, update the changelog/docs, and include any compatibility notes. Actually run the code/function that uses updated library to ensure it doesn't crash. - [x] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Include reproducible steps to demonstrate the issue before the fix. Test edge cases (URL encoding, HTML entities, types). Take this as an opportunity to **make screenshots of the feature/fix and include them in the PR description**. - [x] **Agentic AI Code:** Confirm this Pull Request has gone through additional manual review AND manual testing. - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Design & Architecture:** Prefer smart defaults over adding new settings; use local state for ephemeral UI logic. Open a Discussion for major architectural or UX changes. - [x] **Git Hygiene:** Keep PRs atomic (one logical change). Clean up commits and rebase on `dev` to ensure no unrelated commits (e.g. from `main`) are included. Push updates to the existing PR branch instead of closing and reopening. - [x] **Title Prefix:** PR title uses the `fix:` prefix. # Changelog Entry ### Description - Fixes OpenAI-compatible multimodal video upload flow in chat by ensuring uploaded `video/*` files are injected into the outgoing message payload as `video_url` parts and media URLs are converted to base64 for both images and videos when needed. - Removes misleading upload-processing failure for `video/mp4` by treating video uploads as completed for multimodal chat usage rather than forcing retrieval/text extraction processing. ### Added - Support in `process_chat_payload` for injecting uploaded video files as `{"type":"video_url", "video_url":{"url":...}}` content parts. - Support in media URL conversion for both `image_url` and `video_url` items. ### Changed - Renamed `convert_url_images_to_base64` to `convert_url_media_to_base64` and generalized handling from image-only to image+video. ### Deprecated - None. ### Removed - None. ### Fixed - Fixed upload-time warning/error path where valid `video/mp4` chat uploads were marked as unsupported for processing. - Fixed missing propagation of uploaded videos into OpenAI-compatible multimodal request content. ### Security - No security behavior changes. ### Breaking Changes - **BREAKING CHANGE**: None. --- ### Additional Information - No new dependencies introduced. - Manual validation performed in a real Docker deployment with vLLM backend: 1. Upload `/home/shihan/Downloads/N1cdUjctpG8.mp4` in Open WebUI chat. 2. Verify no upload failure toast for `video/mp4` processing. 3. Send prompt to vLLM-backed model. 4. Confirm model returns accurate video interpretation. - `"No sources found"` may still appear for video-only turns because no RAG citations are attached; this is expected and unchanged by this PR. ### Screenshots or Videos - Verified manually in local environment; screenshots/video evidence can be provided if maintainers request artifacts in-thread. ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 14:11:30 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#42201