[GH-ISSUE #16938] issue: response latency increases by 6s when any file is attached (captions, pasted files, etc.) #33632

Closed
opened 2026-04-25 07:32:00 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @lucyknada on GitHub (Aug 26, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/16938

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.25

Ollama Version (if applicable)

No response

Operating System

debian 13

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Queries should be near-instant with or without file attachments, similar to when pasting the content directly into the prompt.

Actual Behavior

As soon as any file is attached (txt file, youtube caption file, etc.), each query takes ~6 seconds, even for simple prompts. Without files, responses are instant.

Steps to Reproduce

  1. Start with a clean install of Open WebUI using Docker.
  2. Connect to local backend e.g. tabby
  3. Create a new chat.
  4. paste youtube transcript (disable paste as file)
  5. Ask a simple question (response is instant).
  6. Create a new chat.
  7. Upload any file (e.g. plain text or auto-generated transcript).
  8. Ask the same or any question - every turn takes ~6s to respond now.
  9. Remove/disallow file upload - issue disappears.

Logs & Screenshots

  • No errors visible in browser console
  • Latency consistently around 6s with files attached but not visible on websocket or network timing

Additional Information

  • Tried disabling citations, enabling full context mode, no change.
  • Only disabling file uploads fully resolves the issue.
  • Happens consistently with any file type, including very small text files.
Originally created by @lucyknada on GitHub (Aug 26, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/16938 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.25 ### Ollama Version (if applicable) _No response_ ### Operating System debian 13 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Queries should be near-instant with or without file attachments, similar to when pasting the content directly into the prompt. ### Actual Behavior As soon as any file is attached (txt file, youtube caption file, etc.), each query takes ~6 seconds, even for simple prompts. Without files, responses are instant. ### Steps to Reproduce 1. Start with a clean install of Open WebUI using Docker. 2. Connect to local backend e.g. tabby 3. Create a new chat. 4. paste youtube transcript (disable paste as file) 5. Ask a simple question (response is instant). 6. Create a new chat. 8. Upload any file (e.g. plain text or auto-generated transcript). 9. Ask the same or any question - every turn takes ~6s to respond now. 10. Remove/disallow file upload - issue disappears. ### Logs & Screenshots - No errors visible in browser console - Latency consistently around 6s with files attached but not visible on websocket or network timing ### Additional Information - Tried disabling citations, enabling full context mode, no change. - Only disabling file uploads fully resolves the issue. - Happens consistently with any file type, including very small text files.
GiteaMirror added the bug label 2026-04-25 07:32:00 -05:00
Author
Owner

@tjbck commented on GitHub (Aug 26, 2025):

File processing will take extra time.

<!-- gh-comment-id:3225985630 --> @tjbck commented on GitHub (Aug 26, 2025): File processing will take extra time.
Author
Owner

@lucyknada commented on GitHub (Aug 26, 2025):

@tjbck a file already has been uploaded, it has to re-process on every single turn?

<!-- gh-comment-id:3226009740 --> @lucyknada commented on GitHub (Aug 26, 2025): @tjbck a file already has been uploaded, it has to re-process on every single turn?
Author
Owner

@Classic298 commented on GitHub (Aug 27, 2025):

Yes

<!-- gh-comment-id:3226754373 --> @Classic298 commented on GitHub (Aug 27, 2025): Yes
Author
Owner

@lucyknada commented on GitHub (Aug 27, 2025):

@Classic298 whats causing this to take 6 seconds though? the 22m embedding model cant take that long on modern hardware especially on tiny amounts of text, is it the reranking?

<!-- gh-comment-id:3226768473 --> @lucyknada commented on GitHub (Aug 27, 2025): @Classic298 whats causing this to take 6 seconds though? the 22m embedding model cant take that long on modern hardware especially on tiny amounts of text, is it the reranking?
Author
Owner

@Classic298 commented on GitHub (Aug 27, 2025):

Depends on the file.
For me it rarely takes 6 seconds

  1. Uploading the file
  2. Content extraction (takes longer the larger the file)
  3. Embedding (---)
  4. Insert into vector db

Depends on the file type and file size how long content extraction will take

<!-- gh-comment-id:3226774149 --> @Classic298 commented on GitHub (Aug 27, 2025): Depends on the file. For me it rarely takes 6 seconds 1. Uploading the file 2. Content extraction (takes longer the larger the file) 3. Embedding (---) 4. Insert into vector db Depends on the file type and file size how long content extraction will take
Author
Owner

@rgaricano commented on GitHub (Aug 27, 2025):

mesured on my end (low & slow) in dev & DEBUG mode (this causes all embeds to be logged, json files generate thousands of entries, which also slows down the process and makes logging difficult because the journal bundles those excessive lines)

### backend, upload file process time:

m4a     3,8MB - File upload and processing completed in 23.13s
json   104 KB - File upload and processing completed in 0.39s ( _No Content_), (with a 4 MB json I lost the time log)
pdf    3,3 MB - File upload and processing completed in 1.36s
zip  628,4 KB - File upload and processing completed in 3.23s
_ (Note: zip file with repo code, embedding is showed in logs but in FileModal _No Content_ )
<!-- gh-comment-id:3228754102 --> @rgaricano commented on GitHub (Aug 27, 2025): mesured on my end (low & slow) in dev & DEBUG mode (this causes all embeds to be logged, json files generate thousands of entries, which also slows down the process and makes logging difficult because the journal bundles those excessive lines) ``` ### backend, upload file process time: m4a 3,8MB - File upload and processing completed in 23.13s json 104 KB - File upload and processing completed in 0.39s ( _No Content_), (with a 4 MB json I lost the time log) pdf 3,3 MB - File upload and processing completed in 1.36s zip 628,4 KB - File upload and processing completed in 3.23s _ (Note: zip file with repo code, embedding is showed in logs but in FileModal _No Content_ ) ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#33632