[GH-ISSUE #17194] issue: The LLM fails to recognize large documents correctly on the first attempt. #56868

New Issue

GiteaMirror · 2026-05-05T20:11:11-05:00

GiteaMirror commented

2026-05-05 20:11:11 -05:00

Originally created by @Cyp9715 on GitHub (Sep 4, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17194

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.26

Ollama Version (if applicable)

No response

Operating System

Windows 11

Browser (if applicable)

Firefox 142.0.1

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

LLM models must correctly recognize all documents.

Actual Behavior

LLM models do not correctly recognize all documents. (First attempt)

Steps to Reproduce

please check video

Install the following on Ubuntu 24.04:

vllm (v0.10.1.1)
- Main model: GPT-OSS 120B
- Embedding model: bge-m3
OpenWebUI (v0.6.26)
qdrant(1.15.4)

This happens even when using the built-in vector DB, without using qdrant!

Start a Docker container using a command like the one below.

docker run  --name vllm-gpt-oss \
           --runtime nvidia --gpus all \
           -v ~/.cache/huggingface:/root/.cache/huggingface \
           -p 8000:8000 \
           --shm-size=2g \
           --entrypoint /bin/bash \
           vllm/vllm-openai:latest \
           -c "pip install --force-reinstall --no-deps nvidia-nccl-cu12==2.27.7 && \
               python3 -m vllm.entrypoints.openai.api_server \
               --model openai/gpt-oss-120b"

docker run --name vllm-bge-m3 \
           --runtime nvidia --gpus all \
           -v ~/.cache/huggingface:/root/.cache/huggingface \
           -p 8001:8000 \
           --shm-size=2g \
           --entrypoint /bin/bash \
           vllm/vllm-openai:latest \
           -c "pip install --force-reinstall --no-deps nvidia-nccl-cu12==2.27.7 && \
               python3 -m vllm.entrypoints.openai.api_server \
               --model BAAI/bge-m3 \
               --task embed"

docker run  --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  --restart unless-stopped \
  -v qdrant_storage:/qdrant/storage \
  qdrant/qdrant:latest

Integrate the Docker container (vllm) with OpenWebUI.
Attach 5 documents, including a high-capacity document (5MB>).
Immediately after the loading bar inside the document icon finishes, ask the question: "How many files can be identified?"
The number of files (5) was not correctly recognized in the first question.
Ask the same question again in the same chat session without attaching any files.
The number of files was correctly recognized.

Logs & Screenshots

OpenWebUI Logs(Full version).txt
OpenWebUI Logs(Delete file content).txt
The log started at the beginning of the video and ended at the same time as the video.

Part of the personal information in the log has been modified.

Additional Information

I believe this is a problem with the embedding model. If I add a certain time delay through Pipelines, all files are correctly embedded and delivered without a single failure.

The presumed cause of the issue is as follows.:

Content Extraction begins.
Embedding process starts, but before it's complete, the OpenWebUI Send button is activated.
The LLM model receives embedding information for only some of the files (the smaller ones).
The LLM model provides an answer based on the small file embedding information. (The embedding model is still processing the large files in the backend.)
When a second question is asked, the embedding model has already completed the embedding process for the large file. (data is saved to the vector DB).
The same question, asked again without attaching files, correctly recognizes all 5 files.(By referencing the information stored in the vector DB!)

This is the process I suspect is happening.(Or was it because of indexing asynchronous in the DB...)

and, this issue also appears when the embedding model is not deployed with vllm and the Sentence Transformer built into OpenWebUI is used instead.
Finally, this issue occurs not only when uploading multiple files but also when uploading a single, large-capacity file.(>=5MB)

Originally created by @Cyp9715 on GitHub (Sep 4, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/17194 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.26 ### Ollama Version (if applicable) _No response_ ### Operating System Windows 11 ### Browser (if applicable) Firefox 142.0.1 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [ ] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior LLM models must correctly recognize all documents. ### Actual Behavior LLM models do not correctly recognize all documents. (First attempt) ### Steps to Reproduce please check [video](https://www.youtube.com/watch?v=30V0Gs6F7B4) 1. Install the following on Ubuntu 24.04: - vllm (v0.10.1.1) - Main model: GPT-OSS 120B - Embedding model: bge-m3 - OpenWebUI (v0.6.26) - qdrant(1.15.4) _This happens even when using the built-in vector DB, without using qdrant!_ 2. Start a Docker container using a command like the one below. ``` docker run --name vllm-gpt-oss \ --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -p 8000:8000 \ --shm-size=2g \ --entrypoint /bin/bash \ vllm/vllm-openai:latest \ -c "pip install --force-reinstall --no-deps nvidia-nccl-cu12==2.27.7 && \ python3 -m vllm.entrypoints.openai.api_server \ --model openai/gpt-oss-120b" ``` ``` docker run --name vllm-bge-m3 \ --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -p 8001:8000 \ --shm-size=2g \ --entrypoint /bin/bash \ vllm/vllm-openai:latest \ -c "pip install --force-reinstall --no-deps nvidia-nccl-cu12==2.27.7 && \ python3 -m vllm.entrypoints.openai.api_server \ --model BAAI/bge-m3 \ --task embed" ``` ``` docker run --name qdrant \ -p 6333:6333 -p 6334:6334 \ --restart unless-stopped \ -v qdrant_storage:/qdrant/storage \ qdrant/qdrant:latest ``` 3. Integrate the Docker container (vllm) with OpenWebUI. 5. Attach 5 documents, including a high-capacity document (5MB>). 6. Immediately after the loading bar inside the document icon finishes, ask the question: "How many files can be identified?" 7. The number of files (5) was not correctly recognized in the first question. 8. Ask the same question again in the same chat session without attaching any files. 9. The number of files was correctly recognized. ### Logs & Screenshots [OpenWebUI Logs(Full version).txt](https://github.com/user-attachments/files/22133845/OpenWebUI.Logs.Debug.txt) [OpenWebUI Logs(Delete file content).txt](https://github.com/user-attachments/files/22135014/OpenWebUI.Logs.Debug.txt) The log started at the beginning of the video and ended at the same time as the [video](https://www.youtube.com/watch?v=30V0Gs6F7B4). _Part of the personal information in the log has been modified._ ### Additional Information I believe this is a problem with the embedding model. If I add a certain time delay through Pipelines, all files are correctly embedded and delivered without a single failure. The presumed cause of the issue is as follows.: 1. Content Extraction begins. 2. Embedding process starts, but before it's complete, the OpenWebUI Send button is activated. 3. The LLM model receives embedding information for only some of the files (the smaller ones). 4. The LLM model provides an answer based on the small file embedding information. (The embedding model is still processing the large files in the backend.) 5. When a second question is asked, the embedding model has already completed the embedding process for the large file. (data is saved to the vector DB). 6. The same question, asked again without attaching files, correctly recognizes all 5 files.(By referencing the information stored in the vector DB!) This is the process I suspect is happening.(Or was it because of indexing asynchronous in the DB...) and, this issue also appears when the embedding model is not deployed with vllm and the Sentence Transformer built into OpenWebUI is used instead. Finally, this issue occurs not only when uploading multiple files but also when uploading a single, large-capacity file.(>=5MB)

GiteaMirror added the bug label 2026-05-05 20:11:11 -05:00

GiteaMirror closed this issue

2026-05-05 20:11:12 -05:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#56868