mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[GH-ISSUE #19900] issue: ExternalReranker blocks event loop causing application freeze during RAG queries #34561
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @joshrenshaw12 on GitHub (Dec 12, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19900
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.6.41
Ollama Version (if applicable)
No response
Operating System
macOS Tahoe
Browser (if applicable)
Chrome
Confirmation
README.md.Expected Behavior
When using an external reranker (via
RAG_RERANKING_MODEL_TRUST_REMOTE_CODE=truewith an external reranking endpoint), the application should remain responsive while waiting for reranking API responses. Other users should be able to continue using the application normally.Actual Behavior
The entire Open WebUI application freezes and becomes unresponsive while waiting for the external reranker HTTP response. All users experience a complete outage until the reranking request completes. In testing, this caused outages of 9-36+ seconds per reranking call.
Steps to Reproduce
RAG_RERANKING_MODELto use an external modelRAG_RERANKING_MODEL_TRUST_REMOTE_CODE=trueRAG_EXTERNAL_RERANKER_URLto point to a reranking endpoint (e.g., a Bedrock Cohere rerank endpoint via API gateway)Logs & Screenshots
BEFORE FIX - Application freezes during reranking
Note the timestamps showing ~9 second gap between reranker call (
01:13:23.547) and next log (01:13:32.582), and ~23 second gap between subsequent reranker calls. During these periods, the application was completely unresponsive:AFTER FIX - Application remains responsive during reranking
Note how HTTP requests continue to be processed while reranking occurs in the background:
Additional Information
Root Cause Analysis
The
ExternalReranker.predict()method inopen_webui/retrieval/models/external.pyuses synchronousrequests.post()which is called directly from an async FastAPI endpoint without being offloaded to a thread pool. This blocks the Python event loop.Proposed Fix
Two changes are required:
1. Wrap the reranking call with
asyncio.to_thread()inretrieval/utils.py:2. Add timeout to
requests.post()inretrieval/models/external.py:This approach is consistent with existing patterns in the codebase (e.g.,
asyncio.to_thread()is used for embedding operations on line 790 ofretrieval/utils.py).Example .env
Ollama URL for the backend to connect
The path '/ollama' will be redirected to the specified backend URL
OLLAMA_BASE_URL='http://localhost:11434'
OpenAI Configuration
OPENAI_API_BASE_URL='http://host.docker.internal:4000/v1'
OPENAI_API_KEY='xxx'
AUTOMATIC1111_BASE_URL='http://localhost:7860'
Application Configuration
ENV='dev'
ENABLE_PERSISTENT_CONFIG=false
DEFAULT_MODELS='gpt-5'
TASK_MODEL_EXTERNAL='gpt-5-nano'
ENABLE_COMMUNITY_SHARING=false
ENABLE_API_KEY=true
ENABLE_OLLAMA_API=false
ENABLE_DIRECT_CONNECTIONS=false
WEBUI_URL='http://localhost:3000'
GLOBAL_LOG_LEVEL='INFO'
RESET_CONFIG_ON_START=true
RAG Configuration
PDF_EXTRACT_IMAGES=true
ENABLE_RAG_HYBRID_SEARCH=true
RAG_TOP_K=5
RAG_EMBEDDING_ENGINE='openai'
RAG_EMBEDDING_MODEL='text-embedding-3-small'
RAG_OPENAI_API_BASE_URL='http://host.docker.internal:4000/v1'
RAG_OPENAI_API_KEY='xxx'
RAG_FILE_MAX_SIZE=9
RAG_EMBEDDING_BATCH_SIZE=2048
VECTOR_DB='pgvector'
ENABLE_AUTOCOMPLETE_GENERATION=true
RAG_RERANKING_ENGINE='external'
RAG_RERANKING_MODEL='bedrock-cohere-rerank-english-v3.0'
RAG_EXTERNAL_RERANKER_URL='http://custom-text-litellm:4000/v1/rerank'
RAG_EXTERNAL_RERANKER_API_KEY='xxx'
Web Search Configuration
ENABLE_WEB_SEARCH=true
ENABLE_SEARCH_QUERY_GENERATION=true
WEB_SEARCH_ENGINE='searxng'
WEB_SEARCH_CONCURRENT_REQUESTS=3
WEB_SEARCH_RESULT_COUNT=1
SEARXNG_QUERY_URL='http://searxng:8080/search?q='
Audio Configuration
AUDIO_STT_ENGINE='azure'
AUDIO_STT_AZURE_REGION='xxx'
AUDIO_STT_AZURE_LOCALES='en-US,en-GB'
AUDIO_TTS_ENGINE='azure'
AUDIO_TTS_OPENAI_API_BASE_URL='xxx'
AUDIO_TTS_AZURE_SPEECH_REGION='xxx'
AUDIO_TTS_AZURE_SPEECH_OUTPUT_FORMAT='ogg-48khz-16bit-mono-opus'
AUDIO_TTS_VOICE='en-AU-NatashaNeural'
WHISPER_MODEL_AUTO_UPDATE=true
Database Configuration
DATABASE_URL='xxx'
PGVECTOR_DB_URL='xxx'
Tool Server Connections (JSON configuration)
TOOL_SERVER_CONNECTIONS='[]'
Authentication & OAuth Configuration
ENABLE_LOGIN_FORM=false
ENABLE_OAUTH_SIGNUP=true
DEFAULT_USER_ROLE='user'
JWT_EXPIRES_IN='12h'
OAUTH_MERGE_ACCOUNTS_BY_EMAIL=true
OPENID_PROVIDER_URL='xxx'
MICROSOFT_REDIRECT_URI='http://localhost:3000/oauth/microsoft/callback'
OAUTH_GROUP_CLAIM='groups'
ENABLE_OAUTH_GROUP_CREATION=true
ENABLE_OAUTH_GROUP_MANAGEMENT=true
Session & Cookie Configuration
WEBUI_SESSION_COOKIE_SECURE=true
WEBUI_SESSION_COOKIE_SAME_SITE='lax'
WEBUI_AUTH_COOKIE_SAME_SITE='lax'
OneDrive Integration
ENABLE_ONEDRIVE_INTEGRATION=true
ENABLE_ONEDRIVE_BUSINESS=true
ENABLE_ONEDRIVE_PERSONAL=false
ONEDRIVE_SHAREPOINT_URL='xxx'
Content Extraction
CONTENT_EXTRACTION_ENGINE='tika'
TIKA_SERVER_URL='http://tika:9998'
Image Generation
ENABLE_IMAGE_GENERATION=true
IMAGE_GENERATION_ENGINE='openai'
IMAGES_OPENAI_API_BASE_URL='http://litellm-image-models:4001/v1'
IMAGES_OPENAI_API_KEY='xxx'
IMAGE_GENERATION_MODEL='gpt-image-1'
IMAGE_SIZE='1536x1024'
IMAGE_STEPS=24
Storage Configuration
STORAGE_PROVIDER='s3'
S3_REGION_NAME='xxx'
S3_BUCKET_NAME='xxx'
For production, you should only need one host as
fastapi serves the svelte-kit built frontend and backend from the same host and port.
To test with CORS locally, you can set something like
CORS_ALLOW_ORIGIN='http://localhost:5173;http://localhost:8080'
CORS_ALLOW_ORIGIN='*'
For production you should set this to match the proxy configuration (127.0.0.1)
FORWARDED_ALLOW_IPS='*'
DO NOT TRACK
SCARF_NO_ANALYTICS=true
DO_NOT_TRACK=true
ANONYMIZED_TELEMETRY=false
DEV OWUI Secret Key
WEBUI_SECRET_KEY='xxx'
AWS Secrets
AWS_ACCESS_KEY_ID='xxx'
AWS_SECRET_ACCESS_KEY='xxx'
AWS_SESSION_TOKEN='xxx'
Azure Secrets
AZURE_API_KEY='xxx'
AZURE_API_IMAGE_KEY='xxx'
AUDIO_TTS_API_KEY='xxx'
AUDIO_STT_AZURE_API_KEY='xxx'
Entra Secrets
MICROSOFT_CLIENT_ID='xxx'
MICROSOFT_CLIENT_SECRET='xxx'
ONEDRIVE_CLIENT_ID_BUSINESS='xxx'
MICROSOFT_CLIENT_TENANT_ID='xxx'
ONEDRIVE_SHAREPOINT_TENANT_ID='xxx'
Example docker compose
services:
postgres:
image: postgres:17
container_name: postgres
volumes:
- postgres-data:/var/lib/postgresql/data
ports:
- '5432:5432'
environment:
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
restart: unless-stopped
volumes:
open-webui: {}
postgres-data: {}
vectorpg-data: {}
@owui-terminator[bot] commented on GitHub (Dec 12, 2025):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#19861 issue:
by QuitHub • Dec 10, 2025 •
bug#19698 issue: .41 web based search and webpages - RAG - are not fixed
by frenzybiscuit • Dec 02, 2025 •
bug#19877 issue:
by dotmobo • Dec 11, 2025 •
bug#19864 issue:
by Haervwe • Dec 10, 2025 •
bug#15986 issue: Reranker does not release RAM!
by frenzybiscuit • Jul 24, 2025 •
bugShow 5 more related issues
#19281 issue: RAG Template applied with "Bypass Embedding and Retrieval" enabled
by lucyknada • Nov 19, 2025 •
bug#14463 issue: regression on v0.6.12 with RAG
by bb-chris • May 29, 2025 •
bug#19563 issue:
by naruto7g • Nov 28, 2025 •
bug#19496 issue: 500 internal server error appears in v0.6.40
by cloudtuotuo • Nov 26, 2025 •
bug#19417 issue: v0.6.37 SQL Error
by AKHYP • Nov 24, 2025 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@Classic298 commented on GitHub (Dec 12, 2025):
@silentoplayz can you repro?
@silentoplayz commented on GitHub (Dec 13, 2025):
I'm unable to test this issue out. I've never used an external reranker model in Open WebUI with an external reranking endpoint before.
@adhusch commented on GitHub (Dec 17, 2025):
Do you guys have a new internal policy of end-to-end reproducing everything? :-) I mean, @joshrenshaw12 already did a nice root cause analysis and showed a clear bug in the code (sychrounous call that under all circumstances should be asynchronous) and even gave a fix that looks nice (up to the hardcoded timeout=60, hard coding timeouts is risky), which is all independent of the reproduction of the overall scenario? It seems just simply fixing this and then testing aferwards if the end-to-end bug dissappeared too would be faster than any additional testing now?
@Classic298 commented on GitHub (Dec 17, 2025):
@adhusch We try to filter out real issues from non-real issues.
We don't know if this is a real issue if we didn't reproduce it.
And lately, a lot of those seemingly well written and well-intentioned issue posts are heavily assisted by AI written text and often just based on assumptions about the code base that end up being wrong. So better we reproduce it, and once we did (or someone else reproduced it) then we can dig deeper.
It's basically a form of basic triage.
Most posts with a nice root cause analysis lately unfortunately are based off of wrong information or assumptions.
Nothing against joshrenshaw12, maybe he indeed found the core issue, but neither me nor anyone else had the time yet to dig deeper into it, to verify it.
@adhusch commented on GitHub (Dec 17, 2025):
Hi @Classic298. Well, he showed that an external API call is sychronous, which is nearly always not intended, i think there is no way to get a deeper understanding of that than by this fact in the code, no end-to-end test will reveal more in this case. Overall i understand your approach and it makes sense 👍 , but the code is the ultimate truth. Even when his overall case should be non-reproducibly he revelaed this pretty clear bug.
@Classic298 commented on GitHub (Dec 20, 2025):
fixed in dev with new env var introduced that you should configure