mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #23418] feat: Performance Bottleneck - 3-Minute Latency with 400K+ Token Context in Roleplay Scenarios #35506
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @a86582751 on GitHub (Apr 5, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23418
Performance Bottleneck: 3-Minute Latency with 400K+ Token Context in Roleplay Scenarios
Bug Description
When using Open WebUI for long-form roleplay conversations (400K+ tokens, ~36MB chat data), the response latency becomes unbearable (2-3 minutes) despite the model itself being responsive.
Key Observation: The delay occurs before the request reaches the LLM backend.
Environment
host.docker.internal:8317Steps to Reproduce
Root Cause Analysis
Current Architecture (Problematic)
Processing Flow per Message:
SELECT chat FROM chat WHERE id = ?→ Read 36MB (41ms)json.loads()→ Parse to Python dict (5-10s)json.dumps()→ Serialize (5-10s)UPDATE chat SET chat = ?→ Write 37MB (10s)Total: 2-3 minutes, with 80%+ time spent on JSON serialization
Comparison with Efficient Architectures
ChatGPT/Claude approach:
Cherry Studio approach:
Benchmark Data
Proposed Solutions
Option 1: Message Table Normalization (Recommended)
Split the monolithic JSON into normalized tables:
Benefits:
LIMIT 50 OFFSET ?chattable as fallbackOption 2: Lazy Loading API
Add new endpoint that returns paginated messages:
Option 3: Context Compression (Related to PR #22681)
Related Issues
Additional Context
This issue particularly affects:
The current architecture forces users to choose between:
Workarounds Currently Used
Request
Consider prioritizing storage layer refactoring for v2.0 to support:
This would make Open WebUI truly competitive for professional/long-form use cases.
Labels:
performance,enhancement,database,v2.0@pr-validator-bot commented on GitHub (Apr 5, 2026):
⚠️ Missing Issue Title Prefix
@a86582751, your issue title is missing a prefix (e.g.,
bug:,feat:,docs:).Please update your issue title to include one of the following prefixes:
Example:
bug: Login fails when using special characters in password@Classic298 commented on GitHub (Apr 8, 2026):
Did you time / measure this?
I have much longer and larger chats and I don't come even close to these values.
3 minute Latency for the Chat to be sent to the provider is definitely not an Open WebUI fault - parsing a 36MB json doesn't take 3 minutes.
The related issues you referenced are not related
If you load a long chat initially, only 20 messages are being fetched already. So why are you proposing to add what is already in place?
Slop