mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #18933] issue: Critical SQLAlchemy Session Bug: "Could not refresh instance" #57387
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @aidenpearce001 on GitHub (Nov 4, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18933
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.6.34
Ollama Version (if applicable)
No response
Operating System
Ubuntu 22.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
When creating users via the API endpoint
/api/v1/auths/addor creating chats via/api/v1/chats/new, the operations should succeed consistently with near 100% success rate under normal load conditions.Expected:
Actual Behavior
Database write operations fail randomly with 30-60% failure rate, returning:
Observed failure rates across multiple test runs:
Error in logs:
This makes OpenWebUI unreliable for:
Steps to Reproduce
Environment Setup
OpenWebUI Configuration:
Database Infrastructure:
Reproduction Steps
Test 1: User Creation (Simplest)
Expected Result: 10 successful user creations
Actual Result: 3-6 successful, 4-7 failures (30-60% success rate)
Test 2: Concurrent User Creation (Stress)
Expected Result: 10 successful users
Actual Result: 1-3 successful, 7-9 failures (10-30% success rate)
Test 3: Chat Creation
Expected Result: 5 chats created
Actual Result: 0-2 chats created, rest fail
Key Observations
Logs & Screenshots
User Creation Failure (from OpenWebUI logs):
Chat Creation Failure (with full stack trace):
Database Connection Termination (PostgreSQL logs):
(Note: This happened when we tried setting idle_session_timeout to fix the issue - it made it worse)
Related Error (LiteLLM on same infrastructure):
Additional Information
Based on extensive testing and log analysis, the issue appears to be in SQLAlchemy session management:
Suspected code pattern (pseudocode from logs):
Why it fails:
commit(), SQLAlchemy may close the session or detach objectsrefresh()on a detached object raisesInvalidRequestErrorInfrastructure Changes Tested (All Failed to Fix)
To rule out infrastructure issues, we extensively tested:
1. Connection Pool Optimization:
DATABASE_POOL_SIZEfrom 5 → 40DATABASE_POOL_MAX_OVERFLOWfrom 10 → 20DATABASE_POOL_RECYCLEto 240s (4 minutes)DATABASE_POOL_PRE_PINGto test connections2. PgPool Configuration:
connection_life_timeto 600s (was 0/infinite)client_idle_limitto 300s (was 0/infinite)3. PostgreSQL Timeouts:
idle_in_transaction_session_timeout: 5min(killed stuck transactions)statement_timeout: 1min(killed long queries)idle_session_timeout: 10min→ Made it WORSE (killed connections mid-use)idle_session_timeout: 30min→ Still failedidle_session_timeout: 0(disabled) → Still failed4. Scaling Tests:
5. Fresh Connection Tests:
@tjbck commented on GitHub (Nov 4, 2025):
Unable to reproduce, community input wanted here.