mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #18412] feat: Add Automatic Memory Management and Garbage Collection for Vector Embedding Tasks #57255
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Baireinhold on GitHub (Oct 18, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18412
Check Existing Issues
Problem Description
Problem Description
Currently, Open WebUI's vector embedding and retrieval pipeline lacks an automatic memory cleanup mechanism. When performing batch embedding or retrieval tasks, the Python process accumulates memory cumulatively without releasing temporary objects, buffers, or cached data after task completion. This design flaw makes batch operations impractical and degrades system performance over time.
Environment & Platform:
OS: macOS (native Python process)
Setup: Ollama deployed locally on macOS
Python Version: (Please specify your Python version)
Open WebUI Version: (Please specify your version)
Dependencies: Ollama vector model, requests library
Installation & Configuration Steps:
1. Install Ollama on macOS
Download from: https://ollama.ai
2. Pull a vector embedding model
ollama pull bge-m3
3. Verify Ollama is running
curl http://localhost:11434/api/tags
4. Install Open WebUI
pip install open-webui
5. Start Open WebUI
open-webui serve
6. Access via browser
http://localhost:8080
Exact Steps to Reproduce:
1.Open Activity Monitor (Cmd+Space → Activity Monitor)
2.Search for Python process running Open WebUI
3.Note initial memory usage (baseline)
4.Navigate to Open WebUI → Documents/Files
5.Upload a text document (example: any .txt or .pdf file)
6.Click "Embed and Store" or perform vector embedding task
7.Wait for task completion
8.Observe: Memory usage increases by X MB
9.Repeat Step 6-8 five times with different documents
10.Observe: Memory increases cumulatively (~X × 5 MB or more)
11.Check Python process in Activity Monitor → memory NOT released
12.Manual memory pressure release → Python process memory stays the same
13.Kill Python process in Activity Monitor → all memory instantly freed
14.Restart tasks → memory begins accumulating again (same cycle)
Expected Behavior:
Each embedding/retrieval task should complete with ALL temporary objects garbage collected
Memory footprint should stabilize after task completion (return to baseline ± 5%)
Running 50+ embedding tasks should NOT cause system degradation
Memory should not accumulate across multiple task cycles
Actual Behavior:
Memory accumulates monotonically: baseline + (X × task_count)
After 10-50 embedding tasks, system memory becomes critical
After 50-100 tasks, system becomes noticeably slow or freezes
Restarting tasks after manual memory release triggers the same accumulation cycle
Only workaround: Kill and restart the entire Python process
Desired Solution you'd like
Implement automatic memory management for the vector embedding pipeline:
Suggested Implementation Areas:
· File: /app/backend/open_webui/retrieval/utils.py - Function: generate_ollama_batch_embeddings()
· File: /app/backend/open_webui/retrieval/ - All embedding-related functions
· Add explicit cleanup in task completion handlers
Alternatives Considered
Alternative Solutions Considered
1.
User Workaround: Manually restart Python process periodically
2.
oIssue: Interrupts workflow, not practical for production use
3.
Configuration Option: Allow users to set max memory per task
4.
oIssue: Doesn't solve the core problem, just masks it
5.
Batch Size Limitation: Enforce small batch sizes to reduce per-task memory
6.
oIssue: Reduces functionality and doesn't address the leak
7.
Process Isolation: Run embedding tasks in separate processes
8.
oIssue: High overhead, complex to implement, doesn't fix the root cause
Additional Context
Alternative Solutions Considered
1.
User Workaround: Manually restart Python process periodically
2.
oIssue: Interrupts workflow, not practical for production use
3.
Configuration Option: Allow users to set max memory per task
4.
oIssue: Doesn't solve the core problem, just masks it
5.
Batch Size Limitation: Enforce small batch sizes to reduce per-task memory
6.
oIssue: Reduces functionality and doesn't address the leak
7.
Process Isolation: Run embedding tasks in separate processes
8.
oIssue: High overhead, complex to implement, doesn't fix the root cause