[GH-ISSUE #18412] feat: Add Automatic Memory Management and Garbage Collection for Vector Embedding Tasks #34118

Closed
opened 2026-04-25 08:03:02 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Baireinhold on GitHub (Oct 18, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18412

Check Existing Issues

  • I have searched all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Problem Description

Problem Description
Currently, Open WebUI's vector embedding and retrieval pipeline lacks an automatic memory cleanup mechanism. When performing batch embedding or retrieval tasks, the Python process accumulates memory cumulatively without releasing temporary objects, buffers, or cached data after task completion. This design flaw makes batch operations impractical and degrades system performance over time.
Environment & Platform:
OS: macOS (native Python process)
Setup: Ollama deployed locally on macOS
Python Version: (Please specify your Python version)
Open WebUI Version: (Please specify your version)
Dependencies: Ollama vector model, requests library
Installation & Configuration Steps:

1. Install Ollama on macOS

Download from: https://ollama.ai

2. Pull a vector embedding model

ollama pull bge-m3

3. Verify Ollama is running

curl http://localhost:11434/api/tags

4. Install Open WebUI

pip install open-webui

5. Start Open WebUI

open-webui serve

6. Access via browser

http://localhost:8080

Exact Steps to Reproduce:
1.Open Activity Monitor (Cmd+Space → Activity Monitor)
2.Search for Python process running Open WebUI
3.Note initial memory usage (baseline)
4.Navigate to Open WebUI → Documents/Files
5.Upload a text document (example: any .txt or .pdf file)
6.Click "Embed and Store" or perform vector embedding task
7.Wait for task completion
8.Observe: Memory usage increases by X MB
9.Repeat Step 6-8 five times with different documents
10.Observe: Memory increases cumulatively (~X × 5 MB or more)
11.Check Python process in Activity Monitor → memory NOT released
12.Manual memory pressure release → Python process memory stays the same
13.Kill Python process in Activity Monitor → all memory instantly freed
14.Restart tasks → memory begins accumulating again (same cycle)
Expected Behavior:
Each embedding/retrieval task should complete with ALL temporary objects garbage collected
Memory footprint should stabilize after task completion (return to baseline ± 5%)
Running 50+ embedding tasks should NOT cause system degradation
Memory should not accumulate across multiple task cycles
Actual Behavior:
Memory accumulates monotonically: baseline + (X × task_count)
After 10-50 embedding tasks, system memory becomes critical
After 50-100 tasks, system becomes noticeably slow or freezes
Restarting tasks after manual memory release triggers the same accumulation cycle
Only workaround: Kill and restart the entire Python process

Desired Solution you'd like

Implement automatic memory management for the vector embedding pipeline:

  1. Explicit Garbage Collection - Call gc.collect() after each embedding task completes
  2. Context Managers - Use context managers (with statements) to ensure proper resource cleanup
  3. Connection Pool Management - Properly close HTTP connections/sessions after API calls to Ollama
  4. Batch Processing Cleanup - Clear temporary arrays, buffers, and cached embeddings after batch processing
  5. Memory Profiling - Add optional logging to track memory usage per task (debug mode)
  6. Documentation - Add best practices for batch operations to reduce memory impact
    Suggested Implementation Areas:
    · File: /app/backend/open_webui/retrieval/utils.py - Function: generate_ollama_batch_embeddings()
    · File: /app/backend/open_webui/retrieval/ - All embedding-related functions
    · Add explicit cleanup in task completion handlers

Alternatives Considered

Alternative Solutions Considered
1.
User Workaround: Manually restart Python process periodically
2.
oIssue: Interrupts workflow, not practical for production use
3.
Configuration Option: Allow users to set max memory per task
4.
oIssue: Doesn't solve the core problem, just masks it
5.
Batch Size Limitation: Enforce small batch sizes to reduce per-task memory
6.
oIssue: Reduces functionality and doesn't address the leak
7.
Process Isolation: Run embedding tasks in separate processes
8.
oIssue: High overhead, complex to implement, doesn't fix the root cause

Additional Context

Alternative Solutions Considered
1.
User Workaround: Manually restart Python process periodically
2.
oIssue: Interrupts workflow, not practical for production use
3.
Configuration Option: Allow users to set max memory per task
4.
oIssue: Doesn't solve the core problem, just masks it
5.
Batch Size Limitation: Enforce small batch sizes to reduce per-task memory
6.
oIssue: Reduces functionality and doesn't address the leak
7.
Process Isolation: Run embedding tasks in separate processes
8.
oIssue: High overhead, complex to implement, doesn't fix the root cause

Originally created by @Baireinhold on GitHub (Oct 18, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/18412 ### Check Existing Issues - [x] I have searched all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request. ### Problem Description Problem Description Currently, Open WebUI's vector embedding and retrieval pipeline lacks an automatic memory cleanup mechanism. When performing batch embedding or retrieval tasks, the Python process accumulates memory cumulatively without releasing temporary objects, buffers, or cached data after task completion. This design flaw makes batch operations impractical and degrades system performance over time. Environment & Platform: OS: macOS (native Python process) Setup: Ollama deployed locally on macOS Python Version: (Please specify your Python version) Open WebUI Version: (Please specify your version) Dependencies: Ollama vector model, requests library Installation & Configuration Steps: # 1. Install Ollama on macOS # Download from: https://ollama.ai # 2. Pull a vector embedding model ollama pull bge-m3 # 3. Verify Ollama is running curl http://localhost:11434/api/tags # 4. Install Open WebUI pip install open-webui # 5. Start Open WebUI open-webui serve # 6. Access via browser # http://localhost:8080 Exact Steps to Reproduce: 1.Open Activity Monitor (Cmd+Space → Activity Monitor) 2.Search for Python process running Open WebUI 3.Note initial memory usage (baseline) 4.Navigate to Open WebUI → Documents/Files 5.Upload a text document (example: any .txt or .pdf file) 6.Click "Embed and Store" or perform vector embedding task 7.Wait for task completion 8.Observe: Memory usage increases by X MB 9.Repeat Step 6-8 five times with different documents 10.Observe: Memory increases cumulatively (~X × 5 MB or more) 11.Check Python process in Activity Monitor → memory NOT released 12.Manual memory pressure release → Python process memory stays the same 13.Kill Python process in Activity Monitor → all memory instantly freed 14.Restart tasks → memory begins accumulating again (same cycle) Expected Behavior: Each embedding/retrieval task should complete with ALL temporary objects garbage collected Memory footprint should stabilize after task completion (return to baseline ± 5%) Running 50+ embedding tasks should NOT cause system degradation Memory should not accumulate across multiple task cycles Actual Behavior: Memory accumulates monotonically: baseline + (X × task_count) After 10-50 embedding tasks, system memory becomes critical After 50-100 tasks, system becomes noticeably slow or freezes Restarting tasks after manual memory release triggers the same accumulation cycle Only workaround: Kill and restart the entire Python process ### Desired Solution you'd like Implement automatic memory management for the vector embedding pipeline: 1. Explicit Garbage Collection - Call gc.collect() after each embedding task completes 2. Context Managers - Use context managers (with statements) to ensure proper resource cleanup 3. Connection Pool Management - Properly close HTTP connections/sessions after API calls to Ollama 4. Batch Processing Cleanup - Clear temporary arrays, buffers, and cached embeddings after batch processing 5. Memory Profiling - Add optional logging to track memory usage per task (debug mode) 6. Documentation - Add best practices for batch operations to reduce memory impact Suggested Implementation Areas: · File: /app/backend/open_webui/retrieval/utils.py - Function: generate_ollama_batch_embeddings() · File: /app/backend/open_webui/retrieval/ - All embedding-related functions · Add explicit cleanup in task completion handlers ### Alternatives Considered Alternative Solutions Considered 1. User Workaround: Manually restart Python process periodically 2. oIssue: Interrupts workflow, not practical for production use 3. Configuration Option: Allow users to set max memory per task 4. oIssue: Doesn't solve the core problem, just masks it 5. Batch Size Limitation: Enforce small batch sizes to reduce per-task memory 6. oIssue: Reduces functionality and doesn't address the leak 7. Process Isolation: Run embedding tasks in separate processes 8. oIssue: High overhead, complex to implement, doesn't fix the root cause ### Additional Context Alternative Solutions Considered 1. User Workaround: Manually restart Python process periodically 2. oIssue: Interrupts workflow, not practical for production use 3. Configuration Option: Allow users to set max memory per task 4. oIssue: Doesn't solve the core problem, just masks it 5. Batch Size Limitation: Enforce small batch sizes to reduce per-task memory 6. oIssue: Reduces functionality and doesn't address the leak 7. Process Isolation: Run embedding tasks in separate processes 8. oIssue: High overhead, complex to implement, doesn't fix the root cause
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#34118