feat: Support memory retrieval reranking for improved context personalization #6001

New Issue

GiteaMirror · 2025-11-11T16:41:58-06:00

GiteaMirror commented

2025-11-11 16:41:58 -06:00

Originally created by @longzanxi on GitHub (Aug 8, 2025).

Check Existing Issues

I have searched the existing issues and discussions.

Problem Description

Currently, the Open WebUI user memory retrieval only supports vector similarity search and does not include a reranking step. As a result, some long-term user preferences or factual memories—though highly relevant to the current conversation—may not be surfaced, or may be overshadowed by less relevant results. This impacts context accuracy and reduces the personalization capabilities of the chatbot.

Desired Solution you'd like

I propose adding an optional reranking step to the memory retrieval pipeline, similar to what is available for knowledge base RAG retrieval. Specifically:

After the initial vector search, apply reranking logic to select the top-k most relevant memories for context injection.
Allow configuration for enabling/disabling memory reranking, setting top-k values, reranker model choice, and relevance thresholds (e.g., ENABLE_MEMORY_RERANK, MEMORY_TOP_K, MEMORY_TOP_K_RERANKER, MEMORY_RELEVANCE_THRESHOLD).
Make reranking an optional feature, defaulting to off but configurable for advanced users or developers.
Update documentation with usage and configuration instructions for memory rerank.
Implementation suggestion: In memory_handler.py / chat_memory_handler.py, apply reranking after VECTOR_DB_CLIENT.search, reusing the process from RAG retrieval.

Alternatives Considered

Key information can be stored in the knowledge base and injected via RAG with reranking, but this is inflexible for dynamic user memory.
Alternatively, reranking could be manually implemented in a proxy layer or frontend after /memories/query, but this is less efficient and not integrated.

Additional Context

Related issue: RAG Hybrid Search still broken (#15915)

No public discussions or issues currently address reranking for user memory. This feature would improve context accuracy and long-term personalization for advanced users and complex scenarios.

If more technical details are needed or collaboration is welcome, I'm happy to provide further input.

Originally created by @longzanxi on GitHub (Aug 8, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Currently, the Open WebUI user memory retrieval only supports vector similarity search and does not include a reranking step. As a result, some long-term user preferences or factual memories—though highly relevant to the current conversation—may not be surfaced, or may be overshadowed by less relevant results. This impacts context accuracy and reduces the personalization capabilities of the chatbot. ### Desired Solution you'd like I propose adding an optional reranking step to the memory retrieval pipeline, similar to what is available for knowledge base RAG retrieval. Specifically: - After the initial vector search, apply reranking logic to select the top-k most relevant memories for context injection. - Allow configuration for enabling/disabling memory reranking, setting top-k values, reranker model choice, and relevance thresholds (e.g., ENABLE_MEMORY_RERANK, MEMORY_TOP_K, MEMORY_TOP_K_RERANKER, MEMORY_RELEVANCE_THRESHOLD). - Make reranking an optional feature, defaulting to off but configurable for advanced users or developers. - Update documentation with usage and configuration instructions for memory rerank. Implementation suggestion: In memory_handler.py / chat_memory_handler.py, apply reranking after VECTOR_DB_CLIENT.search, reusing the process from RAG retrieval. ### Alternatives Considered - Key information can be stored in the knowledge base and injected via RAG with reranking, but this is inflexible for dynamic user memory. - Alternatively, reranking could be manually implemented in a proxy layer or frontend after /memories/query, but this is less efficient and not integrated. ### Additional Context Related issue: [RAG Hybrid Search still broken (#15915)](https://github.com/open-webui/open-webui/issues/15915) No public discussions or issues currently address reranking for user memory. This feature would improve context accuracy and long-term personalization for advanced users and complex scenarios. If more technical details are needed or collaboration is welcome, I'm happy to provide further input.

GiteaMirror commented

2025-11-11 16:41:59 -06:00

@onestardao commented on GitHub (Aug 9, 2025):

This is essentially the vectorstore ranking drift problem — after retrieval, the ranking phase can drift away from the most relevant context, especially in long-term memory personalization.

We’ve documented this as No.7 in our AI failure Problem Map, along with reproducible cases and tested fixes. If you want the write-up and implementation pattern, I can share it.

@onestardao commented on GitHub (Aug 9, 2025): This is essentially the vectorstore ranking drift problem — after retrieval, the ranking phase can drift away from the most relevant context, especially in long-term memory personalization. We’ve documented this as No.7 in our AI failure Problem Map, along with reproducible cases and tested fixes. If you want the write-up and implementation pattern, I can share it.

GiteaMirror commented

2025-11-11 16:41:59 -06:00

@longzanxi commented on GitHub (Aug 9, 2025):

@onestardao Thanks! Yes, I’d really appreciate it if you could share the Problem Map No.7 write-up and the implementation pattern (repro cases, metrics, fixes).

@longzanxi commented on GitHub (Aug 9, 2025): @onestardao Thanks! Yes, I’d really appreciate it if you could share the Problem Map No.7 write-up and the implementation pattern (repro cases, metrics, fixes).

GiteaMirror commented

2025-11-11 16:41:59 -06:00

@onestardao commented on GitHub (Aug 9, 2025):

Yes — this is exactly the vectorstore ranking drift problem we documented as No.7 in the WFGY Problem Map.
The write-up includes reproducible cases, metrics, and the exact fix pattern.

Full details & implementation steps:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

@onestardao commented on GitHub (Aug 9, 2025): Yes — this is exactly the vectorstore ranking drift problem we documented as No.7 in the WFGY Problem Map. The write-up includes reproducible cases, metrics, and the exact fix pattern. Full details & implementation steps: https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

GiteaMirror commented

2025-11-11 16:42:00 -06:00

@i-iooi-i commented on GitHub (Sep 21, 2025):

Despite storing numerous memories, I've observed that the AI consistently accesses only a small, fixed subset, with the rest remaining unreachable. I suspect this limitation might stem from the embedding model. The current memory feature feels quite underdeveloped, and I eagerly anticipate its full implementation. For now, I am relying on system prompts to manage critical information.

@i-iooi-i commented on GitHub (Sep 21, 2025): Despite storing numerous memories, I've observed that the AI consistently accesses only a small, fixed subset, with the rest remaining unreachable. I suspect this limitation might stem from the embedding model. The current memory feature feels quite underdeveloped, and I eagerly anticipate its full implementation. For now, I am relying on system prompts to manage critical information.

GiteaMirror referenced this issue

2025-11-11 18:00:51 -06:00

[PR #6001] [CLOSED] ja-jp transrate file update. #8605

GiteaMirror referenced this issue

2026-04-20 03:43:53 -05:00

[PR #6001] [CLOSED] ja-jp transrate file update. #21809

GiteaMirror referenced this issue

2026-04-25 10:55:14 -05:00

[PR #6001] [CLOSED] ja-jp transrate file update. #37439

GiteaMirror referenced this issue

2026-04-29 19:05:58 -05:00

[PR #6001] [CLOSED] ja-jp transrate file update. #44857