mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #22384] issue: [RETRIEVAL_HINT] Leaks into Model Output with Knowledge Base Enabled (Structured JSON Responses) #58377
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tcox1969 on GitHub (Mar 8, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22384
Installation Method
Pip Install
Open WebUI Version
0.8.8.8
Ollama Version (if applicable)
0.13.1 and 0.17.7
Operating System
Ubuntu 24.04.4 LTS - AMD64
Browser (if applicable)
Makes No Difference
Expected Behavior
RAG Tooling Upgrades complete their extraction before the system responds so that relevant chunks retrieved can be displayed in the response as requested.
Actual Behavior
Summary
When using OpenWebUI ≥0.8.x with a Knowledge Base attached to a model that returns deterministic structured JSON output, the internal placeholder token RETRIEVAL_HINT frequently leaks into the final response instead of being replaced by retrieved KB content.
This appears to be a race condition between generation and KB retrieval, where the model completes its output before the retrieval system injects the retrieved documents.
The behavior did not occur in OpenWebUI 0.6.41 using the same models and knowledge base.
Environment
OpenWebUI: 0.8.8.8
Ollama: 0.13.1
Model: custom model built with ollama create
Hardware: NVIDIA GPU system (VRAM configuration not relevant to issue)
Model parameters:
PARAMETER temperature 0.0
PARAMETER top_p 1.0
PARAMETER top_k 0
PARAMETER repeat_penalty 1.0
PARAMETER num_predict 256
Capabilities:
Built-in Tools: OFF
All model capabilities: OFF
Knowledge Base: attached
Steps to Reproduce
Create a model via Ollama:
ollama create mymodel:03072026 -f Modelfile
Promote model for production:
ollama cp mymodel:03072026 mymodel:prod
In OpenWebUI:
Disable all model capabilities
Attach a Knowledge Base
Query the model.
Prompt example (classifier):
Article Title:
The model is designed to output JSON:
{
"polarity": "...",
"score": ...,
"fud_ciw": "...",
"confidence": ...,
"rule_weights_triggered": [...]
}
Expected Result
Example expected output:
{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"rule 1",
"rule 10",
"rule 17",
"etc"
]
}
Actual Result
Frequently returns:
{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"RETRIEVAL_HINT",
"RETRIEVAL_HINT"
]
}
Occasionally the correct rule names appear, but ~99% of requests return the placeholder token instead.
Additional Observations
Occurs through both OpenWebUI API and UI interface.
Does not occur when running the model directly via ollama run.
Appears only when the Knowledge Base is attached.
The rest of the JSON response remains correct, suggesting the model still uses KB context but the retrieval hint is not replaced.
Likely Cause
The placeholder token RETRIEVAL_HINT appears to be emitted during generation when the KB retrieval system triggers. However, for short deterministic responses (e.g., JSON classifiers), generation finishes before the retrieval system replaces the placeholder.
This results in the internal retrieval marker leaking into the final output.
Impact
This breaks deterministic API use cases such as:
classification systems
structured JSON outputs
automated pipelines
The behavior makes OpenWebUI unsuitable for production inference in these cases.
Regression
OpenWebUI 0.6.41 with the same model, parameters, and KB does not exhibit this behavior.
Suggested Direction
Ensure KB retrieval occurs before generation begins, rather than relying on mid-generation retrieval hints, particularly when structured output modes are used.
Minimal Reproduction
Deterministic model (temperature=0)
Structured JSON output
Knowledge Base attached
Short responses
Notes
This issue appears specifically when the model produces short structured outputs, where generation completes before KB retrieval injection finishes.
Steps to Reproduce
Steps to Reproduce
Create a model via Ollama:
ollama create mymodel:03072026 -f Modelfile
Promote model for production:
ollama cp mymodel:03072026 mymodel:prod
In OpenWebUI:
Disable all model capabilities
Attach a Knowledge Base
Query the model.
Prompt example (classifier):
Article Title:
The model is designed to output JSON:
{
"polarity": "...",
"score": ...,
"fud_ciw": "...",
"confidence": ...,
"rule_weights_triggered": [...]
}
Expected Result
Example expected output:
{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"rule 1",
"rule 10",
"rule 17",
"etc"
]
}
Actual Result
Frequently returns:
{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"RETRIEVAL_HINT",
"RETRIEVAL_HINT"
]
}
Occasionally the correct rule names appear, but ~99% of requests return the placeholder token instead.
Logs & Screenshots
No logs are necessary, the output contains the leaked RETRIEVAL_HINT indicator instead of RAG pulled info
Additional Information
Every observation already included