[GH-ISSUE #22384] issue: [RETRIEVAL_HINT] Leaks into Model Output with Knowledge Base Enabled (Structured JSON Responses) #58377

Closed
opened 2026-05-05 23:04:45 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @tcox1969 on GitHub (Mar 8, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22384

Installation Method

Pip Install

Open WebUI Version

0.8.8.8

Ollama Version (if applicable)

0.13.1 and 0.17.7

Operating System

Ubuntu 24.04.4 LTS - AMD64

Browser (if applicable)

Makes No Difference

Expected Behavior

RAG Tooling Upgrades complete their extraction before the system responds so that relevant chunks retrieved can be displayed in the response as requested.

Actual Behavior

Summary

When using OpenWebUI ≥0.8.x with a Knowledge Base attached to a model that returns deterministic structured JSON output, the internal placeholder token RETRIEVAL_HINT frequently leaks into the final response instead of being replaced by retrieved KB content.

This appears to be a race condition between generation and KB retrieval, where the model completes its output before the retrieval system injects the retrieved documents.

The behavior did not occur in OpenWebUI 0.6.41 using the same models and knowledge base.

Environment

OpenWebUI: 0.8.8.8
Ollama: 0.13.1
Model: custom model built with ollama create
Hardware: NVIDIA GPU system (VRAM configuration not relevant to issue)

Model parameters:

PARAMETER temperature 0.0
PARAMETER top_p 1.0
PARAMETER top_k 0
PARAMETER repeat_penalty 1.0
PARAMETER num_predict 256

Capabilities:

Built-in Tools: OFF
All model capabilities: OFF
Knowledge Base: attached

Steps to Reproduce

Create a model via Ollama:

ollama create mymodel:03072026 -f Modelfile

Promote model for production:

ollama cp mymodel:03072026 mymodel:prod

In OpenWebUI:

Disable all model capabilities

Attach a Knowledge Base

Query the model.

Prompt example (classifier):

Article Title:

The model is designed to output JSON:

{
"polarity": "...",
"score": ...,
"fud_ciw": "...",
"confidence": ...,
"rule_weights_triggered": [...]
}
Expected Result

Example expected output:

{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"rule 1",
"rule 10",
"rule 17",
"etc"
]
}
Actual Result

Frequently returns:

{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"RETRIEVAL_HINT",
"RETRIEVAL_HINT"
]
}

Occasionally the correct rule names appear, but ~99% of requests return the placeholder token instead.

Additional Observations

Occurs through both OpenWebUI API and UI interface.

Does not occur when running the model directly via ollama run.

Appears only when the Knowledge Base is attached.

The rest of the JSON response remains correct, suggesting the model still uses KB context but the retrieval hint is not replaced.

Likely Cause

The placeholder token RETRIEVAL_HINT appears to be emitted during generation when the KB retrieval system triggers. However, for short deterministic responses (e.g., JSON classifiers), generation finishes before the retrieval system replaces the placeholder.

This results in the internal retrieval marker leaking into the final output.

Impact

This breaks deterministic API use cases such as:

classification systems

structured JSON outputs

automated pipelines

The behavior makes OpenWebUI unsuitable for production inference in these cases.

Regression

OpenWebUI 0.6.41 with the same model, parameters, and KB does not exhibit this behavior.

Suggested Direction

Ensure KB retrieval occurs before generation begins, rather than relying on mid-generation retrieval hints, particularly when structured output modes are used.

Minimal Reproduction

Deterministic model (temperature=0)

Structured JSON output

Knowledge Base attached

Short responses

Notes

This issue appears specifically when the model produces short structured outputs, where generation completes before KB retrieval injection finishes.

Steps to Reproduce

Steps to Reproduce

Create a model via Ollama:

ollama create mymodel:03072026 -f Modelfile

Promote model for production:

ollama cp mymodel:03072026 mymodel:prod

In OpenWebUI:

Disable all model capabilities

Attach a Knowledge Base

Query the model.

Prompt example (classifier):

Article Title:

The model is designed to output JSON:

{
"polarity": "...",
"score": ...,
"fud_ciw": "...",
"confidence": ...,
"rule_weights_triggered": [...]
}
Expected Result

Example expected output:

{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"rule 1",
"rule 10",
"rule 17",
"etc"
]
}
Actual Result

Frequently returns:

{
"polarity": "negative",
"score": -0.8,
"fud_ciw": "TRUE",
"confidence": 0.95,
"rule_weights_triggered": [
"RETRIEVAL_HINT",
"RETRIEVAL_HINT"
]
}

Occasionally the correct rule names appear, but ~99% of requests return the placeholder token instead.

Logs & Screenshots

No logs are necessary, the output contains the leaked RETRIEVAL_HINT indicator instead of RAG pulled info

Additional Information

Every observation already included

Originally created by @tcox1969 on GitHub (Mar 8, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22384 ### Installation Method Pip Install ### Open WebUI Version 0.8.8.8 ### Ollama Version (if applicable) 0.13.1 and 0.17.7 ### Operating System Ubuntu 24.04.4 LTS - AMD64 ### Browser (if applicable) Makes No Difference ### Expected Behavior RAG Tooling Upgrades complete their extraction before the system responds so that relevant chunks retrieved can be displayed in the response as requested. ### Actual Behavior **Summary** When using OpenWebUI ≥0.8.x with a Knowledge Base attached to a model that returns deterministic structured JSON output, the internal placeholder token RETRIEVAL_HINT frequently leaks into the final response instead of being replaced by retrieved KB content. This appears to be a race condition between generation and KB retrieval, where the model completes its output before the retrieval system injects the retrieved documents. The behavior did not occur in OpenWebUI 0.6.41 using the same models and knowledge base. **Environment** OpenWebUI: 0.8.8.8 Ollama: 0.13.1 Model: custom model built with ollama create Hardware: NVIDIA GPU system (VRAM configuration not relevant to issue) **Model parameters:** PARAMETER temperature 0.0 PARAMETER top_p 1.0 PARAMETER top_k 0 PARAMETER repeat_penalty 1.0 PARAMETER num_predict 256 **Capabilities:** Built-in Tools: OFF All model capabilities: OFF Knowledge Base: attached **Steps to Reproduce** Create a model via Ollama: ollama create mymodel:03072026 -f Modelfile Promote model for production: ollama cp mymodel:03072026 mymodel:prod In OpenWebUI: Disable all model capabilities Attach a Knowledge Base Query the model. Prompt example (classifier): Article Title: <Some Article Title Appropriate to Trained Model for Classification> The model is designed to output JSON: { "polarity": "...", "score": ..., "fud_ciw": "...", "confidence": ..., "rule_weights_triggered": [...] } Expected Result Example expected output: { "polarity": "negative", "score": -0.8, "fud_ciw": "TRUE", "confidence": 0.95, "rule_weights_triggered": [ "rule 1", "rule 10", "rule 17", "etc" ] } Actual Result Frequently returns: { "polarity": "negative", "score": -0.8, "fud_ciw": "TRUE", "confidence": 0.95, "rule_weights_triggered": [ "RETRIEVAL_HINT", "RETRIEVAL_HINT" ] } Occasionally the correct rule names appear, but ~99% of requests return the placeholder token instead. **Additional Observations** Occurs through both OpenWebUI API and UI interface. Does not occur when running the model directly via ollama run. Appears only when the Knowledge Base is attached. The rest of the JSON response remains correct, suggesting the model still uses KB context but the retrieval hint is not replaced. **Likely Cause** The placeholder token RETRIEVAL_HINT appears to be emitted during generation when the KB retrieval system triggers. However, for short deterministic responses (e.g., JSON classifiers), generation finishes before the retrieval system replaces the placeholder. This results in the internal retrieval marker leaking into the final output. **Impact** This breaks deterministic API use cases such as: classification systems structured JSON outputs automated pipelines The behavior makes OpenWebUI unsuitable for production inference in these cases. **Regression** OpenWebUI 0.6.41 with the same model, parameters, and KB does not exhibit this behavior. **Suggested Direction** Ensure KB retrieval occurs before generation begins, rather than relying on mid-generation retrieval hints, particularly when structured output modes are used. Minimal Reproduction Deterministic model (temperature=0) Structured JSON output Knowledge Base attached Short responses **Notes** This issue appears specifically when the model produces short structured outputs, where generation completes before KB retrieval injection finishes. ### Steps to Reproduce **Steps to Reproduce** Create a model via Ollama: ollama create mymodel:03072026 -f Modelfile Promote model for production: ollama cp mymodel:03072026 mymodel:prod In OpenWebUI: Disable all model capabilities Attach a Knowledge Base Query the model. Prompt example (classifier): Article Title: <Some Article Title Appropriate to Trained Model for Classification> The model is designed to output JSON: { "polarity": "...", "score": ..., "fud_ciw": "...", "confidence": ..., "rule_weights_triggered": [...] } Expected Result Example expected output: { "polarity": "negative", "score": -0.8, "fud_ciw": "TRUE", "confidence": 0.95, "rule_weights_triggered": [ "rule 1", "rule 10", "rule 17", "etc" ] } Actual Result Frequently returns: { "polarity": "negative", "score": -0.8, "fud_ciw": "TRUE", "confidence": 0.95, "rule_weights_triggered": [ "RETRIEVAL_HINT", "RETRIEVAL_HINT" ] } Occasionally the correct rule names appear, but ~99% of requests return the placeholder token instead. ### Logs & Screenshots No logs are necessary, the output contains the leaked RETRIEVAL_HINT indicator instead of RAG pulled info ### Additional Information Every observation already included
GiteaMirror added the bug label 2026-05-05 23:04:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58377