[GH-ISSUE #23256] issue: Reranker: raw prompt deprecation warning with vLLM ≥ 0.18 (InputProcessor raw prompts removed) #58599

New Issue

GiteaMirror · 2026-05-05T23:30:55-05:00

GiteaMirror commented

2026-05-05 23:30:55 -05:00

Originally created by @danialkhatib on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23256

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.8.12

Ollama Version (if applicable)

No response

Operating System

Red Hat Enterprise Linux release 9.6

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Bug Report: Reranker sends raw prompts to vLLM `InputProcessor` (deprecated in v0.18)

Labels: bug, rag / reranker, vllm

Environment


Open WebUI	v0.8.12
vLLM	0.18.1
Reranker model	zeroentropy/zerank-2
vLLM runner	pooling
Endpoint	POST /v1/rerank
dtype	bfloat16

Expected Behavior

Rerank requests from Open WebUI should be formatted in a way that is compatible with vLLM's InputProcessor. Prompts should be passed through vLLM's Renderer.render_cmpl() or Renderer.render_chat() pipeline before being submitted to the /v1/rerank endpoint, producing no deprecation warnings and ensuring forward compatibility with vLLM ≥ 0.18.

Actual Behavior

On every rerank request, vLLM 0.18.1 logs the following deprecation warning:

WARNING [input_processor.py:227] Passing raw prompts to InputProcessor is deprecated
and will be removed in v0.18. You should instead pass the outputs of
Renderer.render_cmpl() or Renderer.render_chat().

Open WebUI is sending raw prompt strings directly to /v1/rerank instead of using the rendered prompt format that vLLM now expects. Requests currently return 200 OK, but this is likely relying on a fallback that may be fully removed in a future patch, which would break reranking silently or with an unhandled error.

Relevant vLLM Logs

(APIServer pid=1) WARNING 03-31 10:28:02 [input_processor.py:227]
Passing raw prompts to InputProcessor is deprecated and will be
removed in v0.18. You should instead pass the outputs of
Renderer.render_cmpl() or Renderer.render_chat().

(APIServer pid=1) INFO 10.51.2.4:57568 - "POST /v1/rerank HTTP/1.1" 200 OK

Additional Context

The fix needs to be applied in Open WebUI's reranker HTTP client code — specifically wherever it constructs the request body for POST /v1/rerank. The prompt strings need to be passed through vLLM's Renderer before being submitted, or the payload format needs to match what vLLM's InputProcessor now expects as a non-raw-prompt input.

vLLM issue reference: https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/input_processor.py

Steps to Reproduce

Deploy a vLLM reranker with --runner pooling and model zeroentropy/zerank-2 on vLLM 0.18.1
Configure Open WebUI v0.8.12 to use the reranker at /v1/rerank
Trigger a RAG query that invokes the reranker
Observe the InputProcessor deprecation warning in vLLM logs on every request

Logs & Screenshots

(APIServer pid=1) WARNING 03-31 10:28:02 [input_processor.py:227] Passing raw prompts to InputProcessor is deprecated and will be removed in v0.18. You should instead pass the outputs of Renderer.render_cmpl() or Renderer.render_chat().

Additional Information

Vllm Config

      - "zeroentropy/zerank-2"
      - "--runner"
      - "pooling"
      - "--hf-overrides"
      - '{"architectures":["Qwen3ForSequenceClassification"],"classifier_from_token":["no","yes"],"is_original_qwen3_reranker":true}'
      - "--trust-remote-code"
      - "--dtype"
      - "bfloat16"
      - "--enable-prefix-caching"
      - "--max-model-len"
      - "8192"
      - "--max-num-batched-tokens"
      - "24576"
      - "--max-num-seqs"
      -  "192"
      - "--gpu-memory-utilization"
      - "0.95"
      - "--host"
      - "0.0.0.0"
      - "--port"
      - "2020"

Originally created by @danialkhatib on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23256 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.8.12 ### Ollama Version (if applicable) _No response_ ### Operating System Red Hat Enterprise Linux release 9.6 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior ## Bug Report: Reranker sends raw prompts to vLLM `InputProcessor` (deprecated in v0.18) **Labels:** `bug`, `rag / reranker`, `vllm` --- ### Environment | | | |---|---| | Open WebUI | v0.8.12 | | vLLM | 0.18.1 | | Reranker model | zeroentropy/zerank-2 | | vLLM runner | pooling | | Endpoint | POST /v1/rerank | | dtype | bfloat16 | --- ### Expected Behavior Rerank requests from Open WebUI should be formatted in a way that is compatible with vLLM's `InputProcessor`. Prompts should be passed through vLLM's `Renderer.render_cmpl()` or `Renderer.render_chat()` pipeline before being submitted to the `/v1/rerank` endpoint, producing no deprecation warnings and ensuring forward compatibility with vLLM ≥ 0.18. --- ### Actual Behavior On every rerank request, vLLM 0.18.1 logs the following deprecation warning: ``` WARNING [input_processor.py:227] Passing raw prompts to InputProcessor is deprecated and will be removed in v0.18. You should instead pass the outputs of Renderer.render_cmpl() or Renderer.render_chat(). ``` Open WebUI is sending raw prompt strings directly to `/v1/rerank` instead of using the rendered prompt format that vLLM now expects. Requests currently return `200 OK`, but this is likely relying on a fallback that may be fully removed in a future patch, which would break reranking silently or with an unhandled error. --- ### Relevant vLLM Logs ``` (APIServer pid=1) WARNING 03-31 10:28:02 [input_processor.py:227] Passing raw prompts to InputProcessor is deprecated and will be removed in v0.18. You should instead pass the outputs of Renderer.render_cmpl() or Renderer.render_chat(). (APIServer pid=1) INFO 10.51.2.4:57568 - "POST /v1/rerank HTTP/1.1" 200 OK ``` --- ### Additional Context The fix needs to be applied in Open WebUI's reranker HTTP client code — specifically wherever it constructs the request body for `POST /v1/rerank`. The prompt strings need to be passed through vLLM's `Renderer` before being submitted, or the payload format needs to match what vLLM's `InputProcessor` now expects as a non-raw-prompt input. vLLM issue reference: https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/input_processor.py ### Steps to Reproduce ### Steps to Reproduce 1. Deploy a vLLM reranker with `--runner pooling` and model `zeroentropy/zerank-2` on vLLM 0.18.1 2. Configure Open WebUI v0.8.12 to use the reranker at `/v1/rerank` 3. Trigger a RAG query that invokes the reranker 4. Observe the `InputProcessor` deprecation warning in vLLM logs on every request ### Logs & Screenshots ``` (APIServer pid=1) WARNING 03-31 10:28:02 [input_processor.py:227] Passing raw prompts to InputProcessor is deprecated and will be removed in v0.18. You should instead pass the outputs of Renderer.render_cmpl() or Renderer.render_chat(). ``` ### Additional Information Vllm Config ``` - "zeroentropy/zerank-2" - "--runner" - "pooling" - "--hf-overrides" - '{"architectures":["Qwen3ForSequenceClassification"],"classifier_from_token":["no","yes"],"is_original_qwen3_reranker":true}' - "--trust-remote-code" - "--dtype" - "bfloat16" - "--enable-prefix-caching" - "--max-model-len" - "8192" - "--max-num-batched-tokens" - "24576" - "--max-num-seqs" - "192" - "--gpu-memory-utilization" - "0.95" - "--host" - "0.0.0.0" - "--port" - "2020" ```

GiteaMirror added the bug label 2026-05-05 23:30:55 -05:00

GiteaMirror closed this issue

2026-05-05 23:30:57 -05:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#58599

[GH-ISSUE #23256] issue: Reranker: raw prompt deprecation warning with vLLM ≥ 0.18 (InputProcessor raw prompts removed) #58599

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Bug Report: Reranker sends raw prompts to vLLM InputProcessor (deprecated in v0.18)

Environment

Expected Behavior

Actual Behavior

Relevant vLLM Logs

Additional Context

Steps to Reproduce

Steps to Reproduce

Logs & Screenshots

Additional Information

Bug Report: Reranker sends raw prompts to vLLM `InputProcessor` (deprecated in v0.18)