issue: upstream headers are dropped for non-streaming chat completions request #5674

New Issue

GiteaMirror · 2025-11-11T16:28:43-06:00

GiteaMirror commented

2025-11-11 16:28:43 -06:00

Originally created by @Simon-Stone on GitHub (Jul 1, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.6.15

Ollama Version (if applicable)

No response

Operating System

Any

Browser (if applicable)

Any

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When making requests against /api/chat/completions, headers from the upstream APIs (OpenAI, Anthropic, LiteLLM) should be forwarded.

Actual Behavior

Headers are currently only passed through from the upstream for streaming requests, but not for non-streamed requests because of this bit of code:

The else branch only returns the actual payload of the response, but not the headers.

Steps to Reproduce

This can be confirmed with the following two curl commands:

curl -v --location '<your_owui_url>/api/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
  "model": <model_id>,
  "stream": true,
  "messages": [
    {
      "role": "user",
      "content": "what llm are you"
    }
  ]
}'

versus

curl -v --location '<your_owui_url>/api/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
  "model": <model_id>,
  "messages": [
    {
      "role": "user",
      "content": "what llm are you"
    }
  ]
}'

The first call returns all headers from the upstream API, the second does not.

Logs & Screenshots

No relevant logs

Additional Information

No response

Originally created by @Simon-Stone on GitHub (Jul 1, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.6.15 ### Ollama Version (if applicable) _No response_ ### Operating System Any ### Browser (if applicable) Any ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When making requests against `/api/chat/completions`, headers from the upstream APIs (OpenAI, Anthropic, LiteLLM) should be forwarded. ### Actual Behavior Headers are currently only passed through from the upstream for streaming requests, but not for non-streamed requests because of this [bit of code](https://github.com/open-webui/open-webui/blob/de018f091260ae757a61e9e4d8691be6442e3ea6/backend/open_webui/routers/openai.py#L849): The `else` branch only returns the actual payload of the response, but not the headers. ### Steps to Reproduce This can be confirmed with the following two curl commands: ``` curl -v --location '<your_owui_url>/api/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --data '{ "model": <model_id>, "stream": true, "messages": [ { "role": "user", "content": "what llm are you" } ] }' ``` versus ``` curl -v --location '<your_owui_url>/api/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --data '{ "model": <model_id>, "messages": [ { "role": "user", "content": "what llm are you" } ] }' ``` The first call returns all headers from the upstream API, the second does not. ### Logs & Screenshots No relevant logs ### Additional Information _No response_

GiteaMirror added the bug label 2025-11-11 16:28:43 -06:00

GiteaMirror closed this issue

2025-11-11 16:28:44 -06:00

GiteaMirror commented

2025-11-11 16:28:45 -06:00

@jackthgu commented on GitHub (Jul 2, 2025):

Hello, @Simon-Stone

Thank you for your thoughtful check.

I have verified everything, and confirmed that when stream is set to false, the endpoint does not return information such as tokens.

While these details could certainly be useful internally, I'm wondering in what situations they would actually be needed at the endpoint itself.

If you know of any good use cases, we’d love to hear them.

Thank you!

@jackthgu commented on GitHub (Jul 2, 2025): Hello, @Simon-Stone Thank you for your thoughtful check. I have verified everything, and confirmed that when stream is set to false, the endpoint does not return information such as tokens. While these details could certainly be useful internally, I'm wondering in what situations they would actually be needed at the endpoint itself. If you know of any good use cases, we’d love to hear them. Thank you!

GiteaMirror commented

2025-11-11 16:28:46 -06:00

@rgaricano commented on GitHub (Jul 2, 2025):

I think this is intentional behavior,
if is SSE endpoint (Server-Sent Events, continuous data streams), message header have to be check for status,... in each streamed message,
if not headers is on response.

@rgaricano commented on GitHub (Jul 2, 2025): I think this is intentional behavior, if is SSE endpoint (Server-Sent Events, continuous data streams), message header have to be check for status,... in each streamed message, if not headers is on response.

GiteaMirror commented

2025-11-11 16:28:46 -06:00

@Simon-Stone commented on GitHub (Jul 2, 2025):

There are all sorts of scenarios where having access to the response headers might be useful.

Two examples that come to mind would be rate limits, which are communicated in the headers by Anthropic and OpenAI (and I'm sure others, too) and response cost, which is included as a header by LiteLLM.

@Simon-Stone commented on GitHub (Jul 2, 2025): There are all sorts of scenarios where having access to the response headers might be useful. Two examples that come to mind would be rate limits, which are communicated in the headers by Anthropic and OpenAI (and I'm sure others, too) and response cost, which is included as a header by LiteLLM.

GiteaMirror commented

2025-11-11 16:28:47 -06:00

@rgaricano commented on GitHub (Jul 2, 2025):

But if not streaming function return a response r.json(), that I supose with headers in it, isn't?
de018f0912/backend/open_webui/routers/openai.py (L862)

@rgaricano commented on GitHub (Jul 2, 2025): But if not streaming function return a response r.json(), that I supose with headers in it, isn't? https://github.com/open-webui/open-webui/blob/de018f091260ae757a61e9e4d8691be6442e3ea6/backend/open_webui/routers/openai.py#L862

GiteaMirror commented

2025-11-11 16:28:47 -06:00

@Simon-Stone commented on GitHub (Jul 2, 2025):

No, that only returns the response body decoded as a JSON. If it did, we should see the headers in the second, non-streaming curl command.

@Simon-Stone commented on GitHub (Jul 2, 2025): No, that only returns the response *body* decoded as a JSON. If it did, we should see the headers in the second, non-streaming curl command.

GiteaMirror commented

2025-11-11 16:28:48 -06:00

@rgaricano commented on GitHub (Jul 2, 2025):

sure??

ricardo@ricardo-PC:/mnt/IAI/open-webui$ python3
Python 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> import json
>>> from pprint import pprint
>>> 
>>> url = 'https://httpbin.org/post'
>>> data = {'user':'me@example.com'}
>>> 
>>> # as payload
>>> response = requests.post(url, data=json.dumps(data))
>>> 
>>> result = response.json()
>>> pprint(result)
{'args': {},
 'data': '{"user": "me@example.com"}',
 'files': {},
 'form': {},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate, br, zstd',
             'Content-Length': '26',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.32.4',
             'X-Amzn-Trace-Id': 'Root=1-6865101e-16425f772afe997d2e880bf1'},
 'json': {'user': 'me@example.com'},
 'origin': '79.116.70.243',
 'url': 'https://httpbin.org/post'}
>>>

@rgaricano commented on GitHub (Jul 2, 2025): sure?? ``` ricardo@ricardo-PC:/mnt/IAI/open-webui$ python3 Python 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import requests >>> import json >>> from pprint import pprint >>> >>> url = 'https://httpbin.org/post' >>> data = {'user':'me@example.com'} >>> >>> # as payload >>> response = requests.post(url, data=json.dumps(data)) >>> >>> result = response.json() >>> pprint(result) {'args': {}, 'data': '{"user": "me@example.com"}', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Content-Length': '26', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.4', 'X-Amzn-Trace-Id': 'Root=1-6865101e-16425f772afe997d2e880bf1'}, 'json': {'user': 'me@example.com'}, 'origin': '79.116.70.243', 'url': 'https://httpbin.org/post'} >>> ```

GiteaMirror commented

2025-11-11 16:28:48 -06:00

@Simon-Stone commented on GitHub (Jul 2, 2025):

I believe those are the request headers. Take a look at response.headers in your example. Those are the headers returned by the server and those are not included in the JSON.

@Simon-Stone commented on GitHub (Jul 2, 2025): I believe those are the request headers. Take a look at `response.headers` in your example. Those are the headers returned by the server and those are not included in the JSON.

GiteaMirror commented

2025-11-11 16:28:49 -06:00

@rgaricano commented on GitHub (Jul 2, 2025):

that are result/response, (a json object):

{'args': {},
 'data': '{"user": "me@example.com"}',
 'files': {},
 'form': {},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate, br, zstd',
             'Content-Length': '26',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.32.4',
             'X-Amzn-Trace-Id': 'Root=1-6865101e-16425f772afe997d2e880bf1'},
 'json': {'user': 'me@example.com'},
 'origin': '79.116.70.243',
 'url': 'https://httpbin.org/post'}

ant this is result.json (another json object) (json formated content)

{'user': 'me@example.com'}

@rgaricano commented on GitHub (Jul 2, 2025): that are result/response, (a json object): ``` {'args': {}, 'data': '{"user": "me@example.com"}', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Content-Length': '26', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.4', 'X-Amzn-Trace-Id': 'Root=1-6865101e-16425f772afe997d2e880bf1'}, 'json': {'user': 'me@example.com'}, 'origin': '79.116.70.243', 'url': 'https://httpbin.org/post'} ``` ant this is result.json (another json object) (json formated content) ``` {'user': 'me@example.com'} ```

GiteaMirror commented

2025-11-11 16:28:51 -06:00

@Simon-Stone commented on GitHub (Jul 2, 2025):

Not sure what you mean. Take a look at this:

import requests
import json
from pprint import pprint

url = 'https://httpbin.org/post'
data = {'user':'me@example.com'}

# as payload
response = requests.post(url, data=json.dumps(data))

result = response.json()
pprint(result)
print("\n\n---\n\n")
pprint(response.headers)

Outputs:

{'args': {},
 'data': '{"user": "me@example.com"}',
 'files': {},
 'form': {},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate, br, zstd',
             'Content-Length': '26',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.32.4',
             'X-Amzn-Trace-Id': 'Root=1-68654538-58b14609178b2d8424257389'},
 'json': {'user': 'me@example.com'},
 'origin': '35.245.33.20',
 'url': 'https://httpbin.org/post'}


---


{'Date': 'Wed, 02 Jul 2025 14:42:02 GMT', 'Content-Type': 'application/json', 'Content-Length': '469', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}

The second part is the response headers, the first part is the request headers (as seen by the server, which may be affected by load balancers and such).

@Simon-Stone commented on GitHub (Jul 2, 2025): Not sure what you mean. Take a look at this: ``` import requests import json from pprint import pprint url = 'https://httpbin.org/post' data = {'user':'me@example.com'} # as payload response = requests.post(url, data=json.dumps(data)) result = response.json() pprint(result) print("\n\n---\n\n") pprint(response.headers) ``` Outputs: ``` {'args': {}, 'data': '{"user": "me@example.com"}', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Content-Length': '26', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.4', 'X-Amzn-Trace-Id': 'Root=1-68654538-58b14609178b2d8424257389'}, 'json': {'user': 'me@example.com'}, 'origin': '35.245.33.20', 'url': 'https://httpbin.org/post'} --- {'Date': 'Wed, 02 Jul 2025 14:42:02 GMT', 'Content-Type': 'application/json', 'Content-Length': '469', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} ``` The second part is the response headers, the first part is the request headers (as seen by the server, which may be affected by load balancers and such).

GiteaMirror commented

2025-11-11 16:28:51 -06:00

@rgaricano commented on GitHub (Jul 2, 2025):

yes, you have reason:

>>> import requests
>>> import json
>>> from pprint import pprint
>>> 
>>> url = 'https://httpbin.org/post'
>>> data = {'user':'me@example.com'}
>>> headers = { 'Content-Length': '58',
...              'Host': 'httpbin.org.net',
...              'Date': date.strftime("%c"),
...              'Access-Control-Allow-Credentials': 'false'}
>>> 
>>> # as payload
>>> response = requests.post(url, data=json.dumps(data), headers=headers)
>>> 
>>> result = response.json()
>>> pprint(result)
{'args': {},
 'data': '{"user": "me@example.com"}',
 'files': {},
 'form': {},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate, br, zstd',
             'Access-Control-Allow-Credentials': 'false',
             'Content-Length': '26',
             'Date': 'Wed Jul  2 17:46:07 2025',
             'Host': 'httpbin.org.net',
             'User-Agent': 'python-requests/2.32.4',
             'X-Amzn-Trace-Id': 'Root=1-686554bc-0ffb58ca54d80c1610688a21'},
 'json': {'user': 'me@example.com'},
 'origin': '79.116.70.243',
 'url': 'https://httpbin.org.net/post'}
>>> print("\n---\n")

---

>>> pprint(response.headers)
{'Date': 'Wed, 02 Jul 2025 15:48:12 GMT', 'Content-Type': 'application/json', 'Content-Length': '569', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
>>>

@rgaricano commented on GitHub (Jul 2, 2025): yes, you have reason: ``` >>> import requests >>> import json >>> from pprint import pprint >>> >>> url = 'https://httpbin.org/post' >>> data = {'user':'me@example.com'} >>> headers = { 'Content-Length': '58', ... 'Host': 'httpbin.org.net', ... 'Date': date.strftime("%c"), ... 'Access-Control-Allow-Credentials': 'false'} >>> >>> # as payload >>> response = requests.post(url, data=json.dumps(data), headers=headers) >>> >>> result = response.json() >>> pprint(result) {'args': {}, 'data': '{"user": "me@example.com"}', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Access-Control-Allow-Credentials': 'false', 'Content-Length': '26', 'Date': 'Wed Jul 2 17:46:07 2025', 'Host': 'httpbin.org.net', 'User-Agent': 'python-requests/2.32.4', 'X-Amzn-Trace-Id': 'Root=1-686554bc-0ffb58ca54d80c1610688a21'}, 'json': {'user': 'me@example.com'}, 'origin': '79.116.70.243', 'url': 'https://httpbin.org.net/post'} >>> print("\n---\n") --- >>> pprint(response.headers) {'Date': 'Wed, 02 Jul 2025 15:48:12 GMT', 'Content-Type': 'application/json', 'Content-Length': '569', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} >>> ```

GiteaMirror commented

2025-11-11 16:28:51 -06:00

@rgaricano commented on GitHub (Jul 2, 2025):

Then Simon, what do you propose?

return just r (session.request) and modify conversion ?

I was looking those, I didn't a deep search, just curious:

(for reference)
session & requests:
de018f0912/backend/open_webui/routers/openai.py (L837-L887)
59ba21bdf8/backend/open_webui/utils/chat.py (L262-L284)

response conversion:
(do a iterator in convert_response_ollama_to_openai similar to the one in below convert_streaming_response_ollama_to_openai & reassign object content ? )

59ba21bdf8/backend/open_webui/utils/response.py (L103-L130)

request object & methods:
https://www.w3schools.com/python/module_requests.asp
https://fastapi.tiangolo.com/reference/parameters/

@rgaricano commented on GitHub (Jul 2, 2025): Then Simon, what do you propose? return just `r` (session.request) and modify conversion ? I was looking those, I didn't a deep search, just curious: (for reference) session & requests: https://github.com/open-webui/open-webui/blob/de018f091260ae757a61e9e4d8691be6442e3ea6/backend/open_webui/routers/openai.py#L837-L887 https://github.com/open-webui/open-webui/blob/59ba21bdf8eb791a412db869a13ff76c6135b651/backend/open_webui/utils/chat.py#L262-L284 response conversion: (do a iterator in convert_response_ollama_to_openai similar to the one in below convert_streaming_response_ollama_to_openai & reassign object content ? ) https://github.com/open-webui/open-webui/blob/59ba21bdf8eb791a412db869a13ff76c6135b651/backend/open_webui/utils/response.py#L103-L130 request object & methods: https://www.w3schools.com/python/module_requests.asp https://fastapi.tiangolo.com/reference/parameters/

GiteaMirror commented

2025-11-11 16:28:52 -06:00

@Simon-Stone commented on GitHub (Jul 2, 2025):

I've opened a PR regarding this. Feedback much appreciated!

@Simon-Stone commented on GitHub (Jul 2, 2025): I've opened a [PR](https://github.com/open-webui/open-webui/pull/15412) regarding this. Feedback much appreciated!

GiteaMirror commented

2025-11-11 16:28:52 -06:00

@tjbck commented on GitHub (Sep 6, 2025):

Won't be added, this will require major refactor on all openai, ollama, function routers and will break too many existing Functions.

@tjbck commented on GitHub (Sep 6, 2025): Won't be added, this will require major refactor on all openai, ollama, function routers and will break too many existing Functions.

GiteaMirror referenced this issue

2025-11-11 17:58:56 -06:00

[PR #5674] [MERGED] 0.3.28 #8533

GiteaMirror referenced this issue

2025-11-11 17:59:20 -06:00

[PR #5741] [MERGED] dev #8543

GiteaMirror referenced this issue

2026-04-20 03:41:36 -05:00

[PR #5674] [MERGED] 0.3.28 #21737

GiteaMirror referenced this issue

2026-04-20 03:42:03 -05:00

[PR #5741] [MERGED] dev #21747