[GH-ISSUE #24190] issue: Azure OpenAI - Open WebUI: Server Connection Error #58892

New Issue

GiteaMirror · 2026-05-06T00:21:50-05:00

GiteaMirror commented

2026-05-06 00:21:50 -05:00

Originally created by @burkhat on GitHub (Apr 28, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/24190

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Other

Open WebUI Version

v0.9.2

Ollama Version (if applicable)

Operating System

Openshift 4.20.x

Browser (if applicable)

Chrome Version 146.0.7680.154

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

We want to use gpt-5.4 Model from Azure OpenAI with Private Endpoints and Reasoning Effort xhigh.
And we want to get an reply from the Model for the prompt after round about 8 minutes

Actual Behavior

At the moment we got a Server Connection Error after round about 4 Minutes.

We got the RESET from the Azure Ressource, is it possible to implement keep alive messages for OpenAI?

Steps to Reproduce

Install newest Open WebUI with HELM in Openshift.
Set Open WebUI AIOHTTP_CLIENT_TIMEOUT=1800
Create Azure OpenAI Ressource with Private Endpoint in SwedenCentral
Deploy Model gpt-5.4 to OpenAI Ressource
Add Azure OpenAI + Model to Open WebUI
Create a new Chat
Set Reasoning Effort to xhigh
Upload for e.g. two large C++ file
Use following prompt "This is a C++ class of my mock that creates, beside other things, spectrum data. What could be improved to make the “realistic” spectrum even more realistic? Think hard."
After round about 4 Minutes we got the error.

Logs & Screenshots

[OpenWebUI-Log.txt](https://github.com/user-attachments/files/27151680/OpenWebUI-Log.txt)

Additional Information

We have created a dump and we can see we got the RESET from the Azure OpenAI Ressource.
It looks like the idle timout for VNETs within Azure is set to 4 minutes, see https://www.reddit.com/r/AZURE/comments/1mzn1hg/tcp_idle_timeout_in_azure_vnets/?tl=de&rdt=44769
In the dump we can see that Open WebUI don't send keep alive messages.

I've created a small python script with the newest OpenAI pip modul and here we can see a keep alive message will be send every 30 seconds.

At the moment it is only possible to configure keep alive message for Ollama and not for OpenAI within OpenWebUI

Originally created by @burkhat on GitHub (Apr 28, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/24190 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Other ### Open WebUI Version v0.9.2 ### Ollama Version (if applicable) - ### Operating System Openshift 4.20.x ### Browser (if applicable) Chrome Version 146.0.7680.154 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior We want to use gpt-5.4 Model from Azure OpenAI with Private Endpoints and Reasoning Effort xhigh. And we want to get an reply from the Model for the prompt after round about 8 minutes ### Actual Behavior At the moment we got a Server Connection Error after round about 4 Minutes. We got the RESET from the Azure Ressource, is it possible to implement keep alive messages for OpenAI? ### Steps to Reproduce 1. Install newest Open WebUI with HELM in Openshift. 2. Set Open WebUI AIOHTTP_CLIENT_TIMEOUT=1800 3. Create Azure OpenAI Ressource with Private Endpoint in SwedenCentral 4. Deploy Model gpt-5.4 to OpenAI Ressource 5. Add Azure OpenAI + Model to Open WebUI 6. Create a new Chat 7. Set Reasoning Effort to xhigh 8. Upload for e.g. two large C++ file 9. Use following prompt "This is a C++ class of my mock that creates, beside other things, spectrum data. What could be improved to make the “realistic” spectrum even more realistic? Think hard." 10. After round about 4 Minutes we got the error. ### Logs & Screenshots <img width="726" height="210" alt="Image" src="https://github.com/user-attachments/assets/9de72ce4-28fe-43ef-b780-f04d880e7295" /> [OpenWebUI-Log.txt](https://github.com/user-attachments/files/27151680/OpenWebUI-Log.txt) ### Additional Information We have created a dump and we can see we got the RESET from the Azure OpenAI Ressource. It looks like the idle timout for VNETs within Azure is set to 4 minutes, see https://www.reddit.com/r/AZURE/comments/1mzn1hg/tcp_idle_timeout_in_azure_vnets/?tl=de&rdt=44769 In the dump we can see that Open WebUI don't send keep alive messages. I've created a small python script with the newest OpenAI pip modul and here we can see a keep alive message will be send every 30 seconds. At the moment it is only possible to configure keep alive message for Ollama and not for OpenAI within OpenWebUI

GiteaMirror added the bug label 2026-05-06 00:21:50 -05:00

GiteaMirror closed this issue

2026-05-06 00:21:54 -05:00

GiteaMirror commented

2026-05-06 00:21:56 -05:00

@PHclaw commented on GitHub (Apr 28, 2026):

Empty title issue — if title is null or empty, the frontend should fall back to the first ~50 chars of the content. Check the React component rendering the title field and add:

const displayTitle = title || content?.substring(0, 50) || 'Untitled';

Also add a DB-level NOT NULL constraint with a default of the first content chars to prevent nulls at the storage layer.

@PHclaw commented on GitHub (Apr 28, 2026): Empty title issue — if `title` is null or empty, the frontend should fall back to the first ~50 chars of the content. Check the React component rendering the title field and add: ```jsx const displayTitle = title || content?.substring(0, 50) || 'Untitled'; ``` Also add a DB-level NOT NULL constraint with a default of the first content chars to prevent nulls at the storage layer.

GiteaMirror commented

2026-05-06 00:21:58 -05:00

@PHclaw commented on GitHub (Apr 28, 2026):

Azure OpenAI Server Connection Error - this is likely a model name mismatch. Azure OpenAI uses deployment names, not model names. The format should be:

base_url = 'https://YOUR_RESOURCE.openai.azure.com'
model = 'YOUR_DEPLOYMENT_NAME'  # NOT 'gpt-4o'

Also check that your Azure OpenAI resource has the correct API version and that CORS is enabled for the webui origin. The error 'Server Connection Error' typically means the /v1/models endpoint is returning a non-200 status.

@PHclaw commented on GitHub (Apr 28, 2026): Azure OpenAI Server Connection Error - this is likely a model name mismatch. Azure OpenAI uses deployment names, not model names. The format should be: ```python base_url = 'https://YOUR_RESOURCE.openai.azure.com' model = 'YOUR_DEPLOYMENT_NAME' # NOT 'gpt-4o' ``` Also check that your Azure OpenAI resource has the correct API version and that CORS is enabled for the webui origin. The error 'Server Connection Error' typically means the /v1/models endpoint is returning a non-200 status.

GiteaMirror commented

2026-05-06 00:22:00 -05:00

@burkhat commented on GitHub (Apr 28, 2026):

@PHclaw each other chat with the same chat model is working. The problem only occurs if we change the Reasoning effort to "xhigh" and add some files to the prompt.
If we're using "xhigh" with just a prompt "Hello how are you?" we got a response.
We can see in the dump Azure will send a RST after 4 minutes.
With my python script it is working with same Base URL and Model like in OpenWebUI.

@burkhat commented on GitHub (Apr 28, 2026): @PHclaw each other chat with the same chat model is working. The problem only occurs if we change the Reasoning effort to "xhigh" and add some files to the prompt. If we're using "xhigh" with just a prompt "Hello how are you?" we got a response. We can see in the dump Azure will send a RST after 4 minutes. With my python script it is working with same Base URL and Model like in OpenWebUI.

GiteaMirror commented

2026-05-06 00:22:08 -05:00

@PHclaw commented on GitHub (Apr 28, 2026):

Interesting! If it works fine with the same model normally but fails when switching 'Reasoning effort' to high, that points to the reasoning model (o1/o3/o4) being invoked differently.

Reasoning effort settings in Azure OpenAI map to different model deployments:

'low' = gpt-4o (standard)
'high' = o1-preview or o3-mini (separate endpoint)

Check your Azure OpenAI studio for which model deployment is used for reasoning. The 'high' setting likely hits a different deployment that either:

Has a different API version requirement (2024-06-01 vs 2024-08-01-preview)
Uses a different authentication method
Has stricter rate limits

Look at the Azure portal -> Your resource -> Model deployments. If 'o1-preview' or 'o3' is in a separate deployment, make sure that deployment name is in your Open Web UI config.

@PHclaw commented on GitHub (Apr 28, 2026): Interesting! If it works fine with the same model normally but fails when switching 'Reasoning effort' to high, that points to the reasoning model (o1/o3/o4) being invoked differently. Reasoning effort settings in Azure OpenAI map to different model deployments: - 'low' = gpt-4o (standard) - 'high' = o1-preview or o3-mini (separate endpoint) Check your Azure OpenAI studio for which model deployment is used for reasoning. The 'high' setting likely hits a different deployment that either: 1. Has a different API version requirement (2024-06-01 vs 2024-08-01-preview) 2. Uses a different authentication method 3. Has stricter rate limits Look at the Azure portal -> Your resource -> Model deployments. If 'o1-preview' or 'o3' is in a separate deployment, make sure that deployment name is in your Open Web UI config.

GiteaMirror commented

2026-05-06 00:22:12 -05:00

@PHclaw commented on GitHub (Apr 28, 2026):

Following up: if the error only happens when 'Reasoning effort' is set to high, the issue is that high reasoning effort routes to a different model (o1-preview/o3-mini) in Azure OpenAI. These models require different API parameters:

No system messages (o1-series ignores them)
max_completion_tokens instead of max_tokens
No temperature parameter (always 1 for o1)

Check your Open WebUI backend code for the model routing logic. When reasoning_effort=high, it likely sends o1-compatible parameters to gpt-4o, causing the 400 error. The fix is to conditionally apply API parameters based on model family:

if model.startswith('o1') or model.startswith('o3'):
    params = {k: v for k, v in params.items() if k not in ['temperature', 'system']}
    params['max_completion_tokens'] = params.pop('max_tokens', 8192)

@PHclaw commented on GitHub (Apr 28, 2026): Following up: if the error only happens when 'Reasoning effort' is set to high, the issue is that high reasoning effort routes to a different model (o1-preview/o3-mini) in Azure OpenAI. These models require different API parameters: 1. No `system` messages (o1-series ignores them) 2. `max_completion_tokens` instead of `max_tokens` 3. No `temperature` parameter (always 1 for o1) Check your Open WebUI backend code for the model routing logic. When reasoning_effort=high, it likely sends o1-compatible parameters to gpt-4o, causing the 400 error. The fix is to conditionally apply API parameters based on model family: ```python if model.startswith('o1') or model.startswith('o3'): params = {k: v for k, v in params.items() if k not in ['temperature', 'system']} params['max_completion_tokens'] = params.pop('max_tokens', 8192) ```

GiteaMirror commented

2026-05-06 00:22:13 -05:00

@PHclaw commented on GitHub (Apr 28, 2026):

For the DuckDuckGo search AttributeError:

This is likely a version incompatibility. duckduckgo-search updated its API and renamed ddg() function. The fix:

# Old (broken)
from duckduckgo_search import ddg
results = ddg(query, max_results=5)

# New (fixed) - use SafeSearch or News
from duckduckgo_search import SafeSearch
s = SafeSearch()
results = s.text(query, max_results=5)

Or pin the working version:

pip install duckduckgo-search==6.3.3

Check the current installed version and compare with what the code expects.

@PHclaw commented on GitHub (Apr 28, 2026): For the DuckDuckGo search AttributeError: This is likely a version incompatibility. duckduckgo-search updated its API and renamed `ddg()` function. The fix: ```python # Old (broken) from duckduckgo_search import ddg results = ddg(query, max_results=5) # New (fixed) - use SafeSearch or News from duckduckgo_search import SafeSearch s = SafeSearch() results = s.text(query, max_results=5) ``` Or pin the working version: ```bash pip install duckduckgo-search==6.3.3 ``` Check the current installed version and compare with what the code expects.

GiteaMirror commented

2026-05-06 00:22:14 -05:00

@burkhat commented on GitHub (Apr 28, 2026):

@PHclaw The problem has nothing to do with the model or with tokens, but with how the prompt is queried.
When I send the same prompt via a Python script to the same endpoint it works without any issues; the error only occurs with Open WebUI.

In the network dump you can see that Open WebUI does not send any keep‑alive messages, and after about four minutes Azure OpenAI sends an RST, which then triggers the error.

The Python script sends a request every 30 seconds, so no RST is emitted by OpenAI.

In my opinion, Open WebUI should be modified to send keep‑alive messages in order to work around the idle‑timeout.

oai_prompt.py

@burkhat commented on GitHub (Apr 28, 2026): @PHclaw The problem has nothing to do with the model or with tokens, but with how the prompt is queried. When I send the same prompt via a Python script to the same endpoint it works without any issues; the error only occurs with Open WebUI. In the network dump you can see that Open WebUI does not send any keep‑alive messages, and after about four minutes Azure OpenAI sends an RST, which then triggers the error. The Python script sends a request every 30 seconds, so no RST is emitted by OpenAI. In my opinion, Open WebUI should be modified to send keep‑alive messages in order to work around the idle‑timeout. [oai_prompt.py](https://github.com/user-attachments/files/27161510/oai_prompt.py)

GiteaMirror commented

2026-05-06 00:22:15 -05:00

@Classic298 commented on GitHub (May 1, 2026):

@PHclaw stop your clanker and stop it commenting with totally random comments that have NOTHING to do with this here. SPAM

@Classic298 commented on GitHub (May 1, 2026): @PHclaw stop your clanker and stop it commenting with totally random comments that have NOTHING to do with this here. SPAM

GiteaMirror commented

2026-05-06 00:22:15 -05:00

@Classic298 commented on GitHub (May 1, 2026):

@burkhat nginx settings? Reverse proxy? Timeouts there? How did you add the model? Responses api?

Cannot reproduce with this needs more details please

@Classic298 commented on GitHub (May 1, 2026): @burkhat nginx settings? Reverse proxy? Timeouts there? How did you add the model? Responses api? Cannot reproduce with this needs more details please

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#58892