[GH-ISSUE #9155] bug/feature/performance: HTTP connections to Ollama HTTP endpoint are not re-used (performance overhead / penalty) #15404

Closed
opened 2026-04-19 21:37:15 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Kami on GitHub (Jan 31, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/9155

Feature Request / Improvement

Description

I deployed Open WebUI behind a reverse proxy (nginx). Same goes for Ollama (it's also deployed behind nginx reverse proxy which adds authentication and TLS).

I notice that connections from Open WebUI -> Ollama HTTP API endpoint don't seem to be re-used (aka TCP persistent connections and keep-alive).

I originally thought it may be my proxy configuration or Ollama, but I tested it with with my other proxies services and Ollama directly and it works fine:

curl -k -I --http1.1 --keepalive-time 60 https://ollama.local:8080 -H "Authorization: Bearer foobar"
HTTP/1.1 200 OK
Server: nginx/1.27.3
Date: Fri, 31 Jan 2025 09:12:26 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 17
Connection: keep-alive

See Connection: keep-alive header returned by the reverse proxy which indicates long lived connections are supported and working.

Here are logs from the Open Web UI reverse proxy where you can see connections are not re-used (different connection id - new connection is established for each outgoing request to Ollama):

nginx-1  | 172.20.0.2 - GET /api/tags HTTP/1.1 upstream_response_time=0.009 upstream_status=200 connection_id=21 
nginx-1  | 172.20.0.2 - GET /api/version HTTP/1.1 upstream_response_time=0.002 upstream_status=200 connection_id=23 
nginx-1  | 172.20.0.2 - GET /api/tags HTTP/1.1 upstream_response_time=0.007 upstream_status=200 connection_id=24 
nginx-1  | 172.20.0.2 - GET /api/version HTTP/1.1 upstream_response_time=0.001 upstream_status=200 connection_id=25 
nginx-1  | 172.20.0.2 - GET /api/tags HTTP/1.1 upstream_response_time=0.006 upstream_status=200 connection_id=26 
nginx-1  | 172.20.0.2 - GET /api/version HTTP/1.1 upstream_response_time=0.001 upstream_status=200 connection_id=27 

After that, I started digging into the code and I noticed this pattern:

In short, the code always retrieves a new aiohttp.ClientSession. So even though aiohttp enables and supports keep-alive by default, it won't work because the code always obtain a new isolated session object.

I believe, that to make it work correctly, we would need to re-use the same ClientSession for all the outgoing requests. So something like:

AIOHTTP_SESSION = async with aiohttp.ClientSession() 

with AIOHTTP_SESSION:
    async session.get("https://....")
    async session.get("https://....")
    async session.get("https://....")

This is similar to the requests.Session - if you want to re-use underlying TCP connections (to avoid overhead of TCP + TLS handshake, etc.) you need to use the same requests.Session option for all the outgoing requests.

Proposed Improvement

I propose to refactor the code to re-use same (global?) aiohttp.ClientSession instance for all the outgoing requests to Ollama. This should result in lower latency and better overall performance and end user experience.

Additionally, it may be good to expose some underlying aiohttp connector keep-alive related options via environment variables (e.g. keepalive_timeout, limit, limit_per_host).

P.S. I imagine a similar problem may exist with other external model connectors (e.g. OpenAPI), but I didn't dig in.

Originally created by @Kami on GitHub (Jan 31, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/9155 # Feature Request / Improvement ## Description I deployed Open WebUI behind a reverse proxy (nginx). Same goes for Ollama (it's also deployed behind nginx reverse proxy which adds authentication and TLS). I notice that connections from Open WebUI -> Ollama HTTP API endpoint don't seem to be re-used (aka TCP persistent connections and keep-alive). I originally thought it may be my proxy configuration or Ollama, but I tested it with with my other proxies services and Ollama directly and it works fine: ```bash curl -k -I --http1.1 --keepalive-time 60 https://ollama.local:8080 -H "Authorization: Bearer foobar" HTTP/1.1 200 OK Server: nginx/1.27.3 Date: Fri, 31 Jan 2025 09:12:26 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 17 Connection: keep-alive ``` See `Connection: keep-alive` header returned by the reverse proxy which indicates long lived connections are supported and working. Here are logs from the Open Web UI reverse proxy where you can see connections are not re-used (different connection id - new connection is established for each outgoing request to Ollama): ```bash nginx-1 | 172.20.0.2 - GET /api/tags HTTP/1.1 upstream_response_time=0.009 upstream_status=200 connection_id=21 nginx-1 | 172.20.0.2 - GET /api/version HTTP/1.1 upstream_response_time=0.002 upstream_status=200 connection_id=23 nginx-1 | 172.20.0.2 - GET /api/tags HTTP/1.1 upstream_response_time=0.007 upstream_status=200 connection_id=24 nginx-1 | 172.20.0.2 - GET /api/version HTTP/1.1 upstream_response_time=0.001 upstream_status=200 connection_id=25 nginx-1 | 172.20.0.2 - GET /api/tags HTTP/1.1 upstream_response_time=0.006 upstream_status=200 connection_id=26 nginx-1 | 172.20.0.2 - GET /api/version HTTP/1.1 upstream_response_time=0.001 upstream_status=200 connection_id=27 ``` After that, I started digging into the code and I noticed this pattern: * https://github.com/open-webui/open-webui/blob/b72150c881955721a63ae7f4ea1b9ea293816fc1/backend/open_webui/routers/ollama.py#L105 * https://github.com/open-webui/open-webui/blob/b72150c881955721a63ae7f4ea1b9ea293816fc1/backend/open_webui/routers/ollama.py#L190 * etc. In short, the code always retrieves a new `aiohttp.ClientSession`. So even though `aiohttp` enables and supports keep-alive by default, it won't work because the code always obtain a new isolated session object. I believe, that to make it work correctly, we would need to re-use the same `ClientSession` for all the outgoing requests. So something like: ```python AIOHTTP_SESSION = async with aiohttp.ClientSession() with AIOHTTP_SESSION: async session.get("https://....") async session.get("https://....") async session.get("https://....") ``` This is similar to the `requests.Session` - if you want to re-use underlying TCP connections (to avoid overhead of TCP + TLS handshake, etc.) you need to use the same `requests.Session` option for all the outgoing requests. ## Proposed Improvement I propose to refactor the code to re-use same (global?) `aiohttp.ClientSession` instance for all the outgoing requests to Ollama. This should result in lower latency and better overall performance and end user experience. Additionally, it may be good to expose some underlying `aiohttp` connector keep-alive related options via environment variables (e.g. `keepalive_timeout`, `limit`, `limit_per_host`). P.S. I imagine a similar problem may exist with other external model connectors (e.g. OpenAPI), but I didn't dig in. ## Links, References * aiohttp docs - https://docs.aiohttp.org/en/stable/client_reference.html * Similar concept in requests (which is not used here, but just as a reference ) - https://requests.readthedocs.io/en/latest/user/advanced/#session-objects
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#15404