mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #9155] bug/feature/performance: HTTP connections to Ollama HTTP endpoint are not re-used (performance overhead / penalty) #15404
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Kami on GitHub (Jan 31, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/9155
Feature Request / Improvement
Description
I deployed Open WebUI behind a reverse proxy (nginx). Same goes for Ollama (it's also deployed behind nginx reverse proxy which adds authentication and TLS).
I notice that connections from Open WebUI -> Ollama HTTP API endpoint don't seem to be re-used (aka TCP persistent connections and keep-alive).
I originally thought it may be my proxy configuration or Ollama, but I tested it with with my other proxies services and Ollama directly and it works fine:
See
Connection: keep-aliveheader returned by the reverse proxy which indicates long lived connections are supported and working.Here are logs from the Open Web UI reverse proxy where you can see connections are not re-used (different connection id - new connection is established for each outgoing request to Ollama):
After that, I started digging into the code and I noticed this pattern:
b72150c881/backend/open_webui/routers/ollama.py (L105)b72150c881/backend/open_webui/routers/ollama.py (L190)In short, the code always retrieves a new
aiohttp.ClientSession. So even thoughaiohttpenables and supports keep-alive by default, it won't work because the code always obtain a new isolated session object.I believe, that to make it work correctly, we would need to re-use the same
ClientSessionfor all the outgoing requests. So something like:This is similar to the
requests.Session- if you want to re-use underlying TCP connections (to avoid overhead of TCP + TLS handshake, etc.) you need to use the samerequests.Sessionoption for all the outgoing requests.Proposed Improvement
I propose to refactor the code to re-use same (global?)
aiohttp.ClientSessioninstance for all the outgoing requests to Ollama. This should result in lower latency and better overall performance and end user experience.Additionally, it may be good to expose some underlying
aiohttpconnector keep-alive related options via environment variables (e.g.keepalive_timeout,limit,limit_per_host).P.S. I imagine a similar problem may exist with other external model connectors (e.g. OpenAPI), but I didn't dig in.
Links, References