BUG No completion from Ollama-served model proxied behind LiteLLM #751

Closed
opened 2025-11-11 14:30:28 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @corticalstack on GitHub (Apr 28, 2024).

Bug Report

Description

Configured a proxied LiteLLM model in open-webui. Cannot get response completion from the model, UI visualisation just shows a blining awaiting response.

Note I do get compeltion from said model exposed via LiteLLM when inferring LiteLLM directly, with the example curl below:

curl --location 'http://192.168.1.12:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data ' {
      "model": "ollama/llama3",
      "messages": [
        {
          "role": "user",
          "content": "What is entropy theory"
        }
      ],
    }
'

When configuring the ollama/llama3 model in open-webui LiteLLM settings, I have tried both http://192.168.1.12:4000 and http://192.168.1.12:4000/chat/completions as api base urls.

Also note I do get completioned from other LLM endpoints proxied behind LiteLLM sich as OpenAI GPT-3.5-Turbo and Groq Llama3-70B.

Thanks for any help.

Environment

  • Open WebUI Version: [e.g., 0.1.121]

  • **Ollama 0.1.32

  • **LiteLLM 1.35.29 OAS 3.1

  • **Ubuntu 22.04.3 LTS

  • Operating System: [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04]

  • Browser (if applicable): [e.g., Chrome 100.0, Firefox 98.0]

Originally created by @corticalstack on GitHub (Apr 28, 2024). # Bug Report ## Description Configured a proxied LiteLLM model in open-webui. Cannot get response completion from the model, UI visualisation just shows a blining awaiting response. Note I do get compeltion from said model exposed via LiteLLM when inferring LiteLLM directly, with the example curl below: ``` curl --location 'http://192.168.1.12:4000/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer sk-1234' \ --data ' { "model": "ollama/llama3", "messages": [ { "role": "user", "content": "What is entropy theory" } ], } ' ``` When configuring the ollama/llama3 model in open-webui LiteLLM settings, I have tried both `http://192.168.1.12:4000` and `http://192.168.1.12:4000/chat/completions` as api base urls. Also note I do get completioned from other LLM endpoints proxied behind LiteLLM sich as OpenAI GPT-3.5-Turbo and Groq Llama3-70B. Thanks for any help. ## Environment - **Open WebUI Version:** [e.g., 0.1.121] - **Ollama 0.1.32 - **LiteLLM 1.35.29 OAS 3.1 - **Ubuntu 22.04.3 LTS - **Operating System:** [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04] - **Browser (if applicable):** [e.g., Chrome 100.0, Firefox 98.0]
Author
Owner

@justinh-rahb commented on GitHub (Apr 28, 2024):

Base URLs should be that of your Ollama server, not the LiteLLM proxy:

Screenshot 2024-04-28 at 12 40 18 PM
@justinh-rahb commented on GitHub (Apr 28, 2024): Base URLs should be that of your Ollama server, not the LiteLLM proxy: <img width="688" alt="Screenshot 2024-04-28 at 12 40 18 PM" src="https://github.com/open-webui/open-webui/assets/52832301/393b38f9-d11d-490e-a68e-996e4ef51339">
Author
Owner

@corticalstack commented on GitHub (Apr 28, 2024):

That's defeating my objective, which is to have EVERY LLM made available for seelction behind a proxy, for purposes of load balancing, capturing token consumption etc.

If I wanted open-webui to have direct access to ollama runner, I'd just configure the OLLAMA_BASE_URL.

@corticalstack commented on GitHub (Apr 28, 2024): That's defeating my objective, which is to have EVERY LLM made available for seelction behind a proxy, for purposes of load balancing, capturing token consumption etc. If I wanted open-webui to have direct access to ollama runner, I'd just configure the OLLAMA_BASE_URL.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#751