400 error in UI on context overflow #1847

Closed
opened 2025-11-11 14:54:45 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @ddzina on GitHub (Aug 22, 2024).

Bug Report

Installation Method

OpenWebui via docker image in docker compose with LiteLLM container

services:

  open-webui:
    image: open-webui:0.3.11
    container_name: open-webui
    volumes:
      - ./data:/app/backend/data
      - ./config.json:/app/backend/data/config.json
    ports:
      - 3000:8080
    env_file:
      - .env
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped
  litellm:
    image: litellm:main-v1.40.17
    container_name: litellm
    volumes:
      - ./litellm/config.yaml:/app/config.yaml
    ports:
      - 4000:4000
    env_file:
      - .env
    restart: unless-stopped
    command: ["--config", "/app/config.yaml", "--port", "4000"]

Environment

  • Open WebUI Version: [v0.3.11]

  • LiteLLM: [v1.40.17]

  • Operating System: [Ubuntu 22.04]

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

If user message exceeds the context limit of the model, user must receive information about it, not the error which doesn't tell what went wrong.

Actual Behavior:

If user sends large message and context overflows, model responds with 400 error which confuses the user.

Description

Bug Summary:
User receives uninformative 400 error when LLM's context overflows.

Reproduction Details

Steps to Reproduce:

  1. Add any LLM via Connections tab using direct API or LiteLLM.
  2. Send any large text that will exceed the context of selected model.

Logs and Screenshots

Browser Console Logs:

Chat.svelte:491 submitPrompt 
Chat.svelte:642 modelId LLaMA 3 70B Nexus
+layout.svelte:130 usage {models: Array(1)}
+layout.svelte:130 usage {models: Array(1)}
index.ts:281 
        
        
POST https://*url-hidden*/api/chat/completions 400 (Bad Request)
window.fetch @ fetcher.js:72
A @ index.ts:281
ut @ Chat.svelte:1042
await in ut (async)
(anonymous) @ Chat.svelte:693
await in (anonymous) (async)
ft @ Chat.svelte:641
await in ft (async)
_t @ Chat.svelte:564
await in _t (async)
bt @ MessageInput.svelte:635
Show 1 more frame
Show less
Chat.svelte:1279 {detail: "External: 400, message='Bad Request', url=URL('htt…//host.docker.internal:4000/v1/chat/completions')"}
Et @ Chat.svelte:1279
await in Et (async)
ut @ Chat.svelte:1228
await in ut (async)
(anonymous) @ Chat.svelte:693
await in (anonymous) (async)
ft @ Chat.svelte:641
await in ft (async)
_t @ Chat.svelte:564
await in _t (async)
bt @ MessageInput.svelte:635
+layout.svelte:130 usage {models: Array(1)}
+layout.svelte:130 usage {models: Array(1)}
+layout.svelte:130 usage {models: Array(1)}
+layout.svelte:130 usage {models: Array(1)}
+layout.svelte:130 usage {models: Array(1)}models: ['LLaMA 3 70B Nexus'][[Prototype]]: Object
+layout.svelte:130 usage {models: Array(0)}

Docker Container Logs:
OpenWebUI

DEBUG [main] request.url.path: /api/chat/completions
DEBUG [apps.openai.main] {"stream": true, "model": "LLaMA 3 70B Nexus", "messages": [{"role": "user", "content": "*Here was passed whole scietific article exceeding context length of the model*"}]}
ERROR [apps.openai.main] 400, message='Bad Request', url=URL('http://host.docker.internal:4000/v1/chat/completions')
Traceback (most recent call last):
  File "/app/backend/apps/openai/main.py", line 484, in generate_chat_completion
    r.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url=URL('http://host.docker.internal:4000/v1/chat/completions')
DEBUG [main] Commit session after request
INFO:     31.44.1.122:0 - "POST /api/chat/completions HTTP/1.1" 400 Bad Request
DEBUG [main] Commit session after request

LiteLLM

openai.BadRequestError: Error code: 400 - {'code': 400, 'message': "This model's maximum context length is 8192 tokens. However, you requested 12934 tokens (8838 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.", 'object': 'error', 'param': None, 'type': 'BadRequestError'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 378, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/llms/openai.py", line 991, in async_streaming
    raise OpenAIError(status_code=e.status_code, message=str(e))
litellm.llms.openai.OpenAIError: Error code: 400 - {'code': 400, 'message': "This model's maximum context length is 8192 tokens. However, you requested 12934 tokens (8838 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.", 'object': 'error', 'param': None, 'type': 'BadRequestError'}

11:31:30 - LiteLLM Proxy:ERROR: proxy_server.py:3137 - litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.BadRequestError: litellm.ContextWindowExceededError: ContextWindowExceededError: OpenAIException - Error code: 400 - {'code': 400, 'message': "This model's maximum context length is 8192 tokens. However, you requested 12934 tokens (8838 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.", 'object': 'error', 'param': None, 'type': 'BadRequestError'} LiteLLM Retried: 1 times, LiteLLM Max Retries: 2
11:31:30 - LiteLLM Proxy:ERROR: _common.py:120 - Giving up chat_completion(...) after 1 tries (litellm.proxy._types.ProxyException)
INFO:     172.20.0.1:43356 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Screenshots/Screen Recordings (if applicable):
image

Additional Information

Despite the fact that I send logs from case with LiteLLM, the behaviour is the same when using direct OpenAI Connection via OpenAI API. So it is not the issue on the LiteLLM side. Moreover, the same error occurs if user sends a few large messages and each of them fit the context, but summarized context exceeds the limit. By the way, LiteLLM catches the ContextWindowExceededError, so I ask to implement logic to manage such error.

Originally created by @ddzina on GitHub (Aug 22, 2024). # Bug Report ## Installation Method OpenWebui via docker image in docker compose with LiteLLM container ``` services: open-webui: image: open-webui:0.3.11 container_name: open-webui volumes: - ./data:/app/backend/data - ./config.json:/app/backend/data/config.json ports: - 3000:8080 env_file: - .env extra_hosts: - "host.docker.internal:host-gateway" restart: unless-stopped litellm: image: litellm:main-v1.40.17 container_name: litellm volumes: - ./litellm/config.yaml:/app/config.yaml ports: - 4000:4000 env_file: - .env restart: unless-stopped command: ["--config", "/app/config.yaml", "--port", "4000"] ``` ## Environment - **Open WebUI Version:** [v0.3.11] - **LiteLLM:** [v1.40.17] - **Operating System:** [Ubuntu 22.04] **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [ ] I am on the latest version of both Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: If user message exceeds the context limit of the model, user must receive information about it, not the error which doesn't tell what went wrong. ## Actual Behavior: If user sends large message and context overflows, model responds with 400 error which confuses the user. ## Description **Bug Summary:** User receives uninformative 400 error when LLM's context overflows. ## Reproduction Details **Steps to Reproduce:** 1. Add any LLM via Connections tab using direct API or LiteLLM. 2. Send any large text that will exceed the context of selected model. ## Logs and Screenshots **Browser Console Logs:** ``` Chat.svelte:491 submitPrompt Chat.svelte:642 modelId LLaMA 3 70B Nexus +layout.svelte:130 usage {models: Array(1)} +layout.svelte:130 usage {models: Array(1)} index.ts:281 POST https://*url-hidden*/api/chat/completions 400 (Bad Request) window.fetch @ fetcher.js:72 A @ index.ts:281 ut @ Chat.svelte:1042 await in ut (async) (anonymous) @ Chat.svelte:693 await in (anonymous) (async) ft @ Chat.svelte:641 await in ft (async) _t @ Chat.svelte:564 await in _t (async) bt @ MessageInput.svelte:635 Show 1 more frame Show less Chat.svelte:1279 {detail: "External: 400, message='Bad Request', url=URL('htt…//host.docker.internal:4000/v1/chat/completions')"} Et @ Chat.svelte:1279 await in Et (async) ut @ Chat.svelte:1228 await in ut (async) (anonymous) @ Chat.svelte:693 await in (anonymous) (async) ft @ Chat.svelte:641 await in ft (async) _t @ Chat.svelte:564 await in _t (async) bt @ MessageInput.svelte:635 +layout.svelte:130 usage {models: Array(1)} +layout.svelte:130 usage {models: Array(1)} +layout.svelte:130 usage {models: Array(1)} +layout.svelte:130 usage {models: Array(1)} +layout.svelte:130 usage {models: Array(1)}models: ['LLaMA 3 70B Nexus'][[Prototype]]: Object +layout.svelte:130 usage {models: Array(0)} ``` **Docker Container Logs:** OpenWebUI ``` DEBUG [main] request.url.path: /api/chat/completions DEBUG [apps.openai.main] {"stream": true, "model": "LLaMA 3 70B Nexus", "messages": [{"role": "user", "content": "*Here was passed whole scietific article exceeding context length of the model*"}]} ERROR [apps.openai.main] 400, message='Bad Request', url=URL('http://host.docker.internal:4000/v1/chat/completions') Traceback (most recent call last): File "/app/backend/apps/openai/main.py", line 484, in generate_chat_completion r.raise_for_status() File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status raise ClientResponseError( aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url=URL('http://host.docker.internal:4000/v1/chat/completions') DEBUG [main] Commit session after request INFO: 31.44.1.122:0 - "POST /api/chat/completions HTTP/1.1" 400 Bad Request DEBUG [main] Commit session after request ``` LiteLLM ``` openai.BadRequestError: Error code: 400 - {'code': 400, 'message': "This model's maximum context length is 8192 tokens. However, you requested 12934 tokens (8838 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.", 'object': 'error', 'param': None, 'type': 'BadRequestError'} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 378, in acompletion response = await init_response ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/litellm/llms/openai.py", line 991, in async_streaming raise OpenAIError(status_code=e.status_code, message=str(e)) litellm.llms.openai.OpenAIError: Error code: 400 - {'code': 400, 'message': "This model's maximum context length is 8192 tokens. However, you requested 12934 tokens (8838 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.", 'object': 'error', 'param': None, 'type': 'BadRequestError'} 11:31:30 - LiteLLM Proxy:ERROR: proxy_server.py:3137 - litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.BadRequestError: litellm.ContextWindowExceededError: ContextWindowExceededError: OpenAIException - Error code: 400 - {'code': 400, 'message': "This model's maximum context length is 8192 tokens. However, you requested 12934 tokens (8838 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.", 'object': 'error', 'param': None, 'type': 'BadRequestError'} LiteLLM Retried: 1 times, LiteLLM Max Retries: 2 11:31:30 - LiteLLM Proxy:ERROR: _common.py:120 - Giving up chat_completion(...) after 1 tries (litellm.proxy._types.ProxyException) INFO: 172.20.0.1:43356 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request ``` **Screenshots/Screen Recordings (if applicable):** ![image](https://github.com/user-attachments/assets/d23206d6-bc0f-4b83-bc2a-bac056e331ce) ## Additional Information Despite the fact that I send logs from case with LiteLLM, the behaviour is the same when using direct OpenAI Connection via OpenAI API. So it is not the issue on the LiteLLM side. Moreover, the same error occurs if user sends a few large messages and each of them fit the context, but summarized context exceeds the limit. By the way, LiteLLM catches the ContextWindowExceededError, so I ask to implement logic to manage such error.
Author
Owner

@tjbck commented on GitHub (Aug 23, 2024):

If you could provide a detailed guide on how to reproduce with OpenAI (not LiteLLM) API, I'd greatly appreciate it.

@tjbck commented on GitHub (Aug 23, 2024): If you could provide a detailed guide on how to reproduce with OpenAI (not LiteLLM) API, I'd greatly appreciate it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1847