[GH-ISSUE #1327] Dynamic Behavior of Maximum Tokens (max_tokens) in Claude 3 models using LiteLLM #27976

New Issue

GiteaMirror · 2026-04-25T02:44:53-05:00

GiteaMirror commented

2026-04-25 02:44:53 -05:00

Originally created by @EnzoAndree on GitHub (Mar 27, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1327

Bug Report

Description

Bug Summary:
The maximum output token count appears to decrease dynamically, even though the Claude 3 models supports a context window of 200,000 tokens and the max_tokens (Max output) parameter is set to 4096.

Steps to Reproduce:

Open the Open-WebUI application.
Start a new conversation with any Claude 3 model using LiteLLM.
Observe the initial max_tokens value, which should be 4096 in the log.
Continue the conversation by sending additional messages.
Notice that the max_tokens value decreases with each subsequent message.

Expected Behavior:
The max_tokens parameter should be kept consistent at 4096 throughout the conversation. It should only be adjusted if the maximum context length is reached (200,000 tokens).

Actual Behavior:
The max_tokens value decreases as the conversation progresses, even though the context window for the Claude model is 200,000 tokens.

Environment

Operating System: macOS 14.4
Browser (if applicable): Google Chrome

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.

Logs and Screenshots

INFO:     127.0.0.1:56583 - "POST /api/v1/chats/new HTTP/1.1" 200 OK
INFO:     127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK
02:29:38 - LiteLLM:INFO: 

POST Request Sent from LiteLLM:
curl -X POST \
https://api.anthropic.com/v1/messages \
-H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \
-d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}], 'stream': True, 'max_tokens': 4085}'


INFO:LiteLLM:

POST Request Sent from LiteLLM:
curl -X POST \
https://api.anthropic.com/v1/messages \
-H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \
-d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}], 'stream': True, 'max_tokens': 4085}'


02:29:39 - LiteLLM Router:INFO: litellm.acompletion(model=claude-3-haiku-20240307) 200 OK
INFO:LiteLLM Router:litellm.acompletion(model=claude-3-haiku-20240307) 200 OK
INFO:     127.0.0.1:56583 - "POST /litellm/api/v1/chat/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:56583 - "POST /api/v1/chats/aa4b137f-2d5a-48f0-810c-0b2567c295bd HTTP/1.1" 200 OK
INFO:     127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:56583 - "POST /api/v1/chats/aa4b137f-2d5a-48f0-810c-0b2567c295bd HTTP/1.1" 200 OK
INFO:     127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK
02:30:15 - LiteLLM:INFO: 

POST Request Sent from LiteLLM:
curl -X POST \
https://api.anthropic.com/v1/messages \
-H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \
-d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}, {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, {'role': 'user', 'content': 'this is a test of your max_tokens, please say hello'}], 'stream': True, 'max_tokens': 4063}'


INFO:LiteLLM:

POST Request Sent from LiteLLM:
curl -X POST \
https://api.anthropic.com/v1/messages \
-H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \
-d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}, {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, {'role': 'user', 'content': 'this is a test of your max_tokens, please say hello'}], 'stream': True, 'max_tokens': 4063}'


02:30:16 - LiteLLM Router:INFO: litellm.acompletion(model=claude-3-haiku-20240307) 200 OK
INFO:LiteLLM Router:litellm.acompletion(model=claude-3-haiku-20240307) 200 OK
INFO:     127.0.0.1:56607 - "POST /litellm/api/v1/chat/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:56607 - "POST /api/v1/chats/aa4b137f-2d5a-48f0-810c-0b2567c295bd HTTP/1.1" 200 OK
INFO:     127.0.0.1:56607 - "GET /api/v1/chats/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:56607 - "GET /api/v1/chats/ HTTP/1.1" 200 OK

Installation Method

Manual installation

Originally created by @EnzoAndree on GitHub (Mar 27, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/1327 # Bug Report ## Description **Bug Summary:** The maximum output token count appears to decrease dynamically, even though the Claude 3 models supports a context window of 200,000 tokens and the max_tokens (Max output) parameter is set to 4096. **Steps to Reproduce:** 1. Open the Open-WebUI application. 2. Start a new conversation with any Claude 3 model using LiteLLM. 3. Observe the initial `max_tokens` value, which should be 4096 in the log. 4. Continue the conversation by sending additional messages. 5. Notice that the `max_tokens` value decreases with each subsequent message. **Expected Behavior:** The `max_tokens` parameter should be kept consistent at 4096 throughout the conversation. It should only be adjusted if the maximum context length is reached (200,000 tokens). **Actual Behavior:** The `max_tokens` value decreases as the conversation progresses, even though the context window for the Claude model is 200,000 tokens. ## Environment - **Operating System:** macOS 14.4 - **Browser (if applicable):** Google Chrome ## Reproduction Details **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [x] I have included the browser console logs. - [ ] I have included the Docker container logs. ## Logs and Screenshots ``` INFO: 127.0.0.1:56583 - "POST /api/v1/chats/new HTTP/1.1" 200 OK INFO: 127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK 02:29:38 - LiteLLM:INFO: POST Request Sent from LiteLLM: curl -X POST \ https://api.anthropic.com/v1/messages \ -H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \ -d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}], 'stream': True, 'max_tokens': 4085}' INFO:LiteLLM: POST Request Sent from LiteLLM: curl -X POST \ https://api.anthropic.com/v1/messages \ -H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \ -d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}], 'stream': True, 'max_tokens': 4085}' 02:29:39 - LiteLLM Router:INFO: litellm.acompletion(model=claude-3-haiku-20240307) 200 OK INFO:LiteLLM Router:litellm.acompletion(model=claude-3-haiku-20240307) 200 OK INFO: 127.0.0.1:56583 - "POST /litellm/api/v1/chat/completions HTTP/1.1" 200 OK INFO: 127.0.0.1:56583 - "POST /api/v1/chats/aa4b137f-2d5a-48f0-810c-0b2567c295bd HTTP/1.1" 200 OK INFO: 127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK INFO: 127.0.0.1:56583 - "POST /api/v1/chats/aa4b137f-2d5a-48f0-810c-0b2567c295bd HTTP/1.1" 200 OK INFO: 127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK INFO: 127.0.0.1:56583 - "GET /api/v1/chats/ HTTP/1.1" 200 OK 02:30:15 - LiteLLM:INFO: POST Request Sent from LiteLLM: curl -X POST \ https://api.anthropic.com/v1/messages \ -H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \ -d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}, {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, {'role': 'user', 'content': 'this is a test of your max_tokens, please say hello'}], 'stream': True, 'max_tokens': 4063}' INFO:LiteLLM: POST Request Sent from LiteLLM: curl -X POST \ https://api.anthropic.com/v1/messages \ -H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: ********************' \ -d '{'model': 'claude-3-haiku-20240307', 'messages': [{'role': 'user', 'content': 'hello'}, {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, {'role': 'user', 'content': 'this is a test of your max_tokens, please say hello'}], 'stream': True, 'max_tokens': 4063}' 02:30:16 - LiteLLM Router:INFO: litellm.acompletion(model=claude-3-haiku-20240307) 200 OK INFO:LiteLLM Router:litellm.acompletion(model=claude-3-haiku-20240307) 200 OK INFO: 127.0.0.1:56607 - "POST /litellm/api/v1/chat/completions HTTP/1.1" 200 OK INFO: 127.0.0.1:56607 - "POST /api/v1/chats/aa4b137f-2d5a-48f0-810c-0b2567c295bd HTTP/1.1" 200 OK INFO: 127.0.0.1:56607 - "GET /api/v1/chats/ HTTP/1.1" 200 OK INFO: 127.0.0.1:56607 - "GET /api/v1/chats/ HTTP/1.1" 200 OK ``` ## Installation Method Manual installation

GiteaMirror closed this issue

2026-04-25 02:44:54 -05:00

GiteaMirror commented

2026-04-25 02:44:54 -05:00

@tjbck commented on GitHub (Mar 27, 2024):

This might be a LiteLLM issue, so you might want to try testing LiteLLM in isolation. Keep us updated!

@tjbck commented on GitHub (Mar 27, 2024): This might be a LiteLLM issue, so you might want to try testing LiteLLM in isolation. Keep us updated!

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#27976