400 Bad Request Error with Groq's llama-3.1-70b Model Due to max_tokens Parameter in Open WebUI v0.3.19 #2002

New Issue

GiteaMirror · 2025-11-11T14:58:28-06:00

GiteaMirror commented

2025-11-11 14:58:28 -06:00

Originally created by @pppyyyccc on GitHub (Sep 6, 2024).

Bug Report

Installation Method

I use Docker to deploy open-webui with one-api to get all LLM connected.

Environment

Environment
Open WebUI Version: v0.3.19

Operating System: Windows 11

Confirmation:

[ x] I have read and followed all the instructions provided in the README.md.
[x ] I am on the latest version of both Open WebUI and Ollama.
[ x] I have included the browser console logs.
[x ] I have included the Docker container logs.
[ x] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

he model should output results without any errors when using the default Max Tokens (num_predict) value.

Actual Behavior:

When attempting to output using the Groq's llama-3.1-70b model, I receive a 400 Bad Request error. The OneAPI logs indicate a relay error with the message: relay error (channel #1(Groq)): Provider API error: max_tokens for 'llama-3.1-70b-versatile' must be less than or equal to 8000. Despite using the default Max Tokens (num_predict) value, the error persists. I have also tried adjusting the value from 128 to 7000, but the error remains.

Description

Bug Summary:
When using the latest version (v0.3.19) of Open WebUI to output with the Groq's llama-3.1-70b model, a 400 Bad Request error occurs. The OneAPI logs indicate that the max_tokens for llama-3.1-70b-versatile must be less than or equal to 8000, even though the default Max Tokens (num_predict) value is being used. Adjusting the value from 128 to 7000 does not resolve the issue.

Reproduction Details

Steps to Reproduce:

Install the latest version (v0.3.19) of Open WebUI using docker update.

Attempt to output using the Groq's llama-3.1-70b model with the default Max Tokens (num_predict) value.
Observe the 400 Bad Request error.
Check the OneAPI logs for the relay error message.
Adjust the Max Tokens (num_predict) value from 128 to 7000 and attempt to output again.
Observe that the error persists.

Logs and Screenshots

ERROR [open_webui.apps.openai.main] 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'
Traceback (most recent call last):
File "/app/backend/open_webui/apps/openai/main.py", line 411, in generate_chat_completion
r.raise_for_status()
File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1093, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'
INFO: 172.104.184.104:0 - "POST /api/chat/completions HTTP/1.1" 400 Bad Request
INFO: 172.104.184.104:0 - "POST /api/v1/chats/f997cf9c-6774-4437-97e8-d6ed0b673c2c HTTP/1.1" 200 OK
INFO: 172.104.184.104:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 172.104.184.104:0 - "GET /ws/socket.io/?EIO=4&transport=polling&t=P764LlG.0&sid=F4FhXzgP9rcXRLiHAAAD HTTP/1.1" 200 OK
ERROR [open_webui.apps.openai.main] 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'
Traceback (most recent call last):
File "/app/backend/open_webui/apps/openai/main.py", line 411, in generate_chat_completion
r.raise_for_status()
File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1093, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'

one-api log
024/09/06 - 14:41:57 ERROR logger/logger.go:172 20240906144157627289298yCBnLlJf | relay error (channel #1(Groq)): Provider API error: max_tokens for llama-3.1-70b-versatile must be less than or equal to 8000

2024/09/06 - 14:41:58 ERROR logger/logger.go:172 202409061441588 | relay error happen, status code is 400, won't retry in this case

2024/09/06 - 14:41:58 INFO middleware/logger.go:52 GIN request {"status": 400, "request_id": "2024090614415883 ", "method": "POST", "path": "/v1/chat/completions", "query": "", "ip": "ip", "user-agent": "Python/3.11 aiohttp/3.10.5", "latency": "46.887428ms", "user_id": 1, "original_model": "llama-3.1-70b-versatile", "new_model": "", "token_id": 2, "token_name": " ", "channel_id": 1}
2024/09/06 - 14:41:58 ERROR logger/logger.go:172 20240906144158835710277C4YQE93k | relay error (channel #1(Groq)): Provider API error: max_tokens for llama-3.1-70b-versatile must be less than or equal to 8000

Additional Information

The issue seems to be related to the max_tokens parameter for the llama-3.1-70b-versatile model. Despite using the default value and attempting various other values within the acceptable range, the error persists. This suggests a potential bug in the handling of the max_tokens parameter for this specific model. I also tried the Groq other LLM like Mixtral-8x7b the result is good as usual.

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @pppyyyccc on GitHub (Sep 6, 2024). # Bug Report ## Installation Method I use Docker to deploy open-webui with one-api to get all LLM connected. ## Environment Environment Open WebUI Version: v0.3.19 Operating System: Windows 11 **Confirmation:** - [ x] I have read and followed all the instructions provided in the README.md. - [x ] I am on the latest version of both Open WebUI and Ollama. - [ x] I have included the browser console logs. - [x ] I have included the Docker container logs. - [ x] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: he model should output results without any errors when using the default Max Tokens (num_predict) value. ## Actual Behavior: When attempting to output using the Groq's llama-3.1-70b model, I receive a 400 Bad Request error. The OneAPI logs indicate a relay error with the message: relay error (channel #1(Groq)): Provider API error: max_tokens for 'llama-3.1-70b-versatile' must be less than or equal to 8000. Despite using the default Max Tokens (num_predict) value, the error persists. I have also tried adjusting the value from 128 to 7000, but the error remains. ## Description **Bug Summary:** When using the latest version (v0.3.19) of Open WebUI to output with the Groq's llama-3.1-70b model, a 400 Bad Request error occurs. The OneAPI logs indicate that the max_tokens for llama-3.1-70b-versatile must be less than or equal to 8000, even though the default Max Tokens (num_predict) value is being used. Adjusting the value from 128 to 7000 does not resolve the issue. ## Reproduction Details Steps to Reproduce: Install the latest version (v0.3.19) of Open WebUI using docker update. Attempt to output using the Groq's llama-3.1-70b model with the default Max Tokens (num_predict) value. Observe the 400 Bad Request error. Check the OneAPI logs for the relay error message. Adjust the Max Tokens (num_predict) value from 128 to 7000 and attempt to output again. Observe that the error persists. ## Logs and Screenshots ERROR [open_webui.apps.openai.main] 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions' Traceback (most recent call last): File "/app/backend/open_webui/apps/openai/main.py", line 411, in generate_chat_completion r.raise_for_status() File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1093, in raise_for_status raise ClientResponseError( aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions' INFO: 172.104.184.104:0 - "POST /api/chat/completions HTTP/1.1" 400 Bad Request INFO: 172.104.184.104:0 - "POST /api/v1/chats/f997cf9c-6774-4437-97e8-d6ed0b673c2c HTTP/1.1" 200 OK INFO: 172.104.184.104:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK INFO: 172.104.184.104:0 - "GET /ws/socket.io/?EIO=4&transport=polling&t=P764LlG.0&sid=F4FhXzgP9rcXRLiHAAAD HTTP/1.1" 200 OK ERROR [open_webui.apps.openai.main] 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions' Traceback (most recent call last): File "/app/backend/open_webui/apps/openai/main.py", line 411, in generate_chat_completion r.raise_for_status() File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1093, in raise_for_status raise ClientResponseError( aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions' one-api log 024/09/06 - 14:41:57 ERROR logger/logger.go:172 20240906144157627289298yCBnLlJf | relay error (channel #1(Groq)): Provider API error: max_tokens for `llama-3.1-70b-versatile` must be less than or equal to 8000 2024/09/06 - 14:41:58 ERROR logger/logger.go:172 202409061441588 | relay error happen, status code is 400, won't retry in this case 2024/09/06 - 14:41:58 INFO middleware/logger.go:52 GIN request {"status": 400, "request_id": "2024090614415883 ", "method": "POST", "path": "/v1/chat/completions", "query": "", "ip": "ip", "user-agent": "Python/3.11 aiohttp/3.10.5", "latency": "46.887428ms", "user_id": 1, "original_model": "llama-3.1-70b-versatile", "new_model": "", "token_id": 2, "token_name": " ", "channel_id": 1} 2024/09/06 - 14:41:58 ERROR logger/logger.go:172 20240906144158835710277C4YQE93k | relay error (channel #1(Groq)): Provider API error: max_tokens for `llama-3.1-70b-versatile` must be less than or equal to 8000 ## Additional Information The issue seems to be related to the max_tokens parameter for the llama-3.1-70b-versatile model. Despite using the default value and attempting various other values within the acceptable range, the error persists. This suggests a potential bug in the handling of the max_tokens parameter for this specific model. I also tried the Groq other LLM like Mixtral-8x7b the result is good as usual. ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

GiteaMirror closed this issue

2025-11-11 14:58:29 -06:00

GiteaMirror referenced this issue

2025-11-11 17:32:28 -06:00

[PR #2002] [MERGED] 🤩 Added custom openai tts models and role variables #7653

GiteaMirror referenced this issue

2026-04-20 03:13:49 -05:00

[PR #2002] [MERGED] 🤩 Added custom openai tts models and role variables #20857

GiteaMirror referenced this issue

2026-04-25 10:20:18 -05:00

[PR #2002] [MERGED] 🤩 Added custom openai tts models and role variables #36487

GiteaMirror referenced this issue