mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-11 16:35:32 -05:00
400 Bad Request Error with Groq's llama-3.1-70b Model Due to max_tokens Parameter in Open WebUI v0.3.19 #2002
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pppyyyccc on GitHub (Sep 6, 2024).
Bug Report
Installation Method
I use Docker to deploy open-webui with one-api to get all LLM connected.
Environment
Environment
Open WebUI Version: v0.3.19
Operating System: Windows 11
Confirmation:
Expected Behavior:
he model should output results without any errors when using the default Max Tokens (num_predict) value.
Actual Behavior:
When attempting to output using the Groq's llama-3.1-70b model, I receive a 400 Bad Request error. The OneAPI logs indicate a relay error with the message: relay error (channel #1(Groq)): Provider API error: max_tokens for 'llama-3.1-70b-versatile' must be less than or equal to 8000. Despite using the default Max Tokens (num_predict) value, the error persists. I have also tried adjusting the value from 128 to 7000, but the error remains.
Description
Bug Summary:
When using the latest version (v0.3.19) of Open WebUI to output with the Groq's llama-3.1-70b model, a 400 Bad Request error occurs. The OneAPI logs indicate that the max_tokens for llama-3.1-70b-versatile must be less than or equal to 8000, even though the default Max Tokens (num_predict) value is being used. Adjusting the value from 128 to 7000 does not resolve the issue.
Reproduction Details
Steps to Reproduce:
Install the latest version (v0.3.19) of Open WebUI using docker update.
Attempt to output using the Groq's llama-3.1-70b model with the default Max Tokens (num_predict) value.
Observe the 400 Bad Request error.
Check the OneAPI logs for the relay error message.
Adjust the Max Tokens (num_predict) value from 128 to 7000 and attempt to output again.
Observe that the error persists.
Logs and Screenshots
ERROR [open_webui.apps.openai.main] 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'
Traceback (most recent call last):
File "/app/backend/open_webui/apps/openai/main.py", line 411, in generate_chat_completion
r.raise_for_status()
File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1093, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'
INFO: 172.104.184.104:0 - "POST /api/chat/completions HTTP/1.1" 400 Bad Request
INFO: 172.104.184.104:0 - "POST /api/v1/chats/f997cf9c-6774-4437-97e8-d6ed0b673c2c HTTP/1.1" 200 OK
INFO: 172.104.184.104:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 172.104.184.104:0 - "GET /ws/socket.io/?EIO=4&transport=polling&t=P764LlG.0&sid=F4FhXzgP9rcXRLiHAAAD HTTP/1.1" 200 OK
ERROR [open_webui.apps.openai.main] 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'
Traceback (most recent call last):
File "/app/backend/open_webui/apps/openai/main.py", line 411, in generate_chat_completion
r.raise_for_status()
File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1093, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='https://oneapi.xxxx.xxx/v1/chat/completions'
one-api log
024/09/06 - 14:41:57 ERROR logger/logger.go:172 20240906144157627289298yCBnLlJf | relay error (channel #1(Groq)): Provider API error: max_tokens for
llama-3.1-70b-versatilemust be less than or equal to 80002024/09/06 - 14:41:58 ERROR logger/logger.go:172 202409061441588 | relay error happen, status code is 400, won't retry in this case
2024/09/06 - 14:41:58 INFO middleware/logger.go:52 GIN request {"status": 400, "request_id": "2024090614415883 ", "method": "POST", "path": "/v1/chat/completions", "query": "", "ip": "ip", "user-agent": "Python/3.11 aiohttp/3.10.5", "latency": "46.887428ms", "user_id": 1, "original_model": "llama-3.1-70b-versatile", "new_model": "", "token_id": 2, "token_name": " ", "channel_id": 1}
2024/09/06 - 14:41:58 ERROR logger/logger.go:172 20240906144158835710277C4YQE93k | relay error (channel #1(Groq)): Provider API error: max_tokens for
llama-3.1-70b-versatilemust be less than or equal to 8000Additional Information
The issue seems to be related to the max_tokens parameter for the llama-3.1-70b-versatile model. Despite using the default value and attempting various other values within the acceptable range, the error persists. This suggests a potential bug in the handling of the max_tokens parameter for this specific model. I also tried the Groq other LLM like Mixtral-8x7b the result is good as usual.
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!