mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-10 15:54:15 -05:00
issue: Open WebUI not giving all tokens provided by VLLM #4540
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @taltoris on GitHub (Mar 24, 2025).
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v5.20
Ollama Version (if applicable)
No response
Operating System
Ubuntu
Browser (if applicable)
Firefox 136.0.1 and Chrome 134.0.6998.165
Confirmation
README.md.Expected Behavior
Using QwQ served by VLLM, Open-WebUI does not post all of the output generated by VLLM.
For this model, I expect some reasoning enclosed in tags, followed by a summary.
Actual Behavior
Open WebUI not returning the full response given by VLLM?
First, I love Open WebUI. When it works, it's the best.
But, I'm having some trouble.
Here's my setup:
Then, I launch Open WebUI and Select QwQ from my model drop-down.
Then, I ask it a question:
"Why is the sky blue?"
Thinking...
Then... nothing. It just stops outputting tokens! The thinking still appears to be working, but OWU isn't outputting anything.
However, if I attempt to get VLLM to answer the question directly... it works!
{"id":"chatcmpl-b3197cc3aae9402d9c70249460b6a91b","object":"chat.completion","created":1742787780,"model":"/app/models/Qwen-QwQ-AWQ","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"\n\nOkay, so I need to explain why the sky is blue. Let me start by recalling what I know about light and scattering.
...
\n\nThe sky appears blue due to Rayleigh scattering, a process involving how sunlight interacts with Earth's atmosphere. Here’s a breakdown:\n\n### 1. Sunlight Composition \n - Sunlight seems "white" but contains all colors of the visible spectrum (red, orange, yellow, green, blue, indigo, violet). These colors correspond to specific wavelengths—blue/violet being shortest (~400–500 nm), and red/yellow longest (~620–750 nm).\n\n---\n\n### 2. Interaction with Atmospheric Molecules \n - As sunlight passes through the atmosphere, its photons collide with molecules (like nitrogen and oxygen) and tiny particles. \n - Shorter-wavelength blue and violet light scatter far more easily than longer-wavelength red/orange light. ...}
So, VLLM has output that never gets displayed by OWU.
Steps to Reproduce
Logs & Screenshots
WARNING: CORS_ALLOW_ORIGIN IS SET TO '*' - NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS.
INFO [open_webui.env] 'WEBUI_BANNERS' loaded from the latest database entry
INFO [open_webui.env] 'SHOW_ADMIN_DETAILS' loaded from the latest database entry
INFO [open_webui.env] 'TASK_MODEL' loaded from the latest database entry
INFO [open_webui.env] 'TASK_MODEL_EXTERNAL' loaded from the latest database entry
INFO [open_webui.env] 'TITLE_GENERATION_PROMPT_TEMPLATE' loaded from the latest database entry
INFO [open_webui.env] 'TAGS_GENERATION_PROMPT_TEMPLATE' loaded from the latest database entry
INFO [open_webui.env] 'IMAGE_PROMPT_GENERATION_PROMPT_TEMPLATE' loaded from the latest database entry
INFO [open_webui.env] 'ENABLE_TAGS_GENERATION' loaded from the latest database entry
INFO [open_webui.env] 'ENABLE_SEARCH_QUERY_GENERATION' loaded from the latest database entry
INFO [open_webui.env] 'ENABLE_RETRIEVAL_QUERY_GENERATION' loaded from the latest database entry
INFO [open_webui.env] 'QUERY_GENERATION_PROMPT_TEMPLATE' loaded from the latest database entry
INFO [open_webui.env] 'ENABLE_AUTOCOMPLETE_GENERATION' loaded from the latest database entry
INFO [open_webui.env] 'AUTOCOMPLETE_GENERATION_INPUT_MAX_LENGTH' loaded from the latest database entry
INFO [open_webui.env] 'TOOLS_FUNCTION_CALLING_PROMPT_TEMPLATE' loaded from the latest database entry
INFO [open_webui.env] 'ENABLE_GOOGLE_DRIVE_INTEGRATION' loaded from the latest database entry
INFO [open_webui.env] 'ENABLE_RAG_WEB_LOADER_SSL_VERIFICATION' loaded from the latest database entry
INFO [open_webui.env] 'PDF_EXTRACT_IMAGES' loaded from the latest database entry
INFO [open_webui.env] Embedding model set: sentence-transformers/all-MiniLM-L6-v2
INFO [open_webui.env] 'YOUTUBE_LOADER_LANGUAGE' loaded from the latest database entry
INFO [open_webui.env] 'YOUTUBE_LOADER_PROXY_URL' loaded from the latest database entry
INFO [open_webui.env] 'ENABLE_RAG_WEB_SEARCH' loaded from the latest database entry
INFO [open_webui.env] 'RAG_WEB_SEARCH_ENGINE' loaded from the latest database entry
INFO [open_webui.env] 'SEARXNG_QUERY_URL' loaded from the latest database entry
INFO [open_webui.env] 'GOOGLE_PSE_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'GOOGLE_PSE_ENGINE_ID' loaded from the latest database entry
INFO [open_webui.env] 'BRAVE_SEARCH_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'KAGI_SEARCH_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'MOJEEK_SEARCH_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'SERPSTACK_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'SERPSTACK_HTTPS' loaded from the latest database entry
INFO [open_webui.env] 'SERPER_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'SERPLY_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'TAVILY_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'JINA_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'SEARCHAPI_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'SEARCHAPI_ENGINE' loaded from the latest database entry
INFO [open_webui.env] 'BING_SEARCH_V7_ENDPOINT' loaded from the latest database entry
INFO [open_webui.env] 'BING_SEARCH_V7_SUBSCRIPTION_KEY' loaded from the latest database entry
INFO [open_webui.env] 'EXA_API_KEY' loaded from the latest database entry
INFO [open_webui.env] 'RAG_WEB_SEARCH_RESULT_COUNT' loaded from the latest database entry
INFO [open_webui.env] 'RAG_WEB_SEARCH_CONCURRENT_REQUESTS' loaded from the latest database entry
WARNI [langchain_community.utils.user_agent] USER_AGENT environment variable not set, consider setting it to identify your requests.
Fetching 30 files: 100%|██████████| 30/30 [00:00<00:00, 45540.76it/s]
/ _ \ _ __ ___ _ __ \ \ / /| | | | | |_ |
| | | | ' \ / _ \ '_ \ \ \ /\ / / _ \ '_ | | | || |
| || | |) | / | | | \ V V / / |) | || || |
_/| ./ _|| || _/_/ _|./ _/|_|
||
v0.5.10 - building the best open-source AI user interface.
https://github.com/open-webui/open-webui
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO [open_webui.models.auths] authenticate_user: doug@douglamaster.com
INFO: 192.168.0.126:54745 - "POST /api/v1/auths/signin HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/config HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/changelog HTTP/1.1" 200 OK
INFO: 192.168.0.126:54746 - "GET /api/v1/users/user/settings HTTP/1.1" 200 OK
INFO [open_webui.routers.openai] get_all_models()
INFO [open_webui.utils.plugin] Loaded module: function_gemini
INFO: 192.168.0.126:54745 - "GET /api/models HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/v1/configs/banners HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/v1/tools/ HTTP/1.1" 200 OK
INFO: 192.168.0.126:54746 - "GET /ollama/api/version HTTP/1.1" 200 OK
INFO: 192.168.0.126:54747 - "GET /static/favicon.png HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/v1/channels/ HTTP/1.1" 200 OK
INFO: 192.168.0.126:54746 - "GET /api/v1/users/user/settings HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/v1/chats/pinned HTTP/1.1" 200 OK
INFO: 192.168.0.126:54747 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 192.168.0.126:54745 - "GET /api/v1/folders/ HTTP/1.1" 200 OK
INFO: ('192.168.0.126', 54748) - "WebSocket /ws/socket.io/?EIO=4&transport=websocket" [accepted]
INFO: connection open
INFO: ('131.241.21.70', 0) - "WebSocket /ws/socket.io/?EIO=4&transport=websocket" [accepted]
INFO: connection open
INFO: ('131.241.21.70', 0) - "WebSocket /ws/socket.io/?EIO=4&transport=websocket" [accepted]
INFO: connection open
INFO: 192.168.0.126:54749 - "POST /api/v1/chats/new HTTP/1.1" 200 OK
INFO: 192.168.0.126:54749 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: ('131.241.21.70', 0) - "WebSocket /ws/socket.io/?EIO=4&transport=websocket" [accepted]
INFO: connection open
INFO: 192.168.0.126:54749 - "POST /api/v1/chats/cdeab16d-ddf0-4c9f-b7fb-b1ec3a5c373f HTTP/1.1" 200 OK
INFO: 192.168.0.126:54749 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO [open_webui.routers.openai] get_all_models()
INFO: 192.168.0.126:54750 - "POST /api/v1/tasks/auto/completions HTTP/1.1" 400 Bad Request
INFO: 192.168.0.126:54749 - "POST /api/chat/completions HTTP/1.1" 200 OK
INFO: 192.168.0.126:54749 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
Additional Information
#########
Open Webui docker-compose.yml
#########
services:
open-webui:
image: ghcr.io/open-webui/open-webui:v0.5.10
container_name: open-webui
ports:
- "8282:8080"
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434
- AIOHTTP_CLIENT_TIMEOUT=10
volumes:
- open-webui:/app/backend/data
restart: unless-stopped
networks:
- open-webui-network
- llm
pipelines:
image: ghcr.io/open-webui/pipelines:main
container_name: pipelines
restart: always
ports:
- "9099:9099"
volumes:
- pipelines:/app/pipelines
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- open-webui-network
- llm
networks:
open-webui-network:
external: true
llm:
external: true
volumes:
pipelines:
open-webui:
#########
VLLM docker-compose.yml
#########
services:
vllm:
image: vllm/vllm-openai:latest
runtime: nvidia
container_name: vllm
ipc: host
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
- ./models:/app/models
- ./model_config.yml:/app/model_config.yml
environment:
- HUGGING_FACE_HUB_TOKEN=hf_stuff
- CUDA_VISIBLE_DEVICES=0,1
- VLLM_USE_CUDA_EXTENSION=1
- VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
ports:
- "8007:8000"
shm_size: "10gb"
command: [
"--model=/app/models/Qwen-QwQ-AWQ",
"--tensor-parallel-size=2",
"--quantization=awq",
"--max-model-len=32768",
"--trust-remote-code",
"--host=0.0.0.0",
"--gpu-memory-utilization=0.97",
"--enforce-eager",
"--uvicorn-log-level=debug"
]
networks:
- llm
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
llm:
external: true