In the streaming process, a crash may occur when the output tokens reach thousands or tens of thousands #2863

New Issue

GiteaMirror · 2025-11-11T15:15:59-06:00

GiteaMirror commented

2025-11-11 15:15:59 -06:00

Originally created by @Nikoyyy on GitHub (Nov 28, 2024).

Installation Method

[docker run -d -p 3001:8080 --security-opt=seccomp=unconfined --privileged --mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false --mount=type=bind,source=/proc,target=/proc2,readonly=false,bind-recursive=disabled -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main]

Environment

Open WebUI Version: [v0.4.6]
Ollama (if applicable): []
Operating System: [Windows 10]
Browser (if applicable): [Version 131.0.6778.86 (Official Build) (64-bit)]

Confirmation:

[ 1] I have read and followed all the instructions provided in the README.md.
[ 1] I am on the latest version of both Open WebUI and Ollama.
[ 1] I have included the browser console logs.
[ 1] I have included the Docker container logs.
[ 1] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

Streaming process, output reaches thousands or tens of thousands without crashing

Actual Behavior:

When streaming reaches thousands or tens of thousands of tokens, crashes may occur.

Description

Bug Summary:
When streaming reaches thousands or tens of thousands of tokens, crashes may occur.

Reproduction Details

Steps to Reproduce:
Using the QwQ-32B-Preview model, input: There exist real numbers x and y, both greater than 1, such that logx (y²) = log₁ (x¼) = 10. Find xy.
The output is very likely to reach thousands or tens of thousands of tokens, at which point it is highly likely to crash.
As the output increases, it becomes noticeably slower until it eventually crashes.

Logs and Screenshots

Screenshots/Screen Recordings (if applicable):

Originally created by @Nikoyyy on GitHub (Nov 28, 2024). ## Installation Method [docker run -d -p 3001:8080 --security-opt=seccomp=unconfined --privileged --mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false --mount=type=bind,source=/proc,target=/proc2,readonly=false,bind-recursive=disabled -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main] ## Environment - **Open WebUI Version:** [v0.4.6] - **Ollama (if applicable):** [] - **Operating System:** [Windows 10] - **Browser (if applicable):** [Version 131.0.6778.86 (Official Build) (64-bit)] **Confirmation:** - [ 1] I have read and followed all the instructions provided in the README.md. - [ 1] I am on the latest version of both Open WebUI and Ollama. - [ 1] I have included the browser console logs. - [ 1] I have included the Docker container logs. - [ 1] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: Streaming process, output reaches thousands or tens of thousands without crashing ## Actual Behavior: When streaming reaches thousands or tens of thousands of tokens, crashes may occur. ## Description **Bug Summary:** When streaming reaches thousands or tens of thousands of tokens, crashes may occur. ## Reproduction Details **Steps to Reproduce:** Using the QwQ-32B-Preview model, input: There exist real numbers x and y, both greater than 1, such that logx (y²) = log₁ (x¼) = 10. Find xy. The output is very likely to reach thousands or tens of thousands of tokens, at which point it is highly likely to crash. As the output increases, it becomes noticeably slower until it eventually crashes. ## Logs and Screenshots **Screenshots/Screen Recordings (if applicable):** ![44bb4023fd3db5cffebd2a50b00c5f54](https://github.com/user-attachments/assets/0daabdd1-d9f1-49a5-b9fb-9077102d4cd6)

GiteaMirror closed this issue