mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[GH-ISSUE #20173] issue: STT local whisper issue (Please don't turn this to discussion) #57778
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ALIENvsROBOT on GitHub (Dec 25, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20173
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.43 CUDA docker image
Ollama Version (if applicable)
No response
Operating System
Ubuntu 24
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Local Whisper STT should transcribe audio successfully on CUDA, as it did on v0.6.41.
Actual Behavior
Transcription fails with the int8 compute_type error shown above.
Steps to Reproduce
Run with docker cuda support and try to use local whisper. See "Additional Information" section
Logs & Screenshots
File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in call
File "/app/backend/open_webui/main.py", line 1316, in dispatch
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 191, in call
File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit
File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in call
File "/app/backend/open_webui/utils/security_headers.py", line 11, in dispatch
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 191, in call
File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit
File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in call
File "/app/backend/open_webui/main.py", line 1272, in dispatch
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro
File "/usr/local/lib/python3.11/site-packages/starlette_compress/init.py", line 94, in call
File "/usr/local/lib/python3.11/site-packages/starlette_compress/_brotli.py", line 106, in call
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 63, in call
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 716, in call
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 736, in app
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 290, in handle
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 117, in app
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 103, in app
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 424, in app
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 310, in run_endpoint_function
File "/app/backend/open_webui/routers/audio.py", line 274, in update_audio_config
File "/app/backend/open_webui/routers/audio.py", line 144, in set_faster_whisper_model
File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 689, in init
ValueError: Requested int8 compute type, but the target device or backend do not support efficient int8 computation.
Additional Information
Likely root cause
Open WebUI builds the faster-whisper model args roughly like:
device = "cuda"when CUDA is enabled (viaUSE_CUDA_DOCKER→DEVICE_TYPE="cuda"logic)compute_typeis hard-coded to"int8"regardless of deviceThat combination can fail on some CUDA/backends, because CTranslate2 may not support efficient int8 on the active GPU/backend (float16 is typically the safe CUDA default). This is documented upstream in faster-whisper issues as the same exception when using
compute_type="int8"on CUDA.So: CUDA path + forced int8 => hard failure.
Why this shows up specifically now (0.6.41 → 0.6.43)
v0.6.42 explicitly warns about dependency version bumps, which likely includes faster-whisper/ctranslate2 wheels and can change backend capability checks/behavior.
v0.6.43 includes STT refactors (e.g. MIME/content-type handling fix in commit
4ab917c), which may have changed the request flow so more uploads reach the model init path reliably — but the underlying CUDA+int8 incompatibility remains.Recommended fix (maintainer-friendly)
Stop forcing
compute_type="int8"on CUDA by default.device == "cuda"→ defaultcompute_type="float16"(orint8_float16if you want partial-quant speedups)device == "cpu"→ defaultcompute_type="int8"(current behavior fine)Add an override env var so users can control it without patching source:
WHISPER_COMPUTE_TYPE(orAUDIO_STT_COMPUTE_TYPE)float16,int8_float16,int8, etc.Optional robustness: if WhisperModel init raises the “Requested int8 compute type…” ValueError, automatically retry with
float16and log a warning.References
4ab917c(fix/refac: stt default content type)compute_type="int8"on CUDA and recommends using float16.@owui-terminator[bot] commented on GitHub (Dec 25, 2025):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#20125 issue: STT local whisper failure
by ALIENvsROBOT • Dec 22, 2025 •
bug#20019 issue:
by j63440490 • Dec 17, 2025 •
bug#19777 issue:
by Yaute7 • Dec 05, 2025 •
bug#20092 issue:
by VideoRyan • Dec 22, 2025 •
bug#19864 issue:
by Haervwe • Dec 10, 2025 •
bugShow 5 more related issues
#19877 issue:
by dotmobo • Dec 11, 2025 •
bug#20046 issue:
by pierrelouisbescond • Dec 19, 2025 •
bug#19861 issue:
by QuitHub • Dec 10, 2025 •
bug#19563 issue:
by naruto7g • Nov 28, 2025 •
bug#19211 issue:
by Byrnes9 • Nov 16, 2025 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@ALIENvsROBOT commented on GitHub (Dec 26, 2025):
@tjbck Please could you reproduce this bug ??
@ALIENvsROBOT commented on GitHub (Dec 27, 2025):
Please I really want someone to fix this issue asap. My whisper is not working with GPU on version 0.6.43 and 0.6.42 but it worked in 0.6.41 please give me fix !
@Classic298 commented on GitHub (Dec 27, 2025):
@ALIENvsROBOT don't ping Tim on issues, he will check them when he has time.
If it is this urgent to you, feel free to work on a PR
@ALIENvsROBOT commented on GitHub (Dec 27, 2025):
really sorry. I want this for my prototype Majorly rely on whisper transcription.
@ALIENvsROBOT commented on GitHub (Dec 28, 2025):
The crash is caused by the hardcoded compute_type="int8" default interacting with recent faster-whisper updates, which now strictly reject int8 on Linux/Podman setups (like my NVIDIA Blackwell) where the specific kernels aren't available.
Unlike Windows (which handles this via driver emulation), the Linux backend throws a ValueError immediately when the hardware/driver combo doesn't natively support the requested int8 precision.
My testing confirms that float16 is the only universally stable compute type across CUDA architectures, whereas int8 support is inconsistent and hardware-dependent.
To fix this, I updated the logic to default CUDA devices to float16 (safe standard) and implemented a dynamic fallback chain that only attempts int8 if float16 is unavailable.
This ensures the container initializes reliably on all GPU generations without crashing due to strict quantization requirements.
@dohabandit commented on GitHub (Feb 25, 2026):
I have the same issue, although switching the model type to float16 or float32 didn't work with my Tesla V100S. I get the same errors when it is attempting to process chunks of audio:
I decided to switch the whisper STT to CPU, but there wasn't an easy way to do that.
I made changes to audio.py and env.py to add an env variable "WHISPER_DEVICE_TYPE", which I set to "cpu".
`# whisper is broken with V100S and CUDA12 libraries, use CPU
WHISPER_DEVICE_TYPE=cpu
WHISPER_COMPUTE_TYPE=int8
WHISPER_MODEL_AUTO_UPDATE=false
cpu,cuda: float16 int8 int8-float16 float32
`
Also I am running OUI locally in a venv and not running in a docker container. The env var to tell OUI to use cuda is counterintuitive and I think it needs to be changed because USE_CUDA_DOCKER is required to be set to true to use cuda even though you are not using docker!:
# use the V100S DEVICE_TYPE=cuda USE_CUDA_DOCKER=true CUDA_VISIBLE_DEVICES=0Change to env.py:
# device type embedding models - "cpu" (default), "cuda" (nvidia gpu required) or "mps" (apple silicon) - choosing this right can lead to better performance USE_CUDA = os.environ.get("USE_CUDA_DOCKER", "false") WHISPER_DEVICE_TYPE = os.environ.get("WHISPER_DEVICE_TYPE")Change to audio.py:
faster_whisper_kwargs = { "model_size_or_path": model, "device": WHISPER_DEVICE_TYPE if WHISPER_DEVICE_TYPE and WHISPER_DEVICE_TYPE == "cuda" else "cpu", "compute_type": WHISPER_COMPUTE_TYPE, "download_root": WHISPER_MODEL_DIR, "local_files_only": not auto_update, }This lets me set DEVICE_TYPE to cuda for everything else tha OUI does, but have whisper/ctranslate2 using cpu instead.
faster-whisper==1.2.1
ctranslate2==4.6.2
nvidia-cudnn-cu12==9.10.2.21
PyTorch: 2.9.1+cu128
CUDA available: True
CUDA version: 12.8
cuDNN version: 91002
ctranslate2: 4.6.2
GPU: Tesla V100S-PCIE-32GB