mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[PR #20220] [CLOSED] Fix/whisper cuda compute type #25509
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/20220
Author: @ALIENvsROBOT
Created: 12/28/2025
Status: ❌ Closed
Base:
dev← Head:fix/whisper-cuda-compute-type📝 Commits (3)
9355184Fix: auto-select whisper compute type for CUDA7600cb3Fix: force float16 for CUDA whisper compute type82c7be1Fix: add CUDA compute type fallbacks for whisper📊 Changes
3 files changed (+131 additions, -15 deletions)
View changed files
📝
backend/open_webui/config.py(+6 -0)📝
backend/open_webui/main.py(+2 -0)📝
backend/open_webui/routers/audio.py(+123 -15)📄 Description
Pull Request Checklist
Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.
This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.
Before submitting, make sure you've checked the following:
dev.fix.Description (Detailed)
Problem
CUDA builds fail Whisper init because
compute_typewas hard-coded to"int8". Faster-whisper uses CTranslate2, and CTranslate2 does not always supportint8orint8_float16on every CUDA build / GPU capability. This causes aValueErrorand breaks Whisper on GPU images.Fixes: #20173
Why
float16works butint8/int8_float16failfloat16is broadly supported on modern NVIDIA GPUs, whileint8andint8_float16require specific int8 kernels and may be unsupported (especially on newer GPUs / recent architectures).ValueError.Reference: https://opennmt.net/CTranslate2/quantization.html
Fix (What Changed)
This PR makes Whisper compute-type selection CUDA-safe and deterministic, with fallbacks:
CUDA mapping to float16
int8orint8_float16on CUDA → forcefloat16.Device-aware default
float16int8CUDA fallback chain
float16→int8_float16→int8.Config guard
Files Touched
backend/open_webui/routers/audio.pyfloat16for CUDA (int8/int8_float16).int8).Changelog Entry
Description
Added
int8/int8_float16→float16.float16→int8_float16→int8.Changed
float16on CUDA andint8on CPU.Deprecated
Removed
Fixed
compute_type(#20173).Security
Breaking Changes
Testing (Manual)
ghcr.io/open-webui/open-webui:cudabooted successfully.ValueError.float16.int8/int8_float16on CUDA does not crash (forced tofloat16or falls back).Screenshots or Videos
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.