[PR #20220] [CLOSED] Fix/whisper cuda compute type #48557

Closed
opened 2026-04-30 00:34:30 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/20220
Author: @ALIENvsROBOT
Created: 12/28/2025
Status: Closed

Base: devHead: fix/whisper-cuda-compute-type


📝 Commits (3)

  • 9355184 Fix: auto-select whisper compute type for CUDA
  • 7600cb3 Fix: force float16 for CUDA whisper compute type
  • 82c7be1 Fix: add CUDA compute type fallbacks for whisper

📊 Changes

3 files changed (+131 additions, -15 deletions)

View changed files

📝 backend/open_webui/config.py (+6 -0)
📝 backend/open_webui/main.py (+2 -0)
📝 backend/open_webui/routers/audio.py (+123 -15)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

  • Target branch: PR targets dev.
  • Description: Detailed description included below.
  • Changelog: Added below in Keep a Changelog format.
  • Documentation: No docs update required (no new public env vars).
  • Dependencies: No new dependencies.
  • Testing: Manual tests performed (see Testing section).
  • Agentic AI Code: Code reviewed and manually tested by a human.
  • Code review: Self-review completed.
  • Title Prefix: fix.

Description (Detailed)

Problem

CUDA builds fail Whisper init because compute_type was hard-coded to "int8". Faster-whisper uses CTranslate2, and CTranslate2 does not always support int8 or int8_float16 on every CUDA build / GPU capability. This causes a ValueError and breaks Whisper on GPU images.

Fixes: #20173

Why float16 works but int8 / int8_float16 fail

  • faster-whisper relies on CTranslate2 for inference.
  • CTranslate2 only supports certain compute types depending on GPU capability and compiled kernels.
  • float16 is broadly supported on modern NVIDIA GPUs, while int8 and int8_float16 require specific int8 kernels and may be unsupported (especially on newer GPUs / recent architectures).
  • When unsupported, CTranslate2 rejects the compute type at model init, throwing a ValueError.

Reference: https://opennmt.net/CTranslate2/quantization.html

Fix (What Changed)

This PR makes Whisper compute-type selection CUDA-safe and deterministic, with fallbacks:

  1. CUDA mapping to float16

    • If compute type is int8 or int8_float16 on CUDA → force float16.
  2. Device-aware default

    • CUDA → default float16
    • CPU → default int8
  3. CUDA fallback chain

    • If CUDA init fails, automatically retry in order: float16int8_float16int8.
  4. Config guard

    • Any invalid CUDA compute type is normalized before persisting.

Files Touched

  • backend/open_webui/routers/audio.py
    • Normalize compute type to float16 for CUDA (int8/int8_float16).
    • Preserve CPU behavior (int8).
    • Add CUDA fallback chain and retry logic.

Changelog Entry

Description

  • Fixes Whisper CUDA initialization by forcing a stable compute type and adding a CUDA-safe fallback chain (#20173).

Added

  • CUDA mapping: int8 / int8_float16float16.
  • CUDA fallback chain: float16int8_float16int8.

Changed

  • Whisper compute type now defaults to float16 on CUDA and int8 on CPU.
  • CUDA failures retry with fallback compute types instead of crashing.

Deprecated

  • None.

Removed

  • None.

Fixed

  • CUDA Whisper crash caused by invalid/unstable compute_type (#20173).

Security

  • None.

Breaking Changes

  • BREAKING CHANGE: None.

Testing (Manual)

  • CUDA image: ghcr.io/open-webui/open-webui:cuda booted successfully.
  • Whisper init: no ValueError.
  • CUDA compute: resolves to float16.
  • Regression: int8 / int8_float16 on CUDA does not crash (forced to float16 or falls back).

Screenshots or Videos

  • N/A (backend fix)

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/20220 **Author:** [@ALIENvsROBOT](https://github.com/ALIENvsROBOT) **Created:** 12/28/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix/whisper-cuda-compute-type` --- ### 📝 Commits (3) - [`9355184`](https://github.com/open-webui/open-webui/commit/9355184226e94f9ecea786130fdf957fcbf47609) Fix: auto-select whisper compute type for CUDA - [`7600cb3`](https://github.com/open-webui/open-webui/commit/7600cb34c1d0b6265e4637548cd3cbe1661ac951) Fix: force float16 for CUDA whisper compute type - [`82c7be1`](https://github.com/open-webui/open-webui/commit/82c7be1c5c594e0ed556d417d53a05c3e8af9361) Fix: add CUDA compute type fallbacks for whisper ### 📊 Changes **3 files changed** (+131 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+6 -0) 📝 `backend/open_webui/main.py` (+2 -0) 📝 `backend/open_webui/routers/audio.py` (+123 -15) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request. This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** PR targets `dev`. - [x] **Description:** Detailed description included below. - [x] **Changelog:** Added below in Keep a Changelog format. - [ ] **Documentation:** No docs update required (no new public env vars). - [ ] **Dependencies:** No new dependencies. - [x] **Testing:** Manual tests performed (see Testing section). - [x] **Agentic AI Code:** Code reviewed and manually tested by a human. - [x] **Code review:** Self-review completed. - [x] **Title Prefix:** `fix`. --- ## Description (Detailed) ### Problem CUDA builds fail Whisper init because `compute_type` was hard-coded to `"int8"`. Faster-whisper uses **CTranslate2**, and CTranslate2 does not always support `int8` or `int8_float16` on every CUDA build / GPU capability. This causes a `ValueError` and breaks Whisper on GPU images. Fixes: **#20173** ### Why `float16` works but `int8` / `int8_float16` fail - faster-whisper relies on **CTranslate2** for inference. - CTranslate2 only supports certain compute types depending on **GPU capability and compiled kernels**. - `float16` is broadly supported on modern NVIDIA GPUs, while `int8` and `int8_float16` require specific int8 kernels and may be unsupported (especially on newer GPUs / recent architectures). - When unsupported, CTranslate2 rejects the compute type at model init, throwing a `ValueError`. Reference: https://opennmt.net/CTranslate2/quantization.html ### Fix (What Changed) This PR makes Whisper compute-type selection **CUDA-safe and deterministic**, with fallbacks: 1) **CUDA mapping to float16** - If compute type is `int8` or `int8_float16` on CUDA → force `float16`. 2) **Device-aware default** - CUDA → default `float16` - CPU → default `int8` 3) **CUDA fallback chain** - If CUDA init fails, automatically retry in order: `float16` → `int8_float16` → `int8`. 4) **Config guard** - Any invalid CUDA compute type is normalized before persisting. ### Files Touched - `backend/open_webui/routers/audio.py` - Normalize compute type to `float16` for CUDA (`int8`/`int8_float16`). - Preserve CPU behavior (`int8`). - Add CUDA fallback chain and retry logic. --- # Changelog Entry ### Description - Fixes Whisper CUDA initialization by forcing a stable compute type and adding a CUDA-safe fallback chain (**#20173**). ### Added - CUDA mapping: `int8` / `int8_float16` → `float16`. - CUDA fallback chain: `float16` → `int8_float16` → `int8`. ### Changed - Whisper compute type now defaults to `float16` on CUDA and `int8` on CPU. - CUDA failures retry with fallback compute types instead of crashing. ### Deprecated - None. ### Removed - None. ### Fixed - CUDA Whisper crash caused by invalid/unstable `compute_type` (**#20173**). ### Security - None. ### Breaking Changes - **BREAKING CHANGE**: None. --- ### Testing (Manual) - **CUDA image:** `ghcr.io/open-webui/open-webui:cuda` booted successfully. - **Whisper init:** no `ValueError`. - **CUDA compute:** resolves to `float16`. - **Regression:** `int8` / `int8_float16` on CUDA does not crash (forced to `float16` or falls back). ### Screenshots or Videos - N/A (backend fix) ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-30 00:34:30 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#48557