[PR #20215] [CLOSED] Fix: auto-select whisper compute type for CUDA #25508

Closed
opened 2026-04-20 05:58:08 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/20215
Author: @ALIENvsROBOT
Created: 12/27/2025
Status: Closed

Base: devHead: fix/whisper-cuda-compute-type


📝 Commits (1)

  • 9355184 Fix: auto-select whisper compute type for CUDA

📊 Changes

3 files changed (+109 additions, -15 deletions)

View changed files

📝 backend/open_webui/config.py (+6 -0)
📝 backend/open_webui/main.py (+2 -0)
📝 backend/open_webui/routers/audio.py (+101 -15)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions (https://github.com/open-webui/open-webui/discussions) to discuss your

idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it
is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

  • Target branch: PR targets dev (required).
  • Description: Detailed description included below.
  • Changelog: Added below in Keep a Changelog format.
  • Documentation: No new external env vars introduced; docs update not required.
  • Dependencies: No new dependencies added.
  • Testing: Manual tests performed (see Testing section).
  • Agentic AI Code: Code reviewed + manually tested by a human before submission.
  • Code review: Self‑review completed; changes follow existing code style.
  • Title Prefix: fix.

———

Description (Detailed)

Problem

CUDA builds fail Whisper init because compute type was hard‑coded to "int8" in set_faster_whisper_model(). Faster‑whisper on CUDA does not accept int8
directly — it expects float16 or int8_float16. This causes a ValueError and breaks Whisper on GPU images.

Fixes: #20173

Fix (What Changed)

This PR makes compute‑type selection device‑aware and adds a safe fallback:

  1. Auto compute‑type selection
    • If device is CUDA → default float16
    • If device is CPU → default int8
  2. CUDA‑safe mapping
    • If user/config provides int8 while device is CUDA → map to int8_float16
  3. CUDA fallback
    • If faster‑whisper throws ValueError on CUDA with chosen compute type → retry with float16
  4. Config validation at save time
    • WHISPER_COMPUTE_TYPE normalized during update_audio_config() to prevent invalid CUDA values from persisting

Files Touched

  • backend/open_webui/routers/audio.py
    • Added compute type normalization + selection
    • Updated model init to use device‑appropriate compute type with fallback
    • Config update maps invalid CUDA values
  • backend/open_webui/config.py
    • Added persistent config entry for audio.stt.whisper_compute_type
  • backend/open_webui/main.py
    • Wires WHISPER_COMPUTE_TYPE into app state

Unaffected Areas

  • External STT engines: OpenAI, Deepgram, Azure, Mistral (untouched)
  • TTS engines: OpenAI, ElevenLabs, Azure (untouched)
  • Non‑audio services: no changes

———

Changelog Entry

Description

  • Fixes Whisper CUDA initialization by selecting valid compute types and adding safe fallback behavior (#20173).

Added

  • Device‑aware compute‑type resolution (CUDA → float16, CPU → int8).
  • CUDA mapping for int8 → int8_float16.

Changed

  • Faster‑whisper model init now resolves compute type from device/config.
  • CUDA init failures retry with float16.

Deprecated

  • None.

Removed

  • None.

Fixed

  • CUDA Whisper crash caused by invalid compute_type=int8 (#20173).

Security

  • None.

Breaking Changes

  • BREAKING CHANGE: None.

———

Testing (Manual)

  • CUDA build: ghcr.io/open-webui/open-webui:cuda starts successfully.
  • Whisper load: model initializes without ValueError.
  • Default CUDA compute: resolves to float16.
  • Regression test: forced compute type int8 on CUDA maps to int8_float16 (no crash).

Screenshots or Videos

  • N/A (backend fix)

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/20215 **Author:** [@ALIENvsROBOT](https://github.com/ALIENvsROBOT) **Created:** 12/27/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix/whisper-cuda-compute-type` --- ### 📝 Commits (1) - [`9355184`](https://github.com/open-webui/open-webui/commit/9355184226e94f9ecea786130fdf957fcbf47609) Fix: auto-select whisper compute type for CUDA ### 📊 Changes **3 files changed** (+109 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+6 -0) 📝 `backend/open_webui/main.py` (+2 -0) 📝 `backend/open_webui/routers/audio.py` (+101 -15) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in Discussions (https://github.com/open-webui/open-webui/discussions) to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request. This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR. Before submitting, make sure you've checked the following: - [x] Target branch: PR targets dev (required). - [x] Description: Detailed description included below. - [x] Changelog: Added below in Keep a Changelog format. - [ ] Documentation: No new external env vars introduced; docs update not required. - [ ] Dependencies: No new dependencies added. - [x] Testing: Manual tests performed (see Testing section). - [x] Agentic AI Code: Code reviewed + manually tested by a human before submission. - [x] Code review: Self‑review completed; changes follow existing code style. - [x] Title Prefix: fix. ——— ## Description (Detailed) ### Problem CUDA builds fail Whisper init because compute type was hard‑coded to "int8" in set_faster_whisper_model(). Faster‑whisper on CUDA does not accept int8 directly — it expects float16 or int8_float16. This causes a ValueError and breaks Whisper on GPU images. Fixes: #20173 ### Fix (What Changed) This PR makes compute‑type selection device‑aware and adds a safe fallback: 1. Auto compute‑type selection - If device is CUDA → default float16 - If device is CPU → default int8 2. CUDA‑safe mapping - If user/config provides int8 while device is CUDA → map to int8_float16 3. CUDA fallback - If faster‑whisper throws ValueError on CUDA with chosen compute type → retry with float16 4. Config validation at save time - WHISPER_COMPUTE_TYPE normalized during update_audio_config() to prevent invalid CUDA values from persisting ### Files Touched - backend/open_webui/routers/audio.py - Added compute type normalization + selection - Updated model init to use device‑appropriate compute type with fallback - Config update maps invalid CUDA values - backend/open_webui/config.py - Added persistent config entry for audio.stt.whisper_compute_type - backend/open_webui/main.py - Wires WHISPER_COMPUTE_TYPE into app state ### Unaffected Areas - External STT engines: OpenAI, Deepgram, Azure, Mistral (untouched) - TTS engines: OpenAI, ElevenLabs, Azure (untouched) - Non‑audio services: no changes ——— # Changelog Entry ### Description - Fixes Whisper CUDA initialization by selecting valid compute types and adding safe fallback behavior (#20173). ### Added - Device‑aware compute‑type resolution (CUDA → float16, CPU → int8). - CUDA mapping for int8 → int8_float16. ### Changed - Faster‑whisper model init now resolves compute type from device/config. - CUDA init failures retry with float16. ### Deprecated - None. ### Removed - None. ### Fixed - CUDA Whisper crash caused by invalid compute_type=int8 (#20173). ### Security - None. ### Breaking Changes - BREAKING CHANGE: None. ——— ### Testing (Manual) - CUDA build: ghcr.io/open-webui/open-webui:cuda starts successfully. - Whisper load: model initializes without ValueError. - Default CUDA compute: resolves to float16. - Regression test: forced compute type int8 on CUDA maps to int8_float16 (no crash). ### Screenshots or Videos - N/A (backend fix) ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 05:58:08 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#25508