[GH-ISSUE #20173] issue: STT local whisper issue (Please don't turn this to discussion) #57778

New Issue

GiteaMirror · 2026-05-05T21:35:22-05:00

GiteaMirror commented

2026-05-05 21:35:22 -05:00

Originally created by @ALIENvsROBOT on GitHub (Dec 25, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20173

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.43 CUDA docker image

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Local Whisper STT should transcribe audio successfully on CUDA, as it did on v0.6.41.

Actual Behavior

Transcription fails with the int8 compute_type error shown above.

Steps to Reproduce

Run with docker cuda support and try to use local whisper. See "Additional Information" section

Logs & Screenshots

File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups

raise exc

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in call

response = await self.dispatch_func(request, call_next)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/backend/open_webui/main.py", line 1316, in dispatch

response = await call_next(request)

           ^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next

raise app_exc from app_exc.__cause__ or app_exc.__context__

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro

await self.app(scope, receive_or_disconnect, send_no_error)

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 191, in call

with recv_stream, send_stream, collapse_excgroups():

File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit

self.gen.throw(typ, value, traceback)

File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups

raise exc

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in call

response = await self.dispatch_func(request, call_next)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/backend/open_webui/utils/security_headers.py", line 11, in dispatch

response = await call_next(request)

           ^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next

raise app_exc from app_exc.__cause__ or app_exc.__context__

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro

await self.app(scope, receive_or_disconnect, send_no_error)

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 191, in call

with recv_stream, send_stream, collapse_excgroups():

File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit

self.gen.throw(typ, value, traceback)

File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups

raise exc

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in call

response = await self.dispatch_func(request, call_next)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/backend/open_webui/main.py", line 1272, in dispatch

response = await call_next(request)

           ^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next

raise app_exc from app_exc.__cause__ or app_exc.__context__

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro

await self.app(scope, receive_or_disconnect, send_no_error)

File "/usr/local/lib/python3.11/site-packages/starlette_compress/init.py", line 94, in call

return await self._brotli(scope, receive, send)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/starlette_compress/_brotli.py", line 106, in call

await self.app(scope, receive, wrapper)

File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 63, in call

await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)

File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app

raise exc

File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app

await app(scope, receive, sender)

File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call

await self.app(scope, receive, send)

File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 716, in call

await self.middleware_stack(scope, receive, send)

File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 736, in app

await route.handle(scope, receive, send)

File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 290, in handle

await self.app(scope, receive, send)

File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 117, in app

await wrap_app_handling_exceptions(app, request)(scope, receive, send)

File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app

raise exc

File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app

await app(scope, receive, sender)

File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 103, in app

response = await f(request)

           ^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 424, in app

raw_response = await run_endpoint_function(

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 310, in run_endpoint_function

return await dependant.call(**values)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/backend/open_webui/routers/audio.py", line 274, in update_audio_config

request.app.state.faster_whisper_model = set_faster_whisper_model(

                                         ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/backend/open_webui/routers/audio.py", line 144, in set_faster_whisper_model

whisper_model = WhisperModel(**faster_whisper_kwargs)

                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 689, in init

self.model = ctranslate2.models.Whisper(

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^

ValueError: Requested int8 compute type, but the target device or backend do not support efficient int8 computation.

Additional Information

Likely root cause

Open WebUI builds the faster-whisper model args roughly like:

device = "cuda" when CUDA is enabled (via USE_CUDA_DOCKER → DEVICE_TYPE="cuda" logic)
but compute_type is hard-coded to "int8" regardless of device

That combination can fail on some CUDA/backends, because CTranslate2 may not support efficient int8 on the active GPU/backend (float16 is typically the safe CUDA default). This is documented upstream in faster-whisper issues as the same exception when using compute_type="int8" on CUDA.

So: CUDA path + forced int8 => hard failure.

Why this shows up specifically now (0.6.41 → 0.6.43)

v0.6.42 explicitly warns about dependency version bumps, which likely includes faster-whisper/ctranslate2 wheels and can change backend capability checks/behavior.
v0.6.43 includes STT refactors (e.g. MIME/content-type handling fix in commit 4ab917c ), which may have changed the request flow so more uploads reach the model init path reliably — but the underlying CUDA+int8 incompatibility remains.

Recommended fix (maintainer-friendly)

Stop forcing compute_type="int8" on CUDA by default.
- Suggested default policy:
  - device == "cuda" → default compute_type="float16" (or int8_float16 if you want partial-quant speedups)
  - device == "cpu" → default compute_type="int8" (current behavior fine)
Add an override env var so users can control it without patching source:
- e.g. WHISPER_COMPUTE_TYPE (or AUDIO_STT_COMPUTE_TYPE)
- allowed values: float16, int8_float16, int8, etc.
- default logic as above if env var not set
Optional robustness: if WhisperModel init raises the “Requested int8 compute type…” ValueError, automatically retry with float16 and log a warning.

References

v0.6.43 STT refactor commit: 4ab917c (fix/refac: stt default content type)
v0.6.42 release notes warn about dependency version bumps (potential faster-whisper/ctranslate2 change)
Upstream faster-whisper shows the same exception when using compute_type="int8" on CUDA and recommends using float16.

Originally created by @ALIENvsROBOT on GitHub (Dec 25, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/20173 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.43 CUDA docker image ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Local Whisper STT should transcribe audio successfully on CUDA, as it did on v0.6.41. ### Actual Behavior Transcription fails with the int8 compute_type error shown above. ### Steps to Reproduce Run with docker cuda support and try to use local whisper. See "Additional Information" section ### Logs & Screenshots File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups raise exc File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in __call__ response = await self.dispatch_func(request, call_next) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/main.py", line 1316, in dispatch response = await call_next(request) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next raise app_exc from app_exc.__cause__ or app_exc.__context__ File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro await self.app(scope, receive_or_disconnect, send_no_error) File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 191, in __call__ with recv_stream, send_stream, collapse_excgroups(): File "/usr/local/lib/python3.11/contextlib.py", line 158, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups raise exc File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in __call__ response = await self.dispatch_func(request, call_next) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/utils/security_headers.py", line 11, in dispatch response = await call_next(request) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next raise app_exc from app_exc.__cause__ or app_exc.__context__ File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro await self.app(scope, receive_or_disconnect, send_no_error) File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 191, in __call__ with recv_stream, send_stream, collapse_excgroups(): File "/usr/local/lib/python3.11/contextlib.py", line 158, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 85, in collapse_excgroups raise exc File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in __call__ response = await self.dispatch_func(request, call_next) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/main.py", line 1272, in dispatch response = await call_next(request) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next raise app_exc from app_exc.__cause__ or app_exc.__context__ File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro await self.app(scope, receive_or_disconnect, send_no_error) File "/usr/local/lib/python3.11/site-packages/starlette_compress/__init__.py", line 94, in __call__ return await self._brotli(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/starlette_compress/_brotli.py", line 106, in __call__ await self.app(scope, receive, wrapper) File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 63, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ await self.app(scope, receive, send) File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 716, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 736, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 290, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 117, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 103, in app response = await f(request) ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 424, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 310, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/routers/audio.py", line 274, in update_audio_config request.app.state.faster_whisper_model = set_faster_whisper_model( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/routers/audio.py", line 144, in set_faster_whisper_model whisper_model = WhisperModel(**faster_whisper_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 689, in __init__ self.model = ctranslate2.models.Whisper( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: Requested int8 compute type, but the target device or backend do not support efficient int8 computation. ### Additional Information ### Likely root cause Open WebUI builds the faster-whisper model args roughly like: - `device = "cuda"` when CUDA is enabled (via `USE_CUDA_DOCKER` → `DEVICE_TYPE="cuda"` logic) - but `compute_type` is **hard-coded to `"int8"`** regardless of device That combination can fail on some CUDA/backends, because CTranslate2 may not support *efficient int8* on the active GPU/backend (float16 is typically the safe CUDA default). This is documented upstream in faster-whisper issues as the same exception when using `compute_type="int8"` on CUDA. So: **CUDA path + forced int8** => hard failure. ### Why this shows up specifically now (0.6.41 → 0.6.43) v0.6.42 explicitly warns about **dependency version bumps**, which likely includes faster-whisper/ctranslate2 wheels and can change backend capability checks/behavior. v0.6.43 includes STT refactors (e.g. MIME/content-type handling fix in commit `4ab917c` ), which may have changed the request flow so more uploads reach the model init path reliably — but the underlying CUDA+int8 incompatibility remains. ### Recommended fix (maintainer-friendly) 1) **Stop forcing `compute_type="int8"` on CUDA by default.** - Suggested default policy: - `device == "cuda"` → default `compute_type="float16"` (or `int8_float16` if you want partial-quant speedups) - `device == "cpu"` → default `compute_type="int8"` (current behavior fine) 2) Add an override env var so users can control it without patching source: - e.g. `WHISPER_COMPUTE_TYPE` (or `AUDIO_STT_COMPUTE_TYPE`) - allowed values: `float16`, `int8_float16`, `int8`, etc. - default logic as above if env var not set 3) Optional robustness: if WhisperModel init raises the “Requested int8 compute type…” ValueError, automatically retry with `float16` and log a warning. ### References - v0.6.43 STT refactor commit: `4ab917c` (fix/refac: stt default content type) - v0.6.42 release notes warn about dependency version bumps (potential faster-whisper/ctranslate2 change) - Upstream faster-whisper shows the same exception when using `compute_type="int8"` on CUDA and recommends using float16.

GiteaMirror added the bug label 2026-05-05 21:35:23 -05:00

GiteaMirror closed this issue

2026-05-05 21:35:27 -05:00

GiteaMirror commented

2026-05-05 21:35:31 -05:00

@owui-terminator[bot] commented on GitHub (Dec 25, 2025):

🔍 Similar Issues Found

I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:

#20125 issue: STT local whisper failure
by ALIENvsROBOT • Dec 22, 2025 • bug
#20019 issue:
by j63440490 • Dec 17, 2025 • bug
#19777 issue:
by Yaute7 • Dec 05, 2025 • bug
#20092 issue:
by VideoRyan • Dec 22, 2025 • bug
#19864 issue:
by Haervwe • Dec 10, 2025 • bug

Show 5 more related issues

#19877 issue:
by dotmobo • Dec 11, 2025 • bug
#20046 issue:
by pierrelouisbescond • Dec 19, 2025 • bug
#19861 issue:
by QuitHub • Dec 10, 2025 • bug
#19563 issue:
by naruto7g • Nov 28, 2025 • bug
#19211 issue:
by Byrnes9 • Nov 16, 2025 • bug

💡 Tips:

If this is a duplicate, please consider closing this issue and adding any additional details to the existing one
If you found a solution in any of these issues, please share it here to help others

This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

@owui-terminator[bot] commented on GitHub (Dec 25, 2025): 🔍 **Similar Issues Found** I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions: 1. [#20125](https://github.com/open-webui/open-webui/issues/20125) **issue: STT local whisper failure** *by ALIENvsROBOT • Dec 22, 2025 • `bug`* 2. [#20019](https://github.com/open-webui/open-webui/issues/20019) **issue:** *by j63440490 • Dec 17, 2025 • `bug`* 3. [#19777](https://github.com/open-webui/open-webui/issues/19777) **issue:** *by Yaute7 • Dec 05, 2025 • `bug`* 4. [#20092](https://github.com/open-webui/open-webui/issues/20092) **issue:** *by VideoRyan • Dec 22, 2025 • `bug`* 5. [#19864](https://github.com/open-webui/open-webui/issues/19864) **issue:** *by Haervwe • Dec 10, 2025 • `bug`* <details> <summary>Show 5 more related issues</summary> 6. [#19877](https://github.com/open-webui/open-webui/issues/19877) **issue:** *by dotmobo • Dec 11, 2025 • `bug`* 7. [#20046](https://github.com/open-webui/open-webui/issues/20046) **issue:** *by pierrelouisbescond • Dec 19, 2025 • `bug`* 8. [#19861](https://github.com/open-webui/open-webui/issues/19861) **issue:** *by QuitHub • Dec 10, 2025 • `bug`* 9. [#19563](https://github.com/open-webui/open-webui/issues/19563) **issue:** *by naruto7g • Nov 28, 2025 • `bug`* 10. [#19211](https://github.com/open-webui/open-webui/issues/19211) **issue:** *by Byrnes9 • Nov 16, 2025 • `bug`* </details> --- 💡 **Tips:** - If this is a duplicate, please consider closing this issue and adding any additional details to the existing one - If you found a solution in any of these issues, please share it here to help others *This comment was generated automatically by a bot.* Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

GiteaMirror commented

2026-05-05 21:35:34 -05:00

@ALIENvsROBOT commented on GitHub (Dec 26, 2025):

@tjbck Please could you reproduce this bug ??

@ALIENvsROBOT commented on GitHub (Dec 26, 2025): @tjbck Please could you reproduce this bug ??

GiteaMirror commented

2026-05-05 21:35:35 -05:00

@ALIENvsROBOT commented on GitHub (Dec 27, 2025):

Please I really want someone to fix this issue asap. My whisper is not working with GPU on version 0.6.43 and 0.6.42 but it worked in 0.6.41 please give me fix !

@ALIENvsROBOT commented on GitHub (Dec 27, 2025): Please I really want someone to fix this issue asap. My whisper is not working with GPU on version 0.6.43 and 0.6.42 but it worked in 0.6.41 please give me fix !

GiteaMirror commented

2026-05-05 21:35:36 -05:00

@Classic298 commented on GitHub (Dec 27, 2025):

@ALIENvsROBOT don't ping Tim on issues, he will check them when he has time.

If it is this urgent to you, feel free to work on a PR

@Classic298 commented on GitHub (Dec 27, 2025): @ALIENvsROBOT don't ping Tim on issues, he will check them when he has time. If it is this urgent to you, feel free to work on a PR

GiteaMirror commented

2026-05-05 21:35:38 -05:00

@ALIENvsROBOT commented on GitHub (Dec 27, 2025):

really sorry. I want this for my prototype Majorly rely on whisper transcription.

@ALIENvsROBOT commented on GitHub (Dec 27, 2025): really sorry. I want this for my prototype Majorly rely on whisper transcription.

GiteaMirror commented

2026-05-05 21:35:39 -05:00

@ALIENvsROBOT commented on GitHub (Dec 28, 2025):

The crash is caused by the hardcoded compute_type="int8" default interacting with recent faster-whisper updates, which now strictly reject int8 on Linux/Podman setups (like my NVIDIA Blackwell) where the specific kernels aren't available.
Unlike Windows (which handles this via driver emulation), the Linux backend throws a ValueError immediately when the hardware/driver combo doesn't natively support the requested int8 precision.
My testing confirms that float16 is the only universally stable compute type across CUDA architectures, whereas int8 support is inconsistent and hardware-dependent.
To fix this, I updated the logic to default CUDA devices to float16 (safe standard) and implemented a dynamic fallback chain that only attempts int8 if float16 is unavailable.
This ensures the container initializes reliably on all GPU generations without crashing due to strict quantization requirements.

@ALIENvsROBOT commented on GitHub (Dec 28, 2025): The crash is caused by the hardcoded compute_type="int8" default interacting with recent faster-whisper updates, which now strictly reject int8 on Linux/Podman setups (like my NVIDIA Blackwell) where the specific kernels aren't available. Unlike Windows (which handles this via driver emulation), the Linux backend throws a ValueError immediately when the hardware/driver combo doesn't natively support the requested int8 precision. My testing confirms that float16 is the only universally stable compute type across CUDA architectures, whereas int8 support is inconsistent and hardware-dependent. To fix this, I updated the logic to default CUDA devices to float16 (safe standard) and implemented a dynamic fallback chain that only attempts int8 if float16 is unavailable. This ensures the container initializes reliably on all GPU generations without crashing due to strict quantization requirements.

GiteaMirror commented

2026-05-05 21:35:40 -05:00

@dohabandit commented on GitHub (Feb 25, 2026):

I have the same issue, although switching the model type to float16 or float32 didn't work with my Tesla V100S. I get the same errors when it is attempting to process chunks of audio:

fastapi.exceptions.HTTPException: 500: Error transcribing chunk: cuDNN failed with status CUDNN_STATUS_EXECUTION_FAILED_CUDART

I decided to switch the whisper STT to CPU, but there wasn't an easy way to do that.
I made changes to audio.py and env.py to add an env variable "WHISPER_DEVICE_TYPE", which I set to "cpu".
`# whisper is broken with V100S and CUDA12 libraries, use CPU
WHISPER_DEVICE_TYPE=cpu
WHISPER_COMPUTE_TYPE=int8
WHISPER_MODEL_AUTO_UPDATE=false

cpu,cuda: float16 int8 int8-float16 float32

`

Also I am running OUI locally in a venv and not running in a docker container. The env var to tell OUI to use cuda is counterintuitive and I think it needs to be changed because USE_CUDA_DOCKER is required to be set to true to use cuda even though you are not using docker!:
# use the V100S DEVICE_TYPE=cuda USE_CUDA_DOCKER=true CUDA_VISIBLE_DEVICES=0

Change to env.py:
# device type embedding models - "cpu" (default), "cuda" (nvidia gpu required) or "mps" (apple silicon) - choosing this right can lead to better performance USE_CUDA = os.environ.get("USE_CUDA_DOCKER", "false") WHISPER_DEVICE_TYPE = os.environ.get("WHISPER_DEVICE_TYPE")

Change to audio.py:
faster_whisper_kwargs = { "model_size_or_path": model, "device": WHISPER_DEVICE_TYPE if WHISPER_DEVICE_TYPE and WHISPER_DEVICE_TYPE == "cuda" else "cpu", "compute_type": WHISPER_COMPUTE_TYPE, "download_root": WHISPER_MODEL_DIR, "local_files_only": not auto_update, }
This lets me set DEVICE_TYPE to cuda for everything else tha OUI does, but have whisper/ctranslate2 using cpu instead.
faster-whisper==1.2.1
ctranslate2==4.6.2
nvidia-cudnn-cu12==9.10.2.21
PyTorch: 2.9.1+cu128
CUDA available: True
CUDA version: 12.8
cuDNN version: 91002
ctranslate2: 4.6.2
GPU: Tesla V100S-PCIE-32GB

@dohabandit commented on GitHub (Feb 25, 2026): I have the same issue, although switching the model type to float16 or float32 didn't work with my Tesla V100S. I get the same errors when it is attempting to process chunks of audio: > fastapi.exceptions.HTTPException: 500: Error transcribing chunk: cuDNN failed with status CUDNN_STATUS_EXECUTION_FAILED_CUDART I decided to switch the whisper STT to CPU, but there wasn't an easy way to do that. I made changes to audio.py and env.py to add an env variable "WHISPER_DEVICE_TYPE", which I set to "cpu". `# whisper is broken with V100S and CUDA12 libraries, use CPU WHISPER_DEVICE_TYPE=cpu WHISPER_COMPUTE_TYPE=int8 WHISPER_MODEL_AUTO_UPDATE=false # cpu,cuda: float16 int8 int8-float16 float32 ` Also I am running OUI locally in a venv and not running in a docker container. The env var to tell OUI to use cuda is counterintuitive and I think it needs to be changed because USE_CUDA_DOCKER is required to be set to true to use cuda even though you are not using docker!: `# use the V100S DEVICE_TYPE=cuda USE_CUDA_DOCKER=true CUDA_VISIBLE_DEVICES=0 ` Change to env.py: `# device type embedding models - "cpu" (default), "cuda" (nvidia gpu required) or "mps" (apple silicon) - choosing this right can lead to better performance USE_CUDA = os.environ.get("USE_CUDA_DOCKER", "false") WHISPER_DEVICE_TYPE = os.environ.get("WHISPER_DEVICE_TYPE") ` Change to audio.py: ` faster_whisper_kwargs = { "model_size_or_path": model, "device": WHISPER_DEVICE_TYPE if WHISPER_DEVICE_TYPE and WHISPER_DEVICE_TYPE == "cuda" else "cpu", "compute_type": WHISPER_COMPUTE_TYPE, "download_root": WHISPER_MODEL_DIR, "local_files_only": not auto_update, } ` This lets me set DEVICE_TYPE to cuda for everything else tha OUI does, but have whisper/ctranslate2 using cpu instead. faster-whisper==1.2.1 ctranslate2==4.6.2 nvidia-cudnn-cu12==9.10.2.21 PyTorch: 2.9.1+cu128 CUDA available: True CUDA version: 12.8 cuDNN version: 91002 ctranslate2: 4.6.2 GPU: Tesla V100S-PCIE-32GB

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#57778