[PR #12944] [CLOSED] Introducing Custom TTS Engine Support! Better than using OPENAI endpoint #23057

Closed
opened 2026-04-20 04:36:04 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/12944
Author: @RedsAnalysis
Created: 4/16/2025
Status: Closed

Base: devHead: customtts_v1


📝 Commits (4)

  • 9b9fc90 added a front end elemnt to show Customtts on admin settings page and also changed the getvoices function to respond to /audio/voices endpoint
  • 74e8df6 This commit allows users to integrate their own external TTS provider by:
  • d42fd9a END OF THIS BRANCH: Added CustomTTS support with user friendly interface, allowing users to select models and voice from a drop down.
  • 241fcce ADDED CORS_ALLOW_ORIGIN=http://localhost:5173 to the backend dev.sh script since i was running into a CORS error

📊 Changes

6 files changed (+450 additions, -46 deletions)

View changed files

📝 backend/dev.sh (+1 -1)
📝 backend/open_webui/config.py (+15 -0)
📝 backend/open_webui/main.py (+4 -0)
📝 backend/open_webui/routers/audio.py (+197 -0)
📝 src/lib/apis/audio/index.ts (+47 -23)
📝 src/lib/components/admin/Settings/Audio.svelte (+186 -22)

📄 Description

Approach 1: Using the Existing "OpenAI" Engine Setting (with Custom URL)

Pros:
No Backend Code Change Needed (Initially): For basic synthesis, if the custom server perfectly mimics the OpenAI /audio/speech endpoint and payload, it might work without modifying Open WebUI's backend code initially.
Simple Setup (if compatible): Only requires changing the API Base URL field.
Cons:
No Dynamic Voice/Model Discovery: Open WebUI won't attempt to fetch voice or model lists from the custom URL. Users see hardcoded OpenAI defaults (alloy, tts-1, etc.) or nothing in dropdowns/datalists.
Manual Input Required: Users must manually type the exact Voice ID and Model ID required by the custom server into the text fields, without any validation or selection assistance. Highly error-prone.
Poor User Experience: Difficult configuration, lack of guidance, potential for using incorrect/non-existent voices/models.
Misleading Configuration: The UI indicates "OpenAI" is selected, even though it's pointing to a different service, causing confusion.

  • Inflexible: Assumes the custom server uses the exact same paths and behaviors as the official OpenAI API for synthesis. It cannot adapt if the custom server has slightly different requirements, CustomTTS uses the same for now refer to TODO below.

Approach 2: Using the New "Custom TTS" Engine Setting (Your Implementation)

Pros:
Dynamic Voice Discovery: Actively fetches the voice list from the configured custom server's /audio/voices endpoint and populates a dropdown/select list.
Dynamic Model Discovery: Actively fetches the model list from the configured custom server's /models endpoint and populates a dropdown/select list.
Improved User Experience: Users can easily see and select the actual available voices and models from their specific custom server via intuitive dropdowns.
Accurate Configuration: Clearly indicates that a custom, non-standard engine is being used.
Reduced Errors: Selecting from a list prevents typos and ensures valid voice/model IDs are sent.
Clear Separation of Logic: Keeps the specific logic for handling custom/external servers separate from the standard OpenAI implementation.

TODO / Future Plans:

  • Enhance Custom TTS Flexibility & Compatibility:
    • Current Status: The Custom TTS engine is currently designed based on the response formats observed from an example server (specifically {"voices": ["string", ...]} for the voices list and {"data": [{"id": ..., "name": ...}]} for the models list) and assumes fixed relative API paths (/models, /audio/voices, /audio/speech).
    • Planned Enhancement: To achieve crucial compatibility with a wider range of external TTS APIs, the next step is to allow users to specify the exact API paths used for fetching models, voices, and synthesizing speech directly in the frontend settings.
    • Example: Instead of hardcoding /audio/speech, a user could input /generate/speech/v2 or /speak if that's what their specific external service requires.
    • Benefit: This will make the Custom TTS feature significantly more adaptable and powerful, enabling integration with many more diverse external services beyond those strictly following the initial conventions.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/12944 **Author:** [@RedsAnalysis](https://github.com/RedsAnalysis) **Created:** 4/16/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `customtts_v1` --- ### 📝 Commits (4) - [`9b9fc90`](https://github.com/open-webui/open-webui/commit/9b9fc909040455f28f19b623022ad73fb8472d71) added a front end elemnt to show Customtts on admin settings page and also changed the getvoices function to respond to /audio/voices endpoint - [`74e8df6`](https://github.com/open-webui/open-webui/commit/74e8df65159c265273bb5b9894488674d88a975e) This commit allows users to integrate their own external TTS provider by: - [`d42fd9a`](https://github.com/open-webui/open-webui/commit/d42fd9a7139cd81d23ce6d08b84e107679a5235c) END OF THIS BRANCH: Added CustomTTS support with user friendly interface, allowing users to select models and voice from a drop down. - [`241fcce`](https://github.com/open-webui/open-webui/commit/241fcceb064b7eb6f3e33de70403c952616789e7) ADDED CORS_ALLOW_ORIGIN=http://localhost:5173 to the backend dev.sh script since i was running into a CORS error ### 📊 Changes **6 files changed** (+450 additions, -46 deletions) <details> <summary>View changed files</summary> 📝 `backend/dev.sh` (+1 -1) 📝 `backend/open_webui/config.py` (+15 -0) 📝 `backend/open_webui/main.py` (+4 -0) 📝 `backend/open_webui/routers/audio.py` (+197 -0) 📝 `src/lib/apis/audio/index.ts` (+47 -23) 📝 `src/lib/components/admin/Settings/Audio.svelte` (+186 -22) </details> ### 📄 Description ### Approach 1: Using the Existing "OpenAI" Engine Setting (with Custom URL) **Pros:** No Backend Code Change Needed (Initially): For basic synthesis, if the custom server perfectly mimics the OpenAI /audio/speech endpoint and payload, it might work without modifying Open WebUI's backend code initially. Simple Setup (if compatible): Only requires changing the API Base URL field. **Cons:** ❌ No Dynamic Voice/Model Discovery: Open WebUI won't attempt to fetch voice or model lists from the custom URL. Users see hardcoded OpenAI defaults (alloy, tts-1, etc.) or nothing in dropdowns/datalists. ❌ Manual Input Required: Users must manually type the exact Voice ID and Model ID required by the custom server into the text fields, without any validation or selection assistance. Highly error-prone. ❌ Poor User Experience: Difficult configuration, lack of guidance, potential for using incorrect/non-existent voices/models. ❌ Misleading Configuration: The UI indicates "OpenAI" is selected, even though it's pointing to a different service, causing confusion. + Inflexible: Assumes the custom server uses the exact same paths and behaviors as the official OpenAI API for synthesis. It cannot adapt if the custom server has slightly different requirements, CustomTTS uses the same for now refer to TODO below. ### Approach 2: Using the New "Custom TTS" Engine Setting (Your Implementation) **Pros:** ✅ Dynamic Voice Discovery: Actively fetches the voice list from the configured custom server's /audio/voices endpoint and populates a dropdown/select list. ✅ Dynamic Model Discovery: Actively fetches the model list from the configured custom server's /models endpoint and populates a dropdown/select list. ✅ Improved User Experience: Users can easily see and select the actual available voices and models from their specific custom server via intuitive dropdowns. ✅ Accurate Configuration: Clearly indicates that a custom, non-standard engine is being used. ✅ Reduced Errors: Selecting from a list prevents typos and ensures valid voice/model IDs are sent. ✅ Clear Separation of Logic: Keeps the specific logic for handling custom/external servers separate from the standard OpenAI implementation. **TODO / Future Plans:** * **Enhance Custom TTS Flexibility & Compatibility:** * **Current Status:** The Custom TTS engine is currently designed based on the response formats observed from an example server (specifically `{"voices": ["string", ...]}` for the voices list and `{"data": [{"id": ..., "name": ...}]}` for the models list) and assumes fixed relative API paths (`/models`, `/audio/voices`, `/audio/speech`). * **Planned Enhancement:** To achieve **crucial compatibility** with a wider range of external TTS APIs, the next step is to allow users to **specify the exact API paths** used for fetching models, voices, and synthesizing speech directly in the frontend settings. * **Example:** Instead of hardcoding `/audio/speech`, a user could input `/generate/speech/v2` or `/speak` if that's what their specific external service requires. * **Benefit:** This will make the Custom TTS feature significantly more adaptable and powerful, enabling integration with many more diverse external services beyond those strictly following the initial conventions. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 04:36:04 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#23057