[GH-ISSUE #1437] feat: Tier 1 support for LocalAI #12497

Closed
opened 2026-04-19 19:25:59 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @lee-b on GitHub (Apr 5, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1437

Originally assigned to: @tjbck on GitHub.

LocalAI is similar to ollama, but aims to be a complete replacement for OpenAI, already including TTS, STT, and image generation (all of which which open webui use). It would be great to have tier 1 support for this.

Originally created by @lee-b on GitHub (Apr 5, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/1437 Originally assigned to: @tjbck on GitHub. LocalAI is similar to ollama, but aims to be a complete replacement for OpenAI, already including TTS, STT, and image generation (all of which which open webui use). It would be great to have tier 1 support for this.
GiteaMirror added the enhancementgood first issuecore labels 2026-04-19 19:25:59 -05:00
Author
Owner

@lenaxia commented on GitHub (Oct 19, 2024):

What does Tier 1 support entail in this case?

<!-- gh-comment-id:2423604495 --> @lenaxia commented on GitHub (Oct 19, 2024): What does Tier 1 support entail in this case?
Author
Owner

@lee-b commented on GitHub (Oct 19, 2024):

Primarily support for its named models and asking it to pull down new models, as ollama has support for those things, but also support for recognising that it does tts, stt, and image-gen, so not having to know and set the various URL permutations in different tabs and fields in the Open-WebUI admin pages to make it all work. Ideally its features would "just work", when you add the url and tell Open-WebUI that it's a LocalAI server, much like adding an Ollama URL "just works".

However, more broadly, here is what I observe in the UX, and what I propose instead:

Current UX

image

image

image

image

Still a lot of focus on Ollama here, and conceptual separation of ollama from other providers, despite the renaming from ollama-webui to open-webui, and the addition of generic openai server support, audio (TTS/STT), and image support.

These all need to be configured separately, and ollama is the only service that really feels like it has first class support, with the others being tacked on, with features like support for model names being missing or awkward. For example, with a llama.cpp server, the model name that is currently running always appears, but if you run a different model at the same server, then it breaks all of your derived models in workspace, because the model name has changed.

Now like Ollama, LocalAI supports multiple textgen models, each with their own names, as well as pulling down new models. Additionally, it supports other types of models too, each with names, and tags to identify their use cases / modalities:

image

image

AND, unlike ollama, it supports this for other types of models: text to speech, speech to text, image gen. But none of this is directly supported or recognised in Open-WebUI in the same way that Ollama's multiple model support is recognised and used.

Additionally, multimodal models that can do end-to-end speech, speech to text, and text to speech (amongst other modalities such as image to text) are a thing.

So it seems to me that, currently (and understandably) in Open-WebUI, ollama support is currently the only provider truly supported as "tier 1", but that expanding support for other providers would improve both the experience for users, and Open-WebUI's core design.

Proposed UX changes

Genericizing the configuration for a more LocalAI-like approach, via a generic plugin system, would be beneficial, such that:

i) any server plugin can provide 1...N named models (e.g., ollama, LocalAI)
ii) each provided model can have 1...N modality/capability tags (text-gen, text-to-speech, speech-to-text, speech-to-speech, text-embedding, text-to-image, image-to-text, text-to-video, text-only-ocr, ocr-with-table-reflow, etc.)

And so, (with apologies for the quick mock-up) I think that the admin UI should work a bit more like this, both to allow for alternative providers now, and to better place Open-WebUI for all of the multimodal and competing providers stuff that's either here or on its way:

image

Then then models tab could be a list of all models available, and options to pull down new models. The audio/images tabs would not be cluttered with URL details, but would provide more specialized options, like which voice to use by default, whether to use a dedicated "ocr" type of model or whether using an "image-to-text" model is allowed for images, whether you approve local or remote text-to-speech, etc.

<!-- gh-comment-id:2423699214 --> @lee-b commented on GitHub (Oct 19, 2024): Primarily support for its named models and asking it to pull down new models, as ollama has support for those things, but also support for recognising that it does tts, stt, and image-gen, so not having to know and set the various URL permutations in different tabs and fields in the Open-WebUI admin pages to make it all work. Ideally its features would "just work", when you add the url and tell Open-WebUI that it's a LocalAI server, much like adding an Ollama URL "just works". However, more broadly, here is what I observe in the UX, and what I propose instead: ### Current UX ![image](https://github.com/user-attachments/assets/a64afdff-2386-474c-9daa-1321d02ba9ff) ![image](https://github.com/user-attachments/assets/0f3c4581-9fa5-4610-8fc1-6dba41fcf242) ![image](https://github.com/user-attachments/assets/6531432f-2c68-431c-b811-959fe8565735) ![image](https://github.com/user-attachments/assets/fa39294a-cfc1-42ad-b491-5820cc2c10b1) Still a lot of focus on Ollama here, and conceptual separation of ollama from other providers, despite the renaming from ollama-webui to open-webui, and the addition of generic openai server support, audio (TTS/STT), and image support. These all need to be configured separately, and ollama is the only service that really feels like it has first class support, with the others being tacked on, with features like support for model names being missing or awkward. For example, with a llama.cpp server, the model name that is currently running always appears, but if you run a different model at the same server, then it breaks all of your derived models in workspace, because the model name has changed. Now like Ollama, LocalAI supports multiple textgen models, each with their own names, as well as pulling down new models. Additionally, it supports other types of models too, each with names, and tags to identify their use cases / modalities: ![image](https://github.com/user-attachments/assets/074b2db3-28ac-4fb3-bf53-a5108f459c70) ![image](https://github.com/user-attachments/assets/2b476822-a61e-446f-a0f6-70411b338cb4) AND, unlike ollama, it supports this for other types of models: text to speech, speech to text, image gen. But none of this is directly supported or recognised in Open-WebUI in the same way that Ollama's multiple model support is recognised and used. Additionally, multimodal models that can do end-to-end speech, speech to text, and text to speech (amongst other modalities such as image to text) are a thing. So it seems to me that, currently (and understandably) in Open-WebUI, ollama support is currently the only provider truly supported as "tier 1", but that expanding support for other providers would improve both the experience for users, and Open-WebUI's core design. ### Proposed UX changes Genericizing the configuration for a more LocalAI-like approach, via a generic plugin system, would be beneficial, such that: i) any server plugin can provide 1...N named models (e.g., ollama, LocalAI) ii) each provided model can have 1...N modality/capability tags (text-gen, text-to-speech, speech-to-text, speech-to-speech, text-embedding, text-to-image, image-to-text, text-to-video, text-only-ocr, ocr-with-table-reflow, etc.) And so, (with apologies for the quick mock-up) I think that the admin UI should work a bit more like this, both to allow for alternative providers now, and to better place Open-WebUI for all of the multimodal and competing providers stuff that's either here or on its way: ![image](https://github.com/user-attachments/assets/fd2ebaa8-53c9-42b5-9896-abb95e641c36) Then then models tab could be a list of all models available, and options to pull down new models. The audio/images tabs would not be cluttered with URL details, but would provide more specialized options, like which voice to use by default, whether to use a dedicated "ocr" type of model or whether using an "image-to-text" model is allowed for images, whether you approve local or remote text-to-speech, etc.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#12497