[GH-ISSUE #1270] MLC-LLM Quant./Backend Support #12422

New Issue

GiteaMirror · 2026-04-19T19:20:50-05:00

GiteaMirror commented

2026-04-19 19:20:50 -05:00

Originally created by @BuildBackBuehler on GitHub (Mar 23, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1270

Is your feature request related to a problem? Please describe.

I'm always frustrated when...I'm forced to use Ollama as a Mac User 🤪
Just to follow the format hah, but MLC-LLM is an oddly obscure project that provides one with relatively easy quantization (with impressive results) and is platform agnostic -- which is more to say than any other quantization format!

https://github.com/mlc-ai/mlc-llm

On top of that, it uses TVM to provide a backend which can be used as a server. They do have web-based UI but it is barebones as can be.

Describe the solution you'd like

So I'd love to be able to easily run those quants/their backend and plug 'er into Open-WebUI seamlessly. I imagine I can run it via API but...does that have limitations? I've had issues getting that to work in the past, not sure if w/ OWU.

Describe alternatives you've considered

I feel like a dolt who could figure this out...but unsure if API routing would come with consequence, speed-wise and also unable to reap any features/enhancements provided by WebUI to one's LLM. Still, an alternate solution. (Well, looked at the code and it seems Ollama is API-based)

Additional context
Add any other context or screenshots about the feature request here.

Originally created by @BuildBackBuehler on GitHub (Mar 23, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/1270 **Is your feature request related to a problem? Please describe.** I'm always frustrated when...I'm forced to use Ollama as a Mac User 🤪 Just to follow the format hah, but MLC-LLM is an oddly obscure project that provides one with relatively easy quantization (with impressive results) and is platform agnostic -- which is more to say than any other quantization format! https://github.com/mlc-ai/mlc-llm On top of that, it uses TVM to provide a backend which can be used as a server. They do have web-based UI but it is barebones as can be. **Describe the solution you'd like** So I'd love to be able to easily run those quants/their backend and plug 'er into Open-WebUI seamlessly. I imagine I can run it via API but...does that have limitations? I've had issues getting that to work in the past, not sure if w/ OWU. **Describe alternatives you've considered** I feel like a dolt who could figure this out...but unsure if API routing would come with consequence, speed-wise and also unable to reap any features/enhancements provided by WebUI to one's LLM. Still, an alternate solution. (Well, looked at the code and it seems Ollama is API-based) **Additional context** Add any other context or screenshots about the feature request here.

GiteaMirror closed this issue

2026-04-19 19:20:50 -05:00

GiteaMirror commented

2026-04-19 19:20:51 -05:00

@justinh-rahb commented on GitHub (Mar 23, 2024):

Uses standard OpenAI API spec, so this is easily added to Open WebUI:
https://llm.mlc.ai/docs/deploy/rest.html
https://docs.openwebui.com/tutorial/openai

@justinh-rahb commented on GitHub (Mar 23, 2024): Uses standard OpenAI API spec, so this is easily added to Open WebUI: https://llm.mlc.ai/docs/deploy/rest.html https://docs.openwebui.com/tutorial/openai

GiteaMirror referenced this issue

2026-04-20 04:29:04 -05:00

[PR #12422] [CLOSED] refac: Revamp "Archived Chats" modal #22909

GiteaMirror referenced this issue

2026-04-25 11:31:10 -05:00

[PR #12422] [CLOSED] refac: Revamp "Archived Chats" modal #38539

GiteaMirror referenced this issue

2026-04-29 20:33:41 -05:00

[PR #12422] [CLOSED] refac: Revamp "Archived Chats" modal #45957

GiteaMirror referenced this issue

2026-05-06 05:25:29 -05:00

[PR #12422] [CLOSED] refac: Revamp "Archived Chats" modal #61765