[GH-ISSUE #1270] MLC-LLM Quant./Backend Support #12422

Closed
opened 2026-04-19 19:20:50 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @BuildBackBuehler on GitHub (Mar 23, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1270

Is your feature request related to a problem? Please describe.

I'm always frustrated when...I'm forced to use Ollama as a Mac User 🤪
Just to follow the format hah, but MLC-LLM is an oddly obscure project that provides one with relatively easy quantization (with impressive results) and is platform agnostic -- which is more to say than any other quantization format!

https://github.com/mlc-ai/mlc-llm

On top of that, it uses TVM to provide a backend which can be used as a server. They do have web-based UI but it is barebones as can be.

Describe the solution you'd like

So I'd love to be able to easily run those quants/their backend and plug 'er into Open-WebUI seamlessly. I imagine I can run it via API but...does that have limitations? I've had issues getting that to work in the past, not sure if w/ OWU.

Describe alternatives you've considered

I feel like a dolt who could figure this out...but unsure if API routing would come with consequence, speed-wise and also unable to reap any features/enhancements provided by WebUI to one's LLM. Still, an alternate solution. (Well, looked at the code and it seems Ollama is API-based)

Additional context
Add any other context or screenshots about the feature request here.

Originally created by @BuildBackBuehler on GitHub (Mar 23, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/1270 **Is your feature request related to a problem? Please describe.** I'm always frustrated when...I'm forced to use Ollama as a Mac User 🤪 Just to follow the format hah, but MLC-LLM is an oddly obscure project that provides one with relatively easy quantization (with impressive results) and is platform agnostic -- which is more to say than any other quantization format! https://github.com/mlc-ai/mlc-llm On top of that, it uses TVM to provide a backend which can be used as a server. They do have web-based UI but it is barebones as can be. **Describe the solution you'd like** So I'd love to be able to easily run those quants/their backend and plug 'er into Open-WebUI seamlessly. I imagine I can run it via API but...does that have limitations? I've had issues getting that to work in the past, not sure if w/ OWU. **Describe alternatives you've considered** I feel like a dolt who could figure this out...but unsure if API routing would come with consequence, speed-wise and also unable to reap any features/enhancements provided by WebUI to one's LLM. Still, an alternate solution. (Well, looked at the code and it seems Ollama is API-based) **Additional context** Add any other context or screenshots about the feature request here.
Author
Owner

@justinh-rahb commented on GitHub (Mar 23, 2024):

Uses standard OpenAI API spec, so this is easily added to Open WebUI:
https://llm.mlc.ai/docs/deploy/rest.html
https://docs.openwebui.com/tutorial/openai

<!-- gh-comment-id:2016359934 --> @justinh-rahb commented on GitHub (Mar 23, 2024): Uses standard OpenAI API spec, so this is easily added to Open WebUI: https://llm.mlc.ai/docs/deploy/rest.html https://docs.openwebui.com/tutorial/openai
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#12422