[GH-ISSUE #7353] Does ollama have other model support plans?Such as TTS, graphics, video, etc #30433

Open
opened 2026-04-22 10:03:06 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @E218PQ on GitHub (Oct 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7353

We deeply appreciate the convenience, speed, and power of Olama. In order to meet more application scenarios, we hope that Olama can increase support for other model categories, such as text generated speech, text generated images, text generated videos, etc. With the rapid development of AI, the demand for AI will also increase. I hope you can carefully consider it.

Originally created by @E218PQ on GitHub (Oct 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7353 We deeply appreciate the convenience, speed, and power of Olama. In order to meet more application scenarios, we hope that Olama can increase support for other model categories, such as text generated speech, text generated images, text generated videos, etc. With the rapid development of AI, the demand for AI will also increase. I hope you can carefully consider it.
GiteaMirror added the model label 2026-04-22 10:03:06 -05:00
Author
Owner

@AncientMystic commented on GitHub (Oct 30, 2024):

This would be interesting to see, it could be incorporated into the text generation as a way to speak with the ai, use it as an assistant, visualise data/responses or even create a video assistant.

It would certainly take a LOT of work and computing power currently to achieve any of this.

While AI is advancing rapidly, i think this might be putting the cart before the horse at this stage.

Performance of AI is really terrible still, results are not great and it takes a LOT of computing power to get there. But it would be cool to see more options.

Thats just my personal take on the matter though.

<!-- gh-comment-id:2448350959 --> @AncientMystic commented on GitHub (Oct 30, 2024): This would be interesting to see, it could be incorporated into the text generation as a way to speak with the ai, use it as an assistant, visualise data/responses or even create a video assistant. It would certainly take a LOT of work and computing power currently to achieve any of this. While AI is advancing rapidly, i think this might be putting the cart before the horse at this stage. Performance of AI is really terrible still, results are not great and it takes a LOT of computing power to get there. But it would be cool to see more options. Thats just my personal take on the matter though.
Author
Owner

@E218PQ commented on GitHub (Nov 5, 2024):

这将很有趣,它可以被整合到文本生成中,作为与 AI 交谈的一种方式,将其用作助手,可视化数据/响应,甚至创建视频助手。

目前肯定需要大量的工作和计算能力才能实现这些。

虽然 AI 正在迅速发展,但我认为这可能是在现阶段本末倒置。

AI 的性能仍然很糟糕,结果不是很好,需要大量的计算能力才能达到这个目标。但是如果能看到更多选择,那就太酷了。

不过,这只是我个人对此事的看法。

Yes, currently, various applications of AI require high hardware performance, but we hope that Olama can still expand its support for other types of models, such as RAG, TTS, etc., so that in many scenarios, there is no need to deploy Olama+xinference at the same time, which cannot simplify the environment and lower the threshold for beginners.

<!-- gh-comment-id:2455999128 --> @E218PQ commented on GitHub (Nov 5, 2024): > 这将很有趣,它可以被整合到文本生成中,作为与 AI 交谈的一种方式,将其用作助手,可视化数据/响应,甚至创建视频助手。 > > 目前肯定需要大量的工作和计算能力才能实现这些。 > > 虽然 AI 正在迅速发展,但我认为这可能是在现阶段本末倒置。 > > AI 的性能仍然很糟糕,结果不是很好,需要大量的计算能力才能达到这个目标。但是如果能看到更多选择,那就太酷了。 > > 不过,这只是我个人对此事的看法。 Yes, currently, various applications of AI require high hardware performance, but we hope that Olama can still expand its support for other types of models, such as RAG, TTS, etc., so that in many scenarios, there is no need to deploy Olama+xinference at the same time, which cannot simplify the environment and lower the threshold for beginners.
Author
Owner

@jcc10 commented on GitHub (Feb 26, 2025):

I would appreciate it specifically for ensuring that I don't wind up having multiple systems competing for the same GPU. Having Ollama manage the GPU would be preferable to my current bodge of having to make requests to Ollama, then forcing the model to unload before loading the TTS model in manually. (That and it would make it easier to see who is responsible for the GPU load.)

<!-- gh-comment-id:2683702116 --> @jcc10 commented on GitHub (Feb 26, 2025): I would appreciate it specifically for ensuring that I don't wind up having multiple systems competing for the same GPU. Having Ollama manage the GPU would be preferable to my current bodge of having to make requests to Ollama, then forcing the model to unload before loading the TTS model in manually. (That and it would make it easier to see who is responsible for the GPU load.)
Author
Owner

@AncientMystic commented on GitHub (Feb 26, 2025):

I would appreciate it specifically for ensuring that I don't wind up having multiple systems competing for the same GPU. Having Ollama manage the GPU would be preferable to my current bodge of having to make requests to Ollama, then forcing the model to unload before loading the TTS model in manually. (That and it would make it easier to see who is responsible for the GPU load.)

If it helps any, while its not the point of this request, for tts at least, i found kokoro-fastapi TTS with the CPU model works great and is plenty fast enough, as long as you aren't loading entire pages or books at once into tts it is plenty fast enough and doesn't compete for the GPU at all that way. Even somewhat long messages do not take too long to process with the cpu version.

<!-- gh-comment-id:2683798294 --> @AncientMystic commented on GitHub (Feb 26, 2025): > I would appreciate it specifically for ensuring that I don't wind up having multiple systems competing for the same GPU. Having Ollama manage the GPU would be preferable to my current bodge of having to make requests to Ollama, then forcing the model to unload before loading the TTS model in manually. (That and it would make it easier to see who is responsible for the GPU load.) If it helps any, while its not the point of this request, for tts at least, i found kokoro-fastapi TTS with the CPU model works great and is plenty fast enough, as long as you aren't loading entire pages or books at once into tts it is plenty fast enough and doesn't compete for the GPU at all that way. Even somewhat long messages do not take too long to process with the cpu version.
Author
Owner

@E218PQ commented on GitHub (Feb 26, 2025):

Today, the video generation model with low GPU requirements for wan2.1 has been open-sourced. I wonder if Ollama has considered expanding its support to other model types?

<!-- gh-comment-id:2684673408 --> @E218PQ commented on GitHub (Feb 26, 2025): Today, the video generation model with low GPU requirements for wan2.1 has been open-sourced. I wonder if Ollama has considered expanding its support to other model types?
Author
Owner

@jcc10 commented on GitHub (Jun 20, 2025):

If it helps any, while its not the point of this request, for tts at least, i found kokoro-fastapi TTS with the CPU model works great and is plenty fast enough, as long as you aren't loading entire pages or books at once into tts it is plenty fast enough and doesn't compete for the GPU at all that way. Even somewhat long messages do not take too long to process with the cpu version.

So, while that's great for a more basic TTS, I am trying to get something like Parler running which allows for a text description of voices, which is needed for the project I am attempting to setup.

<!-- gh-comment-id:2992985050 --> @jcc10 commented on GitHub (Jun 20, 2025): > If it helps any, while its not the point of this request, for tts at least, i found kokoro-fastapi TTS with the CPU model works great and is plenty fast enough, as long as you aren't loading entire pages or books at once into tts it is plenty fast enough and doesn't compete for the GPU at all that way. Even somewhat long messages do not take too long to process with the cpu version. So, while that's great for a more basic TTS, I am trying to get something like Parler running which allows for a text description of voices, which is needed for the project I am attempting to setup.
Author
Owner

@codewithtyler commented on GitHub (Aug 11, 2025):

I'd love to see support for video models. In my use case I'm looking to use Skyreels V2 and it would be great if I could use Ollama's API to create a text to video automation using this model.

<!-- gh-comment-id:3173662180 --> @codewithtyler commented on GitHub (Aug 11, 2025): I'd love to see support for video models. In my use case I'm looking to use Skyreels V2 and it would be great if I could use Ollama's API to create a text to video automation using this model.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30433