[GH-ISSUE #3265] Does ollama also plan to support the sound models? #64048

Open
opened 2026-05-03 15:58:31 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @insooneelife on GitHub (Mar 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3265

What are you trying to do?

Currently, ollama supports most llms and I know that it also supports vision model. I thought it would be nice if the sound model could also be used through ollama. I wonder if there are any plans like this.

How should we solve this?

I know that there are currently sound models released on huggingface. There are several TTS and STT models released as open source. I think these models could also support model serving and API like llm.

What is the impact of not solving this?

No response

Anything else?

No response

Originally created by @insooneelife on GitHub (Mar 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3265 ### What are you trying to do? Currently, ollama supports most llms and I know that it also supports vision model. I thought it would be nice if the sound model could also be used through ollama. I wonder if there are any plans like this. ### How should we solve this? I know that there are currently sound models released on huggingface. There are several TTS and STT models released as open source. I think these models could also support model serving and API like llm. ### What is the impact of not solving this? _No response_ ### Anything else? _No response_
GiteaMirror added the feature request label 2026-05-03 15:58:31 -05:00
Author
Owner

@BruceMacD commented on GitHub (Mar 25, 2024):

Hi @insooneelife thanks for opening the issue. Text-to-speech and speech-to-text models are on our radar, but they aren't on the roadmap yet. When that changes we will update this issue.

<!-- gh-comment-id:2018498993 --> @BruceMacD commented on GitHub (Mar 25, 2024): Hi @insooneelife thanks for opening the issue. Text-to-speech and speech-to-text models are on our radar, but they aren't on the roadmap yet. When that changes we will update this issue.
Author
Owner

@shyeetsao commented on GitHub (Dec 25, 2024):

FYI llama.app added TTS support last week: https://github.com/ggerganov/llama.cpp/pull/10784

<!-- gh-comment-id:2561936714 --> @shyeetsao commented on GitHub (Dec 25, 2024): FYI llama.app added TTS support last week: https://github.com/ggerganov/llama.cpp/pull/10784
Author
Owner

@olumolu commented on GitHub (Feb 19, 2025):

https://huggingface.co/stepfun-ai/Step-Audio-Chat
+1 for me

<!-- gh-comment-id:2668036297 --> @olumolu commented on GitHub (Feb 19, 2025): https://huggingface.co/stepfun-ai/Step-Audio-Chat +1 for me
Author
Owner

@Amoghk04 commented on GitHub (Mar 17, 2026):

I have developed an application which supports sound models along with existing ollama features, will make it public in a couple of weeks, includes configurable RAG pipelines as well

<!-- gh-comment-id:4073047701 --> @Amoghk04 commented on GitHub (Mar 17, 2026): I have developed an application which supports sound models along with existing ollama features, will make it public in a couple of weeks, includes configurable RAG pipelines as well
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64048