[GH-ISSUE #1168] Support WhisperForConditionalGeneration #26353

Open
opened 2026-04-22 02:35:52 -05:00 by GiteaMirror · 20 comments
Owner

Originally created by @OpenWaygate on GitHub (Nov 17, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1168

Hi, it will be so great if ollama can run openai/whisper, then we can chain voice and text. Is there any roadmap about it?

Originally created by @OpenWaygate on GitHub (Nov 17, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1168 Hi, it will be so great if ollama can run openai/whisper, then we can chain voice and text. Is there any roadmap about it?
GiteaMirror added the modelfeature request labels 2026-04-22 02:35:52 -05:00
Author
Owner

@grigio commented on GitHub (Nov 18, 2023):

+1 but I'd prefere whisper.cpp which also works on cpu

<!-- gh-comment-id:1817484879 --> @grigio commented on GitHub (Nov 18, 2023): +1 but I'd prefere whisper.cpp which also works on cpu
Author
Owner

@OpenWaygate commented on GitHub (Nov 19, 2023):

+1 but I'd prefere whisper.cpp which also works on cpu

That would be better, since i donot have a gpu

<!-- gh-comment-id:1817734413 --> @OpenWaygate commented on GitHub (Nov 19, 2023): > +1 but I'd prefere whisper.cpp which also works on cpu That would be better, since i donot have a gpu
Author
Owner

@oliverbob commented on GitHub (Nov 22, 2023):

Any update to this is yet? Do we have native or langchain support for this already?

<!-- gh-comment-id:1821960667 --> @oliverbob commented on GitHub (Nov 22, 2023): Any update to this is yet? Do we have native or langchain support for this already?
Author
Owner

@nickplennox commented on GitHub (Dec 1, 2023):

+1 over here, I'd love to be able to run whisper, especially on GPU!

Don't think I know enough about all this to create my own ModelFile from a pytorch model.

<!-- gh-comment-id:1836162008 --> @nickplennox commented on GitHub (Dec 1, 2023): +1 over here, I'd love to be able to run whisper, especially on GPU! Don't think I know enough about all this to create my own ModelFile from a pytorch model.
Author
Owner

@einarpersson commented on GitHub (Dec 6, 2023):

If possible I would like a smaller model as well. I don't know that much of this subject but shouldn't asmaller models be enough to detect keywords such as "Hi Alfred"? And then pipe the rest to something like Whisper.

<!-- gh-comment-id:1842807847 --> @einarpersson commented on GitHub (Dec 6, 2023): If possible I would like a smaller model as well. I don't know that much of this subject but shouldn't asmaller models be enough to detect keywords such as "Hi Alfred"? And then pipe the rest to something like Whisper.
Author
Owner

@djmaze commented on GitHub (Dec 22, 2023):

faster-whisper would be even more interesting IMO. Since it is, uhm, faster.

<!-- gh-comment-id:1867785257 --> @djmaze commented on GitHub (Dec 22, 2023): [faster-whisper](https://github.com/SYSTRAN/faster-whisper) would be even more interesting IMO. Since it is, uhm, faster.
Author
Owner

@lasseedfast commented on GitHub (Jan 28, 2024):

As both Ollama and whisper.cpp are somehow related to llama.cpp, maybe whisper.cpp could be a good starting point. This is only me guessing, maybe @wookayin has some input as someone who seems to be into both Ollama and llama.cpp?

<!-- gh-comment-id:1913546589 --> @lasseedfast commented on GitHub (Jan 28, 2024): As both Ollama and whisper.cpp are somehow related to llama.cpp, maybe whisper.cpp could be a good starting point. This is only me guessing, maybe @wookayin has some input as someone who seems to be into both Ollama and llama.cpp?
Author
Owner

@royjhan commented on GitHub (Aug 15, 2024):

Hey everyone, we're trying to get an initial pulse of a feel for what whisper would look like in Ollama. Super rough POC but feel free to take a look at the demo and leave any high-level feedback! #6241

<!-- gh-comment-id:2292190473 --> @royjhan commented on GitHub (Aug 15, 2024): Hey everyone, we're trying to get an initial pulse of a feel for what whisper would look like in Ollama. Super rough POC but feel free to take a look at the demo and leave any high-level feedback! #6241
Author
Owner

@ryzxxn commented on GitHub (Dec 23, 2024):

Can we please have a websocket integration where we can stream audio and out put stream of text transcribed
(Applications: realtime voice assistant, better context aware application, multimodal processing of live video and audio feeds )

<!-- gh-comment-id:2558823951 --> @ryzxxn commented on GitHub (Dec 23, 2024): Can we please have a websocket integration where we can stream audio and out put stream of text transcribed (Applications: realtime voice assistant, better context aware application, multimodal processing of live video and audio feeds )
Author
Owner

@peng456 commented on GitHub (Dec 23, 2024):

Is anyone interested in implementing this feature?

<!-- gh-comment-id:2559434581 --> @peng456 commented on GitHub (Dec 23, 2024): Is anyone interested in implementing this feature?
Author
Owner

@peng456 commented on GitHub (Dec 23, 2024):

Is anyone interested in implementing this feature?

Integrating whisper.cpp into Ollama

<!-- gh-comment-id:2559439124 --> @peng456 commented on GitHub (Dec 23, 2024): > Is anyone interested in implementing this feature? Integrating whisper.cpp into Ollama
Author
Owner

@LuisMalhadas commented on GitHub (Dec 23, 2024):

yes... ;)

<!-- gh-comment-id:2559541384 --> @LuisMalhadas commented on GitHub (Dec 23, 2024): yes... ;)
Author
Owner

@alexander-potemkin commented on GitHub (Dec 25, 2024):

faster-whisper would be even more interesting IMO. Since it is, uhm, faster.

It is indeed!

<!-- gh-comment-id:2561886497 --> @alexander-potemkin commented on GitHub (Dec 25, 2024): > [faster-whisper](https://github.com/SYSTRAN/faster-whisper) would be even more interesting IMO. Since it is, uhm, faster. It is indeed!
Author
Owner

@alexander-potemkin commented on GitHub (Dec 25, 2024):

yes... ;)

Don't mean to be rude or something, but I'm planning to re-implement a project with (faster-whisper) under the hood and I would absolutely love to replace the lib with ollama implementation, so... Do you have any ideas or thoughts on when you can have that ready?

And do you plan to support whisper-turbo? And/or faster-whisper?

Thanks in advance in any way!!

<!-- gh-comment-id:2561887541 --> @alexander-potemkin commented on GitHub (Dec 25, 2024): > yes... ;) Don't mean to be rude or something, but I'm planning to re-implement a project with (faster-whisper) under the hood and I would absolutely love to replace the lib with ollama implementation, so... Do you have any ideas or thoughts on when you can have that ready? And do you plan to support [whisper-turbo](https://huggingface.co/openai/whisper-large-v3-turbo)? And/or faster-whisper? Thanks in advance in any way!!
Author
Owner

@peng456 commented on GitHub (Dec 30, 2024):

yes... ;)
Do we need to create a new repository or submit a new branch?

<!-- gh-comment-id:2565233504 --> @peng456 commented on GitHub (Dec 30, 2024): > yes... ;) Do we need to create a new repository or submit a new branch?
Author
Owner

@bioshazard commented on GitHub (Jan 27, 2025):

yes... ;)

oh nice! thanks! I am struggling between wanting to go all in on ollama and going back to LocalAI for the broad modality support. Extremely interested in whisper on ollama!

<!-- gh-comment-id:2614784344 --> @bioshazard commented on GitHub (Jan 27, 2025): > yes... ;) oh nice! thanks! I am struggling between wanting to go all in on ollama and going back to LocalAI for the broad modality support. Extremely interested in whisper on ollama!
Author
Owner

@theshyPika commented on GitHub (Feb 11, 2025):

It would be so elegant if ollama could provide a speech-to-text API interface compatible with OpenWebUI. For example: "/api/audio/transcriptions".

<!-- gh-comment-id:2651187879 --> @theshyPika commented on GitHub (Feb 11, 2025): It would be so elegant if ollama could provide a speech-to-text API interface compatible with OpenWebUI. For example: "/api/audio/transcriptions".
Author
Owner

@mahimairaja commented on GitHub (Mar 4, 2025):

Is this issue is related to something building a open source version of OpenAI's real time api?

<!-- gh-comment-id:2695912063 --> @mahimairaja commented on GitHub (Mar 4, 2025): Is this issue is related to something building a open source version of OpenAI's real time api?
Author
Owner

@JohnGalt1717 commented on GitHub (May 8, 2025):

I would love to see the equivalent to OpenAI's realtime api that we could drop in an as an example use Steve Sanderson's code in C# to carry on realtime AI conversations. He informs me that the OpenAI models are doing speech recognition directly and not using whisper, but tooling whsiper into ollama to create an emulation would be excellent and I would suggest that parity is the goal.

<!-- gh-comment-id:2863209203 --> @JohnGalt1717 commented on GitHub (May 8, 2025): I would love to see the equivalent to OpenAI's realtime api that we could drop in an as an example use Steve Sanderson's code in C# to carry on realtime AI conversations. He informs me that the OpenAI models are doing speech recognition directly and not using whisper, but tooling whsiper into ollama to create an emulation would be excellent and I would suggest that parity is the goal.
Author
Owner

@privacy-advo commented on GitHub (May 17, 2025):

@jmorganca @ParthSareen
Do you maintain a public roadmap?
Are audio features planned for a upcoming release?

<!-- gh-comment-id:2888524061 --> @privacy-advo commented on GitHub (May 17, 2025): @jmorganca @ParthSareen Do you maintain a public roadmap? Are audio features planned for a upcoming release?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26353