[GH-ISSUE #2815] Adding Whisper by creating Modelfile #48220

Closed
opened 2026-04-28 07:13:58 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @serban-razvan-termene on GitHub (Feb 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2815

Ollama uses a format for the Modelfile that is described here. The Modelfile allows only for text input to be put into the resulting model.

There is one multimodal model (LLaVA) that you guys added to the model hub that uses something else other than text. But some modifications to the Ollama and the model were done. As far as I understand, the model takes the encoded image as a text parameter.

Is there any possibility for me to write a Modelfile and modify the base repository for Ollama to take audio as input for the Whisper model that takes audio as the input and outputs the text heard in that audio?

Originally created by @serban-razvan-termene on GitHub (Feb 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2815 Ollama uses a format for the Modelfile that is described [here](https://github.com/ollama/ollama/blob/main/docs/modelfile.md). The Modelfile allows only for text input to be put into the resulting model. There is one multimodal model (LLaVA) that you guys added to the model hub that uses something else other than text. But some modifications to the Ollama and the model [were done](https://github.com/ollama/ollama/issues/746). As far as I understand, the model takes the encoded image as a text parameter. Is there any possibility for me to write a Modelfile and modify the base repository for Ollama to take audio as input for the Whisper model that takes audio as the input and outputs the text heard in that audio?
Author
Owner

@jmorganca commented on GitHub (May 10, 2024):

Hi @serban-razvan-termene love this idea, however not yet since Ollama doesn't support audio models yet. I'll merge this with #1168

<!-- gh-comment-id:2103679675 --> @jmorganca commented on GitHub (May 10, 2024): Hi @serban-razvan-termene love this idea, however not yet since Ollama doesn't support audio models yet. I'll merge this with #1168
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48220