[PR #6241] [CLOSED] Speech Prototype #17331

Closed
opened 2026-04-16 05:59:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6241
Author: @royjhan
Created: 8/7/2024
Status: Closed

Base: mainHead: royh/whisper


📝 Commits (10+)

📊 Changes

11 files changed (+596 additions, -10 deletions)

View changed files

📝 .gitmodules (+4 -1)
📝 api/types.go (+19 -0)
📝 cmd/cmd.go (+39 -0)
📝 cmd/interactive.go (+35 -0)
docs/speech.md (+83 -0)
📝 go.mod (+1 -0)
📝 go.sum (+2 -0)
llm/whisper.cpp (+1 -0)
recorder/recorder.go (+137 -0)
📝 server/routes.go (+256 -0)
📝 server/sched.go (+19 -9)

📄 Description

whisper.cpp - custom ggml, wav audio

Instructions for running in md

As of now would require conversion to ggml format to run inference, would wait to see the general momentum surrounding speech-to-text models as bigger players release foundational models.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6241 **Author:** [@royjhan](https://github.com/royjhan) **Created:** 8/7/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `royh/whisper` --- ### 📝 Commits (10+) - [`1ac92ea`](https://github.com/ollama/ollama/commit/1ac92eae7ce056e6aecf4da790a574e0f735366e) submodule - [`6548318`](https://github.com/ollama/ollama/commit/65483180b9988ee76a68b5c3ecf0f077fbe5374a) working poc - [`97d9dff`](https://github.com/ollama/ollama/commit/97d9dffa804849a8a02cbf6e32e33be563131a9e) err check - [`17f9dc6`](https://github.com/ollama/ollama/commit/17f9dc6d086a2eb6a491700890a8f7c8831edd20) save whisper port - [`e4d3519`](https://github.com/ollama/ollama/commit/e4d35198a23efe47890583d837881cc532e993cf) transcribe - [`2a9feb0`](https://github.com/ollama/ollama/commit/2a9feb07072d7fa53f97d06e13c7d410bda5f377) model flexibility - [`a5181a8`](https://github.com/ollama/ollama/commit/a5181a8c511ec04b6a8ae062baf5f29970486e33) error handling - [`75ad630`](https://github.com/ollama/ollama/commit/75ad6309b46ef33d102ee1ec6d72341d88a6ff33) chat support - [`8ccf543`](https://github.com/ollama/ollama/commit/8ccf543c53f1183b1b7cb3890e83b41ca221f572) chat doc - [`d503f04`](https://github.com/ollama/ollama/commit/d503f04b3274431b1c3ceb7fe9f4004ee4a87db8) expiration ### 📊 Changes **11 files changed** (+596 additions, -10 deletions) <details> <summary>View changed files</summary> 📝 `.gitmodules` (+4 -1) 📝 `api/types.go` (+19 -0) 📝 `cmd/cmd.go` (+39 -0) 📝 `cmd/interactive.go` (+35 -0) ➕ `docs/speech.md` (+83 -0) 📝 `go.mod` (+1 -0) 📝 `go.sum` (+2 -0) ➕ `llm/whisper.cpp` (+1 -0) ➕ `recorder/recorder.go` (+137 -0) 📝 `server/routes.go` (+256 -0) 📝 `server/sched.go` (+19 -9) </details> ### 📄 Description whisper.cpp - custom ggml, wav audio Instructions for running in md As of now would require conversion to ggml format to run inference, would wait to see the general momentum surrounding speech-to-text models as bigger players release foundational models. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 05:59:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#17331