[PR #12801] [CLOSED] Added STT inputs And TTS Output Check Readme File for the updates #39829

Closed
opened 2026-04-23 00:49:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12801
Author: @atulenv
Created: 10/28/2025
Status: Closed

Base: mainHead: test/cli-run-help-and-args


📝 Commits (1)

  • 4d729a7 Added STT inputs And TTS Output Check Readme File for the updates

📊 Changes

4 files changed (+241 additions, -16 deletions)

View changed files

📝 README.md (+24 -0)
cmd.go (+0 -0)
📝 cmd/cmd.go (+199 -6)
📝 go.mod (+18 -10)

📄 Description

Title: Feat: Add speak command for voice interaction with models

This pull request introduces a new speak command to the Ollama CLI, enabling users to interact with
models using their voice. This feature significantly improves accessibility and provides a more
natural and intuitive way to communicate with large language models, moving beyond text-based
interaction.

Description

The speak command facilitates a complete voice-to-voice conversation with any model running on
Ollama. The workflow is as follows:

  1. The user invokes ollama speak .
  2. The CLI records a 5-second audio clip from the user's microphone.
  3. The recorded audio is transcribed into text using a local speech-to-text engine.
  4. The transcribed text is then sent to the specified Ollama model as a prompt.
  5. The model's response is captured and converted back into speech, which is then played back to the
    user.
  • portaudio: A cross-platform audio I/O library used for recording audio from the microphone. It was
    chosen for its wide platform support and stability.
  • go-whisper: A Go implementation of OpenAI's Whisper model for high-quality, local speech-to-text
    transcription. This avoids the need for cloud-based speech recognition services and API keys.
  • espeak: A compact, open-source software speech synthesizer for text-to-speech. It was chosen for
    its simplicity and ease of integration, as it can be called directly from the command line.

Changes Made

  • cmd/cmd.go:

    • Added a new speakCmd to the Cobra command structure.
    • Implemented the speakHandler function, which orchestrates the entire speak functionality, from
      recording to transcription and text-to-speech.
    • Added the transcribeAudio helper function, which uses the go-whisper library to convert the
      recorded audio file to text.
    • Added the speakText helper function, which uses the espeak command-line tool to convert the
      model's response to speech.
  • go.mod and go.sum:

    • Added the following new dependencies to support the speak command:
      • github.com/go-audio/audio
      • github.com/go-audio/wav
      • github.com/gordonklaus/portaudio
      • github.com/mutablelogic/go-whisper
  • README.md:

    • Added a new "Speak to a model" section under the "CLI Reference" to document the new speak
      command.
    • This new section, contributed by Atul Sahu (atulenv), explains the feature, lists the required
      dependencies with installation instructions, and provides a clear example of how to use the
      command.

How to Use

To use the speak command, you first need to install a few dependencies:

  1. PortAudio: A cross-platform audio I/O library.

    • macOS: brew install portaudio
    • Debian/Ubuntu: sudo apt-get install portaudio19-dev
  2. eSpeak: A text-to-speech synthesizer.

    • macOS: brew install espeak
    • Debian/Ubuntu: sudo apt-get install espeak
  3. Whisper Model: A pre-trained whisper.cpp model for speech-to-text.

    1. Download a model from huggingface.co/ggerganov/whisper.cpp/tree/main
      (https://huggingface.co/ggerganov/whisper.cpp/tree/main) (e.g., ggml-base.en.bin).
    2. Place the model file in the same directory where you run the ollama command.

Once you have installed the dependencies, you can use the speak command:

1 ollama speak

Testing

To test the new speak command, please follow these steps:

  1. Ensure you have installed all the required dependencies (PortAudio, eSpeak, and a whisper.cpp
    model).
  2. Build the ollama binary with the new changes.
  3. Run the speak command with a model of your choice:
    1 ./ollama speak llama3.2
  4. The command will print "Recording for 5 seconds...". Speak a prompt into your microphone.
  5. After 5 seconds, the recording will stop, and the transcribed text will be printed to the console.
  6. The model's response will then be spoken back to you.

Future Improvements

This implementation provides a solid foundation for voice interaction in Ollama. Here are some
potential future improvements:

  • Higher-Quality TTS: Integrate a more advanced text-to-speech engine for more natural-sounding voice
    responses.
  • Configurable Recording: Allow the user to configure the recording duration or use a "push-to-talk"
    style of interaction.
  • Visual Feedback: Add a visual indicator to show when the microphone is actively recording.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12801 **Author:** [@atulenv](https://github.com/atulenv) **Created:** 10/28/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `test/cli-run-help-and-args` --- ### 📝 Commits (1) - [`4d729a7`](https://github.com/ollama/ollama/commit/4d729a7280b20b3dfc148d9a449306d8b71feb34) Added STT inputs And TTS Output Check Readme File for the updates ### 📊 Changes **4 files changed** (+241 additions, -16 deletions) <details> <summary>View changed files</summary> 📝 `README.md` (+24 -0) ➕ `cmd.go` (+0 -0) 📝 `cmd/cmd.go` (+199 -6) 📝 `go.mod` (+18 -10) </details> ### 📄 Description Title: Feat: Add `speak` command for voice interaction with models This pull request introduces a new speak command to the Ollama CLI, enabling users to interact with models using their voice. This feature significantly improves accessibility and provides a more natural and intuitive way to communicate with large language models, moving beyond text-based interaction. Description The speak command facilitates a complete voice-to-voice conversation with any model running on Ollama. The workflow is as follows: 1. The user invokes ollama speak <model-name>. 2. The CLI records a 5-second audio clip from the user's microphone. 3. The recorded audio is transcribed into text using a local speech-to-text engine. 4. The transcribed text is then sent to the specified Ollama model as a prompt. 5. The model's response is captured and converted back into speech, which is then played back to the user. * `portaudio`: A cross-platform audio I/O library used for recording audio from the microphone. It was chosen for its wide platform support and stability. * `go-whisper`: A Go implementation of OpenAI's Whisper model for high-quality, local speech-to-text transcription. This avoids the need for cloud-based speech recognition services and API keys. * `espeak`: A compact, open-source software speech synthesizer for text-to-speech. It was chosen for its simplicity and ease of integration, as it can be called directly from the command line. Changes Made * `cmd/cmd.go`: * Added a new speakCmd to the Cobra command structure. * Implemented the speakHandler function, which orchestrates the entire speak functionality, from recording to transcription and text-to-speech. * Added the transcribeAudio helper function, which uses the go-whisper library to convert the recorded audio file to text. * Added the speakText helper function, which uses the espeak command-line tool to convert the model's response to speech. * `go.mod` and `go.sum`: * Added the following new dependencies to support the speak command: * github.com/go-audio/audio * github.com/go-audio/wav * github.com/gordonklaus/portaudio * github.com/mutablelogic/go-whisper * `README.md`: * Added a new "Speak to a model" section under the "CLI Reference" to document the new speak command. * This new section, contributed by Atul Sahu (atulenv), explains the feature, lists the required dependencies with installation instructions, and provides a clear example of how to use the command. How to Use To use the speak command, you first need to install a few dependencies: 1. PortAudio: A cross-platform audio I/O library. * macOS: brew install portaudio * Debian/Ubuntu: sudo apt-get install portaudio19-dev 2. eSpeak: A text-to-speech synthesizer. * macOS: brew install espeak * Debian/Ubuntu: sudo apt-get install espeak 3. Whisper Model: A pre-trained whisper.cpp model for speech-to-text. 1. Download a model from huggingface.co/ggerganov/whisper.cpp/tree/main (https://huggingface.co/ggerganov/whisper.cpp/tree/main) (e.g., ggml-base.en.bin). 2. Place the model file in the same directory where you run the ollama command. Once you have installed the dependencies, you can use the speak command: 1 ollama speak <model-name> Testing To test the new speak command, please follow these steps: 1. Ensure you have installed all the required dependencies (PortAudio, eSpeak, and a whisper.cpp model). 2. Build the ollama binary with the new changes. 3. Run the speak command with a model of your choice: 1 ./ollama speak llama3.2 4. The command will print "Recording for 5 seconds...". Speak a prompt into your microphone. 5. After 5 seconds, the recording will stop, and the transcribed text will be printed to the console. 6. The model's response will then be spoken back to you. Future Improvements This implementation provides a solid foundation for voice interaction in Ollama. Here are some potential future improvements: * Higher-Quality TTS: Integrate a more advanced text-to-speech engine for more natural-sounding voice responses. * Configurable Recording: Allow the user to configure the recording duration or use a "push-to-talk" style of interaction. * Visual Feedback: Add a visual indicator to show when the microphone is actively recording. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 00:49:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#39829