[PR #15243] api: add audios field for audio input in multimodal models #25630

Open
opened 2026-04-19 18:19:16 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15243
Author: @harsha-gouru
Created: 4/2/2026
Status: 🔄 Open

Base: mainHead: feat/audio-api-field


📝 Commits (1)

  • af45c40 api: add audios field for audio input in multimodal models

📊 Changes

15 files changed (+495 additions, -27 deletions)

View changed files

📝 api/types.go (+6 -1)
📝 cmd/cmd.go (+10 -1)
📝 cmd/cmd_test.go (+9 -0)
📝 cmd/interactive.go (+8 -5)
📝 cmd/interactive_test.go (+5 -3)
📝 integration/audio_test.go (+2 -2)
📝 model/renderers/gemma4.go (+4 -3)
📝 openai/openai.go (+5 -2)
📝 openai/openai_test.go (+201 -0)
📝 server/prompt.go (+9 -3)
📝 server/prompt_test.go (+26 -0)
📝 server/routes.go (+12 -6)
📝 server/routes_debug_test.go (+35 -0)
📝 server/routes_generate_test.go (+157 -0)
📝 template/template.go (+6 -1)

📄 Description

Add a dedicated Audios field to Message and GenerateRequest so that API consumers can send audio data without stuffing it into the Images field. Audio bytes are merged into the existing multimodal pipeline at the server layer since models already detect WAV format by magic bytes.

Changes:

  • Add Audios []ImageData to GenerateRequest and Message structs
  • Merge audios into the images pipeline in prompt.go and routes.go
  • Update Gemma4 renderer and template rendering to count audios
  • Route OpenAI input_audio content to the new Audios field
  • Validate input_audio format and reject non-WAV with a clear error
  • Fix audio-only generate being treated as a load-only request
  • Separate WAV files from images in CLI extractFileData
  • Add server, OpenAI, prompt, and debug-render tests for audio

Backward compatible: the images field continues to accept audio data.

Fixes #11798


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15243 **Author:** [@harsha-gouru](https://github.com/harsha-gouru) **Created:** 4/2/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feat/audio-api-field` --- ### 📝 Commits (1) - [`af45c40`](https://github.com/ollama/ollama/commit/af45c407cece58d9e46e3c86236b857665165608) api: add audios field for audio input in multimodal models ### 📊 Changes **15 files changed** (+495 additions, -27 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+6 -1) 📝 `cmd/cmd.go` (+10 -1) 📝 `cmd/cmd_test.go` (+9 -0) 📝 `cmd/interactive.go` (+8 -5) 📝 `cmd/interactive_test.go` (+5 -3) 📝 `integration/audio_test.go` (+2 -2) 📝 `model/renderers/gemma4.go` (+4 -3) 📝 `openai/openai.go` (+5 -2) 📝 `openai/openai_test.go` (+201 -0) 📝 `server/prompt.go` (+9 -3) 📝 `server/prompt_test.go` (+26 -0) 📝 `server/routes.go` (+12 -6) 📝 `server/routes_debug_test.go` (+35 -0) 📝 `server/routes_generate_test.go` (+157 -0) 📝 `template/template.go` (+6 -1) </details> ### 📄 Description Add a dedicated Audios field to Message and GenerateRequest so that API consumers can send audio data without stuffing it into the Images field. Audio bytes are merged into the existing multimodal pipeline at the server layer since models already detect WAV format by magic bytes. Changes: - Add Audios []ImageData to GenerateRequest and Message structs - Merge audios into the images pipeline in prompt.go and routes.go - Update Gemma4 renderer and template rendering to count audios - Route OpenAI input_audio content to the new Audios field - Validate input_audio format and reject non-WAV with a clear error - Fix audio-only generate being treated as a load-only request - Separate WAV files from images in CLI extractFileData - Add server, OpenAI, prompt, and debug-render tests for audio Backward compatible: the images field continues to accept audio data. Fixes #11798 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:19:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25630