[GH-ISSUE #11432] voxtral #54059

Open
opened 2026-04-29 05:09:54 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @rlindskog on GitHub (Jul 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11432

Originally created by @rlindskog on GitHub (Jul 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11432
GiteaMirror added the model label 2026-04-29 05:09:54 -05:00
Author
Owner

@SuperPat45 commented on GitHub (Jul 15, 2025):

This model need to add audio input support to Ollama APIs
For /chat/completions, the Mistral Doc gives this example:

curl --location https://api.mistral.ai/v1/chat/completions \
  --header "Authorization: Bearer $MISTRAL_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "voxtral-mini-2507",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_audio",
            "input_audio": {
              "data": "https://download.samplelib.com/mp3/sample-15s.mp3",
              "format": "mp3"
            }
          },
          {
            "type": "text",
            "text": "What'\''s in this file?"
          }
        ]
      }
    ]
  }'

I suppose it would not be possible to also implement the “standard” /audio/transcriptions API with this model?

<!-- gh-comment-id:3075411655 --> @SuperPat45 commented on GitHub (Jul 15, 2025): This model need to add audio input support to Ollama APIs For /chat/completions, the [Mistral Doc](https://docs.mistral.ai/capabilities/audio/) gives this example: ``` curl --location https://api.mistral.ai/v1/chat/completions \ --header "Authorization: Bearer $MISTRAL_API_KEY" \ --header "Content-Type: application/json" \ --data '{ "model": "voxtral-mini-2507", "messages": [ { "role": "user", "content": [ { "type": "input_audio", "input_audio": { "data": "https://download.samplelib.com/mp3/sample-15s.mp3", "format": "mp3" } }, { "type": "text", "text": "What'\''s in this file?" } ] } ] }' ``` I suppose it would not be possible to also implement the “standard” /audio/transcriptions API with this model?
Author
Owner

@MyButtermilk commented on GitHub (Jul 21, 2025):

I would rather use Voxtral-Small-latest as it is more accurate and thus competitive with the SOTA. Would it be possible to use it in chat completions mode but only return the prompted transcript via json?

from mistralai import Mistral
client = Mistral(api_key=API_KEY)

messages = [{
"role": "user",
"content": [
{"type": "input_audio",
"input_audio": {"data": audio_url, "format": audio_format}},
{"type": "text",
"text": (
"Transcribe the speech verbatim. "
"No translation. No summary. "
"Return JSON {"text": ""}. "
)},
],
}]

resp = client.chat.complete(
model="voxtral-small-latest",
prompt_mode=None,
temperature=0.0,
response_format={"type": "json_object"},
messages=messages,
max_tokens=4096,
)
transcript = resp.choices[0].message.content # parse JSON -> transcript

<!-- gh-comment-id:3095245155 --> @MyButtermilk commented on GitHub (Jul 21, 2025): I would rather use Voxtral-Small-latest as it is more accurate and thus competitive with the SOTA. Would it be possible to use it in chat completions mode but only return the prompted transcript via json? from mistralai import Mistral client = Mistral(api_key=API_KEY) messages = [{ "role": "user", "content": [ {"type": "input_audio", "input_audio": {"data": audio_url, "format": audio_format}}, {"type": "text", "text": ( "Transcribe the speech verbatim. " "No translation. No summary. " "Return JSON {\"text\": \"<transcript>\"}. " )}, ], }] resp = client.chat.complete( model="voxtral-small-latest", prompt_mode=None, temperature=0.0, response_format={"type": "json_object"}, messages=messages, max_tokens=4096, ) transcript = resp.choices[0].message.content # parse JSON -> transcript
Author
Owner

@eschmidbauer commented on GitHub (Jul 29, 2025):

voxtral support was recently added to llama.cpp

<!-- gh-comment-id:3132096259 --> @eschmidbauer commented on GitHub (Jul 29, 2025): voxtral support was recently added to [llama.cpp](https://github.com/ggml-org/llama.cpp/commit/00fa15fedc79263fa0285e6a3bbb0cfb3e3878a2)
Author
Owner

@SEVENID commented on GitHub (Nov 24, 2025):

Is it even planned? >3 months ago llama.cpp received voxtral support.

<!-- gh-comment-id:3572511092 --> @SEVENID commented on GitHub (Nov 24, 2025): Is it even planned? >3 months ago llama.cpp received voxtral support.
Author
Owner

@SuperPat45 commented on GitHub (Feb 4, 2026):

Mistral just released a new Voxtral mini realtime open weight model
https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602
https://mistral.ai/news/voxtral-transcribe-2

<!-- gh-comment-id:3849438705 --> @SuperPat45 commented on GitHub (Feb 4, 2026): Mistral just released a new Voxtral mini realtime open weight model https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602 https://mistral.ai/news/voxtral-transcribe-2
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54059