[GH-ISSUE #11236] Trying to get Ollama models to work with Macbook Pro M4 Pro, but keep getting response: {"done_reason":"load"} #7401

New Issue

GiteaMirror · 2026-04-12T19:29:23-05:00

GiteaMirror commented

2026-04-12 19:29:23 -05:00

Originally created by @Pbot64 on GitHub (Jun 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11236

What is the issue?

Hi,

I'm trying to get this project working https://github.com/asiff00/On-Device-Speech-to-Speech-Conversational-AI?tab=readme-ov-file on my Macbook Pro M4 Pro (OS Version 15.5 (24F74).

I've tried phi3, mistral, and llama3.2 and I keep getting the same response: done_reason:"load".
I've used the CLI version and the MacOS version 0.9.3 and even tried the pre-release [v0.9.4] for MacOS via (https://github.com/ollama/ollama/releases/tag/v0.9.4-rc2).

Here is my entire .env (besides my Hugging Face Token)
VOICE_NAME=af_heart
SPEED=0.9
export ESPEAK_PATH=/usr/local/bin/espeak

LLM settings

LM_STUDIO_URL=http://localhost:11434/v1
OLLAMA_URL=http://localhost:11434/api/generate
DEFAULT_SYSTEM_PROMPT=You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses.
MAX_TOKENS=512
NUM_THREADS=2
LLM_TEMPERATURE=0.9
LLM_STREAM=true
LLM_RETRY_DELAY=0.5
MAX_RETRIES=3

Model names

VAD_MODEL=pyannote/segmentation-3.0
WHISPER_MODEL=openai/whisper-tiny.en
LLM_MODEL=mistral
TTS_MODEL=kokoro.pth

VAD settings

VAD_MIN_DURATION_ON=0.1
VAD_MIN_DURATION_OFF=0.1

Audio settings

CHUNK=256
FORMAT=pyaudio.paFloat32
CHANNELS=1
RATE=16000
OUTPUT_SAMPLE_RATE=24000
RECORD_DURATION=5
SILENCE_THRESHOLD=0.001
INTERRUPTION_THRESHOLD=0.01
MAX_SILENCE_DURATION=1
SPEECH_CHECK_TIMEOUT=0.1
SPEECH_CHECK_THRESHOLD=0.02
ROLLING_BUFFER_TIME=0.5
TARGET_SIZE = 25
PLAYBACK_DELAY = 0.001
FIRST_SENTENCE_SIZE = 2

I am stuck. Totally and completely stuck. I feel like I've tried everything to get this kokoro conversational project working, but I'm stuck at the LLM integration. Help would be very much appreciated! Copilot says it's a bug with Ollama trying to work with Apple Metal. Please...

Relevant log output

curl http://localhost:11434/api/generate -d '{
  "model": "phi3",    
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Say hello!"}
  ],
  "stream": false
}'
{"model":"phi3","created_at":"2025-06-29T20:25:26.312542Z","response":"","done":true,"done_reason":"load"}%

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.9.3

Originally created by @Pbot64 on GitHub (Jun 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11236 ### What is the issue? Hi, I'm trying to get this project working _https://github.com/asiff00/On-Device-Speech-to-Speech-Conversational-AI?tab=readme-ov-file_ on my Macbook Pro M4 Pro (OS Version 15.5 (24F74). I've tried phi3, mistral, and llama3.2 and I keep getting the same response: done_reason:"load". I've used the CLI version and the MacOS version 0.9.3 and even tried the pre-release [v0.9.4] for MacOS via (https://github.com/ollama/ollama/releases/tag/v0.9.4-rc2). Here is my entire .env (besides my Hugging Face Token) VOICE_NAME=af_heart SPEED=0.9 export ESPEAK_PATH=/usr/local/bin/espeak # LLM settings LM_STUDIO_URL=http://localhost:11434/v1 OLLAMA_URL=http://localhost:11434/api/generate DEFAULT_SYSTEM_PROMPT=You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses. MAX_TOKENS=512 NUM_THREADS=2 LLM_TEMPERATURE=0.9 LLM_STREAM=true LLM_RETRY_DELAY=0.5 MAX_RETRIES=3 # Model names VAD_MODEL=pyannote/segmentation-3.0 WHISPER_MODEL=openai/whisper-tiny.en LLM_MODEL=mistral TTS_MODEL=kokoro.pth # VAD settings VAD_MIN_DURATION_ON=0.1 VAD_MIN_DURATION_OFF=0.1 # Audio settings CHUNK=256 FORMAT=pyaudio.paFloat32 CHANNELS=1 RATE=16000 OUTPUT_SAMPLE_RATE=24000 RECORD_DURATION=5 SILENCE_THRESHOLD=0.001 INTERRUPTION_THRESHOLD=0.01 MAX_SILENCE_DURATION=1 SPEECH_CHECK_TIMEOUT=0.1 SPEECH_CHECK_THRESHOLD=0.02 ROLLING_BUFFER_TIME=0.5 TARGET_SIZE = 25 PLAYBACK_DELAY = 0.001 FIRST_SENTENCE_SIZE = 2 I am stuck. Totally and completely stuck. I feel like I've tried everything to get this kokoro conversational project working, but I'm stuck at the LLM integration. Help would be very much appreciated! Copilot says it's a bug with Ollama trying to work with Apple Metal. Please... ### Relevant log output ```shell curl http://localhost:11434/api/generate -d '{ "model": "phi3", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say hello!"} ], "stream": false }' {"model":"phi3","created_at":"2025-06-29T20:25:26.312542Z","response":"","done":true,"done_reason":"load"}% ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.9.3

GiteaMirror added the bug label 2026-04-12 19:29:23 -05:00

GiteaMirror closed this issue

2026-04-12 19:29:23 -05:00

GiteaMirror commented

2026-04-12 19:29:24 -05:00

@rick-github commented on GitHub (Jun 29, 2025):

If you want to use messages[], you need to use the /api/chat endpoint.

@rick-github commented on GitHub (Jun 29, 2025): If you want to use `messages[]`, you need to use the `/api/chat` [endpoint](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion).

GiteaMirror commented

2026-04-12 19:29:25 -05:00

@Pbot64 commented on GitHub (Jun 29, 2025):

Tried using that end point. Still not working, I get this "cursh>" which is strange

@Pbot64 commented on GitHub (Jun 29, 2025): Tried using that end point. Still not working, I get this "cursh>" which is strange

GiteaMirror commented

2026-04-12 19:29:25 -05:00

@rick-github commented on GitHub (Jun 29, 2025):

$ curl http://localhost:11434/api/chat -d '{
  "model": "phi3",    
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Say hello!"}
  ],
  "stream": false
}'
{"model":"phi3","created_at":"2025-06-29T21:48:31.650657348Z","message":{"role":"assistant","content":"Hello there! How can I help you today?"},"done_reason":"stop","done":true,"total_duration":39757122491,"load_duration":39529064468,"prompt_eval_count":25,"prompt_eval_duration":153800265,"eval_count":11,"eval_duration":73081385}

@rick-github commented on GitHub (Jun 29, 2025): ```console $ curl http://localhost:11434/api/chat -d '{ "model": "phi3", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say hello!"} ], "stream": false }' {"model":"phi3","created_at":"2025-06-29T21:48:31.650657348Z","message":{"role":"assistant","content":"Hello there! How can I help you today?"},"done_reason":"stop","done":true,"total_duration":39757122491,"load_duration":39529064468,"prompt_eval_count":25,"prompt_eval_duration":153800265,"eval_count":11,"eval_duration":73081385} ```

GiteaMirror commented

2026-04-12 19:29:25 -05:00

@Pbot64 commented on GitHub (Jun 29, 2025):

Appreciate the help. I just was serving the wrong model, but I still can't generate any sound. This is what it gives me:

eSpeak path set to: /opt/homebrew/bin/espeak
✅ This is the kokoro.py you’re actually running

Initializing Whisper model...

Initializing Voice Activity Detection...
Registered checkpoint save hook for _speechbrain_save
Registered checkpoint load hook for _speechbrain_load
Registered checkpoint save hook for save
Registered checkpoint load hook for load
Registered checkpoint save hook for _save
Registered checkpoint load hook for _recover

=== Voice Chat Bot Initializing ===
Device being used: cpu

Initializing voice generator...
Loaded voice: af_heart

Warming up the LLM model...
[DEBUG] Checking Ollama health at http://localhost:11434
[DEBUG] Ollama health status: 200
[DEBUG] Calling get_ai_response for warmup...
[DEBUG] Warmup get_ai_response returned: <generator object get_ai_response at 0x1921a1770>

=== Voice Chat Bot Ready ===
The bot is now listening for speech.
Just start speaking, and I'll respond automatically!
You can interrupt me anytime by starting to speak.
[DEBUG] About to record audio...

Listening... (Press Ctrl+C to stop)

Potential speech detected...
Processing speech segment...
🔈 Captured audio: <class 'numpy.ndarray'> (93184,)
📊 Audio stats: -0.10005563 0.15666205 -5.5727873e-05
[DEBUG] Audio data is not None, proceeding to VAD...
[DEBUG] Detecting speech segments...
[DEBUG] speech_segments: tensor([ 8.9648e-05, 4.7134e-05, 1.0209e-04, ..., -6.8553e-04,
-6.7169e-04, -5.9594e-04])
[DEBUG] VAD found speech segments, proceeding to transcription...

Transcribing detected speech...
🎛️ Input features shape: torch.Size([1, 80, 3000])
🧠 Generated token IDs: tensor([[18435, 11, 314, 716, 257, 1692, 852, 13, 1867, 318,
534, 1438, 30]])
📝 Final transcription: Hello, I am a human being. What is your name?
🔊 Audio data shape: (65070,)
📝 Final transcription list: Hello, I am a human being. What is your name?
[DEBUG] Transcribed user_input: Hello, I am a human being. What is your name?
[DEBUG] Transcription not empty, sending to LLM...
🧠 Sending to LLM: Hello, I am a human being. What is your name?
You (voice): Hello, I am a human being. What is your name?

Thinking...
[DEBUG] Calling process_input from voice input...
[DEBUG] process_input called with user_input: Hello, I am a human being. What is your name?
[DEBUG] messages before append: [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}]
[DEBUG] messages after append: [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}]

Thinking...
[DEBUG] Calling get_ai_response with model: phi3, url: http://localhost:11434/api/chat
[DEBUG] messages_to_send: [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}]
[DEBUG] get_ai_response returned: <generator object get_ai_response at 0x193431e70>
[DEBUG][get_ai_response] Payload to LLM: {'model': 'phi3', 'messages': [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}], 'options': {'num_ctx': 1024, 'num_thread': 2}, 'stream': False}
[DEBUG][get_ai_response] Raw response status: 200
[DEBUG][get_ai_response] Non-streaming JSON: {'model': 'phi3', 'created_at': '2025-06-29T22:32:32.485973Z', 'message': {'role': 'assistant', 'content': 'Umm, my designation as an AI assistant doesn't have a personal name like humans do, but you can call me "Aiden" if that helps! How may I assist you today? So tell me about yourself without revealing any sensitive information.'}, 'done_reason': 'stop', 'done': True, 'total_duration': 1538489125, 'load_duration': 538388042, 'prompt_eval_count': 95, 'prompt_eval_duration': 286457208, 'eval_count': 55, 'eval_duration': 710626375}
[DEBUG][get_ai_response] No response text in non-streaming reply.

Audio Generation Complete - Processed: 0, Generated: 0, Failed: 0

Timing Chart:
Event | Time (s) | Δ+

User stopped speaking | 0.00 | 0.00
VAD started | 0.00 | 0.00
End-to-end response | 2.41 | 2.41
[DEBUG] process_input completed successfully.
[DEBUG] process_input returned: was_interrupted=False, speech_data=None
✅ LLM and TTS response completed. was_interrupted = False
[DEBUG] About to record audio...

Listening... (Press Ctrl+C to stop)

@Pbot64 commented on GitHub (Jun 29, 2025): Appreciate the help. I just was serving the wrong model, but I still can't generate any sound. This is what it gives me: eSpeak path set to: /opt/homebrew/bin/espeak ✅ This is the kokoro.py you’re actually running Initializing Whisper model... Initializing Voice Activity Detection... Registered checkpoint save hook for _speechbrain_save Registered checkpoint load hook for _speechbrain_load Registered checkpoint save hook for save Registered checkpoint load hook for load Registered checkpoint save hook for _save Registered checkpoint load hook for _recover === Voice Chat Bot Initializing === Device being used: cpu Initializing voice generator... Loaded voice: af_heart Warming up the LLM model... [DEBUG] Checking Ollama health at http://localhost:11434 [DEBUG] Ollama health status: 200 [DEBUG] Calling get_ai_response for warmup... [DEBUG] Warmup get_ai_response returned: <generator object get_ai_response at 0x1921a1770> === Voice Chat Bot Ready === The bot is now listening for speech. Just start speaking, and I'll respond automatically! You can interrupt me anytime by starting to speak. [DEBUG] About to record audio... Listening... (Press Ctrl+C to stop) Potential speech detected... Processing speech segment... 🔈 Captured audio: <class 'numpy.ndarray'> (93184,) 📊 Audio stats: -0.10005563 0.15666205 -5.5727873e-05 [DEBUG] Audio data is not None, proceeding to VAD... [DEBUG] Detecting speech segments... [DEBUG] speech_segments: tensor([ 8.9648e-05, 4.7134e-05, 1.0209e-04, ..., -6.8553e-04, -6.7169e-04, -5.9594e-04]) [DEBUG] VAD found speech segments, proceeding to transcription... Transcribing detected speech... 🎛️ Input features shape: torch.Size([1, 80, 3000]) 🧠 Generated token IDs: tensor([[18435, 11, 314, 716, 257, 1692, 852, 13, 1867, 318, 534, 1438, 30]]) 📝 Final transcription: Hello, I am a human being. What is your name? 🔊 Audio data shape: (65070,) 📝 Final transcription list: Hello, I am a human being. What is your name? [DEBUG] Transcribed user_input: Hello, I am a human being. What is your name? [DEBUG] Transcription not empty, sending to LLM... 🧠 Sending to LLM: Hello, I am a human being. What is your name? You (voice): Hello, I am a human being. What is your name? Thinking... [DEBUG] Calling process_input from voice input... [DEBUG] process_input called with user_input: Hello, I am a human being. What is your name? [DEBUG] messages before append: [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}] [DEBUG] messages after append: [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}] Thinking... [DEBUG] Calling get_ai_response with model: phi3, url: http://localhost:11434/api/chat [DEBUG] messages_to_send: [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}] [DEBUG] get_ai_response returned: <generator object get_ai_response at 0x193431e70> [DEBUG][get_ai_response] Payload to LLM: {'model': 'phi3', 'messages': [{'role': 'system', 'content': "You are a friendly, helpful, and intelligent assistant. Begin your responses with phrases like 'Umm,' 'So,' or similar. Focus on the user query and reply directly to the user in the first person ('I'), responding promptly and naturally. Do not include any additional information or context in your responses."}, {'role': 'user', 'content': ' Hello, I am a human being. What is your name?'}], 'options': {'num_ctx': 1024, 'num_thread': 2}, 'stream': False} [DEBUG][get_ai_response] Raw response status: 200 [DEBUG][get_ai_response] Non-streaming JSON: {'model': 'phi3', 'created_at': '2025-06-29T22:32:32.485973Z', 'message': {'role': 'assistant', 'content': 'Umm, my designation as an AI assistant doesn\'t have a personal name like humans do, but you can call me "Aiden" if that helps! How may I assist you today? So tell me about yourself without revealing any sensitive information.'}, 'done_reason': 'stop', 'done': True, 'total_duration': 1538489125, 'load_duration': 538388042, 'prompt_eval_count': 95, 'prompt_eval_duration': 286457208, 'eval_count': 55, 'eval_duration': 710626375} [DEBUG][get_ai_response] No response text in non-streaming reply. Audio Generation Complete - Processed: 0, Generated: 0, Failed: 0 Timing Chart: Event | Time (s) | Δ+ --------------------------------------------- User stopped speaking | 0.00 | 0.00 VAD started | 0.00 | 0.00 End-to-end response | 2.41 | 2.41 [DEBUG] process_input completed successfully. [DEBUG] process_input returned: was_interrupted=False, speech_data=None ✅ LLM and TTS response completed. was_interrupted = False [DEBUG] About to record audio... Listening... (Press Ctrl+C to stop)

GiteaMirror commented

2026-04-12 19:29:26 -05:00

@rick-github commented on GitHub (Jun 29, 2025):

[DEBUG][get_ai_response] Non-streaming JSON: {'model': 'phi3', 'created_at': '2025-06-29T22:32:32.485973Z', 'message': {'role': 'assistant', 'content': 'Umm, my designation as an AI assistant doesn't have a personal name like humans do, but you can call me "Aiden" if that helps! How may I assist you today? So tell me about yourself without revealing any sensitive information.'}, 'done_reason': 'stop', 'done': True, 'total_duration': 1538489125, 'load_duration': 538388042, 'prompt_eval_count': 95, 'prompt_eval_duration': 286457208, 'eval_count': 55, 'eval_duration': 710626375}
[DEBUG][get_ai_response] No response text in non-streaming reply.

ollama is returning a response but the project is not recognizing it. You probably need to follow up on https://github.com/asiff00/On-Device-Speech-to-Speech-Conversational-AI/issues/12, so far this doesn't look like an ollama issue.

@rick-github commented on GitHub (Jun 29, 2025): > [DEBUG][get_ai_response] Non-streaming JSON: {'model': 'phi3', 'created_at': '2025-06-29T22:32:32.485973Z', 'message': {'role': 'assistant', 'content': 'Umm, my designation as an AI assistant doesn't have a personal name like humans do, but you can call me "Aiden" if that helps! How may I assist you today? So tell me about yourself without revealing any sensitive information.'}, 'done_reason': 'stop', 'done': True, 'total_duration': 1538489125, 'load_duration': 538388042, 'prompt_eval_count': 95, 'prompt_eval_duration': 286457208, 'eval_count': 55, 'eval_duration': 710626375} > [DEBUG][get_ai_response] No response text in non-streaming reply. ollama is returning a response but the project is not recognizing it. You probably need to follow up on https://github.com/asiff00/On-Device-Speech-to-Speech-Conversational-AI/issues/12, so far this doesn't look like an ollama issue.

GiteaMirror commented

2026-04-12 19:29:26 -05:00

@Pbot64 commented on GitHub (Jun 29, 2025):

now I'm getting this
Warming up the LLM model...
[DEBUG] Checking Ollama health at http://localhost:11434
[DEBUG] Ollama health status: 200
[DEBUG] Calling get_ai_response for warmup...
[LLM ERROR] Failed to get response from Ollama: Extra data: line 2 column 1 (char 118)
[DEBUG] Warmup get_ai_response returned: None
Failed to initialized the AI model!

@Pbot64 commented on GitHub (Jun 29, 2025): now I'm getting this Warming up the LLM model... [DEBUG] Checking Ollama health at http://localhost:11434 [DEBUG] Ollama health status: 200 [DEBUG] Calling get_ai_response for warmup... [LLM ERROR] Failed to get response from Ollama: Extra data: line 2 column 1 (char 118) [DEBUG] Warmup get_ai_response returned: None Failed to initialized the AI model!

GiteaMirror referenced this issue

2026-04-22 10:06:20 -05:00

[GH-ISSUE #7401] Configure docker image to start with some models installed #30467

GiteaMirror referenced this issue

2026-04-28 18:56:24 -05:00

[GH-ISSUE #7401] Configure docker image to start with some models installed #51218

GiteaMirror referenced this issue

2026-05-04 08:07:25 -05:00

[GH-ISSUE #7401] Configure docker image to start with some models installed #66763

GiteaMirror referenced this issue

2026-05-09 14:00:54 -05:00

[GH-ISSUE #7401] Configure docker image to start with some models installed #82387

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#7401