[GH-ISSUE #8887] The stream mode doesnt work with Function Calling #5762

New Issue

GiteaMirror · 2026-04-12T17:05:12-05:00

GiteaMirror commented

2026-04-12 17:05:12 -05:00

Originally created by @dickens88 on GitHub (Feb 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8887

Originally assigned to: @ParthSareen on GitHub.

What is the issue?

Hi,

I'm trying to use Function Calling with stream mode. And i Just released with Function list in the request, even input dont need function the response always using non-stream mode.

I used OpenAI SDK and connect to Ollama API. my testing code is like this,

response = self.openai.chat.completions.create(
    model="qwen2.5:14b",
    messages="please write me a 800 words post about AI",
    tools=registry.list_functions(),
    temperature=0.2,
    stream=True
)

for chunk in response:
    for tool_call in chunk.choices[0].delta.tool_calls or []:
        index = tool_call.index
        if index not in final_tool_calls:
            final_tool_calls[index] = tool_call
        if self.is_openai_model(self.model):
            # the way for openai model
            final_tool_calls[index].function.arguments += tool_call.function.arguments

    if chunk.choices[0].delta.content is not None:
       # wrap the chunk with {"text": xxx} and print
        yield json.dumps({"text": chunk.choices[0].delta.content}, ensure_ascii=False)

what if I remove tools=registry.list_functions(),, the output looks like stream that there are multiple chunks and each chunk wrap with Json

{"text": "Certainly"}{"text": "!"}{"text": " Here"}{"text": "'s"}{"text": " an"}{"text": " engaging"}{"text": " and"}{"text": " informative"}{"text": " blog"}

But after I add tools=registry.list_functions(), back. The output looks not like stream anymore, all the content in a single chunk:

{"text": "Creating an 800-word article on AI is quite extensive for this format, but I can certainly provide you with a detailed ..."}

I'm not sure if the Ollama API really works fine with stream mode.

Relevant log output

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.7

Originally created by @dickens88 on GitHub (Feb 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8887 Originally assigned to: @ParthSareen on GitHub. ### What is the issue? Hi, I'm trying to use `Function Calling` with `stream` mode. And i Just released with Function list in the request, even input dont need function the response always using non-stream mode. I used OpenAI SDK and connect to Ollama API. my testing code is like this, ``` response = self.openai.chat.completions.create( model="qwen2.5:14b", messages="please write me a 800 words post about AI", tools=registry.list_functions(), temperature=0.2, stream=True ) for chunk in response: for tool_call in chunk.choices[0].delta.tool_calls or []: index = tool_call.index if index not in final_tool_calls: final_tool_calls[index] = tool_call if self.is_openai_model(self.model): # the way for openai model final_tool_calls[index].function.arguments += tool_call.function.arguments if chunk.choices[0].delta.content is not None: # wrap the chunk with {"text": xxx} and print yield json.dumps({"text": chunk.choices[0].delta.content}, ensure_ascii=False) ``` what if I remove `tools=registry.list_functions(),`, the output looks like stream that there are multiple chunks and each chunk wrap with Json ``` {"text": "Certainly"}{"text": "!"}{"text": " Here"}{"text": "'s"}{"text": " an"}{"text": " engaging"}{"text": " and"}{"text": " informative"}{"text": " blog"} ``` But after I add `tools=registry.list_functions(),` back. The output looks not like stream anymore, all the content in a single chunk: ``` {"text": "Creating an 800-word article on AI is quite extensive for this format, but I can certainly provide you with a detailed ..."} ``` I'm not sure if the Ollama API really works fine with stream mode. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.7

GiteaMirror added the bug label 2026-04-12 17:05:12 -05:00

GiteaMirror closed this issue

2026-04-12 17:05:13 -05:00

GiteaMirror commented

2026-04-12 17:05:14 -05:00

@rick-github commented on GitHub (Feb 6, 2025):

https://github.com/ollama/ollama/issues/7886

@rick-github commented on GitHub (Feb 6, 2025): https://github.com/ollama/ollama/issues/7886

GiteaMirror commented

2026-04-12 17:05:14 -05:00

@ParthSareen commented on GitHub (Feb 6, 2025):

I'm not sure what the .list_functions() does, but assuming it is giving the schema of the tool, if the LLM does not respond with a tool then yes, Ollama will send the content back as a single chunk. See: https://github.com/ollama/ollama/issues/5796#issuecomment-2508764074

The current work around is to only add tools when you're expecting to use them.

@ParthSareen commented on GitHub (Feb 6, 2025): I'm not sure what the `.list_functions()` does, but assuming it is giving the schema of the tool, if the LLM does not respond with a tool then yes, Ollama will send the content back as a single chunk. See: https://github.com/ollama/ollama/issues/5796#issuecomment-2508764074 The current work around is to only add tools when you're expecting to use them.

GiteaMirror commented

2026-04-12 17:05:15 -05:00

@ParthSareen commented on GitHub (Feb 6, 2025):

Happy to open this again if needed :)

@ParthSareen commented on GitHub (Feb 6, 2025): Happy to open this again if needed :)

GiteaMirror commented

2026-04-12 17:05:16 -05:00

@dickens88 commented on GitHub (Feb 7, 2025):

@ParthSareen Thank you so much for the reply. If we compare with the stream mode implemented by OpenAI, we can see even add all the tools schema the reply still work with stream mode, which mean the response message was splitted into multiple chunks. With the stream mode the frontend looks faster and smooth.

Therefore, I think the current Ollama stream mode does not work in the function call scenario. Please consider if we can reopen the ticket.

@dickens88 commented on GitHub (Feb 7, 2025): @ParthSareen Thank you so much for the reply. If we compare with the stream mode implemented by OpenAI, we can see even add all the tools schema the reply still work with stream mode, which mean the response message was splitted into multiple chunks. With the stream mode the frontend looks faster and smooth. Therefore, I think the current Ollama stream mode does not work in the function call scenario. Please consider if we can reopen the ticket.

GiteaMirror commented

2026-04-12 17:05:16 -05:00

@ParthSareen commented on GitHub (Feb 7, 2025):

@ParthSareen Thank you so much for the reply. If we compare with the stream mode implemented by OpenAI, we can see even add all the tools schema the reply still work with stream mode, which mean the response message was splitted into multiple chunks. With the stream mode the frontend looks faster and smooth.

Therefore, I think the current Ollama stream mode does not work in the function call scenario. Please consider if we can reopen the ticket.

If you're using tool calling it really shouldn't matter what the split chunks are if you're expecting a function call. The current design returns fully parsed tool calls still as a stream which is a better design, you know which tools to call. If each chunk was streamed back, you'd still have to wait until it was recognized as a full tool call.

@ParthSareen commented on GitHub (Feb 7, 2025): > [@ParthSareen](https://github.com/ParthSareen) Thank you so much for the reply. If we compare with the stream mode implemented by OpenAI, we can see even add all the tools schema the reply still work with stream mode, which mean the response message was splitted into multiple chunks. With the stream mode the frontend looks faster and smooth. > > Therefore, I think the current Ollama stream mode does not work in the function call scenario. Please consider if we can reopen the ticket. If you're using tool calling it really shouldn't matter what the split chunks are if you're expecting a function call. The current design returns fully parsed tool calls still _as a stream_ which is a better design, you know which tools to call. If each chunk was streamed back, you'd still have to wait until it was recognized as a full tool call.

GiteaMirror commented

2026-04-12 17:05:17 -05:00

@dickens88 commented on GitHub (Feb 8, 2025):

If you're using tool calling it really shouldn't matter what the split chunks are if you're expecting a function call. The current design returns fully parsed tool calls still as a stream which is a better design, you know which tools to call. If each chunk was streamed back, you'd still have to wait until it was recognized as a full tool call.

You are right, but tool call is not only calling tools, but also put the result of function back to the context for the next round of chat. Now the tool call and final chat are all in non-stream mode. In OpenAI streaming function calling there is a example of how to get function call in stream mode, the SDK looks sample.

final_tool_calls = {}

for chunk in stream:
    for tool_call in chunk.choices[0].delta.tool_calls or []:
        index = tool_call.index

        if index not in final_tool_calls:
            final_tool_calls[index] = tool_call

        final_tool_calls[index].function.arguments += tool_call.function.arguments

At the meanwhile, the final answer which include the result of function calling can still output in stream model. In a word, OpenAI's API fully support stream in with function calling and the chat with function result. It use the same API and ask AI to decide when to call a function and when to return text.

@dickens88 commented on GitHub (Feb 8, 2025): > If you're using tool calling it really shouldn't matter what the split chunks are if you're expecting a function call. The current design returns fully parsed tool calls still _as a stream_ which is a better design, you know which tools to call. If each chunk was streamed back, you'd still have to wait until it was recognized as a full tool call. You are right, but tool call is not only calling tools, but also put the result of function back to the context for the next round of chat. Now the tool call and final chat are all in non-stream mode. In [OpenAI streaming function calling](https://platform.openai.com/docs/guides/function-calling?lang=curl&strict-mode=enabled) there is a example of how to get function call in stream mode, the SDK looks sample. ``` final_tool_calls = {} for chunk in stream: for tool_call in chunk.choices[0].delta.tool_calls or []: index = tool_call.index if index not in final_tool_calls: final_tool_calls[index] = tool_call final_tool_calls[index].function.arguments += tool_call.function.arguments ``` At the meanwhile, the final answer which include the result of function calling can still output in stream model. In a word, OpenAI's API fully support stream in with function calling and the chat with function result. It use the same API and ask AI to decide when to call a function and when to return text.

GiteaMirror commented

2026-04-12 17:05:18 -05:00

@UlrikWKoren commented on GitHub (Mar 12, 2025):

Please FIX This, we need to be able to stream while we have given tools. Please!

@UlrikWKoren commented on GitHub (Mar 12, 2025): Please FIX This, we need to be able to stream while we have given tools. Please!

GiteaMirror commented

2026-04-12 17:05:18 -05:00

@jbcallaghan commented on GitHub (Mar 22, 2025):

I can understand when a tool is called, that there wouldn't be much benefit to streaming in chunks, but what about when the LLM doesn't require a tool and responds directly? I have never managed to get that part to stream in chunks.

@jbcallaghan commented on GitHub (Mar 22, 2025): I can understand when a tool is called, that there wouldn't be much benefit to streaming in chunks, but what about when the LLM doesn't require a tool and responds directly? I have never managed to get that part to stream in chunks.

GiteaMirror commented

2026-04-12 17:05:19 -05:00

@ParthSareen commented on GitHub (Mar 25, 2025):

Planning to fix this folks in the coming weeks where we can check if tools are coming down or not and then stream back the result! Sorry for the wait 🙏🏽

@ParthSareen commented on GitHub (Mar 25, 2025): Planning to fix this folks in the coming weeks where we can check if tools are coming down or not and then stream back the result! Sorry for the wait 🙏🏽

GiteaMirror commented

2026-04-12 17:05:19 -05:00

@RakeshReddyKondeti commented on GitHub (Mar 26, 2025):

Hi @ParthSareen

I'm working on an application where I'd like users to see responses in real-time, but I also need function calling capabilities. Is there any quick and dirty workaround that might help in the interim?

@RakeshReddyKondeti commented on GitHub (Mar 26, 2025): Hi @ParthSareen I'm working on an application where I'd like users to see responses in real-time, but I also need function calling capabilities. Is there any quick and dirty workaround that might help in the interim?

GiteaMirror commented

2026-04-12 17:05:20 -05:00

@ParthSareen commented on GitHub (Mar 26, 2025):

@RakeshReddyKondeti for the meantime you can just have two clients one streaming w/o tools passed in vs. one with tools for function calling. Let me know how that goes

@ParthSareen commented on GitHub (Mar 26, 2025): @RakeshReddyKondeti for the meantime you can just have two clients one streaming w/o tools passed in vs. one with tools for function calling. Let me know how that goes

GiteaMirror commented

2026-04-12 17:05:20 -05:00

@RakeshReddyKondeti commented on GitHub (Mar 26, 2025):

Thanks for the response. I appreciate the suggested workaround, but I think it doesn't quite address my specific use case.

My application requires giving the LLM access to tools, but letting the model itself decide whether to use them for each query. The current behavior forces me to choose between:

Providing tools but losing streaming entirely (even when the LLM chooses not to use tools)
Having streaming but removing the LLM's ability to use tools when needed

The suggested approach of having two separate clients would require me to predict in advance whether the LLM will need tools for a given query, which defeats the purpose of letting the LLM make that decision during inference.

I'll wait for the update (hopefully soon), as this functionality is important for my use case. Thanks for working on this issue.

@RakeshReddyKondeti commented on GitHub (Mar 26, 2025): Thanks for the response. I appreciate the suggested workaround, but I think it doesn't quite address my specific use case. My application requires giving the LLM access to tools, but letting the model itself decide whether to use them for each query. The current behavior forces me to choose between: 1. Providing tools but losing streaming entirely (even when the LLM chooses not to use tools) 2. Having streaming but removing the LLM's ability to use tools when needed The suggested approach of having two separate clients would require me to predict in advance whether the LLM will need tools for a given query, which defeats the purpose of letting the LLM make that decision during inference. I'll wait for the update (hopefully soon), as this functionality is important for my use case. Thanks for working on this issue.

GiteaMirror commented

2026-04-12 17:05:21 -05:00

@NasonZ commented on GitHub (Apr 2, 2025):

Thanks for the response. I appreciate the suggested workaround, but I think it doesn't quite address my specific use case.

My application requires giving the LLM access to tools, but letting the model itself decide whether to use them for each query. The current behavior forces me to choose between:

Providing tools but losing streaming entirely (even when the LLM chooses not to use tools)

Having streaming but removing the LLM's ability to use tools when needed

The suggested approach of having two separate clients would require me to predict in advance whether the LLM will need tools for a given query, which defeats the purpose of letting the LLM make that decision during inference.

I'll wait for the update (hopefully soon), as this functionality is important for my use case. Thanks for working on this issue.

I also have the same use case. Commenting to keep an eye out for updates

@NasonZ commented on GitHub (Apr 2, 2025): > Thanks for the response. I appreciate the suggested workaround, but I think it doesn't quite address my specific use case. > > My application requires giving the LLM access to tools, but letting the model itself decide whether to use them for each query. The current behavior forces me to choose between: > > 1. Providing tools but losing streaming entirely (even when the LLM chooses not to use tools) > 2. Having streaming but removing the LLM's ability to use tools when needed > > The suggested approach of having two separate clients would require me to predict in advance whether the LLM will need tools for a given query, which defeats the purpose of letting the LLM make that decision during inference. > > I'll wait for the update (hopefully soon), as this functionality is important for my use case. Thanks for working on this issue. I also have the same use case. Commenting to keep an eye out for updates

GiteaMirror commented

2026-04-12 17:05:22 -05:00

@smileyboy2019 commented on GitHub (Apr 25, 2025):

@ParthSareen @dickens88 When can the tool be resolved to stream back conten

@smileyboy2019 commented on GitHub (Apr 25, 2025): @ParthSareen @dickens88 When can the tool be resolved to stream back conten

GiteaMirror commented

2026-04-12 17:05:22 -05:00

@ParthSareen commented on GitHub (Apr 25, 2025):

Working on it right now! @smileyboy2019

@ParthSareen commented on GitHub (Apr 25, 2025): Working on it right now! @smileyboy2019

GiteaMirror commented

2026-04-12 17:05:23 -05:00

@danny-avila commented on GitHub (May 7, 2025):

Waiting for this :)

@danny-avila commented on GitHub (May 7, 2025): Waiting for this :)

GiteaMirror commented

2026-04-12 17:05:24 -05:00

@anyon17 commented on GitHub (May 10, 2025):

Is this issue resolved ? with MCP tools this issue is even more important because MCP servers are pre-registered with the LLM. Right now I am getting single chunk when added tools even if the response does not predict any tool call. Can anyone help with this issue ?

@anyon17 commented on GitHub (May 10, 2025): Is this issue resolved ? with MCP tools this issue is even more important because MCP servers are pre-registered with the LLM. Right now I am getting single chunk when added tools even if the response does not predict any tool call. Can anyone help with this issue ?

GiteaMirror commented

2026-04-12 17:05:26 -05:00

@ghassenbenghorbal commented on GitHub (May 17, 2025):

Any news?

@ghassenbenghorbal commented on GitHub (May 17, 2025): Any news?

GiteaMirror commented

2026-04-12 17:05:28 -05:00

@ParthSareen commented on GitHub (May 17, 2025):

Almost done folks. Just some last bit of cleanup left. Going to break this massive PR into some smaller chunks.

If you want to try it out: https://github.com/ollama/ollama/pull/10415

@ParthSareen commented on GitHub (May 17, 2025): Almost done folks. Just some last bit of cleanup left. Going to break this massive PR into some smaller chunks. If you want to try it out: https://github.com/ollama/ollama/pull/10415

GiteaMirror commented

2026-04-12 17:05:29 -05:00

@danny-avila commented on GitHub (May 25, 2025):

Almost done folks. Just some last bit of cleanup left. Going to break this massive PR into some smaller chunks.

If you want to try it out: #10415

Looking forward to it!

@danny-avila commented on GitHub (May 25, 2025): > Almost done folks. Just some last bit of cleanup left. Going to break this massive PR into some smaller chunks. > > If you want to try it out: [#10415](https://github.com/ollama/ollama/pull/10415) Looking forward to it!

GiteaMirror commented

2026-04-12 17:05:31 -05:00

@fwq418233640 commented on GitHub (Jun 3, 2025):

Almost done folks. Just some last bit of cleanup left. Going to break this massive PR into some smaller chunks.

If you want to try it out: #10415

Thank you very much and I am looking forward to the release of this feature！

@fwq418233640 commented on GitHub (Jun 3, 2025): > Almost done folks. Just some last bit of cleanup left. Going to break this massive PR into some smaller chunks. > > If you want to try it out: [#10415](https://github.com/ollama/ollama/pull/10415) Thank you very much and I am looking forward to the release of this feature！

GiteaMirror referenced this issue

2026-04-22 08:07:42 -05:00

[GH-ISSUE #5762] How to pass a custom file (txt\doc\pdf) to prompt through the API #29348

GiteaMirror referenced this issue

2026-04-22 09:56:00 -05:00

[GH-ISSUE #7238] Ollama document intelligence engine #30357

GiteaMirror referenced this issue

2026-04-22 09:56:00 -05:00

[GH-ISSUE #7238] Ollama document intelligence engine #30357

GiteaMirror referenced this issue

2026-04-22 10:02:18 -05:00

[GH-ISSUE #7340] Is it possible to use the API to pass my documents to the model and have it understand them in depth? #30423

GiteaMirror referenced this issue

2026-04-28 14:06:08 -05:00

[GH-ISSUE #5762] How to pass a custom file (txt\doc\pdf) to prompt through the API #50099

GiteaMirror referenced this issue

2026-04-28 18:24:23 -05:00

[GH-ISSUE #7238] Ollama document intelligence engine #51108

GiteaMirror referenced this issue

2026-04-28 18:24:25 -05:00

[GH-ISSUE #7238] Ollama document intelligence engine #51108

GiteaMirror referenced this issue

2026-04-28 18:52:32 -05:00

[GH-ISSUE #7340] Is it possible to use the API to pass my documents to the model and have it understand them in depth? #51174

GiteaMirror referenced this issue

2026-05-03 21:56:29 -05:00

[GH-ISSUE #5762] How to pass a custom file (txt\doc\pdf) to prompt through the API #65625

GiteaMirror referenced this issue

2026-05-04 07:43:50 -05:00

[GH-ISSUE #7238] Ollama document intelligence engine #66653

GiteaMirror referenced this issue

2026-05-04 07:43:51 -05:00

[GH-ISSUE #7238] Ollama document intelligence engine #66653

GiteaMirror referenced this issue

2026-05-04 07:56:57 -05:00

[GH-ISSUE #7340] Is it possible to use the API to pass my documents to the model and have it understand them in depth? #66719

Sign in to join this conversation.

Branches Tags

main

hoyyeva/opencode-image-modality

hoyyeva/anthropic-renderer-local-image-path

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#5762