[GH-ISSUE #6002] JSON Schema conformity using Llama.cpp Grammar generation for Tool Calling #65790

New Issue

GiteaMirror · 2026-05-03T22:42:13-05:00

GiteaMirror commented

2026-05-03 22:42:13 -05:00

Originally created by @marcnnn on GitHub (Jul 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6002

Originally assigned to: @ParthSareen on GitHub.

First thanks to the ollama Team, it's a pleasure to use!

I was looking into the Topic of Schema and Grammar from the tools' perspective.

I assume:

the tool arguments json schema, only inserted into the Prompt by ollama
no grammar is used to enforce the json schema for tool arguments
(only the json grammar is used)
the json-schema is not used for validation after parsing as well

Since Multiple pull request for the Topic of JSON schema and Grammars are open, I would like to address that by suggesting closing them.
For the few people that actually need Grammar support, using llama.cpp without ollama seem appropriate, since using grammars is already very low level.

ollama could focus to support tools calling with JSON Schema conformity.

One would need to combine all possible Tools into one JSON Schema.
A bit like in this https://github.com/ggerganov/llama.cpp/issues/7703 example.

Then the model can be constrained to comply to the correct tool use.
The problem is the exact tool support output depends on the model:

f5e3939220/server/testdata/tools/llama3-groq-tool-use.out (L14)

vs

f5e3939220/server/testdata/tools/firefunction.out (L17)

vs

e51c73ac63/models/llama3_1/api/tool_utils.py (L16)

How to deal with the:

--"functools" in firefunction
--"<tool_call>" token in llama3-groq
--"<function=****>" for llama 3.1

is not clear for me jet, since I think it would be best to leverage the JSON-Schema to grammar functionality in llama.cpp.

@mxyng What do think about that?

Should I spend time working or that, or is there no chance to be merged into ollama?

This should address:

https://github.com/ollama/ollama/issues/5976
https://github.com/ollama/ollama/issues/1507
https://github.com/ollama/ollama/pull/5348
https://github.com/ollama/ollama/pull/830
https://github.com/ollama/ollama/pull/565
https://github.com/ollama/ollama/pull/1606
https://github.com/ollama/ollama/pull/4525
https://github.com/ollama/ollama/pull/2404

on the consumer side:
https://github.com/thmsmlr/instructor_ex/issues/11

Originally created by @marcnnn on GitHub (Jul 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6002 Originally assigned to: @ParthSareen on GitHub. First thanks to the ollama Team, it's a pleasure to use! I was looking into the Topic of Schema and Grammar from the tools' perspective. I assume: - the tool arguments json schema, only inserted into the Prompt by ollama - no grammar is used to enforce the json schema for tool arguments - (only the json grammar is used) - the json-schema is not used for validation after parsing as well Since Multiple pull request for the Topic of JSON schema and Grammars are open, I would like to address that by suggesting closing them. For the few people that actually need Grammar support, using llama.cpp without ollama seem appropriate, since using grammars is already very low level. ollama could focus to support tools calling with JSON Schema conformity. One would need to combine all possible Tools into one JSON Schema. A bit like in this https://github.com/ggerganov/llama.cpp/issues/7703 example. Then the model can be constrained to comply to the correct tool use. The problem is the exact tool support output depends on the model: https://github.com/ollama/ollama/blob/f5e3939220e9cd3d7a636708bc9df031ebfd4854/server/testdata/tools/llama3-groq-tool-use.out#L14 vs https://github.com/ollama/ollama/blob/f5e3939220e9cd3d7a636708bc9df031ebfd4854/server/testdata/tools/firefunction.out#L17 vs https://github.com/meta-llama/llama-models/blob/e51c73ac639a38877da9bdfaecb4cb07dc8ba6d0/models/llama3_1/api/tool_utils.py#L16 How to deal with the: --"functools" in firefunction --"<tool_call>" token in llama3-groq --"<function=****>" for llama 3.1 is not clear for me jet, since I think it would be best to leverage the JSON-Schema to grammar functionality in llama.cpp. @mxyng What do think about that? Should I spend time working or that, or is there no chance to be merged into ollama? This should address: https://github.com/ollama/ollama/issues/5976 https://github.com/ollama/ollama/issues/1507 https://github.com/ollama/ollama/pull/5348 https://github.com/ollama/ollama/pull/830 https://github.com/ollama/ollama/pull/565 https://github.com/ollama/ollama/pull/1606 https://github.com/ollama/ollama/pull/4525 https://github.com/ollama/ollama/pull/2404 on the consumer side: https://github.com/thmsmlr/instructor_ex/issues/11

GiteaMirror added the feature request label 2026-05-03 22:42:13 -05:00

GiteaMirror closed this issue

2026-05-03 22:42:22 -05:00

GiteaMirror commented

2026-05-03 22:42:24 -05:00

@NeuralNotwerk commented on GitHub (Jul 30, 2024):

I'd love to see this feature added. I don't understand why it isn't just passed straight through to ggml via llama.cpp. We need arbitrary grammars in GBNF format.

@NeuralNotwerk commented on GitHub (Jul 30, 2024): I'd love to see this feature added. I don't understand why it isn't just passed straight through to ggml via llama.cpp. We need arbitrary grammars in GBNF format.

GiteaMirror commented

2026-05-03 22:42:25 -05:00

@Kinglord commented on GitHub (Aug 7, 2024):

Hey all, I know there's an automated ping here but just to better align everyone please check out and comment on my new call to the Ollama team for clarity here. As always please be civil and stay on topic! 😄 - https://github.com/ollama/ollama/issues/6237

@Kinglord commented on GitHub (Aug 7, 2024): Hey all, I know there's an automated ping here but just to better align everyone please check out and comment on my new call to the Ollama team for clarity here. As always please be civil and stay on topic! 😄 - https://github.com/ollama/ollama/issues/6237

GiteaMirror commented

2026-05-03 22:42:26 -05:00

@marcnnn commented on GitHub (Aug 7, 2024):

@Kinglord Thanks for creating a place for that discussion.
I tried to get around that discussion here.

Because tool use with models that are trained specifically for that, like llama3.1,
comes with its own challenges for the grammar.
Because just using the JSON schema to grammar conversion in llama.cpp will not be enough as explained.

A way to template the grammar generation in the model file could be a solution that I am thinking about.

@marcnnn commented on GitHub (Aug 7, 2024): @Kinglord Thanks for creating a place for that discussion. I tried to get around that discussion here. Because tool use with models that are trained specifically for that, like llama3.1, comes with its own challenges for the grammar. Because just using the JSON schema to grammar conversion in llama.cpp will not be enough as explained. A way to template the grammar generation in the model file could be a solution that I am thinking about.

GiteaMirror commented

2026-05-03 22:42:27 -05:00

@ParthSareen commented on GitHub (Dec 5, 2024):

Hi! Going to close this out as we're supporting structured outputs through https://github.com/ollama/ollama/pull/7900

Left a comment with some background as well: https://github.com/ollama/ollama/issues/6237#issuecomment-2518836758

@ParthSareen commented on GitHub (Dec 5, 2024): Hi! Going to close this out as we're supporting structured outputs through https://github.com/ollama/ollama/pull/7900 Left a comment with some background as well: https://github.com/ollama/ollama/issues/6237#issuecomment-2518836758

GiteaMirror commented

2026-05-03 22:42:28 -05:00

@allenporter commented on GitHub (Dec 9, 2024):

@ParthSareen I interepted this request as actually using the provided tool calling schema for the grammar schema -- which i believe is slightly different than supporting structured outputs in the response. Did this change actually support grammars for tool calls? I am assuming not because it appears to pass in the request format only, not looking at the tool format.

I want to make sure this was intentional and not a misunderstanding. Having tool calls following the schema would be a huge quality win for smaller models that don't always produce correct tool call outputs.

@allenporter commented on GitHub (Dec 9, 2024): @ParthSareen I interepted this request as actually using the provided tool calling schema for the grammar schema -- which i believe is slightly different than supporting structured outputs in the response. Did this change actually support grammars for tool calls? I am assuming not because it appears to pass in the request format only, not looking at the tool format. I want to make sure this was intentional and not a misunderstanding. Having tool calls following the schema would be a huge quality win for smaller models that don't always produce correct tool call outputs.

GiteaMirror commented

2026-05-03 22:42:28 -05:00

@ParthSareen commented on GitHub (Dec 9, 2024):

Thanks for the ping @allenporter. Seems like I misinterpreted on my first run through of this. Reopening the issue for now. Going to think a bit about how we can support this, what does extensibility look like, and if it makes sense for the stage of the project we're in. I'm also not sure what the exact interface would look like but am going to think through this one. I do think this could be really cool and improve accuracy. Just a bit worried about the interface as there are big updates to the engine incoming.

Will keep you all posted! Thanks!

@ParthSareen commented on GitHub (Dec 9, 2024): Thanks for the ping @allenporter. Seems like I misinterpreted on my first run through of this. Reopening the issue for now. Going to think a bit about how we can support this, what does extensibility look like, and if it makes sense for the stage of the project we're in. I'm also not sure what the exact interface would look like but am going to think through this one. I do think this could be really cool and improve accuracy. Just a bit worried about the interface as there are big updates to the engine incoming. Will keep you all posted! Thanks!

GiteaMirror commented

2026-05-03 22:42:29 -05:00

@allenporter commented on GitHub (Dec 9, 2024):

Ok thanks, I'm happy to help contribute (I'm familiar with some of how the tool parsing code works) but also I know it's a bit tricky as described above since it also depends on model specific formats as described above by the reporter. And as you say if the tech direction is shifting here, adds a bit more to be requirements...

Happy to keep discussing.

Just to motivate this a bit: The primary use case I'm pushing is or to improve tool use quality for home assistant device control where llama3.1 sometimes gets the tool params wrong. (We have some benchmarks tracking this)

@allenporter commented on GitHub (Dec 9, 2024): Ok thanks, I'm happy to help contribute (I'm familiar with some of how the tool parsing code works) but also I know it's a bit tricky as described above since it also depends on model specific formats as described above by the reporter. And as you say if the tech direction is shifting here, adds a bit more to be requirements... Happy to keep discussing. Just to motivate this a bit: The primary use case I'm pushing is or to improve tool use quality for home assistant device control where llama3.1 sometimes gets the tool params wrong. (We have some benchmarks tracking this)

GiteaMirror commented

2026-05-03 22:42:30 -05:00

@ParthSareen commented on GitHub (Dec 9, 2024):

@allenporter If you'd like you can take a crack at it for fun and see how far you get. There are a couple things I need to get to in the meantime but can pick it up from wherever you leave off. We can coauthor it if you're interested 😄 For a starting point I'd dig into the ChatHandler around here: da09488fbf/server/routes.go (L1467-L1530)

We're also currently using go templates for the tool parsing - something that I potentially want to refactor too. Would be out of scope for the PR but important to keep in mind with whatever you prototype. If you do choose to pick this up just open a draft PR and tag me!

Thanks!

@ParthSareen commented on GitHub (Dec 9, 2024): @allenporter If you'd like you can take a crack at it for fun and see how far you get. There are a couple things I need to get to in the meantime but can pick it up from wherever you leave off. We can coauthor it if you're interested 😄 For a starting point I'd dig into the ChatHandler around here: https://github.com/ollama/ollama/blob/da09488fbfc437c55a94bc5374b0850d935ea09f/server/routes.go#L1467-L1530 We're also currently using go templates for the tool parsing - something that I potentially want to refactor too. Would be out of scope for the PR but important to keep in mind with whatever you prototype. If you do choose to pick this up just open a draft PR and tag me! Thanks!

GiteaMirror commented

2026-05-03 22:42:30 -05:00

@allenporter commented on GitHub (Dec 10, 2024):

We're also currently using go templates for the tool parsing - something that I potentially want to refactor too.

Yeah, when I was looking at this looked at, it seemed a little difficult to specify up front as a grammar since:
(1) Tool response format depends on the model / template
(2) Tool responses are optional

My impression is the flow is something like this to prepare for a tool call:

900f64e6be/server/prompt.go (L25) appends any present tool calls into the request
The completion request point you cited will have the tool calls in the request
The response is parsed 900f64e6be/server/model.go (L303) using the template.
Response parsing is a little more creative. It assumes that the output format must contain json inside of what is specified in the template (works for most models, but not all).
- It instantiates the template with a fake tool call with placeholder variable names
- It then brute force parses the output as json 900f64e6be/server/model.go (L277) iterating through each character, parsing objects as it goes
- Then it reverse engineers the fields that contain the tool names and arguments from the placeholder names
- Then repeats the same process for the real output collecting the tool call objects

Proposed approach to get started:

Maybe we start simple with a model like llama3 which has a very straight forward format: Either it's responding with text, or it's responding with json, with no other wrapping characters.
Define a grammar where the structure is optional. Unless it starts producing json: it has a schema where the parameters match the tool call in the request.
It looks possible to even try this out with the current request API by making assumptions about the model format, except for the part where it's optional

@allenporter commented on GitHub (Dec 10, 2024): > We're also currently using go templates for the tool parsing - something that I potentially want to refactor too. Yeah, when I was looking at this looked at, it seemed a little difficult to specify up front as a grammar since: (1) Tool response format depends on the model / template (2) Tool responses are optional My impression is the flow is something like this to prepare for a tool call: - https://github.com/ollama/ollama/blob/900f64e6be859f52350c25032ff5b11f10509c7e/server/prompt.go#L25 appends any present tool calls into the request - The completion request point you cited will have the tool calls in the request - The response is parsed https://github.com/ollama/ollama/blob/900f64e6be859f52350c25032ff5b11f10509c7e/server/model.go#L303 using the template. - Response parsing is a little more creative. It assumes that the output format must contain json inside of what is specified in the template (works for most models, but not all). - It instantiates the template with a fake tool call with placeholder variable names - It then brute force parses the output as json https://github.com/ollama/ollama/blob/900f64e6be859f52350c25032ff5b11f10509c7e/server/model.go#L277 iterating through each character, parsing objects as it goes - Then it reverse engineers the fields that contain the tool names and arguments from the placeholder names - Then repeats the same process for the real output collecting the tool call objects Proposed approach to get started: - Maybe we start simple with a model like llama3 which has a very straight forward format: Either it's responding with text, or it's responding with json, with no other wrapping characters. - Define a grammar where the structure is optional. Unless it starts producing json: it has a schema where the `parameters` match the tool call in the request. - It looks possible to even try this out with the current request API by making assumptions about the model format, except for the part where it's optional

GiteaMirror commented

2026-05-03 22:42:31 -05:00

@ParthSareen commented on GitHub (Dec 13, 2024):

@allenporter I think this sounds like a good start - would be cool to get a prototype working! Keep me posted!

@ParthSareen commented on GitHub (Dec 13, 2024): @allenporter I think this sounds like a good start - would be cool to get a prototype working! Keep me posted!

GiteaMirror commented

2026-05-03 22:42:31 -05:00

@allenporter commented on GitHub (Dec 13, 2024):

I made a simple notebook to call llama3.1 w/ tools while also passing the json schema of the tools to the format parameter as an anyOf:
https://github.com/allenporter/ml-papers/blob/main/function-calling/ollama/json-schema.ipynb -- the notebook generates the schema by iterating over the tools.

It works well (it's easier to tell if its working by changing around the tools and mismatching the questions since you can see it force tool calls it would not otherwise).

This does makes tool calling mandatory when using this approach. Llama does bias towards calling tools when provided generally, but when you apply the format it will always call tools, which is not always desired.

@allenporter commented on GitHub (Dec 13, 2024): I made a simple notebook to call `llama3.1` w/ tools while also passing the json schema of the tools to the `format` parameter as an `anyOf`: https://github.com/allenporter/ml-papers/blob/main/function-calling/ollama/json-schema.ipynb -- the notebook generates the schema by iterating over the tools. It works well (it's easier to tell if its working by changing around the tools and mismatching the questions since you can see it force tool calls it would not otherwise). This does makes tool calling mandatory when using this approach. Llama does bias towards calling tools when provided generally, but when you apply the format it will *always* call tools, which is not always desired.

GiteaMirror commented

2026-05-03 22:42:32 -05:00

@allenporter commented on GitHub (Dec 13, 2024):

I've applied this technique to the Home Assistant assist-mini benchmark using the naive approach above and llama3.1:8b appears to improve from 83% to 91% before and after confirming that using a tool schema is a worthwhile improvement to explore.

@allenporter commented on GitHub (Dec 13, 2024): I've applied this technique to the Home Assistant `assist-mini` [benchmark](https://github.com/allenporter/home-assistant-datasets/tree/main/reports) using the naive approach above and `llama3.1:8b` appears to improve from 83% to 91% before and after confirming that using a tool schema is a worthwhile improvement to explore.

GiteaMirror commented

2026-05-03 22:42:33 -05:00

@allenporter commented on GitHub (Dec 13, 2024):

As a next step, i'll try:

get a grammar working forcing a json schema inferred from tool schema (always)
make tools optional in the grammar
see if its possible to create a grammar from the template output

@allenporter commented on GitHub (Dec 13, 2024): As a next step, i'll try: - get a grammar working forcing a json schema inferred from tool schema (always) - make tools optional in the grammar - see if its possible to create a grammar from the template output

GiteaMirror commented

2026-05-03 22:42:33 -05:00

@davidgeorgewilliams commented on GitHub (Dec 13, 2024):

+1, structured outputs from tool calls would be extremely helpful for adoption.

@davidgeorgewilliams commented on GitHub (Dec 13, 2024): +1, structured outputs from tool calls would be extremely helpful for adoption.

GiteaMirror commented

2026-05-03 22:42:34 -05:00

@ParthSareen commented on GitHub (Dec 13, 2024):

The preliminary results look promising @allenporter - the next steps sound good too. Just toss whatever work you get done on a draft PR and tag me. Thanks for working on this! 🙏🏽

@ParthSareen commented on GitHub (Dec 13, 2024): The preliminary results look promising @allenporter - the next steps sound good too. Just toss whatever work you get done on a draft PR and tag me. Thanks for working on this! 🙏🏽

GiteaMirror commented

2026-05-03 22:42:35 -05:00

@allenporter commented on GitHub (Feb 3, 2025):

The tool calling support natively in llama.cpp implements this feature: https://github.com/ggerganov/llama.cpp/pull/9639 quoting:

Any tool_calls field returned by llama-server should always conform to the JSON schema (to the extent that it uses supported features of JSON schemas), so there's no need to use any post-processor.

I could see a world where ollama instead uses the native tool calling to get this feature, assuming template support in llama.cpp can match what ollama has.

@allenporter commented on GitHub (Feb 3, 2025): The tool calling support natively in `llama.cpp` implements this feature: https://github.com/ggerganov/llama.cpp/pull/9639 quoting: > Any tool_calls field returned by llama-server should always conform to the JSON schema (to the extent that it uses [supported features of JSON schemas](https://github.com/ggerganov/llama.cpp/tree/master/grammars#json-schemas--gbnf)), so there's no need to use any post-processor. I could see a world where ollama instead uses the native tool calling to get this feature, assuming template support in llama.cpp can match what ollama has.

GiteaMirror commented

2026-05-03 22:42:35 -05:00

@ParthSareen commented on GitHub (Feb 3, 2025):

@allenporter We're working on a new engine and I've been working on sampling so won't be directly using llama.cpp for that. We're already doing partial JSON parsing on the tools output so it makes sense we'll do the json schema from tools. I think we can close this issue out for now and I'll open something for myself to get to after we have some of the new engine work merged in :) Really appreciate you digging in and validating this!

@ParthSareen commented on GitHub (Feb 3, 2025): @allenporter We're working on a new engine and I've been working on sampling so won't be directly using llama.cpp for that. We're already doing partial JSON parsing on the tools output so it makes sense we'll do the json schema from tools. I think we can close this issue out for now and I'll open something for myself to get to after we have some of the new engine work merged in :) Really appreciate you digging in and validating this!

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#65790