[GH-ISSUE #11768] Access Ollama Turbo through the local Ollama API #33557

New Issue

GiteaMirror · 2026-04-22T16:24:20-05:00

GiteaMirror commented

2026-04-22 16:24:20 -05:00

Originally created by @owenzhao on GitHub (Aug 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11768

Originally assigned to: @pdevine on GitHub.

Ollama currently offers two access methods: the locally deployed Ollama that can be accessed without an API key, and Ollama turbo which requires an API key for access. My suggestion is to add a method for calling locally deployed Ollama without requiring an API key, which would allow Ollama to automatically call Ollama Turbo when needed. The reasons are as follows:

The unified API enables third-party developers to support all Ollama features with minimal code modifications.
Not requiring API keys enhances security for third-party applications. This avoids unsafe practices such as developers writing API keys directly into their code.
Increasing the installation rate and usability of Ollama. The adoption of Ollama by third-party applications will lead to higher installation rates of Ollama itself. This is because even users who don’t download local models still need to deploy a local instance of Ollama for proxy purposes when using turbo. In the future, Ollama can achieve minimum operability by pre-installing an open-source small local model, such as Qwen3 in 1.7B or 4B parameter sizes.
To better serve developers. According to current App Store guidelines, macOS applications that require users to input API Keys are most likely to be rejected during Apple’s review process, while iOS applications do not face this limitation. If Ollama adopts this approach, developers won’t need to have macOS users enter API Keys. Previously, macOS developers had to relay requests through their own web servers, which is more costly and requires higher technical capabilities.

The specific method I’m proposing is that Ollama should determine whether the model called by the user is available locally. If not, and when the user has turbo access, it should automatically call the turbo model. This way, users don’t need to input Ollama’s turbo in third-party applications and can use them directly. Ollama could then include settings that allow users to configure whether a specific application is allowed to access turbo. The specific design for this can be considered by Ollama itself.

Originally created by @owenzhao on GitHub (Aug 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11768 Originally assigned to: @pdevine on GitHub. Ollama currently offers two access methods: the locally deployed Ollama that can be accessed without an API key, and Ollama turbo which requires an API key for access. My suggestion is to add a method for calling locally deployed Ollama without requiring an API key, which would allow Ollama to automatically call Ollama Turbo when needed. The reasons are as follows: 1. The unified API enables third-party developers to support all Ollama features with minimal code modifications. 2. Not requiring API keys enhances security for third-party applications. This avoids unsafe practices such as developers writing API keys directly into their code. 3. Increasing the installation rate and usability of Ollama. The adoption of Ollama by third-party applications will lead to higher installation rates of Ollama itself. This is because even users who don’t download local models still need to deploy a local instance of Ollama for proxy purposes when using turbo. In the future, Ollama can achieve minimum operability by pre-installing an open-source small local model, such as Qwen3 in 1.7B or 4B parameter sizes. 4. To better serve developers. According to current App Store guidelines, macOS applications that require users to input API Keys are most likely to be rejected during Apple’s review process, while iOS applications do not face this limitation. If Ollama adopts this approach, developers won’t need to have macOS users enter API Keys. Previously, macOS developers had to relay requests through their own web servers, which is more costly and requires higher technical capabilities. The specific method I’m proposing is that Ollama should determine whether the model called by the user is available locally. If not, and when the user has turbo access, it should automatically call the turbo model. This way, users don’t need to input Ollama’s turbo in third-party applications and can use them directly. Ollama could then include settings that allow users to configure whether a specific application is allowed to access turbo. The specific design for this can be considered by Ollama itself.

GiteaMirror added the cloud feature request labels 2026-04-22 16:24:20 -05:00

GiteaMirror closed this issue

2026-04-22 16:24:20 -05:00

GiteaMirror commented

2026-04-22 16:24:21 -05:00

@BumpyClock commented on GitHub (Aug 7, 2025):

+1 I feel pretty dumb subbing to Turbo, I took the access via github CLI to mean that it would also be accessible via the local ollama API. without that I don't quite see the point of Turbo, that's the most common use case for me, I know I can use it with the python library but then I have to switch all my code from the OpenAI SDK with just the baseURL switch to ollama.

@BumpyClock commented on GitHub (Aug 7, 2025): +1 I feel pretty dumb subbing to Turbo, I took the access via github CLI to mean that it would also be accessible via the local ollama API. without that I don't quite see the point of Turbo, that's the most common use case for me, I know I can use it with the python library but then I have to switch all my code from the OpenAI SDK with just the baseURL switch to ollama.

GiteaMirror commented

2026-04-22 16:24:21 -05:00

@pdevine commented on GitHub (Aug 7, 2025):

Hey guys, this is something I've already been looking at and have a working prototype. I'll see if I can get it into shippable shape soon. The cool thing is you can just use a Modelfile and set whatever parameters you want so you can tweak the default Turbo mode settings for the model.

@pdevine commented on GitHub (Aug 7, 2025): Hey guys, this is something I've already been looking at and have a working prototype. I'll see if I can get it into shippable shape soon. The cool thing is you can just use a Modelfile and set whatever parameters you want so you can tweak the default Turbo mode settings for the model.

GiteaMirror commented

2026-04-22 16:24:22 -05:00

@LivioGama commented on GitHub (Aug 7, 2025):

Great @pdevine !
I also wrote this to one of your colleague that reached out per email:

After thinking, that's actually inaccurate. SSH keys cannot be used to communicate directly over HTTP, the protocol are not made for that.
Since you probably faced this issue, I decided to write to you. Because there still be a clever way:
It's a bit of code but since the users are required to send they SSH key just like in the tuto: https://github.com/ollama/ollama/blob/main/docs/turbo.md
You could smartly assumed that this key (or part of it) IS the first bearer, and set it in the profile. Then Ollama client simply needs an update to take into account that the SSH key might be the actual bearer and challenge auth with it.
It's creative, but not perfect. There is still the question of expiration and rotation, but the current token implementation does not solve this either, anyway :)

@LivioGama commented on GitHub (Aug 7, 2025): Great @pdevine ! I also wrote this to one of your colleague that reached out per email: > After thinking, that's actually inaccurate. SSH keys cannot be used to communicate directly over HTTP, the protocol are not made for that. > Since you probably faced this issue, I decided to write to you. Because there still be a clever way: > It's a bit of code but since the users are required to send they SSH key just like in the tuto: https://github.com/ollama/ollama/blob/main/docs/turbo.md > You could smartly assumed that this key (or part of it) IS the first bearer, and set it in the profile. Then Ollama client simply needs an update to take into account that the SSH key might be the actual bearer and challenge auth with it. > It's creative, but not perfect. There is still the question of expiration and rotation, but the current token implementation does not solve this either, anyway :)

GiteaMirror commented

2026-04-22 16:24:22 -05:00

@pdevine commented on GitHub (Aug 7, 2025):

@LivioGama The way that the ed25519 keys work is that you use your private key to sign the request, and then the pubkey and the signature are sent as the bearer token. Your pubkey is then matched and the signature is verified against the request being made. This is between your local ollama server and ollama.com.

I have another change which implements this same method between the local client and the local server. It uses an authorized_keys file with simple RBAC for any of the API endpoints. The draft PR is up at #11574

@pdevine commented on GitHub (Aug 7, 2025): @LivioGama The way that the ed25519 keys work is that you use your private key to sign the request, and then the pubkey and the signature are sent as the bearer token. Your pubkey is then matched and the signature is verified against the request being made. This is between your local ollama server and ollama.com. I have another change which implements this same method between the local client and the local server. It uses an `authorized_keys` file with simple RBAC for any of the API endpoints. The draft PR is up at #11574

GiteaMirror commented

2026-04-22 16:24:22 -05:00

@LivioGama commented on GitHub (Aug 8, 2025):

Thank you very much, I truly appreciate your reactivity.
I can see by your answer that you really know what you are doing. Therefore could you explain concisely what went wrong with this release so the remote turbo models does not show up? From what I understand it should have worked with the ed25519 key. Unless you are actually talking about the implementation you recently prepared to fix the issue?

@LivioGama commented on GitHub (Aug 8, 2025): Thank you very much, I truly appreciate your reactivity. I can see by your answer that you really know what you are doing. Therefore could you explain concisely what went wrong with this release so the remote turbo models does not show up? From what I understand it should have worked with the ed25519 key. Unless you are actually talking about the implementation you recently prepared to fix the issue?

GiteaMirror commented

2026-04-22 16:24:23 -05:00

@pdevine commented on GitHub (Aug 8, 2025):

@LivioGama Yes, the feature isn't ready yet. Instead what was released was the ollama API running on ollama.com. Using the CLI you can run OLLAMA_HOST=ollama.com ollama ls and see each of the Turbo models, and then use OLLAMA_HOST=ollama.com ollama run gpt-oss:120b to run the 120b model remotely.

@pdevine commented on GitHub (Aug 8, 2025): @LivioGama Yes, the feature isn't ready yet. Instead what was released was the ollama API running on ollama.com. Using the CLI you can run `OLLAMA_HOST=ollama.com ollama ls` and see each of the Turbo models, and then use `OLLAMA_HOST=ollama.com ollama run gpt-oss:120b` to run the 120b model remotely.

GiteaMirror commented

2026-04-22 16:24:23 -05:00

@LivioGama commented on GitHub (Aug 8, 2025):

Alright, so for now the OpenAI Compatible way I found is the only solution for IDE 😊 https://github.com/LivioGama/gpt-oss-120b-MAX

@LivioGama commented on GitHub (Aug 8, 2025): Alright, so for now the OpenAI Compatible way I found is the only solution for IDE 😊 https://github.com/LivioGama/gpt-oss-120b-MAX

GiteaMirror commented

2026-04-22 16:24:24 -05:00

@BumpyClock commented on GitHub (Aug 8, 2025):

What a legend! Thank you for sharing that

@BumpyClock commented on GitHub (Aug 8, 2025): What a legend! Thank you for sharing that

GiteaMirror commented

2026-04-22 16:24:24 -05:00

@BumpyClock commented on GitHub (Aug 11, 2025):

@LivioGama Yes, the feature isn't ready yet. Instead what was released was the ollama API running on ollama.com. Using the CLI you can run OLLAMA_HOST=ollama.com ollama ls and see each of the Turbo models, and then use OLLAMA_HOST=ollama.com ollama run gpt-oss:120b to run the 120b model remotely.

The problem with doing

OLLAMA_HOST=ollama.com ollama run gpt-oss:120b

is that it's not really accesible via the local API that way. To me that was the biggest appeal for Tubo, that I can just point it to Ollama for local testing and use a larger model and see how that will work. That not being available in Turbo is the biggest bummer. I know you're working on it. Looking forward to it, until them I'm using the workaround that @LivioGama posted and it's working okay so far.

@BumpyClock commented on GitHub (Aug 11, 2025): > [@LivioGama](https://github.com/LivioGama) Yes, the feature isn't ready yet. Instead what was released was the ollama API running on ollama.com. Using the CLI you can run `OLLAMA_HOST=ollama.com ollama ls` and see each of the Turbo models, and then use `OLLAMA_HOST=ollama.com ollama run gpt-oss:120b` to run the 120b model remotely. The problem with doing > OLLAMA_HOST=ollama.com ollama run gpt-oss:120b is that it's not really accesible via the local API that way. To me that was the biggest appeal for Tubo, that I can just point it to Ollama for local testing and use a larger model and see how that will work. That not being available in Turbo is the biggest bummer. I know you're working on it. Looking forward to it, until them I'm using the workaround that @LivioGama posted and it's working okay so far.

GiteaMirror commented

2026-04-22 16:24:25 -05:00

@LivioGama commented on GitHub (Aug 13, 2025):

@BumpyClock I have a phenomenal good news!
It turns out that my implem was working only for "non stream" mode. I reviewed it completely to support streaming, but also tool calling etc.. Now it works completely with RooCode/KiloCode and other llm tools ! Just need to pull and you're good

@LivioGama commented on GitHub (Aug 13, 2025): @BumpyClock I have a phenomenal good news! It turns out that my implem was working only for "non stream" mode. I reviewed it completely to support streaming, but also tool calling etc.. Now it works completely with RooCode/KiloCode and other llm tools ! Just need to pull and you're good <img width="602" height="1037" alt="Image" src="https://github.com/user-attachments/assets/4505ef80-b33b-40a0-8099-69a1435db638" /> <img width="593" height="947" alt="Image" src="https://github.com/user-attachments/assets/a89e69d6-2af9-4e10-b2fa-c85b2eebd1d7" />

GiteaMirror commented

2026-04-22 16:24:25 -05:00

@BumpyClock commented on GitHub (Aug 14, 2025):

Love it! In my local branch I made it a transparent proxy for ollama endpoints so I could use it with openwebui and it worked in streming mode. I'll checkout the latest. Appreciate it!

Edit: @LivioGama here's my fork, I (well Claude) updated it to be a transparent proxy that combines the local and cloud models for transparent access.

@BumpyClock commented on GitHub (Aug 14, 2025): Love it! In my local branch I made it a transparent proxy for ollama endpoints so I could use it with openwebui and it worked in streming mode. I'll checkout the latest. Appreciate it! Edit: @LivioGama [here's my fork](https://github.com/BumpyClock/gpt-oss-120b-MAX), I (well Claude) updated it to be a transparent proxy that combines the local and cloud models for transparent access.

GiteaMirror commented

2026-04-22 16:24:25 -05:00

@LivioGama commented on GitHub (Aug 15, 2025):

@BumpyClock I also did it in my project!
I run both:

Ollama proxy on localhost:3305
OpenAI compatible on localhost:3304

What I noticed:

Roo Code will not work with Ollama proxy but OpenAI yes, I filed a bug and it was auto fix by a bot and auto reviewed by the same bot 😅 https://github.com/RooCodeInc/Roo-Code/issues/7070
Kilo code will run correctly with both approach (which is funny since Kilo code is a fork from RooCode)
JetBrains AI will not work with Ollama but OpenAI yes
Cline supports the header bearer directly which means no need for this proxy. Example video

Don't hesitate to share if you have feedback from your side too, curious to see this evolving

PS: Still getting some nasty errors:

Unexpected API Response: The language model did not provide any assistant
messages. This may indicate an issue with the API or the model's output.

It's really that the tools expects a non empty assistant message, which is not always provided...

I tried to enforce it with a rule/guideline:

ALWAYS ALWAYS INCLUDE A NON EMPTY MESSAGE FROM THE ROLE ASSISTANT IN ALL OF OUR INTERACTIONS AS THE FIRST MESSAGE OF YOUR ANSWER. FAILING TO COMPLY TO THIS RULE IS ACTUALLY THE WORSE MISTAKE TO DO, IT BREAKS EVERYTHING AND I GET "Unexpected API Response: The language model did not provide any assistant messages. This may indicate an issue with the API or the model's output."

It's a bit better but not a complete fix... Maybe there is something to do to enforce non empty assistant message on the proxy, but seems complicated with streaming...

🎁 As a bonus if you read until here (thanks 🎉): I managed to make cline a bit more "agentic" and take initiative like cursor, crafting this rule/guideline:

Try to take initiative, don't assume I only want the task I asked for, guess what I would like after that (in term of actions non related to code), and run them. For example, if I ask for code, it's very likely that I don't only want code, I wanted to be from up to date libraries using mcp context7. I also want you to check that it's compiling. And I also want you to run the program if possible. Then the more important is that you need to analyze the output of that run in order to detect possible errors and fix them.

🎁 As bonus 2: I also added in local file logging and it is very interesting to see how roo code / kilo code where built: they simply created their own language with tags on top of the LLM:

Enjoy 🎉

@LivioGama commented on GitHub (Aug 15, 2025): @BumpyClock I also did it in my project! I run both: - Ollama proxy on localhost:3305 - OpenAI compatible on localhost:3304 What I noticed: - [Roo Code will not work with Ollama proxy](https://discord.com/channels/1128867683291627614/1128867684130508875/1405352237081038899) but OpenAI yes, I filed a bug and it was auto fix by a bot and auto reviewed by the same bot 😅 https://github.com/RooCodeInc/Roo-Code/issues/7070 - Kilo code will run correctly with both approach (which is funny since Kilo code is a fork from RooCode) - JetBrains AI will not work with Ollama but [OpenAI yes](https://discord.com/channels/1318600112242561076/1318600112695677031/1405538818979008634) - [Cline supports the header bearer](https://discord.com/channels/1128867683291627614/1128867684130508875/1405356128778457219) directly which means no need for this proxy. [Example video](https://discord.com/channels/1318600112242561076/1318600112695677031/1405681117159358584) Don't hesitate to share if you have feedback from your side too, curious to see this evolving PS: Still getting some nasty errors: ``` Unexpected API Response: The language model did not provide any assistant messages. This may indicate an issue with the API or the model's output. ``` It's really that the tools expects a non empty assistant message, which is not always provided... <img width="1026" height="1750" alt="Image" src="https://github.com/user-attachments/assets/9b385da6-38c5-424a-9b9c-988f871ed34f" /> I tried to enforce it with a rule/guideline: ``` ALWAYS ALWAYS INCLUDE A NON EMPTY MESSAGE FROM THE ROLE ASSISTANT IN ALL OF OUR INTERACTIONS AS THE FIRST MESSAGE OF YOUR ANSWER. FAILING TO COMPLY TO THIS RULE IS ACTUALLY THE WORSE MISTAKE TO DO, IT BREAKS EVERYTHING AND I GET "Unexpected API Response: The language model did not provide any assistant messages. This may indicate an issue with the API or the model's output." ``` It's a bit better but not a complete fix... Maybe there is something to do to enforce non empty assistant message on the proxy, but seems complicated with streaming... 🎁 As a bonus if you read until here (thanks 🎉): I managed to make cline a bit more "agentic" and take initiative like cursor, crafting this rule/guideline: ``` Try to take initiative, don't assume I only want the task I asked for, guess what I would like after that (in term of actions non related to code), and run them. For example, if I ask for code, it's very likely that I don't only want code, I wanted to be from up to date libraries using mcp context7. I also want you to check that it's compiling. And I also want you to run the program if possible. Then the more important is that you need to analyze the output of that run in order to detect possible errors and fix them. ``` 🎁 As bonus 2: I also added in local file logging and it is very interesting to see how roo code / kilo code where built: they simply created their own language with tags on top of the LLM: ![Image](https://github.com/user-attachments/assets/531e4fc1-902e-468c-a9e3-32305f091c42) Enjoy 🎉

GiteaMirror commented

2026-04-22 16:24:26 -05:00

@mdlmarkham commented on GitHub (Sep 3, 2025):

I was able to connect OpenWebUI to Ollama Turbo following the docs... and then use OpenWebUI to proxy both of the GPT-OSS models locally through the /api/v1 endpoint. GPT-OSS Works with tools in n8n.

@mdlmarkham commented on GitHub (Sep 3, 2025): I was able to connect OpenWebUI to Ollama Turbo following the docs... and then use OpenWebUI to proxy both of the GPT-OSS models locally through the /api/v1 endpoint. GPT-OSS Works with tools in n8n.

GiteaMirror commented

2026-04-22 16:24:27 -05:00

@jmorganca commented on GitHub (Sep 21, 2025):

This is now possible with cloud models! With Ollama 0.12.0, you can now run:

ollama pull qwen3-coder:480b-cloud

And then sign in by running:

ollama signin

Then refer to qwen3-coder:480b-cloud in the API or other tools

Let me know if you have any trouble getting up and running

@jmorganca commented on GitHub (Sep 21, 2025): This is now possible with [cloud models](https://ollama.com/blog/cloud-models)! With Ollama 0.12.0, you can now run: ``` ollama pull qwen3-coder:480b-cloud ``` And then sign in by running: ``` ollama signin ``` Then refer to `qwen3-coder:480b-cloud` in the API or other tools Let me know if you have any trouble getting up and running

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#33557