[GH-ISSUE #2109] Support loading multiple models at the same time #26968

New Issue

GiteaMirror · 2026-04-22T03:46:26-05:00

GiteaMirror commented

2026-04-22 03:46:26 -05:00

Originally created by @Picaso2 on GitHub (Jan 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2109

Originally assigned to: @dhiltgen on GitHub.

is it possible to create one model from multiple models? or even load multiple models?

Originally created by @Picaso2 on GitHub (Jan 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2109 Originally assigned to: @dhiltgen on GitHub. is it possible to create one model from multiple models? or even load multiple models?

GiteaMirror closed this issue

2026-04-22 03:46:26 -05:00

GiteaMirror commented

2026-04-22 03:46:27 -05:00

@Dbone29 commented on GitHub (Jan 22, 2024):

You can merge 2 models with another tool. https://huggingface.co/Undi95 He doe it with some models. After that you can a gguf file auf this model und use it in ollama when you want to. Ollama on its own isn't able to combine 2 modells.

@Dbone29 commented on GitHub (Jan 22, 2024): You can merge 2 models with another tool. https://huggingface.co/Undi95 He doe it with some models. After that you can a gguf file auf this model und use it in ollama when you want to. Ollama on its own isn't able to combine 2 modells.

GiteaMirror commented

2026-04-22 03:46:28 -05:00

@cmndcntrlcyber commented on GitHub (Mar 2, 2024):

You can merge 2 models with another tool. https://huggingface.co/Undi95 He doe it with some models. After that you can a gguf file auf this model und use it in ollama when you want to. Ollama on its own isn't able to combine 2 modells.

Do you happen to have a link or name of the tool?

@cmndcntrlcyber commented on GitHub (Mar 2, 2024): > You can merge 2 models with another tool. https://huggingface.co/Undi95 He doe it with some models. After that you can a gguf file auf this model und use it in ollama when you want to. Ollama on its own isn't able to combine 2 modells. Do you happen to have a link or name of the tool?

GiteaMirror commented

2026-04-22 03:46:28 -05:00

@Dbone29 commented on GitHub (Mar 2, 2024):

There are many tools for this task, but unfortunately, I am not familiar enough to say which one is the best or what the differences between them are. However, here's an example of a tool that I came across last year:

https://github.com/arcee-ai/mergekit

@Dbone29 commented on GitHub (Mar 2, 2024): There are many tools for this task, but unfortunately, I am not familiar enough to say which one is the best or what the differences between them are. However, here's an example of a tool that I came across last year: https://github.com/arcee-ai/mergekit

GiteaMirror commented

2026-04-22 03:46:29 -05:00

@pdevine commented on GitHub (Mar 11, 2024):

@Picaso2 other than the multimodal models we don't yet support loading multiple models into memory simultaneously. What is the use case you're trying to do?

@pdevine commented on GitHub (Mar 11, 2024): @Picaso2 other than the multimodal models we don't _yet_ support loading multiple models into memory simultaneously. What is the use case you're trying to do?

GiteaMirror commented

2026-04-22 03:46:30 -05:00

@mofanke commented on GitHub (Mar 12, 2024):

@Picaso2 other than the multimodal models we don't yet support loading multiple models into memory simultaneously. What is the use case you're trying to do?

I encountered a similar requirement, and I want to implement a RAG (Retrieval-Augmented Generation) system. It requires using both an embedding model and a chat model separately. Currently, the implementation with Ollama requires constantly switching between models, which slows down the process. It would be much more efficient if there was a way to use them simultaneously.

@mofanke commented on GitHub (Mar 12, 2024): > @Picaso2 other than the multimodal models we don't _yet_ support loading multiple models into memory simultaneously. What is the use case you're trying to do? I encountered a similar requirement, and I want to implement a RAG (Retrieval-Augmented Generation) system. It requires using both an embedding model and a chat model separately. Currently, the implementation with Ollama requires constantly switching between models, which slows down the process. It would be much more efficient if there was a way to use them simultaneously.

GiteaMirror commented

2026-04-22 03:46:31 -05:00

@Picaso2 commented on GitHub (Mar 12, 2024):

Ultimately i would like to have an system that i can have a conversation with on various topics from science to politic to math.

From: mofanke @.>
Sent: Tuesday, March 12, 2024 02:09
To: ollama/ollama @.>
Cc: Picaso2 @.>; Mention @.>
Subject: Re: [ollama/ollama] multiple models (Issue #2109)

@Picaso2https://github.com/Picaso2 other than the multimodal models we don't yet support loading multiple models into memory simultaneously. What is the use case you're trying to do?

I encountered a similar requirement, and I want to implement a RAG (Retrieval-Augmented Generation) system. It requires using both an embedding model and a chat model separately. Currently, the implementation with Ollama requires constantly switching between models, which slows down the process. It would be much more efficient if there was a way to use them simultaneously.

—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2109#issuecomment-1990754093, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASTD6ECNM3664GEZHXMF34TYX2LXZAVCNFSM6AAAAABCDDUZMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJQG42TIMBZGM.
You are receiving this because you were mentioned.Message ID: @.***>

@Picaso2 commented on GitHub (Mar 12, 2024): Ultimately i would like to have an system that i can have a conversation with on various topics from science to politic to math. ________________________________ From: mofanke ***@***.***> Sent: Tuesday, March 12, 2024 02:09 To: ollama/ollama ***@***.***> Cc: Picaso2 ***@***.***>; Mention ***@***.***> Subject: Re: [ollama/ollama] multiple models (Issue #2109) @Picaso2<https://github.com/Picaso2> other than the multimodal models we don't yet support loading multiple models into memory simultaneously. What is the use case you're trying to do? I encountered a similar requirement, and I want to implement a RAG (Retrieval-Augmented Generation) system. It requires using both an embedding model and a chat model separately. Currently, the implementation with Ollama requires constantly switching between models, which slows down the process. It would be much more efficient if there was a way to use them simultaneously. — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/2109#issuecomment-1990754093>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ASTD6ECNM3664GEZHXMF34TYX2LXZAVCNFSM6AAAAABCDDUZMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJQG42TIMBZGM>. You are receiving this because you were mentioned.Message ID: ***@***.***>

GiteaMirror commented

2026-04-22 03:46:32 -05:00

@Fruetel commented on GitHub (Mar 16, 2024):

@Picaso2 other than the multimodal models we don't yet support loading multiple models into memory simultaneously. What is the use case you're trying to do?

I also have a use case for this.

I'm using Crew.ai with Ollama. I have agents which need to use tools, such as search or document retrieval and then there are agents who work on data which are provided by the tool users. For tool using agent I use Hermes-2-Pro-Mistral, which is optimized for tool usage but not that smart with 7 billion parameters. Would be awesome to be able to load a smart Mixtral model for the thinking agents in parallel to the Hermes for the tool using ones.

@Fruetel commented on GitHub (Mar 16, 2024): > @Picaso2 other than the multimodal models we don't _yet_ support loading multiple models into memory simultaneously. What is the use case you're trying to do? I also have a use case for this. I'm using Crew.ai with Ollama. I have agents which need to use tools, such as search or document retrieval and then there are agents who work on data which are provided by the tool users. For tool using agent I use Hermes-2-Pro-Mistral, which is optimized for tool usage but not that smart with 7 billion parameters. Would be awesome to be able to load a smart Mixtral model for the thinking agents in parallel to the Hermes for the tool using ones.

GiteaMirror commented

2026-04-22 03:46:32 -05:00

@dizzyriver commented on GitHub (Mar 16, 2024):

Same. I'd like llava for image to text and mixtral for language reasoning

@dizzyriver commented on GitHub (Mar 16, 2024): Same. I'd like llava for image to text and mixtral for language reasoning

GiteaMirror commented

2026-04-22 03:46:34 -05:00

@alfi4000 commented on GitHub (Mar 19, 2024):

Same. I'd like llava for text to image and mixtral for language reasoning

same

@alfi4000 commented on GitHub (Mar 19, 2024): > Same. I'd like llava for text to image and mixtral for language reasoning same

GiteaMirror commented

2026-04-22 03:46:36 -05:00

@alfi4000 commented on GitHub (Mar 19, 2024):

would It be possible to run several models 1 over gpu and the other ones over cpu and ram because I want to be able to run several models at once so if one of my family members are using Ollama over open webui and I do the same time it would be good that one runs on cpu and the other one one gpu!

@alfi4000 commented on GitHub (Mar 19, 2024): would It be possible to run several models 1 over gpu and the other ones over cpu and ram because I want to be able to run several models at once so if one of my family members are using Ollama over open webui and I do the same time it would be good that one runs on cpu and the other one one gpu!

GiteaMirror commented

2026-04-22 03:46:36 -05:00

@oldgithubman commented on GitHub (Mar 19, 2024):

I have a rig with three graphics cards that I would like to run three separate models on simultaneously and have them group chat

@oldgithubman commented on GitHub (Mar 19, 2024): I have a rig with three graphics cards that I would like to run three separate models on simultaneously and have them group chat

GiteaMirror commented

2026-04-22 03:46:37 -05:00

@alfi4000 commented on GitHub (Mar 23, 2024):

I have a rig with three graphics cards that I would like to run three separate models on simultaneously and have them group chat

Try run this: and edit the ip Adress 127.0.0.1 to your rigs ip Adress or just let it how it is and then add the ip Adress with Port to your instances for example I used open webui added those 3 and it managed one ip Adress connection per Chat window so it should handle all 3 graphics car but only if you run it like I have you!:

Linux (I tested in Ubuntu)
OLLAMA_HOST=127.0.0.1:11435 ollama serve & OLLAMA_HOST=127.0.0.1:11436 ollama serve & OLLAMA_HOST=127.0.0.1:11437 ollama serve

just an example you can use different ports but on one connection only one gpu and 1 llm not several other wise it will first finish first the second and then third!

To stop it
Linux (on Ubuntu tested)
Command: grep ollama
Output:
1828
2883
1284
Command: kill 1284 & kill 1828 & kill 2883
If that doesn’t work try to kill each process manually!

command: kill 1828
command: kill 2883
command: kill 1284

@alfi4000 commented on GitHub (Mar 23, 2024): > > I have a rig with three graphics cards that I would like to run three separate models on simultaneously and have them group chat > Try run this: and edit the ip Adress 127.0.0.1 to your rigs ip Adress or just let it how it is and then add the ip Adress with Port to your instances for example I used open webui added those 3 and it managed one ip Adress connection per Chat window so it should handle all 3 graphics car but only if you run it like I have you!: Linux (I tested in Ubuntu) OLLAMA_HOST=127.0.0.1:11435 ollama serve & OLLAMA_HOST=127.0.0.1:11436 ollama serve & OLLAMA_HOST=127.0.0.1:11437 ollama serve just an example you can use different ports but on one connection only one gpu and 1 llm not several other wise it will first finish first the second and then third! To stop it Linux (on Ubuntu tested) Command: grep ollama Output: 1828 2883 1284 Command: kill 1284 & kill 1828 & kill 2883 If that doesn’t work try to kill each process manually! command: kill 1828 command: kill 2883 command: kill 1284

GiteaMirror commented

2026-04-22 03:46:37 -05:00

@oldgithubman commented on GitHub (Mar 30, 2024):

That's what I'm currently doing (loosely), but you also have to map each instance to a specific GPU. It works, but it's very clunky to setup. A GUI would be nice

@oldgithubman commented on GitHub (Mar 30, 2024): That's what I'm currently doing (loosely), but you also have to map each instance to a specific GPU. It works, but it's very clunky to setup. A GUI would be nice

GiteaMirror commented

2026-04-22 03:46:38 -05:00

@leporel commented on GitHub (Apr 6, 2024):

run in docker, stick containers separately with gpu1,gpu2 or cpu only, open-webui can work with multiply ollama instances

version: '3.8'

services:
  ollama:
    volumes:
      - type: bind
        source: C:\MyPrograms\ollama\data
        target: /root/.ollama
      - type: bind
        source: C:\MyPrograms\ollama\models
        target: /models
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest
    ports:
      - ${OPEN_WEBUI_PORT-11434}:11434
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities:
                - gpu

  ollama-cpu:
    volumes:
      - type: bind
        source: C:\MyPrograms\ollama\data
        target: /root/.ollama
      - type: bind
        source: C:\MyPrograms\ollama\models
        target: /models
    container_name: ollama-cpu
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest
    ports:
      - ${OPEN_WEBUI_PORT2-11435}:11434
                
  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    volumes:
      - type: bind
        source: C:\MyPrograms\ollama\web
        target: /app/backend/data
    depends_on:
      - ollama
    ports:
      - ${OPEN_WEBUI_PORT-3000}:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:${OPEN_WEBUI_PORT-11434}'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

@leporel commented on GitHub (Apr 6, 2024): run in docker, stick containers separately with gpu1,gpu2 or cpu only, open-webui can work with multiply ollama instances ``` version: '3.8' services: ollama: volumes: - type: bind source: C:\MyPrograms\ollama\data target: /root/.ollama - type: bind source: C:\MyPrograms\ollama\models target: /models container_name: ollama pull_policy: always tty: true restart: unless-stopped image: ollama/ollama:latest ports: - ${OPEN_WEBUI_PORT-11434}:11434 deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0'] capabilities: - gpu ollama-cpu: volumes: - type: bind source: C:\MyPrograms\ollama\data target: /root/.ollama - type: bind source: C:\MyPrograms\ollama\models target: /models container_name: ollama-cpu pull_policy: always tty: true restart: unless-stopped image: ollama/ollama:latest ports: - ${OPEN_WEBUI_PORT2-11435}:11434 open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui volumes: - type: bind source: C:\MyPrograms\ollama\web target: /app/backend/data depends_on: - ollama ports: - ${OPEN_WEBUI_PORT-3000}:8080 environment: - 'OLLAMA_BASE_URL=http://ollama:${OPEN_WEBUI_PORT-11434}' - 'WEBUI_SECRET_KEY=' extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped ```

GiteaMirror commented

2026-04-22 03:46:38 -05:00

@oldgithubman commented on GitHub (Apr 6, 2024):

run in docker, stick containers separately with gpu1,gpu2 or cpu only, open-webui can work with multiply ollama instances

version: '3.8'

services:
  ollama:
    volumes:
      - type: bind
        source: C:\MyPrograms\ollama\data
        target: /root/.ollama
      - type: bind
        source: C:\MyPrograms\ollama\models
        target: /models
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest
    ports:
      - ${OPEN_WEBUI_PORT-11434}:11434
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities:
                - gpu

  ollama-cpu:
    volumes:
      - type: bind
        source: C:\MyPrograms\ollama\data
        target: /root/.ollama
      - type: bind
        source: C:\MyPrograms\ollama\models
        target: /models
    container_name: ollama-cpu
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest
    ports:
      - ${OPEN_WEBUI_PORT2-11435}:11434
                
  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    volumes:
      - type: bind
        source: C:\MyPrograms\ollama\web
        target: /app/backend/data
    depends_on:
      - ollama
    ports:
      - ${OPEN_WEBUI_PORT-3000}:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:${OPEN_WEBUI_PORT-11434}'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

No offense, but that's even clunkier. You already don't need to use docker

@oldgithubman commented on GitHub (Apr 6, 2024): > run in docker, stick containers separately with gpu1,gpu2 or cpu only, open-webui can work with multiply ollama instances > > ``` > version: '3.8' > > services: > ollama: > volumes: > - type: bind > source: C:\MyPrograms\ollama\data > target: /root/.ollama > - type: bind > source: C:\MyPrograms\ollama\models > target: /models > container_name: ollama > pull_policy: always > tty: true > restart: unless-stopped > image: ollama/ollama:latest > ports: > - ${OPEN_WEBUI_PORT-11434}:11434 > deploy: > resources: > reservations: > devices: > - driver: nvidia > device_ids: ['0'] > capabilities: > - gpu > > ollama-cpu: > volumes: > - type: bind > source: C:\MyPrograms\ollama\data > target: /root/.ollama > - type: bind > source: C:\MyPrograms\ollama\models > target: /models > container_name: ollama-cpu > pull_policy: always > tty: true > restart: unless-stopped > image: ollama/ollama:latest > ports: > - ${OPEN_WEBUI_PORT2-11435}:11434 > > open-webui: > image: ghcr.io/open-webui/open-webui:latest > container_name: open-webui > volumes: > - type: bind > source: C:\MyPrograms\ollama\web > target: /app/backend/data > depends_on: > - ollama > ports: > - ${OPEN_WEBUI_PORT-3000}:8080 > environment: > - 'OLLAMA_BASE_URL=http://ollama:${OPEN_WEBUI_PORT-11434}' > - 'WEBUI_SECRET_KEY=' > extra_hosts: > - host.docker.internal:host-gateway > restart: unless-stopped > ``` No offense, but that's even clunkier. You already don't need to use docker

GiteaMirror commented

2026-04-22 03:46:39 -05:00

@oldgithubman commented on GitHub (Apr 23, 2024):

Can we have control over which model is run on which GPU?

@oldgithubman commented on GitHub (Apr 23, 2024): Can we have control over which model is run on which GPU?

GiteaMirror commented

2026-04-22 03:46:40 -05:00

@dhiltgen commented on GitHub (Apr 23, 2024):

Can we have control over which model is run on which GPU?

This is something we can look at adding incrementally as this feature matures. Feel free to file a new issue and capture how you'd like it to work.

@dhiltgen commented on GitHub (Apr 23, 2024): > Can we have control over which model is run on which GPU? This is something we can look at adding incrementally as this feature matures. Feel free to file a new issue and capture how you'd like it to work.

GiteaMirror commented

2026-04-22 03:46:41 -05:00

@dougy83 commented on GitHub (Jun 9, 2024):

Can we have control over which model is run on which GPU?

You can create a new cpu-only model name using the following (e.g. for phi3 model) ollama show phi3 --model-file > phi3-cpuonly.modelfile, editing that file to include PARAMETER num_cpu 0 and update the FROM section as it describes, and then using ollama create -f phi3-cpuonly.modelfile phi3-cpuonly
You then just reference the the phi3-cpuonly, and it loads into system RAM. You can call the file and model whatever you want.

@dougy83 commented on GitHub (Jun 9, 2024): > Can we have control over which model is run on which GPU? You can create a new cpu-only model name using the following (e.g. for phi3 model) `ollama show phi3 --model-file > phi3-cpuonly.modelfile`, editing that file to include `PARAMETER num_cpu 0` and update the FROM section as it describes, and then using `ollama create -f phi3-cpuonly.modelfile phi3-cpuonly` You then just reference the the phi3-cpuonly, and it loads into system RAM. You can call the file and model whatever you want.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#26968