[GH-ISSUE #1500] GPU MIG not supported in Kubernetes #47323

New Issue

GiteaMirror · 2026-04-28T03:35:25-05:00

GiteaMirror commented

2026-04-28 03:35:25 -05:00

Originally created by @duhow on GitHub (Dec 13, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1500

Originally assigned to: @dhiltgen on GitHub.

7db5bcf73b/llm/llama.go (L238)

Getting the GPU information (full-GPU memory) is not available - the command above returns Insufficient Permissions, as the container is assigned only a part of it via MIG (Multi-Instance GPU).

However, the container can actually view the MIG devices, and ollama should be able to use them.

root@ollama-0:/# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  Off  | 00000000:05:00.0 Off |                   On |
| N/A   35C    P0    43W / 300W |                  N/A |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    7   0   0  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    8   0   1  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Originally created by @duhow on GitHub (Dec 13, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1500 Originally assigned to: @dhiltgen on GitHub. https://github.com/jmorganca/ollama/blob/7db5bcf73bf7026970e988f56126db8f370f1b11/llm/llama.go#L238 Getting the GPU information (full-GPU memory) is not available - the command above returns `Insufficient Permissions`, as the container is assigned only a part of it via MIG (Multi-Instance GPU). However, the container can actually view the MIG devices, and `ollama` should be able to use them. ``` root@ollama-0:/# nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100 80G... Off | 00000000:05:00.0 Off | On | | N/A 35C P0 43W / 300W | N/A | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | 0 7 0 0 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 8 0 1 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ```

GiteaMirror added the nvidia feature request labels 2026-04-28 03:35:26 -05:00

GiteaMirror closed this issue

2026-04-28 03:35:27 -05:00

GiteaMirror commented

2026-04-28 03:35:27 -05:00

@duhow commented on GitHub (Jan 18, 2024):

Still not working in v0.1.20 .

2024/01/18 10:33:28 routes.go:930: Listening on [::]:11434 (version 0.1.20)
2024/01/18 10:33:29 shim_ext_server.go:142: Dynamic LLM variants [cuda]
2024/01/18 10:33:29 gpu.go:88: Detecting GPU type
2024/01/18 10:33:29 gpu.go:203: Searching for GPU management library libnvidia-ml.so
2024/01/18 10:33:29 gpu.go:248: Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.510.108.03]
2024/01/18 10:33:29 gpu.go:94: Nvidia GPU detected
2024/01/18 10:33:29 gpu.go:125: error looking up CUDA GPU memory: device memory info lookup failure 0: 4
2024/01/18 10:33:29 routes.go:953: no GPU detected

@duhow commented on GitHub (Jan 18, 2024): Still not working in v0.1.20 . ``` 2024/01/18 10:33:28 routes.go:930: Listening on [::]:11434 (version 0.1.20) 2024/01/18 10:33:29 shim_ext_server.go:142: Dynamic LLM variants [cuda] 2024/01/18 10:33:29 gpu.go:88: Detecting GPU type 2024/01/18 10:33:29 gpu.go:203: Searching for GPU management library libnvidia-ml.so 2024/01/18 10:33:29 gpu.go:248: Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.510.108.03] 2024/01/18 10:33:29 gpu.go:94: Nvidia GPU detected 2024/01/18 10:33:29 gpu.go:125: error looking up CUDA GPU memory: device memory info lookup failure 0: 4 2024/01/18 10:33:29 routes.go:953: no GPU detected ```

GiteaMirror commented

2026-04-28 03:35:28 -05:00

@dhiltgen commented on GitHub (Jan 27, 2024):

If I'm understanding correctly, in this environment, we will not be able to use the management library to discover the available GPU memory. That's unfortunate given we really need to know that information before we try to load the model otherwise we may over (or under) allocate VRAM and in the case of over-allocate, crash.

Similar to #1979 we might be able to refine the GPU discovery algorithm to allow you to specify how much memory we can use via an env var override, and then force the CUDA library to be used even though we couldn't perform the management library calls.

@dhiltgen commented on GitHub (Jan 27, 2024): If I'm understanding correctly, in this environment, we will not be able to use the management library to discover the available GPU memory. That's unfortunate given we really need to know that information before we try to load the model otherwise we may over (or under) allocate VRAM and in the case of over-allocate, crash. Similar to #1979 we might be able to refine the GPU discovery algorithm to allow you to specify how much memory we can use via an env var override, and then force the CUDA library to be used even though we couldn't perform the management library calls.

GiteaMirror commented

2026-04-28 03:35:28 -05:00

@waTeim commented on GitHub (Feb 6, 2024):

say, maybe check out my PR, what testing beyond what I've done (if any) is needed?

@waTeim commented on GitHub (Feb 6, 2024): say, maybe check out my PR, what testing beyond what I've done (if any) is needed?

GiteaMirror commented

2026-04-28 03:35:29 -05:00

@Defiler226 commented on GitHub (Feb 19, 2024):

Great work, solved my issue with MIG on Kubernetes! Hope we can get this to the main branch.

@Defiler226 commented on GitHub (Feb 19, 2024): Great work, solved my issue with MIG on Kubernetes! Hope we can get this to the main branch.

GiteaMirror commented

2026-04-28 03:35:29 -05:00

@northcode7 commented on GitHub (Feb 26, 2024):

Solved the issue with MIG here aswell. Works great on K8S. Please merge to main.

@northcode7 commented on GitHub (Feb 26, 2024): Solved the issue with MIG here aswell. Works great on K8S. Please merge to main.

GiteaMirror commented

2026-04-28 03:35:30 -05:00

@dhiltgen commented on GitHub (Apr 15, 2024):

Once #3418 merges, we'll be relying solely on the cudart library (no more management library) so that will help move us forward towards resolving this feature request.

@dhiltgen commented on GitHub (Apr 15, 2024): Once #3418 merges, we'll be relying solely on the cudart library (no more management library) so that will help move us forward towards resolving this feature request.

GiteaMirror commented

2026-04-28 03:35:30 -05:00

@dhiltgen commented on GitHub (May 21, 2024):

I'm curious if the recent transition over to the Driver API has had any impact on MIG support. Could people who have MIG configurations try out the latest ollama image builds and report back if they work properly, or if we still need to get a rebased/refined version of #2264 (or equivalent) merged to enable this usecase?

@dhiltgen commented on GitHub (May 21, 2024): I'm curious if the recent transition over to the Driver API has had any impact on MIG support. Could people who have MIG configurations try out the latest ollama image builds and report back if they work properly, or if we still need to get a rebased/refined version of #2264 (or equivalent) merged to enable this usecase?

GiteaMirror commented

2026-04-28 03:35:30 -05:00

@dasantonym commented on GitHub (May 24, 2024):

Hey @dhiltgen, I can confirm MIG is now working for us with the latest image and the GPU is detected. Thanks a lot!

@dasantonym commented on GitHub (May 24, 2024): Hey @dhiltgen, I can confirm MIG is now working for us with the latest image and the GPU is detected. Thanks a lot!

GiteaMirror commented

2026-04-28 03:35:31 -05:00

@dhiltgen commented on GitHub (May 25, 2024):

That's great to hear!

@dhiltgen commented on GitHub (May 25, 2024): That's great to hear!

GiteaMirror commented

2026-04-28 03:35:31 -05:00

@jonasmock commented on GitHub (May 31, 2024):

@dasantonym I have a question regarding MIG. We have 2 x A100 80GB and plan to use single MIG strategy with 14 x 10GB slices in our Kubernetes.

I want to assign 2 of the slices to Ollama pod. Is Ollama able to use both of the slices? Or will it just use one of them?

@jonasmock commented on GitHub (May 31, 2024): @dasantonym I have a question regarding MIG. We have 2 x A100 80GB and plan to use single MIG strategy with 14 x 10GB slices in our Kubernetes. I want to assign 2 of the slices to Ollama pod. Is Ollama able to use both of the slices? Or will it just use one of them?

GiteaMirror commented

2026-04-28 03:35:31 -05:00

@mhoehl05 commented on GitHub (Aug 1, 2024):

@dasantonym I have a question regarding MIG. We have 2 x A100 80GB and plan to use single MIG strategy with 14 x 10GB slices in our Kubernetes.

I want to assign 2 of the slices to Ollama pod. Is Ollama able to use both of the slices? Or will it just use one of them?

been testing the same setup with 1x h100 and 20gb slices for a proof of concept, running into the same issue. Ollama only utilizes 1 of 3 passed Migs:

@mhoehl05 commented on GitHub (Aug 1, 2024): > @dasantonym I have a question regarding MIG. We have 2 x A100 80GB and plan to use single MIG strategy with 14 x 10GB slices in our Kubernetes. > > I want to assign 2 of the slices to Ollama pod. Is Ollama able to use both of the slices? Or will it just use one of them? been testing the same setup with 1x h100 and 20gb slices for a proof of concept, running into the same issue. Ollama only utilizes 1 of 3 passed Migs: ![image](https://github.com/user-attachments/assets/b96b3364-97a2-4e8f-b6ae-773c68b714ac)

GiteaMirror commented

2026-04-28 03:35:31 -05:00

@dasantonym commented on GitHub (Aug 1, 2024):

Hey @mhoehl05 and @jonasmock , unfortunately I have no clue about this. Is this even supposed to be supported by the architecture? My instinct would be to just run multiple instances on one 100 each, then load-balance between them.

🤷

@dasantonym commented on GitHub (Aug 1, 2024): Hey @mhoehl05 and @jonasmock , unfortunately I have no clue about this. Is this even supposed to be supported by the architecture? My instinct would be to just run multiple instances on one 100 each, then load-balance between them. # 🤷

GiteaMirror commented

2026-04-28 03:35:32 -05:00

@mhoehl05 commented on GitHub (Aug 1, 2024):

@dasantonym using Kubernetes you will not be able run multiple ollama instances on one gpu since you need to pass the gpu into the container making it only avaiable for that one container.

You can load multiple models on one ollama instance, but that kind of kills the purpose of Kubernetes. Slicing the gpu into migs via the mig-manager used by the nvidia-operator in kubernetes, would be a better solution. Creating dedicated ollama instances for each model and passing migs of the gpu via your configured mig strategy (you might even want to use the ollama-operator for that). That way you can precisly orchestrate how much resources are available for each model.

Of course you can just not use kubernetes at all and deploy a beefy vm that can hold up many models. But that might cause trouble loading models, since each models is trying to reserve part of the video memory and in case you do run out of resources, models keep getting loaded and unloaded which does impact performance.

I think its important to have control over the resources that your gpu has and using kubernetes you could dynamically allocate gpu resources to your models. Yet migs need to be supported by ollama in order for that to function.

@mhoehl05 commented on GitHub (Aug 1, 2024): @dasantonym using Kubernetes you will not be able run multiple ollama instances on one gpu since you need to pass the gpu into the container making it only avaiable for that one container. You can load multiple models on one ollama instance, but that kind of kills the purpose of Kubernetes. Slicing the gpu into migs via the mig-manager used by the nvidia-operator in kubernetes, would be a better solution. Creating dedicated ollama instances for each model and passing migs of the gpu via your configured mig strategy (you might even want to use the ollama-operator for that). That way you can precisly orchestrate how much resources are available for each model. Of course you can just not use kubernetes at all and deploy a beefy vm that can hold up many models. But that might cause trouble loading models, since each models is trying to reserve part of the video memory and in case you do run out of resources, models keep getting loaded and unloaded which does impact performance. I think its important to have control over the resources that your gpu has and using kubernetes you could dynamically allocate gpu resources to your models. Yet migs need to be supported by ollama in order for that to function.

GiteaMirror commented

2026-04-28 03:35:32 -05:00

@dasantonym commented on GitHub (Aug 1, 2024):

Sorry, I guess this was badly phrased. I meant 1 Pod -> 1 Model / Instance -> 1 exclusive GPU or slice, then scale that up and load-balance between the Pods. MIG is working, so you should be fine. We're using it like this, at least it works for us.

@dasantonym commented on GitHub (Aug 1, 2024): Sorry, I guess this was badly phrased. I meant 1 Pod -> 1 Model / Instance -> 1 exclusive GPU or slice, then scale that up and load-balance between the Pods. MIG is working, so you should be fine. We're using it like this, at least it works for us.

GiteaMirror commented

2026-04-28 03:35:32 -05:00

@mhoehl05 commented on GitHub (Aug 2, 2024):

oh yes of course thats a valid solution. but since we have models that require different vram capacities, we would need to use mixed strategy, and slice up accordingly to the models we use. That would raise the problem that each model needs a "mig-tier" that can be used to scale. For instance we might have 20gb slices and 50 gb slices accross the cluster and a model like llama3.1:70b could only utilize the 50gb slices (or rather perform decent on these slices).

Also since a gpu can only be sliced while not in use, youd need to find a strategy that suits you best. Otherwise you would have to drain a node and reslice its gpu(s).

I think a more suitable approach would be to create multiple smaller slices that can be consumed by larger models, but are sufficient to run smaller models.

@mhoehl05 commented on GitHub (Aug 2, 2024): oh yes of course thats a valid solution. but since we have models that require different vram capacities, we would need to use mixed strategy, and slice up accordingly to the models we use. That would raise the problem that each model needs a "mig-tier" that can be used to scale. For instance we might have 20gb slices and 50 gb slices accross the cluster and a model like llama3.1:70b could only utilize the 50gb slices (or rather perform decent on these slices). Also since a gpu can only be sliced while not in use, youd need to find a strategy that suits you best. Otherwise you would have to drain a node and reslice its gpu(s). I think a more suitable approach would be to create multiple smaller slices that can be consumed by larger models, but are sufficient to run smaller models.

GiteaMirror commented

2026-04-28 03:35:33 -05:00

@waTeim commented on GitHub (Aug 2, 2024):

check me if I'm wrong on this, but you can't just add up the VRAM (e.g. use 10 5G instances), each MIG instance must have at least as much of the VRAM as what the layer that needs to be loaded on it requires -- at least from the limited debugging messages I saw.

@waTeim commented on GitHub (Aug 2, 2024): check me if I'm wrong on this, but you can't just add up the VRAM (e.g. use 10 5G instances), each MIG instance must have at least as much of the VRAM as what the layer that needs to be loaded on it requires -- at least from the limited debugging messages I saw.

GiteaMirror commented

2026-04-28 03:35:33 -05:00

@mhoehl05 commented on GitHub (Aug 5, 2024):

so from my test you can see that i have been able to run the llama3.1:70b on a single 20G MIG istance. Ollama estimates a requirement of 40gb vram for the model https://ollama.com/library/llama3.1:70b When testing the model, it was really slow. I am not sure what the machnism to be able to run larger models with less vram is. But the ollama home page tells us to use different tags for different quantizations. So i guess thats not it, but please correct me if I am wrong.

From what ive seen in previous tests ollama does utilize multiple gpus if needed. I am hoping to achieve the same behavior with migs.

@mhoehl05 commented on GitHub (Aug 5, 2024): so from my test you can see that i have been able to run the llama3.1:70b on a single 20G MIG istance. Ollama estimates a requirement of 40gb vram for the model https://ollama.com/library/llama3.1:70b When testing the model, it was really slow. I am not sure what the machnism to be able to run larger models with less vram is. But the ollama home page tells us to use different tags for different quantizations. So i guess thats not it, but please correct me if I am wrong. From what ive seen in previous tests ollama does utilize multiple gpus if needed. I am hoping to achieve the same behavior with migs.

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#47323