[GH-ISSUE #7686] Swap Disk Safeguard #51418

New Issue

GiteaMirror · 2026-04-28T19:58:14-05:00

GiteaMirror commented

2026-04-28 19:58:14 -05:00

Originally created by @unclemusclez on GitHub (Nov 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7686

What is the issue?

If Ollama and a Model are bound as a startup process, there is a potentiality to allow Ollama to utilize the swap memory on start and cause an incredibly slow system/system hang.

If you compile Ollama with CPU capabilities, and the GPU driver does not load for some reason or was uninstalled without Ollama's startup process being turned off, AND you have a SWAP DISK that is large enough to hold a model that is bound to a start process, Ollama will automatically load a model into the swap disk, on CPU.

I think the only remedy would be to actually go to the server location and reboot in safe-mode, or revert to a previous snapshot. This could be system breaking.

OS

Linux

GPU

Nvidia, AMD

CPU

Intel, AMD

Ollama version

No response

Originally created by @unclemusclez on GitHub (Nov 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7686 ### What is the issue? If Ollama and a Model are bound as a startup process, there is a potentiality to allow Ollama to utilize the swap memory on start and cause an incredibly slow system/system hang. If you compile Ollama with CPU capabilities, and the GPU driver does not load for some reason or was uninstalled without Ollama's startup process being turned off, AND you have a SWAP DISK that is large enough to hold a model that is bound to a start process, Ollama will automatically load a model into the swap disk, on CPU. I think the only remedy would be to actually go to the server location and reboot in safe-mode, or revert to a previous snapshot. This could be system breaking. ### OS Linux ### GPU Nvidia, AMD ### CPU Intel, AMD ### Ollama version _No response_

GiteaMirror added the bug label 2026-04-28 19:58:14 -05:00

GiteaMirror closed this issue

2026-04-28 19:58:19 -05:00

GiteaMirror commented

2026-04-28 19:58:28 -05:00

@rick-github commented on GitHub (Nov 15, 2024):

More of a sysadmin issue than an ollama issue. There are multiple ways to manage resources on a Linux system, eg add swap, or modify the ollama service file and add resource limits:

ExecStart=bash -c 'exec prlimit --data=$[500 * 1024 * 1024] /usr/local/bin/ollama serve'

Windows systems can use process-govener, I'm sure similar utilities are available for MacOS.

@rick-github commented on GitHub (Nov 15, 2024): More of a sysadmin issue than an ollama issue. There are multiple ways to manage resources on a Linux system, eg add swap, or modify the ollama service file and add resource limits: ``` ExecStart=bash -c 'exec prlimit --data=$[500 * 1024 * 1024] /usr/local/bin/ollama serve' ``` Windows systems can use [process-govener](https://github.com/lowleveldesign/process-governor), I'm sure similar utilities are available for MacOS.

GiteaMirror commented

2026-04-28 19:58:36 -05:00

@unclemusclez commented on GitHub (Nov 15, 2024):

More of a sysadmin issue than an ollama issue. There are multiple ways to manage resources on a Linux system, eg add swap, or modify the ollama service file and add resource limits:
ExecStart=bash -c 'exec prlimit --data=$[500 * 1024 * 1024] /usr/local/bin/ollama serve'
Windows systems can use process-govener, I'm sure similar utilities are available for MacOS.

i propose a --swap flag to enable swap memory utilization. This isn't the same thing as shared memory. For the overwhelming majority of users, I highly doubt swap memory is being utilized for model hosting.

The reason i came across this was because i had to reinstall ROCm drivers. The systemctl service I had created triggered the model load. Because i was trying to clean the driver and install fresh drivers on a rebooted system, the systemctl process launched without being able to use ROCm, causing a serious system hang.

I agree this is a very specific situation, but i was lucky i knew what to look for.

there is no real way to SSH into a system that is running this slow unless you let the entire model load up first.

and yes, I am familiar with this behavior just because of my personal system specifications. If I didn't know that the reason my system was hanging was due to Ollama trying to load 32GB onto a swap disk with CPU, i would probably think my machine was melting.

@unclemusclez commented on GitHub (Nov 15, 2024): > More of a sysadmin issue than an ollama issue. There are multiple ways to manage resources on a Linux system, eg add swap, or modify the ollama service file and add resource limits: > > ``` > ExecStart=bash -c 'exec prlimit --data=$[500 * 1024 * 1024] /usr/local/bin/ollama serve' > ``` > > Windows systems can use [process-govener](https://github.com/lowleveldesign/process-governor), I'm sure similar utilities are available for MacOS. i propose a `--swap` flag to enable swap memory utilization. This isn't the same thing as shared memory. For the overwhelming majority of users, I highly doubt swap memory is being utilized for model hosting. The reason i came across this was because i had to reinstall ROCm drivers. The systemctl service I had created triggered the model load. Because i was trying to clean the driver and install fresh drivers on a rebooted system, the systemctl process launched without being able to use ROCm, causing a serious system hang. I agree this is a very specific situation, but i was lucky i knew what to look for. there is no real way to SSH into a system that is running this slow unless you let the entire model load up first. and yes, I am familiar with this behavior just because of my personal system specifications. If I didn't know that the reason my system was hanging was due to Ollama trying to load 32GB onto a swap disk with CPU, i would probably think my machine was melting.

GiteaMirror commented

2026-04-28 19:58:38 -05:00

@rick-github commented on GitHub (Nov 15, 2024):

Perhaps MemorySwapMax might be better for your use case.

@rick-github commented on GitHub (Nov 15, 2024): Perhaps [MemorySwapMax](https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html#MemorySwapMax=bytes) might be better for your use case.

GiteaMirror commented

2026-04-28 19:58:39 -05:00

@unclemusclez commented on GitHub (Nov 15, 2024):

Perhaps MemorySwapMax might be better for your use case.

I think you are misunderstanding the severity. My machine is working as expected.

If, for any reason, a GPU driver that is linked to the runner becomes unavailable, you will automatically utilize available swap memory if it is large enough to support the model (and if dedicated system memory isn't enough). Ollama will load a model with CPU from your SSD/HDD into ANOTHER SSD/HDD. This can potentially a very long and undesired process.

Ollama, for most purposes, should never utilize swap memory.

For example:

My GPU is 32GB
my System Memory is 16GB
and my Swap Memory is 140GB.

this is a very unique case, but it exposes the CPU functionality of Ollama when resources are mismanaged. This can happen due to a simple apt update given the right conditions.

@unclemusclez commented on GitHub (Nov 15, 2024): > Perhaps [MemorySwapMax](https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html#MemorySwapMax=bytes) might be better for your use case. I think you are misunderstanding the severity. My machine is working as expected. If, for any reason, a GPU driver that is linked to the runner becomes unavailable, you will automatically utilize available swap memory if it is large enough to support the model (and if dedicated system memory isn't enough). Ollama will load a model with CPU from your SSD/HDD into **ANOTHER** SSD/HDD. This can potentially a very long and undesired process. Ollama, for most purposes, should never utilize swap memory. For example: My GPU is 32GB my System Memory is 16GB and my Swap Memory is 140GB. this is a very unique case, but it exposes the CPU functionality of Ollama when resources are mismanaged. This can happen due to a simple `apt update` given the right conditions.

GiteaMirror commented

2026-04-28 19:58:41 -05:00

@rick-github commented on GitHub (Nov 15, 2024):

If you limit the data segment, swap will not grow larger than that size. This will prevent the "very long and undesired process".

$ systemctl cat ollama | grep "^ExecStart"
ExecStart=bash -c 'exec prlimit --data=$[500 * 1024 * 1024] /usr/local/bin/ollama serve'
$ ollama list dolphin-2.7-mixtral-8x7b:f16
NAME                            ID              SIZE     MODIFIED     
dolphin-2.7-mixtral-8x7b:f16    f8df7041a6a8    93 GB    8 months ago 
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3
$ time curl localhost:11343/api/generate -d '{"model":"dolphin-2.7-mixtral-8x7b:f16","options":{"num_gpu":0}}'
{"error":"llama runner process has terminated: error loading model: unable to allocate backend buffer\nllama_load_model_from_file: failed to load model"}
real	0m0.421s
user	0m0.005s
sys	0m0.017s
$ time curl localhost:11343/api/generate -d '{"model":"qwen2.5:14b","options":{"num_gpu":0}}'
{"error":"llama runner process has terminated: error loading model: unable to allocate backend buffer\nllama_load_model_from_file: failed to load model"}
real	0m0.679s
user	0m0.008s
sys	0m0.005s
$ time curl localhost:11343/api/generate -d '{"model":"qwen2.5:14b"}'
{"model":"qwen2.5:14b","created_at":"2024-11-15T16:47:49.760112712Z","response":"","done":true,"done_reason":"load"}
real	1m19.054s
user	0m0.006s
sys	0m0.009s

There are solutions to your problem that don't require an extra flag that will very rarely be used.

@rick-github commented on GitHub (Nov 15, 2024): If you limit the data segment, swap will not grow larger than that size. This will prevent the "very long and undesired process". ```console $ systemctl cat ollama | grep "^ExecStart" ExecStart=bash -c 'exec prlimit --data=$[500 * 1024 * 1024] /usr/local/bin/ollama serve' $ ollama list dolphin-2.7-mixtral-8x7b:f16 NAME ID SIZE MODIFIED dolphin-2.7-mixtral-8x7b:f16 f8df7041a6a8 93 GB 8 months ago $ echo 3 | sudo tee /proc/sys/vm/drop_caches 3 $ time curl localhost:11343/api/generate -d '{"model":"dolphin-2.7-mixtral-8x7b:f16","options":{"num_gpu":0}}' {"error":"llama runner process has terminated: error loading model: unable to allocate backend buffer\nllama_load_model_from_file: failed to load model"} real 0m0.421s user 0m0.005s sys 0m0.017s $ time curl localhost:11343/api/generate -d '{"model":"qwen2.5:14b","options":{"num_gpu":0}}' {"error":"llama runner process has terminated: error loading model: unable to allocate backend buffer\nllama_load_model_from_file: failed to load model"} real 0m0.679s user 0m0.008s sys 0m0.005s $ time curl localhost:11343/api/generate -d '{"model":"qwen2.5:14b"}' {"model":"qwen2.5:14b","created_at":"2024-11-15T16:47:49.760112712Z","response":"","done":true,"done_reason":"load"} real 1m19.054s user 0m0.006s sys 0m0.009s ``` There are solutions to your problem that don't require an extra flag that will very rarely be used.

GiteaMirror commented

2026-04-28 19:58:42 -05:00

@unclemusclez commented on GitHub (Nov 15, 2024):

none of these commands would be even possible if the system is halted

@unclemusclez commented on GitHub (Nov 15, 2024): none of these commands would be even possible if the system is halted

GiteaMirror commented

2026-04-28 19:58:43 -05:00

@rick-github commented on GitHub (Nov 15, 2024):

Neither would ollama serve --no-swap. That's why this is a sysadmin issue, not an ollama issue.

@rick-github commented on GitHub (Nov 15, 2024): Neither would `ollama serve --no-swap`. That's why this is a sysadmin issue, not an ollama issue.

GiteaMirror commented

2026-04-28 19:58:43 -05:00

@unclemusclez commented on GitHub (Nov 15, 2024):

Neither would ollama serve --no-swap. That's why this is a sysadmin issue, not an ollama issue.

no, --swap would be to enable swap partitions for use. It should be disabled by default.

on top of this, yes it would get triggered during a systemctl call, that's how the system gets hung up in the first place. If ollama is ran with a systemctl script it will run essentially exactly how you just stated it wouldn't.

if there is a secondary process in which a model is activated on start, whether by API or by direct client commands/scripts, this will effectively halt the system.

@unclemusclez commented on GitHub (Nov 15, 2024): > Neither would `ollama serve --no-swap`. That's why this is a sysadmin issue, not an ollama issue. no, `--swap` would be to enable swap partitions for use. It should be disabled by default. on top of this, yes it would get triggered during a systemctl call, that's how the system gets hung up in the first place. If ollama is ran with a systemctl script it will run essentially exactly how you just stated it wouldn't. if there is a secondary process in which a model is activated on start, whether by API or by direct client commands/scripts, this will effectively halt the system.

GiteaMirror commented

2026-04-28 19:58:45 -05:00

@rick-github commented on GitHub (Nov 15, 2024):

I think the premise that swap should be off by default is where we are not seeing eye-to-eye. Many ollama users like to experiment with models that do not fit in a combination of VRAM and RAM, which is why ollama considers free swap when it's computing memory requirements for loading a model.

In the case where a system absolutely must not use swap, options have been presented. So it then comes down to configuration. In the swap-off-by-default case, anybody who wants to experiment with large models needs to modify their configuration to add --swap. In the swap-on-by-default case, anybody who wants to limit swap needs to modify their configuration to add prlimit. So it's a sysadmin issue.

If ollama has been configured in the swap-on-by-default case with prlimit and a secondary process activates a model on start, the system will not be halted.

@rick-github commented on GitHub (Nov 15, 2024): I think the premise that swap should be off by default is where we are not seeing eye-to-eye. Many ollama users like to experiment with models that do not fit in a combination of VRAM and RAM, which is why ollama considers free swap when it's computing memory requirements for loading a model. In the case where a system absolutely must not use swap, options have been presented. So it then comes down to configuration. In the swap-off-by-default case, anybody who wants to experiment with large models needs to modify their configuration to add `--swap`. In the swap-on-by-default case, anybody who wants to limit swap needs to modify their configuration to add `prlimit`. So it's a sysadmin issue. If ollama has been configured in the swap-on-by-default case with `prlimit` and a secondary process activates a model on start, the system will not be halted.

GiteaMirror commented

2026-04-28 19:58:46 -05:00

@unclemusclez commented on GitHub (Nov 15, 2024):

I think the premise that swap should be off by default is where we are not seeing eye-to-eye. Many ollama users like to experiment with models that do not fit in a combination of VRAM and RAM, which is why ollama considers free swap when it's computing memory requirements for loading a model.

In the case where a system absolutely must not use swap, options have been presented. So it then comes down to configuration. In the swap-off-by-default case, anybody who wants to experiment with large models needs to modify their configuration to add --swap. In the swap-on-by-default case, anybody who wants to limit swap needs to modify their configuration to add prlimit. So it's a sysadmin issue.

If ollama has been configured in the swap-on-by-default case with prlimit and a secondary process activates a model on start, the system will not be halted.

I'm not married to the idea of a flag, however, I absolutely triggered this scenario. It would be a rare case, but it is definitely possible. I did it.

Luckily I knew what to look for. However, I can imagine someone not realizing what they did, and ending up having to reinstall the operating system, and potentially triggering it again because they didn't even understand what the issue was in the first place.

Just to reiterate, if Ollama and the model being loaded are startup processes (with a systemctl service script for example) + if there is not enough System RAM + the GPU driver does not load + the SWAP memory is sufficient, then The SWAP memory will be utilized, potentially causing a system hang until the model is fully loaded.

Fully loading the model could take a considerable amount of time depending on Model Size and Swap Disk Speed. For example with my system it is a low speed RPM drive loading 27GB. This take about 20-40 minutes.

Because the memory is maxed out during this window, The system will most likely be inaccessible until the model is unloaded from the system memory, if the system has a chance for that to occur. This means you cant SSH into the machine or kill a process. The machine will be too slow for practical usage.

@unclemusclez commented on GitHub (Nov 15, 2024): > I think the premise that swap should be off by default is where we are not seeing eye-to-eye. Many ollama users like to experiment with models that do not fit in a combination of VRAM and RAM, which is why ollama considers free swap when it's computing memory requirements for loading a model. > > In the case where a system absolutely must not use swap, options have been presented. So it then comes down to configuration. In the swap-off-by-default case, anybody who wants to experiment with large models needs to modify their configuration to add `--swap`. In the swap-on-by-default case, anybody who wants to limit swap needs to modify their configuration to add `prlimit`. So it's a sysadmin issue. > > If ollama has been configured in the swap-on-by-default case with `prlimit` and a secondary process activates a model on start, the system will not be halted. I'm not married to the idea of a flag, however, I absolutely triggered this scenario. It would be a rare case, but it is definitely possible. I did it. Luckily I knew what to look for. However, I can imagine someone not realizing what they did, and ending up having to reinstall the operating system, and potentially triggering it again because they didn't even understand what the issue was in the first place. Just to reiterate, if Ollama and the model being loaded are startup processes (with a `systemctl` service script for example) + **if there is not enough System RAM** + **the GPU driver does not load** + **the SWAP memory is sufficient**, then **The SWAP memory will be utilized**, potentially **causing a system hang until the model is fully loaded**. Fully loading the model could take a considerable amount of time depending on Model Size and Swap Disk Speed. For example with my system it is a low speed RPM drive loading 27GB. This take about 20-40 minutes. Because the memory is maxed out during this window, The system will most likely be inaccessible until the model is unloaded from the system memory, if the system has a chance for that to occur. This means you cant SSH into the machine or kill a process. The machine will be too slow for practical usage.

GiteaMirror commented

2026-04-28 19:58:48 -05:00

@rick-github commented on GitHub (Nov 16, 2024):

I'm not anti-flag, my position is that there are already mechanisms available that can protect a system from this event. If you would like to add additional mechanisms, file a PR and let the developers choose to integrate. The typical configuration method in ollama is via environment variables.

@rick-github commented on GitHub (Nov 16, 2024): I'm not anti-flag, my position is that there are already mechanisms available that can protect a system from this event. If you would like to add additional mechanisms, file a PR and let the developers choose to integrate. The typical configuration method in ollama is via [environment variables](https://github.com/ollama/ollama/blob/d875e99e4639dc07af90b2e3ea0d175e2e692efb/envconfig/config.go#L235).

GiteaMirror commented

2026-04-28 19:58:51 -05:00

@unclemusclez commented on GitHub (May 21, 2025):

i think this actually existed as a flag for llama.cpp.
--mlock force system to keep model in RAM rather than swapping or compressing

i believe if there is not enough memory when using this flag, it just wont start.

best wishes!

@unclemusclez commented on GitHub (May 21, 2025): i think this actually existed as a flag for llama.cpp. `--mlock force system to keep model in RAM rather than swapping or compressing` i believe if there is not enough memory when using this flag, it just wont start. best wishes!

GiteaMirror commented

2026-04-28 19:58:52 -05:00

@rick-github commented on GitHub (May 21, 2025):

mlock is not currently supported. Even when it was, it was advisory - if the server failed to mlock the entirety of the model, it would emit a warning to the log and continue.

@rick-github commented on GitHub (May 21, 2025): `mlock` is not currently supported. Even when it was, it was advisory - if the server failed to mlock the entirety of the model, it would emit a warning to the log and continue.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#51418