[GH-ISSUE #1600] Is there any option to unload a model from memory? #62922

New Issue

GiteaMirror · 2026-05-03T10:49:38-05:00

GiteaMirror commented

2026-05-03 10:49:38 -05:00

Originally created by @DanielMazurkiewicz on GitHub (Dec 19, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1600

As in the title. I want to unload model, is there any option for it?

Originally created by @DanielMazurkiewicz on GitHub (Dec 19, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1600 As in the title. I want to unload model, is there any option for it?

GiteaMirror closed this issue

2026-05-03 10:49:39 -05:00

GiteaMirror commented

2026-05-03 10:49:40 -05:00

@zach030 commented on GitHub (Dec 19, 2023):

No support yet, you can only shutdown the serve to close the model

@zach030 commented on GitHub (Dec 19, 2023): No support yet, you can only shutdown the serve to close the model

GiteaMirror commented

2026-05-03 10:49:41 -05:00

@technovangelist commented on GitHub (Dec 19, 2023):

The model gets automatically unloaded after 5 minutes. It sounds like you want it unloaded in less time than that? Or are you saying it's taking longer than 5 minutes?

@technovangelist commented on GitHub (Dec 19, 2023): The model gets automatically unloaded after 5 minutes. It sounds like you want it unloaded in less time than that? Or are you saying it's taking longer than 5 minutes?

GiteaMirror commented

2026-05-03 10:49:42 -05:00

@igorschlum commented on GitHub (Dec 19, 2023):

I think that it could be interesting to set a parameter for the 5 minutes. Sometimes, you want less or more, depending the usage of Ollama you make.

@igorschlum commented on GitHub (Dec 19, 2023): I think that it could be interesting to set a parameter for the 5 minutes. Sometimes, you want less or more, depending the usage of Ollama you make.

GiteaMirror commented

2026-05-03 10:49:44 -05:00

@mattbisme commented on GitHub (Dec 20, 2023):

There are some use cases where it would be nice to have it unload almost immediately (single-turn Siri request). But other times where it would be better to have it loaded indefinitely (a chatbot that is prompted occasionally where a fast response provides a better experience).

Being able to configure this parameter at model runtime (especially via API) would be great!

@mattbisme commented on GitHub (Dec 20, 2023): There are some use cases where it would be nice to have it unload almost immediately (single-turn Siri request). But other times where it would be better to have it loaded indefinitely (a chatbot that is prompted occasionally where a fast response provides a better experience). Being able to configure this parameter at model runtime (especially via API) would be great!

GiteaMirror commented

2026-05-03 10:49:45 -05:00

@chymian commented on GitHub (Dec 27, 2023):

+1 for the API- & Modelfile-tuneable unload parameter, especially: keep-inifinit

@chymian commented on GitHub (Dec 27, 2023): +1 for the API- & Modelfile-tuneable unload parameter, especially: __keep-inifinit__

GiteaMirror commented

2026-05-03 10:49:46 -05:00

@dennisorlando commented on GitHub (Jan 6, 2024):

Something as simple as "ollama load" and "ollama unload" would suffice

@dennisorlando commented on GitHub (Jan 6, 2024): Something as simple as "ollama load" and "ollama unload" would suffice

GiteaMirror commented

2026-05-03 10:49:46 -05:00

@olekse commented on GitHub (Jan 6, 2024):

^^^
edit. btw for me it always stays in ram (Windows 10/WSL /w https://github.com/ollama-webui/ollama-webui)

@olekse commented on GitHub (Jan 6, 2024): ^^^ edit. btw for me it always stays in ram (Windows 10/WSL /w https://github.com/ollama-webui/ollama-webui)

GiteaMirror commented

2026-05-03 10:49:47 -05:00

@pdevine commented on GitHub (Jan 28, 2024):

#2146 adds this which will be available in 0.1.23. Going to go ahead and close this. You can set keep_alive to -1 when calling the chat API and it will leave the model loaded in memory.

@pdevine commented on GitHub (Jan 28, 2024): #2146 adds this which will be available in `0.1.23`. Going to go ahead and close this. You can set `keep_alive` to `-1` when calling the chat API and it will leave the model loaded in memory.

GiteaMirror commented

2026-05-03 10:49:48 -05:00

@nathanleclaire commented on GitHub (Feb 7, 2024):

@pdevine For what it's worth I would still like the ability to manually evict a model from VRAM through API + CLI command. The keepalive functionality is nice but on my Linux box (will have to double-check later to make sure it's latest version, but installed very recently) after a chat session the model just sits there in VRAM and I have to restart ollama to get it out if something else wants to use the GPU. Which, of course, is a shame because I would also like to ollama pull things at the same time :)

@nathanleclaire commented on GitHub (Feb 7, 2024): @pdevine For what it's worth I would still like the ability to manually evict a model from VRAM through API + CLI command. The keepalive functionality is nice but on my Linux box (will have to double-check later to make sure it's latest version, but installed very recently) after a chat session the model just sits there in VRAM and I have to restart ollama to get it out if something else wants to use the GPU. Which, of course, is a shame because I would also like to `ollama pull` things at the same time :)

GiteaMirror commented

2026-05-03 10:49:49 -05:00

@pdevine commented on GitHub (Feb 7, 2024):

@nathanleclaire I've been thinking about adding an OLLAMA_KEEP_ALIVE env variable to be able to change the default timeout. I don't want to go too extreme here though because ideally there would be much richer controls (e.g. access controls/policies/multiple models/etc.)

@pdevine commented on GitHub (Feb 7, 2024): @nathanleclaire I've been thinking about adding an `OLLAMA_KEEP_ALIVE` env variable to be able to change the default timeout. I don't want to go too extreme here though because ideally there would be much richer controls (e.g. access controls/policies/multiple models/etc.)

GiteaMirror commented

2026-05-03 10:49:50 -05:00

@OliChase404 commented on GitHub (Feb 21, 2024):

This might help, from the faq.md file:

How do I keep a model loaded in memory or make it unload immediately?

By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the keep_alive parameter with either the /api/generate and /api/chat API endpoints to control how long the model is left in memory.
The keep_alive parameter can be set to:

a duration string (such as "10m" or "24h")
a number in seconds (such as 3600)
any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
'0' which will unload the model immediately after generating a response
For example, to preload a model and leave it in memory use:

curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}'

To unload the model and free up memory use:

curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'

@OliChase404 commented on GitHub (Feb 21, 2024): This might help, from the [faq.md](https://faq.md/) file: ## How do I keep a model loaded in memory or make it unload immediately? By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the `keep_alive` parameter with either the `/api/generate` and `/api/chat` API endpoints to control how long the model is left in memory. The `keep_alive` parameter can be set to: * a duration string (such as "10m" or "24h") * a number in seconds (such as 3600) * any negative number which will keep the model loaded in memory (e.g. -1 or "-1m") * '0' which will unload the model immediately after generating a response For example, to preload a model and leave it in memory use: ```shell curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}' ``` To unload the model and free up memory use: ```shell curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'

GiteaMirror commented

2026-05-03 10:49:50 -05:00

@OliChase404 commented on GitHub (Feb 21, 2024):

So to immediately unload a model and free your vram run:

curl http://localhost:11434/api/generate -d '{"model": "MODELNAME", "keep_alive": 0}'

Replace MODELNAME with the name of the model currently loaded.

@OliChase404 commented on GitHub (Feb 21, 2024): So to immediately unload a model and free your vram run: curl http://localhost:11434/api/generate -d '{"model": "MODELNAME", "keep_alive": 0}' Replace MODELNAME with the name of the model currently loaded.

GiteaMirror commented

2026-05-03 10:49:51 -05:00

@olekse commented on GitHub (Mar 18, 2024):

What about removing the model from the RAM? It still stays in RAM when VRAM is cleared.

@olekse commented on GitHub (Mar 18, 2024): What about removing the model from the RAM? It still stays in RAM when VRAM is cleared.

GiteaMirror commented

2026-05-03 10:49:52 -05:00

@pdevine commented on GitHub (Mar 18, 2024):

@olekse it's the same.

@pdevine commented on GitHub (Mar 18, 2024): @olekse it's the same.

GiteaMirror commented

2026-05-03 10:49:53 -05:00

@wenhui01 commented on GitHub (Apr 15, 2024):

This parameter can't work with embedding model.

@wenhui01 commented on GitHub (Apr 15, 2024): This parameter can't work with embedding model.

GiteaMirror commented

2026-05-03 10:49:54 -05:00

@Divelix commented on GitHub (May 5, 2024):

CLI command to immediately unload model from memory is a must have for me. Most of the time 5 mins is fine, but occasionally I have to shut it down right now to use full VRAM for smth else. Implement what @dennisorlando suggested above, please.

@Divelix commented on GitHub (May 5, 2024): CLI command to immediately unload model from memory is a must have for me. Most of the time 5 mins is fine, but occasionally I have to shut it down right now to use full VRAM for smth else. Implement what @dennisorlando suggested above, please.

GiteaMirror commented

2026-05-03 10:49:55 -05:00

@VfBfoerst commented on GitHub (May 8, 2024):

curl http://localhost:11434/api/generate -d '{"model": "MODELNAME", "keep_alive": 0}'

worked for me :)

@VfBfoerst commented on GitHub (May 8, 2024): > curl http://localhost:11434/api/generate -d '{"model": "MODELNAME", "keep_alive": 0}' worked for me :)

GiteaMirror commented

2026-05-03 10:49:55 -05:00

@davidearlyoung commented on GitHub (Jun 2, 2024):

Since terminal command line ollama ps is a thing now, as well as loading multiple models, I feel that this should be revisited. Such as reconsidering adding the ability to target an already loaded model for unloading using terminal command line. Rather then sending a client request with keep_alive parameter to 0 when ever I want to evict a select model from vram/ram. I feel that the keep_alive method to target unloading a single model is a bit cumbersome. Especially from a server admin's perspective.

@davidearlyoung commented on GitHub (Jun 2, 2024): Since terminal command line `ollama ps` is a thing now, as well as loading multiple models, I feel that this should be revisited. Such as reconsidering adding the ability to target an already loaded model for unloading using terminal command line. Rather then sending a client request with `keep_alive` parameter to 0 when ever I want to evict a select model from vram/ram. I feel that the keep_alive method to target unloading a single model is a bit cumbersome. Especially from a server admin's perspective.

GiteaMirror commented

2026-05-03 10:49:57 -05:00

@eav-solution commented on GitHub (Jun 19, 2024):

Vote

@eav-solution commented on GitHub (Jun 19, 2024): Vote

GiteaMirror commented

2026-05-03 10:49:59 -05:00

@Dement242 commented on GitHub (Aug 16, 2024):

Here is a script that will list all running models on the ip´s (edit the script and change to your ip-addresses) and unload the model you select.

#!/bin/bash

# Define the IP addresses
ips=("192.168.1.40" "192.168.1.42")

# Function to fetch models from a given IP address
fetch_models() {
    local ip=$1
    json_response=$(curl -s http://$ip:11434/api/ps)
    echo "$json_response" | jq -r --arg ip "$ip" '.models[] | "\($ip) \(.model)"'
}

# Step 1: Fetch models from all IP addresses and display as a numbered list
declare -a models_list
index=1
echo "Available models:"
for ip in "${ips[@]}"; do
    echo "Server $ip:"
    models=$(fetch_models $ip)
    IFS=$'\n' read -rd '' -a models_array <<< "$models"
    for model in "${models_array[@]}"; do
        models_list+=("$model")
        echo "  $index) ${model#* }"
        ((index++))
    done
done

# Step 2: Let the user select a model by number
echo -e "\nEnter the model number (or 0 to exit):"
read model_num
[[ $model_num -eq 0 ]] && exit 0

# Step 3: Get the selected IP and model
selected_entry="${models_list[$((model_num-1))]}"
selected_ip="${selected_entry%% *}"
selected_model="${selected_entry#* }"

# Step 4: Run the final curl command with the selected IP and model
curl http://$selected_ip:11434/api/generate -d "{\"model\": \"$selected_model\", \"keep_alive\": 0}"
echo

@Dement242 commented on GitHub (Aug 16, 2024): Here is a script that will list all running models on the ip´s (edit the script and change to your ip-addresses) and unload the model you select. ``` #!/bin/bash # Define the IP addresses ips=("192.168.1.40" "192.168.1.42") # Function to fetch models from a given IP address fetch_models() { local ip=$1 json_response=$(curl -s http://$ip:11434/api/ps) echo "$json_response" | jq -r --arg ip "$ip" '.models[] | "\($ip) \(.model)"' } # Step 1: Fetch models from all IP addresses and display as a numbered list declare -a models_list index=1 echo "Available models:" for ip in "${ips[@]}"; do echo "Server $ip:" models=$(fetch_models $ip) IFS=$'\n' read -rd '' -a models_array <<< "$models" for model in "${models_array[@]}"; do models_list+=("$model") echo " $index) ${model#* }" ((index++)) done done # Step 2: Let the user select a model by number echo -e "\nEnter the model number (or 0 to exit):" read model_num [[ $model_num -eq 0 ]] && exit 0 # Step 3: Get the selected IP and model selected_entry="${models_list[$((model_num-1))]}" selected_ip="${selected_entry%% *}" selected_model="${selected_entry#* }" # Step 4: Run the final curl command with the selected IP and model curl http://$selected_ip:11434/api/generate -d "{\"model\": \"$selected_model\", \"keep_alive\": 0}" echo ```

GiteaMirror commented

2026-05-03 10:50:01 -05:00

@MoreColors123 commented on GitHub (Aug 25, 2024):

i was battling with full VRAM while using ollama and FLUX simultaneously. here is a version for anyone looking for a solution for WINDOWS. it is written with chat gpt, so i don't know how, but it works. just adjust the IP which ollama is running on in the first line. Below i also included a .bat script to execute this .ps1

/// put this in a file called unload.ps1

# Define the IP addresses
$ips = @("127.0.0.1")

# Function to fetch models from a given IP address
function Fetch-Models {
    param (
        [string]$ip
    )
    $json_response = Invoke-RestMethod -Uri "http://${ip}:11434/api/ps"
    $models = $json_response.models | ForEach-Object { 
        "$ip $($_.model)" 
    }
    return $models
}

# Step 1: Fetch models from all IP addresses and display as a numbered list
$models_list = @()
$index = 1
Write-Host "Available models:"
foreach ($ip in $ips) {
    Write-Host "Server ${ip}:"
    $models = Fetch-Models -ip $ip
    foreach ($model in $models) {
        $models_list += $model
        Write-Host "  $index) $($model -split ' ')[1]"
        $index++
    }
}

# Step 2: Let the user select a model by number
$model_num = Read-Host "`nEnter the model number (or 0 to exit):"
if ($model_num -eq 0) {
    exit
}

# Step 3: Get the selected IP and model
$selected_entry = $models_list[$model_num - 1].Trim()
$selected_ip = ($selected_entry -split ' ')[0].Trim()
$selected_model = ($selected_entry -split ' ')[1].Trim()

# Debug output
Write-Host "Selected IP: $selected_ip"
Write-Host "Selected Model: $selected_model"

# Step 4: Run the final Invoke-RestMethod command with the selected IP and model
$body = @{ model = $selected_model; keep_alive = 0 } | ConvertTo-Json
Invoke-RestMethod -Uri "http://${selected_ip}:11434/api/generate" -Method Post -Body $body -ContentType "application/json"
Write-Host "Model Unloaded"

/// put this in a .bat file:

@echo off
PowerShell -NoProfile -ExecutionPolicy Bypass -File "unload.ps1"
pause

-> double the click .bat file to choose the ollama model to unload

@MoreColors123 commented on GitHub (Aug 25, 2024): i was battling with full VRAM while using ollama and FLUX simultaneously. here is a version for anyone looking for a solution for WINDOWS. it is written with chat gpt, so i don't know how, but it works. just adjust the IP which ollama is running on in the first line. Below i also included a .bat script to execute this .ps1 /// put this in a file called unload.ps1 ```powershell # Define the IP addresses $ips = @("127.0.0.1") # Function to fetch models from a given IP address function Fetch-Models { param ( [string]$ip ) $json_response = Invoke-RestMethod -Uri "http://${ip}:11434/api/ps" $models = $json_response.models | ForEach-Object { "$ip $($_.model)" } return $models } # Step 1: Fetch models from all IP addresses and display as a numbered list $models_list = @() $index = 1 Write-Host "Available models:" foreach ($ip in $ips) { Write-Host "Server ${ip}:" $models = Fetch-Models -ip $ip foreach ($model in $models) { $models_list += $model Write-Host " $index) $($model -split ' ')[1]" $index++ } } # Step 2: Let the user select a model by number $model_num = Read-Host "`nEnter the model number (or 0 to exit):" if ($model_num -eq 0) { exit } # Step 3: Get the selected IP and model $selected_entry = $models_list[$model_num - 1].Trim() $selected_ip = ($selected_entry -split ' ')[0].Trim() $selected_model = ($selected_entry -split ' ')[1].Trim() # Debug output Write-Host "Selected IP: $selected_ip" Write-Host "Selected Model: $selected_model" # Step 4: Run the final Invoke-RestMethod command with the selected IP and model $body = @{ model = $selected_model; keep_alive = 0 } | ConvertTo-Json Invoke-RestMethod -Uri "http://${selected_ip}:11434/api/generate" -Method Post -Body $body -ContentType "application/json" Write-Host "Model Unloaded" ``` /// put this in a .bat file: ```shell @echo off PowerShell -NoProfile -ExecutionPolicy Bypass -File "unload.ps1" pause ``` -> double the click .bat file to choose the ollama model to unload

GiteaMirror commented

2026-05-03 10:50:03 -05:00

@pdevine commented on GitHub (Nov 17, 2024):

To unload a model, just use ollama stop <model>. To keep it in memory you can use ollama run --keepalive -1s <model>

@pdevine commented on GitHub (Nov 17, 2024): To unload a model, just use `ollama stop <model>`. To keep it in memory you can use `ollama run --keepalive -1s <model>`

GiteaMirror commented

2026-05-03 10:50:03 -05:00

@somera commented on GitHub (Feb 27, 2025):

Instead of

for ip in "${ips[@]}"; do
echo "Server $ip:"
models=$(fetch_models $ip)
IFS=$'\n' read -rd '' -a models_array <<< "$models"
for model in "${models_array[@]}"; do
models_list+=("$model")
echo " $index) ${model#* }"
((index++))
done
done

it works for me with:

for ip in "${ips[@]}"; do
    echo "Server $ip:"
    models=$(fetch_models "$ip")

    # Read the models into an array
    mapfile -t models_array <<< "$models"

    for model in "${models_array[@]}"; do
        models_list+=("$model")
        echo "  $index) ${model#* }"
        ((index++))
    done
done

@somera commented on GitHub (Feb 27, 2025): Instead of > for ip in "${ips[@]}"; do > echo "Server $ip:" > models=$(fetch_models $ip) > IFS=$'\n' read -rd '' -a models_array <<< "$models" > for model in "${models_array[@]}"; do > models_list+=("$model") > echo " $index) ${model#* }" > ((index++)) > done > done it works for me with: ``` for ip in "${ips[@]}"; do echo "Server $ip:" models=$(fetch_models "$ip") # Read the models into an array mapfile -t models_array <<< "$models" for model in "${models_array[@]}"; do models_list+=("$model") echo " $index) ${model#* }" ((index++)) done done ```

GiteaMirror commented

2026-05-03 10:50:04 -05:00

@mbylstra commented on GitHub (Mar 12, 2025):

To unload a model, just use ollama stop <model>. To keep it in memory you can use ollama run --keepalive -1s <model>

I don't think that's possible from the REST API (or the Python client). It'd be great if there was a straightforward endpoint to immediately unload a model. As I understand it is possible with /api/generate -d '{"model": "MODELNAME", "keep_alive": 0}', but as mentioned previously it is a cumbersome and counterintuitive way to do it. I don't think it'd hurt to add a specific endpoint for this.

A common use case is sharing a GPU between image generation and text generation (and needing to unload from memory when switching from text generation to image generation if the GPU does not have enough VRAM for both)

@mbylstra commented on GitHub (Mar 12, 2025): > To unload a model, just use `ollama stop <model>`. To keep it in memory you can use `ollama run --keepalive -1s <model>` I don't think that's possible from the [REST API](https://github.com/ollama/ollama/blob/main/docs/api.md) (or the Python client). It'd be great if there was a straightforward endpoint to immediately unload a model. As I understand it is possible with `/api/generate -d '{"model": "MODELNAME", "keep_alive": 0}'`, but as mentioned previously it is a cumbersome and counterintuitive way to do it. I don't think it'd hurt to add a specific endpoint for this. A common use case is sharing a GPU between image generation and text generation (and needing to unload from memory when switching from text generation to image generation if the GPU does not have enough VRAM for both)

GiteaMirror commented

2026-05-03 10:50:05 -05:00

@dstadulis commented on GitHub (Sep 12, 2025):

Offered is a solution which:

Create JSON objects of the ollama ps lines of the models
Selects name
Passes these names as inputs to a gnu-parallel command
- Which sets ollama's keepalive value to 0, effectively purging the model from memory

# Unload all models listed by ollama ps -- by setting keepalive value to 0
parallel "ollama run --keepalive 0s {}" ::: $(ollama ps | awk '
  NR == 1 {
    split($0, headers)
  }
  NR > 1 {
    obj = "{"
    for (i=1; i<=NF; i++) {
      gsub(/[[:space:]]+$/, "", $i) # Trim trailing spaces
      if (i < NF) {
        obj = sprintf("%s\"%s\": \"%s\", ", obj, headers[i], $i)
      } else {
        obj = sprintf("%s\"%s\": \"%s\"}", obj, headers[i], $i)
      }
    }
    print obj
  }
' |   jq -r '.NAME')

@dstadulis commented on GitHub (Sep 12, 2025): Offered is a solution which: - Create JSON objects of the ollama ps lines of the models - Selects name - Passes these names as inputs to a gnu-parallel command - Which sets ollama's `keepalive` value to 0, effectively purging the model from memory ```zsh # Unload all models listed by ollama ps -- by setting keepalive value to 0 parallel "ollama run --keepalive 0s {}" ::: $(ollama ps | awk ' NR == 1 { split($0, headers) } NR > 1 { obj = "{" for (i=1; i<=NF; i++) { gsub(/[[:space:]]+$/, "", $i) # Trim trailing spaces if (i < NF) { obj = sprintf("%s\"%s\": \"%s\", ", obj, headers[i], $i) } else { obj = sprintf("%s\"%s\": \"%s\"}", obj, headers[i], $i) } } print obj } ' | jq -r '.NAME') ```

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#62922