[GH-ISSUE #3645] keep_alive doesn't work for OpenAI API #64285

Closed
opened 2026-05-03 16:56:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @longcw on GitHub (Apr 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3645

What is the issue?

keep_alive doesn't work for OpenAI API. When set the keep_alive as 0 in OpenAI API call through http://localhost:11434/v1/chat/completions, the model was not unloaded after the call.

What did you expect to see?

The model should be unloaded when set keep_alive 0

Steps to reproduce

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama2",
        "keep_alive": 0,
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

The following works correctly

curl http://localhost:11434/api/chat \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama2",
        "keep_alive": 0,
        "stream": false,
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Are there any recent changes that introduced the issue?

No response

OS

Linux

Architecture

amd64

Platform

No response

Ollama version

0.1.27

GPU

Nvidia

GPU info

No response

CPU

No response

Other software

No response

Originally created by @longcw on GitHub (Apr 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3645 ### What is the issue? `keep_alive` doesn't work for OpenAI API. When set the `keep_alive` as 0 in OpenAI API call through http://localhost:11434/v1/chat/completions, the model was not unloaded after the call. ### What did you expect to see? The model should be unloaded when set keep_alive 0 ### Steps to reproduce ```bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama2", "keep_alive": 0, "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }' ``` The following works correctly ```bash curl http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{ "model": "llama2", "keep_alive": 0, "stream": false, "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }' ``` ### Are there any recent changes that introduced the issue? _No response_ ### OS Linux ### Architecture amd64 ### Platform _No response_ ### Ollama version 0.1.27 ### GPU Nvidia ### GPU info _No response_ ### CPU _No response_ ### Other software _No response_
GiteaMirror added the bug label 2026-05-03 16:56:24 -05:00
Author
Owner

@jmorganca commented on GitHub (Apr 15, 2024):

Thanks for the issue! merging with #2963

<!-- gh-comment-id:2057622168 --> @jmorganca commented on GitHub (Apr 15, 2024): Thanks for the issue! merging with #2963
Author
Owner

@aaricantto commented on GitHub (Jun 10, 2024):

I'm getting this issue as well - memory only frees up when I run wsl --shutdown

To elaborate - when I a generate request with - "keep_alive": 0, - I can then run http://localhost:11434/api/ps and see an empty output

{
"models": []
}

But running "top" in my WSL shell shows that the ollama process is indeed consuming a significant amount of memory!

217 ollama    20   0   68.6g 151152  12412 S   0.0   0.5   0:09.13 ollama
<!-- gh-comment-id:2157517724 --> @aaricantto commented on GitHub (Jun 10, 2024): I'm getting this issue as well - memory only frees up when I run wsl --shutdown To elaborate - when I a generate request with - "keep_alive": 0, - I can then run http://localhost:11434/api/ps and see an empty output { "models": [] } But running "top" in my WSL shell shows that the ollama process is indeed consuming a significant amount of memory! 217 ollama 20 0 68.6g 151152 12412 S 0.0 0.5 0:09.13 ollama
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64285