[GH-ISSUE #2508] OLLAMA_KEEP_ALIVE ENV feature #63505

Closed
opened 2026-05-03 13:53:10 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @uxfion on GitHub (Feb 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2508

Does anyone know how to set keep_alive in the openai API? It seems that this feature is not supported in the openai API.

It would be better if we could set OLLAMA_KEEP_ALIVE in the environment variables, since the /v1/chat/completions endpoint is difficult to support customized parameters.

https://github.com/ollama/ollama/pull/2146#issue-2094810743

Originally created by @uxfion on GitHub (Feb 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2508 Does anyone know how to set `keep_alive` in the openai API? It seems that this feature is not supported in the openai API. It would be better if we could set `OLLAMA_KEEP_ALIVE` in the environment variables, since the `/v1/chat/completions` endpoint is difficult to support customized parameters. https://github.com/ollama/ollama/pull/2146#issue-2094810743
GiteaMirror added the feature request label 2026-05-03 13:53:10 -05:00
Author
Owner

@jukofyork commented on GitHub (Feb 16, 2024):

Not sure if it helps but I've been keeping it alive by sending this every 4.5 minutes:

If an empty prompt is provided, the model will be loaded into memory.

curl http://localhost:11434/api/generate -d '{
  "model": "llama2"
}'

From: https://github.com/ollama/ollama/blob/main/docs/api.md

<!-- gh-comment-id:1947841101 --> @jukofyork commented on GitHub (Feb 16, 2024): Not sure if it helps but I've been keeping it alive by sending this every 4.5 minutes: > If an empty prompt is provided, the model will be loaded into memory. ``` curl http://localhost:11434/api/generate -d '{ "model": "llama2" }' ``` From: https://github.com/ollama/ollama/blob/main/docs/api.md
Author
Owner

@uxfion commented on GitHub (Feb 16, 2024):

I also wrote a code to keep it alive, but it's still a bit silly. We urgently need an intelligent scheduling system.

import requests
import time
from datetime import datetime
import argparse


def get_current_time_str():
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


def call_api(model):
    url = "http://127.0.0.1:11434/api/generate"
    headers = {"Content-Type": "application/json"}
    payload = {"model": model, "keep_alive": "-3m"}

    try:
        start_time = datetime.now()
        print(f"\n\n[{start_time}] Trying to call the API...")
        response = requests.post(url, json=payload, headers=headers)
        end_time = datetime.now()
        duration = (end_time - start_time).total_seconds()

        current_time = get_current_time_str()
        if response.status_code == 200:
            print(f"[{current_time}] API call successful. Duration: {duration} seconds")
            print(response.text)
        else:
            print(
                f"[{current_time}] API call failed with status code: {response.status_code}. Duration: {duration} seconds"
            )
    except Exception as e:
        current_time = get_current_time_str()
        print(f"[{current_time}] An error occurred: {e}. Duration: {duration} seconds")


def main():
    parser = argparse.ArgumentParser(description="Call API with a model parameter")
    parser.add_argument("model", type=str, help="Model name to call API with")
    args = parser.parse_args()

    interval = 270  # 4 minutes and 30 seconds in seconds
    while True:
        call_api(args.model)
        time.sleep(interval)


if __name__ == "__main__":
    main()

run with python keep_alive llama2

<!-- gh-comment-id:1947852056 --> @uxfion commented on GitHub (Feb 16, 2024): I also wrote a code to keep it alive, but it's still a bit silly. **We urgently need an intelligent scheduling system.** ```python import requests import time from datetime import datetime import argparse def get_current_time_str(): return datetime.now().strftime("%Y-%m-%d %H:%M:%S") def call_api(model): url = "http://127.0.0.1:11434/api/generate" headers = {"Content-Type": "application/json"} payload = {"model": model, "keep_alive": "-3m"} try: start_time = datetime.now() print(f"\n\n[{start_time}] Trying to call the API...") response = requests.post(url, json=payload, headers=headers) end_time = datetime.now() duration = (end_time - start_time).total_seconds() current_time = get_current_time_str() if response.status_code == 200: print(f"[{current_time}] API call successful. Duration: {duration} seconds") print(response.text) else: print( f"[{current_time}] API call failed with status code: {response.status_code}. Duration: {duration} seconds" ) except Exception as e: current_time = get_current_time_str() print(f"[{current_time}] An error occurred: {e}. Duration: {duration} seconds") def main(): parser = argparse.ArgumentParser(description="Call API with a model parameter") parser.add_argument("model", type=str, help="Model name to call API with") args = parser.parse_args() interval = 270 # 4 minutes and 30 seconds in seconds while True: call_api(args.model) time.sleep(interval) if __name__ == "__main__": main() ``` run with `python keep_alive llama2`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63505