[GH-ISSUE #11002] Add Environment Variable to Override API Parameter keep_alive Value #7254

Closed
opened 2026-04-12 19:18:01 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @kkrick-sdsu on GitHub (Jun 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11002

Add Environment Variable to Override API Parameter keep_alive Value

This is a feature request to add a new environment variable to Ollama, "OLLAMA_KEEP_ALIVE_OVERRIDE".

This proposed environment variable would allow the value of the existing environment variable "OLLAMA_KEEP_ALIVE" to override the API parameter "keep_alive" value, falling back to the default value from envconfig.KeepAlive() when not set.

Explain the problem you are trying to solve, not what you are trying to do

I want a way to prevent individual users from changing the keep alive duration with their API calls. When I set the environment variable OLLAMA_KEEP_ALIVE, I.E.: to the value -1 to keep models loaded 'forever', I want this setting to be honored over user API requests. The absence of such a feature allows models to be unloaded inadvertently when they are expected to be always available.

I am trying to solve this problem within the context of a multi-user enviornment where multiple users have API access, both directly via API requests and via UI layers calling the API on their behalf. Some users provide the keep_alive API parameter in their Ollama API requests. Some UI tools supply the keep_alive API parameter by default using the default value provided in Ollama's API docs.

I would prefer to have control over this functionality directly in Ollama as opposed to educating users and encouraging changes for separate UI providers.

Explain why the change is important

Providing administrators with the ability to enforce a consistent keep alive policy across all API usage is essential for maintaining high availability of specific models in shared environments. This is particularly important in production contexts where predictability and stability are priorities.

Explain how the change will be used

Users would be allowed to supply a new environment variable "OLLAMA_KEEP_ALIVE_OVERRIDE". This would be defaulted to false to prevent breaking changes to existing deployments.

When the proposed OLLAMA_KEEP_ALIVE_OVERRIDE is set to true:

  • If OLLAMA_KEEP_ALIVE is set, then the value for OLLAMA_KEEP_ALIVE overrides any keep_alive value included in an API request.
  • Else if OLLAMA_KEEP_ALIVE is not set, then the default value from envconfig.KeepAlive() would override any keep_alive value provided.

Explain how the change will be tested

A test would be added to envconfig/config_test.go similar to envconfig.TestKeepAlive().

A manual test procedure would look something like this:

  1. Set OLLAMA_KEEP_ALIVE to -1:
export OLLAMA_KEEP_ALIVE=-1
  1. Set OLLAMA_KEEP_ALIVE_OVERRIDE to true:
export OLLAMA_KEEP_ALIVE_OVERRIDE=true
  1. Run Ollama:
ollama serve
  1. Load a small model, I.E. gemma3:1b:
ollama run gemma3:1b
  1. Verify the model is loaded 'Forever':
ollama ps
  1. Run the following request with keep_alive parameter:
curl http://localhost:11434/api/generate -d '{    
  "model": "gemma3:1b",
  "prompt": "Why is the sky blue?",
  "keep_alive": 0 
}'
  1. Verify the model is still loaded 'Forever':
ollama ps
Originally created by @kkrick-sdsu on GitHub (Jun 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11002 # Add Environment Variable to Override API Parameter keep_alive Value This is a feature request to add a new environment variable to Ollama, "OLLAMA_KEEP_ALIVE_OVERRIDE". This proposed environment variable would allow the value of the existing environment variable "OLLAMA_KEEP_ALIVE" to override the API parameter "keep_alive" value, falling back to the default value from envconfig.KeepAlive() when not set. ## Explain the problem you are trying to solve, not what you are trying to do I want a way to prevent individual users from changing the keep alive duration with their API calls. When I set the environment variable OLLAMA_KEEP_ALIVE, I.E.: to the value -1 to keep models loaded 'forever', I want this setting to be honored over user API requests. The absence of such a feature allows models to be unloaded inadvertently when they are expected to be always available. I am trying to solve this problem within the context of a multi-user enviornment where multiple users have API access, both directly via API requests and via UI layers calling the API on their behalf. Some users provide the keep_alive API parameter in their Ollama API requests. Some UI tools supply the keep_alive API parameter by default using the default value provided in Ollama's API docs. I would prefer to have control over this functionality directly in Ollama as opposed to educating users and encouraging changes for separate UI providers. ## Explain why the change is important Providing administrators with the ability to enforce a consistent keep alive policy across all API usage is essential for maintaining high availability of specific models in shared environments. This is particularly important in production contexts where predictability and stability are priorities. ## Explain how the change will be used Users would be allowed to supply a new environment variable "OLLAMA_KEEP_ALIVE_OVERRIDE". This would be defaulted to `false` to prevent breaking changes to existing deployments. When the proposed OLLAMA_KEEP_ALIVE_OVERRIDE is set to `true`: - If OLLAMA_KEEP_ALIVE is set, then the value for OLLAMA_KEEP_ALIVE overrides any keep_alive value included in an API request. - Else if OLLAMA_KEEP_ALIVE is not set, then the default value from envconfig.KeepAlive() would override any keep_alive value provided. ## Explain how the change will be tested A test would be added to envconfig/config_test.go similar to envconfig.TestKeepAlive(). A manual test procedure would look something like this: 1. Set OLLAMA_KEEP_ALIVE to -1: ```bash export OLLAMA_KEEP_ALIVE=-1 ``` 2. Set OLLAMA_KEEP_ALIVE_OVERRIDE to true: ```bash export OLLAMA_KEEP_ALIVE_OVERRIDE=true ``` 3. Run Ollama: ```bash ollama serve ``` 4. Load a small model, I.E. gemma3:1b: ```bash ollama run gemma3:1b ``` 5. Verify the model is loaded 'Forever': ```bash ollama ps ``` 6. Run the following request with `keep_alive` parameter: ```bash curl http://localhost:11434/api/generate -d '{ "model": "gemma3:1b", "prompt": "Why is the sky blue?", "keep_alive": 0 }' ``` 7. Verify the model is still loaded 'Forever': ```bash ollama ps ```
GiteaMirror added the feature request label 2026-04-12 19:18:01 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 6, 2025):

You could have the clients use the OpenAI compatibility endpoint. That doesn't allow setting keep_alive.

<!-- gh-comment-id:2951218631 --> @rick-github commented on GitHub (Jun 6, 2025): You could have the clients use the OpenAI compatibility endpoint. That doesn't allow setting `keep_alive`.
Author
Owner

@kkrick-sdsu commented on GitHub (Jun 9, 2025):

Yeah, that is definitely an option and that will be my recommended default.

With that said though, the Ollama API has gained popularity and some users and apps prefer it to Open AI's API. With that in mind, it would be great to have an option to exercise control over keep_alive. In a mixed-use environment with some users/apps on the OpenAI API and others on the Ollama API, it only takes one API call with a keep_alive to introduce unexpected behavior.

<!-- gh-comment-id:2957253492 --> @kkrick-sdsu commented on GitHub (Jun 9, 2025): Yeah, that is definitely an option and that will be my recommended default. With that said though, the Ollama API has gained popularity and some users and apps prefer it to Open AI's API. With that in mind, it would be great to have an option to exercise control over `keep_alive`. In a mixed-use environment with some users/apps on the OpenAI API and others on the Ollama API, it only takes one API call with a `keep_alive` to introduce unexpected behavior.
Author
Owner

@rick-github commented on GitHub (Jun 9, 2025):

The problem is that ollama /api endpoint is really a mix of management and functional endpoints. If you can't trust the clients to not set keep_alive, then giving them an endpoint that allows them to delete models is not optimal. If the client really wants to use the ollama endpoint, then the usual way to control access is by using a proxy that can auth the various endpoints and restrict fields in the requests. So far it's been a general policy that if there's a tool external to ollama that can perform a function, then using that tool is recommended over adding code to ollama. Some users push back on that, understandably, as it adds components and requires management, but that's where we are at the moment.

<!-- gh-comment-id:2957317054 --> @rick-github commented on GitHub (Jun 9, 2025): The problem is that ollama `/api` endpoint is really a mix of management and functional endpoints. If you can't trust the clients to not set `keep_alive`, then giving them an endpoint that allows them to delete models is not optimal. If the client really wants to use the ollama endpoint, then the usual way to control access is by using a proxy that can auth the various endpoints and restrict fields in the requests. So far it's been a general policy that if there's a tool external to ollama that can perform a function, then using that tool is recommended over adding code to ollama. Some users push back on that, understandably, as it adds components and requires management, but that's where we are at the moment.
Author
Owner

@kkrick-sdsu commented on GitHub (Jun 10, 2025):

Thanks for the additional context. I'll look for an appropriate tool that meets the need. We can close this feature request then.

<!-- gh-comment-id:2959722663 --> @kkrick-sdsu commented on GitHub (Jun 10, 2025): Thanks for the additional context. I'll look for an appropriate tool that meets the need. We can close this feature request then.
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

mitmproxy is useful for managing request contents.

services:
  ollama-backend:
    image: ollama/ollama:${OLLAMA_DOCKER_TAG-latest}
    volumes:
      - ${OLLAMA_HOME-./ollama}:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=${OLLAMA_KEEP_ALIVE--1}
      - OLLAMA_DEBUG=${OLLAMA_DEBUG-2}

  ollama-mitmproxy:
    image: mitmproxy-modify
    build:
      dockerfile_inline: |
        FROM mitmproxy/mitmproxy
        WORKDIR /home/mitmproxy
        RUN cat > filter.py <<"EOF"
        import json
        from mitmproxy import http
        def responseheaders(flow):
          flow.response.stream = True
        def request(flow: http.HTTPFlow) -> None:
          try:
            payload = json.loads(flow.request.text)
            payload.pop("keep_alive")
            flow.request.text = json.dumps(payload)
            flow.request.headers["content-length"] = str(len(flow.request.text))
          except:
            pass
        EOF
    command: [ "mitmdump", "--mode", "reverse:http://ollama-backend:11434", "-s", "filter.py" ]
    ports:
      - 11434:8080

I've used docker here for environment management, but mitmproxy is cross-platform and can be run from a command line.

<!-- gh-comment-id:2959740647 --> @rick-github commented on GitHub (Jun 10, 2025): `mitmproxy` is useful for managing request contents. ```yaml services: ollama-backend: image: ollama/ollama:${OLLAMA_DOCKER_TAG-latest} volumes: - ${OLLAMA_HOME-./ollama}:/root/.ollama environment: - OLLAMA_KEEP_ALIVE=${OLLAMA_KEEP_ALIVE--1} - OLLAMA_DEBUG=${OLLAMA_DEBUG-2} ollama-mitmproxy: image: mitmproxy-modify build: dockerfile_inline: | FROM mitmproxy/mitmproxy WORKDIR /home/mitmproxy RUN cat > filter.py <<"EOF" import json from mitmproxy import http def responseheaders(flow): flow.response.stream = True def request(flow: http.HTTPFlow) -> None: try: payload = json.loads(flow.request.text) payload.pop("keep_alive") flow.request.text = json.dumps(payload) flow.request.headers["content-length"] = str(len(flow.request.text)) except: pass EOF command: [ "mitmdump", "--mode", "reverse:http://ollama-backend:11434", "-s", "filter.py" ] ports: - 11434:8080 ``` I've used docker here for environment management, but mitmproxy is cross-platform and can be run from a command line.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7254