[PR #2146] [MERGED] add keep_alive to generate/chat/embedding api endpoints #73093

Closed
opened 2026-05-05 04:45:20 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2146
Author: @pdevine
Created: 1/22/2024
Status: Merged
Merged: 1/26/2024
Merged by: @pdevine

Base: mainHead: keepalive


📝 Commits (3)

  • 57cadbc add keep_alive to /api/generate
  • 7bb6ccb fix lint warning
  • 0cf5815 fix parsed duration + add to chat/embed endpoints

📊 Changes

2 files changed (+48 additions, -20 deletions)

View changed files

📝 api/types.go (+25 -17)
📝 server/routes.go (+23 -3)

📄 Description

This change adds a new keep_alive parameter to /api/generate which can control the duration for how long a model is loaded and left in memory. There are three cases:

  1. if keep_alive is not set, the model will stay loaded for the default value (5 minutes);
  2. if keep_alive is set to a positive duration (e.g. "20m"), it will stay loaded for the duration;
  3. if keep_alive is set to a negative duration (e.g. "-1m"), it will stay loaded indefinitely

If you wish the model to be loaded immediately after generation, you can set it to "0m", or even just 0. Also, maybe most importantly, subsequent calls to the /api/generate will change the load duration, so even if you called it once with a negative value and the next caller omits it, it will still only stay in memory for 5 minutes after the second call.

Note that this change only applies to the /api/generate. We can either layer on the changes for /api/chat on top of this change, or push it as a separate PR.

resolves #1339


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2146 **Author:** [@pdevine](https://github.com/pdevine) **Created:** 1/22/2024 **Status:** ✅ Merged **Merged:** 1/26/2024 **Merged by:** [@pdevine](https://github.com/pdevine) **Base:** `main` ← **Head:** `keepalive` --- ### 📝 Commits (3) - [`57cadbc`](https://github.com/ollama/ollama/commit/57cadbcc79760e568cbe95765df14394e010d910) add keep_alive to `/api/generate` - [`7bb6ccb`](https://github.com/ollama/ollama/commit/7bb6ccbbee46277bf864c77aa44e0e93c370203c) fix lint warning - [`0cf5815`](https://github.com/ollama/ollama/commit/0cf581521726ccd1782751e4696a795641993f88) fix parsed duration + add to chat/embed endpoints ### 📊 Changes **2 files changed** (+48 additions, -20 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+25 -17) 📝 `server/routes.go` (+23 -3) </details> ### 📄 Description This change adds a new `keep_alive` parameter to `/api/generate` which can control the duration for how long a model is loaded and left in memory. There are three cases: 1. if `keep_alive` is not set, the model will stay loaded for the default value (5 minutes); 2. if `keep_alive` is set to a positive duration (e.g. "20m"), it will stay loaded for the duration; 3. if `keep_alive` is set to a negative duration (e.g. "-1m"), it will stay loaded indefinitely If you wish the model to be loaded immediately after generation, you can set it to "0m", or even just `0`. Also, maybe *most importantly*, subsequent calls to the `/api/generate` will change the load duration, so even if you called it once with a negative value and the next caller omits it, it will still only stay in memory for 5 minutes after the second call. Note that this change only applies to the `/api/generate`. We can either layer on the changes for `/api/chat` on top of this change, or push it as a separate PR. resolves #1339 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 04:45:20 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#73093