issue: The "Keep Alive" setting has no effect #4428

Closed
opened 2025-11-11 15:53:52 -06:00 by GiteaMirror · 7 comments
Owner

Originally created by @FlippingBinary on GitHub (Mar 14, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.5.20

Ollama Version (if applicable)

v0.6.0

Operating System

Windows 11

Browser (if applicable)

Vivaldi 7.1.3570.39 (Stable channel) (64-bit)

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have listed steps to reproduce the bug in detail.

Expected Behavior

Settings -> General -> Advanced Parameters -> Show -> Keep Alive -> Custom -> 60m should cause open-webui to send a keep_alive parameter in requests to the /api/generate and /api/chat endpoints.

Actual Behavior

The parameter is absent from requests and does not modify Ollama's behavior. This can be verified by examining the request body in the networking tab of the browser's developer tools and with the CLI by ollama ps, which shows the time to live for any loaded models.

Steps to Reproduce

  1. Log in to a user (I'll assume it's an admin).
  2. Click the user menu in the lower left of the interface.
  3. Click "Settings"
  4. Click "General"
  5. Click "Show" next to "Advanced Parameters"
  6. Scroll to the bottom.
  7. Click "Custom" next to "Keep Alive"
  8. Set a valid keep alive to something other than the default value, like 60m or -1.
  9. Save and close the settings.
  10. Generate a chat response.

Logs & Screenshots

Even though Keep Alive is set to 60m and the response is actively being generated while calling ollama ps:

NAME          ID              SIZE     PROCESSOR    UNTIL
gemma3:12b    6fd036cefda5    11 GB    100% GPU     4 minutes from now

The network tab shows the request payload:

Image

Additional Information

Before opening this issue, I searched for code in the repository that sends the parameter to Ollama because I was trying to understand why it's not working. I quickly found pull request #721 that seems to add the capability in its initial commit 7053f2f67d. It modified generatePrompt to take a body object and pass it to Ollama. It also modified src/lib/components/chat/MessageInput/Models.svelte to set the parameter in the body when calling generatePrompt.

In that same pull request, 3057bfe5a0 reverted those changes, and I can't find any reason why it was reverted. It looks like that's the reason the keep alive setting has never changed Ollama's behavior since that PR added the setting to open-webui's interface.

After #721 was merged, @jupiterbjy, @LoadingCode233, @SteamNimmersatt, and @tinglion all wrote comments about it not working for them.

Would a PR fixing this be welcome? I feel like I'm missing something like maybe there are bigger changes in the works.

Originally created by @FlippingBinary on GitHub (Mar 14, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.5.20 ### Ollama Version (if applicable) v0.6.0 ### Operating System Windows 11 ### Browser (if applicable) Vivaldi 7.1.3570.39 (Stable channel) (64-bit) ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior `Settings -> General -> Advanced Parameters -> Show -> Keep Alive -> Custom -> 60m` should cause open-webui to send a `keep_alive` parameter in requests to the `/api/generate` and `/api/chat` endpoints. ### Actual Behavior The parameter is absent from requests and does not modify Ollama's behavior. This can be verified by examining the request body in the networking tab of the browser's developer tools and with the CLI by `ollama ps`, which shows the time to live for any loaded models. ### Steps to Reproduce 1. Log in to a user (I'll assume it's an admin). 2. Click the user menu in the lower left of the interface. 3. Click "Settings" 4. Click "General" 5. Click "Show" next to "Advanced Parameters" 6. Scroll to the bottom. 7. Click "Custom" next to "Keep Alive" 8. Set a valid keep alive to something other than the default value, like `60m` or `-1`. 9. Save and close the settings. 10. Generate a chat response. ### Logs & Screenshots Even though `Keep Alive` is set to `60m` and the response is actively being generated while calling `ollama ps`: ``` NAME ID SIZE PROCESSOR UNTIL gemma3:12b 6fd036cefda5 11 GB 100% GPU 4 minutes from now ``` The network tab shows the request payload: ![Image](https://github.com/user-attachments/assets/efc860d1-5252-4262-8c0c-82863403b442) ### Additional Information Before opening this issue, I searched for code in the repository that sends the parameter to Ollama because I was trying to understand why it's not working. I quickly found pull request #721 that seems to add the capability in its initial commit https://github.com/open-webui/open-webui/pull/721/commits/7053f2f67dc9b20bffaa95a31909a16550a94f70. It modified `generatePrompt` to take a body object and pass it to Ollama. It also modified `src/lib/components/chat/MessageInput/Models.svelte` to set the parameter in the body when calling `generatePrompt`. In that same pull request, https://github.com/open-webui/open-webui/pull/721/commits/3057bfe5a02a0e88a10aa1655d26253413f85af0 reverted those changes, and I can't find any reason why it was reverted. It looks like that's the reason the keep alive setting has never changed Ollama's behavior since that PR added the setting to open-webui's interface. After #721 was merged, @jupiterbjy, @LoadingCode233, @SteamNimmersatt, and @tinglion all wrote comments about it not working for them. Would a PR fixing this be welcome? I feel like I'm missing something like maybe there are bigger changes in the works.
GiteaMirror added the bug label 2025-11-11 15:53:52 -06:00
Author
Owner

@foraxe commented on GitHub (Mar 14, 2025):

I just created this discussion before making a Pull request. It is on this issue.
https://github.com/open-webui/open-webui/discussions/11690
The problem with the current code is because of the keep_alive config not correctly sent to ollama.
I now am using a uv installed owui. A temporary fix would be to manually modify the payload code.

@foraxe commented on GitHub (Mar 14, 2025): I just created this discussion before making a Pull request. It is on this issue. https://github.com/open-webui/open-webui/discussions/11690 The problem with the current code is because of the keep_alive config not correctly sent to ollama. I now am using a uv installed owui. A temporary fix would be to manually modify the payload code.
Author
Owner

@FlippingBinary commented on GitHub (Mar 14, 2025):

Ahh, sorry @foraxe. I didn't notice you created a discussion while I was writing up this issue. Now I see the call to /api/chat/completions does send params.keep_alive in the request payload, according to the browser.

Image

So that should just need to be moved up a level like you wrote in your discussion, but I'm not sure where it's getting set.

@FlippingBinary commented on GitHub (Mar 14, 2025): Ahh, sorry @foraxe. I didn't notice you created a discussion while I was writing up this issue. Now I see the call to `/api/chat/completions` does send `params.keep_alive` in the request payload, according to the browser. ![Image](https://github.com/user-attachments/assets/b0314173-c971-4806-a12c-f1a70ffad219) So that should just need to be moved up a level like you wrote in your discussion, but I'm not sure where it's getting set.
Author
Owner

@FlippingBinary commented on GitHub (Mar 14, 2025):

Wait, that's what's getting sent to open-webui, which is okay the way it is. The problem is open-webui isn't passing that parameter on to the Ollama backend. It's not going to be visible in the browser tools.

@FlippingBinary commented on GitHub (Mar 14, 2025): Wait, that's what's getting sent to open-webui, which is okay the way it is. The problem is open-webui isn't passing that parameter on to the Ollama backend. It's not going to be visible in the browser tools.
Author
Owner

@foraxe commented on GitHub (Mar 14, 2025):

Hi, @FlippingBinary I am still testing the fix.
But, for a temporary fix, you can modify the backend/open_webui/routers/ollama.py.
Add: payload["keep_alive"] = -1 # keep alive forever
In function:
@router.post("/api/chat")
@router.post("/api/chat/{url_idx}")
async def generate_chat_completion(
,before return await send_post_request(

like:
@router.post("/api/chat")
@router.post("/api/chat/{url_idx}")
async def generate_chat_completion(
request: Request,
form_data: dict,
....
payload["keep_alive"] = -1 # keep alive forever
return await send_post_request(
url=f"{url}/api/chat",
payload=json.dumps(payload),
stream=form_data.stream,
key=get_api_key(url_idx, url, request.app.state.config.OLLAMA_API_CONFIGS),
content_type="application/x-ndjson",
user=user,
)

@foraxe commented on GitHub (Mar 14, 2025): Hi, @FlippingBinary I am still testing the fix. But, for a temporary fix, you can modify the backend/open_webui/routers/ollama.py. Add: payload["keep_alive"] = -1 # keep alive forever In function: @router.post("/api/chat") @router.post("/api/chat/{url_idx}") async def generate_chat_completion( ,before return await send_post_request( like: @router.post("/api/chat") @router.post("/api/chat/{url_idx}") async def generate_chat_completion( request: Request, form_data: dict, .... payload["keep_alive"] = -1 # keep alive forever return await send_post_request( url=f"{url}/api/chat", payload=json.dumps(payload), stream=form_data.stream, key=get_api_key(url_idx, url, request.app.state.config.OLLAMA_API_CONFIGS), content_type="application/x-ndjson", user=user, )
Author
Owner

@FlippingBinary commented on GitHub (Mar 14, 2025):

Setting environment variables for Ollama is another way to set a more useful default than 5 minutes. I imagine most people are using a workaround like this:

        OLLAMA_KEEP_ALIVE: "60m"
        OLLAMA_NUM_PARALLEL: 2
        OLLAMA_MAX_LOADED_MODELS: 2

It would be nice if the keep alive parameter could be set on a per-model basis, though, especially if the administrator can set boundaries on upper and lower values. That would meet the need described in #3284.

@FlippingBinary commented on GitHub (Mar 14, 2025): Setting environment variables for Ollama is another way to set a more useful default than 5 minutes. I imagine most people are using a workaround like this: OLLAMA_KEEP_ALIVE: "60m" OLLAMA_NUM_PARALLEL: 2 OLLAMA_MAX_LOADED_MODELS: 2 It would be nice if the keep alive parameter could be set on a per-model basis, though, especially if the administrator can set boundaries on upper and lower values. That would meet the need described in #3284.
Author
Owner

@foraxe commented on GitHub (Mar 15, 2025):

Setting environment variables for Ollama is another way to set a more useful default than 5 minutes. I imagine most people are using a workaround like this:

        OLLAMA_KEEP_ALIVE: "60m"
        OLLAMA_NUM_PARALLEL: 2
        OLLAMA_MAX_LOADED_MODELS: 2

It would be nice if the keep alive parameter could be set on a per-model basis, though, especially if the administrator can set boundaries on upper and lower values. That would meet the need described in #3284.

  1. The environment variables for Ollama are easy to override.
    In ef378ad673/docs/faq.md (L253) :

The keep_alive API parameter with the /api/generate and /api/chat API endpoints will override the OLLAMA_KEEP_ALIVE setting.

  1. In the issue, the OLLAMA_KEEP_ALIVE was overridden by the POST message from owui. So, using environment variables for workarounds here is not effective.
  2. The OLLAMA_KEEP_ALIVE can be overridden by other program's behaviors. In such scenarios, it would induce more complexity in owui.
  3. I saw the earlier https://github.com/open-webui/open-webui/issues/3284. Maybe some work can be done in the frontend not to show 'user' options on some configs. And setting boundaries is an interesting one.
@foraxe commented on GitHub (Mar 15, 2025): > Setting environment variables for Ollama is another way to set a more useful default than 5 minutes. I imagine most people are using a workaround like this: > > ``` > OLLAMA_KEEP_ALIVE: "60m" > OLLAMA_NUM_PARALLEL: 2 > OLLAMA_MAX_LOADED_MODELS: 2 > ``` > > It would be nice if the keep alive parameter could be set on a per-model basis, though, especially if the administrator can set boundaries on upper and lower values. That would meet the need described in [#3284](https://github.com/open-webui/open-webui/issues/3284). 1. The environment variables for Ollama are easy to override. In https://github.com/ollama/ollama/blob/ef378ad673a3f01382add316835957b1d4184177/docs/faq.md?plain=1#L253 : > The `keep_alive` API parameter with the `/api/generate` and `/api/chat` API endpoints will override the `OLLAMA_KEEP_ALIVE` setting. 2. In the issue, the `OLLAMA_KEEP_ALIVE` was overridden by the POST message from owui. So, using environment variables for workarounds here is not effective. 3. The `OLLAMA_KEEP_ALIVE` can be overridden by other program's behaviors. In such scenarios, it would induce more complexity in owui. 4. I saw the earlier https://github.com/open-webui/open-webui/issues/3284. Maybe some work can be done in the frontend not to show 'user' options on some configs. And setting boundaries is an interesting one.
Author
Owner

@tjbck commented on GitHub (Mar 15, 2025):

PR merged, Thanks!

@tjbck commented on GitHub (Mar 15, 2025): PR merged, Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#4428