[GH-ISSUE #9577] keepalive parameter not take into account #68302

Closed
opened 2026-05-04 13:11:06 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @nicho2 on GitHub (Mar 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9577

What is the issue?

Hello,

I send a request with keep_alive = 8m

but the ollama release the model after 5m

time start = 2025-03-07T12:30:09
time_release= 2025-03-07T12:35:00

but in the log , there is duration=8m0s

Image

Relevant log output

time=2025-03-07T12:30:09.671Z level=DEBUG source=routes.go:1501 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\n\n<i-----------------2<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

time=2025-03-07T12:30:09.752Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=71199 used=0 remaining=71199

time=2025-03-07T12:35:00.529Z level=DEBUG source=sched.go:467 msg="context for request finished"

time=2025-03-07T12:35:00.529Z level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d duration=8m0s

time=2025-03-07T12:35:00.530Z level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d refCount=0

[GIN] 2025/03/07 - 12:35:00 | 200 |          5m0s |      172.21.0.1 | POST     "/api/chat"

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.5.13

Originally created by @nicho2 on GitHub (Mar 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9577 ### What is the issue? Hello, I send a request with keep_alive = 8m but the ollama release the model after 5m time start = 2025-03-07T12:30:09 time_release= 2025-03-07T12:35:00 but in the log , there is duration=8m0s ![Image](https://github.com/user-attachments/assets/e2873b61-4932-4bb0-97e4-f168e1971208) ### Relevant log output ```shell time=2025-03-07T12:30:09.671Z level=DEBUG source=routes.go:1501 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\n\n<i-----------------2<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" time=2025-03-07T12:30:09.752Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=71199 used=0 remaining=71199 time=2025-03-07T12:35:00.529Z level=DEBUG source=sched.go:467 msg="context for request finished" time=2025-03-07T12:35:00.529Z level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d duration=8m0s time=2025-03-07T12:35:00.530Z level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d refCount=0 [GIN] 2025/03/07 - 12:35:00 | 200 | 5m0s | 172.21.0.1 | POST "/api/chat" ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.13
GiteaMirror added the bugneeds more info labels 2026-05-04 13:11:07 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 7, 2025):

The model hasn't been unloaded. The completion took 5 minutes, and then the 8 minute timeout started. If another request is not received, the model will be unloaded at 12:43.

<!-- gh-comment-id:2706401840 --> @rick-github commented on GitHub (Mar 7, 2025): The model hasn't been unloaded. The completion took 5 minutes, and then the 8 minute timeout started. If another request is not received, the model will be unloaded at 12:43.
Author
Owner

@nicho2 commented on GitHub (Mar 7, 2025):

The completion took 5 minutes, and i haven't an answer (llm).

just ollama send:
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Date: Fri, 07 Mar 2025 13:25:38 GMT
Transfer-Encoding: chunked

{"error":"POST predict: Post "http://127.0.0.1:35039/completion": context canceled"}

<!-- gh-comment-id:2706460496 --> @nicho2 commented on GitHub (Mar 7, 2025): The completion took 5 minutes, and i haven't an answer (llm). just ollama send: HTTP/1.1 200 OK Content-Type: application/x-ndjson Date: Fri, 07 Mar 2025 13:25:38 GMT Transfer-Encoding: chunked {"error":"POST predict: Post \"http://127.0.0.1:35039/completion\": context canceled"}
Author
Owner

@rick-github commented on GitHub (Mar 7, 2025):

Does your client have a 5 minute timeout?

<!-- gh-comment-id:2706473808 --> @rick-github commented on GitHub (Mar 7, 2025): Does your client have a 5 minute timeout?
Author
Owner

@nicho2 commented on GitHub (Mar 7, 2025):

No , i don't see

is-it possible is linking with OLLAMA_LOAD_TIMEOUT ? :
OLLAMA_LOAD_TIMEOUT How long to allow model loads to stall before giving up (default "5m")

<!-- gh-comment-id:2706498509 --> @nicho2 commented on GitHub (Mar 7, 2025): No , i don't see is-it possible is linking with OLLAMA_LOAD_TIMEOUT ? : OLLAMA_LOAD_TIMEOUT How long to allow model loads to stall before giving up (default "5m")
Author
Owner

@rick-github commented on GitHub (Mar 7, 2025):

No, the model is already loaded.

<!-- gh-comment-id:2706503599 --> @rick-github commented on GitHub (Mar 7, 2025): No, the model is already loaded.
Author
Owner

@nicho2 commented on GitHub (Mar 7, 2025):

sometimes when i regenerate from the chat , i have the answer immediatly and the cache in not null

time=2025-03-07T13:49:10.358Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=107963 prompt=95938 used=95888 remaining=50

<!-- gh-comment-id:2706532610 --> @nicho2 commented on GitHub (Mar 7, 2025): sometimes when i regenerate from the chat , i have the answer immediatly and the cache in not null time=2025-03-07T13:49:10.358Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=107963 prompt=95938 used=95888 remaining=50
Author
Owner

@rick-github commented on GitHub (Mar 7, 2025):

Yes, that's the point of a cache.

<!-- gh-comment-id:2706537258 --> @rick-github commented on GitHub (Mar 7, 2025): Yes, that's the point of a cache.
Author
Owner

@pdevine commented on GitHub (Mar 12, 2025):

OK, I'm not quite sure how to help here. @nicho2 it seems like Ollama is behaving as intended?

<!-- gh-comment-id:2716162807 --> @pdevine commented on GitHub (Mar 12, 2025): OK, I'm not quite sure how to help here. @nicho2 it seems like Ollama is behaving as intended?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68302