[GH-ISSUE #6401] embeddings models keep_alive #50532

Closed
opened 2026-04-28 16:13:57 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Abdulrahman392011 on GitHub (Aug 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6401

I use embeddings models a lot and every time it loads the model do the vectoring and then unload it immediately. when I try to keep alive by using this command

$ curl http://localhost:11434/api/generate -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}'

it tells me that this model isn't a generative model and refuse to keep alive. please have support for it to decrease the latency as it copies the 600 megabytes every time and then delete it which adds a couple of seconds to an operation that should take only a second.

Originally created by @Abdulrahman392011 on GitHub (Aug 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6401 I use embeddings models a lot and every time it loads the model do the vectoring and then unload it immediately. when I try to keep alive by using this command $ curl http://localhost:11434/api/generate -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}' it tells me that this model isn't a generative model and refuse to keep alive. please have support for it to decrease the latency as it copies the 600 megabytes every time and then delete it which adds a couple of seconds to an operation that should take only a second.
GiteaMirror added the feature request label 2026-04-28 16:13:57 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 17, 2024):

You need to load an embedding model via the embedding API endpoint:

$ curl http://localhost:11434/api/embed -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}'
$ ollama ps
NAME                            ID              SIZE    PROCESSOR       UNTIL   
mxbai-embed-large:latest        468836162de7    1.2 GB  100% GPU        Forever
<!-- gh-comment-id:2295024968 --> @rick-github commented on GitHub (Aug 17, 2024): You need to load an embedding model via the embedding API endpoint: ``` $ curl http://localhost:11434/api/embed -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}' $ ollama ps NAME ID SIZE PROCESSOR UNTIL mxbai-embed-large:latest 468836162de7 1.2 GB 100% GPU Forever ```
Author
Owner

@Abdulrahman392011 commented on GitHub (Aug 17, 2024):

thank you. I tried and it works.

<!-- gh-comment-id:2295025552 --> @Abdulrahman392011 commented on GitHub (Aug 17, 2024): thank you. I tried and it works.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50532