[PR #4327] [MERGED] Ollama ps command for showing currently loaded models #73754

Closed
opened 2026-05-05 05:41:26 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4327
Author: @pdevine
Created: 5/10/2024
Status: Merged
Merged: 5/14/2024
Merged by: @pdevine

Base: mainHead: pdevine/ps


📝 Commits (6)

📊 Changes

10 files changed (+193 additions, -50 deletions)

View changed files

📝 api/client.go (+9 -0)
📝 api/types.go (+3 -1)
📝 cmd/cmd.go (+75 -0)
📝 cmd/interactive.go (+5 -0)
📝 format/time.go (+3 -1)
📝 format/time_test.go (+10 -0)
📝 llm/server.go (+5 -0)
📝 server/routes.go (+29 -0)
📝 server/sched.go (+42 -38)
📝 server/sched_test.go (+12 -10)

📄 Description

This change adds a rudimentary ps command which makes use of the new scheduler changes in the server. The UX also

The UX for this depends on whether you're using the CPU, GPU, or a hybrid of both and looks like:

NAME            ID              SIZE    PROCESSOR        UNTIL
mistral:latest  61e88e884507    5.4 GB  100% GPU         28 seconds from now

NAME            ID              SIZE    PROCESSOR        UNTIL
mistral:latest  61e88e884507    5.4 GB  48%/52% CPU/GPU  28 seconds from now

NAME            ID              SIZE    PROCESSOR        UNTIL
mistral:latest  61e88e884507    5.4 GB  100% CPU         28 seconds from now

Additionally, there is a new --keepalive flag in the REPL which can be used to set how long you want the model to stay resident in memory after the model has finished inference. It takes a duration string (e.g. 3m30s), however we can switch this to also accept integers similar to the API.

This also introduces a new /api/ps endpoint which returns back a response similar to the /api/tags endpoint albeit with additional information. The size of the running model will not match the amount reported from the /api/tags endpoint for a given model since it can take additional memory when loaded onto the GPU or as a hybrid.

Partially addresses #3902
Fixes #4013
Replaces #2359


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4327 **Author:** [@pdevine](https://github.com/pdevine) **Created:** 5/10/2024 **Status:** ✅ Merged **Merged:** 5/14/2024 **Merged by:** [@pdevine](https://github.com/pdevine) **Base:** `main` ← **Head:** `pdevine/ps` --- ### 📝 Commits (6) - [`8d95c9b`](https://github.com/ollama/ollama/commit/8d95c9bc91a550012731441dec21bbebd09def36) add ollama ps command - [`334fdc7`](https://github.com/ollama/ollama/commit/334fdc73667f478c88a6b08b9b86aa4c538aefe3) humantime forever - [`a8e6033`](https://github.com/ollama/ollama/commit/a8e6033168afa4e9b403df8c8ac8bed44c95d8ce) add keepalive to `ollama run` - [`dd38c7e`](https://github.com/ollama/ollama/commit/dd38c7e8e9244e193bb51036bbed3f9437362154) show cpu/gpu percentages - [`bc03ad8`](https://github.com/ollama/ollama/commit/bc03ad81564eac725652923487d0a28d3e97f965) fix sched unit tests - [`d94da46`](https://github.com/ollama/ollama/commit/d94da46c092fd67f527731ff6d48c1895387ef67) feed the linter ### 📊 Changes **10 files changed** (+193 additions, -50 deletions) <details> <summary>View changed files</summary> 📝 `api/client.go` (+9 -0) 📝 `api/types.go` (+3 -1) 📝 `cmd/cmd.go` (+75 -0) 📝 `cmd/interactive.go` (+5 -0) 📝 `format/time.go` (+3 -1) 📝 `format/time_test.go` (+10 -0) 📝 `llm/server.go` (+5 -0) 📝 `server/routes.go` (+29 -0) 📝 `server/sched.go` (+42 -38) 📝 `server/sched_test.go` (+12 -10) </details> ### 📄 Description This change adds a rudimentary `ps` command which makes use of the new scheduler changes in the server. The UX also The UX for this depends on whether you're using the CPU, GPU, or a hybrid of both and looks like: ``` NAME ID SIZE PROCESSOR UNTIL mistral:latest 61e88e884507 5.4 GB 100% GPU 28 seconds from now NAME ID SIZE PROCESSOR UNTIL mistral:latest 61e88e884507 5.4 GB 48%/52% CPU/GPU 28 seconds from now NAME ID SIZE PROCESSOR UNTIL mistral:latest 61e88e884507 5.4 GB 100% CPU 28 seconds from now ``` Additionally, there is a new `--keepalive` flag in the REPL which can be used to set how long you want the model to stay resident in memory after the model has finished inference. It takes a duration string (e.g. `3m30s`), however we can switch this to also accept integers similar to the API. This also introduces a new `/api/ps` endpoint which returns back a response similar to the `/api/tags` endpoint albeit with additional information. The size of the running model *will not* match the amount reported from the `/api/tags` endpoint for a given model since it can take additional memory when loaded onto the GPU or as a hybrid. Partially addresses #3902 Fixes #4013 Replaces #2359 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 05:41:26 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#73754