[PR #4153] [MERGED] Add GPU usage #11401

Closed
opened 2026-04-12 23:29:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4153
Author: @dhiltgen
Created: 5/4/2024
Status: Merged
Merged: 5/8/2024
Merged by: @dhiltgen

Base: mainHead: gpu_verbose_response


📝 Commits (1)

  • bee2f4a Record GPU usage information

📊 Changes

3 files changed (+40 additions, -20 deletions)

View changed files

📝 format/bytes.go (+2 -0)
📝 llm/memory.go (+12 -12)
📝 llm/server.go (+26 -8)

📄 Description

Help users understand how much of the model fit into their GPU without having to resort to inspecting the server log

A few examples from different systems and models

eval rate:            4.40 tokens/s
gpu usage:            1 GPU (14/27 layers) 3.2 GB (2.0 GB GPU)

eval rate:            6.64 tokens/s
gpu usage:            1 GPU (27/27 layers) 3.2 GB

eval rate:            18.44 tokens/s
gpu usage:            2 GPUs (27/33 layers) 27 GB (24 GB GPU)

eval rate:            19.58 tokens/s
gpu usage:            CPU (0/27 layers) 3.2 GB

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4153 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 5/4/2024 **Status:** ✅ Merged **Merged:** 5/8/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `gpu_verbose_response` --- ### 📝 Commits (1) - [`bee2f4a`](https://github.com/ollama/ollama/commit/bee2f4a3b0e5dbf6611a399cd6b8f6b176b9d376) Record GPU usage information ### 📊 Changes **3 files changed** (+40 additions, -20 deletions) <details> <summary>View changed files</summary> 📝 `format/bytes.go` (+2 -0) 📝 `llm/memory.go` (+12 -12) 📝 `llm/server.go` (+26 -8) </details> ### 📄 Description Help users understand how much of the model fit into their GPU without having to resort to inspecting the server log A few examples from different systems and models ``` eval rate: 4.40 tokens/s gpu usage: 1 GPU (14/27 layers) 3.2 GB (2.0 GB GPU) eval rate: 6.64 tokens/s gpu usage: 1 GPU (27/27 layers) 3.2 GB eval rate: 18.44 tokens/s gpu usage: 2 GPUs (27/33 layers) 27 GB (24 GB GPU) eval rate: 19.58 tokens/s gpu usage: CPU (0/27 layers) 3.2 GB ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:29:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11401