[GH-ISSUE #3144] add /metrics endpoint #27694

Open
opened 2026-04-22 05:13:41 -05:00 by GiteaMirror · 46 comments
Owner

Originally created by @codearranger on GitHub (Mar 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3144

Originally assigned to: @ParthSareen on GitHub.

It would be nice of ollama had a /metrics endpoint for collecting metrics for prometheus or other monitoring tools.

https://prometheus.io/docs/guides/go-application/

Some metrics to include might be,
GPU utilization, memory utilization, CPU utilzation, layers used, request counts, etc.

Originally created by @codearranger on GitHub (Mar 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3144 Originally assigned to: @ParthSareen on GitHub. It would be nice of ollama had a /metrics endpoint for collecting metrics for prometheus or other monitoring tools. https://prometheus.io/docs/guides/go-application/ Some metrics to include might be, GPU utilization, memory utilization, CPU utilzation, layers used, request counts, etc.
GiteaMirror added the feature requestapi labels 2026-04-22 05:13:41 -05:00
Author
Owner

@amila-ku commented on GitHub (Mar 16, 2024):

I would like to work on this one. I have worked on several Prometheus metrics integrations on Go apps before.

<!-- gh-comment-id:2001984043 --> @amila-ku commented on GitHub (Mar 16, 2024): I would like to work on this one. I have worked on several Prometheus metrics integrations on Go apps before.
Author
Owner

@aliirz commented on GitHub (Mar 17, 2024):

+1

<!-- gh-comment-id:2002376566 --> @aliirz commented on GitHub (Mar 17, 2024): +1
Author
Owner

@yuliyantsvetkov commented on GitHub (Mar 21, 2024):

I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters.

By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base.

Let me know if I can help with the reviews.

<!-- gh-comment-id:2012517993 --> @yuliyantsvetkov commented on GitHub (Mar 21, 2024): I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters. By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base. Let me know if I can help with the reviews.
Author
Owner

@amila-ku commented on GitHub (Mar 22, 2024):

I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters.

By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base.

Let me know if I can help with the reviews.

Great, I already started by adding the metrics endpoint and trying to add a few custom metrics. I will share what metrics I'm trying to add initially and how it would generally look like.

<!-- gh-comment-id:2014092582 --> @amila-ku commented on GitHub (Mar 22, 2024): > I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters. > > By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base. > > Let me know if I can help with the reviews. Great, I already started by adding the metrics endpoint and trying to add a few custom metrics. I will share what metrics I'm trying to add initially and how it would generally look like.
Author
Owner

@amila-ku commented on GitHub (Apr 7, 2024):

I added metrics endpoint with custom metrics for request counts.

example:

# curl http://127.0.0.1:11434/metrics | grep -i ollama
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6664    0  6664    0     0   519k      0 --:--:-- --:--:-- --:--:--  542k
# HELP ollama_model_list_requests_total The total number of model list requets that have been attempted.
# TYPE ollama_model_list_requests_total counter
ollama_model_list_requests_total{action="list",status="OK",status_code="200"} 1
# HELP ollama_model_pull_requests_total The total number of model pulls that have been attempted.
# TYPE ollama_model_pull_requests_total counter
ollama_model_pull_requests_total{action="pull",status="OK",status_code="200"} 1
# HELP ollama_model_requests_total The total number of requests on all endpoints.
# TYPE ollama_model_requests_total counter
ollama_model_requests_total{action="all",status="OK",status_code="200"} 6

I will not add more in the first PR to make it simpler.

<!-- gh-comment-id:2041644151 --> @amila-ku commented on GitHub (Apr 7, 2024): I added metrics endpoint with custom metrics for request counts. example: ``` # curl http://127.0.0.1:11434/metrics | grep -i ollama % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 6664 0 6664 0 0 519k 0 --:--:-- --:--:-- --:--:-- 542k # HELP ollama_model_list_requests_total The total number of model list requets that have been attempted. # TYPE ollama_model_list_requests_total counter ollama_model_list_requests_total{action="list",status="OK",status_code="200"} 1 # HELP ollama_model_pull_requests_total The total number of model pulls that have been attempted. # TYPE ollama_model_pull_requests_total counter ollama_model_pull_requests_total{action="pull",status="OK",status_code="200"} 1 # HELP ollama_model_requests_total The total number of requests on all endpoints. # TYPE ollama_model_requests_total counter ollama_model_requests_total{action="all",status="OK",status_code="200"} 6 ``` I will not add more in the first PR to make it simpler.
Author
Owner

@sammcj commented on GitHub (Apr 9, 2024):

This is nice!

<!-- gh-comment-id:2046198343 --> @sammcj commented on GitHub (Apr 9, 2024): This is nice!
Author
Owner

@joshcarp commented on GitHub (May 1, 2024):

So I've got a lot of thoughts about this.
I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues.

There's an open PR on adding semantic conventions for LLM applications, but its focus is more on the API side of things, and I think ollama could provide a pretty good use case for standardizing telemetry data for the internal nitty gritty of LLMS; think perplexity, predicted token loss, etc.

<!-- gh-comment-id:2088523876 --> @joshcarp commented on GitHub (May 1, 2024): So I've got a lot of thoughts about this. I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues. There's an [open PR](https://github.com/open-telemetry/semantic-conventions/issues/327) on adding semantic conventions for LLM applications, but its focus is more on the API side of things, and I think ollama could provide a pretty good use case for standardizing telemetry data for the internal nitty gritty of LLMS; think perplexity, predicted token loss, etc.
Author
Owner

@amanagarwal042 commented on GitHub (May 22, 2024):

Hey folks, I am the maintainer of the OpenLIT project

We built OpenTelemetry tracing and metrics for the Python Ollama client (Follows the OTel Semantic conventions for LLMs) in the OpenLIT SDK. This basically would give tracing and metrics for API side of things like prompts, response, tokens, some request and response metadata.

You can check it out here https://github.com/openlit/openlit, Lemme know if you have any thoughts on this as I feel this is closely related to this open issue aswell.

We are trying to evaluate on how using the same library we could extract GPU metrics (possibly using nvidias-smi), If anyone has thoughts that would love to see

<!-- gh-comment-id:2124476504 --> @amanagarwal042 commented on GitHub (May 22, 2024): Hey folks, I am the maintainer of the OpenLIT project We built OpenTelemetry tracing and metrics for the Python Ollama client (Follows the OTel Semantic conventions for LLMs) in the OpenLIT SDK. This basically would give tracing and metrics for API side of things like prompts, response, tokens, some request and response metadata. You can check it out here **https://github.com/openlit/openlit**, Lemme know if you have any thoughts on this as I feel this is closely related to this open issue aswell. We are trying to evaluate on how using the same library we could extract GPU metrics (possibly using nvidias-smi), If anyone has thoughts that would love to see
Author
Owner

@nsankar commented on GitHub (Jun 4, 2024):

@patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter.

<!-- gh-comment-id:2146716100 --> @nsankar commented on GitHub (Jun 4, 2024): @patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter.
Author
Owner

@Agash commented on GitHub (Jun 6, 2024):

So I've got a lot of thoughts about this. I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues.

If metrics get integrated I think one should definitely go with the OpenTelemetry and not with a product specific client such a Prometheus. I use Prometheus as well, but I can just hook it up with OTLP. Tracing is also an important aspect.

As nice as OpenLIT seems, having the data come from the inference server in a distributed application is way more beneficial in my use case and is keeping me from using Ollama.

<!-- gh-comment-id:2153184254 --> @Agash commented on GitHub (Jun 6, 2024): > So I've got a lot of thoughts about this. I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues. If metrics get integrated I think one should definitely go with the OpenTelemetry and not with a product specific client such a Prometheus. I use Prometheus as well, but I can just hook it up with OTLP. Tracing is also an important aspect. As nice as OpenLIT seems, having the data come from the inference server in a distributed application is way more beneficial in my use case and is keeping me from using Ollama.
Author
Owner

@kennethwolters commented on GitHub (Jun 14, 2024):

queue length would be an important metric to me.

<!-- gh-comment-id:2167736910 --> @kennethwolters commented on GitHub (Jun 14, 2024): queue length would be an important metric to me.
Author
Owner

@amanagarwal042 commented on GitHub (Jun 16, 2024):

@patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter.

Thanks @nsankar, We were looking for something that is OpenTelemetry native (DCGM Exporter metrics are prometheus style). So we did build an OpenTelemetry-variant of the DCGM exporter ourselves.

https://github.com/openlit/openlit/tree/main/otel-gpu-collector

<!-- gh-comment-id:2171678249 --> @amanagarwal042 commented on GitHub (Jun 16, 2024): > @patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter. Thanks @nsankar, We were looking for something that is OpenTelemetry native (DCGM Exporter metrics are prometheus style). So we did build an OpenTelemetry-variant of the DCGM exporter ourselves. https://github.com/openlit/openlit/tree/main/otel-gpu-collector
Author
Owner

@maher-naija-pro commented on GitHub (Aug 18, 2024):

It would be interesting to have this metrics for production deployment.

807_514_1 25
815_565_1 25
988_589_1 25

<!-- gh-comment-id:2295272500 --> @maher-naija-pro commented on GitHub (Aug 18, 2024): It would be interesting to have this metrics for production deployment. ![807_514_1 25](https://github.com/user-attachments/assets/733964d7-4d6f-46a6-87a0-707a6095c6b3) ![815_565_1 25](https://github.com/user-attachments/assets/e360c285-d6b2-4b14-bacd-166be3b50f75) ![988_589_1 25](https://github.com/user-attachments/assets/4a6e1ad7-214b-427d-bfbf-074466873135)
Author
Owner

@nstogner commented on GitHub (Aug 22, 2024):

Thoughts on the approach of defining and implementing a basic set of metrics first and then splitting additional metrics / tracing into other issues? Seems like that might be a good way to move this one forward.

+1 to: https://github.com/ollama/ollama/issues/3144#issuecomment-2041644151

<!-- gh-comment-id:2304591242 --> @nstogner commented on GitHub (Aug 22, 2024): Thoughts on the approach of defining and implementing a basic set of metrics first and then splitting additional metrics / tracing into other issues? Seems like that might be a good way to move this one forward. +1 to: https://github.com/ollama/ollama/issues/3144#issuecomment-2041644151
Author
Owner

@francescor commented on GitHub (Sep 25, 2024):

We would appreciate prometheus /metrics, too

<!-- gh-comment-id:2373256366 --> @francescor commented on GitHub (Sep 25, 2024): We would appreciate prometheus /metrics, too
Author
Owner

@ragonneau commented on GitHub (Dec 9, 2024):

+1

<!-- gh-comment-id:2527335945 --> @ragonneau commented on GitHub (Dec 9, 2024): +1
Author
Owner

@df-cgdm commented on GitHub (Dec 10, 2024):

+1

<!-- gh-comment-id:2530880792 --> @df-cgdm commented on GitHub (Dec 10, 2024): +1
Author
Owner

@arnesund commented on GitHub (Dec 18, 2024):

+1 for metrics, preferably in Prometheus format

<!-- gh-comment-id:2551784239 --> @arnesund commented on GitHub (Dec 18, 2024): +1 for metrics, preferably in Prometheus format
Author
Owner

@ill-yes commented on GitHub (Jan 7, 2025):

Any updates on this?

<!-- gh-comment-id:2575528055 --> @ill-yes commented on GitHub (Jan 7, 2025): Any updates on this?
Author
Owner

@killbee26 commented on GitHub (Jan 7, 2025):

Need updates on this feature implementation!!

<!-- gh-comment-id:2576025868 --> @killbee26 commented on GitHub (Jan 7, 2025): Need updates on this feature implementation!!
Author
Owner

@JusefPol commented on GitHub (Jan 24, 2025):

The most critical ones are the ones that, without knowing you can potentially break everything. GPU VRAM Usage and VRAM Available is critical to avoid failures when trying to run models bigger that what's available, same for System RAM, Drive Capacity also critical to avoid downloading a model without enough space on the hard drive. Those basic ones I believe are critical to any third-party tool to present to the user to avoid hangs and reboots of systems.

<!-- gh-comment-id:2612119781 --> @JusefPol commented on GitHub (Jan 24, 2025): The most critical ones are the ones that, without knowing you can potentially break everything. GPU VRAM Usage and VRAM Available is critical to avoid failures when trying to run models bigger that what's available, same for System RAM, Drive Capacity also critical to avoid downloading a model without enough space on the hard drive. Those basic ones I believe are critical to any third-party tool to present to the user to avoid hangs and reboots of systems.
Author
Owner

@robsonfelix commented on GitHub (Feb 4, 2025):

+1

<!-- gh-comment-id:2635297766 --> @robsonfelix commented on GitHub (Feb 4, 2025): +1
Author
Owner

@AlessandroSpallina commented on GitHub (Feb 12, 2025):

+1

<!-- gh-comment-id:2652848766 --> @AlessandroSpallina commented on GitHub (Feb 12, 2025): +1
Author
Owner

@hadisalem commented on GitHub (Feb 18, 2025):

+1

<!-- gh-comment-id:2665105884 --> @hadisalem commented on GitHub (Feb 18, 2025): +1
Author
Owner

@kushteppalwarappzen commented on GitHub (Mar 11, 2025):

+1

<!-- gh-comment-id:2715749540 --> @kushteppalwarappzen commented on GitHub (Mar 11, 2025): +1
Author
Owner

@Leeaandrob commented on GitHub (Mar 18, 2025):

+1

<!-- gh-comment-id:2734606519 --> @Leeaandrob commented on GitHub (Mar 18, 2025): +1
Author
Owner

@deep-space-explorer commented on GitHub (Mar 31, 2025):

+1

<!-- gh-comment-id:2765649445 --> @deep-space-explorer commented on GitHub (Mar 31, 2025): +1
Author
Owner

@bobanj commented on GitHub (Apr 4, 2025):

+1

<!-- gh-comment-id:2778832754 --> @bobanj commented on GitHub (Apr 4, 2025): +1
Author
Owner

@leviskay commented on GitHub (Apr 24, 2025):

+1

<!-- gh-comment-id:2826336278 --> @leviskay commented on GitHub (Apr 24, 2025): +1
Author
Owner

@lweighall commented on GitHub (May 20, 2025):

+1

<!-- gh-comment-id:2895808181 --> @lweighall commented on GitHub (May 20, 2025): +1
Author
Owner

@jonigl commented on GitHub (Jun 2, 2025):

+1

<!-- gh-comment-id:2931265076 --> @jonigl commented on GitHub (Jun 2, 2025): +1
Author
Owner

@ckuethe commented on GitHub (Jun 2, 2025):

rocm-smi has a json output mode, and here's an example of what it says about my GPU:

{
   "card0": {
      "Device Name": "Radeon RX 7900 XT",
      "Device ID": "0x744c",
      "Device Rev": "0xcc",
      "Subsystem ID": "0x471e",
      "GUID": "21031",
      "Unique ID": "0xd8607413b0e3a90b",
      "VBIOS version": "113-D70401XT-P11",
      "Temperature (Sensor edge) (C)": "38.0",
      "Temperature (Sensor junction) (C)": "47.0",
      "Temperature (Sensor memory) (C)": "48.0",
      "dcefclk clock speed:": "(1563Mhz)",
      "dcefclk clock level:": "1",
      "fclk clock speed:": "(2301Mhz)",
      "fclk clock level:": "7",
      "mclk clock speed:": "(456Mhz)",
      "mclk clock level:": "1",
      "sclk clock speed:": "(43Mhz)",
      "sclk clock level:": "1",
      "socclk clock speed:": "(1500Mhz)",
      "socclk clock level:": "1",
      "pcie clock level": "2 (8.0GT/s x8)",
      "Performance Level": "auto",
      "Max Graphics Package Power (W)": "265.0",
      "Average Graphics Package Power (W)": "55.0",
      "GPU use (%)": "0",
      "GPU Memory Allocated (VRAM%)": "0",
      "GPU Memory Read/Write Activity (%)": "0",
      "Memory Activity": "N/A",
      "Avg. Memory Bandwidth": "0",
      "GPU memory vendor": "samsung",
      "PCIe Replay Count": "0",
      "Serial Number": "N/A",
      "Voltage (mV)": "712",
      "PCI Bus": "0000:0B:00.0",
      "ASD firmware version": "0x210000e3",
      "ME firmware version": "2280",
      "MEC firmware version": "2460",
      "MES firmware version": "0x0000006a",
      "MES KIQ firmware version": "0x00000100",
      "PFP firmware version": "2370",
      "RLC firmware version": "128",
      "SDMA firmware version": "24",
      "SDMA2 firmware version": "24",
      "SMC firmware version": "00.78.127.00",
      "SOS firmware version": "0x00310032",
      "TA RAS firmware version": "27.00.02.05",
      "VCN firmware version": "0x09116006",
      "Card Series": "Radeon RX 7900 XT",
      "Card Model": "0x744c",
      "Card Vendor": "Advanced Micro Devices, Inc. [AMD/ATI]",
      "Card SKU": "D70401XT",
      "Node ID": "1",
      "GFX Version": "gfx1100",
      "Energy counter": "0",
      "Accumulated Energy (uJ)": "0.0",
      "Metric Version and Size (Bytes)": "1.3 120",
      "temperature_edge (C)": "38",
      "temperature_hotspot (C)": "47",
      "temperature_mem (C)": "48",
      "temperature_vrgfx (C)": "41",
      "temperature_vrsoc (C)": "39",
      "temperature_vrmem (C)": "43",
      "average_gfx_activity (%)": "8",
      "average_umc_activity (%)": "0",
      "average_mm_activity (%)": "0",
      "average_socket_power (W)": "55",
      "energy_accumulator (15.259uJ (2^-16))": "0",
      "system_clock_counter (ns)": "186394264281836",
      "average_gfxclk_frequency (MHz)": "42",
      "average_socclk_frequency (MHz)": "N/A",
      "average_uclk_frequency (MHz)": "909",
      "average_vclk0_frequency (MHz)": "2933",
      "average_dclk0_frequency (MHz)": "2199",
      "average_vclk1_frequency (MHz)": "2933",
      "average_dclk1_frequency (MHz)": "2199",
      "current_gfxclk (MHz)": "42",
      "current_socclk (MHz)": "1500",
      "current_uclk (MHz)": "456",
      "current_vclk0 (MHz)": "2933",
      "current_dclk0 (MHz)": "2200",
      "current_vclk1 (MHz)": "2933",
      "current_dclk1 (MHz)": "2200",
      "throttle_status": "2",
      "current_fan_speed (rpm)": "0",
      "pcie_link_width (Lanes)": "8",
      "pcie_link_speed (0.1 GT/s)": "80",
      "gfx_activity_acc (%)": "N/A",
      "mem_activity_acc (%)": "N/A",
      "temperature_hbm (C)": "['N/A', 'N/A', 'N/A', 'N/A']",
      "firmware_timestamp (10ns resolution)": "18446744073709551606",
      "voltage_soc (mV)": "1170",
      "voltage_gfx (mV)": "713",
      "voltage_mem (mV)": "705",
      "indep_throttle_status": "68719476736",
      "current_socket_power (W)": "N/A",
      "vcn_activity (%)": "[0, 'N/A', 'N/A', 'N/A']",
      "gfxclk_lock_status": "N/A",
      "xgmi_link_width": "N/A",
      "xgmi_link_speed (Gbps)": "N/A",
      "pcie_bandwidth_acc (GB/s)": "N/A",
      "pcie_bandwidth_inst (GB/s)": "N/A",
      "pcie_l0_to_recov_count_acc (Count)": "N/A",
      "pcie_replay_count_acc (Count)": "N/A",
      "pcie_replay_rover_count_acc (Count)": "N/A",
      "xgmi_read_data_acc (kB)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "xgmi_write_data_acc (kB)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "current_gfxclks (MHz)": "[42, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "current_socclks (MHz)": "[1500, 'N/A', 'N/A', 'N/A']",
      "current_vclk0s (MHz)": "[2933, 'N/A', 'N/A', 'N/A']",
      "current_dclk0s (MHz)": "[2200, 'N/A', 'N/A', 'N/A']",
      "jpeg_activity (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "pcie_nak_sent_count_acc (Count)": "N/A",
      "pcie_nak_rcvd_count_acc (Count)": "N/A",
      "accumulation_counter (Count)": "N/A",
      "prochot_residency_acc (Count)": "N/A",
      "ppt_residency_acc (Count)": "N/A",
      "socket_thm_residency_acc (Count)": "N/A",
      "vr_thm_residency_acc (Count)": "N/A",
      "hbm_thm_residency_acc (Count)": "N/A",
      "pcie_lc_perf_other_end_recovery (Count)": "N/A",
      "vram_max_bandwidth (GB/s)": "N/A",
      "xgmi_link_status (Up/Down)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "num_partition": "N/A",
      "xcp_stats.gfx_busy_inst (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "xcp_stats.jpeg_busy (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "xcp_stats.vcn_busy (%)": "['N/A', 'N/A', 'N/A', 'N/A']",
      "xcp_stats.gfx_busy_acc (Count)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']",
      "xcp_stats.gfx_below_host_limit_acc (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']"
    },
    "system": {
      "Driver version": "6.10.5",
      "PID2405": "ollama, 0, 0, 0, unknown"
    }
  }

amdgpu_top (not part of ROCm) also has a json output mode and it looks like this:

{
      "ASIC Name": "GFX1100/Navi31",
      "CU per Shader Array": {
        "max": 8,
        "min": 6
      },
      "Chip Class": "GFX11",
      "DeviceID": 29772,
      "DeviceName": "AMD Radeon RX 7900 XT",
      "GL1 Cache per Shader Array": 262144,
      "GPU Clock": {
        "max": 2075,
        "min": 500
      },
      "GPU Family": "GC 11.0.0",
      "GPU Type": "dGPU",
      "GTT Size": 33668050944,
      "L1 Cache per CU": 32768,
      "L2 Cache": 6291456,
      "L3 Cache": 83886080,
      "Memory Clock": {
        "max": 1249,
        "min": 96
      },
      "NPU": null,
      "PCI": "0000:0b:00.0",
      "PCIe Link": {
        "max_dpm_link": {
          "gen": 3,
          "width": 8
        },
        "max_gpu_link": {
          "gen": 4,
          "width": 16
        },
        "max_system_link": {
          "gen": 3,
          "width": 8
        },
        "min_dpm_link": {
          "gen": 1,
          "width": 1
        }
      },
      "Power Cap": {
        "current": 265,
        "max": 290,
        "min": 238
      },
      "Power Profiles": [
        "BOOTUP_DEFAULT",
        "3D_FULL_SCREEN",
        "POWER_SAVING",
        "VIDEO",
        "VR",
        "COMPUTE",
        "CUSTOM",
        "WINDOW_3D"
      ],
      "RenderBackend": 24,
      "RenderBackend Type": "RB Plus",
      "ResizableBAR": false,
      "RevisionID": 204,
      "Sensors": {
        "Average Power": {
          "unit": "W",
          "value": 25
        },
        "CPU Tctl": null,
        "Edge Temperature": {
          "unit": "C",
          "value": 33
        },
        "Fan": {
          "unit": "RPM",
          "value": 0
        },
        "Fan Max": {
          "unit": "RPM",
          "value": 3600
        },
        "GFX Power": {
          "unit": "W",
          "value": 25
        },
        "GFX_MCLK": {
          "unit": "MHz",
          "value": 1249
        },
        "GFX_SCLK": {
          "unit": "MHz",
          "value": 168
        },
        "Input Power": null,
        "Junction Temperature": {
          "unit": "C",
          "value": 41
        },
        "Memory Temperature": {
          "unit": "C",
          "value": 46
        },
        "PCI Power State": "D0",
        "PCIe Link Speed": {
          "gen": 3,
          "width": 8
        },
        "Power Profile": "COMPUTE",
        "VDDGFX": {
          "unit": "mV",
          "value": 119
        },
        "VDDNB": null
      },
      "Shader Array per Shader Engine": 2,
      "Shader Engine": 6,
      "Total Compute Unit": 84,
      "Total ROP": 192,
      "VBIOS": {
        "date": "2023/03/23 04:26",
        "name": "EXT-80680 PPTable#3892-21152 TGP265w",
        "pn": "113-D70401XT-P11",
        "ver_str": "022.001.002.024.000001"
      },
      "VRAM": {
        "Total GTT": {
          "unit": "MiB",
          "value": 32108
        },
        "Total GTT Usage": {
          "unit": "MiB",
          "value": 15
        },
        "Total VRAM": {
          "unit": "MiB",
          "value": 20464
        },
        "Total VRAM Usage": {
          "unit": "MiB",
          "value": 11963
        }
      },
      "VRAM Bit width": 320,
      "VRAM Size": 21458059264,
      "VRAM Type": "GDDR6",
      "Video Caps": {
        "AV1": {
          "Decode": {
            "height": 4352,
            "width": 8192
          },
          "Encode": {
            "height": 4352,
            "width": 8192
          }
        },
        "HEVC": {
          "Decode": {
            "height": 4352,
            "width": 8192
          },
          "Encode": {
            "height": 4352,
            "width": 8192
          }
        },
        "JPEG": {
          "Decode": {
            "height": 16384,
            "width": 16384
          },
          "Encode": null
        },
        "MPEG2": {
          "Decode": null,
          "Encode": null
        },
        "MPEG4": {
          "Decode": null,
          "Encode": null
        },
        "MPEG4_AVC": {
          "Decode": {
            "height": 4096,
            "width": 4096
          },
          "Encode": {
            "height": 2304,
            "width": 4096
          }
        },
        "VC1": {
          "Decode": null,
          "Encode": null
        },
        "VP9": {
          "Decode": {
            "height": 4352,
            "width": 8192
          },
          "Encode": null
        }
      },
      "amdgpu_top_version": {
        "major": 0,
        "minor": 10,
        "patch": 4
      },
      "drm_version": {
        "major": 3,
        "minor": 59,
        "patchlevel": 0
      },
      "gfx_target_version": "gfx1100",
      "gpu_activity": {
        "GFX": {
          "unit": "%",
          "value": 100
        },
        "MediaEngine": {
          "unit": "%",
          "value": 0
        },
        "Memory": {
          "unit": "%",
          "value": 59
        }
      },
      "gpu_metrics": {
        "Throttle Status": [
          "PPT0",
          "PPT1",
          "TEMP_HOTSPOT"
        ],
        "average_core_power": null,
        "average_cpu_current": null,
        "average_cpu_power": null,
        "average_cpu_voltage": null,
        "average_dclk1_frequency": 25,
        "average_dclk_frequency": 25,
        "average_fclk_frequency": null,
        "average_gfx_current": null,
        "average_gfx_power": null,
        "average_gfx_voltage": null,
        "average_gfxclk_frequency": 2872,
        "average_soc_current": null,
        "average_soc_power": null,
        "average_soc_voltage": null,
        "average_socclk_frequency": null,
        "average_socket_power": 308,
        "average_temperature_core": null,
        "average_temperature_l3": null,
        "average_uclk_frequency": 2499,
        "average_vclk1_frequency": 25,
        "average_vclk_frequency": 25,
        "current_coreclk": null,
        "current_dclk": 25,
        "current_dclk1": 25,
        "current_fclk": null,
        "current_gfxclk": 2872,
        "current_l3clk": null,
        "current_socclk": 1285,
        "current_uclk": 1249,
        "current_vclk": 25,
        "current_vclk1": 25,
        "fan_pwm": null,
        "header": {
          "content_revision": 3,
          "format_revision": 1,
          "structure_size": 120
        },
        "pcie_link_speed": 80,
        "pcie_link_width": 8,
        "system_clock_counter": 177245504434575,
        "temperature_core": null,
        "temperature_edge": 43,
        "temperature_gfx": null,
        "temperature_hotspot": 62,
        "temperature_l3": null,
        "temperature_mem": 60,
        "temperature_soc": null,
        "temperature_vrgfx": 43,
        "temperature_vrmem": 43,
        "temperature_vrsoc": 41,
        "voltage_gfx": 1026,
        "voltage_mem": 803,
        "voltage_soc": 938
      }
    }

One of the things I was looking at for fun was power-consumption-per-LLM-query; it seems ROCm may report energy use in micro-joules, though my card or drivers don't seem to support that. I'm currently polling a smart power meter but I'd like to be able to poll ollama to know when my GPU is actually computing/inferring, rather than when I'm doing something else on the CPU.

I'm not terribly concerned about plotting GPU temperature or fan speed - the drivers seem to do a decent job of keeping the card within the appropriate thermal envelope - but I can see how others might want to track that in their dashboards.

I would not have a problem if the notional metrics endpoint just included the rocm-smi blob in the output and let me make sense of it.

I don't have examples of nvidia-smi, but I assume the NVidia drivers also have a way to query similar information.

<!-- gh-comment-id:2931628469 --> @ckuethe commented on GitHub (Jun 2, 2025): `rocm-smi` has a json output mode, and here's an example of what it says about my GPU: ```json { "card0": { "Device Name": "Radeon RX 7900 XT", "Device ID": "0x744c", "Device Rev": "0xcc", "Subsystem ID": "0x471e", "GUID": "21031", "Unique ID": "0xd8607413b0e3a90b", "VBIOS version": "113-D70401XT-P11", "Temperature (Sensor edge) (C)": "38.0", "Temperature (Sensor junction) (C)": "47.0", "Temperature (Sensor memory) (C)": "48.0", "dcefclk clock speed:": "(1563Mhz)", "dcefclk clock level:": "1", "fclk clock speed:": "(2301Mhz)", "fclk clock level:": "7", "mclk clock speed:": "(456Mhz)", "mclk clock level:": "1", "sclk clock speed:": "(43Mhz)", "sclk clock level:": "1", "socclk clock speed:": "(1500Mhz)", "socclk clock level:": "1", "pcie clock level": "2 (8.0GT/s x8)", "Performance Level": "auto", "Max Graphics Package Power (W)": "265.0", "Average Graphics Package Power (W)": "55.0", "GPU use (%)": "0", "GPU Memory Allocated (VRAM%)": "0", "GPU Memory Read/Write Activity (%)": "0", "Memory Activity": "N/A", "Avg. Memory Bandwidth": "0", "GPU memory vendor": "samsung", "PCIe Replay Count": "0", "Serial Number": "N/A", "Voltage (mV)": "712", "PCI Bus": "0000:0B:00.0", "ASD firmware version": "0x210000e3", "ME firmware version": "2280", "MEC firmware version": "2460", "MES firmware version": "0x0000006a", "MES KIQ firmware version": "0x00000100", "PFP firmware version": "2370", "RLC firmware version": "128", "SDMA firmware version": "24", "SDMA2 firmware version": "24", "SMC firmware version": "00.78.127.00", "SOS firmware version": "0x00310032", "TA RAS firmware version": "27.00.02.05", "VCN firmware version": "0x09116006", "Card Series": "Radeon RX 7900 XT", "Card Model": "0x744c", "Card Vendor": "Advanced Micro Devices, Inc. [AMD/ATI]", "Card SKU": "D70401XT", "Node ID": "1", "GFX Version": "gfx1100", "Energy counter": "0", "Accumulated Energy (uJ)": "0.0", "Metric Version and Size (Bytes)": "1.3 120", "temperature_edge (C)": "38", "temperature_hotspot (C)": "47", "temperature_mem (C)": "48", "temperature_vrgfx (C)": "41", "temperature_vrsoc (C)": "39", "temperature_vrmem (C)": "43", "average_gfx_activity (%)": "8", "average_umc_activity (%)": "0", "average_mm_activity (%)": "0", "average_socket_power (W)": "55", "energy_accumulator (15.259uJ (2^-16))": "0", "system_clock_counter (ns)": "186394264281836", "average_gfxclk_frequency (MHz)": "42", "average_socclk_frequency (MHz)": "N/A", "average_uclk_frequency (MHz)": "909", "average_vclk0_frequency (MHz)": "2933", "average_dclk0_frequency (MHz)": "2199", "average_vclk1_frequency (MHz)": "2933", "average_dclk1_frequency (MHz)": "2199", "current_gfxclk (MHz)": "42", "current_socclk (MHz)": "1500", "current_uclk (MHz)": "456", "current_vclk0 (MHz)": "2933", "current_dclk0 (MHz)": "2200", "current_vclk1 (MHz)": "2933", "current_dclk1 (MHz)": "2200", "throttle_status": "2", "current_fan_speed (rpm)": "0", "pcie_link_width (Lanes)": "8", "pcie_link_speed (0.1 GT/s)": "80", "gfx_activity_acc (%)": "N/A", "mem_activity_acc (%)": "N/A", "temperature_hbm (C)": "['N/A', 'N/A', 'N/A', 'N/A']", "firmware_timestamp (10ns resolution)": "18446744073709551606", "voltage_soc (mV)": "1170", "voltage_gfx (mV)": "713", "voltage_mem (mV)": "705", "indep_throttle_status": "68719476736", "current_socket_power (W)": "N/A", "vcn_activity (%)": "[0, 'N/A', 'N/A', 'N/A']", "gfxclk_lock_status": "N/A", "xgmi_link_width": "N/A", "xgmi_link_speed (Gbps)": "N/A", "pcie_bandwidth_acc (GB/s)": "N/A", "pcie_bandwidth_inst (GB/s)": "N/A", "pcie_l0_to_recov_count_acc (Count)": "N/A", "pcie_replay_count_acc (Count)": "N/A", "pcie_replay_rover_count_acc (Count)": "N/A", "xgmi_read_data_acc (kB)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "xgmi_write_data_acc (kB)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "current_gfxclks (MHz)": "[42, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "current_socclks (MHz)": "[1500, 'N/A', 'N/A', 'N/A']", "current_vclk0s (MHz)": "[2933, 'N/A', 'N/A', 'N/A']", "current_dclk0s (MHz)": "[2200, 'N/A', 'N/A', 'N/A']", "jpeg_activity (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "pcie_nak_sent_count_acc (Count)": "N/A", "pcie_nak_rcvd_count_acc (Count)": "N/A", "accumulation_counter (Count)": "N/A", "prochot_residency_acc (Count)": "N/A", "ppt_residency_acc (Count)": "N/A", "socket_thm_residency_acc (Count)": "N/A", "vr_thm_residency_acc (Count)": "N/A", "hbm_thm_residency_acc (Count)": "N/A", "pcie_lc_perf_other_end_recovery (Count)": "N/A", "vram_max_bandwidth (GB/s)": "N/A", "xgmi_link_status (Up/Down)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "num_partition": "N/A", "xcp_stats.gfx_busy_inst (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "xcp_stats.jpeg_busy (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "xcp_stats.vcn_busy (%)": "['N/A', 'N/A', 'N/A', 'N/A']", "xcp_stats.gfx_busy_acc (Count)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']", "xcp_stats.gfx_below_host_limit_acc (%)": "['N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']" }, "system": { "Driver version": "6.10.5", "PID2405": "ollama, 0, 0, 0, unknown" } } ``` [amdgpu_top](https://github.com/Umio-Yasuno/amdgpu_top) (not part of ROCm) also has a json output mode and it looks like this: ```json { "ASIC Name": "GFX1100/Navi31", "CU per Shader Array": { "max": 8, "min": 6 }, "Chip Class": "GFX11", "DeviceID": 29772, "DeviceName": "AMD Radeon RX 7900 XT", "GL1 Cache per Shader Array": 262144, "GPU Clock": { "max": 2075, "min": 500 }, "GPU Family": "GC 11.0.0", "GPU Type": "dGPU", "GTT Size": 33668050944, "L1 Cache per CU": 32768, "L2 Cache": 6291456, "L3 Cache": 83886080, "Memory Clock": { "max": 1249, "min": 96 }, "NPU": null, "PCI": "0000:0b:00.0", "PCIe Link": { "max_dpm_link": { "gen": 3, "width": 8 }, "max_gpu_link": { "gen": 4, "width": 16 }, "max_system_link": { "gen": 3, "width": 8 }, "min_dpm_link": { "gen": 1, "width": 1 } }, "Power Cap": { "current": 265, "max": 290, "min": 238 }, "Power Profiles": [ "BOOTUP_DEFAULT", "3D_FULL_SCREEN", "POWER_SAVING", "VIDEO", "VR", "COMPUTE", "CUSTOM", "WINDOW_3D" ], "RenderBackend": 24, "RenderBackend Type": "RB Plus", "ResizableBAR": false, "RevisionID": 204, "Sensors": { "Average Power": { "unit": "W", "value": 25 }, "CPU Tctl": null, "Edge Temperature": { "unit": "C", "value": 33 }, "Fan": { "unit": "RPM", "value": 0 }, "Fan Max": { "unit": "RPM", "value": 3600 }, "GFX Power": { "unit": "W", "value": 25 }, "GFX_MCLK": { "unit": "MHz", "value": 1249 }, "GFX_SCLK": { "unit": "MHz", "value": 168 }, "Input Power": null, "Junction Temperature": { "unit": "C", "value": 41 }, "Memory Temperature": { "unit": "C", "value": 46 }, "PCI Power State": "D0", "PCIe Link Speed": { "gen": 3, "width": 8 }, "Power Profile": "COMPUTE", "VDDGFX": { "unit": "mV", "value": 119 }, "VDDNB": null }, "Shader Array per Shader Engine": 2, "Shader Engine": 6, "Total Compute Unit": 84, "Total ROP": 192, "VBIOS": { "date": "2023/03/23 04:26", "name": "EXT-80680 PPTable#3892-21152 TGP265w", "pn": "113-D70401XT-P11", "ver_str": "022.001.002.024.000001" }, "VRAM": { "Total GTT": { "unit": "MiB", "value": 32108 }, "Total GTT Usage": { "unit": "MiB", "value": 15 }, "Total VRAM": { "unit": "MiB", "value": 20464 }, "Total VRAM Usage": { "unit": "MiB", "value": 11963 } }, "VRAM Bit width": 320, "VRAM Size": 21458059264, "VRAM Type": "GDDR6", "Video Caps": { "AV1": { "Decode": { "height": 4352, "width": 8192 }, "Encode": { "height": 4352, "width": 8192 } }, "HEVC": { "Decode": { "height": 4352, "width": 8192 }, "Encode": { "height": 4352, "width": 8192 } }, "JPEG": { "Decode": { "height": 16384, "width": 16384 }, "Encode": null }, "MPEG2": { "Decode": null, "Encode": null }, "MPEG4": { "Decode": null, "Encode": null }, "MPEG4_AVC": { "Decode": { "height": 4096, "width": 4096 }, "Encode": { "height": 2304, "width": 4096 } }, "VC1": { "Decode": null, "Encode": null }, "VP9": { "Decode": { "height": 4352, "width": 8192 }, "Encode": null } }, "amdgpu_top_version": { "major": 0, "minor": 10, "patch": 4 }, "drm_version": { "major": 3, "minor": 59, "patchlevel": 0 }, "gfx_target_version": "gfx1100", "gpu_activity": { "GFX": { "unit": "%", "value": 100 }, "MediaEngine": { "unit": "%", "value": 0 }, "Memory": { "unit": "%", "value": 59 } }, "gpu_metrics": { "Throttle Status": [ "PPT0", "PPT1", "TEMP_HOTSPOT" ], "average_core_power": null, "average_cpu_current": null, "average_cpu_power": null, "average_cpu_voltage": null, "average_dclk1_frequency": 25, "average_dclk_frequency": 25, "average_fclk_frequency": null, "average_gfx_current": null, "average_gfx_power": null, "average_gfx_voltage": null, "average_gfxclk_frequency": 2872, "average_soc_current": null, "average_soc_power": null, "average_soc_voltage": null, "average_socclk_frequency": null, "average_socket_power": 308, "average_temperature_core": null, "average_temperature_l3": null, "average_uclk_frequency": 2499, "average_vclk1_frequency": 25, "average_vclk_frequency": 25, "current_coreclk": null, "current_dclk": 25, "current_dclk1": 25, "current_fclk": null, "current_gfxclk": 2872, "current_l3clk": null, "current_socclk": 1285, "current_uclk": 1249, "current_vclk": 25, "current_vclk1": 25, "fan_pwm": null, "header": { "content_revision": 3, "format_revision": 1, "structure_size": 120 }, "pcie_link_speed": 80, "pcie_link_width": 8, "system_clock_counter": 177245504434575, "temperature_core": null, "temperature_edge": 43, "temperature_gfx": null, "temperature_hotspot": 62, "temperature_l3": null, "temperature_mem": 60, "temperature_soc": null, "temperature_vrgfx": 43, "temperature_vrmem": 43, "temperature_vrsoc": 41, "voltage_gfx": 1026, "voltage_mem": 803, "voltage_soc": 938 } } ``` One of the things I was looking at for fun was power-consumption-per-LLM-query; it seems ROCm may report energy use in micro-joules, though my card or drivers don't seem to support that. I'm currently polling a smart power meter but I'd like to be able to poll ollama to know when my GPU is actually computing/inferring, rather than when I'm doing something else on the CPU. I'm not terribly concerned about plotting GPU temperature or fan speed - the drivers seem to do a decent job of keeping the card within the appropriate thermal envelope - but I can see how others might want to track that in their dashboards. I would not have a problem if the notional metrics endpoint just included the `rocm-smi` blob in the output and let me make sense of it. I don't have examples of `nvidia-smi`, but I assume the NVidia drivers also have a way to query similar information.
Author
Owner

@Blakdawn commented on GitHub (Jun 13, 2025):

@ckuethe
Close, you can get XML from nvidia-smi -q -x, which results in this enormous file containing every supported clock speed (snipped for brevity)

<?xml version="1.0" ?>
<!DOCTYPE nvidia_smi_log SYSTEM "nvsmi_device_v12.dtd">
<nvidia_smi_log>
        <timestamp>Fri Jun 13 21:52:52 2025</timestamp>
        <driver_version>570.133.20</driver_version>
        <cuda_version>12.8</cuda_version>
        <attached_gpus>2</attached_gpus>
        <gpu id="00000000:01:00.0">
                <product_name>NVIDIA GeForce RTX 4060</product_name>
                <product_brand>GeForce</product_brand>
                <product_architecture>Ada Lovelace</product_architecture>
                <display_mode>Disabled</display_mode>
                <display_active>Disabled</display_active>
                <persistence_mode>Disabled</persistence_mode>
                <addressing_mode>None</addressing_mode>
                <mig_mode>
                        <current_mig>N/A</current_mig>
                        <pending_mig>N/A</pending_mig>
                </mig_mode>
                <mig_devices>
                        None
                </mig_devices>
                <accounting_mode>Disabled</accounting_mode>
                <accounting_mode_buffer_size>4000</accounting_mode_buffer_size>
                <driver_model>
                        <current_dm>N/A</current_dm>
                        <pending_dm>N/A</pending_dm>
                </driver_model>
                <serial>N/A</serial>
                <uuid>GPU-c1eea051-dc74-dce9-b83b-462d775d490a</uuid>
                <minor_number>0</minor_number>
                <vbios_version>95.07.36.40.0B</vbios_version>
                <multigpu_board>No</multigpu_board>
                <board_id>0x100</board_id>
                <board_part_number>N/A</board_part_number>
                <gpu_part_number>2882-400-A1</gpu_part_number>
                <gpu_fru_part_number>N/A</gpu_fru_part_number>
                <platformInfo>
                        <chassis_serial_number>N/A</chassis_serial_number>
                        <slot_number>N/A</slot_number>
                        <tray_index>N/A</tray_index>
                        <host_id>N/A</host_id>
                        <peer_type>N/A</peer_type>
                        <module_id>1</module_id>
                        <gpu_fabric_guid>N/A</gpu_fabric_guid>
                </platformInfo>
                <inforom_version>
                        <img_version>G002.0000.00.03</img_version>
                        <oem_object>2.0</oem_object>
                        <ecc_object>N/A</ecc_object>
                        <pwr_object>N/A</pwr_object>
                </inforom_version>
                <inforom_bbx_flush>
                        <latest_timestamp>N/A</latest_timestamp>
                        <latest_duration>N/A</latest_duration>
                </inforom_bbx_flush>
                <gpu_operation_mode>
                        <current_gom>N/A</current_gom>
                        <pending_gom>N/A</pending_gom>
                </gpu_operation_mode>
                <c2c_mode>N/A</c2c_mode>
                <gpu_virtualization_mode>
                        <virtualization_mode>None</virtualization_mode>
                        <host_vgpu_mode>N/A</host_vgpu_mode>
                        <vgpu_heterogeneous_mode>N/A</vgpu_heterogeneous_mode>
                </gpu_virtualization_mode>
                <gpu_reset_status>
                        <reset_required>Requested functionality has been deprecated</reset_required>
                        <drain_and_reset_recommended>Requested functionality has been deprecated</drain_and_reset_recommended>
                </gpu_reset_status>
                <gpu_recovery_action>None</gpu_recovery_action>
                <gsp_firmware_version>570.133.20</gsp_firmware_version>
                <ibmnpu>
                        <relaxed_ordering_mode>N/A</relaxed_ordering_mode>
                </ibmnpu>
                <pci>
                        <pci_bus>01</pci_bus>
                        <pci_device>00</pci_device>
                        <pci_domain>0000</pci_domain>
                        <pci_base_class>3</pci_base_class>
                        <pci_sub_class>0</pci_sub_class>
                        <pci_device_id>288210DE</pci_device_id>
                        <pci_bus_id>00000000:01:00.0</pci_bus_id>
                        <pci_sub_system_id>895E1043</pci_sub_system_id>
                        <pci_gpu_link_info>
                                <pcie_gen>
                                        <max_link_gen>3</max_link_gen>
                                        <current_link_gen>3</current_link_gen>
                                        <device_current_link_gen>3</device_current_link_gen>
                                        <max_device_link_gen>4</max_device_link_gen>
                                        <max_host_link_gen>3</max_host_link_gen>
                                </pcie_gen>
                                <link_widths>
                                        <max_link_width>8x</max_link_width>
                                        <current_link_width>8x</current_link_width>
                                </link_widths>
                        </pci_gpu_link_info>
                        <pci_bridge_chip>
                                <bridge_chip_type>N/A</bridge_chip_type>
                                <bridge_chip_fw>N/A</bridge_chip_fw>
                        </pci_bridge_chip>
                        <replay_counter>0</replay_counter>
                        <replay_rollover_counter>0</replay_rollover_counter>
                        <tx_util>450 KB/s</tx_util>
                        <rx_util>350 KB/s</rx_util>
                        <atomic_caps_outbound>N/A</atomic_caps_outbound>
                        <atomic_caps_inbound>N/A</atomic_caps_inbound>
                </pci>
                <fan_speed>31 %</fan_speed>
                <performance_state>P0</performance_state>
                <clocks_event_reasons>
                        <clocks_event_reason_gpu_idle>Not Active</clocks_event_reason_gpu_idle>
                        <clocks_event_reason_applications_clocks_setting>Not Active</clocks_event_reason_applications_clocks_setting>
                        <clocks_event_reason_sw_power_cap>Not Active</clocks_event_reason_sw_power_cap>
                        <clocks_event_reason_hw_slowdown>Not Active</clocks_event_reason_hw_slowdown>
                        <clocks_event_reason_hw_thermal_slowdown>Not Active</clocks_event_reason_hw_thermal_slowdown>
                        <clocks_event_reason_hw_power_brake_slowdown>Not Active</clocks_event_reason_hw_power_brake_slowdown>
                        <clocks_event_reason_sync_boost>Not Active</clocks_event_reason_sync_boost>
                        <clocks_event_reason_sw_thermal_slowdown>Not Active</clocks_event_reason_sw_thermal_slowdown>
                        <clocks_event_reason_display_clocks_setting>Not Active</clocks_event_reason_display_clocks_setting>
                </clocks_event_reasons>
                <sparse_operation_mode>N/A</sparse_operation_mode>
                <fb_memory_usage>
                        <total>8188 MiB</total>
                        <reserved>374 MiB</reserved>
                        <used>7474 MiB</used>
                        <free>342 MiB</free>
                </fb_memory_usage>
                <bar1_memory_usage>
                        <total>256 MiB</total>
                        <used>6 MiB</used>
                        <free>250 MiB</free>
                </bar1_memory_usage>
                <cc_protected_memory_usage>
                        <total>0 MiB</total>
                        <used>0 MiB</used>
                        <free>0 MiB</free>
                </cc_protected_memory_usage>
                <compute_mode>Default</compute_mode>
                <utilization>
                        <gpu_util>0 %</gpu_util>
                        <memory_util>0 %</memory_util>
                        <encoder_util>0 %</encoder_util>
                        <decoder_util>0 %</decoder_util>
                        <jpeg_util>0 %</jpeg_util>
                        <ofa_util>0 %</ofa_util>
                </utilization>
                <encoder_stats>
                        <session_count>0</session_count>
                        <average_fps>0</average_fps>
                        <average_latency>0</average_latency>
                </encoder_stats>
                <fbc_stats>
                        <session_count>0</session_count>
                        <average_fps>0</average_fps>
                        <average_latency>0</average_latency>
                </fbc_stats>
                <dram_encryption_mode>
                        <current_dram_encryption>N/A</current_dram_encryption>
                        <pending_dram_encryption>N/A</pending_dram_encryption>
                </dram_encryption_mode>
                <ecc_mode>
                        <current_ecc>N/A</current_ecc>
                        <pending_ecc>N/A</pending_ecc>
                </ecc_mode>
                <ecc_errors>
                        <volatile>
                                <sram_correctable>N/A</sram_correctable>
                                <sram_uncorrectable_parity>N/A</sram_uncorrectable_parity>
                                <sram_uncorrectable_secded>N/A</sram_uncorrectable_secded>
                                <dram_correctable>N/A</dram_correctable>
                                <dram_uncorrectable>N/A</dram_uncorrectable>
                        </volatile>
                        <aggregate>
                                <sram_correctable>N/A</sram_correctable>
                                <sram_uncorrectable_parity>N/A</sram_uncorrectable_parity>
                                <sram_uncorrectable_secded>N/A</sram_uncorrectable_secded>
                                <dram_correctable>N/A</dram_correctable>
                                <dram_uncorrectable>N/A</dram_uncorrectable>
                                <sram_threshold_exceeded>N/A</sram_threshold_exceeded>
                        </aggregate>
                        <aggregate_uncorrectable_sram_sources>
                                <sram_l2>N/A</sram_l2>
                                <sram_sm>N/A</sram_sm>
                                <sram_microcontroller>N/A</sram_microcontroller>
                                <sram_pcie>N/A</sram_pcie>
                                <sram_other>N/A</sram_other>
                        </aggregate_uncorrectable_sram_sources>
                </ecc_errors>
                <retired_pages>
                        <multiple_single_bit_retirement>
                                <retired_count>N/A</retired_count>
                                <retired_pagelist>N/A</retired_pagelist>
                        </multiple_single_bit_retirement>
                        <double_bit_retirement>
                                <retired_count>N/A</retired_count>
                                <retired_pagelist>N/A</retired_pagelist>
                        </double_bit_retirement>
                        <pending_blacklist>N/A</pending_blacklist>
                        <pending_retirement>N/A</pending_retirement>
                </retired_pages>
                <remapped_rows>
                        <remapped_row_corr>0</remapped_row_corr>
                        <remapped_row_unc>0</remapped_row_unc>
                        <remapped_row_pending>No</remapped_row_pending>
                        <remapped_row_failure>No</remapped_row_failure>
                        <row_remapper_histogram>
                                <row_remapper_histogram_max>64 bank(s)</row_remapper_histogram_max>
                                <row_remapper_histogram_high>0 bank(s)</row_remapper_histogram_high>
                                <row_remapper_histogram_partial>0 bank(s)</row_remapper_histogram_partial>
                                <row_remapper_histogram_low>0 bank(s)</row_remapper_histogram_low>
                                <row_remapper_histogram_none>0 bank(s)</row_remapper_histogram_none>
                        </row_remapper_histogram>
                </remapped_rows>
                <temperature>
                        <gpu_temp>53 C</gpu_temp>
                        <gpu_temp_tlimit>31 C</gpu_temp_tlimit>
                        <gpu_temp_max_tlimit_threshold>-7 C</gpu_temp_max_tlimit_threshold>
                        <gpu_temp_slow_tlimit_threshold>-2 C</gpu_temp_slow_tlimit_threshold>
                        <gpu_temp_max_gpu_tlimit_threshold>0 C</gpu_temp_max_gpu_tlimit_threshold>
                        <gpu_target_temperature>83 C</gpu_target_temperature>
                        <memory_temp>N/A</memory_temp>
                        <gpu_temp_max_mem_tlimit_threshold>N/A</gpu_temp_max_mem_tlimit_threshold>
                </temperature>
                <supported_gpu_target_temp>
                        <gpu_target_temp_min>65 C</gpu_target_temp_min>
                        <gpu_target_temp_max>88 C</gpu_target_temp_max>
                </supported_gpu_target_temp>
                <gpu_power_readings>
                        <power_state>P0</power_state>
                        <average_power_draw>N/A</average_power_draw>
                        <instant_power_draw>N/A</instant_power_draw>
                        <current_power_limit>115.00 W</current_power_limit>
                        <requested_power_limit>115.00 W</requested_power_limit>
                        <default_power_limit>115.00 W</default_power_limit>
                        <min_power_limit>90.00 W</min_power_limit>
                        <max_power_limit>138.00 W</max_power_limit>
                </gpu_power_readings>
                <gpu_memory_power_readings>
                        <average_power_draw>N/A</average_power_draw>
                        <instant_power_draw>N/A</instant_power_draw>
                </gpu_memory_power_readings>
                <module_power_readings>
                        <power_state>P0</power_state>
                        <average_power_draw>N/A</average_power_draw>
                        <instant_power_draw>N/A</instant_power_draw>
                        <current_power_limit>N/A</current_power_limit>
                        <requested_power_limit>N/A</requested_power_limit>
                        <default_power_limit>N/A</default_power_limit>
                        <min_power_limit>N/A</min_power_limit>
                        <max_power_limit>N/A</max_power_limit>
                </module_power_readings>
                <power_smoothing>N/A</power_smoothing>
                <power_profiles>
                        <power_profile_requested_profiles>N/A</power_profile_requested_profiles>
                        <power_profile_enforced_profiles>N/A</power_profile_enforced_profiles>
                </power_profiles>
                <clocks>
                        <graphics_clock>2505 MHz</graphics_clock>
                        <sm_clock>2505 MHz</sm_clock>
                        <mem_clock>8501 MHz</mem_clock>
                        <video_clock>2070 MHz</video_clock>
                </clocks>
                <applications_clocks>
                        <graphics_clock>N/A</graphics_clock>
                        <mem_clock>N/A</mem_clock>
                </applications_clocks>
                <default_applications_clocks>
                        <graphics_clock>N/A</graphics_clock>
                        <mem_clock>N/A</mem_clock>
                </default_applications_clocks>
                <deferred_clocks>
                        <mem_clock>N/A</mem_clock>
                </deferred_clocks>
                <max_clocks>
                        <graphics_clock>3105 MHz</graphics_clock>
                        <sm_clock>3105 MHz</sm_clock>
                        <mem_clock>8501 MHz</mem_clock>
                        <video_clock>2415 MHz</video_clock>
                </max_clocks>
                <max_customer_boost_clocks>
                        <graphics_clock>N/A</graphics_clock>
                </max_customer_boost_clocks>
                <clock_policy>
                        <auto_boost>N/A</auto_boost>
                        <auto_boost_default>N/A</auto_boost_default>
                </clock_policy>
                <voltage>
                        <graphics_volt>N/A</graphics_volt>
                </voltage>
                <fabric>
                        <state>N/A</state>
                        <status>N/A</status>
                        <cliqueId>N/A</cliqueId>
                        <clusterUuid>N/A</clusterUuid>
                        <health>
                                <bandwidth>N/A</bandwidth>
                                <route_recovery_in_progress>N/A</route_recovery_in_progress>
                                <route_unhealthy>N/A</route_unhealthy>
                                <access_timeout_recovery>N/A</access_timeout_recovery>
                        </health>
                </fabric>
                <supported_clocks>
                        <supported_mem_clock>
                                <value>8501 MHz</value>
                                <supported_graphics_clock>3105 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3090 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3075 MHz</supported_graphics_clock>
                                SNIP
                                <supported_graphics_clock>210 MHz</supported_graphics_clock>
                        </supported_mem_clock>
                        <supported_mem_clock>
                                <value>8251 MHz</value>
                                <supported_graphics_clock>3105 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3090 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3075 MHz</supported_graphics_clock>
                                SNIP
                                <supported_graphics_clock>210 MHz</supported_graphics_clock>
                        </supported_mem_clock>
                        <supported_mem_clock>
                                <value>5001 MHz</value>
                                <supported_graphics_clock>3105 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3090 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3075 MHz</supported_graphics_clock>
                                SNIP
                                <supported_graphics_clock>210 MHz</supported_graphics_clock>
                        </supported_mem_clock>
                        <supported_mem_clock>
                                <value>810 MHz</value>
                                <supported_graphics_clock>3105 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3090 MHz</supported_graphics_clock>
                                <supported_graphics_clock>3075 MHz</supported_graphics_clock>
                                SNIP
                                <supported_graphics_clock>210 MHz</supported_graphics_clock>
                        </supported_mem_clock>
                        <supported_mem_clock>
                                <value>405 MHz</value>
                                <supported_graphics_clock>405 MHz</supported_graphics_clock>
                                <supported_graphics_clock>390 MHz</supported_graphics_clock>
                                <supported_graphics_clock>375 MHz</supported_graphics_clock>
                                <supported_graphics_clock>360 MHz</supported_graphics_clock>
                                <supported_graphics_clock>345 MHz</supported_graphics_clock>
                                <supported_graphics_clock>330 MHz</supported_graphics_clock>
                                <supported_graphics_clock>315 MHz</supported_graphics_clock>
                                <supported_graphics_clock>300 MHz</supported_graphics_clock>
                                <supported_graphics_clock>285 MHz</supported_graphics_clock>
                                <supported_graphics_clock>270 MHz</supported_graphics_clock>
                                <supported_graphics_clock>255 MHz</supported_graphics_clock>
                                <supported_graphics_clock>240 MHz</supported_graphics_clock>
                                <supported_graphics_clock>225 MHz</supported_graphics_clock>
                                <supported_graphics_clock>210 MHz</supported_graphics_clock>
                        </supported_mem_clock>
                </supported_clocks>
                <processes>
                        <process_info>
                                <gpu_instance_id>N/A</gpu_instance_id>
                                <compute_instance_id>N/A</compute_instance_id>
                                <pid>9082</pid>
                                <type>G</type>
                                <process_name>/usr/lib/xorg/Xorg</process_name>
                                <used_memory>9 MiB</used_memory>
                        </process_info>
                        <process_info>
                                <gpu_instance_id>N/A</gpu_instance_id>
                                <compute_instance_id>N/A</compute_instance_id>
                                <pid>9687</pid>
                                <type>G</type>
                                <process_name>/usr/bin/gnome-shell</process_name>
                                <used_memory>3 MiB</used_memory>
                        </process_info>
                        <process_info>
                                <gpu_instance_id>N/A</gpu_instance_id>
                                <compute_instance_id>N/A</compute_instance_id>
                                <pid>265214</pid>
                                <type>C</type>
                                <process_name>/usr/bin/ollama</process_name>
                                <used_memory>7434 MiB</used_memory>
                        </process_info>
                </processes>
                <accounted_processes>
                </accounted_processes>
                <capabilities>
                        <egm>disabled</egm>
                </capabilities>
        </gpu>
</nvidia_smi_log>
<!-- gh-comment-id:2971666624 --> @Blakdawn commented on GitHub (Jun 13, 2025): @ckuethe Close, you can get XML from `nvidia-smi -q -x`, which results in this enormous file containing every supported clock speed (snipped for brevity) ``` <?xml version="1.0" ?> <!DOCTYPE nvidia_smi_log SYSTEM "nvsmi_device_v12.dtd"> <nvidia_smi_log> <timestamp>Fri Jun 13 21:52:52 2025</timestamp> <driver_version>570.133.20</driver_version> <cuda_version>12.8</cuda_version> <attached_gpus>2</attached_gpus> <gpu id="00000000:01:00.0"> <product_name>NVIDIA GeForce RTX 4060</product_name> <product_brand>GeForce</product_brand> <product_architecture>Ada Lovelace</product_architecture> <display_mode>Disabled</display_mode> <display_active>Disabled</display_active> <persistence_mode>Disabled</persistence_mode> <addressing_mode>None</addressing_mode> <mig_mode> <current_mig>N/A</current_mig> <pending_mig>N/A</pending_mig> </mig_mode> <mig_devices> None </mig_devices> <accounting_mode>Disabled</accounting_mode> <accounting_mode_buffer_size>4000</accounting_mode_buffer_size> <driver_model> <current_dm>N/A</current_dm> <pending_dm>N/A</pending_dm> </driver_model> <serial>N/A</serial> <uuid>GPU-c1eea051-dc74-dce9-b83b-462d775d490a</uuid> <minor_number>0</minor_number> <vbios_version>95.07.36.40.0B</vbios_version> <multigpu_board>No</multigpu_board> <board_id>0x100</board_id> <board_part_number>N/A</board_part_number> <gpu_part_number>2882-400-A1</gpu_part_number> <gpu_fru_part_number>N/A</gpu_fru_part_number> <platformInfo> <chassis_serial_number>N/A</chassis_serial_number> <slot_number>N/A</slot_number> <tray_index>N/A</tray_index> <host_id>N/A</host_id> <peer_type>N/A</peer_type> <module_id>1</module_id> <gpu_fabric_guid>N/A</gpu_fabric_guid> </platformInfo> <inforom_version> <img_version>G002.0000.00.03</img_version> <oem_object>2.0</oem_object> <ecc_object>N/A</ecc_object> <pwr_object>N/A</pwr_object> </inforom_version> <inforom_bbx_flush> <latest_timestamp>N/A</latest_timestamp> <latest_duration>N/A</latest_duration> </inforom_bbx_flush> <gpu_operation_mode> <current_gom>N/A</current_gom> <pending_gom>N/A</pending_gom> </gpu_operation_mode> <c2c_mode>N/A</c2c_mode> <gpu_virtualization_mode> <virtualization_mode>None</virtualization_mode> <host_vgpu_mode>N/A</host_vgpu_mode> <vgpu_heterogeneous_mode>N/A</vgpu_heterogeneous_mode> </gpu_virtualization_mode> <gpu_reset_status> <reset_required>Requested functionality has been deprecated</reset_required> <drain_and_reset_recommended>Requested functionality has been deprecated</drain_and_reset_recommended> </gpu_reset_status> <gpu_recovery_action>None</gpu_recovery_action> <gsp_firmware_version>570.133.20</gsp_firmware_version> <ibmnpu> <relaxed_ordering_mode>N/A</relaxed_ordering_mode> </ibmnpu> <pci> <pci_bus>01</pci_bus> <pci_device>00</pci_device> <pci_domain>0000</pci_domain> <pci_base_class>3</pci_base_class> <pci_sub_class>0</pci_sub_class> <pci_device_id>288210DE</pci_device_id> <pci_bus_id>00000000:01:00.0</pci_bus_id> <pci_sub_system_id>895E1043</pci_sub_system_id> <pci_gpu_link_info> <pcie_gen> <max_link_gen>3</max_link_gen> <current_link_gen>3</current_link_gen> <device_current_link_gen>3</device_current_link_gen> <max_device_link_gen>4</max_device_link_gen> <max_host_link_gen>3</max_host_link_gen> </pcie_gen> <link_widths> <max_link_width>8x</max_link_width> <current_link_width>8x</current_link_width> </link_widths> </pci_gpu_link_info> <pci_bridge_chip> <bridge_chip_type>N/A</bridge_chip_type> <bridge_chip_fw>N/A</bridge_chip_fw> </pci_bridge_chip> <replay_counter>0</replay_counter> <replay_rollover_counter>0</replay_rollover_counter> <tx_util>450 KB/s</tx_util> <rx_util>350 KB/s</rx_util> <atomic_caps_outbound>N/A</atomic_caps_outbound> <atomic_caps_inbound>N/A</atomic_caps_inbound> </pci> <fan_speed>31 %</fan_speed> <performance_state>P0</performance_state> <clocks_event_reasons> <clocks_event_reason_gpu_idle>Not Active</clocks_event_reason_gpu_idle> <clocks_event_reason_applications_clocks_setting>Not Active</clocks_event_reason_applications_clocks_setting> <clocks_event_reason_sw_power_cap>Not Active</clocks_event_reason_sw_power_cap> <clocks_event_reason_hw_slowdown>Not Active</clocks_event_reason_hw_slowdown> <clocks_event_reason_hw_thermal_slowdown>Not Active</clocks_event_reason_hw_thermal_slowdown> <clocks_event_reason_hw_power_brake_slowdown>Not Active</clocks_event_reason_hw_power_brake_slowdown> <clocks_event_reason_sync_boost>Not Active</clocks_event_reason_sync_boost> <clocks_event_reason_sw_thermal_slowdown>Not Active</clocks_event_reason_sw_thermal_slowdown> <clocks_event_reason_display_clocks_setting>Not Active</clocks_event_reason_display_clocks_setting> </clocks_event_reasons> <sparse_operation_mode>N/A</sparse_operation_mode> <fb_memory_usage> <total>8188 MiB</total> <reserved>374 MiB</reserved> <used>7474 MiB</used> <free>342 MiB</free> </fb_memory_usage> <bar1_memory_usage> <total>256 MiB</total> <used>6 MiB</used> <free>250 MiB</free> </bar1_memory_usage> <cc_protected_memory_usage> <total>0 MiB</total> <used>0 MiB</used> <free>0 MiB</free> </cc_protected_memory_usage> <compute_mode>Default</compute_mode> <utilization> <gpu_util>0 %</gpu_util> <memory_util>0 %</memory_util> <encoder_util>0 %</encoder_util> <decoder_util>0 %</decoder_util> <jpeg_util>0 %</jpeg_util> <ofa_util>0 %</ofa_util> </utilization> <encoder_stats> <session_count>0</session_count> <average_fps>0</average_fps> <average_latency>0</average_latency> </encoder_stats> <fbc_stats> <session_count>0</session_count> <average_fps>0</average_fps> <average_latency>0</average_latency> </fbc_stats> <dram_encryption_mode> <current_dram_encryption>N/A</current_dram_encryption> <pending_dram_encryption>N/A</pending_dram_encryption> </dram_encryption_mode> <ecc_mode> <current_ecc>N/A</current_ecc> <pending_ecc>N/A</pending_ecc> </ecc_mode> <ecc_errors> <volatile> <sram_correctable>N/A</sram_correctable> <sram_uncorrectable_parity>N/A</sram_uncorrectable_parity> <sram_uncorrectable_secded>N/A</sram_uncorrectable_secded> <dram_correctable>N/A</dram_correctable> <dram_uncorrectable>N/A</dram_uncorrectable> </volatile> <aggregate> <sram_correctable>N/A</sram_correctable> <sram_uncorrectable_parity>N/A</sram_uncorrectable_parity> <sram_uncorrectable_secded>N/A</sram_uncorrectable_secded> <dram_correctable>N/A</dram_correctable> <dram_uncorrectable>N/A</dram_uncorrectable> <sram_threshold_exceeded>N/A</sram_threshold_exceeded> </aggregate> <aggregate_uncorrectable_sram_sources> <sram_l2>N/A</sram_l2> <sram_sm>N/A</sram_sm> <sram_microcontroller>N/A</sram_microcontroller> <sram_pcie>N/A</sram_pcie> <sram_other>N/A</sram_other> </aggregate_uncorrectable_sram_sources> </ecc_errors> <retired_pages> <multiple_single_bit_retirement> <retired_count>N/A</retired_count> <retired_pagelist>N/A</retired_pagelist> </multiple_single_bit_retirement> <double_bit_retirement> <retired_count>N/A</retired_count> <retired_pagelist>N/A</retired_pagelist> </double_bit_retirement> <pending_blacklist>N/A</pending_blacklist> <pending_retirement>N/A</pending_retirement> </retired_pages> <remapped_rows> <remapped_row_corr>0</remapped_row_corr> <remapped_row_unc>0</remapped_row_unc> <remapped_row_pending>No</remapped_row_pending> <remapped_row_failure>No</remapped_row_failure> <row_remapper_histogram> <row_remapper_histogram_max>64 bank(s)</row_remapper_histogram_max> <row_remapper_histogram_high>0 bank(s)</row_remapper_histogram_high> <row_remapper_histogram_partial>0 bank(s)</row_remapper_histogram_partial> <row_remapper_histogram_low>0 bank(s)</row_remapper_histogram_low> <row_remapper_histogram_none>0 bank(s)</row_remapper_histogram_none> </row_remapper_histogram> </remapped_rows> <temperature> <gpu_temp>53 C</gpu_temp> <gpu_temp_tlimit>31 C</gpu_temp_tlimit> <gpu_temp_max_tlimit_threshold>-7 C</gpu_temp_max_tlimit_threshold> <gpu_temp_slow_tlimit_threshold>-2 C</gpu_temp_slow_tlimit_threshold> <gpu_temp_max_gpu_tlimit_threshold>0 C</gpu_temp_max_gpu_tlimit_threshold> <gpu_target_temperature>83 C</gpu_target_temperature> <memory_temp>N/A</memory_temp> <gpu_temp_max_mem_tlimit_threshold>N/A</gpu_temp_max_mem_tlimit_threshold> </temperature> <supported_gpu_target_temp> <gpu_target_temp_min>65 C</gpu_target_temp_min> <gpu_target_temp_max>88 C</gpu_target_temp_max> </supported_gpu_target_temp> <gpu_power_readings> <power_state>P0</power_state> <average_power_draw>N/A</average_power_draw> <instant_power_draw>N/A</instant_power_draw> <current_power_limit>115.00 W</current_power_limit> <requested_power_limit>115.00 W</requested_power_limit> <default_power_limit>115.00 W</default_power_limit> <min_power_limit>90.00 W</min_power_limit> <max_power_limit>138.00 W</max_power_limit> </gpu_power_readings> <gpu_memory_power_readings> <average_power_draw>N/A</average_power_draw> <instant_power_draw>N/A</instant_power_draw> </gpu_memory_power_readings> <module_power_readings> <power_state>P0</power_state> <average_power_draw>N/A</average_power_draw> <instant_power_draw>N/A</instant_power_draw> <current_power_limit>N/A</current_power_limit> <requested_power_limit>N/A</requested_power_limit> <default_power_limit>N/A</default_power_limit> <min_power_limit>N/A</min_power_limit> <max_power_limit>N/A</max_power_limit> </module_power_readings> <power_smoothing>N/A</power_smoothing> <power_profiles> <power_profile_requested_profiles>N/A</power_profile_requested_profiles> <power_profile_enforced_profiles>N/A</power_profile_enforced_profiles> </power_profiles> <clocks> <graphics_clock>2505 MHz</graphics_clock> <sm_clock>2505 MHz</sm_clock> <mem_clock>8501 MHz</mem_clock> <video_clock>2070 MHz</video_clock> </clocks> <applications_clocks> <graphics_clock>N/A</graphics_clock> <mem_clock>N/A</mem_clock> </applications_clocks> <default_applications_clocks> <graphics_clock>N/A</graphics_clock> <mem_clock>N/A</mem_clock> </default_applications_clocks> <deferred_clocks> <mem_clock>N/A</mem_clock> </deferred_clocks> <max_clocks> <graphics_clock>3105 MHz</graphics_clock> <sm_clock>3105 MHz</sm_clock> <mem_clock>8501 MHz</mem_clock> <video_clock>2415 MHz</video_clock> </max_clocks> <max_customer_boost_clocks> <graphics_clock>N/A</graphics_clock> </max_customer_boost_clocks> <clock_policy> <auto_boost>N/A</auto_boost> <auto_boost_default>N/A</auto_boost_default> </clock_policy> <voltage> <graphics_volt>N/A</graphics_volt> </voltage> <fabric> <state>N/A</state> <status>N/A</status> <cliqueId>N/A</cliqueId> <clusterUuid>N/A</clusterUuid> <health> <bandwidth>N/A</bandwidth> <route_recovery_in_progress>N/A</route_recovery_in_progress> <route_unhealthy>N/A</route_unhealthy> <access_timeout_recovery>N/A</access_timeout_recovery> </health> </fabric> <supported_clocks> <supported_mem_clock> <value>8501 MHz</value> <supported_graphics_clock>3105 MHz</supported_graphics_clock> <supported_graphics_clock>3090 MHz</supported_graphics_clock> <supported_graphics_clock>3075 MHz</supported_graphics_clock> SNIP <supported_graphics_clock>210 MHz</supported_graphics_clock> </supported_mem_clock> <supported_mem_clock> <value>8251 MHz</value> <supported_graphics_clock>3105 MHz</supported_graphics_clock> <supported_graphics_clock>3090 MHz</supported_graphics_clock> <supported_graphics_clock>3075 MHz</supported_graphics_clock> SNIP <supported_graphics_clock>210 MHz</supported_graphics_clock> </supported_mem_clock> <supported_mem_clock> <value>5001 MHz</value> <supported_graphics_clock>3105 MHz</supported_graphics_clock> <supported_graphics_clock>3090 MHz</supported_graphics_clock> <supported_graphics_clock>3075 MHz</supported_graphics_clock> SNIP <supported_graphics_clock>210 MHz</supported_graphics_clock> </supported_mem_clock> <supported_mem_clock> <value>810 MHz</value> <supported_graphics_clock>3105 MHz</supported_graphics_clock> <supported_graphics_clock>3090 MHz</supported_graphics_clock> <supported_graphics_clock>3075 MHz</supported_graphics_clock> SNIP <supported_graphics_clock>210 MHz</supported_graphics_clock> </supported_mem_clock> <supported_mem_clock> <value>405 MHz</value> <supported_graphics_clock>405 MHz</supported_graphics_clock> <supported_graphics_clock>390 MHz</supported_graphics_clock> <supported_graphics_clock>375 MHz</supported_graphics_clock> <supported_graphics_clock>360 MHz</supported_graphics_clock> <supported_graphics_clock>345 MHz</supported_graphics_clock> <supported_graphics_clock>330 MHz</supported_graphics_clock> <supported_graphics_clock>315 MHz</supported_graphics_clock> <supported_graphics_clock>300 MHz</supported_graphics_clock> <supported_graphics_clock>285 MHz</supported_graphics_clock> <supported_graphics_clock>270 MHz</supported_graphics_clock> <supported_graphics_clock>255 MHz</supported_graphics_clock> <supported_graphics_clock>240 MHz</supported_graphics_clock> <supported_graphics_clock>225 MHz</supported_graphics_clock> <supported_graphics_clock>210 MHz</supported_graphics_clock> </supported_mem_clock> </supported_clocks> <processes> <process_info> <gpu_instance_id>N/A</gpu_instance_id> <compute_instance_id>N/A</compute_instance_id> <pid>9082</pid> <type>G</type> <process_name>/usr/lib/xorg/Xorg</process_name> <used_memory>9 MiB</used_memory> </process_info> <process_info> <gpu_instance_id>N/A</gpu_instance_id> <compute_instance_id>N/A</compute_instance_id> <pid>9687</pid> <type>G</type> <process_name>/usr/bin/gnome-shell</process_name> <used_memory>3 MiB</used_memory> </process_info> <process_info> <gpu_instance_id>N/A</gpu_instance_id> <compute_instance_id>N/A</compute_instance_id> <pid>265214</pid> <type>C</type> <process_name>/usr/bin/ollama</process_name> <used_memory>7434 MiB</used_memory> </process_info> </processes> <accounted_processes> </accounted_processes> <capabilities> <egm>disabled</egm> </capabilities> </gpu> </nvidia_smi_log> ```
Author
Owner

@lapo-luchini commented on GitHub (Jul 1, 2025):

I'm working on some Prometheus-compatible /metrics exporting in #11159.

<!-- gh-comment-id:3025173872 --> @lapo-luchini commented on GitHub (Jul 1, 2025): I'm working on some Prometheus-compatible `/metrics` exporting in #11159.
Author
Owner

@mikygit commented on GitHub (Sep 4, 2025):

+1

<!-- gh-comment-id:3252827629 --> @mikygit commented on GitHub (Sep 4, 2025): +1
Author
Owner

@fire833 commented on GitHub (Sep 6, 2025):

+1

<!-- gh-comment-id:3263056989 --> @fire833 commented on GitHub (Sep 6, 2025): +1
Author
Owner

@baditaflorin commented on GitHub (Sep 7, 2025):

This would be perfect!

<!-- gh-comment-id:3263757080 --> @baditaflorin commented on GitHub (Sep 7, 2025): This would be perfect!
Author
Owner

@volodinaleksey commented on GitHub (Sep 9, 2025):

+1

<!-- gh-comment-id:3270178575 --> @volodinaleksey commented on GitHub (Sep 9, 2025): +1
Author
Owner

@xxnuo commented on GitHub (Sep 12, 2025):

Will input and output token counts be supported?
I observed that vLLM has this feature, and I'd like to track how many tokens are being used.

<!-- gh-comment-id:3284502969 --> @xxnuo commented on GitHub (Sep 12, 2025): Will input and output token counts be supported? I observed that vLLM has this feature, and I'd like to track how many tokens are being used.
Author
Owner

@leooamaral commented on GitHub (Sep 17, 2025):

+1

<!-- gh-comment-id:3302984330 --> @leooamaral commented on GitHub (Sep 17, 2025): +1
Author
Owner

@Lesaloon commented on GitHub (Oct 4, 2025):

+1

<!-- gh-comment-id:3368563479 --> @Lesaloon commented on GitHub (Oct 4, 2025): +1
Author
Owner

@ill-yes commented on GitHub (Oct 9, 2025):

FYI we're waiting for this PR https://github.com/ollama/ollama/pull/11159 and an approve of @pdevine

<!-- gh-comment-id:3386546149 --> @ill-yes commented on GitHub (Oct 9, 2025): FYI we're waiting for this PR https://github.com/ollama/ollama/pull/11159 and an approve of @pdevine
Author
Owner

@andreys42 commented on GitHub (Nov 10, 2025):

Hi folks, any updates regarding PR ?

<!-- gh-comment-id:3512180402 --> @andreys42 commented on GitHub (Nov 10, 2025): Hi folks, any updates regarding PR ?
Author
Owner

@drascom commented on GitHub (Dec 20, 2025):

hey i did a small go app to follow responses here https://github.com/drascom/ollama_metrics_dashboard.git if anyone needs it

Image

<!-- gh-comment-id:3677076658 --> @drascom commented on GitHub (Dec 20, 2025): hey i did a small go app to follow responses [here](https://github.com/drascom/ollama_metrics_dashboard.git) https://github.com/drascom/ollama_metrics_dashboard.git if anyone needs it ![Image](https://github.com/user-attachments/assets/d7a901d2-3b5c-4c62-aeea-a9ec08e0b6d9)
Author
Owner

@abate commented on GitHub (Feb 21, 2026):

one important metric for me ( but I guess this is already considered ) would be the number of layers of the model offloaded to the GPU , and information about the KV cache. looking forward to check this out. My poor ollama server is in dire need of optimizations, and without data, it's sometimes complicated.

<!-- gh-comment-id:3938545043 --> @abate commented on GitHub (Feb 21, 2026): one important metric for me ( but I guess this is already considered ) would be the number of layers of the model offloaded to the GPU , and information about the KV cache. looking forward to check this out. My poor ollama server is in dire need of optimizations, and without data, it's sometimes complicated.
Author
Owner

@elliotfehr commented on GitHub (Apr 11, 2026):

+1 to a native metrics endpoint! I made https://github.com/elliotfehr/ollama-metrics-proxy as a quick workaround

<!-- gh-comment-id:4230367478 --> @elliotfehr commented on GitHub (Apr 11, 2026): +1 to a native metrics endpoint! I made https://github.com/elliotfehr/ollama-metrics-proxy as a quick workaround
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27694