[GH-ISSUE #8247] Enhanced System Observability for Multi-Server Environments (Unified Endpoints?) #51779

Closed
opened 2026-04-28 20:56:03 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @dezoito on GitHub (Dec 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8247

As Ollama adoption grows, the lack of comprehensive system metrics makes it challenging to meet standard operational requirements - monitoring, alerting, and planning across development, staging, and production environments.

This can also prevent a wider adoption in commercial and production applications.

While the current endpoints (/api/version, /api/tags, /api/ps) provide basic information, consolidating and expanding these into a single observability endpoint would significantly improve monitoring and management capabilities,

Proposed: New /api/info endpoint returning unified system metrics, like:

{
  "version": "0.1.16",
  "system": {
    "cpu": {
      "cores": 16,
      "threads": 32,
      "usage_percent": 45.2,
      ...
    },
    "memory": {
      "total_bytes": 34359738368,
      "used_bytes": 28859738368,
      ...
    },
    "gpus": [
      {
        "name": "NVIDIA GeForce RTX 4090",
        "memory": {
          "total_bytes": 25769803776,
          "used_bytes": 16106127360,
          ...
        },
      }
    ]
  },
  "models": {
    "loaded": [...],
    "available": [...],
  },
}

This enhancement would:

  • Enable proper monitoring and alerting in production environments
  • Simplify capacity planning across deployments
  • Allow integration with standard observability tools

To the community:

  • What other metrics or stats would be useful to have?
  • How do you currently monitor or observe your running Ollama instances?
Originally created by @dezoito on GitHub (Dec 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8247 As Ollama adoption grows, the lack of comprehensive system metrics makes it challenging to meet standard operational requirements - monitoring, alerting, and planning across development, staging, and production environments. This can also prevent a wider adoption in commercial and production applications. While the current endpoints (`/api/version`, `/api/tags`, `/api/ps`) provide basic information, consolidating and expanding these into a single observability endpoint would significantly improve monitoring and management capabilities, Proposed: New `/api/info` endpoint returning unified system metrics, like: ```json { "version": "0.1.16", "system": { "cpu": { "cores": 16, "threads": 32, "usage_percent": 45.2, ... }, "memory": { "total_bytes": 34359738368, "used_bytes": 28859738368, ... }, "gpus": [ { "name": "NVIDIA GeForce RTX 4090", "memory": { "total_bytes": 25769803776, "used_bytes": 16106127360, ... }, } ] }, "models": { "loaded": [...], "available": [...], }, } ``` This enhancement would: - Enable proper monitoring and alerting in production environments - Simplify capacity planning across deployments - Allow integration with standard observability tools To the community: - What other metrics or stats would be useful to have? - How do you currently monitor or observe your running Ollama instances?
GiteaMirror added the feature request label 2026-04-28 20:56:03 -05:00
Author
Owner

@maglore9900 commented on GitHub (Dec 26, 2024):

Agree. This would be very helpful for enterprise type deployments, but also helpful for power users, and potentially troubleshooting.

<!-- gh-comment-id:2562816396 --> @maglore9900 commented on GitHub (Dec 26, 2024): Agree. This would be very helpful for enterprise type deployments, but also helpful for power users, and potentially troubleshooting.
Author
Owner

@kth8 commented on GitHub (Dec 26, 2024):

Prometheus and Grafana is widely used in commercial and production environments and the standard endpoint for that is /metrics. In order for Prometheus to scrape that endpoint, the format should be like this instead of JSON.

<!-- gh-comment-id:2562818401 --> @kth8 commented on GitHub (Dec 26, 2024): Prometheus and Grafana is widely used in commercial and production environments and the standard endpoint for that is `/metrics`. In order for Prometheus to scrape that endpoint, the format should be like [this](https://demo.promlabs.com/metrics) instead of JSON.
Author
Owner

@dezoito commented on GitHub (Dec 26, 2024):

In order for Prometheus to scrape that endpoint, the format should be like this instead of json.

Thank you for the feedback!

That looks way more detailed, of course.

I still think a simple JSON response could still add a ton of value (and would be easier to implement by combining the existing endpoints).

Someone cue in the "why not both" meme, please...

<!-- gh-comment-id:2562831601 --> @dezoito commented on GitHub (Dec 26, 2024): > In order for Prometheus to scrape that endpoint, the format should be like [this](https://demo.promlabs.com/metrics) instead of json. Thank you for the feedback! That looks way more detailed, of course. I still think a simple JSON response could still add a ton of value (and would be easier to implement by combining the existing endpoints). Someone cue in the "why not both" meme, please...
Author
Owner

@rick-github commented on GitHub (Dec 26, 2024):

https://github.com/ollama/ollama/issues/3144

<!-- gh-comment-id:2562875176 --> @rick-github commented on GitHub (Dec 26, 2024): https://github.com/ollama/ollama/issues/3144
Author
Owner

@rick-github commented on GitHub (Jan 13, 2025):

closing as dupe #3144

<!-- gh-comment-id:2586036739 --> @rick-github commented on GitHub (Jan 13, 2025): closing as dupe #3144
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51779