[GH-ISSUE #9254] Add support for opentelemetry tracing #31792

Closed
opened 2026-04-22 12:33:53 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @ibl-g on GitHub (Feb 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9254

Facilitate troubleshooting key server side latency phases by instrumenting and exporting API request trace spans.

This is complementary to #3144 for adding prometheus metric export. However, there's great overlap in how both can be implemented.

High level design for the tracing MVP

  • Use opentelemetry-go-contrib/exporters/autoexport for initialising trace exporter based on established opentelemetry environment variables.

    • This will add support for otlp, console/stdout, and noop exporters by default. It could be extended easily by adding additional dependencies on exporters from the opentelemetry registry.
    • Note, the same library can be used to initalise a metric reader and prometheus /metrics endpoint for #3144 as well as otlp, noop, and console/stdout readers.
  • Use opentelemetry-go-contrib/instrumentation/github.com/gin-gonic/gin/otelgin to add routing middleware in server/routes.go. This provides both traces and metrics, depending on which exporters have been initialized.

  • Use opentelemetry's default trace context propagation to nest spans under a parent trace if present in the API request. This would be the case if client libraries use trace propagation and/or are running in an environment that initiates tracing downstream (like Cloud Run). This is default out of the box with opentelemetry golang libraries.

  • Extend handlers in server/routes.go to capture MVP spans using opentelemetry-go. I'd suggest starting with phases already reported in API responses like load_duration and eval_duration and expanding from there.

    • Use SpanRecorder to verify that expected traces have been added within routes_test.go.
    • Alternatively (or in addition), register an InMemoryExporter with autoexport and verify it captured expected spans after exercising some routes.
  • Document how to enable trace exports under https://github.com/ollama/ollama/tree/main/docs. We could also create a full
    example either with the console exporter or exporting to a second container with an opentelemetry-collector.

Future follow up

  • Set trace span attributes per semantic conventions for GenAI spans
  • Use instrumentation/net/http/otelhttp to add child trace spans for outbound HTTP requests. This could be used to
    capture the interactions with the llama server from llm/server.go or for instrumenting the model download in /api/pull.
  • Continue to add useful child spans and span attributes.

Next steps

I have a working proof of concept of the MVP outlined above running on Cloud Run with an ollama container exporting using otlp to a sidecar, which in turn exports to Google Cloud Trace - https://services.google.com/fh/files/misc/ollama-tracing-poc.png (note the first
/api/generate span is from Cloud Run and the second from otelgin). If you are happy with the proposed design then I'll clean up my PoC and send you a PR.

Originally created by @ibl-g on GitHub (Feb 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9254 Facilitate troubleshooting key server side latency phases by instrumenting and exporting API request trace spans. This is complementary to #3144 for adding prometheus metric export. However, there's great overlap in how both can be implemented. ### High level design for the tracing MVP * Use [opentelemetry-go-contrib/exporters/autoexport](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/exporters/autoexport) for initialising trace exporter based on established [opentelemetry environment variables](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/). * This will add support for otlp, console/stdout, and noop exporters by default. It could be extended easily by adding additional dependencies on exporters from the [opentelemetry registry](https://opentelemetry.io/ecosystem/registry/?language=go). * Note, the same library can be used to initalise a metric reader and prometheus /metrics endpoint for #3144 as well as otlp, noop, and console/stdout readers. * Use [opentelemetry-go-contrib/instrumentation/github.com/gin-gonic/gin/otelgin](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrumentation/github.com/gin-gonic/gin/otelgin) to add routing middleware in `server/routes.go`. This provides both traces and metrics, depending on which exporters have been initialized. * Use opentelemetry's default trace context propagation to nest spans under a parent trace if present in the API request. This would be the case if client libraries use trace propagation and/or are running in an environment that initiates tracing downstream (like Cloud Run). This is default out of the box with opentelemetry golang libraries. * Extend handlers in `server/routes.go` to capture MVP spans using [opentelemetry-go](https://github.com/open-telemetry/opentelemetry-go). I'd suggest starting with phases already reported in API responses like `load_duration` and `eval_duration` and expanding from there. * Use [SpanRecorder](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/tracetest/recorder.go) to verify that expected traces have been added within `routes_test.go`. * Alternatively (or in addition), register an [InMemoryExporter](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/tracetest/exporter.go) with `autoexport` and verify it captured expected spans after exercising some routes. * Document how to enable trace exports under https://github.com/ollama/ollama/tree/main/docs. We could also create a full example either with the console exporter or exporting to a second container with an opentelemetry-collector. ### Future follow up * Set trace span attributes per [semantic conventions for GenAI spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/) * Use [instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrumentation/net/http/otelhttp) to add child trace spans for outbound HTTP requests. This could be used to capture the interactions with the llama server from `llm/server.go` or for instrumenting the model download in `/api/pull`. * Continue to add useful child spans and span attributes. ### Next steps I have a working proof of concept of the MVP outlined above running on Cloud Run with an ollama container exporting using otlp to a sidecar, which in turn exports to Google Cloud Trace - https://services.google.com/fh/files/misc/ollama-tracing-poc.png (note the first `/api/generate` span is from Cloud Run and the second from `otelgin`). If you are happy with the proposed design then I'll clean up my PoC and send you a PR.
GiteaMirror added the feature request label 2026-04-22 12:33:54 -05:00
Author
Owner

@bmizerany commented on GitHub (Feb 27, 2025):

Thank you for the ticket. We're not ready to support OT at this time. There is current work being done to refactor large amounts of the server, and we'll be considering lite tracing during its development.

I'm closing this and the accompanying PR since we can't support it at the moment.

Thank you again.

<!-- gh-comment-id:2689225934 --> @bmizerany commented on GitHub (Feb 27, 2025): Thank you for the ticket. We're not ready to support OT at this time. There is current work being done to refactor large amounts of the server, and we'll be considering lite tracing during its development. I'm closing this and the accompanying PR since we can't support it at the moment. Thank you again.
Author
Owner

@ibl-g commented on GitHub (Feb 28, 2025):

No worries. Let me know if you'd like any assistance with setting it up down the line and good luck on the refactoring!

<!-- gh-comment-id:2690726888 --> @ibl-g commented on GitHub (Feb 28, 2025): No worries. Let me know if you'd like any assistance with setting it up down the line and good luck on the refactoring!
Author
Owner

@frzifus commented on GitHub (Feb 11, 2026):

Any objections adding trace support to ollama and mention its not offically supported? e.g. Adding a simple log message for users that they are informed its just experimental and may does not work?

I am currently observing my llm-d, llamastack, custom agent setup quite fine. The only thing missing is ollama. Having a basic instrumentation would also help with generating RED metrics on my end.

<!-- gh-comment-id:3884407038 --> @frzifus commented on GitHub (Feb 11, 2026): Any objections adding trace support to ollama and mention its not offically supported? e.g. Adding a simple log message for users that they are informed its just experimental and may does not work? I am currently observing my llm-d, llamastack, custom agent setup quite fine. The only thing missing is ollama. Having a basic instrumentation would also help with generating RED metrics on my end.
Author
Owner

@frzifus commented on GitHub (Feb 18, 2026):

wdyt @bmizerany?

<!-- gh-comment-id:3918117628 --> @frzifus commented on GitHub (Feb 18, 2026): wdyt @bmizerany?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#31792