mirror of
https://github.com/go-gitea/gitea.git
synced 2026-03-22 06:24:14 -05:00
Support for OpenTelemetry #13833
Open
opened 2025-11-02 10:54:26 -06:00 by GiteaMirror
·
39 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
No Label
type/proposal
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#13833
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TheFox0x7 on GitHub (Dec 16, 2024).
Feature Description
OpenTelemetry is an open standard designed to unify signals coming from an application. It allows for tracing requests and operations, sending logs from the application and gathering and sending prometheus metrics.
Potentially it would allow for removing prometheus in favor of opentelemetry (go library does have setting for prometheus style endpoint though I've yet to test it) and having the option to find long running operations quicker with tracing.
I'd like to work on this (though it might take a while) and I have a "demo" of such integration here, with only tracing support.
Screenshots
No response
@lunny commented on GitHub (Dec 16, 2024):
Please have more discussion here before proposing a pull request.
Traditionally, I would like to collect logs from the disk log file rather than sending logs to the remote directly. I haven't read all the documentation of OpenTelemetry but I want to know how it works for logs.
@wxiaoguang commented on GitHub (Dec 17, 2024):
I think it needs more discussion.
If there is a chance, I'd like to totally decouple the 3rd dependency to avoid bloating Gitea binary (it is large enough now, and it is executed by every git operation .....)
There are various "tracing" or other tools and there will be more, if we added this, then should or shouldn't we add others?
So before starting the work, I'd like to clarify:
meter: use it to replace the existingmodules/metrics/collector.go(prometheus)? I do not see we should keep different and duplicate collectors togethertracer: I agree that Gitea needs better performance tracing, but the question is how to introduce it. And Gitea used to be an all-in-one app, in most cases we need end users to report their performance problems directly, but not ask the users to setup a OpenTelemetry to collect the performance profile ....logger: well, maybe Gitea needs to refactor its logger system first? Or just output JSON logs?@TheFox0x7 commented on GitHub (Dec 17, 2024):
For starters, I hoped for some discussion before continuing forward - I have few outstanding issues (mainly how to handle ssh connections and how to register the exporters with manager because there's probably a way better solution for that) which I want to discuss.
Firstly keep in mind that all 3 of the components are optional by design, you can have noop versions of all 3 with 0 code change.
As for your questions:
I haven't yet connected logs to it in any app besides python, but in general it doesn't stop you from gathering logs on disk, to stdout/err or to syslog. What it does is provide a vendor independent way to export logs (which is just gathers on the fly as well), to systems supporting the OpenTelemetry Protocol (OTLP), via grpc or http.
Logging in golang SDK is experimental stage.
The idea with opentelemetry is to provide a single standard for signals, so it shouldn't be required to add more.
Roadmap probably would be useful yeah. As for the usefulness, it might vary from case to case but I'll refer you to grafana's blog post about it which I think shows why it can be useful pretty well.
That's the idea and it shouldn't be a breaking change since golang SDK supports prometheus style export aside from OTLP.
Tracing is mainly for operations (cross or in app) - for example if you select the branch view and it takes a while to load, you might be interested why - was it the template taking weirdly long, git being slow for some reason, nothing was cached or maybe database took a while to respond?
With traces you can setup tracing on all of those parts and have a time route with metadata similar to gnatt chart to see where the issue was.
I don't mean to introduce it as a mandatory step to report a performance issue, I think it would be easier if someone has such a setup to be able to see the trace duration and point at let's say DB query as the thing that took the longest (say 2 seconds) despite normally being quick (0.2 seconds).
Logging as mentioned before is experimental in golang SDK and I'm still kind of lost how it actually works (I mainly never tried it). Migrating to
log/slogwould be nice but that's not something I can do without help and I'd have to refresh on how the logging in SDK is configured to say something for sure.That is basically how the current log system can be gathered (except the JSON part) if one wants. So yes it is possible - with a very similar system as prometheus. JSON logs for that would be something of interest to me.
@wxiaoguang commented on GitHub (Dec 17, 2024):
Thank you for the details. I also have some experiences for "app performance tracing" (I used to build a web app serving millions of user per day and the QPS is around thousands), I also used some tracing tools and built some tracing frameworks, so I could fully agree with the ideas behind OpenTelemetry.
I also used GitLab for long time, GitLab packed a whole grafana into their deployment, and they have their own diagnosis tools.
So if let me design a roadmap for Gitea, I think the key problem is how to benefit most end users if we introduce a new mechanism.
tracercalls correctly.Then the question is: how to satisfy these requirements? Gitea needs tracing tools, and end users need to report performance problems conveniently. In my mind, the roadmap could be like this:
@TheFox0x7 commented on GitHub (Dec 17, 2024):
I agree it's not feasible to expect someone who wanted a simple git server to setup prometheus (metrics), loki/logstash (logs) and tempo/jaeger (traces) plus possibly pyroscope (pprof). This feature mostly targets people who would want to have observable application when using multiple components or are looking to diagnose such issues.
Collecting traces is fairly simple but I know that's not helping much.
But I sort of think you're misunderstanding the idea. The entire thing is optional by design - so you COULD use it but it's not mandatory. If you want to have a system which would work without any collectors and so on - it could be done by adding an exporter (either dedicated or the debug console one).
It's very similar to how metrics work - you can turn it on and scrape it but you can also not do it and there's no loss.
As for gitlab comparison - the minimal possible deployment won't change. The protocol is already a "shim" of sorts, as you can add custom exporters to it or just ingest data in your own way.
I don't want to introduce entire grafana stack into gitea or to try and go the gitlab way of trying to add dashboards and metric collection into it.
I'm not sure if I understand your concerns properly - you're worried about it being a smaller feature which won't be used by a lot of people and will be hard to take advantage of?
quick demo of how it works in case someone wants to see it.
It's just tracing, copy pasted from getting started page and trimmed down a bit. On first run you should get a number from the dice roll, then a trace json should be printed. Second thing which you can try is comment out every part in setup except the roll then see that it still works but the entire trace is a noop.
@wxiaoguang commented on GitHub (Dec 17, 2024):
To prevent from misunderstanding, let's take some examples.
Your proposal: Suppose we have OpenTelemetry in Gitea:
My proposal: not introduce OpenTelemetry in Gitea at the moment, but only implement the necessary tracing features in Gitea and provide some shim interfaces
@TheFox0x7 commented on GitHub (Dec 17, 2024):
Okay I get it now, thanks a lot. I haven't considered an embedded use case.
In my case it would be slightly more complicated:
podman run --rm -p 16686:16686 -p 4317:4317 docker.io/jaegertracing/all-in-one:latestwould cover the minimum version)which is far from an ease of having it available from admin panel.
On the other hand I have no clue how would it look like. It's not like one (to my knowledge) could enable it from admin panel, then get one particular endpoint traced end to end without having propagation which the standard does provide. Also I think that making an interface for it would be harder to maintain and integrate with later. So I still think it would be better to have the native support and think of a way to make it simple to configure and gather as intended or make an custom exporter which will provide the use case which you're talking about.
Regardless it is an interesting use case, I'll admit.
I thought about having an OTel collector embedded in to use as reporting for actions - since CI monitoring is a really nice use case for traces, but I scrapped the idea as it feel more like bloat and there's the storage of it to figure out. Maybe it would work for this? Throwing it out there in case you'll find it interesting.
Full disclosure: I'm obviously biased to having it natively implemented instead of having to shim support onto base.
@wxiaoguang commented on GitHub (Dec 18, 2024):
After 1.23 gets a stable release (maybe in 1 or 2 weeks), I think we could start some initial work for the performance tracing. Then we could try to get OpenTelemetry support, either source-code-level optional (build from source code/tag), or config-level optional (enable by config option), or mandatory.
@TheFox0x7 commented on GitHub (Dec 18, 2024):
I would suggest making it a configuration feature. It would be always "on" but just default to noop.
Making it mandatory is already out as it needs to be configured - the default
localhost:4317address is pretty useless on it's own.I took a really quick glace at the std
runtime/traceversion. I think that would indeed cover your usecase better as it lets the user write the trace to a file and I think it works globally so you could just add a trigger in admin panel for it, maybe an on/off switch even.@wxiaoguang commented on GitHub (Dec 22, 2024):
I started some initial work here: https://github.com/wxiaoguang/gitea/commits/support-trace/
Quite simple, it introduces our shim-layer and builtin tracer. The interface is designed to be OpenTelemetry-like.
Will try to do some improvements and make it really work before proposing a PR: the real challenge is how to correctly trace the performance problems and show useful result to end users (especially in the db and git package). Maybe it would take about one week.
Update: PR: Support performance trace #32973
@TheFox0x7 commented on GitHub (Dec 22, 2024):
It looks like it would work with the standard. As long as semantic conventions will be followed for metadata keys/values and I would be able to replace tracer with otel one without breaking stuff I don't think I'll have objections.
Db would probably be best handed in xorm via otel as well (seeing that grafana uses xorm they might consider helping there), but semconv exist for client side, and it might be doable with either session or struct events but I haven't looked that deep.
For git I modified the run command, which worked surprisingly well as a quick "catch all" option, with a downside of spans lacking good context (that required diving into the span and seeing what command was called), which I started to solve by having the general command be traced too:
Side note so I won't forget about it, calls to
/internalendpoint should have span headers since they aren't a standalone operations but they are part of a larger one. The caller of that endpoint is the parent.@TheFox0x7 commented on GitHub (Dec 22, 2024):
FYI: I've published what I've made to date on my fork.
To turn it on:
and

docker/podman run --rm -p 16686:16686 -p 4318:4318 -d --name jaeger docker.io/jaegertracing/all-in-one:latestshould suffice. The container should listen on
localhost:4318as that's the default.As for the actual demo - visit any page in gitea, view a file, etc. then go to localhost:16668 (jaeger) and pick
giteafrom service menu. At the very least you should have a git trace similar to this one:@wxiaoguang commented on GitHub (Dec 22, 2024):
Hmm ... it seems somewhat cumbersome to add a lot of
ctx, span := tracer.Start(repo.Ctx, "CommitTree"). I think I could have some ideas about how to improve it. At least, by my recent " Refactor pprof labels and process desc #32909 ",git.Command.Runis able to know its caller.ps: I am still working on some refactoring PRs, I think we could have enough preparation first, then the following work could be easier.
Update: in Refactor request context #32956 , legacy hacky code has been removed and now we have a clear RequestContext, this context is worth to trace.
@techknowlogick commented on GitHub (Dec 24, 2024):
Thank you for this issue. As for the related code you linked to, due to its licensing, I have a personal policy of not looking at that repo to ensure that I don't violate any of the licenses.
It's funny that you opened this issue, as a part of some performance work, and some chance encounters with some folks from honeycomb, I have been investigating this for some time, and have some (less than stellar) code that I put together as a proof of concept to show that it could be done. I've pushed my code, so that others can also look at it too, and perhaps get this in.
I share the same feelings as @wxiaoguang about the cumbersome boilerplate of wrapping many things with tracer.start/etc.., but sadly, other than some ebpf auto-profiling things that exist for otel, that seems to be the approach for adding otel to golang programs. Thankfully the two libraries I've chosen for middleware for db & HTTP, alleviate that. Sadly, for our git module we are likely to have to do the wrapping ourselves.
@wxiaoguang commented on GitHub (Dec 24, 2024):
I am still working on some preparations, so please review and approve my recent refactoring PRs, then we will have the builtin performance tracer soon. Then, at least, advanced users could introduce their tracers from source code level.
@TheFox0x7 commented on GitHub (Dec 24, 2024):
@techknowlogick You can ignore it. I'm the author of the entire thing and I did pick EUPL (since I wanted to ensure the telemetry part is public), but seeing as I want to add it to gitea and work on this as well I dropped that requirement.
If you want I can swap the header so it's on paper without having to rely on my word.
It's also less than stellar - main things which work well are settings and basic tracing, the actual service start is something that works but it's probably misusing the graceful service system somewhat.
@wxiaoguang commented on GitHub (Dec 24, 2024):
I am sure we need to do it by ourselves: https://github.com/go-gitea/gitea/issues/32866#issuecomment-2547981203
If you read my recent PRs, then you could know that: no sad, no need to make it complicated. I have a complete design to make the system right and lightweight, but I need to do some refactorings first.
@wxiaoguang commented on GitHub (Dec 24, 2024):
I have pushed my work for "builtin tracer": WIP: Support performance trace #32973 .
It has initial http/db/git support, and a simple list in admin UI
@techknowlogick commented on GitHub (Dec 24, 2024):
@TheFox0x7 thanks! I'll checkout your code in the link above. I'm sure your statement here is sufficient, and enough people get emails from this tracker that there is some record of your statement should something catastrophic happen to GitHub (knock on wood it doesn't).
@wxiaoguang thanks! I hadn't checked out your code yet. The code in my PR was stashed in my local repo for a while and figured it'd be better to push it up since it can be added to this discussion.
@TheFox0x7 commented on GitHub (Dec 25, 2024):
I've looked through the build-in tracer PR and I have few issues I think are worth discussing. Posting them here since I don't know the conventions too well - some don't like draft reviews. Plus they aren't that specific to it.
First thing, circling back a bit - why not implement SpanExporter interface as internal backend?
I see following upsides of that:
I've tried to do it and the biggest issue I had so far is connecting spans into one as spans know only their parent so I can't easily build a string like it did.

I did however manage to rewrite parts of it so it works on my PR (without parent child relation):
If we manage to solve without re-implementing half of jaeger I think we'd have it covered quite easily between the 3 solutions.
@wxiaoguang commented on GitHub (Dec 25, 2024):
Sorry but I didn't get the point .... could you elaborate?
I think our shim-layer won't implement any "real" OpenTelemetry interfaces, because the OpenTelemetry's interfaces have quite heavy and complex dependencies. So by current design, users could implement the "traceStarter" and register it to operate on the our spans directly.
If you think "SpanExporter" is useful, could you provide some pseudocode to describe how to use it?
@wxiaoguang commented on GitHub (Dec 25, 2024):
In my mind the otel integration could be like this:
@TheFox0x7 commented on GitHub (Dec 25, 2024):
Putting it into words turns out way harder than I thought but:
The way I see it right now is that we could make taillog into an exporter, making the integration more natural IMO, with the yet to be solved part of assembling a trace from spans. I might be stuck a bit on this idea though as mentioned earlier.
Unfortunately I agree that the sdk isn't the lightest or simple and compared to main branch the end binary is ~3MB larger - which is quite a lot for a feature that might not even be used.
I think that the list of concerns I have is as follows:
As a side note - caller context in git command is really useful, so thanks for that :)

SpanExporter interface consists of two methods:
ExportSpans (context.Context, []trace.ReadOnlySpan) error, which is called by relevant OTel system (SpanProcessor here, usually a buffer so the application isn't exporting 1 span at a time). It's job is to export spans (ones that ended) to any place you want.Shutdown(context.Context) error, which is the clean up method, after which the exporter should not process anything.So I was thinking that we could have internal one which would export spans into memory if the feature is on, similarly to how you proposed in the draft but with regular library.
So putting it into a flow: span -> Processor -> Exporter -> Memory ->
/perftracepageI hope I made myself clearer... even if a small bit.
My hacky plug
Full code on: https://github.com/TheFox0x7/gitea/tree/otel-mods
@TheFox0x7 commented on GitHub (Dec 29, 2024):
I added prometheus exporter with otelchi generated metrics as a port test and as fun as otelchi is for quick start I don't think this:
gitea_response_size_bytes_count{http_method="GET",http_scheme="http",net_host_name="Gitea",net_host_port="3000",net_protocol_version="1.1",net_sock_peer_addr="127.0.0.1",net_sock_peer_port="57834",otel_scope_name="github.com/riandyrn/otelchi/metric",otel_scope_version="0.1.0",user_agent_original="Mozilla/5.0 (X11; Linux x86_64; rv:133.0) Gecko/20100101 Firefox/133.0"} 13is something we're interested in (the user agent addition is... interesting).
But it is doable and fairly simple to port current metrics to it.
Unfortunately this doesn't remove prometheus package like I thought it would so there goes my "removing dependency" argument.
@TheFox0x7 commented on GitHub (Jan 12, 2025):
bit of an update:
My current idea is to use otel for general tracing and export them using either http/grpc to remote/with some built-in exporter to have some insight without pulling jaeger or such. Some smaller open threads with this approach:
And a rather large one:
How to make traces work for issue reports? That's a rather large and core issue of this. There isn't a set json format for full trace so every backend (tested 2) makes their own one for download.
The way I hoped to make this work it to have admin UI for tracing, which would display the traces and if someone wants to add it to a report they could download the json and attach it. But if this could only be uploaded to a single backend (or two if we pick one to align to), that's kind of pointless.
In theory raw otlp output could be saved to .json file and replayed to upload it
but I've yet to see anything that can read that output.collector has a receiver for this and otel-tui uses it.What I currently have is something like this:

But it doesn't preserve hierarchy and is not nice to look at. Also I've found following projects which might be of interest - for reference or UI side.
https://github.com/CtrlSpice/otel-desktop-viewer
https://github.com/ymtdzzz/otel-tui
I've also discovered that otel trace use
runtime/trace.Taskunderneath, which is nice. Bad part being that (at least to me) it's not in any way more readable or helpful.@wxiaoguang commented on GitHub (Jan 13, 2025):
How many bytes would it make the binary larger? We have removed many dependencies for " Gitea binary size #33023 " to reduce the binary size for many mega-bytes.
@TheFox0x7 commented on GitHub (Jan 13, 2025):
About 2.4-2.6MB for basic version (with external exporters only) and up from there. Most of this coming from grpc. The hacky UI version is 2.9MB larger than main.
Though turns out using collectors code is probably of the table. It pulls twice that size and I haven't done anything useful with it. Also it pulls deprecated library which takes 1.1MB in final build.
@techknowlogick commented on GitHub (Jan 13, 2025):
Which library is the deprecated one?
Is this even with
-s -wldflags, to trim down the binary, on the final build?@TheFox0x7 commented on GitHub (Jan 13, 2025):
github.com/gogo/protobuf, which gets pulled bygo.opentelemetry.io/collector/pdata/ptrace. To be fair I haven't made any use of it for now so I'm not even sure if regular collector would end up pulling it in. I pulled it since I wanted to base the local UI of otel-tui but so far I managed without.Yes.
@TheFox0x7 commented on GitHub (Jan 14, 2025):
I figured that before trying to figure the span assembly I'll work on some routes to trace so there would be some a starting point and well... this will be more complicated than I originally thought. Mainly due to the fact that the
context.Contextis wrapped so the normal propagation system won't work like usual...No idea how to deal with this for now. I think proper thing to do is refactor the context to be standard library one per go wiki but that's a lot of work and I'm not sure how doable it would be in the first place.
Unless I'm missing an easy way to do it. I'm not that advanced in go.
@wxiaoguang commented on GitHub (Jan 15, 2025):
For this problem: we should use the chi router's pattern path, I have used that in the demo #32973
https://github.com/go-gitea/gitea/pull/32973/files#diff-6c413ec64ebadd1b4f2778a41d4cf9d60b8682a8adf62f957e3b0f12aad5ac14R54-R58
(it's also not complete yet, need to cover some edge cases)
By the way, I still have some concerns of introducing the opentelemetry dependency at the moment:
(ps: there are some high priority TODOs on my side, so I haven't made new progress for #32973)
@TheFox0x7 commented on GitHub (Jan 15, 2025):
This does mostly what otelchi does and doesn't solve the problem completely. This makes the entire route a root span and allows for operations in that route to be part of that trace - which is what we want.
This doesn't make grouping of some operations possible though, I ended up with a root span from the route with multiple child spans inside it, all labeled as some git operation or database call. Now let's say I want to group them into which operations are part of the
ViewIssuefunction. I cannot do it because that requires context to be passed down. See image, note that hierarchy is flat.So I end up with meaningless information that
ViewIssueis part of the route and took some time. Not why it took that time.Fully understandable, I'm not too happy about it too but not much I can do with that unfortunately. Unless alternative implementation of the SDK exists, which I haven't found. Most of the size comes from
google.golang.org/grpc(+946kB),go.opentelemetry.io/otel(+651kB) and .rodata (+576kB).This can be done regardless of tracer though? Just mark the feature as beta until we figure out all paths to trace.
I thought about it too. Traces can be exported as json, but it it's not standardized so that's kind of a dead end. Beyond usage of
runtime/traceand writing the output to diagnostics file or writing the OTLP to json file I have no good ideas for now.@wxiaoguang commented on GitHub (Jan 15, 2025):
If you'd like to trace every middleware&endpoint handler's names, there is
routing.GetFuncInfowhich is used to trace all function names, androuting.UpdateFuncInfois used to record the names and these names will be shown in logs. I could propose a demo later.@TheFox0x7 commented on GitHub (Jan 15, 2025):
I'm not sure how this would help. The span names are fine (except for the database but that's a different issue). The fundamental issue is about relations, the parent span cannot be derived from context if the context is custom.
In the above image I've tried to work with the context with span to show the propagation issue. What you can gather from it is:
Lack of propagation makes the db tracing kind of pointless since they aren't getting linked to any operation.
Of course I could go down the code manually and see which function ends up calling git but at that point I might as well just print down times to console.
Hopefully I made the issue clearer.
I'll go work on prometheus metrics in the meantime, because as far as I see, there's no way to avoid having to decouple
stdcCtx.Contextfromcontext.Context.@wxiaoguang commented on GitHub (Jan 15, 2025):
It helps. See my new commit
94816bf306:@TheFox0x7 commented on GitHub (Jan 15, 2025):
It didn't really want to trace any endpoint aside of
/user/eventsbut I finally got a/{username}/{reponame}/{type:pulls}/{index}to show up.First few nestings are fine but then you hit the opposite issue (at least the data looks like you hit it).
GetNotificationCountGetActiveStopwatchshould be on the same level as one doesn't call the other. No idea how accurate is it later.However I still don't see how this helps with tracking spans via context which is the issue.
If the overall consensus here is that otel is too big to be included by default can we at least agree to use general api ofNever mind this doesn't guarantee the same behavior so implementation working on internal ones might not work on otel ones.go.opentelemetry.io/otel(+827kB with my starting point)?The route this is taking starts to be orthogonal to my proposal and I think it'll make it harder to include distributed version, which would be an issue to me.
Also related to the binary size: https://github.com/open-telemetry/opentelemetry-go/issues/2579
@wxiaoguang commented on GitHub (Jan 16, 2025):
No, it is not the opposite issue , it is the code really does, because
GetNotificationCount/GetActiveStopwatchare middlewares. Every middleware is traced asBut
GetNotificationCountis more special because it defers the "ctx" call to a tmpl function, you could read its code to see why.In short: the new trace code is right. If it looks ugly (or unexpected), it is the existing code's problem.
@wxiaoguang commented on GitHub (Jan 16, 2025):
I do not see which part starts to be orthogonal , if you use our
gtproftracer framework, everything should work with OpenTelemetry SDK. Just register your tracers byglobalTraceStarters = append(globalTraceStarters, &traceOpenTelemetryStarter{})@TheFox0x7 commented on GitHub (Jan 16, 2025):
You're correct. I looked at them and figured that they are next to each other so they shouldn't be included in each other, while ignoring that they are middlewares despite writing one before. I'm sorry for that.
See my review on your proposed system. It does work but it's limited and propagation issue remains.
I understand the size concern, it's completely valid. I understand the need for it to work for local users too and I fully agree that it is of use. But this doesn't cover the opposite end of users who are willing to start a single jaeger instance and collect data.
Why not standardize on https://pkg.go.dev/go.opentelemetry.io/otel/trace? That way we at least have common api and can swap implementations easily.
Otel exporters can be behind compilation flag to avoid bloat but at least it's a standard and portable implementation which works for both systems the same.
Here's a version without any exporters It does nothing until you configure a tracer. otelchi is only used for simplicity since it has route naming figured out properly while it would take a bit for me.