Languages Statusbar #890

Closed
opened 2025-11-02 03:40:27 -06:00 by GiteaMirror · 26 comments
Owner

Originally created by @TimerWolf on GitHub (Jul 14, 2017).

Are there any plans for gitea to get a status-bar that shows how much of different code its used in the git?

And when you click on it you see the different languages that are used and how many % of the code that is used of that language in the git?

Its a really neat feature that i really miss!

Originally created by @TimerWolf on GitHub (Jul 14, 2017). Are there any plans for gitea to get a status-bar that shows how much of different code its used in the git? And when you click on it you see the different languages that are used and how many % of the code that is used of that language in the git? Its a really neat feature that i really miss! ![](https://snag.gy/gil631.jpg)
GiteaMirror added the type/feature label 2025-11-02 03:40:27 -06:00
Author
Owner

@tonivj5 commented on GitHub (Jul 19, 2017):

Here is the PR to gogs (it was not merged) implemeting that feature https://github.com/gogits/gogs/pull/2135. It could be reused to add it in gitea 😉

@tonivj5 commented on GitHub (Jul 19, 2017): Here is the PR to gogs (it was not merged) implemeting that feature https://github.com/gogits/gogs/pull/2135. It could be reused to add it in gitea 😉
Author
Owner

@lunny commented on GitHub (Jul 20, 2017):

@xxxtonixxx maybe someone could send it to Gitea.

@lunny commented on GitHub (Jul 20, 2017): @xxxtonixxx maybe someone could send it to Gitea.
Author
Owner

@tonivj5 commented on GitHub (Jul 21, 2017):

If @generaltso want, he could do it! If not, I think I could attempt it 😅

@tonivj5 commented on GitHub (Jul 21, 2017): If @generaltso want, he could do it! If not, I think I could attempt it 😅
Author
Owner

@dayvonjersen commented on GitHub (Jul 21, 2017):

https://github.com/gogits/gogs/pull/2135 is certainly out of date by now as the gogs codebase has probably changed and the API for linguist has definitely changed.

Of course anyone is more than welcome to use my library but implementing this feature isn't going to be a copy/paste job.

That said, I copied and pasted the CSS I whipped up for those screenshots into a codepen: https://codepen.io/anon/pen/PjMdBy

-tso

@dayvonjersen commented on GitHub (Jul 21, 2017): https://github.com/gogits/gogs/pull/2135 is certainly out of date by now as the gogs codebase has probably changed and the API for [linguist](https://github.com/generaltso/linguist) has definitely changed. Of course anyone is more than welcome to use my library but implementing this feature isn't going to be a copy/paste job. That said, I copied and pasted the CSS I whipped up for those screenshots into a codepen: https://codepen.io/anon/pen/PjMdBy -tso
Author
Owner

@lafriks commented on GitHub (Jul 21, 2017):

And I don't think it can be accepted in that form, stats should be generated and cached only once when repository default branch changes, not on every page load

@lafriks commented on GitHub (Jul 21, 2017): And I don't think it can be accepted in that form, stats should be generated and cached only once when repository default branch changes, not on every page load
Author
Owner

@dayvonjersen commented on GitHub (Jul 21, 2017):

@lafriks yes, exactly. probably best to have like a post-receive hook that runs in the background and stores the result in the db.
and have a setting to disable it entirely for those concerned about server resource usage
it would also be cool if the classifier could then be retrained on real-world code samples but I'm probably jumping the gun here >_>

-tso

@dayvonjersen commented on GitHub (Jul 21, 2017): @lafriks yes, exactly. probably best to have like a post-receive hook that runs in the background and stores the result in the db. and have a setting to disable it entirely for those concerned about server resource usage it would also be cool if the classifier could then be retrained on real-world code samples but I'm probably jumping the gun here >_> -tso
Author
Owner

@OmarAssadi commented on GitHub (Oct 11, 2017):

I put up a small ($5) bounty on this one. Miss this feature!

EDIT: Here is the current pledge amount. If anyone else feels like contributing, feel free!
current amount

@OmarAssadi commented on GitHub (Oct 11, 2017): I put up a small ($5) [bounty](https://www.bountysource.com/issues/47234720-statusbar) on this one. Miss this feature! *EDIT:* Here is the current pledge amount. If anyone else feels like contributing, feel free! [![current amount](https://api.bountysource.com/badge/issue?issue_id=47234720)](https://www.bountysource.com/issues/47234720-statusbar)
Author
Owner

@dayvonjersen commented on GitHub (Oct 19, 2017):

Hm, now my interest is piqued ;)

I could take another stab at it maybe tomorrow evening (I'm in EST). But be forewarned, my preliminary attempt will probably be an awful hack job. I will need to rely on the rest of the community's advice to do it right.

-tso

@dayvonjersen commented on GitHub (Oct 19, 2017): Hm, now my interest is piqued ;) I could take another stab at it maybe tomorrow evening (I'm in EST). But be forewarned, my preliminary attempt will probably be an awful hack job. I will need to rely on the rest of the community's advice to do it right. -tso
Author
Owner

@OmarAssadi commented on GitHub (Oct 19, 2017):

Sounds great! In addition to caching, the final version should probably also be limited by file size. Maybe an adjustable setting?

@OmarAssadi commented on GitHub (Oct 19, 2017): Sounds great! In addition to caching, the final version should probably also be limited by file size. Maybe an adjustable setting?
Author
Owner

@dayvonjersen commented on GitHub (Oct 19, 2017):

@54 Hm so you mean don't try to classify individual files that are larger than, e.g. 1MB? Most of the time what linguist does is it goes by file extension but hm, yes I see what you mean just thinking aloud... Good idea :)

@dayvonjersen commented on GitHub (Oct 19, 2017): @54 Hm so you mean don't try to classify individual files that are larger than, e.g. 1MB? Most of the time what [linguist](https://github.com/generaltso/linguist) does is it goes by file extension but hm, yes I see what you mean just thinking aloud... Good idea :)
Author
Owner

@OmarAssadi commented on GitHub (Oct 19, 2017):

@generaltso Yeah, I just figure it'd kinda suck if someone uploaded some monstrous set of files that the server had to analyze. But, I haven't looked at your linguist library. Does it ever actually do some content analysis or is it pretty much entirely based on extension?

If it is purely based on the extension, then I don't think it's necessary to add that particular limitation.

@OmarAssadi commented on GitHub (Oct 19, 2017): @generaltso Yeah, I just figure it'd kinda suck if someone uploaded some monstrous set of files that the server had to analyze. But, I haven't looked at your linguist library. Does it ever actually do some content analysis or is it pretty much entirely based on extension? If it is purely based on the extension, then I don't think it's necessary to add that particular limitation.
Author
Owner

@dayvonjersen commented on GitHub (Oct 19, 2017):

well it can do either.

in the reference implementation, after being filtered by linguist.ShouldIgnoreFilename() the file extension is passed to linguist.LanguageHints().

if there is more than one possible language for an extension (e.g. .php could be either PHP or Facebook's "Hack" language) then it first checks if the file is a binary blob with linguist.ShouldIgnoreContents() and then uses a bayesian classifier which has been trained on the same dataset as github/linguist to analyse the text (using a tokenizer which could use some improvement) and determine the language (the function is called linguist.Analyse())

a pretty straightforward process imo but I'm a tiny bit biased since I wrote it :p

it might be more convenient to encapsulate all the nuance into a single package-level function instead of requiring all of those steps for the typical use-case, I welcome any input in improving the library for users as well if you or anyone else have any suggestions :)

-tso

@dayvonjersen commented on GitHub (Oct 19, 2017): well it can do either. in the [reference implementation](https://github.com/generaltso/linguist/tree/master/cmd/l), after being filtered by [linguist.ShouldIgnoreFilename()](https://godoc.org/github.com/generaltso/linguist#ShouldIgnoreFilename) the file extension is passed to [linguist.LanguageHints()](https://godoc.org/github.com/generaltso/linguist#LanguageHints). if there is more than one possible language for an extension (e.g. `.php` could be either PHP or Facebook's "Hack" language) then it first checks if the file is a binary blob with [linguist.ShouldIgnoreContents()](https://godoc.org/github.com/generaltso/linguist#ShouldIgnoreContents) and then uses a bayesian classifier which has been trained on the same dataset as [github/linguist](https://github.com/github/linguist) to analyse the text (using a tokenizer which could [use some improvement](https://github.com/generaltso/linguist/blob/master/tokenizer/tokenizer.go#L1)) and determine the language (the function is called [linguist.Analyse()](https://godoc.org/github.com/generaltso/linguist#Analyse)) a pretty straightforward process imo but I'm a tiny bit biased since I wrote it :p it might be more convenient to encapsulate all the nuance into a single package-level function instead of requiring all of those steps for the typical use-case, I welcome any input in improving the library for users as well if you or anyone else have any suggestions :) -tso
Author
Owner

@OmarAssadi commented on GitHub (Oct 19, 2017):

Ah, thanks for the clarification! By the way, #2108 appears to include its own submenu—minus the linguist functionality, though.

Might be worth waiting for that to get merged?

@OmarAssadi commented on GitHub (Oct 19, 2017): Ah, thanks for the clarification! By the way, #2108 appears to include its own submenu—minus the linguist functionality, though. Might be worth waiting for that to get merged?
Author
Owner

@dayvonjersen commented on GitHub (Oct 19, 2017):

@54 that would seem to make the most sense in the grand scheme of things, but I'm just gonna get to hacking away at something on a separate branch based off master since I just reinstalled MySQL and setup gitea and I'm ready to go, just to get the ball moving

I'll update here with screenshots and code and stuff in a couple hours makes coffee 😁

@dayvonjersen commented on GitHub (Oct 19, 2017): @54 that would seem to make the most sense in the grand scheme of things, but I'm just gonna get to hacking away at something on a separate branch based off master since I just reinstalled MySQL and setup gitea and I'm ready to go, just to get the ball moving <insert more metaphors here> I'll update here with screenshots and code and stuff in a couple hours *makes coffee* :coffee: :grin:
Author
Owner

@OmarAssadi commented on GitHub (Oct 19, 2017):

Yeah, of course. I just meant since there is some overlap, probably best not to make something super polished just yet 😁

Good luck, @generaltso!

@OmarAssadi commented on GitHub (Oct 19, 2017): Yeah, of course. I just meant since there is some overlap, probably best not to make something super polished just yet 😁 Good luck, @generaltso!
Author
Owner

@dayvonjersen commented on GitHub (Oct 19, 2017):

OK I got my feet wet in the code base and basically just implemented the design I had previously come up with some dummy placeholder results for now.

I have lots of questions but I think I made some decent progress for 1.5 hours of work

you can view my commits:
41cabf2d85
https://github.com/generaltso/gitea/commits/feature/language-statistics

these should be probably squashed with a rebase if/when it's ready for a PR

NOTE: I had to cp -r my-local-fork-of-gitea-sdk gitea/vendor/ manually. I don't fully understand how to use vendoring with go build in this project atm

First thing I did was add a table to the DB (even though I'm not using it just yet)
00427a9095/language_statistics.sql

ALTERNATIVE TO THIS would be a single row field in the repository table with JSON data containing all the language stats.

Here's the Commits/Branches/Releases/Contributors bar

to complete this I need to know how to

  • get the number of branches for a repo
  • get the number of unique GIT_AUTHOR's for a repo

here's the language bar

It's already using linguist.LanguageColor() to use the "proper" github colors...

to complete this I need:

  • to get the complete file list of the repo
  • (maybe not right now but eventually) get the corresponding file sizes
  • read data (file contents) from the repo
  • store those results in the db
  • read those results back from the db in models/repo.go or models/language_statistics.go

my biggest questions I need guidance with now are:

  • how to work with the db?
  • how to add hooks for e.g. post-receive? (so that language statistics can be populated and kept up-to-date in the background)

ADDITIONALLY and this is me getting way ahead of myself here but for the future:

  • user data (aka user code stored with a gitea instance) should be run through the classifier again just like github does with all the code on this (incredible treasure trove and modern day library of alexandria of a) website <3 <3 <3 :octocat: ❤️ 💖
  • language statistics should be visible on other pages for example the repository list and search results

again my commits are viewable here:
41cabf2d85
https://github.com/generaltso/gitea/commits/feature/language-statistics

I appreciate any further guidance but I think I'm going to make dinner now and relax a bit.

<3

-tso

EDIT: also note there is a transition for the commits/etc bar -> language %'s. It looks like this codepen of a 3D cube flip effect because I took it directly from that and changed the timing function (see also fdcd6b8b9e/templates/repo/home.tmpl (L77))

@dayvonjersen commented on GitHub (Oct 19, 2017): OK I got my feet wet in the code base and basically just implemented the design I had previously come up with some dummy placeholder results for now. I have lots of questions but I think I made some decent progress for 1.5 hours of work you can view my commits: https://github.com/generaltso/go-sdk/commit/41cabf2d856c55b2202c02394777c905bd113c1f https://github.com/generaltso/gitea/commits/feature/language-statistics these should be probably squashed with a rebase if/when it's ready for a PR NOTE: I had to `cp -r my-local-fork-of-gitea-sdk gitea/vendor/` manually. I don't fully understand how to use vendoring with go build in this project atm First thing I did was add a table to the DB (even though I'm not using it just yet) https://github.com/generaltso/gitea/blob/00427a9095590b78ce85cca8c41d174b5bdf5db4/language_statistics.sql ALTERNATIVE TO THIS would be a single ~row~ __field__ in the `repository` table with JSON data containing all the language stats. Here's the Commits/Branches/Releases/Contributors bar ![](https://u.teknik.io/ZtmBo.png) to complete this I need to know how to - get the number of branches for a repo - get the number of unique GIT_AUTHOR's for a repo here's the language bar ![](https://u.teknik.io/9DBGr.png) It's already using `linguist.LanguageColor()` to use the "proper" github colors... to complete this I need: - to get the **complete** file list of the repo - (maybe not right now but eventually) get the corresponding file sizes - read data (file contents) from the repo - store those results in the db - read those results back from the db in `models/repo.go` or `models/language_statistics.go` my biggest questions I need guidance with now are: - how to work with the db? - how to add hooks for e.g. `post-receive`? (so that language statistics can be populated and kept up-to-date in the background) ADDITIONALLY and this is me getting way ahead of myself here but *for the future*: - user data (aka user code stored with a gitea instance) should be run through the [classifier](https://github.com/generaltso/linguist/blob/master/data/generate_classifier.go) again just like github does with all the code on this (incredible treasure trove and modern day library of alexandria of a) website <3 <3 <3 :octocat: :heart: :sparkling_heart: - language statistics should be visible on other pages for example the repository list and search results again my commits are viewable here: https://github.com/generaltso/go-sdk/commit/41cabf2d856c55b2202c02394777c905bd113c1f https://github.com/generaltso/gitea/commits/feature/language-statistics I appreciate any further guidance but I think I'm going to make dinner now and relax a bit. <3 -tso EDIT: also note there is a transition for the commits/etc bar -> language %'s. It looks like [this codepen of a 3D cube flip effect](https://codepen.io/rachsmith/pen/cojza) because I took it directly from that and changed the timing function (see also https://github.com/generaltso/gitea/blob/fdcd6b8b9ebfb8098ecfa81f92352c20ed58f270/templates/repo/home.tmpl#L77)
Author
Owner

@lunny commented on GitHub (Oct 20, 2017):

@genedna all db related operation on models sub module.

@lunny commented on GitHub (Oct 20, 2017): @genedna all db related operation on `models` sub module.
Author
Owner

@genedna commented on GitHub (Oct 20, 2017):

@lunny @genedna -> @generaltso ?

@genedna commented on GitHub (Oct 20, 2017): @lunny @genedna -> @generaltso ?
Author
Owner

@lunny commented on GitHub (Oct 20, 2017):

@genedna sorry for wrong mention. :) @generaltso

@lunny commented on GitHub (Oct 20, 2017): @genedna sorry for wrong mention. :) @generaltso
Author
Owner

@dayvonjersen commented on GitHub (Oct 20, 2017):

@lunny thanks I'll have to read over the code in models more in-depth then; will probably work on it some more later today :)

any other comments/feedback on what I did so far? or does it look ok

@dayvonjersen commented on GitHub (Oct 20, 2017): @lunny thanks I'll have to read over the code in `models` more in-depth then; will probably work on it some more later today :) any other comments/feedback on what I did so far? or does it look ok
Author
Owner

@kolaente commented on GitHub (Apr 29, 2018):

@generaltso liking it so far! Any progress update?

@kolaente commented on GitHub (Apr 29, 2018): @generaltso liking it so far! Any progress update?
Author
Owner

@OmarAssadi commented on GitHub (May 10, 2018):

Bumped the bounty to $20. If anyone else would like to see this, feel free to contribute to the bounty.

current amount

@OmarAssadi commented on GitHub (May 10, 2018): Bumped the bounty to $20. If anyone else would like to see this, feel free to contribute to the bounty. [![current amount](https://api.bountysource.com/badge/issue?issue_id=47234720)](https://www.bountysource.com/issues/47234720-statusbar)
Author
Owner

@alexanderadam commented on GitHub (May 11, 2018):

I just want to add that I would love to see the repository size in it (like this extension does it for GitHub).
And this is probably also easier than language recognition.

EDIT: created a new issue for it as wished

@alexanderadam commented on GitHub (May 11, 2018): I just want to add that I would _love_ to see the repository size in it ([like this extension does it for GitHub](https://chrome.google.com/webstore/detail/github-repository-size/apnjnioapinblneaedefcnopcjepgkci)). And this is probably also easier than language recognition. **EDIT:** [created a new issue for it as wished](https://github.com/go-gitea/gitea/issues/3953)
Author
Owner

@OmarAssadi commented on GitHub (May 11, 2018):

@alexanderadam Good idea as well! As far as the language recognition goes, there is already a linguist port to Go.

@OmarAssadi commented on GitHub (May 11, 2018): @alexanderadam Good idea as well! As far as the language recognition goes, there is already a linguist port to Go.
Author
Owner

@kolaente commented on GitHub (May 11, 2018):

@alexanderadam mind opening a seperate issue for that?

@kolaente commented on GitHub (May 11, 2018): @alexanderadam mind opening a seperate issue for that?
Author
Owner

@OmarAssadi commented on GitHub (Nov 18, 2018):

Looks like @lafriks started work on this a little while back 👍 Figured it'd be worth linking the PR and issue - #4824.

@OmarAssadi commented on GitHub (Nov 18, 2018): Looks like @lafriks started work on this a little while back 👍 Figured it'd be worth linking the PR and issue - #4824.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#890