mirror of
https://github.com/go-gitea/gitea.git
synced 2026-05-06 02:02:32 -05:00
Prune hook_task Table #5069
Closed
opened 2025-11-02 06:13:10 -06:00 by GiteaMirror
·
19 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#5069
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jag3773 on GitHub (Mar 16, 2020).
[x]): NADescription
The hook_task table doesn't appear to be pruned at any point, so on an active site this table can grow to be very large and it makes loading the edit webhook page quite slow. I'm not sure how the
actiontable relates but that may need to be pruned also?Possible Solution
One solution I thought of is to allow administrators to set a max number of deliveries to retain per webhook.
Screenshots
@lafriks commented on GitHub (Mar 16, 2020):
Action table can't really be pruned as it contains valuable activity information, while I agree webhooks could be cleared up by some policy
@lunny commented on GitHub (Mar 17, 2020):
At first, we can add button on repository webhook management UI to delete old webhooks.
@jag3773 commented on GitHub (Mar 17, 2020):
@lunny If I understand your suggestion, I don't think that solves the problem. That's only helpful if you have a handful of repositories, in which case you would never have this problem in the first place. Imagine having to click that button on 25,000 repositories!
The only realistic solution in that case is to have a global setting that the admin can control.
@lunny commented on GitHub (Mar 17, 2020):
@jag3773 I also think there should be a button on admin panel but it's not conflicted with my idea. The button on admin panel will delete all the repositories in the gitea instance and for a public instance, we should let user chose whether to delete them.
@bhalbright commented on GitHub (Apr 14, 2020):
Just posting a note I am looking into this issue, thanks!
@stale[bot] commented on GitHub (Jun 13, 2020):
This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.
@lunny commented on GitHub (Jun 13, 2020):
A configuration item could be added to keep recent 1 year(or longer) webhooks. A background go routine could clean them every day.
@bhalbright commented on GitHub (Jun 15, 2020):
@lunny thanks, I had submitted a PR for an implementation based on deleting all but x webhooks (can be set per repo). Do you have any thoughts there...that was the implementation that worked best for our use case but I can understand if you'd rather have something a little different for general usage.
@lunny commented on GitHub (Jun 18, 2020):
Then we could have two choices. One is to delete old finished records, another is to keep some recent unfinished records.
@jag3773 commented on GitHub (Jun 18, 2020):
Sounds like you are both saying similar things. My request is to limit the number of items kept in the history, whether it is date based or count based is not so important.
The advantage of "count based" is that every repo will retain a certain number of recent events. In a date based system, it's likely you would not see any history for a repo that hasn't been used recently.
@bhalbright commented on GitHub (Jun 21, 2020):
I guess @lunny you are saying we should give the user the option to either purge by "older than x days" or an option like we had suggested "delete all but most recent x entries"? Which should be the default option?
@lunny commented on GitHub (Jun 21, 2020):
@bhalbright Yes. That's what I meant.
@jgkirschbaum commented on GitHub (Jul 16, 2020):
@lunny @lafriks If I see this correctly, the action table could also be pruned. In my opinion the only function of the action table is to provide data for the dashboard. The timespan for the dashboard is currently 1 year, so data older than one year could be pruned. Our action table is about 1 GB in size after 1 year usage of gitea, which makes up 90% of the total database size.
So it would be perfect if someone could please implement the following features:
dashboard_display_period).action_historization_period(default isdashboard_display_period)).action_historization_period.I'm sorry I can't support coding, but I don't speak go.
@lunny commented on GitHub (Jul 16, 2020):
I think we could also have a button on admin panel to clean the two tables.
@jgkirschbaum commented on GitHub (Jul 16, 2020):
Yes, would be a first step.
@zeripath commented on GitHub (Jul 28, 2020):
So partially this is a problem of database management. When you get into large enough systems those running Gitea are going to need to actually do some DB management themselves and not rely on the ORM creating a perfect DB.
For example here if you were using a postgres 10+ DB backend you could simply PARTITION the action table etc. Similarly for other DB systems.
Throwing away action data is a decision for server managers - I'm not sure that we at Gitea should be running anything that deletes data by default.
However, we can and should do a few things here.
Another option is whether we can store these actions on disk as a sort of hybrid db - however, we are then getting in to the situation of effectively being a DBMS - it's the job of the DB to decide how to partition and manage big tables. I'm not certain how GH or Facebook manage their big tables - I know some people advocate throwing this sort of stuff into a non-relational/NoSQL db depending on the inherent structure within these - and given we mostly don't use the relational features of this data that could work.
One thing we should additionally look at is why this data is updated and not immutable - if it can be made immutable then the hybrid approach may make more sense.
@bhalbright commented on GitHub (Jul 29, 2020):
@zeripath regarding the cron to delete old actions being OFF by default, would you expect the same for a cron job to delete from hook_task? In the PR I had submitted it was turned on by default globally and then you could turn on/off by repo in the UI.
@jgkirschbaum commented on GitHub (Jul 29, 2020):
@zeripath
You are right, that would be simple and effective, but I would not recommend this approach to Gitea admins, because then you have another DDL as the one delivered with Gitea.
That's great and I appreciate that.
IMHO I would advice you not to use plain text files and create a hybrid db of your own. Plain text files are a mess and in a container environment they are a mess and pain. I think the preasure on this topic isn't so high, so making the things configurable paired with a few simple db jobs inside Gitea as you recommended would be sufficient.
Which data is updated? As far as I know, the data in both the action table and the hook_task table are immutable and are not updated. But as I mentioned before I would not go with an hybrid approach, this introduces unneeded complexity.
Thank you for your efforts.
@zeripath commented on GitHub (Jul 29, 2020):
@bhalbright in regards to the hook_task table - in some ways that could be argued as being just user visible logging. Actions though are the unique behaviours of the users - it's a bit more than logging.