mirror of
https://github.com/go-gitea/gitea.git
synced 2026-03-21 14:06:26 -05:00
Index Page Incredibly Slow to Load - Slow SQL Query #13523
Closed
opened 2025-11-02 10:45:02 -06:00 by GiteaMirror
·
7 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#13523
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @davidsenk on GitHub (Sep 23, 2024).
Description
Recently on my Gitea instance, loading the index page started taking an absolute eternity, 30 seconds+ quite often. This is a bug report / analysis as to why.
I took a look in the logs for the container and noticed this:
Hmm... that's odd. Why would that query be so slow?
Running similar query against database using my actual user_id=2 and is_deleted = 0 returns in 0.006 seconds
Strange... similar query gets results quite quickly.
Let's try removing the where...
Wow! That makes the query take significantly longer!
Output from the actual select statement:
Changing the select to remove the inner join:
From reviewing the source code, the columns for
repositoryare intentionally ignored.e1f0598c8f/models/activities/action.go (L456-L459)With this knowledge, I was able to build a significantly faster query that achieves the same thing (only getting actions for repositories that exist in the
repositorytable):Look at that! 0.028 seconds! Much better! But wait? What in the world is that query? Let me break it down a bit:
The original query is completing an "inner join" for the apparently sole purpose of determing if the actions belong to repositories that currently exist.
The first thing I tried was to change it from an inner join to a subquery:
Unfortunately... this is also just as slow...
So thinking about it... how do you make it faster? What are we trying to do? Exclude actions that have a repo_id that is not in repositories right?
Let's take a look at the queries we just used above!
Ok, both of those look to be pretty similar, but what's happening is MariaDB needs to compare all of the ids against the "select id from repository" part of the subquery.
This is likely the "same thing" that MariaDB is doing when completing the above inner join.
So how do you make this faster?
Let's flip it around and only get the repo_ids of actions that are not in the repository table!
Neat! For me it's an empty set! (and it generally should be; however, repo_id is not a foreign key in the current schema)
Now, when you do a select against that subquery, it is much faster to execute as there is "less" that needs to be scanned for each row.
Bringing us back to this...
That is where the odd looking subquery comes from; however, it leads to significantly faster queries that still maintain the "same" restrictions
as the existing inner join!
Finally, let's add user_id=2 and is_deleted=0 back just for a "apples to apples" comparison with original query:
Neat! Just as fast (0.002 seconds), I'm omitting the actual output from this query as it's just an absolute load of json in the
contentcolumn, but otherwise looks similar to above samples.I'd test changing the query in the source code; however, I do not have a Go development environment currently setup. I do hope this knowledge can be incorporated into the next version update for faster index page loads!
Thanks!
Gitea Version
1.22.2
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Screenshots
No response
Git Version
No response
Operating System
AlmaLinux 9.4 - Official Gitea container
How are you running Gitea?
Official gitea container, running on podman:
Database
MySQL/MariaDB
@KazzmanK commented on GitHub (Sep 23, 2024):
Any time we need to teach dumb database engine to execute simple queries with such a mess, it leads to "wtf is happening here???" in the feature.
Have you tried to update statistics/rebuild indexes?
@davidsenk commented on GitHub (Sep 23, 2024):
The indexes are already there, the query that is being ran is just not optimized to use the indexes appropriately.
The tl;dr is the because of the inner join, every row on actions needs to be compared
repo_id == repository.id?Limiting the number of comparisons (by only comparing against the set of repo_id that do not exist in repository.id) massively increases performance. In my example, I have ~70 repositories and ~86k actions.
Each of the 86k actions needs to be compared against ~70 repository.id values, leading to 86k * 70 = ~6 million comparisons. Reducing that to... 0 (in my case) means that the query can run quite a lot faster. It's making the database do less work.
Instead of asking "get me the set of actions for the repo_ids in repository.id" ask "get me the set of actions with a repo_id not in the set of ids that do not exist in repository.id"
i.e. it's a "double negative" to get back to the same place, but let's the database figure it out a lot faster.
Ideally action.repo_id should be a foreign key to repository.id, then the database would already know if the repository exists in the repository table already. Without it being a foreign key, it needs to do all this computation every time (or use the "not not" trick I showed above).
@KazzmanK commented on GitHub (Sep 24, 2024):
inner join is not that straightforward. It leads to inner loop in case of bad planning or, for example, because of query hint (inner loop join for MSSQL). Mostly it is a main tool for queries and highly optimized. No 6 mils comparisons in loops.
Imagine your tables (or indexes) are sorted by join key. You need two pointers to walk through tables. you move left or right pointer when you need to advance to next row or to next join predicate value. if join keys are same it is okay, if not - advance required number of rows on sorted list and you`ll determine a new join pair or determine that there are no join pairs.
It is one-time pass. There is a number of other optimizations, like hash, bitmap etc helper structures to make that basic kind of joins run extremely fast. It this case with double negative you put db engine into state that it cannot chouse wrong execution plan. That is great in terms of fine-tuning (i'm doing that twice a day all the time) but if not great in general. It is better to find the roots of issue. First - reindex, reanalyse, garbage collect and so on, to make db optimizations awake.
@lunny commented on GitHub (Sep 24, 2024):
Maybe we can remove the inner join directly and provide a doctor command to clean the dirty records on action table.
@davidsenk commented on GitHub (Sep 24, 2024):
@lunny that would probably work pretty well, maybe have it be one of the "cronjobs" where it removes action entries if they refer to repos that do not exist in the repository table. This could be done using the select unique(xxx) not in (select id) query example.
The select without the join is plenty fast, it's just the comparison between tables that's slow.
Ideally, I believe that action.repo_id should be a foreign key to the repository table and then a delete cascade can be added to automatically "clean up" the action table as well.
@somera commented on GitHub (Sep 25, 2024):
On the index page (and on pagging)
On which hardware do you run gitea?
After #31752 the whole start page need ~3-5 seconds. It run on an NUC with an Intel Pentium Silver J5005 and 32GB RAM and PostgreSQL.
In my case the problem is this
which is made on every page. All other stuff is fast now.
@lunny commented on GitHub (Sep 25, 2024):
@somera #32127 will make it more faster after #31752 .