Many database indexes are irrelevant and or obsolete #3873

Open
opened 2025-11-02 05:29:07 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @guillep2k on GitHub (Aug 29, 2019).

  • Gitea version (or commit ref): v1.9

Description

A quick check through the indexes created on a Gitea database (in this case, MSSQL), shows that there are many indexes that should not exist.

The following observations are based on the fact that in all cases the indexes are normal B-Trees (e.g. not BITMAP). B-Trees cannot be combined together, so the database must choose only one of the indexes for each particular query. For example, if user_id and repo_id are indexed separately for the table issues, the statement SELECT * FROM issues WHERE user_id = ? AND repo_id = ? will force the database to choose between the user_id and the repo_id indexes, it cannot use both.

Some examples to consider are:

Table Index name Columns Observations
action IDX_action_is_deleted is_deleted Very low cardinality
action IDX_action_is_private is_private Very low cardinality
collaboration UQE_collaboration_s repo_id, user_id Makes IDX_collaboration_repo_id obsolete
collaboration IDX_collaboration_repo_id repo_id Obsoleted by UQE_collaboration_s
commit_status IDX_commit_status_index index Unlikely to query by index alone; requires sha and repo_id
deleted_branch UQE_deleted_branch_s repo_id, name, commit Makes IDX_deleted_branch_repo_id obsolete
deleted_branch IDX_deleted_branch_repo_id repo_id Obsoleted by IDX_deleted_branch_repo_id
deploy_key UQE_deploy_key_s key_id, repo_id Makes IDX_deploy_key_key_id obsolete
deploy_key IDX_deploy_key_key_id key_id Obsoleted by UQE_deploy_key_s
issue IDX_issue_is_closed is_closed Very low cardinality
issue IDX_issue_is_pull is_pull Very low cardinality
issue UQE_issue_repo_index repo_id, index Makes IDX_issue_repo_id obsolete
issue IDX_issue_repo_id repo_id Obsoleted by UQE_issue_repo_index
lfs_meta_object UQE_lfs_meta_object_s oid, repository_id Makes IDX_lfs_meta_object_oid obsolete
lfs_meta_object IDX_lfs_meta_object_oid oid Obsoleted by UQE_lfs_meta_object_s

(And the list goes on...)

Indexes that are not useful for querying the database hurt INSERT and UPDATE performance, as the database needs to maintain their data even if they will not be used later.

The two cases detected are:

  • Very low cardinality: since there are very few possible values (e.g. true, false), there's no chance that the database will choose the index for a query. There will be other indexes more relevant for the condition. Even if is_private = 1 is the only condition for the query, in terms of I/O the database will be better off doing a full table scan than using the index.

  • Superseded by another index: if two indexes share the same prefix, only one of them is needed. For example, if one index has the columns a, b and another has a, b, c, the database can use the second for any queries with conditions on a, b columns, even if the column c is not used.

Before making a PR about this, I'd like to hear you guys take on this matter.

Originally created by @guillep2k on GitHub (Aug 29, 2019). - Gitea version (or commit ref): v1.9 ## Description A quick check through the indexes created on a Gitea database (in this case, MSSQL), shows that there are many indexes that should not exist. The following observations are based on the fact that in all cases the indexes are normal B-Trees (e.g. not BITMAP). B-Trees cannot be combined together, so the database must choose only one of the indexes for each particular query. For example, if `user_id` and `repo_id` are indexed separately for the table `issues`, the statement `SELECT * FROM issues WHERE user_id = ? AND repo_id = ?` will force the database to choose between the `user_id` and the `repo_id` indexes, it cannot use both. Some examples to consider are: | Table | Index name | Columns | Observations | |-----------------|----------------------------|-----------------------|------------------------------------------------------------| | action | IDX_action_is_deleted | is_deleted | Very low cardinality | | action | IDX_action_is_private | is_private | Very low cardinality | | collaboration | UQE_collaboration_s | repo_id, user_id | Makes IDX_collaboration_repo_id obsolete | | collaboration | IDX_collaboration_repo_id | repo_id | Obsoleted by UQE_collaboration_s | | commit_status | IDX_commit_status_index | index | Unlikely to query by index alone; requires sha and repo_id | | deleted_branch | UQE_deleted_branch_s | repo_id, name, commit | Makes IDX_deleted_branch_repo_id obsolete | | deleted_branch | IDX_deleted_branch_repo_id | repo_id | Obsoleted by IDX_deleted_branch_repo_id | | deploy_key | UQE_deploy_key_s | key_id, repo_id | Makes IDX_deploy_key_key_id obsolete | | deploy_key | IDX_deploy_key_key_id | key_id | Obsoleted by UQE_deploy_key_s | | issue | IDX_issue_is_closed | is_closed | Very low cardinality | | issue | IDX_issue_is_pull | is_pull | Very low cardinality | | issue | UQE_issue_repo_index | repo_id, index | Makes IDX_issue_repo_id obsolete | | issue | IDX_issue_repo_id | repo_id | Obsoleted by UQE_issue_repo_index | | lfs_meta_object | UQE_lfs_meta_object_s | oid, repository_id | Makes IDX_lfs_meta_object_oid obsolete | | lfs_meta_object | IDX_lfs_meta_object_oid | oid | Obsoleted by UQE_lfs_meta_object_s | (And the list goes on...) Indexes that are not useful for querying the database hurt INSERT and UPDATE performance, as the database needs to maintain their data even if they will not be used later. The two cases detected are: * **Very low cardinality**: since there are very few possible values (e.g. `true`, `false`), there's no chance that the database will choose the index for a query. There will be other indexes more relevant for the condition. Even if `is_private = 1` is the only condition for the query, in terms of I/O the database will be better off doing a full table scan than using the index. * **Superseded by another index**: if two indexes share the same prefix, only one of them is needed. For example, if one index has the columns `a, b` and another has `a, b, c`, the database can use the second for any queries with conditions on `a, b` columns, even if the column `c` is not used. Before making a PR about this, I'd like to hear you guys take on this matter.
GiteaMirror added the type/proposalissue/confirmed labels 2025-11-02 05:29:07 -06:00
Author
Owner

@lafriks commented on GitHub (Aug 29, 2019):

I agree that these needs to be cleaned up! 👍

@lafriks commented on GitHub (Aug 29, 2019): I agree that these needs to be cleaned up! :+1:
Author
Owner

@stale[bot] commented on GitHub (Oct 28, 2019):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

@stale[bot] commented on GitHub (Oct 28, 2019): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#3873