[PR #3452] [MERGED] Reduce repo indexer disk usage #16945

Closed
opened 2025-11-02 12:22:42 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/go-gitea/gitea/pull/3452
Author: @ethantkoenig
Created: 2/3/2018
Status: Merged
Merged: 2/5/2018
Merged by: @lafriks

Base: masterHead: repo_indexer_disk_usage


📝 Commits (1)

  • 55a3db8 Reduce repo indexer disk usage

📊 Changes

14 files changed (+704 additions, -97 deletions)

View changed files

📝 models/issue_indexer.go (+2 -2)
📝 models/repo_indexer.go (+10 -6)
📝 modules/indexer/indexer.go (+30 -29)
📝 modules/indexer/issue.go (+32 -27)
📝 modules/indexer/repo.go (+43 -33)
vendor/github.com/blevesearch/bleve/analysis/token/unique/unique.go (+53 -0)
vendor/github.com/ethantkoenig/rupture/Gopkg.lock (+173 -0)
vendor/github.com/ethantkoenig/rupture/Gopkg.toml (+34 -0)
vendor/github.com/ethantkoenig/rupture/LICENSE (+21 -0)
vendor/github.com/ethantkoenig/rupture/README.md (+13 -0)
vendor/github.com/ethantkoenig/rupture/flushing_batch.go (+67 -0)
vendor/github.com/ethantkoenig/rupture/metadata.go (+68 -0)
vendor/github.com/ethantkoenig/rupture/sharded_index.go (+146 -0)
📝 vendor/vendor.json (+12 -0)

📄 Description

Reduces disk usage of the repo (i.e. code) indexer:

  • Disables bleve's _all field (which meant that we were previously storing everything twice)
  • Use the bleve unique token filter (https://github.com/blevesearch/bleve/pull/739), since we only display the first occurrence of the search term.

I saw as roughly 3x (1.5GB -> 500MB) reduction in disk usage as a result of these changes (of course, mileage will vary depending on what type of text/code you are indexing).

Also introduces a migration-like versions to the issue and repo indexers to facilitate changes (which will typically require rebuilding the index).

Yes, this PR shamelessly pulls in https://github.com/ethantkoenig/rupture as a dependency to facilitate tracking indexer versions and migrations; I am aware of no other alternatives.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/go-gitea/gitea/pull/3452 **Author:** [@ethantkoenig](https://github.com/ethantkoenig) **Created:** 2/3/2018 **Status:** ✅ Merged **Merged:** 2/5/2018 **Merged by:** [@lafriks](https://github.com/lafriks) **Base:** `master` ← **Head:** `repo_indexer_disk_usage` --- ### 📝 Commits (1) - [`55a3db8`](https://github.com/go-gitea/gitea/commit/55a3db8f960bf641cdb3d848edaa2d68410225e7) Reduce repo indexer disk usage ### 📊 Changes **14 files changed** (+704 additions, -97 deletions) <details> <summary>View changed files</summary> 📝 `models/issue_indexer.go` (+2 -2) 📝 `models/repo_indexer.go` (+10 -6) 📝 `modules/indexer/indexer.go` (+30 -29) 📝 `modules/indexer/issue.go` (+32 -27) 📝 `modules/indexer/repo.go` (+43 -33) ➕ `vendor/github.com/blevesearch/bleve/analysis/token/unique/unique.go` (+53 -0) ➕ `vendor/github.com/ethantkoenig/rupture/Gopkg.lock` (+173 -0) ➕ `vendor/github.com/ethantkoenig/rupture/Gopkg.toml` (+34 -0) ➕ `vendor/github.com/ethantkoenig/rupture/LICENSE` (+21 -0) ➕ `vendor/github.com/ethantkoenig/rupture/README.md` (+13 -0) ➕ `vendor/github.com/ethantkoenig/rupture/flushing_batch.go` (+67 -0) ➕ `vendor/github.com/ethantkoenig/rupture/metadata.go` (+68 -0) ➕ `vendor/github.com/ethantkoenig/rupture/sharded_index.go` (+146 -0) 📝 `vendor/vendor.json` (+12 -0) </details> ### 📄 Description Reduces disk usage of the repo (i.e. code) indexer: - Disables bleve's `_all` field (which meant that we were previously storing everything twice) - Use the bleve unique token filter (https://github.com/blevesearch/bleve/pull/739), since we only display the first occurrence of the search term. I saw as roughly 3x (1.5GB -> 500MB) reduction in disk usage as a result of these changes (of course, mileage will vary depending on what type of text/code you are indexing). Also introduces a migration-like versions to the issue and repo indexers to facilitate changes (which will typically require rebuilding the index). Yes, this PR shamelessly pulls in https://github.com/ethantkoenig/rupture as a dependency to facilitate tracking indexer versions and migrations; I am aware of no other alternatives. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-02 12:22:42 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#16945