Support better search engines for indexing (elasticsearch, mysql/pgsql/sqlite fulltext or something else) #3200

Closed
opened 2025-11-02 05:03:40 -06:00 by GiteaMirror · 8 comments
Owner

Originally created by @vitalif on GitHub (Apr 16, 2019).

  • Gitea 1.7.6
  • Git 2.11
  • Operating system: Linux
  • Database: all
  • Can you reproduce the bug at https://try.gitea.io:
    • Not relevant

Description

Bleve indexer is very inefficient: it uses a lot of disk space and a lot of memory. Also it keeps all index mmap'ed all the time which makes Gitea crash when I enable it on my 32-bit server with just 1.4gb of git repositories after generating 2GB of index.

There exist a lot of more popular and efficient full-text search engines, starting with ones built into Postgres / MySQL / SQLite (MySQL's one is not the most efficient one, but it still works). Then there's Elasticsearch and so on. Index sizes are much smaller in Elasticsearch and Postgres (compared to size of indexed data).

Originally created by @vitalif on GitHub (Apr 16, 2019). - Gitea 1.7.6 - Git 2.11 - Operating system: Linux - Database: all - Can you reproduce the bug at https://try.gitea.io: - [x] Not relevant ## Description Bleve indexer is very inefficient: it uses a lot of disk space and a lot of memory. Also it keeps all index mmap'ed all the time which makes Gitea crash when I enable it on my 32-bit server with just 1.4gb of git repositories after generating 2GB of index. There exist a lot of more popular and efficient full-text search engines, starting with ones built into Postgres / MySQL / SQLite (MySQL's one is not the most efficient one, but it still works). Then there's Elasticsearch and so on. Index sizes are much smaller in Elasticsearch and Postgres (compared to size of indexed data).
GiteaMirror added the type/proposal label 2025-11-02 05:03:40 -06:00
Author
Owner

@lunny commented on GitHub (Apr 16, 2019):

We are refactoring issue indexer, after that, we will start to refactor code indexer. Some PRs you can find, i.e. https://github.com/go-gitea/gitea/pull/6150

@lunny commented on GitHub (Apr 16, 2019): We are refactoring issue indexer, after that, we will start to refactor code indexer. Some PRs you can find, i.e. https://github.com/go-gitea/gitea/pull/6150
Author
Owner

@adamcavendish commented on GitHub (May 1, 2019):

Hi I've also seen in the configuration files that there're two types of ISSUE_INDEXER_TYPE available. What is the differences between "db" and "bleve"? Is it safe to change from "bleve" into "db" in config files and then a simple restart?

@adamcavendish commented on GitHub (May 1, 2019): Hi I've also seen in the configuration files that there're two types of ISSUE_INDEXER_TYPE available. What is the differences between "db" and "bleve"? Is it safe to change from "bleve" into "db" in config files and then a simple restart?
Author
Owner

@lunny commented on GitHub (May 6, 2019):

@adamcavendish db will use database's Like to search issues. Your operations are safe. But both types are inefficient.

@lunny commented on GitHub (May 6, 2019): @adamcavendish db will use database's `Like` to search issues. Your operations are safe. But both types are inefficient.
Author
Owner

@alexanderadam commented on GitHub (Jul 3, 2019):

A proper search support could fix things like #5694 #5277 #3448, #2967, #2434, #8366, #8386, #7825, #10147 and #10764 if implemented properly. Those might not be the "same" but their cause is [probably] the current indexing/searching implementation.

And I guess it would also help to lower the amount of memory related bugs (i.e. #4807).

@alexanderadam commented on GitHub (Jul 3, 2019): A proper search support could fix things like #5694 #5277 #3448, #2967, #2434, #8366, #8386, #7825, #10147 and #10764 if implemented properly. Those might not be the "same" but their cause is [probably] the current indexing/searching implementation. And I guess it would also help to lower the amount of memory related bugs (i.e. #4807).
Author
Owner

@jeffliu27 commented on GitHub (Jul 5, 2019):

Would love to help with the implementation for the elasticsearch code search backend! @lunny @jeblair

@jeffliu27 commented on GitHub (Jul 5, 2019): Would love to help with the implementation for the elasticsearch code search backend! @lunny @jeblair
Author
Owner

@rcarmo commented on GitHub (May 24, 2020):

Just a note that SQLite has FTS indexing, and that it is quite efficient (I have gigabytes of plain text files indexed that way)

@rcarmo commented on GitHub (May 24, 2020): Just a note that SQLite has FTS indexing, and that it is quite efficient (I have gigabytes of plain text files indexed that way)
Author
Owner

@rcarmo commented on GitHub (Sep 11, 2020):

So no built-in, single binary SQLite search?

@rcarmo commented on GitHub (Sep 11, 2020): So no built-in, single binary SQLite search?
Author
Owner

@lunny commented on GitHub (Sep 11, 2020):

@rcarmo issues index support built-in SQLite search, code index support built-in bleve search.

@lunny commented on GitHub (Sep 11, 2020): @rcarmo issues index support built-in SQLite search, code index support built-in bleve search.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#3200