Search Functionality Issues with Bleeve Engine #13242

Closed
opened 2025-11-02 10:35:56 -06:00 by GiteaMirror · 34 comments
Owner

Originally created by @amix307 on GitHub (Jul 5, 2024).

Description

Hi,

I'm following up on my previous ticket: https://github.com/go-gitea/gitea/issues/30064. I always keep the service updated to the latest version, and the search problem persists even on version 1.22.1. I'm using the inbuilt Bleeve engine.

Here’s another example of the issue:
When searching for the string "services.gradle.org", there is one file in the repository where this string should be found. However, the Exact search method does not find the result, and the Fuzzy search hangs for about 2-3 minutes without finding anything.

To give more context about my instance: I have about 300 organizations and 3000 repositories, but overall the size is small as I don't have heavy files. I know for sure that there are many occurrences of the search string across default branches, likely several hundred. Despite this, the Exact search method returns only about 10 results and does so instantly. The Fuzzy search, however, hangs indefinitely. On version 1.21.11, I even encountered a 500 error and the Gitea service restarted.

Manual re-indexing does not help. I would like to resolve the search issues and have more transparent ways to understand how the code is indexed and to have more flexible control over Bleeve settings.

Thanks in advance for your help. I've attached some screenshots for reference.

Gitea Version

1.22.1

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

2024-07-05_11-35-26
2024-07-05_11-37-40
2024-07-05_11-38-37
2024-07-05_11-41-55

Git Version

2.31.1

Operating System

CentOS Stream 9

How are you running Gitea?

Self-Hosted from dl.gitea.org

Database

PostgreSQL

Originally created by @amix307 on GitHub (Jul 5, 2024). ### Description Hi, I'm following up on my previous ticket: https://github.com/go-gitea/gitea/issues/30064. I always keep the service updated to the latest version, and the search problem persists even on version 1.22.1. I'm using the inbuilt Bleeve engine. Here’s another example of the issue: When searching for the string "services.gradle.org", there is one file in the repository where this string should be found. However, the Exact search method does not find the result, and the Fuzzy search hangs for about 2-3 minutes without finding anything. To give more context about my instance: I have about 300 organizations and 3000 repositories, but overall the size is small as I don't have heavy files. I know for sure that there are many occurrences of the search string across default branches, likely several hundred. Despite this, the Exact search method returns only about 10 results and does so instantly. The Fuzzy search, however, hangs indefinitely. On version 1.21.11, I even encountered a 500 error and the Gitea service restarted. Manual re-indexing does not help. I would like to resolve the search issues and have more transparent ways to understand how the code is indexed and to have more flexible control over Bleeve settings. Thanks in advance for your help. I've attached some screenshots for reference. ### Gitea Version 1.22.1 ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist _No response_ ### Screenshots ![2024-07-05_11-35-26](https://github.com/go-gitea/gitea/assets/24320787/c1cc9b9a-4f35-47af-8e9c-6b4276eb3ec2) ![2024-07-05_11-37-40](https://github.com/go-gitea/gitea/assets/24320787/3079344d-f429-49a0-9907-8134a78a60a4) ![2024-07-05_11-38-37](https://github.com/go-gitea/gitea/assets/24320787/50509b41-68ec-4a82-81d1-21a98abe76ca) ![2024-07-05_11-41-55](https://github.com/go-gitea/gitea/assets/24320787/6adc29af-7f4e-4842-b6de-d9830494ebfe) ### Git Version 2.31.1 ### Operating System CentOS Stream 9 ### How are you running Gitea? Self-Hosted from dl.gitea.org ### Database PostgreSQL
Author
Owner

@silverwind commented on GitHub (Jul 5, 2024):

the Fuzzy search hangs for about 2-3 minutes without finding anything.

I also noticed this search hanging starting with v1.22. If kept running, gitea continously consumes more and more memory up until the point where it exhausts the system resources. It's only for certain search terms, not all of them.

@silverwind commented on GitHub (Jul 5, 2024): > the Fuzzy search hangs for about 2-3 minutes without finding anything. I also noticed this search hanging starting with v1.22. If kept running, gitea continously consumes more and more memory up until the point where it exhausts the system resources. It's only for certain search terms, not all of them.
Author
Owner

@amix307 commented on GitHub (Jul 5, 2024):

the Fuzzy search hangs for about 2-3 minutes without finding anything.

I also noticed this search hanging starting with v1.22. If kept running, gitea continously consumes more and more memory up until the point where it exhausts the system resources. It's only for certain search terms, not all of them.

Yes, yesterday my instance hangs and service gitea was restarted with OOM on VM 64gig ram)

@amix307 commented on GitHub (Jul 5, 2024): > > the Fuzzy search hangs for about 2-3 minutes without finding anything. > > I also noticed this search hanging starting with v1.22. If kept running, gitea continously consumes more and more memory up until the point where it exhausts the system resources. It's only for certain search terms, not all of them. Yes, yesterday my instance hangs and service gitea was restarted with OOM on VM 64gig ram)
Author
Owner

@MICCustomsSolutions commented on GitHub (Jul 5, 2024):

Same issue here. Eats all the RAM.
Disabling the index fixes the issue.

@MICCustomsSolutions commented on GitHub (Jul 5, 2024): Same issue here. Eats all the RAM. Disabling the index fixes the issue.
Author
Owner

@carobme commented on GitHub (Jul 9, 2024):

I can confirm this issue. Using the fuzzy code search with v1.22.1 (self-hosted) results in Gitea memory usage growing until the process runs out of memory.

Rebuilding the index (by stopping Gitea, rm -rf indexers/* and starting again) doesn't make a difference.

@carobme commented on GitHub (Jul 9, 2024): I can confirm this issue. Using the fuzzy code search with v1.22.1 (self-hosted) results in Gitea memory usage growing until the process runs out of memory. Rebuilding the index (by stopping Gitea, `rm -rf indexers/*` and starting again) doesn't make a difference.
Author
Owner

@techknowlogick commented on GitHub (Jul 9, 2024):

Pinging @6543

@techknowlogick commented on GitHub (Jul 9, 2024): Pinging @6543
Author
Owner

@makar112233 commented on GitHub (Jul 10, 2024):

We're also had same issue

@makar112233 commented on GitHub (Jul 10, 2024): We're also had same issue
Author
Owner

@smartEBL commented on GitHub (Jul 17, 2024):

We are affected by this issue as well. A temporary fix for us is changing

[indexer]
REPO_INDEXER_ENABLED = true

to

[indexer]
REPO_INDEXER_ENABLED = false

Code search on single repository still works then, global search does not (of course). So it would be really nice to see that fixed. Is there anything we could provide for debugging?

@smartEBL commented on GitHub (Jul 17, 2024): We are affected by this issue as well. A temporary fix for us is changing ```ini [indexer] REPO_INDEXER_ENABLED = true ``` to ```ini [indexer] REPO_INDEXER_ENABLED = false ``` Code search on single repository still works then, global search does not (of course). So it would be really nice to see that fixed. Is there anything we could provide for debugging?
Author
Owner

@amix307 commented on GitHub (Jul 19, 2024):

@smartEBL now i switched to single-node elasticsearch in container at same vm and it works good, but have same issues with non transparent index mechanism (not searching all)

@amix307 commented on GitHub (Jul 19, 2024): @smartEBL now i switched to single-node elasticsearch in container at same vm and it works good, but have same issues with non transparent index mechanism (not searching all)
Author
Owner

@nekdan commented on GitHub (Jul 22, 2024):

We also encountered this issue and disabled the search functionality to ensure normal operation. However, this is a poor solution because we need the search functionality for our work.

@nekdan commented on GitHub (Jul 22, 2024): We also encountered this issue and disabled the search functionality to ensure normal operation. However, this is a poor solution because we need the search functionality for our work.
Author
Owner

@lunny commented on GitHub (Jul 22, 2024):

Can anyone reproduce it in the development version? I think it's related to bleve versions.

@lunny commented on GitHub (Jul 22, 2024): Can anyone reproduce it in the development version? I think it's related to bleve versions.
Author
Owner

@kemzeb commented on GitHub (Jul 22, 2024):

Can anyone reproduce it in the development version? I think it's related to bleve versions.

I am not yet familiar with how the code search indexing works nor am I familiar with bleve, but this could be the case after looking into the recent bleve releases.

In Gitea v1.21 we use bleve v2.30.10. Gitea v1.22 and onwards uses bleve v2.4.0.

bleve v2.4.1 adds a fix to a memory leak problem associated to their "vector query path" (more specifically, it was a problem in a indirect dependency called blevesearch/go-faiss).

I am not sure if this "path" is something that our code will eventually execute (or if it is only used by the vector indexing feature introduced in bleve v2.4.0), but I wish to bring this up to those that are maybe more familiar.

@kemzeb commented on GitHub (Jul 22, 2024): > Can anyone reproduce it in the development version? I think it's related to bleve versions. I am not yet familiar with how the code search indexing works nor am I familiar with bleve, but this could be the case after looking into the recent bleve releases. In Gitea v1.21 we use bleve v2.30.10. Gitea v1.22 and onwards uses bleve [ v2.4.0](https://github.com/blevesearch/bleve/releases/tag/v2.4.0). bleve [v2.4.1](https://github.com/blevesearch/bleve/releases/tag/v2.4.1) adds a fix to a memory leak problem associated to their "vector query path" (more specifically, it was a problem in a indirect dependency called [blevesearch/go-faiss](https://github.com/blevesearch/go-faiss/commit/693b06a12ca7c83f194973de009c8290de3ada90)). I am not sure if this "path" is something that our code will eventually execute (or if it is only used by the vector indexing feature introduced in bleve v2.4.0), but I wish to bring this up to those that are maybe more familiar.
Author
Owner

@silverwind commented on GitHub (Jul 23, 2024):

According to go.mod, gitea v1.22 still uses bleve v2.3.10, so exact same version as v1.21:

https://github.com/go-gitea/gitea/blob/release/v1.22/go.mod#L22
https://github.com/go-gitea/gitea/blob/release/v1.21/go.mod#L21

I think the issue must lie in first-party code.

@silverwind commented on GitHub (Jul 23, 2024): According to go.mod, gitea v1.22 still uses bleve v2.3.10, so exact same version as v1.21: https://github.com/go-gitea/gitea/blob/release/v1.22/go.mod#L22 https://github.com/go-gitea/gitea/blob/release/v1.21/go.mod#L21 I think the issue must lie in first-party code.
Author
Owner

@kemzeb commented on GitHub (Jul 27, 2024):

After doing the following when firing up my dev Gitea instance:

  • Create an empty repo
  • Push Gitea's entire commit history to this empty repo (chose Gitea since its has a large history)
  • Execute a fuzzy code search with the query "merge.go"

I was able to notice the huge memory cost. Here is my generated pprof graph for reference.

I also checked out snapshot 1262ff6734 (as this was before major changes were made to code search fuzzing) and I did not notice a performance impact on memory when observing my heap usage in the admin dashboard.

Don't have time to dig into this further yet, but thought this could be helpful in some way.

@kemzeb commented on GitHub (Jul 27, 2024): After doing the following when firing up my dev Gitea instance: - Create an empty repo - Push Gitea's entire commit history to this empty repo (chose Gitea since its has a large history) - Execute a fuzzy code search with the query "merge.go" I was able to notice the huge memory cost. [Here](https://github.com/user-attachments/assets/83500580-a98d-41ea-97d1-f817fe0b7a07) is my generated pprof graph for reference. I also checked out snapshot 1262ff6734543b37d834e63a6a623648c77ee4f4 (as this was before major changes were made to code search fuzzing) and I did not notice a performance impact on memory when observing my heap usage in the admin dashboard. Don't have time to dig into this further yet, but thought this could be helpful in some way.
Author
Owner

@lunny commented on GitHub (Aug 16, 2024):

Bleve has been upgraded to v2.4.2 via #31762 but I don't whether this issue has been resolved.

@lunny commented on GitHub (Aug 16, 2024): Bleve has been upgraded to v2.4.2 via #31762 but I don't whether this issue has been resolved.
Author
Owner

@amix307 commented on GitHub (Aug 20, 2024):

Bleve has been upgraded to v2.4.2 via #31762 but I don't whether this issue has been resolved.

i can try to test on nightly 1.22

@amix307 commented on GitHub (Aug 20, 2024): > Bleve has been upgraded to v2.4.2 via #31762 but I don't whether this issue has been resolved. i can try to test on nightly 1.22
Author
Owner

@silverwind commented on GitHub (Aug 29, 2024):

@smartEBL now i switched to single-node elasticsearch in container at same vm and it works good, but have same issues with non transparent index mechanism (not searching all)

I switched to elasticsearch repo indexer as well now. It seems to be more efficient regarding disk usage than bleve, likely because of missing compression of the bleve index files. I noticed one bug so far in that exact search does not work correctly with elasticsearch, it's always fuzzy.

@silverwind commented on GitHub (Aug 29, 2024): > @smartEBL now i switched to single-node elasticsearch in container at same vm and it works good, but have same issues with non transparent index mechanism (not searching all) I switched to elasticsearch repo indexer as well now. It seems to be more efficient regarding disk usage than bleve, likely because of missing compression of the bleve index files. I noticed one bug so far in that exact search does not work correctly with elasticsearch, it's always fuzzy.
Author
Owner

@smartEBL commented on GitHub (Sep 2, 2024):

@amix307 @silverwind
How do your setups look like now? How did you change the app.ini to achieve that?

@smartEBL commented on GitHub (Sep 2, 2024): @amix307 @silverwind How do your setups look like now? How did you change the app.ini to achieve that?
Author
Owner

@amix307 commented on GitHub (Sep 2, 2024):

@amix307 @silverwind How do your setups look like now? How did you change the app.ini to achieve that?

[indexer]

; before:

;ISSUE_INDEXER_NAME = gitea_issues
;ISSUE_INDEXER_TYPE = bleve
;MAX_FILE_SIZE = 1048576
;REPO_INDEXER_ENABLED = false
;REPO_INDEXER_EXCLUDE = resources/bin/**
;REPO_INDEXER_INCLUDE =
;REPO_INDEXER_PATH = indexers/repos.bleve
;REPO_INDEXER_TYPE = bleve
;STARTUP_TIMEOUT = 30s

; now:

REPO_INDEXER_ENABLED = true
REPO_INDEXER_REPO_TYPES = sources,forks,mirrors,templates
REPO_INDEXER_TYPE = elasticsearch
REPO_INDEXER_CONN_STR = http://elastic:changeme@localhost:9200
REPO_INDEXER_NAME = gitea_codes
REPO_INDEXER_INCLUDE =
REPO_INDEXER_EXCLUDE =
MAX_FILE_SIZE = 1048576
@amix307 commented on GitHub (Sep 2, 2024): > @amix307 @silverwind How do your setups look like now? How did you change the app.ini to achieve that? ``` [indexer] ; before: ;ISSUE_INDEXER_NAME = gitea_issues ;ISSUE_INDEXER_TYPE = bleve ;MAX_FILE_SIZE = 1048576 ;REPO_INDEXER_ENABLED = false ;REPO_INDEXER_EXCLUDE = resources/bin/** ;REPO_INDEXER_INCLUDE = ;REPO_INDEXER_PATH = indexers/repos.bleve ;REPO_INDEXER_TYPE = bleve ;STARTUP_TIMEOUT = 30s ; now: REPO_INDEXER_ENABLED = true REPO_INDEXER_REPO_TYPES = sources,forks,mirrors,templates REPO_INDEXER_TYPE = elasticsearch REPO_INDEXER_CONN_STR = http://elastic:changeme@localhost:9200 REPO_INDEXER_NAME = gitea_codes REPO_INDEXER_INCLUDE = REPO_INDEXER_EXCLUDE = MAX_FILE_SIZE = 1048576 ```
Author
Owner

@smartEBL commented on GitHub (Sep 2, 2024):

@amix307 @silverwind How do your setups look like now? How did you change the app.ini to achieve that?

[indexer]

; before:

;ISSUE_INDEXER_NAME = gitea_issues
;ISSUE_INDEXER_TYPE = bleve
;MAX_FILE_SIZE = 1048576
;REPO_INDEXER_ENABLED = false
;REPO_INDEXER_EXCLUDE = resources/bin/**
;REPO_INDEXER_INCLUDE =
;REPO_INDEXER_PATH = indexers/repos.bleve
;REPO_INDEXER_TYPE = bleve
;STARTUP_TIMEOUT = 30s

; now:

REPO_INDEXER_ENABLED = true
REPO_INDEXER_REPO_TYPES = sources,forks,mirrors,templates
REPO_INDEXER_TYPE = elasticsearch
REPO_INDEXER_CONN_STR = http://elastic:changeme@localhost:9200
REPO_INDEXER_NAME = gitea_codes
REPO_INDEXER_INCLUDE =
REPO_INDEXER_EXCLUDE =
MAX_FILE_SIZE = 1048576

Cool, thanks for the fast reply. And how complicated is the setup/configuration of the elastic search? You also just spawn it inside some docker container? Or were there any big showstoppers you came across?

@smartEBL commented on GitHub (Sep 2, 2024): > > @amix307 @silverwind How do your setups look like now? How did you change the app.ini to achieve that? > > ``` > [indexer] > > ; before: > > ;ISSUE_INDEXER_NAME = gitea_issues > ;ISSUE_INDEXER_TYPE = bleve > ;MAX_FILE_SIZE = 1048576 > ;REPO_INDEXER_ENABLED = false > ;REPO_INDEXER_EXCLUDE = resources/bin/** > ;REPO_INDEXER_INCLUDE = > ;REPO_INDEXER_PATH = indexers/repos.bleve > ;REPO_INDEXER_TYPE = bleve > ;STARTUP_TIMEOUT = 30s > > ; now: > > REPO_INDEXER_ENABLED = true > REPO_INDEXER_REPO_TYPES = sources,forks,mirrors,templates > REPO_INDEXER_TYPE = elasticsearch > REPO_INDEXER_CONN_STR = http://elastic:changeme@localhost:9200 > REPO_INDEXER_NAME = gitea_codes > REPO_INDEXER_INCLUDE = > REPO_INDEXER_EXCLUDE = > MAX_FILE_SIZE = 1048576 > ``` Cool, thanks for the fast reply. And how complicated is the setup/configuration of the elastic search? You also just spawn it inside some docker container? Or were there any big showstoppers you came across?
Author
Owner

@silverwind commented on GitHub (Sep 2, 2024):

We're getting offtopic, but it's pretty trivial, for example with docker run:

docker run --name="elasticsearch" \
  --volume="/opt/elasticsearch:/usr/share/elasticsearch/data" \
  --volume="/usr/share/zoneinfo:/usr/share/zoneinfo:ro" \
  --volume="/etc/localtime:/etc/localtime:ro" \
  --env="discovery.type=single-node" \
  --env="xpack.security.enabled=false" \
  --env="ingest.geoip.downloader.enabled=false" \
  --env="ELASTIC_USERNAME=elastic" \
  --env="ELASTIC_PASSWORD=changeme" \
  --publish="0.0.0.0:9200:9200" \
  --restart="always" \
  --detach="true" \
  "elasticsearch:7.17.23"
@silverwind commented on GitHub (Sep 2, 2024): We're getting offtopic, but it's pretty trivial, for example with `docker run`: ```bash docker run --name="elasticsearch" \ --volume="/opt/elasticsearch:/usr/share/elasticsearch/data" \ --volume="/usr/share/zoneinfo:/usr/share/zoneinfo:ro" \ --volume="/etc/localtime:/etc/localtime:ro" \ --env="discovery.type=single-node" \ --env="xpack.security.enabled=false" \ --env="ingest.geoip.downloader.enabled=false" \ --env="ELASTIC_USERNAME=elastic" \ --env="ELASTIC_PASSWORD=changeme" \ --publish="0.0.0.0:9200:9200" \ --restart="always" \ --detach="true" \ "elasticsearch:7.17.23" ```
Author
Owner

@planbnet commented on GitHub (Sep 6, 2024):

I just had this problem with a simple search on our local instance running Gitea 1.22.1.
The installation has a few hundred repos and Bleve configured like this:

[indexer]
REPO_INDEXER_ENABLED = true
REPO_INDEXER_PATH = indexers/repos.bleve
MAX_FILE_SIZE = 1048576
REPO_INDEXER_INCLUDE =
REPO_INDEXER_EXCLUDE = client/bower_components/**,bower_components/**,node-modules/**,client/node-modules/**

16 GB of memory are slowly eaten up by the following goroutine (taken from Gitea's monitoring/stacktrace view). Clicking the trash can icon to kill the process does nothing.

GET: /explore/code?q=build.yaml&fuzzy=true

  • [...until here, there are always different function calls listed with each reload of the monitoring view, but the following stacktrace items will always be the same..]

  • github.com/blevesearch/bleve/v2/search/searcher.(*PhraseSearcher).Next
    /go/pkg/mod/github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_phrase.go:229

  • github.com/blevesearch/bleve/v2/search/collector.(*TopNCollector).Collect
    /go/pkg/mod/github.com/blevesearch/bleve/v2@v2.3.10/search/collector/topn.go:228

  • github.com/blevesearch/bleve/v2.(*indexImpl).SearchInContext
    /go/pkg/mod/github.com/blevesearch/bleve/v2@v2.3.10/index_impl.go:580

  • code.gitea.io/gitea/modules/indexer/code/bleve.(*Indexer).Search
    /source/modules/indexer/code/bleve/bleve.go:285

  • code.gitea.io/gitea/modules/indexer/code.PerformSearch
    /source/modules/indexer/code/search.go:138

  • code.gitea.io/gitea/routers/web/explore.Code
    /source/routers/web/explore/code.go:80

@planbnet commented on GitHub (Sep 6, 2024): I just had this problem with a simple search on our local instance running Gitea 1.22.1. The installation has a few hundred repos and Bleve configured like this: ``` [indexer] REPO_INDEXER_ENABLED = true REPO_INDEXER_PATH = indexers/repos.bleve MAX_FILE_SIZE = 1048576 REPO_INDEXER_INCLUDE = REPO_INDEXER_EXCLUDE = client/bower_components/**,bower_components/**,node-modules/**,client/node-modules/** ``` 16 GB of memory are slowly eaten up by the following goroutine (taken from Gitea's monitoring/stacktrace view). Clicking the trash can icon to kill the process does nothing. `GET: /explore/code?q=build.yaml&fuzzy=true` - `[...until here, there are always different function calls listed with each reload of the monitoring view, but the following stacktrace items will always be the same..]` - `github.com/blevesearch/bleve/v2/search/searcher.(*PhraseSearcher).Next` `/go/pkg/mod/github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_phrase.go:229` - `github.com/blevesearch/bleve/v2/search/collector.(*TopNCollector).Collect` `/go/pkg/mod/github.com/blevesearch/bleve/v2@v2.3.10/search/collector/topn.go:228` - `github.com/blevesearch/bleve/v2.(*indexImpl).SearchInContext` `/go/pkg/mod/github.com/blevesearch/bleve/v2@v2.3.10/index_impl.go:580` - `code.gitea.io/gitea/modules/indexer/code/bleve.(*Indexer).Search` `/source/modules/indexer/code/bleve/bleve.go:285` - `code.gitea.io/gitea/modules/indexer/code.PerformSearch` `/source/modules/indexer/code/search.go:138` - `code.gitea.io/gitea/routers/web/explore.Code` `/source/routers/web/explore/code.go:80`
Author
Owner

@johanvdw commented on GitHub (Sep 6, 2024):

Bleve has been upgraded to v2.4.2 via #31762 but I don't whether this issue has been resolved.

This is not solved on gitea 1.22.2

@johanvdw commented on GitHub (Sep 6, 2024): > Bleve has been upgraded to v2.4.2 via #31762 but I don't whether this issue has been resolved. This is not solved on gitea 1.22.2
Author
Owner

@planbnet commented on GitHub (Sep 6, 2024):

Just tested it with 1.22.2, still the exact same issue.

To prevent unintended DOS attacks on our server, I've disabled the fuzzy search with a small nginx hack:

location /explore/code {
  if ($arg_fuzzy = 'true') {
    rewrite ^/explore/code /explore/code?q=$arg_q&fuzzy=false last;
  }
  proxy_pass http://localhost:3000;
}

If there's a cleaner workaround to keep the indexer, but disable fuzzy search, please let me know.

@planbnet commented on GitHub (Sep 6, 2024): Just tested it with 1.22.2, still the exact same issue. To prevent unintended DOS attacks on our server, I've disabled the fuzzy search with a small nginx hack: ``` location /explore/code { if ($arg_fuzzy = 'true') { rewrite ^/explore/code /explore/code?q=$arg_q&fuzzy=false last; } proxy_pass http://localhost:3000; } ``` If there's a cleaner workaround to keep the indexer, but disable fuzzy search, please let me know.
Author
Owner

@jpraet commented on GitHub (Oct 17, 2024):

I also got hit by this unpleasant surprise of OOM crashes after upgrading to gitea 1.22.

When such a significant problem is discovered in a new release, it would be nice if a warning would be added to the release notes and blog posts.

Are we sure this is an upstream issue? Given https://github.com/go-gitea/gitea/issues/31565#issuecomment-2244027712

@jpraet commented on GitHub (Oct 17, 2024): I also got hit by this unpleasant surprise of OOM crashes after upgrading to gitea 1.22. When such a significant problem is discovered in a new release, it would be nice if a warning would be added to the release notes and blog posts. Are we sure this is an upstream issue? Given https://github.com/go-gitea/gitea/issues/31565#issuecomment-2244027712
Author
Owner

@jpraet commented on GitHub (Oct 18, 2024):

Perhaps this one is related? https://github.com/go-gitea/gitea/pull/29706

@jpraet commented on GitHub (Oct 18, 2024): Perhaps this one is related? https://github.com/go-gitea/gitea/pull/29706
Author
Owner

@lunny commented on GitHub (Oct 18, 2024):

Perhaps this one is related? #29706

Did you mean it will happen only for fuzzy searching?

@lunny commented on GitHub (Oct 18, 2024): > Perhaps this one is related? #29706 Did you mean it will happen only for fuzzy searching?
Author
Owner

@jpraet commented on GitHub (Oct 18, 2024):

Yes, exact search seems to be working fine, hence https://github.com/go-gitea/gitea/issues/31565#issuecomment-2333563036.

@jpraet commented on GitHub (Oct 18, 2024): Yes, exact search seems to be working fine, hence https://github.com/go-gitea/gitea/issues/31565#issuecomment-2333563036.
Author
Owner

@jpraet commented on GitHub (Nov 21, 2024):

I want to raise some attention to this issue @go-gitea/maintainers, as it causes OOM crashes of gitea instance.

@jpraet commented on GitHub (Nov 21, 2024): I want to raise some attention to this issue @go-gitea/maintainers, as it causes OOM crashes of gitea instance.
Author
Owner

@johanvdw commented on GitHub (Nov 21, 2024):

yes, it is a security issue.

@johanvdw commented on GitHub (Nov 21, 2024): yes, it is a security issue.
Author
Owner

@lunny commented on GitHub (Nov 21, 2024):

Just tested it with 1.22.2, still the exact same issue.

To prevent unintended DOS attacks on our server, I've disabled the fuzzy search with a small nginx hack:

location /explore/code {
  if ($arg_fuzzy = 'true') {
    rewrite ^/explore/code /explore/code?q=$arg_q&fuzzy=false last;
  }
  proxy_pass http://localhost:3000;
}

If there's a cleaner workaround to keep the indexer, but disable fuzzy search, please let me know.

Maybe we can have an option to disable fuzzy search at first.

@lunny commented on GitHub (Nov 21, 2024): > Just tested it with 1.22.2, still the exact same issue. > > To prevent unintended DOS attacks on our server, I've disabled the fuzzy search with a small nginx hack: > > ``` > location /explore/code { > if ($arg_fuzzy = 'true') { > rewrite ^/explore/code /explore/code?q=$arg_q&fuzzy=false last; > } > proxy_pass http://localhost:3000; > } > ``` > > If there's a cleaner workaround to keep the indexer, but disable fuzzy search, please let me know. Maybe we can have an option to disable fuzzy search at first.
Author
Owner

@wxiaoguang commented on GitHub (Nov 22, 2024):

I can see some related bleeve related fuzzy changes in 1.22 : Determine fuzziness of bleve indexer by keyword length #29706 , Expose fuzzy search for issues/pulls #29701

Just a guess, would reverting #29706 resolve the problem? also cc @6543

@wxiaoguang commented on GitHub (Nov 22, 2024): I can see some related bleeve related fuzzy changes in 1.22 : Determine fuzziness of bleve indexer by keyword length #29706 , Expose fuzzy search for issues/pulls #29701 Just a guess, would reverting #29706 resolve the problem? also cc @6543
Author
Owner

@6543 commented on GitHub (Nov 22, 2024):

👀

@6543 commented on GitHub (Nov 22, 2024): :eyes:
Author
Owner

@lunny commented on GitHub (Jan 2, 2025):

We can temporarily disable fuzzy search for the bleve search engine in v1.23.

@lunny commented on GitHub (Jan 2, 2025): We can temporarily disable fuzzy search for the bleve search engine in v1.23.
Author
Owner

@wxiaoguang commented on GitHub (Jan 2, 2025):

I can see some related bleeve related fuzzy changes in 1.22 : Determine fuzziness of bleve indexer by keyword length #29706 , Expose fuzzy search for issues/pulls #29701

Just a guess, would reverting #29706 resolve the problem? also cc @6543

@6543 then we need another dirty patch to the indexer system: Fix bleve fuzziness search #33078

@wxiaoguang commented on GitHub (Jan 2, 2025): > I can see some related bleeve related fuzzy changes in 1.22 : Determine fuzziness of bleve indexer by keyword length #29706 , Expose fuzzy search for issues/pulls #29701 > > Just a guess, would reverting #29706 resolve the problem? also cc @6543 @6543 then we need another dirty patch to the indexer system: Fix bleve fuzziness search #33078
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#13242