Elasticsearch: Request Entity Too Large #12044

Closed
opened 2025-11-02 09:55:54 -06:00 by GiteaMirror · 9 comments
Owner

Originally created by @markusamshove on GitHub (Nov 19, 2023).

Description

I've tried to enable code indexing in our instance using Elasticsearch, but I get the following error for a lot of repositories:

workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 20 items, backoff for a few seconds
indexer.go:128:func2() [E] Codes indexer handler: index error for repo 713: elastic: Error 413 (Request Entity Too Large)
workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 1 items, backoff for a few seconds

I've changed the setting http.max_content_length in the Elasticsearch config to the maximum possile value 2147483647b but the error still comes up.

This also comes up for a lot of repositories, not just our biggest ones.

I'm unsure how the indexer works, does it take the whole sourcecode of a branch and pumps it into elastic? Is some kind of batching per x files needed?

Gitea Version

1.21.0

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

Running Gitea on Linux amd64 with the official binary and Elasticsearch within Docker

Database

None

Originally created by @markusamshove on GitHub (Nov 19, 2023). ### Description I've tried to enable code indexing in our instance using Elasticsearch, but I get the following error for a lot of repositories: ``` workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 20 items, backoff for a few seconds indexer.go:128:func2() [E] Codes indexer handler: index error for repo 713: elastic: Error 413 (Request Entity Too Large) workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 1 items, backoff for a few seconds ``` I've changed the setting `http.max_content_length` in the Elasticsearch config to the maximum possile value `2147483647b` but the error still comes up. This also comes up for a lot of repositories, not just our biggest ones. I'm unsure how the indexer works, does it take the whole sourcecode of a branch and pumps it into elastic? Is some kind of batching per x files needed? ### Gitea Version 1.21.0 ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist _No response_ ### Screenshots _No response_ ### Git Version _No response_ ### Operating System _No response_ ### How are you running Gitea? Running Gitea on Linux amd64 with the official binary and Elasticsearch within Docker ### Database None
GiteaMirror added the type/bug label 2025-11-02 09:55:54 -06:00
Author
Owner

@markusamshove commented on GitHub (Nov 19, 2023):

The repository sizes (as reported in the gitea web ui) from some repositories that I picked out of the log are:

  • 1.1 MiB
  • 7.8 MiB
  • 2 MiB
  • 598 MiB
  • 154 MiB
  • 2.8 MiB

That makes me wonder if the small repositories are batched together with the big ones which then exceeds the request limit.

Reducing the max file size to MAX_FILE_SIZE=10000 does not seem to resolve the issue.

@markusamshove commented on GitHub (Nov 19, 2023): The repository sizes (as reported in the gitea web ui) from some repositories that I picked out of the log are: - 1.1 MiB - 7.8 MiB - 2 MiB - 598 MiB - 154 MiB - 2.8 MiB That makes me wonder if the small repositories are batched together with the big ones which then exceeds the request limit. Reducing the max file size to `MAX_FILE_SIZE=10000` does not seem to resolve the issue.
Author
Owner

@inferno-umar commented on GitHub (Feb 4, 2024):

Me 2 having the same issue.
My repo size in UI is 319MiB not indexing

@inferno-umar commented on GitHub (Feb 4, 2024): Me 2 having the same issue. My repo size in UI is 319MiB not indexing
Author
Owner

@wxiaoguang commented on GitHub (Feb 4, 2024):

to inferno-umar : Does this answer help?

https://stackoverflow.com/questions/58490210/the-remote-server-returned-an-error-413-request-entity-too-large-elasticsear


update: MarkusAmshove's report said that they have tried http.max_content_length, I am wondering whether these problems are the same.

@wxiaoguang commented on GitHub (Feb 4, 2024): to inferno-umar : Does this answer help? https://stackoverflow.com/questions/58490210/the-remote-server-returned-an-error-413-request-entity-too-large-elasticsear ---- update: MarkusAmshove's report said that they have tried `http.max_content_length`, I am wondering whether these problems are the same.
Author
Owner

@wxiaoguang commented on GitHub (Feb 4, 2024):

Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273

688d4a1f71/modules/indexer/code/elasticsearch/elasticsearch.go (L182-L188)

@wxiaoguang commented on GitHub (Feb 4, 2024): Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273 https://github.com/go-gitea/gitea/blob/688d4a1f719d2df4d2626453f4bc042c1874a375/modules/indexer/code/elasticsearch/elasticsearch.go#L182-L188
Author
Owner

@inferno-umar commented on GitHub (Feb 4, 2024):

Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273

688d4a1f71/modules/indexer/code/elasticsearch/elasticsearch.go (L182-L188)

Yeah! you're right Gitea is putting everything in 1 request before sending it to elastic search, not batching it, as shown in my error logs below:

....
...exer/code/indexer.go:128:func2() [E] Codes indexer handler: index error for repo 28: elastic: Error 413 (Request Entity Too Large)
...queue/workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 1 items, backoff for a few seconds
.....
@inferno-umar commented on GitHub (Feb 4, 2024): > Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273 > > https://github.com/go-gitea/gitea/blob/688d4a1f719d2df4d2626453f4bc042c1874a375/modules/indexer/code/elasticsearch/elasticsearch.go#L182-L188 Yeah! you're right Gitea is putting everything in 1 request before sending it to elastic search, not batching it, as shown in my error logs below: ``` .... ...exer/code/indexer.go:128:func2() [E] Codes indexer handler: index error for repo 28: elastic: Error 413 (Request Entity Too Large) ...queue/workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 1 items, backoff for a few seconds ..... ```
Author
Owner

@inferno-umar commented on GitHub (Feb 4, 2024):

After finding out I pushed my elasticsearch maximum limit to 2147483647b then I'm getting the following error 429 (too many requests):

 ...exer/code/indexer.go:128:func2() [E] Codes indexer handler: index error for repo 28: elastic: Error 429 (Too Many Requests): [in_flight_requests] Data too large, data for [<http_request>] would be  [1272753908/1.1gb], which is larger than the limit of [1073741824/1gb] [type=circuit_breaking_exception]
@inferno-umar commented on GitHub (Feb 4, 2024): After finding out I pushed my elasticsearch maximum limit to `2147483647b` then I'm getting the following error 429 (too many requests): ``` ...exer/code/indexer.go:128:func2() [E] Codes indexer handler: index error for repo 28: elastic: Error 429 (Too Many Requests): [in_flight_requests] Data too large, data for [<http_request>] would be [1272753908/1.1gb], which is larger than the limit of [1073741824/1gb] [type=circuit_breaking_exception] ```
Author
Owner

@inferno-umar commented on GitHub (Feb 5, 2024):

I'm trying to fix this issue in the code by batching the requests

Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273

688d4a1f71/modules/indexer/code/elasticsearch/elasticsearch.go (L182-L188)

@inferno-umar commented on GitHub (Feb 5, 2024): I'm trying to fix this issue in the code by batching the requests > Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273 > > https://github.com/go-gitea/gitea/blob/688d4a1f719d2df4d2626453f4bc042c1874a375/modules/indexer/code/elasticsearch/elasticsearch.go#L182-L188
Author
Owner

@lunny commented on GitHub (Feb 19, 2024):

Fixed by #29075

@lunny commented on GitHub (Feb 19, 2024): Fixed by #29075
Author
Owner

@github-actions[bot] commented on GitHub (Mar 1, 2024):

Automatically locked because of our CONTRIBUTING guidelines

@github-actions[bot] commented on GitHub (Mar 1, 2024): Automatically locked because of our [CONTRIBUTING guidelines](https://github.com/go-gitea/gitea/blob/main/CONTRIBUTING.md#issue-locking)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#12044