mirror of
https://github.com/go-gitea/gitea.git
synced 2026-05-21 11:55:31 -05:00
OOM caused by numerous crawls #14106
Closed
opened 2025-11-02 11:02:59 -06:00 by GiteaMirror
·
24 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
No Label
issue/needs-feedback
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#14106
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @H0llyW00dzZ on GitHub (Feb 6, 2025).
Description
In the latest versions, 1.23.2 and 1.23.3, memory leaks occur.(update: see below, not memory leak, not regression)These OOMs are caused by numerous crawls, such as those used by Facebook Inc. (Meta), Amazon (AWS), and other entities that fetch data excessively for AI training.
My Gitea self-hosted configuration:
Screenshots
The logs exemplify how these companies use crawls for their AI.
Essentially, memory leaks occur when there are many fetch requests, leading to crashes due to excessive memory consumption (thanks to OOM Kubernetes).
@wxiaoguang commented on GitHub (Feb 6, 2025):
Could you download a diagnosis report from "admin panel -> monitor -> trace" when the memory goes high?
The report contains heap dump (no sensitive data) and could help to locate the problem.
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
Here is the system notice:
This is the system status, which shows an inconsistent system status, as I mentioned earlier in #33311.
@wxiaoguang commented on GitHub (Feb 6, 2025):
Could you download a diagnosis report from "admin panel -> monitor -> trace" when the memory goes high?
The report contains heap dump (no sensitive data) and could help to locate the problem.
@wxiaoguang commented on GitHub (Feb 6, 2025):
If the memory is not related to Gitea process, then maybe you need to figure out which process consumes that memory, for example: git process? or some other commands?
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
I can't capture the memory usage when it spikes via the trace admin panel because every time memory consumption goes high (e.g., 7 GiB), it crashes due to OOM Kubernetes.
@wxiaoguang commented on GitHub (Feb 6, 2025):
Is it clear that which process consumes that much memory? The Gitea web server process itself, or other processes like "ssh" or "git" or "gitea serve/hook"?
@wxiaoguang commented on GitHub (Feb 6, 2025):
If the OOM is caused by crawls, then it isn't a regression: each request consumes memory, some large repo/files consume more, then if there are lot of requests, these requests do consume a lot of memory and would lead to OOM. Maybe you could try to make stop the crawls and/or require sign-in for your instance.
So I think we need to make the problem clearer.
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
Most likely, it's from Git because the stack trace shows this:
When there are many requests, such as GET requests to view repositories from crawls, memory consumption goes high, and it crashes due to being OOM killed by Kubernetes.
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
Also right now, I've rolled back to version 1.23.1 and reduced the cache for last commit messages from 10K to 5K in the
app.iniconfiguration. Let's see if it still crashes.@wxiaoguang commented on GitHub (Feb 6, 2025):
TBH, I do not see related change between 1.23.1 ~ 1.23.3
https://github.com/go-gitea/gitea/compare/v1.23.1...v1.23.3
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
Well It worked fine for me previously, with uptime of over a month without crashing due to high memory consumption.
And now, after rolling back, it still crashes.
@wxiaoguang commented on GitHub (Feb 6, 2025):
Well, as I said above: it can't be a regression, it can't be related to the new version.
There are just more crawls now. If you do not have that much resource support the crawls, maybe you need to block the crawls.
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
For now, I've enabled
REQUIRE_SIGNIN_VIEWto disable crawls used by companies like Facebook (Meta) and Amazon (AWS) for training their AI. It seems they are likely overusing (Abuse) the crawls for AI purposes.Blocking these crawls by IP is ineffective because their IPs frequently change.
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
@wxiaoguang
The problem was solved by blocking their ASN, likely used for abusive AI training (e.g., Facebook Inc. (Meta), Amazon (AWS)). Now, only crawls from Google, used for indexing in their search engine, are allowed via Kubernetes Ingress Nginx. However, I believe it would be beneficial to expand the admin panel with additional features to block crawls based on IPs, User-Agent, and ASN. This would help prevent high memory consumption, likely due to memory leaks, which can cause crashes.
@H0llyW00dzZ commented on GitHub (Feb 6, 2025):
The proof that blocking bad crawls used by Facebook Inc. (Meta) and Amazon (AWS) for AI training has effectively solved the memory usage issue, which was previously being abused excessively for profit.
@H0llyW00dzZ commented on GitHub (Feb 15, 2025):
@wxiaoguang I've resolved this problem by increasing the Redis cache pool size to 500 and switching the session storage from files to Redis, using the same pool size of 500. This results in a total pool size of 1000.
The Stats:
Redis:

Pods:



However, this solution is only temporary because, without Redis, the memory usage leads to excessive consumption.
@wxiaoguang commented on GitHub (Apr 9, 2025):
In 1.23.7 , we have this:
Add a config option to block "expensive" pages for anonymous users (#34024) (#34071)
@H0llyW00dzZ commented on GitHub (Apr 9, 2025):
@wxiaoguang, I've been trying that configuration option, but it seems similar to
REQUIRE_SIGNIN_VIEW = true, which may not be ideal for open-source repositories. I think it would be more effective to implement a rate limiter based on IP addresses or user agents, or both, for areas that consume a lot of memory (e.g., example.com/repo/commit/sha1commit). This could reduce resource usage, such as memory, especially since many AI crawlers use the same IPs and user agents when crawling a site.@wxiaoguang commented on GitHub (Apr 9, 2025):
For "open source public site", my proposal is
https://github.com/go-gitea/gitea/pull/33951#discussion_r2032324964I don't run a public site, so I can't comment too much for this problem.
@H0llyW00dzZ commented on GitHub (Apr 9, 2025):
@wxiaoguang, I run a public site primarily for mirroring repositories. Also the implementation of #33951 could indeed help reduce resource usage. It's quite similar to a rate limiter, which would be beneficial in managing resource consumption effectively.
@wxiaoguang commented on GitHub (Apr 20, 2025):
#33951 has been merged, does it work for your case?
@H0llyW00dzZ commented on GitHub (Apr 20, 2025):
@wxiaoguang I haven't tried it yet. My git site is using Gitea 1.23.7, not the nightly build, as I prefer long-term stability due to its running on k8s.
@H0llyW00dzZ commented on GitHub (May 1, 2025):
@wxiaoguang I've been using version 1.24.0-rc0. The performance is better now, unlike previously when memory usage increased a lot.
However, I'm not sure yet if it's fixed, as my Gitea self-hosted site currently shows no crawling detected. I might update you later if crawling is detected.
@GiteaBot commented on GitHub (Jun 2, 2025):
We close issues that need feedback from the author if there were no new comments for a month. 🍵