mirror of
https://github.com/go-gitea/gitea.git
synced 2026-03-12 10:39:38 -05:00
Repo indexer - Max file size #1501
Closed
opened 2025-11-02 04:02:39 -06:00 by GiteaMirror
·
22 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#1501
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ghost on GitHub (Feb 5, 2018).
gitea v1.4.0-rc1
I am trying to change the repo indexer max file size. I setted it to 5242880, but the repo file size is fixed to:
Seems that the setting doesn't be load correctly:

I also tried to change the setting, and erased the repo folder, but anyway the size seems to be fixed.
Regards
@lafriks commented on GitHub (Feb 5, 2018):
MAX_FILE_SIZEis not for index size but for up to how big files to index@ghost commented on GitHub (Feb 5, 2018):
Thank you for your fast answer @lafriks . Then, what about the fixed "store" file size? I am trying to indexing a big project, and some of the files was indexed, but seems to be that not everyone, and I supposed that it depends maybe from the "file size"
@lafriks commented on GitHub (Feb 5, 2018):
how big is file that was not indexed?
@ghost commented on GitHub (Feb 5, 2018):
Only 9KB, and is a Java file
@lafriks commented on GitHub (Feb 5, 2018):
Do you see any errors in log file? and why do you think it was not indexed?
@ghost commented on GitHub (Feb 5, 2018):
No errors in log file. I am sure that the file was not indexed because I am trying to look also for other strings inside this file, but no results appear.
@ghost commented on GitHub (Feb 5, 2018):
I just replicated the problem. If I create a new repository with just this file and just this repo, then the file is indexed correctly. I think that the problem is the max "store" file size
@lafriks commented on GitHub (Feb 5, 2018):
I don't think there is maximal size for store file
@ghost commented on GitHub (Feb 5, 2018):
You can try to replicate the problem by cloning the repo: https://github.com/Xilinx/u-boot-xlnx
and search for string: "dm_test_adc_bind" (that is present in file "test/dm/adc.c"). Even in this case, the store file goes to a size of 1.048.576 KB and stop working (stop to grow up and to index files)
@lafriks commented on GitHub (Feb 5, 2018):
@ethantkoenig can you check this?
@ghost commented on GitHub (Feb 5, 2018):
@ethantkoenig @lafriks Thank you. I also want to add an additional information: with the same repository occured also an access violation during repo indexing: Exception
@ethantkoenig commented on GitHub (Feb 6, 2018):
In my experience, bleve does not do well once index files approach/exceed 1 GB (although it may be different for other machines and OS's). I do recall seeing similar behavior at some point (where an index file remained the exact same size as more content was added to it), but wasn't able to figure out what was going on.
I have found that using sharded indices (not natively supported by bleve) allows bleve to scale more gracefully. Perhaps this is something to consider using in gitea.
@ghost commented on GitHub (Feb 6, 2018):
Thank you for your answer @ethantkoenig . Are there any workaround to avoid this behavior?
@ghost commented on GitHub (Feb 6, 2018):
I also want to add some external links to boltdb:
https://github.com/boltdb/bolt/issues/266
https://github.com/boltdb/bolt/issues/308
https://github.com/boltdb/bolt/blob/master/db.go (check comments at row 17, and starting from line 305)
Are those useful?
@ghost commented on GitHub (Feb 8, 2018):
@ethantkoenig an additional information: your lasts commit on master, changed the store max size to 524.288 KB, instead of, for "v1.4.0-rc1" the size is fixed to 1.048.576 KB
@ghost commented on GitHub (Feb 10, 2018):
@ethantkoenig @lafriks After hours of reverse engineering I think that I found the problem about max store sizing on gitea v1.4.0-rc1. On monday I will test mine tricky-hack on production gitea, and I will let you know about it here
@lafriks commented on GitHub (Feb 10, 2018):
@giudon that's great to hear
@ghost commented on GitHub (Feb 12, 2018):
@ethantkoenig @lafriks Now I can confirm my analisys: there is a problem in boltdb. In particular it happens in such particular cases, when is needed to remap the store file after 1GB. The problem is still present in the master version, because I seen that on gitea we are using this db.go version



ccd680d8c1/db.go(this commit was done by @lunny , and after this there is only one commit more, that doesn't change the remap logic on db.go).The following are the technical details (before start, I want to say that my OS pageSize, on a Windows 64 bit machine is equal to 4.096 bytes):
When boltdb needs to allocate new "junk" to store file, this method is called:
as you can see, when is needed to increase the store file is called the method "mmap" with a minimum size reallocation parameter. Inside "mmap" is called the method "mmapSize", that calculate the new store file size based on the minimum one passed as parameter:
as you can see from my red comments, there are 2 different calculation methods inside "mmapSize". The first one calculate the new store size following this rule, based on the minimum size parameter:
2^15 = 32.768 bytes
2^16 = 65.536 bytes
2^17 = 131.072 bytes
2^18 = 262.144 bytes
2^19 = 524.288 bytes
2^20 = 1.048.576 bytes
2^21 = 2.097.152 bytes
2^22 = 4.194.304 bytes
2^23 = 8.388.608 bytes
2^24 = 16.777.216 bytes
2^25 = 33.554.432 bytes
2^26 = 67.108.864 bytes
2^27 = 134.217.728 bytes
2^28 = 268.435.456 bytes
2^29 = 536.870.912 bytes
2^30 = 1.073.741.824 bytes
and this method works well.
The problems starts when the second calculation method take control over remap size, because the minimum size that is passed as parameter when 1GB is reached was "1.073.741.824", and not as I was expecting a value greater than "1.073.741.824".
So, when 1GB is reached, simply the mmapSize return the same minimum size passed as parameter (1.073.741.824), because following codes:
are not reached.
My little tricky-hack to let it works, is to comment the following red lines inside "mmapSize" method:


and so, when 1GB is reached, the new allocation size will be increased everytime by os.Getpagesize().
Now repo indexing work fine:
What do you think about it guys?
@lafriks commented on GitHub (Feb 12, 2018):
@giudon great bug hunting :) can you report this upstream to boltdb?
@ghost commented on GitHub (Feb 12, 2018):
@lafriks did it
@guillep2k commented on GitHub (Aug 1, 2019):
Bolt have become archived/unmaintained. Proposed alternative seems to be https://github.com/etcd-io/bbolt. Perhaps it's worth taking a look at it?
@lunny commented on GitHub (Mar 27, 2024):
It's outdated and now the latest version depends on go.etcd.io/bbolt indirectly.