[Feature] cron.update_mirrors - LIMIT_SIZE config parameter #7815

Closed
opened 2025-11-02 07:37:36 -06:00 by GiteaMirror · 12 comments
Owner

Originally created by @somera on GitHub (Sep 7, 2021).

Description

The update_mirrors cron is updating all mirros (where updated_unix is ...) in one row. In my case I'm running the one once per day. That the cron needs ~2,5h to update all >6000 mirrors. And if Gitea is updating too much repos in one row GitHub is blocking Gite for some minutes. In this case Gitea gets 503 HTTP-ERROR

2021/09/07 04:09:27 ...irror/mirror_pull.go:176:runSync() [E] Failed to update mirror repository &{394 67 OpenAPITools <nil> openapi-generator-cli openapi-generator-cli   2 https://github.com/OpenAPITools/openapi-generator-cli.git master 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 false false false true <nil> [] 0 map[] map[] [] <nil> false 0 <nil> false 0 <nil> 54835310 <nil> <nil> false false [openapi openapi3 npm openapi-generator openapi2] default  1615665794 1630844437}:
        Stdout: Fetching origin

        Stderr: fatal: unable to access 'https://github.com/OpenAPITools/openapi-generator-cli.git/': The requested URL returned error: 503
        error: Could not fetch origin

        Err: exit status 1
        /source/services/mirror/mirror_pull.go:176 (0x2006e8a)
        /source/services/mirror/mirror_pull.go:276 (0x2008e9e)
        /source/services/mirror/mirror.go:79 (0x2004545)
        /source/modules/graceful/manager.go:139 (0xc5c565)
        /usr/local/go/src/runtime/asm_amd64.s:1371 (0x47aa40)

In this case I would preffer some config parameter where I can set the size of the current update_mirror cron.

; Update mirrors
[cron.update_mirrors]
; SCHEDULE = @every 24h
SCHEDULE = 0 0 * * * *
**LIMIT_SIZE = 50**

In this case the update_mirrors cron will run every hour und call the update for the 50 oldest (select * from repository where is_mirror = true order by updated_unix asc limit 50) mirrors. And if LIMIT_SIZE is not set, then it will gets all mirrors in the right order.

Or is this now possible?

There is an MIRROR_QUEUE_LENGTH config parameter. But I didn't find the usage in code.

image

Originally created by @somera on GitHub (Sep 7, 2021). ## Description <!-- If using a proxy or a CDN (e.g. CloudFlare) in front of gitea, please disable the proxy/CDN fully and connect to gitea directly to confirm the issue still persists without those services. --> The update_mirrors cron is updating all mirros (where updated_unix is ...) in one row. In my case I'm running the one once per day. That the cron needs ~2,5h to update all >6000 mirrors. And if Gitea is updating too much repos in one row GitHub is blocking Gite for some minutes. In this case Gitea gets 503 HTTP-ERROR ``` 2021/09/07 04:09:27 ...irror/mirror_pull.go:176:runSync() [E] Failed to update mirror repository &{394 67 OpenAPITools <nil> openapi-generator-cli openapi-generator-cli 2 https://github.com/OpenAPITools/openapi-generator-cli.git master 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 false false false true <nil> [] 0 map[] map[] [] <nil> false 0 <nil> false 0 <nil> 54835310 <nil> <nil> false false [openapi openapi3 npm openapi-generator openapi2] default 1615665794 1630844437}: Stdout: Fetching origin Stderr: fatal: unable to access 'https://github.com/OpenAPITools/openapi-generator-cli.git/': The requested URL returned error: 503 error: Could not fetch origin Err: exit status 1 /source/services/mirror/mirror_pull.go:176 (0x2006e8a) /source/services/mirror/mirror_pull.go:276 (0x2008e9e) /source/services/mirror/mirror.go:79 (0x2004545) /source/modules/graceful/manager.go:139 (0xc5c565) /usr/local/go/src/runtime/asm_amd64.s:1371 (0x47aa40) ``` In this case I would preffer some config parameter where I can set the size of the current update_mirror cron. ``` ; Update mirrors [cron.update_mirrors] ; SCHEDULE = @every 24h SCHEDULE = 0 0 * * * * **LIMIT_SIZE = 50** ``` In this case the update_mirrors cron will run every hour und call the update for the 50 oldest (select * from repository where is_mirror = true order by updated_unix asc limit 50) mirrors. And if LIMIT_SIZE is not set, then it will gets all mirrors in the right order. Or is this now possible? There is an MIRROR_QUEUE_LENGTH config parameter. But I didn't find the usage in code. ![image](https://user-images.githubusercontent.com/8334250/132366641-24a8d201-1cb3-4924-a0b0-e2c5813d9a7f.png)
GiteaMirror added the type/proposal label 2025-11-02 07:37:36 -06:00
Author
Owner

@lunny commented on GitHub (Sep 8, 2021):

Worker number and queue number should help to control that but it's not direct.

@lunny commented on GitHub (Sep 8, 2021): Worker number and queue number should help to control that but it's not direct.
Author
Owner

@zeripath commented on GitHub (Nov 5, 2021):

In 1.16 the mirror queue will be a proper queue and we would recommend that you use TYPE=level or TYPE=redis queue if you have a lot of mirrors.

@zeripath commented on GitHub (Nov 5, 2021): In 1.16 the mirror queue will be a proper queue and we would recommend that you use TYPE=level or TYPE=redis queue if you have a lot of mirrors.
Author
Owner

@somera commented on GitHub (Nov 5, 2021):

In 1.16 the mirror queue will be a proper queue and we would recommend that you use TYPE=level or TYPE=redis queue if you have a lot of mirrors.

Sounds good. Is there an config example? I can't find nothing in

https://github.com/go-gitea/gitea/blob/main/custom/conf/app.example.ini

@somera commented on GitHub (Nov 5, 2021): > In 1.16 the mirror queue will be a proper queue and we would recommend that you use TYPE=level or TYPE=redis queue if you have a lot of mirrors. Sounds good. Is there an config example? I can't find nothing in https://github.com/go-gitea/gitea/blob/main/custom/conf/app.example.ini
Author
Owner

@zeripath commented on GitHub (Nov 5, 2021):

If you're happy to run redis for all your queues it would be as simple as:

[queue]
TYPE=redis
CONN_STR=; as per docs

To specifically make the mirror queue and the pull request task queues level queues then it's:

[queue.pr_patch_checker]
TYPE=level

[queue.mirror]
TYPE=mirror

That should do it.

@zeripath commented on GitHub (Nov 5, 2021): If you're happy to run redis for all your queues it would be as simple as: ```ini [queue] TYPE=redis CONN_STR=; as per docs ``` To specifically make the mirror queue and the pull request task queues level queues then it's: ```ini [queue.pr_patch_checker] TYPE=level [queue.mirror] TYPE=mirror ``` That should do it.
Author
Owner

@somera commented on GitHub (Nov 6, 2021):

@zeripath currently I'm running my Gitea on an Mini-Server with 8GB RAM with PostgreSQL and Memcached, Nexus, ... . And if I will optimize my mirror process than I need Redis. In this case I can replace Memcached with Redis. Could be possible.

Perhaps it will be better to reduce the amount of different tools which can be used with Gitea. ;) Cause not every tool can be used for some operations. And the development process will be easier.

But if I set

[queue]
...
LENGTH = 2000

and my mirror cron runs every 24 hours than only 2000 oldest mirrors will be updated. Right? This means, that if I have 6000 mirrors after 3 days all mirrors will be up2date. Right?

@somera commented on GitHub (Nov 6, 2021): @zeripath currently I'm running my Gitea on an Mini-Server with 8GB RAM with PostgreSQL and Memcached, Nexus, ... . And if I will optimize my mirror process than I need Redis. In this case I can replace Memcached with Redis. Could be possible. Perhaps it will be better to reduce the amount of different tools which can be used with Gitea. ;) Cause not every tool can be used for some operations. And the development process will be easier. But if I set [queue] ... LENGTH = 2000 and my mirror cron runs every 24 hours than only 2000 oldest mirrors will be updated. Right? This means, that if I have 6000 mirrors after 3 days all mirrors will be up2date. Right?
Author
Owner

@zeripath commented on GitHub (Nov 6, 2021):

@somera if you don't want to use redis just use the level queue which is built into gitea itself.

The problem with using a persistable-channel queue for the mirror queue is if you have 2001 mirrors and 2000 are queued, the 2001st request to push to the mirror queue will block.

It is this blocking that is likely the cause your repeated issues of opened or stuck processes. Not every call to queue.Push() is async'd with go queue.Push(...).

@zeripath commented on GitHub (Nov 6, 2021): @somera if you don't want to use `redis` just use the `level` queue which is built into gitea itself. The problem with using a persistable-channel queue for the mirror queue is if you have 2001 mirrors and 2000 are queued, the 2001st request to push to the mirror queue will block. It is this blocking that is likely the cause your repeated issues of opened or stuck processes. Not every call to queue.Push() is async'd with `go queue.Push(...)`.
Author
Owner

@somera commented on GitHub (Nov 6, 2021):

ok. But what is with this question? Cause I try to understand this new functionality. It this what I "wanted" in my initial post?

If I set

[queue]
...
LENGTH = 2000

and my mirror cron runs every 24 hours than only 2000 oldest mirrors will be updated. Right? This means, that if I have 6000 mirrors after 3 days all mirrors will be up2date. Right?

@somera commented on GitHub (Nov 6, 2021): ok. But what is with this question? Cause I try to understand this new functionality. It this what I "wanted" in my initial post? If I set [queue] ... LENGTH = 2000 and my mirror cron runs every 24 hours than only 2000 oldest mirrors will be updated. Right? This means, that if I have 6000 mirrors after 3 days all mirrors will be up2date. Right?
Author
Owner

@zeripath commented on GitHub (Nov 6, 2021):

Ah I think I now understand what you mean - you'd prefer to limit the number of mirrors added to the queue by cron.update_mirrors.

OK let me take a look at that now.

@zeripath commented on GitHub (Nov 6, 2021): Ah I think I now understand what you mean - you'd prefer to limit the number of mirrors added to the queue by cron.update_mirrors. OK let me take a look at that now.
Author
Owner

@somera commented on GitHub (Nov 6, 2021):

@zeripath right. I don't want update all the 6000 mirrors in one row. this need's ~3h at the moment. And I will be blocked on Github ... too many requests in xxx minutes. but split theam.

On every update mirror cron call gite should update only xxxx oldest updated mirrors.

@somera commented on GitHub (Nov 6, 2021): @zeripath right. I don't want update all the 6000 mirrors in one row. this need's ~3h at the moment. And I will be blocked on Github ... too many requests in xxx minutes. but split theam. On every update mirror cron call gite should update only xxxx oldest updated mirrors.
Author
Owner

@somera commented on GitHub (Nov 23, 2021):

@zeripath thx. If I set than (perhaps in 1.16.0)

PULL_LIMIT=1000

than on update_mirrors cron only 1000 oldest mirrors will be updated?

@somera commented on GitHub (Nov 23, 2021): @zeripath thx. If I set than (perhaps in 1.16.0) `PULL_LIMIT=1000` than on update_mirrors cron only 1000 oldest mirrors will be updated?
Author
Owner

@zeripath commented on GitHub (Nov 23, 2021):

Each time the update_mirrors task is run only the oldest PULL_LIMIT pull mirrors and oldest PUSH_LIMIT push mirrors will be added to the queue.

If the mirror is already in the queue it will not count towards the limit.

So if the task limit is 3 say and you have repos A-N waiting to be updated and in increasing staleness, if A-E are already in the queue F, G and H will be added.

@zeripath commented on GitHub (Nov 23, 2021): Each time the update_mirrors task is run only the oldest PULL_LIMIT pull mirrors and oldest PUSH_LIMIT push mirrors will be added to the queue. If the mirror is already in the queue it will not count towards the limit. So if the task limit is 3 say and you have repos A-N waiting to be updated and in increasing staleness, if A-E are already in the queue F, G and H will be added.
Author
Owner

@somera commented on GitHub (Feb 5, 2022):

@zeripath after upgrade to 1.16.0 the update mirror process isn't working like in 1.15.x anymore. See #18607

And I don't understand the new process.

I did ~9000 curl api calls to Gitea:

curl -X 'POST' 'http://nuc-mini-celeron:3000/api/v1/repos/gaphor/in-app-notification-demo/mirror-sync' -H 'accept: application/json' -H 'Authorization: token xxxxx' -d ''

If I repead the curl calls I see this

2022/02/05 01:04:08 ...ces/mirror/mirror.go:161:func1() [E] Unable to push sync request for to the queue for push mirror repo[6616]: Error: already in queue
        /source/services/mirror/mirror.go:161 (0x1cd87c9)
        /usr/local/go/src/runtime/asm_amd64.s:1581 (0x471520)

in the logs.

And I set this:

[cron.update_mirrors]
SCHEDULE = 0 0 4 * * 5
PULL_LIMIT = -1
PUSH_LIMIT = -1

But Gitea is not updating all the repos.

Thy? When will Gitea update all the repos where the mirror.updated_unix date is older than one day?

image

@somera commented on GitHub (Feb 5, 2022): @zeripath after upgrade to 1.16.0 the update mirror process isn't working like in 1.15.x anymore. See #18607 And I don't understand the new process. I did ~9000 curl api calls to Gitea: `curl -X 'POST' 'http://nuc-mini-celeron:3000/api/v1/repos/gaphor/in-app-notification-demo/mirror-sync' -H 'accept: application/json' -H 'Authorization: token xxxxx' -d ''` If I repead the curl calls I see this ``` 2022/02/05 01:04:08 ...ces/mirror/mirror.go:161:func1() [E] Unable to push sync request for to the queue for push mirror repo[6616]: Error: already in queue /source/services/mirror/mirror.go:161 (0x1cd87c9) /usr/local/go/src/runtime/asm_amd64.s:1581 (0x471520) ``` in the logs. And I set this: ``` [cron.update_mirrors] SCHEDULE = 0 0 4 * * 5 PULL_LIMIT = -1 PUSH_LIMIT = -1 ``` But Gitea is not updating all the repos. Thy? When will Gitea update all the repos where the mirror.updated_unix date is older than one day? ![image](https://user-images.githubusercontent.com/8334250/152619252-624f6403-d7be-4f48-a4e7-cc04b9f2368c.png)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#7815