mirror of
https://github.com/go-gitea/gitea.git
synced 2026-03-23 06:58:45 -05:00
[Proposal][Discuss] Gitea Cluster #6426
Open
opened 2025-11-02 06:55:25 -06:00 by GiteaMirror
·
22 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#6426
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @lunny on GitHub (Dec 2, 2020).
How does a Gitea deployment scale? Gitea cluster should resolve part of it.
Currently when running several Gitea instances which shared database, git storage. There is still something needs to resolve.
comment by @wxiaoguang
ExclusivePoolpool, which is also in-process now.#31813(based on #31908)
@6543 commented on GitHub (Dec 2, 2020):
for cron I propose: cron only create tasks, witch are represented in DB (like it's done with migration tasks)
for tasks: each instance should have a unique ID (GUID), if an instances fetch tasks from DB and alter there state by changing Status to running + adding GUID & PID into table
there must be some way gitea instances can speac to each other by GUID as Identifyer to:
propose a hardbeat to recover & cleanup tasks of crashed gitea instances:
modules/tasks task will need to be refactor to have an easy interface:
task.Signal(task.CANCEL, guid, pid) <- if guid is not of running instance, send it to the specific one ...
task.Run(t *task)
...
@lafriks commented on GitHub (Dec 2, 2020):
Some kind of git storage layer would be needed imho (something like gitlab has)
@6543 commented on GitHub (Dec 2, 2020):
I would fokus on tasks since git data via shared-storage work quite well at the moment
@lunny commented on GitHub (Dec 3, 2020):
It is but in fact it's expensive. So a distributed git data storage layer still be a necessary feature of Gitea in future.
@Codeberg-org commented on GitHub (Dec 3, 2020):
+1
Safe distributed/concurrent gitea is surely the highest priority from a user point of view, as off-the-shelf options for distributed SQL databases and distributed file systems are readily available.
@6543 commented on GitHub (Feb 3, 2021):
Roadmap:
master elec
done by DBMS: who get SQL select-update query in first
~7msg types
msg com
some sort of https://nats.io/, https://activemq.apache.org/cross-language-clients, ... over DB, Redis, ... ?
sidenotes
@gary-mazz commented on GitHub (Mar 18, 2021):
Interesting discussion. I think this started back in 2017 #2959.
There needs to be recognition of 2 cluster use cases: Load Balancing and High Availability (HA) with 2 types of location configurations: local and remote.
The more distant the cluster participant, data shifts from synchronous (near -real-time) to delayed; creating a spectrum of data synchronization quality levels from highly consistent to eventually consistent.
Technologies picked should be able to operate at a distance as well as on local prem without reconfiguration. Secure communications via tunneling and certificate based authentication between nodes should also be considered.
The "tricky part" is figuring out where to put the replication. Since gitea supports multiple databases, and each employs different and incompatible replication mechanisms, a formalized middle-ware layer is likely required to replicate data. The mid-layer replications also allows different db backend configuration (eg postgresql and Mysql) to provide transparent replication.
Replications will need some type lockout strategy for check-in/outs and zips operations during replication activity. The options are:
With remote site load balancing, it is possible to have check-in collisions causing inconsistencies. The use cases that cause these conditions:
I hope this helps some of your design decisions
PS: don't forget config files change pushes..
@lafriks commented on GitHub (Mar 22, 2021):
We should probably also need some kind of git repository access layer so that they could be distributed across cluster with local storage
@imacks commented on GitHub (Jun 3, 2021):
Just want to contribute my own experience using Gitea for the last couple of years.
Our first attempt was to run dockerized Gitea in kube, with storage back end provided by NFS. We rely on kube healthcheck to restart an unresponsive Gitea instance, which can run on any tainted host managed by kube. This solves the reliability issue somewhat, though there will be a period of unavailability while the container restarts.
Our v2 setup swaps out NFS for ceph CSI in kube. R/W performance improves dramatically. We also use S3 compat layer in ceph to store LFS data.
My most pressing desire for v3 is HA. We can be less ambitious and work on single local cluster first. There can be a dedicated pod for running cron tasks, so Gitea can concentrate on doing git and webserver stuff. We can also use s3 for storage exclusively for its sync capabilities.
@viceice commented on GitHub (Jan 4, 2023):
Do you have some hint to move from nfs to ceph CSI? I like to test out the perf. I already use S3 (minio) for all other Gitea storage.
@piamo commented on GitHub (Mar 10, 2023):
Will there be concurrency problem when using Ceph CSI, since there is no file lock protection?
@imacks commented on GitHub (Mar 11, 2023):
@piamo no. Only a single instance of gitea runs at any one time, so no locking is necessary. The appropriate ceph volume is auto mounted on whichever host the gitea container runs on. So yeah my setup is not HA, just resilient to host failure.
@piamo commented on GitHub (Apr 21, 2023):
@imacks But if two or more concurrent requests try to change the same repo, lock is still necessary.
@harryzcy commented on GitHub (May 2, 2023):
I think one immediate step for Gitea would be to enable limiting read-only operations and disable cron to somewhat achieve high availability. Many parts can already be deployed in a HA way:
What we need right now is to allow for disabling cron jobs, then Gitea can be deployed in a cluster with ReadWriteMany storage for git objects. To support ReadWriteOnce storage, the files need to be replicated by Gitea instead of the storage provider. Then Gitea must have a read-only mode and those replicas need to pull changes from master instance. In this case, the read-only operations should be identified so that a load balancer can route traffic properly.
After we have done the above step, then we could try to find some leader election protocols so that a replica can be promoted to master if master is down. This would be the second step.
Only after we have done that, we can start to split cron jobs to multiple instances. I think this is more complicated than the first two steps above.
@pat-s commented on GitHub (May 2, 2023):
Just FYI, we have an active WIP for a Gitea-HA setup in the helm-chart going on right now: https://gitea.com/gitea/helm-chart/pulls/437
It is based on Postgres-HA, a RWX file system and redis-cluster.
I think that using a RWX solves some part of the leader-election logic WRT to tasks and communication.
The only thing that is a true issue still are the duplicated cron executions. The biggest issue would be that both do the same thing at the exact same moment and crash therefore.
I haven't yet tested in in practice though.
Maybe implementing a random offset/sleep could help in the first place to at least ensure proper functionality? Even if all jobs would still be executed redundantly but it would at least allow us to make some initial progress.
@lunny commented on GitHub (May 3, 2023):
There are still some locks in fact need to be refactored except cron, see #22176
@wxiaoguang commented on GitHub (May 15, 2023):
@pat-s commented on GitHub (May 15, 2023):
Idk what the "docker's duplicate insert bug" is here and all the other points are also somewhat unclear in terms of severity. I think we need to check and find out in the end.
And to test all of them, we need a (functional) HA cluster first to test on.
I can provide a instance for testing if needed. Are you interested @wxiaoguang @lunny? I could also give you access to the k8s namespace so you can explore the pods yourself.
On the other hand I wonder if this could also be set up and tested using the project funds? A terraform setup which destroys everything again after testing is not a big deal. And the helm chart logic for a HA setup is ready.
@lunny commented on GitHub (May 16, 2023):
I think most problems here are obvious from code level. Maybe we can find more when we start testing. LThank you for you idea about the testing infrastructure. When we need those, we can discuss them. But for now, there are so many problems, maybe we should begin from starting some discuss or sending some PRs.
@wxiaoguang commented on GitHub (May 16, 2023):
Context:
I am interested, however, I have a quite long TODO list and many new PRs:
So I don't think I have the bandwidth at the moment.
@prskr commented on GitHub (Nov 9, 2023):
I didn't check everything in the code so far but I think something like https://github.com/hibiken/asynq could help with the cron issues?
For the shared repo access I was actually wondering why not trying to abstract that e.g. with a S3 compatible storage and use something like redlock to synchronize the access repositories. I'd even assume concurrent read should be fine? It's only about consistence when writing to a repository (presumably)?
@anbraten commented on GitHub (Oct 7, 2024):
In #28958 I've started a distributed implementation for the internal notifier. Thereby events such as
issue was deletedwould be broadcasted across all nodes.