Register act_runner as ephemeral to Gitea #13692

Closed
opened 2025-11-02 10:50:31 -06:00 by GiteaMirror · 12 comments
Owner

Originally created by @ChristopherHX on GitHub (Nov 9, 2024).

Originally assigned to: @ChristopherHX on GitHub.

Feature Description

My idea is to

  • add a Ephemeral field to the database structure of the Runner
  • when the Ephemeral field is true let FetchTask return no task without error if the assigned task is in progress
  • when the Ephemeral field is true let FetchTask return no task with an error if the assigned task is in done, and remove the runner from the database.
  • UpdateTask and UpdateLog scope access to runnerid
  • Update the runnerv1 protocol to have an Ephemeral field as well during registration
  • Update act_runner to have the ephemeral flag, which implies run-once

This proposal would allow to securely deploy a registred host-mode act_runner in a VM and reset the same after job exit.

Some of my idea has been scetched in https://github.com/ChristopherHX/gitea/tree/ephemeral-runners.

Protocol proposal here: https://gitea.com/gitea/actions-proto-def/pulls/14

Read more here: https://gitea.com/gitea/act_runner/issues/19#issuecomment-932880

Screenshots

No response

Originally created by @ChristopherHX on GitHub (Nov 9, 2024). Originally assigned to: @ChristopherHX on GitHub. ### Feature Description My idea is to - [x] add a Ephemeral field to the database structure of the Runner - [x] when the Ephemeral field is true let FetchTask return no task without error if the assigned task is in progress - [x] when the Ephemeral field is true let FetchTask return no task with an error if the assigned task is in done, and remove the runner from the database. - [x] UpdateTask and UpdateLog scope access to runnerid - [x] Update the runnerv1 protocol to have an Ephemeral field as well during registration - [x] Update act_runner to have the ephemeral flag, which implies run-once This proposal would allow to securely deploy a registred host-mode act_runner in a VM and reset the same after job exit. Some of my idea has been scetched in https://github.com/ChristopherHX/gitea/tree/ephemeral-runners. Protocol proposal here: https://gitea.com/gitea/actions-proto-def/pulls/14 Read more here: https://gitea.com/gitea/act_runner/issues/19#issuecomment-932880 ### Screenshots _No response_
GiteaMirror added the topic/gitea-actionstype/proposal labels 2025-11-02 10:50:31 -06:00
Author
Owner

@wolfogre commented on GitHub (Nov 11, 2024):

I have a wait-and-see attitude towards this proposal.

Regarding running safely in host mode, my first instinct is:

  • Do not register a new act_runner every time.
  • The concurrency of tasks for the registered runner can only be 1, so during the execution of a task, the runner will not fetch a second one. (This is my biggest concern about the proposal; when the runner is not suitable for receiving new tasks, it should not FetchTask, rather than letting the FetchTask function decide whether to assign tasks.)
  • When executing tasks, once the execution is successful, do not immediately fetch the next task. Instead, clean up the env or even rebuild the virtual machine.
  • When starting a new runner in new env, reuse the local state file of the previously registered runner so that it can be recognized and accepted by Gitea without needing to register again.

To clarify, in the current design, Gitea does not actively assign tasks to runners; it only attempts to assign a new task when a runner requests one (if available). The purpose of this design is to allow the runner to decide for itself whether it is ready to receive more tasks, while Gitea only determines if there are new tasks to assign.

@wolfogre commented on GitHub (Nov 11, 2024): I have a wait-and-see attitude towards this proposal. Regarding running safely in host mode, my first instinct is: - Do not register a new act_runner every time. - The concurrency of tasks for the registered runner can only be 1, so during the execution of a task, the runner will not fetch a second one. (This is my biggest concern about the proposal; when the runner is not suitable for receiving new tasks, it should not FetchTask, rather than letting the FetchTask function decide whether to assign tasks.) - When executing tasks, once the execution is successful, do not immediately fetch the next task. Instead, clean up the env or even rebuild the virtual machine. - When starting a new runner in new env, reuse the local state file of the previously registered runner so that it can be recognized and accepted by Gitea without needing to register again. --- To clarify, in the current design, Gitea does not actively assign tasks to runners; it only attempts to assign a new task when a runner requests one (if available). The purpose of this design is to allow the runner to decide for itself whether it is ready to receive more tasks, while Gitea only determines if there are new tasks to assign.
Author
Owner

@ChristopherHX commented on GitHub (Nov 11, 2024):

Regarding running safely in host mode, my first instinct is:

This matches with the new run once mode that has been recently merged: https://gitea.com/gitea/act_runner/pulls/598

Anyone can fetch new jobs with the current runner state files, that could be uploaded to a bad actors server when the runner is in host-mode.

The same applies to GitHub Actions as long they are not ephemeral.

My proposal would optionally invalidate the token like in GitHub Actions.

Making FetchTask always fail after a job has been fetched is another idea that depends on once mode of the runner.

@ChristopherHX commented on GitHub (Nov 11, 2024): > Regarding running safely in host mode, my first instinct is: This matches with the new run once mode that has been recently merged: https://gitea.com/gitea/act_runner/pulls/598 Anyone can fetch new jobs with the current runner state files, that could be uploaded to a bad actors server when the runner is in host-mode. The same applies to GitHub Actions as long they are not ephemeral. My proposal would optionally invalidate the token like in GitHub Actions. Making FetchTask always fail after a job has been fetched is another idea that depends on once mode of the runner.
Author
Owner

@wolfogre commented on GitHub (Nov 11, 2024):

It cannot be a reason that "#598 has been merged so we should keep going that way." I just want to discuss whether we have a better design to handle this.

@wolfogre commented on GitHub (Nov 11, 2024): It cannot be a reason that "[#598](https://gitea.com/gitea/act_runner/pulls/598) has been merged so we should keep going that way." I just want to discuss whether we have a better design to handle this.
Author
Owner

@ChristopherHX commented on GitHub (Nov 11, 2024):

Yes let's discuss, this is still a pure proposal except the hardening change.

I only wanted to clairify the problems of the approuch you have described and that it looks based on my opinion like the once flag.

In GitHub Actions there is an alternative way as well for long lived runner state files

  • make the SYSTEM_RUNTIME_TOKEN writeable for logs and task state (GitHub Actions has this as of 2019)
  • allow spawning a script that executes a fetched task by pipeing stdio and sent updates directly to gitea (My custom github-act-runner has this, but only GitHub Actions can be used for direct communication)

Adding more native act runtimes, seems to be a mess.

@ChristopherHX commented on GitHub (Nov 11, 2024): Yes let's discuss, this is still a pure proposal except the hardening change. I only wanted to clairify the problems of the approuch you have described and that it looks based on my opinion like the once flag. In GitHub Actions there is an alternative way as well for long lived runner state files - make the SYSTEM_RUNTIME_TOKEN writeable for logs and task state (GitHub Actions has this as of 2019) - allow spawning a script that executes a fetched task by pipeing stdio and sent updates directly to gitea (My custom github-act-runner has this, but only GitHub Actions can be used for direct communication) Adding more native act runtimes, seems to be a mess.
Author
Owner

@ChristopherHX commented on GitHub (Nov 24, 2024):

This proposal aims to implement https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/autoscaling-with-self-hosted-runners#using-ephemeral-runners-for-autoscaling.

While I agree that gitea doesn't assign jobs to a specfic runner, but I still want the stronger security you get by this feature of GitHub Actions also in Gitea.

Once my main post would get 10 upvotes or more I create a Pull Request (eventually a ping/message in this issue is required) until then I wait for alternative proposals that bring similar security to act_runner in host mode.

I will create another proposal for uploading logs and job state via the ACTIONS_RUNTIME_TOKEN so an act_runner can act as auto scaler for single job act_runners.

@ChristopherHX commented on GitHub (Nov 24, 2024): This proposal aims to implement https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/autoscaling-with-self-hosted-runners#using-ephemeral-runners-for-autoscaling. While I agree that gitea doesn't assign jobs to a specfic runner, but I still want the stronger security you get by this feature of GitHub Actions also in Gitea. Once my main post would get 10 upvotes or more I create a Pull Request (eventually a ping/message in this issue is required) until then I wait for alternative proposals that bring similar security to act_runner in host mode. I will create another proposal for uploading logs and job state via the ACTIONS_RUNTIME_TOKEN so an act_runner can act as auto scaler for single job act_runners.
Author
Owner

@KhaineBOT commented on GitHub (Jan 28, 2025):

The challenge is how to manage the ephemeral runners. If you need to spin up, register, run the act, and then tear it down. Either, you need a third party piece of software to do this orchestration or it gets built into gitea. I think what would make sense to define a lightweight orchestration protocol. Third parties could use this to create complex orchestrations (i.e. running across many disparate hosts/platforms.) and gitea could implement simple orchestration on the local host. For many users, a simple host/docker only orchestration solution built into gitea would be highly beneficial.

We should make it as easy as possible for ephemeral runners to be used. Ideally, they should be the preferred method given the risks posed by long running acts.

@KhaineBOT commented on GitHub (Jan 28, 2025): The challenge is how to manage the ephemeral runners. If you need to spin up, register, run the act, and then tear it down. Either, you need a third party piece of software to do this orchestration or it gets built into gitea. I think what would make sense to define a lightweight orchestration protocol. Third parties could use this to create complex orchestrations (i.e. running across many disparate hosts/platforms.) and gitea could implement simple orchestration on the local host. For many users, a simple host/docker only orchestration solution built into gitea would be highly beneficial. We should make it as easy as possible for ephemeral runners to be used. Ideally, they should be the preferred method given [the](https://johnstawinski.com/2024/01/11/playing-with-fire-how-we-executed-a-critical-supply-chain-attack-on-pytorch/) [risks](https://adnanthekhan.com/2023/12/20/one-supply-chain-attack-to-rule-them-all/) posed by long running acts.
Author
Owner

@gabriel-samfira commented on GitHub (Feb 8, 2025):

I am willing to add support for gitea in GARM if it comes somewhat close to how ephemeral runners and webhooks work in github. I could then abstract away the code forge part of GARM and create an implementation for gitea. That would help with the ephemeral runner management part. And it would unlock the plethora of IaaS providers that GARM offers, for gitea as well.

Edit: someone already opened a request for this feature here: https://github.com/cloudbase/garm/issues/323

@gabriel-samfira commented on GitHub (Feb 8, 2025): I am willing to add support for gitea in [GARM](https://github.com/cloudbase/garm) if it comes somewhat close to how ephemeral runners and webhooks work in github. I could then abstract away the code forge part of GARM and create an implementation for gitea. That would help with the ephemeral runner management part. And it would unlock[ the plethora of IaaS providers](https://github.com/cloudbase/garm?tab=readme-ov-file#installing-external-providers) that GARM offers, for gitea as well. Edit: someone already opened a request for this feature here: https://github.com/cloudbase/garm/issues/323
Author
Owner

@ChristopherHX commented on GitHub (Feb 8, 2025):

From your referenced request, it might make sense that you would create a proposal additionally for the following item, which is 0% covered right now and reference this proposal

can send webhooks with job runs so we can schedule runners to a pool

this proposal seem to hit 10 upvotes, so I will look into making a Pull Request for the ephemeral runner part soon

@ChristopherHX commented on GitHub (Feb 8, 2025): From your referenced request, it might make sense that you would create a proposal additionally for the following item, which is 0% covered right now and reference this proposal > can send webhooks with job runs so we can schedule runners to a pool this proposal seem to hit 10 upvotes, so I will look into making a Pull Request for the ephemeral runner part soon
Author
Owner

@ChristopherHX commented on GitHub (Feb 23, 2025):

workflow_job webhook experimental https://github.com/go-gitea/gitea/pull/33694 as independent feature

Can be cherry picked on top of ephemeral runners PR to be tested together.

I'm using this branch with both changes to test things locally: https://github.com/ChristopherHX/gitea/tree/workflow_job_webhook, beware this contains Database Migration that are not part of nightly backups are important.

@ChristopherHX commented on GitHub (Feb 23, 2025): workflow_job webhook experimental https://github.com/go-gitea/gitea/pull/33694 as independent feature Can be cherry picked on top of [ephemeral runners PR](https://github.com/go-gitea/gitea/pull/33570) to be tested together. I'm using this branch with both changes to test things locally: <https://github.com/ChristopherHX/gitea/tree/workflow_job_webhook>, beware this contains Database Migration that are not part of nightly backups are important.
Author
Owner

@cobak78 commented on GitHub (Mar 12, 2025):

I think this issue https://codeberg.org/forgejo/discussions/issues/241 is slightly related and explores another solution on this topic by moving the poller to an external scaler.

@cobak78 commented on GitHub (Mar 12, 2025): I think this issue https://codeberg.org/forgejo/discussions/issues/241 is slightly related and explores another solution on this topic by moving the poller to an external scaler.
Author
Owner

@ChristopherHX commented on GitHub (Mar 19, 2025):

Hello @cobak78,

I skimmed over your discussion, while I'm planning to only implement the workflow_run + workflow_job endpoints defined by GitHub in Gitea instead of brewing our own like you seem to have done for Forgejo. This means for example using github_runner scaler of keda instead of creating a Gitea specific one, my initial test seems to show the queue metric can be calculated from my POC branch without webhooks or code changes to keda.

If you trust everyone who gains access to queue Jobs on your Forgejo instance, please ignore the rest of my post.

I would suggest that you audit the security of your Forgejo runner long living credentials when using labels with :host, since only the :docker:// labels can protect the credential file, without using the feature discussed here, based on my knowledge from the Gitea runner and nektos/act.

My goal of ephemeral runners in Gitea is, even if the long living runner credentials are exposed to a non trusted job that they become invalid for accessing additional write tokens + secets by fetching additional tasks that everybody can do with read access to the runner credential file (you make use of the reusability of this file content as a feature).

I saw inside your gist an kubernetes internal dns name for the runner creds to Forgejo, this means for exploiting this outside of kubernetes you need to know the public endpoint of gitea if this is not already leaked inside the job in GITHUB_SERVER_URL, but afaik the original registration domain is not recorded on the server to block access.

For a private or proof of concept instance your approach is perfectly fine 👍 .

@ChristopherHX commented on GitHub (Mar 19, 2025): Hello @cobak78, I skimmed over your discussion, while I'm planning to only implement the workflow_run + workflow_job endpoints defined by GitHub in Gitea instead of brewing our own like you seem to have done for Forgejo. This means for example using github_runner scaler of keda instead of creating a Gitea specific one, my initial test seems to show the queue metric can be calculated from my POC branch without webhooks or code changes to keda. If you trust everyone who gains access to queue Jobs on your Forgejo instance, please ignore the rest of my post. I would suggest that you audit the security of your Forgejo runner long living credentials when using labels with `:host`, since only the `:docker://` labels can protect the credential file, without using the feature discussed here, based on my knowledge from the Gitea runner and nektos/act. My goal of ephemeral runners in Gitea is, even if the long living runner credentials are exposed to a non trusted job that they become invalid for accessing additional write tokens + secets by fetching additional tasks that everybody can do with read access to the runner credential file (you make use of the reusability of this file content as a feature). I saw inside your gist an kubernetes internal dns name for the runner creds to Forgejo, this means for exploiting this outside of kubernetes you need to know the public endpoint of gitea if this is not already leaked inside the job in `GITHUB_SERVER_URL`, but afaik the original registration domain is not recorded on the server to block access. For a private or proof of concept instance your approach is perfectly fine 👍 .
Author
Owner

@cobak78 commented on GitHub (Jun 5, 2025):

Hi @ChristopherHX ,
thanks for your comment. As you say, and also the documentation on forgejo on the self-host runner type, there is no boundary limit on that type.
I will add a disclaimer explaining this potential missue.
In my own solution i only offer runner with docker type labels, so it's not really a problem

@cobak78 commented on GitHub (Jun 5, 2025): Hi @ChristopherHX , thanks for your comment. As you say, and also the documentation on forgejo on the self-host runner type, there is no boundary limit on that type. I will add a disclaimer explaining this potential missue. In my own solution i only offer runner with docker type labels, so it's not really a problem
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#13692