[GH-ISSUE #769] Errors when batch-deploying multiple stacks using the same repo #7483

Open
opened 2026-04-27 21:14:50 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @chrschorn on GitHub (Aug 26, 2025).
Original GitHub issue: https://github.com/moghtech/komodo/issues/769

Hey, I'm using many stacks (20+) configured from a central git repo with komodo v1.19.1. The idea is to use this simple procedure to update all changed stacks whenever a new commit is made to the repo

[[procedure]]
name = "git-push-redeploy"
config.schedule = "Run every day at 1:01 am"

[[procedure.config.stage]]
name = "Batch Pull"
enabled = true
executions = [
  { execution.type = "BatchPullStack", execution.params.pattern = "*", enabled = true }
]

[[procedure.config.stage]]
name = "Batch Redeploy"
enabled = true
executions = [
  { execution.type = "BatchDeployStackIfChanged", execution.params.pattern = "*", enabled = true }
]

Deploying the stacks individually works fine. However, I'm running into multiple issues when trying to batch deploy:

  1. When configuring the repo on each stack individually (no shared "Repo" resource), BatchDeployStackIfChanged won't recognize that anything has changed. Additionally, typically the compose.yaml editor doesn't show up unless I recently used "Redeploy" (see screenshot).
Image
  1. When configuring the repo as a shared resource and run the same procedure, I get this kind of error on most (but not all) of the stacks:
ERROR: Failed at PullStack

TRACE:
	1: 500 Internal Server Error
	2: Missing compose file at compose.yaml

Any advice on how to properly configure this? Or perhaps I'm looking at a bug here? I have a suspicion that Komodo is trying to clone the repo many times to the same location, overwriting/deleting files in a race condition-like behavior.

Originally created by @chrschorn on GitHub (Aug 26, 2025). Original GitHub issue: https://github.com/moghtech/komodo/issues/769 Hey, I'm using many stacks (20+) configured from a central git repo with komodo v1.19.1. The idea is to use this simple procedure to update all changed stacks whenever a new commit is made to the repo ```toml [[procedure]] name = "git-push-redeploy" config.schedule = "Run every day at 1:01 am" [[procedure.config.stage]] name = "Batch Pull" enabled = true executions = [ { execution.type = "BatchPullStack", execution.params.pattern = "*", enabled = true } ] [[procedure.config.stage]] name = "Batch Redeploy" enabled = true executions = [ { execution.type = "BatchDeployStackIfChanged", execution.params.pattern = "*", enabled = true } ] ``` Deploying the stacks individually works fine. However, I'm running into multiple issues when trying to batch deploy: 1. When configuring the repo on each stack individually (no shared "Repo" resource), `BatchDeployStackIfChanged` won't recognize that anything has changed. Additionally, typically the `compose.yaml` editor doesn't show up unless I recently used "Redeploy" (see screenshot). <img width="1414" height="723" alt="Image" src="https://github.com/user-attachments/assets/ddb643d1-684c-471e-89ca-5814b46eb617" /> 2. When configuring the repo as a shared resource and run the same procedure, I get this kind of error on most (but not all) of the stacks: ``` ERROR: Failed at PullStack TRACE: 1: 500 Internal Server Error 2: Missing compose file at compose.yaml ``` Any advice on how to properly configure this? Or perhaps I'm looking at a bug here? I have a suspicion that Komodo is trying to clone the repo many times to the same location, overwriting/deleting files in a race condition-like behavior.
GiteaMirror added the seen 👀 label 2026-04-27 21:14:50 -05:00
Author
Owner

@mbecker20 commented on GitHub (Aug 26, 2025):

It sounds like Komodo core is having trouble cloning the repos. It does this in addition to Periphery clone, so it can safely pull repo and check for newer files compared to the periphery clone. Is there a networking reason why your periphery servers would be able to clone the repos, but Komodo Core container wouldn't be?

<!-- gh-comment-id:3222498928 --> @mbecker20 commented on GitHub (Aug 26, 2025): It sounds like Komodo core is having trouble cloning the repos. It does this in addition to Periphery clone, so it can safely pull repo and check for newer files compared to the periphery clone. Is there a networking reason why your periphery servers would be able to clone the repos, but Komodo Core container wouldn't be?
Author
Owner

@chrschorn commented on GitHub (Aug 26, 2025):

Thank you for pointing me in the network direction! I was able to solve issue 1, but 2 still persists.

In case anyone runs into the same networking issue as me: the issue was that, because komodo was running on the same VM as my reverse proxy, it was communicating with the reverse proxy via ipv6. I had configured the reverse proxy to only accept private ipv4 (but not ipv6) ranges for internal network communication. That lead to komodo not being able to access the git server at all. It wasn't very clear that Komodo was unable to clone the repo, but ultimately it was a setup issue on my end.

As for issue 2, when I use more than 5-10 stacks with a single shared "Repo" resource and use BatchPullStack, most (but not all) stacks fail with this error 500. The same works correctly using an individually configured repo on each stack.

Image
<!-- gh-comment-id:3224148134 --> @chrschorn commented on GitHub (Aug 26, 2025): Thank you for pointing me in the network direction! I was able to solve issue 1, but 2 still persists. In case anyone runs into the same networking issue as me: the issue was that, because komodo was running on the same VM as my reverse proxy, it was communicating with the reverse proxy via ipv6. I had configured the reverse proxy to only accept private ipv4 (but _not_ ipv6) ranges for internal network communication. That lead to komodo not being able to access the git server at all. It wasn't very clear that Komodo was unable to clone the repo, but ultimately it was a setup issue on my end. As for issue 2, when I use more than 5-10 stacks with a single shared "Repo" resource and use `BatchPullStack`, most (but not all) stacks fail with this error 500. The same works correctly using an individually configured repo on each stack. <img width="1196" height="448" alt="Image" src="https://github.com/user-attachments/assets/743ac61e-bdb5-4989-849c-6f8d139f8dc3" />
Author
Owner

@mbecker20 commented on GitHub (Aug 26, 2025):

@chrschorn there are a lot of config, the error you see is letting you know when you move to repo some other configuration is wrong, like run directory or file path.

<!-- gh-comment-id:3224628595 --> @mbecker20 commented on GitHub (Aug 26, 2025): @chrschorn there are a lot of config, the error you see is letting you know when you move to repo some other configuration is wrong, like run directory or file path.
Author
Owner

@Elekam commented on GitHub (Aug 28, 2025):

I have the same issue. I'm quite confused because I have 50+ stacks running, only 14 fail, and these 14 stacks are very similar to all the others. The run directory is identical using "./application_name" with the compose.yaml inside that folder inside the repo.

I don't really see why these stacks fail but the others don't, The only difference between them is that they point at different directories, but in the exact same way, and different env vars, which shouldn't be related to this.

I can also redeploy these stacks manually, and it works fine, but they don't work when triggered through the Global Update Schedule

Edit: I also checked the permissions of all my volume mounts and they all seem fine, and are identical between the failing and non-failing stacks.

<!-- gh-comment-id:3232728139 --> @Elekam commented on GitHub (Aug 28, 2025): I have the same issue. I'm quite confused because I have 50+ stacks running, only 14 fail, and these 14 stacks are very similar to all the others. The run directory is identical using "./application_name" with the compose.yaml inside that folder inside the repo. I don't really see why these stacks fail but the others don't, The only difference between them is that they point at different directories, but in the exact same way, and different env vars, which shouldn't be related to this. I can also redeploy these stacks manually, and it works fine, but they don't work when triggered through the Global Update Schedule Edit: I also checked the permissions of all my volume mounts and they all seem fine, and are identical between the failing and non-failing stacks.
Author
Owner

@mbecker20 commented on GitHub (Aug 30, 2025):

@chrschorn @Elekam I understand these issues are frustrating. The next step is to see if this issue can be reproduced on my side so I can figure out what might be happening. Do you guys have any steps to reproduce the issue you can provide?

<!-- gh-comment-id:3239009151 --> @mbecker20 commented on GitHub (Aug 30, 2025): @chrschorn @Elekam I understand these issues are frustrating. The next step is to see if this issue can be reproduced on my side so I can figure out what might be happening. Do you guys have any steps to reproduce the issue you can provide?
Author
Owner

@chrschorn commented on GitHub (Aug 31, 2025):

Steps that lead to the issue on my end:

  • Setup a "Repo" resource
  • Setup at least ~10 stacks that all reference the repo.
    • Each stack references a different subfolder in the repo (e.g. stacks/immich, stacks/paperless etc.). Not sure if this is important.
  • Run BatchPullStack with target * in a procedure
<!-- gh-comment-id:3240324458 --> @chrschorn commented on GitHub (Aug 31, 2025): Steps that lead to the issue on my end: - Setup a "Repo" resource - Setup at least ~10 stacks that all reference the repo. - Each stack references a different subfolder in the repo (e.g. `stacks/immich`, `stacks/paperless` etc.). Not sure if this is important. - Run `BatchPullStack` with target `*` in a procedure
Author
Owner

@joeknock90 commented on GitHub (Oct 6, 2025):

Just wanted to throw in that I'm also experiencing this issue. with more or less the same setup as above. ~20 Stacks across 4 servers. Individually pull the stack works great, but the procedure fails with the same 500 error.

<!-- gh-comment-id:3371742077 --> @joeknock90 commented on GitHub (Oct 6, 2025): Just wanted to throw in that I'm also experiencing this issue. with more or less the same setup as above. ~20 Stacks across 4 servers. Individually pull the stack works great, but the procedure fails with the same 500 error.
Author
Owner

@durandguru commented on GitHub (Oct 20, 2025):

I have the same. 4 servers. On two intel servers the batch pull stack procedure always work. I have two servers running in Oracle Cloud with Ubuntu running on arm64. Those two servers always fail with the Batch Pull Stack procedure. Manually pulling stacks work, but only when I select a few and not all (1 server 12 stacks/1 server 22 stacks) stacks at once.

Running on 1.19.5

Error in Batch Pull Stack

ERROR: Failed stage 'Stage 1' execution after 59.99791408s

TRACE:
1: ERROR: Failed on PullStack(PullStack { stack: "starbase80-911", services: [] })
2: ERROR: execution not successful. see update '68f62c72353a32f10d52ba27'

Error in stack

ERROR: Failed at PullStack

TRACE:
1: 500 Internal Server Error
2: Failed to validate run directory on host after stack write (canonicalize error)
3: No such file or directory (os error 2)

Other Error (the compose file is in the shared git repo)

ERROR: Failed at PullStack

TRACE:
1: 500 Internal Server Error
2: Missing compose file at compose.yaml

Doing a manual pull does work without a problem.

<!-- gh-comment-id:3421913625 --> @durandguru commented on GitHub (Oct 20, 2025): I have the same. 4 servers. On two intel servers the batch pull stack procedure always work. I have two servers running in Oracle Cloud with Ubuntu running on arm64. Those two servers always fail with the Batch Pull Stack procedure. Manually pulling stacks work, but only when I select a few and not all (1 server 12 stacks/1 server 22 stacks) stacks at once. Running on 1.19.5 #### Error in Batch Pull Stack ERROR: Failed stage 'Stage 1' execution after 59.99791408s TRACE: 1: ERROR: Failed on PullStack(PullStack { stack: "starbase80-911", services: [] }) 2: ERROR: execution not successful. see update '68f62c72353a32f10d52ba27' #### Error in stack ERROR: Failed at PullStack TRACE: 1: 500 Internal Server Error 2: Failed to validate run directory on host after stack write (canonicalize error) 3: No such file or directory (os error 2) #### Other Error (the compose file is in the shared git repo) ERROR: Failed at PullStack TRACE: 1: 500 Internal Server Error 2: Missing compose file at compose.yaml Doing a manual pull does work without a problem.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/komodo#7483