Recurring freezes on BSD #8318

Closed
opened 2025-11-02 08:02:05 -06:00 by GiteaMirror · 34 comments
Owner

Originally created by @phryk on GitHub (Jan 4, 2022).

Gitea Version

1.15.8

Git Version

2.34.1

Operating System

FreeBSD

How are you running Gitea?

Installed from custom package repository built py local poudriere.
Same problem occured with versions from official FreeBSD pkg repo.

Gitea is running within a jail and has an nginx running on the same jail in front of it as transparent proxy.

Database

PostgreSQL

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

I'm not sure what that even means. Do you want a paste?

Description

I've been experiencing recurring freezes where only killall -9 gitea helps for about a year now.
This happens from within a couple hours to a couple weeks after the service is started.

I cannot reproduce this issue wilfully, but if I wait long enough it always shows up.

I have previously tried setting the log level to Debug, but haven't seen anything that
would tell me what the actual issue is. As this happened last summer, I don't have
those particular logs anymore, but have re-escalated my log level to Debug to be able
to attach a log when this bug next hits.

My latest attempt in figuring out what's going on was to enable gitea's metrics and have
them graphed in Grafana, but every single metric just shows a straight line up to the point
when gitea freezes, which is when data stops coming in.

In the meantime till I can attach a log: Any hunches on what might trigger this behavior?
Any other data I can supply to help triage this issue?

Best wishes,

phryk

Screenshots

No response

Originally created by @phryk on GitHub (Jan 4, 2022). ### Gitea Version 1.15.8 ### Git Version 2.34.1 ### Operating System FreeBSD ### How are you running Gitea? Installed from custom package repository built py local poudriere. Same problem occured with versions from official FreeBSD pkg repo. Gitea is running within a jail and has an nginx running on the same jail in front of it as transparent proxy. ### Database PostgreSQL ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist I'm not sure what that even means. Do you want a paste? ### Description I've been experiencing recurring freezes where only `killall -9 gitea` helps for about a year now. This happens from within a couple hours to a couple weeks after the service is started. I cannot reproduce this issue wilfully, but if I wait long enough it always shows up. I have previously tried setting the log level to Debug, but haven't seen anything that would tell me what the actual issue is. As this happened last summer, I don't have those particular logs anymore, but have re-escalated my log level to Debug to be able to attach a log when this bug next hits. My latest attempt in figuring out what's going on was to enable gitea's metrics and have them graphed in Grafana, but every single metric just shows a straight line up to the point when gitea freezes, which is when data stops coming in. In the meantime till I can attach a log: Any hunches on what might trigger this behavior? Any other data I can supply to help triage this issue? Best wishes, phryk ### Screenshots _No response_
GiteaMirror added the issue/needs-feedbacktype/upstream labels 2025-11-02 08:02:05 -06:00
Author
Owner

@lunny commented on GitHub (Jan 4, 2022):

How did you compile Gitea yourself?
And could you catch the CPU, disk, memory usage when Gitea recurring freezes?

@lunny commented on GitHub (Jan 4, 2022): How did you compile Gitea yourself? And could you catch the CPU, disk, memory usage when Gitea recurring freezes?
Author
Owner

@phryk commented on GitHub (Jan 4, 2022):

It's built by poudriere, which is a bulk package builder for FreeBSD that
uses FreeBSDs ports system to do the actual building.
You can find out more about it here: https://github.com/freebsd/poudriere/wiki

Actual disk usage I sadly don't have data for, but here are graphs for the other metrics as well as gitea and nginx (the green one is the gitea, or rather the transparent proxy for it)

combined

@phryk commented on GitHub (Jan 4, 2022): It's built by poudriere, which is a bulk package builder for FreeBSD that uses FreeBSDs ports system to do the actual building. You can find out more about it here: https://github.com/freebsd/poudriere/wiki Actual disk usage I sadly don't have data for, but here are graphs for the other metrics as well as gitea and nginx (the green one is the gitea, or rather the transparent proxy for it) ![combined](https://user-images.githubusercontent.com/1021513/148078327-bff22987-23c6-4a64-bd4e-a50ce9acb788.png)
Author
Owner

@phryk commented on GitHub (Jan 4, 2022):

I should probably also mention that no other services seem to be affected by this, so it looks like it's not hogging any system resources (or rather not any used by anything else) when it freezes – it just stops doing stuff.

@phryk commented on GitHub (Jan 4, 2022): I should probably also mention that no other services seem to be affected by this, so it looks like it's not hogging any system resources (or rather not any used by anything else) when it freezes – it just stops doing stuff.
Author
Owner

@zeripath commented on GitHub (Jan 4, 2022):

This sounds like a deadlock happening somewhere but postgres isn't usually a DB that suffers from them. (SQLite is usually the way we detect these.) Apart from DB deadlocks I'm not sure there's any other obvious thing that would cause a deadlock.

One problem in 1.15.7- was that git cat-file --batch won't fail fast if the git repo is broken but #17992 should prevent that - Is this definitely still happening on 1.15.8?

If so, a few thoughts:

  • Do you have pprof enabled? (be careful with this as on 1.15 pprof will write files to PPROF_DATA_PATH every time people ssh to your server.) It may be helpful to poll the cpuprofile to see if we can catch the problem as it happens.
  • Have you been able to rule out a network port exhaustion problem? Or open files problem?
  • In the router logs have you seen any common requests that aren't complete?
  • I worry that what we're seeing might be a bug in go on FreeBSD - but I just don't know how to prove that.
@zeripath commented on GitHub (Jan 4, 2022): This sounds like a deadlock happening somewhere but postgres isn't usually a DB that suffers from them. (SQLite is usually the way we detect these.) Apart from DB deadlocks I'm not sure there's any other obvious thing that would cause a deadlock. One problem in 1.15.7- was that `git cat-file --batch` won't fail fast if the git repo is broken but #17992 should prevent that - Is this definitely still happening on 1.15.8? If so, a few thoughts: * Do you have pprof enabled? (be careful with this as on 1.15 pprof will write files to PPROF_DATA_PATH every time people ssh to your server.) It may be helpful to poll the cpuprofile to see if we can catch the problem as it happens. * Have you been able to rule out a network port exhaustion problem? Or open files problem? * In the router logs have you seen any common requests that aren't complete? * I worry that what we're seeing might be a bug in go on FreeBSD - but I just don't know how to prove that.
Author
Owner

@phryk commented on GitHub (Jan 4, 2022):

  • Other services using PostgreSQL (with other databases) are unaffected, not sure if that means a DB deadlock can be excluded.
  • Yes, definitely happening on 1.15.8
  • Well, now I have pprof enabled, will report back when the bug hits again
  • I'm not quite sure what you mean with router logs and how I would tell a request didn't complete
    • grep -i error gitea.log shows nothing of interest, the only legitimate errors are from gitea being unable
      to update a mirror whose source repo got DMCA'd…
@phryk commented on GitHub (Jan 4, 2022): * Other services using PostgreSQL (with other databases) are unaffected, not sure if that means a DB deadlock can be excluded. * Yes, definitely happening on 1.15.8 * Well, *now* I have pprof enabled, will report back when the bug hits again * I'm not quite sure what you mean with router logs and how I would tell a request didn't complete * `grep -i error gitea.log` shows nothing of interest, the only legitimate errors are from gitea being unable to update a mirror whose source repo got DMCA'd…
Author
Owner

@zeripath commented on GitHub (Jan 7, 2022):

The logs appear somewhat confusing:

2022/01/04 23:11:01 ...s/context/context.go:185:HTML() [D] Template: repo/commits
2022/01/04 23:11:01 models/repo.go:504:ComposeMetas() [I] [SQL] SELECT team.lower_name FROM "team_repo" INNER JOIN "team" ON team.id = team_repo.team_id WHERE (team_repo.repo_id = $1) ORDER BY team.lower_name [1] - 268.794µs
2022/01/04 23:11:02 ...s/context/context.go:755:1() [D] Session ID: <crawler>
2022/01/04 23:11:02 ...s/context/context.go:756:1() [D] CSRF Token: <crawler>
2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "user" WHERE "lower_name"=$1 LIMIT 1 [owner] - 463.192µs
2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "repository" WHERE "owner_id"=$1 AND "lower_name"=$2 LIMIT 1 [2 repo] - 413.763µs
2022/01/04 23:11:02 models/repo_unit.go:210:getUnitsByRepoID() [I] [SQL] SELECT "id", "repo_id", "type", "config", "created_unix" FROM "repo_unit" WHERE (repo_id = $1) [1] - 183.034µs
2022/01/04 23:11:02 ...s/repo_pushmirror.go:97:GetPushMirrorsByRepoID() [I] [SQL] SELECT "id", "repo_id", "remote_name", "interval", "created_unix", "last_update", "last_error" FROM "push_mirror" WHERE (repo_id=$1) [1] - 162.754µs
2022/01/04 23:11:02 ...ules/context/repo.go:470:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 [1 false] - 194.067µs
2022/01/04 23:11:02 ...ules/context/repo.go:477:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 AND is_tag=$3 [1 false false] - 196.112µs
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-parse
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 show-ref --tags
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch-check

This implies that the problem has occurred somewhere in:

b25a571bc9/modules/context/repo.go (L551-L558)

But the problem is that there isn't really any place for a deadlock to occur in there.

My thoughts for progressing this further are to apply the following:

diff --git a/modules/git/command.go b/modules/git/command.go
index d83c42fdc..273bea632 100644
--- a/modules/git/command.go
+++ b/modules/git/command.go
@@ -120,6 +120,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		log.Debug("%s: %v", dir, c)
 	}
 
+	defer log.Debug("Done %s: %v", dir, c)
+
 	ctx, cancel := context.WithTimeout(c.parentContext, timeout)
 	defer cancel()
 
diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go
index 666ca81c1..b9a7a483f 100644
--- a/modules/git/repo_branch_nogogit.go
+++ b/modules/git/repo_branch_nogogit.go
@@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) {
 
 // callShowRef return refs, if limit = 0 it will not limit
 func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) {
+	log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+	defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+
 	stdoutReader, stdoutWriter := io.Pipe()
 	defer func() {
 		_ = stdoutReader.Close()
diff --git a/modules/process/manager.go b/modules/process/manager.go
index e42e38a0f..7ea6a73e6 100644
--- a/modules/process/manager.go
+++ b/modules/process/manager.go
@@ -14,6 +14,8 @@ import (
 	"sort"
 	"sync"
 	"time"
+
+	"code.gitea.io/gitea/modules/log"
 )
 
 // TODO: This packages still uses a singleton for the Manager.
@@ -56,6 +58,7 @@ func GetManager() *Manager {
 
 // Add a process to the ProcessManager and returns its PID.
 func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
+	log.Debug("Add(%s)", description)
 	pm.mutex.Lock()
 	pid := pm.counter + 1
 	pm.processes[pid] = &Process{
@@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
 	pm.counter = pid
 	pm.mutex.Unlock()
 
+	log.Debug("Done Add(%s) PID: %d", description, pid)
 	return pid
 }
 

That would help us to see if there is deadlock somewhere.

One final random thought is maybe the problem is in log/file.go - I guess it's worth another review.

@zeripath commented on GitHub (Jan 7, 2022): The logs appear somewhat confusing: ``` 2022/01/04 23:11:01 ...s/context/context.go:185:HTML() [D] Template: repo/commits 2022/01/04 23:11:01 models/repo.go:504:ComposeMetas() [I] [SQL] SELECT team.lower_name FROM "team_repo" INNER JOIN "team" ON team.id = team_repo.team_id WHERE (team_repo.repo_id = $1) ORDER BY team.lower_name [1] - 268.794µs 2022/01/04 23:11:02 ...s/context/context.go:755:1() [D] Session ID: <crawler> 2022/01/04 23:11:02 ...s/context/context.go:756:1() [D] CSRF Token: <crawler> 2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "user" WHERE "lower_name"=$1 LIMIT 1 [owner] - 463.192µs 2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "repository" WHERE "owner_id"=$1 AND "lower_name"=$2 LIMIT 1 [2 repo] - 413.763µs 2022/01/04 23:11:02 models/repo_unit.go:210:getUnitsByRepoID() [I] [SQL] SELECT "id", "repo_id", "type", "config", "created_unix" FROM "repo_unit" WHERE (repo_id = $1) [1] - 183.034µs 2022/01/04 23:11:02 ...s/repo_pushmirror.go:97:GetPushMirrorsByRepoID() [I] [SQL] SELECT "id", "repo_id", "remote_name", "interval", "created_unix", "last_update", "last_error" FROM "push_mirror" WHERE (repo_id=$1) [1] - 162.754µs 2022/01/04 23:11:02 ...ules/context/repo.go:470:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 [1 false] - 194.067µs 2022/01/04 23:11:02 ...ules/context/repo.go:477:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 AND is_tag=$3 [1 false false] - 196.112µs 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-parse 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 show-ref --tags 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch-check ``` This implies that the problem has occurred somewhere in: https://github.com/go-gitea/gitea/blob/b25a571bc979d1a7483f488be7e342299837637b/modules/context/repo.go#L551-L558 But the problem is that there isn't really any place for a deadlock to occur in there. My thoughts for progressing this further are to apply the following: ```patch diff --git a/modules/git/command.go b/modules/git/command.go index d83c42fdc..273bea632 100644 --- a/modules/git/command.go +++ b/modules/git/command.go @@ -120,6 +120,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. log.Debug("%s: %v", dir, c) } + defer log.Debug("Done %s: %v", dir, c) + ctx, cancel := context.WithTimeout(c.parentContext, timeout) defer cancel() diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go index 666ca81c1..b9a7a483f 100644 --- a/modules/git/repo_branch_nogogit.go +++ b/modules/git/repo_branch_nogogit.go @@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) { // callShowRef return refs, if limit = 0 it will not limit func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) { + log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + stdoutReader, stdoutWriter := io.Pipe() defer func() { _ = stdoutReader.Close() diff --git a/modules/process/manager.go b/modules/process/manager.go index e42e38a0f..7ea6a73e6 100644 --- a/modules/process/manager.go +++ b/modules/process/manager.go @@ -14,6 +14,8 @@ import ( "sort" "sync" "time" + + "code.gitea.io/gitea/modules/log" ) // TODO: This packages still uses a singleton for the Manager. @@ -56,6 +58,7 @@ func GetManager() *Manager { // Add a process to the ProcessManager and returns its PID. func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { + log.Debug("Add(%s)", description) pm.mutex.Lock() pid := pm.counter + 1 pm.processes[pid] = &Process{ @@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { pm.counter = pid pm.mutex.Unlock() + log.Debug("Done Add(%s) PID: %d", description, pid) return pid } ``` That would help us to see if there is deadlock somewhere. One final random thought is maybe the problem is in log/file.go - I guess it's worth another review.
Author
Owner

@zeripath commented on GitHub (Jan 15, 2022):

Have you seen any further freezes?

@zeripath commented on GitHub (Jan 15, 2022): Have you seen any further freezes?
Author
Owner

@phryk commented on GitHub (Jan 16, 2022):

Yes and no. The bug seems to have triggered again, but this time the process actually managed to die.
Logs are on their way to you.

@phryk commented on GitHub (Jan 16, 2022): Yes and no. The bug seems to have triggered again, but this time the process actually managed to die. Logs are on their way to you.
Author
Owner

@zeripath commented on GitHub (Jan 16, 2022):

OK so somewhat relievingly the problem is not in process.GetManager().Add() and that the error has occurred in the same place suggests that logging is not to blame.

The problem lies somewhere in:

6cb5069bf6/modules/git/command.go (L123-L155)

The only places left therefore are:

  1. line 123: ctx, cancel := context.WithTimeout(c.parentContext, timeout) is this deadlocking somehow?
  2. line 126: cmd := exec.CommandContext(ctx, c.name, c.args...) this requires stating the filesystem. Is the filesystem locking for some reason?
  3. line 148: if err := cmd.Start(); err != nil { this involves starting the process itself so maybe there is some issue with bsd process creation here.

None of these are looking greatly soluble, I guess adding some logging to this section of code would be the only thing to do to move things closer to working out what the possible reason is.


diff --git a/modules/git/command.go b/modules/git/command.go
index d83c42fdc..71c398aae 100644
--- a/modules/git/command.go
+++ b/modules/git/command.go
@@ -120,9 +120,13 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		log.Debug("%s: %v", dir, c)
 	}
 
+	defer log.Debug("Done %s: %v", dir, c)
+
 	ctx, cancel := context.WithTimeout(c.parentContext, timeout)
 	defer cancel()
 
+	log.Debug("%s: %v created context", dir, c)
+
 	cmd := exec.CommandContext(ctx, c.name, c.args...)
 	if env == nil {
 		cmd.Env = os.Environ()
@@ -130,6 +134,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		cmd.Env = env
 	}
 
+	log.Debug("%s: %v created CommandContext", dir, c)
+
 	cmd.Env = append(
 		cmd.Env,
 		fmt.Sprintf("LC_ALL=%s", DefaultLocale),
@@ -149,6 +155,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		return err
 	}
 
+	log.Debug("%s: %v started", dir, c)
+
 	desc := c.desc
 	if desc == "" {
 		desc = fmt.Sprintf("%s %s %s [repo_path: %s]", GitExecutable, c.name, strings.Join(c.args, " "), dir)
diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go
index 666ca81c1..b9a7a483f 100644
--- a/modules/git/repo_branch_nogogit.go
+++ b/modules/git/repo_branch_nogogit.go
@@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) {
 
 // callShowRef return refs, if limit = 0 it will not limit
 func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) {
+	log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+	defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+
 	stdoutReader, stdoutWriter := io.Pipe()
 	defer func() {
 		_ = stdoutReader.Close()
diff --git a/modules/process/manager.go b/modules/process/manager.go
index e42e38a0f..7ea6a73e6 100644
--- a/modules/process/manager.go
+++ b/modules/process/manager.go
@@ -14,6 +14,8 @@ import (
 	"sort"
 	"sync"
 	"time"
+
+	"code.gitea.io/gitea/modules/log"
 )
 
 // TODO: This packages still uses a singleton for the Manager.
@@ -56,6 +58,7 @@ func GetManager() *Manager {
 
 // Add a process to the ProcessManager and returns its PID.
 func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
+	log.Debug("Add(%s)", description)
 	pm.mutex.Lock()
 	pid := pm.counter + 1
 	pm.processes[pid] = &Process{
@@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
 	pm.counter = pid
 	pm.mutex.Unlock()
 
+	log.Debug("Done Add(%s) PID: %d", description, pid)
 	return pid
 }
 

(remember github likes to pretend the final empty line doesn't exist so if you copy this add a terminal empty line.)

@zeripath commented on GitHub (Jan 16, 2022): OK so somewhat relievingly the problem is not in process.GetManager().Add() and that the error has occurred in the same place suggests that logging is not to blame. The problem lies somewhere in: https://github.com/go-gitea/gitea/blob/6cb5069bf6679e0def94aabb7c52f9fecd4b3009/modules/git/command.go#L123-L155 The only places left therefore are: 1. line 123: `ctx, cancel := context.WithTimeout(c.parentContext, timeout)` is this deadlocking somehow? 2. line 126: `cmd := exec.CommandContext(ctx, c.name, c.args...)` this requires stating the filesystem. Is the filesystem locking for some reason? 3. line 148:` if err := cmd.Start(); err != nil {` this involves starting the process itself so maybe there is some issue with bsd process creation here. None of these are looking greatly soluble, I guess adding some logging to this section of code would be the only thing to do to move things closer to working out what the possible reason is. --- ```patch diff --git a/modules/git/command.go b/modules/git/command.go index d83c42fdc..71c398aae 100644 --- a/modules/git/command.go +++ b/modules/git/command.go @@ -120,9 +120,13 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. log.Debug("%s: %v", dir, c) } + defer log.Debug("Done %s: %v", dir, c) + ctx, cancel := context.WithTimeout(c.parentContext, timeout) defer cancel() + log.Debug("%s: %v created context", dir, c) + cmd := exec.CommandContext(ctx, c.name, c.args...) if env == nil { cmd.Env = os.Environ() @@ -130,6 +134,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. cmd.Env = env } + log.Debug("%s: %v created CommandContext", dir, c) + cmd.Env = append( cmd.Env, fmt.Sprintf("LC_ALL=%s", DefaultLocale), @@ -149,6 +155,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. return err } + log.Debug("%s: %v started", dir, c) + desc := c.desc if desc == "" { desc = fmt.Sprintf("%s %s %s [repo_path: %s]", GitExecutable, c.name, strings.Join(c.args, " "), dir) diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go index 666ca81c1..b9a7a483f 100644 --- a/modules/git/repo_branch_nogogit.go +++ b/modules/git/repo_branch_nogogit.go @@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) { // callShowRef return refs, if limit = 0 it will not limit func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) { + log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + stdoutReader, stdoutWriter := io.Pipe() defer func() { _ = stdoutReader.Close() diff --git a/modules/process/manager.go b/modules/process/manager.go index e42e38a0f..7ea6a73e6 100644 --- a/modules/process/manager.go +++ b/modules/process/manager.go @@ -14,6 +14,8 @@ import ( "sort" "sync" "time" + + "code.gitea.io/gitea/modules/log" ) // TODO: This packages still uses a singleton for the Manager. @@ -56,6 +58,7 @@ func GetManager() *Manager { // Add a process to the ProcessManager and returns its PID. func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { + log.Debug("Add(%s)", description) pm.mutex.Lock() pid := pm.counter + 1 pm.processes[pid] = &Process{ @@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { pm.counter = pid pm.mutex.Unlock() + log.Debug("Done Add(%s) PID: %d", description, pid) return pid } ``` (remember github likes to pretend the final empty line doesn't exist so if you copy this add a terminal empty line.)
Author
Owner

@phryk commented on GitHub (Jan 16, 2022):

Patched, deployed and will get back next time the bug strikes. Thanks for the assistance. :)

@phryk commented on GitHub (Jan 16, 2022): Patched, deployed and will get back next time the bug strikes. Thanks for the assistance. :)
Author
Owner

@thearchivalone commented on GitHub (Jan 20, 2022):

@phryk thanks for posting this bug report. I'm having the same issue running Gitea in Bastille running postgres in the same container. I'm currently updating to the latest release but did notice that the hang happens the same way as you and have to fully restart the server when I need to update a setting or two (still getting Gitea moved over and configured properly from a Linux server).

Definitely seeing this issue with 1.15.10 on my end.

The easiest way to get it to trigger for me was to just spam service gitea restart until it freezes at Waiting for PIDS message. Sending this to the port maintainer.

@thearchivalone commented on GitHub (Jan 20, 2022): @phryk thanks for posting this bug report. I'm having the same issue running Gitea in Bastille running postgres in the same container. I'm currently updating to the latest release but did notice that the hang happens the same way as you and have to fully restart the server when I need to update a setting or two (still getting Gitea moved over and configured properly from a Linux server). Definitely seeing this issue with 1.15.10 on my end. The easiest way to get it to trigger for me was to just spam `service gitea restart` until it freezes at Waiting for PIDS message. Sending this to the port maintainer.
Author
Owner

@zeripath commented on GitHub (Jan 20, 2022):

@bedwardly-down as I said above this isn't looking like a Gitea bug per se. The three points of possible deadlock are all deep in go std library code and likely at system calls.

My greatest suspicion is unfortunately falling at the context.WithTimeout call. If it's there then that's a serious problem and working around it will not be easy (although we could simply drop the WithTimeout assuming WithCancel is unaffected.)

@zeripath commented on GitHub (Jan 20, 2022): @bedwardly-down as I said above this isn't looking like a Gitea bug per se. The three points of possible deadlock are all deep in go std library code and likely at system calls. My greatest suspicion is unfortunately falling at the context.WithTimeout call. If it's there then that's a serious problem and working around it will not be easy (although we could simply drop the WithTimeout assuming WithCancel is unaffected.)
Author
Owner

@thearchivalone commented on GitHub (Jan 20, 2022):

@zeripath thanks for responding. I'm waiting on the FreeBSD port maintainer to get back to me on it but go apps having this kind of issue is not uncommon on FreeBSD, according to other maintainers I've been interacting with in the IRC and official Discord channels. I ran a test yesterday that kind of goes along with your assumptions too.

I'm primarily involved with the Nodejs ecosystem right now, so I use the PM2 process manager and Nodemon pretty regularly. Running Gitea through PM2 and having that strapped to FreeBSD's init system showed that the port was attempting to start and stop at the exact same time in various places causing it to deadlock. I had a similar occurrence with Caddy and a few other go apps on my old Gentoo server (it uses a modified version of FreeBSD's init system with most of the core functionality being exactly the same between the two). Running Gitea and Caddy outside of init but instead directly seemed to work fine in several of my tests.

@thearchivalone commented on GitHub (Jan 20, 2022): @zeripath thanks for responding. I'm waiting on the FreeBSD port maintainer to get back to me on it but go apps having this kind of issue is not uncommon on FreeBSD, according to other maintainers I've been interacting with in the IRC and official Discord channels. I ran a test yesterday that kind of goes along with your assumptions too. I'm primarily involved with the Nodejs ecosystem right now, so I use the PM2 process manager and Nodemon pretty regularly. Running Gitea through PM2 and having that strapped to FreeBSD's init system showed that the port was attempting to start and stop at the exact same time in various places causing it to deadlock. I had a similar occurrence with Caddy and a few other go apps on my old Gentoo server (it uses a modified version of FreeBSD's init system with most of the core functionality being exactly the same between the two). Running Gitea and Caddy outside of init but instead directly seemed to work fine in several of my tests.
Author
Owner

@phryk commented on GitHub (Jan 20, 2022):

Bug hit again. Logs will shortly be on the way to @zeripath, last 3 lines from gitea.log look like this:

2022/01/20 03:24:11 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash>
2022/01/20 03:24:11 ...dules/git/command.go:128:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created context
2022/01/20 03:24:11 ...dules/git/command.go:137:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created CommandContext
@phryk commented on GitHub (Jan 20, 2022): Bug hit again. Logs will shortly be on the way to @zeripath, last 3 lines from `gitea.log` look like this: ``` 2022/01/20 03:24:11 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> 2022/01/20 03:24:11 ...dules/git/command.go:128:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created context 2022/01/20 03:24:11 ...dules/git/command.go:137:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created CommandContext ```
Author
Owner

@zeripath commented on GitHub (Jan 20, 2022):

OK well we've established that the problem is in starting the process:

6cb5069bf6/modules/git/command.go (L148)

I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do.

You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue?

@zeripath commented on GitHub (Jan 20, 2022): OK well we've established that the problem is in starting the process: https://github.com/go-gitea/gitea/blob/6cb5069bf6679e0def94aabb7c52f9fecd4b3009/modules/git/command.go#L148 I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do. You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue?
Author
Owner

@thearchivalone commented on GitHub (Jan 22, 2022):

OK well we've established that the problem is in starting the process:

6cb5069bf6/modules/git/command.go (L148)

I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do.

You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue?

To further clarify, PM2 was not how I normally ran Gitea. It was a test to see what’s actually happening behind the scenes with a tool that has a built in monitor that prints some basic but useful information. It also doesn’t seem to have any issues with Sqlite3.

@thearchivalone commented on GitHub (Jan 22, 2022): > OK well we've established that the problem is in starting the process: > > https://github.com/go-gitea/gitea/blob/6cb5069bf6679e0def94aabb7c52f9fecd4b3009/modules/git/command.go#L148 > > I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do. > > You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue? To further clarify, PM2 was not how I normally ran Gitea. It was a test to see what’s actually happening behind the scenes with a tool that has a built in monitor that prints some basic but useful information. It also doesn’t seem to have any issues with Sqlite3.
Author
Owner

@thearchivalone commented on GitHub (Jan 23, 2022):

@phryk how dependent are you on getting your git up and running? A large chunk of my daily needs are built around git and version control, so this definitely was an inconvenience for me. I hope it gets fixed pretty quickly upstream.

@thearchivalone commented on GitHub (Jan 23, 2022): @phryk how dependent are you on getting your git up and running? A large chunk of my daily needs are built around git and version control, so this definitely was an inconvenience for me. I hope it gets fixed pretty quickly upstream.
Author
Owner

@lunny commented on GitHub (Jan 23, 2022):

Have you catched the issue when enable pprof? If that, could you upload pprof report?

@lunny commented on GitHub (Jan 23, 2022): Have you catched the issue when enable pprof? If that, could you upload pprof report?
Author
Owner

@thearchivalone commented on GitHub (Jan 23, 2022):

I haven’t tried or heard of that. I’ll have to tinker with that later.

@thearchivalone commented on GitHub (Jan 23, 2022): I haven’t tried or heard of that. I’ll have to tinker with that later.
Author
Owner

@phryk commented on GitHub (Jan 23, 2022):

@łunny I've tried that before, but with the process frozen, pprof doesn't answer anymore either.
@zeripath recommended periodically polling pprof and saving the last couple results. I might get to that in a couple days.

Would be easier, if the info collected by pprof would also go into the openmetrics output as that's already being polled. :P

@bedwardly-down I'm running a private gitea instance which I use for all my projects, so it is a bit of an inconvenience for me personally, but I don't have anyone else depending on the service.

@phryk commented on GitHub (Jan 23, 2022): @łunny I've tried that before, but with the process frozen, pprof doesn't answer anymore either. @zeripath recommended periodically polling pprof and saving the last couple results. I might get to that in a couple days. Would be easier, if the info collected by pprof would also go into the openmetrics output as that's already being polled. :P @bedwardly-down I'm running a private gitea instance which I use for all my projects, so it is a bit of an inconvenience for me personally, but I don't have anyone else depending on the service.
Author
Owner

@zeripath commented on GitHub (Jan 23, 2022):

I'm really not sure pprof if going to help much. The problem appears to be in os/exec/exec.go within go's std library itself.

os/exec/exec.go:(*cmd).Start()

Walking through the code of Start in there points to:

	c.Process, err = os.StartProcess(c.Path, c.argv(), &os.ProcAttr{
		Dir:   c.Dir,
		Files: c.childFiles,
		Env:   addCriticalEnv(dedupEnv(envv)),
		Sys:   c.SysProcAttr,
	})

As the place where the problem is.

syscall/exec_unix.go:os.StartProcess(...)

Which on bsd, unix and linux all call forkExec in syscall/exec_unix.go. A cursory glance of this code shows:

...
	// Acquire the fork lock so that no other threads
	// create new fds that are not yet close-on-exec
	// before we fork.
	ForkLock.Lock()

	// Allocate child status pipe close on exec.
	if err = forkExecPipe(p[:]); err != nil {
		ForkLock.Unlock()
		return 0, err
	}

	// Kick off child.
	pid, err1 = forkAndExecInChild(argv0p, argvp, envvp, chroot, dir, attr, sys, p[1])
	if err1 != 0 {
		Close(p[0])
		Close(p[1])
		ForkLock.Unlock()
		return 0, Errno(err1)
	}
	ForkLock.Unlock()
...

Now if there is a panic in there the ForkLock could end up being left locked but the panic should be seen in our logs and I see no evidence of this.

Which leads me to think that either forkExecPipe or forkAndExecInChild are blocking.

forkExecPipe

Now forkExecPipe in linux is somewhat more complex than that on bsd:

BSD (syscall/forkpipe2.go):

func forkExecPipe(p []int) error {
	return Pipe2(p, O_CLOEXEC)
}

Linux (exec_linux,go):

// Try to open a pipe with O_CLOEXEC set on both file descriptors.
func forkExecPipe(p []int) (err error) {
	err = Pipe2(p, O_CLOEXEC)
	// pipe2 was added in 2.6.27 and our minimum requirement is 2.6.23, so it
	// might not be implemented.
	if err == ENOSYS {
		if err = Pipe(p); err != nil {
			return
		}
		if _, err = fcntl(p[0], F_SETFD, FD_CLOEXEC); err != nil {
			return
		}
		_, err = fcntl(p[1], F_SETFD, FD_CLOEXEC)
	}
	return
}

But the complexity here is simply falling back. So if there is a deadlock here it's in the syscall pipe2. https://www.freebsd.org/cgi/man.cgi?query=pipe2&sektion=2&format=html

The man page does not indicate that this could block but dtrace and ktrace will capture these calls.

forkAndExecInChild

The implementations between linux and bsd are substantially different here and it's getting into the deep systems programming level of starting processes that I'm afraid I know little to nothing about. The file is syscall/exec_bsd.go

	// About to call fork.
	// No more allocation or calls of non-assembly functions.
	runtime_BeforeFork()
	r1, _, err1 = RawSyscall(SYS_FORK, 0, 0, 0)
	if err1 != 0 {
		runtime_AfterFork()
		return 0, err1
	}

	if r1 != 0 {
		// parent; return PID
		runtime_AfterFork()
		return int(r1), 0
	}

Fundamentally there is an assembly call into runtime_BeforeFork which calls systemstack(beforeFork) which tells the system stack to block signals. (Could the blocking of signals be causing a problem? Does your jail send up a signal if Fork is blocked?)

Then https://www.freebsd.org/cgi/man.cgi?fork(2), and we check the pid returned - and if we're the parent or an error occurs returns the pid or error after running runtime_AfterFork() which calls systemstack(afterFork) reversing the changes of fork.

Remaining part of forkExec after ForkLock.Unlock()

	// Read child error status from pipe.
	Close(p[1])
	for {
		n, err = readlen(p[0], (*byte)(unsafe.Pointer(&err1)), int(unsafe.Sizeof(err1)))
		if err != EINTR {
			break
		}
	}
	Close(p[0])
	if err != nil || n != 0 {
		if n == int(unsafe.Sizeof(err1)) {
			err = Errno(err1)
		}
		if err == nil {
			err = EPIPE
		}

		// Child failed; wait for it to exit, to make sure
		// the zombies don't accumulate.
		_, err1 := Wait4(pid, &wstatus, 0, nil)
		for err1 == EINTR {
			_, err1 = Wait4(pid, &wstatus, 0, nil)
		}
		return 0, err
	}

	// Read got EOF, so pipe closed on exec, so exec succeeded.
	return pid, nil

I guess this could block - but I don't understand why it would take down the whole system.

Summary

  • The blocking has been traced to os/exec/exec.go:(*cmd).Start()
  • It's likely to be in syscall/exec_unix.go:os.StartProcess(...) and hence forkExec in syscall/exec_unix.go.
  • The reports of the entire go process halting make me suspicious about something happening around forkAndExecInChild in syscall/exec_bsd.go.
  • In particular around the call to fork(2) either at that call or at the systemstack calls
  • This should be dtraceable or ktraceable.
  • Alternatively something happening around pipe(2) in forkAndExecInChild could be blocking but again this should be dtraceable/ktraceable.
  • After the ForkLock has unlocked there is some code around reading the error from the child pipe but this shouldn't kill the whole go process.

So we're likely looking at a bug in go's runtime either due to some weird resource limit handling of the jail and the way it reports issues to go, or even more difficult to fix some bug in the OS about creating processes after some limit has been reached.

Maybe this https://github.com/golang/go/issues/43873 is related?

You've never told us the version of FreeBSD you're running or given us any information about how you have set up the jail. Could the parameters of your jail responsible?

@zeripath commented on GitHub (Jan 23, 2022): I'm really not sure pprof if going to help much. The problem appears to be in `os/exec/exec.go` within go's std library itself. ## `os/exec/exec.go:(*cmd).Start()` Walking through the code of Start in there points to: ```go c.Process, err = os.StartProcess(c.Path, c.argv(), &os.ProcAttr{ Dir: c.Dir, Files: c.childFiles, Env: addCriticalEnv(dedupEnv(envv)), Sys: c.SysProcAttr, }) ``` As the place where the problem is. ## `syscall/exec_unix.go:os.StartProcess(...)` Which on bsd, unix and linux all call `forkExec` in syscall/exec_unix.go. A cursory glance of this code shows: ```go ... // Acquire the fork lock so that no other threads // create new fds that are not yet close-on-exec // before we fork. ForkLock.Lock() // Allocate child status pipe close on exec. if err = forkExecPipe(p[:]); err != nil { ForkLock.Unlock() return 0, err } // Kick off child. pid, err1 = forkAndExecInChild(argv0p, argvp, envvp, chroot, dir, attr, sys, p[1]) if err1 != 0 { Close(p[0]) Close(p[1]) ForkLock.Unlock() return 0, Errno(err1) } ForkLock.Unlock() ... ``` Now if there is a panic in there the `ForkLock` could end up being left locked but the panic **should** be seen in our logs and I see no evidence of this. Which leads me to think that either `forkExecPipe` or `forkAndExecInChild` are blocking. ## `forkExecPipe` Now `forkExecPipe` in linux is somewhat more complex than that on bsd: BSD (`syscall/forkpipe2.go`): ```go func forkExecPipe(p []int) error { return Pipe2(p, O_CLOEXEC) } ``` Linux (`exec_linux,go`): ```go // Try to open a pipe with O_CLOEXEC set on both file descriptors. func forkExecPipe(p []int) (err error) { err = Pipe2(p, O_CLOEXEC) // pipe2 was added in 2.6.27 and our minimum requirement is 2.6.23, so it // might not be implemented. if err == ENOSYS { if err = Pipe(p); err != nil { return } if _, err = fcntl(p[0], F_SETFD, FD_CLOEXEC); err != nil { return } _, err = fcntl(p[1], F_SETFD, FD_CLOEXEC) } return } ``` But the complexity here is simply falling back. So if there is a deadlock here it's in the syscall pipe2. <https://www.freebsd.org/cgi/man.cgi?query=pipe2&sektion=2&format=html> The man page does not indicate that this could block but dtrace and ktrace will capture these calls. ## ``forkAndExecInChild`` The implementations between linux and bsd are substantially different here and it's getting into the deep systems programming level of starting processes that I'm afraid I know little to nothing about. The file is `syscall/exec_bsd.go` ``` // About to call fork. // No more allocation or calls of non-assembly functions. runtime_BeforeFork() r1, _, err1 = RawSyscall(SYS_FORK, 0, 0, 0) if err1 != 0 { runtime_AfterFork() return 0, err1 } if r1 != 0 { // parent; return PID runtime_AfterFork() return int(r1), 0 } ``` Fundamentally there is an assembly call into `runtime_BeforeFork` which calls `systemstack(beforeFork)` which tells the system stack to block signals. (Could the blocking of signals be causing a problem? Does your jail send up a signal if Fork is blocked?) Then <https://www.freebsd.org/cgi/man.cgi?fork(2)>, and we check the pid returned - and if we're the parent or an error occurs returns the pid or error after running `runtime_AfterFork()` which calls `systemstack(afterFork)` reversing the changes of fork. ## Remaining part of `forkExec` after `ForkLock.Unlock()` ```go // Read child error status from pipe. Close(p[1]) for { n, err = readlen(p[0], (*byte)(unsafe.Pointer(&err1)), int(unsafe.Sizeof(err1))) if err != EINTR { break } } Close(p[0]) if err != nil || n != 0 { if n == int(unsafe.Sizeof(err1)) { err = Errno(err1) } if err == nil { err = EPIPE } // Child failed; wait for it to exit, to make sure // the zombies don't accumulate. _, err1 := Wait4(pid, &wstatus, 0, nil) for err1 == EINTR { _, err1 = Wait4(pid, &wstatus, 0, nil) } return 0, err } // Read got EOF, so pipe closed on exec, so exec succeeded. return pid, nil ``` I guess this could block - but I don't understand why it would take down the whole system. ## Summary * The blocking has been traced to `os/exec/exec.go:(*cmd).Start()` * It's likely to be in `syscall/exec_unix.go:os.StartProcess(...)` and hence `forkExec` in `syscall/exec_unix.go`. * The reports of the entire go process halting make me suspicious about something happening around `forkAndExecInChild` in `syscall/exec_bsd.go`. * In particular around the call to `fork(2)` either at that call or at the `systemstack` calls * This should be dtraceable or ktraceable. * Alternatively something happening around `pipe(2)` in `forkAndExecInChild` could be blocking but again this should be dtraceable/ktraceable. * After the `ForkLock` has unlocked there is some code around reading the error from the child pipe but this shouldn't kill the whole go process. So we're likely looking at a bug in go's runtime either due to some weird resource limit handling of the jail and the way it reports issues to go, or even more difficult to fix some bug in the OS about creating processes after some limit has been reached. Maybe this https://github.com/golang/go/issues/43873 is related? You've never told us the version of FreeBSD you're running or given us any information about how you have set up the jail. Could the parameters of your jail responsible?
Author
Owner

@thearchivalone commented on GitHub (Jan 23, 2022):

@zeripath Thanks for looking into this as much as you have. For mine:

  • FreeBSD 13.0 p6.0 (latest stable patches and kernel)
  • Jail - BastilleBSD latest 0.9 patches
  • Postgres 13.5
  • Intel Xeon 3cpu 2.6GHz per core
  • 4 GB DDR4
  • 75GB NVMe SSD

@phryk I just had a thought: are you running postgres in the same jail as Gitea instead of separately and interacting with it through a port? To get it to work, it needed some modifications to its main config file to allow it to use more RAM and other system resources than most BSD jails initially allocate. I haven’t tried postgres in a separate jail yet since many online sources recommended encapsulating DB and its software in a single unit to allow easier transfer and backup. I wonder if the deadlock is caused because of that.

@thearchivalone commented on GitHub (Jan 23, 2022): @zeripath Thanks for looking into this as much as you have. For mine: - FreeBSD 13.0 p6.0 (latest stable patches and kernel) - Jail - BastilleBSD latest 0.9 patches - Postgres 13.5 - Intel Xeon 3cpu 2.6GHz per core - 4 GB DDR4 - 75GB NVMe SSD @phryk I just had a thought: are you running postgres in the same jail as Gitea instead of separately and interacting with it through a port? To get it to work, it needed some modifications to its main config file to allow it to use more RAM and other system resources than most BSD jails initially allocate. I haven’t tried postgres in a separate jail yet since many online sources recommended encapsulating DB and its software in a single unit to allow easier transfer and backup. I wonder if the deadlock is caused because of that.
Author
Owner

@thearchivalone commented on GitHub (Jan 23, 2022):

The issue wasn’t solved by moving postgres to a separate container for me. Even running postgres through a port and connecting that way caused gitea to hang. Removing the app.ini and letting it try to generate a new one also caused it to hang even with the /usr/local/etc/gitea/conf directory permissions set to 755 or 777. So far, it looks like SQLite is the definitive way to run it right now.

Can I get anyone else to try it with Sqlite? So far, that works in my tests but isn’t ideal if you have big repositories

@thearchivalone commented on GitHub (Jan 23, 2022): The issue wasn’t solved by moving postgres to a separate container for me. Even running postgres through a port and connecting that way caused gitea to hang. Removing the app.ini and letting it try to generate a new one also caused it to hang even with the /usr/local/etc/gitea/conf directory permissions set to 755 or 777. So far, it looks like SQLite is the definitive way to run it right now. Can I get anyone else to try it with Sqlite? So far, that works in my tests but isn’t ideal if you have big repositories
Author
Owner

@zeripath commented on GitHub (Jan 23, 2022):

@bedwardly-down are you sure that you have the same problem as @phryk ? It would useful to double check to see where your hangs are happening. I cannot see why SQLite would be better for a problem relating to forking.

If you're finding that SQLite is better then could you try connecting to postgres over a unix port instead tcp port as it might be that you're suffering tcp port-exhaustion instead.


Thinking again to this blocking problem relating to fork I wonder if the block is happening is because a page fault is occurring in fork and the signal cannot be handled?

@phyrk what version of go are you building Gitea with? Please ensure that go is the most recent version. It might help to allow.mlock in the jail.

@zeripath commented on GitHub (Jan 23, 2022): @bedwardly-down are you sure that you have the same problem as @phryk ? It would useful to double check to see where your hangs are happening. I cannot see why SQLite would be better for a problem relating to forking. If you're finding that SQLite is better then could you try connecting to postgres over a unix port instead tcp port as it might be that you're suffering tcp port-exhaustion instead. --- Thinking again to this blocking problem relating to fork I wonder if the block is happening is because a page fault is occurring in fork and the signal cannot be handled? @phyrk what version of go are you building Gitea with? Please ensure that go is the most recent version. It might help to `allow.mlock` in the jail.
Author
Owner

@thearchivalone commented on GitHub (Jan 23, 2022):

@zeripath honestly, it may be different. There’s not enough info from the original poster to get a clear picture and the only common thread we have is postgres and I’m grasping straws using my own limited knowledge of how gitea and its database support works.

@thearchivalone commented on GitHub (Jan 23, 2022): @zeripath honestly, it may be different. There’s not enough info from the original poster to get a clear picture and the only common thread we have is postgres and I’m grasping straws using my own limited knowledge of how gitea and its database support works.
Author
Owner

@zeripath commented on GitHub (Jan 23, 2022):

@bedwardly-down could you apply the patch in https://github.com/go-gitea/gitea/issues/18180#issuecomment-1013916215 and check your logs for when then deadlock occurs to see if it's at the same place.

If you find that the final log line is the same as in @phryk you're hitting the same problem. If not - well then we get to find another bug.

If switching to sqlite helps then that means port exhaustion is more likely. Similarly if using a unix port for postgres helps then it's far more likely to be a port exhaustion problem.

@zeripath commented on GitHub (Jan 23, 2022): @bedwardly-down could you apply the patch in https://github.com/go-gitea/gitea/issues/18180#issuecomment-1013916215 and check your logs for when then deadlock occurs to see if it's at the same place. If you find that the final log line is the same as in @phryk you're hitting the same problem. If not - well then we get to find another bug. If switching to sqlite helps then that means port exhaustion is more likely. Similarly if using a unix port for postgres helps then it's far more likely to be a port exhaustion problem.
Author
Owner

@thearchivalone commented on GitHub (Jan 23, 2022):

Didn’t apply the patch. Running it straight through the unix socket inside the same container finally solved my issue. Looking at the system logs, there was no indication that anything was happening prior running through a port from either a separate or local port that my permissions were wrong within Postgres. I fixed them and can now run Gitea with no issues. So, I’ll have to agree that mine was probably port exhaustion as you suggested.

@thearchivalone commented on GitHub (Jan 23, 2022): Didn’t apply the patch. Running it straight through the unix socket inside the same container finally solved my issue. Looking at the system logs, there was no indication that anything was happening prior running through a port from either a separate or local port that my permissions were wrong within Postgres. I fixed them and can now run Gitea with no issues. So, I’ll have to agree that mine was probably port exhaustion as you suggested.
Author
Owner

@zeripath commented on GitHub (Jan 23, 2022):

If you're using an http/https proxy like nginx you should also be able to make gitea run as http+unix which would then prevent port exhaustion (except through migrations) as a cause for problems. (Well I say "eliminate" I mean stop Gitea from being the cause - the port exhaustion could still happen in nginx.)


Actually it looks like Go is now setting SO_REUSEADDR so our server listening shouldn't be affected by port exhaustion - perhaps the DB connection sockets aren't set with SO_REUSEADDR?

@zeripath commented on GitHub (Jan 23, 2022): If you're using an http/https proxy like nginx you should also be able to make gitea run as http+unix which would then prevent port exhaustion (except through migrations) as a cause for problems. (Well I say "eliminate" I mean stop Gitea from being the cause - the port exhaustion could still happen in nginx.) --- Actually it looks like Go is now setting SO_REUSEADDR so our server listening shouldn't be affected by port exhaustion - perhaps the DB connection sockets aren't set with SO_REUSEADDR?
Author
Owner

@thearchivalone commented on GitHub (Jan 23, 2022):

I’m using Caddy in a separate container and reverse proxying straight from the local container ports. I don't believe Bastille supports unix sockets right now but that's possibly in the works. I have under 10 containers so far and caddy is only currently serving 3 while the rest are completely internal. That’s not good. I’ll have to research a bit on how to wrangle the containers into a single socket. Thanks.

@thearchivalone commented on GitHub (Jan 23, 2022): I’m using Caddy in a separate container and reverse proxying straight from the local container ports. I don't believe Bastille supports unix sockets right now but that's possibly in the works. I have under 10 containers so far and caddy is only currently serving 3 while the rest are completely internal. That’s not good. I’ll have to research a bit on how to wrangle the containers into a single socket. Thanks.
Author
Owner

@tsowa commented on GitHub (Jan 25, 2022):

I have been using gitea for more than a year and have never had such problems. My configuration:

$ uname -srm
FreeBSD 13.0-STABLE amd64

Jails were created with make distribution and make installworld from /usr/src, actually I am using one jail as a template and then only tar -x to the specific location. The main system (host) is build from source, after host is built then the jails are updated with the same binaries: I am using a custom script which does mergemaster, make installworld, make delete-old(libs) to the specific jail location with -D and DESTDIR parameter. Packages are built in a different jail (I am not using poudiere) and are available to other jails through custom pkg repository.

From pf (packet filter) I am making a redirect to a jail with nginx:

rdr on $ext_if proto tcp from any to {$main_ext_ip, $main_ext_ip2} port {80, 443} tag X51REDIRECT -> $jail_nginx_ip
rdr on lo0 proto tcp from {$main_ext_ip, $main_ext_ip2, 192.168.1.0/24} to {$main_ext_ip, $main_ext_ip2} port {80, 443} tag X51REDIRECT -> $jail_nginx_ip

pass in quick on $ext_if proto tcp from any to $jail_nginx_ip port { 80, 443 } flags S/SA keep state (if-bound) queue (www_out, ack_out)
pass in quick on lo0 proto tcp from {$main_ext_ip, $main_ext_ip2} to $jail_nginx_ip port { 80, 443 } flags S/SA keep state (if-bound) queue (www_out, ack_out)
pass out quick on lo0 proto tcp from 192.168.1.0/24 to {$main_ext_ip, $main_ext_ip2} port { 80, 443 } flags S/SA keep state (if-bound)
pass in  quick on lo0 proto tcp from 192.168.1.0/24 to $jail_nginx_ip port { 80, 443 } flags S/SA keep state (if-bound) queue (www_out, ack_out)

In the jail the nginx config looks like:

   server {
        listen      192.168.1.61:80;
        listen      192.168.1.61:443 ssl;

        server_name gitea.ttmath.org;
        client_max_body_size 300M;

        ssl_certificate     /etc/ssl/ttmath.org/certs.pem;
        ssl_certificate_key /etc/ssl/ttmath.org/privatekey.pem;
        ssl_dhparam         /etc/ssl/ttmath.org/dhparam.pem;

        location / {
            proxy_pass       http://192.168.1.93:3002;
            proxy_read_timeout 360s;
            include          proxy-headers.conf;

            if ($scheme = http) {
               return 301 https://$host$request_uri;
            }
        }

        include location-well-known.conf;
        include ssl-params.conf;
    }

proxy-headers.conf:

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

ssl-params.conf:

ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_session_cache    shared:SSL:1m;
ssl_session_timeout  5m;
ssl_ciphers  HIGH:!aNULL:!MD5;

Nginx makes a proxy to the jail where gitea is installed together with postgresql.

$ pkg info -x gitea postgresql
gitea-1.15.10
postgresql13-client-13.3
postgresql13-server-13.4_1

My jails config /etc/jail.conf:

nginx {
        exec.start = "/bin/sh /etc/rc";
        exec.stop = "/bin/sh /etc/rc.shutdown";
        exec.clean;
        mount.devfs;
        host.hostname = "nginx.ttmath.org";
        
        path = "/home/jails/nginx";
        ip4.addr = "192.168.1.61";
        allow.mount = "true";
        allow.raw_sockets="true";
        allow.sysvipc="true";

        linux = "new";
        sysvmsg = "new";
        sysvsem = "new";
        sysvshm = "new";
}

gitea {
        exec.start = "/bin/sh /etc/rc";
        exec.stop = "/bin/sh /etc/rc.shutdown";
        exec.clean;
        mount.devfs;

        host.hostname = "gitea.ttmath.org";
        
        path = "/2/jails/gitea";
        ip4.addr = "192.168.1.93";
        allow.mount = "true";
        allow.raw_sockets="true";
        allow.sysvipc="true";
        allow.chflags="true";

        linux = "new";
        sysvmsg = "new";
        sysvsem = "new";
        sysvshm = "new";
}

Make sure you have sysv* set in your config if you are using postgresql.

My limits for the gitea jail:

# rctl -h jail:gitea
jail:gitea:maxproc:deny=500
jail:gitea:openfiles:deny=49K
jail:gitea:memoryuse:deny=6144M

and the average resource use:

# rctl -h -u jail:gitea
cputime=39K
datasize=13M
stacksize=0
coredumpsize=0
memoryuse=929M
memorylocked=0
maxproc=28
openfiles=8968
vmemoryuse=5357M
pseudoterminals=1
swapuse=0
nthr=111
msgqqueued=0
msgqsize=0
nmsgq=0
nsem=0
nsemop=0
nshm=1
shmsize=4096
wallclock=21M
pcpu=0
readbps=0
writebps=0
readiops=0
writeiops=0

In fact, there are three separate instances of gitea in the jail - each with custom startup script. But the startup script is based on the standard /usr/local/etc/rc.d/gitea so this should not be the problem.

Someone said above that this error can be provoked by service restart so I have tried:

sh -c 'n=1000; while [ $n -ne 0 ] ; do service gitea3 restart ; n=`expr $n - 1` ; done'

but nothing wrong has happened. May this depends on the kind of the git repository. If you wish I can prepare you an empty gitea instance and you can then interact with it to see if this problem occurs.

@tsowa commented on GitHub (Jan 25, 2022): I have been using gitea for more than a year and have never had such problems. My configuration: $ uname -srm FreeBSD 13.0-STABLE amd64 Jails were created with make distribution and make installworld from /usr/src, actually I am using one jail as a template and then only tar -x to the specific location. The main system (host) is build from source, after host is built then the jails are updated with the same binaries: I am using a custom script which does mergemaster, make installworld, make delete-old(libs) to the specific jail location with -D and DESTDIR parameter. Packages are built in a different jail (I am not using poudiere) and are available to other jails through custom pkg repository. From pf (packet filter) I am making a redirect to a jail with nginx: ``` rdr on $ext_if proto tcp from any to {$main_ext_ip, $main_ext_ip2} port {80, 443} tag X51REDIRECT -> $jail_nginx_ip rdr on lo0 proto tcp from {$main_ext_ip, $main_ext_ip2, 192.168.1.0/24} to {$main_ext_ip, $main_ext_ip2} port {80, 443} tag X51REDIRECT -> $jail_nginx_ip pass in quick on $ext_if proto tcp from any to $jail_nginx_ip port { 80, 443 } flags S/SA keep state (if-bound) queue (www_out, ack_out) pass in quick on lo0 proto tcp from {$main_ext_ip, $main_ext_ip2} to $jail_nginx_ip port { 80, 443 } flags S/SA keep state (if-bound) queue (www_out, ack_out) pass out quick on lo0 proto tcp from 192.168.1.0/24 to {$main_ext_ip, $main_ext_ip2} port { 80, 443 } flags S/SA keep state (if-bound) pass in quick on lo0 proto tcp from 192.168.1.0/24 to $jail_nginx_ip port { 80, 443 } flags S/SA keep state (if-bound) queue (www_out, ack_out) ``` In the jail the nginx config looks like: ``` server { listen 192.168.1.61:80; listen 192.168.1.61:443 ssl; server_name gitea.ttmath.org; client_max_body_size 300M; ssl_certificate /etc/ssl/ttmath.org/certs.pem; ssl_certificate_key /etc/ssl/ttmath.org/privatekey.pem; ssl_dhparam /etc/ssl/ttmath.org/dhparam.pem; location / { proxy_pass http://192.168.1.93:3002; proxy_read_timeout 360s; include proxy-headers.conf; if ($scheme = http) { return 301 https://$host$request_uri; } } include location-well-known.conf; include ssl-params.conf; } ``` proxy-headers.conf: ``` proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; ``` ssl-params.conf: ``` ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; ssl_session_cache shared:SSL:1m; ssl_session_timeout 5m; ssl_ciphers HIGH:!aNULL:!MD5; ``` Nginx makes a proxy to the jail where gitea is installed together with postgresql. ``` $ pkg info -x gitea postgresql gitea-1.15.10 postgresql13-client-13.3 postgresql13-server-13.4_1 ``` My jails config /etc/jail.conf: ``` nginx { exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown"; exec.clean; mount.devfs; host.hostname = "nginx.ttmath.org"; path = "/home/jails/nginx"; ip4.addr = "192.168.1.61"; allow.mount = "true"; allow.raw_sockets="true"; allow.sysvipc="true"; linux = "new"; sysvmsg = "new"; sysvsem = "new"; sysvshm = "new"; } gitea { exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown"; exec.clean; mount.devfs; host.hostname = "gitea.ttmath.org"; path = "/2/jails/gitea"; ip4.addr = "192.168.1.93"; allow.mount = "true"; allow.raw_sockets="true"; allow.sysvipc="true"; allow.chflags="true"; linux = "new"; sysvmsg = "new"; sysvsem = "new"; sysvshm = "new"; } ``` Make sure you have sysv* set in your config if you are using postgresql. My limits for the gitea jail: ``` # rctl -h jail:gitea jail:gitea:maxproc:deny=500 jail:gitea:openfiles:deny=49K jail:gitea:memoryuse:deny=6144M ``` and the average resource use: ``` # rctl -h -u jail:gitea cputime=39K datasize=13M stacksize=0 coredumpsize=0 memoryuse=929M memorylocked=0 maxproc=28 openfiles=8968 vmemoryuse=5357M pseudoterminals=1 swapuse=0 nthr=111 msgqqueued=0 msgqsize=0 nmsgq=0 nsem=0 nsemop=0 nshm=1 shmsize=4096 wallclock=21M pcpu=0 readbps=0 writebps=0 readiops=0 writeiops=0 ``` In fact, there are three separate instances of gitea in the jail - each with custom startup script. But the startup script is based on the standard /usr/local/etc/rc.d/gitea so this should not be the problem. Someone said above that this error can be provoked by service restart so I have tried: ``` sh -c 'n=1000; while [ $n -ne 0 ] ; do service gitea3 restart ; n=`expr $n - 1` ; done' ``` but nothing wrong has happened. May this depends on the kind of the git repository. If you wish I can prepare you an empty gitea instance and you can then interact with it to see if this problem occurs.
Author
Owner

@tsowa commented on GitHub (Jan 25, 2022):

For completeness: I am not using ZFS but UFS with soft-updates:

$ mount | grep gm1s1a
/dev/mirror/gm1s1a on /2 (ufs, local, noatime, soft-updates)

In the past I have used soft update journaling but there were a bug in soft updated journaling which caused the file system to be locked: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292 and I have changed the filesystem configuration so that UFS only uses soft-updates now.

@tsowa commented on GitHub (Jan 25, 2022): For completeness: I am not using ZFS but UFS with soft-updates: ``` $ mount | grep gm1s1a /dev/mirror/gm1s1a on /2 (ufs, local, noatime, soft-updates) ``` In the past I have used soft update journaling but there were a bug in soft updated journaling which caused the file system to be locked: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292 and I have changed the filesystem configuration so that UFS only uses soft-updates now.
Author
Owner

@phryk commented on GitHub (Jan 27, 2022):

My setup looks a bit different. I'm not really sure what
(if anything) of this is useful to know, but here you go:

  • Intel Core i7-2600
  • 16G non-ECC RAM
  • 2x 3TB HDD
  • 128GB SSD
  • some RealTek NIC

Set up on this is a custom FreeBSD (12.2) install with redundant ZFS pools
and a hybrid HDD/SSD UFS mirror (which is where database-specific data lives).
All drives use 256 bit AES-XTS encryption via geom_eli, accelerated with AESNI.

All currently deployed packages are built by a poudriere run on the host OS,
but this is a relatively recent change on this setup and the bug already happened
way before it.

I'm running a custom thinjail setup using /etc/jail.conf in which I have just one
base-system (/jail/base) that's being mounted read-only into all jail roots with
nullfs and another read-write nullfs mount of /jail/rw/<jailname> for most
other stuff on top – the Postgres' data directory being a notable exception as
that's yet another read-write nullfs mount.

All jails are on an extra loopback interface, lo1 so communications to and
from them can't reach the internet without going through the firewall (pf).

I use nginx as reverse proxy with gitea running on TCP port 3000 in the same
jail (http). PostgreSQL however runs on a separate jail (database) and is
only exposed through a UNIX socket in a directory that is nullfs-mounted to
http so services on that jail can access the database (offers pretty neat and
granular access control).

I think I already said this to @zeripath on IRC, but it might be worth reiterating,
that I kind of get the feeling that the root cause of my problem might be buried
somewhere in nullfs.

@phryk commented on GitHub (Jan 27, 2022): My setup looks a bit different. I'm not really sure what (if anything) of this is useful to know, but here you go: * Intel Core i7-2600 * 16G non-ECC RAM * 2x 3TB HDD * 128GB SSD * some RealTek NIC Set up on this is a custom FreeBSD (12.2) install with redundant ZFS pools and a hybrid HDD/SSD UFS mirror (which is where database-specific data lives). All drives use 256 bit AES-XTS encryption via `geom_eli`, accelerated with AESNI. All currently deployed packages are built by a poudriere run on the host OS, but this is a relatively recent change on this setup and the bug already happened way before it. I'm running a custom thinjail setup using `/etc/jail.conf` in which I have just one base-system (`/jail/base`) that's being mounted read-only into all jail roots with `nullfs` and another read-write `nullfs` mount of `/jail/rw/<jailname>` for most other stuff on top – the Postgres' data directory being a notable exception as that's *yet another* read-write `nullfs` mount. All jails are on an extra loopback interface, `lo1` so communications to and from them can't reach the internet without going through the firewall (pf). I use nginx as reverse proxy with gitea running on TCP port 3000 in the same jail (`http`). PostgreSQL however runs on a separate jail (`database`) and is only exposed through a UNIX socket in a directory that is `nullfs`-mounted to `http` so services on that jail can access the database (offers pretty neat and granular access control). I think I already said this to @zeripath on IRC, but it might be worth reiterating, that I kind of get the feeling that the root cause of my problem might be buried somewhere in `nullfs`.
Author
Owner

@feld commented on GitHub (Jan 27, 2022):

I also use the nullfs trick to mount postgres, mysql, and syslogd sockets into jails but have never encountered issues with it FWIW

edit: what is your SSD? Some have small buffers that cause the entire device to choke if you fill it completely with writes

@feld commented on GitHub (Jan 27, 2022): I also use the nullfs trick to mount postgres, mysql, and syslogd sockets into jails but have never encountered issues with it FWIW edit: what is your SSD? Some have small buffers that cause the entire device to choke if you fill it completely with writes
Author
Owner

@phryk commented on GitHub (Mar 29, 2022):

30 days ago, I did a (minor) FreeBSD update which contained a new kernel
version and updated gitea to 1.15.10. Go might also have been updated,
tho I'm not sure about that – it's now at 1.17.6. I have seen no freezes since.

I'm closing this issue for now – feel free to reopen if you think this isn't warranted.

I would of course prefer to know what the underlying issue was, but it
sounds like that would require investing a huge amount of time. :P

@phryk commented on GitHub (Mar 29, 2022): 30 days ago, I did a (minor) FreeBSD update which contained a new kernel version and updated gitea to 1.15.10. Go might also have been updated, tho I'm not sure about that – it's now at 1.17.6. I have seen no freezes since. I'm closing this issue for now – feel free to reopen if you think this isn't warranted. I would of course prefer to know what the underlying issue was, but it sounds like that would require investing a huge amount of time. :P
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#8318