Recurring freezes on BSD #8318

@zeripath commented on GitHub (Jan 4, 2022):

This sounds like a deadlock happening somewhere but postgres isn't usually a DB that suffers from them. (SQLite is usually the way we detect these.) Apart from DB deadlocks I'm not sure there's any other obvious thing that would cause a deadlock.

One problem in 1.15.7- was that git cat-file --batch won't fail fast if the git repo is broken but #17992 should prevent that - Is this definitely still happening on 1.15.8?

If so, a few thoughts:

Do you have pprof enabled? (be careful with this as on 1.15 pprof will write files to PPROF_DATA_PATH every time people ssh to your server.) It may be helpful to poll the cpuprofile to see if we can catch the problem as it happens.
Have you been able to rule out a network port exhaustion problem? Or open files problem?
In the router logs have you seen any common requests that aren't complete?
I worry that what we're seeing might be a bug in go on FreeBSD - but I just don't know how to prove that.

@zeripath commented on GitHub (Jan 4, 2022): This sounds like a deadlock happening somewhere but postgres isn't usually a DB that suffers from them. (SQLite is usually the way we detect these.) Apart from DB deadlocks I'm not sure there's any other obvious thing that would cause a deadlock. One problem in 1.15.7- was that `git cat-file --batch` won't fail fast if the git repo is broken but #17992 should prevent that - Is this definitely still happening on 1.15.8? If so, a few thoughts: * Do you have pprof enabled? (be careful with this as on 1.15 pprof will write files to PPROF_DATA_PATH every time people ssh to your server.) It may be helpful to poll the cpuprofile to see if we can catch the problem as it happens. * Have you been able to rule out a network port exhaustion problem? Or open files problem? * In the router logs have you seen any common requests that aren't complete? * I worry that what we're seeing might be a bug in go on FreeBSD - but I just don't know how to prove that.

GiteaMirror commented

@phryk commented on GitHub (Jan 4, 2022):

Other services using PostgreSQL (with other databases) are unaffected, not sure if that means a DB deadlock can be excluded.
Yes, definitely happening on 1.15.8
Well, now I have pprof enabled, will report back when the bug hits again
I'm not quite sure what you mean with router logs and how I would tell a request didn't complete
- grep -i error gitea.log shows nothing of interest, the only legitimate errors are from gitea being unable
  to update a mirror whose source repo got DMCA'd…

@phryk commented on GitHub (Jan 4, 2022): * Other services using PostgreSQL (with other databases) are unaffected, not sure if that means a DB deadlock can be excluded. * Yes, definitely happening on 1.15.8 * Well, *now* I have pprof enabled, will report back when the bug hits again * I'm not quite sure what you mean with router logs and how I would tell a request didn't complete * `grep -i error gitea.log` shows nothing of interest, the only legitimate errors are from gitea being unable to update a mirror whose source repo got DMCA'd…

GiteaMirror commented

b25a571bc9/modules/context/repo.go (L551-L558)

@zeripath commented on GitHub (Jan 7, 2022):

The logs appear somewhat confusing:

2022/01/04 23:11:01 ...s/context/context.go:185:HTML() [D] Template: repo/commits
2022/01/04 23:11:01 models/repo.go:504:ComposeMetas() [I] [SQL] SELECT team.lower_name FROM "team_repo" INNER JOIN "team" ON team.id = team_repo.team_id WHERE (team_repo.repo_id = $1) ORDER BY team.lower_name [1] - 268.794µs
2022/01/04 23:11:02 ...s/context/context.go:755:1() [D] Session ID: <crawler>
2022/01/04 23:11:02 ...s/context/context.go:756:1() [D] CSRF Token: <crawler>
2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "user" WHERE "lower_name"=$1 LIMIT 1 [owner] - 463.192µs
2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "repository" WHERE "owner_id"=$1 AND "lower_name"=$2 LIMIT 1 [2 repo] - 413.763µs
2022/01/04 23:11:02 models/repo_unit.go:210:getUnitsByRepoID() [I] [SQL] SELECT "id", "repo_id", "type", "config", "created_unix" FROM "repo_unit" WHERE (repo_id = $1) [1] - 183.034µs
2022/01/04 23:11:02 ...s/repo_pushmirror.go:97:GetPushMirrorsByRepoID() [I] [SQL] SELECT "id", "repo_id", "remote_name", "interval", "created_unix", "last_update", "last_error" FROM "push_mirror" WHERE (repo_id=$1) [1] - 162.754µs
2022/01/04 23:11:02 ...ules/context/repo.go:470:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 [1 false] - 194.067µs
2022/01/04 23:11:02 ...ules/context/repo.go:477:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 AND is_tag=$3 [1 false false] - 196.112µs
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-parse
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 show-ref --tags
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch
2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch-check

This implies that the problem has occurred somewhere in:

But the problem is that there isn't really any place for a deadlock to occur in there.

My thoughts for progressing this further are to apply the following:

diff --git a/modules/git/command.go b/modules/git/command.go
index d83c42fdc..273bea632 100644
--- a/modules/git/command.go
+++ b/modules/git/command.go
@@ -120,6 +120,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		log.Debug("%s: %v", dir, c)
 	}
 
+	defer log.Debug("Done %s: %v", dir, c)
+
 	ctx, cancel := context.WithTimeout(c.parentContext, timeout)
 	defer cancel()
 
diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go
index 666ca81c1..b9a7a483f 100644
--- a/modules/git/repo_branch_nogogit.go
+++ b/modules/git/repo_branch_nogogit.go
@@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) {
 
 // callShowRef return refs, if limit = 0 it will not limit
 func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) {
+	log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+	defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+
 	stdoutReader, stdoutWriter := io.Pipe()
 	defer func() {
 		_ = stdoutReader.Close()
diff --git a/modules/process/manager.go b/modules/process/manager.go
index e42e38a0f..7ea6a73e6 100644
--- a/modules/process/manager.go
+++ b/modules/process/manager.go
@@ -14,6 +14,8 @@ import (
 	"sort"
 	"sync"
 	"time"
+
+	"code.gitea.io/gitea/modules/log"
 )
 
 // TODO: This packages still uses a singleton for the Manager.
@@ -56,6 +58,7 @@ func GetManager() *Manager {
 
 // Add a process to the ProcessManager and returns its PID.
 func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
+	log.Debug("Add(%s)", description)
 	pm.mutex.Lock()
 	pid := pm.counter + 1
 	pm.processes[pid] = &Process{
@@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
 	pm.counter = pid
 	pm.mutex.Unlock()
 
+	log.Debug("Done Add(%s) PID: %d", description, pid)
 	return pid
 }

That would help us to see if there is deadlock somewhere.

One final random thought is maybe the problem is in log/file.go - I guess it's worth another review.

@zeripath commented on GitHub (Jan 7, 2022): The logs appear somewhat confusing: ``` 2022/01/04 23:11:01 ...s/context/context.go:185:HTML() [D] Template: repo/commits 2022/01/04 23:11:01 models/repo.go:504:ComposeMetas() [I] [SQL] SELECT team.lower_name FROM "team_repo" INNER JOIN "team" ON team.id = team_repo.team_id WHERE (team_repo.repo_id = $1) ORDER BY team.lower_name [1] - 268.794µs 2022/01/04 23:11:02 ...s/context/context.go:755:1() [D] Session ID: <crawler> 2022/01/04 23:11:02 ...s/context/context.go:756:1() [D] CSRF Token: <crawler> 2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "user" WHERE "lower_name"=$1 LIMIT 1 [owner] - 463.192µs 2022/01/04 23:11:02 ...rm.io/xorm/engine.go:1139:Get() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "repository" WHERE "owner_id"=$1 AND "lower_name"=$2 LIMIT 1 [2 repo] - 413.763µs 2022/01/04 23:11:02 models/repo_unit.go:210:getUnitsByRepoID() [I] [SQL] SELECT "id", "repo_id", "type", "config", "created_unix" FROM "repo_unit" WHERE (repo_id = $1) [1] - 183.034µs 2022/01/04 23:11:02 ...s/repo_pushmirror.go:97:GetPushMirrorsByRepoID() [I] [SQL] SELECT "id", "repo_id", "remote_name", "interval", "created_unix", "last_update", "last_error" FROM "push_mirror" WHERE (repo_id=$1) [1] - 162.754µs 2022/01/04 23:11:02 ...ules/context/repo.go:470:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 [1 false] - 194.067µs 2022/01/04 23:11:02 ...ules/context/repo.go:477:RepoAssignment() [I] [SQL] SELECT count(*) FROM "release" WHERE repo_id=$1 AND is_draft=$2 AND is_tag=$3 [1 false false] - 196.112µs 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-parse 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 show-ref --tags 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch 2022/01/04 23:11:02 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/owner/repo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch-check ``` This implies that the problem has occurred somewhere in: https://github.com/go-gitea/gitea/blob/b25a571bc979d1a7483f488be7e342299837637b/modules/context/repo.go#L551-L558 But the problem is that there isn't really any place for a deadlock to occur in there. My thoughts for progressing this further are to apply the following: ```patch diff --git a/modules/git/command.go b/modules/git/command.go index d83c42fdc..273bea632 100644 --- a/modules/git/command.go +++ b/modules/git/command.go @@ -120,6 +120,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. log.Debug("%s: %v", dir, c) } + defer log.Debug("Done %s: %v", dir, c) + ctx, cancel := context.WithTimeout(c.parentContext, timeout) defer cancel() diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go index 666ca81c1..b9a7a483f 100644 --- a/modules/git/repo_branch_nogogit.go +++ b/modules/git/repo_branch_nogogit.go @@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) { // callShowRef return refs, if limit = 0 it will not limit func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) { + log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + stdoutReader, stdoutWriter := io.Pipe() defer func() { _ = stdoutReader.Close() diff --git a/modules/process/manager.go b/modules/process/manager.go index e42e38a0f..7ea6a73e6 100644 --- a/modules/process/manager.go +++ b/modules/process/manager.go @@ -14,6 +14,8 @@ import ( "sort" "sync" "time" + + "code.gitea.io/gitea/modules/log" ) // TODO: This packages still uses a singleton for the Manager. @@ -56,6 +58,7 @@ func GetManager() *Manager { // Add a process to the ProcessManager and returns its PID. func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { + log.Debug("Add(%s)", description) pm.mutex.Lock() pid := pm.counter + 1 pm.processes[pid] = &Process{ @@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { pm.counter = pid pm.mutex.Unlock() + log.Debug("Done Add(%s) PID: %d", description, pid) return pid } ``` That would help us to see if there is deadlock somewhere. One final random thought is maybe the problem is in log/file.go - I guess it's worth another review.

GiteaMirror commented

@zeripath commented on GitHub (Jan 15, 2022):

Have you seen any further freezes?

@zeripath commented on GitHub (Jan 15, 2022): Have you seen any further freezes?

GiteaMirror commented

@phryk commented on GitHub (Jan 16, 2022):

Yes and no. The bug seems to have triggered again, but this time the process actually managed to die.
Logs are on their way to you.

@phryk commented on GitHub (Jan 16, 2022): Yes and no. The bug seems to have triggered again, but this time the process actually managed to die. Logs are on their way to you.

GiteaMirror commented

6cb5069bf6/modules/git/command.go (L123-L155)

@zeripath commented on GitHub (Jan 16, 2022):

OK so somewhat relievingly the problem is not in process.GetManager().Add() and that the error has occurred in the same place suggests that logging is not to blame.

The problem lies somewhere in:

The only places left therefore are:

line 123: ctx, cancel := context.WithTimeout(c.parentContext, timeout) is this deadlocking somehow?
line 126: cmd := exec.CommandContext(ctx, c.name, c.args...) this requires stating the filesystem. Is the filesystem locking for some reason?
line 148: if err := cmd.Start(); err != nil { this involves starting the process itself so maybe there is some issue with bsd process creation here.

None of these are looking greatly soluble, I guess adding some logging to this section of code would be the only thing to do to move things closer to working out what the possible reason is.

diff --git a/modules/git/command.go b/modules/git/command.go
index d83c42fdc..71c398aae 100644
--- a/modules/git/command.go
+++ b/modules/git/command.go
@@ -120,9 +120,13 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		log.Debug("%s: %v", dir, c)
 	}
 
+	defer log.Debug("Done %s: %v", dir, c)
+
 	ctx, cancel := context.WithTimeout(c.parentContext, timeout)
 	defer cancel()
 
+	log.Debug("%s: %v created context", dir, c)
+
 	cmd := exec.CommandContext(ctx, c.name, c.args...)
 	if env == nil {
 		cmd.Env = os.Environ()
@@ -130,6 +134,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		cmd.Env = env
 	}
 
+	log.Debug("%s: %v created CommandContext", dir, c)
+
 	cmd.Env = append(
 		cmd.Env,
 		fmt.Sprintf("LC_ALL=%s", DefaultLocale),
@@ -149,6 +155,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time.
 		return err
 	}
 
+	log.Debug("%s: %v started", dir, c)
+
 	desc := c.desc
 	if desc == "" {
 		desc = fmt.Sprintf("%s %s %s [repo_path: %s]", GitExecutable, c.name, strings.Join(c.args, " "), dir)
diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go
index 666ca81c1..b9a7a483f 100644
--- a/modules/git/repo_branch_nogogit.go
+++ b/modules/git/repo_branch_nogogit.go
@@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) {
 
 // callShowRef return refs, if limit = 0 it will not limit
 func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) {
+	log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+	defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit)
+
 	stdoutReader, stdoutWriter := io.Pipe()
 	defer func() {
 		_ = stdoutReader.Close()
diff --git a/modules/process/manager.go b/modules/process/manager.go
index e42e38a0f..7ea6a73e6 100644
--- a/modules/process/manager.go
+++ b/modules/process/manager.go
@@ -14,6 +14,8 @@ import (
 	"sort"
 	"sync"
 	"time"
+
+	"code.gitea.io/gitea/modules/log"
 )
 
 // TODO: This packages still uses a singleton for the Manager.
@@ -56,6 +58,7 @@ func GetManager() *Manager {
 
 // Add a process to the ProcessManager and returns its PID.
 func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
+	log.Debug("Add(%s)", description)
 	pm.mutex.Lock()
 	pid := pm.counter + 1
 	pm.processes[pid] = &Process{
@@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 {
 	pm.counter = pid
 	pm.mutex.Unlock()
 
+	log.Debug("Done Add(%s) PID: %d", description, pid)
 	return pid
 }

(remember github likes to pretend the final empty line doesn't exist so if you copy this add a terminal empty line.)

@zeripath commented on GitHub (Jan 16, 2022): OK so somewhat relievingly the problem is not in process.GetManager().Add() and that the error has occurred in the same place suggests that logging is not to blame. The problem lies somewhere in: https://github.com/go-gitea/gitea/blob/6cb5069bf6679e0def94aabb7c52f9fecd4b3009/modules/git/command.go#L123-L155 The only places left therefore are: 1. line 123: `ctx, cancel := context.WithTimeout(c.parentContext, timeout)` is this deadlocking somehow? 2. line 126: `cmd := exec.CommandContext(ctx, c.name, c.args...)` this requires stating the filesystem. Is the filesystem locking for some reason? 3. line 148:` if err := cmd.Start(); err != nil {` this involves starting the process itself so maybe there is some issue with bsd process creation here. None of these are looking greatly soluble, I guess adding some logging to this section of code would be the only thing to do to move things closer to working out what the possible reason is. --- ```patch diff --git a/modules/git/command.go b/modules/git/command.go index d83c42fdc..71c398aae 100644 --- a/modules/git/command.go +++ b/modules/git/command.go @@ -120,9 +120,13 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. log.Debug("%s: %v", dir, c) } + defer log.Debug("Done %s: %v", dir, c) + ctx, cancel := context.WithTimeout(c.parentContext, timeout) defer cancel() + log.Debug("%s: %v created context", dir, c) + cmd := exec.CommandContext(ctx, c.name, c.args...) if env == nil { cmd.Env = os.Environ() @@ -130,6 +134,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. cmd.Env = env } + log.Debug("%s: %v created CommandContext", dir, c) + cmd.Env = append( cmd.Env, fmt.Sprintf("LC_ALL=%s", DefaultLocale), @@ -149,6 +155,8 @@ func (c *Command) RunInDirTimeoutEnvFullPipelineFunc(env []string, timeout time. return err } + log.Debug("%s: %v started", dir, c) + desc := c.desc if desc == "" { desc = fmt.Sprintf("%s %s %s [repo_path: %s]", GitExecutable, c.name, strings.Join(c.args, " "), dir) diff --git a/modules/git/repo_branch_nogogit.go b/modules/git/repo_branch_nogogit.go index 666ca81c1..b9a7a483f 100644 --- a/modules/git/repo_branch_nogogit.go +++ b/modules/git/repo_branch_nogogit.go @@ -68,6 +68,9 @@ func (repo *Repository) GetBranches(skip, limit int) ([]string, int, error) { // callShowRef return refs, if limit = 0 it will not limit func callShowRef(repoPath, prefix, arg string, skip, limit int) (branchNames []string, countAll int, err error) { + log.Debug("callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + defer log.Debug("done: callShowRef %s %s %s %d %d", repoPath, prefix, arg, skip, limit) + stdoutReader, stdoutWriter := io.Pipe() defer func() { _ = stdoutReader.Close() diff --git a/modules/process/manager.go b/modules/process/manager.go index e42e38a0f..7ea6a73e6 100644 --- a/modules/process/manager.go +++ b/modules/process/manager.go @@ -14,6 +14,8 @@ import ( "sort" "sync" "time" + + "code.gitea.io/gitea/modules/log" ) // TODO: This packages still uses a singleton for the Manager. @@ -56,6 +58,7 @@ func GetManager() *Manager { // Add a process to the ProcessManager and returns its PID. func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { + log.Debug("Add(%s)", description) pm.mutex.Lock() pid := pm.counter + 1 pm.processes[pid] = &Process{ @@ -67,6 +70,7 @@ func (pm *Manager) Add(description string, cancel context.CancelFunc) int64 { pm.counter = pid pm.mutex.Unlock() + log.Debug("Done Add(%s) PID: %d", description, pid) return pid } ``` (remember github likes to pretend the final empty line doesn't exist so if you copy this add a terminal empty line.)

GiteaMirror commented

@phryk commented on GitHub (Jan 16, 2022):

Patched, deployed and will get back next time the bug strikes. Thanks for the assistance. :)

@phryk commented on GitHub (Jan 16, 2022): Patched, deployed and will get back next time the bug strikes. Thanks for the assistance. :)

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 20, 2022):

@phryk thanks for posting this bug report. I'm having the same issue running Gitea in Bastille running postgres in the same container. I'm currently updating to the latest release but did notice that the hang happens the same way as you and have to fully restart the server when I need to update a setting or two (still getting Gitea moved over and configured properly from a Linux server).

Definitely seeing this issue with 1.15.10 on my end.

The easiest way to get it to trigger for me was to just spam service gitea restart until it freezes at Waiting for PIDS message. Sending this to the port maintainer.

@thearchivalone commented on GitHub (Jan 20, 2022): @phryk thanks for posting this bug report. I'm having the same issue running Gitea in Bastille running postgres in the same container. I'm currently updating to the latest release but did notice that the hang happens the same way as you and have to fully restart the server when I need to update a setting or two (still getting Gitea moved over and configured properly from a Linux server). Definitely seeing this issue with 1.15.10 on my end. The easiest way to get it to trigger for me was to just spam `service gitea restart` until it freezes at Waiting for PIDS message. Sending this to the port maintainer.

GiteaMirror commented

2025-11-02 08:02:09 -06:00

@zeripath commented on GitHub (Jan 20, 2022):

@bedwardly-down as I said above this isn't looking like a Gitea bug per se. The three points of possible deadlock are all deep in go std library code and likely at system calls.

My greatest suspicion is unfortunately falling at the context.WithTimeout call. If it's there then that's a serious problem and working around it will not be easy (although we could simply drop the WithTimeout assuming WithCancel is unaffected.)

@zeripath commented on GitHub (Jan 20, 2022): @bedwardly-down as I said above this isn't looking like a Gitea bug per se. The three points of possible deadlock are all deep in go std library code and likely at system calls. My greatest suspicion is unfortunately falling at the context.WithTimeout call. If it's there then that's a serious problem and working around it will not be easy (although we could simply drop the WithTimeout assuming WithCancel is unaffected.)

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 20, 2022):

@zeripath thanks for responding. I'm waiting on the FreeBSD port maintainer to get back to me on it but go apps having this kind of issue is not uncommon on FreeBSD, according to other maintainers I've been interacting with in the IRC and official Discord channels. I ran a test yesterday that kind of goes along with your assumptions too.

I'm primarily involved with the Nodejs ecosystem right now, so I use the PM2 process manager and Nodemon pretty regularly. Running Gitea through PM2 and having that strapped to FreeBSD's init system showed that the port was attempting to start and stop at the exact same time in various places causing it to deadlock. I had a similar occurrence with Caddy and a few other go apps on my old Gentoo server (it uses a modified version of FreeBSD's init system with most of the core functionality being exactly the same between the two). Running Gitea and Caddy outside of init but instead directly seemed to work fine in several of my tests.

@thearchivalone commented on GitHub (Jan 20, 2022): @zeripath thanks for responding. I'm waiting on the FreeBSD port maintainer to get back to me on it but go apps having this kind of issue is not uncommon on FreeBSD, according to other maintainers I've been interacting with in the IRC and official Discord channels. I ran a test yesterday that kind of goes along with your assumptions too. I'm primarily involved with the Nodejs ecosystem right now, so I use the PM2 process manager and Nodemon pretty regularly. Running Gitea through PM2 and having that strapped to FreeBSD's init system showed that the port was attempting to start and stop at the exact same time in various places causing it to deadlock. I had a similar occurrence with Caddy and a few other go apps on my old Gentoo server (it uses a modified version of FreeBSD's init system with most of the core functionality being exactly the same between the two). Running Gitea and Caddy outside of init but instead directly seemed to work fine in several of my tests.

GiteaMirror commented

2025-11-02 08:02:09 -06:00

@phryk commented on GitHub (Jan 20, 2022):

Bug hit again. Logs will shortly be on the way to @zeripath, last 3 lines from gitea.log look like this:

2022/01/20 03:24:11 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash>
2022/01/20 03:24:11 ...dules/git/command.go:128:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created context
2022/01/20 03:24:11 ...dules/git/command.go:137:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created CommandContext

@phryk commented on GitHub (Jan 20, 2022): Bug hit again. Logs will shortly be on the way to @zeripath, last 3 lines from `gitea.log` look like this: ``` 2022/01/20 03:24:11 ...dules/git/command.go:120:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> 2022/01/20 03:24:11 ...dules/git/command.go:128:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created context 2022/01/20 03:24:11 ...dules/git/command.go:137:RunInDirTimeoutEnvFullPipelineFunc() [D] /var/db/gitea/gitea-repositories/mirror/foo.git: /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --count <somehash> created CommandContext ```

GiteaMirror commented

2025-11-02 08:02:09 -06:00

@zeripath commented on GitHub (Jan 20, 2022):

OK well we've established that the problem is in starting the process:

6cb5069bf6/modules/git/command.go (L148)

I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do.

You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue?

@zeripath commented on GitHub (Jan 20, 2022): OK well we've established that the problem is in starting the process: https://github.com/go-gitea/gitea/blob/6cb5069bf6679e0def94aabb7c52f9fecd4b3009/modules/git/command.go#L148 I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do. You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue?

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 22, 2022):

OK well we've established that the problem is in starting the process:

6cb5069bf6/modules/git/command.go (L148)

I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do.

You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue?

To further clarify, PM2 was not how I normally ran Gitea. It was a test to see what’s actually happening behind the scenes with a tool that has a built in monitor that prints some basic but useful information. It also doesn’t seem to have any issues with Sqlite3.

@thearchivalone commented on GitHub (Jan 22, 2022): > OK well we've established that the problem is in starting the process: > > https://github.com/go-gitea/gitea/blob/6cb5069bf6679e0def94aabb7c52f9fecd4b3009/modules/git/command.go#L148 > > I think this is likely to be an os/jail problem - @bedwardly-down 's comment suggests that perhaps the issue might some deadlock in PM2 with processes being created at exactly the same time. If so, there's nothing we as gitea can do. > > You could try to use the gogit variant - as this will create a lot fewer calls to git - which might reduce the issue? To further clarify, PM2 was not how I normally ran Gitea. It was a test to see what’s actually happening behind the scenes with a tool that has a built in monitor that prints some basic but useful information. It also doesn’t seem to have any issues with Sqlite3.

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 23, 2022):

@phryk how dependent are you on getting your git up and running? A large chunk of my daily needs are built around git and version control, so this definitely was an inconvenience for me. I hope it gets fixed pretty quickly upstream.

@thearchivalone commented on GitHub (Jan 23, 2022): @phryk how dependent are you on getting your git up and running? A large chunk of my daily needs are built around git and version control, so this definitely was an inconvenience for me. I hope it gets fixed pretty quickly upstream.

GiteaMirror commented

@lunny commented on GitHub (Jan 23, 2022):

Have you catched the issue when enable pprof? If that, could you upload pprof report?

@lunny commented on GitHub (Jan 23, 2022): Have you catched the issue when enable pprof? If that, could you upload pprof report?

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 23, 2022):

I haven’t tried or heard of that. I’ll have to tinker with that later.

@thearchivalone commented on GitHub (Jan 23, 2022): I haven’t tried or heard of that. I’ll have to tinker with that later.

GiteaMirror commented

@phryk commented on GitHub (Jan 23, 2022):

@łunny I've tried that before, but with the process frozen, pprof doesn't answer anymore either.
@zeripath recommended periodically polling pprof and saving the last couple results. I might get to that in a couple days.

Would be easier, if the info collected by pprof would also go into the openmetrics output as that's already being polled. :P

@bedwardly-down I'm running a private gitea instance which I use for all my projects, so it is a bit of an inconvenience for me personally, but I don't have anyone else depending on the service.

@phryk commented on GitHub (Jan 23, 2022): @łunny I've tried that before, but with the process frozen, pprof doesn't answer anymore either. @zeripath recommended periodically polling pprof and saving the last couple results. I might get to that in a couple days. Would be easier, if the info collected by pprof would also go into the openmetrics output as that's already being polled. :P @bedwardly-down I'm running a private gitea instance which I use for all my projects, so it is a bit of an inconvenience for me personally, but I don't have anyone else depending on the service.

GiteaMirror commented

@zeripath commented on GitHub (Jan 23, 2022):

I'm really not sure pprof if going to help much. The problem appears to be in os/exec/exec.go within go's std library itself.

`os/exec/exec.go:(*cmd).Start()`

Walking through the code of Start in there points to:

	c.Process, err = os.StartProcess(c.Path, c.argv(), &os.ProcAttr{
		Dir:   c.Dir,
		Files: c.childFiles,
		Env:   addCriticalEnv(dedupEnv(envv)),
		Sys:   c.SysProcAttr,
	})

As the place where the problem is.

`syscall/exec_unix.go:os.StartProcess(...)`

Which on bsd, unix and linux all call forkExec in syscall/exec_unix.go. A cursory glance of this code shows:

...
	// Acquire the fork lock so that no other threads
	// create new fds that are not yet close-on-exec
	// before we fork.
	ForkLock.Lock()

	// Allocate child status pipe close on exec.
	if err = forkExecPipe(p[:]); err != nil {
		ForkLock.Unlock()
		return 0, err
	}

	// Kick off child.
	pid, err1 = forkAndExecInChild(argv0p, argvp, envvp, chroot, dir, attr, sys, p[1])
	if err1 != 0 {
		Close(p[0])
		Close(p[1])
		ForkLock.Unlock()
		return 0, Errno(err1)
	}
	ForkLock.Unlock()
...

Now if there is a panic in there the ForkLock could end up being left locked but the panic should be seen in our logs and I see no evidence of this.

Which leads me to think that either forkExecPipe or forkAndExecInChild are blocking.

`forkExecPipe`

Now forkExecPipe in linux is somewhat more complex than that on bsd:

BSD (syscall/forkpipe2.go):

func forkExecPipe(p []int) error {
	return Pipe2(p, O_CLOEXEC)
}

Linux (exec_linux,go):

// Try to open a pipe with O_CLOEXEC set on both file descriptors.
func forkExecPipe(p []int) (err error) {
	err = Pipe2(p, O_CLOEXEC)
	// pipe2 was added in 2.6.27 and our minimum requirement is 2.6.23, so it
	// might not be implemented.
	if err == ENOSYS {
		if err = Pipe(p); err != nil {
			return
		}
		if _, err = fcntl(p[0], F_SETFD, FD_CLOEXEC); err != nil {
			return
		}
		_, err = fcntl(p[1], F_SETFD, FD_CLOEXEC)
	}
	return
}

But the complexity here is simply falling back. So if there is a deadlock here it's in the syscall pipe2. https://www.freebsd.org/cgi/man.cgi?query=pipe2&sektion=2&format=html

The man page does not indicate that this could block but dtrace and ktrace will capture these calls.

`forkAndExecInChild`

The implementations between linux and bsd are substantially different here and it's getting into the deep systems programming level of starting processes that I'm afraid I know little to nothing about. The file is syscall/exec_bsd.go

	// About to call fork.
	// No more allocation or calls of non-assembly functions.
	runtime_BeforeFork()
	r1, _, err1 = RawSyscall(SYS_FORK, 0, 0, 0)
	if err1 != 0 {
		runtime_AfterFork()
		return 0, err1
	}

	if r1 != 0 {
		// parent; return PID
		runtime_AfterFork()
		return int(r1), 0
	}

Fundamentally there is an assembly call into runtime_BeforeFork which calls systemstack(beforeFork) which tells the system stack to block signals. (Could the blocking of signals be causing a problem? Does your jail send up a signal if Fork is blocked?)

Then https://www.freebsd.org/cgi/man.cgi?fork(2), and we check the pid returned - and if we're the parent or an error occurs returns the pid or error after running runtime_AfterFork() which calls systemstack(afterFork) reversing the changes of fork.

Remaining part of `forkExec` after `ForkLock.Unlock()`

	// Read child error status from pipe.
	Close(p[1])
	for {
		n, err = readlen(p[0], (*byte)(unsafe.Pointer(&err1)), int(unsafe.Sizeof(err1)))
		if err != EINTR {
			break
		}
	}
	Close(p[0])
	if err != nil || n != 0 {
		if n == int(unsafe.Sizeof(err1)) {
			err = Errno(err1)
		}
		if err == nil {
			err = EPIPE
		}

		// Child failed; wait for it to exit, to make sure
		// the zombies don't accumulate.
		_, err1 := Wait4(pid, &wstatus, 0, nil)
		for err1 == EINTR {
			_, err1 = Wait4(pid, &wstatus, 0, nil)
		}
		return 0, err
	}

	// Read got EOF, so pipe closed on exec, so exec succeeded.
	return pid, nil

I guess this could block - but I don't understand why it would take down the whole system.

Summary

The blocking has been traced to os/exec/exec.go:(*cmd).Start()
It's likely to be in syscall/exec_unix.go:os.StartProcess(...) and hence forkExec in syscall/exec_unix.go.
The reports of the entire go process halting make me suspicious about something happening around forkAndExecInChild in syscall/exec_bsd.go.
In particular around the call to fork(2) either at that call or at the systemstack calls
This should be dtraceable or ktraceable.
Alternatively something happening around pipe(2) in forkAndExecInChild could be blocking but again this should be dtraceable/ktraceable.
After the ForkLock has unlocked there is some code around reading the error from the child pipe but this shouldn't kill the whole go process.

So we're likely looking at a bug in go's runtime either due to some weird resource limit handling of the jail and the way it reports issues to go, or even more difficult to fix some bug in the OS about creating processes after some limit has been reached.

Maybe this https://github.com/golang/go/issues/43873 is related?

You've never told us the version of FreeBSD you're running or given us any information about how you have set up the jail. Could the parameters of your jail responsible?

@zeripath commented on GitHub (Jan 23, 2022): I'm really not sure pprof if going to help much. The problem appears to be in `os/exec/exec.go` within go's std library itself. ## `os/exec/exec.go:(*cmd).Start()` Walking through the code of Start in there points to: ```go c.Process, err = os.StartProcess(c.Path, c.argv(), &os.ProcAttr{ Dir: c.Dir, Files: c.childFiles, Env: addCriticalEnv(dedupEnv(envv)), Sys: c.SysProcAttr, }) ``` As the place where the problem is. ## `syscall/exec_unix.go:os.StartProcess(...)` Which on bsd, unix and linux all call `forkExec` in syscall/exec_unix.go. A cursory glance of this code shows: ```go ... // Acquire the fork lock so that no other threads // create new fds that are not yet close-on-exec // before we fork. ForkLock.Lock() // Allocate child status pipe close on exec. if err = forkExecPipe(p[:]); err != nil { ForkLock.Unlock() return 0, err } // Kick off child. pid, err1 = forkAndExecInChild(argv0p, argvp, envvp, chroot, dir, attr, sys, p[1]) if err1 != 0 { Close(p[0]) Close(p[1]) ForkLock.Unlock() return 0, Errno(err1) } ForkLock.Unlock() ... ``` Now if there is a panic in there the `ForkLock` could end up being left locked but the panic **should** be seen in our logs and I see no evidence of this. Which leads me to think that either `forkExecPipe` or `forkAndExecInChild` are blocking. ## `forkExecPipe` Now `forkExecPipe` in linux is somewhat more complex than that on bsd: BSD (`syscall/forkpipe2.go`): ```go func forkExecPipe(p []int) error { return Pipe2(p, O_CLOEXEC) } ``` Linux (`exec_linux,go`): ```go // Try to open a pipe with O_CLOEXEC set on both file descriptors. func forkExecPipe(p []int) (err error) { err = Pipe2(p, O_CLOEXEC) // pipe2 was added in 2.6.27 and our minimum requirement is 2.6.23, so it // might not be implemented. if err == ENOSYS { if err = Pipe(p); err != nil { return } if _, err = fcntl(p[0], F_SETFD, FD_CLOEXEC); err != nil { return } _, err = fcntl(p[1], F_SETFD, FD_CLOEXEC) } return } ``` But the complexity here is simply falling back. So if there is a deadlock here it's in the syscall pipe2. <https://www.freebsd.org/cgi/man.cgi?query=pipe2&sektion=2&format=html> The man page does not indicate that this could block but dtrace and ktrace will capture these calls. ## ``forkAndExecInChild`` The implementations between linux and bsd are substantially different here and it's getting into the deep systems programming level of starting processes that I'm afraid I know little to nothing about. The file is `syscall/exec_bsd.go` ``` // About to call fork. // No more allocation or calls of non-assembly functions. runtime_BeforeFork() r1, _, err1 = RawSyscall(SYS_FORK, 0, 0, 0) if err1 != 0 { runtime_AfterFork() return 0, err1 } if r1 != 0 { // parent; return PID runtime_AfterFork() return int(r1), 0 } ``` Fundamentally there is an assembly call into `runtime_BeforeFork` which calls `systemstack(beforeFork)` which tells the system stack to block signals. (Could the blocking of signals be causing a problem? Does your jail send up a signal if Fork is blocked?) Then <https://www.freebsd.org/cgi/man.cgi?fork(2)>, and we check the pid returned - and if we're the parent or an error occurs returns the pid or error after running `runtime_AfterFork()` which calls `systemstack(afterFork)` reversing the changes of fork. ## Remaining part of `forkExec` after `ForkLock.Unlock()` ```go // Read child error status from pipe. Close(p[1]) for { n, err = readlen(p[0], (*byte)(unsafe.Pointer(&err1)), int(unsafe.Sizeof(err1))) if err != EINTR { break } } Close(p[0]) if err != nil || n != 0 { if n == int(unsafe.Sizeof(err1)) { err = Errno(err1) } if err == nil { err = EPIPE } // Child failed; wait for it to exit, to make sure // the zombies don't accumulate. _, err1 := Wait4(pid, &wstatus, 0, nil) for err1 == EINTR { _, err1 = Wait4(pid, &wstatus, 0, nil) } return 0, err } // Read got EOF, so pipe closed on exec, so exec succeeded. return pid, nil ``` I guess this could block - but I don't understand why it would take down the whole system. ## Summary * The blocking has been traced to `os/exec/exec.go:(*cmd).Start()` * It's likely to be in `syscall/exec_unix.go:os.StartProcess(...)` and hence `forkExec` in `syscall/exec_unix.go`. * The reports of the entire go process halting make me suspicious about something happening around `forkAndExecInChild` in `syscall/exec_bsd.go`. * In particular around the call to `fork(2)` either at that call or at the `systemstack` calls * This should be dtraceable or ktraceable. * Alternatively something happening around `pipe(2)` in `forkAndExecInChild` could be blocking but again this should be dtraceable/ktraceable. * After the `ForkLock` has unlocked there is some code around reading the error from the child pipe but this shouldn't kill the whole go process. So we're likely looking at a bug in go's runtime either due to some weird resource limit handling of the jail and the way it reports issues to go, or even more difficult to fix some bug in the OS about creating processes after some limit has been reached. Maybe this https://github.com/golang/go/issues/43873 is related? You've never told us the version of FreeBSD you're running or given us any information about how you have set up the jail. Could the parameters of your jail responsible?

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 23, 2022):

@zeripath Thanks for looking into this as much as you have. For mine:

FreeBSD 13.0 p6.0 (latest stable patches and kernel)
Jail - BastilleBSD latest 0.9 patches
Postgres 13.5
Intel Xeon 3cpu 2.6GHz per core
4 GB DDR4
75GB NVMe SSD

@phryk I just had a thought: are you running postgres in the same jail as Gitea instead of separately and interacting with it through a port? To get it to work, it needed some modifications to its main config file to allow it to use more RAM and other system resources than most BSD jails initially allocate. I haven’t tried postgres in a separate jail yet since many online sources recommended encapsulating DB and its software in a single unit to allow easier transfer and backup. I wonder if the deadlock is caused because of that.

@thearchivalone commented on GitHub (Jan 23, 2022): @zeripath Thanks for looking into this as much as you have. For mine: - FreeBSD 13.0 p6.0 (latest stable patches and kernel) - Jail - BastilleBSD latest 0.9 patches - Postgres 13.5 - Intel Xeon 3cpu 2.6GHz per core - 4 GB DDR4 - 75GB NVMe SSD @phryk I just had a thought: are you running postgres in the same jail as Gitea instead of separately and interacting with it through a port? To get it to work, it needed some modifications to its main config file to allow it to use more RAM and other system resources than most BSD jails initially allocate. I haven’t tried postgres in a separate jail yet since many online sources recommended encapsulating DB and its software in a single unit to allow easier transfer and backup. I wonder if the deadlock is caused because of that.

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 23, 2022):

The issue wasn’t solved by moving postgres to a separate container for me. Even running postgres through a port and connecting that way caused gitea to hang. Removing the app.ini and letting it try to generate a new one also caused it to hang even with the /usr/local/etc/gitea/conf directory permissions set to 755 or 777. So far, it looks like SQLite is the definitive way to run it right now.

Can I get anyone else to try it with Sqlite? So far, that works in my tests but isn’t ideal if you have big repositories

@thearchivalone commented on GitHub (Jan 23, 2022): The issue wasn’t solved by moving postgres to a separate container for me. Even running postgres through a port and connecting that way caused gitea to hang. Removing the app.ini and letting it try to generate a new one also caused it to hang even with the /usr/local/etc/gitea/conf directory permissions set to 755 or 777. So far, it looks like SQLite is the definitive way to run it right now. Can I get anyone else to try it with Sqlite? So far, that works in my tests but isn’t ideal if you have big repositories

GiteaMirror commented

@zeripath commented on GitHub (Jan 23, 2022):

@bedwardly-down are you sure that you have the same problem as @phryk ? It would useful to double check to see where your hangs are happening. I cannot see why SQLite would be better for a problem relating to forking.

If you're finding that SQLite is better then could you try connecting to postgres over a unix port instead tcp port as it might be that you're suffering tcp port-exhaustion instead.

Thinking again to this blocking problem relating to fork I wonder if the block is happening is because a page fault is occurring in fork and the signal cannot be handled?

@phyrk what version of go are you building Gitea with? Please ensure that go is the most recent version. It might help to allow.mlock in the jail.

@zeripath commented on GitHub (Jan 23, 2022): @bedwardly-down are you sure that you have the same problem as @phryk ? It would useful to double check to see where your hangs are happening. I cannot see why SQLite would be better for a problem relating to forking. If you're finding that SQLite is better then could you try connecting to postgres over a unix port instead tcp port as it might be that you're suffering tcp port-exhaustion instead. --- Thinking again to this blocking problem relating to fork I wonder if the block is happening is because a page fault is occurring in fork and the signal cannot be handled? @phyrk what version of go are you building Gitea with? Please ensure that go is the most recent version. It might help to `allow.mlock` in the jail.

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 23, 2022):

@zeripath honestly, it may be different. There’s not enough info from the original poster to get a clear picture and the only common thread we have is postgres and I’m grasping straws using my own limited knowledge of how gitea and its database support works.

@thearchivalone commented on GitHub (Jan 23, 2022): @zeripath honestly, it may be different. There’s not enough info from the original poster to get a clear picture and the only common thread we have is postgres and I’m grasping straws using my own limited knowledge of how gitea and its database support works.

GiteaMirror commented

@zeripath commented on GitHub (Jan 23, 2022):

@bedwardly-down could you apply the patch in https://github.com/go-gitea/gitea/issues/18180#issuecomment-1013916215 and check your logs for when then deadlock occurs to see if it's at the same place.

If you find that the final log line is the same as in @phryk you're hitting the same problem. If not - well then we get to find another bug.

If switching to sqlite helps then that means port exhaustion is more likely. Similarly if using a unix port for postgres helps then it's far more likely to be a port exhaustion problem.

@zeripath commented on GitHub (Jan 23, 2022): @bedwardly-down could you apply the patch in https://github.com/go-gitea/gitea/issues/18180#issuecomment-1013916215 and check your logs for when then deadlock occurs to see if it's at the same place. If you find that the final log line is the same as in @phryk you're hitting the same problem. If not - well then we get to find another bug. If switching to sqlite helps then that means port exhaustion is more likely. Similarly if using a unix port for postgres helps then it's far more likely to be a port exhaustion problem.

GiteaMirror commented

@thearchivalone commented on GitHub (Jan 23, 2022):

Didn’t apply the patch. Running it straight through the unix socket inside the same container finally solved my issue. Looking at the system logs, there was no indication that anything was happening prior running through a port from either a separate or local port that my permissions were wrong within Postgres. I fixed them and can now run Gitea with no issues. So, I’ll have to agree that mine was probably port exhaustion as you suggested.

@thearchivalone commented on GitHub (Jan 23, 2022): Didn’t apply the patch. Running it straight through the unix socket inside the same container finally solved my issue. Looking at the system logs, there was no indication that anything was happening prior running through a port from either a separate or local port that my permissions were wrong within Postgres. I fixed them and can now run Gitea with no issues. So, I’ll have to agree that mine was probably port exhaustion as you suggested.

GiteaMirror commented