Queues not terminating gracefully #10315

Closed
opened 2025-11-02 09:04:02 -06:00 by GiteaMirror · 19 comments
Owner

Originally created by @brechtvl on GitHub (Feb 21, 2023).

Description

We are running into the issue where a Gitea restart happens during merge conflict checking, PRs remain stuck in conflict state forever. I think there's two distinct issues here, one is that these don't get unstuck on new pushes and perhaps that's best left for another report.

However the reason things get into this state in the first place seems to be a problem in shutdown terminating all the queues immediately instead of respecting the hammer time.

When starting ./gitea web and then do killall gitea , I see the following in the log:

2023/02/21 19:43:26 ...eful/manager_unix.go:149:handleSignals() [W] [63f510cd-4] PID 1500309. Received SIGTERM. Shutting down...
2023/02/21 19:43:26 cmd/web.go:271:listen() [I] [63f510cd-48] HTTP Listener: 0.0.0.0:3000 Closed
2023/02/21 19:43:26 ...eue/queue_channel.go:127:func1() [W] ChannelQueue: task-channel Terminated before completed flushing
2023/02/21 19:43:26 .../graceful/manager.go:205:doHammerTime() [W] Setting Hammer condition
2023/02/21 19:43:26 ...eue/queue_channel.go:127:func1() [W] ChannelQueue: notification-service-channel Terminated before completed flushing
2023/02/21 19:43:26 ...eue/queue_channel.go:127:func1() [W] ChannelQueue: push_update-channel Terminated before completed flushing
2023/02/21 19:43:26 ...eful/server_hooks.go:46:doShutdown() [I] [63f510cd-48] PID: 1500309 Listener ([::]:3000) closed.
2023/02/21 19:43:27 .../graceful/manager.go:224:doTerminate() [W] Terminating
2023/02/21 19:43:27 ...er/issues/indexer.go:201:2() [I] PID: 1500309 Issue Indexer closed
2023/02/21 19:43:27 ...eful/manager_unix.go:157:handleSignals() [W] PID: 1500309. Background context for manager closed - context canceled - Shutting down...
2023/02/21 19:43:27 cmd/web.go:183:runWeb() [I] PID: 1500309 Gitea Web Finished

This log is from my local test instance with configuration set to defaults as much as possible.

On our production instance that Terminated before completed flushing is leading to a lot of different errors as workers get terminated in the middle of what they're doing.

I can provide more detailed logs if needed, but maybe this is easy to redo on any instance.

Gitea Version

main (43405c3)

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

Own build from main.

Database

None

Originally created by @brechtvl on GitHub (Feb 21, 2023). ### Description We are running into the issue where a Gitea restart happens during merge conflict checking, PRs remain stuck in conflict state forever. I think there's two distinct issues here, one is that these don't get unstuck on new pushes and perhaps that's best left for another report. However the reason things get into this state in the first place seems to be a problem in shutdown terminating all the queues immediately instead of respecting the hammer time. When starting `./gitea web` and then do `killall gitea` , I see the following in the log: ``` 2023/02/21 19:43:26 ...eful/manager_unix.go:149:handleSignals() [W] [63f510cd-4] PID 1500309. Received SIGTERM. Shutting down... 2023/02/21 19:43:26 cmd/web.go:271:listen() [I] [63f510cd-48] HTTP Listener: 0.0.0.0:3000 Closed 2023/02/21 19:43:26 ...eue/queue_channel.go:127:func1() [W] ChannelQueue: task-channel Terminated before completed flushing 2023/02/21 19:43:26 .../graceful/manager.go:205:doHammerTime() [W] Setting Hammer condition 2023/02/21 19:43:26 ...eue/queue_channel.go:127:func1() [W] ChannelQueue: notification-service-channel Terminated before completed flushing 2023/02/21 19:43:26 ...eue/queue_channel.go:127:func1() [W] ChannelQueue: push_update-channel Terminated before completed flushing 2023/02/21 19:43:26 ...eful/server_hooks.go:46:doShutdown() [I] [63f510cd-48] PID: 1500309 Listener ([::]:3000) closed. 2023/02/21 19:43:27 .../graceful/manager.go:224:doTerminate() [W] Terminating 2023/02/21 19:43:27 ...er/issues/indexer.go:201:2() [I] PID: 1500309 Issue Indexer closed 2023/02/21 19:43:27 ...eful/manager_unix.go:157:handleSignals() [W] PID: 1500309. Background context for manager closed - context canceled - Shutting down... 2023/02/21 19:43:27 cmd/web.go:183:runWeb() [I] PID: 1500309 Gitea Web Finished ``` This log is from my local test instance with configuration set to defaults as much as possible. On our production instance that `Terminated before completed flushing` is leading to a lot of different errors as workers get terminated in the middle of what they're doing. I can provide more detailed logs if needed, but maybe this is easy to redo on any instance. ### Gitea Version main (43405c3) ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist _No response_ ### Screenshots _No response_ ### Git Version _No response_ ### Operating System _No response_ ### How are you running Gitea? Own build from main. ### Database None
GiteaMirror added the type/bug label 2025-11-02 09:04:02 -06:00
Author
Owner

@techknowlogick commented on GitHub (Feb 21, 2023):

For debugging purposes: [W] [63f40381] cannot get cache context when getting data: &{context.Background.WithCancel.WithCancel.WithValue(type pprof.labelContextKey, val {"graceful-lifecycle":"with-hammer"}) was also found in the logs, which traces back to the following function:
https://sourcegraph.com/github.com/go-gitea/gitea/-/blob/modules/cache/context.go?L58

Edit: this is also showing up on gitea.com, although not during shutdowns.

@techknowlogick commented on GitHub (Feb 21, 2023): For debugging purposes: `[W] [63f40381] cannot get cache context when getting data: &{context.Background.WithCancel.WithCancel.WithValue(type pprof.labelContextKey, val {"graceful-lifecycle":"with-hammer"})` was also found in the logs, which traces back to the following function: https://sourcegraph.com/github.com/go-gitea/gitea/-/blob/modules/cache/context.go?L58 Edit: this is also showing up on gitea.com, although not during shutdowns.
Author
Owner

@brechtvl commented on GitHub (Feb 21, 2023):

I'm also not seeing "graceful-lifecycle":"with-hammer" messages when simply starting and stopping gitea. But when I for example do a push that causes conflict checking workers to be run for PRs, then they start showing up.

So I guess there is some problem with cache context in the workers?

@brechtvl commented on GitHub (Feb 21, 2023): I'm also not seeing `"graceful-lifecycle":"with-hammer"` messages when simply starting and stopping gitea. But when I for example do a push that causes conflict checking workers to be run for PRs, then they start showing up. So I guess there is some problem with cache context in the workers?
Author
Owner

@zeripath commented on GitHub (Feb 21, 2023):

If I had to guess this is going to be a misreport and there's going to be a non-deterministic hierarchy of cancelled contexts - the wrong one of which is being picked and reported. I'd need to spend a good few hours looking through the code thinking about the concurrency again which I won't have time to look at until the weekend.

@zeripath commented on GitHub (Feb 21, 2023): If I had to guess this is going to be a misreport and there's going to be a non-deterministic hierarchy of cancelled contexts - the wrong one of which is being picked and reported. I'd need to spend a good few hours looking through the code thinking about the concurrency again which I won't have time to look at until the weekend.
Author
Owner

@lunny commented on GitHub (Feb 21, 2023):

The context cache should be used in http request level currently. Do you have a redid cache server?

@lunny commented on GitHub (Feb 21, 2023): The context cache should be used in http request level currently. Do you have a redid cache server?
Author
Owner

@techknowlogick commented on GitHub (Feb 21, 2023):

@lunny they are using memcached for cache

@techknowlogick commented on GitHub (Feb 21, 2023): @lunny they are using memcached for cache
Author
Owner

@lunny commented on GitHub (Feb 21, 2023):

@lunny they are using memcached for cache

This maybe occurs when Gitea start up.

@lunny commented on GitHub (Feb 21, 2023): > @lunny they are using memcached for cache This maybe occurs when Gitea start up.
Author
Owner

@brechtvl commented on GitHub (Feb 21, 2023):

I see these Terminated before completed flushing warning messages also in v1.16.9 (had some problems testing earlier). So I'm not sure if this is a problem that has existed for a while, or if these messages are not as serious as they sound and the problem is elsewhere.

Here is the part of the log from the production server where it got the shutdown signal and then ran into errors, if it helps:
https://gist.github.com/brechtvl/e4778293c1c919a4ec896a31e418677c

@brechtvl commented on GitHub (Feb 21, 2023): I see these `Terminated before completed flushing` warning messages also in `v1.16.9` (had some problems testing earlier). So I'm not sure if this is a problem that has existed for a while, or if these messages are not as serious as they sound and the problem is elsewhere. Here is the part of the log from the production server where it got the shutdown signal and then ran into errors, if it helps: https://gist.github.com/brechtvl/e4778293c1c919a4ec896a31e418677c
Author
Owner

@techknowlogick commented on GitHub (Feb 21, 2023):

@lunny the same ctx error can be seen in Gitea.com logs and occurs at times other than startup and shutdown

@techknowlogick commented on GitHub (Feb 21, 2023): @lunny the same ctx error can be seen in Gitea.com logs and occurs at times other than startup and shutdown
Author
Owner

@lunny commented on GitHub (Feb 22, 2023):

I have sent a PR #23054, but I don't think that will fix this problem in fact. It will remove the unnecessary warning logs.

@lunny commented on GitHub (Feb 22, 2023): I have sent a PR #23054, but I don't think that will fix this problem in fact. It will remove the unnecessary warning logs.
Author
Owner

@lunny commented on GitHub (Feb 22, 2023):

I see these Terminated before completed flushing warning messages also in v1.16.9 (had some problems testing earlier). So I'm not sure if this is a problem that has existed for a while, or if these messages are not as serious as they sound and the problem is elsewhere.

Here is the part of the log from the production server where it got the shutdown signal and then ran into errors, if it helps: https://gist.github.com/brechtvl/e4778293c1c919a4ec896a31e418677c

From the logs, looks like it's a normal terminal process.

@lunny commented on GitHub (Feb 22, 2023): > I see these `Terminated before completed flushing` warning messages also in `v1.16.9` (had some problems testing earlier). So I'm not sure if this is a problem that has existed for a while, or if these messages are not as serious as they sound and the problem is elsewhere. > > Here is the part of the log from the production server where it got the shutdown signal and then ran into errors, if it helps: https://gist.github.com/brechtvl/e4778293c1c919a4ec896a31e418677c From the logs, looks like it's a normal terminal process.
Author
Owner

@brechtvl commented on GitHub (Feb 22, 2023):

Everything seems to shut down within 1s while GRACEFUL_HAMMER_TIME was left to the default 60s. I was expecting it to wait up to 60 seconds for any ongoing work to complete. Or is that only meant for requests, not queues?

If it's not meant for queues, is the mechanism that work gets added back to the queue on such errors, to be redone after restart? It wasn't doing that as far as I could tell, but I may well have missed something.

@brechtvl commented on GitHub (Feb 22, 2023): Everything seems to shut down within 1s while `GRACEFUL_HAMMER_TIME` was left to the default 60s. I was expecting it to wait up to 60 seconds for any ongoing work to complete. Or is that only meant for requests, not queues? If it's not meant for queues, is the mechanism that work gets added back to the queue on such errors, to be redone after restart? It wasn't doing that as far as I could tell, but I may well have missed something.
Author
Owner

@lunny commented on GitHub (Feb 22, 2023):

I think the logic should be if all workers shut down less than 60s, then it will exit immediately. If some workers shutting down time out(address 60s), then hammer shutdown will be executed and the program will exit at that time.

@lunny commented on GitHub (Feb 22, 2023): I think the logic should be if all workers shut down less than 60s, then it will exit immediately. If some workers shutting down time out(address 60s), then hammer shutdown will be executed and the program will exit at that time.
Author
Owner

@brechtvl commented on GitHub (Feb 22, 2023):

I found more details (and a very hacky workaround) for pull requests getting stuck in conflict checking.

  • After shutdown in the middle of pr_patch_checker work, a persistent queue of remaining PRs to be checked is saved to disk.
  • On restart these are loaded into the persistent queue of prPatchCheckerQueue, but this seemingly does not cause any work to start.
  • Also on restart, InitializePullRequests will put any PRs in need of checking in the channel queue of prPatchCheckerQueue and start working on them.
  • However checkAndUpdateStatus never updates the PR status because prPatchCheckerQueue.Has returns true, due to this same PR still being in the persistent queue of prPatchCheckerQueue.

The persistent queue in this case seems unnecessary as InitializePullRequests can already restart conflict checking for any PRs that need it.

@brechtvl commented on GitHub (Feb 22, 2023): I found more details (and a very hacky workaround) for pull requests getting stuck in conflict checking. * After shutdown in the middle of `pr_patch_checker` work, a persistent queue of remaining PRs to be checked is saved to disk. * On restart these are loaded into the persistent queue of `prPatchCheckerQueue`, but this seemingly does not cause any work to start. * Also on restart, `InitializePullRequests` will put any PRs in need of checking in the channel queue of `prPatchCheckerQueue` and start working on them. * However `checkAndUpdateStatus` never updates the PR status because `prPatchCheckerQueue.Has` returns true, due to this same PR still being in the persistent queue of `prPatchCheckerQueue`. The persistent queue in this case seems unnecessary as `InitializePullRequests` can already restart conflict checking for any PRs that need it.
Author
Owner

@brechtvl commented on GitHub (Feb 22, 2023):

The steps to reproduce this are:

  • Run Gitea with default configuration
  • Migrate Blender repository from http://github.com/blender/blender
  • Create about 10 pull requests (e.g. by editing a single file in the UI and creating a PR)
  • Edit another file through the UI and commit to main directly
  • killall gitea quickly while conflict checking is going on
  • After restart, some pull requests will remain stuck in conflict checking no matter if you do more commits on main or the PR head branch
@brechtvl commented on GitHub (Feb 22, 2023): The steps to reproduce this are: * Run Gitea with default configuration * Migrate Blender repository from http://github.com/blender/blender * Create about 10 pull requests (e.g. by editing a single file in the UI and creating a PR) * Edit another file through the UI and commit to main directly * `killall gitea` quickly while conflict checking is going on * After restart, some pull requests will remain stuck in conflict checking no matter if you do more commits on main or the PR head branch
Author
Owner

@wxiaoguang commented on GitHub (Feb 23, 2023):

This is a test for this problem

The results are quite surprising.

TestPullRequest_QueueStuck completed1=1, executed2=0, has2=20, executed3=100, has3=19
TestPullRequest_QueueStuck completed1=4, executed2=0, has2=20, executed3=100, has3=19
TestPullRequest_QueueStuck completed1=4, executed2=0, has2=20, executed3=101, has3=19
func TestPullRequest_QueueStuck(t *testing.T) {
	setting_module.AppWorkPath = "/tmp"
	_ = util.RemoveAll("/tmp/data")
	unittest.PrepareTestEnv(t)
	setting_module.InitProviderAndLoadCommonSettingsForTest()
	setting_module.LoadQueueSettings()

	q1 := func() (completedTasks []string) {
		startWhen100Ready := make(chan struct{}) // only start data cnosuming when the 100 tasks are all pushed into queue
		stopAt20Shutdown := make(chan struct{})  // stop and shutdown at the 20th item

		testHandler := func(data ...queue.Data) []queue.Data {
			<-startWhen100Ready
			time.Sleep(100 * time.Millisecond)
			for _, datum := range data {
				s := datum.(string)
				completedTasks = append(completedTasks, s)
				if s == "task-20" {
					close(stopAt20Shutdown)
					return nil
				}
			}
			return nil
		}

		q := queue.CreateUniqueQueue("pr_patch_checker_test", testHandler, "")
		q.Run(func(atShutdown func()) { go func() { <-stopAt20Shutdown; atShutdown() }() }, func(atTerminate func()) {})

		// add 100 tasks to the queue
		for i := 0; i < 100; i++ {
			_ = q.Push("task-" + strconv.Itoa(i))
		}

		close(startWhen100Ready)
		<-stopAt20Shutdown
		return
	}

	q2 := func() (executedTasks []string, hasTasks []string) {
		stop := make(chan struct{})

		// collect the tasks that have been executed
		testHandler := func(data ...queue.Data) []queue.Data {
			for _, datum := range data {
				executedTasks = append(executedTasks, datum.(string))
			}
			return nil
		}

		q := queue.CreateUniqueQueue("pr_patch_checker_test", testHandler, "")
		q.Run(func(atShutdown func()) { go func() { <-stop; atShutdown() }() }, func(atTerminate func()) {})

		// wait for a while to see whether there are tasks to get executed.
		time.Sleep(1 * time.Second)

		// check whether the tasks are still in the queue
		for i := 0; i < 100; i++ {
			if has, _ := q.Has("task-" + strconv.Itoa(i)); has {
				hasTasks = append(hasTasks, "task-"+strconv.Itoa(i))
			}
		}
		close(stop)
		return
	}

	q3 := func() (executedTasks []string, hasTasks []string) {
		stop := make(chan struct{})

		testHandler := func(data ...queue.Data) []queue.Data {
			for _, datum := range data {
				executedTasks = append(executedTasks, datum.(string))
			}
			return nil
		}

		q := queue.CreateUniqueQueue("pr_patch_checker_test", testHandler, "")
		q.Run(func(atShutdown func()) { go func() { <-stop; atShutdown() }() }, func(atTerminate func()) {})

		// re-run all tasks
		for i := 0; i < 100; i++ {
			_ = q.Push("task-" + strconv.Itoa(i))
		}

		// wait for a while
		time.Sleep(1 * time.Second)

		// check whether the tasks are still in the queue
		for i := 0; i < 100; i++ {
			if has, _ := q.Has("task-" + strconv.Itoa(i)); has {
				hasTasks = append(hasTasks, "task-"+strconv.Itoa(i))
			}
		}
		close(stop)
		return
	}

	completedTasks1 := q1() // run some tasks and shutdown at an intermediate point
	time.Sleep(time.Second)
	executedTasks2, hasTasks2 := q2() // restart the queue to check the tasks in it
	time.Sleep(time.Second)
	executedTasks3, hasTasks3 := q3() // try to re-run all tasks

	log.Error("TestPullRequest_QueueStuck completed1=%v, executed2=%v, has2=%v, executed3=%v, has3=%v",
		len(completedTasks1), len(executedTasks2), len(hasTasks2), len(executedTasks3), len(hasTasks3))
}
@wxiaoguang commented on GitHub (Feb 23, 2023): This is a test for this problem The results are quite surprising. ``` TestPullRequest_QueueStuck completed1=1, executed2=0, has2=20, executed3=100, has3=19 TestPullRequest_QueueStuck completed1=4, executed2=0, has2=20, executed3=100, has3=19 TestPullRequest_QueueStuck completed1=4, executed2=0, has2=20, executed3=101, has3=19 ``` ```go func TestPullRequest_QueueStuck(t *testing.T) { setting_module.AppWorkPath = "/tmp" _ = util.RemoveAll("/tmp/data") unittest.PrepareTestEnv(t) setting_module.InitProviderAndLoadCommonSettingsForTest() setting_module.LoadQueueSettings() q1 := func() (completedTasks []string) { startWhen100Ready := make(chan struct{}) // only start data cnosuming when the 100 tasks are all pushed into queue stopAt20Shutdown := make(chan struct{}) // stop and shutdown at the 20th item testHandler := func(data ...queue.Data) []queue.Data { <-startWhen100Ready time.Sleep(100 * time.Millisecond) for _, datum := range data { s := datum.(string) completedTasks = append(completedTasks, s) if s == "task-20" { close(stopAt20Shutdown) return nil } } return nil } q := queue.CreateUniqueQueue("pr_patch_checker_test", testHandler, "") q.Run(func(atShutdown func()) { go func() { <-stopAt20Shutdown; atShutdown() }() }, func(atTerminate func()) {}) // add 100 tasks to the queue for i := 0; i < 100; i++ { _ = q.Push("task-" + strconv.Itoa(i)) } close(startWhen100Ready) <-stopAt20Shutdown return } q2 := func() (executedTasks []string, hasTasks []string) { stop := make(chan struct{}) // collect the tasks that have been executed testHandler := func(data ...queue.Data) []queue.Data { for _, datum := range data { executedTasks = append(executedTasks, datum.(string)) } return nil } q := queue.CreateUniqueQueue("pr_patch_checker_test", testHandler, "") q.Run(func(atShutdown func()) { go func() { <-stop; atShutdown() }() }, func(atTerminate func()) {}) // wait for a while to see whether there are tasks to get executed. time.Sleep(1 * time.Second) // check whether the tasks are still in the queue for i := 0; i < 100; i++ { if has, _ := q.Has("task-" + strconv.Itoa(i)); has { hasTasks = append(hasTasks, "task-"+strconv.Itoa(i)) } } close(stop) return } q3 := func() (executedTasks []string, hasTasks []string) { stop := make(chan struct{}) testHandler := func(data ...queue.Data) []queue.Data { for _, datum := range data { executedTasks = append(executedTasks, datum.(string)) } return nil } q := queue.CreateUniqueQueue("pr_patch_checker_test", testHandler, "") q.Run(func(atShutdown func()) { go func() { <-stop; atShutdown() }() }, func(atTerminate func()) {}) // re-run all tasks for i := 0; i < 100; i++ { _ = q.Push("task-" + strconv.Itoa(i)) } // wait for a while time.Sleep(1 * time.Second) // check whether the tasks are still in the queue for i := 0; i < 100; i++ { if has, _ := q.Has("task-" + strconv.Itoa(i)); has { hasTasks = append(hasTasks, "task-"+strconv.Itoa(i)) } } close(stop) return } completedTasks1 := q1() // run some tasks and shutdown at an intermediate point time.Sleep(time.Second) executedTasks2, hasTasks2 := q2() // restart the queue to check the tasks in it time.Sleep(time.Second) executedTasks3, hasTasks3 := q3() // try to re-run all tasks log.Error("TestPullRequest_QueueStuck completed1=%v, executed2=%v, has2=%v, executed3=%v, has3=%v", len(completedTasks1), len(executedTasks2), len(hasTasks2), len(executedTasks3), len(hasTasks3)) } ```
Author
Owner

@zeripath commented on GitHub (Feb 25, 2023):

@wxiaoguang Thanks for your testcase. It's not quite right but it's pointed me in the correct direction.

Queues are extremely concurrent and the handler in q1 is going to drop a lot of data - you cannot expect that as soon as you tell a queue to shutdown that no more data will be handled - In fact you can almost guarantee that at least a few more will be handled.

I'm just working through it trying to make it safe and properly concurrent safe and I'm placing it in modules/queue once I've slightly rewritten it - (I'm not sure what q3 is supposed to reliably do.)


Now to address the Warning - if you switch on Trace logging for modules/queue:

gitea manager logging add console --name traceconsole --level TRACE --expression modules/queue
[log]
MODE = ..., traceconsole
...

[log.traceconsole]
LEVEL=trace
MODE=console
EXPRESSION=modules/queue

You would see that internal channelqueue is explicitly drained at Terminate. There should be no loss of data - the warning is not the cause of the issue you're investigating.


I assume you're actually investigating things not being emptied on startup - and that's a real bug.

If you look at:

8540fc45b1/modules/queue/queue_disk_channel.go (L175-L197)

and compare with:

8540fc45b1/modules/queue/unique_queue_disk_channel.go (L212-L226)

I think you'll see the real bug.

@zeripath commented on GitHub (Feb 25, 2023): @wxiaoguang Thanks for your testcase. It's not quite right but it's pointed me in the correct direction. Queues are extremely concurrent and the handler in q1 is going to drop a lot of data - you cannot expect that as soon as you tell a queue to shutdown that no more data will be handled - In fact you can almost guarantee that at least a few more will be handled. I'm just working through it trying to make it safe and properly concurrent safe and I'm placing it in modules/queue once I've slightly rewritten it - (I'm not sure what q3 is supposed to reliably do.) --- Now to address the Warning - if you switch on Trace logging for modules/queue: ```bash gitea manager logging add console --name traceconsole --level TRACE --expression modules/queue ``` ```ini [log] MODE = ..., traceconsole ... [log.traceconsole] LEVEL=trace MODE=console EXPRESSION=modules/queue ``` You would see that internal channelqueue is explicitly drained at Terminate. There should be no loss of data - the warning is not the cause of the issue you're investigating. --- I assume you're actually investigating things not being emptied on startup - and that's a real bug. If you look at: https://github.com/go-gitea/gitea/blob/8540fc45b11eff9a73753ca139f8ea5c38509bf5/modules/queue/queue_disk_channel.go#L175-L197 and compare with: https://github.com/go-gitea/gitea/blob/8540fc45b11eff9a73753ca139f8ea5c38509bf5/modules/queue/unique_queue_disk_channel.go#L212-L226 I think you'll see the real bug.
Author
Owner

@wxiaoguang commented on GitHub (Feb 25, 2023):

Actually, I do not care about whether the queue is reliable or not. And in most cases, the queue doesn't need to be reliable.

But has3=19 seems to cause the bug described by this issue. These tasks are just stuck in queue.

@wxiaoguang commented on GitHub (Feb 25, 2023): Actually, I do not care about whether the queue is reliable or not. And in most cases, the queue doesn't need to be reliable. But `has3=19` seems to cause the bug described by this issue. These tasks are just stuck in queue.
Author
Owner

@zeripath commented on GitHub (Feb 25, 2023):

As I said, your testcase is somewhat incorrect in form. The reason you're seeing somewhat inconsistent behaviour is partially because of this but also because of the bug that I mentioned above.

Thank you for the testcase though because whilst it's not quite right it made me realise that the bug is the one I've mentioned above - which I've fixed in the attached PR. The testcase has also been fixed.

@zeripath commented on GitHub (Feb 25, 2023): As I said, your testcase is somewhat incorrect in form. The reason you're seeing somewhat inconsistent behaviour is partially because of this but also because of the bug that I mentioned above. Thank you for the testcase though because whilst it's not quite right it made me realise that the bug is the one I've mentioned above - which I've fixed in the attached PR. The testcase has also been fixed.
Author
Owner

@wxiaoguang commented on GitHub (Feb 26, 2023):

After applying #23154 , I can still see some strange results (just strange, I do not know whether they are right or wrong)

  • The q2 always outputs executed2=0, has2=20, does it mean that: if no Push after the startup, the tasks in the queue would never be executed?
  • If the q3 uses Has to limit the tasks only being executed once (the code in below details, maybe like the checkAndUpdateStatus in code?), the output will be executed3=107, has3=0, it means that many tasks are still re-executed even if there is a Has check. I know that there is concurrency, but is it clear to future developers? Is the existing code like checkAndUpdateStatus which relies on Has working as expected in all cases?

Are these two behaviors by design? If yes, I think some comments would be very useful. If no, fix or not?

Feel free to ignore my comment if you think they are unrelated.

ps: I didn't get the point about what do you mean by "incorrect" or "not quite right". That test code is just used to test the queue's behavior, there is no assertion in it, and I didn't say anything about "correct" or "right" before ....

		// q3 test handler to use `Has` to only executed one task
		testHandler := func(data ...queue.Data) []queue.Data {
			for _, datum := range data {
				s := datum.(string)
				if has, _ := q.Has(s); !has {
					executedTasks = append(executedTasks, s)
				}
			}
			return nil
		}
@wxiaoguang commented on GitHub (Feb 26, 2023): After applying #23154 , I can still see some strange results (just strange, I do not know whether they are right or wrong) * The `q2` always outputs `executed2=0, has2=20`, does it mean that: if no `Push` after the startup, the tasks in the queue would never be executed? * If the `q3` uses `Has` to limit the tasks only being executed once (the code in below details, maybe like the `checkAndUpdateStatus` in code?), the output will be `executed3=107, has3=0`, it means that many tasks are still re-executed even if there is a `Has` check. I know that there is concurrency, but is it clear to future developers? Is the existing code like `checkAndUpdateStatus` which relies on `Has` working as expected in all cases? Are these two behaviors by design? If yes, I think some comments would be very useful. If no, fix or not? Feel free to ignore my comment if you think they are unrelated. ps: I didn't get the point about what do you mean by "incorrect" or "not quite right". That test code is just used to test the queue's behavior, there is no assertion in it, and I didn't say anything about "correct" or "right" before .... <details> ```go // q3 test handler to use `Has` to only executed one task testHandler := func(data ...queue.Data) []queue.Data { for _, datum := range data { s := datum.(string) if has, _ := q.Has(s); !has { executedTasks = append(executedTasks, s) } } return nil } ``` </details>
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#10315