Migrations permanently stuck if gitea is restarted during the migration #6293

Closed
opened 2025-11-02 06:51:15 -06:00 by GiteaMirror · 19 comments
Owner

Originally created by @Qix- on GitHub (Nov 11, 2020).

  • Gitea version (or commit ref): 1.12.5
  • Git version: 2.20.1
  • Operating system: Debian 10, used the "getting started on linux" instructions from the main site
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No, cannot restart try.gitea.io manually
  • Log gist:

Description

https://github.com/go-gitea/gitea/issues/8812#issuecomment-549700212

Same as mentioned there. Forcefully restarting gitea while a migration is happening will cause any unfinished/pending migrations to hang indefinitely. Manually running the cron tasks in the administration panel does nothing.

I just spent about 5 hours scouring the web for clone links for a bunch of dependencies we need to mirror, I would really prefer not to have to do that again.

Screenshots

image

Originally created by @Qix- on GitHub (Nov 11, 2020). <!-- NOTE: If your issue is a security concern, please send an email to security@gitea.io instead of opening a public issue --> <!-- 1. Please speak English, this is the language all maintainers can speak and write. 2. Please ask questions or configuration/deploy problems on our Discord server (https://discord.gg/gitea) or forum (https://discourse.gitea.io). 3. Please take a moment to check that your issue doesn't already exist. 4. Please give all relevant information below for bug reports, because incomplete details will be handled as an invalid report. --> - Gitea version (or commit ref): 1.12.5 - Git version: 2.20.1 - Operating system: Debian 10, used the "getting started on linux" instructions from the main site <!-- Please include information on whether you built gitea yourself, used one of our downloads or are using some other package --> <!-- Please also tell us how you are running gitea, e.g. if it is being run from docker, a command-line, systemd etc. ---> <!-- If you are using a package or systemd tell us what distribution you are using --> - Database (use `[x]`): - [ ] PostgreSQL - [ ] MySQL - [ ] MSSQL - [x] SQLite - Can you reproduce the bug at https://try.gitea.io: - [ ] Yes (provide example URL) - [x] No, cannot restart try.gitea.io manually - Log gist: <!-- It really is important to provide pertinent logs --> <!-- Please read https://docs.gitea.io/en-us/logging-configuration/#debugging-problems --> <!-- In addition, if your problem relates to git commands set `RUN_MODE=dev` at the top of app.ini --> ## Description https://github.com/go-gitea/gitea/issues/8812#issuecomment-549700212 Same as mentioned there. Forcefully restarting gitea while a migration is happening will cause any unfinished/pending migrations to hang indefinitely. Manually running the cron tasks in the administration panel does nothing. I just spent about 5 hours scouring the web for clone links for a bunch of dependencies we need to mirror, I would really prefer not to have to do that again. ## Screenshots ![image](https://user-images.githubusercontent.com/885648/98761030-d34cd980-23d4-11eb-9f70-c22bf1bd6885.png)
GiteaMirror added the type/bug label 2025-11-02 06:51:15 -06:00
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

This definitely should get fixed permanently but I'm also open for any manual workarounds that don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually...

@Qix- commented on GitHub (Nov 11, 2020): This definitely should get fixed permanently but I'm also open for any manual workarounds that don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually...
Author
Owner

@lunny commented on GitHub (Nov 11, 2020):

Just delete the repository from admin panel and then migrate it again.

@lunny commented on GitHub (Nov 11, 2020): Just delete the repository from admin panel and then migrate it again.
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually

I will spend another 5 hours re-initializing all of them >.> is there a way to kick off the migrations manually?

@Qix- commented on GitHub (Nov 11, 2020): > don't involve deleting each of the 74 mirrors I just created and re-initializing all of them manually I will spend another 5 hours re-initializing all of them >.> is there a way to kick off the migrations manually?
Author
Owner

@zeripath commented on GitHub (Nov 11, 2020):

You could use the API?

@zeripath commented on GitHub (Nov 11, 2020): You could use the API?
Author
Owner

@zeripath commented on GitHub (Nov 11, 2020):

why is your gitea being restarted so much?

@zeripath commented on GitHub (Nov 11, 2020): why is your gitea being restarted so much?
Author
Owner

@zeripath commented on GitHub (Nov 11, 2020):

(This also leads to the question as to why the migration isn't being cancelled when the machine is restarted, and why the migration stuff isn't restartable...)

@zeripath commented on GitHub (Nov 11, 2020): (This also leads to the question as to why the migration isn't being cancelled when the machine is restarted, and why the migration stuff isn't restartable...)
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

Well, it froze and prevented anyone from SSHing in and had to be force killed. 🙃 So 0 for 2 now.

Why is gitea not fault tolerant is a question-as-a-response, lol.

@Qix- commented on GitHub (Nov 11, 2020): Well, it froze and prevented anyone from SSHing in and had to be force killed. 🙃 So 0 for 2 now. Why is gitea not fault tolerant is a question-as-a-response, lol.
Author
Owner

@zeripath commented on GitHub (Nov 11, 2020):

OK so you're running SQLite in production and you've hit #13271.

That was fixed by: #13505 and should be fixed in 1.13 by #13507.

Bugs happen. No-one is paying any of us to work on Gitea.

@zeripath commented on GitHub (Nov 11, 2020): OK so you're running SQLite in production and you've hit #13271. That was fixed by: #13505 and should be fixed in 1.13 by #13507. Bugs happen. No-one is paying any of us to work on Gitea.
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

Bugs happen. No-one is paying any of us to work on Gitea.

Yes, I'm fully aware how OSS works (check my profile). The sort of silly question why I'm restarting gitea (faults happen in production...) deserved an answer in-kind. Gitea is not critical for us, I'm not demanding anything, etc.

Thank you for the links, I'll patiently await 1.13 then 🙂

@Qix- commented on GitHub (Nov 11, 2020): > Bugs happen. No-one is paying any of us to work on Gitea. Yes, I'm fully aware how OSS works (check my profile). The sort of silly question why I'm restarting gitea (faults happen in production...) deserved an answer in-kind. Gitea is not critical for us, I'm not demanding anything, etc. Thank you for the links, I'll patiently await 1.13 then 🙂
Author
Owner

@6543 commented on GitHub (Nov 11, 2020):

@Qix- gitea is trying to be tolerant - just SQLite is very limited ... so if you dont use it for your ~5 repos but mirror 74+ repos and more, you realy should consider moving to mysql

@6543 commented on GitHub (Nov 11, 2020): @Qix- gitea is trying to be tolerant - just SQLite is very limited ... so if you dont use it for your ~5 repos but mirror 74+ repos and more, you realy should consider moving to mysql
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

@6543 Why? SQLite is very robust if used correctly. It's been around for decades and is used successfully in production (see: android) every day by billions of users.

That's a weak argument. I'm not trying to debate here, I was simply reporting a bug. There's no reason, however, to insinuate that lack of fault tolerance is somehow my fault. It's a bug, it's nobody's fault, and I'm grateful for the project of course.

I was simply filing a bug.

@Qix- commented on GitHub (Nov 11, 2020): @6543 Why? SQLite is very robust if used correctly. It's been around for decades and is used successfully in production (see: android) every day by billions of users. That's a weak argument. I'm not trying to debate here, I was simply reporting a bug. There's no reason, however, to insinuate that lack of fault tolerance is somehow my fault. It's a bug, it's nobody's fault, and I'm grateful for the project of course. I was simply filing a bug.
Author
Owner

@6543 commented on GitHub (Nov 11, 2020):

I have nothing against you, I just want to point out that SQLite easily deadlocks when it is used by multiple actors (yes we are trying to get rid of it).

And thanks for bug-reporting, without we would not be aware of many bugs 👍

@6543 commented on GitHub (Nov 11, 2020): I have nothing against you, I just want to point out that SQLite easily deadlocks when it is used by multiple actors (yes we are trying to get rid of it). And thanks for bug-reporting, without we would not be aware of many bugs :+1:
Author
Owner

@zeripath commented on GitHub (Nov 11, 2020):

@Qix- I'm sorry if you thought that: https://github.com/go-gitea/gitea/issues/13513#issuecomment-725354404 was an inappropriate question

It isn't inappropriate, because repos should get deleted if the migration is cancelled because of shutdown. The deadlock explains why they weren't and is the root cause of the problems you are seeing.

@zeripath commented on GitHub (Nov 11, 2020): @Qix- I'm sorry if you thought that: https://github.com/go-gitea/gitea/issues/13513#issuecomment-725354404 was an inappropriate question It isn't inappropriate, because repos should get deleted if the migration is cancelled because of shutdown. The deadlock explains why they weren't and is the root cause of the problems you are seeing.
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

I merely insinuated that a web service would be more robust if it could survive unexpected shutdowns. Gitea being force-killed put it into a corrupted state that cannot be resumed or error-corrected, which is a dist-sys problem.

I'm a dist-sys architect; asking me "why are you restarting [a web service]" is like asking me "why did you make your server's power go out during a thunderstorm?". I didn't want that to happen, but it happens. A robust service would be fault tolerant of that.

With a single instance running, I highly doubt this is purely SQLite's fault (there are not multiple actors here). Perhaps I'm missing implementation details, but it seems like maybe something could be improved to increase the robustness against failures.

That's all I was implying. 🙂 I wasn't trying to put anyone down, but I didn't see how the question fit the bug report at all.

@Qix- commented on GitHub (Nov 11, 2020): I merely insinuated that a web service would be more robust if it could survive unexpected shutdowns. Gitea being force-killed put it into a corrupted state that cannot be resumed or error-corrected, which is a dist-sys problem. I'm a dist-sys architect; asking me "why are you restarting [a web service]" is like asking me "why did you make your server's power go out during a thunderstorm?". I didn't _want_ that to happen, but it happens. A robust service would be fault tolerant of that. With a single instance running, I highly doubt this is purely SQLite's fault (there are not multiple actors here). Perhaps I'm missing implementation details, but it seems like maybe something could be improved to increase the robustness against failures. That's all I was implying. 🙂 I wasn't trying to put anyone down, but I didn't see how the question fit the bug report at all.
Author
Owner

@zeripath commented on GitHub (Nov 11, 2020):

(@Qix- your replies are reading very aggressively - I'm sorry if mine are reading in the same way. I'm not trying to be aggressive or defensive here.)

There already is code to clean up a migration if it fails or gitea is shutdown during a migration - however, this relies on the db not being totally deadlocked at that point.

Clearly - that is not a completely robust solution as assuming that the connection to the db was OK at shutdown is probably not something we can rely on and rather we need something that can look at in progress tasks and allow them to be cancelled. It's worth noting however that if SQLite has gone down like this we're in serious trouble - the goroutines block until the db context is killed at hammer - by which time all git operations have to die too. The migration as a whole could and should have a context which is cancelled at shutdown but xorm does not provide a way for us to make a db request with a specific context (AFAIK) so I don't think there is a way. <- OK it looks like this is actually possible just need to set the session context - this would mean propogating the context down to the models package

Sequencing these things is not simple - and the answer is that sqlite deadlocks are IMO critical security issues to be solved as soon as possible.

Now it would be helpful to provide some way of cancelling migrations - which has been discussed on a different issue and is also not simple. Tasks can run on different gitea instances so the request to cancel a migration would have to be published somewhere - and then caught by the reading gitea and before being cancelled. But of course that would not solve the issue you were having as it was due to a deadlock.


I hope that now you see why asking why you were stopping and starting gitea so much is relevant. If you're having to stop and start a web service constantly because of a problem with it - the bug that is forcing you to restart may be the actual reason you're seeing.

@zeripath commented on GitHub (Nov 11, 2020): (@Qix- your replies are reading very aggressively - I'm sorry if mine are reading in the same way. I'm not trying to be aggressive or defensive here.) There already is code to clean up a migration if it fails or gitea is shutdown during a migration - however, this relies on the db not being totally deadlocked at that point. Clearly - that is not a completely robust solution as assuming that the connection to the db was OK at shutdown is probably not something we can rely on and rather we need something that can look at in progress tasks and allow them to be cancelled. It's worth noting however that if SQLite has gone down like this we're in serious trouble - the goroutines block until the db context is killed at hammer - by which time all git operations have to die too. The migration as a whole could and should have a context which is cancelled at shutdown but xorm does not provide a way for us to make a db request with a specific context (AFAIK) so I don't think there is a way. <- **OK it looks like this is actually possible just need to set the session context - this would mean propogating the context down to the models package** Sequencing these things is not simple - and the answer is that sqlite deadlocks are IMO critical security issues to be solved as soon as possible. Now it would be helpful to provide some way of cancelling migrations - which has been discussed on a different issue and is also not simple. Tasks can run on different gitea instances so the request to cancel a migration would have to be published somewhere - and then caught by the reading gitea and before being cancelled. But of course that would not solve the issue you were having as it was due to a deadlock. --- I hope that now you see why asking why you were stopping and starting gitea so much is relevant. If you're having to stop and start a web service constantly because of a problem with it - the bug that is forcing you to restart may be the actual reason you're seeing.
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

I'm not being aggressive, I just seem to have a different viewpoint than you about software robustness.

A fault tolerant web service has the property that, in the event of a failure of any kind, it is able to error-correct and resume operations without manual intervention.

There could be a new cron-job; pseudo-code:

IF (number_of_migrations_running < migration_concurrency)
AND (query_number_of_unstarted_migrations > 0)
THEN
    start_migration
END

I don't see how what I'm saying is "aggressive", I apologize if you've perceived it that way. However, I'm not going to pretend the current behavior is correct or that it's not a bug. If you're not interested in fixing it, that's fine - I can find another solution, it's not a problem. However, I wanted to let you know that this is indeed an issue and that I simply wanted to express that the two responses - "why are you restarting?" and "It's SQLite's problem" - don't make much sense to me as they do not address the fault tolerance point.

If SQLite makes it easy for gitea to fail, then gitea should probably have error-correcting logic to correct any errors SQLite might cause.

That's all.

@Qix- commented on GitHub (Nov 11, 2020): I'm not being aggressive, I just seem to have a different viewpoint than you about software robustness. A [fault tolerant](https://en.wikipedia.org/wiki/Fault_tolerance) web service has the property that, in the event of a failure of any kind, it is able to error-correct and resume operations without manual intervention. There could be a new cron-job; pseudo-code: ``` IF (number_of_migrations_running < migration_concurrency) AND (query_number_of_unstarted_migrations > 0) THEN start_migration END ``` I don't see how what I'm saying is "aggressive", I apologize if you've perceived it that way. However, I'm not going to pretend the current behavior is correct or that it's not a bug. If you're not interested in fixing it, that's fine - I can find another solution, it's not a problem. However, I wanted to let you know that this is indeed an issue and that I simply wanted to express that the two responses - "why are you restarting?" and "It's SQLite's problem" - don't make much sense to me as they do not address the _fault tolerance_ point. If SQLite makes it easy for gitea to fail, then gitea should probably have error-correcting logic to correct any errors SQLite might cause. That's all.
Author
Owner

@6543 commented on GitHub (Nov 11, 2020):

@Qix- Since what you suggest is a new topic i have created a new issue ... #13515


keep bugs and requests seperated ...

@6543 commented on GitHub (Nov 11, 2020): @Qix- Since what you suggest is a new topic i have created a new issue ... #13515 ----------- keep bugs and requests seperated ...
Author
Owner

@Qix- commented on GitHub (Nov 11, 2020):

If you're having to stop and start a web service constantly because of a problem with it

I had to restart it once. I don't know where you got the idea that I was just constantly bringing it up and down. It froze the entire external sshd instance once and that was enough for it to ignore all of the migrations.

@Qix- commented on GitHub (Nov 11, 2020): > If you're having to stop and start a web service constantly because of a problem with it I had to restart it once. I don't know where you got the idea that I was just constantly bringing it up and down. It froze the entire external sshd instance _once_ and that was enough for it to ignore all of the migrations.
Author
Owner

@techknowlogick commented on GitHub (Nov 11, 2020):

Locking as this issue has been closed and whenever a comment is made 400+ get an email.

@techknowlogick commented on GitHub (Nov 11, 2020): Locking as this issue has been closed and whenever a comment is made 400+ get an email.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#6293