Gitea dump duplicates repositories on windows #9768

Open
opened 2025-11-02 08:49:01 -06:00 by GiteaMirror · 11 comments
Owner

Originally created by @eeyrjmr on GitHub (Nov 2, 2022).

Description

When comparing the resultant archive produced via "gitea dump" between windows and linux, the windows archive is twice as large.

It appears the bare repositories are duplicated in two locations

gitea-dump-####.zip
    custom
    data
        gitea-repositories
                  repo1
                  repo2       
   repos
      repo1
      repo2

Gitea Version

1.17.2

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.34

Operating System

windows, linux

How are you running Gitea?

service

Database

MySQL

Originally created by @eeyrjmr on GitHub (Nov 2, 2022). ### Description When comparing the resultant archive produced via "gitea dump" between windows and linux, the windows archive is twice as large. It appears the bare repositories are duplicated in two locations ``` gitea-dump-####.zip custom data gitea-repositories repo1 repo2 repos repo1 repo2 ``` ### Gitea Version 1.17.2 ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist _No response_ ### Screenshots _No response_ ### Git Version 2.34 ### Operating System windows, linux ### How are you running Gitea? service ### Database MySQL
GiteaMirror added the type/bug label 2025-11-02 08:49:01 -06:00
Author
Owner

@lunny commented on GitHub (Nov 2, 2022):

So which one is not in the right place?

@lunny commented on GitHub (Nov 2, 2022): So which one is not in the right place?
Author
Owner

@wxiaoguang commented on GitHub (Nov 3, 2022):

Maybe it's related to a long-standing bug, you shouldn't run gitea dump in gitea directory.

@wxiaoguang commented on GitHub (Nov 3, 2022): Maybe it's related to a long-standing bug, you shouldn't run `gitea dump` in gitea directory. * https://github.com/go-gitea/gitea/issues/19533
Author
Owner

@eeyrjmr commented on GitHub (Nov 21, 2022):

Maybe it's related to a long-standing bug, you shouldn't run gitea dump in gitea directory.

Apologies for the delay... Interesting bug. I have just tried this and the result is the same

So which one is not in the right place?

Very good question :) I suspect it is linux but it is likely due to a subtle difference in on-disk file structure. Looking at the restore part of the docs: https://docs.gitea.io/en-us/backup-and-restore/#restore-command-restore

unzip gitea-dump-1610949662.zip
cd gitea-dump-1610949662
mv data/conf/app.ini /etc/gitea/conf/app.ini
mv data/* /var/lib/gitea/data/
mv log/* /var/lib/gitea/log/
mv repos/* /var/lib/gitea/repositories/
chown -R gitea:gitea /etc/gitea/conf/app.ini /var/lib/gitea 

the repositories are meant to be in the root of the gitea working directory as this is where the restore sequence is instructing the user to act.

Looking at the dump generated from gitea running in an Alpine VE I see the structure aligns with this

  1. repos directory in the root of the zip containing the repos/orgs
  2. no additional repos stored within the data directory of the zip

Looking at the dump generated from a gitea running in a windows MS I see a subtle difference

  1. repos directory in the root of the zip containing the repos/org
  2. a gitea-repositories directory under the data directory of the zip.

I noticed this oddity some months ago where the backup zip was larger than the on-disk structure but I didn't look into it. I recently pushed some older git repos to the instance running on windows and the recent backups are growing

on-disk = 708Meg
gitea-dump-1668736800.zip = 1,411Meg

the sql dump (the only thing that should be different) is 1Meg in size. I spent a bit of time looking over the dump code but I havn't managed to get my head around how it works to try to understand what it is trying to dump, let alone why it is making this additional directory and only for windows

@eeyrjmr commented on GitHub (Nov 21, 2022): > Maybe it's related to a long-standing bug, you shouldn't run `gitea dump` in gitea directory. > > * [Migrating gitea repo to gitea repo leads to really huge repository #19533](https://github.com/go-gitea/gitea/issues/19533) Apologies for the delay... Interesting bug. I have just tried this and the result is the same > So which one is not in the right place? Very good question :) I suspect it is linux but it is likely due to a subtle difference in on-disk file structure. Looking at the restore part of the docs: https://docs.gitea.io/en-us/backup-and-restore/#restore-command-restore ``` unzip gitea-dump-1610949662.zip cd gitea-dump-1610949662 mv data/conf/app.ini /etc/gitea/conf/app.ini mv data/* /var/lib/gitea/data/ mv log/* /var/lib/gitea/log/ mv repos/* /var/lib/gitea/repositories/ chown -R gitea:gitea /etc/gitea/conf/app.ini /var/lib/gitea ``` the repositories are meant to be in the root of the gitea working directory as this is where the restore sequence is instructing the user to act. Looking at the dump generated from gitea running in an Alpine VE I see the structure aligns with this 1. repos directory in the root of the zip containing the repos/orgs 2. no additional repos stored within the data directory of the zip Looking at the dump generated from a gitea running in a windows MS I see a subtle difference 1. repos directory in the root of the zip containing the repos/org 2. a gitea-repositories directory under the data directory of the zip. I noticed this oddity some months ago where the backup zip was larger than the on-disk structure but I didn't look into it. I recently pushed some older git repos to the instance running on windows and the recent backups are growing on-disk = 708Meg gitea-dump-1668736800.zip = 1,411Meg the sql dump (the only thing that should be different) is 1Meg in size. I spent a bit of time looking over the dump code but I havn't managed to get my head around how it works to try to understand what it is trying to dump, let alone why it is making this additional directory and only for windows
Author
Owner

@eeyrjmr commented on GitHub (Nov 21, 2022):

I do have this in my app.ini

[repository]
ROOT = D:/gitea/data/gitea-repositories

Now thinking about this... could this be related. Looking at:
https://docs.gitea.io/en-us/config-cheat-sheet/#repository-repository
ROOT: %(APP_DATA_PATH)s/gitea-repositories: Root path for storing all repository data. A relative path is interpreted as AppWorkPath/%(ROOT)s.

So I set this "just in case" based upon the "windows as a service" to include full path:
https://docs.gitea.io/en-us/windows-service/

So a running gitea is correctly reading this location. Now the backup... the backup code does two things

  1. copies the repositories
  2. backs up ./data

since I have repositories in the data subdirectory it is getting archived twice.

So in theory I should be able to comment out the [repository] section, move the D:/gitea/data/gitea-repositories to D:/gitea/gitea-repositories and gitea should keep working but also the gitea dump will be ~ the on-disk size

@eeyrjmr commented on GitHub (Nov 21, 2022): I do have this in my app.ini > [repository] > ROOT = D:/gitea/data/gitea-repositories Now thinking about this... could this be related. Looking at: https://docs.gitea.io/en-us/config-cheat-sheet/#repository-repository `ROOT: %(APP_DATA_PATH)s/gitea-repositories: Root path for storing all repository data. A relative path is interpreted as AppWorkPath/%(ROOT)s.` So I set this "just in case" based upon the "windows as a service" to include full path: https://docs.gitea.io/en-us/windows-service/ So a running gitea is correctly reading this location. Now the backup... the backup code does two things 1) copies the repositories 2) backs up ./data since I have repositories in the data subdirectory it is getting archived twice. So in theory I should be able to comment out the [repository] section, move the **D:/gitea/data/gitea-repositories** to **D:/gitea/gitea-repositories** and gitea should keep working but also the gitea dump will be ~ the on-disk size
Author
Owner

@lunny commented on GitHub (Nov 21, 2022):

So should you move repositories out of data or should Gitea check if repositories directory under ./data?

@lunny commented on GitHub (Nov 21, 2022): So should you move repositories out of data or should Gitea check if repositories directory under `./data`?
Author
Owner

@eeyrjmr commented on GitHub (Nov 21, 2022):

So should you move repositories out of data or should Gitea check if repositories directory under ./data?

good question :)
For consistency I should move repositories out of data as this way following the restore from backup makes sense.

should gitea check if the repositories are under ./data ... looking at the issue @wxiaoguang linked there is some commonality as the migration also put the repositories under ./data. Its extra logic to check and skip

@eeyrjmr commented on GitHub (Nov 21, 2022): > So should you move repositories out of data or should Gitea check if repositories directory under `./data`? good question :) For consistency I should move repositories out of data as this way following the restore from backup makes sense. should gitea check if the repositories are under ./data ... looking at the issue @wxiaoguang linked there is some commonality as the migration also put the repositories under ./data. Its extra logic to check and skip
Author
Owner

@eeyrjmr commented on GitHub (Nov 22, 2022):

ok its a bit more involved than that...

I commented out the [repository] entry and ran git dump to test:

2022/11/22 08:41:00 ...les/storage/local.go:46:NewLocalStorage() [I] Creating new Local Storage at D:\gitea\data\packages
Failed to include repositories: open D:\gitea\data\gitea-repositories: The system cannot find the file specified.
2022/11/22 08:41:00 cmd/dump.go:241:runDump() [I] Dumping local repositories... D:\gitea\data\gitea-repositories
2022/11/22 08:41:00 cmd/dump.go:159:fatal() [F] Failed to include repositories: open D:\gitea\data\gitea-repositories: The system cannot find the file specified.

that aside, the archive is back to an expected size

image

@eeyrjmr commented on GitHub (Nov 22, 2022): ok its a bit more involved than that... I commented out the [repository] entry and ran git dump to test: ``` 2022/11/22 08:41:00 ...les/storage/local.go:46:NewLocalStorage() [I] Creating new Local Storage at D:\gitea\data\packages Failed to include repositories: open D:\gitea\data\gitea-repositories: The system cannot find the file specified. 2022/11/22 08:41:00 cmd/dump.go:241:runDump() [I] Dumping local repositories... D:\gitea\data\gitea-repositories 2022/11/22 08:41:00 cmd/dump.go:159:fatal() [F] Failed to include repositories: open D:\gitea\data\gitea-repositories: The system cannot find the file specified. ``` that aside, the archive is back to an expected size ![image](https://user-images.githubusercontent.com/4564448/203267327-96e50c93-84f6-4296-9bdb-dc8155a288b2.png)
Author
Owner

@techknowlogick commented on GitHub (Jul 25, 2023):

re-opening as we've received a similar report via chat

@techknowlogick commented on GitHub (Jul 25, 2023): re-opening as we've received a similar report via chat
Author
Owner

@Kalyxt commented on GitHub (Jul 30, 2023):

I'll post here additional info.

giteasize

First line is zipped gitea folder which contains entire data, second line is dump created by gitea CLI (1.20.1).

I browsed dump file ale there are duplicated repositories at gitea-dump-1690312222.zip\data\gitea-repositories and gitea-dump-1690312222.zip\repos.

@Kalyxt commented on GitHub (Jul 30, 2023): I'll post here additional info. ![giteasize](https://github.com/go-gitea/gitea/assets/26333327/74d54917-755d-4715-b348-32e2d45884a3) First line is zipped gitea folder which contains entire data, second line is dump created by gitea CLI (1.20.1). I browsed dump file ale there are duplicated repositories at `gitea-dump-1690312222.zip\data\gitea-repositories` and `gitea-dump-1690312222.zip\repos`.
Author
Owner

@hesseldijk commented on GitHub (Oct 10, 2023):

Hi,

Any more information on this? I'm having the same problem (1.20.2)

@hesseldijk commented on GitHub (Oct 10, 2023): Hi, Any more information on this? I'm having the same problem (1.20.2)
Author
Owner

@wxiaoguang commented on GitHub (Apr 3, 2024):

When writing #30240 , I think I understand more about the problems now (the "dump" code wasn't written by me, so it really takes a lot of time to understand what it is doing ....)

The root problem is that some directories overlapped. For example: Gitea expects to backup PathA and PathB. But if PathA=C:\git\data and PathB=C:\git\data\sub, then the dumped file contains duplicate files.

At the moment I don't have a clear plan for a complete rewriting. And I can see that the "dump" command has a lot of problems. So a workaround could be "manually copy the data directory and dump the database", it is more flexible and controllable.

@wxiaoguang commented on GitHub (Apr 3, 2024): When writing #30240 , I think I understand more about the problems now (the "dump" code wasn't written by me, so it really takes a lot of time to understand what it is doing ....) The root problem is that some directories overlapped. For example: Gitea expects to backup PathA and PathB. But if `PathA=C:\git\data` and `PathB=C:\git\data\sub`, then the dumped file contains duplicate files. At the moment I don't have a clear plan for a complete rewriting. And I can see that the "dump" command has a lot of problems. So a workaround could be "manually copy the data directory and dump the database", it is more flexible and controllable.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#9768