Record storage objects metadata in database #13379

Open
opened 2025-11-02 10:40:32 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @wolfogre on GitHub (Aug 8, 2024).

Feature Description

This idea came to my mind for weeks.

Now, it's almost impossible to inspect the storage usage on Gitea, like how many objects there are, and how much space they took. If I want to know that, I have to write a tool to iterate all objects on disk/s3/blob.

However, things could be much easier if Gitea maintains the info in database when add/remove objects to/from storage. It can record the path and size of the objects, then I can search them with path prefix (it's friendly to database) and get the list, number count and size count.

Is it difficult to implement?

Not really, Gitea has the interface for object storage

// ObjectStorage represents an object storage to handle a bucket and files
type ObjectStorage interface {
	Open(path string) (Object, error)
	// Save store a object, if size is unknown set -1
	Save(path string, r io.Reader, size int64) (int64, error)
	Stat(path string) (os.FileInfo, error)
	Delete(path string) error
	URL(path, name string) (*url.URL, error)
	IterateObjects(path string, iterator func(path string, obj Object) error) error
}

So we can add a new implementation which wraps the inner one and writes to the database after Save and Delete.

Than, add a new page in admin zone, maybe "Code Asserts" > "Objects".

What to do with the existing instances?

Add a cron job that supports manual triggering only, and when users trigger it, call IterateObjects to sync the data into database.

What is it used for?

TBH. the only usage I can think of is just inspecting. It's risky to manage (like remove) them manually. But maybe there will be some new uses in the future.

What if I don't need this and want to save database space?

Add a new config option to disable it, and the "Objects" page will be hide in admin zone.

Originally created by @wolfogre on GitHub (Aug 8, 2024). ### Feature Description This idea came to my mind for weeks. Now, it's almost impossible to inspect the storage usage on Gitea, like how many objects there are, and how much space they took. If I want to know that, I have to write a tool to iterate all objects on disk/s3/blob. However, things could be much easier if Gitea maintains the info in database when add/remove objects to/from storage. It can record the path and size of the objects, then I can search them with path prefix (it's friendly to database) and get the list, number count and size count. #### Is it difficult to implement? Not really, Gitea has the interface for object storage ```go // ObjectStorage represents an object storage to handle a bucket and files type ObjectStorage interface { Open(path string) (Object, error) // Save store a object, if size is unknown set -1 Save(path string, r io.Reader, size int64) (int64, error) Stat(path string) (os.FileInfo, error) Delete(path string) error URL(path, name string) (*url.URL, error) IterateObjects(path string, iterator func(path string, obj Object) error) error } ``` So we can add a new implementation which wraps the inner one and writes to the database after `Save` and `Delete`. Than, add a new page in admin zone, maybe "Code Asserts" > "Objects". #### What to do with the existing instances? Add a cron job that supports manual triggering only, and when users trigger it, call `IterateObjects` to sync the data into database. #### What is it used for? TBH. the only usage I can think of is just inspecting. It's risky to manage (like remove) them manually. But maybe there will be some new uses in the future. #### What if I don't need this and want to save database space? Add a new config option to disable it, and the "Objects" page will be hide in admin zone.
GiteaMirror added the type/proposal label 2025-11-02 10:40:32 -06:00
Author
Owner

@lunny commented on GitHub (Aug 8, 2024):

Another benefit is we can have tags for LFS files on the tree UI.

@lunny commented on GitHub (Aug 8, 2024): Another benefit is we can have tags for LFS files on the tree UI.
Author
Owner

@wolfogre commented on GitHub (Aug 9, 2024):

Another benefit is we can have tags for LFS files on the tree UI.

Hmm... I am not sure about how to do this. The path of the LFS object in storage is different from the path in the work tree, like images/a.png vs lfs/fe/33/cd16d18ed2009e1aa561a67e6800394a57196225b2cbd8a9ee2a1340ea4e.

And if there's any database tables that could help with that, it should be LFSMetaObject.

@wolfogre commented on GitHub (Aug 9, 2024): > Another benefit is we can have tags for LFS files on the tree UI. Hmm... I am not sure about how to do this. The path of the LFS object in storage is different from the path in the work tree, like `images/a.png` vs `lfs/fe/33/cd16d18ed2009e1aa561a67e6800394a57196225b2cbd8a9ee2a1340ea4e`. And if there's any database tables that could help with that, it should be `LFSMetaObject`.
Author
Owner

@lunny commented on GitHub (Aug 14, 2024):

Maybe benefit for #31162

@lunny commented on GitHub (Aug 14, 2024): Maybe benefit for #31162
Author
Owner

@wolfogre commented on GitHub (Aug 15, 2024):

Maybe benefit for #31162

It still should be LFSMetaObject.

@wolfogre commented on GitHub (Aug 15, 2024): > Maybe benefit for #31162 It still should be `LFSMetaObject`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#13379