Add Maintenance Operation to Garbage Collect Orphaned Attachments #14703

Closed
opened 2025-11-02 11:20:31 -06:00 by GiteaMirror · 14 comments
Owner

Originally created by @smartYSC on GitHub (Jul 4, 2025).

Feature Description

Over time, orphaned attachments can accumulate in Gitea. Attachments can become orphaned e.g. when a user starts creating a PR/issue/comment/release, uploads an attachment and then decided to not post it, see https://github.com/go-gitea/gitea/issues/16783.

In our case we have a bot which posts build results to PRs. It overwrites existing comments to avoid clutter. We do this by deleting the existing comment and creating a new one. However, the attachments of that comment are not cleaned up.

It would be nice if Gitea would either automatically delete those unreferenced attachments or to add a Maintenance Operation which checks the attachments table and removes all which have no reference.

Screenshots

No response

Originally created by @smartYSC on GitHub (Jul 4, 2025). ### Feature Description Over time, orphaned attachments can accumulate in Gitea. Attachments can become orphaned e.g. when a user starts creating a PR/issue/comment/release, uploads an attachment and then decided to not post it, see https://github.com/go-gitea/gitea/issues/16783. In our case we have a bot which posts build results to PRs. It overwrites existing comments to avoid clutter. We do this by deleting the existing comment and creating a new one. However, the attachments of that comment are not cleaned up. It would be nice if Gitea would either automatically delete those unreferenced attachments or to add a Maintenance Operation which checks the attachments table and removes all which have no reference. ### Screenshots _No response_
GiteaMirror added the type/proposalissue/needs-feedback labels 2025-11-02 11:20:32 -06:00
Author
Owner

@silverwind commented on GitHub (Jul 4, 2025):

I agree this would be nice to have. If possible, the mechanism should also check the comment edit history so that attachments that are used in a older version of a comment are not being deleted.

@silverwind commented on GitHub (Jul 4, 2025): I agree this would be nice to have. If possible, the mechanism should also check the comment edit history so that attachments that are used in a older version of a comment are not being deleted.
Author
Owner

@wxiaoguang commented on GitHub (Jul 4, 2025):

It would be nice if Gitea would either automatically delete those unreferenced attachments or to add a Maintenance Operation which checks the attachments table and removes all which have no reference.

It's almost impossible to correctly detect "unused" attachments, users could copy the attachment link to other places (issues, PRs) and even wiki pages.

@wxiaoguang commented on GitHub (Jul 4, 2025): > It would be nice if Gitea would either automatically delete those unreferenced attachments or to add a Maintenance Operation which checks the attachments table and removes all which have no reference. It's almost impossible to correctly detect "unused" attachments, users could copy the attachment link to other places (issues, PRs) and even wiki pages.
Author
Owner

@smartYSC commented on GitHub (Jul 4, 2025):

You can already manually edit attachments of a comment and delete them. They will be actually deleted from disk. So those other links become invalid then already.

The only issue I am facing right now is that if you delete the comment without deleting the attachments first, the attachments survive. You can see this by checking the attachments table and check if the mentioned comment_id exists.

To be extra clear: If you first delete the attachments one-by-one and then delete the comment, everything is gone.

@smartYSC commented on GitHub (Jul 4, 2025): You can already manually edit attachments of a comment and delete them. They will be actually deleted from disk. So those other links become invalid then already. The only issue I am facing right now is that if you delete the comment without deleting the attachments first, the attachments survive. You can see this by checking the `attachments` table and check if the mentioned `comment_id` exists. To be extra clear: If you first delete the attachments one-by-one and then delete the comment, everything is gone.
Author
Owner

@wxiaoguang commented on GitHub (Jul 4, 2025):

That's the problem:

It's almost impossible to correctly detect "unused" attachments, users could copy the attachment link to other places (issues, PRs) and even wiki pages.

What if the doer has used the attachment links at other places and does want to keep the attachment and only want to delete the comment?

@wxiaoguang commented on GitHub (Jul 4, 2025): That's the problem: > It's almost impossible to correctly detect "unused" attachments, users could copy the attachment link to other places (issues, PRs) and even wiki pages. What if the doer has used the attachment links at other places and does want to keep the attachment and only want to delete the comment?
Author
Owner

@smartYSC commented on GitHub (Jul 4, 2025):

Ok, but taking this further: You could also post a link to a comment somewhere. So when you delete that comment, that link is also 404...

@smartYSC commented on GitHub (Jul 4, 2025): Ok, but taking this further: You could also post a link to a comment somewhere. So when you delete that comment, that link is also 404...
Author
Owner

@silverwind commented on GitHub (Jul 4, 2025):

Attachments could be considered for deletion if all these conditions meet:

  • They are not referenced in any comment
  • They are not referenced in any comment edit history
  • They have not been accessed in the last 12 months (configurable)
@silverwind commented on GitHub (Jul 4, 2025): Attachments could be considered for deletion if all these conditions meet: - They are not referenced in any comment - They are not referenced in any comment edit history - They have not been accessed in the last 12 months (configurable)
Author
Owner

@silverwind commented on GitHub (Jul 4, 2025):

An even better approach would be to not create garbage on the server in first place. Currently, attachments immediately upload when they are added, but they should ideally only upload when the comment is actually saved. So we could keep them stored on client side, and only send them when the comment is saved.

This will require a major rewrite of the attachment code, but I think it is ripe for it anyways.

@silverwind commented on GitHub (Jul 4, 2025): An even better approach would be to not create garbage on the server in first place. Currently, attachments immediately upload when they are added, but they should ideally only upload when the comment is actually saved. So we could keep them stored on client side, and only send them when the comment is saved. This will require a major rewrite of the attachment code, but I think it is ripe for it anyways.
Author
Owner

@delvh commented on GitHub (Jul 4, 2025):

@silverwind While this would be feasible in theory, I'm already worried for the cron job cleaning it up.
It sounds like a pretty long running task - in the current architecture, you need multiple trips across the DB.
So yeah, it would be possible if we completely overhaul the entire attachment mechanism.

@delvh commented on GitHub (Jul 4, 2025): @silverwind While this would be feasible in theory, I'm already worried for the cron job cleaning it up. It sounds like a pretty long running task - in the current architecture, you need multiple trips across the DB. So yeah, it would be possible if we completely overhaul the entire attachment mechanism.
Author
Owner

@lunny commented on GitHub (Jul 4, 2025):

An even better approach would be to not create garbage on the server in first place. Currently, attachments immediately upload when they are added, but they should ideally only upload when the comment is actually saved. So we could keep them stored on client side, and only send them when the comment is saved.

This will require a major rewrite of the attachment code, but I think it is ripe for it anyways.

How should it work when pasting an image if not uploading it first?

@lunny commented on GitHub (Jul 4, 2025): > An even better approach would be to not create garbage on the server in first place. Currently, attachments immediately upload when they are added, but they should ideally only upload when the comment is actually saved. So we could keep them stored on client side, and only send them when the comment is saved. > > This will require a major rewrite of the attachment code, but I think it is ripe for it anyways. How should it work when pasting an image if not uploading it first?
Author
Owner

@silverwind commented on GitHub (Jul 4, 2025):

How should it work when pasting an image if not uploading it first?

You can store File objects in memory in JS that can be attached to the FormData on submit.

@silverwind commented on GitHub (Jul 4, 2025): > How should it work when pasting an image if not uploading it first? You can store `File` objects in memory in JS that can be attached to the `FormData` on submit.
Author
Owner

@wxiaoguang commented on GitHub (Jul 5, 2025):

How should it work when pasting an image if not uploading it first?

You can store File objects in memory in JS that can be attached to the FormData on submit.

Then how to preview the markdown content with uploaded images?

@wxiaoguang commented on GitHub (Jul 5, 2025): > > How should it work when pasting an image if not uploading it first? > > You can store `File` objects in memory in JS that can be attached to the `FormData` on submit. Then how to preview the markdown content with uploaded images?
Author
Owner

@silverwind commented on GitHub (Jul 7, 2025):

Then how to preview the markdown content with uploaded images?

You use URL.createObjectURL which creates a in-memory image url using a random hash. The markdown code could display this URL as-is.

@silverwind commented on GitHub (Jul 7, 2025): > Then how to preview the markdown content with uploaded images? You use [URL.createObjectURL](https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL_static) which creates a in-memory image url using a random hash. The markdown code could display this URL as-is.
Author
Owner

@silverwind commented on GitHub (Jul 7, 2025):

BTW I think we should overhaul the frontend attachment code to remove Dropzone and make it work like GitHub where attachments only exist in the markdown source (as temporary object urls until saved once), e.g. no display of attachments outside the textarea.

Example: test1.txt

@silverwind commented on GitHub (Jul 7, 2025): BTW I think we should overhaul the frontend attachment code to remove Dropzone and make it work like GitHub where attachments only exist in the markdown source (as temporary object urls until saved once), e.g. no display of attachments outside the textarea. Example: [test1.txt](https://github.com/user-attachments/files/21099835/test1.txt)
Author
Owner

@GiteaBot commented on GitHub (Aug 6, 2025):

We close issues that need feedback from the author if there were no new comments for a month. 🍵

@GiteaBot commented on GitHub (Aug 6, 2025): We close issues that need feedback from the author if there were no new comments for a month. :tea:
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#14703