Proposal: User <-> repo unit cross table for easier permission checking. #4597

Open
opened 2025-11-02 05:55:40 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @guillep2k on GitHub (Jan 5, 2020).

I propose to create a new table (user_repo_units) to summarize all permissions a user requires to access any repository; this table should be derived from all the possible permission sources (e.g. team membership, admin status, repo visibility, repo active units, etc.). Any changes in the system that modifies the user's accessibility to any repo should update said table. Then, many queries can use the table for user access checking instead of a complex set of golang-side code.

The table would have a structure like:

type UserRepoUnits struct {
	UserID      int64       `xorm:"pk"`
	RepoID      int64       `xorm:"pk INDEX"`
	Type        UnitType
	Mode        AccessMode
}

Units accessible for all users (e.g. UnitTypeCode on public repositories) will not have a record for each user but one for the user 0 instead (may be -1 is better?). This should reduce the number of records in the table considerably.

Advantages

  • Access checks would be centralized in a few functions, much easier to maintain.
  • Most queries can be simplified considerably, especially for the home page.
  • It will improve the performance of the more frequent actions (e.g. home page, issue list, etc), rather than the infrequent permission modifications.
  • Bulk processes (like notifications) can retrieve a valid list of users with a single query instead of multiple complex steps.
  • Simplified access checks means less likely to leak information inadvertently.

Usage

The table can then be used in queries. For example, to get all open issues (UnitTypeIssues = 2) the user 1234 can see (AccessModeRead = 1), we could do:

select issue.*
from issue
where (select max(mode) from user_repo_units
        where user_repo_units.repository_id = issue.repository_id
          and user_repo_units.user_id in (1234, 0)
          and user_repo_units.type = 2) >= 1;
  and issue.is_closed = false;

No team membership check required, no ownership or admin check required. All in the same (pretty much standardized) query.

Alternatives

  • We could use a column for each unit type instead of separate records. This would reduce the number of records (and speed up the queries even more) at a little additional cost on maintainability.
Originally created by @guillep2k on GitHub (Jan 5, 2020). I propose to create a new table (`user_repo_units`) to summarize all permissions a user requires to access any repository; this table should be derived from all the possible permission sources (e.g. team membership, admin status, repo visibility, repo active units, etc.). Any changes in the system that modifies the user's accessibility to any repo should update said table. Then, many queries can use the table for user access checking instead of a complex set of golang-side code. The table would have a structure like: ``` type UserRepoUnits struct { UserID int64 `xorm:"pk"` RepoID int64 `xorm:"pk INDEX"` Type UnitType Mode AccessMode } ``` Units accessible for all users (e.g. `UnitTypeCode` on public repositories) will not have a record for each user but one for the user `0` instead (may be `-1` is better?). This should reduce the number of records in the table considerably. ### Advantages * Access checks would be centralized in a few functions, much easier to maintain. * Most queries can be simplified considerably, especially for the home page. * It will improve the performance of the more frequent actions (e.g. home page, issue list, etc), rather than the infrequent permission modifications. * Bulk processes (like notifications) can retrieve a valid list of users with a single query instead of multiple complex steps. * Simplified access checks means less likely to leak information inadvertently. ### Usage The table can then be used in queries. For example, to get all open issues (`UnitTypeIssues` = 2) the user `1234` can see (`AccessModeRead` = 1), we could do: ``` select issue.* from issue where (select max(mode) from user_repo_units where user_repo_units.repository_id = issue.repository_id and user_repo_units.user_id in (1234, 0) and user_repo_units.type = 2) >= 1; and issue.is_closed = false; ``` No team membership check required, no ownership or admin check required. All in the same (pretty much standardized) query. ### Alternatives * We could use a column for each unit type instead of separate records. This would reduce the number of records (and speed up the queries even more) at a little additional cost on maintainability.
GiteaMirror added the type/proposal label 2025-11-02 05:55:40 -06:00
Author
Owner

@lunny commented on GitHub (Jan 6, 2020):

And we could also support collabrators unit permissions if this. Currently only organization's repositories could have unit permissions, but not individual's.

And this may result in the inconsistent between team_unit and user_repo_units. But even that, I also support to add the extra table to do that. We have to be careful of the consistent of database.

@lunny commented on GitHub (Jan 6, 2020): And we could also support collabrators unit permissions if this. Currently only organization's repositories could have unit permissions, but not individual's. And this may result in the inconsistent between team_unit and user_repo_units. But even that, I also support to add the extra table to do that. We have to be careful of the consistent of database.
Author
Owner

@sapk commented on GitHub (Jan 6, 2020):

Couldn't we define a view wich will be optimize by the SQL engine instead of a table that need to be tightly kept uptodate ?

@sapk commented on GitHub (Jan 6, 2020): Couldn't we define a view wich will be optimize by the SQL engine instead of a table that need to be tightly kept uptodate ?
Author
Owner

@guillep2k commented on GitHub (Jan 7, 2020):

@sapk That would be a very complex view and will not likely be much optimizable. Views only translate as subqueries; they don't have any kind of pre-processing.

There's however the concept of a materialized view (a.k.a. indexed view), which is a read-only table automatically maintained by the rdbms based on a view definition. Those would be very convenient here, but I don't think they're available on all of our supported databases (not in sqlite3) and they have some important limitations and caveats as well, like not being immediately updated (e.g. in PostgreSQL a REFRESH MATERIALIZED VIEW mymatview; statement must be ran in order to update its contents).

This SO answer explains how views can improve a query performance, but only if they are indexed (materialized) views.

@guillep2k commented on GitHub (Jan 7, 2020): @sapk That would be a very complex view and will not likely be much optimizable. [Views only translate as subqueries](https://www.sqlservercentral.com/blogs/poor-little-misunderstood-views); they don't have any kind of pre-processing. There's however the concept of a [materialized view](https://en.wikipedia.org/wiki/Materialized_view) (a.k.a. indexed view), which is a read-only table automatically maintained by the rdbms based on a view definition. Those would be very convenient here, but I don't think they're available on all of our supported databases (not in sqlite3) and they have some [important limitations](https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd171921(v=sql.100)?redirectedfrom=MSDN#additional-requirements) and [caveats](https://www.fromdual.com/mysql-materialized-views#refreshing) as well, like not being immediately updated (e.g. in PostgreSQL a `REFRESH MATERIALIZED VIEW mymatview;` statement must be ran in order to update its contents). [This SO answer](https://stackoverflow.com/questions/439056/is-a-view-faster-than-a-simple-query) explains how views can improve a query performance, but only if they are indexed (materialized) views.
Author
Owner

@sapk commented on GitHub (Jan 7, 2020):

@guillep2k The optimization (cache) could be done on gitea part like using xorm cache over the view and clearing it when rights are changed. I am not against the table, I just fear that it would become a hazardous code to maintain so if it could be a more simple solution it would be better. In fact, we can go first with the table and later move to a view and just remove the code that updated the table.

@sapk commented on GitHub (Jan 7, 2020): @guillep2k The optimization (cache) could be done on gitea part like using xorm cache over the view and clearing it when rights are changed. I am not against the table, I just fear that it would become a hazardous code to maintain so if it could be a more simple solution it would be better. In fact, we can go first with the table and later move to a view and just remove the code that updated the table.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#4597