[GH-ISSUE #23323] feat: built-in per-user and per-group token/message usage limits #58614

New Issue

GiteaMirror · 2026-05-05T23:33:12-05:00

GiteaMirror commented

2026-05-05 23:33:12 -05:00

Originally created by @smorello87 on GitHub (Apr 1, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23323

Check Existing Issues

I have searched the existing issues and discussions.

Related: #6692 (broad tracking/limit discussion), #21675 (analytics enhancements)

Problem Description

Open WebUI v0.8.0+ introduced the chat_message table with per-message token usage tracking, and the analytics dashboard already aggregates usage by user and by model via get_token_usage_by_user and get_token_usage_by_model. However, there's currently no built-in way to enforce usage limits — admins can see how many tokens each user consumed, but can't cap it.

For institutions and teams deploying Open WebUI, this is a critical gap. Without built-in limits, admins must either:

Use external proxies (LiteLLM, custom middleware) to enforce budgets — adding operational complexity
Use pipe functions for tracking — which can't block requests at the platform level
Simply trust users not to overconsume — which doesn't scale

The data to enforce limits is already being collected in chat_message.usage. What's missing is the enforcement layer.

Desired Solution

Add an admin-configurable usage limit system that leverages the existing chat_message analytics data:

Per-user limits (Admin → Users or Settings):

Max tokens per day/week/month (input, output, or total)
Max messages per day/week/month
When limit is reached: block new requests with a clear error message

Per-group limits (Admin → Groups):

Same token/message limits, but applied at the group level
Users inherit the limit from their highest-priority group (consistent with existing group permission model)
Group limits could serve as defaults, with per-user overrides

Admin UI:

Settings page to configure default limits and per-group limits
Visual indicator on the analytics dashboard showing users approaching their limits
Option to set limits as "soft" (warn) or "hard" (block)

User-facing:

Clear indication of remaining quota (e.g., in the UI or via an API endpoint)
Informative error message when limit is reached, not a generic failure

Why Built-In?

The chat_message table already has everything needed:

get_token_usage_by_user() can check current period usage
get_message_count_by_user() can check message counts
Group membership is already tracked

A lightweight check before each model call (query current period usage, compare against limit) would be straightforward to implement and wouldn't require any external dependencies.

Alternatives Considered

LiteLLM proxy: We currently use this on our dev environment. It works but adds significant operational complexity (separate database, sidecar container, config sync, user sync scripts). For teams that just need basic limits, this is overkill.
Pipe functions (e.g., openwebui-token-tracking): Good for credit-based tracking but can't enforce hard limits at the platform level, and requires per-environment UI configuration.
External API gateway: Same complexity issue as LiteLLM.

A built-in solution would cover the 80% use case (simple token/message caps) without requiring external infrastructure, while users with more complex needs (cost-based billing, per-model pricing) could continue using LiteLLM or similar tools.

Additional Context

We run Open WebUI for ~500 users at CUNY (City University of New York) and this is our most-requested admin feature. We've been working around it with LiteLLM sidecar + group budget sync scripts, but a native solution would dramatically simplify our deployment.

Originally created by @smorello87 on GitHub (Apr 1, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23323 ### Check Existing Issues - [x] I have searched the existing issues and discussions. Related: #6692 (broad tracking/limit discussion), #21675 (analytics enhancements) ### Problem Description Open WebUI v0.8.0+ introduced the `chat_message` table with per-message token usage tracking, and the analytics dashboard already aggregates usage by user and by model via `get_token_usage_by_user` and `get_token_usage_by_model`. However, there's currently no built-in way to **enforce** usage limits — admins can *see* how many tokens each user consumed, but can't *cap* it. For institutions and teams deploying Open WebUI, this is a critical gap. Without built-in limits, admins must either: - Use external proxies (LiteLLM, custom middleware) to enforce budgets — adding operational complexity - Use pipe functions for tracking — which can't block requests at the platform level - Simply trust users not to overconsume — which doesn't scale The data to enforce limits is already being collected in `chat_message.usage`. What's missing is the enforcement layer. ### Desired Solution Add an admin-configurable usage limit system that leverages the existing `chat_message` analytics data: **Per-user limits** (Admin → Users or Settings): - Max tokens per day/week/month (input, output, or total) - Max messages per day/week/month - When limit is reached: block new requests with a clear error message **Per-group limits** (Admin → Groups): - Same token/message limits, but applied at the group level - Users inherit the limit from their highest-priority group (consistent with existing group permission model) - Group limits could serve as defaults, with per-user overrides **Admin UI**: - Settings page to configure default limits and per-group limits - Visual indicator on the analytics dashboard showing users approaching their limits - Option to set limits as "soft" (warn) or "hard" (block) **User-facing**: - Clear indication of remaining quota (e.g., in the UI or via an API endpoint) - Informative error message when limit is reached, not a generic failure ### Why Built-In? The `chat_message` table already has everything needed: - `get_token_usage_by_user()` can check current period usage - `get_message_count_by_user()` can check message counts - Group membership is already tracked A lightweight check before each model call (query current period usage, compare against limit) would be straightforward to implement and wouldn't require any external dependencies. ### Alternatives Considered - **LiteLLM proxy**: We currently use this on our dev environment. It works but adds significant operational complexity (separate database, sidecar container, config sync, user sync scripts). For teams that just need basic limits, this is overkill. - **Pipe functions** (e.g., [openwebui-token-tracking](https://github.com/dartmouth/openwebui-token-tracking)): Good for credit-based tracking but can't enforce hard limits at the platform level, and requires per-environment UI configuration. - **External API gateway**: Same complexity issue as LiteLLM. A built-in solution would cover the 80% use case (simple token/message caps) without requiring external infrastructure, while users with more complex needs (cost-based billing, per-model pricing) could continue using LiteLLM or similar tools. ### Additional Context We run Open WebUI for ~500 users at CUNY (City University of New York) and this is our most-requested admin feature. We've been working around it with LiteLLM sidecar + group budget sync scripts, but a native solution would dramatically simplify our deployment.

GiteaMirror closed this issue

2026-05-05 23:33:13 -05:00

GiteaMirror commented

2026-05-05 23:33:16 -05:00

@tjbck commented on GitHub (Apr 2, 2026):

#21675

@tjbck commented on GitHub (Apr 2, 2026): #21675

Sign in to join this conversation.