issue: Performance bottleneck in /api/models: excessive DB queries in get_all_models and access filtering #5420

New Issue

GiteaMirror · 2025-11-11T16:20:32-06:00

GiteaMirror commented

2025-11-11 16:20:32 -06:00

Originally created by @escargotbuffed on GitHub (Jun 2, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.13

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22.04

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Calling GET /api/models should return a filtered list of models within a few hundred milliseconds at most.
The endpoint should retrieve all available models (from OpenAI/functions) and apply access-control filtering without issuing dozens of individual database queries.

Actual Behavior

The response time for GET /api/models often exceeds 7–8 seconds.
Profiling shows two main hotspots:
1. get_all_models (≈ 4.5 s) — fetching/merging every model from external sources
2. Access-filtering loop (≈ 3.3 s) — for each model that does not have arena=true, the code invokes Models.get_model_by_id(model_id) (1 DB query per model) followed—if the user is not the owner—by has_access(...), which itself calls Groups.get_groups_by_member_id(user_id) (another DB query per model)

Because there are O(40–50) models, the combined effect is tens of database round-trips in sequence, causing a multi-second delay before any data is returned to the client.

Even though Redis is configured, the final filtered model list is not being cached. Every request re-executes all database queries against the remote Postgres instance.

Steps to Reproduce

Ensure the application is pointed at a remote (non‐local) PostgreSQL database instance.
Authenticate as a normal user (role = “user”) who owns or has permission on a subset of models.

Issue a request to the /api/models endpoint:

GET https://<your-domain>/api/models
Authorization: Bearer <valid token>

Observe the response time in your browser’s network tab (it will typically take around 8 seconds).

Logs & Screenshots

2025-06-02 00:19:59.509 | DEBUG    | open_webui.main:get_models:1198 
    Times (ms): get_all_models=4484.3, tags_processing=0.1, sorting=0.1, access_filtering=3355.3, total=7839.7  
2025-06-02 00:19:59.512 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 
    177.39.9.158:0 - "GET /api/models HTTP/1.1" 200 - {}

@app.get("/api/models")
async def get_models(request: Request, user=Depends(get_verified_user)):
    t0 = time.perf_counter()
    def get_filtered_models(models, user):
        filtered_models = []
        for model in models:
            if model.get("arena"):
                if has_access(
                    user.id,
                    type="read",
                    access_control=model.get("info", {})
                    .get("meta", {})
                    .get("access_control", {}),
                ):
                    filtered_models.append(model)
                continue

            model_info = Models.get_model_by_id(model["id"])
            if model_info:
                if user.id == model_info.user_id or has_access(
                    user.id, type="read", access_control=model_info.access_control
                ):
                    filtered_models.append(model)

        return filtered_models

    t1 = time.perf_counter()
    all_models = await get_all_models(request, user=user)
    t2 = time.perf_counter()

    models = []
    for model in all_models:
        # Filter out filter pipelines
        if "pipeline" in model and model["pipeline"].get("type", None) == "filter":
            continue

        try:
            model_tags = [
                tag.get("name")
                for tag in model.get("info", {}).get("meta", {}).get("tags", [])
            ]
            tags = [tag.get("name") for tag in model.get("tags", [])]

            tags = list(set(model_tags + tags))
            model["tags"] = [{"name": tag} for tag in tags]
        except Exception as e:
            log.debug(f"Error processing model tags: {e}")
            model["tags"] = []
            pass

        models.append(model)
    t3 = time.perf_counter()

    model_order_list = request.app.state.config.MODEL_ORDER_LIST
    if model_order_list:
        model_order_dict = {model_id: i for i, model_id in enumerate(model_order_list)}
        # Sort models by order list priority, with fallback for those not in the list
        models.sort(
            key=lambda x: (model_order_dict.get(x["id"], float("inf")), x["name"])
        )

    t4 = time.perf_counter()
    # Filter out models that the user does not have access to
    if user.role == "user" and not BYPASS_MODEL_ACCESS_CONTROL:
        models = get_filtered_models(models, user)

    t5 = time.perf_counter()
    log.debug(
        f"/api/models returned filtered models accessible to the user: {json.dumps([model['id'] for model in models])}"
    )

    log.debug(
        f"Times (ms): get_all_models={(fetch_end-fetch_start)*1000:.1f}, "
        f"tags_processing={(tags_end-fetch_end)*1000:.1f}, "
        f"sorting={(sort_end-tags_end)*1000:.1f}, "
        f"access_filtering={(access_end-access_start)*1000:.1f}, "
        f"total={(end_time-start_time)*1000:.1f}"
    )
    
    return {"data": models}

Additional Information

Database & Cache Configuration
- Postgres is running on a separate host (e.g., postgres://user:pass@db.example.com:5432/open_webui).
- Redis is running and reachable, but the code never writes the final filtered‐model list or intermediate permission results into Redis. Consequently, Redis is effectively unused for /api/models.
Why This Matters
- Because the database is not local, each SQL call adds network latency (often 20–50 ms or more). With ~45 models, that quickly multiplies into multiple seconds.
- Redis is already configured in the project, but no code path writes the merged/filtered model list into Redis. Even if Redis is available, every request misses the cache and repeats all remote queries.

Originally created by @escargotbuffed on GitHub (Jun 2, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.13 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior * Calling `GET /api/models` should return a filtered list of models within a few hundred milliseconds at most. * The endpoint should retrieve all available models (from OpenAI/functions) and apply access-control filtering without issuing dozens of individual database queries. ### Actual Behavior * The response time for `GET /api/models` often exceeds 7–8 seconds. * Profiling shows two main hotspots: 1. **`get_all_models`** (≈ 4.5 s) — fetching/merging every model from external sources 2. **Access-filtering loop** (≈ 3.3 s) — for each model that does not have `arena=true`, the code invokes `Models.get_model_by_id(model_id)` (1 DB query per model) followed—if the user is not the owner—by `has_access(...)`, which itself calls `Groups.get_groups_by_member_id(user_id)` (another DB query per model) Because there are O(40–50) models, the combined effect is tens of database round-trips in sequence, causing a multi-second delay before any data is returned to the client. Even though Redis is configured, the final filtered model list is not being cached. Every request re-executes all database queries against the remote Postgres instance. ### Steps to Reproduce 1. Ensure the application is pointed at a **remote (non‐local) PostgreSQL** database instance. 2. Authenticate as a normal user (role = “user”) who owns or has permission on a subset of models. 3. Issue a request to the `/api/models` endpoint: ``` GET https://<your-domain>/api/models Authorization: Bearer <valid token> ``` 4. Observe the response time in your browser’s network tab (it will typically take around 8 seconds). ### Logs & Screenshots ``` 2025-06-02 00:19:59.509 | DEBUG | open_webui.main:get_models:1198 Times (ms): get_all_models=4484.3, tags_processing=0.1, sorting=0.1, access_filtering=3355.3, total=7839.7 2025-06-02 00:19:59.512 | INFO | uvicorn.protocols.http.httptools_impl:send:476 177.39.9.158:0 - "GET /api/models HTTP/1.1" 200 - {} ``` ``` @app.get("/api/models") async def get_models(request: Request, user=Depends(get_verified_user)): t0 = time.perf_counter() def get_filtered_models(models, user): filtered_models = [] for model in models: if model.get("arena"): if has_access( user.id, type="read", access_control=model.get("info", {}) .get("meta", {}) .get("access_control", {}), ): filtered_models.append(model) continue model_info = Models.get_model_by_id(model["id"]) if model_info: if user.id == model_info.user_id or has_access( user.id, type="read", access_control=model_info.access_control ): filtered_models.append(model) return filtered_models t1 = time.perf_counter() all_models = await get_all_models(request, user=user) t2 = time.perf_counter() models = [] for model in all_models: # Filter out filter pipelines if "pipeline" in model and model["pipeline"].get("type", None) == "filter": continue try: model_tags = [ tag.get("name") for tag in model.get("info", {}).get("meta", {}).get("tags", []) ] tags = [tag.get("name") for tag in model.get("tags", [])] tags = list(set(model_tags + tags)) model["tags"] = [{"name": tag} for tag in tags] except Exception as e: log.debug(f"Error processing model tags: {e}") model["tags"] = [] pass models.append(model) t3 = time.perf_counter() model_order_list = request.app.state.config.MODEL_ORDER_LIST if model_order_list: model_order_dict = {model_id: i for i, model_id in enumerate(model_order_list)} # Sort models by order list priority, with fallback for those not in the list models.sort( key=lambda x: (model_order_dict.get(x["id"], float("inf")), x["name"]) ) t4 = time.perf_counter() # Filter out models that the user does not have access to if user.role == "user" and not BYPASS_MODEL_ACCESS_CONTROL: models = get_filtered_models(models, user) t5 = time.perf_counter() log.debug( f"/api/models returned filtered models accessible to the user: {json.dumps([model['id'] for model in models])}" ) log.debug( f"Times (ms): get_all_models={(fetch_end-fetch_start)*1000:.1f}, " f"tags_processing={(tags_end-fetch_end)*1000:.1f}, " f"sorting={(sort_end-tags_end)*1000:.1f}, " f"access_filtering={(access_end-access_start)*1000:.1f}, " f"total={(end_time-start_time)*1000:.1f}" ) return {"data": models} ``` ### Additional Information * **Database & Cache Configuration** * **Postgres** is running on a separate host (e.g., `postgres://user:pass@db.example.com:5432/open_webui`). * **Redis** is running and reachable, but the code never writes the final filtered‐model list or intermediate permission results into Redis. Consequently, Redis is effectively unused for `/api/models`. * **Why This Matters** * Because the database is **not local**, each SQL call adds network latency (often 20–50 ms or more). With \~45 models, that quickly multiplies into multiple seconds. * Redis is already configured in the project, but no code path writes the merged/filtered model list into Redis. Even if Redis is available, every request misses the cache and repeats all remote queries.

GiteaMirror added the bug label 2025-11-11 16:20:32 -06:00

GiteaMirror closed this issue