[GH-ISSUE #12058] issue: Cached tokens out of nowhere #16454

New Issue

GiteaMirror · 2026-04-19T22:22:12-05:00

GiteaMirror commented

2026-04-19 22:22:12 -05:00

Originally created by @davidvpe on GitHub (Mar 25, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12058

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.5.20

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24.04

Browser (if applicable)

Brave

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

When sending a message saying "hello" for the first time in a new chat tokens should be at the minimum

Actual Behavior

Tokens used are really high and and some are cached already but nothing appears on the logs.

Steps to Reproduce

I am using OpenWebUI with LiteLLM

Setup LiteLLM as OpenAI connection (not direct connection) on OpenWebUI
Setup models from LiteLLM (I am currently using bedrock models not sure if its relevant, I use all default settings)
Initiate more than 1 conversation till tokens reaches 2191 (request+response)
From now on all requests are processed with that amount of tokens.

Logs & Screenshots

These are the request/response/metadata gotten from LiteLLM. I see from the browser logs that the request made doesn't containe any history. So I am a bit puzzled.

Request:

{
  "stream": true,
  "model": "bedrock-claude-sonnet-3.7",
  "messages": [
    {
      "role": "system",
      "content": "Se conciso y directo al punto al responder, a no ser que alguien te pida mas detalle."
    },
    {
      "role": "user",
      "content": "hello"
    }
  ]
}

Response:

{
  "id": "chatcmpl-9233222f-67db-413c-ad24-4507fb635050",
  "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
  "usage": {
    "total_tokens": 2191,
    "prompt_tokens": 1799,
    "completion_tokens": 392,
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 1796
    },
    "cache_read_input_tokens": 1796,
    "completion_tokens_details": null,
    "cache_creation_input_tokens": 0
  },
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?",
        "tool_calls": null,
        "function_call": null
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1742940082,
  "system_fingerprint": null
}

Metadata

{
  "batch_models": null,
  "user_api_key": "xxxxx",
  "applied_guardrails": [],
  "user_api_key_alias": "OpenWebUI",
  "user_api_key_org_id": null,
  "requester_ip_address": "",
  "user_api_key_team_id": "yyy",
  "user_api_key_user_id": "zzz",
  "additional_usage_values": {
    "prompt_tokens_details": {
      "text_tokens": null,
      "audio_tokens": null,
      "image_tokens": null,
      "cached_tokens": 1796
    },
    "cache_read_input_tokens": 1796,
    "completion_tokens_details": null,
    "cache_creation_input_tokens": 0
  },
  "user_api_key_team_alias": "Home"
}

Additional Information

Just to mention that I am using LiteLLM from other clients as well (Telegram Bots) and I don't have the same issue so I think OpenWebUI is doing something behind the curtains to cache some tokens and reuse them in the same conversation Because sometimes it randomly mentions stuff I mentioned in different chats.

Originally created by @davidvpe on GitHub (Mar 25, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/12058 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.5.20 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24.04 ### Browser (if applicable) Brave ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior When sending a message saying "hello" for the first time in a new chat tokens should be at the minimum ### Actual Behavior Tokens used are really high and and some are cached already but nothing appears on the logs. ### Steps to Reproduce I am using OpenWebUI with LiteLLM - Setup LiteLLM as OpenAI connection (not direct connection) on OpenWebUI - Setup models from LiteLLM (I am currently using bedrock models not sure if its relevant, I use all default settings) - Initiate more than 1 conversation till tokens reaches 2191 (request+response) - From now on all requests are processed with that amount of tokens. ### Logs & Screenshots These are the request/response/metadata gotten from LiteLLM. I see from the browser logs that the request made doesn't containe any history. So I am a bit puzzled. Request: ```json { "stream": true, "model": "bedrock-claude-sonnet-3.7", "messages": [ { "role": "system", "content": "Se conciso y directo al punto al responder, a no ser que alguien te pida mas detalle." }, { "role": "user", "content": "hello" } ] } ``` Response: ```json { "id": "chatcmpl-9233222f-67db-413c-ad24-4507fb635050", "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", "usage": { "total_tokens": 2191, "prompt_tokens": 1799, "completion_tokens": 392, "prompt_tokens_details": { "audio_tokens": null, "cached_tokens": 1796 }, "cache_read_input_tokens": 1796, "completion_tokens_details": null, "cache_creation_input_tokens": 0 }, "object": "chat.completion", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I assist you today?", "tool_calls": null, "function_call": null }, "finish_reason": "stop" } ], "created": 1742940082, "system_fingerprint": null } ``` Metadata ```json { "batch_models": null, "user_api_key": "xxxxx", "applied_guardrails": [], "user_api_key_alias": "OpenWebUI", "user_api_key_org_id": null, "requester_ip_address": "", "user_api_key_team_id": "yyy", "user_api_key_user_id": "zzz", "additional_usage_values": { "prompt_tokens_details": { "text_tokens": null, "audio_tokens": null, "image_tokens": null, "cached_tokens": 1796 }, "cache_read_input_tokens": 1796, "completion_tokens_details": null, "cache_creation_input_tokens": 0 }, "user_api_key_team_alias": "Home" } ``` ![Image](https://github.com/user-attachments/assets/985e2f42-2872-48f7-8717-325316c39faf) ### Additional Information Just to mention that I am using LiteLLM from other clients as well (Telegram Bots) and I don't have the same issue so I think OpenWebUI is doing something behind the curtains to cache some tokens and reuse them in the same conversation Because sometimes it randomly mentions stuff I mentioned in different chats.

GiteaMirror added the bug label 2026-04-19 22:22:12 -05:00

GiteaMirror closed this issue

2026-04-19 22:22:13 -05:00

GiteaMirror referenced this issue

2026-04-20 05:15:28 -05:00

[PR #16454] [CLOSED] fix: Handle JSONResponse objects in middleware to prevent TypeError #24140

GiteaMirror referenced this issue

2026-04-25 12:13:34 -05:00

[PR #16454] [CLOSED] fix: Handle JSONResponse objects in middleware to prevent TypeError #39770

GiteaMirror referenced this issue

2026-04-29 22:17:51 -05:00

[PR #16454] [CLOSED] fix: Handle JSONResponse objects in middleware to prevent TypeError #47188