[GH-ISSUE #23204] issue: Anthropic direct connection - Prompt caching not supported #19918

Closed
opened 2026-04-20 02:28:31 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @Lyhtande on GitHub (Mar 29, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23204

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.8.12

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22.04.5 LTS

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The native Anthropic integration should support Anthropic's prompt caching mechanism, so that:

  • Repeated context / system prompts are cached
  • API costs are reduced significantly
  • TTS continues to work as expected

Actual Behavior

When using Anthropic Claude models via the direct connection in Open WebUI, prompt caching is not being utilized. This results in significantly higher API costs, especially with large context models like Claude Sonnet and Opus.

Anthropic supports prompt caching via specific cache-control headers in the API request. However, the current native integration does not implement this feature.

Steps to Reproduce

  1. Add Anthropic as a native model connection
  2. Start a conversation with a large system prompt
  3. Observe API usage – no cache hits, full token cost every request

Logs & Screenshots

Image

Additional Information

No response

Originally created by @Lyhtande on GitHub (Mar 29, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23204 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.8.12 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22.04.5 LTS ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The native Anthropic integration should support Anthropic's prompt caching mechanism, so that: - Repeated context / system prompts are cached - API costs are reduced significantly - TTS continues to work as expected ### Actual Behavior When using Anthropic Claude models via the direct connection in Open WebUI, prompt caching is not being utilized. This results in significantly higher API costs, especially with large context models like Claude Sonnet and Opus. Anthropic supports prompt caching via specific cache-control headers in the API request. However, the current native integration does not implement this feature. ### Steps to Reproduce 1. Add Anthropic as a native model connection 2. Start a conversation with a large system prompt 3. Observe API usage – no cache hits, full token cost every request ### Logs & Screenshots <img width="854" height="449" alt="Image" src="https://github.com/user-attachments/assets/1511c585-9752-431f-a897-f0fbb9c2ae4b" /> ### Additional Information _No response_
GiteaMirror added the bug label 2026-04-20 02:28:31 -05:00
Author
Owner

@Classic298 commented on GitHub (Mar 29, 2026):

you can do it via advanced parameter and adding cache_control header there - or a filter

both are easy options

and this is not only for direct connections but global ones
but yeah not an issue, you just need to configure it

<!-- gh-comment-id:4150797813 --> @Classic298 commented on GitHub (Mar 29, 2026): you can do it via advanced parameter and adding cache_control header there - or a filter both are easy options and this is not only for direct connections but global ones but yeah not an issue, you just need to configure it
Author
Owner

@Lyhtande commented on GitHub (Mar 29, 2026):

you can do it via advanced parameter and adding cache_control header there - or a filter

both are easy options

and this is not only for direct connections but global ones but yeah not an issue, you just need to configure it

@Classic298 Thanks for the suggestions! I tried both approaches – custom params and an inlet filter setting cache_control: {"type": "ephemeral"} at the top level. In both cases, Cache Read and Cache Write remain 0 tokens in the Anthropic console. It seems Open WebUI strips unknown keys when transforming the request to Anthropic's format. Do you have a specific implementation in mind that actually works?

What I tried:

Filter:

from pydantic import BaseModel


class Filter:
    class Valves(BaseModel):
        pass

    def __init__(self):
        self.valves = self.Valves()

    def inlet(self, body: dict, __user__: dict = None) -> dict:
        body["cache_control"] = {"type": "ephemeral"}
        return body

Advanced Param:
cache_control Value: {"type": "ephemeral"}

This results in:

"custom_params": {
      "cache_control": "{\"type\": \"ephemeral\"}"
    }
<!-- gh-comment-id:4150951653 --> @Lyhtande commented on GitHub (Mar 29, 2026): > you can do it via advanced parameter and adding cache_control header there - or a filter > > both are easy options > > and this is not only for direct connections but global ones but yeah not an issue, you just need to configure it @Classic298 Thanks for the suggestions! I tried both approaches – custom params and an inlet filter setting `cache_control: {"type": "ephemeral"} `at the top level. In both cases, Cache Read and Cache Write remain 0 tokens in the Anthropic console. It seems Open WebUI strips unknown keys when transforming the request to Anthropic's format. Do you have a specific implementation in mind that actually works? ## What I tried: **Filter:** ``` from pydantic import BaseModel class Filter: class Valves(BaseModel): pass def __init__(self): self.valves = self.Valves() def inlet(self, body: dict, __user__: dict = None) -> dict: body["cache_control"] = {"type": "ephemeral"} return body ``` **Advanced Param:** cache_control Value: {"type": "ephemeral"} This results in: ``` "custom_params": { "cache_control": "{\"type\": \"ephemeral\"}" } ```
Author
Owner

@Classic298 commented on GitHub (Mar 29, 2026):

You're right i read the anthropic docs again

The reason this doesn't work (and can't work via filters/advanced params either) is that Open WebUI communicates with Anthropic through their OpenAI-compatible /v1/chat/completions endpoint AND THAT ENDPOINT doesnt have caching support - even if you add the parameter (and you did add it correctly)

Prompt caching (cache_control) is a feature of Anthropic's native Messages API only — the OpenAI-compatible endpoint doesn't support it.

Open WebUI doesn't have outgoing support for Anthropic's native Messages API format — it only has an inbound /api/v1/messages endpoint for compatibility when using Open WebUI as an LLM proxy. Supporting prompt caching would require adding native Anthropic Messages API support on the outgoing request side, which would be a feature request rather than a bug.

What you CAN DO is add a pipe which implements anthropic as a provider and use that for your models and enjoy caching via that

reference: https://openwebui.com/posts/anthropic_60984ebf

<!-- gh-comment-id:4150992475 --> @Classic298 commented on GitHub (Mar 29, 2026): You're right i read the anthropic docs again The reason this doesn't work (and can't work via filters/advanced params either) is that Open WebUI communicates with Anthropic through their OpenAI-compatible /v1/chat/completions endpoint AND THAT ENDPOINT doesnt have caching support - even if you add the parameter (and you did add it correctly) Prompt caching (cache_control) is a feature of Anthropic's native Messages API only — the OpenAI-compatible endpoint doesn't support it. Open WebUI doesn't have outgoing support for Anthropic's native Messages API format — it only has an inbound /api/v1/messages endpoint for compatibility when using Open WebUI as an LLM proxy. Supporting prompt caching would require adding native Anthropic Messages API support on the outgoing request side, which would be a feature request rather than a bug. What you CAN DO is add a pipe which implements anthropic as a provider and use that for your models and enjoy caching via that reference: https://openwebui.com/posts/anthropic_60984ebf
Author
Owner

@Lyhtande commented on GitHub (Mar 29, 2026):

@Classic298 Thanks for the detailed explanation! I'll go with a custom pipe for now. Could you please convert this to a feature request? Native caching support via the Anthropic Messages API would be a great addition.

<!-- gh-comment-id:4151011736 --> @Lyhtande commented on GitHub (Mar 29, 2026): @Classic298 Thanks for the detailed explanation! I'll go with a custom pipe for now. Could you please convert this to a feature request? Native caching support via the Anthropic Messages API would be a great addition.
Author
Owner

@Classic298 commented on GitHub (Mar 29, 2026):

@Lyhtande there have been dozens of feature requests (many duplicates) the past about native anthropic messages support. The answer was and is: no. No native support for messages will be added as it doesn't fit with Open WebUI's stance. Providers should support universal or de-facto universal API standards and not invent their own (i.e. messages, google's "interactions" and many other examples)

And if it wasn't clear: there is no Anthropic Messages API support, therefore, no prompt caching because Anthropic only supports it via that API and not via chat completions.

<!-- gh-comment-id:4151017180 --> @Classic298 commented on GitHub (Mar 29, 2026): @Lyhtande there have been dozens of feature requests (many duplicates) the past about native anthropic messages support. The answer was and is: no. No native support for messages will be added as it doesn't fit with Open WebUI's stance. Providers should support universal or de-facto universal API standards and not invent their own (i.e. messages, google's "interactions" and many other examples) And if it wasn't clear: there is no Anthropic Messages API support, therefore, no prompt caching because Anthropic only supports it via that API and not via chat completions.
Author
Owner
<!-- gh-comment-id:4151018265 --> @Classic298 commented on GitHub (Mar 29, 2026): https://docs.openwebui.com/faq#q-why-doesnt-open-webui-natively-support-provider-xs-proprietary-api
Author
Owner

@Lyhtande commented on GitHub (Mar 29, 2026):

@Classic298 understood, and thanks for the clarification. To be fair though – in your earlier comment you yourself described this as 'a feature request rather than a bug', which is why I asked for the label change. Anyway, good to know this is a deliberate design decision. I'll stick with my custom pipe.

<!-- gh-comment-id:4151027351 --> @Lyhtande commented on GitHub (Mar 29, 2026): @Classic298 understood, and thanks for the clarification. To be fair though – in your earlier comment you yourself described this as 'a feature request rather than a bug', which is why I asked for the label change. Anyway, good to know this is a deliberate design decision. I'll stick with my custom pipe.
Author
Owner

@Classic298 commented on GitHub (Mar 29, 2026):

Yeah maybe my wording wasn't clear

I meant to say it WOULD be a feature request and not a bug.
That's it haha
I didn't mean to imply you should ask for messages api support - but that's on me. Clearly now that i read my sentence again, ... one can read an implication to request a feature for messages API support

<!-- gh-comment-id:4151031191 --> @Classic298 commented on GitHub (Mar 29, 2026): Yeah maybe my wording wasn't clear I meant to say it WOULD be a feature request and not a bug. That's it haha I didn't mean to imply you should ask for messages api support - **but that's on me**. Clearly now that i read my sentence again, ... one can read an implication to request a feature for messages API support
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#19918