[PR #7237] [CLOSED] feat: add cache_prompt option for llama-server completions #8828

New Issue

GiteaMirror · 2025-11-11T18:07:03-06:00

GiteaMirror commented

2025-11-11 18:07:03 -06:00

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/7237
Author: @drunnells
Created: 11/22/2024
Status: ❌ Closed

Base: dev ← Head: main

📝 Commits (1)

75d13a0 feat: add cache_prompt option for llama-server completions

📊 Changes

3 files changed (+46 additions, -0 deletions)

View changed files

📝 backend/open_webui/apps/openai/main.py (+4 -0)
📝 backend/open_webui/utils/payload.py (+1 -0)
📝 src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte (+41 -0)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

Target branch: Please verify that the pull request targets the dev branch.
Description: Provide a concise description of the changes made in this pull request.
Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
Testing: Have you written and run sufficient tests for validating the changes?
Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:
- BREAKING CHANGE: Significant changes that may affect compatibility
- build: Changes that affect the build system or external dependencies
- ci: Changes to our continuous integration processes or workflows
- chore: Refactor, cleanup, or other non-functional code changes
- docs: Documentation update or addition
- feat: Introduces a new feature or enhancement to the codebase
- fix: Bug fix or error correction
- i18n: Internationalization or localization changes
- perf: Performance improvement
- refactor: Code restructuring for better maintainability, readability, or scalability
- style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)
- test: Adding missing tests or correcting existing tests
- WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

Adds the cache_prompt option for llama-server completions, enabling users to reuse the KV cache when supported, improving completion speed in long conversations when using llama.cpp server.

Added

Backend: Support for the cache_prompt parameter in the payload for llama-server completions.
Frontend: Toggle to enable or disable cache_prompt in Advanced Settings.

Changed

Deprecated

Removed

Fixed

Security

Breaking Changes

Additional Information

Started discussion thread regarding this change: here
More information about the cache_prompt parameter with llama.cpp can be found here
Tested locally for cases where cache_prompt is enabled and disabled.
Screenshots attached for frontend changes.

Screenshots or Videos

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/7237 **Author:** [@drunnells](https://github.com/drunnells) **Created:** 11/22/2024 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `main` --- ### 📝 Commits (1) - [`75d13a0`](https://github.com/open-webui/open-webui/commit/75d13a0e6c170d19a06a82f6175b9ad801298994) feat: add cache_prompt option for llama-server completions ### 📊 Changes **3 files changed** (+46 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/apps/openai/main.py` (+4 -0) 📝 `backend/open_webui/utils/payload.py` (+1 -0) 📝 `src/lib/components/chat/Settings/Advanced/AdvancedParams.svelte` (+41 -0) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [x] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [x] **Testing:** Have you written and run sufficient tests for validating the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To cleary categorize this pull request, prefix the pull request title, using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description - Adds the `cache_prompt` option for llama-server completions, enabling users to reuse the KV cache when supported, improving completion speed in long conversations when using llama.cpp server. ### Added - **Backend**: Support for the `cache_prompt` parameter in the payload for llama-server completions. - **Frontend**: Toggle to enable or disable `cache_prompt` in Advanced Settings. ### Changed ### Deprecated ### Removed ### Fixed ### Security ### Breaking Changes --- ### Additional Information - Started discussion thread regarding this change: [here](https://github.com/open-webui/open-webui/discussions/7137) - More information about the cache_prompt parameter with llama.cpp can be found [here](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#post-completion-given-a-prompt-it-returns-the-predicted-completion) - Tested locally for cases where `cache_prompt` is enabled and disabled. - Screenshots attached for frontend changes. ### Screenshots or Videos ![Screenshot 2024-11-20 at 9 44 39 PM](https://github.com/user-attachments/assets/a27924f2-a639-471b-a153-b37b7332b284) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2025-11-11 18:07:03 -06:00

GiteaMirror closed this issue

2025-11-11 18:07:03 -06:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#8828