[PR #22373] [CLOSED] feat: add utilities and deployment scaffolding for external MedGemma model server #49689

Closed
opened 2026-04-30 01:59:35 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/22373
Author: @cadeferg
Created: 3/7/2026
Status: Closed

Base: devHead: medgemma-runpod-setup


📝 Commits (10+)

📊 Changes

14 files changed (+280 additions, -25 deletions)

View changed files

📝 .env.example (+27 -1)
📝 Dockerfile (+1 -0)
📝 README.md (+7 -0)
📝 backend/open_webui/config.py (+13 -13)
📝 backend/open_webui/main.py (+5 -0)
backend/open_webui/prompts/medical_simplification.txt (+22 -0)
backend/open_webui/utils/document_chunking.py (+21 -0)
backend/open_webui/utils/runpod_idle_shutdown.py (+126 -0)
deploy/runpod-model/.env.example (+4 -0)
deploy/runpod-model/README.md (+11 -0)
deploy/runpod-model/run_vllm.sh (+12 -0)
desktop.ini (+2 -0)
docker-compose.medgemma-dev.yml (+27 -0)
📝 docker-compose.yaml (+2 -11)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: This pull request targets the dev branch.
  • Description: A concise description of the changes is provided below.
  • Changelog: A changelog entry has been included.
  • Documentation: No user-facing behavior changes yet; documentation updates may follow depending on maintainers' feedback.
  • Dependencies: No new dependencies were introduced.
  • Testing: Manual testing performed to ensure the backend starts and builds successfully with the added utilities.
  • Agentic AI Code: This PR has been reviewed and manually validated before submission.
  • Code review: Self-review performed to ensure adherence to project conventions.
  • Design & Architecture: Changes are isolated utility additions and deployment scaffolding without altering existing runtime behavior.
  • Git Hygiene: This PR is atomic and focuses on deployment preparation utilities.
  • Title Prefix: chore

Description

This PR introduces backend utilities and deployment scaffolding to support running Open WebUI with an external HuggingFace-hosted MedGemma model server (e.g., served via vLLM on infrastructure such as RunPod).

The goal is to prepare the repository for integration with external LLM inference services rather than assuming local Ollama-based models.

These changes are non-breaking and primarily add optional utilities and configuration scaffolding that can be used in external deployments.

Key motivations:

  • enable Open WebUI deployments that rely on external model inference servers
  • prepare infrastructure for MedGemma-based medical document simplification workflows
  • introduce optional idle-shutdown support for GPU infrastructure (e.g., RunPod)
  • provide utilities for document chunking and prompt templating for long medical texts

None of these additions modify existing model providers or affect default Open WebUI behavior.


Changelog Entry

Description

Adds backend utilities and deployment scaffolding to support external model serving (e.g., MedGemma via vLLM) and infrastructure-aware deployments such as GPU spot instances.

Added

  • runpod_idle_shutdown.py utility to support optional automatic shutdown of idle GPU pods
  • document_chunking.py helper for splitting long medical documents into manageable chunks
  • medical document simplification prompt template
  • deployment scaffolding for running a MedGemma model server via vLLM
  • environment variable placeholders for external model server configuration

Changed

  • Extended .env.example with optional configuration variables for external model servers and GPU deployment environments.

Deprecated

  • None

Removed

  • None

Fixed

  • None

Security

  • No security-related changes.

Breaking Changes

  • None

Additional Information

These additions are intended for deployments where Open WebUI interacts with an external model inference server (e.g., HuggingFace models served through vLLM).

The utilities are optional and do not affect existing Open WebUI functionality.

Future work may include:

  • official configuration support for external HuggingFace model providers
  • deployment guides for GPU-hosted inference backends

Screenshots or Videos

N/A — backend utilities and deployment scaffolding only.


Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/22373 **Author:** [@cadeferg](https://github.com/cadeferg) **Created:** 3/7/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `medgemma-runpod-setup` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`6f1486f`](https://github.com/open-webui/open-webui/commit/6f1486ffd0cb288d0e21f41845361924e0d742b3) Merge pull request #19466 from open-webui/dev - [`d95f533`](https://github.com/open-webui/open-webui/commit/d95f533214e3fe5beb5e41ec1f349940bc4c7043) Merge pull request #19729 from open-webui/dev - [`a727153`](https://github.com/open-webui/open-webui/commit/a7271532f8a38da46785afcaa7e65f9a45e7d753) 0.6.43 (#20093) - [`6adde20`](https://github.com/open-webui/open-webui/commit/6adde203cd292a9e3af9c64a2ae36b603fed096a) Merge pull request #20394 from open-webui/dev - [`f9b0534`](https://github.com/open-webui/open-webui/commit/f9b0534e0c442631d1cb7205169588b9b6204179) Merge pull request #20522 from open-webui/dev ### 📊 Changes **14 files changed** (+280 additions, -25 deletions) <details> <summary>View changed files</summary> 📝 `.env.example` (+27 -1) 📝 `Dockerfile` (+1 -0) 📝 `README.md` (+7 -0) 📝 `backend/open_webui/config.py` (+13 -13) 📝 `backend/open_webui/main.py` (+5 -0) ➕ `backend/open_webui/prompts/medical_simplification.txt` (+22 -0) ➕ `backend/open_webui/utils/document_chunking.py` (+21 -0) ➕ `backend/open_webui/utils/runpod_idle_shutdown.py` (+126 -0) ➕ `deploy/runpod-model/.env.example` (+4 -0) ➕ `deploy/runpod-model/README.md` (+11 -0) ➕ `deploy/runpod-model/run_vllm.sh` (+12 -0) ➕ `desktop.ini` (+2 -0) ➕ `docker-compose.medgemma-dev.yml` (+27 -0) 📝 `docker-compose.yaml` (+2 -11) </details> ### 📄 Description <!-- ⚠️ CRITICAL CHECKS FOR CONTRIBUTORS (READ, DON'T DELETE) ⚠️ 1. Target the `dev` branch. PRs targeting `main` will be automatically closed. 2. Do NOT delete the CLA section at the bottom. It is required for the bot to accept your PR. --> # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** This pull request targets the `dev` branch. - [x] **Description:** A concise description of the changes is provided below. - [x] **Changelog:** A changelog entry has been included. - [ ] **Documentation:** No user-facing behavior changes yet; documentation updates may follow depending on maintainers' feedback. - [x] **Dependencies:** No new dependencies were introduced. - [x] **Testing:** Manual testing performed to ensure the backend starts and builds successfully with the added utilities. - [x] **Agentic AI Code:** This PR has been reviewed and manually validated before submission. - [x] **Code review:** Self-review performed to ensure adherence to project conventions. - [x] **Design & Architecture:** Changes are isolated utility additions and deployment scaffolding without altering existing runtime behavior. - [x] **Git Hygiene:** This PR is atomic and focuses on deployment preparation utilities. - [x] **Title Prefix:** `chore` --- # Description This PR introduces backend utilities and deployment scaffolding to support running Open WebUI with an **external HuggingFace-hosted MedGemma model server** (e.g., served via vLLM on infrastructure such as RunPod). The goal is to prepare the repository for integration with external LLM inference services rather than assuming local Ollama-based models. These changes are **non-breaking** and primarily add optional utilities and configuration scaffolding that can be used in external deployments. Key motivations: - enable Open WebUI deployments that rely on **external model inference servers** - prepare infrastructure for **MedGemma-based medical document simplification workflows** - introduce **optional idle-shutdown support** for GPU infrastructure (e.g., RunPod) - provide utilities for **document chunking and prompt templating** for long medical texts None of these additions modify existing model providers or affect default Open WebUI behavior. --- # Changelog Entry ### Description Adds backend utilities and deployment scaffolding to support external model serving (e.g., MedGemma via vLLM) and infrastructure-aware deployments such as GPU spot instances. ### Added - `runpod_idle_shutdown.py` utility to support optional automatic shutdown of idle GPU pods - `document_chunking.py` helper for splitting long medical documents into manageable chunks - medical document simplification prompt template - deployment scaffolding for running a MedGemma model server via vLLM - environment variable placeholders for external model server configuration ### Changed - Extended `.env.example` with optional configuration variables for external model servers and GPU deployment environments. ### Deprecated - None ### Removed - None ### Fixed - None ### Security - No security-related changes. ### Breaking Changes - **None** --- ### Additional Information These additions are intended for deployments where Open WebUI interacts with an external model inference server (e.g., HuggingFace models served through vLLM). The utilities are optional and do not affect existing Open WebUI functionality. Future work may include: - official configuration support for external HuggingFace model providers - deployment guides for GPU-hosted inference backends --- ### Screenshots or Videos N/A — backend utilities and deployment scaffolding only. --- ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-30 01:59:35 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#49689