[PR #21165] [MERGED] fix: bundle NLTK punkt_tab in Docker image for airgapped environments #25955

Closed
opened 2026-04-20 06:13:58 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/21165
Author: @Classic298
Created: 2/4/2026
Status: Merged
Merged: 2/5/2026
Merged by: @tjbck

Base: devHead: ntlk


📝 Commits (1)

  • 435e283 fix: bundle NLTK punkt_tab in Docker image for airgapped environments

📊 Changes

1 file changed (+2 additions, -0 deletions)

View changed files

📝 Dockerfile (+2 -0)

📄 Description

Pre-download NLTK punkt_tab during Docker build instead of at runtime. This fixes document extraction failures in offline/airgapped environments where the container cannot download the tokenizer data after restarts.

Fixes #21150

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/21165 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 2/4/2026 **Status:** ✅ Merged **Merged:** 2/5/2026 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `ntlk` --- ### 📝 Commits (1) - [`435e283`](https://github.com/open-webui/open-webui/commit/435e2838dc8e06fcb53104f879843060d02b4212) fix: bundle NLTK punkt_tab in Docker image for airgapped environments ### 📊 Changes **1 file changed** (+2 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `Dockerfile` (+2 -0) </details> ### 📄 Description Pre-download NLTK punkt_tab during Docker build instead of at runtime. This fixes document extraction failures in offline/airgapped environments where the container cannot download the tokenizer data after restarts. Fixes #21150 ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 06:13:58 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#25955