[GH-ISSUE #21150] issue: 'punkt_tab' not bundled with OWUI Container, results in Offline Mode / Airgapped system issues #58066

Closed
opened 2026-05-05 22:15:58 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @TomTheWise on GitHub (Feb 4, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21150

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

tested Both with 0.7.2 and 0.6.42 - same behaviour

Ollama Version (if applicable)

No response

Operating System

Debian 12 (Podman 4.X) and Debian 13 (Podman 5.X

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

In Air Gapped mode everything should be installed and included so you can without issues restart the container after all conenctiosn to itnernet are closed.
Upload files such als xlsx, docx and pdfs and so on in Air Gapped mode should still work exactly the same.

Actual Behavior

Some but not all xlsx and also pdfs fail to upload due to missing 'punkt_tab'. Yellow error Message stating NTLK cant find punkt_tab is shown.

Image

NTLK apparently is used by OWUI to download 'punkt_tab' during a start of the Container. It is not bundled with OWUI Docker image.
When restarting it is gone again as its only downloaded into a temporary direction.

So in Airgapped systems restarting OWUI will result in heavily impaired Upload of files. Strangely not all xlsx or pdf or docx are affected - some upload without issues and can be used.

Steps to Reproduce

  1. Install OpenWebUI with Container (we use Podman, but behavior should be exactly the same on Docker)
  2. Airgap the System with Firewall, optionally but not neccesseary to reproduce this bug, enable OWUI offline mode
  3. Restart the Container or the whole system
  4. Upload a variety of PDFs, docx, xlsx and so on files to knowledge - some will successfully upload, some fill fail due to missing punkt_tab. Strangely not ALL files are affected. It happens more foten on xlsx fiels - but not on all.

Logs & Screenshots

Image

errorLog_punkt_tab_missing.txt

Additional Information

This air gapped mode probably is the explanation / reason for this exact same issue: https://github.com/open-webui/open-webui/discussions/17694
However as that Issue could not be reproduced (I guess the info about airgapped was missing) it was turned into a discussion.
As both current and older versions are affected, it is likely that this issue is exactly the same as the issue back then.

The xlsx provided by tim he used to reproduce works even without 'punkt_tab' already downloaded by NTLK.
Sadly im not able to provide internal example xlsx files.

Originally created by @TomTheWise on GitHub (Feb 4, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/21150 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version tested Both with 0.7.2 and 0.6.42 - same behaviour ### Ollama Version (if applicable) _No response_ ### Operating System Debian 12 (Podman 4.X) and Debian 13 (Podman 5.X ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior In Air Gapped mode everything should be installed and included so you can without issues restart the container after all conenctiosn to itnernet are closed. Upload files such als xlsx, docx and pdfs and so on in Air Gapped mode should still work exactly the same. ### Actual Behavior Some but not all xlsx and also pdfs fail to upload due to missing 'punkt_tab'. Yellow error Message stating NTLK cant find punkt_tab is shown. <img width="562" height="505" alt="Image" src="https://github.com/user-attachments/assets/9c2bd344-84da-4b7b-b861-209d81c52af1" /> NTLK apparently is used by OWUI to download 'punkt_tab' during a start of the Container. It is not bundled with OWUI Docker image. When restarting it is gone again as its only downloaded into a temporary direction. So in Airgapped systems restarting OWUI will result in heavily impaired Upload of files. Strangely not all xlsx or pdf or docx are affected - some upload without issues and can be used. ### Steps to Reproduce 1. Install OpenWebUI with Container (we use Podman, but behavior should be exactly the same on Docker) 2. Airgap the System with Firewall, optionally but not neccesseary to reproduce this bug, enable OWUI offline mode 3. Restart the Container or the whole system 4. Upload a variety of PDFs, docx, xlsx and so on files to knowledge - some will successfully upload, some fill fail due to missing punkt_tab. Strangely not ALL files are affected. It happens more foten on xlsx fiels - but not on all. ### Logs & Screenshots <img width="562" height="505" alt="Image" src="https://github.com/user-attachments/assets/17682238-96fb-4a9c-98f2-a7af400254fc" /> [errorLog_punkt_tab_missing.txt](https://github.com/user-attachments/files/25069234/errorLog_punkt_tab_missing.txt) ### Additional Information This air gapped mode probably is the explanation / reason for this exact same issue: https://github.com/open-webui/open-webui/discussions/17694 However as that Issue could not be reproduced (I guess the info about airgapped was missing) it was turned into a discussion. As both current and older versions are affected, it is likely that this issue is exactly the same as the issue back then. The xlsx provided by tim he used to reproduce works even without 'punkt_tab' already downloaded by NTLK. Sadly im not able to provide internal example xlsx files.
GiteaMirror added the bug label 2026-05-05 22:15:58 -05:00
Author
Owner

@owui-terminator[bot] commented on GitHub (Feb 4, 2026):

🔍 Similar Issues Found

I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:

  1. #21034 issue: arena doesnt work properly
    by thekinez • Jan 29, 2026 • bug

  2. #20327 issue: Unable to use any Open WebUI version newer than 0.6.25 due to hybrid search performance
    by galvanoid • Jan 02, 2026 • bug

  3. #20994 issue: ollama version in open-webui:ollama docker container is stuck at 0.13.5
    by wyattearp • Jan 28, 2026 • bug

  4. #19987 issue: There is a lack of visual consistency between the home page and the chat interface.
    by i-iooi-i • Dec 16, 2025 • bug

  5. #19438 issue: Icon loading regression
    by JoelShepard • Nov 24, 2025 • bug


💡 Tips:

  • If this is a duplicate, please consider closing this issue and adding any additional details to the existing one
  • If you found a solution in any of these issues, please share it here to help others

This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

<!-- gh-comment-id:3846635406 --> @owui-terminator[bot] commented on GitHub (Feb 4, 2026): 🔍 **Similar Issues Found** I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions: 1. [#21034](https://github.com/open-webui/open-webui/issues/21034) **issue: arena doesnt work properly** *by thekinez • Jan 29, 2026 • `bug`* 2. [#20327](https://github.com/open-webui/open-webui/issues/20327) **issue: Unable to use any Open WebUI version newer than 0.6.25 due to hybrid search performance** *by galvanoid • Jan 02, 2026 • `bug`* 3. [#20994](https://github.com/open-webui/open-webui/issues/20994) **issue: ollama version in open-webui:ollama docker container is stuck at 0.13.5** *by wyattearp • Jan 28, 2026 • `bug`* 4. [#19987](https://github.com/open-webui/open-webui/issues/19987) **issue: There is a lack of visual consistency between the home page and the chat interface.** *by i-iooi-i • Dec 16, 2025 • `bug`* 5. [#19438](https://github.com/open-webui/open-webui/issues/19438) **issue: Icon loading regression** *by JoelShepard • Nov 24, 2025 • `bug`* --- 💡 **Tips:** - If this is a duplicate, please consider closing this issue and adding any additional details to the existing one - If you found a solution in any of these issues, please share it here to help others *This comment was generated automatically by a bot.* Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
Author
Owner

@TomTheWise commented on GitHub (Feb 4, 2026):

Also I can add that this issue happens no matter what is set as text splitting method - happens on all 3 available settings: Chars, Tokens or Markdown headings.

This is something i don't really understand as some fiels upload without issue end get split without issue - and apparently donr require punkt_tab

<!-- gh-comment-id:3846972195 --> @TomTheWise commented on GitHub (Feb 4, 2026): Also I can add that this issue happens no matter what is set as text splitting method - happens on all 3 available settings: Chars, Tokens or Markdown headings. This is something i don't really understand as some fiels upload without issue end get split without issue - and apparently donr require punkt_tab
Author
Owner

@Classic298 commented on GitHub (Feb 4, 2026):

@TomTheWise did you try with other document extraction engines as i recommended?

the default built in one is not ideal for many documents, especially xlsx

its just a very minimal doc extraction engine that "just works" for beginners who download and want to try it out and see it working initially

for professional usecases you should use a different extraction engine

<!-- gh-comment-id:3847061068 --> @Classic298 commented on GitHub (Feb 4, 2026): @TomTheWise did you try with other document extraction engines as i recommended? the default built in one is not ideal for many documents, especially xlsx its just a very minimal doc extraction engine that "just works" for beginners who download and want to try it out and see it working initially for professional usecases you should use a different extraction engine
Author
Owner

@TomTheWise commented on GitHub (Feb 4, 2026):

@Classic298 not yet. Honestly I never looked into that as the default was surprisingly successful till now. But I will do so in the next days what option will be best for us.

<!-- gh-comment-id:3847285878 --> @TomTheWise commented on GitHub (Feb 4, 2026): @Classic298 not yet. Honestly I never looked into that as the default was surprisingly successful till now. But I will do so in the next days what option will be best for us.
Author
Owner

@TomTheWise commented on GitHub (Feb 4, 2026):

@Classic298 got Tika running super fast, super easy when Podman is already set up for OWUI!
This solves this issue and is a much superior way. Thank you very much!

Howevery maybe the fix to bundle the stuff that is currently loaded via NTLK (Im worried, maybe its even more important stuff?) in the Container Image would still be probably very good, especially for airgapped environments.

<!-- gh-comment-id:3849003080 --> @TomTheWise commented on GitHub (Feb 4, 2026): @Classic298 got Tika running super fast, super easy when Podman is already set up for OWUI! This solves this issue and is a much superior way. Thank you very much! Howevery maybe the fix to bundle the stuff that is currently loaded via NTLK (Im worried, maybe its even more important stuff?) in the Container Image would still be probably very good, especially for airgapped environments.
Author
Owner

@Classic298 commented on GitHub (Feb 4, 2026):

ill take a look. if it is easily fixable, then we will fix it - if not - probably not worth fixing it

Glad you enjoy tika. Tika can be very powerful if configured correctly

<!-- gh-comment-id:3849016789 --> @Classic298 commented on GitHub (Feb 4, 2026): ill take a look. if it is easily fixable, then we will fix it - if not - probably not worth fixing it Glad you enjoy tika. Tika can be very powerful if configured correctly
Author
Owner

@Classic298 commented on GitHub (Feb 4, 2026):

https://github.com/open-webui/open-webui/pull/21165

<!-- gh-comment-id:3849058258 --> @Classic298 commented on GitHub (Feb 4, 2026): https://github.com/open-webui/open-webui/pull/21165
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58066