mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-08 04:16:03 -05:00
[GH-ISSUE #21150] issue: 'punkt_tab' not bundled with OWUI Container, results in Offline Mode / Airgapped system issues #34929
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TomTheWise on GitHub (Feb 4, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21150
Check Existing Issues
Installation Method
Docker
Open WebUI Version
tested Both with 0.7.2 and 0.6.42 - same behaviour
Ollama Version (if applicable)
No response
Operating System
Debian 12 (Podman 4.X) and Debian 13 (Podman 5.X
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
In Air Gapped mode everything should be installed and included so you can without issues restart the container after all conenctiosn to itnernet are closed.
Upload files such als xlsx, docx and pdfs and so on in Air Gapped mode should still work exactly the same.
Actual Behavior
Some but not all xlsx and also pdfs fail to upload due to missing 'punkt_tab'. Yellow error Message stating NTLK cant find punkt_tab is shown.
NTLK apparently is used by OWUI to download 'punkt_tab' during a start of the Container. It is not bundled with OWUI Docker image.
When restarting it is gone again as its only downloaded into a temporary direction.
So in Airgapped systems restarting OWUI will result in heavily impaired Upload of files. Strangely not all xlsx or pdf or docx are affected - some upload without issues and can be used.
Steps to Reproduce
Logs & Screenshots
errorLog_punkt_tab_missing.txt
Additional Information
This air gapped mode probably is the explanation / reason for this exact same issue: https://github.com/open-webui/open-webui/discussions/17694
However as that Issue could not be reproduced (I guess the info about airgapped was missing) it was turned into a discussion.
As both current and older versions are affected, it is likely that this issue is exactly the same as the issue back then.
The xlsx provided by tim he used to reproduce works even without 'punkt_tab' already downloaded by NTLK.
Sadly im not able to provide internal example xlsx files.
@owui-terminator[bot] commented on GitHub (Feb 4, 2026):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#21034 issue: arena doesnt work properly
by thekinez • Jan 29, 2026 •
bug#20327 issue: Unable to use any Open WebUI version newer than 0.6.25 due to hybrid search performance
by galvanoid • Jan 02, 2026 •
bug#20994 issue: ollama version in open-webui:ollama docker container is stuck at 0.13.5
by wyattearp • Jan 28, 2026 •
bug#19987 issue: There is a lack of visual consistency between the home page and the chat interface.
by i-iooi-i • Dec 16, 2025 •
bug#19438 issue: Icon loading regression
by JoelShepard • Nov 24, 2025 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@TomTheWise commented on GitHub (Feb 4, 2026):
Also I can add that this issue happens no matter what is set as text splitting method - happens on all 3 available settings: Chars, Tokens or Markdown headings.
This is something i don't really understand as some fiels upload without issue end get split without issue - and apparently donr require punkt_tab
@Classic298 commented on GitHub (Feb 4, 2026):
@TomTheWise did you try with other document extraction engines as i recommended?
the default built in one is not ideal for many documents, especially xlsx
its just a very minimal doc extraction engine that "just works" for beginners who download and want to try it out and see it working initially
for professional usecases you should use a different extraction engine
@TomTheWise commented on GitHub (Feb 4, 2026):
@Classic298 not yet. Honestly I never looked into that as the default was surprisingly successful till now. But I will do so in the next days what option will be best for us.
@TomTheWise commented on GitHub (Feb 4, 2026):
@Classic298 got Tika running super fast, super easy when Podman is already set up for OWUI!
This solves this issue and is a much superior way. Thank you very much!
Howevery maybe the fix to bundle the stuff that is currently loaded via NTLK (Im worried, maybe its even more important stuff?) in the Container Image would still be probably very good, especially for airgapped environments.
@Classic298 commented on GitHub (Feb 4, 2026):
ill take a look. if it is easily fixable, then we will fix it - if not - probably not worth fixing it
Glad you enjoy tika. Tika can be very powerful if configured correctly
@Classic298 commented on GitHub (Feb 4, 2026):
https://github.com/open-webui/open-webui/pull/21165