bug: upload md file, but got error message 'Resource punkt_tab not found.' #1792

Closed
opened 2025-11-11 14:53:22 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @qaz-t on GitHub (Aug 16, 2024).

Bug Report

Installation Method

docker

Environment

  • Open WebUI Version: v0.3.13

  • Operating System: Ubuntu 22.04

  • Browser (if applicable): Chrome 127

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

upload file successfully

Actual Behavior:

i got error message below

Something went wrong :/ ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab')  For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/usr/local/nltk_data' - '/usr/local/share/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' **********************************************************************

Description

Bug Summary:
upload md files faild. got Resource punkt_tab not found.

Reproduction Details

Steps to Reproduce:
upload a md file

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:

Traceback (most recent call last):
  File "/app/backend/apps/rag/main.py", line 1248, in process_doc
    data = loader.load()
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 89, in lazy_load
    elements = self._get_elements()
               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/markdown.py", line 45, in _get_elements
    return partition_md(filename=self.file_path, **self.unstructured_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/documents/elements.py", line 593, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 429, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 385, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/md.py", line 111, in partition_md
    return partition_html(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/documents/elements.py", line 593, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 429, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 385, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/partition.py", line 103, in partition_html
    elements = list(
               ^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/lang.py", line 475, in apply_lang_metadata
    elements = list(elements)
               ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/partition.py", line 222, in iter_elements
    yield from cls(opts)._iter_elements()
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/partition.py", line 229, in _iter_elements
    for e in self._main.iter_elements():
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 359, in iter_elements
    yield from block_item.iter_elements()
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 354, in iter_elements
    yield from self._element_from_text_or_tail(self.text or "", q, self._ElementCls)
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 384, in _element_from_text_or_tail
    yield from element_accum.flush(ElementCls)
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 252, in flush
    ElementCls = derive_element_type_from_text(normalized_text)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 884, in derive_element_type_from_text
    if is_possible_narrative_text(text):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 80, in is_possible_narrative_text
    if exceeds_cap_ratio(text, threshold=cap_threshold):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 276, in exceeds_cap_ratio
    if sentence_count(text, 3) > 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 225, in sentence_count
    sentences = sent_tokenize(text)
                ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 137, in sent_tokenize
    return _sent_tokenize(text)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = PunktTokenizer(language)
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
    self.load_lang(lang)
  File "/usr/local/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
    lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/nltk/data.py", line 582, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:
  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html
  Attempted to load tokenizers/punkt_tab/english/
  Searched in:
    - '/root/nltk_data'
    - '/usr/local/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'

Screenshots/Screen Recordings (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Additional Information

maybe related to
https://github.com/nltk/nltk/issues/3293
https://github.com/Unstructured-IO/unstructured/issues/3511

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @qaz-t on GitHub (Aug 16, 2024). # Bug Report ## Installation Method docker ## Environment - **Open WebUI Version:** v0.3.13 - **Operating System:** Ubuntu 22.04 - **Browser (if applicable):** Chrome 127 **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: upload file successfully ## Actual Behavior: i got error message below ```bash Something went wrong :/ ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab')  For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/usr/local/nltk_data' - '/usr/local/share/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** ``` ## Description **Bug Summary:** upload md files faild. got ` Resource punkt_tab not found.` ## Reproduction Details **Steps to Reproduce:** upload a md file ## Logs and Screenshots **Browser Console Logs:** [Include relevant browser console logs, if applicable] **Docker Container Logs:** ```bash Traceback (most recent call last): File "/app/backend/apps/rag/main.py", line 1248, in process_doc data = loader.load() ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load return list(self.lazy_load()) ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 89, in lazy_load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/markdown.py", line 45, in _get_elements return partition_md(filename=self.file_path, **self.unstructured_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/documents/elements.py", line 593, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 429, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 385, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/md.py", line 111, in partition_md return partition_html( ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/documents/elements.py", line 593, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 429, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 385, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/partition.py", line 103, in partition_html elements = list( ^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/lang.py", line 475, in apply_lang_metadata elements = list(elements) ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/partition.py", line 222, in iter_elements yield from cls(opts)._iter_elements() File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/partition.py", line 229, in _iter_elements for e in self._main.iter_elements(): File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 359, in iter_elements yield from block_item.iter_elements() File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 354, in iter_elements yield from self._element_from_text_or_tail(self.text or "", q, self._ElementCls) File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 384, in _element_from_text_or_tail yield from element_accum.flush(ElementCls) File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 252, in flush ElementCls = derive_element_type_from_text(normalized_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/html/parser.py", line 884, in derive_element_type_from_text if is_possible_narrative_text(text): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 80, in is_possible_narrative_text if exceeds_cap_ratio(text, threshold=cap_threshold): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 276, in exceeds_cap_ratio if sentence_count(text, 3) > 1: ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 225, in sentence_count sentences = sent_tokenize(text) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 137, in sent_tokenize return _sent_tokenize(text) ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize tokenizer = PunktTokenizer(language) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__ self.load_lang(lang) File "/usr/local/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang lang_dir = find(f"tokenizers/punkt_tab/{lang}/") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/nltk/data.py", line 582, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab') For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/usr/local/nltk_data' - '/usr/local/share/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ``` **Screenshots/Screen Recordings (if applicable):** [Attach any relevant screenshots to help illustrate the issue] ## Additional Information maybe related to https://github.com/nltk/nltk/issues/3293 https://github.com/Unstructured-IO/unstructured/issues/3511 ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
Author
Owner

@tjbck commented on GitHub (Aug 17, 2024):

Fixed on dev!

@tjbck commented on GitHub (Aug 17, 2024): Fixed on dev!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1792