[PR #9328] [MERGED] fix: increment unstructured package version to fix UnstructuredExcelLoader #22359

Closed
opened 2026-04-20 04:05:40 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/9328
Author: @kahghi
Created: 2/4/2025
Status: Merged
Merged: 2/4/2025
Merged by: @tjbck

Base: devHead: fix-unstructured-pkg-version


📝 Commits (1)

  • 1868cf0 increment unstructured package version

📊 Changes

3 files changed (+160 additions, -163 deletions)

View changed files

📝 backend/requirements.txt (+1 -1)
📝 pyproject.toml (+1 -1)
📝 uv.lock (+158 -161)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests for validating the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

  • TLDR: unstructured package version 0.15.9 is downloading nltk data from a aws s3 bucket that has a permission denied error. incrementing the package version fixes this issue.
    error:
    image

image
image

  • details: when uploading files of xls format, the UnstructuredExcelLoader would then cause an error with partition_xlsx, which then ties to is_possible_narrative_text, to exceeds_cap_ratio, to sentence_count, to sent_tokenize, to _download_nltk_packages_if_not_present, to download_nltk_packages when then uses urllib to retrieve the file in the aws s3 bucket

updated version now downloads via nltk itself:
image

Added

  • [List any new features, functionalities, or additions]

Changed

  • [List any changes, updates, refactorings, or optimizations]

Deprecated

  • [List any deprecated functionality or features that have been removed]

Removed

  • [List any removed features, files, or functionalities]

Fixed

  • UnstructuredExcelLoader not working with xls files due to unstructured package. incrementing the version fixes this issue

Security

  • [List any new or updated security-related changes, including vulnerability fixes]

Breaking Changes

  • BREAKING CHANGE: [List any breaking changes affecting compatibility or functionality]

Additional Information

  • [Insert any additional context, notes, or explanations for the changes]
    • [Reference any related issues, commits, or other relevant information]

Screenshots or Videos

  • [Attach any relevant screenshots or videos demonstrating the changes]

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/9328 **Author:** [@kahghi](https://github.com/kahghi) **Created:** 2/4/2025 **Status:** ✅ Merged **Merged:** 2/4/2025 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `fix-unstructured-pkg-version` --- ### 📝 Commits (1) - [`1868cf0`](https://github.com/open-webui/open-webui/commit/1868cf0a91c85fffaf07681f510c3efda27ae504) increment unstructured package version ### 📊 Changes **3 files changed** (+160 additions, -163 deletions) <details> <summary>View changed files</summary> 📝 `backend/requirements.txt` (+1 -1) 📝 `pyproject.toml` (+1 -1) 📝 `uv.lock` (+158 -161) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [ ] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [ ] **Testing:** Have you written and run sufficient tests for validating the changes? - [ ] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To cleary categorize this pull request, prefix the pull request title, using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description - TLDR: unstructured package version 0.15.9 is downloading nltk data from a aws s3 bucket that has a permission denied error. incrementing the package version fixes this issue. error: <img width="501" alt="image" src="https://github.com/user-attachments/assets/13ac84af-06e1-497e-82de-2401bc9885cb" /> ![image](https://github.com/user-attachments/assets/379927a6-0f60-47fa-a9c7-d1ed3ed6b84d) <img width="1659" alt="image" src="https://github.com/user-attachments/assets/0fb76789-4f24-434f-9ff3-6a75bf307934" /> - details: when uploading files of xls format, the UnstructuredExcelLoader would then cause an error with partition_xlsx, which then ties to is_possible_narrative_text, to exceeds_cap_ratio, to sentence_count, to sent_tokenize, to _download_nltk_packages_if_not_present, to download_nltk_packages when then uses urllib to retrieve the file in the aws s3 bucket updated version now downloads via nltk itself: <img width="641" alt="image" src="https://github.com/user-attachments/assets/c796bddf-69f6-4df4-9c28-435b0d25d59c" /> ### Added - [List any new features, functionalities, or additions] ### Changed - [List any changes, updates, refactorings, or optimizations] ### Deprecated - [List any deprecated functionality or features that have been removed] ### Removed - [List any removed features, files, or functionalities] ### Fixed - UnstructuredExcelLoader not working with xls files due to unstructured package. incrementing the version fixes this issue ### Security - [List any new or updated security-related changes, including vulnerability fixes] ### Breaking Changes - **BREAKING CHANGE**: [List any breaking changes affecting compatibility or functionality] --- ### Additional Information - [Insert any additional context, notes, or explanations for the changes] - [Reference any related issues, commits, or other relevant information] ### Screenshots or Videos - [Attach any relevant screenshots or videos demonstrating the changes] --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 04:05:40 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#22359