[PR #2591] [CLOSED] chore(deps): bump unstructured from 0.11.8 to 0.14.2 in /backend #21039

Closed
opened 2026-04-20 03:19:00 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/2591
Author: @dependabot[bot]
Created: 5/27/2024
Status: Closed

Base: mainHead: dependabot/pip/backend/unstructured-0.14.2


📝 Commits (1)

  • c0c04f8 chore(deps): bump unstructured from 0.11.8 to 0.14.2 in /backend

📊 Changes

1 file changed (+1 additions, -1 deletions)

View changed files

📝 backend/requirements.txt (+1 -1)

📄 Description

Bumps unstructured from 0.11.8 to 0.14.2.

Release notes

Sourced from unstructured's releases.

0.14.2

Enhancements

  • Bump unstructured-inference==0.7.33.

Features

  • Add attribution to the pinecone connector.

0.14.1

Enhancements

  • Refactor code related to embedded text extraction. The embedded text extraction code is moved from unstructured-inference to unstructured.

Features

  • Large improvements to the ingest process:
    • Support for multiprocessing and async, with limits for both.
    • Streamlined to process when mapping CLI invocations to the underlying code
    • More granular steps introduced to give better control over process (i.e. dedicated step to uncompress files already in the local filesystem, new optional staging step before upload)
    • Use the python client when calling the unstructured api for partitioning or chunking
    • Saving the final content is now a dedicated destination connector (local) set as the default if none are provided. Avoids adding new files locally if uploading elsewhere.
    • Leverage last modified date when deciding if new files should be downloaded and reprocessed.
    • Add attribution to the pinecone connector
  • Add support for Python 3.12. unstructured now works with Python 3.12!

0.14.0

BREAKING CHANGES

  • Turn table extraction for PDFs and images off by default. Reverting the default behavior for table extraction to "off" for PDFs and images. A number of users didn't realize we made the change and were impacted by slower processing times due to the extra model call for table extraction.

Enhancements

  • Skip unnecessary element sorting in partition_pdf(). Skip element sorting when determining whether embedded text can be extracted.
  • Faster evaluation Support for concurrent processing of documents during evaluation
  • Add strategy parameter to partition_docx(). Behavior of future enhancements may be sensitive the partitioning strategy. Add this parameter so partition_docx() is aware of the requested strategy.
  • Add GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR configuration parameteres to control temporary storage.

Features

  • Add form extraction basics (document elements and placeholder code in partition). This is to lay the ground work for the future. Form extraction models are not currently available in the library. An attempt to use this functionality will end in a NotImplementedError.

Fixes

  • Add missing starting_page_num param to partition_image
  • Make the filename and file params for partition_image and partition_pdf match the other partitioners
  • Fix include_slide_notes and include_page_breaks params in partition_ppt
  • Re-apply: skip accuracy calculation feature Overwritten by mistake
  • Fix type hint for paragraph_grouper param paragraph_grouper can be set to False, but the type hint did not not reflect this previously.
  • Remove links param from partition_pdf links is extracted during partitioning and is not needed as a paramter in partition_pdf.

... (truncated)

Changelog

Sourced from unstructured's changelog.

0.14.2

Enhancements

  • Bump unstructured-inference==0.7.33.

Features

  • Add attribution to the pinecone connector.

Fixes

0.14.1

Enhancements

  • Refactor code related to embedded text extraction. The embedded text extraction code is moved from unstructured-inference to unstructured.

Features

  • Large improvements to the ingest process:
    • Support for multiprocessing and async, with limits for both.
    • Streamlined to process when mapping CLI invocations to the underlying code
    • More granular steps introduced to give better control over process (i.e. dedicated step to uncompress files already in the local filesystem, new optional staging step before upload)
    • Use the python client when calling the unstructured api for partitioning or chunking
    • Saving the final content is now a dedicated destination connector (local) set as the default if none are provided. Avoids adding new files locally if uploading elsewhere.
    • Leverage last modified date when deciding if new files should be downloaded and reprocessed.
    • Add attribution to the pinecone connector
    • Add support for Python 3.12. unstructured now works with Python 3.12!

Fixes

0.14.0

BREAKING CHANGES

  • Turn table extraction for PDFs and images off by default. Reverting the default behavior for table extraction to "off" for PDFs and images. A number of users didn't realize we made the change and were impacted by slower processing times due to the extra model call for table extraction.

Enhancements

  • Skip unnecessary element sorting in partition_pdf(). Skip element sorting when determining whether embedded text can be extracted.
  • Faster evaluation Support for concurrent processing of documents during evaluation
  • Add strategy parameter to partition_docx(). Behavior of future enhancements may be sensitive the partitioning strategy. Add this parameter so partition_docx() is aware of the requested strategy.
  • Add GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR configuration parameteres to control temporary storage.

Features

  • Add form extraction basics (document elements and placeholder code in partition). This is to lay the ground work for the future. Form extraction models are not currently available in the library. An attempt to use this functionality will end in a NotImplementedError.

Fixes

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/2591 **Author:** [@dependabot[bot]](https://github.com/apps/dependabot) **Created:** 5/27/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `dependabot/pip/backend/unstructured-0.14.2` --- ### 📝 Commits (1) - [`c0c04f8`](https://github.com/open-webui/open-webui/commit/c0c04f856419cfd2abec7179310866528466e95d) chore(deps): bump unstructured from 0.11.8 to 0.14.2 in /backend ### 📊 Changes **1 file changed** (+1 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `backend/requirements.txt` (+1 -1) </details> ### 📄 Description Bumps [unstructured](https://github.com/Unstructured-IO/unstructured) from 0.11.8 to 0.14.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Unstructured-IO/unstructured/releases">unstructured's releases</a>.</em></p> <blockquote> <h2>0.14.2</h2> <h3>Enhancements</h3> <ul> <li><strong>Bump unstructured-inference==0.7.33</strong>.</li> </ul> <h3>Features</h3> <ul> <li><strong>Add attribution to the <code>pinecone</code> connector</strong>.</li> </ul> <h2>0.14.1</h2> <h3>Enhancements</h3> <ul> <li><strong>Refactor code related to embedded text extraction</strong>. The embedded text extraction code is moved from <code>unstructured-inference</code> to <code>unstructured</code>.</li> </ul> <h3>Features</h3> <ul> <li><strong>Large improvements to the ingest process:</strong> <ul> <li>Support for multiprocessing and async, with limits for both.</li> <li>Streamlined to process when mapping CLI invocations to the underlying code</li> <li>More granular steps introduced to give better control over process (i.e. dedicated step to uncompress files already in the local filesystem, new optional staging step before upload)</li> <li>Use the python client when calling the unstructured api for partitioning or chunking</li> <li>Saving the final content is now a dedicated destination connector (local) set as the default if none are provided. Avoids adding new files locally if uploading elsewhere.</li> <li>Leverage last modified date when deciding if new files should be downloaded and reprocessed.</li> <li>Add attribution to the <code>pinecone</code> connector</li> </ul> </li> <li><strong>Add support for Python 3.12</strong>. <code>unstructured</code> now works with Python 3.12!</li> </ul> <h2>0.14.0</h2> <h3>BREAKING CHANGES</h3> <ul> <li><strong>Turn table extraction for PDFs and images off by default</strong>. Reverting the default behavior for table extraction to &quot;off&quot; for PDFs and images. A number of users didn't realize we made the change and were impacted by slower processing times due to the extra model call for table extraction.</li> </ul> <h3>Enhancements</h3> <ul> <li><strong>Skip unnecessary element sorting in <code>partition_pdf()</code></strong>. Skip element sorting when determining whether embedded text can be extracted.</li> <li><strong>Faster evaluation</strong> Support for concurrent processing of documents during evaluation</li> <li><strong>Add strategy parameter to <code>partition_docx()</code>.</strong> Behavior of future enhancements may be sensitive the partitioning strategy. Add this parameter so <code>partition_docx()</code> is aware of the requested strategy.</li> <li><strong>Add GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR</strong> configuration parameteres to control temporary storage.</li> </ul> <h3>Features</h3> <ul> <li><strong>Add form extraction basics (document elements and placeholder code in partition)</strong>. This is to lay the ground work for the future. Form extraction models are not currently available in the library. An attempt to use this functionality will end in a <code>NotImplementedError</code>.</li> </ul> <h3>Fixes</h3> <ul> <li><strong>Add missing starting_page_num param to partition_image</strong></li> <li><strong>Make the filename and file params for partition_image and partition_pdf match the other partitioners</strong></li> <li><strong>Fix include_slide_notes and include_page_breaks params in partition_ppt</strong></li> <li><strong>Re-apply: skip accuracy calculation feature</strong> Overwritten by mistake</li> <li><strong>Fix type hint for paragraph_grouper param</strong> <code>paragraph_grouper</code> can be set to <code>False</code>, but the type hint did not not reflect this previously.</li> <li><strong>Remove links param from partition_pdf</strong> <code>links</code> is extracted during partitioning and is not needed as a paramter in partition_pdf.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md">unstructured's changelog</a>.</em></p> <blockquote> <h2>0.14.2</h2> <h3>Enhancements</h3> <ul> <li><strong>Bump unstructured-inference==0.7.33</strong>.</li> </ul> <h3>Features</h3> <ul> <li><strong>Add attribution to the <code>pinecone</code> connector</strong>.</li> </ul> <h3>Fixes</h3> <h2>0.14.1</h2> <h3>Enhancements</h3> <ul> <li><strong>Refactor code related to embedded text extraction</strong>. The embedded text extraction code is moved from <code>unstructured-inference</code> to <code>unstructured</code>.</li> </ul> <h3>Features</h3> <ul> <li><strong>Large improvements to the ingest process:</strong> <ul> <li>Support for multiprocessing and async, with limits for both.</li> <li>Streamlined to process when mapping CLI invocations to the underlying code</li> <li>More granular steps introduced to give better control over process (i.e. dedicated step to uncompress files already in the local filesystem, new optional staging step before upload)</li> <li>Use the python client when calling the unstructured api for partitioning or chunking</li> <li>Saving the final content is now a dedicated destination connector (local) set as the default if none are provided. Avoids adding new files locally if uploading elsewhere.</li> <li>Leverage last modified date when deciding if new files should be downloaded and reprocessed.</li> <li>Add attribution to the <code>pinecone</code> connector</li> <li><strong>Add support for Python 3.12</strong>. <code>unstructured</code> now works with Python 3.12!</li> </ul> </li> </ul> <h3>Fixes</h3> <h2>0.14.0</h2> <h3>BREAKING CHANGES</h3> <ul> <li><strong>Turn table extraction for PDFs and images off by default</strong>. Reverting the default behavior for table extraction to &quot;off&quot; for PDFs and images. A number of users didn't realize we made the change and were impacted by slower processing times due to the extra model call for table extraction.</li> </ul> <h3>Enhancements</h3> <ul> <li><strong>Skip unnecessary element sorting in <code>partition_pdf()</code></strong>. Skip element sorting when determining whether embedded text can be extracted.</li> <li><strong>Faster evaluation</strong> Support for concurrent processing of documents during evaluation</li> <li><strong>Add strategy parameter to <code>partition_docx()</code>.</strong> Behavior of future enhancements may be sensitive the partitioning strategy. Add this parameter so <code>partition_docx()</code> is aware of the requested strategy.</li> <li><strong>Add GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR</strong> configuration parameteres to control temporary storage.</li> </ul> <h3>Features</h3> <ul> <li><strong>Add form extraction basics (document elements and placeholder code in partition)</strong>. This is to lay the ground work for the future. Form extraction models are not currently available in the library. An attempt to use this functionality will end in a <code>NotImplementedError</code>.</li> </ul> <h3>Fixes</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/18428f24ab4efe2c65f24be5c992bd8903dc6dac"><code>18428f2</code></a> chore: bump unstructured-inference 0.7.33 (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3074">#3074</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/30e5a0cd4e05db225163c2bfe010a4743f90a932"><code>30e5a0c</code></a> rfctr(docx): organize docx tests (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3070">#3070</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/7832dfc723efa5bd9381e1a09e8d7c26232d0b03"><code>7832dfc</code></a> feat: add attribution for pinecone (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3067">#3067</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/b0d8a779da65cdb7a4c94735e51e4765b2093718"><code>b0d8a77</code></a> feat: <code>partiton_pdf()</code> set inferred elements text (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3061">#3061</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/059fc64bd96f596c1173e6b92cfdbb653e34247b"><code>059fc64</code></a> build: apk add libreoffice24 (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3065">#3065</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/3eaf65a8c1c73e204c3a030f6c5d5412ac2c49c2"><code>3eaf65a</code></a> feat: refactor ingest (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3009">#3009</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/73739b38cc9ac504c99c14653d03a2d1a7fbb7c1"><code>73739b3</code></a> docs: redirect to docs.unstructured.io on github pages (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3054">#3054</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/acda4d07073ad104793f6e435ee9c4150faf87d8"><code>acda4d0</code></a> fix: set <code>skip_infer_tables</code> explicitly in `test_partition_via_api_with_no_st...</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/6066a264cb9e2b3d5fc097418f4c0f994fa34c3d"><code>6066a26</code></a> fix: update container link in README.md (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/2889">#2889</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/60f10fe6ddf5efcf255aedda961d952c54f7df4e"><code>60f10fe</code></a> Updated Weaviate Docker image url (auto PR by bot) (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/2659">#2659</a>)</li> <li>Additional commits viewable in <a href="https://github.com/Unstructured-IO/unstructured/compare/0.11.8...0.14.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=unstructured&package-manager=pip&previous-version=0.11.8&new-version=0.14.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 03:19:00 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#21039