[PR #22044] [CLOSED] chore(deps): bump unstructured from 0.18.31 to 0.21.5 #26437

Closed
opened 2026-04-20 06:29:40 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/22044
Author: @dependabot[bot]
Created: 3/1/2026
Status: Closed

Base: devHead: dependabot/uv/dev/unstructured-0.21.5


📝 Commits (1)

  • 923f0b9 chore(deps): bump unstructured from 0.18.31 to 0.21.5

📊 Changes

3 files changed (+1763 additions, -1092 deletions)

View changed files

📝 backend/requirements.txt (+1 -1)
📝 pyproject.toml (+1 -1)
📝 uv.lock (+1761 -1090)

📄 Description

Bumps unstructured from 0.18.31 to 0.21.5.

Release notes

Sourced from unstructured's releases.

0.21.5

What's Changed

New Contributors

Full Changelog: https://github.com/Unstructured-IO/unstructured/compare/0.21.2...0.21.5

0.21.2

No release notes provided.

0.21.1

What's Changed

Full Changelog: https://github.com/Unstructured-IO/unstructured/compare/0.21.0...0.21.1

0.21.0

Fixes

  • Replace NLTK with spaCy to remediate CVE-2025-14009: NLTK's downloader uses zipfile.extractall() without path validation, enabling RCE via malicious packages (CVSS 10.0, no patch available). spaCy models install as pip packages, eliminating the vulnerable downloader entirely.

0.20.8

What's Changed

Full Changelog: https://github.com/Unstructured-IO/unstructured/compare/0.20.6...0.20.8

0.20.6

What's Changed

New Contributors

Full Changelog: https://github.com/Unstructured-IO/unstructured/compare/0.20.1...0.20.6

0.20.2

Release 0.20.2

... (truncated)

Changelog

Sourced from unstructured's changelog.

0.21.5

Fixes

  • Lower the requirement for pdfminer.six to >=20251230

0.21.4

Enhancements

  • Add a github action for testing time regressions

0.21.3

Enhancements

  • Custom fallback for language detection (fixes #4091): Add optional language_fallback callable for short ASCII text (e.g. when detection would default to English). Callable receives the text and may return a list of ISO 639-3 codes or None to leave language unspecified; return value is validated and invalid entries are filtered out. language_fallback is passed through partition(), PDF/image partitioners, and partition_html; partition_md now accepts languages (use [""] to disable detection). Language-related parameters (languages, detect_language_per_element, language_fallback) are documented as top-level options and exposed explicitly on partition_html.

0.21.2

Fixes

  • Self-install pinned spaCy model at runtime with SHA256 verification: Replace the en-core-web-sm direct URL dependency in pyproject.toml with the installer library. The spaCy model is now downloaded and installed on first use with hash verification, removing the need for [tool.uv.sources] and making the install more portable.

0.21.1

  • Bump version to create a new release

0.21.0

Fixes

  • Replace NLTK with spaCy to remediate CVE-2025-14009: NLTK's downloader uses zipfile.extractall() without path validation, enabling RCE via malicious packages (CVSS 10.0, no patch available). spaCy models install as pip packages, eliminating the vulnerable downloader entirely.

0.20.8

Fixes

  • downgrade wrapt so it is compatible with opentelemetry-instrumentation-httpx
  • resolve lock issue with windows and python 3.13

0.20.7

Fixes

  • Cap size when decompressing elements JSON file: Prevents situations where decompression can consume an arbitrarily large portion in memory and on the filesystem.

0.20.6

Fixes

  • fix: remap parent id after hashing to preserve right reference

0.20.5

Fixes

  • Gracefully handle invalid text_as_html during chunking: _TableChunker now catches parse errors (e.g. lxml.etree.ParserError when text_as_html contains a markdown code-fence like ```html\n) and returns None instead of raising, allowing chunking to continue using plain-text fallback. A WARNING log is emitted with a truncated preview of the offending value.

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/22044 **Author:** [@dependabot[bot]](https://github.com/apps/dependabot) **Created:** 3/1/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `dependabot/uv/dev/unstructured-0.21.5` --- ### 📝 Commits (1) - [`923f0b9`](https://github.com/open-webui/open-webui/commit/923f0b938aaae8770b8b2826d7dcc6a3dd251f84) chore(deps): bump unstructured from 0.18.31 to 0.21.5 ### 📊 Changes **3 files changed** (+1763 additions, -1092 deletions) <details> <summary>View changed files</summary> 📝 `backend/requirements.txt` (+1 -1) 📝 `pyproject.toml` (+1 -1) 📝 `uv.lock` (+1761 -1090) </details> ### 📄 Description Bumps [unstructured](https://github.com/Unstructured-IO/unstructured) from 0.18.31 to 0.21.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Unstructured-IO/unstructured/releases">unstructured's releases</a>.</em></p> <blockquote> <h2>0.21.5</h2> <h2>What's Changed</h2> <ul> <li>feat: custom fallback for language detection by <a href="https://github.com/claytonlin1110"><code>@​claytonlin1110</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4238">Unstructured-IO/unstructured#4238</a></li> <li>Add Github action for time regressions by <a href="https://github.com/aadland6"><code>@​aadland6</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4261">Unstructured-IO/unstructured#4261</a></li> <li>fix: relax lower bound for pdfminer.six by <a href="https://github.com/badGarnet"><code>@​badGarnet</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4262">Unstructured-IO/unstructured#4262</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/claytonlin1110"><code>@​claytonlin1110</code></a> made their first contribution in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4238">Unstructured-IO/unstructured#4238</a></li> <li><a href="https://github.com/aadland6"><code>@​aadland6</code></a> made their first contribution in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4261">Unstructured-IO/unstructured#4261</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/Unstructured-IO/unstructured/compare/0.21.2...0.21.5">https://github.com/Unstructured-IO/unstructured/compare/0.21.2...0.21.5</a></p> <h2>0.21.2</h2> <p>No release notes provided.</p> <h2>0.21.1</h2> <h2>What's Changed</h2> <ul> <li>bump version by <a href="https://github.com/badGarnet"><code>@​badGarnet</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4257">Unstructured-IO/unstructured#4257</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/Unstructured-IO/unstructured/compare/0.21.0...0.21.1">https://github.com/Unstructured-IO/unstructured/compare/0.21.0...0.21.1</a></p> <h2>0.21.0</h2> <h3>Fixes</h3> <ul> <li><strong>Replace NLTK with spaCy to remediate CVE-2025-14009</strong>: NLTK's downloader uses <code>zipfile.extractall()</code> without path validation, enabling RCE via malicious packages (CVSS 10.0, no patch available). spaCy models install as pip packages, eliminating the vulnerable downloader entirely.</li> </ul> <h2>0.20.8</h2> <h2>What's Changed</h2> <ul> <li>fix: set max decompressed size for elements JSON by <a href="https://github.com/qued"><code>@​qued</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4244">Unstructured-IO/unstructured#4244</a></li> <li>fix: update depdencies by <a href="https://github.com/badGarnet"><code>@​badGarnet</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4247">Unstructured-IO/unstructured#4247</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/Unstructured-IO/unstructured/compare/0.20.6...0.20.8">https://github.com/Unstructured-IO/unstructured/compare/0.20.6...0.20.8</a></p> <h2>0.20.6</h2> <h2>What's Changed</h2> <ul> <li>Automate pypi publishing by <a href="https://github.com/PastelStorm"><code>@​PastelStorm</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4239">Unstructured-IO/unstructured#4239</a></li> <li>fix: remove duplicate characters caused by fake bold rendering in PDFs by <a href="https://github.com/bittoby"><code>@​bittoby</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4215">Unstructured-IO/unstructured#4215</a></li> <li>Improve fast partition cold start by <a href="https://github.com/CyMule"><code>@​CyMule</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4242">Unstructured-IO/unstructured#4242</a></li> <li>fix: gracefully handle invalide html string during chunking by <a href="https://github.com/badGarnet"><code>@​badGarnet</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4243">Unstructured-IO/unstructured#4243</a></li> <li>fix: remap parent id after hashing by <a href="https://github.com/badGarnet"><code>@​badGarnet</code></a> in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4245">Unstructured-IO/unstructured#4245</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/bittoby"><code>@​bittoby</code></a> made their first contribution in <a href="https://redirect.github.com/Unstructured-IO/unstructured/pull/4215">Unstructured-IO/unstructured#4215</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/Unstructured-IO/unstructured/compare/0.20.1...0.20.6">https://github.com/Unstructured-IO/unstructured/compare/0.20.1...0.20.6</a></p> <h2>0.20.2</h2> <p>Release 0.20.2</p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md">unstructured's changelog</a>.</em></p> <blockquote> <h2>0.21.5</h2> <h3>Fixes</h3> <ul> <li>Lower the requirement for <code>pdfminer.six</code> to <code>&gt;=20251230</code></li> </ul> <h2>0.21.4</h2> <h3>Enhancements</h3> <ul> <li>Add a github action for testing time regressions</li> </ul> <h2>0.21.3</h2> <h3>Enhancements</h3> <ul> <li><strong>Custom fallback for language detection (fixes <a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4091">#4091</a>)</strong>: Add optional <code>language_fallback</code> callable for short ASCII text (e.g. when detection would default to English). Callable receives the text and may return a list of ISO 639-3 codes or <code>None</code> to leave language unspecified; return value is validated and invalid entries are filtered out. <code>language_fallback</code> is passed through <code>partition()</code>, PDF/image partitioners, and <code>partition_html</code>; <code>partition_md</code> now accepts <code>languages</code> (use <code>[&quot;&quot;]</code> to disable detection). Language-related parameters (<code>languages</code>, <code>detect_language_per_element</code>, <code>language_fallback</code>) are documented as top-level options and exposed explicitly on <code>partition_html</code>.</li> </ul> <h2>0.21.2</h2> <h3>Fixes</h3> <ul> <li><strong>Self-install pinned spaCy model at runtime with SHA256 verification</strong>: Replace the <code>en-core-web-sm</code> direct URL dependency in <code>pyproject.toml</code> with the <code>installer</code> library. The spaCy model is now downloaded and installed on first use with hash verification, removing the need for <code>[tool.uv.sources]</code> and making the install more portable.</li> </ul> <h2>0.21.1</h2> <ul> <li>Bump version to create a new release</li> </ul> <h2>0.21.0</h2> <h3>Fixes</h3> <ul> <li><strong>Replace NLTK with spaCy to remediate CVE-2025-14009</strong>: NLTK's downloader uses <code>zipfile.extractall()</code> without path validation, enabling RCE via malicious packages (CVSS 10.0, no patch available). spaCy models install as pip packages, eliminating the vulnerable downloader entirely.</li> </ul> <h2>0.20.8</h2> <h3>Fixes</h3> <ul> <li>downgrade <code>wrapt</code> so it is compatible with <code>opentelemetry-instrumentation-httpx</code></li> <li>resolve lock issue with windows and python 3.13</li> </ul> <h2>0.20.7</h2> <h3>Fixes</h3> <ul> <li><strong>Cap size when decompressing elements JSON file</strong>: Prevents situations where decompression can consume an arbitrarily large portion in memory and on the filesystem.</li> </ul> <h2>0.20.6</h2> <h3>Fixes</h3> <ul> <li>fix: remap parent id after hashing to preserve right reference</li> </ul> <h2>0.20.5</h2> <h3>Fixes</h3> <ul> <li><strong>Gracefully handle invalid <code>text_as_html</code> during chunking</strong>: <code>_TableChunker</code> now catches parse errors (e.g. <code>lxml.etree.ParserError</code> when <code>text_as_html</code> contains a markdown code-fence like <code>```html\n</code>) and returns <code>None</code> instead of raising, allowing chunking to continue using plain-text fallback. A <code>WARNING</code> log is emitted with a truncated preview of the offending value.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/530235229e6a1791d4b9c3e9511fae87334aeb74"><code>5302352</code></a> fix: relax lower bound for pdfminer.six (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4262">#4262</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/16482f96a07291fe045fdf1706d172c3e91e5ceb"><code>16482f9</code></a> Add Github action for time regressions (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4261">#4261</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/afbda95893db3dcf39a4ae10de0dd1549ddc8dae"><code>afbda95</code></a> feat: custom fallback for language detection (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4238">#4238</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/4a77a8c9950203e6506c23876e76baab6b2dea12"><code>4a77a8c</code></a> fix: self-install pinned spaCy model at runtime with SHA256 verification (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4258">#4258</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/47b8b5ee43cb8a4d684076710d73ced991c0457a"><code>47b8b5e</code></a> bump version (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4257">#4257</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/3db7b4f9ed806055bc35e96919710459be460bbf"><code>3db7b4f</code></a> Fix: replace nltk with spacy CVE 2025 14009 (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4255">#4255</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/a8f14ba30f8a7c610ea2deca1ab237d9a57caff9"><code>a8f14ba</code></a> fix: update depdencies (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4247">#4247</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/c6c746286b44eedff5ebd10a149aafeb7c863ad4"><code>c6c7462</code></a> fix: set max decompressed size for elements JSON (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4244">#4244</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/e2d8b7ae0d41b17b91eeb6d577cba818340b3331"><code>e2d8b7a</code></a> fix: remap parent id after hashing (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4245">#4245</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/c1f819c5c303e1f39877eccb00d41fa058389b9c"><code>c1f819c</code></a> fix: gracefully handle invalide html string during chunking (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/4243">#4243</a>)</li> <li>Additional commits viewable in <a href="https://github.com/Unstructured-IO/unstructured/compare/0.18.31...0.21.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=unstructured&package-manager=uv&previous-version=0.18.31&new-version=0.21.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 06:29:40 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#26437