[PR #5066] [MERGED] build(deps): bump unstructured from 0.15.7 to 0.15.9 in /backend #8414

Closed
opened 2025-11-11 17:55:57 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/5066
Author: @dependabot[bot]
Created: 9/1/2024
Status: Merged
Merged: 9/3/2024
Merged by: @tjbck

Base: devHead: dependabot/pip/backend/dev/unstructured-0.15.9


📝 Commits (1)

  • 92488c2 build(deps): bump unstructured from 0.15.7 to 0.15.9 in /backend

📊 Changes

1 file changed (+1 additions, -1 deletions)

View changed files

📝 backend/requirements.txt (+1 -1)

📄 Description

Bumps unstructured from 0.15.7 to 0.15.9.

Release notes

Sourced from unstructured's releases.

0.15.9

Enhancements

Features

  • Add support for encoding parameter in partition_csv

0.15.8

Enhancements

  • Bump unstructured.paddleocr to 2.8.1.0.

Features

  • Add MixedbreadAI embedder Adds MixedbreadAI embeddings to support embedding via Mixedbread AI.

Fixes

  • Replace pillow-heif with pi-heif. Replaces pillow-heif with pi-heif due to more permissive licensing on the wheel for pi-heif.
  • Minify text_as_html from DOCX. Previously .metadata.text_as_html for DOCX tables was "bloated" with whitespace and noise elements introduced by tabulate that produced over-chunking and lower "semantic density" of elements. Reduce HTML to minimum character count without preserving all text.
  • Fall back to filename extension-based file-type detection for unidentified OLE files. Resolves a problem where a DOC file that could not be detected as such by filetype was incorrectly identified as a MSG file.
Changelog

Sourced from unstructured's changelog.

0.15.9

Enhancements

Features

  • Add support for encoding parameter in partition_csv

Fixes

  • Check storage contents for OLE file type detection Updates detect_filetype to check the content of OLE files to more reliable differentiate DOC, PPT, XLS, and MSG files. As part of this, the "msg" extra was removed because the python-oxmsg package is now a base dependency.
  • Fix disk space leaks and Windows errors when accessing file.name on a NamedTemporaryFile Uses of NamedTemporaryFile(..., delete=False) and/or uses of file.name of NamedTemporaryFiles have been replaced with TemporaryFileDirectory to avoid a known issue: https://docs.python.org/3/library/tempfile.html#tempfile.NamedTemporaryFile

0.15.8

Enhancements

  • Bump unstructured.paddleocr to 2.8.1.0.

Features

  • Add MixedbreadAI embedder Adds MixedbreadAI embeddings to support embedding via Mixedbread AI.

Fixes

  • Replace pillow-heif with pi-heif. Replaces pillow-heif with pi-heif due to more permissive licensing on the wheel for pi-heif.
  • Minify text_as_html from DOCX. Previously .metadata.text_as_html for DOCX tables was "bloated" with whitespace and noise elements introduced by tabulate that produced over-chunking and lower "semantic density" of elements. Reduce HTML to minimum character count without preserving all text.
  • Fall back to filename extension-based file-type detection for unidentified OLE files. Resolves a problem where a DOC file that could not be detected as such by filetype was incorrectly identified as a MSG file.
Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/5066 **Author:** [@dependabot[bot]](https://github.com/apps/dependabot) **Created:** 9/1/2024 **Status:** ✅ Merged **Merged:** 9/3/2024 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `dependabot/pip/backend/dev/unstructured-0.15.9` --- ### 📝 Commits (1) - [`92488c2`](https://github.com/open-webui/open-webui/commit/92488c254da8ca5203cd94271c9d6d1dd0fadb99) build(deps): bump unstructured from 0.15.7 to 0.15.9 in /backend ### 📊 Changes **1 file changed** (+1 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `backend/requirements.txt` (+1 -1) </details> ### 📄 Description Bumps [unstructured](https://github.com/Unstructured-IO/unstructured) from 0.15.7 to 0.15.9. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Unstructured-IO/unstructured/releases">unstructured's releases</a>.</em></p> <blockquote> <h2>0.15.9</h2> <h3>Enhancements</h3> <h3>Features</h3> <ul> <li><strong>Add support for encoding parameter in partition_csv</strong></li> </ul> <h2>0.15.8</h2> <h3>Enhancements</h3> <ul> <li><strong>Bump unstructured.paddleocr to 2.8.1.0.</strong></li> </ul> <h3>Features</h3> <ul> <li><strong>Add MixedbreadAI embedder</strong> Adds MixedbreadAI embeddings to support embedding via Mixedbread AI.</li> </ul> <h3>Fixes</h3> <ul> <li><strong>Replace <code>pillow-heif</code> with <code>pi-heif</code></strong>. Replaces <code>pillow-heif</code> with <code>pi-heif</code> due to more permissive licensing on the wheel for <code>pi-heif</code>.</li> <li><strong>Minify text_as_html from DOCX.</strong> Previously <code>.metadata.text_as_html</code> for DOCX tables was &quot;bloated&quot; with whitespace and noise elements introduced by <code>tabulate</code> that produced over-chunking and lower &quot;semantic density&quot; of elements. Reduce HTML to minimum character count without preserving all text.</li> <li><strong>Fall back to filename extension-based file-type detection for unidentified OLE files.</strong> Resolves a problem where a DOC file that could not be detected as such by <code>filetype</code> was incorrectly identified as a MSG file.</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md">unstructured's changelog</a>.</em></p> <blockquote> <h2>0.15.9</h2> <h3>Enhancements</h3> <h3>Features</h3> <ul> <li><strong>Add support for encoding parameter in partition_csv</strong></li> </ul> <h3>Fixes</h3> <ul> <li><strong>Check storage contents for OLE file type detection</strong> Updates <code>detect_filetype</code> to check the content of OLE files to more reliable differentiate DOC, PPT, XLS, and MSG files. As part of this, the <code>&quot;msg&quot;</code> extra was removed because the <code>python-oxmsg</code> package is now a base dependency.</li> <li><strong>Fix disk space leaks and Windows errors when accessing file.name on a NamedTemporaryFile</strong> Uses of <code>NamedTemporaryFile(..., delete=False)</code> and/or uses of <code>file.name</code> of NamedTemporaryFiles have been replaced with TemporaryFileDirectory to avoid a known issue: <a href="https://docs.python.org/3/library/tempfile.html#tempfile.NamedTemporaryFile">https://docs.python.org/3/library/tempfile.html#tempfile.NamedTemporaryFile</a></li> </ul> <h2>0.15.8</h2> <h3>Enhancements</h3> <ul> <li><strong>Bump unstructured.paddleocr to 2.8.1.0.</strong></li> </ul> <h3>Features</h3> <ul> <li><strong>Add MixedbreadAI embedder</strong> Adds MixedbreadAI embeddings to support embedding via Mixedbread AI.</li> </ul> <h3>Fixes</h3> <ul> <li><strong>Replace <code>pillow-heif</code> with <code>pi-heif</code></strong>. Replaces <code>pillow-heif</code> with <code>pi-heif</code> due to more permissive licensing on the wheel for <code>pi-heif</code>.</li> <li><strong>Minify text_as_html from DOCX.</strong> Previously <code>.metadata.text_as_html</code> for DOCX tables was &quot;bloated&quot; with whitespace and noise elements introduced by <code>tabulate</code> that produced over-chunking and lower &quot;semantic density&quot; of elements. Reduce HTML to minimum character count without preserving all text.</li> <li><strong>Fall back to filename extension-based file-type detection for unidentified OLE files.</strong> Resolves a problem where a DOC file that could not be detected as such by <code>filetype</code> was incorrectly identified as a MSG file.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/6ba8135bf95ecdbd1733a84c12f7cdbddf4f37ad"><code>6ba8135</code></a> fix: check ole storage content to differentiate filetypes (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3581">#3581</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/ddb6cb631db704fe4583a42a276547beee986e91"><code>ddb6cb6</code></a> chore: remove minimum version pins for pins older than 6 mo (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3577">#3577</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/f440eb476cf75d6109e8a3719cadf893529dcef8"><code>f440eb4</code></a> feat: Support encoding parameter in partition_csv (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3564">#3564</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/f21c853adecdd960fdbef83f404864a5bead29d4"><code>f21c853</code></a> bug: fix file_conversion disk leak (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3562">#3562</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/4194a07d12539cca5ff11b377f120dc4934b078b"><code>4194a07</code></a> build(deps): replace pillow-heif with pi-heif (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3571">#3571</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/ddba928344d9ca25531c0fcd40a3e6bbcffda745"><code>ddba928</code></a> Potter/mixedbread embedder (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3513">#3513</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/affd997c3936b08522c573117fa95420a794b1a4"><code>affd997</code></a> refactor(ci): remove unused environment variables (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3568">#3568</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/09d84bc46bb933ee36c232b068b28bbdff9b82c3"><code>09d84bc</code></a> build(deps): version bumps for 2024-08-26 (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3567">#3567</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/ac10ba4fc1fb77f269cf4cffc7c8d35082cded51"><code>ac10ba4</code></a> build(deps): bump unstructured.paddleocr to 2.8.1.0 (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3561">#3561</a>)</li> <li><a href="https://github.com/Unstructured-IO/unstructured/commit/32bb77aafb43291916e2914c756ac9930e98791b"><code>32bb77a</code></a> fix(file): no default OLE subtype (<a href="https://redirect.github.com/Unstructured-IO/unstructured/issues/3516">#3516</a>)</li> <li>Additional commits viewable in <a href="https://github.com/Unstructured-IO/unstructured/compare/0.15.7...0.15.9">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=unstructured&package-manager=pip&previous-version=0.15.7&new-version=0.15.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 17:55:57 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#8414