web UI *.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive" #1093

Closed
opened 2025-11-11 14:37:06 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @tqangxl on GitHub (Jun 2, 2024).

Bug Report

*.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive"

Description

*.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive"
image

Bug Summary:
[Provide a brief but clear summary of the bug]

Steps to Reproduce:
save .docx file to .doc (save type :word97 - 2003)
upload *.doc file (word97 - 2003) from chat

Expected Behavior:
[Describe what you expected to happen.]
from:
https://blog.csdn.net/qq_50000922/article/details/129270999
https://blog.csdn.net/weixin_42521211/article/details/106428503
方法一:python-docx模块
使用python-docx模块,通过其中的Document函数可以读取word文档,然后可以借助document对象的相关属性、方法来获取文档中想要的信息或者编辑文档。

‘add_heading’,
‘add_page_break’,
‘add_paragraph’,
‘add_picture’,
‘add_section’,
‘add_table’,
‘core_properties’,
‘element’,
‘inline_shapes’,
‘paragraphs’,
‘part’,
‘save’,
‘sections’,
‘settings’,
‘styles’,
‘tables’

简单示例
from docx import Document
input_document = Document(filename) #读取word文件
tables = input_document.tables # 获取文件中的所有表格

KeyError: “There is no item named ‘word/NULL’ in the archive”

方法二:解压-解析的方式
根据对上述word的简述, 根据word(.docx)文件的格式,因此我们可以通过遵循如下步骤进行正文信息的提取:

unzip .docx files
用BeautifulSoup解析word/document.xml提取正文信息
代码示例如下:

from zipfile import ZipFile
from bs4 import BeautifulSoup

document=ZipFile(r'test.docx')
xml=document.read("word/document.xml")
wordObj=BeautifulSoup(xml.decode("utf-8"))
texts=wordObj.findAll("w:t")
for text in texts:
print(text.text)

Actual Behavior:
[Describe what actually happened.]

Environment

  • Open WebUI Version: [e.g., 0.1.120]

  • Ollama (if applicable): [e.g., 0.1.30, 0.1.32-rc1]

  • Operating System: [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04]

  • Browser (if applicable): [e.g., Chrome 100.0, Firefox 98.0]

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

image

Browser Console Logs:
"Something went wrong :/\n"There is no item named 'word/document.xml' in the archive""
[Include relevant browser console logs, if applicable]

Docker Container Logs:
[Include relevant Docker container logs, if applicable]

Screenshots (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Installation Method

[Describe the method you used to install the project, e.g., manual installation, Docker, package manager, etc.]

Additional Information

[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @tqangxl on GitHub (Jun 2, 2024). # Bug Report *.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive" ## Description *.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive" ![image](https://github.com/open-webui/open-webui/assets/9669944/1cebc00d-a3a9-4d56-a61f-998740e8ec77) **Bug Summary:** [Provide a brief but clear summary of the bug] **Steps to Reproduce:** save .docx file to .doc (save type :word97 - 2003) upload *.doc file (word97 - 2003) from chat **Expected Behavior:** [Describe what you expected to happen.] from: https://blog.csdn.net/qq_50000922/article/details/129270999 https://blog.csdn.net/weixin_42521211/article/details/106428503 方法一:python-docx模块 使用python-docx模块,通过其中的Document函数可以读取word文档,然后可以借助document对象的相关属性、方法来获取文档中想要的信息或者编辑文档。 ‘add_heading’, ‘add_page_break’, ‘add_paragraph’, ‘add_picture’, ‘add_section’, ‘add_table’, ‘core_properties’, ‘element’, ‘inline_shapes’, ‘paragraphs’, ‘part’, ‘save’, ‘sections’, ‘settings’, ‘styles’, ‘tables’ 简单示例 from docx import Document input_document = Document(filename) #读取word文件 tables = input_document.tables # 获取文件中的所有表格 KeyError: “There is no item named ‘word/NULL’ in the archive” 方法二:解压-解析的方式 根据对上述word的简述, 根据word(.docx)文件的格式,因此我们可以通过遵循如下步骤进行正文信息的提取: unzip .docx files 用BeautifulSoup解析word/document.xml提取正文信息 代码示例如下: from zipfile import ZipFile from bs4 import BeautifulSoup document=ZipFile(r'test.docx') xml=document.read("word/document.xml") wordObj=BeautifulSoup(xml.decode("utf-8")) texts=wordObj.findAll("w:t") for text in texts: print(text.text) **Actual Behavior:** [Describe what actually happened.] ## Environment - **Open WebUI Version:** [e.g., 0.1.120] - **Ollama (if applicable):** [e.g., 0.1.30, 0.1.32-rc1] - **Operating System:** [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04] - **Browser (if applicable):** [e.g., Chrome 100.0, Firefox 98.0] ## Reproduction Details **Confirmation:** - [ ] I have read and followed all the instructions provided in the README.md. - [ ] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. ## Logs and Screenshots ![image](https://github.com/open-webui/open-webui/assets/9669944/c469fc61-a6b8-4c44-8dca-455eff0bda17) **Browser Console Logs:** "Something went wrong :/\n\"There is no item named 'word/document.xml' in the archive\"" [Include relevant browser console logs, if applicable] **Docker Container Logs:** [Include relevant Docker container logs, if applicable] **Screenshots (if applicable):** [Attach any relevant screenshots to help illustrate the issue] ## Installation Method [Describe the method you used to install the project, e.g., manual installation, Docker, package manager, etc.] ## Additional Information [Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.] ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1093