mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
web UI *.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive" #1093
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tqangxl on GitHub (Jun 2, 2024).
Bug Report
*.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive"
Description
*.doc(word97 - 2003) Something went wrong :/ "There is no item named 'word/document.xml' in the archive"

Bug Summary:
[Provide a brief but clear summary of the bug]
Steps to Reproduce:
save .docx file to .doc (save type :word97 - 2003)
upload *.doc file (word97 - 2003) from chat
Expected Behavior:
[Describe what you expected to happen.]
from:
https://blog.csdn.net/qq_50000922/article/details/129270999
https://blog.csdn.net/weixin_42521211/article/details/106428503
方法一:python-docx模块
使用python-docx模块,通过其中的Document函数可以读取word文档,然后可以借助document对象的相关属性、方法来获取文档中想要的信息或者编辑文档。
‘add_heading’,
‘add_page_break’,
‘add_paragraph’,
‘add_picture’,
‘add_section’,
‘add_table’,
‘core_properties’,
‘element’,
‘inline_shapes’,
‘paragraphs’,
‘part’,
‘save’,
‘sections’,
‘settings’,
‘styles’,
‘tables’
简单示例
from docx import Document
input_document = Document(filename) #读取word文件
tables = input_document.tables # 获取文件中的所有表格
KeyError: “There is no item named ‘word/NULL’ in the archive”
方法二:解压-解析的方式
根据对上述word的简述, 根据word(.docx)文件的格式,因此我们可以通过遵循如下步骤进行正文信息的提取:
unzip .docx files
用BeautifulSoup解析word/document.xml提取正文信息
代码示例如下:
from zipfile import ZipFile
from bs4 import BeautifulSoup
document=ZipFile(r'test.docx')
xml=document.read("word/document.xml")
wordObj=BeautifulSoup(xml.decode("utf-8"))
texts=wordObj.findAll("w:t")
for text in texts:
print(text.text)
Actual Behavior:
[Describe what actually happened.]
Environment
Open WebUI Version: [e.g., 0.1.120]
Ollama (if applicable): [e.g., 0.1.30, 0.1.32-rc1]
Operating System: [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04]
Browser (if applicable): [e.g., Chrome 100.0, Firefox 98.0]
Reproduction Details
Confirmation:
Logs and Screenshots
Browser Console Logs:
"Something went wrong :/\n"There is no item named 'word/document.xml' in the archive""
[Include relevant browser console logs, if applicable]
Docker Container Logs:
[Include relevant Docker container logs, if applicable]
Screenshots (if applicable):
[Attach any relevant screenshots to help illustrate the issue]
Installation Method
[Describe the method you used to install the project, e.g., manual installation, Docker, package manager, etc.]
Additional Information
[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!