mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #13729] issue: /api/v1/files/ upload succeeds but files are not processed (data.content is missing, hash is null) #32541
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @LosaLosSantos on GitHub (May 9, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13729
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.7
Ollama Version (if applicable)
No response
Operating System
Server (OpenWebUI backend): Linux (Docker container)
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
When uploading a file via the /api/v1/files/ endpoint and adding it to a knowledge base via /api/v1/knowledge/{knowledge_id}/file/add, the file should:
Be correctly parsed and chunked by the backend
Populate data.content and hash fields in the files table
Be usable immediately for retrieval-augmented generation (RAG)
Actual Behavior
The file upload via /api/v1/files/ appears to succeed (returns 200 OK and file ID), but:
The file record in the database has data: {} and hash: null
No chunking or content extraction occurs
Attempting to associate the file with a knowledge base results in:
{"detail": "Extracted content is not available for this file. Please ensure that the file is processed before proceeding."}
Thus, the document is not usable in the chat or completions API
Steps to Reproduce
Confirm that the upload returns a valid file ID.
Check the file record in the DB (files table) — fields data.content and hash are missing.
Try to add the file to a knowledge collection using:
{"detail": "Extracted content is not available for this file. Please ensure that the file is processed before proceeding."}
Logs & Screenshots
Relevant backend logs:
2025-05-09 10:47:58.848 | ERROR | open_webui.routers.files:upload_file:160 - Error processing file: d16de88a-e0d4-4370-9534-60199f3a3849 - {}
2025-05-09 10:47:58.849 | INFO | "POST /api/v1/files/ HTTP/1.1" 200 -
2025-05-09 10:47:58.889 | INFO | "POST /api/v1/knowledge/.../file/add HTTP/1.1" 400 -
This error happens silently (returns 200), and causes downstream issues in RAG workflows, where files appear uploaded but are never chunked or usable in collections.
Additional Information
No response
@zhizhi-name commented on GitHub (May 9, 2025):
I met with the same bug when calling the upload-file API and using file chat.
2025-05-09 21:46:05.181 | INFO | open_webui.routers.files:upload_file:91 - file.content_type: None - {}
2025-05-09 21:46:05.194 | ERROR | open_webui.routers.files:upload_file:159 - 'NoneType' object has no attribute 'startswith' - {}
Traceback (most recent call last):
File "C:\Users-\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x000001BDB7DB5A80>
└ <WorkerThread(AnyIO worker thread, started 37656)>
File "C:\Users-\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x000001BD9793FBA0>
└ <WorkerThread(AnyIO worker thread, started 37656)>
File "C:\Users-\AppData\Local\Programs\Python\Python311\Lib\site-packages\anyio_backends_asyncio.py", line 962, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function upload_file at 0x000001BDEAF5CAE0>, user=UserModel(id='-', nam...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x000001BD979B61C0>
AttributeError: 'NoneType' object has no attribute 'startswith'
2025-05-09 21:46:05.199 | ERROR | open_webui.routers.files:upload_file:160 - Error processing file: - - {}
2025-05-09 21:46:05.201 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.1.-:17099 - "POST /api/v1/files/ HTTP/1.1" 200 - {}
@feder-cr commented on GitHub (May 9, 2025):
Yes I also have the same bug from the previous version it has not been fixed.🤬
@zhizhi-name commented on GitHub (May 9, 2025):
0.6.5 was good. I didn't try 0.6.6, though.
@LosaLosSantos commented on GitHub (May 9, 2025):
The issue is born in 0.6.6 [ #13600 ]
@tjbck commented on GitHub (May 10, 2025):
What files are you uploading for this?
@LosaLosSantos commented on GitHub (May 10, 2025):
.pdf and .docx In my case
@athoik commented on GitHub (May 10, 2025):
Change
Content Extraction Engineto docling and add eg localhost IP, but no docling server listening / up and running, on localhost.That will force an error, no data returned from docling / and exception posting to rest API endpoint.
Although file will be uploaded on filesystem and on db it will have null hash!
@zhizhi-name commented on GitHub (May 10, 2025):
In my case, I upload a PDF file, but file.content_type returns None.
The malfunction still exists in 0.6.8 and 0.6.9
@athoik commented on GitHub (May 11, 2025):
Hi,
I tried several uploads with docling service down.
The file never inserted in the Knowledge Base and that's correct.
Although the file inserted multiple times in DB and in uploads folder.
That happens because of creating new UUID on each upload.
0cef844168/backend/open_webui/routers/files.py (L99)I bielieve we need to hash files using SHA-256 (or SHA-512) and include the Knowledge Base Id on files.
That would make unique files per Knowledge Base and resolved above issue as well.
@LosaLosSantos commented on GitHub (May 14, 2025):
Is there any update?
@g3rard-j commented on GitHub (May 14, 2025):
For me it worked when I explicitly passed the content type with the request.
I realized when I used the
curlcommand from the docs that it was working fine with the response showing"content_type":"application/pdf"(for pdf document), however the response from the request made with therequestlibrary shows'content_type': NoneUpon further research I found out that
requestswill guess the filename from the file object, then use the mimetypes module to pick a content type, unless it can’t figure one out—then it silently omits the header.@zhizhi-name commented on GitHub (May 14, 2025):
Does it solve the RAG issue after you upload the PDF?
In my case, I changed "file.content_type.startswith" back to "file.content_type in".
This could solve AttributeError: 'NoneType' object has no attribute 'startswith'
However, the RAG still does not work as the document is still not processed.
@tjbck commented on GitHub (May 14, 2025):
Should be addressed with
32ea31144ein dev, testing wanted here!@zhizhi-name commented on GitHub (May 19, 2025):
The problem is addressed. However, the chat/completions API returns a new response about sources, so the code needs to be changed to process source information.
@bilalnazirraja commented on GitHub (May 20, 2025):
Has the problem been resolved? I just updated to the latest release (0.6.10) and still face the same error, even when I explicitly specify the file type I am uploading.
@LosaLosSantos commented on GitHub (May 20, 2025):
The problem is still there
@zhizhi-name commented on GitHub (May 20, 2025):
You may refer to my case:
The sources response looks like this:
{
"sources": [
{
"source": {
"type": "string",
"id": "string"
},
"document": [
"string",
...
],
"metadata": [
{
"author": "string",
"comments": "string",
"company": "string",
"created_by": "string",
"creationdate": "string (ISO 8601 datetime)",
"creator": "string",
"embedding_config": "string (JSON-encoded)",
"file_id": "string",
"hash": "string",
"keywords": "string",
"moddate": "string (ISO 8601 datetime)",
"name": "string",
"page": "integer",
"page_label": "string or integer",
"producer": "string",
"source": "string",
"sourcemodified": "string",
"start_index": "integer",
"subject": "string",
"title": "string",
"total_pages": "integer",
"trapped": "string"
},
...
]
}
]
}
After this sources response, you will get the normal stream chat delta responses.