mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-23 10:02:10 -05:00
[GH-ISSUE #8800] KeyError: '/Filter' during Embedding with Potential Memory Leak #69869
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Jehyun97 on GitHub (Jan 23, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/8800
##Description
When performing the embedding process, an error occurs with certain PDF files. These specific PDFs appear to cause issues because the LLM cannot properly respond to their content. Additionally, whenever this error is triggered, there seems to be a memory leak, as the RAM usage continues to increase and does not decrease over time.
The issue is observed while running the embedding process in the following environment.
##Steps to Reproduce
Install OpenWebUI using pip install open-webui.
Set the embedding model to llava:34B in the configuration.
Attempt to embed a PDF file that contains complex or unsupported content.
Monitor for errors during the embedding process and observe RAM usage after the error occurs.
##Expected Behavior
The embedding process should gracefully handle PDF files, even if the LLM cannot respond to certain content.
RAM usage should remain stable and decrease once the embedding process completes.
##Actual Behavior
The embedding process fails with specific PDF files, producing an error.
RAM usage increases significantly and does not decrease after the error occurs, suggesting a memory leak.
##Environment
Hardware: NVIDIA RTX 4090, 128GB RAM
Software:
Ollama (Local)
OpenWebUI (installed via pip)
Embedding Model: llava:34B
##Error log
INFO: Started server process [24284]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO: 127.0.0.1:50572 - "GET /workspace/knowledge HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /static/splash.png HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/config HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /static/favicon.png HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/auths/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/config HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/changelog HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/users/user/settings HTTP/1.1" 200 OK
INFO [open_webui.routers.openai] get_all_models()
INFO [open_webui.routers.ollama] get_all_models()
INFO: ('127.0.0.1', 50580) - "WebSocket /ws/socket.io/?EIO=4&transport=websocket" [accepted]
INFO: connection open
INFO: 127.0.0.1:50572 - "GET /api/models HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/configs/banners HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/tools/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/channels/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50578 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50573 - "GET /api/v1/knowledge/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /api/v1/chats/pinned HTTP/1.1" 200 OK
INFO: 127.0.0.1:50578 - "GET /api/v1/knowledge/list HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /api/v1/folders/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /api/v1/chats/?page=2 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /api/v1/chats/61b0c8ad-e11f-42a7-8a2c-746afee17f6e HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/61b0c8ad-e11f-42a7-8a2c-746afee17f6e HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/2dae32f7-2d1a-420e-9241-1d7285eebea5 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/be92c1a2-6999-415a-b442-c935348d5e72 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/49142232-612c-4716-8a22-5d9bedab8cd7 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/791474d2-5fca-41db-995a-1519753f6ccb HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50578 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50572 - "GET /api/v1/models/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/groups/ HTTP/1.1" 200 OK
INFO [open_webui.routers.openai] get_all_models()
INFO [open_webui.routers.ollama] get_all_models()
INFO: 127.0.0.1:50578 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50578 - "GET /api/v1/users/user/settings HTTP/1.1" 200 OK
INFO: 127.0.0.1:50578 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50578 - "GET /api/v1/users/user/settings HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /api/models HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /ollama/api/version HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "POST /api/v1/chats/new HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50572 - "POST /api/v1/chats/87394621-d56a-47a6-85fb-5da516999dea HTTP/1.1" 200 OK
INFO: 127.0.0.1:50573 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "POST /api/chat/completions HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "POST /api/chat/completed HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "POST /api/v1/chats/87394621-d56a-47a6-85fb-5da516999dea HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50572 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50625 - "GET /api/v1/chats/87394621-d56a-47a6-85fb-5da516999dea HTTP/1.1" 200 OK
INFO: 127.0.0.1:50625 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 OK
INFO: 127.0.0.1:50649 - "GET /api/v1/chats/0a24d42a-3bbf-4eaf-95a3-dc3a461c8564 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50650 - "GET /api/v1/chats/61b0c8ad-e11f-42a7-8a2c-746afee17f6e HTTP/1.1" 200 OK
INFO: 127.0.0.1:50649 - "GET /_app/immutable/nodes/14.BbMNt56o.js HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50649 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50650 - "GET /api/v1/users/user/settings HTTP/1.1" 200 OK
INFO: 127.0.0.1:50650 - "GET /static/favicon.png HTTP/1.1" 304 Not Modified
INFO [open_webui.routers.openai] get_all_models()
INFO: 127.0.0.1:50649 - "GET /api/v1/models/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50649 - "GET /api/v1/groups/ HTTP/1.1" 200 OK
INFO [open_webui.routers.ollama] get_all_models()
INFO: 127.0.0.1:50650 - "GET /api/models HTTP/1.1" 200 OK
INFO: 127.0.0.1:50649 - "GET /api/v1/knowledge/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50650 - "GET /api/v1/knowledge/list HTTP/1.1" 200 OK
INFO: 127.0.0.1:50650 - "GET /api/v1/chats/4a909505-cf4a-4276-9910-9c7050f96fe9 HTTP/1.1" 200 OK
INFO: 127.0.0.1:50650 - "GET /_app/immutable/nodes/27.BGknEtjO.js HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50650 - "GET /_app/immutable/nodes/23.DVn2870Y.js HTTP/1.1" 200 OK
INFO: 127.0.0.1:50650 - "GET /api/v1/groups/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50651 - "POST /api/v1/knowledge/create HTTP/1.1" 200 OK
INFO: 127.0.0.1:50651 - "GET /api/v1/knowledge/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50651 - "GET /_app/immutable/nodes/22.NzlK6pcY.js HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50652 - "GET /_app/immutable/chunks/AccessControlModal.BTOy_yWu.js HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50651 - "GET /_app/immutable/assets/22.wEbTgpRj.css HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:50651 - "GET /api/v1/knowledge/db8d92c4-f0f4-4178-9a90-457180b21f40 HTTP/1.1" 200 OK
WARNI [python_multipart.multipart] Skipping data after last boundary
INFO [open_webui.routers.files] file.content_type: application/pdf
Filter type: /FlateDecode
Filter type: /FlateDecode
Filter type: /FlateDecode
Filter type: /FlateDecode
Filter type: /FlateDecode
Filter type: /FlateDecode
Filter type: /FlateDecode
Filter type: /FlateDecode
Filter type: /FlateDecode
INFO [open_webui.routers.retrieval] save_docs_to_vector_db: document 2014_07-30_NASA-STD-3001-Vol-1-Rev-A_published.pdf file-45a51dad-b6ac-4f76-a82f-34a35068caa0
Collection file-45a51dad-b6ac-4f76-a82f-34a35068caa0 does not exist.
INFO [open_webui.routers.retrieval] adding to collection file-45a51dad-b6ac-4f76-a82f-34a35068caa0
INFO: 127.0.0.1:50658 - "POST /api/v1/files/ HTTP/1.1" 200 OK
INFO [open_webui.routers.retrieval] save_docs_to_vector_db: document 2014_07-30_NASA-STD-3001-Vol-1-Rev-A_published.pdf db8d92c4-f0f4-4178-9a90-457180b21f40
Collection db8d92c4-f0f4-4178-9a90-457180b21f40 does not exist.
INFO [open_webui.routers.retrieval] adding to collection db8d92c4-f0f4-4178-9a90-457180b21f40
INFO: 127.0.0.1:50658 - "POST /api/v1/knowledge/db8d92c4-f0f4-4178-9a90-457180b21f40/file/add HTTP/1.1" 200 OK
WARNI [python_multipart.multipart] Skipping data after last boundary
INFO [open_webui.routers.files] file.content_type: application/pdf
Filter type: /FlateDecode
INFO [open_webui.routers.retrieval] save_docs_to_vector_db: document 2022-01-05-NASA-STD-3001-Vol1-Rev-B-Final-Draft-Signature-010522.pdf file-ed659d2e-50bd-40ea-ba53-48f9107aea18
Collection file-ed659d2e-50bd-40ea-ba53-48f9107aea18 does not exist.
INFO [open_webui.routers.retrieval] adding to collection file-ed659d2e-50bd-40ea-ba53-48f9107aea18
INFO: 127.0.0.1:50658 - "POST /api/v1/files/ HTTP/1.1" 200 OK
INFO [open_webui.routers.retrieval] save_docs_to_vector_db: document 2022-01-05-NASA-STD-3001-Vol1-Rev-B-Final-Draft-Signature-010522.pdf db8d92c4-f0f4-4178-9a90-457180b21f40
INFO [open_webui.routers.retrieval] collection db8d92c4-f0f4-4178-9a90-457180b21f40 already exists
INFO [open_webui.routers.retrieval] adding to collection db8d92c4-f0f4-4178-9a90-457180b21f40
INFO: 127.0.0.1:50658 - "POST /api/v1/knowledge/db8d92c4-f0f4-4178-9a90-457180b21f40/file/add HTTP/1.1" 200 OK
WARNI [python_multipart.multipart] Skipping data after last boundary
INFO [open_webui.routers.files] file.content_type: application/pdf
Filter type: /FlateDecode
Filter type: /FlateDecode
ERROR [open_webui.routers.retrieval] '/Filter'
Traceback (most recent call last):
File "D:\PYTHON.venv\Lib\site-packages\open_webui\routers\retrieval.py", line 884, in process_file
docs = loader.load(
^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\open_webui\retrieval\loaders\main.py", line 127, in load
docs = loader.load()
^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_core\document_loaders\base.py", line 31, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\pdf.py", line 257, in lazy_load
yield from self.parser.parse(blob)
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_core\document_loaders\base.py", line 127, in parse
return list(self.lazy_parse(blob))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\parsers\pdf.py", line 125, in lazy_parse
yield from [
^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\parsers\pdf.py", line 128, in
+ self._extract_images_from_page(page),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\parsers\pdf.py", line 143, in _extract_images_from_page
if xObject[obj]["/Filter"][1:] in _PDF_FILTER_WITHOUT_LOSS:
~~~~~~~~~~~~^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\pypdf\generic_data_structures.py", line 417, in getitem
return dict.getitem(self, key).get_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/Filter'
ERROR [open_webui.routers.files] 400: '/Filter'
Traceback (most recent call last):
File "D:\PYTHON.venv\Lib\site-packages\open_webui\routers\retrieval.py", line 884, in process_file
docs = loader.load(
^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\open_webui\retrieval\loaders\main.py", line 127, in load
docs = loader.load()
^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_core\document_loaders\base.py", line 31, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\pdf.py", line 257, in lazy_load
yield from self.parser.parse(blob)
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_core\document_loaders\base.py", line 127, in parse
return list(self.lazy_parse(blob))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\parsers\pdf.py", line 125, in lazy_parse
yield from [
^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\parsers\pdf.py", line 128, in
+ self._extract_images_from_page(page),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\langchain_community\document_loaders\parsers\pdf.py", line 143, in _extract_images_from_page
if xObject[obj]["/Filter"][1:] in _PDF_FILTER_WITHOUT_LOSS:
~~~~~~~~~~~~^^^^^^^^^^^
File "D:\PYTHON.venv\Lib\site-packages\pypdf\generic_data_structures.py", line 417, in getitem
return dict.getitem(self, key).get_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/Filter'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\PYTHON.venv\Lib\site-packages\open_webui\routers\files.py", line 74, in upload_file
process_file(request, ProcessFileForm(file_id=id))
File "D:\PYTHON.venv\Lib\site-packages\open_webui\routers\retrieval.py", line 962, in process_file
raise HTTPException(
fastapi.exceptions.HTTPException: 400: '/Filter'
ERROR [open_webui.routers.files] Error processing file: 366d6cd4-b7c7-4d0b-9ed8-1b82b105f854
INFO: 127.0.0.1:50658 - "POST /api/v1/files/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:50658 - "POST /api/v1/knowledge/db8d92c4-f0f4-4178-9a90-457180b21f40/file/add HTTP/1.1" 400 Bad Request
@Classic298 commented on GitHub (Jan 23, 2025):
Well... LLaVA-34B is,... quite a large embedding model with RAM usage going beyond 24 GB (that's the BARE minimum requirement for low Quantization, if you use high Quantization then you are way below the minimum requirements needed to run the model).
Even though you have a 4090, for such large embedding models it is recommended to use multiple GPUs or much larger GPUs with more VRAM.
I think you should try to scale down the embedding model a bit