[GH-ISSUE #13198] issue: RAG not working with Hybrid Search (Reranking sometimes returns empty results) #55507

New Issue

GiteaMirror · 2026-05-05T17:36:36-05:00

GiteaMirror commented

2026-05-05 17:36:36 -05:00

Originally created by @MikeNatC on GitHub (Apr 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13198

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.5

Ollama Version (if applicable)

0.6.6

Operating System

Unraid 7.0.1

Browser (if applicable)

Chrome 135.0.7049.115

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

When a document from the knowledge base is loaded in a chat, I expect the response to the chat to consider and be aware of the document as part of context.

Actual Behavior

Response from LLM is that there is no context provided when I use a prompt like "What is the context about?"

However, if I follow up with a second prompt that has something relating to the context, it suddenly recognises the existence of the context.

Steps to Reproduce

Go to Knowledge Base, create a new project and upload documents (sample document attached). UNCITRAL Model Law.txt
Start new chat and select model.
Ask the model "what is the context about?"
Look at the response which should indicate that the context cannot be found.
Follow-up with a different question referencing something contained in the context.
Response should now recognise the existence of the context.

Logs & Screenshots

Screenshot of the Chat

Screenshot of Settings

Screenshots of Document Settings

![Image](https://github.com/user-attachments/assets/0624dcd6-98a3-4058-a443-b0311c9cb2b0)

Open WebUI Docker Logs

2025-04-24 19:21:56.954 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/prompts/ HTTP/1.1" 200 - {}
2025-04-24 19:21:56.980 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/knowledge/ HTTP/1.1" 200 - {}
2025-04-24 19:22:05.015 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/new HTTP/1.1" 200 - {}
2025-04-24 19:22:05.046 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:05.085 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {}
2025-04-24 19:22:05.086 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /static/favicon.png HTTP/1.1" 304 - {}
2025-04-24 19:22:05.117 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:05.140 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {}
2025-04-24 19:22:05.158 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {}
2025-04-24 19:22:11.765 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/8cc5bb09-8807-4e67-95de-0f40eb17a726 HTTP/1.1" 200 - {}
2025-04-24 19:22:20.625 | INFO     | open_webui.retrieval.utils:query_collection_with_hybrid_search:310 - Starting hybrid search for 3 queries in 1 collections... - {}
2025-04-24 19:22:23.665 | WARNING  | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {}
2025-04-24 19:22:23.665 | WARNING  | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {}
2025-04-24 19:22:23.669 | WARNING  | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {}
2025-04-24 19:22:33.253 | INFO     | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {}
2025-04-24 19:22:33.273 | INFO     | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {}
2025-04-24 19:22:33.285 | INFO     | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {}
2025-04-24 19:22:33.287 | INFO     | open_webui.routers.openai:get_all_models:389 - get_all_models() - {}
2025-04-24 19:22:33.328 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completions HTTP/1.1" 200 - {}
2025-04-24 19:22:33.376 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:35.167 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completed HTTP/1.1" 200 - {}
2025-04-24 19:22:35.170 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /static/favicon.png HTTP/1.1" 304 - {}
2025-04-24 19:22:35.205 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {}
2025-04-24 19:22:35.226 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:36.059 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:36.089 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:36.101 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:36.113 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:22:36.798 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {}
2025-04-24 19:22:36.805 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {}
2025-04-24 19:22:36.816 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {}
2025-04-24 19:22:36.827 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {}
2025-04-24 19:22:36.839 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {}
2025-04-24 19:23:25.780 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {}
2025-04-24 19:24:06.278 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/bb517a35-33d5-4fa2-8302-6fe3255a81cf HTTP/1.1" 200 - {}
2025-04-24 19:24:23.456 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {}
2025-04-24 19:24:23.456 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /static/favicon.png HTTP/1.1" 304 - {}
2025-04-24 19:24:23.484 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:24:23.512 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {}
2025-04-24 19:24:23.526 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {}
2025-04-24 19:24:24.609 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/516dc587-d2b1-4d17-9e1b-b06017014a89 HTTP/1.1" 200 - {}
2025-04-24 19:24:24.636 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/516dc587-d2b1-4d17-9e1b-b06017014a89 HTTP/1.1" 200 - {}
2025-04-24 19:24:24.648 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/21945f55-a115-4887-9ae8-2c27baa3fe2e HTTP/1.1" 200 - {}
2025-04-24 19:24:24.718 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/7c3f9738-6c89-4205-8329-4d0713050071 HTTP/1.1" 200 - {}
2025-04-24 19:24:25.011 | INFO     | open_webui.retrieval.utils:query_collection_with_hybrid_search:310 - Starting hybrid search for 3 queries in 1 collections... - {}
2025-04-24 19:24:25.091 | WARNING  | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {}
2025-04-24 19:24:25.101 | WARNING  | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {}
Batches:   0%|          | 0/1 [00:00<?, ?it/s]
2025-04-24 19:24:25.118 | WARNING  | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greateBatches: 100%|██████████| 1/1 [00:09<00:00,  9.33s/it]lts = 10 - {}
2025-04-24 19:24:34.426 | INFO     | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[{'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.6334028244018555}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.43488165736198425}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 0, 'score': 0.35497820377349854}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.3421934247016907}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 6625, 'score': 0.30291610956192017}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.29727572202682495}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.27738073468208313}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 28604, 'score': 0.21438729763031006}]] [[0.6334028244018555, 0.43488165736198425, 0.35497820377349854, 0.3421934247016907, 0.30291610956192017, 0.29727572202682495, 0.27738073468208313, 0.21438729763031006]] - {}
Batches: 100%|██████████| 1/1 [00:09<00:00,  9.35s/it]
2025-04-24 19:24:34.457 | INFO     | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[{'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.6205747723579407}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.4414413869380951}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.29927703738212585}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 0, 'score': 0.25361162424087524}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.22711896896362305}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 28604, 'score': 0.22512435913085938}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.22068163752555847}]] [[0.6205747723579407, 0.4414413869380951, 0.29927703738212585, 0.25361162424087524, 0.22711896896362305, 0.22512435913085938, 0.22068163752555847]] - {}
Batches: 100%|██████████| 1/1 [00:09<00:00,  9.38s/it]
2025-04-24 19:24:34.519 | INFO     | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {}atches: 100%|██████████| 1/1 [00:09<00:00,  9.38s/it]
2025-04-24 19:24:34.526 | INFO     | open_webui.routers.openai:get_all_models:389 - get_all_models() - {}
2025-04-24 19:24:34.572 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completions HTTP/1.1" 200 - {}
2025-04-24 19:24:34.627 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:24:37.639 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completed HTTP/1.1" 200 - {}
2025-04-24 19:24:37.682 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {}
2025-04-24 19:24:37.710 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-04-24 19:24:55.959 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=2 HTTP/1.1" 200 - {}

Ollama Docker Logs

llama_model_loader: - kv  26:          tokenizer.ggml.seperator_token_id u32              = 2
llama_model_loader: - kv  27:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  28:                tokenizer.ggml.cls_token_id u32              = 0
llama_model_loader: - kv  29:               tokenizer.ggml.mask_token_id u32              = 250001
llama_model_loader: - kv  30:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  31:               tokenizer.ggml.add_eos_token bool             = true
llama_model_loader: - kv  32:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  244 tensors
llama_model_loader: - type  f16:  145 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 1.07 GiB (16.25 BPW) 
load: model vocab missing newline token, using special_pad_id instead
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 4
load: token to piece cache size = 2.1668 MB
print_info: arch             = bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 566.70 M
print_info: general.name     = n/a
print_info: vocab type       = UGM
print_info: n_vocab          = 250002
print_info: n_merges         = 0
print_info: BOS token        = 0 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 3 '<unk>'
print_info: SEP token        = 2 '</s>'
print_info: PAD token        = 1 '<pad>'
print_info: MASK token       = 250001 '[PAD250000]'
print_info: LF token         = 0 '<s>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
llama_model_load: vocab only - skipping tensors
time=2025-04-24T11:22:21.697Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 12 --parallel 1 --port 43699"
time=2025-04-24T11:22:21.697Z level=INFO source=sched.go:451 msg="loaded runners" count=2
time=2025-04-24T11:22:21.697Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-04-24T11:22:21.697Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-04-24T11:22:21.707Z level=INFO source=runner.go:853 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2025-04-24T11:22:21.756Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-04-24T11:22:21.759Z level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:43699"
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 6445 MiB free
llama_model_loader: loaded meta data with 33 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                         general.size_label str              = 567M
llama_model_loader: - kv   3:                            general.license str              = mit
llama_model_loader: - kv   4:                               general.tags arr[str,4]       = ["sentence-transformers", "feature-ex...
llama_model_loader: - kv   5:                           bert.block_count u32              = 24
llama_model_loader: - kv   6:                        bert.context_length u32              = 8192
llama_model_loader: - kv   7:                      bert.embedding_length u32              = 1024
llama_model_loader: - kv   8:                   bert.feed_forward_length u32              = 4096
llama_model_loader: - kv   9:                  bert.attention.head_count u32              = 16
llama_model_loader: - kv  10:          bert.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                          general.file_type u32              = 1
llama_model_loader: - kv  12:                      bert.attention.causal bool             = false
llama_model_loader: - kv  13:                          bert.pooling_type u32              = 2
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = t5
llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,250002]  = ["<s>", "<pad>", "</s>", "<unk>", ","...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,250002]  = [0.000000, 0.000000, 0.000000, 0.0000...
time=2025-04-24T11:22:21.949Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,250002]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  20:            tokenizer.ggml.token_type_count u32              = 1
llama_model_loader: - kv  21:    tokenizer.ggml.remove_extra_whitespaces bool             = true
llama_model_loader: - kv  22:        tokenizer.ggml.precompiled_charsmap arr[u8,237539]   = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,...
llama_model_loader: - kv  23:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  24:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  25:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  26:          tokenizer.ggml.seperator_token_id u32              = 2
llama_model_loader: - kv  27:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  28:                tokenizer.ggml.cls_token_id u32              = 0
llama_model_loader: - kv  29:               tokenizer.ggml.mask_token_id u32              = 250001
llama_model_loader: - kv  30:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  31:               tokenizer.ggml.add_eos_token bool             = true
llama_model_loader: - kv  32:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  244 tensors
llama_model_loader: - type  f16:  145 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 1.07 GiB (16.25 BPW) 
load: model vocab missing newline token, using special_pad_id instead
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 4
load: token to piece cache size = 2.1668 MB
print_info: arch             = bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 8192
print_info: n_embd           = 1024
print_info: n_layer          = 24
print_info: n_head           = 16
print_info: n_head_kv        = 16
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_swa_pattern    = 1
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 1.0e-05
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 4096
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 2
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 8192
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 335M
print_info: model params     = 566.70 M
print_info: general.name     = n/a
print_info: vocab type       = UGM
print_info: n_vocab          = 250002
print_info: n_merges         = 0
print_info: BOS token        = 0 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 3 '<unk>'
print_info: SEP token        = 2 '</s>'
print_info: PAD token        = 1 '<pad>'
print_info: MASK token       = 250001 '[PAD250000]'
print_info: LF token         = 0 '<s>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 24 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 25/25 layers to GPU
load_tensors:        CUDA0 model buffer size =   577.22 MiB
load_tensors:   CPU_Mapped model buffer size =   520.30 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = 0
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (2048) < n_ctx_train (8192) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0.00 MiB
init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1
init:      CUDA0 KV buffer size =   192.00 MiB
llama_context: KV self size  =  192.00 MiB, K (f16):   96.00 MiB, V (f16):   96.00 MiB
llama_context:      CUDA0 compute buffer size =    27.01 MiB
llama_context:  CUDA_Host compute buffer size =     5.01 MiB
llama_context: graph nodes  = 825
llama_context: graph splits = 4 (with bs=512), 2 (with bs=1)
time=2025-04-24T11:22:23.203Z level=INFO source=server.go:619 msg="llama runner started in 1.51 seconds"
time=2025-04-24T11:22:23.544Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-24T11:22:23.545Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-24T11:22:23.545Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/04/24 - 11:22:23 | 200 |   3.01056943s |      172.18.0.1 | POST     "/api/embed"
[GIN] 2025/04/24 - 11:22:23 | 200 |  3.020382832s |      172.18.0.1 | POST     "/api/embed"
[GIN] 2025/04/24 - 11:22:23 | 200 |  3.027106525s |      172.18.0.1 | POST     "/api/embed"
time=2025-04-24T11:22:35.217Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/04/24 - 11:22:36 | 200 |  864.455719ms |      172.18.0.1 | POST     "/api/chat"
time=2025-04-24T11:22:36.091Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/04/24 - 11:22:36 | 200 |   702.61302ms |      172.18.0.1 | POST     "/api/chat"
time=2025-04-24T11:24:23.583Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/04/24 - 11:24:25 | 200 |  1.467083742s |      172.18.0.1 | POST     "/api/chat"
time=2025-04-24T11:24:25.049Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-24T11:24:25.049Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-24T11:24:25.049Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-24T11:24:25.075Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-24T11:24:25.076Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-24T11:24:25.076Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/04/24 - 11:24:25 | 200 |   71.799347ms |      172.18.0.1 | POST     "/api/embed"
[GIN] 2025/04/24 - 11:24:25 | 200 |   79.555892ms |      172.18.0.1 | POST     "/api/embed"
[GIN] 2025/04/24 - 11:24:25 | 200 |   85.187121ms |      172.18.0.1 | POST     "/api/embed"

Left out browser console logs as they did not appear relevant - no error messages.

Additional Information

No response

Originally created by @MikeNatC on GitHub (Apr 24, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/13198 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.5 ### Ollama Version (if applicable) 0.6.6 ### Operating System Unraid 7.0.1 ### Browser (if applicable) Chrome 135.0.7049.115 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior When a document from the knowledge base is loaded in a chat, I expect the response to the chat to consider and be aware of the document as part of context. ### Actual Behavior Response from LLM is that there is no context provided when I use a prompt like "What is the context about?" However, if I follow up with a second prompt that has something relating to the context, it suddenly recognises the existence of the context. ### Steps to Reproduce 1. Go to Knowledge Base, create a new project and upload documents (sample document attached). [UNCITRAL Model Law.txt](https://github.com/user-attachments/files/19890569/UNCITRAL.Model.Law.txt) 2. Start new chat and select model. 3. Ask the model "what is the context about?" 4. Look at the response which should indicate that the context cannot be found. 5. Follow-up with a different question referencing something contained in the context. 6. Response should now recognise the existence of the context. ### Logs & Screenshots ### Screenshot of the Chat ![Image](https://github.com/user-attachments/assets/1e400c79-fac8-4c31-8077-22ac2273ded4) ### Screenshot of Settings <details> <summary> Screenshots of Document Settings </summary> ![Image](https://github.com/user-attachments/assets/0624dcd6-98a3-4058-a443-b0311c9cb2b0) ![Image](https://github.com/user-attachments/assets/0392c4e4-bb72-4f1e-bc03-c342704f6d5a) ![Image](https://github.com/user-attachments/assets/b2b17e20-2058-459e-8d2a-cd5c8a9fda1e) ![Image](https://github.com/user-attachments/assets/f8485195-8323-4850-833b-c23a859395e6) </details> ### Open WebUI Docker Logs ```ruby 2025-04-24 19:21:56.954 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/prompts/ HTTP/1.1" 200 - {} 2025-04-24 19:21:56.980 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/knowledge/ HTTP/1.1" 200 - {} 2025-04-24 19:22:05.015 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/new HTTP/1.1" 200 - {} 2025-04-24 19:22:05.046 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:05.085 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {} 2025-04-24 19:22:05.086 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /static/favicon.png HTTP/1.1" 304 - {} 2025-04-24 19:22:05.117 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:05.140 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {} 2025-04-24 19:22:05.158 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {} 2025-04-24 19:22:11.765 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/8cc5bb09-8807-4e67-95de-0f40eb17a726 HTTP/1.1" 200 - {} 2025-04-24 19:22:20.625 | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:310 - Starting hybrid search for 3 queries in 1 collections... - {} 2025-04-24 19:22:23.665 | WARNING | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {} 2025-04-24 19:22:23.665 | WARNING | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {} 2025-04-24 19:22:23.669 | WARNING | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {} 2025-04-24 19:22:33.253 | INFO | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {} 2025-04-24 19:22:33.273 | INFO | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {} 2025-04-24 19:22:33.285 | INFO | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {} 2025-04-24 19:22:33.287 | INFO | open_webui.routers.openai:get_all_models:389 - get_all_models() - {} 2025-04-24 19:22:33.328 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completions HTTP/1.1" 200 - {} 2025-04-24 19:22:33.376 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:35.167 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completed HTTP/1.1" 200 - {} 2025-04-24 19:22:35.170 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /static/favicon.png HTTP/1.1" 304 - {} 2025-04-24 19:22:35.205 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {} 2025-04-24 19:22:35.226 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:36.059 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:36.089 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:36.101 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:36.113 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:22:36.798 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {} 2025-04-24 19:22:36.805 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {} 2025-04-24 19:22:36.816 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {} 2025-04-24 19:22:36.827 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {} 2025-04-24 19:22:36.839 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {} 2025-04-24 19:23:25.780 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {} 2025-04-24 19:24:06.278 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/bb517a35-33d5-4fa2-8302-6fe3255a81cf HTTP/1.1" 200 - {} 2025-04-24 19:24:23.456 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {} 2025-04-24 19:24:23.456 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /static/favicon.png HTTP/1.1" 304 - {} 2025-04-24 19:24:23.484 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:24:23.512 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {} 2025-04-24 19:24:23.526 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {} 2025-04-24 19:24:24.609 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/516dc587-d2b1-4d17-9e1b-b06017014a89 HTTP/1.1" 200 - {} 2025-04-24 19:24:24.636 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/516dc587-d2b1-4d17-9e1b-b06017014a89 HTTP/1.1" 200 - {} 2025-04-24 19:24:24.648 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/21945f55-a115-4887-9ae8-2c27baa3fe2e HTTP/1.1" 200 - {} 2025-04-24 19:24:24.718 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/7c3f9738-6c89-4205-8329-4d0713050071 HTTP/1.1" 200 - {} 2025-04-24 19:24:25.011 | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:310 - Starting hybrid search for 3 queries in 1 collections... - {} 2025-04-24 19:24:25.091 | WARNING | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {} 2025-04-24 19:24:25.101 | WARNING | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greater than number of elements in index 10, updating n_results = 10 - {} Batches: 0%| | 0/1 [00:00<?, ?it/s] 2025-04-24 19:24:25.118 | WARNING | chromadb.segment.impl.vector.local_persistent_hnsw:query_vectors:423 - Number of requested results 40 is greateBatches: 100%|██████████| 1/1 [00:09<00:00, 9.33s/it]lts = 10 - {} 2025-04-24 19:24:34.426 | INFO | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[{'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.6334028244018555}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.43488165736198425}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 0, 'score': 0.35497820377349854}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.3421934247016907}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 6625, 'score': 0.30291610956192017}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.29727572202682495}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.27738073468208313}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 28604, 'score': 0.21438729763031006}]] [[0.6334028244018555, 0.43488165736198425, 0.35497820377349854, 0.3421934247016907, 0.30291610956192017, 0.29727572202682495, 0.27738073468208313, 0.21438729763031006]] - {} Batches: 100%|██████████| 1/1 [00:09<00:00, 9.35s/it] 2025-04-24 19:24:34.457 | INFO | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[{'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.6205747723579407}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.4414413869380951}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.29927703738212585}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 0, 'score': 0.25361162424087524}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.22711896896362305}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': 28604, 'score': 0.22512435913085938}, {'created_by': '21568af8-e7dd-4175-a686-d692319e2e85', 'embedding_config': '{"engine": "ollama", "model": "bge-m3:latest"}', 'file_id': '72cf79e1-a1d3-4e6e-a088-45623a449073', 'hash': '533285e1f38ff39b3b3bd043931bce435e2cc87ffb416de10bf6cb72f263e04e', 'name': 'UNCITRAL Model Law.txt', 'source': 'UNCITRAL Model Law.txt', 'start_index': -1, 'score': 0.22068163752555847}]] [[0.6205747723579407, 0.4414413869380951, 0.29927703738212585, 0.25361162424087524, 0.22711896896362305, 0.22512435913085938, 0.22068163752555847]] - {} Batches: 100%|██████████| 1/1 [00:09<00:00, 9.38s/it] 2025-04-24 19:24:34.519 | INFO | open_webui.retrieval.utils:query_doc_with_hybrid_search:168 - query_doc_with_hybrid_search:result [[]] [[]] - {}atches: 100%|██████████| 1/1 [00:09<00:00, 9.38s/it] 2025-04-24 19:24:34.526 | INFO | open_webui.routers.openai:get_all_models:389 - get_all_models() - {} 2025-04-24 19:24:34.572 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completions HTTP/1.1" 200 - {} 2025-04-24 19:24:34.627 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:24:37.639 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/chat/completed HTTP/1.1" 200 - {} 2025-04-24 19:24:37.682 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "POST /api/v1/chats/09bfb55e-251d-4d70-9de6-7fb8b9f9cdc4 HTTP/1.1" 200 - {} 2025-04-24 19:24:37.710 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-04-24 19:24:55.959 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 10.10.4.1:0 - "GET /api/v1/chats/?page=2 HTTP/1.1" 200 - {} ``` ### Ollama Docker Logs ``` ruby llama_model_loader: - kv 26: tokenizer.ggml.seperator_token_id u32 = 2 llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 1 llama_model_loader: - kv 28: tokenizer.ggml.cls_token_id u32 = 0 llama_model_loader: - kv 29: tokenizer.ggml.mask_token_id u32 = 250001 llama_model_loader: - kv 30: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: general.quantization_version u32 = 2 llama_model_loader: - type f32: 244 tensors llama_model_loader: - type f16: 145 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 1.07 GiB (16.25 BPW) load: model vocab missing newline token, using special_pad_id instead load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 4 load: token to piece cache size = 2.1668 MB print_info: arch = bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 566.70 M print_info: general.name = n/a print_info: vocab type = UGM print_info: n_vocab = 250002 print_info: n_merges = 0 print_info: BOS token = 0 '<s>' print_info: EOS token = 2 '</s>' print_info: UNK token = 3 '<unk>' print_info: SEP token = 2 '</s>' print_info: PAD token = 1 '<pad>' print_info: MASK token = 250001 '[PAD250000]' print_info: LF token = 0 '<s>' print_info: EOG token = 2 '</s>' print_info: max token length = 48 llama_model_load: vocab only - skipping tensors time=2025-04-24T11:22:21.697Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 12 --parallel 1 --port 43699" time=2025-04-24T11:22:21.697Z level=INFO source=sched.go:451 msg="loaded runners" count=2 time=2025-04-24T11:22:21.697Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding" time=2025-04-24T11:22:21.697Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" time=2025-04-24T11:22:21.707Z level=INFO source=runner.go:853 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-04-24T11:22:21.756Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-04-24T11:22:21.759Z level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:43699" llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 6445 MiB free llama_model_loader: loaded meta data with 33 key-value pairs and 389 tensors from /root/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.size_label str = 567M llama_model_loader: - kv 3: general.license str = mit llama_model_loader: - kv 4: general.tags arr[str,4] = ["sentence-transformers", "feature-ex... llama_model_loader: - kv 5: bert.block_count u32 = 24 llama_model_loader: - kv 6: bert.context_length u32 = 8192 llama_model_loader: - kv 7: bert.embedding_length u32 = 1024 llama_model_loader: - kv 8: bert.feed_forward_length u32 = 4096 llama_model_loader: - kv 9: bert.attention.head_count u32 = 16 llama_model_loader: - kv 10: bert.attention.layer_norm_epsilon f32 = 0.000010 llama_model_loader: - kv 11: general.file_type u32 = 1 llama_model_loader: - kv 12: bert.attention.causal bool = false llama_model_loader: - kv 13: bert.pooling_type u32 = 2 llama_model_loader: - kv 14: tokenizer.ggml.model str = t5 llama_model_loader: - kv 15: tokenizer.ggml.pre str = default llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,250002] = ["<s>", "<pad>", "</s>", "<unk>", ","... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,250002] = [0.000000, 0.000000, 0.000000, 0.0000... time=2025-04-24T11:22:21.949Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,250002] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.add_space_prefix bool = true llama_model_loader: - kv 20: tokenizer.ggml.token_type_count u32 = 1 llama_model_loader: - kv 21: tokenizer.ggml.remove_extra_whitespaces bool = true llama_model_loader: - kv 22: tokenizer.ggml.precompiled_charsmap arr[u8,237539] = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,... llama_model_loader: - kv 23: tokenizer.ggml.bos_token_id u32 = 0 llama_model_loader: - kv 24: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 25: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 26: tokenizer.ggml.seperator_token_id u32 = 2 llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 1 llama_model_loader: - kv 28: tokenizer.ggml.cls_token_id u32 = 0 llama_model_loader: - kv 29: tokenizer.ggml.mask_token_id u32 = 250001 llama_model_loader: - kv 30: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: general.quantization_version u32 = 2 llama_model_loader: - type f32: 244 tensors llama_model_loader: - type f16: 145 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 1.07 GiB (16.25 BPW) load: model vocab missing newline token, using special_pad_id instead load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 4 load: token to piece cache size = 2.1668 MB print_info: arch = bert print_info: vocab_only = 0 print_info: n_ctx_train = 8192 print_info: n_embd = 1024 print_info: n_layer = 24 print_info: n_head = 16 print_info: n_head_kv = 16 print_info: n_rot = 64 print_info: n_swa = 0 print_info: n_swa_pattern = 1 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 1.0e-05 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 4096 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 2 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 10000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 8192 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = 335M print_info: model params = 566.70 M print_info: general.name = n/a print_info: vocab type = UGM print_info: n_vocab = 250002 print_info: n_merges = 0 print_info: BOS token = 0 '<s>' print_info: EOS token = 2 '</s>' print_info: UNK token = 3 '<unk>' print_info: SEP token = 2 '</s>' print_info: PAD token = 1 '<pad>' print_info: MASK token = 250001 '[PAD250000]' print_info: LF token = 0 '<s>' print_info: EOG token = 2 '</s>' print_info: max token length = 48 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 24 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 25/25 layers to GPU load_tensors: CUDA0 model buffer size = 577.22 MiB load_tensors: CPU_Mapped model buffer size = 520.30 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 2048 llama_context: n_ctx_per_seq = 2048 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = 0 llama_context: freq_base = 10000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (2048) < n_ctx_train (8192) -- the full capacity of the model will not be utilized llama_context: CUDA_Host output buffer size = 0.00 MiB init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1 init: CUDA0 KV buffer size = 192.00 MiB llama_context: KV self size = 192.00 MiB, K (f16): 96.00 MiB, V (f16): 96.00 MiB llama_context: CUDA0 compute buffer size = 27.01 MiB llama_context: CUDA_Host compute buffer size = 5.01 MiB llama_context: graph nodes = 825 llama_context: graph splits = 4 (with bs=512), 2 (with bs=1) time=2025-04-24T11:22:23.203Z level=INFO source=server.go:619 msg="llama runner started in 1.51 seconds" time=2025-04-24T11:22:23.544Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-24T11:22:23.545Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-24T11:22:23.545Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/04/24 - 11:22:23 | 200 | 3.01056943s | 172.18.0.1 | POST "/api/embed" [GIN] 2025/04/24 - 11:22:23 | 200 | 3.020382832s | 172.18.0.1 | POST "/api/embed" [GIN] 2025/04/24 - 11:22:23 | 200 | 3.027106525s | 172.18.0.1 | POST "/api/embed" time=2025-04-24T11:22:35.217Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/04/24 - 11:22:36 | 200 | 864.455719ms | 172.18.0.1 | POST "/api/chat" time=2025-04-24T11:22:36.091Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/04/24 - 11:22:36 | 200 | 702.61302ms | 172.18.0.1 | POST "/api/chat" time=2025-04-24T11:24:23.583Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/04/24 - 11:24:25 | 200 | 1.467083742s | 172.18.0.1 | POST "/api/chat" time=2025-04-24T11:24:25.049Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-24T11:24:25.049Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-24T11:24:25.049Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-24T11:24:25.075Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-24T11:24:25.076Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-24T11:24:25.076Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/04/24 - 11:24:25 | 200 | 71.799347ms | 172.18.0.1 | POST "/api/embed" [GIN] 2025/04/24 - 11:24:25 | 200 | 79.555892ms | 172.18.0.1 | POST "/api/embed" [GIN] 2025/04/24 - 11:24:25 | 200 | 85.187121ms | 172.18.0.1 | POST "/api/embed" ``` Left out browser console logs as they did not appear relevant - no error messages. ### Additional Information _No response_

GiteaMirror added the bug label 2026-05-05 17:36:36 -05:00

GiteaMirror closed this issue

2026-05-05 17:36:36 -05:00

GiteaMirror commented

2026-05-05 17:36:37 -05:00

@FooleanBool commented on GitHub (Apr 24, 2025):

Works for me.
Docling extraction.
Veritas prompt.
Model:Llama-3.1-8B-UltraLong-1M-Instruct.Q8_0:latest

@FooleanBool commented on GitHub (Apr 24, 2025): Works for me. Docling extraction. Veritas prompt. Model:Llama-3.1-8B-UltraLong-1M-Instruct.Q8_0:latest ![Image](https://github.com/user-attachments/assets/8ef11eda-b66b-4c91-84d0-33372ed95187)

GiteaMirror commented

2026-05-05 17:36:38 -05:00

@FooleanBool commented on GitHub (Apr 24, 2025):

When added to a purpose made knowledge base.

@FooleanBool commented on GitHub (Apr 24, 2025): When added to a purpose made knowledge base. ![Image](https://github.com/user-attachments/assets/cf3c5902-d927-4e3c-a90c-b6f74e2891f5)

GiteaMirror commented

2026-05-05 17:36:38 -05:00

@MikeNatC commented on GitHub (Apr 24, 2025):

I noticed that in my docker logs, when I restart Open WebUI, I get this warning message which I think refers to the documents in my Knowledge Base. It seems to suggest that the 11 files in my knowledge base were processed instantaneously [00:00<00:00, 136906.07it/s] without properly having been chunked.

WARNI [langchain_community.utils.user_agent] USER_AGENT environment variable not set, consider setting it to identify your requests.
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 136906.07it/s]

@MikeNatC commented on GitHub (Apr 24, 2025): I noticed that in my docker logs, when I restart Open WebUI, I get this warning message which I think refers to the documents in my Knowledge Base. It seems to suggest that the 11 files in my knowledge base were processed instantaneously `[00:00<00:00, 136906.07it/s]` without properly having been chunked. ```diff WARNI [langchain_community.utils.user_agent] USER_AGENT environment variable not set, consider setting it to identify your requests. Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 136906.07it/s]

GiteaMirror commented

2026-05-05 17:36:38 -05:00

@MikeNatC commented on GitHub (Apr 24, 2025):

I think I figured out the issue - my reranking model was not working and would sometime just not generate any text. Once I changed the reranking model, this issue went away. I think the model must have gotten corrupted somehow.

I was wondering if anyone can tell me how to delete and reformat the reranking model? I understand it is in the sqLite database but is there a way to access that with adminer?

@MikeNatC commented on GitHub (Apr 24, 2025): I think I figured out the issue - my reranking model was not working and would sometime just not generate any text. Once I changed the reranking model, this issue went away. I think the model must have gotten corrupted somehow. I was wondering if anyone can tell me how to delete and reformat the reranking model? I understand it is in the sqLite database but is there a way to access that with adminer?

GiteaMirror commented

2026-05-05 17:36:39 -05:00

@MikeNatC commented on GitHub (Apr 25, 2025):

Nope the issue came back. It is back to showing no context.

This is after I managed to get it to work again yesterday after changing the embedding and reranking model.

![image](https://github.com/user-attachments/assets/90a436e3-9560-4f9f-8c60-38f9aadfdab0) ![image](https://github.com/user-attachments/assets/4d7ccc31-c83e-4443-86ae-8c0d38e63af8)

For context, when I switched off hybrid search the problem went away.

@MikeNatC commented on GitHub (Apr 25, 2025): Nope the issue came back. It is back to showing no context. This is after I managed to get it to work again yesterday after changing the embedding and reranking model. <details> ![image](https://github.com/user-attachments/assets/90a436e3-9560-4f9f-8c60-38f9aadfdab0) ![image](https://github.com/user-attachments/assets/4d7ccc31-c83e-4443-86ae-8c0d38e63af8) </details> For context, when I switched off `hybrid search` the problem went away.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#55507