[GH-ISSUE #13614] issue: Nvidia V100 GPU not supported in main-cuda docker image(with capability 7.0, message say all below 7.5 are not supported) #16969

Closed
opened 2026-04-19 22:46:21 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @ER-EPR on GitHub (May 7, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13614

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

6.7

Ollama Version (if applicable)

6.8

Operating System

Ubuntu 22.04

Browser (if applicable)

edge

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have listed steps to reproduce the bug in detail.

Expected Behavior

reranking working.

Actual Behavior

reranking not working

log show the following when container starts:

Important

Found GPU0 Tesla V100-SXM3-32GB which is of cuda capability 7.0.

PyTorch no longer supports this GPU because it is too old.

The minimum cuda capability supported by this library is 7.5.

Steps to Reproduce

deploy main-cuda image on machine with Nvidia V100 GPU
use rerank in RAG knowledge or hybrid search.

Logs & Screenshots

2025-05-07 04:29:26.715 | INFO | open_webui.utils.plugin:load_tool_module_by_id:103 - Loaded module: tool_default - {}

2025-05-07 04:29:39.766 | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:334 - Starting hybrid search for 1 queries in 1 collections... - {}

Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 0%| | 0/1 [00:00<?, ?it/s]

2025-05-07 04:29:43.814 | ERROR | open_webui.retrieval.utils:query_doc_with_hybrid_search:174 - Error querying doc file-7dc76e48-426e-47c3-ad14-f7d7250acb58 with hybrid search: CUDA error: no kernel image is available for execution on the device

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

  • {}

Traceback (most recent call last):

File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap

self._bootstrap_inner()

│    └ <function Thread._bootstrap_inner at 0x7fbaa3d74860>

└ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner

self.run()

│    └ <function Thread.run at 0x7fbaa3d74540>

└ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

File "/usr/local/lib/python3.11/threading.py", line 982, in run

self._target(*self._args, **self._kwargs)

│    │        │    │        │    └ {}

│    │        │    │        └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

│    │        │    └ (<weakref at 0x7fb888271c10; to 'ThreadPoolExecutor' at 0x7fb888128150>, <_queue.SimpleQueue object at 0x7fb8881423e0>, None,...

│    │        └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

│    └ <function _worker at 0x7fbaa2c84180>

└ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker

work_item.run()

│         └ <function _WorkItem.run at 0x7fbaa2c842c0>

└ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run

result = self.fn(*self.args, **self.kwargs)

         │    │   │    │       │    └ {}

         │    │   │    │       └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

         │    │   │    └ ('file-7dc76e48-426e-47c3-ad14-f7d7250acb58', '这些人在说什么')

         │    │   └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

         │    └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fb889ebcd60>

         └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

File "/app/backend/open_webui/retrieval/utils.py", line 340, in process_query

result = query_doc_with_hybrid_search(

         └ <function query_doc_with_hybrid_search at 0x7fb91b88b7e0>

File "/app/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search

result = compression_retriever.invoke(query)

         │                     │      └ '这些人在说什么'

         │                     └ <function BaseRetriever.invoke at 0x7fb91bdc96c0>

         └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke

result = self._get_relevant_documents(

         │    └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fb91bdc87c0>

         └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

File "/usr/local/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents

compressed_docs = self.base_compressor.compress_documents(

                  │    │               └ <function RerankCompressor.compress_documents at 0x7fb91b8acc20>

                  │    └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb...

                  └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

File "/app/backend/open_webui/retrieval/utils.py", line 809, in compress_documents

scores = self.reranking_function.predict(

         │    │                  └ <function CrossEncoder.predict at 0x7fb88c72b1a0>

         │    └ CrossEncoder(

         │        (model): XLMRobertaForSequenceClassification(

         │          (roberta): XLMRobertaModel(

         │            (embeddings): XLMRobertaE...

         └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb...

File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

       │     │       └ {}

       │     └ (CrossEncoder(

       │         (model): XLMRobertaForSequenceClassification(

       │           (roberta): XLMRobertaModel(

       │             (embeddings): XLMRoberta...

       └ <function CrossEncoder.predict at 0x7fb88c72b100>

File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/util.py", line 68, in wrapper

return func(self, *args, **kwargs)

       │    │      │       └ {}

       │    │      └ ([('这些人在说什么', "Thank you, Pratik, for setting the scene about the. Needs and especially the burden of hypoglycemia and the be...

       │    └ CrossEncoder(

       │        (model): XLMRobertaForSequenceClassification(

       │          (roberta): XLMRobertaModel(

       │            (embeddings): XLMRobertaE...

       └ <function CrossEncoder.predict at 0x7fb88c72b060>

File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 651, in predict

model_predictions = self.model(**features, return_dict=True)

                    │            └ <unprintable BatchEncoding object>

                    └ CrossEncoder(

                        (model): XLMRobertaForSequenceClassification(

                          (roberta): XLMRobertaModel(

                            (embeddings): XLMRobertaE...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

       │    │           │       └ <unprintable dict object>

       │    │           └ ()

       │    └ <function Module._call_impl at 0x7fb9395589a0>

       └ XLMRobertaForSequenceClassification(

           (roberta): XLMRobertaModel(

             (embeddings): XLMRobertaEmbeddings(

               (word_embedd...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

       │             │       └ <unprintable dict object>

       │             └ ()

       └ <bound method XLMRobertaForSequenceClassification.forward of XLMRobertaForSequenceClassification(

           (roberta): XLMRobertaMode...

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1331, in forward

outputs = self.roberta(

          └ XLMRobertaForSequenceClassification(

              (roberta): XLMRobertaModel(

                (embeddings): XLMRobertaEmbeddings(

                  (word_embedd...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

       │    │           │       └ <unprintable dict object>

       │    │           └ <unprintable tuple object>

       │    └ <function Module._call_impl at 0x7fb9395589a0>

       └ XLMRobertaModel(

           (embeddings): XLMRobertaEmbeddings(

             (word_embeddings): Embedding(250002, 1024, padding_idx=1)

             (pos...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

       │             │       └ <unprintable dict object>

       │             └ <unprintable tuple object>

       └ <bound method XLMRobertaModel.forward of XLMRobertaModel(

           (embeddings): XLMRobertaEmbeddings(

             (word_embeddings): Embedd...

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 915, in forward

embedding_output = self.embeddings(

                   └ XLMRobertaModel(

                       (embeddings): XLMRobertaEmbeddings(

                         (word_embeddings): Embedding(250002, 1024, padding_idx=1)

                         (pos...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

       │    │           │       └ <unprintable dict object>

       │    │           └ ()

       │    └ <function Module._call_impl at 0x7fb9395589a0>

       └ XLMRobertaEmbeddings(

           (word_embeddings): Embedding(250002, 1024, padding_idx=1)

           (position_embeddings): Embedding(8194, 10...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

       │             │       └ <unprintable dict object>

       │             └ ()

       └ <bound method XLMRobertaEmbeddings.forward of XLMRobertaEmbeddings(

           (word_embeddings): Embedding(250002, 1024, padding_idx=...

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 100, in forward

position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length)

               │                                  │          │    │            └ 0

               │                                  │          │    └ 1

               │                                  │          └ XLMRobertaEmbeddings(

               │                                  │              (word_embeddings): Embedding(250002, 1024, padding_idx=1)

               │                                  │              (position_embeddings): Embedding(8194, 10...

               │                                  └ <unprintable Tensor object>

               └ <function create_position_ids_from_input_ids at 0x7fb88c39b880>

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1700, in create_position_ids_from_input_ids

mask = input_ids.ne(padding_idx).int()

       │         │  └ 1

       │         └ <method 'ne' of 'torch._C.TensorBase' objects>

       └ <unprintable Tensor object>

RuntimeError: CUDA error: no kernel image is available for execution on the device

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

2025-05-07 04:29:43.855 | ERROR | open_webui.retrieval.utils:process_query:352 - Error when querying the collection with hybrid_search: CUDA error: no kernel image is available for execution on the device

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

  • {}

Traceback (most recent call last):

File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap

self._bootstrap_inner()

│    └ <function Thread._bootstrap_inner at 0x7fbaa3d74860>

└ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner

self.run()

│    └ <function Thread.run at 0x7fbaa3d74540>

└ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

File "/usr/local/lib/python3.11/threading.py", line 982, in run

self._target(*self._args, **self._kwargs)

│    │        │    │        │    └ {}

│    │        │    │        └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

│    │        │    └ (<weakref at 0x7fb888271c10; to 'ThreadPoolExecutor' at 0x7fb888128150>, <_queue.SimpleQueue object at 0x7fb8881423e0>, None,...

│    │        └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

│    └ <function _worker at 0x7fbaa2c84180>

└ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)>

File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker

work_item.run()

│         └ <function _WorkItem.run at 0x7fbaa2c842c0>

└ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run

result = self.fn(*self.args, **self.kwargs)

         │    │   │    │       │    └ {}

         │    │   │    │       └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

         │    │   │    └ ('file-7dc76e48-426e-47c3-ad14-f7d7250acb58', '这些人在说什么')

         │    │   └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

         │    └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fb889ebcd60>

         └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0>

File "/app/backend/open_webui/retrieval/utils.py", line 340, in process_query

result = query_doc_with_hybrid_search(

         └ <function query_doc_with_hybrid_search at 0x7fb91b88b7e0>

File "/app/backend/open_webui/retrieval/utils.py", line 175, in query_doc_with_hybrid_search

raise e

File "/app/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search

result = compression_retriever.invoke(query)

         │                     │      └ '这些人在说什么'

         │                     └ <function BaseRetriever.invoke at 0x7fb91bdc96c0>

         └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke

result = self._get_relevant_documents(

         │    └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fb91bdc87c0>

         └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

File "/usr/local/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents

compressed_docs = self.base_compressor.compress_documents(

                  │    │               └ <function RerankCompressor.compress_documents at 0x7fb91b8acc20>

                  │    └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb...

                  └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

File "/app/backend/open_webui/retrieval/utils.py", line 809, in compress_documents

scores = self.reranking_function.predict(

         │    │                  └ <function CrossEncoder.predict at 0x7fb88c72b1a0>

         │    └ CrossEncoder(

         │        (model): XLMRobertaForSequenceClassification(

         │          (roberta): XLMRobertaModel(

         │            (embeddings): XLMRobertaE...

         └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb...

File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

       │     │       └ {}

       │     └ (CrossEncoder(

       │         (model): XLMRobertaForSequenceClassification(

       │           (roberta): XLMRobertaModel(

       │             (embeddings): XLMRoberta...

       └ <function CrossEncoder.predict at 0x7fb88c72b100>

File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/util.py", line 68, in wrapper

return func(self, *args, **kwargs)

       │    │      │       └ {}

       │    │      └ ([('这些人在说什么', "Thank you, Pratik, for setting the scene about the. Needs and especially the burden of hypoglycemia and the be...

       │    └ CrossEncoder(

       │        (model): XLMRobertaForSequenceClassification(

       │          (roberta): XLMRobertaModel(

       │            (embeddings): XLMRobertaE...

       └ <function CrossEncoder.predict at 0x7fb88c72b060>

File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 651, in predict

model_predictions = self.model(**features, return_dict=True)

                    │            └ <unprintable BatchEncoding object>

                    └ CrossEncoder(

                        (model): XLMRobertaForSequenceClassification(

                          (roberta): XLMRobertaModel(

                            (embeddings): XLMRobertaE...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

       │    │           │       └ <unprintable dict object>

       │    │           └ ()

       │    └ <function Module._call_impl at 0x7fb9395589a0>

       └ XLMRobertaForSequenceClassification(

           (roberta): XLMRobertaModel(

             (embeddings): XLMRobertaEmbeddings(

               (word_embedd...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

       │             │       └ <unprintable dict object>

       │             └ ()

       └ <bound method XLMRobertaForSequenceClassification.forward of XLMRobertaForSequenceClassification(

           (roberta): XLMRobertaMode...

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1331, in forward

outputs = self.roberta(

          └ XLMRobertaForSequenceClassification(

              (roberta): XLMRobertaModel(

                (embeddings): XLMRobertaEmbeddings(

                  (word_embedd...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

       │    │           │       └ <unprintable dict object>

       │    │           └ <unprintable tuple object>

       │    └ <function Module._call_impl at 0x7fb9395589a0>

       └ XLMRobertaModel(

           (embeddings): XLMRobertaEmbeddings(

             (word_embeddings): Embedding(250002, 1024, padding_idx=1)

             (pos...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

       │             │       └ <unprintable dict object>

       │             └ <unprintable tuple object>

       └ <bound method XLMRobertaModel.forward of XLMRobertaModel(

           (embeddings): XLMRobertaEmbeddings(

             (word_embeddings): Embedd...

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 915, in forward

embedding_output = self.embeddings(

                   └ XLMRobertaModel(

                       (embeddings): XLMRobertaEmbeddings(

                         (word_embeddings): Embedding(250002, 1024, padding_idx=1)

                         (pos...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

       │    │           │       └ <unprintable dict object>

       │    │           └ ()

       │    └ <function Module._call_impl at 0x7fb9395589a0>

       └ XLMRobertaEmbeddings(

           (word_embeddings): Embedding(250002, 1024, padding_idx=1)

           (position_embeddings): Embedding(8194, 10...

File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

       │             │       └ <unprintable dict object>

       │             └ ()

       └ <bound method XLMRobertaEmbeddings.forward of XLMRobertaEmbeddings(

           (word_embeddings): Embedding(250002, 1024, padding_idx=...

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 100, in forward

position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length)

               │                                  │          │    │            └ 0

               │                                  │          │    └ 1

               │                                  │          └ XLMRobertaEmbeddings(

               │                                  │              (word_embeddings): Embedding(250002, 1024, padding_idx=1)

               │                                  │              (position_embeddings): Embedding(8194, 10...

               │                                  └ <unprintable Tensor object>

               └ <function create_position_ids_from_input_ids at 0x7fb88c39b880>

File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1700, in create_position_ids_from_input_ids

mask = input_ids.ne(padding_idx).int()

       │         │  └ 1

       │         └ <method 'ne' of 'torch._C.TensorBase' objects>

       └ <unprintable Tensor object>

RuntimeError: CUDA error: no kernel image is available for execution on the device

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

2025-05-07 04:29:44.024 | INFO | open_webui.retrieval.utils:query_doc:88 - query_doc:result '77565fac-2892-4df3-87d0-051b16a18d55', '46ce05e6-8db3-41cd-9007-e387cb846bab', '54a86116-cd4d-4010-8494-4e4c0e1b27d3', 'e54d84d9-2ff2-497b-b74f-e94f51833f13', '0914de01-3734-4a3a-a621-8a49263527be' {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 25522}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 25813}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 17323}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 36428}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 34979} - {}

Additional Information

https://github.com/open-webui/open-webui/issues/13186
in the recent pytorch upgrade, cu128 with test version pytorch 2.7 is used, I think maybe this version drop the support for Arch 7.5 and below. But I can't find the exact description in pytorch repo. But I saw the current stable version is pytorch 2.7 with cu126. Is their a guide for me to build the main-cuda image myself?

Originally created by @ER-EPR on GitHub (May 7, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/13614 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 6.7 ### Ollama Version (if applicable) 6.8 ### Operating System Ubuntu 22.04 ### Browser (if applicable) edge ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior reranking working. ### Actual Behavior reranking not working log show the following when container starts: > [!IMPORTANT] > Found GPU0 Tesla V100-SXM3-32GB which is of cuda capability 7.0. > > PyTorch no longer supports this GPU because it is too old. > > The minimum cuda capability supported by this library is 7.5. ### Steps to Reproduce deploy main-cuda image on machine with Nvidia V100 GPU use rerank in RAG knowledge or hybrid search. ### Logs & Screenshots 2025-05-07 04:29:26.715 | INFO | open_webui.utils.plugin:load_tool_module_by_id:103 - Loaded module: tool_default - {} 2025-05-07 04:29:39.766 | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:334 - Starting hybrid search for 1 queries in 1 collections... - {} Batches: 0%| | 0/1 [00:00<?, ?it/s] Batches: 0%| | 0/1 [00:00<?, ?it/s] 2025-05-07 04:29:43.814 | ERROR | open_webui.retrieval.utils:query_doc_with_hybrid_search:174 - Error querying doc file-7dc76e48-426e-47c3-ad14-f7d7250acb58 with hybrid search: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. - {} Traceback (most recent call last): File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7fbaa3d74860> └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function Thread.run at 0x7fbaa3d74540> └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> File "/usr/local/lib/python3.11/threading.py", line 982, in run self._target(*self._args, **self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> │ │ │ └ (<weakref at 0x7fb888271c10; to 'ThreadPoolExecutor' at 0x7fb888128150>, <_queue.SimpleQueue object at 0x7fb8881423e0>, None,... │ │ └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> │ └ <function _worker at 0x7fbaa2c84180> └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker work_item.run() │ └ <function _WorkItem.run at 0x7fbaa2c842c0> └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> │ │ │ └ ('file-7dc76e48-426e-47c3-ad14-f7d7250acb58', '这些人在说什么') │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> │ └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fb889ebcd60> └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> File "/app/backend/open_webui/retrieval/utils.py", line 340, in process_query result = query_doc_with_hybrid_search( └ <function query_doc_with_hybrid_search at 0x7fb91b88b7e0> > File "/app/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search result = compression_retriever.invoke(query) │ │ └ '这些人在说什么' │ └ <function BaseRetriever.invoke at 0x7fb91bdc96c0> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke result = self._get_relevant_documents( │ └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fb91bdc87c0> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/usr/local/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents compressed_docs = self.base_compressor.compress_documents( │ │ └ <function RerankCompressor.compress_documents at 0x7fb91b8acc20> │ └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb... └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/app/backend/open_webui/retrieval/utils.py", line 809, in compress_documents scores = self.reranking_function.predict( │ │ └ <function CrossEncoder.predict at 0x7fb88c72b1a0> │ └ CrossEncoder( │ (model): XLMRobertaForSequenceClassification( │ (roberta): XLMRobertaModel( │ (embeddings): XLMRobertaE... └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb... File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) │ │ └ {} │ └ (CrossEncoder( │ (model): XLMRobertaForSequenceClassification( │ (roberta): XLMRobertaModel( │ (embeddings): XLMRoberta... └ <function CrossEncoder.predict at 0x7fb88c72b100> File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/util.py", line 68, in wrapper return func(self, *args, **kwargs) │ │ │ └ {} │ │ └ ([('这些人在说什么', "Thank you, Pratik, for setting the scene about the. Needs and especially the burden of hypoglycemia and the be... │ └ CrossEncoder( │ (model): XLMRobertaForSequenceClassification( │ (roberta): XLMRobertaModel( │ (embeddings): XLMRobertaE... └ <function CrossEncoder.predict at 0x7fb88c72b060> File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 651, in predict model_predictions = self.model(**features, return_dict=True) │ └ <unprintable BatchEncoding object> └ CrossEncoder( (model): XLMRobertaForSequenceClassification( (roberta): XLMRobertaModel( (embeddings): XLMRobertaE... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) │ │ │ └ <unprintable dict object> │ │ └ () │ └ <function Module._call_impl at 0x7fb9395589a0> └ XLMRobertaForSequenceClassification( (roberta): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embedd... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) │ │ └ <unprintable dict object> │ └ () └ <bound method XLMRobertaForSequenceClassification.forward of XLMRobertaForSequenceClassification( (roberta): XLMRobertaMode... File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1331, in forward outputs = self.roberta( └ XLMRobertaForSequenceClassification( (roberta): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embedd... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) │ │ │ └ <unprintable dict object> │ │ └ <unprintable tuple object> │ └ <function Module._call_impl at 0x7fb9395589a0> └ XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=1) (pos... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) │ │ └ <unprintable dict object> │ └ <unprintable tuple object> └ <bound method XLMRobertaModel.forward of XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedd... File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 915, in forward embedding_output = self.embeddings( └ XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=1) (pos... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) │ │ │ └ <unprintable dict object> │ │ └ () │ └ <function Module._call_impl at 0x7fb9395589a0> └ XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=1) (position_embeddings): Embedding(8194, 10... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) │ │ └ <unprintable dict object> │ └ () └ <bound method XLMRobertaEmbeddings.forward of XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=... File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 100, in forward position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length) │ │ │ │ └ 0 │ │ │ └ 1 │ │ └ XLMRobertaEmbeddings( │ │ (word_embeddings): Embedding(250002, 1024, padding_idx=1) │ │ (position_embeddings): Embedding(8194, 10... │ └ <unprintable Tensor object> └ <function create_position_ids_from_input_ids at 0x7fb88c39b880> File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1700, in create_position_ids_from_input_ids mask = input_ids.ne(padding_idx).int() │ │ └ 1 │ └ <method 'ne' of 'torch._C.TensorBase' objects> └ <unprintable Tensor object> RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-05-07 04:29:43.855 | ERROR | open_webui.retrieval.utils:process_query:352 - Error when querying the collection with hybrid_search: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. - {} Traceback (most recent call last): File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7fbaa3d74860> └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function Thread.run at 0x7fbaa3d74540> └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> File "/usr/local/lib/python3.11/threading.py", line 982, in run self._target(*self._args, **self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> │ │ │ └ (<weakref at 0x7fb888271c10; to 'ThreadPoolExecutor' at 0x7fb888128150>, <_queue.SimpleQueue object at 0x7fb8881423e0>, None,... │ │ └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> │ └ <function _worker at 0x7fbaa2c84180> └ <Thread(ThreadPoolExecutor-3_0, started 140427797698240)> File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker work_item.run() │ └ <function _WorkItem.run at 0x7fbaa2c842c0> └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> │ │ │ └ ('file-7dc76e48-426e-47c3-ad14-f7d7250acb58', '这些人在说什么') │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> │ └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fb889ebcd60> └ <concurrent.futures.thread._WorkItem object at 0x7fb8883979d0> > File "/app/backend/open_webui/retrieval/utils.py", line 340, in process_query result = query_doc_with_hybrid_search( └ <function query_doc_with_hybrid_search at 0x7fb91b88b7e0> File "/app/backend/open_webui/retrieval/utils.py", line 175, in query_doc_with_hybrid_search raise e File "/app/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search result = compression_retriever.invoke(query) │ │ └ '这些人在说什么' │ └ <function BaseRetriever.invoke at 0x7fb91bdc96c0> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke result = self._get_relevant_documents( │ └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fb91bdc87c0> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/usr/local/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents compressed_docs = self.base_compressor.compress_documents( │ │ └ <function RerankCompressor.compress_documents at 0x7fb91b8acc20> │ └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb... └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/app/backend/open_webui/retrieval/utils.py", line 809, in compress_documents scores = self.reranking_function.predict( │ │ └ <function CrossEncoder.predict at 0x7fb88c72b1a0> │ └ CrossEncoder( │ (model): XLMRobertaForSequenceClassification( │ (roberta): XLMRobertaModel( │ (embeddings): XLMRobertaE... └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fb889eb... File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) │ │ └ {} │ └ (CrossEncoder( │ (model): XLMRobertaForSequenceClassification( │ (roberta): XLMRobertaModel( │ (embeddings): XLMRoberta... └ <function CrossEncoder.predict at 0x7fb88c72b100> File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/util.py", line 68, in wrapper return func(self, *args, **kwargs) │ │ │ └ {} │ │ └ ([('这些人在说什么', "Thank you, Pratik, for setting the scene about the. Needs and especially the burden of hypoglycemia and the be... │ └ CrossEncoder( │ (model): XLMRobertaForSequenceClassification( │ (roberta): XLMRobertaModel( │ (embeddings): XLMRobertaE... └ <function CrossEncoder.predict at 0x7fb88c72b060> File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 651, in predict model_predictions = self.model(**features, return_dict=True) │ └ <unprintable BatchEncoding object> └ CrossEncoder( (model): XLMRobertaForSequenceClassification( (roberta): XLMRobertaModel( (embeddings): XLMRobertaE... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) │ │ │ └ <unprintable dict object> │ │ └ () │ └ <function Module._call_impl at 0x7fb9395589a0> └ XLMRobertaForSequenceClassification( (roberta): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embedd... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) │ │ └ <unprintable dict object> │ └ () └ <bound method XLMRobertaForSequenceClassification.forward of XLMRobertaForSequenceClassification( (roberta): XLMRobertaMode... File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1331, in forward outputs = self.roberta( └ XLMRobertaForSequenceClassification( (roberta): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embedd... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) │ │ │ └ <unprintable dict object> │ │ └ <unprintable tuple object> │ └ <function Module._call_impl at 0x7fb9395589a0> └ XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=1) (pos... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) │ │ └ <unprintable dict object> │ └ <unprintable tuple object> └ <bound method XLMRobertaModel.forward of XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedd... File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 915, in forward embedding_output = self.embeddings( └ XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=1) (pos... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) │ │ │ └ <unprintable dict object> │ │ └ () │ └ <function Module._call_impl at 0x7fb9395589a0> └ XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=1) (position_embeddings): Embedding(8194, 10... File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) │ │ └ <unprintable dict object> │ └ () └ <bound method XLMRobertaEmbeddings.forward of XLMRobertaEmbeddings( (word_embeddings): Embedding(250002, 1024, padding_idx=... File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 100, in forward position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length) │ │ │ │ └ 0 │ │ │ └ 1 │ │ └ XLMRobertaEmbeddings( │ │ (word_embeddings): Embedding(250002, 1024, padding_idx=1) │ │ (position_embeddings): Embedding(8194, 10... │ └ <unprintable Tensor object> └ <function create_position_ids_from_input_ids at 0x7fb88c39b880> File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1700, in create_position_ids_from_input_ids mask = input_ids.ne(padding_idx).int() │ │ └ 1 │ └ <method 'ne' of 'torch._C.TensorBase' objects> └ <unprintable Tensor object> RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-05-07 04:29:44.024 | INFO | open_webui.retrieval.utils:query_doc:88 - query_doc:result [['77565fac-2892-4df3-87d0-051b16a18d55', '46ce05e6-8db3-41cd-9007-e387cb846bab', '54a86116-cd4d-4010-8494-4e4c0e1b27d3', 'e54d84d9-2ff2-497b-b74f-e94f51833f13', '0914de01-3734-4a3a-a621-8a49263527be']] [[{'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 25522}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 25813}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 17323}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 36428}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 34979}]] - {} ### Additional Information https://github.com/open-webui/open-webui/issues/13186 in the recent pytorch upgrade, cu128 with test version pytorch 2.7 is used, I think maybe this version drop the support for Arch 7.5 and below. But I can't find the exact description in pytorch repo. But I saw the current stable version is pytorch 2.7 with cu126. Is their a guide for me to build the main-cuda image myself?
GiteaMirror added the bug label 2026-04-19 22:46:21 -05:00
Author
Owner

@Mister-Hope commented on GitHub (May 7, 2025):

I have a repo under my personal account, check the cu128 branch and revert to an older cuda version similarly.

You should understand to support new coming hardware, older ones will be out of support gradually

<!-- gh-comment-id:2857100367 --> @Mister-Hope commented on GitHub (May 7, 2025): I have a repo under my personal account, check the cu128 branch and revert to an older cuda version similarly. You should understand to support new coming hardware, older ones will be out of support gradually
Author
Owner

@ER-EPR commented on GitHub (May 7, 2025):

I have a repo under my personal account, check the cu128 branch and revert to an older cuda version similarly.

You should understand to support new coming hardware, older ones will be out of support gradually

Hi, I understand. But among ≥32G VRAM GPU H100 or A100 are still too expensive. I have cu128 installed, the problem is pytorch in the latest image missing support for cuda capability 7.0, is it possible to use a pytorch compiled with backward compatibility at least to 7.0
Is there a guide of building the webui cuda image. I see the pull request only change the cuda version, what should I do if I want to test other combination of cuda and pytorch build arg.

<!-- gh-comment-id:2857174422 --> @ER-EPR commented on GitHub (May 7, 2025): > I have a repo under my personal account, check the cu128 branch and revert to an older cuda version similarly. > > You should understand to support new coming hardware, older ones will be out of support gradually Hi, I understand. But among ≥32G VRAM GPU H100 or A100 are still too expensive. I have cu128 installed, the problem is pytorch in the latest image missing support for cuda capability 7.0, is it possible to use a pytorch compiled with backward compatibility at least to 7.0 Is there a guide of building the webui cuda image. I see the pull request only change the cuda version, what should I do if I want to test other combination of cuda and pytorch build arg.
Author
Owner

@Mister-Hope commented on GitHub (May 7, 2025):

Hi, edit Dockfile

- ARG USE_CUDA_VER=cu128
+ ARG USE_CUDA_VER=cu121

Then build the image yourself, then you should bypass this. This is the only change I contribute.

You should also be aware that as time pass, the minimal CUDA version that pytorch may change, currently it's 2.7 and previously 2.6. You may need to stop at a future version either.

<!-- gh-comment-id:2857217642 --> @Mister-Hope commented on GitHub (May 7, 2025): Hi, edit `Dockfile` ```diff - ARG USE_CUDA_VER=cu128 + ARG USE_CUDA_VER=cu121 ``` Then build the image yourself, then you should bypass this. This is the only change I contribute. You should also be aware that as time pass, the minimal CUDA version that pytorch may change, currently it's 2.7 and previously 2.6. You may need to stop at a future version either.
Author
Owner

@ProjectMoon commented on GitHub (May 7, 2025):

I have a similar problem, but I require CUDA 11 for an NVIDIA GTX 970. Hybrid search was working some versions ago.

I am now building my own image. When I tried installing cu117, there was an error importing transformers.modeling_utils because the compiler attribute is missing from torch. I suspect another dep needs to be version-locked for this case? Possibly transformers itself?

<!-- gh-comment-id:2857511554 --> @ProjectMoon commented on GitHub (May 7, 2025): I have a similar problem, but I require CUDA 11 for an NVIDIA GTX 970. Hybrid search was working some versions ago. I am now building my own image. When I tried installing cu117, there was an error importing `transformers.modeling_utils` because the `compiler` attribute is missing from torch. I suspect another dep needs to be version-locked for this case? Possibly `transformers` itself?
Author
Owner

@ProjectMoon commented on GitHub (May 7, 2025):

Have experienced some apparent success with version locking transformers to 4.48.3 and downgrading sentence-transformers back to 3.3.1. There was a commit that upgraded sentence-transformers to a new version: 3ec6652f990f0314062498492f799b58ddc550d6

Still testing, as I'm not ENTIRELY sure it's running on my GPU yet.

<!-- gh-comment-id:2857687027 --> @ProjectMoon commented on GitHub (May 7, 2025): Have experienced some apparent success with version locking transformers to 4.48.3 and downgrading `sentence-transformers` back to 3.3.1. There was a commit that upgraded sentence-transformers to a new version: `3ec6652f990f0314062498492f799b58ddc550d6` Still testing, as I'm not ENTIRELY sure it's running on my GPU yet.
Author
Owner

@ivanbaldo commented on GitHub (May 7, 2025):

Maybe running the re-ranker on the CPU is not too slow?

Also there's a PR for running the re-ranker externally with a Cohere API and that I guess is similar to an API provided by Ollama and others that could be implemented, see https://github.com/open-webui/open-webui/issues/8478 .

<!-- gh-comment-id:2858690162 --> @ivanbaldo commented on GitHub (May 7, 2025): Maybe running the re-ranker on the CPU is not too slow? Also there's a PR for running the re-ranker externally with a Cohere API and that I guess is similar to an API provided by Ollama and others that could be implemented, see https://github.com/open-webui/open-webui/issues/8478 .
Author
Owner

@ProjectMoon commented on GitHub (May 7, 2025):

Maybe running the re-ranker on the CPU is not too slow?

Also there's a PR for running the re-ranker externally with a Cohere API and that I guess is similar to an API provided by Ollama and others that could be implemented, see #8478 .

Well, normally when I run the reranking on the CPU, all the CPU cores spin to 100% and my desktop sounds like a sad jet engine. That doesn't happen. Also, it seems to execute anything involving transformers (hybrid search, speech to text) too quickly to be CPU-only. But what I DO NOT see is the model being loaded onto the NVIDIA GPU with nvidia-smi. But if I run Python in the container and do torch.cuda.is_available(), I get True, and it reports the device is the GTX 970.

So... evidence heavily points to "running on GPU."

<!-- gh-comment-id:2858739771 --> @ProjectMoon commented on GitHub (May 7, 2025): > Maybe running the re-ranker on the CPU is not too slow? > > Also there's a PR for running the re-ranker externally with a Cohere API and that I guess is similar to an API provided by Ollama and others that could be implemented, see [#8478](https://github.com/open-webui/open-webui/issues/8478) . Well, normally when I run the reranking on the CPU, all the CPU cores spin to 100% and my desktop sounds like a sad jet engine. That doesn't happen. Also, it seems to execute anything involving transformers (hybrid search, speech to text) too quickly to be CPU-only. But what I DO NOT see is the model being loaded onto the NVIDIA GPU with `nvidia-smi`. But if I run Python in the container and do `torch.cuda.is_available()`, I get `True`, and it reports the device is the GTX 970. So... evidence heavily points to "running on GPU."
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#16969