mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-08 21:09:41 -05:00
[GH-ISSUE #13614] issue: Nvidia V100 GPU not supported in main-cuda docker image(with capability 7.0, message say all below 7.5 are not supported) #16969
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ER-EPR on GitHub (May 7, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13614
Check Existing Issues
Installation Method
Docker
Open WebUI Version
6.7
Ollama Version (if applicable)
6.8
Operating System
Ubuntu 22.04
Browser (if applicable)
edge
Confirmation
README.md.Expected Behavior
reranking working.
Actual Behavior
reranking not working
log show the following when container starts:
Steps to Reproduce
deploy main-cuda image on machine with Nvidia V100 GPU
use rerank in RAG knowledge or hybrid search.
Logs & Screenshots
2025-05-07 04:29:26.715 | INFO | open_webui.utils.plugin:load_tool_module_by_id:103 - Loaded module: tool_default - {}
2025-05-07 04:29:39.766 | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:334 - Starting hybrid search for 1 queries in 1 collections... - {}
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 0%| | 0/1 [00:00<?, ?it/s]
2025-05-07 04:29:43.814 | ERROR | open_webui.retrieval.utils:query_doc_with_hybrid_search:174 - Error querying doc file-7dc76e48-426e-47c3-ad14-f7d7250acb58 with hybrid search: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 982, in run
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
File "/app/backend/open_webui/retrieval/utils.py", line 340, in process_query
File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke
File "/usr/local/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents
File "/app/backend/open_webui/retrieval/utils.py", line 809, in compress_documents
File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/util.py", line 68, in wrapper
File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 651, in predict
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1331, in forward
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 915, in forward
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 100, in forward
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1700, in create_position_ids_from_input_ids
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.2025-05-07 04:29:43.855 | ERROR | open_webui.retrieval.utils:process_query:352 - Error when querying the collection with hybrid_search: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 982, in run
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
File "/app/backend/open_webui/retrieval/utils.py", line 175, in query_doc_with_hybrid_search
File "/app/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search
File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke
File "/usr/local/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents
File "/app/backend/open_webui/retrieval/utils.py", line 809, in compress_documents
File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/util.py", line 68, in wrapper
File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 651, in predict
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1331, in forward
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 915, in forward
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 100, in forward
File "/usr/local/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py", line 1700, in create_position_ids_from_input_ids
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.2025-05-07 04:29:44.024 | INFO | open_webui.retrieval.utils:query_doc:88 - query_doc:result '77565fac-2892-4df3-87d0-051b16a18d55', '46ce05e6-8db3-41cd-9007-e387cb846bab', '54a86116-cd4d-4010-8494-4e4c0e1b27d3', 'e54d84d9-2ff2-497b-b74f-e94f51833f13', '0914de01-3734-4a3a-a621-8a49263527be' {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 25522}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 25813}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 17323}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 36428}, {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'created_by': '3705809b-2508-413e-acec-8f42176cee07', 'embedding_config': '{"engine": "ollama", "model": "bge-m3"}', 'file_id': '7dc76e48-426e-47c3-ad14-f7d7250acb58', 'hash': '95f23eee27b2e9846a1e64e8e760991aa4eb1935e08dd86393d9d024284b828e', 'name': 'Roche ATTD 录音稿.docx', 'source': 'Roche ATTD 录音稿.docx', 'start_index': 34979} - {}
Additional Information
https://github.com/open-webui/open-webui/issues/13186
in the recent pytorch upgrade, cu128 with test version pytorch 2.7 is used, I think maybe this version drop the support for Arch 7.5 and below. But I can't find the exact description in pytorch repo. But I saw the current stable version is pytorch 2.7 with cu126. Is their a guide for me to build the main-cuda image myself?
@Mister-Hope commented on GitHub (May 7, 2025):
I have a repo under my personal account, check the cu128 branch and revert to an older cuda version similarly.
You should understand to support new coming hardware, older ones will be out of support gradually
@ER-EPR commented on GitHub (May 7, 2025):
Hi, I understand. But among ≥32G VRAM GPU H100 or A100 are still too expensive. I have cu128 installed, the problem is pytorch in the latest image missing support for cuda capability 7.0, is it possible to use a pytorch compiled with backward compatibility at least to 7.0
Is there a guide of building the webui cuda image. I see the pull request only change the cuda version, what should I do if I want to test other combination of cuda and pytorch build arg.
@Mister-Hope commented on GitHub (May 7, 2025):
Hi, edit
DockfileThen build the image yourself, then you should bypass this. This is the only change I contribute.
You should also be aware that as time pass, the minimal CUDA version that pytorch may change, currently it's 2.7 and previously 2.6. You may need to stop at a future version either.
@ProjectMoon commented on GitHub (May 7, 2025):
I have a similar problem, but I require CUDA 11 for an NVIDIA GTX 970. Hybrid search was working some versions ago.
I am now building my own image. When I tried installing cu117, there was an error importing
transformers.modeling_utilsbecause thecompilerattribute is missing from torch. I suspect another dep needs to be version-locked for this case? Possiblytransformersitself?@ProjectMoon commented on GitHub (May 7, 2025):
Have experienced some apparent success with version locking transformers to 4.48.3 and downgrading
sentence-transformersback to 3.3.1. There was a commit that upgraded sentence-transformers to a new version:3ec6652f990f0314062498492f799b58ddc550d6Still testing, as I'm not ENTIRELY sure it's running on my GPU yet.
@ivanbaldo commented on GitHub (May 7, 2025):
Maybe running the re-ranker on the CPU is not too slow?
Also there's a PR for running the re-ranker externally with a Cohere API and that I guess is similar to an API provided by Ollama and others that could be implemented, see https://github.com/open-webui/open-webui/issues/8478 .
@ProjectMoon commented on GitHub (May 7, 2025):
Well, normally when I run the reranking on the CPU, all the CPU cores spin to 100% and my desktop sounds like a sad jet engine. That doesn't happen. Also, it seems to execute anything involving transformers (hybrid search, speech to text) too quickly to be CPU-only. But what I DO NOT see is the model being loaded onto the NVIDIA GPU with
nvidia-smi. But if I run Python in the container and dotorch.cuda.is_available(), I getTrue, and it reports the device is the GTX 970.So... evidence heavily points to "running on GPU."