mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #13077] issue: HuggingFace Models Fail Image Processing While Official Ollama Versions Work #16802
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @VooDisss on GitHub (Apr 20, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13077
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.6.5
Ollama Version (if applicable)
0.6.5
Operating System
Windows 10
Browser (if applicable)
Vivaldi 7.3.36535.9
Confirmation
README.md.Expected Behavior
Models should either:
Actual Behavior
When using gamma3:12b and gemma3:28b models from HuggingFace on Windows 10 IoT Enterprise, the models become unresponsive when images are present. This occurs specifically within Open Web UI through Ollama.
For example:
hf.co/bartowski/mlabonne_gemma-3-27b-it-abliterated-GGUF:Q4_K_M
hf.co/bartowski/mlabonne_gemma-3-12b-it-abliterated-GGUF:Q6_K
hf.co/bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF:Q6_K
Steps to Reproduce
3.1 Try to send any message with an attached image
4.1 Model fails to respond or generate any output
alternative:
3.2 send a message with an attached image and official gemma3:27b or gemma3:12b model - success.
3.2.1 Switch to earlier mentioned huggingface model and write anything (even without attached image)
4.2 Model fails to respond or generate any output
Logs & Screenshots
msg="starting llama server" cmd="C:\\Users\\Username\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Username\\.ollama\\models\\blobs\\sha256-0d7afea4b1889c113f4a8ec1855d23bee71b3e3bedcb1fad84f9c9ffcdfe07d0 --ctx-size 15000 --batch-size 512 --n-gpu-layers 63 --threads 6 --no-mmap --parallel 1 --port 11725" time=2025-04-20T14:35:43.651+03:00 level=INFO source=sched.go:451 msg="loaded runners" count=1 time=2025-04-20T14:35:43.651+03:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" time=2025-04-20T14:35:43.677+03:00 level=INFO source=runner.go:816 msg="starting ollama engine" time=2025-04-20T14:35:43.679+03:00 level=INFO source=runner.go:879 msg="Server listening on 127.0.0.1:11725" time=2025-04-20T14:35:43.712+03:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" time=2025-04-20T14:35:43.720+03:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-04-20T14:35:43.720+03:00 level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="Gemma 3 27b It Abliterated" description="" num_tensors=808 num_key_values=45 time=2025-04-20T14:35:43.749+03:00 level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.260531 model=C:\Users\Username\.ollama\models\blobs\sha256-ef951e8da2afca377da0c429e6c3314a5ee1aeac5b1ddff48918c43ae86878aa ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from C:\Users\Username\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\ggml-cuda.dll load_backend: loaded CPU backend from C:\Users\Username\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-04-20T14:35:43.817+03:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-04-20T14:35:43.963+03:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" time=2025-04-20T14:35:43.969+03:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="15.4 GiB" time=2025-04-20T14:35:43.969+03:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB" time=2025-04-20T14:35:43.999+03:00 level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5108764 model=C:\Users\Username\.ollama\models\blobs\sha256-ef951e8da2afca377da0c429e6c3314a5ee1aeac5b1ddff48918c43ae86878aa time=2025-04-20T14:35:49.040+03:00 level=INFO source=ggml.go:388 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 time=2025-04-20T14:35:49.040+03:00 level=INFO source=ggml.go:388 msg="compute graph" backend=CPU buffer_type=CUDA_Host time=2025-04-20T14:35:49.044+03:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-04-20T14:35:49.047+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0 time=2025-04-20T14:35:49.056+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-20T14:35:49.056+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-20T14:35:49.056+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-20T14:35:49.056+03:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-20T14:35:49.226+03:00 level=INFO source=server.go:619 msg="llama runner started in 5.58 seconds" time=2025-04-20T14:35:49.332+03:00 level=INFO source=server.go:789 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input" [GIN] 2025/04/20 - 14:35:49 | 200 | 10.9533448s | 127.0.0.1 | POST "/api/chat" time=2025-04-20T14:36:15.868+03:00 level=INFO source=server.go:789 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input" [GIN] 2025/04/20 - 14:36:15 | 200 | 89.9969ms | 127.0.0.1 | POST "/api/chat" time=2025-04-20T14:36:26.040+03:00 level=INFO source=server.go:789 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input" [GIN] 2025/04/20 - 14:36:26 | 200 | 96.0007ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/20 - 14:37:47 | 200 | 0s | 127.0.0.1 | GET "/api/version" **time=2025-04-20T14:37:59.756+03:00 level=INFO source=server.go:789 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input"** [GIN] 2025/04/20 - 14:37:59 | 200 | 52.9984ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/20 - 14:38:01 | 200 | 1.3285373s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/20 - 14:38:02 | 200 | 883.4985ms | 127.0.0.1 | POST "/api/chat" **time=2025-04-20T14:38:49.731+03:00 level=INFO source=server.go:789 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input"** [GIN] 2025/04/20 - 14:38:49 | 200 | 30.9938ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/20 - 14:38:50 | 200 | 953.9996ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/20 - 14:38:51 | 200 | 848.4977ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/20 - 14:40:20 | 200 | 0s | 127.0.0.1 | GET "/api/version"Additional Information
All official Ollama models work as expected with images
The issue is specific to HuggingFace versions of these models
Even after disabling vision, the model remains unresponsive when an image was previously present in chat
Ollama server log shows errors, docker OWUI container shows no errors in the logs
@VooDisss commented on GitHub (Apr 20, 2025):
I got the suggestion to use official ollama model from another guy on discord, called
Movalis. Meaning the problem is not just for me and we have to resort to official ollama models if we want any images in the chat.@EntropyYue commented on GitHub (Apr 24, 2025):
Have you added a visual projector to the modelfile?
@VooDisss commented on GitHub (Apr 26, 2025):
@EntropyYue no, I didn't add a visual model, and I'm unfamiliar with the process. It appears the issue lies with Ollama's download process; it doesn't automatically include the necessary components for image support. The workaround requires manually downloading the full model and then quantizing it. I've seen similar discussions about this on the Ollama github.
@EntropyYue commented on GitHub (Apr 26, 2025):
You should download the visual projector and include it in the modelfile
@VooDisss commented on GitHub (Apr 26, 2025):
@EntropyYue Easier said than done... didn't I mention that i use Ollama and NOT LMstudio (where it is that easy as just putting a projector mmproj file in the same folder as the GGUF file of the LLM model)? (I did)
@EntropyYue commented on GitHub (Apr 26, 2025):
Import the model and projector separately like this