[GH-ISSUE #4332] Difference in performance between liuhaotian/llava-v1.6-34b and Ollama's llava:34b-v1.6 #2696

Closed
opened 2026-04-12 13:01:07 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @EricWiener on GitHub (May 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4332

What is the issue?

When using the demo here I get much better results using their llava:34b-v1.6 than I do via ollama run llava:34b-v1.6 when prompting with the same prompt followed by the image. Example of how I'm prompting:

ollama run llava:34b-v1.6 --verbose
>>> Is there a dog in this picture and if so what is it doing? /data/dog.png

Is there any reason for this and can I somehow match the performance of the demo?

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.1.34

Originally created by @EricWiener on GitHub (May 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4332 ### What is the issue? When using the demo [here](https://llava.hliu.cc/) I get much better results using their `llava:34b-v1.6` than I do via `ollama run llava:34b-v1.6` when prompting with the same prompt followed by the image. Example of how I'm prompting: ``` ollama run llava:34b-v1.6 --verbose >>> Is there a dog in this picture and if so what is it doing? /data/dog.png ``` Is there any reason for this and can I somehow match the performance of the demo? ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.1.34
GiteaMirror added the bug label 2026-04-12 13:01:07 -05:00
Author
Owner

@lizaibeim commented on GitHub (May 13, 2024):

It is not a bug, as the demo uses fp16 precision while the llava:34b-v1.6 is in 4 bit quantization. Thus, the performance is not comparable.

<!-- gh-comment-id:2107387826 --> @lizaibeim commented on GitHub (May 13, 2024): It is not a bug, as the demo uses fp16 precision while the llava:34b-v1.6 is in 4 bit quantization. Thus, the performance is not comparable.
Author
Owner

@EricWiener commented on GitHub (May 13, 2024):

Thanks for the response @lizaibeim ! Do you know if the demo is fp16 or bf16? I believe the demo might be bf16 but only see fp16 as an option through ollama.

<!-- gh-comment-id:2107675372 --> @EricWiener commented on GitHub (May 13, 2024): Thanks for the response @lizaibeim ! Do you know if the demo is fp16 or bf16? I believe the demo might be bf16 but only see fp16 as an option through ollama.
Author
Owner

@lizaibeim commented on GitHub (May 13, 2024):

Sorry i couldn't confirm if it is bf16 or fp16. But based on the recommended VRAM from llava - 80G, versus 69G in Ollama, it might be bf16.

<!-- gh-comment-id:2107995333 --> @lizaibeim commented on GitHub (May 13, 2024): Sorry i couldn't confirm if it is bf16 or fp16. But based on the recommended VRAM from [llava](https://github.com/haotian-liu/LLaVA) - 80G, versus 69G in [Ollama](https://ollama.com/library/llava:34b-v1.6-fp16), it might be bf16.
Author
Owner

@EricWiener commented on GitHub (May 13, 2024):

Thanks @lizaibeim !

<!-- gh-comment-id:2108043934 --> @EricWiener commented on GitHub (May 13, 2024): Thanks @lizaibeim !
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2696