[GH-ISSUE #6255] Update LLaVA to LLaVA OneVision #65952

Open
opened 2026-05-03 23:18:34 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @alexrah on GitHub (Aug 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6255

LLaVA has a new version called OneVision which was released 2024/08/06

HuggingFace
GitHub
Release Notes

key features:

  • Supports various input resolutions up to 2304 * 2304 pixels.
  • Single image input is represented by 729 * (9+1) tokens at most under anyres_max_9 mode.
  • Supports multi-image and video inputs. Multi-image input is represented by 729 token for each image, and video input is represented by 196 token for each frame.
  • Available in three sizes: 0.5B, 7B and 72B parameter versions, fit for different memory and inference latency requirements.
  • better support for Set-of-mark prompting
  • and more...
Originally created by @alexrah on GitHub (Aug 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6255 LLaVA has a new version called OneVision which was released 2024/08/06 [HuggingFace](https://huggingface.co/collections/lmms-lab/llava-onevision-66a259c3526e15166d6bba37) [GitHub](https://github.com/LLaVA-VL/LLaVA-NeXT/) [Release Notes](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) ### key features: - Supports various input resolutions up to 2304 * 2304 pixels. - Single image input is represented by 729 * (9+1) tokens at most under anyres_max_9 mode. - Supports multi-image and video inputs. Multi-image input is represented by 729 token for each image, and video input is represented by 196 token for each frame. - Available in three sizes: 0.5B, 7B and 72B parameter versions, fit for different memory and inference latency requirements. - better support for Set-of-mark prompting - and more...
GiteaMirror added the model label 2026-05-03 23:18:34 -05:00
Author
Owner

@evolu8 commented on GitHub (Aug 31, 2024):

a Q4 of the Qwen 72b version is a world apart from all the other open models I've tested and yet is workable for small business / many (not all) private budgets re hardware. Fitting on two large or one very large GPU and running at decent speed. It might not be getting the press, but this is the game changer for many. Please let us know if there are challenges the community can help with.

<!-- gh-comment-id:2322825874 --> @evolu8 commented on GitHub (Aug 31, 2024): a Q4 of the Qwen 72b version is a world apart from all the other open models I've tested and yet is workable for small business / many (not all) private budgets re hardware. Fitting on two large or one very large GPU and running at decent speed. It might not be getting the press, but this is the game changer for many. Please let us know if there are challenges the community can help with.
Author
Owner

@ChieF-TroN commented on GitHub (Sep 5, 2024):

Showing my support for adding support for LLaVa-OneVision. This is leagues better than any previous LLaVa model.

<!-- gh-comment-id:2330650314 --> @ChieF-TroN commented on GitHub (Sep 5, 2024): Showing my support for adding support for LLaVa-OneVision. This is leagues better than any previous LLaVa model.
Author
Owner

@jignnsd commented on GitHub (Sep 5, 2024):

The vision models in ollama have very limited capabilities. Will be very useful to add better vision models like this one, so we keep up with the latest developments.
Thanks

<!-- gh-comment-id:2331733417 --> @jignnsd commented on GitHub (Sep 5, 2024): The vision models in ollama have very limited capabilities. Will be very useful to add better vision models like this one, so we keep up with the latest developments. Thanks
Author
Owner

@blacklig commented on GitHub (Sep 12, 2024):

yes please.. at least 7B SI

<!-- gh-comment-id:2347352425 --> @blacklig commented on GitHub (Sep 12, 2024): yes please.. at least 7B SI
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65952