[GH-ISSUE #10248] InternVL3 Series with Vision, Tools Support, and Quantized Versions #6724

Closed
opened 2026-04-12 18:28:28 -05:00 by GiteaMirror · 17 comments
Owner

Originally created by @zytoh0 on GitHub (Apr 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10248

Please add InternVL3 series with both vision and tools support.

Model:
https://huggingface.co/collections/OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d

InternVL3 is a strong multimodal model with tool-using capabilities, ideal for vision agents and perception-based workflows.

Requesting:

  • Vision and tools/function calling support

  • A wide range of quantized versions to support different deployment scenarios, as done with qwen2.5-coder, is highly useful.

Thank you!

Originally created by @zytoh0 on GitHub (Apr 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10248 Please add InternVL3 series with both vision and tools support. Model: https://huggingface.co/collections/OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d InternVL3 is a strong multimodal model with tool-using capabilities, ideal for vision agents and perception-based workflows. Requesting: - Vision and tools/function calling support - A wide range of quantized versions to support different deployment scenarios, as done with [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder/tags), is highly useful. Thank you!
GiteaMirror added the model label 2026-04-12 18:28:28 -05:00
Author
Owner

@3unnycheung commented on GitHub (Apr 14, 2025):

a great model

<!-- gh-comment-id:2801412358 --> @3unnycheung commented on GitHub (Apr 14, 2025): a great model
Author
Owner

@gongysh2004 commented on GitHub (Apr 17, 2025):

+1

<!-- gh-comment-id:2811395018 --> @gongysh2004 commented on GitHub (Apr 17, 2025): +1
Author
Owner

@ayang commented on GitHub (Apr 20, 2025):

+1

<!-- gh-comment-id:2817217660 --> @ayang commented on GitHub (Apr 20, 2025): +1
Author
Owner

@MickeyMi commented on GitHub (Apr 21, 2025):

+1

<!-- gh-comment-id:2818488151 --> @MickeyMi commented on GitHub (Apr 21, 2025): +1
Author
Owner

@RenzoZS commented on GitHub (Apr 23, 2025):

+1

<!-- gh-comment-id:2825272412 --> @RenzoZS commented on GitHub (Apr 23, 2025): +1
Author
Owner

@webzone commented on GitHub (Apr 30, 2025):

+1

<!-- gh-comment-id:2840799619 --> @webzone commented on GitHub (Apr 30, 2025): +1
Author
Owner

@VooDisss commented on GitHub (May 2, 2025):

+1

<!-- gh-comment-id:2847597433 --> @VooDisss commented on GitHub (May 2, 2025): +1
Author
Owner

@ghost commented on GitHub (May 13, 2025):

The latest ollama pre-release v0.7.0-rc0 can actually load and run the gguf versions from this collection:
https://huggingface.co/collections/ggml-org/internvl-3-and-internvl-25-681f412ab9b6f40dc20ac926

I guess this is because llama.cpp recently added support for multi modal including internvl3 series (https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md).

But vision accuracy seems completely off right now in my tests compared to llama.cpp standalone.

<!-- gh-comment-id:2876225917 --> @ghost commented on GitHub (May 13, 2025): The latest ollama pre-release `v0.7.0-rc0` can actually load and run the gguf versions from this collection: https://huggingface.co/collections/ggml-org/internvl-3-and-internvl-25-681f412ab9b6f40dc20ac926 I guess this is because llama.cpp recently added support for multi modal including internvl3 series (https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md). But vision accuracy seems completely off right now in my tests compared to llama.cpp standalone.
Author
Owner

@maximilianbehr commented on GitHub (May 14, 2025):

+1

<!-- gh-comment-id:2881216331 --> @maximilianbehr commented on GitHub (May 14, 2025): +1
Author
Owner

@luguoyixiazi commented on GitHub (May 16, 2025):

The latest ollama pre-release v0.7.0-rc0 can actually load and run the gguf versions from this collection: https://huggingface.co/collections/ggml-org/internvl-3-and-internvl-25-681f412ab9b6f40dc20ac926

I guess this is because llama.cpp recently added support for multi modal including internvl3 series (https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md).

But vision accuracy seems completely off right now in my tests compared to llama.cpp standalone.

I tried in lmstudio (8b-fp16), much worse than using transfomers (8b-hf), have you compare them? I guess load image may make difference.

<!-- gh-comment-id:2887235690 --> @luguoyixiazi commented on GitHub (May 16, 2025): > The latest ollama pre-release `v0.7.0-rc0` can actually load and run the gguf versions from this collection: https://huggingface.co/collections/ggml-org/internvl-3-and-internvl-25-681f412ab9b6f40dc20ac926 > > I guess this is because llama.cpp recently added support for multi modal including internvl3 series (https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md). > > But vision accuracy seems completely off right now in my tests compared to llama.cpp standalone. I tried in lmstudio (8b-fp16), much worse than using transfomers (8b-hf), have you compare them? I guess load image may make difference.
Author
Owner

@kaci commented on GitHub (May 17, 2025):

+1

<!-- gh-comment-id:2888536988 --> @kaci commented on GitHub (May 17, 2025): +1
Author
Owner

@ShenYangLin93 commented on GitHub (Jun 13, 2025):

+1

<!-- gh-comment-id:2968738070 --> @ShenYangLin93 commented on GitHub (Jun 13, 2025): +1
Author
Owner

@ghzgod commented on GitHub (Jun 19, 2025):

+1

<!-- gh-comment-id:2988783713 --> @ghzgod commented on GitHub (Jun 19, 2025): +1
Author
Owner
<!-- gh-comment-id:3225733293 --> @Elgokoo commented on GitHub (Aug 26, 2025): it doesn"t support https://huggingface.co/bartowski/OpenGVLab_InternVL3_5-14B-GGUF/blob/main/OpenGVLab_InternVL3_5-14B-Q6_K_L.gguf
Author
Owner

@billbillbilly commented on GitHub (Aug 28, 2025):

+1

<!-- gh-comment-id:3235129061 --> @billbillbilly commented on GitHub (Aug 28, 2025): +1
Author
Owner

@Deathproof76 commented on GitHub (Sep 7, 2025):

+1

<!-- gh-comment-id:3263684783 --> @Deathproof76 commented on GitHub (Sep 7, 2025): +1
Author
Owner

@rick-github commented on GitHub (Dec 24, 2025):

$ ollama-run.py hf.co/unsloth/InternVL3-8B-Instruct-GGUF:Q4_K_M what is the time? and what is in this image: ./image1.jpg --tool get_datetime
Added picture ./image1.jpg

calling get_datetime({'timezone_name': 'UTC'})
The current time is 00:56. 

As for the image you provided (image1.jpg), it features a cute puppy sitting on a stone
surface with a red collar and bell around its neck, looking to the side. The background
appears blurred, suggesting an outdoor setting.

The template in the bartowski model doesn't include tool support but it should be possible to merge it with the template from the unsloth model.

$ ollama run hf.co/bartowski/OpenGVLab_InternVL3_5-14B-GGUF:Q6_K_L what is in this image: ./image1.jpg 
Added image './image1.jpg'
The image shows a small white puppy sitting on stone steps. The puppy is wearing a red collar with 
gold-colored studs and a bell attached to it. The background appears to be an outdoor setting with 
blurred elements, possibly indicating a patio or garden area. The puppy looks alert and curious, gazing 
off to the side.
<!-- gh-comment-id:3688326140 --> @rick-github commented on GitHub (Dec 24, 2025): ```console $ ollama-run.py hf.co/unsloth/InternVL3-8B-Instruct-GGUF:Q4_K_M what is the time? and what is in this image: ./image1.jpg --tool get_datetime Added picture ./image1.jpg calling get_datetime({'timezone_name': 'UTC'}) The current time is 00:56. As for the image you provided (image1.jpg), it features a cute puppy sitting on a stone surface with a red collar and bell around its neck, looking to the side. The background appears blurred, suggesting an outdoor setting. ``` The template in the bartowski model doesn't include tool support but it should be possible to merge it with the template from the unsloth model. ```console $ ollama run hf.co/bartowski/OpenGVLab_InternVL3_5-14B-GGUF:Q6_K_L what is in this image: ./image1.jpg Added image './image1.jpg' The image shows a small white puppy sitting on stone steps. The puppy is wearing a red collar with gold-colored studs and a bell attached to it. The background appears to be an outdoor setting with blurred elements, possibly indicating a patio or garden area. The puppy looks alert and curious, gazing off to the side. ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6724