[GH-ISSUE #4257] Support for InternVL-Chat-V1.5 #49169

Closed
opened 2026-04-28 10:53:19 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @wwjCMP on GitHub (May 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4257

https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5

We introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple designs:

Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model---InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs.
Dynamic High-Resolution: we divide images into tiles ranging from 1 to 40 of 448 × 448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input.
High-Quality Bilingual Dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images, and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in OCR- and Chinese-related tasks.

Originally created by @wwjCMP on GitHub (May 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4257 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 We introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple designs: Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model---InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs. Dynamic High-Resolution: we divide images into tiles ranging from 1 to 40 of 448 × 448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input. High-Quality Bilingual Dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images, and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in OCR- and Chinese-related tasks.
GiteaMirror added the model label 2026-04-28 10:53:19 -05:00
Author
Owner

@wwjCMP commented on GitHub (May 8, 2024):

https://github.com/ggerganov/llama.cpp/issues/6803

<!-- gh-comment-id:2100475707 --> @wwjCMP commented on GitHub (May 8, 2024): https://github.com/ggerganov/llama.cpp/issues/6803
Author
Owner

@kong1414 commented on GitHub (Jun 20, 2024):

How is the progress of this adaptation work?

<!-- gh-comment-id:2179837731 --> @kong1414 commented on GitHub (Jun 20, 2024): How is the progress of this adaptation work?
Author
Owner

@DuckyBlender commented on GitHub (Jul 26, 2024):

This still one of the best open source vision models. When will it be available on ollama?

<!-- gh-comment-id:2253034444 --> @DuckyBlender commented on GitHub (Jul 26, 2024): This still one of the best open source vision models. When will it be available on ollama?
Author
Owner

@samyan commented on GitHub (Aug 12, 2024):

When it will be available?

<!-- gh-comment-id:2284875050 --> @samyan commented on GitHub (Aug 12, 2024): When it will be available?
Author
Owner

@gaborcselle commented on GitHub (Sep 25, 2024):

Hi all, I was wondering when this might be available in ollama?

<!-- gh-comment-id:2374885484 --> @gaborcselle commented on GitHub (Sep 25, 2024): Hi all, I was wondering when this might be available in ollama?
Author
Owner

@cenap commented on GitHub (Dec 15, 2024):

+1

<!-- gh-comment-id:2543438332 --> @cenap commented on GitHub (Dec 15, 2024): +1
Author
Owner

@benjamin-ebert commented on GitHub (Jan 28, 2025):

+1

<!-- gh-comment-id:2618137785 --> @benjamin-ebert commented on GitHub (Jan 28, 2025): +1
Author
Owner

@James4Ever0 commented on GitHub (Jan 28, 2025):

Have a look at here:

https://github.com/ggerganov/llama.cpp/pull/9403

https://github.com/qlylangyu/llama.cpp/pull/1

<!-- gh-comment-id:2619006666 --> @James4Ever0 commented on GitHub (Jan 28, 2025): Have a look at here: https://github.com/ggerganov/llama.cpp/pull/9403 https://github.com/qlylangyu/llama.cpp/pull/1
Author
Owner

@rick-github commented on GitHub (Jan 8, 2026):

Superseded by internvl3 https://github.com/ollama/ollama/issues/10248

<!-- gh-comment-id:3721895548 --> @rick-github commented on GitHub (Jan 8, 2026): Superseded by internvl3 https://github.com/ollama/ollama/issues/10248
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49169