[GH-ISSUE #8422] Support for llamaindex/vdr-2b-multi-v1: Multilingual Visual Document Retrieval Model #5411

Open
opened 2026-04-12 16:38:55 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @JPC612 on GitHub (Jan 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8422

vdr-2b-multi-v1 is a cutting-edge multilingual embedding model designed for visual document retrieval across various languages and domains. The model encodes document page screenshots into dense single-vector representations, allowing efficient search and querying of visually rich multilingual documents without OCR or data extraction pipelines.

https://huggingface.co/llamaindex/vdr-2b-multi-v1
https://huggingface.co/blog/vdr-2b-multilingual

Highlights:

  • Multilingual Training: Trained on Italian, Spanish, English, French, and German, forming a dataset of 500k high-quality samples.
  • Low VRAM and Faster Inference: 3x faster inference with only 30% of the image tokens used by its base model.
  • Cross-Lingual Retrieval: Search German documents using Italian queries with superior accuracy.
  • Matryoshka Representation Learning (MRL): Enables dimensional reduction while maintaining embedding quality, optimizing both retrieval speed and storage.

Why Include This Model?

  • Multilingual Applications: Especially beneficial for regions like Europe, where multilingual documents are prevalent.
  • Performance and Efficiency: Outperforms previous benchmarks in terms of speed, memory efficiency, and retrieval accuracy.
  • Open Source Contributions: Accompanied by the largest open-source multilingual dataset for visual document retrieval (vdr-multilingual-train).
Originally created by @JPC612 on GitHub (Jan 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8422 vdr-2b-multi-v1 is a cutting-edge multilingual embedding model designed for visual document retrieval across various languages and domains. The model encodes document page screenshots into dense single-vector representations, allowing efficient search and querying of visually rich multilingual documents without OCR or data extraction pipelines. https://huggingface.co/llamaindex/vdr-2b-multi-v1 https://huggingface.co/blog/vdr-2b-multilingual Highlights: - Multilingual Training: Trained on Italian, Spanish, English, French, and German, forming a dataset of 500k high-quality samples. - Low VRAM and Faster Inference: 3x faster inference with only 30% of the image tokens used by its base model. - Cross-Lingual Retrieval: Search German documents using Italian queries with superior accuracy. - Matryoshka Representation Learning (MRL): Enables dimensional reduction while maintaining embedding quality, optimizing both retrieval speed and storage. Why Include This Model? - Multilingual Applications: Especially beneficial for regions like Europe, where multilingual documents are prevalent. - Performance and Efficiency: Outperforms previous benchmarks in terms of speed, memory efficiency, and retrieval accuracy. - Open Source Contributions: Accompanied by the largest open-source multilingual dataset for visual document retrieval (vdr-multilingual-train).
GiteaMirror added the model label 2026-04-12 16:38:55 -05:00
Author
Owner

@DewiarQR commented on GitHub (Feb 13, 2025):

Поддерживаю. Хотелось бы видеть эту модель в экосистеме ollama

<!-- gh-comment-id:2656330390 --> @DewiarQR commented on GitHub (Feb 13, 2025): Поддерживаю. Хотелось бы видеть эту модель в экосистеме ollama
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5411