ModernVBERT/colmodernvbert#

Note

This is a remotely-sourced model from the colmodernvbert plugin, maintained by the community. It is not part of FiftyOne core and may have special installation requirements. Please review the plugin documentation and license before use.

The ModernVBERT suite is a suite of compact 250M-parameter vision-language encoders. ColModernVBERT is the late-interaction version that is fine-tuned for visual document retrieval tasks, the most performant model on this task..

Details

Model name: ModernVBERT/colmodernvbert
Model source: https://huggingface.co/ModernVBERT/colmodernvbert
Model author: Paul Teiletche, et. al
Model license: MIT
Exposes embeddings? yes
Tags: classification, logits, embeddings, torch, visual-document-retrieval, zero-shot

Requirements

Packages: huggingface-hub, transformers, torch, torchvision, colpali-engine
CPU support
- yes
GPU support
- yes

Example usage

import fiftyone as fo
import fiftyone.zoo as foz

foz.register_zoo_model_source("https://github.com/harpreetsahota204/colmodernvbert")

dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    dataset_name=fo.get_default_dataset_name(),
    max_samples=50,
    shuffle=True,
)

model = foz.load_zoo_model("ModernVBERT/colmodernvbert")

dataset.apply_model(model, label_field="predictions")

session = fo.launch_app(dataset)