OpenCLIP Integration#

FiftyOne integrates natively with the OpenCLIP library, an open source implementation of OpenAI’s CLIP (Contrastive Language-Image Pre-training) model that you can use to run inference on your FiftyOne datasets with a few lines of code!

Setup#

To get started with OpenCLIP, install the open_clip_torch package:

1pip install open_clip_torch
2
3# May also be needed
4pip install timm --upgrade

Model zoo#

You can load the original ViT-B-32 OpenAI pretrained model from the FiftyOne Model Zoo as follows:

1import fiftyone.zoo as foz
2
3model = foz.load_zoo_model("open-clip-torch")

You can also specify different model architectures and pretrained weights by passing in optional parameters. Pretrained models can be loaded directly from OpenCLIP or from Hugging Face’s Model Hub:

 1rn50 = foz.load_zoo_model(
 2    "open-clip-torch",
 3    clip_model="RN50",
 4    pretrained="cc12m",
 5)
 6
 7meta_clip = foz.load_zoo_model(
 8    "open-clip-torch",
 9    clip_model="ViT-B-32-quickgelu",
10    pretrained="metaclip_400m",
11)
12
13eva_clip = foz.load_zoo_model(
14    "open-clip-torch",
15    clip_model="EVA02-B-16",
16    pretrained="merged2b_s8b_b131k",
17)
18
19clipa = foz.load_zoo_model(
20    "open-clip-torch",
21    clip_model="hf-hub:UCSC-VLAA/ViT-L-14-CLIPA-datacomp1B",
22    pretrained="",
23)
24
25siglip = foz.load_zoo_model(
26    "open-clip-torch",
27    clip_model="hf-hub:timm/ViT-B-16-SigLIP",
28    pretrained="",
29)

Inference#

When running inference with OpenCLIP, you can specify a text prompt to help guide the model towards a solution as well as only specify a certain number of classes to output during zero shot classification.

Note

While OpenCLIP models are typically set to train mode by default, the FiftyOne integration sets the model to eval mode before running inference.

For example we can run inference as such:

 1import fiftyone as fo
 2import fiftyone.zoo as foz
 3
 4dataset = foz.load_zoo_dataset("quickstart")
 5
 6model = foz.load_zoo_model(
 7    "open-clip-torch",
 8    text_prompt="A photo of a",
 9    classes=["person", "dog", "cat", "bird", "car", "tree", "chair"],
10)
11
12dataset.apply_model(model, label_field="clip_predictions")
13
14session = fo.launch_app(dataset)
zero-shot-classification-example

Embeddings#

Another application of OpenCLIP is embeddings visualization.

For example, let’s compare the embeddings of the original OpenAI CLIP model to MetaCLIP. We’ll also perform a quick zero shot classification to color the embeddings:

 1import fiftyone.brain as fob
 2
 3meta_clip = foz.load_zoo_model(
 4    "open-clip-torch",
 5    clip_model="ViT-B-32-quickgelu",
 6    pretrained="metaclip_400m",
 7    text_prompt="A photo of a",
 8)
 9
10dataset.apply_model(meta_clip, label_field="meta_clip_classification")
11
12fob.compute_visualization(
13    dataset,
14    model=meta_clip,
15    brain_key="meta_clip",
16)
17
18openai_clip = foz.load_zoo_model(
19    "open-clip-torch",
20    text_prompt="A photo of a",
21)
22
23dataset.apply_model(openai_clip, label_field="openai_clip_classifications")
24
25fob.compute_visualization(
26    dataset,
27    model=openai_clip,
28    brain_key="openai_clip",
29)

Here is the final result!

clip-compare