OpenCLIP Integration¶
FiftyOne integrates natively with the OpenCLIP library, an open source implementation of OpenAI’s CLIP (Contrastive Language-Image Pre-training) model that you can use to run inference on your FiftyOne datasets with a few lines of code!
Setup¶
To get started with OpenCLIP, install the open_clip_torch
package:
1 2 3 4 | pip install open_clip_torch # May also be needed pip install timm --upgrade |
Model zoo¶
You can load the original ViT-B-32 OpenAI pretrained model from the FiftyOne Model Zoo as follows:
1 2 3 | import fiftyone.zoo as foz model = foz.load_zoo_model("open-clip-torch") |
You can also specify different model architectures and pretrained weights by passing in optional parameters. Pretrained models can be loaded directly from OpenCLIP or from Hugging Face’s Model Hub:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | rn50 = foz.load_zoo_model( "open-clip-torch", clip_model="RN50", pretrained="cc12m", ) meta_clip = foz.load_zoo_model( "open-clip-torch", clip_model="ViT-B-32-quickgelu", pretrained="metaclip_400m", ) eva_clip = foz.load_zoo_model( "open-clip-torch", clip_model="EVA02-B-16", pretrained="merged2b_s8b_b131k", ) clipa = foz.load_zoo_model( "open-clip-torch", clip_model="hf-hub:UCSC-VLAA/ViT-L-14-CLIPA-datacomp1B", pretrained="", ) siglip = foz.load_zoo_model( "open-clip-torch", clip_model="hf-hub:timm/ViT-B-16-SigLIP", pretrained="", ) |
Inference¶
When running inference with OpenCLIP, you can specify a text prompt to help guide the model towards a solution as well as only specify a certain number of classes to output during zero shot classification.
For example we can run inference as such:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") model = foz.load_zoo_model( "open-clip-torch", text_prompt="A photo of a", classes=["person", "dog", "cat", "bird", "car", "tree", "chair"], ) dataset.apply_model(model, label_field="clip_predictions") session = fo.launch_app(dataset) |
Embeddings¶
Another application of OpenCLIP is embeddings visualization.
For example, let’s compare the embeddings of the original OpenAI CLIP model to MetaCLIP. We’ll also perform a quick zero shot classification to color the embeddings:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import fiftyone.brain as fob meta_clip = foz.load_zoo_model( "open-clip-torch", clip_model="ViT-B-32-quickgelu", pretrained="metaclip_400m", text_prompt="A photo of a", ) dataset.apply_model(meta_clip, label_field="meta_clip_classification") fob.compute_visualization( dataset, model=meta_clip, brain_key="meta_clip", ) openai_clip = foz.load_zoo_model( "open-clip-torch", text_prompt="A photo of a", ) dataset.apply_model(openai_clip, label_field="openai_clip_classifications") fob.compute_visualization( dataset, model=openai_clip, brain_key="openai_clip", ) |
Here is the final result!
Text similarity search¶
OpenCLIP can also be used for text similarity search.
To use a specific pretrained-checkpoint pair for text similarity search, pass
these in as a dictionary via the model_kwargs
argument to
compute_similarity()
.
For example, for MetaCLIP, we can do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import fiftyone as fo import fiftyone.zoo as foz import fiftyone.brain as fob dataset = foz.load_zoo_dataset("quickstart") model_kwargs = { "clip_model": "ViT-B-32-quickgelu", "pretrained": "metaclip_400m", "text_prompt": "A photo of a", } fob.compute_similarity( dataset, model="open-clip-torch", model_kwargs=model_kwargs, brain_key="sim_metaclip", ) |
You can then search by text similarity in Python via the
sort_by_similarity()
stage as follows:
1 2 3 | query = "kites flying in the sky" view = dataset.sort_by_similarity(query, k=25, brain_key="sim_metaclip") |
Note
Did you know? You can also perform text similarity queries directly in the App!