FiftyOne Model Zoo¶
FiftyOne provides a Model Zoo that contains a collection of pre-trained models that you can download and run inference on your FiftyOne Datasets via a few simple commands.
Note
Zoo models may require additional packages such as TensorFlow or PyTorch (or specific versions of them) in order to be used. See this section for more information on viewing/installing package requirements for models.
If you try to load a zoo model without the proper packages installed, you will receive an error message that will explain what you need to install.
Depending on your compute environment, some package requirement failures may be erroneous. In such cases, you can suppress error messages.
Available models¶
The Model Zoo contains over 70 pre-trained models that you can apply to your datasets with a few simple commands. Click the link below to see all of the models in the zoo!
Note
Did you know? You can also
pass custom models to methods like
apply_model()
and
compute_embeddings()
!
API reference¶
Check out the API reference for complete instructions for using the Model Zoo library.
Basic recipe¶
Methods for working with the Model Zoo are conveniently exposed via the Python
library and the CLI. The basic recipe is that you load a model from the zoo and
then apply it to a dataset (or a subset of the dataset specified by a
DatasetView
) using methods such as
apply_model()
and
compute_embeddings()
:
Prediction¶
The Model Zoo provides a number of convenient methods for generating predictions with zoo models for your datasets.
For example, the code sample below shows a self-contained example of loading a Faster R-CNN PyTorch model from the model zoo and adding its predictions to the COCO-2017 dataset from the Dataset Zoo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | import fiftyone as fo import fiftyone.zoo as foz # List available zoo models model_names = foz.list_zoo_models() print(model_names) # # Load zoo model # # This will download the model from the web, if necessary, and ensure # that any required packages are installed # model = foz.load_zoo_model("faster-rcnn-resnet50-fpn-coco-torch") # # Load some samples from the COCO-2017 validation split # # This will download the dataset from the web, if necessary # dataset = foz.load_zoo_dataset( "coco-2017", split="validation", dataset_name="coco-2017-validation-sample", max_samples=50, shuffle=True, ) # # Choose some samples to process. This can be the entire dataset, or a # subset of the dataset. In this case, we'll choose some samples at # random # samples = dataset.take(25) # # Generate predictions for each sample and store the results in the # `faster_rcnn` field of the dataset, discarding all predictions with # confidence below 0.5 # samples.apply_model(model, label_field="faster_rcnn", confidence_thresh=0.5) print(samples) # Visualize predictions in the App session = fo.launch_app(view=samples) |
Logits¶
Many classifiers in the Model Zoo can optionally store logits for their predictions.
Note
Storing logits for predictions enables you to run Brain methods such as label mistakes and sample hardness on your datasets!
You can check if a model exposes logits via
has_logits()
:
1 2 3 4 5 6 7 | import fiftyone.zoo as foz # Load zoo model model = foz.load_zoo_model("inception-v3-imagenet-torch") # Check if model has logits print(model.has_logits) # True |
For models that expose logits, you can store logits for all predictions
generated by
apply_model()
by passing the optional store_logits=True
argument:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import fiftyone.zoo as foz # Load zoo model model = foz.load_zoo_model("inception-v3-imagenet-torch") print(model.has_logits) # True # Load zoo dataset dataset = foz.load_zoo_dataset("imagenet-sample") # Select some samples to process samples = dataset.take(10) # Generate predictions and populate their `logits` fields samples.apply_model(model, store_logits=True) |
Embeddings¶
Many models in the Model Zoo expose embeddings for their predictions:
1 2 3 4 5 6 7 | import fiftyone.zoo as foz # Load zoo model model = foz.load_zoo_model("inception-v3-imagenet-torch") # Check if model exposes embeddings print(model.has_embeddings) # True |
For models that expose embeddings, you can generate embeddings for all
samples in a dataset (or a subset of it specified by a DatasetView
) by
calling
compute_embeddings()
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import fiftyone.zoo as foz # Load zoo model model = foz.load_zoo_model("inception-v3-imagenet-torch") print(model.has_embeddings) # True # Load zoo dataset dataset = foz.load_zoo_dataset("imagenet-sample") # Select some samples to process samples = dataset.take(10) # # Option 1: Generate embeddings for each sample and return them in a # `num_samples x dim` array # embeddings = samples.compute_embeddings(model) # # Option 2: Generate embeddings for each sample and store them in an # `embeddings` field of the dataset # samples.compute_embeddings(model, embeddings_field="embeddings") |
You can also use
compute_patch_embeddings()
to generate embeddings for image patches defined by another label field, e.g,.
the detections generated by a detection model.
Design overview¶
All models in the FiftyOne Model Zoo are instances of the Model
class, which
defines a common interface for loading models and generating predictions with
defined input and output data formats.
Note
The following sections describe the interface that all models in the Model
Zoo implement. If you write a wrapper for your custom model that implements
the Model
interface, then you can pass your models to builtin methods
like
apply_model()
and
compute_embeddings()
too!
FiftyOne provides classes that make it easy to deploy models in custom
frameworks easy. For example, if you have a PyTorch model that processes
images, you can likely use
TorchImageModel
to run it
using FiftyOne.
Prediction¶
Inside builtin methods like
apply_model()
,
predictions of a Model
instance are generated using the following pattern:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | import numpy as np from PIL import Image import fiftyone as fo def read_rgb_image(path): """Utility function that loads an image as an RGB numpy array.""" return np.asarray(Image.open(path).convert("rgb")) # Load a `Model` instance that processes images model = ... # Load a FiftyOne dataset dataset = fo.load_dataset(...) # A sample field in which to store the predictions label_field = "predictions" # Perform prediction on all images in the dataset with model: for sample in dataset: # Load image img = read_rgb_image(sample.filepath) # Perform prediction labels = model.predict(img) # Save labels sample.add_labels(labels, label_field=label_field) sample.save() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import eta.core.video as etav import fiftyone as fo # Load a `Model` instance that processes videos model = ... # Load a FiftyOne dataset dataset = fo.load_dataset(...) # A sample field in which to store the predictions label_field = "predictions" # Perform prediction on all videos in the dataset with model: for sample in dataset: # Perform prediction with etav.FFmpegVideoReader(sample.filepath) as video_reader: labels = model.predict(video_reader) # Save labels sample.add_labels(labels, label_field=label_field) sample.save() |
By convention, Model
instances must implement the context manager interface,
which handles any necessary setup and teardown required to use the model.
Predictions are generated via the
Model.predict()
interface method, which
takes an image/video as input and returns the predictions.
In order to be compatible with builtin methods like
apply_model()
,
models should support the following basic signature of running inference and
storing the output labels:
1 2 | labels = model.predict(arg) sample.add_labels(labels, label_field=label_field) |
where the model should, at minimum, support arg
values that are:
Image models: uint8 numpy arrays (HWC)
Video models:
eta.core.video.VideoReader
instances
and the output labels
can be any of the following:
A
Label
instance, in which case the labels are directly saved in the specifiedlabel_field
of the sample
1 2 | # Single sample-level label sample[label_field] = labels |
A dict mapping keys to
Label
instances. In this case, the labels are added as follows:
1 2 3 | # Multiple sample-level labels for key, value in labels.items(): sample[label_key(key)] = value |
A dict mapping frame numbers to
Label
instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:
1 2 3 4 5 6 7 | # Single set of per-frame labels sample.frames.merge( { frame_number: {label_field: label} for frame_number, label in labels.items() } ) |
A dict mapping frame numbers to dicts mapping keys to
Label
instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:
1 2 3 4 5 6 7 | # Multiple per-frame labels sample.frames.merge( { frame_number: {label_key(k): v for k, v in frame_dict.items()} for frame_number, frame_dict in labels.items() } ) |
In the above snippets, the label_key
function maps label dict keys to field
names, and is defined from label_field
as follows:
1 2 3 4 5 6 | if isinstance(label_field, dict): label_key = lambda k: label_field.get(k, k) elif label_field is not None: label_key = lambda k: label_field + "_" + k else: label_key = lambda k: k |
For models that support batching, the Model
interface also provides a
predict_all()
method that can
provide an efficient implementation of predicting on a batch of data.
Note
Builtin methods like
apply_model()
provide a batch_size
parameter that can be used to control the batch
size used when performing inference with models that support efficient
batching.
Note
PyTorch models can implement the TorchModelMixin
mixin, in which case
DataLoaders
are used to efficiently feed data to the models during inference.
Logits¶
Models that generate logits for their predictions can expose them to FiftyOne
by implementing the LogitsMixin
mixin.
Inside builtin methods like
apply_model()
,
if the user requests logits, the model’s
store_logits
property is set to indicate that the model should store logits in the Label
instances that it produces during inference.
Embeddings¶
Models that can compute embeddings for their input data can expose this
capability by implementing the EmbeddingsMixin
mixin.
Inside builtin methods like
compute_embeddings()
,
embeddings for a collection of samples are generated using an analogous pattern
to the prediction code shown above, except that the embeddings are generated
using Model.embed()
in
place of Model.predict()
.
By convention,
Model.embed()
should
return a numpy array containing the embedding.
Note
Sample embeddings are typically 1D vectors, but this is not strictly required.
For models that support batching, the EmbeddingsMixin
interface also provides
a embed_all()
method that can
provide an efficient implementation of embedding a batch of data.
Using custom models¶
FiftyOne provides a
TorchImageModel
class that you can use to load your own custom Torch model and pass it to
builtin methods like
apply_model()
and
compute_embeddings()
.
For example, the snippet below loads a pretrained model from torchvision
and uses it both as a classifier and to generate image embeddings:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import os import eta import fiftyone as fo import fiftyone.zoo as foz import fiftyone.utils.torch as fout dataset = foz.load_zoo_dataset("quickstart") labels_path = os.path.join( eta.constants.RESOURCES_DIR, "imagenet-labels-no-background.txt" ) config = fout.TorchImageModelConfig( { "entrypoint_fcn": "torchvision.models.mobilenet.mobilenet_v2", "entrypoint_args": {"weights": "MobileNet_V2_Weights.DEFAULT"}, "output_processor_cls": "fiftyone.utils.torch.ClassifierOutputProcessor", "labels_path": labels_path, "image_min_dim": 224, "image_max_dim": 2048, "image_mean": [0.485, 0.456, 0.406], "image_std": [0.229, 0.224, 0.225], "embeddings_layer": "<classifier.1", } ) model = fout.TorchImageModel(config) dataset.apply_model(model, label_field="imagenet") embeddings = dataset.compute_embeddings(model) |
The necessary configuration is provided via the
TorchImageModelConfig
class, which exposes a number of builtin mechanisms for defining the model to
load and any necessary preprocessing and post-processing.
Under the hood, the torch model is loaded via:
torch_model = entrypoint_fcn(**entrypoint_args)
which is assumed to return a torch.nn.Module
whose __call__()
method directly accepts Torch tensors (NCHW) as input.
The TorchImageModelConfig
class provides a number of builtin mechanisms for specifying the required
preprocessing for your model, such as resizing and normalization. In the above
example, image_min_dim
, image_max_dim
, image_mean
, and image_std
are
used.
The output_processor_cls
parameter of
TorchImageModelConfig
must be set to the fully-qualified class name of an
OutputProcessor
subclass that
defines how to translate the model’s raw output into the suitable FiftyOne
Label
types, and is instantiated as follows:
output_processor = output_processor_cls(classes=classes, **output_processor_args)
where your model’s classes can be specified via any of the classes
,
labels_string
, or labels_path
parameters of
TorchImageModelConfig
.
The following builtin output processors are available for use:
or you can write your own
OutputProcessor
subclass.
Finally, if you would like to pass your custom model to methods like
compute_embeddings()
,
set the embeddings_layer
parameter to the name of a layer whose output to
expose as embeddings (or prepend <
to use the input tensor instead).
Note
Did you know? You can also register your custom model under a name of your choice so that it can be loaded and used as follows:
model = foz.load_zoo_model("your-custom-model")
dataset.apply_model(model, label_field="predictions")