Built-In Zoo Models¶

This page lists all of the natively available models in the FiftyOne Model Zoo.

Check out the API reference for complete instructions for using the Model Zoo.

alexnet-imagenet-torch

AlexNet model architecture from "One weird trick for parallelizing convolutional neural networks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Alexnet

centernet-hg104-1024-coco-tf2

CenterNet model from "Objects as Points" with the Hourglass-104 backbone trained on COCO resized to 1024x1024

Detection,Coco,TensorFlow-2,Centernet

centernet-hg104-512-coco-tf2

CenterNet model from "Objects as Points" with the Hourglass-104 backbone trained on COCO resized to 512x512

Detection,Coco,TensorFlow-2,Centernet

centernet-mobilenet-v2-fpn-512-coco-tf2

CenterNet model from "Objects as Points" with the MobileNetV2 backbone trained on COCO resized to 512x512

Detection,Coco,TensorFlow-2,Centernet,Mobilenet

centernet-resnet101-v1-fpn-512-coco-tf2

CenterNet model from "Objects as Points" with the ResNet-101v1 backbone + FPN trained on COCO resized to 512x512

Detection,Coco,TensorFlow-2,Centernet,Resnet

centernet-resnet50-v1-fpn-512-coco-tf2

CenterNet model from "Objects as Points" with the ResNet-50-v1 backbone + FPN trained on COCO resized to 512x512

Detection,Coco,TensorFlow-2,Centernet,Resnet

centernet-resnet50-v2-512-coco-tf2

CenterNet model from "Objects as Points" with the ResNet-50v2 backbone trained on COCO resized to 512x512

Detection,Coco,TensorFlow-2,Centernet,Resnet

classification-transformer-torch

Hugging Face Transformers model for image classification

Classification,Logits,Embeddings,PyTorch,Transformers

clip-vit-base32-torch

CLIP text/image encoder from "Learning Transferable Visual Models From Natural Language Supervision" trained on 400M text-image pairs

Classification,Logits,Embeddings,PyTorch,Clip,Zero-shot

deeplabv3-cityscapes-tf

DeepLabv3+ semantic segmentation model from "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" with Xception backbone trained on the Cityscapes dataset

Segmentation,Cityscapes,TensorFlow,Deeplabv3

deeplabv3-mnv2-cityscapes-tf

DeepLabv3+ semantic segmentation model from "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" with MobileNetV2 backbone trained on the Cityscapes dataset

Segmentation,Cityscapes,TensorFlow,Deeplabv3

deeplabv3-resnet101-coco-torch

DeepLabV3 model from "Rethinking Atrous Convolution for Semantic Image Segmentation" with ResNet-101 backbone trained on COCO

Segmentation,Coco,PyTorch,Resnet,Deeplabv3

deeplabv3-resnet50-coco-torch

DeepLabV3 model from "Rethinking Atrous Convolution for Semantic Image Segmentation" with ResNet-50 backbone trained on COCO

Segmentation,Coco,PyTorch,Resnet,Deeplabv3

densenet121-imagenet-torch

Densenet-121 model from "Densely Connected Convolutional Networks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Densenet

densenet161-imagenet-torch

Densenet-161 model from "Densely Connected Convolutional Networks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Densenet

densenet169-imagenet-torch

Densenet-169 model from "Densely Connected Convolutional Networks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Densenet

densenet201-imagenet-torch

Densenet-201 model from "Densely Connected Convolutional Networks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Densenet

depth-estimation-transformer-torch

Hugging Face Transformers model for monocular depth estimation

Depth,PyTorch,Transformers

detection-transformer-torch

Hugging Face Transformers model for object detection

Detection,Logits,Embeddings,PyTorch,Transformers

dinov2-vitb14-reg-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-B/14 distilled

Embeddings,PyTorch,Dinov2

dinov2-vitb14-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-B/14 distilled

Embeddings,PyTorch,Dinov2

dinov2-vitg14-reg-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-g/14

Embeddings,PyTorch,Dinov2

dinov2-vitg14-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-g/14

Embeddings,PyTorch,Dinov2

dinov2-vitl14-reg-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-L/14 distilled

Embeddings,PyTorch,Dinov2

dinov2-vitl14-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-L/14 distilled

Embeddings,PyTorch,Dinov2

dinov2-vits14-reg-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-S/14 distilled

Embeddings,PyTorch,Dinov2

dinov2-vits14-torch

DINOv2: Learning Robust Visual Features without Supervision. Model: ViT-S/14 distilled

Embeddings,PyTorch,Dinov2

efficientdet-d0-512-coco-tf2

EfficientDet-D0 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 512x512

Detection,Coco,TensorFlow-2,Efficientdet

efficientdet-d0-coco-tf1

EfficientDet-D0 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO

Detection,Coco,TensorFlow-1,Efficientdet

efficientdet-d1-640-coco-tf2

EfficientDet-D1 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 640x640

Detection,Coco,TensorFlow-2,Efficientdet

efficientdet-d1-coco-tf1

EfficientDet-D1 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO

Detection,Coco,TensorFlow-1,Efficientdet

efficientdet-d2-768-coco-tf2

EfficientDet-D2 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 768x768

Detection,Coco,TensorFlow-2,Efficientdet

efficientdet-d2-coco-tf1

EfficientDet-D2 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO

Detection,Coco,TensorFlow-1,Efficientdet

efficientdet-d3-896-coco-tf2

EfficientDet-D3 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 896x896

Detection,Coco,TensorFlow-2,Efficientdet

efficientdet-d3-coco-tf1

EfficientDet-D3 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO

Detection,Coco,TensorFlow-1,Efficientdet

efficientdet-d4-1024-coco-tf2

EfficientDet-D4 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 1024x1024

Detection,Coco,TensorFlow-2,Efficientdet

efficientdet-d4-coco-tf1

EfficientDet-D4 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO

Detection,Coco,TensorFlow-1,Efficientdet

efficientdet-d5-1280-coco-tf2

EfficientDet-D5 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 1280x1280

Detection,Coco,TensorFlow-2,Efficientdet

efficientdet-d5-coco-tf1

EfficientDet-D5 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO

Detection,Coco,TensorFlow-1,Efficientdet

efficientdet-d6-1280-coco-tf2

EfficientDet-D6 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 1280x1280

Detection,Coco,TensorFlow-2,Efficientdet

efficientdet-d6-coco-tf1

EfficientDet-D6 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO

Detection,Coco,TensorFlow-1,Efficientdet

efficientdet-d7-1536-coco-tf2

EfficientDet-D7 model from "EfficientDet: Scalable and Efficient Object Detection" trained on COCO resized to 1536x1536

Detection,Coco,TensorFlow-2,Efficientdet

faster-rcnn-inception-resnet-atrous-v2-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" atrous version with Inception backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn,Inception,Resnet

faster-rcnn-inception-resnet-atrous-v2-lowproposals-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" atrous version with low-proposals and Inception backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn,Inception,Resnet

faster-rcnn-inception-v2-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with Inception v2 backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn,Inception

faster-rcnn-nas-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with NAS-net backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn

faster-rcnn-nas-lowproposals-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with low-proposals and NAS-net backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn

faster-rcnn-resnet101-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with ResNet-101 backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn,Resnet

faster-rcnn-resnet101-lowproposals-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with low-proposals and ResNet-101 backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn,Resnet

faster-rcnn-resnet50-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with ResNet-50 backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn,Resnet

faster-rcnn-resnet50-fpn-coco-torch

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with ResNet-50 FPN backbone trained on COCO

Detection,Coco,PyTorch,Faster-rcnn,Resnet

faster-rcnn-resnet50-lowproposals-coco-tf

Faster R-CNN model from "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" with low-proposals and ResNet-50 backbone trained on COCO

Detection,Coco,TensorFlow,Faster-rcnn,Resnet

fcn-resnet101-coco-torch

FCN model from "Fully Convolutional Networks for Semantic Segmentation" with ResNet-101 backbone trained on COCO

Segmentation,Coco,PyTorch,Fcn,Resnet

fcn-resnet50-coco-torch

FCN model from "Fully Convolutional Networks for Semantic Segmentation" with ResNet-50 backbone trained on COCO

Segmentation,Coco,PyTorch,Fcn,Resnet

googlenet-imagenet-torch

GoogLeNet (Inception v1) model from "Going Deeper with Convolutions" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Googlenet

group-vit-segmentation-transformer-torch

Hugging Face Transformers model for zero-shot semantic segmentation

Segmentation,Embeddings,PyTorch,Transformers,Zero-shot

inception-resnet-v2-imagenet-tf1

Inception v2 model from "Rethinking the Inception Architecture for Computer Vision" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,TensorFlow-1,Inception,Resnet

inception-v3-imagenet-torch

Inception v3 model from "Rethinking the Inception Architecture for Computer Vision" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Inception

inception-v4-imagenet-tf1

Inception v4 model from "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,TensorFlow-1,Inception

keypoint-rcnn-resnet50-fpn-coco-torch

Keypoint R-CNN model from "Mask R-CNN" with ResNet-50 FPN backbone trained on COCO

Keypoints,Coco,PyTorch,Keypoint-rcnn,Resnet

mask-rcnn-inception-resnet-v2-atrous-coco-tf

Mask R-CNN model from "Mask R-CNN" atrous version with Inception backbone trained on COCO

Instances,Coco,TensorFlow,Mask-rcnn,Inception,Resnet

mask-rcnn-inception-v2-coco-tf

Mask R-CNN model from "Mask R-CNN" with Inception backbone trained on COCO

Instances,Coco,TensorFlow,Mask-rcnn,Inception

mask-rcnn-resnet101-atrous-coco-tf

Mask R-CNN model from "Mask R-CNN" atrous version with ResNet-101 backbone trained on COCO

Instances,Coco,TensorFlow,Mask-rcnn,Resnet

mask-rcnn-resnet50-atrous-coco-tf

Mask R-CNN model from "Mask R-CNN" atrous version with ResNet-50 backbone trained on COCO

Instances,Coco,TensorFlow,Mask-rcnn,Resnet

mask-rcnn-resnet50-fpn-coco-torch

Mask R-CNN model from "Mask R-CNN" with ResNet-50 FPN backbone trained on COCO

Instances,Coco,PyTorch,Mask-rcnn,Resnet

med-sam-2-video-torch

Fine-tuned SAM2-hiera-tiny model from "Medical SAM 2 - Segment Medical Images as Video via Segment Anything Model 2"

Segment-anything,PyTorch,Zero-shot,Video,Med-sam

mnasnet0.5-imagenet-torch

MNASNet model from "MnasNet: Platform-Aware Neural Architecture Search for Mobile" with depth multiplier of 0.5 trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Mnasnet

mnasnet1.0-imagenet-torch

MNASNet model from "MnasNet: Platform-Aware Neural Architecture Search for Mobile" with depth multiplier of 1.0 trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Mnasnet

mobilenet-v2-imagenet-tf1

MobileNetV2 model from "MobileNetV2: Inverted Residuals and Linear Bottlenecks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,TensorFlow-1,Mobilenet

mobilenet-v2-imagenet-torch

MobileNetV2 model from "MobileNetV2: Inverted Residuals and Linear Bottlenecks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Mobilenet

omdet-turbo-swin-tiny-torch

Hugging Face Transformers OmDet-Turbo

Detection,Logits,Embeddings,PyTorch,Transformers,Zero-shot

open-clip-torch

OPEN CLIP text/image encoder from "Learning Transferable Visual Models From Natural Language Supervision" trained on 400M text-image pairs

Classification,Logits,Embeddings,PyTorch,Clip,Zero-shot

owlvit-base-patch16-torch

Hugging Face Transformers OWL-ViT

Detection,Logits,Embeddings,PyTorch,Transformers,Zero-shot

resnet-v1-50-imagenet-tf1

ResNet-50 v1 model from "Deep Residual Learning for Image Recognition" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,TensorFlow-1,Resnet

resnet-v2-50-imagenet-tf1

ResNet-50 v2 model from "Deep Residual Learning for Image Recognition" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,TensorFlow-1,Resnet

resnet101-imagenet-torch

ResNet-101 model from "Deep Residual Learning for Image Recognition" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Resnet

resnet152-imagenet-torch

ResNet-152 model from "Deep Residual Learning for Image Recognition" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Resnet

resnet18-imagenet-torch

ResNet-18 model from "Deep Residual Learning for Image Recognition" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Resnet

resnet34-imagenet-torch

ResNet-34 model from "Deep Residual Learning for Image Recognition" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Resnet

resnet50-imagenet-torch

ResNet-50 model from "Deep Residual Learning for Image Recognition" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Resnet

resnext101-32x8d-imagenet-torch

ResNeXt-101 32x8d model from "Aggregated Residual Transformations for Deep Neural Networks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Resnext

resnext50-32x4d-imagenet-torch

ResNeXt-50 32x4d model from "Aggregated Residual Transformations for Deep Neural Networks" trained on ImageNet

Classification,Embeddings,Logits,Imagenet,PyTorch,Resnext

retinanet-resnet50-fpn-coco-torch

RetinaNet model from "Focal Loss for Dense Object Detection" with ResNet-50 FPN backbone trained on COCO

Detection,Coco,PyTorch,Retinanet,Resnet

rfcn-resnet101-coco-tf

R-FCN object detection model from "R-FCN: Object Detection via Region-based Fully Convolutional Networks" with ResNet-101 backbone trained on COCO

Detection,Coco,TensorFlow,Rfcn,Resnet

rtdetr-l-coco-torch

RT-DETR-l model trained on COCO

Detection,Coco,PyTorch,Transformer,Rtdetr