Loading data into FiftyOne¶

The first step to using FiftyOne is to load your data into a dataset. FiftyOne supports automatic loading of datasets stored in various common formats. If your dataset is stored in a custom format, don’t worry, FiftyOne also provides support for easily loading datasets in custom formats.

Check out the sections below to see which import pattern is the best fit for your data.

Note

Did you know? You can import media and/or labels from within the FiftyOne App by installing the @voxel51/io plugin!

Note

When you create a Dataset, its samples and all of their fields (metadata, labels, custom fields, etc.) are written to FiftyOne’s backing database.

Important: Samples only store the filepath to the media, not the raw media itself. FiftyOne does not create duplicate copies of your data!

Common formats¶

If your data is stored on disk in one of the many common formats supported natively by FiftyOne, then you can automatically load your data into a Dataset via the following simple pattern:

import fiftyone as fo

# A name for the dataset
name = "my-dataset"

# The directory containing the dataset to import
dataset_dir = "/path/to/dataset"

# The type of the dataset being imported
dataset_type = fo.types.COCODetectionDataset  # for example

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=dataset_type,
    name=name,
)

Note

Check out this page for more details about loading datasets from disk in common formats!

Custom formats¶

The simplest and most flexible approach to loading your data into FiftyOne is to iterate over your data in a simple Python loop, create a Sample for each data + label(s) pair, and then add those samples to a Dataset.

FiftyOne provides label types for common tasks such as classification, detection, segmentation, and many more. The examples below give you a sense of the basic workflow for a few tasks:

import glob
import fiftyone as fo

images_patt = "/path/to/images/*"

# Ex: your custom label format
annotations = {
    "/path/to/images/000001.jpg": "dog",
    ....,
}

# Create samples for your data
samples = []
for filepath in glob.glob(images_patt):
    sample = fo.Sample(filepath=filepath)

    # Store classification in a field name of your choice
    label = annotations[filepath]
    sample["ground_truth"] = fo.Classification(label=label)

    samples.append(sample)

# Create dataset
dataset = fo.Dataset("my-classification-dataset")
dataset.add_samples(samples)

import glob
import fiftyone as fo

images_patt = "/path/to/images/*"

# Ex: your custom label format
annotations = {
    "/path/to/images/000001.jpg": [
        {"bbox": ..., "label": ...},
        ...
    ],
    ...
}

# Create samples for your data
samples = []
for filepath in glob.glob(images_patt):
    sample = fo.Sample(filepath=filepath)

    # Convert detections to FiftyOne format
    detections = []
    for obj in annotations[filepath]:
        label = obj["label"]

        # Bounding box coordinates should be relative values
        # in [0, 1] in the following format:
        # [top-left-x, top-left-y, width, height]
        bounding_box = obj["bbox"]

        detections.append(
            fo.Detection(label=label, bounding_box=bounding_box)
        )

    # Store detections in a field name of your choice
    sample["ground_truth"] = fo.Detections(detections=detections)

    samples.append(sample)

# Create dataset
dataset = fo.Dataset("my-detection-dataset")
dataset.add_samples(samples)

import fiftyone as fo

video_path = "/path/to/video.mp4"

# Ex: your custom label format
frame_labels = {
    1: {
        "weather": "sunny",
        "objects": [
            {
                "label": ...
                "bbox": ...
            },
            ...
        ]
    },
    ...
}

# Create video sample with frame labels
sample = fo.Sample(filepath=video_path)
for frame_number, labels in frame_labels.items():
    frame = fo.Frame()

    # Store a frame classification
    weather = labels["weather"]
    frame["weather"] = fo.Classification(label=weather)

    # Convert detections to FiftyOne format
    detections = []
    for obj in labels["objects"]:
        label = obj["label"]

        # Bounding box coordinates should be relative values
        # in [0, 1] in the following format:
        # [top-left-x, top-left-y, width, height]
        bounding_box = obj["bbox"]

        detections.append(
            fo.Detection(label=label, bounding_box=bounding_box)
        )

    # Store object detections
    frame["objects"] = fo.Detections(detections=detections)

    # Add frame to sample
    sample.frames[frame_number] = frame

# Create dataset
dataset = fo.Dataset("my-labeled-video-dataset")
dataset.add_sample(sample)

Note that using Dataset.add_samples() to add batches of samples to your datasets can be significantly more efficient than adding samples one-by-one via Dataset.add_sample().

Note

If you use the same custom data format frequently in your workflows, then writing a custom dataset importer is a great way to abstract and streamline the loading of your data into FiftyOne.

Loading images¶

If you’re just getting started with a project and all you have is a bunch of image files, you can easily load them into a FiftyOne dataset and start visualizing them in the App:

You can use the Dataset.from_images(), Dataset.from_images_dir(), and Dataset.from_images_patt() factory methods to load your images into FiftyOne:

import fiftyone as fo

# Create a dataset from a list of images
dataset = fo.Dataset.from_images(
    ["/path/to/image1.jpg", "/path/to/image2.jpg", ...]
)

# Create a dataset from a directory of images
dataset = fo.Dataset.from_images_dir("/path/to/images")

# Create a dataset from a glob pattern of images
dataset = fo.Dataset.from_images_patt("/path/to/images/*.jpg")

session = fo.launch_app(dataset)

You can also use Dataset.add_images(), Dataset.add_images_dir(), and Dataset.add_images_patt() to add images to an existing dataset.

You can use the fiftyone app view command from the CLI to quickly browse images in the App without creating a (persistent) FiftyOne dataset:

# View a glob pattern of images in the App
fiftyone app view --images-patt '/path/to/images/*.jpg'

# View a directory of images in the App
fiftyone app view --images-dir '/path/to/images'

Loading videos¶

If you’re just getting started with a project and all you have is a bunch of video files, you can easily load them into a FiftyOne dataset and start visualizing them in the App:

You can use the Dataset.from_videos(), Dataset.from_videos_dir(), and Dataset.from_videos_patt() factory methods to load your videos into FiftyOne:

import fiftyone as fo

# Create a dataset from a list of videos
dataset = fo.Dataset.from_videos(
    ["/path/to/video1.mp4", "/path/to/video2.mp4", ...]
)

# Create a dataset from a directory of videos
dataset = fo.Dataset.from_videos_dir("/path/to/videos")

# Create a dataset from a glob pattern of videos
dataset = fo.Dataset.from_videos_patt("/path/to/videos/*.mp4")

session = fo.launch_app(dataset)

You can also use Dataset.add_videos(), Dataset.add_videos_dir(), and Dataset.add_videos_patt() to add videos to an existing dataset.

You can use the fiftyone app view command from the CLI to quickly browse videos in the App without creating a (persistent) FiftyOne dataset:

# View a glob pattern of videos in the App
fiftyone app view --videos-patt '/path/to/videos/*.mp4'

# View a directory of videos in the App
fiftyone app view --videos-dir '/path/to/videos'

Model predictions¶

Once you’ve created a dataset and ground truth labels, you can easily add model predictions to take advantage of FiftyOne’s evaluation capabilities.

If you have model predictions stored in COCO format, then you can use add_coco_labels() to conveniently add the labels to an existing dataset.

The example below demonstrates a round-trip export and then re-import of both images-and-labels and labels-only data in COCO format:

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.coco as fouc

dataset = foz.load_zoo_dataset("quickstart")
classes = dataset.distinct("predictions.detections.label")

# Export images and ground truth labels to disk
dataset.export(
    export_dir="/tmp/coco",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth",
    classes=classes,
)

# Export predictions
dataset.export(
    dataset_type=fo.types.COCODetectionDataset,
    labels_path="/tmp/coco/predictions.json",
    label_field="predictions",
    classes=classes,
)

# Now load ground truth labels into a new dataset
dataset2 = fo.Dataset.from_dir(
    dataset_dir="/tmp/coco",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth",
    label_types="detections",
)

# And add model predictions
fouc.add_coco_labels(
    dataset2,
    "predictions",
    "/tmp/coco/predictions.json",
    classes,
)

# Verify that ground truth and predictions were imported as expected
print(dataset.count("ground_truth.detections"))
print(dataset2.count("ground_truth.detections"))
print(dataset.count("predictions.detections"))
print(dataset2.count("predictions.detections"))

Note

See add_coco_labels() for a complete description of the available syntaxes for loading COCO-formatted predictions to an existing dataset.

If you have model predictions stored in YOLO format, then you can use add_yolo_labels() to conveniently add the labels to an existing dataset.

The example below demonstrates a round-trip export and then re-import of both images-and-labels and labels-only data in YOLO format:

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.yolo as fouy

dataset = foz.load_zoo_dataset("quickstart")
classes = dataset.distinct("predictions.detections.label")

# Export images and ground truth labels to disk
dataset.export(
    export_dir="/tmp/yolov4",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
    classes=classes,
)

# Export predictions
dataset.export(
    dataset_type=fo.types.YOLOv4Dataset,
    labels_path="/tmp/yolov4/predictions",
    label_field="predictions",
    classes=classes,
)

# Now load ground truth labels into a new dataset
dataset2 = fo.Dataset.from_dir(
    dataset_dir="/tmp/yolov4",
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
)

# And add model predictions
fouy.add_yolo_labels(
    dataset2,
    "predictions",
    "/tmp/yolov4/predictions",
    classes,
)

# Verify that ground truth and predictions were imported as expected
print(dataset.count("ground_truth.detections"))
print(dataset2.count("ground_truth.detections"))
print(dataset.count("predictions.detections"))
print(dataset2.count("predictions.detections"))

Note

See add_yolo_labels() for a complete description of the available syntaxes for loading YOLO-formatted predictions to an existing dataset.

Model predictions stored in other formats can always be loaded iteratively through a simple Python loop.

The example below shows how to add object detection predictions to a dataset, but many other label types are also supported.

import fiftyone as fo

# Ex: your custom predictions format
predictions = {
    "/path/to/images/000001.jpg": [
        {"bbox": ..., "label": ..., "score": ...},
        ...
    ],
    ...
}

# Add predictions to your samples
for sample in dataset:
    filepath = sample.filepath

    # Convert predictions to FiftyOne format
    detections = []
    for obj in predictions[filepath]:
        label = obj["label"]
        confidence = obj["score"]

        # Bounding box coordinates should be relative values
        # in [0, 1] in the following format:
        # [top-left-x, top-left-y, width, height]
        bounding_box = obj["bbox"]

        detections.append(
            fo.Detection(
                label=label,
                bounding_box=bounding_box,
                confidence=confidence,
            )
        )

    # Store detections in a field name of your choice
    sample["predictions"] = fo.Detections(detections=detections)

    sample.save()

Note

If you are in need of a model to run on your dataset, check out the FiftyOne Model Zoo or the Lightning Flash integration.

Need data?¶

The FiftyOne Dataset Zoo contains dozens of popular public datasets that you can load into FiftyOne in a single line of code:

import fiftyone.zoo as foz

# List available datasets
print(foz.list_zoo_datasets())
# ['coco-2014', ...,  'kitti', ..., 'voc-2012', ...]

# Load a split of a zoo dataset
dataset = foz.load_zoo_dataset("cifar10", split="train")

Note

Check out the available zoo datasets!