Loading data into FiftyOne¶
The first step to using FiftyOne is to load your data into a dataset. FiftyOne supports automatic loading of datasets stored in various common formats. If your dataset is stored in a custom format, don’t worry, FiftyOne also provides support for easily loading datasets in custom formats.
Check out the sections below to see which import pattern is the best fit for your data.
Note
Did you know? You can import media and/or labels from within the FiftyOne App by installing the @voxel51/io plugin!
Note
When you create a Dataset
, its samples and all of their fields (metadata,
labels, custom fields, etc.) are written to FiftyOne’s backing database.
Important: Samples only store the filepath
to the media, not the
raw media itself. FiftyOne does not create duplicate copies of your data!
Common formats¶
If your data is stored on disk in one of the
many common formats supported natively by
FiftyOne, then you can automatically load your data into a Dataset
via the
following simple pattern:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import fiftyone as fo # A name for the dataset name = "my-dataset" # The directory containing the dataset to import dataset_dir = "/path/to/dataset" # The type of the dataset being imported dataset_type = fo.types.COCODetectionDataset # for example dataset = fo.Dataset.from_dir( dataset_dir=dataset_dir, dataset_type=dataset_type, name=name, ) |
Note
Check out this page for more details about loading datasets from disk in common formats!
Custom formats¶
The simplest and most flexible approach to loading your data into FiftyOne is
to iterate over your data in a simple Python loop, create a Sample
for each
data + label(s) pair, and then add those samples to a Dataset
.
FiftyOne provides label types for common tasks such as classification, detection, segmentation, and many more. The examples below give you a sense of the basic workflow for a few tasks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | import glob import fiftyone as fo images_patt = "/path/to/images/*" # Ex: your custom label format annotations = { "/path/to/images/000001.jpg": "dog", ...., } # Create samples for your data samples = [] for filepath in glob.glob(images_patt): sample = fo.Sample(filepath=filepath) # Store classification in a field name of your choice label = annotations[filepath] sample["ground_truth"] = fo.Classification(label=label) samples.append(sample) # Create dataset dataset = fo.Dataset("my-classification-dataset") dataset.add_samples(samples) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import glob import fiftyone as fo images_patt = "/path/to/images/*" # Ex: your custom label format annotations = { "/path/to/images/000001.jpg": [ {"bbox": ..., "label": ...}, ... ], ... } # Create samples for your data samples = [] for filepath in glob.glob(images_patt): sample = fo.Sample(filepath=filepath) # Convert detections to FiftyOne format detections = [] for obj in annotations[filepath]: label = obj["label"] # Bounding box coordinates should be relative values # in [0, 1] in the following format: # [top-left-x, top-left-y, width, height] bounding_box = obj["bbox"] detections.append( fo.Detection(label=label, bounding_box=bounding_box) ) # Store detections in a field name of your choice sample["ground_truth"] = fo.Detections(detections=detections) samples.append(sample) # Create dataset dataset = fo.Dataset("my-detection-dataset") dataset.add_samples(samples) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | import fiftyone as fo video_path = "/path/to/video.mp4" # Ex: your custom label format frame_labels = { 1: { "weather": "sunny", "objects": [ { "label": ... "bbox": ... }, ... ] }, ... } # Create video sample with frame labels sample = fo.Sample(filepath=video_path) for frame_number, labels in frame_labels.items(): frame = fo.Frame() # Store a frame classification weather = labels["weather"] frame["weather"] = fo.Classification(label=weather) # Convert detections to FiftyOne format detections = [] for obj in labels["objects"]: label = obj["label"] # Bounding box coordinates should be relative values # in [0, 1] in the following format: # [top-left-x, top-left-y, width, height] bounding_box = obj["bbox"] detections.append( fo.Detection(label=label, bounding_box=bounding_box) ) # Store object detections frame["objects"] = fo.Detections(detections=detections) # Add frame to sample sample.frames[frame_number] = frame # Create dataset dataset = fo.Dataset("my-labeled-video-dataset") dataset.add_sample(sample) |
Note that using Dataset.add_samples()
to add batches of samples to your datasets can be significantly more efficient
than adding samples one-by-one via
Dataset.add_sample()
.
Note
If you use the same custom data format frequently in your workflows, then writing a custom dataset importer is a great way to abstract and streamline the loading of your data into FiftyOne.
Loading images¶
If you’re just getting started with a project and all you have is a bunch of image files, you can easily load them into a FiftyOne dataset and start visualizing them in the App:
You can use the
Dataset.from_images()
,
Dataset.from_images_dir()
, and
Dataset.from_images_patt()
factory methods to load your images into FiftyOne:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import fiftyone as fo # Create a dataset from a list of images dataset = fo.Dataset.from_images( ["/path/to/image1.jpg", "/path/to/image2.jpg", ...] ) # Create a dataset from a directory of images dataset = fo.Dataset.from_images_dir("/path/to/images") # Create a dataset from a glob pattern of images dataset = fo.Dataset.from_images_patt("/path/to/images/*.jpg") session = fo.launch_app(dataset) |
You can also use
Dataset.add_images()
,
Dataset.add_images_dir()
, and
Dataset.add_images_patt()
to add images to an existing dataset.
You can use the fiftyone app view command from the CLI to quickly browse images in the App without creating a (persistent) FiftyOne dataset:
# View a glob pattern of images in the App
fiftyone app view --images-patt '/path/to/images/*.jpg'
# View a directory of images in the App
fiftyone app view --images-dir '/path/to/images'
Loading videos¶
If you’re just getting started with a project and all you have is a bunch of video files, you can easily load them into a FiftyOne dataset and start visualizing them in the App:
You can use the
Dataset.from_videos()
,
Dataset.from_videos_dir()
, and
Dataset.from_videos_patt()
factory methods to load your videos into FiftyOne:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import fiftyone as fo # Create a dataset from a list of videos dataset = fo.Dataset.from_videos( ["/path/to/video1.mp4", "/path/to/video2.mp4", ...] ) # Create a dataset from a directory of videos dataset = fo.Dataset.from_videos_dir("/path/to/videos") # Create a dataset from a glob pattern of videos dataset = fo.Dataset.from_videos_patt("/path/to/videos/*.mp4") session = fo.launch_app(dataset) |
You can also use
Dataset.add_videos()
,
Dataset.add_videos_dir()
, and
Dataset.add_videos_patt()
to add videos to an existing dataset.
You can use the fiftyone app view command from the CLI to quickly browse videos in the App without creating a (persistent) FiftyOne dataset:
# View a glob pattern of videos in the App
fiftyone app view --videos-patt '/path/to/videos/*.mp4'
# View a directory of videos in the App
fiftyone app view --videos-dir '/path/to/videos'
Model predictions¶
Once you’ve created a dataset and ground truth labels, you can easily add model predictions to take advantage of FiftyOne’s evaluation capabilities.
If you have model predictions stored in
COCO format, then you can use
add_coco_labels()
to
conveniently add the labels to an existing dataset.
The example below demonstrates a round-trip export and then re-import of both images-and-labels and labels-only data in COCO format:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | import fiftyone as fo import fiftyone.zoo as foz import fiftyone.utils.coco as fouc dataset = foz.load_zoo_dataset("quickstart") classes = dataset.distinct("predictions.detections.label") # Export images and ground truth labels to disk dataset.export( export_dir="/tmp/coco", dataset_type=fo.types.COCODetectionDataset, label_field="ground_truth", classes=classes, ) # Export predictions dataset.export( dataset_type=fo.types.COCODetectionDataset, labels_path="/tmp/coco/predictions.json", label_field="predictions", classes=classes, ) # Now load ground truth labels into a new dataset dataset2 = fo.Dataset.from_dir( dataset_dir="/tmp/coco", dataset_type=fo.types.COCODetectionDataset, label_field="ground_truth", label_types="detections", ) # And add model predictions fouc.add_coco_labels( dataset2, "predictions", "/tmp/coco/predictions.json", classes, ) # Verify that ground truth and predictions were imported as expected print(dataset.count("ground_truth.detections")) print(dataset2.count("ground_truth.detections")) print(dataset.count("predictions.detections")) print(dataset2.count("predictions.detections")) |
Note
See add_coco_labels()
for
a complete description of the available syntaxes for loading
COCO-formatted predictions to an existing dataset.
If you have model predictions stored in
YOLO format, then you can use
add_yolo_labels()
to
conveniently add the labels to an existing dataset.
The example below demonstrates a round-trip export and then re-import of both images-and-labels and labels-only data in YOLO format:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import fiftyone as fo import fiftyone.zoo as foz import fiftyone.utils.yolo as fouy dataset = foz.load_zoo_dataset("quickstart") classes = dataset.distinct("predictions.detections.label") # Export images and ground truth labels to disk dataset.export( export_dir="/tmp/yolov4", dataset_type=fo.types.YOLOv4Dataset, label_field="ground_truth", classes=classes, ) # Export predictions dataset.export( dataset_type=fo.types.YOLOv4Dataset, labels_path="/tmp/yolov4/predictions", label_field="predictions", classes=classes, ) # Now load ground truth labels into a new dataset dataset2 = fo.Dataset.from_dir( dataset_dir="/tmp/yolov4", dataset_type=fo.types.YOLOv4Dataset, label_field="ground_truth", ) # And add model predictions fouy.add_yolo_labels( dataset2, "predictions", "/tmp/yolov4/predictions", classes, ) # Verify that ground truth and predictions were imported as expected print(dataset.count("ground_truth.detections")) print(dataset2.count("ground_truth.detections")) print(dataset.count("predictions.detections")) print(dataset2.count("predictions.detections")) |
Note
See add_yolo_labels()
for
a complete description of the available syntaxes for loading
YOLO-formatted predictions to an existing dataset.
Model predictions stored in other formats can always be loaded iteratively through a simple Python loop.
The example below shows how to add object detection predictions to a dataset, but many other label types are also supported.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | import fiftyone as fo # Ex: your custom predictions format predictions = { "/path/to/images/000001.jpg": [ {"bbox": ..., "label": ..., "score": ...}, ... ], ... } # Add predictions to your samples for sample in dataset: filepath = sample.filepath # Convert predictions to FiftyOne format detections = [] for obj in predictions[filepath]: label = obj["label"] confidence = obj["score"] # Bounding box coordinates should be relative values # in [0, 1] in the following format: # [top-left-x, top-left-y, width, height] bounding_box = obj["bbox"] detections.append( fo.Detection( label=label, bounding_box=bounding_box, confidence=confidence, ) ) # Store detections in a field name of your choice sample["predictions"] = fo.Detections(detections=detections) sample.save() |
Note
If you are in need of a model to run on your dataset, check out the FiftyOne Model Zoo or the Lightning Flash integration.
Need data?¶
The FiftyOne Dataset Zoo contains dozens of popular public datasets that you can load into FiftyOne in a single line of code:
1 2 3 4 5 6 7 8 | import fiftyone.zoo as foz # List available datasets print(foz.list_zoo_datasets()) # ['coco-2014', ..., 'kitti', ..., 'voc-2012', ...] # Load a split of a zoo dataset dataset = foz.load_zoo_dataset("cifar10", split="train") |
Note
Check out the available zoo datasets!