Run in Google Colab | View source on GitHub | Download notebook |
Writing Custom Dataset Exporters¶
This recipe demonstrates how to write a custom DatasetExporter and use it to export a FiftyOne dataset to disk in your custom format.
Setup¶
If you haven’t already, install FiftyOne:
[ ]:
!pip install fiftyone
In this recipe we’ll use the FiftyOne Dataset Zoo to download the CIFAR-10 dataset to use as sample data to feed our custom exporter.
Behind the scenes, FiftyOne uses either the TensorFlow Datasets or TorchVision Datasets libraries to wrangle the datasets, depending on which ML library you have installed.
You can, for example, install PyTorch as follows:
[ ]:
!pip install torch torchvision
Writing a DatasetExporter¶
FiftyOne provides a DatasetExporter interface that defines how it exports datasets to disk when methods such as Dataset.export() are used.
DatasetExporter
itself is an abstract interface; the concrete interface that you should implement is determined by the type of dataset that you are exporting. See writing a custom DatasetExporter for full details.
In this recipe, we’ll write a custom LabeledImageDatasetExporter that can export an image classification dataset to disk in the following format:
<dataset_dir>/
data/
<filename1>.<ext>
<filename2>.<ext>
...
labels.csv
where labels.csv
is a CSV file that contains the image metadata and associated labels in the following format:
filepath,size_bytes,mime_type,width,height,num_channels,label
<filepath>,<size_bytes>,<mime_type>,<width>,<height>,<num_channels>,<label>
<filepath>,<size_bytes>,<mime_type>,<width>,<height>,<num_channels>,<label>
...
Here’s the complete definition of the DatasetExporter
:
[1]:
import csv
import os
import fiftyone as fo
import fiftyone.utils.data as foud
class CSVImageClassificationDatasetExporter(foud.LabeledImageDatasetExporter):
"""Exporter for image classification datasets whose labels and image
metadata are stored on disk in a CSV file.
Datasets of this type are exported in the following format:
<dataset_dir>/
data/
<filename1>.<ext>
<filename2>.<ext>
...
labels.csv
where ``labels.csv`` is a CSV file in the following format::
filepath,size_bytes,mime_type,width,height,num_channels,label
<filepath>,<size_bytes>,<mime_type>,<width>,<height>,<num_channels>,<label>
<filepath>,<size_bytes>,<mime_type>,<width>,<height>,<num_channels>,<label>
...
Args:
export_dir: the directory to write the export
"""
def __init__(self, export_dir):
super().__init__(export_dir=export_dir)
self._data_dir = None
self._labels_path = None
self._labels = None
self._image_exporter = None
@property
def requires_image_metadata(self):
"""Whether this exporter requires
:class:`fiftyone.core.metadata.ImageMetadata` instances for each sample
being exported.
"""
return True
@property
def label_cls(self):
"""The :class:`fiftyone.core.labels.Label` class(es) exported by this
exporter.
This can be any of the following:
- a :class:`fiftyone.core.labels.Label` class. In this case, the
exporter directly exports labels of this type
- a list or tuple of :class:`fiftyone.core.labels.Label` classes. In
this case, the exporter can export a single label field of any of
these types
- a dict mapping keys to :class:`fiftyone.core.labels.Label` classes.
In this case, the exporter can handle label dictionaries with
value-types specified by this dictionary. Not all keys need be
present in the exported label dicts
- ``None``. In this case, the exporter makes no guarantees about the
labels that it can export
"""
return fo.Classification
def setup(self):
"""Performs any necessary setup before exporting the first sample in
the dataset.
This method is called when the exporter's context manager interface is
entered, :func:`DatasetExporter.__enter__`.
"""
self._data_dir = os.path.join(self.export_dir, "data")
self._labels_path = os.path.join(self.export_dir, "labels.csv")
self._labels = []
# The `ImageExporter` utility class provides an `export()` method
# that exports images to an output directory with automatic handling
# of things like name conflicts
self._image_exporter = foud.ImageExporter(
True, export_path=self._data_dir, default_ext=".jpg",
)
self._image_exporter.setup()
def export_sample(self, image_or_path, label, metadata=None):
"""Exports the given sample to the dataset.
Args:
image_or_path: an image or the path to the image on disk
label: an instance of :meth:`label_cls`, or a dictionary mapping
field names to :class:`fiftyone.core.labels.Label` instances,
or ``None`` if the sample is unlabeled
metadata (None): a :class:`fiftyone.core.metadata.ImageMetadata`
instance for the sample. Only required when
:meth:`requires_image_metadata` is ``True``
"""
out_image_path, _ = self._image_exporter.export(image_or_path)
if metadata is None:
metadata = fo.ImageMetadata.build_for(image_or_path)
self._labels.append((
out_image_path,
metadata.size_bytes,
metadata.mime_type,
metadata.width,
metadata.height,
metadata.num_channels,
label.label, # here, `label` is a `Classification` instance
))
def close(self, *args):
"""Performs any necessary actions after the last sample has been
exported.
This method is called when the exporter's context manager interface is
exited, :func:`DatasetExporter.__exit__`.
Args:
*args: the arguments to :func:`DatasetExporter.__exit__`
"""
# Ensure the base output directory exists
basedir = os.path.dirname(self._labels_path)
if basedir and not os.path.isdir(basedir):
os.makedirs(basedir)
# Write the labels CSV file
with open(self._labels_path, "w") as f:
writer = csv.writer(f)
writer.writerow([
"filepath",
"size_bytes",
"mime_type",
"width",
"height",
"num_channels",
"label",
])
for row in self._labels:
writer.writerow(row)
Generating a sample dataset¶
In order to use CSVImageClassificationDatasetExporter
, we need some labeled image samples to work with.
Let’s use some samples from the test split of CIFAR-10:
[2]:
import fiftyone.zoo as foz
num_samples = 1000
#
# Load `num_samples` from CIFAR-10
#
# This command will download the test split of CIFAR-10 from the web the first
# time it is executed, if necessary
#
cifar10_test = foz.load_zoo_dataset("cifar10", split="test")
samples = cifar10_test.limit(num_samples)
Split 'test' already downloaded
Loading 'cifar10' split 'test'
100% |███| 10000/10000 [4.4s elapsed, 0s remaining, 2.2K samples/s]
[3]:
# Print summary information about the samples
print(samples)
Dataset: cifar10-test
Num samples: 1000
Tags: ['test']
Sample fields:
filepath: fiftyone.core.fields.StringField
tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
Pipeline stages:
1. Limit(limit=1000)
[4]:
# Print a sample
print(samples.first())
<Sample: {
'dataset_name': 'cifar10-test',
'id': '5f0e6d7f503bf2b87254061c',
'filepath': '~/fiftyone/cifar10/test/data/000001.jpg',
'tags': BaseList(['test']),
'metadata': None,
'ground_truth': <Classification: {'label': 'cat', 'confidence': None, 'logits': None}>,
}>
Exporting a dataset¶
With our samples and DatasetExporter
in-hand, exporting the samples to disk in our custom format is as simple as follows:
[5]:
export_dir = "/tmp/fiftyone/custom-dataset-exporter"
# Export the dataset
print("Exporting %d samples to '%s'" % (len(samples), export_dir))
exporter = CSVImageClassificationDatasetExporter(export_dir)
samples.export(dataset_exporter=exporter)
Exporting 1000 samples to '/tmp/fiftyone/custom-dataset-exporter'
100% |█████| 1000/1000 [1.0s elapsed, 0s remaining, 1.0K samples/s]
Let’s inspect the contents of the exported dataset to verify that it was written in the correct format:
[9]:
!ls -lah /tmp/fiftyone/custom-dataset-exporter
total 168
drwxr-xr-x 4 voxel51 wheel 128B Jul 14 22:46 .
drwxr-xr-x 3 voxel51 wheel 96B Jul 14 22:46 ..
drwxr-xr-x 1002 voxel51 wheel 31K Jul 14 22:46 data
-rw-r--r-- 1 voxel51 wheel 83K Jul 14 22:46 labels.csv
[10]:
!ls -lah /tmp/fiftyone/custom-dataset-exporter/data | head -n 10
total 8000
drwxr-xr-x 1002 voxel51 wheel 31K Jul 14 22:46 .
drwxr-xr-x 4 voxel51 wheel 128B Jul 14 22:46 ..
-rw-r--r-- 1 voxel51 wheel 1.4K Jul 14 22:46 000001.jpg
-rw-r--r-- 1 voxel51 wheel 1.3K Jul 14 22:46 000002.jpg
-rw-r--r-- 1 voxel51 wheel 1.2K Jul 14 22:46 000003.jpg
-rw-r--r-- 1 voxel51 wheel 1.2K Jul 14 22:46 000004.jpg
-rw-r--r-- 1 voxel51 wheel 1.4K Jul 14 22:46 000005.jpg
-rw-r--r-- 1 voxel51 wheel 1.3K Jul 14 22:46 000006.jpg
-rw-r--r-- 1 voxel51 wheel 1.4K Jul 14 22:46 000007.jpg
[11]:
!head -n 10 /tmp/fiftyone/custom-dataset-exporter/labels.csv
filepath,size_bytes,mime_type,width,height,num_channels,label
/tmp/fiftyone/custom-dataset-exporter/data/000001.jpg,1422,image/jpeg,32,32,3,cat
/tmp/fiftyone/custom-dataset-exporter/data/000002.jpg,1285,image/jpeg,32,32,3,ship
/tmp/fiftyone/custom-dataset-exporter/data/000003.jpg,1258,image/jpeg,32,32,3,ship
/tmp/fiftyone/custom-dataset-exporter/data/000004.jpg,1244,image/jpeg,32,32,3,airplane
/tmp/fiftyone/custom-dataset-exporter/data/000005.jpg,1388,image/jpeg,32,32,3,frog
/tmp/fiftyone/custom-dataset-exporter/data/000006.jpg,1311,image/jpeg,32,32,3,frog
/tmp/fiftyone/custom-dataset-exporter/data/000007.jpg,1412,image/jpeg,32,32,3,automobile
/tmp/fiftyone/custom-dataset-exporter/data/000008.jpg,1218,image/jpeg,32,32,3,frog
/tmp/fiftyone/custom-dataset-exporter/data/000009.jpg,1262,image/jpeg,32,32,3,cat
Cleanup¶
You can cleanup the files generated by this recipe by running:
[12]:
!rm -rf /tmp/fiftyone