fiftyone.utils.data.ingestors#

Dataset ingestors.

Copyright 2017-2025, Voxel51, Inc.

Classes:

ImageIngestor(dataset_dir[, image_format])

Mixin for fiftyone.utils.data.importers.DatasetImporter instances that ingest images into the provided dataset_dir during import.

UnlabeledImageDatasetIngestor(dataset_dir, ...)

Dataset importer that ingests unlabeled images into the provided dataset_dir during import.

LabeledImageDatasetIngestor(dataset_dir, ...)

Dataset importer that ingests labeled images into the provided dataset_dir during import.

VideoIngestor(dataset_dir)

Mixin for fiftyone.utils.data.importers.DatasetImporter instances that ingest videos into the provided dataset_dir during import.

UnlabeledVideoDatasetIngestor(dataset_dir, ...)

Dataset importer that ingests unlabeled videos into the provided dataset_dir during import.

LabeledVideoDatasetIngestor(dataset_dir, ...)

Dataset importer that ingests labeled videos into the provided dataset_dir during import.

class fiftyone.utils.data.ingestors.ImageIngestor(dataset_dir, image_format=None)#

Bases: object

Mixin for fiftyone.utils.data.importers.DatasetImporter instances that ingest images into the provided dataset_dir during import.

Parameters:
  • dataset_dir – the directory where input images will be ingested into

  • image_format (None) – the image format to use when writing in-memory images to disk. By default, fiftyone.config.default_image_ext is used

class fiftyone.utils.data.ingestors.UnlabeledImageDatasetIngestor(dataset_dir, samples, sample_parser, image_format=None, max_samples=None)#

Bases: UnlabeledImageDatasetImporter, ImageIngestor

Dataset importer that ingests unlabeled images into the provided dataset_dir during import.

The source images are parsed from the provided samples using the provided fiftyone.utils.data.parsers.UnlabeledImageSampleParser.

If an image path is available via fiftyone.utils.data.parsers.UnlabeledImageSampleParser.get_image_path(), then the image is directly copied from its source location into dataset_dir. In this case, the original filename is maintained, unless a name conflict would occur, in which case an index of the form "-%d" % count is appended to the base filename.

If no image path is available, the image is read in-memory via fiftyone.utils.data.parsers.UnlabeledImageSampleParser.get_image() and written to dataset_dir in the following format:

<dataset_dir>/<image_count><image_format>

where image_count is the number of files in dataset_dir.

Parameters:
  • dataset_dir – the directory where input images will be ingested into

  • samples – an iterable of samples that can be parsed by sample_parser

  • sample_parser – an fiftyone.utils.data.parsers.UnlabeledImageSampleParser to use to parse the samples

  • image_format (None) – the image format to use when writing in-memory images to disk. By default, fiftyone.config.default_image_ext is used

  • max_samples (None) – a maximum number of samples to import. By default, all samples are imported

Attributes:

has_dataset_info

Whether this importer produces a dataset info dictionary.

has_image_metadata

Whether this importer produces fiftyone.core.metadata.ImageMetadata instances for each image.

Methods:

setup()

Performs any necessary setup before importing the first sample in the dataset.

close(*args)

Performs any necessary actions after the last sample has been imported.

get_dataset_info()

Returns the dataset info for the dataset.

property has_dataset_info#

Whether this importer produces a dataset info dictionary.

property has_image_metadata#

Whether this importer produces fiftyone.core.metadata.ImageMetadata instances for each image.

setup()#

Performs any necessary setup before importing the first sample in the dataset.

This method is called when the importer’s context manager interface is entered, DatasetImporter.__enter__().

close(*args)#

Performs any necessary actions after the last sample has been imported.

This method is called when the importer’s context manager interface is exited, DatasetImporter.__exit__().

Parameters:

*args – the arguments to DatasetImporter.__exit__()

get_dataset_info()#

Returns the dataset info for the dataset.

By convention, this method should be called after all samples in the dataset have been imported.

Returns:

a dict of dataset info

class fiftyone.utils.data.ingestors.LabeledImageDatasetIngestor(dataset_dir, samples, sample_parser, image_format=None, max_samples=None)#

Bases: LabeledImageDatasetImporter, ImageIngestor

Dataset importer that ingests labeled images into the provided dataset_dir during import.

The source images and labels are parsed from the provided samples using the provided fiftyone.utils.data.parsers.LabeledImageSampleParser.

If an image path is available via fiftyone.utils.data.parsers.LabeledImageSampleParser.get_image_path(), then the image is directly copied from its source location into dataset_dir. In this case, the original filename is maintained, unless a name conflict would occur, in which case an index of the form "-%d" % count is appended to the base filename.

If no image path is available, the image is read in-memory via fiftyone.utils.data.parsers.LabeledImageSampleParser.get_image() and written to dataset_dir in the following format:

<dataset_dir>/<image_count><image_format>

where image_count is the number of files in dataset_dir.

Parameters:
  • dataset_dir – the directory where input images will be ingested into

  • samples – an iterable of samples that can be parsed by sample_parser

  • sample_parser – an fiftyone.utils.data.parsers.LabeledImageSampleParser to use to parse the samples

  • image_format (None) – the image format to use when writing in-memory images to disk. By default, fiftyone.config.default_image_ext is used

  • max_samples (None) – a maximum number of samples to import. By default, all samples are imported

Attributes:

has_dataset_info

Whether this importer produces a dataset info dictionary.

has_image_metadata

Whether this importer produces fiftyone.core.metadata.ImageMetadata instances for each image.

label_cls

The fiftyone.core.labels.Label class(es) returned by this importer.

Methods:

setup()

Performs any necessary setup before importing the first sample in the dataset.

close(*args)

Performs any necessary actions after the last sample has been imported.

get_dataset_info()

Returns the dataset info for the dataset.

property has_dataset_info#

Whether this importer produces a dataset info dictionary.

property has_image_metadata#

Whether this importer produces fiftyone.core.metadata.ImageMetadata instances for each image.

property label_cls#

The fiftyone.core.labels.Label class(es) returned by this importer.

This can be any of the following:

  • a fiftyone.core.labels.Label class. In this case, the importer is guaranteed to return labels of this type

  • a list or tuple of fiftyone.core.labels.Label classes. In this case, the importer can produce a single label field of any of these types

  • a dict mapping keys to fiftyone.core.labels.Label classes. In this case, the importer will return label dictionaries with keys and value-types specified by this dictionary. Not all keys need be present in the imported labels

  • None. In this case, the importer makes no guarantees about the labels that it may return

setup()#

Performs any necessary setup before importing the first sample in the dataset.

This method is called when the importer’s context manager interface is entered, DatasetImporter.__enter__().

close(*args)#

Performs any necessary actions after the last sample has been imported.

This method is called when the importer’s context manager interface is exited, DatasetImporter.__exit__().

Parameters:

*args – the arguments to DatasetImporter.__exit__()

get_dataset_info()#

Returns the dataset info for the dataset.

By convention, this method should be called after all samples in the dataset have been imported.

Returns:

a dict of dataset info

class fiftyone.utils.data.ingestors.VideoIngestor(dataset_dir)#

Bases: object

Mixin for fiftyone.utils.data.importers.DatasetImporter instances that ingest videos into the provided dataset_dir during import.

Parameters:

dataset_dir – the directory where input videos will be ingested into

class fiftyone.utils.data.ingestors.UnlabeledVideoDatasetIngestor(dataset_dir, samples, sample_parser, max_samples=None)#

Bases: UnlabeledVideoDatasetImporter, VideoIngestor

Dataset importer that ingests unlabeled videos into the provided dataset_dir during import.

The source videos are parsed from the provided samples using the provided fiftyone.utils.data.parsers.UnlabeledVideoSampleParser.

The source videos are directly copied from their source locations into dataset_dir, maintaining the original filenames, unless a name conflict would occur, in which case an index of the form "-%d" % count is appended to the base filename.

Parameters:
  • dataset_dir – the directory where input videos will be ingested into

  • samples – an iterable of samples that can be parsed by sample_parser

  • sample_parser – an fiftyone.utils.data.parsers.UnlabeledVideoSampleParser to use to parse the samples

  • max_samples (None) – a maximum number of samples to import. By default, all samples are imported

Attributes:

has_dataset_info

Whether this importer produces a dataset info dictionary.

has_video_metadata

Whether this importer produces fiftyone.core.metadata.VideoMetadata instances for each video.

Methods:

setup()

Performs any necessary setup before importing the first sample in the dataset.

close(*args)

Performs any necessary actions after the last sample has been imported.

get_dataset_info()

Returns the dataset info for the dataset.

property has_dataset_info#

Whether this importer produces a dataset info dictionary.

property has_video_metadata#

Whether this importer produces fiftyone.core.metadata.VideoMetadata instances for each video.

setup()#

Performs any necessary setup before importing the first sample in the dataset.

This method is called when the importer’s context manager interface is entered, DatasetImporter.__enter__().

close(*args)#

Performs any necessary actions after the last sample has been imported.

This method is called when the importer’s context manager interface is exited, DatasetImporter.__exit__().

Parameters:

*args – the arguments to DatasetImporter.__exit__()

get_dataset_info()#

Returns the dataset info for the dataset.

By convention, this method should be called after all samples in the dataset have been imported.

Returns:

a dict of dataset info

class fiftyone.utils.data.ingestors.LabeledVideoDatasetIngestor(dataset_dir, samples, sample_parser, max_samples=None)#

Bases: LabeledVideoDatasetImporter, VideoIngestor

Dataset importer that ingests labeled videos into the provided dataset_dir during import.

The source videos and labels are parsed from the provided samples using the provided fiftyone.utils.data.parsers.LabeledVideoSampleParser.

The source videos are directly copied from their source locations into dataset_dir, maintaining the original filenames, unless a name conflict would occur, in which case an index of the form "-%d" % count is appended to the base filename.

Parameters:
  • dataset_dir – the directory where input videos will be ingested into

  • samples – an iterable of samples that can be parsed by sample_parser

  • sample_parser – an fiftyone.utils.data.parsers.LabeledVideoSampleParser to use to parse the samples

  • max_samples (None) – a maximum number of samples to import. By default, all samples are imported

Attributes:

has_dataset_info

Whether this importer produces a dataset info dictionary.

has_video_metadata

Whether this importer produces fiftyone.core.metadata.VideoMetadata instances for each video.

label_cls

The fiftyone.core.labels.Label class(es) returned by this importer within the sample-level labels that it produces.

frame_labels_cls

The fiftyone.core.labels.Label class(es) returned by this importer within the frame labels that it produces.

Methods:

setup()

Performs any necessary setup before importing the first sample in the dataset.

close(*args)

Performs any necessary actions after the last sample has been imported.

get_dataset_info()

Returns the dataset info for the dataset.

property has_dataset_info#

Whether this importer produces a dataset info dictionary.

property has_video_metadata#

Whether this importer produces fiftyone.core.metadata.VideoMetadata instances for each video.

property label_cls#

The fiftyone.core.labels.Label class(es) returned by this importer within the sample-level labels that it produces.

This can be any of the following:

  • a fiftyone.core.labels.Label class. In this case, the importer is guaranteed to return sample-level labels of this type

  • a list or tuple of fiftyone.core.labels.Label classes. In this case, the importer can produce a single sample-level label field of any of these types

  • a dict mapping keys to fiftyone.core.labels.Label classes. In this case, the importer will return sample-level label dictionaries with keys and value-types specified by this dictionary. Not all keys need be present in the imported labels

  • None. In this case, the importer makes no guarantees about the sample-level labels that it may return

property frame_labels_cls#

The fiftyone.core.labels.Label class(es) returned by this importer within the frame labels that it produces.

This can be any of the following:

  • a fiftyone.core.labels.Label class. In this case, the importer is guaranteed to return frame labels of this type

  • a list or tuple of fiftyone.core.labels.Label classes. In this case, the importer can produce a single frame label field of any of these types

  • a dict mapping keys to fiftyone.core.labels.Label classes. In this case, the importer will return frame label dictionaries with keys and value-types specified by this dictionary. Not all keys need be present in each frame

  • None. In this case, the importer makes no guarantees about the frame labels that it may return

setup()#

Performs any necessary setup before importing the first sample in the dataset.

This method is called when the importer’s context manager interface is entered, DatasetImporter.__enter__().

close(*args)#

Performs any necessary actions after the last sample has been imported.

This method is called when the importer’s context manager interface is exited, DatasetImporter.__exit__().

Parameters:

*args – the arguments to DatasetImporter.__exit__()

get_dataset_info()#

Returns the dataset info for the dataset.

By convention, this method should be called after all samples in the dataset have been imported.

Returns:

a dict of dataset info