fiftyone.zoo.datasets#

Module contents#

The FiftyOne Dataset Zoo.

This package defines a collection of open source datasets made available for download via FiftyOne.

Copyright 2017-2025, Voxel51, Inc.

Functions:

list_zoo_datasets([tags, source, license])

Lists the available datasets in the FiftyOne Dataset Zoo.

list_zoo_dataset_sources()

Returns the list of available zoo dataset sources.

list_downloaded_zoo_datasets()

Returns information about the zoo datasets that have been downloaded.

download_zoo_dataset(name_or_url[, split, ...])

Downloads the specified dataset from the FiftyOne Dataset Zoo.

load_zoo_dataset(name_or_url[, split, ...])

Loads the specified dataset from the FiftyOne Dataset Zoo.

find_zoo_dataset(name_or_url[, split])

Returns the directory containing the given zoo dataset.

load_zoo_dataset_info(name_or_url)

Loads the ZooDatasetInfo for the specified zoo dataset.

get_zoo_dataset(name_or_url[, overwrite])

Returns the ZooDataset instance for the given dataset.

delete_zoo_dataset(name_or_url[, split])

Deletes the zoo dataset from local disk, if necessary.

Classes:

ZooDatasetInfo(zoo_dataset, dataset_type, ...)

Class containing info about a dataset in the FiftyOne Dataset Zoo.

ZooDatasetSplitInfo(split, num_samples)

Class containing info about a split of a dataset in the FiftyOne Dataset Zoo.

ZooDataset()

Base class for datasets made available in the FiftyOne Dataset Zoo.

RemoteZooDataset(dataset_dir[, url])

Class for working with remotely-sourced datasets that are compatible with the FiftyOne Dataset Zoo.

DeprecatedZooDataset()

Class representing a zoo dataset that no longer exists in the FiftyOne Dataset Zoo.

fiftyone.zoo.datasets.list_zoo_datasets(tags=None, source=None, license=None)#

Lists the available datasets in the FiftyOne Dataset Zoo.

Also includes any remotely-sourced zoo datasets that you’ve downloaded.

Example usage:

import fiftyone as fo
import fiftyone.zoo as foz

#
# List all zoo datasets
#

names = foz.list_zoo_datasets()
print(names)

#
# List all zoo datasets with (both of) the specified tags
#

names = foz.list_zoo_datasets(tags=["image", "detection"])
print(names)

#
# List all zoo datasets available via the given source
#

names = foz.list_zoo_datasets(source="torch")
print(names)
Parameters:
  • tags (None) – only include datasets that have the specified tag or list of tags

  • source (None) – only include datasets available via the given source or list of sources

  • license (None) – only include datasets that are distributed under the specified license or any of the specified list of licenses. Run fiftyone zoo datasets list to see the available licenses

Returns:

a sorted list of dataset names

fiftyone.zoo.datasets.list_zoo_dataset_sources()#

Returns the list of available zoo dataset sources.

Returns:

a list of sources

fiftyone.zoo.datasets.list_downloaded_zoo_datasets()#

Returns information about the zoo datasets that have been downloaded.

Returns:

a dict mapping dataset names to (dataset_dir, ZooDatasetInfo) tuples

fiftyone.zoo.datasets.download_zoo_dataset(name_or_url, split=None, splits=None, overwrite=False, cleanup=True, **kwargs)#

Downloads the specified dataset from the FiftyOne Dataset Zoo.

Any dataset splits that have already been downloaded are not re-downloaded, unless overwrite == True is specified.

Note

To download from a private GitHub repository that you have access to, provide your GitHub personal access token by setting the GITHUB_TOKEN environment variable.

Parameters:
  • name_or_url

    the name of the zoo dataset to download, or the remote source to download it from, which can be:

    • a GitHub repo URL like https://github.com/<user>/<repo>

    • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>

    • a GitHub ref string like <user>/<repo>[/<ref>]

    • a publicly accessible URL of an archive (eg zip or tar) file

  • split (None) – ("train", "validation", "test"). If neither split nor splits are provided, all available splits are downloaded. Consult the documentation for the ZooDataset you specified to see the supported splits

  • splits (None) – a list of splits to download, if applicable. Typical values are ("train", "validation", "test"). If neither split nor splits are provided, all available splits are downloaded. Consult the documentation for the ZooDataset you specified to see the supported splits

  • overwrite (False) – whether to overwrite any existing files

  • cleanup (True) – whether to cleanup any temporary files generated during download

  • **kwargs – optional arguments for the ZooDataset constructor or the remote dataset’s download_and_prepare() method

Returns:

a tuple of

  • info: the ZooDatasetInfo for the dataset

  • dataset_dir: the directory containing the dataset

fiftyone.zoo.datasets.load_zoo_dataset(name_or_url, split=None, splits=None, label_field=None, dataset_name=None, download_if_necessary=True, drop_existing_dataset=False, persistent=False, overwrite=False, cleanup=True, progress=None, **kwargs)#

Loads the specified dataset from the FiftyOne Dataset Zoo.

By default, the dataset will be downloaded if necessary.

Note

To download from a private GitHub repository that you have access to, provide your GitHub personal access token by setting the GITHUB_TOKEN environment variable.

If you do not specify a custom dataset_name and you have previously loaded the same zoo dataset and split(s) into FiftyOne, the existing dataset will be returned.

Parameters:
  • name_or_url

    the name of the zoo dataset to load, or the remote source to load it from, which can be:

    • a GitHub repo URL like https://github.com/<user>/<repo>

    • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>

    • a GitHub ref string like <user>/<repo>[/<ref>]

    • a publicly accessible URL of an archive (eg zip or tar) file

  • split (None) – ("train", "validation", "test"). If neither split nor splits are provided, all available splits are loaded. Consult the documentation for the ZooDataset you specified to see the supported splits

  • splits (None) – a list of splits to load, if applicable. Typical values are ("train", "validation", "test"). If neither split nor splits are provided, all available splits are loaded. Consult the documentation for the ZooDataset you specified to see the supported splits

  • label_field (None) – the label field (or prefix, if the dataset contains multiple label fields) in which to store the dataset’s labels. By default, this is "ground_truth" if the dataset contains a single label field. If the dataset contains multiple label fields and this value is not provided, the labels will be stored under dataset-specific field names

  • dataset_name (None) – an optional name to give the returned fiftyone.core.dataset.Dataset. By default, a name will be constructed based on the dataset and split(s) you are loading

  • download_if_necessary (True) – whether to download the dataset if it is not found in the specified dataset directory

  • drop_existing_dataset (False) – whether to drop an existing dataset with the same name if it exists

  • persistent (False) – whether the dataset should persist in the database after the session terminates

  • overwrite (False) – whether to overwrite any existing files if the dataset is to be downloaded

  • cleanup (True) – whether to cleanup any temporary files generated during download

  • progress (None) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

  • **kwargs – optional arguments to pass to the fiftyone.utils.data.importers.DatasetImporter constructor or the remote dataset’s load_dataset()` method. If ``download_if_necessary == True, then kwargs can also contain arguments for download_zoo_dataset()

Returns:

a fiftyone.core.dataset.Dataset

fiftyone.zoo.datasets.find_zoo_dataset(name_or_url, split=None)#

Returns the directory containing the given zoo dataset.

If a split is provided, the path to the dataset split is returned; otherwise, the path to the root directory is returned.

The dataset must be downloaded. Use download_zoo_dataset() to download datasets.

Parameters:
  • name_or_url

    the name of the zoo dataset or its remote source, which can be:

    • a GitHub repo URL like https://github.com/<user>/<repo>

    • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>

    • a GitHub ref string like <user>/<repo>[/<ref>]

    • a publicly accessible URL of an archive (eg zip or tar) file

  • split (None) – a specific split to locate

Returns:

the directory containing the dataset or split

Raises:

ValueError – if the dataset or split does not exist or has not been downloaded

fiftyone.zoo.datasets.load_zoo_dataset_info(name_or_url)#

Loads the ZooDatasetInfo for the specified zoo dataset.

The dataset must be downloaded. Use download_zoo_dataset() to download datasets.

Parameters:

name_or_url

the name of the zoo dataset or its remote source, which can be:

  • a GitHub repo URL like https://github.com/<user>/<repo>

  • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>

  • a GitHub ref string like <user>/<repo>[/<ref>]

  • a publicly accessible URL of an archive (eg zip or tar) file

Returns:

the ZooDatasetInfo for the dataset

Raises:

ValueError – if the dataset has not been downloaded

fiftyone.zoo.datasets.get_zoo_dataset(name_or_url, overwrite=False, **kwargs)#

Returns the ZooDataset instance for the given dataset.

If the dataset is available from multiple sources, the default source is used.

Parameters:
  • name_or_url

    the name of the zoo dataset, or its remote source, which can be:

    • a GitHub repo URL like https://github.com/<user>/<repo>

    • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>

    • a GitHub ref string like <user>/<repo>[/<ref>]

    • a publicly accessible URL of an archive (eg zip or tar) file

  • overwrite (False) – whether to overwrite existing metadata if it has already been downloaded. Only applicable when name_or_url is a remote source

  • **kwargs – optional arguments for ZooDataset

Returns:

the ZooDataset instance

fiftyone.zoo.datasets.delete_zoo_dataset(name_or_url, split=None)#

Deletes the zoo dataset from local disk, if necessary.

If a split is provided, only that split is deleted.

Parameters:
  • name_or_url

    the name of the zoo dataset, or its remote source, which can be:

    • a GitHub repo URL like https://github.com/<user>/<repo>

    • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>

    • a GitHub ref string like <user>/<repo>[/<ref>]

    • a publicly accessible URL of an archive (eg zip or tar) file

  • split (None)

class fiftyone.zoo.datasets.ZooDatasetInfo(zoo_dataset, dataset_type, num_samples, downloaded_splits=None, parameters=None, classes=None)#

Bases: Serializable

Class containing info about a dataset in the FiftyOne Dataset Zoo.

Parameters:
  • zoo_dataset – the ZooDataset instance for the dataset

  • dataset_type – the fiftyone.types.Dataset type of the dataset

  • num_samples – the total number of samples in all downloaded splits of the dataset

  • downloaded_splits (None) – a dict of ZooDatasetSplitInfo instances describing the downloaded splits of the dataset, if applicable

  • parameters (None) – a dict of parameters for the dataset

  • classes (None) – a list of class label strings

Attributes:

name

The name of the dataset.

zoo_dataset

The fully-qualified class string for the ZooDataset of the dataset.

dataset_type

The fully-qualified class string of the fiftyone.types.Dataset type, if any.

supported_splits

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

url

The dataset's URL, or None if it is not remotely-sourced.

Methods:

get_zoo_dataset()

Returns the ZooDataset instance for the dataset.

get_dataset_type()

Returns the fiftyone.types.Dataset type instance for the dataset.

is_split_downloaded(split)

Whether the given dataset split is downloaded.

add_split(split_info)

Adds the split to the dataset.

remove_split(split)

Removes the split from the dataset.

attributes()

Returns a list of class attributes to be serialized.

from_dict(d)

Loads a ZooDatasetInfo from a JSON dictionary.

from_json(json_path[, zoo_dataset, upgrade, ...])

Loads a ZooDatasetInfo from a JSON file on disk.

copy()

Returns a deep copy of the object.

custom_attributes([dynamic, private])

Returns a customizable list of class attributes.

from_str(s, *args, **kwargs)

Constructs a Serializable object from a JSON string.

get_class_name()

Returns the fully-qualified class name string of this object.

serialize([reflective])

Serializes the object into a dictionary.

to_str([pretty_print])

Returns a string representation of this object.

write_json(path[, pretty_print])

Serializes the object and writes it to disk.

property name#

The name of the dataset.

property zoo_dataset#

The fully-qualified class string for the ZooDataset of the dataset.

property dataset_type#

The fully-qualified class string of the fiftyone.types.Dataset type, if any.

property supported_splits#

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

property url#

The dataset’s URL, or None if it is not remotely-sourced.

get_zoo_dataset()#

Returns the ZooDataset instance for the dataset.

Returns:

a ZooDataset instance

get_dataset_type()#

Returns the fiftyone.types.Dataset type instance for the dataset.

Returns:

a fiftyone.types.Dataset instance

is_split_downloaded(split)#

Whether the given dataset split is downloaded.

Parameters:

split – the dataset split

Returns:

True/False

add_split(split_info)#

Adds the split to the dataset.

Parameters:

split_info – a ZooDatasetSplitInfo

remove_split(split)#

Removes the split from the dataset.

Parameters:

split – the name of the split

attributes()#

Returns a list of class attributes to be serialized.

Returns:

a list of class attributes

classmethod from_dict(d)#

Loads a ZooDatasetInfo from a JSON dictionary.

Parameters:

d – a JSON dictionary

Returns:

a ZooDatasetInfo

classmethod from_json(json_path, zoo_dataset=None, upgrade=False, warn_deprecated=False)#

Loads a ZooDatasetInfo from a JSON file on disk.

Parameters:
  • json_path – path to JSON file

  • zoo_dataset (None) – an existing ZooDataset instance

  • upgrade (False) – whether to upgrade the JSON file on disk if any migrations were necessary

  • warn_deprecated (False) – whether to issue a warning if the dataset has a deprecated format

Returns:

a ZooDatasetInfo

copy()#

Returns a deep copy of the object.

Returns:

a Serializable instance

custom_attributes(dynamic=False, private=False)#

Returns a customizable list of class attributes.

By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).

Parameters:
  • dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False

  • private – whether to include private properties, i.e., those starting with “_”. By default, this is False

Returns:

a list of class attributes

classmethod from_str(s, *args, **kwargs)#

Constructs a Serializable object from a JSON string.

Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.

Parameters:
  • s – a JSON string representation of a Serializable object

  • *args – optional positional arguments for self.from_dict()

  • **kwargs – optional keyword arguments for self.from_dict()

Returns:

an instance of the Serializable class

classmethod get_class_name()#

Returns the fully-qualified class name string of this object.

serialize(reflective=False)#

Serializes the object into a dictionary.

Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.

Parameters:

reflective – whether to include reflective attributes when serializing the object. By default, this is False

Returns:

a JSON dictionary representation of the object

to_str(pretty_print=True, **kwargs)#

Returns a string representation of this object.

Parameters:
  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True

  • **kwargs – optional keyword arguments for self.serialize()

Returns:

a string representation of the object

write_json(path, pretty_print=False, **kwargs)#

Serializes the object and writes it to disk.

Parameters:
  • path – the output path

  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False

  • **kwargs – optional keyword arguments for self.serialize()

class fiftyone.zoo.datasets.ZooDatasetSplitInfo(split, num_samples)#

Bases: Serializable

Class containing info about a split of a dataset in the FiftyOne Dataset Zoo.

Parameters:
  • split – the name of the split

  • num_samples – the number of samples in the split

Methods:

attributes()

Returns a list of class attributes to be serialized.

from_dict(d)

Loads a ZooDatasetSplitInfo from a JSON dictionary.

copy()

Returns a deep copy of the object.

custom_attributes([dynamic, private])

Returns a customizable list of class attributes.

from_json(path, *args, **kwargs)

Constructs a Serializable object from a JSON file.

from_str(s, *args, **kwargs)

Constructs a Serializable object from a JSON string.

get_class_name()

Returns the fully-qualified class name string of this object.

serialize([reflective])

Serializes the object into a dictionary.

to_str([pretty_print])

Returns a string representation of this object.

write_json(path[, pretty_print])

Serializes the object and writes it to disk.

attributes()#

Returns a list of class attributes to be serialized.

Returns:

a list of class attributes

classmethod from_dict(d)#

Loads a ZooDatasetSplitInfo from a JSON dictionary.

Parameters:

d – a JSON dictionary

Returns:

a ZooDatasetSplitInfo

copy()#

Returns a deep copy of the object.

Returns:

a Serializable instance

custom_attributes(dynamic=False, private=False)#

Returns a customizable list of class attributes.

By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).

Parameters:
  • dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False

  • private – whether to include private properties, i.e., those starting with “_”. By default, this is False

Returns:

a list of class attributes

classmethod from_json(path, *args, **kwargs)#

Constructs a Serializable object from a JSON file.

Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.

Parameters:
  • path – the path to the JSON file on disk

  • *args – optional positional arguments for self.from_dict()

  • **kwargs – optional keyword arguments for self.from_dict()

Returns:

an instance of the Serializable class

classmethod from_str(s, *args, **kwargs)#

Constructs a Serializable object from a JSON string.

Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.

Parameters:
  • s – a JSON string representation of a Serializable object

  • *args – optional positional arguments for self.from_dict()

  • **kwargs – optional keyword arguments for self.from_dict()

Returns:

an instance of the Serializable class

classmethod get_class_name()#

Returns the fully-qualified class name string of this object.

serialize(reflective=False)#

Serializes the object into a dictionary.

Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.

Parameters:

reflective – whether to include reflective attributes when serializing the object. By default, this is False

Returns:

a JSON dictionary representation of the object

to_str(pretty_print=True, **kwargs)#

Returns a string representation of this object.

Parameters:
  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True

  • **kwargs – optional keyword arguments for self.serialize()

Returns:

a string representation of the object

write_json(path, pretty_print=False, **kwargs)#

Serializes the object and writes it to disk.

Parameters:
  • path – the output path

  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False

  • **kwargs – optional keyword arguments for self.serialize()

class fiftyone.zoo.datasets.ZooDataset#

Bases: object

Base class for datasets made available in the FiftyOne Dataset Zoo.

Attributes:

name

The name of the dataset.

is_remote

Whether the dataset is remotely-sourced.

license

The license or list,of,licenses under which the dataset is distributed, or None if unknown.

tags

A tuple of tags for the dataset.

has_tags

Whether the dataset has tags.

parameters

An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.

supported_splits

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

has_splits

Whether the dataset has splits.

has_patches

Whether the dataset has patches that may need to be applied to already downloaded files.

supports_partial_downloads

Whether the dataset supports downloading partial subsets of its splits.

requires_manual_download

Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.

importer_kwargs

A dict of default kwargs to pass to this dataset's fiftyone.utils.data.importers.DatasetImporter.

Methods:

has_tag(tag)

Whether the dataset has the given tag.

has_split(split)

Whether the dataset has the given split.

get_split_dir(dataset_dir, split)

Returns the directory for the given split of the dataset.

has_info(dataset_dir)

Determines whether the directory contains ZooDatasetInfo.

load_info(dataset_dir[, upgrade, ...])

Loads the ZooDatasetInfo from the given dataset directory.

get_info_path(dataset_dir)

Returns the path to the ZooDatasetInfo for the dataset.

download_and_prepare(dataset_dir[, split, ...])

Downloads the dataset and prepares it for use.

property name#

The name of the dataset.

property is_remote#

Whether the dataset is remotely-sourced.

property license#

The license or list,of,licenses under which the dataset is distributed, or None if unknown.

property tags#

A tuple of tags for the dataset.

property has_tags#

Whether the dataset has tags.

property parameters#

An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.

property supported_splits#

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

property has_splits#

Whether the dataset has splits.

property has_patches#

Whether the dataset has patches that may need to be applied to already downloaded files.

property supports_partial_downloads#

Whether the dataset supports downloading partial subsets of its splits.

property requires_manual_download#

Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.

property importer_kwargs#

A dict of default kwargs to pass to this dataset’s fiftyone.utils.data.importers.DatasetImporter.

has_tag(tag)#

Whether the dataset has the given tag.

Parameters:

tag – the tag

Returns:

True/False

has_split(split)#

Whether the dataset has the given split.

Parameters:

split – the dataset split

Returns:

True/False

get_split_dir(dataset_dir, split)#

Returns the directory for the given split of the dataset.

Parameters:
  • dataset_dir – the dataset directory

  • split – the dataset split

Returns:

the directory that will/does hold the specified split

static has_info(dataset_dir)#

Determines whether the directory contains ZooDatasetInfo.

Parameters:

dataset_dir – the dataset directory

Returns:

True/False

static load_info(dataset_dir, upgrade=True, warn_deprecated=False)#

Loads the ZooDatasetInfo from the given dataset directory.

Parameters:
  • dataset_dir – the directory in which to construct the dataset

  • upgrade (True) – whether to upgrade the JSON file on disk if any migrations were necessary

  • warn_deprecated (False) – whether to issue a warning if the dataset has a deprecated format

Returns:

the ZooDatasetInfo for the dataset

static get_info_path(dataset_dir)#

Returns the path to the ZooDatasetInfo for the dataset.

Parameters:

dataset_dir – the dataset directory

Returns:

the path to the ZooDatasetInfo

download_and_prepare(dataset_dir, split=None, splits=None, cleanup=True)#

Downloads the dataset and prepares it for use.

If the requested splits have already been downloaded, they are not re-downloaded.

Parameters:
  • dataset_dir – the directory in which to construct the dataset

  • split (None) – split nor splits are provided, the full dataset is downloaded

  • splits (None) – a list of splits to download, if applicable. If neither split nor splits are provided, the full dataset is downloaded

  • cleanup (True) – whether to cleanup any temporary files generated during download

Returns:

the ZooDatasetInfo for the dataset

class fiftyone.zoo.datasets.RemoteZooDataset(dataset_dir, url=None, **kwargs)#

Bases: ZooDataset

Class for working with remotely-sourced datasets that are compatible with the FiftyOne Dataset Zoo.

Parameters:
  • dataset_dir – the dataset’s local directory, which must contain a valid dataset YAML file

  • url (None) –

    the dataset’s remote source, which can be:

    • a GitHub repo URL like https://github.com/<user>/<repo>

    • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>

    • a GitHub ref string like <user>/<repo>[/<ref>]

    • a publicly accessible URL of an archive (eg zip or tar) file

    This is explicitly provided rather than relying on the YAML file’s url property in case the caller has specified a particular branch or commit

  • **kwargs – optional keyword arguments for the dataset’s download_and_prepare() and/or load_dataset() methods

Attributes:

metadata

name

The name of the dataset.

url

is_remote

Whether the dataset is remotely-sourced.

author

version

source

license

The license or list,of,licenses under which the dataset is distributed, or None if unknown.

description

fiftyone_version

tags

A tuple of tags for the dataset.

supported_splits

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

supports_partial_downloads

Whether the dataset supports downloading partial subsets of its splits.

size_samples

has_patches

Whether the dataset has patches that may need to be applied to already downloaded files.

has_splits

Whether the dataset has splits.

has_tags

Whether the dataset has tags.

importer_kwargs

A dict of default kwargs to pass to this dataset's fiftyone.utils.data.importers.DatasetImporter.

parameters

An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.

requires_manual_download

Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.

Methods:

download_and_prepare(dataset_dir[, split, ...])

Downloads the dataset and prepares it for use.

get_info_path(dataset_dir)

Returns the path to the ZooDatasetInfo for the dataset.

get_split_dir(dataset_dir, split)

Returns the directory for the given split of the dataset.

has_info(dataset_dir)

Determines whether the directory contains ZooDatasetInfo.

has_split(split)

Whether the dataset has the given split.

has_tag(tag)

Whether the dataset has the given tag.

load_info(dataset_dir[, upgrade, ...])

Loads the ZooDatasetInfo from the given dataset directory.

property metadata#
property name#

The name of the dataset.

property url#
property is_remote#

Whether the dataset is remotely-sourced.

property author#
property version#
property source#
property license#

The license or list,of,licenses under which the dataset is distributed, or None if unknown.

property description#
property fiftyone_version#
property tags#

A tuple of tags for the dataset.

property supported_splits#

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

property supports_partial_downloads#

Whether the dataset supports downloading partial subsets of its splits.

property size_samples#
download_and_prepare(dataset_dir, split=None, splits=None, cleanup=True)#

Downloads the dataset and prepares it for use.

If the requested splits have already been downloaded, they are not re-downloaded.

Parameters:
  • dataset_dir – the directory in which to construct the dataset

  • split (None) – split nor splits are provided, the full dataset is downloaded

  • splits (None) – a list of splits to download, if applicable. If neither split nor splits are provided, the full dataset is downloaded

  • cleanup (True) – whether to cleanup any temporary files generated during download

Returns:

the ZooDatasetInfo for the dataset

static get_info_path(dataset_dir)#

Returns the path to the ZooDatasetInfo for the dataset.

Parameters:

dataset_dir – the dataset directory

Returns:

the path to the ZooDatasetInfo

get_split_dir(dataset_dir, split)#

Returns the directory for the given split of the dataset.

Parameters:
  • dataset_dir – the dataset directory

  • split – the dataset split

Returns:

the directory that will/does hold the specified split

static has_info(dataset_dir)#

Determines whether the directory contains ZooDatasetInfo.

Parameters:

dataset_dir – the dataset directory

Returns:

True/False

property has_patches#

Whether the dataset has patches that may need to be applied to already downloaded files.

has_split(split)#

Whether the dataset has the given split.

Parameters:

split – the dataset split

Returns:

True/False

property has_splits#

Whether the dataset has splits.

has_tag(tag)#

Whether the dataset has the given tag.

Parameters:

tag – the tag

Returns:

True/False

property has_tags#

Whether the dataset has tags.

property importer_kwargs#

A dict of default kwargs to pass to this dataset’s fiftyone.utils.data.importers.DatasetImporter.

static load_info(dataset_dir, upgrade=True, warn_deprecated=False)#

Loads the ZooDatasetInfo from the given dataset directory.

Parameters:
  • dataset_dir – the directory in which to construct the dataset

  • upgrade (True) – whether to upgrade the JSON file on disk if any migrations were necessary

  • warn_deprecated (False) – whether to issue a warning if the dataset has a deprecated format

Returns:

the ZooDatasetInfo for the dataset

property parameters#

An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.

property requires_manual_download#

Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.

class fiftyone.zoo.datasets.DeprecatedZooDataset#

Bases: ZooDataset

Class representing a zoo dataset that no longer exists in the FiftyOne Dataset Zoo.

Attributes:

name

The name of the dataset.

supported_splits

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

has_patches

Whether the dataset has patches that may need to be applied to already downloaded files.

has_splits

Whether the dataset has splits.

has_tags

Whether the dataset has tags.

importer_kwargs

A dict of default kwargs to pass to this dataset's fiftyone.utils.data.importers.DatasetImporter.

is_remote

Whether the dataset is remotely-sourced.

license

The license or list,of,licenses under which the dataset is distributed, or None if unknown.

parameters

An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.

requires_manual_download

Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.

supports_partial_downloads

Whether the dataset supports downloading partial subsets of its splits.

tags

A tuple of tags for the dataset.

Methods:

download_and_prepare(dataset_dir[, split, ...])

Downloads the dataset and prepares it for use.

get_info_path(dataset_dir)

Returns the path to the ZooDatasetInfo for the dataset.

get_split_dir(dataset_dir, split)

Returns the directory for the given split of the dataset.

has_info(dataset_dir)

Determines whether the directory contains ZooDatasetInfo.

has_split(split)

Whether the dataset has the given split.

has_tag(tag)

Whether the dataset has the given tag.

load_info(dataset_dir[, upgrade, ...])

Loads the ZooDatasetInfo from the given dataset directory.

property name#

The name of the dataset.

property supported_splits#

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

download_and_prepare(dataset_dir, split=None, splits=None, cleanup=True)#

Downloads the dataset and prepares it for use.

If the requested splits have already been downloaded, they are not re-downloaded.

Parameters:
  • dataset_dir – the directory in which to construct the dataset

  • split (None) – split nor splits are provided, the full dataset is downloaded

  • splits (None) – a list of splits to download, if applicable. If neither split nor splits are provided, the full dataset is downloaded

  • cleanup (True) – whether to cleanup any temporary files generated during download

Returns:

the ZooDatasetInfo for the dataset

static get_info_path(dataset_dir)#

Returns the path to the ZooDatasetInfo for the dataset.

Parameters:

dataset_dir – the dataset directory

Returns:

the path to the ZooDatasetInfo

get_split_dir(dataset_dir, split)#

Returns the directory for the given split of the dataset.

Parameters:
  • dataset_dir – the dataset directory

  • split – the dataset split

Returns:

the directory that will/does hold the specified split

static has_info(dataset_dir)#

Determines whether the directory contains ZooDatasetInfo.

Parameters:

dataset_dir – the dataset directory

Returns:

True/False

property has_patches#

Whether the dataset has patches that may need to be applied to already downloaded files.

has_split(split)#

Whether the dataset has the given split.

Parameters:

split – the dataset split

Returns:

True/False

property has_splits#

Whether the dataset has splits.

has_tag(tag)#

Whether the dataset has the given tag.

Parameters:

tag – the tag

Returns:

True/False

property has_tags#

Whether the dataset has tags.

property importer_kwargs#

A dict of default kwargs to pass to this dataset’s fiftyone.utils.data.importers.DatasetImporter.

property is_remote#

Whether the dataset is remotely-sourced.

property license#

The license or list,of,licenses under which the dataset is distributed, or None if unknown.

static load_info(dataset_dir, upgrade=True, warn_deprecated=False)#

Loads the ZooDatasetInfo from the given dataset directory.

Parameters:
  • dataset_dir – the directory in which to construct the dataset

  • upgrade (True) – whether to upgrade the JSON file on disk if any migrations were necessary

  • warn_deprecated (False) – whether to issue a warning if the dataset has a deprecated format

Returns:

the ZooDatasetInfo for the dataset

property parameters#

An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.

property requires_manual_download#

Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.

property supports_partial_downloads#

Whether the dataset supports downloading partial subsets of its splits.

property tags#

A tuple of tags for the dataset.