fiftyone.zoo.datasets¶
Module contents¶
The FiftyOne Dataset Zoo.
This package defines a collection of open source datasets made available for download via FiftyOne.
Functions:
|
Returns the list of available datasets in the FiftyOne Dataset Zoo. |
|
Returns information about the zoo datasets that have been downloaded. |
|
Downloads the dataset of the given name from the FiftyOne Dataset Zoo. |
|
Loads the dataset of the given name from the FiftyOne Dataset Zoo as a |
|
Returns the directory containing the given zoo dataset. |
|
Loads the |
|
Returns the |
|
Deletes the zoo dataset from local disk, if necessary. |
Classes:
|
Class containing info about a dataset in the FiftyOne Dataset Zoo. |
|
Class containing info about a split of a dataset in the FiftyOne Dataset Zoo. |
Base class for datasets made available in the FiftyOne Dataset Zoo. |
|
Class representing a zoo dataset that no longer exists in the FiftyOne Dataset Zoo. |
-
fiftyone.zoo.datasets.
list_zoo_datasets
(tags=None, source=None)¶ Returns the list of available datasets in the FiftyOne Dataset Zoo.
Example usage:
import fiftyone as fo import fiftyone.zoo as foz # # List all zoo datasets # names = foz.list_zoo_datasets() print(names) # # List all zoo datasets with (both of) the specified tags # names = foz.list_zoo_datasets(tags=["image", "detection"]) print(names) # # List all zoo datasets available via the given source # names = foz.list_zoo_datasets(source="torch") print(names)
- Parameters
tags (None) – only include datasets that have the specified tag or list of tags
source (None) – only include datasets available via the given source or list of sources
- Returns
a sorted list of dataset names
-
fiftyone.zoo.datasets.
list_downloaded_zoo_datasets
(base_dir=None)¶ Returns information about the zoo datasets that have been downloaded.
- Parameters
base_dir (None) – the base directory to search for downloaded datasets. By default,
fo.config.dataset_zoo_dir
is used- Returns
a dict mapping dataset names to (dataset dir,
ZooDatasetInfo
) tuples
-
fiftyone.zoo.datasets.
download_zoo_dataset
(name, split=None, splits=None, dataset_dir=None, overwrite=False, cleanup=True, **kwargs)¶ Downloads the dataset of the given name from the FiftyOne Dataset Zoo.
Any dataset splits that already exist in the specified directory are not re-downloaded, unless
overwrite == True
is specified.- Parameters
name – the name of the zoo dataset to download. Call
list_zoo_datasets()
to see the available datasetssplit (None) –
("train", "validation", "test")
. If neithersplit
norsplits
are provided, all available splits are downloaded. Consult the documentation for theZooDataset
you specified to see the supported splitssplits (None) – a list of splits to download, if applicable. Typical values are
("train", "validation", "test")
. If neithersplit
norsplits
are provided, all available splits are downloaded. Consult the documentation for theZooDataset
you specified to see the supported splitsdataset_dir (None) – the directory into which to download the dataset. By default, it is downloaded to a subdirectory of
fiftyone.config.dataset_zoo_dir
overwrite (False) – whether to overwrite any existing files
cleanup (True) – whether to cleanup any temporary files generated during download
**kwargs – optional arguments for the
ZooDataset
constructor
- Returns
tuple of
info: the
ZooDatasetInfo
for the datasetdataset_dir: the directory containing the dataset
-
fiftyone.zoo.datasets.
load_zoo_dataset
(name, split=None, splits=None, label_field=None, dataset_name=None, dataset_dir=None, download_if_necessary=True, drop_existing_dataset=False, persistent=False, overwrite=False, cleanup=True, progress=None, **kwargs)¶ Loads the dataset of the given name from the FiftyOne Dataset Zoo as a
fiftyone.core.dataset.Dataset
.By default, the dataset will be downloaded if it does not already exist in the specified directory.
If you do not specify a custom
dataset_name
and you have previously loaded the same zoo dataset and split(s) into FiftyOne, the existingfiftyone.core.dataset.Dataset
will be returned.- Parameters
name – the name of the zoo dataset to load. Call
list_zoo_datasets()
to see the available datasetssplit (None) –
("train", "validation", "test")
. If neithersplit
norsplits
are provided, all available splits are loaded. Consult the documentation for theZooDataset
you specified to see the supported splitssplits (None) – a list of splits to load, if applicable. Typical values are
("train", "validation", "test")
. If neithersplit
norsplits
are provided, all available splits are loaded. Consult the documentation for theZooDataset
you specified to see the supported splitslabel_field (None) – the label field (or prefix, if the dataset contains multiple label fields) in which to store the dataset’s labels. By default, this is
"ground_truth"
if the dataset contains a single label field. If the dataset contains multiple label fields and this value is not provided, the labels will be stored under dataset-specific field namesdataset_name (None) – an optional name to give the returned
fiftyone.core.dataset.Dataset
. By default, a name will be constructed based on the dataset and split(s) you are loadingdataset_dir (None) – the directory in which the dataset is stored or will be downloaded. By default, the dataset will be located in
fiftyone.config.dataset_zoo_dir
download_if_necessary (True) – whether to download the dataset if it is not found in the specified dataset directory
drop_existing_dataset (False) – whether to drop an existing dataset with the same name if it exists
persistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite any existing files if the dataset is to be downloaded
cleanup (True) – whether to cleanup any temporary files generated during download
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional arguments to pass to the
fiftyone.utils.data.importers.DatasetImporter
constructor. Ifdownload_if_necessary == True
, thenkwargs
can also contain arguments fordownload_zoo_dataset()
- Returns
-
fiftyone.zoo.datasets.
find_zoo_dataset
(name, split=None)¶ Returns the directory containing the given zoo dataset.
If a
split
is provided, the path to the dataset split is returned; otherwise, the path to the root directory is returned.The dataset must be downloaded. Use
download_zoo_dataset()
to download datasets.- Parameters
name – the name of the zoo dataset
split (None) –
- Returns
the directory containing the dataset
- Raises
ValueError – if the dataset or split does not exist or has not been downloaded
-
fiftyone.zoo.datasets.
load_zoo_dataset_info
(name, dataset_dir=None)¶ Loads the
ZooDatasetInfo
for the specified zoo dataset.The dataset must be downloaded. Use
download_zoo_dataset()
to download datasets.- Parameters
name – the name of the zoo dataset
dataset_dir (None) – the directory in which the dataset is stored. By default, the dataset is located in
fiftyone.config.dataset_zoo_dir
- Returns
the
ZooDatasetInfo
for the dataset- Raises
ValueError – if the dataset has not been downloaded
-
fiftyone.zoo.datasets.
get_zoo_dataset
(name, **kwargs)¶ Returns the
ZooDataset
instance for the dataset with the given name.If the dataset is available from multiple sources, the default source is used.
- Parameters
name – the name of the zoo dataset
**kwargs – optional arguments for
ZooDataset
- Returns
the
ZooDataset
instance
-
fiftyone.zoo.datasets.
delete_zoo_dataset
(name, split=None)¶ Deletes the zoo dataset from local disk, if necessary.
If a
split
is provided, only that split is deleted.- Parameters
name – the name of the zoo dataset
split (None) –
-
class
fiftyone.zoo.datasets.
ZooDatasetInfo
(zoo_dataset, dataset_type, num_samples, downloaded_splits=None, parameters=None, classes=None)¶ Bases:
eta.core.serial.Serializable
Class containing info about a dataset in the FiftyOne Dataset Zoo.
- Parameters
zoo_dataset – the
ZooDataset
instance for the datasetdataset_type – the
fiftyone.types.Dataset
type of the datasetnum_samples – the total number of samples in all downloaded splits of the dataset
downloaded_splits (None) – a dict of
ZooDatasetSplitInfo
instances describing the downloaded splits of the dataset, if applicableparameters (None) – a dict of parameters for the dataset
classes (None) – a list of class label strings
Attributes:
The name of the dataset.
The fully-qualified class string for the
ZooDataset
of the dataset.The fully-qualified class string of the
fiftyone.types.Dataset
type.A tuple of supported splits for the dataset, or None if the dataset does not have splits.
Methods:
Returns the
ZooDataset
instance for the dataset.Returns the
fiftyone.types.Dataset
type instance for the dataset.is_split_downloaded
(split)Whether the given dataset split is downloaded.
add_split
(split_info)Adds the split to the dataset.
remove_split
(split)Removes the split from the dataset.
Returns a list of class attributes to be serialized.
from_dict
(d)Loads a
ZooDatasetInfo
from a JSON dictionary.from_json
(json_path[, upgrade, warn_deprecated])Loads a
ZooDatasetInfo
from a JSON file on disk.copy
()Returns a deep copy of the object.
custom_attributes
([dynamic, private])Returns a customizable list of class attributes.
from_str
(s, *args, **kwargs)Constructs a Serializable object from a JSON string.
Returns the fully-qualified class name string of this object.
serialize
([reflective])Serializes the object into a dictionary.
to_str
([pretty_print])Returns a string representation of this object.
write_json
(path[, pretty_print])Serializes the object and writes it to disk.
-
property
name
¶ The name of the dataset.
-
property
zoo_dataset
¶ The fully-qualified class string for the
ZooDataset
of the dataset.
-
property
dataset_type
¶ The fully-qualified class string of the
fiftyone.types.Dataset
type.
-
property
supported_splits
¶ A tuple of supported splits for the dataset, or None if the dataset does not have splits.
-
get_zoo_dataset
()¶ Returns the
ZooDataset
instance for the dataset.- Returns
a
ZooDataset
instance
-
get_dataset_type
()¶ Returns the
fiftyone.types.Dataset
type instance for the dataset.- Returns
a
fiftyone.types.Dataset
instance
-
is_split_downloaded
(split)¶ Whether the given dataset split is downloaded.
- Parameters
split – the dataset split
- Returns
True/False
-
add_split
(split_info)¶ Adds the split to the dataset.
- Parameters
split_info – a
ZooDatasetSplitInfo
-
remove_split
(split)¶ Removes the split from the dataset.
- Parameters
split – the name of the split
-
attributes
()¶ Returns a list of class attributes to be serialized.
- Returns
a list of class attributes
-
classmethod
from_dict
(d)¶ Loads a
ZooDatasetInfo
from a JSON dictionary.- Parameters
d – a JSON dictionary
- Returns
-
classmethod
from_json
(json_path, upgrade=False, warn_deprecated=False)¶ Loads a
ZooDatasetInfo
from a JSON file on disk.- Parameters
json_path – path to JSON file
upgrade (False) – whether to upgrade the JSON file on disk if any migrations were necessary
warn_deprecated (False) – whether to issue a warning if the dataset has a deprecated format
- Returns
-
copy
()¶ Returns a deep copy of the object.
- Returns
a Serializable instance
-
custom_attributes
(dynamic=False, private=False)¶ Returns a customizable list of class attributes.
By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).
- Parameters
dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False
private – whether to include private properties, i.e., those starting with “_”. By default, this is False
- Returns
a list of class attributes
-
classmethod
from_str
(s, *args, **kwargs)¶ Constructs a Serializable object from a JSON string.
Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.
- Parameters
s – a JSON string representation of a Serializable object
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()
- Returns
an instance of the Serializable class
-
classmethod
get_class_name
()¶ Returns the fully-qualified class name string of this object.
-
serialize
(reflective=False)¶ Serializes the object into a dictionary.
Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.
- Parameters
reflective – whether to include reflective attributes when serializing the object. By default, this is False
- Returns
a JSON dictionary representation of the object
-
to_str
(pretty_print=True, **kwargs)¶ Returns a string representation of this object.
- Parameters
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True
**kwargs – optional keyword arguments for self.serialize()
- Returns
a string representation of the object
-
write_json
(path, pretty_print=False, **kwargs)¶ Serializes the object and writes it to disk.
- Parameters
path – the output path
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False
**kwargs – optional keyword arguments for self.serialize()
-
class
fiftyone.zoo.datasets.
ZooDatasetSplitInfo
(split, num_samples)¶ Bases:
eta.core.serial.Serializable
Class containing info about a split of a dataset in the FiftyOne Dataset Zoo.
- Parameters
split – the name of the split
num_samples – the number of samples in the split
Methods:
Returns a list of class attributes to be serialized.
from_dict
(d)Loads a
ZooDatasetSplitInfo
from a JSON dictionary.copy
()Returns a deep copy of the object.
custom_attributes
([dynamic, private])Returns a customizable list of class attributes.
from_json
(path, *args, **kwargs)Constructs a Serializable object from a JSON file.
from_str
(s, *args, **kwargs)Constructs a Serializable object from a JSON string.
Returns the fully-qualified class name string of this object.
serialize
([reflective])Serializes the object into a dictionary.
to_str
([pretty_print])Returns a string representation of this object.
write_json
(path[, pretty_print])Serializes the object and writes it to disk.
-
attributes
()¶ Returns a list of class attributes to be serialized.
- Returns
a list of class attributes
-
classmethod
from_dict
(d)¶ Loads a
ZooDatasetSplitInfo
from a JSON dictionary.- Parameters
d – a JSON dictionary
- Returns
-
copy
()¶ Returns a deep copy of the object.
- Returns
a Serializable instance
-
custom_attributes
(dynamic=False, private=False)¶ Returns a customizable list of class attributes.
By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).
- Parameters
dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False
private – whether to include private properties, i.e., those starting with “_”. By default, this is False
- Returns
a list of class attributes
-
classmethod
from_json
(path, *args, **kwargs)¶ Constructs a Serializable object from a JSON file.
Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.
- Parameters
path – the path to the JSON file on disk
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()
- Returns
an instance of the Serializable class
-
classmethod
from_str
(s, *args, **kwargs)¶ Constructs a Serializable object from a JSON string.
Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.
- Parameters
s – a JSON string representation of a Serializable object
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()
- Returns
an instance of the Serializable class
-
classmethod
get_class_name
()¶ Returns the fully-qualified class name string of this object.
-
serialize
(reflective=False)¶ Serializes the object into a dictionary.
Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.
- Parameters
reflective – whether to include reflective attributes when serializing the object. By default, this is False
- Returns
a JSON dictionary representation of the object
-
to_str
(pretty_print=True, **kwargs)¶ Returns a string representation of this object.
- Parameters
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True
**kwargs – optional keyword arguments for self.serialize()
- Returns
a string representation of the object
-
write_json
(path, pretty_print=False, **kwargs)¶ Serializes the object and writes it to disk.
- Parameters
path – the output path
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False
**kwargs – optional keyword arguments for self.serialize()
-
class
fiftyone.zoo.datasets.
ZooDataset
¶ Bases:
object
Base class for datasets made available in the FiftyOne Dataset Zoo.
Attributes:
The name of the dataset.
A tuple of tags for the dataset.
Whether the dataset has tags.
An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.
A tuple of supported splits for the dataset, or None if the dataset does not have splits.
Whether the dataset has splits.
Whether the dataset has patches that may need to be applied to already downloaded files.
Whether the dataset supports downloading partial subsets of its splits.
Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.
A dict of default kwargs to pass to this dataset’s
fiftyone.utils.data.importers.DatasetImporter
.Methods:
has_tag
(tag)Whether the dataset has the given tag.
has_split
(split)Whether the dataset has the given split.
get_split_dir
(dataset_dir, split)Returns the directory for the given split of the dataset.
load_info
(dataset_dir[, upgrade, …])Loads the
ZooDatasetInfo
from the given dataset directory.get_info_path
(dataset_dir)Returns the path to the
ZooDatasetInfo
for the dataset.download_and_prepare
([dataset_dir, split, …])Downloads the dataset and prepares it for use.
-
property
name
¶ The name of the dataset.
A tuple of tags for the dataset.
Whether the dataset has tags.
-
property
parameters
¶ An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.
-
property
supported_splits
¶ A tuple of supported splits for the dataset, or None if the dataset does not have splits.
-
property
has_splits
¶ Whether the dataset has splits.
-
property
has_patches
¶ Whether the dataset has patches that may need to be applied to already downloaded files.
-
property
supports_partial_downloads
¶ Whether the dataset supports downloading partial subsets of its splits.
-
property
requires_manual_download
¶ Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.
-
property
importer_kwargs
¶ A dict of default kwargs to pass to this dataset’s
fiftyone.utils.data.importers.DatasetImporter
.
-
has_tag
(tag)¶ Whether the dataset has the given tag.
- Parameters
tag – the tag
- Returns
True/False
-
has_split
(split)¶ Whether the dataset has the given split.
- Parameters
split – the dataset split
- Returns
True/False
-
get_split_dir
(dataset_dir, split)¶ Returns the directory for the given split of the dataset.
- Parameters
dataset_dir – the dataset directory
split – the dataset split
- Returns
the directory that will/does hold the specified split
-
static
load_info
(dataset_dir, upgrade=True, warn_deprecated=False)¶ Loads the
ZooDatasetInfo
from the given dataset directory.- Parameters
dataset_dir – the directory in which to construct the dataset
upgrade (True) – whether to upgrade the JSON file on disk if any migrations were necessary
warn_deprecated (False) – whether to issue a warning if the dataset has a deprecated format
- Returns
the
ZooDatasetInfo
for the dataset
-
static
get_info_path
(dataset_dir)¶ Returns the path to the
ZooDatasetInfo
for the dataset.- Parameters
dataset_dir – the dataset directory
- Returns
the path to the
ZooDatasetInfo
-
download_and_prepare
(dataset_dir=None, split=None, splits=None, overwrite=False, cleanup=True)¶ Downloads the dataset and prepares it for use.
If the requested splits have already been downloaded, they are not re-downloaded unless
overwrite
is True.- Parameters
dataset_dir (None) – the directory in which to construct the dataset. By default, it is written to a subdirectory of
fiftyone.config.dataset_zoo_dir
split (None) –
split
norsplits
are provided, the full dataset is downloadedsplits (None) – a list of splits to download, if applicable. If neither
split
norsplits
are provided, the full dataset is downloadedoverwrite (False) – whether to overwrite any existing files
cleanup (True) – whether to cleanup any temporary files generated during download
- Returns
tuple of
info: the
ZooDatasetInfo
for the datasetdataset_dir: the directory containing the dataset
-
property
-
class
fiftyone.zoo.datasets.
DeprecatedZooDataset
¶ Bases:
fiftyone.zoo.datasets.ZooDataset
Class representing a zoo dataset that no longer exists in the FiftyOne Dataset Zoo.
Attributes:
The name of the dataset.
A tuple of supported splits for the dataset, or None if the dataset does not have splits.
Whether the dataset has patches that may need to be applied to already downloaded files.
Whether the dataset has splits.
Whether the dataset has tags.
A dict of default kwargs to pass to this dataset’s
fiftyone.utils.data.importers.DatasetImporter
.An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.
Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.
Whether the dataset supports downloading partial subsets of its splits.
A tuple of tags for the dataset.
Methods:
download_and_prepare
([dataset_dir, split, …])Downloads the dataset and prepares it for use.
get_info_path
(dataset_dir)Returns the path to the
ZooDatasetInfo
for the dataset.get_split_dir
(dataset_dir, split)Returns the directory for the given split of the dataset.
has_split
(split)Whether the dataset has the given split.
has_tag
(tag)Whether the dataset has the given tag.
load_info
(dataset_dir[, upgrade, …])Loads the
ZooDatasetInfo
from the given dataset directory.-
property
name
¶ The name of the dataset.
-
property
supported_splits
¶ A tuple of supported splits for the dataset, or None if the dataset does not have splits.
-
download_and_prepare
(dataset_dir=None, split=None, splits=None, overwrite=False, cleanup=True)¶ Downloads the dataset and prepares it for use.
If the requested splits have already been downloaded, they are not re-downloaded unless
overwrite
is True.- Parameters
dataset_dir (None) – the directory in which to construct the dataset. By default, it is written to a subdirectory of
fiftyone.config.dataset_zoo_dir
split (None) –
split
norsplits
are provided, the full dataset is downloadedsplits (None) – a list of splits to download, if applicable. If neither
split
norsplits
are provided, the full dataset is downloadedoverwrite (False) – whether to overwrite any existing files
cleanup (True) – whether to cleanup any temporary files generated during download
- Returns
tuple of
info: the
ZooDatasetInfo
for the datasetdataset_dir: the directory containing the dataset
-
static
get_info_path
(dataset_dir)¶ Returns the path to the
ZooDatasetInfo
for the dataset.- Parameters
dataset_dir – the dataset directory
- Returns
the path to the
ZooDatasetInfo
-
get_split_dir
(dataset_dir, split)¶ Returns the directory for the given split of the dataset.
- Parameters
dataset_dir – the dataset directory
split – the dataset split
- Returns
the directory that will/does hold the specified split
-
property
has_patches
¶ Whether the dataset has patches that may need to be applied to already downloaded files.
-
has_split
(split)¶ Whether the dataset has the given split.
- Parameters
split – the dataset split
- Returns
True/False
-
property
has_splits
¶ Whether the dataset has splits.
-
has_tag
(tag)¶ Whether the dataset has the given tag.
- Parameters
tag – the tag
- Returns
True/False
Whether the dataset has tags.
-
property
importer_kwargs
¶ A dict of default kwargs to pass to this dataset’s
fiftyone.utils.data.importers.DatasetImporter
.
-
static
load_info
(dataset_dir, upgrade=True, warn_deprecated=False)¶ Loads the
ZooDatasetInfo
from the given dataset directory.- Parameters
dataset_dir – the directory in which to construct the dataset
upgrade (True) – whether to upgrade the JSON file on disk if any migrations were necessary
warn_deprecated (False) – whether to issue a warning if the dataset has a deprecated format
- Returns
the
ZooDatasetInfo
for the dataset
-
property
parameters
¶ An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.
-
property
requires_manual_download
¶ Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.
-
property
supports_partial_downloads
¶ Whether the dataset supports downloading partial subsets of its splits.
A tuple of tags for the dataset.
-
property