fiftyone.utils.csv#
CSV utilities.
Classes:
|
A flexible CSV importer that represents slice(s) of field values of a dataset as columns of a CSV file. |
|
A flexible CSV exporter that represents slice(s) of field values of a dataset as columns of a CSV file. |
- class fiftyone.utils.csv.CSVDatasetImporter(dataset_dir=None, data_path=None, labels_path=None, media_field='filepath', fields=None, skip_missing_media=False, include_all_data=False, shuffle=False, seed=None, max_samples=None)#
Bases:
GenericSampleDatasetImporter,ImportPathsMixinA flexible CSV importer that represents slice(s) of field values of a dataset as columns of a CSV file.
See this page for format details.
- Parameters:
dataset_dir (None) – the dataset directory. If omitted,
data_pathand/orlabels_pathmust be provideddata_path (None) –
an optional parameter that enables explicit control over the location of the media. Can be any of the following:
a folder name like
"data"or"data/"specifying a subfolder ofdataset_dirwhere the media files residean absolute directory path where the media files reside. In this case, the
dataset_dirhas no effect on the location of the dataa filename like
"data.json"specifying the filename of the JSON data manifest file indataset_diran absolute filepath specifying the location of the JSON data manifest. In this case,
dataset_dirhas no effect on the location of the dataa dict mapping filenames to absolute filepaths
If None, this parameter will default to whichever of
data/ordata.jsonexists in the dataset directorylabels_path (None) –
an optional parameter that enables explicit control over the location of the labels. Can be any of the following:
a filename like
"labels.csv"specifying the location of the labels indataset_diran absolute filepath to the labels. In this case,
dataset_dirhas no effect on the location of the labels
If None, the parameter will default to
labels.csvmedia_field ("filepath") –
the name of the column containing the media path for each sample. The media paths in this column may be:
filenames or relative paths to media files in
data_pathabsolute media paths, in which case
data_pathhas no effect
fields (None) –
an optional parameter that specifies the columns to read and parse from the CSV file. Can be any of the following:
an iterable of column names to parse as strings
a dict mapping column names to functions that parse the column values into the appropriate type. Any keys with
Nonevalues in this case are directly loaded as strings
If not provided, all columns are parsed as strings
skip_missing_media (False) – whether to skip (True) or raise an error (False) when rows with no
media_fieldare encounteredinclude_all_data (False) – whether to generate samples for all media in the data directory (True) rather than only creating samples for media with CSV rows (False)
shuffle (False) – whether to randomly shuffle the order in which the samples are imported
seed (None) – a random seed to use when shuffling
max_samples (None) – a maximum number of samples to import. By default, all samples are imported
Attributes:
Whether this importer produces a sample field schema.
Whether this importer produces a dataset info dictionary.
Methods:
setup()Performs any necessary setup before importing the first sample in the dataset.
close(*args)Performs any necessary actions after the last sample has been imported.
Returns the dataset info for the dataset.
Returns a dictionary describing the field schema of the samples loaded by this importer.
- property has_sample_field_schema#
Whether this importer produces a sample field schema.
- property has_dataset_info#
Whether this importer produces a dataset info dictionary.
- setup()#
Performs any necessary setup before importing the first sample in the dataset.
This method is called when the importer’s context manager interface is entered,
DatasetImporter.__enter__().
- close(*args)#
Performs any necessary actions after the last sample has been imported.
This method is called when the importer’s context manager interface is exited,
DatasetImporter.__exit__().- Parameters:
*args – the arguments to
DatasetImporter.__exit__()
- get_dataset_info()#
Returns the dataset info for the dataset.
By convention, this method should be called after all samples in the dataset have been imported.
- Returns:
a dict of dataset info
- get_sample_field_schema()#
Returns a dictionary describing the field schema of the samples loaded by this importer.
- Returns:
a dict mapping field names to
fiftyone.core.fields.Fieldinstances orstr(field)representations of them
- class fiftyone.utils.csv.CSVDatasetExporter(export_dir=None, data_path=None, labels_path=None, export_media=None, rel_dir=None, abs_paths=False, media_field='filepath', fields=None)#
Bases:
BatchDatasetExporter,ExportPathsMixinA flexible CSV exporter that represents slice(s) of field values of a dataset as columns of a CSV file.
See this page for exporting datasets of this type.
- Parameters:
export_dir (None) – the directory to write the export. This has no effect if
data_pathandlabels_pathare absolute pathsdata_path (None) –
an optional parameter that enables explicit control over the location of the exported media. Can be any of the following:
a folder name like
"data"or"data/"specifying a subfolder ofexport_dirin which to export the mediaan absolute directory path in which to export the media. In this case, the
export_dirhas no effect on the location of the dataa JSON filename like
"data.json"specifying the filename of the manifest file inexport_dirgenerated whenexport_mediais"manifest"an absolute filepath specifying the location to write the JSON manifest file when
export_mediais"manifest". In this case,export_dirhas no effect on the location of the data
If None, the default value of this parameter will be chosen based on the value of the
export_mediaparameterlabels_path (None) –
an optional parameter that enables explicit control over the location of the exported labels. Can be any of the following:
a filename like
"labels.csv"specifying the location inexport_dirin which to export the labelsan absolute filepath to which to export the labels. In this case, the
export_dirhas no effect on the location of the labels
If None, the labels will be exported into
export_dirusing the default filenameexport_media (None) –
controls how to export the raw media. The supported values are:
True: copy all media files into the output directoryFalse: don’t export media"move": move all media files into the output directory"symlink": create symlinks to the media files in the output directory"manifest": create adata.jsonin the output directory that maps UUIDs used in the labels files to the filepaths of the source media, rather than exporting the actual media
If None, the default value of this parameter will be chosen based on the value of the
data_pathparameterrel_dir (None) – an optional relative directory to strip from each input filepath to generate a unique identifier for each media. When exporting media, this identifier is joined with
data_pathto generate an output path for each exported media. This argument allows for populating nested subdirectories that match the shape of the input paths. The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()abs_paths (False) – whether to store absolute paths to the media in the exported labels
media_field ("filepath") – the name of the field containing the media to export for each sample
fields (None) –
an optional argument specifying the fields or
embedding.field.namesto include as columns in the exported CSV. Can be:a field or iterable of fields
a dict mapping field names to column names
By default, only the
media_fieldis exported
Methods:
setup()Performs any necessary setup before exporting the first sample in the dataset.
export_samples(sample_collection[, progress])Exports the given sample collection.
close(*args)Performs any necessary actions after the last sample has been exported.
export_sample(*args, **kwargs)Exports the given sample to the dataset.
log_collection(sample_collection)Logs any relevant information about the
fiftyone.core.collections.SampleCollectionwhose samples will be exported.- setup()#
Performs any necessary setup before exporting the first sample in the dataset.
This method is called when the exporter’s context manager interface is entered,
DatasetExporter.__enter__().
- export_samples(sample_collection, progress=None)#
Exports the given sample collection.
- Parameters:
sample_collection – a
fiftyone.core.collections.SampleCollectionprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars(None), or a progress callback function to invoke instead
- close(*args)#
Performs any necessary actions after the last sample has been exported.
This method is called when the exporter’s context manager interface is exited,
DatasetExporter.__exit__().- Parameters:
*args – the arguments to
DatasetExporter.__exit__()
- export_sample(*args, **kwargs)#
Exports the given sample to the dataset.
- Parameters:
*args – subclass-specific positional arguments
**kwargs – subclass-specific keyword arguments
- log_collection(sample_collection)#
Logs any relevant information about the
fiftyone.core.collections.SampleCollectionwhose samples will be exported.Subclasses can optionally implement this method if their export format can record information such as the
fiftyone.core.collections.SampleCollection.info()of the collection being exported.By convention, this method must be optional; i.e., if it is not called before the first call to
export_sample(), then the exporter must make do without any information about thefiftyone.core.collections.SampleCollection(which may not be available, for example, if the samples being exported are not stored in a collection).- Parameters:
sample_collection – the
fiftyone.core.collections.SampleCollectionwhose samples will be exported