fiftyone.utils.data.base#
Data utilities.
Functions:
|
Maps the values in the given field to new values for each sample in the collection. |
|
Parses the contents of the given directory of images. |
|
Parses the contents of the given directory of videos. |
|
Parses the contents of the given image classification dataset directory tree, which should have the following format. |
|
Downloads the classification dataset specified by the given CSV file, which should have the following format. |
|
Downloads the images from the given URLs. |
- fiftyone.utils.data.base.map_values(sample_collection, path, map, progress=False)#
Maps the values in the given field to new values for each sample in the collection.
This function performs the same operation as
map_values()
but it immediately saves the mapped values to the database rather than creating a view.Examples:
import random import fiftyone as fo import fiftyone.zoo as foz import fiftyone.utils.data as foud from fiftyone import ViewField as F ANIMALS = [ "bear", "bird", "cat", "cow", "dog", "elephant", "giraffe", "horse", "sheep", "zebra" ] dataset = foz.load_zoo_dataset("quickstart") values = [random.choice(ANIMALS) for _ in range(len(dataset))] dataset.set_values("str_field", values) dataset.set_values("list_field", [[v] for v in values]) dataset.set_field("ground_truth.detections.tags", [F("label")]).save() # Map all animals to string "animal" mapping = {a: "animal" for a in ANIMALS} # # Map values in top-level fields # foud.map_values(dataset, "str_field", mapping) print(dataset.count_values("str_field")) # {"animal": 200} foud.map_values(dataset, "list_field", mapping) print(dataset.count_values("list_field")) # {"animal": 200} # # Map values in nested fields # foud.map_values(dataset, "ground_truth.detections.label", mapping) print(dataset.count_values("ground_truth.detections.label")) # {"animal": 183, ...} foud.map_values(dataset, "ground_truth.detections.tags", mapping) print(dataset.count_values("ground_truth.detections.tags")) # {"animal": 183, ...}
- Parameters:
sample_collection – a
fiftyone.core.collections.SampleCollection
path – the field or
embedded.field.name
to mapmap – a dict mapping values to new values
progress (False) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- fiftyone.utils.data.base.parse_images_dir(dataset_dir, recursive=True)#
Parses the contents of the given directory of images.
- Parameters:
dataset_dir – the dataset directory
recursive (True) – whether to recursively traverse subdirectories
- Returns:
a list of image paths
- fiftyone.utils.data.base.parse_videos_dir(dataset_dir, recursive=True)#
Parses the contents of the given directory of videos.
- Parameters:
dataset_dir – the dataset directory
recursive (True) – whether to recursively traverse subdirectories
- Returns:
a list of video paths
- fiftyone.utils.data.base.parse_image_classification_dir_tree(dataset_dir)#
Parses the contents of the given image classification dataset directory tree, which should have the following format:
<dataset_dir>/ <classA>/ <image1>.<ext> <image2>.<ext> ... <classB>/ <image1>.<ext> <image2>.<ext> ...
- Parameters:
dataset_dir – the dataset directory
- Returns:
a list of
(image_path, target)
pairs classes: a list of class label strings- Return type:
samples
- fiftyone.utils.data.base.download_image_classification_dataset(csv_path, dataset_dir, classes=None, num_workers=None)#
Downloads the classification dataset specified by the given CSV file, which should have the following format:
<label1>,<image_url1> <label2>,<image_url2> ...
The image filenames are the basenames of the URLs, which are assumed to be unique.
The dataset is written to disk in
fiftyone.types.FiftyOneImageClassificationDataset
format.- Parameters:
csv_path – a CSV file containing the labels and image URLs
dataset_dir – the directory to write the dataset
classes (None) – an optional list of classes. By default, this will be inferred from the contents of
csv_path
num_workers (None) – a suggested number of threads to use to download images
- fiftyone.utils.data.base.download_images(image_urls, output_dir, num_workers=None)#
Downloads the images from the given URLs.
The filenames in
output_dir
are the basenames of the URLs, which are assumed to be unique.- Parameters:
image_urls – a list of image URLs to download
output_dir – the directory to write the images
num_workers (None) – a suggested number of threads to use
- Returns:
the list of downloaded image paths