fiftyone.utils.data.base#

Data utilities.

Copyright 2017-2025, Voxel51, Inc.

Functions:

map_values(sample_collection, path, map[, ...])

Maps the values in the given field to new values for each sample in the collection.

parse_images_dir(dataset_dir[, recursive])

Parses the contents of the given directory of images.

parse_videos_dir(dataset_dir[, recursive])

Parses the contents of the given directory of videos.

parse_image_classification_dir_tree(dataset_dir)

Parses the contents of the given image classification dataset directory tree, which should have the following format.

download_image_classification_dataset(...[, ...])

Downloads the classification dataset specified by the given CSV file, which should have the following format.

download_images(image_urls, output_dir[, ...])

Downloads the images from the given URLs.

fiftyone.utils.data.base.map_values(sample_collection, path, map, progress=False)#

Maps the values in the given field to new values for each sample in the collection.

This function performs the same operation as map_values() but it immediately saves the mapped values to the database rather than creating a view.

Examples:

import random

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.data as foud
from fiftyone import ViewField as F

ANIMALS = [
    "bear", "bird", "cat", "cow", "dog", "elephant", "giraffe",
    "horse", "sheep", "zebra"
]

dataset = foz.load_zoo_dataset("quickstart")

values = [random.choice(ANIMALS) for _ in range(len(dataset))]
dataset.set_values("str_field", values)
dataset.set_values("list_field", [[v] for v in values])

dataset.set_field("ground_truth.detections.tags", [F("label")]).save()

# Map all animals to string "animal"
mapping = {a: "animal" for a in ANIMALS}

#
# Map values in top-level fields
#

foud.map_values(dataset, "str_field", mapping)

print(dataset.count_values("str_field"))
# {"animal": 200}

foud.map_values(dataset, "list_field", mapping)

print(dataset.count_values("list_field"))
# {"animal": 200}

#
# Map values in nested fields
#

foud.map_values(dataset, "ground_truth.detections.label", mapping)

print(dataset.count_values("ground_truth.detections.label"))
# {"animal": 183, ...}

foud.map_values(dataset, "ground_truth.detections.tags", mapping)

print(dataset.count_values("ground_truth.detections.tags"))
# {"animal": 183, ...}
Parameters:
  • sample_collection – a fiftyone.core.collections.SampleCollection

  • path – the field or embedded.field.name to map

  • map – a dict mapping values to new values

  • progress (False) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

fiftyone.utils.data.base.parse_images_dir(dataset_dir, recursive=True)#

Parses the contents of the given directory of images.

Parameters:
  • dataset_dir – the dataset directory

  • recursive (True) – whether to recursively traverse subdirectories

Returns:

a list of image paths

fiftyone.utils.data.base.parse_videos_dir(dataset_dir, recursive=True)#

Parses the contents of the given directory of videos.

Parameters:
  • dataset_dir – the dataset directory

  • recursive (True) – whether to recursively traverse subdirectories

Returns:

a list of video paths

fiftyone.utils.data.base.parse_image_classification_dir_tree(dataset_dir)#

Parses the contents of the given image classification dataset directory tree, which should have the following format:

<dataset_dir>/
    <classA>/
        <image1>.<ext>
        <image2>.<ext>
        ...
    <classB>/
        <image1>.<ext>
        <image2>.<ext>
        ...
Parameters:

dataset_dir – the dataset directory

Returns:

a list of (image_path, target) pairs classes: a list of class label strings

Return type:

samples

fiftyone.utils.data.base.download_image_classification_dataset(csv_path, dataset_dir, classes=None, num_workers=None)#

Downloads the classification dataset specified by the given CSV file, which should have the following format:

<label1>,<image_url1>
<label2>,<image_url2>
...

The image filenames are the basenames of the URLs, which are assumed to be unique.

The dataset is written to disk in fiftyone.types.FiftyOneImageClassificationDataset format.

Parameters:
  • csv_path – a CSV file containing the labels and image URLs

  • dataset_dir – the directory to write the dataset

  • classes (None) – an optional list of classes. By default, this will be inferred from the contents of csv_path

  • num_workers (None) – a suggested number of threads to use to download images

fiftyone.utils.data.base.download_images(image_urls, output_dir, num_workers=None)#

Downloads the images from the given URLs.

The filenames in output_dir are the basenames of the URLs, which are assumed to be unique.

Parameters:
  • image_urls – a list of image URLs to download

  • output_dir – the directory to write the images

  • num_workers (None) – a suggested number of threads to use

Returns:

the list of downloaded image paths