fiftyone.core.dataset¶
FiftyOne datasets.
Exceptions:
|
Exception raised when a dataset is not found. |
Functions:
|
Lists the available FiftyOne datasets. |
|
Checks if the dataset exists. |
|
Loads the FiftyOne dataset with the given name. |
Returns a default dataset name based on the current time. |
|
|
Makes a unique dataset name with the given root name. |
|
Returns the default dataset directory for the dataset with the given name. |
|
Deletes the FiftyOne dataset with the given name. |
|
Deletes all FiftyOne datasets whose names match the given glob pattern. |
|
Deletes all non-persistent datasets. |
Classes:
|
A FiftyOne dataset. |
-
exception
fiftyone.core.dataset.
DatasetNotFoundError
(name)¶ Bases:
ValueError
Exception raised when a dataset is not found.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
fiftyone.core.dataset.
list_datasets
(glob_patt=None, tags=None, info=False)¶ Lists the available FiftyOne datasets.
- Parameters
glob_patt (None) – an optional glob pattern of names to return
tags (None) – only include datasets that have the specified tag or list of tags
info (False) – whether to return info dicts describing each dataset rather than just their names
- Returns
a list of dataset names or info dicts
-
fiftyone.core.dataset.
dataset_exists
(name)¶ Checks if the dataset exists.
- Parameters
name – the name of the dataset
- Returns
True/False
-
fiftyone.core.dataset.
load_dataset
(name, create_if_necessary=False)¶ Loads the FiftyOne dataset with the given name.
To create a new dataset, use the
Dataset
constructor.Note
Dataset
instances are singletons keyed by their name, so all calls to this method with a given datasetname
in a program will return the same object.- Parameters
name – the name of the dataset
create_if_necessary (False) – if no dataset exists, create an empty one
- Raises
DatasetNotFoundError – if the dataset does not exist and create_if_necessary is False
- Returns
a
Dataset
-
fiftyone.core.dataset.
get_default_dataset_name
()¶ Returns a default dataset name based on the current time.
- Returns
a dataset name
-
fiftyone.core.dataset.
make_unique_dataset_name
(root)¶ Makes a unique dataset name with the given root name.
- Parameters
root – the root name for the dataset
- Returns
the dataset name
-
fiftyone.core.dataset.
get_default_dataset_dir
(name)¶ Returns the default dataset directory for the dataset with the given name.
- Parameters
name – the dataset name
- Returns
the default directory for the dataset
-
fiftyone.core.dataset.
delete_dataset
(name, verbose=False)¶ Deletes the FiftyOne dataset with the given name.
- Parameters
name – the name of the dataset
verbose (False) – whether to log the name of the deleted dataset
-
fiftyone.core.dataset.
delete_datasets
(glob_patt, verbose=False)¶ Deletes all FiftyOne datasets whose names match the given glob pattern.
- Parameters
glob_patt – a glob pattern of datasets to delete
verbose (False) – whether to log the names of deleted datasets
-
fiftyone.core.dataset.
delete_non_persistent_datasets
(verbose=False)¶ Deletes all non-persistent datasets.
- Parameters
verbose (False) – whether to log the names of deleted datasets
-
class
fiftyone.core.dataset.
Dataset
(name=None, _create=True, *args, **kwargs)¶ Bases:
fiftyone.core.collections.SampleCollection
A FiftyOne dataset.
Datasets represent an ordered collection of
fiftyone.core.sample.Sample
instances that describe a particular type of raw media (e.g., images or videos) together with a user-defined set of fields.FiftyOne datasets ingest and store the labels for all samples internally; raw media is stored on disk and the dataset provides paths to the data.
See this page for an overview of working with FiftyOne datasets.
- Parameters
name (None) – the name of the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
Attributes:
The media type of the dataset.
The group field of the dataset, or None if the dataset is not grouped.
The current group slice of the dataset, or None if the dataset is not grouped.
The list of group slices of the dataset, or None if the dataset is not grouped.
A dict mapping group slices to media types, or None if the dataset is not grouped.
The default group slice of the dataset, or None if the dataset is not grouped.
The version of the
fiftyone
package for which the dataset is formatted.The name of the dataset.
The slug of the dataset.
The datetime that the dataset was created.
The datetime that the dataset was last modified.
The datetime that the dataset was last loaded.
Whether the dataset persists in the database after a session is terminated.
A list of tags on the dataset.
A string description on the dataset.
A user-facing dictionary of information about the dataset.
A
fiftyone.core.odm.dataset.DatasetAppConfig
that customizes how this dataset is visualized in the FiftyOne App.A dict mapping field names to list of class label strings for the corresponding fields of the dataset.
A list of class label strings for all
fiftyone.core.labels.Label
fields of this dataset that do not have customized classes defined inclasses()
.A dict mapping field names to mask target dicts, each of which defines a mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks in the corresponding field of the dataset.
A dict defining a default mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks of all
fiftyone.core.labels.Segmentation
fields of this dataset that do not have customized mask targets defined inmask_targets()
.A dict mapping field names to
fiftyone.core.odm.dataset.KeypointSkeleton
instances, each of which defines the semantic labels and point connectivity for thefiftyone.core.labels.Keypoint
instances in the corresponding field of the dataset.A default
fiftyone.core.odm.dataset.KeypointSkeleton
defining the semantic labels and point connectivity for allfiftyone.core.labels.Keypoint
fields of this dataset that do not have customized skeletons defined inskeleton()
.Whether the dataset is deleted.
Whether this dataset has any saved views.
Whether this dataset has any saved workspaces.
Whether this collection has any annotation runs.
Whether this collection has any brain runs.
Whether this collection has any evaluation results.
Whether this collection has any runs.
Methods:
summary
()Returns a string summary of the dataset.
stats
([include_media, include_indexes, …])Returns stats about the dataset on disk.
first
()Returns the first sample in the dataset.
last
()Returns the last sample in the dataset.
head
([num_samples])Returns a list of the first few samples in the dataset.
tail
([num_samples])Returns a list of the last few samples in the dataset.
one
(expr[, exact])Returns a single sample in this dataset matching the expression.
view
()Returns a
fiftyone.core.view.DatasetView
containing the entire dataset.get_field_schema
([ftype, embedded_doc_type, …])Returns a schema dictionary describing the fields of the samples in the dataset.
get_frame_field_schema
([ftype, …])Returns a schema dictionary describing the fields of the frames of the samples in the dataset.
add_sample_field
(field_name, ftype[, …])Adds a new sample field or embedded field to the dataset, if necessary.
add_dynamic_sample_fields
([fields, …])Adds all dynamic sample fields to the dataset’s schema.
add_frame_field
(field_name, ftype[, …])Adds a new frame-level field or embedded field to the dataset, if necessary.
Lists the summary fields on the dataset.
create_summary_field
(path[, field_name, …])Populates a sample-level field that records the unique values or numeric ranges that appear in the specified field on each sample in the dataset.
Returns a list of summary fields that may need to be updated.
update_summary_field
(field_name)Updates the summary field based on the current values of its source field.
delete_summary_field
(field_name[, error_level])Deletes the summary field from all samples in the dataset.
delete_summary_fields
(field_names[, error_level])Deletes the summary fields from all samples in the dataset.
add_dynamic_frame_fields
([fields, …])Adds all dynamic frame fields to the dataset’s schema.
add_group_field
(field_name[, default, …])Adds a group field to the dataset, if necessary.
rename_sample_field
(field_name, new_field_name)Renames the sample field to the given new name.
rename_sample_fields
(field_mapping)Renames the sample fields to the given new names.
rename_frame_field
(field_name, new_field_name)Renames the frame-level field to the given new name.
rename_frame_fields
(field_mapping)Renames the frame-level fields to the given new names.
clone_sample_field
(field_name, new_field_name)Clones the given sample field into a new field of the dataset.
clone_sample_fields
(field_mapping)Clones the given sample fields into new fields of the dataset.
clone_frame_field
(field_name, new_field_name)Clones the frame-level field into a new field.
clone_frame_fields
(field_mapping)Clones the frame-level fields into new fields.
clear_sample_field
(field_name)Clears the values of the field from all samples in the dataset.
clear_sample_fields
(field_names)Clears the values of the fields from all samples in the dataset.
clear_frame_field
(field_name)Clears the values of the frame-level field from all samples in the dataset.
clear_frame_fields
(field_names)Clears the values of the frame-level fields from all samples in the dataset.
delete_sample_field
(field_name[, error_level])Deletes the field from all samples in the dataset.
delete_sample_fields
(field_names[, error_level])Deletes the fields from all samples in the dataset.
remove_dynamic_sample_field
(field_name[, …])Removes the dynamic embedded sample field from the dataset’s schema.
remove_dynamic_sample_fields
(field_names[, …])Removes the dynamic embedded sample fields from the dataset’s schema.
delete_frame_field
(field_name[, error_level])Deletes the frame-level field from all samples in the dataset.
delete_frame_fields
(field_names[, error_level])Deletes the frame-level fields from all samples in the dataset.
remove_dynamic_frame_field
(field_name[, …])Removes the dynamic embedded frame field from the dataset’s schema.
remove_dynamic_frame_fields
(field_names[, …])Removes the dynamic embedded frame fields from the dataset’s schema.
add_group_slice
(name, media_type)Adds a group slice with the given media type to the dataset, if necessary.
rename_group_slice
(name, new_name)Renames the group slice with the given name.
delete_group_slice
(name)Deletes all samples in the given group slice from the dataset.
iter_samples
([progress, autosave, …])Returns an iterator over the samples in the dataset.
iter_groups
([group_slices, progress, …])Returns an iterator over the groups in the dataset.
get_group
(group_id[, group_slices])Returns a dict containing the samples for the given group ID.
add_sample
(sample[, expand_schema, dynamic, …])Adds the given sample to the dataset.
add_samples
(samples[, expand_schema, …])Adds the given samples to the dataset.
add_collection
(sample_collection[, …])Adds the contents of the given collection to the dataset.
merge_sample
(sample[, key_field, …])Merges the fields of the given sample into this dataset.
merge_samples
(samples[, key_field, key_fcn, …])Merges the given samples into this dataset.
delete_samples
(samples_or_ids)Deletes the given sample(s) from the dataset.
delete_frames
(frames_or_ids)Deletes the given frames(s) from the dataset.
delete_groups
(groups_or_ids)Deletes the given groups(s) from the dataset.
delete_labels
([labels, ids, tags, view, fields])Deletes the specified labels from the dataset.
save
()Saves the dataset to the database.
has_saved_view
(name)Whether this dataset has a saved view with the given name.
list_saved_views
([info])List saved views on this dataset.
save_view
(name, view[, description, color, …])Saves the given view into this dataset under the given name so it can be loaded later via
load_saved_view()
.get_saved_view_info
(name)Loads the editable information about the saved view with the given name.
update_saved_view_info
(name, info)Updates the editable information for the saved view with the given name.
load_saved_view
(name)Loads the saved view with the given name.
delete_saved_view
(name)Deletes the saved view with the given name.
Deletes all saved views from this dataset.
has_workspace
(name)Whether this dataset has a saved workspace with the given name.
list_workspaces
([info])List saved workspaces on this dataset.
save_workspace
(name, workspace[, …])Saves a workspace into this dataset under the given name so it can be loaded later via
load_workspace()
.load_workspace
(name)Loads the saved workspace with the given name.
get_workspace_info
(name)Gets the information about the workspace with the given name.
update_workspace_info
(name, info)Updates the editable information for the saved view with the given name.
delete_workspace
(name)Deletes the saved workspace with the given name.
Deletes all saved workspaces from this dataset.
clone
([name, persistent])Creates a copy of the dataset.
clear
()Removes all samples from the dataset.
Removes all frame labels from the dataset.
Ensures that the video dataset contains frame instances for every frame of each sample’s source video.
delete
()Deletes the dataset.
add_dir
([dataset_dir, dataset_type, …])Adds the contents of the given directory to the dataset.
merge_dir
([dataset_dir, dataset_type, …])Merges the contents of the given directory into the dataset.
add_archive
(archive_path[, dataset_type, …])Adds the contents of the given archive to the dataset.
merge_archive
(archive_path[, dataset_type, …])Merges the contents of the given archive into the dataset.
add_importer
(dataset_importer[, …])Adds the samples from the given
fiftyone.utils.data.importers.DatasetImporter
to the dataset.merge_importer
(dataset_importer[, …])Merges the samples from the given
fiftyone.utils.data.importers.DatasetImporter
into the dataset.add_images
(paths_or_samples[, …])Adds the given images to the dataset.
add_labeled_images
(samples, sample_parser[, …])Adds the given labeled images to the dataset.
add_images_dir
(images_dir[, tags, …])Adds the given directory of images to the dataset.
add_images_patt
(images_patt[, tags, progress])Adds the given glob pattern of images to the dataset.
ingest_images
(paths_or_samples[, …])Ingests the given iterable of images into the dataset.
ingest_labeled_images
(samples, sample_parser)Ingests the given iterable of labeled image samples into the dataset.
add_videos
(paths_or_samples[, …])Adds the given videos to the dataset.
add_labeled_videos
(samples, sample_parser[, …])Adds the given labeled videos to the dataset.
add_videos_dir
(videos_dir[, tags, …])Adds the given directory of videos to the dataset.
add_videos_patt
(videos_patt[, tags, progress])Adds the given glob pattern of videos to the dataset.
ingest_videos
(paths_or_samples[, …])Ingests the given iterable of videos into the dataset.
ingest_labeled_videos
(samples, sample_parser)Ingests the given iterable of labeled video samples into the dataset.
from_dir
([dataset_dir, dataset_type, …])Creates a
Dataset
from the contents of the given directory.from_archive
(archive_path[, dataset_type, …])Creates a
Dataset
from the contents of the given archive.from_importer
(dataset_importer[, name, …])Creates a
Dataset
by importing the samples in the givenfiftyone.utils.data.importers.DatasetImporter
.from_images
(paths_or_samples[, …])Creates a
Dataset
from the given images.from_labeled_images
(samples, sample_parser)Creates a
Dataset
from the given labeled images.from_images_dir
(images_dir[, name, …])Creates a
Dataset
from the given directory of images.from_images_patt
(images_patt[, name, …])Creates a
Dataset
from the given glob pattern of images.from_videos
(paths_or_samples[, …])Creates a
Dataset
from the given videos.from_labeled_videos
(samples, sample_parser)Creates a
Dataset
from the given labeled videos.from_videos_dir
(videos_dir[, name, …])Creates a
Dataset
from the given directory of videos.from_videos_patt
(videos_patt[, name, …])Creates a
Dataset
from the given glob pattern of videos.from_dict
(d[, name, persistent, overwrite, …])Loads a
Dataset
from a JSON dictionary generated byfiftyone.core.collections.SampleCollection.to_dict()
.from_json
(path_or_str[, name, persistent, …])Loads a
Dataset
from JSON generated byfiftyone.core.collections.SampleCollection.write_json()
orfiftyone.core.collections.SampleCollection.to_json()
.reload
()Reloads the dataset and any in-memory samples from the database.
Clears the dataset’s in-memory cache.
add_stage
(stage)Applies the given
fiftyone.core.stages.ViewStage
to the collection.aggregate
(aggregations)Aggregates one or more
fiftyone.core.aggregations.Aggregation
instances.annotate
(anno_key[, label_schema, …])Exports the samples and optional label field(s) in this collection to the given annotation backend.
apply_model
(model[, label_field, …])Applies the model to the samples in the collection.
bounds
(field_or_expr[, expr, safe])Computes the bounds of a numeric field of the collection.
compute_embeddings
(model[, …])Computes embeddings for the samples in the collection using the given model.
compute_metadata
([overwrite, num_workers, …])Populates the
metadata
field of all samples in the collection.compute_patch_embeddings
(model, patches_field)Computes embeddings for the image patches defined by
patches_field
of the samples in the collection using the given model.concat
(samples)Concatenates the contents of the given
SampleCollection
to this collection.count
([field_or_expr, expr, safe])Counts the number of field values in the collection.
count_label_tags
([label_fields])Counts the occurrences of all label tags in the specified label field(s) of this collection.
Counts the occurrences of sample tags in this collection.
count_values
(field_or_expr[, expr, safe])Counts the occurrences of field values in the collection.
create_index
(field_or_spec[, unique, wait])Creates an index on the given field or with the given specification, if necessary.
delete_annotation_run
(anno_key)Deletes the annotation run with the given key from this collection.
Deletes all annotation runs from this collection.
delete_brain_run
(brain_key)Deletes the brain method run with the given key from this collection.
Deletes all brain method runs from this collection.
delete_evaluation
(eval_key)Deletes the evaluation results associated with the given evaluation key from this collection.
Deletes all evaluation results from this collection.
delete_run
(run_key)Deletes the run with the given key from this collection.
Deletes all runs from this collection.
distinct
(field_or_expr[, expr, safe])Computes the distinct values of a field in the collection.
draw_labels
(output_dir[, rel_dir, …])Renders annotated versions of the media in the collection with the specified label data overlaid to the given directory.
drop_index
(field_or_name)Drops the index for the given field or name, if necessary.
evaluate_classifications
(pred_field[, …])Evaluates the classification predictions in this collection with respect to the specified ground truth labels.
evaluate_detections
(pred_field[, gt_field, …])Evaluates the specified predicted detections in this collection with respect to the specified ground truth detections.
evaluate_regressions
(pred_field[, gt_field, …])Evaluates the regression predictions in this collection with respect to the specified ground truth values.
evaluate_segmentations
(pred_field[, …])Evaluates the specified semantic segmentation masks in this collection with respect to the specified ground truth masks.
exclude
(sample_ids)Excludes the samples with the given IDs from the collection.
exclude_by
(field, values)Excludes the samples with the given field values from the collection.
exclude_fields
([field_names, meta_filter, …])Excludes the fields with the given names from the samples in the collection.
exclude_frames
(frame_ids[, omit_empty])Excludes the frames with the given IDs from the video collection.
exclude_groups
(group_ids)Excludes the groups with the given IDs from the grouped collection.
exclude_labels
([labels, ids, tags, fields, …])Excludes the specified labels from the collection.
exists
(field[, bool])Returns a view containing the samples in the collection that have (or do not have) a non-
None
value for the given field or embedded field.export
([export_dir, dataset_type, …])Exports the samples in the collection to disk.
filter_field
(field, filter[, only_matches])Filters the values of a field or embedded field of each sample in the collection.
filter_keypoints
(field[, filter, labels, …])Filters the individual
fiftyone.core.labels.Keypoint.points
elements in the specified keypoints field of each sample in the collection.filter_labels
(field, filter[, only_matches, …])Filters the
fiftyone.core.labels.Label
field of each sample in the collection.flatten
([stages])Returns a flattened view that contains all samples in the dynamic grouped collection.
geo_near
(point[, location_field, …])Sorts the samples in the collection by their proximity to a specified geolocation.
geo_within
(boundary[, location_field, strict])Filters the samples in this collection to only include samples whose geolocation is within a specified boundary.
get_annotation_info
(anno_key)Returns information about the annotation run with the given key on this collection.
get_brain_info
(brain_key)Returns information about the brain method run with the given key on this collection.
get_classes
(field)Gets the classes list for the given field, or None if no classes are available.
get_dynamic_field_schema
([fields, recursive])Returns a schema dictionary describing the dynamic fields of the samples in the collection.
get_dynamic_frame_field_schema
([fields, …])Returns a schema dictionary describing the dynamic fields of the frames in the collection.
get_evaluation_info
(eval_key)Returns information about the evaluation with the given key on this collection.
get_field
(path[, ftype, embedded_doc_type, …])Returns the field instance of the provided path, or
None
if one does not exist.get_index_information
([include_stats])Returns a dictionary of information about the indexes on this collection.
get_mask_targets
(field)Gets the mask targets for the given field, or None if no mask targets are available.
get_run_info
(run_key)Returns information about the run with the given key on this collection.
get_skeleton
(field)Gets the keypoint skeleton for the given field, or None if no skeleton is available.
group_by
(field_or_expr[, order_by, reverse, …])Creates a view that groups the samples in the collection by a specified field or expression.
has_annotation_run
(anno_key)Whether this collection has an annotation run with the given key.
has_brain_run
(brain_key)Whether this collection has a brain method run with the given key.
has_classes
(field)Determines whether this collection has a classes list for the given field.
has_evaluation
(eval_key)Whether this collection has an evaluation with the given key.
has_field
(path)Determines whether the collection has a field with the given name.
has_frame_field
(path)Determines whether the collection has a frame-level field with the given name.
has_mask_targets
(field)Determines whether this collection has mask targets for the given field.
has_run
(run_key)Whether this collection has a run with the given key.
has_sample_field
(path)Determines whether the collection has a sample field with the given name.
has_skeleton
(field)Determines whether this collection has a keypoint skeleton for the given field.
histogram_values
(field_or_expr[, expr, …])Computes a histogram of the field values in the collection.
init_run
(**kwargs)Initializes a config instance for a new run.
init_run_results
(run_key, **kwargs)Initializes a results instance for the run with the given key.
limit
(limit)Returns a view with at most the given number of samples.
limit_labels
(field, limit)Limits the number of
fiftyone.core.labels.Label
instances in the specified labels list field of each sample in the collection.Returns a list of all available methods on this collection that apply
fiftyone.core.aggregations.Aggregation
operations to this collection.list_annotation_runs
([type, method])Returns a list of annotation keys on this collection.
list_brain_runs
([type, method])Returns a list of brain keys on this collection.
list_evaluations
([type, method])Returns a list of evaluation keys on this collection.
Returns the list of index names on this collection.
list_runs
(**kwargs)Returns a list of run keys on this collection.
list_schema
(field_or_expr[, expr])Extracts the value type(s) in a specified list field across all samples in the collection.
Returns a list of all available methods on this collection that apply
fiftyone.core.stages.ViewStage
operations to this collection.load_annotation_results
(anno_key[, cache])Loads the results for the annotation run with the given key on this collection.
load_annotation_view
(anno_key[, select_fields])Loads the
fiftyone.core.view.DatasetView
on which the specified annotation run was performed on this collection.load_annotations
(anno_key[, dest_field, …])Downloads the labels from the given annotation run from the annotation backend and merges them into this collection.
load_brain_results
(brain_key[, cache, load_view])Loads the results for the brain method run with the given key on this collection.
load_brain_view
(brain_key[, select_fields])Loads the
fiftyone.core.view.DatasetView
on which the specified brain method run was performed on this collection.load_evaluation_results
(eval_key[, cache])Loads the results for the evaluation with the given key on this collection.
load_evaluation_view
(eval_key[, select_fields])Loads the
fiftyone.core.view.DatasetView
on which the specified evaluation was performed on this collection.load_run_results
(run_key[, cache, load_view])Loads the results for the run with the given key on this collection.
load_run_view
(run_key[, select_fields])Loads the
fiftyone.core.view.DatasetView
on which the specified run was performed on this collection.make_unique_field_name
([root])Makes a unique field name with the given root name for the collection.
map_labels
(field, map)Maps the
label
values of afiftyone.core.labels.Label
field to new values for each sample in the collection.match
(filter)Filters the samples in the collection by the given filter.
match_frames
(filter[, omit_empty])Filters the frames in the video collection by the given filter.
match_labels
([labels, ids, tags, filter, …])Selects the samples from the collection that contain (or do not contain) at least one label that matches the specified criteria.
match_tags
(tags[, bool, all])Returns a view containing the samples in the collection that have or don’t have any/all of the given tag(s).
mean
(field_or_expr[, expr, safe])Computes the arithmetic mean of the field values of the collection.
merge_labels
(in_field, out_field)Merges the labels from the given input field into the given output field of the collection.
mongo
(pipeline[, _needs_frames, _group_slices])Adds a view stage defined by a raw MongoDB aggregation pipeline.
quantiles
(field_or_expr, quantiles[, expr, safe])Computes the quantile(s) of the field values of a collection.
register_run
(run_key, config[, results, …])Registers a run under the given key on this collection.
rename_annotation_run
(anno_key, new_anno_key)Replaces the key for the given annotation run with a new key.
rename_brain_run
(brain_key, new_brain_key)Replaces the key for the given brain run with a new key.
rename_evaluation
(eval_key, new_eval_key)Replaces the key for the given evaluation with a new key.
rename_run
(run_key, new_run_key)Replaces the key for the given run with a new key.
save_context
([batch_size, batching_strategy])Returns a context that can be used to save samples from this collection according to a configurable batching strategy.
save_run_results
(run_key, results[, …])Saves run results for the run with the given key.
schema
(field_or_expr[, expr, dynamic_only, …])Extracts the names and types of the attributes of a specified embedded document field across all samples in the collection.
select
(sample_ids[, ordered])Selects the samples with the given IDs from the collection.
select_by
(field, values[, ordered])Selects the samples with the given field values from the collection.
select_fields
([field_names, meta_filter, …])Selects only the fields with the given names from the samples in the collection.
select_frames
(frame_ids[, omit_empty])Selects the frames with the given IDs from the video collection.
select_group_slices
([slices, media_type, …])Selects the samples in the group collection from the given slice(s).
select_groups
(group_ids[, ordered])Selects the groups with the given IDs from the grouped collection.
select_labels
([labels, ids, tags, fields, …])Selects only the specified labels from the collection.
set_field
(field, expr[, _allow_missing])Sets a field or embedded field on each sample in a collection by evaluating the given expression.
set_label_values
(field_name, values[, …])Sets the fields of the specified labels in the collection to the given values.
set_values
(field_name, values[, key_field, …])Sets the field or embedded field on each sample or frame in the collection to the given values.
shuffle
([seed])Randomly shuffles the samples in the collection.
skip
(skip)Omits the given number of samples from the head of the collection.
sort_by
(field_or_expr[, reverse, create_index])Sorts the samples in the collection by the given field(s) or expression(s).
sort_by_similarity
(query[, k, reverse, …])Sorts the collection by similarity to a specified query.
split_labels
(in_field, out_field[, filter])Splits the labels from the given input field into the given output field of the collection.
std
(field_or_expr[, expr, safe, sample])Computes the standard deviation of the field values of the collection.
sum
(field_or_expr[, expr, safe])Computes the sum of the field values of the collection.
sync_last_modified_at
([include_frames])Syncs the
last_modified_at
property(s) of the dataset.tag_labels
(tags[, label_fields])Adds the tag(s) to all labels in the specified label field(s) of this collection, if necessary.
tag_samples
(tags)Adds the tag(s) to all samples in this collection, if necessary.
take
(size[, seed])Randomly samples the given number of samples from the collection.
to_clips
(field_or_expr, **kwargs)Creates a view that contains one sample per clip defined by the given field or expression in the video collection.
to_dict
([rel_dir, include_private, …])Returns a JSON dictionary representation of the collection.
to_evaluation_patches
(eval_key, **kwargs)Creates a view based on the results of the evaluation with the given key that contains one sample for each true positive, false positive, and false negative example in the collection, respectively.
to_frames
(**kwargs)Creates a view that contains one sample per frame in the video collection.
to_json
([rel_dir, include_private, …])Returns a JSON string representation of the collection.
to_patches
(field, **kwargs)Creates a view that contains one sample per object patch in the specified field of the collection.
to_trajectories
(field, **kwargs)Creates a view that contains one clip for each unique object trajectory defined by their
(label, index)
in a frame-level field of a video collection.untag_labels
(tags[, label_fields])Removes the tag from all labels in the specified label field(s) of this collection, if necessary.
untag_samples
(tags)Removes the tag(s) from all samples in this collection, if necessary.
update_run_config
(run_key, config)Updates the run config for the run with the given key.
validate_field_type
(path[, ftype, …])Validates that the collection has a field of the given type.
validate_fields_exist
(fields[, include_private])Validates that the collection has field(s) with the given name(s).
values
(field_or_expr[, expr, missing_value, …])Extracts the values of a field from all samples in the collection.
write_json
(json_path[, rel_dir, …])Writes the colllection to disk in JSON format.
-
property
media_type
¶ The media type of the dataset.
-
property
group_field
¶ The group field of the dataset, or None if the dataset is not grouped.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") print(dataset.group_field) # group
-
property
group_slice
¶ The current group slice of the dataset, or None if the dataset is not grouped.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") print(dataset.group_slices) # ['left', 'right', 'pcd'] print(dataset.group_slice) # left # Change the current group slice dataset.group_slice = "right" print(dataset.group_slice) # right
-
property
group_slices
¶ The list of group slices of the dataset, or None if the dataset is not grouped.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") print(dataset.group_slices) # ['left', 'right', 'pcd']
-
property
group_media_types
¶ A dict mapping group slices to media types, or None if the dataset is not grouped.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") print(dataset.group_media_types) # {'left': 'image', 'right': 'image', 'pcd': 'point-cloud'}
-
property
default_group_slice
¶ The default group slice of the dataset, or None if the dataset is not grouped.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") print(dataset.default_group_slice) # left # Change the default group slice dataset.default_group_slice = "right" print(dataset.default_group_slice) # right
-
property
version
¶ The version of the
fiftyone
package for which the dataset is formatted.
-
property
name
¶ The name of the dataset.
-
property
slug
¶ The slug of the dataset.
-
property
created_at
¶ The datetime that the dataset was created.
-
property
last_modified_at
¶ The datetime that the dataset was last modified.
-
property
last_loaded_at
¶ The datetime that the dataset was last loaded.
-
property
persistent
¶ Whether the dataset persists in the database after a session is terminated.
A list of tags on the dataset.
Examples:
import fiftyone as fo dataset = fo.Dataset() # Add some tags dataset.tags = ["test", "projectA"] # Edit the tags dataset.tags.pop() dataset.tags.append("projectB") dataset.save() # must save after edits
-
property
description
¶ A string description on the dataset.
Examples:
import fiftyone as fo dataset = fo.Dataset() # Store a description on the dataset dataset.description = "Your description here"
-
property
info
¶ A user-facing dictionary of information about the dataset.
Examples:
import fiftyone as fo dataset = fo.Dataset() # Store a class list in the dataset's info dataset.info = {"classes": ["cat", "dog"]} # Edit the info dataset.info["other_classes"] = ["bird", "plane"] dataset.save() # must save after edits
-
property
app_config
¶ A
fiftyone.core.odm.dataset.DatasetAppConfig
that customizes how this dataset is visualized in the FiftyOne App.Examples:
import fiftyone as fo import fiftyone.utils.image as foui import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") # View the dataset's current App config print(dataset.app_config) # Generate some thumbnail images foui.transform_images( dataset, size=(-1, 32), output_field="thumbnail_path", output_dir="/tmp/thumbnails", ) # Modify the dataset's App config dataset.app_config.media_fields = ["filepath", "thumbnail_path"] dataset.app_config.grid_media_field = "thumbnail_path" dataset.save() # must save after edits session = fo.launch_app(dataset)
-
property
classes
¶ A dict mapping field names to list of class label strings for the corresponding fields of the dataset.
Examples:
import fiftyone as fo dataset = fo.Dataset() # Set classes for the `ground_truth` and `predictions` fields dataset.classes = { "ground_truth": ["cat", "dog"], "predictions": ["cat", "dog", "other"], } # Edit an existing classes list dataset.classes["ground_truth"].append("other") dataset.save() # must save after edits
-
property
default_classes
¶ A list of class label strings for all
fiftyone.core.labels.Label
fields of this dataset that do not have customized classes defined inclasses()
.Examples:
import fiftyone as fo dataset = fo.Dataset() # Set default classes dataset.default_classes = ["cat", "dog"] # Edit the default classes dataset.default_classes.append("rabbit") dataset.save() # must save after edits
-
property
mask_targets
¶ A dict mapping field names to mask target dicts, each of which defines a mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks in the corresponding field of the dataset.
Examples:
import fiftyone as fo # # 2D masks # dataset = fo.Dataset() # Set mask targets for the `ground_truth` and `predictions` fields dataset.mask_targets = { "ground_truth": {1: "cat", 2: "dog"}, "predictions": {1: "cat", 2: "dog", 255: "other"}, } # Or, for RGB mask targets dataset.mask_targets = { "segmentations": {"#3f0a44": "road", "#eeffee": "building", "#ffffff": "other"} } # Edit an existing mask target dataset.mask_targets["ground_truth"][255] = "other" dataset.save() # must save after edits # # 3D masks # dataset = fo.Dataset() # Set mask targets for the `ground_truth` and `predictions` fields dataset.mask_targets = { "ground_truth": {"#499CEF": "cat", "#6D04FF": "dog"}, "predictions": { "#499CEF": "cat", "#6D04FF": "dog", "#FF6D04": "person" }, } # Edit an existing mask target dataset.mask_targets["ground_truth"]["#FF6D04"] = "person" dataset.save() # must save after edits
-
property
default_mask_targets
¶ A dict defining a default mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks of all
fiftyone.core.labels.Segmentation
fields of this dataset that do not have customized mask targets defined inmask_targets()
.Examples:
import fiftyone as fo # # 2D masks # dataset = fo.Dataset() # Set default mask targets dataset.default_mask_targets = {1: "cat", 2: "dog"} # Or, for RGB mask targets dataset.default_mask_targets = {"#3f0a44": "road", "#eeffee": "building", "#ffffff": "other"} # Edit the default mask targets dataset.default_mask_targets[255] = "other" dataset.save() # must save after edits # # 3D masks # dataset = fo.Dataset() # Set default mask targets dataset.default_mask_targets = {"#499CEF": "cat", "#6D04FF": "dog"} # Edit the default mask targets dataset.default_mask_targets["#FF6D04"] = "person" dataset.save() # must save after edits
-
property
skeletons
¶ A dict mapping field names to
fiftyone.core.odm.dataset.KeypointSkeleton
instances, each of which defines the semantic labels and point connectivity for thefiftyone.core.labels.Keypoint
instances in the corresponding field of the dataset.Examples:
import fiftyone as fo dataset = fo.Dataset() # Set keypoint skeleton for the `ground_truth` field dataset.skeletons = { "ground_truth": fo.KeypointSkeleton( labels=[ "left hand" "left shoulder", "right shoulder", "right hand", "left eye", "right eye", "mouth", ], edges=[[0, 1, 2, 3], [4, 5, 6]], ) } # Edit an existing skeleton dataset.skeletons["ground_truth"].labels[-1] = "lips" dataset.save() # must save after edits
-
property
default_skeleton
¶ A default
fiftyone.core.odm.dataset.KeypointSkeleton
defining the semantic labels and point connectivity for allfiftyone.core.labels.Keypoint
fields of this dataset that do not have customized skeletons defined inskeleton()
.Examples:
import fiftyone as fo dataset = fo.Dataset() # Set default keypoint skeleton dataset.default_skeleton = fo.KeypointSkeleton( labels=[ "left hand" "left shoulder", "right shoulder", "right hand", "left eye", "right eye", "mouth", ], edges=[[0, 1, 2, 3], [4, 5, 6]], ) # Edit the default skeleton dataset.default_skeleton.labels[-1] = "lips" dataset.save() # must save after edits
-
property
deleted
¶ Whether the dataset is deleted.
-
summary
()¶ Returns a string summary of the dataset.
- Returns
a string summary
-
stats
(include_media=False, include_indexes=False, compressed=False)¶ Returns stats about the dataset on disk.
The
samples
keys refer to the sample documents stored in the database.For video datasets, the
frames
keys refer to the frame documents stored in the database.The
media
keys refer to the raw media associated with each sample on disk.The
index[es]
keys refer to the indexes associated with the dataset.Note that dataset-level metadata such as annotation runs are not included in this computation.
- Parameters
include_media (False) – whether to include stats about the size of the raw media in the dataset
include_indexes (False) – whether to include stats on the dataset’s indexes
compressed (False) – whether to return the sizes of collections in their compressed form on disk (True) or the logical uncompressed size of the collections (False)
- Returns
a stats dict
-
first
()¶ Returns the first sample in the dataset.
- Returns
-
last
()¶ Returns the last sample in the dataset.
- Returns
-
head
(num_samples=3)¶ Returns a list of the first few samples in the dataset.
If fewer than
num_samples
samples are in the dataset, only the available samples are returned.- Parameters
num_samples (3) – the number of samples
- Returns
a list of
fiftyone.core.sample.Sample
objects
-
tail
(num_samples=3)¶ Returns a list of the last few samples in the dataset.
If fewer than
num_samples
samples are in the dataset, only the available samples are returned.- Parameters
num_samples (3) – the number of samples
- Returns
a list of
fiftyone.core.sample.Sample
objects
-
one
(expr, exact=False)¶ Returns a single sample in this dataset matching the expression.
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Get a sample by filepath # # A random filepath in the dataset filepath = dataset.take(1).first().filepath # Get sample by filepath sample = dataset.one(F("filepath") == filepath) # # Dealing with multiple matches # # Get a sample whose image is JPEG sample = dataset.one(F("filepath").ends_with(".jpg")) # Raises an error since there are multiple JPEGs dataset.one(F("filepath").ends_with(".jpg"), exact=True)
- Parameters
expr – a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that evaluates toTrue
for the sample to matchexact (False) – whether to raise an error if multiple samples match the expression
- Returns
-
view
()¶ Returns a
fiftyone.core.view.DatasetView
containing the entire dataset.- Returns
-
get_field_schema
(ftype=None, embedded_doc_type=None, read_only=None, info_keys=None, created_after=None, include_private=False, flat=False, mode=None)¶ Returns a schema dictionary describing the fields of the samples in the dataset.
- Parameters
ftype (None) – an optional field type or iterable of types to which to restrict the returned schema. Must be subclass(es) of
fiftyone.core.fields.Field
embedded_doc_type (None) – an optional embedded document type or iterable of types to which to restrict the returned schema. Must be subclass(es) of
fiftyone.core.odm.BaseEmbeddedDocument
read_only (None) – whether to restrict to (True) or exclude (False) read-only fields. By default, all fields are included
info_keys (None) – an optional key or list of keys that must be in the field’s
info
dictcreated_after (None) – an optional
datetime
specifying a minimum creation dateinclude_private (False) – whether to include fields that start with
_
in the returned schemaflat (False) – whether to return a flattened schema where all embedded document fields are included as top-level keys
mode (None) – whether to apply the above constraints before and/or after flattening the schema. Only applicable when
flat
is True. Supported values are("before", "after", "both")
. The default is"after"
- Returns
a dict mapping field names to
fiftyone.core.fields.Field
instances
-
get_frame_field_schema
(ftype=None, embedded_doc_type=None, read_only=None, info_keys=None, created_after=None, include_private=False, flat=False, mode=None)¶ Returns a schema dictionary describing the fields of the frames of the samples in the dataset.
Only applicable for datasets that contain videos.
- Parameters
ftype (None) – an optional field type or iterable of types to which to restrict the returned schema. Must be subclass(es) of
fiftyone.core.fields.Field
embedded_doc_type (None) – an optional embedded document type or iterable of types to which to restrict the returned schema. Must be subclass(es) of
fiftyone.core.odm.BaseEmbeddedDocument
read_only (None) – whether to restrict to (True) or exclude (False) read-only fields. By default, all fields are included
info_keys (None) – an optional key or list of keys that must be in the field’s
info
dictcreated_after (None) – an optional
datetime
specifying a minimum creation dateinclude_private (False) – whether to include fields that start with
_
in the returned schemaflat (False) – whether to return a flattened schema where all embedded document fields are included as top-level keys
mode (None) – whether to apply the above constraints before and/or after flattening the schema. Only applicable when
flat
is True. Supported values are("before", "after", "both")
. The default is"after"
- Returns
a dict mapping field names to
fiftyone.core.fields.Field
instances, orNone
if the dataset does not contain videos
-
add_sample_field
(field_name, ftype, embedded_doc_type=None, subfield=None, fields=None, description=None, info=None, read_only=False, **kwargs)¶ Adds a new sample field or embedded field to the dataset, if necessary.
- Parameters
field_name – the field name or
embedded.field.name
ftype – the field type to create. Must be a subclass of
fiftyone.core.fields.Field
embedded_doc_type (None) – the
fiftyone.core.odm.BaseEmbeddedDocument
type of the field. Only applicable whenftype
isfiftyone.core.fields.EmbeddedDocumentField
subfield (None) – the
fiftyone.core.fields.Field
type of the contained field. Only applicable whenftype
isfiftyone.core.fields.ListField
orfiftyone.core.fields.DictField
fields (None) – a list of
fiftyone.core.fields.Field
instances defining embedded document attributes. Only applicable whenftype
isfiftyone.core.fields.EmbeddedDocumentField
description (None) – an optional description
info (None) – an optional info dict
read_only (False) – whether the field should be read-only
- Raises
ValueError – if a field of the same name already exists and it is not compliant with the specified values
-
add_dynamic_sample_fields
(fields=None, recursive=True, add_mixed=False)¶ Adds all dynamic sample fields to the dataset’s schema.
Dynamic fields are embedded document fields with at least one non-None value that have not been declared on the dataset’s schema.
- Parameters
fields (None) – an optional field or iterable of fields for which to add dynamic fields. By default, all fields are considered
recursive (True) – whether to recursively inspect nested lists and embedded documents for dynamic fields
add_mixed (False) – whether to declare fields that contain values of mixed types as generic
fiftyone.core.fields.Field
instances (True) or to skip such fields (False)
-
add_frame_field
(field_name, ftype, embedded_doc_type=None, subfield=None, fields=None, description=None, info=None, read_only=False, **kwargs)¶ Adds a new frame-level field or embedded field to the dataset, if necessary.
Only applicable to datasets that contain videos.
- Parameters
field_name – the field name or
embedded.field.name
ftype – the field type to create. Must be a subclass of
fiftyone.core.fields.Field
embedded_doc_type (None) – the
fiftyone.core.odm.BaseEmbeddedDocument
type of the field. Only applicable whenftype
isfiftyone.core.fields.EmbeddedDocumentField
subfield (None) – the
fiftyone.core.fields.Field
type of the contained field. Only applicable whenftype
isfiftyone.core.fields.ListField
orfiftyone.core.fields.DictField
fields (None) – a list of
fiftyone.core.fields.Field
instances defining embedded document attributes. Only applicable whenftype
isfiftyone.core.fields.EmbeddedDocumentField
description (None) – an optional description
info (None) – an optional info dict
read_only (False) – whether the field should be read-only
- Raises
ValueError – if a field of the same name already exists and it is not compliant with the specified values
-
list_summary_fields
()¶ Lists the summary fields on the dataset.
Use
create_summary_field()
to create summary fields, and usedelete_summary_field()
to delete them.- Returns
a list of summary field names
-
create_summary_field
(path, field_name=None, sidebar_group=None, include_counts=False, group_by=None, read_only=True, create_index=True)¶ Populates a sample-level field that records the unique values or numeric ranges that appear in the specified field on each sample in the dataset.
This method is particularly useful for summarizing frame-level fields of video datasets, in which case the sample-level field records the unique values or numeric ranges that appear in the specified frame-level field across all frames of that sample. This summary field can then be efficiently queried to retrieve samples that contain specific values of interest in at least one frame.
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart-video") dataset.set_field("frames.detections.detections.confidence", F.rand()).save() # Generate a summary field for object labels dataset.create_summary_field("frames.detections.detections.label") # Generate a summary field for [min, max] confidences dataset.create_summary_field("frames.detections.detections.confidence") # Generate a summary field for object labels and counts dataset.create_summary_field( "frames.detections.detections.label", field_name="frames_detections_label2", include_counts=True, ) # Generate a summary field for per-label [min, max] confidences dataset.create_summary_field( "frames.detections.detections.confidence", field_name="frames_detections_confidence2", group_by="label", ) print(dataset.list_summary_fields())
- Parameters
path – an input field path
field_name (None) – the sample-level field in which to store the summary data. By default, a suitable name is derived from the given
path
sidebar_group (None) – the name of a App sidebar group to which to add the summary field. By default, all summary fields are added to a
"summaries"
group. You can passFalse
to skip sidebar group modificationinclude_counts (False) – whether to include per-value counts when summarizing categorical fields
group_by (None) – an optional attribute to group by when
path
is a numeric field to generate per-attribute[min, max]
ranges. This may either be an absolute path or an attribute name that is interpreted relative topath
read_only (True) – whether to mark the summary field as read-only
create_index (True) – whether to create database index(es) for the summary field
- Returns
the summary field name
-
check_summary_fields
()¶ Returns a list of summary fields that may need to be updated.
Summary fields may need to be updated whenever there have been modifications to the dataset’s samples since the summaries were last generated.
Note that inclusion in this list is only a heuristic, as any sample modifications may not have affected the summary’s source field.
- Returns
list of summary field names
-
update_summary_field
(field_name)¶ Updates the summary field based on the current values of its source field.
- Parameters
field_name – the summary field
-
delete_summary_field
(field_name, error_level=0)¶ Deletes the summary field from all samples in the dataset.
- Parameters
field_name – the summary field
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a summary field cannot be deleted
1 (-) – log warning if a summary field cannot be deleted
2 (-) – ignore summary fields that cannot be deleted
-
delete_summary_fields
(field_names, error_level=0)¶ Deletes the summary fields from all samples in the dataset.
- Parameters
field_names – the summary field or iterable of summary fields
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a summary field cannot be deleted
1 (-) – log warning if a summary field cannot be deleted
2 (-) – ignore summary fields that cannot be deleted
-
add_dynamic_frame_fields
(fields=None, recursive=True, add_mixed=False)¶ Adds all dynamic frame fields to the dataset’s schema.
Dynamic fields are embedded document fields with at least one non-None value that have not been declared on the dataset’s schema.
- Parameters
fields (None) – an optional field or iterable of fields for which to add dynamic fields. By default, all fields are considered
recursive (True) – whether to recursively inspect nested lists and embedded documents for dynamic fields
add_mixed (False) – whether to declare fields that contain values of mixed types as generic
fiftyone.core.fields.Field
instances (True) or to skip such fields (False)
-
add_group_field
(field_name, default=None, description=None, info=None, read_only=False)¶ Adds a group field to the dataset, if necessary.
- Parameters
field_name – the field name
default (None) – a default group slice for the field
description (None) – an optional description
info (None) – an optional info dict
read_only (False) – whether the field should be read-only
- Raises
ValueError – if a group field with another name already exists
-
rename_sample_field
(field_name, new_field_name)¶ Renames the sample field to the given new name.
You can use dot notation (
embedded.field.name
) to rename embedded fields.- Parameters
field_name – the field name or
embedded.field.name
new_field_name – the new field name or
embedded.field.name
-
rename_sample_fields
(field_mapping)¶ Renames the sample fields to the given new names.
You can use dot notation (
embedded.field.name
) to rename embedded fields.- Parameters
field_mapping – a dict mapping field names to new field names
-
rename_frame_field
(field_name, new_field_name)¶ Renames the frame-level field to the given new name.
You can use dot notation (
embedded.field.name
) to rename embedded frame fields.Only applicable to datasets that contain videos.
- Parameters
field_name – the field name or
embedded.field.name
new_field_name – the new field name or
embedded.field.name
-
rename_frame_fields
(field_mapping)¶ Renames the frame-level fields to the given new names.
You can use dot notation (
embedded.field.name
) to rename embedded frame fields.- Parameters
field_mapping – a dict mapping field names to new field names
-
clone_sample_field
(field_name, new_field_name)¶ Clones the given sample field into a new field of the dataset.
You can use dot notation (
embedded.field.name
) to clone embedded fields.- Parameters
field_name – the field name or
embedded.field.name
new_field_name – the new field name or
embedded.field.name
-
clone_sample_fields
(field_mapping)¶ Clones the given sample fields into new fields of the dataset.
You can use dot notation (
embedded.field.name
) to clone embedded fields.- Parameters
field_mapping – a dict mapping field names to new field names into which to clone each field
-
clone_frame_field
(field_name, new_field_name)¶ Clones the frame-level field into a new field.
You can use dot notation (
embedded.field.name
) to clone embedded frame fields.Only applicable to datasets that contain videos.
- Parameters
field_name – the field name or
embedded.field.name
new_field_name – the new field name or
embedded.field.name
-
clone_frame_fields
(field_mapping)¶ Clones the frame-level fields into new fields.
You can use dot notation (
embedded.field.name
) to clone embedded frame fields.Only applicable to datasets that contain videos.
- Parameters
field_mapping – a dict mapping field names to new field names into which to clone each field
-
clear_sample_field
(field_name)¶ Clears the values of the field from all samples in the dataset.
The field will remain in the dataset’s schema, and all samples will have the value
None
for the field.You can use dot notation (
embedded.field.name
) to clear embedded fields.- Parameters
field_name – the field name or
embedded.field.name
-
clear_sample_fields
(field_names)¶ Clears the values of the fields from all samples in the dataset.
The field will remain in the dataset’s schema, and all samples will have the value
None
for the field.You can use dot notation (
embedded.field.name
) to clear embedded fields.- Parameters
field_names – the field name or iterable of field names
-
clear_frame_field
(field_name)¶ Clears the values of the frame-level field from all samples in the dataset.
The field will remain in the dataset’s frame schema, and all frames will have the value
None
for the field.You can use dot notation (
embedded.field.name
) to clear embedded frame fields.Only applicable to datasets that contain videos.
- Parameters
field_name – the field name or
embedded.field.name
-
clear_frame_fields
(field_names)¶ Clears the values of the frame-level fields from all samples in the dataset.
The fields will remain in the dataset’s frame schema, and all frames will have the value
None
for the field.You can use dot notation (
embedded.field.name
) to clear embedded frame fields.Only applicable to datasets that contain videos.
- Parameters
field_names – the field name or iterable of field names
-
delete_sample_field
(field_name, error_level=0)¶ Deletes the field from all samples in the dataset.
You can use dot notation (
embedded.field.name
) to delete embedded fields.- Parameters
field_name – the field name or
embedded.field.name
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be deleted
1 (-) – log warning if a top-level field cannot be deleted
2 (-) – ignore top-level fields that cannot be deleted
-
delete_sample_fields
(field_names, error_level=0)¶ Deletes the fields from all samples in the dataset.
You can use dot notation (
embedded.field.name
) to delete embedded fields.- Parameters
field_names – the field name or iterable of field names
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be deleted
1 (-) – log warning if a top-level field cannot be deleted
2 (-) – ignore top-level fields that cannot be deleted
-
remove_dynamic_sample_field
(field_name, error_level=0)¶ Removes the dynamic embedded sample field from the dataset’s schema.
The underlying data is not deleted from the samples.
- Parameters
field_name – the
embedded.field.name
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be removed
1 (-) – log warning if a top-level field cannot be removed
2 (-) – ignore top-level fields that cannot be removed
-
remove_dynamic_sample_fields
(field_names, error_level=0)¶ Removes the dynamic embedded sample fields from the dataset’s schema.
The underlying data is not deleted from the samples.
- Parameters
field_names – the
embedded.field.name
or iterable of field nameserror_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be removed
1 (-) – log warning if a top-level field cannot be removed
2 (-) – ignore top-level fields that cannot be removed
-
delete_frame_field
(field_name, error_level=0)¶ Deletes the frame-level field from all samples in the dataset.
You can use dot notation (
embedded.field.name
) to delete embedded frame fields.Only applicable to datasets that contain videos.
- Parameters
field_name – the field name or
embedded.field.name
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be deleted
1 (-) – log warning if a top-level field cannot be deleted
2 (-) – ignore top-level fields that cannot be deleted
-
delete_frame_fields
(field_names, error_level=0)¶ Deletes the frame-level fields from all samples in the dataset.
You can use dot notation (
embedded.field.name
) to delete embedded frame fields.Only applicable to datasets that contain videos.
- Parameters
field_names – a field name or iterable of field names
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be deleted
1 (-) – log warning if a top-level field cannot be deleted
2 (-) – ignore top-level fields that cannot be deleted
-
remove_dynamic_frame_field
(field_name, error_level=0)¶ Removes the dynamic embedded frame field from the dataset’s schema.
The underlying data is not deleted from the frames.
- Parameters
field_name – the
embedded.field.name
error_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be removed
1 (-) – log warning if a top-level field cannot be removed
2 (-) – ignore top-level fields that cannot be removed
-
remove_dynamic_frame_fields
(field_names, error_level=0)¶ Removes the dynamic embedded frame fields from the dataset’s schema.
The underlying data is not deleted from the frames.
- Parameters
field_names – the
embedded.field.name
or iterable of field nameserror_level (0) – the error level to use. Valid values are:
0 (-) – raise error if a top-level field cannot be removed
1 (-) – log warning if a top-level field cannot be removed
2 (-) – ignore top-level fields that cannot be removed
-
add_group_slice
(name, media_type)¶ Adds a group slice with the given media type to the dataset, if necessary.
- Parameters
name – a group slice name
media_type – the media type of the slice
-
rename_group_slice
(name, new_name)¶ Renames the group slice with the given name.
- Parameters
name – the group slice name
new_name – the new group slice name
-
delete_group_slice
(name)¶ Deletes all samples in the given group slice from the dataset.
- Parameters
name – a group slice name
-
iter_samples
(progress=False, autosave=False, batch_size=None, batching_strategy=None)¶ Returns an iterator over the samples in the dataset.
Examples:
import random as r import string as s import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("cifar10", split="test") def make_label(): return "".join(r.choice(s.ascii_letters) for i in range(10)) # No save context for sample in dataset.iter_samples(progress=True): sample.ground_truth.label = make_label() sample.save() # Save using default batching strategy for sample in dataset.iter_samples(progress=True, autosave=True): sample.ground_truth.label = make_label() # Save in batches of 10 for sample in dataset.iter_samples( progress=True, autosave=True, batch_size=10 ): sample.ground_truth.label = make_label() # Save every 0.5 seconds for sample in dataset.iter_samples( progress=True, autosave=True, batch_size=0.5 ): sample.ground_truth.label = make_label()
- Parameters
progress (False) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke insteadautosave (False) – whether to automatically save changes to samples emitted by this iterator
batch_size (None) – the batch size to use when autosaving samples. If a
batching_strategy
is provided, this parameter configures the strategy as described below. If nobatching_strategy
is provided, this can either be an integer specifying the number of samples to save in a batch (in which casebatching_strategy
is implicitly set to"static"
) or a float number of seconds between batched saves (in which casebatching_strategy
is implicitly set to"latency"
)batching_strategy (None) –
the batching strategy to use for each save operation when autosaving samples. Supported values are:
"static"
: a fixed sample batch size for each save"size"
: a target batch size, in bytes, for each save"latency"
: a target latency, in seconds, between saves
By default,
fo.config.default_batcher
is used
- Returns
an iterator over
fiftyone.core.sample.Sample
instances
-
iter_groups
(group_slices=None, progress=False, autosave=False, batch_size=None, batching_strategy=None)¶ Returns an iterator over the groups in the dataset.
Examples:
import random as r import string as s import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") def make_label(): return "".join(r.choice(s.ascii_letters) for i in range(10)) # No save context for group in dataset.iter_groups(progress=True): for sample in group.values(): sample["test"] = make_label() sample.save() # Save using default batching strategy for group in dataset.iter_groups(progress=True, autosave=True): for sample in group.values(): sample["test"] = make_label() # Save in batches of 10 for group in dataset.iter_groups( progress=True, autosave=True, batch_size=10 ): for sample in group.values(): sample["test"] = make_label() # Save every 0.5 seconds for group in dataset.iter_groups( progress=True, autosave=True, batch_size=0.5 ): for sample in group.values(): sample["test"] = make_label()
- Parameters
group_slices (None) – an optional subset of group slices to load
progress (False) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke insteadautosave (False) – whether to automatically save changes to samples emitted by this iterator
batch_size (None) – the batch size to use when autosaving samples. If a
batching_strategy
is provided, this parameter configures the strategy as described below. If nobatching_strategy
is provided, this can either be an integer specifying the number of samples to save in a batch (in which casebatching_strategy
is implicitly set to"static"
) or a float number of seconds between batched saves (in which casebatching_strategy
is implicitly set to"latency"
)batching_strategy (None) –
the batching strategy to use for each save operation when autosaving samples. Supported values are:
"static"
: a fixed sample batch size for each save"size"
: a target batch size, in bytes, for each save"latency"
: a target latency, in seconds, between saves
By default,
fo.config.default_batcher
is used
- Returns
an iterator that emits dicts mapping group slice names to
fiftyone.core.sample.Sample
instances, one per group
-
get_group
(group_id, group_slices=None)¶ Returns a dict containing the samples for the given group ID.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") group_id = dataset.take(1).first().group.id group = dataset.get_group(group_id) print(group.keys()) # ['left', 'right', 'pcd']
- Parameters
group_id – a group ID
group_slices (None) – an optional subset of group slices to load
- Returns
a dict mapping group names to
fiftyone.core.sample.Sample
instances- Raises
KeyError – if the group ID is not found
-
add_sample
(sample, expand_schema=True, dynamic=False, validate=True)¶ Adds the given sample to the dataset.
If the sample instance does not belong to a dataset, it is updated in-place to reflect its membership in this dataset. If the sample instance belongs to another dataset, it is not modified.
- Parameters
sample – a
fiftyone.core.sample.Sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if the sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
validate (True) – whether to validate that the fields of the sample are compliant with the dataset schema before adding it
- Returns
the ID of the sample in the dataset
-
add_samples
(samples, expand_schema=True, dynamic=False, validate=True, progress=None, num_samples=None)¶ Adds the given samples to the dataset.
Any sample instances that do not belong to a dataset are updated in-place to reflect membership in this dataset. Any sample instances that belong to other datasets are not modified.
- Parameters
samples – an iterable of
fiftyone.core.sample.Sample
instances or afiftyone.core.collections.SampleCollection
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
validate (True) – whether to validate that the fields of each sample are compliant with the dataset schema before adding it
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke insteadnum_samples (None) – the number of samples in
samples
. If not provided, this is computed (if possible) vialen(samples)
if needed for progress tracking
- Returns
a list of IDs of the samples in the dataset
-
add_collection
(sample_collection, include_info=True, overwrite_info=False, new_ids=False, progress=None)¶ Adds the contents of the given collection to the dataset.
This method is a special case of
Dataset.merge_samples()
that adds samples with new IDs to this dataset and omits any samples with existing IDs (the latter would only happen in rare cases).Use
Dataset.merge_samples()
if you have multiple datasets whose samples refer to the same source media.- Parameters
sample_collection – a
fiftyone.core.collections.SampleCollection
include_info (True) – whether to merge dataset-level information such as
info
andclasses
overwrite_info (False) – whether to overwrite existing dataset-level information. Only applicable when
include_info
is Truenew_ids (False) – whether to generate new sample/frame/group IDs. By default, the IDs of the input collection are retained
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples that were added to this dataset
-
merge_sample
(sample, key_field='filepath', skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, validate=True, dynamic=False)¶ Merges the fields of the given sample into this dataset.
By default, the sample is merged with an existing sample with the same absolute
filepath
, if one exists. Otherwise a new sample is inserted. You can customize this behavior via thekey_field
,skip_existing
, andinsert_new
parameters.The behavior of this method is highly customizable. By default, all top-level fields from the provided sample are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the sameid
in both samples are updated rather than duplicated.To avoid confusion between missing fields and fields whose value is
None
,None
-valued fields are always treated as missing while merging.This method can be configured in numerous ways, including:
Whether new fields can be added to the dataset schema
Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
Whether to merge only specific fields, or all but certain fields
Mapping input sample fields to different field names of this sample
- Parameters
sample – a
fiftyone.core.sample.Sample
key_field ("filepath") – the sample field to use to decide whether to join with an existing sample
skip_existing (False) – whether to skip existing samples (True) or merge them (False)
insert_new (True) – whether to insert new samples (True) or skip them (False)
fields (None) – an optional field or iterable of fields to which to restrict the merge. May contain frame fields for video samples. This can also be a dict mapping field names of the input sample to field names of this dataset
omit_fields (None) – an optional field or iterable of fields to exclude from the merge. May contain frame fields for video samples
merge_lists (True) – whether to merge the elements of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields) rather than merging the entire top-level field like other field types. For label lists fields, existingfiftyone.core.label.Label
elements are either replaced (whenoverwrite
is True) or kept (whenoverwrite
is False) when theirid
matches a label from the provided sampleoverwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements
expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema
validate (True) – whether to validate values for existing fields
dynamic (False) – whether to declare dynamic embedded document fields
-
merge_samples
(samples, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, include_info=True, overwrite_info=False, progress=None, num_samples=None)¶ Merges the given samples into this dataset.
Note
This method requires the ability to create unique indexes on the
key_field
of each collection.See
add_collection()
if you want to add samples from one collection to another dataset without a uniqueness constraint.By default, samples with the same absolute
filepath
are merged, but you can customize this behavior via thekey_field
andkey_fcn
parameters. For example, you could setkey_fcn = lambda sample: os.path.basename(sample.filepath)
to merge samples with the same base filename.The behavior of this method is highly customizable. By default, all top-level fields from the provided samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the sameid
in both collections are updated rather than duplicated.To avoid confusion between missing fields and fields whose value is
None
,None
-valued fields are always treated as missing while merging.This method can be configured in numerous ways, including:
Whether existing samples should be modified or skipped
Whether new samples should be added or omitted
Whether new fields can be added to the dataset schema
Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
Whether to merge only specific fields, or all but certain fields
Mapping input fields to different field names of this dataset
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
or iterable offiftyone.core.sample.Sample
instanceskey_field ("filepath") – the sample field to use to decide whether to join with an existing sample
key_fcn (None) – a function that accepts a
fiftyone.core.sample.Sample
instance and computes a key to decide if two samples should be merged. If akey_fcn
is provided,key_field
is ignoredskip_existing (False) – whether to skip existing samples (True) or merge them (False)
insert_new (True) – whether to insert new samples (True) or skip them (False)
fields (None) – an optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from
samples
when merging or adding samples. One exception is thatfilepath
is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this datasetomit_fields (None) – an optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from
samples
, if present, when merging or adding samples. One exception is thatfilepath
is always included when adding new samples, since the field is requiredmerge_lists (True) – whether to merge the elements of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields) rather than merging the entire top-level field like other field types. For label lists fields, existingfiftyone.core.label.Label
elements are either replaced (whenoverwrite
is True) or kept (whenoverwrite
is False) when theirid
matches a label from the provided samplesoverwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements
expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered. Only applicable when
samples
is not afiftyone.core.collections.SampleCollection
include_info (True) – whether to merge dataset-level information such as
info
andclasses
. Only applicable whensamples
is afiftyone.core.collections.SampleCollection
overwrite_info (False) – whether to overwrite existing dataset-level information. Only applicable when
samples
is afiftyone.core.collections.SampleCollection
andinclude_info
is Trueprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke insteadnum_samples (None) – the number of samples in
samples
. If not provided, this is computed (if possible) vialen(samples)
if needed for progress tracking
-
delete_samples
(samples_or_ids)¶ Deletes the given sample(s) from the dataset.
If reference to a sample exists in memory, the sample will be updated such that
sample.in_dataset
is False.- Parameters
samples_or_ids –
the sample(s) to delete. Can be any of the following:
a sample ID
an iterable of sample IDs
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
an iterable of
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
instances
-
delete_frames
(frames_or_ids)¶ Deletes the given frames(s) from the dataset.
If reference to a frame exists in memory, the frame will be updated such that
frame.in_dataset
is False.- Parameters
frames_or_ids –
the frame(s) to delete. Can be any of the following:
a frame ID
an iterable of frame IDs
a
fiftyone.core.frame.Frame
orfiftyone.core.frame.FrameView
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
whose frames to deletean iterable of
fiftyone.core.frame.Frame
orfiftyone.core.frame.FrameView
instancesan iterable of
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
instances whose frames to deletea
fiftyone.core.collections.SampleCollection
whose frames to delete
-
delete_groups
(groups_or_ids)¶ Deletes the given groups(s) from the dataset.
If reference to a sample exists in memory, the sample will be updated such that
sample.in_dataset
is False.- Parameters
groups_or_ids –
the group(s) to delete. Can be any of the following:
a group ID
an iterable of group IDs
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
a group dict returned by
get_group()
an iterable of
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
instancesan iterable of group dicts returned by
get_group()
-
delete_labels
(labels=None, ids=None, tags=None, view=None, fields=None)¶ Deletes the specified labels from the dataset.
You can specify the labels to delete via any of the following methods:
Provide the
labels
argument, which should contain a list of dicts in the format returned byfiftyone.core.session.Session.selected_labels
Provide the
ids
ortags
arguments to specify the labels to delete via their IDs and/or tagsProvide the
view
argument to delete all of the labels in a view into this dataset. This syntax is useful if you have constructed afiftyone.core.view.DatasetView
defining the labels to delete
Additionally, you can specify the
fields
argument to restrict deletion to specific field(s), either for efficiency or to ensure that labels from other fields are not deleted if their contents are included in the other arguments.- Parameters
labels (None) – a list of dicts specifying the labels to delete in the format returned by
fiftyone.core.session.Session.selected_labels
ids (None) – an ID or iterable of IDs of the labels to delete
tags (None) – a tag or iterable of tags of the labels to delete
view (None) – a
fiftyone.core.view.DatasetView
into this dataset containing the labels to deletefields (None) – a field or iterable of fields from which to delete labels
-
save
()¶ Saves the dataset to the database.
This only needs to be called when dataset-level information such as its
Dataset.info()
is modified.
-
property
has_saved_views
¶ Whether this dataset has any saved views.
-
has_saved_view
(name)¶ Whether this dataset has a saved view with the given name.
- Parameters
name – a saved view name
- Returns
True/False
-
list_saved_views
(info=False)¶ List saved views on this dataset.
- Parameters
info (False) – whether to return info dicts describing each saved view rather than just their names
- Returns
a list of saved view names or info dicts
-
save_view
(name, view, description=None, color=None, overwrite=False)¶ Saves the given view into this dataset under the given name so it can be loaded later via
load_saved_view()
.Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") view = dataset.filter_labels("ground_truth", F("label") == "cat") dataset.save_view("cats", view) also_view = dataset.load_saved_view("cats") assert view == also_view
- Parameters
name – a name for the saved view
view – a
fiftyone.core.view.DatasetView
description (None) – an optional string description
color (None) – an optional RGB hex string like
'#FF6D04'
overwrite (False) – whether to overwrite an existing saved view with the same name
-
get_saved_view_info
(name)¶ Loads the editable information about the saved view with the given name.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") view = dataset.limit(10) dataset.save_view("test", view) print(dataset.get_saved_view_info("test"))
- Parameters
name – the name of a saved view
- Returns
a dict of editable info
-
update_saved_view_info
(name, info)¶ Updates the editable information for the saved view with the given name.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") view = dataset.limit(10) dataset.save_view("test", view) # Update the saved view's name and add a description info = dict( name="a new name", description="a description", ) dataset.update_saved_view_info("test", info)
- Parameters
name – the name of a saved view
info – a dict whose keys are a subset of the keys returned by
get_saved_view_info()
-
load_saved_view
(name)¶ Loads the saved view with the given name.
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") view = dataset.filter_labels("ground_truth", F("label") == "cat") dataset.save_view("cats", view) also_view = dataset.load_saved_view("cats") assert view == also_view
- Parameters
name – the name of a saved view
- Returns
-
delete_saved_view
(name)¶ Deletes the saved view with the given name.
- Parameters
name – the name of a saved view
-
delete_saved_views
()¶ Deletes all saved views from this dataset.
-
property
has_workspaces
¶ Whether this dataset has any saved workspaces.
-
has_workspace
(name)¶ Whether this dataset has a saved workspace with the given name.
- Parameters
name – a saved workspace name
- Returns
True/False
-
list_workspaces
(info=False)¶ List saved workspaces on this dataset.
- Parameters
info (False) – whether to return info dicts describing each saved workspace rather than just their names
- Returns
a list of saved workspace names or info dicts
-
save_workspace
(name, workspace, description=None, color=None, overwrite=False)¶ Saves a workspace into this dataset under the given name so it can be loaded later via
load_workspace()
.Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") embeddings_panel = fo.Panel( type="Embeddings", state=dict( brainResult="img_viz", colorByField="metadata.size_bytes" ), ) workspace = fo.Space(children=[embeddings_panel]) workspace_name = "embeddings-workspace" description = "Show embeddings only" dataset.save_workspace( workspace_name, workspace, description=description ) assert dataset.has_workspace(workspace_name) also_workspace = dataset.load_workspace(workspace_name) assert workspace == also_workspace
- Parameters
name – a name for the saved workspace
workspace – a
fiftyone.core.odm.workspace.Space
description (None) – an optional string description
color (None) – an optional RGB hex string like
'#FF6D04'
overwrite (False) – whether to overwrite an existing workspace with the same name
- Raises
ValueError – if
overwrite==False
and workspace withname
already exists
-
load_workspace
(name)¶ Loads the saved workspace with the given name.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") embeddings_panel = fo.Panel( type="Embeddings", state=dict(brainResult="img_viz", colorByField="metadata.size_bytes"), ) workspace = fo.Space(children=[embeddings_panel]) workspace_name = "embeddings-workspace" dataset.save_workspace(workspace_name, workspace) # Some time later ... load the workspace loaded_workspace = dataset.load_workspace(workspace_name) assert workspace == loaded_workspace # Launch app with the loaded workspace! session = fo.launch_app(dataset, spaces=loaded_workspace) # Or set via session later on session.spaces = loaded_workspace
- Parameters
name – the name of a saved workspace
- Returns
- Raises
ValueError – if
name
is not a saved workspace
-
get_workspace_info
(name)¶ Gets the information about the workspace with the given name.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") workspace = fo.Space() description = "A really cool (apparently empty?) workspace" dataset.save_workspace("test", workspace, description=description) print(dataset.get_workspace_info("test"))
- Parameters
name – the name of a saved view
- Returns
a dict of editable info
-
update_workspace_info
(name, info)¶ Updates the editable information for the saved view with the given name.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") workspace = fo.Space() dataset.save_workspace("test", view) # Update the workspace's name and add a description, color info = dict( name="a new name", color="#FF6D04", description="a description", ) dataset.update_workspace_info("test", info)
- Parameters
name – the name of a saved workspace
info – a dict whose keys are a subset of the keys returned by
get_workspace_info()
-
delete_workspace
(name)¶ Deletes the saved workspace with the given name.
- Parameters
name – the name of a saved workspace
- Raises
ValueError – if
name
is not a saved workspace
-
delete_workspaces
()¶ Deletes all saved workspaces from this dataset.
-
clone
(name=None, persistent=False)¶ Creates a copy of the dataset.
Dataset clones contain deep copies of all samples and dataset-level information in the source dataset. The source media files, however, are not copied.
- Parameters
name (None) – a name for the cloned dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the cloned dataset should be persistent
- Returns
the new
Dataset
-
clear
()¶ Removes all samples from the dataset.
If reference to a sample exists in memory, the sample will be updated such that
sample.in_dataset
is False.
-
clear_frames
()¶ Removes all frame labels from the dataset.
If reference to a frame exists in memory, the frame will be updated such that
frame.in_dataset
is False.
-
ensure_frames
()¶ Ensures that the video dataset contains frame instances for every frame of each sample’s source video.
Empty frames will be inserted for missing frames, and already existing frames are left unchanged.
-
delete
()¶ Deletes the dataset.
Once deleted, only the
name
anddeleted
attributes of a dataset may be accessed.If reference to a sample exists in memory, the sample will be updated such that
sample.in_dataset
is False.
-
add_dir
(dataset_dir=None, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, progress=None, **kwargs)¶ Adds the contents of the given directory to the dataset.
You can perform imports with this method via the following basic patterns:
Provide
dataset_dir
anddataset_type
to import the contents of a directory that is organized in the default layout for the dataset type as documented in this guideProvide
dataset_type
along withdata_path
,labels_path
, or other type-specific parameters to perform a customized import. This syntax provides the flexibility to, for example, perform labels-only imports or imports where the source media lies in a different location than the labels
In either workflow, the remaining parameters of this method can be provided to further configure the import.
See this guide for example usages of this method and descriptions of the available dataset types.
- Parameters
dataset_dir (None) – the dataset directory. This can be omitted for certain dataset formats if you provide arguments such as
data_path
andlabels_path
dataset_type (None) – the
fiftyone.types.Dataset
type of the datasetdata_path (None) –
an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:
a folder name like
"data"
or"data/"
specifying a subfolder ofdataset_dir
in which the media liesan absolute directory path in which the media lies. In this case, the
dataset_dir
has no effect on the location of the dataa filename like
"data.json"
specifying the filename of a JSON manifest file indataset_dir
that maps UUIDs to media filepaths. Files of this format are generated when passing theexport_media="manifest"
option tofiftyone.core.collections.SampleCollection.export()
an absolute filepath to a JSON manifest file. In this case,
dataset_dir
has no effect on the location of the dataa dict mapping filenames to absolute filepaths
By default, it is assumed that the data can be located in the default location within
dataset_dir
for the dataset typelabels_path (None) –
an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:
a type-specific folder name like
"labels"
or"labels/"
or a filename like"labels.json"
or"labels.xml"
specifying the location indataset_dir
of the labels file(s)an absolute directory or filepath containing the labels file(s). In this case,
dataset_dir
has no effect on the location of the labels
For labeled datasets, this parameter defaults to the location in
dataset_dir
of the labels for the default layout of the dataset type being importedlabel_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
add_info (True) – whether to add dataset info from the importer (if any) to the dataset’s
info
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments to pass to the constructor of the
fiftyone.utils.data.importers.DatasetImporter
for the specifieddataset_type
- Returns
a list of IDs of the samples that were added to the dataset
-
merge_dir
(dataset_dir=None, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, progress=None, **kwargs)¶ Merges the contents of the given directory into the dataset.
Note
This method requires the ability to create unique indexes on the
key_field
of each collection.See
add_dir()
if you want to add samples without a uniqueness constraint.You can perform imports with this method via the following basic patterns:
Provide
dataset_dir
anddataset_type
to import the contents of a directory that is organized in the default layout for the dataset type as documented in this guideProvide
dataset_type
along withdata_path
,labels_path
, or other type-specific parameters to perform a customized import. This syntax provides the flexibility to, for example, perform labels-only imports or imports where the source media lies in a different location than the labels
In either workflow, the remaining parameters of this method can be provided to further configure the import.
See this guide for example usages of this method and descriptions of the available dataset types.
By default, samples with the same absolute
filepath
are merged, but you can customize this behavior via thekey_field
andkey_fcn
parameters. For example, you could setkey_fcn = lambda sample: os.path.basename(sample.filepath)
to merge samples with the same base filename.The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the sameid
in both collections are updated rather than duplicated.To avoid confusion between missing fields and fields whose value is
None
,None
-valued fields are always treated as missing while merging.This method can be configured in numerous ways, including:
Whether existing samples should be modified or skipped
Whether new samples should be added or omitted
Whether new fields can be added to the dataset schema
Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
Whether to merge only specific fields, or all but certain fields
Mapping input fields to different field names of this dataset
- Parameters
dataset_dir (None) – the dataset directory. This can be omitted for certain dataset formats if you provide arguments such as
data_path
andlabels_path
dataset_type (None) – the
fiftyone.types.Dataset
type of the datasetdata_path (None) –
an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:
a folder name like
"data"
or"data/"
specifying a subfolder ofdataset_dir
in which the media liesan absolute directory path in which the media lies. In this case, the
dataset_dir
has no effect on the location of the dataa filename like
"data.json"
specifying the filename of a JSON manifest file indataset_dir
that maps UUIDs to media filepaths. Files of this format are generated when passing theexport_media="manifest"
option tofiftyone.core.collections.SampleCollection.export()
an absolute filepath to a JSON manifest file. In this case,
dataset_dir
has no effect on the location of the dataa dict mapping filenames to absolute filepaths
By default, it is assumed that the data can be located in the default location within
dataset_dir
for the dataset typelabels_path (None) –
an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:
a type-specific folder name like
"labels"
or"labels/"
or a filename like"labels.json"
or"labels.xml"
specifying the location indataset_dir
of the labels file(s)an absolute directory or filepath containing the labels file(s). In this case,
dataset_dir
has no effect on the location of the labels
For labeled datasets, this parameter defaults to the location in
dataset_dir
of the labels for the default layout of the dataset type being importedlabel_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
key_field ("filepath") – the sample field to use to decide whether to join with an existing sample
key_fcn (None) – a function that accepts a
fiftyone.core.sample.Sample
instance and computes a key to decide if two samples should be merged. If akey_fcn
is provided,key_field
is ignoredskip_existing (False) – whether to skip existing samples (True) or merge them (False)
insert_new (True) – whether to insert new samples (True) or skip them (False)
fields (None) – an optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from
samples
when merging or adding samples. One exception is thatfilepath
is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this datasetomit_fields (None) – an optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that
filepath
is always included when adding new samples, since the field is requiredmerge_lists (True) – whether to merge the elements of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields) rather than merging the entire top-level field like other field types. For label lists fields, existingfiftyone.core.label.Label
elements are either replaced (whenoverwrite
is True) or kept (whenoverwrite
is False) when theirid
matches a label from the provided samplesoverwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements
expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
add_info (True) – whether to add dataset info from the importer (if any) to the dataset
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments to pass to the constructor of the
fiftyone.utils.data.importers.DatasetImporter
for the specifieddataset_type
-
add_archive
(archive_path, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, cleanup=True, progress=None, **kwargs)¶ Adds the contents of the given archive to the dataset.
If a directory with the same root name as
archive_path
exists, it is assumed that this directory contains the extracted contents of the archive, and thus the archive is not re-extracted.See this guide for example usages of this method and descriptions of the available dataset types.
Note
The following archive formats are explicitly supported:
.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz
If an archive not in the above list is found, extraction will be attempted via the
patool
package, which supports many formats but may require that additional system packages be installed.- Parameters
archive_path – the path to an archive of a dataset directory
dataset_type (None) – the
fiftyone.types.Dataset
type of the dataset inarchive_path
data_path (None) –
an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:
a folder name like
"data"
or"data/"
specifying a subfolder ofdataset_dir
in which the media liesan absolute directory path in which the media lies. In this case, the
archive_path
has no effect on the location of the dataa filename like
"data.json"
specifying the filename of a JSON manifest file inarchive_path
that maps UUIDs to media filepaths. Files of this format are generated when passing theexport_media="manifest"
option tofiftyone.core.collections.SampleCollection.export()
an absolute filepath to a JSON manifest file. In this case,
archive_path
has no effect on the location of the dataa dict mapping filenames to absolute filepaths
By default, it is assumed that the data can be located in the default location within
archive_path
for the dataset typelabels_path (None) –
an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:
a type-specific folder name like
"labels"
or"labels/"
or a filename like"labels.json"
or"labels.xml"
specifying the location inarchive_path
of the labels file(s)an absolute directory or filepath containing the labels file(s). In this case,
archive_path
has no effect on the location of the labels
For labeled datasets, this parameter defaults to the location in
archive_path
of the labels for the default layout of the dataset type being importedlabel_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
add_info (True) – whether to add dataset info from the importer (if any) to the dataset’s
info
cleanup (True) – whether to delete the archive after extracting it
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments to pass to the constructor of the
fiftyone.utils.data.importers.DatasetImporter
for the specifieddataset_type
- Returns
a list of IDs of the samples that were added to the dataset
-
merge_archive
(archive_path, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, cleanup=True, progress=None, **kwargs)¶ Merges the contents of the given archive into the dataset.
Note
This method requires the ability to create unique indexes on the
key_field
of each collection.See
add_archive()
if you want to add samples without a uniqueness constraint.If a directory with the same root name as
archive_path
exists, it is assumed that this directory contains the extracted contents of the archive, and thus the archive is not re-extracted.See this guide for example usages of this method and descriptions of the available dataset types.
Note
The following archive formats are explicitly supported:
.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz
If an archive not in the above list is found, extraction will be attempted via the
patool
package, which supports many formats but may require that additional system packages be installed.By default, samples with the same absolute
filepath
are merged, but you can customize this behavior via thekey_field
andkey_fcn
parameters. For example, you could setkey_fcn = lambda sample: os.path.basename(sample.filepath)
to merge samples with the same base filename.The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the sameid
in both collections are updated rather than duplicated.To avoid confusion between missing fields and fields whose value is
None
,None
-valued fields are always treated as missing while merging.This method can be configured in numerous ways, including:
Whether existing samples should be modified or skipped
Whether new samples should be added or omitted
Whether new fields can be added to the dataset schema
Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
Whether to merge only specific fields, or all but certain fields
Mapping input fields to different field names of this dataset
- Parameters
archive_path – the path to an archive of a dataset directory
dataset_type (None) – the
fiftyone.types.Dataset
type of the dataset inarchive_path
data_path (None) –
an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:
a folder name like
"data"
or"data/"
specifying a subfolder ofdataset_dir
in which the media liesan absolute directory path in which the media lies. In this case, the
archive_path
has no effect on the location of the dataa filename like
"data.json"
specifying the filename of a JSON manifest file inarchive_path
that maps UUIDs to media filepaths. Files of this format are generated when passing theexport_media="manifest"
option tofiftyone.core.collections.SampleCollection.export()
an absolute filepath to a JSON manifest file. In this case,
archive_path
has no effect on the location of the dataa dict mapping filenames to absolute filepaths
By default, it is assumed that the data can be located in the default location within
archive_path
for the dataset typelabels_path (None) –
an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:
a type-specific folder name like
"labels"
or"labels/"
or a filename like"labels.json"
or"labels.xml"
specifying the location inarchive_path
of the labels file(s)an absolute directory or filepath containing the labels file(s). In this case,
archive_path
has no effect on the location of the labels
For labeled datasets, this parameter defaults to the location in
archive_path
of the labels for the default layout of the dataset type being importedlabel_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
key_field ("filepath") – the sample field to use to decide whether to join with an existing sample
key_fcn (None) – a function that accepts a
fiftyone.core.sample.Sample
instance and computes a key to decide if two samples should be merged. If akey_fcn
is provided,key_field
is ignoredskip_existing (False) – whether to skip existing samples (True) or merge them (False)
insert_new (True) – whether to insert new samples (True) or skip them (False)
fields (None) – an optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from
samples
when merging or adding samples. One exception is thatfilepath
is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this datasetomit_fields (None) – an optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that
filepath
is always included when adding new samples, since the field is requiredmerge_lists (True) – whether to merge the elements of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields) rather than merging the entire top-level field like other field types. For label lists fields, existingfiftyone.core.label.Label
elements are either replaced (whenoverwrite
is True) or kept (whenoverwrite
is False) when theirid
matches a label from the provided samplesoverwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements
expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
add_info (True) – whether to add dataset info from the importer (if any) to the dataset
cleanup (True) – whether to delete the archive after extracting it
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments to pass to the constructor of the
fiftyone.utils.data.importers.DatasetImporter
for the specifieddataset_type
-
add_importer
(dataset_importer, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, progress=None)¶ Adds the samples from the given
fiftyone.utils.data.importers.DatasetImporter
to the dataset.See this guide for more details about importing datasets in custom formats by defining your own
DatasetImporter
.- Parameters
dataset_importer – a
fiftyone.utils.data.importers.DatasetImporter
label_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
add_info (True) – whether to add dataset info from the importer (if any) to the dataset’s
info
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples that were added to the dataset
-
merge_importer
(dataset_importer, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, progress=None)¶ Merges the samples from the given
fiftyone.utils.data.importers.DatasetImporter
into the dataset.Note
This method requires the ability to create unique indexes on the
key_field
of each collection.See
add_importer()
if you want to add samples without a uniqueness constraint.See this guide for more details about importing datasets in custom formats by defining your own
DatasetImporter
.By default, samples with the same absolute
filepath
are merged, but you can customize this behavior via thekey_field
andkey_fcn
parameters. For example, you could setkey_fcn = lambda sample: os.path.basename(sample.filepath)
to merge samples with the same base filename.The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the sameid
in both collections are updated rather than duplicated.To avoid confusion between missing fields and fields whose value is
None
,None
-valued fields are always treated as missing while merging.This method can be configured in numerous ways, including:
Whether existing samples should be modified or skipped
Whether new samples should be added or omitted
Whether new fields can be added to the dataset schema
Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
Whether to merge only specific fields, or all but certain fields
Mapping input fields to different field names of this dataset
- Parameters
dataset_importer – a
fiftyone.utils.data.importers.DatasetImporter
label_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
key_field ("filepath") – the sample field to use to decide whether to join with an existing sample
key_fcn (None) – a function that accepts a
fiftyone.core.sample.Sample
instance and computes a key to decide if two samples should be merged. If akey_fcn
is provided,key_field
is ignoredskip_existing (False) – whether to skip existing samples (True) or merge them (False)
insert_new (True) – whether to insert new samples (True) or skip them (False)
fields (None) – an optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from
samples
when merging or adding samples. One exception is thatfilepath
is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this datasetomit_fields (None) – an optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that
filepath
is always included when adding new samples, since the field is requiredmerge_lists (True) – whether to merge the elements of list fields (e.g.,
tags
) and label list fields (e.g.,fiftyone.core.labels.Detections
fields) rather than merging the entire top-level field like other field types. For label lists fields, existingfiftyone.core.label.Label
elements are either replaced (whenoverwrite
is True) or kept (whenoverwrite
is False) when theirid
matches a label from the provided samplesoverwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements
expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
add_info (True) – whether to add dataset info from the importer (if any) to the dataset
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
add_images
(paths_or_samples, sample_parser=None, tags=None, progress=None)¶ Adds the given images to the dataset.
This operation does not read the images.
See this guide for more details about adding images to a dataset by defining your own
UnlabeledImageSampleParser
.- Parameters
paths_or_samples – an iterable of data. If no
sample_parser
is provided, this must be an iterable of image paths. If asample_parser
is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parsersample_parser (None) – a
fiftyone.utils.data.parsers.UnlabeledImageSampleParser
instance to use to parse the samplestags (None) – an optional tag or iterable of tags to attach to each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples that were added to the dataset
-
add_labeled_images
(samples, sample_parser, label_field=None, tags=None, expand_schema=True, dynamic=False, progress=None)¶ Adds the given labeled images to the dataset.
This operation will iterate over all provided samples, but the images will not be read (unless the sample parser requires it in order to compute image metadata).
See this guide for more details about adding labeled images to a dataset by defining your own
LabeledImageSampleParser
.- Parameters
samples – an iterable of data
sample_parser – a
fiftyone.utils.data.parsers.LabeledImageSampleParser
instance to use to parse the sampleslabel_field (None) – controls the field(s) in which imported labels are stored. If the parser produces a single
fiftyone.core.labels.Label
instance per sample, this argument specifies the name of the field to use; the default is"ground_truth"
. If the parser produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples that were added to the dataset
-
add_images_dir
(images_dir, tags=None, recursive=True, progress=None)¶ Adds the given directory of images to the dataset.
See
fiftyone.types.ImageDirectory
for format details. In particular, note that files with non-image MIME types are omitted.This operation does not read the images.
- Parameters
images_dir – a directory of images
tags (None) – an optional tag or iterable of tags to attach to each sample
recursive (True) – whether to recursively traverse subdirectories
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
add_images_patt
(images_patt, tags=None, progress=None)¶ Adds the given glob pattern of images to the dataset.
This operation does not read the images.
- Parameters
images_patt – a glob pattern of images like
/path/to/images/*.jpg
tags (None) – an optional tag or iterable of tags to attach to each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
ingest_images
(paths_or_samples, sample_parser=None, tags=None, dataset_dir=None, image_format=None, progress=None)¶ Ingests the given iterable of images into the dataset.
The images are read in-memory and written to
dataset_dir
.See this guide for more details about ingesting images into a dataset by defining your own
UnlabeledImageSampleParser
.- Parameters
paths_or_samples – an iterable of data. If no
sample_parser
is provided, this must be an iterable of image paths. If asample_parser
is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parsersample_parser (None) – a
fiftyone.utils.data.parsers.UnlabeledImageSampleParser
instance to use to parse the samplestags (None) – an optional tag or iterable of tags to attach to each sample
dataset_dir (None) – the directory in which the images will be written. By default,
get_default_dataset_dir()
is usedimage_format (None) – the image format to use to write the images to disk. By default,
fiftyone.config.default_image_ext
is usedprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
ingest_labeled_images
(samples, sample_parser, label_field=None, tags=None, expand_schema=True, dynamic=False, dataset_dir=None, image_format=None, progress=None)¶ Ingests the given iterable of labeled image samples into the dataset.
The images are read in-memory and written to
dataset_dir
.See this guide for more details about ingesting labeled images into a dataset by defining your own
LabeledImageSampleParser
.- Parameters
samples – an iterable of data
sample_parser – a
fiftyone.utils.data.parsers.LabeledImageSampleParser
instance to use to parse the sampleslabel_field (None) – controls the field(s) in which imported labels are stored. If the parser produces a single
fiftyone.core.labels.Label
instance per sample, this argument specifies the name of the field to use; the default is"ground_truth"
. If the parser produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if the sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
dataset_dir (None) – the directory in which the images will be written. By default,
get_default_dataset_dir()
is usedimage_format (None) – the image format to use to write the images to disk. By default,
fiftyone.config.default_image_ext
is usedprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
add_videos
(paths_or_samples, sample_parser=None, tags=None, progress=None)¶ Adds the given videos to the dataset.
This operation does not read the videos.
See this guide for more details about adding videos to a dataset by defining your own
UnlabeledVideoSampleParser
.- Parameters
paths_or_samples – an iterable of data. If no
sample_parser
is provided, this must be an iterable of video paths. If asample_parser
is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parsersample_parser (None) – a
fiftyone.utils.data.parsers.UnlabeledVideoSampleParser
instance to use to parse the samplestags (None) – an optional tag or iterable of tags to attach to each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples that were added to the dataset
-
add_labeled_videos
(samples, sample_parser, label_field=None, tags=None, expand_schema=True, dynamic=False, progress=None)¶ Adds the given labeled videos to the dataset.
This operation will iterate over all provided samples, but the videos will not be read/decoded/etc.
See this guide for more details about adding labeled videos to a dataset by defining your own
LabeledVideoSampleParser
.- Parameters
samples – an iterable of data
sample_parser – a
fiftyone.utils.data.parsers.LabeledVideoSampleParser
instance to use to parse the sampleslabel_field ("ground_truth") – controls the field(s) in which imported labels are stored. If the parser produces a single
fiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the parser produces a dictionary of labels per sample/frame, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field nameslabel_field – the name (or root name) of the frame field(s) to use for the labels
tags (None) – an optional tag or iterable of tags to attach to each sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples that were added to the dataset
-
add_videos_dir
(videos_dir, tags=None, recursive=True, progress=None)¶ Adds the given directory of videos to the dataset.
See
fiftyone.types.VideoDirectory
for format details. In particular, note that files with non-video MIME types are omitted.This operation does not read/decode the videos.
- Parameters
videos_dir – a directory of videos
tags (None) – an optional tag or iterable of tags to attach to each sample
recursive (True) – whether to recursively traverse subdirectories
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
add_videos_patt
(videos_patt, tags=None, progress=None)¶ Adds the given glob pattern of videos to the dataset.
This operation does not read/decode the videos.
- Parameters
videos_patt – a glob pattern of videos like
/path/to/videos/*.mp4
tags (None) – an optional tag or iterable of tags to attach to each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
ingest_videos
(paths_or_samples, sample_parser=None, tags=None, dataset_dir=None, progress=None)¶ Ingests the given iterable of videos into the dataset.
The videos are copied to
dataset_dir
.See this guide for more details about ingesting videos into a dataset by defining your own
UnlabeledVideoSampleParser
.- Parameters
paths_or_samples – an iterable of data. If no
sample_parser
is provided, this must be an iterable of video paths. If asample_parser
is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parsersample_parser (None) – a
fiftyone.utils.data.parsers.UnlabeledVideoSampleParser
instance to use to parse the samplestags (None) – an optional tag or iterable of tags to attach to each sample
dataset_dir (None) – the directory in which the videos will be written. By default,
get_default_dataset_dir()
is usedprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
ingest_labeled_videos
(samples, sample_parser, tags=None, expand_schema=True, dynamic=False, dataset_dir=None, progress=None)¶ Ingests the given iterable of labeled video samples into the dataset.
The videos are copied to
dataset_dir
.See this guide for more details about ingesting labeled videos into a dataset by defining your own
LabeledVideoSampleParser
.- Parameters
samples – an iterable of data
sample_parser – a
fiftyone.utils.data.parsers.LabeledVideoSampleParser
instance to use to parse the samplestags (None) – an optional tag or iterable of tags to attach to each sample
expand_schema (True) – whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if the sample’s schema is not a subset of the dataset schema
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
dataset_dir (None) – the directory in which the videos will be written. By default,
get_default_dataset_dir()
is usedprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a list of IDs of the samples in the dataset
-
classmethod
from_dir
(dataset_dir=None, dataset_type=None, data_path=None, labels_path=None, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None, **kwargs)¶ Creates a
Dataset
from the contents of the given directory.You can create datasets with this method via the following basic patterns:
Provide
dataset_dir
anddataset_type
to import the contents of a directory that is organized in the default layout for the dataset type as documented in this guideProvide
dataset_type
along withdata_path
,labels_path
, or other type-specific parameters to perform a customized import. This syntax provides the flexibility to, for example, perform labels-only imports or imports where the source media lies in a different location than the labels
In either workflow, the remaining parameters of this method can be provided to further configure the import.
See this guide for example usages of this method and descriptions of the available dataset types.
- Parameters
dataset_dir (None) – the dataset directory. This can be omitted if you provide arguments such as
data_path
andlabels_path
dataset_type (None) – the
fiftyone.types.Dataset
type of the datasetdata_path (None) –
an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:
a folder name like
"data"
or"data/"
specifying a subfolder ofdataset_dir
in which the media liesan absolute directory path in which the media lies. In this case, the
dataset_dir
has no effect on the location of the dataa filename like
"data.json"
specifying the filename of a JSON manifest file indataset_dir
that maps UUIDs to media filepaths. Files of this format are generated when passing theexport_media="manifest"
option tofiftyone.core.collections.SampleCollection.export()
an absolute filepath to a JSON manifest file. In this case,
dataset_dir
has no effect on the location of the dataa dict mapping filenames to absolute filepaths
By default, it is assumed that the data can be located in the default location within
dataset_dir
for the dataset typelabels_path (None) –
an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:
a type-specific folder name like
"labels"
or"labels/"
or a filename like"labels.json"
or"labels.xml"
specifying the location indataset_dir
of the labels file(s)an absolute directory or filepath containing the labels file(s). In this case,
dataset_dir
has no effect on the location of the labels
For labeled datasets, this parameter defaults to the location in
dataset_dir
of the labels for the default layout of the dataset type being importedname (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
label_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments to pass to the constructor of the
fiftyone.utils.data.importers.DatasetImporter
for the specifieddataset_type
- Returns
a
Dataset
-
classmethod
from_archive
(archive_path, dataset_type=None, data_path=None, labels_path=None, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, cleanup=True, progress=None, **kwargs)¶ Creates a
Dataset
from the contents of the given archive.If a directory with the same root name as
archive_path
exists, it is assumed that this directory contains the extracted contents of the archive, and thus the archive is not re-extracted.See this guide for example usages of this method and descriptions of the available dataset types.
Note
The following archive formats are explicitly supported:
.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz
If an archive not in the above list is found, extraction will be attempted via the
patool
package, which supports many formats but may require that additional system packages be installed.- Parameters
archive_path – the path to an archive of a dataset directory
dataset_type (None) – the
fiftyone.types.Dataset
type of the dataset inarchive_path
data_path (None) –
an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:
a folder name like
"data"
or"data/"
specifying a subfolder ofdataset_dir
in which the media liesan absolute directory path in which the media lies. In this case, the
archive_path
has no effect on the location of the dataa filename like
"data.json"
specifying the filename of a JSON manifest file inarchive_path
that maps UUIDs to media filepaths. Files of this format are generated when passing theexport_media="manifest"
option tofiftyone.core.collections.SampleCollection.export()
an absolute filepath to a JSON manifest file. In this case,
archive_path
has no effect on the location of the dataa dict mapping filenames to absolute filepaths
By default, it is assumed that the data can be located in the default location within
archive_path
for the dataset typelabels_path (None) –
an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:
a type-specific folder name like
"labels"
or"labels/"
or a filename like"labels.json"
or"labels.xml"
specifying the location inarchive_path
of the labels file(s)an absolute directory or filepath containing the labels file(s). In this case,
archive_path
has no effect on the location of the labels
For labeled datasets, this parameter defaults to the location in
archive_path
of the labels for the default layout of the dataset type being importedname (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
label_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
cleanup (True) – whether to delete the archive after extracting it
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments to pass to the constructor of the
fiftyone.utils.data.importers.DatasetImporter
for the specifieddataset_type
- Returns
a
Dataset
-
classmethod
from_importer
(dataset_importer, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None)¶ Creates a
Dataset
by importing the samples in the givenfiftyone.utils.data.importers.DatasetImporter
.See this guide for more details about providing a custom
DatasetImporter
to import datasets into FiftyOne.- Parameters
dataset_importer – a
fiftyone.utils.data.importers.DatasetImporter
name (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
label_field (None) – controls the field(s) in which imported labels are stored. Only applicable if
dataset_importer
is afiftyone.utils.data.importers.LabeledImageDatasetImporter
orfiftyone.utils.data.importers.LabeledVideoDatasetImporter
. If the importer produces a singlefiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_images
(paths_or_samples, sample_parser=None, name=None, persistent=False, overwrite=False, tags=None, progress=None)¶ Creates a
Dataset
from the given images.This operation does not read the images.
See this guide for more details about providing a custom
UnlabeledImageSampleParser
to load image samples into FiftyOne.- Parameters
paths_or_samples – an iterable of data. If no
sample_parser
is provided, this must be an iterable of image paths. If asample_parser
is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parsersample_parser (None) – a
fiftyone.utils.data.parsers.UnlabeledImageSampleParser
instance to use to parse the samplesname (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
tags (None) – an optional tag or iterable of tags to attach to each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_labeled_images
(samples, sample_parser, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None)¶ Creates a
Dataset
from the given labeled images.This operation will iterate over all provided samples, but the images will not be read.
See this guide for more details about providing a custom
LabeledImageSampleParser
to load labeled image samples into FiftyOne.- Parameters
samples – an iterable of data
sample_parser – a
fiftyone.utils.data.parsers.LabeledImageSampleParser
instance to use to parse the samplesname (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
label_field (None) – controls the field(s) in which imported labels are stored. If the parser produces a single
fiftyone.core.labels.Label
instance per sample, this argument specifies the name of the field to use; the default is"ground_truth"
. If the parser produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_images_dir
(images_dir, name=None, persistent=False, overwrite=False, tags=None, recursive=True, progress=None)¶ Creates a
Dataset
from the given directory of images.This operation does not read the images.
- Parameters
images_dir – a directory of images
name (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
tags (None) – an optional tag or iterable of tags to attach to each sample
recursive (True) – whether to recursively traverse subdirectories
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_images_patt
(images_patt, name=None, persistent=False, overwrite=False, tags=None, progress=None)¶ Creates a
Dataset
from the given glob pattern of images.This operation does not read the images.
- Parameters
images_patt – a glob pattern of images like
/path/to/images/*.jpg
name (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
tags (None) – an optional tag or iterable of tags to attach to each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_videos
(paths_or_samples, sample_parser=None, name=None, persistent=False, overwrite=False, tags=None, progress=None)¶ Creates a
Dataset
from the given videos.This operation does not read/decode the videos.
See this guide for more details about providing a custom
UnlabeledVideoSampleParser
to load video samples into FiftyOne.- Parameters
paths_or_samples – an iterable of data. If no
sample_parser
is provided, this must be an iterable of video paths. If asample_parser
is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parsersample_parser (None) – a
fiftyone.utils.data.parsers.UnlabeledVideoSampleParser
instance to use to parse the samplesname (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
tags (None) – an optional tag or iterable of tags to attach to each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_labeled_videos
(samples, sample_parser, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None)¶ Creates a
Dataset
from the given labeled videos.This operation will iterate over all provided samples, but the videos will not be read/decoded/etc.
See this guide for more details about providing a custom
LabeledVideoSampleParser
to load labeled video samples into FiftyOne.- Parameters
samples – an iterable of data
sample_parser – a
fiftyone.utils.data.parsers.LabeledVideoSampleParser
instance to use to parse the samplesname (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
label_field (None) – controls the field(s) in which imported labels are stored. If the parser produces a single
fiftyone.core.labels.Label
instance per sample/frame, this argument specifies the name of the field to use; the default is"ground_truth"
. If the parser produces a dictionary of labels per sample/frame, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field namestags (None) – an optional tag or iterable of tags to attach to each sample
dynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_videos_dir
(videos_dir, name=None, persistent=False, overwrite=False, tags=None, recursive=True, progress=None)¶ Creates a
Dataset
from the given directory of videos.This operation does not read/decode the videos.
- Parameters
videos_dir – a directory of videos
name (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
tags (None) – an optional tag or iterable of tags to attach to each sample
recursive (True) – whether to recursively traverse subdirectories
- Returns
a
Dataset
-
classmethod
from_videos_patt
(videos_patt, name=None, persistent=False, overwrite=False, tags=None, progress=None)¶ Creates a
Dataset
from the given glob pattern of videos.This operation does not read/decode the videos.
- Parameters
videos_patt – a glob pattern of videos like
/path/to/videos/*.mp4
name (None) – a name for the dataset. By default,
get_default_dataset_name()
is usedpersistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
tags (None) – an optional tag or iterable of tags to attach to each sample
- Returns
a
Dataset
-
classmethod
from_dict
(d, name=None, persistent=False, overwrite=False, rel_dir=None, frame_labels_dir=None, progress=None)¶ Loads a
Dataset
from a JSON dictionary generated byfiftyone.core.collections.SampleCollection.to_dict()
.The JSON dictionary can contain an export of any
fiftyone.core.collections.SampleCollection
, e.g.,Dataset
orfiftyone.core.view.DatasetView
.- Parameters
d – a JSON dictionary
name (None) – a name for the new dataset
persistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
rel_dir (None) – a relative directory to prepend to the
filepath
of each sample if the filepath is not absolute (begins with a path separator). The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
frame_labels_dir (None) – a directory of per-sample JSON files containing the frame labels for video samples. If omitted, it is assumed that the frame labels are included directly in the provided JSON dict. Only applicable to datasets that contain videos
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
classmethod
from_json
(path_or_str, name=None, persistent=False, overwrite=False, rel_dir=None, frame_labels_dir=None, progress=None)¶ Loads a
Dataset
from JSON generated byfiftyone.core.collections.SampleCollection.write_json()
orfiftyone.core.collections.SampleCollection.to_json()
.The JSON file can contain an export of any
fiftyone.core.collections.SampleCollection
, e.g.,Dataset
orfiftyone.core.view.DatasetView
.- Parameters
path_or_str – the path to a JSON file on disk or a JSON string
name (None) – a name for the new dataset
persistent (False) – whether the dataset should persist in the database after the session terminates
overwrite (False) – whether to overwrite an existing dataset of the same name
rel_dir (None) – a relative directory to prepend to the
filepath
of each sample, if the filepath is not absolute (begins with a path separator). The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
Dataset
-
reload
()¶ Reloads the dataset and any in-memory samples from the database.
-
clear_cache
()¶ Clears the dataset’s in-memory cache.
Dataset caches may contain sample/frame singletons and annotation/brain/evaluation/custom runs.
-
add_stage
(stage)¶ Applies the given
fiftyone.core.stages.ViewStage
to the collection.- Parameters
stage – a
fiftyone.core.stages.ViewStage
- Returns
-
aggregate
(aggregations)¶ Aggregates one or more
fiftyone.core.aggregations.Aggregation
instances.Note that it is best practice to group aggregations into a single call to
aggregate()
, as this will be more efficient than performing multiple aggregations in series.- Parameters
aggregations – an
fiftyone.core.aggregations.Aggregation
or iterable offiftyone.core.aggregations.Aggregation
instances- Returns
an aggregation result or list of aggregation results corresponding to the input aggregation(s)
-
annotate
(anno_key, label_schema=None, label_field=None, label_type=None, classes=None, attributes=True, mask_targets=None, allow_additions=True, allow_deletions=True, allow_label_edits=True, allow_index_edits=True, allow_spatial_edits=True, media_field='filepath', backend=None, launch_editor=False, **kwargs)¶ Exports the samples and optional label field(s) in this collection to the given annotation backend.
The
backend
parameter controls which annotation backend to use. Depending on the backend you use, you may want/need to provide extra keyword arguments to this function for the constructor of the backend’sfiftyone.utils.annotations.AnnotationBackendConfig
class.The natively provided backends and their associated config classes are:
"labelstudio"
:fiftyone.utils.labelstudio.LabelStudioBackendConfig
"labelbox"
:fiftyone.utils.labelbox.LabelboxBackendConfig
See this page for more information about using this method, including how to define label schemas and how to configure login credentials for your annotation provider.
- Parameters
anno_key – a string key to use to refer to this annotation run
label_schema (None) – a dictionary defining the label schema to use. If this argument is provided, it takes precedence over the other schema-related arguments
label_field (None) – a string indicating a new or existing label field to annotate
label_type (None) –
a string indicating the type of labels to annotate. The possible values are:
"classification"
: a single classification stored infiftyone.core.labels.Classification
fields"classifications"
: multilabel classifications stored infiftyone.core.labels.Classifications
fields"detections"
: object detections stored infiftyone.core.labels.Detections
fields"instances"
: instance segmentations stored infiftyone.core.labels.Detections
fields with theirmask
attributes populated"polylines"
: polylines stored infiftyone.core.labels.Polylines
fields with theirfilled
attributes set toFalse
"polygons"
: polygons stored infiftyone.core.labels.Polylines
fields with theirfilled
attributes set toTrue
"keypoints"
: keypoints stored infiftyone.core.labels.Keypoints
fields"segmentation"
: semantic segmentations stored infiftyone.core.labels.Segmentation
fields"scalar"
: scalar labels stored infiftyone.core.fields.IntField
,fiftyone.core.fields.FloatField
,fiftyone.core.fields.StringField
, orfiftyone.core.fields.BooleanField
fields
All new label fields must have their type specified via this argument or in
label_schema
. Note that annotation backends may not support all label typesclasses (None) – a list of strings indicating the class options for
label_field
or all fields inlabel_schema
without classes specified. All new label fields must have a class list provided via one of the supported methods. For existing label fields, if classes are not provided by this argument norlabel_schema
, they are retrieved fromget_classes()
if possible, or else the observed labels on your dataset are usedattributes (True) –
specifies the label attributes of each label field to include (other than their
label
, which is always included) in the annotation export. Can be any of the following:True
: export all label attributesFalse
: don’t export any custom label attributesa list of label attributes to export
a dict mapping attribute names to dicts specifying the
type
,values
, anddefault
for each attribute
If a
label_schema
is also provided, this parameter determines which attributes are included for all fields that do not explicitly define their per-field attributes (in addition to any per-class attributes)mask_targets (None) – a dict mapping pixel values to semantic label strings. Only applicable when annotating semantic segmentations
allow_additions (True) – whether to allow new labels to be added. Only applicable when editing existing label fields
allow_deletions (True) – whether to allow labels to be deleted. Only applicable when editing existing label fields
allow_label_edits (True) – whether to allow the
label
attribute of existing labels to be modified. Only applicable when editing existing fields withlabel
attributesallow_index_edits (True) – whether to allow the
index
attribute of existing video tracks to be modified. Only applicable when editing existing frame fields withindex
attributesallow_spatial_edits (True) – whether to allow edits to the spatial properties (bounding boxes, vertices, keypoints, masks, etc) of labels. Only applicable when editing existing spatial label fields
media_field ("filepath") – the field containing the paths to the media files to upload
backend (None) – the annotation backend to use. The supported values are
fiftyone.annotation_config.backends.keys()
and the default isfiftyone.annotation_config.default_backend
launch_editor (False) – whether to launch the annotation backend’s editor after uploading the samples
**kwargs – keyword arguments for the
fiftyone.utils.annotations.AnnotationBackendConfig
- Returns
an
fiftyone.utils.annotations.AnnnotationResults
-
apply_model
(model, label_field='predictions', confidence_thresh=None, store_logits=False, batch_size=None, num_workers=None, skip_failures=True, output_dir=None, rel_dir=None, progress=None, **kwargs)¶ Applies the model to the samples in the collection.
This method supports all of the following cases:
Applying an image model to an image collection
Applying an image model to the frames of a video collection
Applying a video model to a video collection
- Parameters
model – a
fiftyone.core.models.Model
, Hugging Face transformers model, Ultralytics model, SuperGradients model, or Lightning Flash modellabel_field ("predictions") – the name of the field in which to store the model predictions. When performing inference on video frames, the “frames.” prefix is optional
confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels generated by the model
store_logits (False) – whether to store logits for the model predictions. This is only supported when the provided
model
has logits,model.has_logits == True
batch_size (None) – an optional batch size to use, if the model supports batching
num_workers (None) – the number of workers for the
torch.utils.data.DataLoader
to use. Only applicable for Torch-based modelsskip_failures (True) – whether to gracefully continue without raising an error if predictions cannot be generated for a sample. Only applicable to
fiftyone.core.models.Model
instancesoutput_dir (None) – an optional output directory in which to write segmentation images. Only applicable if the model generates segmentations. If none is provided, the segmentations are stored in the database
rel_dir (None) – an optional relative directory to strip from each input filepath to generate a unique identifier that is joined with
output_dir
to generate an output path for each segmentation image. This argument allows for populating nested subdirectories inoutput_dir
that match the shape of the input paths. The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional model-specific keyword arguments passed through to the underlying inference implementation
-
bounds
(field_or_expr, expr=None, safe=False)¶ Computes the bounds of a numeric field of the collection.
None
-valued fields are ignored.This aggregation is typically applied to numeric field types (or lists of such types):
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", numeric_field=1.0, numeric_list_field=[1, 2, 3], ), fo.Sample( filepath="/path/to/image2.png", numeric_field=4.0, numeric_list_field=[1, 2], ), fo.Sample( filepath="/path/to/image3.png", numeric_field=None, numeric_list_field=None, ), ] ) # # Compute the bounds of a numeric field # bounds = dataset.bounds("numeric_field") print(bounds) # (min, max) # # Compute the a bounds of a numeric list field # bounds = dataset.bounds("numeric_list_field") print(bounds) # (min, max) # # Compute the bounds of a transformation of a numeric field # bounds = dataset.bounds(2 * (F("numeric_field") + 1)) print(bounds) # (min, max)
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to ignore nan/inf values when dealing with floating point values
- Returns
the
(min, max)
bounds
-
compute_embeddings
(model, embeddings_field=None, batch_size=None, num_workers=None, skip_failures=True, progress=None, **kwargs)¶ Computes embeddings for the samples in the collection using the given model.
This method supports all the following cases:
Using an image model to compute embeddings for an image collection
Using an image model to compute frame embeddings for a video collection
Using a video model to compute embeddings for a video collection
The
model
must expose embeddings, i.e.,fiftyone.core.models.Model.has_embeddings()
must returnTrue
.If an
embeddings_field
is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.- Parameters
model – a
fiftyone.core.models.Model
, Hugging Face Transformers model, Ultralytics model, SuperGradients model, or Lightning Flash modelembeddings_field (None) – the name of a field in which to store the embeddings. When computing video frame embeddings, the “frames.” prefix is optional
batch_size (None) – an optional batch size to use, if the model supports batching
num_workers (None) – the number of workers for the
torch.utils.data.DataLoader
to use. Only applicable for Torch-based modelsskip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample. Only applicable to
fiftyone.core.models.Model
instancesprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional model-specific keyword arguments passed through to the underlying inference implementation
- Returns
None
, if anembeddings_field
is provideda
num_samples x num_dim
array of embeddings, when computing embeddings for image/video collections with image/video models, respectively, and noembeddings_field
is provided. Ifskip_failures
isTrue
and any errors are detected, a list of lengthnum_samples
is returned instead containing all successfully computed embedding vectors along withNone
entries for samples for which embeddings could not be computeda dictionary mapping sample IDs to
num_frames x num_dim
arrays of embeddings, when computing frame embeddings for video collections using an image model. Ifskip_failures
isTrue
and any errors are detected, the values of this dictionary will contain arrays of embeddings for all frames 1, 2, … until the error occurred, orNone
if no embeddings were computed at all
- Return type
one of the following
-
compute_metadata
(overwrite=False, num_workers=None, skip_failures=True, warn_failures=False, progress=None)¶ Populates the
metadata
field of all samples in the collection.Any samples with existing metadata are skipped, unless
overwrite == True
.- Parameters
overwrite (False) – whether to overwrite existing metadata
num_workers (None) – a suggested number of threads to use
skip_failures (True) – whether to gracefully continue without raising an error if metadata cannot be computed for a sample
warn_failures (False) – whether to log a warning if metadata cannot be computed for a sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
compute_patch_embeddings
(model, patches_field, embeddings_field=None, force_square=False, alpha=None, handle_missing='skip', batch_size=None, num_workers=None, skip_failures=True, progress=None)¶ Computes embeddings for the image patches defined by
patches_field
of the samples in the collection using the given model.This method supports all the following cases:
Using an image model to compute patch embeddings for an image collection
Using an image model to compute frame patch embeddings for a video collection
The
model
must expose embeddings, i.e.,fiftyone.core.models.Model.has_embeddings()
must returnTrue
.If an
embeddings_field
is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.- Parameters
model – a
fiftyone.core.models.Model
, Hugging Face Transformers model, Ultralytics model, SuperGradients model, or Lightning Flash modelpatches_field – the name of the field defining the image patches in each sample to embed. Must be of type
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
. When computing video frame embeddings, the “frames.” prefix is optionalembeddings_field (None) – the name of a label attribute in which to store the embeddings
force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction
alpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, inf)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 0.1
to expand the boxes by 10%, and setalpha = -0.1
to contract the boxes by 10%handle_missing ("skip") –
how to handle images with no patches. Supported values are:
”skip”: skip the image and assign its embedding as
None
”image”: use the whole image as a single patch
”error”: raise an error
batch_size (None) – an optional batch size to use, if the model supports batching
num_workers (None) – the number of workers for the
torch.utils.data.DataLoader
to use. Only applicable for Torch-based modelsskip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
None
, if anembeddings_field
is provideda dict mapping sample IDs to
num_patches x num_dim
arrays of patch embeddings, when computing patch embeddings for image collections and noembeddings_field
is provided. Ifskip_failures
isTrue
and any errors are detected, this dictionary will containNone
values for any samples for which embeddings could not be computeda dict of dicts mapping sample IDs to frame numbers to
num_patches x num_dim
arrays of patch embeddings, when computing patch embeddings for the frames of video collections and noembeddings_field
is provided. Ifskip_failures
isTrue
and any errors are detected, this nested dict will contain missing orNone
values to indicate uncomputable embeddings
- Return type
one of the following
-
concat
(samples)¶ Concatenates the contents of the given
SampleCollection
to this collection.Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Concatenate two views # view1 = dataset.match(F("uniqueness") < 0.2) view2 = dataset.match(F("uniqueness") > 0.7) view = view1.concat(view2) print(view1) print(view2) print(view) # # Concatenate two patches views # gt_objects = dataset.to_patches("ground_truth") patches1 = gt_objects[:50] patches2 = gt_objects[-50:] patches = patches1.concat(patches2) print(patches1) print(patches2) print(patches)
- Parameters
samples – a
SampleCollection
whose contents to append to this collection- Returns
-
count
(field_or_expr=None, expr=None, safe=False)¶ Counts the number of field values in the collection.
None
-valued fields are ignored.If no field is provided, the samples themselves are counted.
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Detections( detections=[ fo.Detection(label="cat"), fo.Detection(label="dog"), ] ), ), fo.Sample( filepath="/path/to/image2.png", predictions=fo.Detections( detections=[ fo.Detection(label="cat"), fo.Detection(label="rabbit"), fo.Detection(label="squirrel"), ] ), ), fo.Sample( filepath="/path/to/image3.png", predictions=None, ), ] ) # # Count the number of samples in the dataset # count = dataset.count() print(count) # the count # # Count the number of samples with `predictions` # count = dataset.count("predictions") print(count) # the count # # Count the number of objects in the `predictions` field # count = dataset.count("predictions.detections") print(count) # the count # # Count the number of objects in samples with > 2 predictions # count = dataset.count( (F("predictions.detections").length() > 2).if_else( F("predictions.detections"), None ) ) print(count) # the count
- Parameters
field_or_expr (None) –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. If neitherfield_or_expr
orexpr
is provided, the samples themselves are counted. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to ignore nan/inf values when dealing with floating point values
- Returns
the count
Counts the occurrences of all label tags in the specified label field(s) of this collection.
- Parameters
label_fields (None) – an optional name or iterable of names of
fiftyone.core.labels.Label
fields. By default, all label fields are used- Returns
a dict mapping tags to counts
Counts the occurrences of sample tags in this collection.
- Returns
a dict mapping tags to counts
-
count_values
(field_or_expr, expr=None, safe=False)¶ Counts the occurrences of field values in the collection.
This aggregation is typically applied to countable field types (or lists of such types):
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", tags=["sunny"], predictions=fo.Detections( detections=[ fo.Detection(label="cat"), fo.Detection(label="dog"), ] ), ), fo.Sample( filepath="/path/to/image2.png", tags=["cloudy"], predictions=fo.Detections( detections=[ fo.Detection(label="cat"), fo.Detection(label="rabbit"), ] ), ), fo.Sample( filepath="/path/to/image3.png", predictions=None, ), ] ) # # Compute the tag counts in the dataset # counts = dataset.count_values("tags") print(counts) # dict mapping values to counts # # Compute the predicted label counts in the dataset # counts = dataset.count_values("predictions.detections.label") print(counts) # dict mapping values to counts # # Compute the predicted label counts after some normalization # counts = dataset.count_values( F("predictions.detections.label").map_values( {"cat": "pet", "dog": "pet"} ).upper() ) print(counts) # dict mapping values to counts
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to treat nan/inf values as None when dealing with floating point values
- Returns
a dict mapping values to counts
-
create_index
(field_or_spec, unique=False, wait=True, **kwargs)¶ Creates an index on the given field or with the given specification, if necessary.
Indexes enable efficient sorting, merging, and other such operations.
Frame-level fields can be indexed by prepending
"frames."
to the field name.Note
If an index with the same field(s) but different order(s) already exists, no new index will be created.
Use
drop_index()
to drop an existing index first if you wish to replace an existing index with new properties.Note
If you are indexing a single field and it already has a unique constraint, it will be retained regardless of the
unique
value you specify. Conversely, if the given field already has a non-unique index but you requested a unique index, the existing index will be replaced with a unique index.Use
drop_index()
to drop an existing index first if you wish to replace an existing index with new properties.- Parameters
field_or_spec – the field name,
embedded.field.name
, or index specification list. Seepymongo.collection.Collection.create_index()
for supported valuesunique (False) – whether to add a uniqueness constraint to the index
wait (True) – whether to wait for index creation to finish
**kwargs – optional keyword arguments for
pymongo.collection.Collection.create_index()
- Returns
the name of the index
-
delete_annotation_run
(anno_key)¶ Deletes the annotation run with the given key from this collection.
Calling this method only deletes the record of the annotation run from the collection; it will not delete any annotations loaded onto your dataset via
load_annotations()
, nor will it delete any associated information from the annotation backend.Use
load_annotation_results()
to programmatically manage/delete a run from the annotation backend.- Parameters
anno_key – an annotation key
-
delete_annotation_runs
()¶ Deletes all annotation runs from this collection.
Calling this method only deletes the records of the annotation runs from this collection; it will not delete any annotations loaded onto your dataset via
load_annotations()
, nor will it delete any associated information from the annotation backend.Use
load_annotation_results()
to programmatically manage/delete runs in the annotation backend.
-
delete_brain_run
(brain_key)¶ Deletes the brain method run with the given key from this collection.
- Parameters
brain_key – a brain key
-
delete_brain_runs
()¶ Deletes all brain method runs from this collection.
-
delete_evaluation
(eval_key)¶ Deletes the evaluation results associated with the given evaluation key from this collection.
- Parameters
eval_key – an evaluation key
-
delete_evaluations
()¶ Deletes all evaluation results from this collection.
-
delete_run
(run_key)¶ Deletes the run with the given key from this collection.
- Parameters
run_key – a run key
-
delete_runs
()¶ Deletes all runs from this collection.
-
distinct
(field_or_expr, expr=None, safe=False)¶ Computes the distinct values of a field in the collection.
None
-valued fields are ignored.This aggregation is typically applied to countable field types (or lists of such types):
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", tags=["sunny"], predictions=fo.Detections( detections=[ fo.Detection(label="cat"), fo.Detection(label="dog"), ] ), ), fo.Sample( filepath="/path/to/image2.png", tags=["sunny", "cloudy"], predictions=fo.Detections( detections=[ fo.Detection(label="cat"), fo.Detection(label="rabbit"), ] ), ), fo.Sample( filepath="/path/to/image3.png", predictions=None, ), ] ) # # Get the distinct tags in a dataset # values = dataset.distinct("tags") print(values) # list of distinct values # # Get the distinct predicted labels in a dataset # values = dataset.distinct("predictions.detections.label") print(values) # list of distinct values # # Get the distinct predicted labels after some normalization # values = dataset.distinct( F("predictions.detections.label").map_values( {"cat": "pet", "dog": "pet"} ).upper() ) print(values) # list of distinct values
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to ignore nan/inf values when dealing with floating point values
- Returns
a sorted list of distinct values
-
draw_labels
(output_dir, rel_dir=None, label_fields=None, overwrite=False, config=None, progress=None, **kwargs)¶ Renders annotated versions of the media in the collection with the specified label data overlaid to the given directory.
The filenames of the sample media are maintained, unless a name conflict would occur in
output_dir
, in which case an index of the form"-%d" % count
is appended to the base filename.Images are written in format
fo.config.default_image_ext
, and videos are written in formatfo.config.default_video_ext
.- Parameters
output_dir – the directory to write the annotated media
rel_dir (None) – an optional relative directory to strip from each input filepath to generate a unique identifier that is joined with
output_dir
to generate an output path for each annotated media. This argument allows for populating nested subdirectories inoutput_dir
that match the shape of the input paths. The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
label_fields (None) – a label field or list of label fields to render. By default, all
fiftyone.core.labels.Label
fields are drawnoverwrite (False) – whether to delete
output_dir
if it exists before renderingconfig (None) – an optional
fiftyone.utils.annotations.DrawConfig
configuring how to draw the labelsprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments specifying parameters of the default
fiftyone.utils.annotations.DrawConfig
to override
- Returns
the list of paths to the rendered media
-
drop_index
(field_or_name)¶ Drops the index for the given field or name, if necessary.
- Parameters
field_or_name – a field name,
embedded.field.name
, or compound index name. Uselist_indexes()
to see the available indexes
-
evaluate_classifications
(pred_field, gt_field='ground_truth', eval_key=None, classes=None, missing=None, method=None, progress=None, **kwargs)¶ Evaluates the classification predictions in this collection with respect to the specified ground truth labels.
By default, this method simply compares the ground truth and prediction for each sample, but other strategies such as binary evaluation and top-k matching can be configured via the
method
parameter.You can customize the evaluation method by passing additional parameters for the method’s config class as
kwargs
.The natively provided
method
values and their associated configs are:"simple"
:fiftyone.utils.eval.classification.SimpleEvaluationConfig
"top-k"
:fiftyone.utils.eval.classification.TopKEvaluationConfig
"binary"
:fiftyone.utils.eval.classification.BinaryEvaluationConfig
If an
eval_key
is specified, then this method will record some statistics on each sample:When evaluating sample-level fields, an
eval_key
field will be populated on each sample recording whether that sample’s prediction is correct.When evaluating frame-level fields, an
eval_key
field will be populated on each frame recording whether that frame’s prediction is correct. In addition, aneval_key
field will be populated on each sample that records the average accuracy of the frame predictions of the sample.
- Parameters
pred_field – the name of the field containing the predicted
fiftyone.core.labels.Classification
instancesgt_field ("ground_truth") – the name of the field containing the ground truth
fiftyone.core.labels.Classification
instanceseval_key (None) – a string key to use to refer to this evaluation
classes (None) – the list of possible classes. If not provided, the observed ground truth/predicted labels are used
missing (None) – a missing label string. Any None-valued labels are given this label for results purposes
method (None) – a string specifying the evaluation method to use. The supported values are
fo.evaluation_config.classification_backends.keys()
and the default isfo.evaluation_config.classification_default_backend
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments for the constructor of the
fiftyone.utils.eval.classification.ClassificationEvaluationConfig
being used
- Returns
-
evaluate_detections
(pred_field, gt_field='ground_truth', eval_key=None, classes=None, missing=None, method=None, iou=0.5, use_masks=False, use_boxes=False, classwise=True, dynamic=True, progress=None, **kwargs)¶ Evaluates the specified predicted detections in this collection with respect to the specified ground truth detections.
This method supports evaluating the following spatial data types:
Object detections in
fiftyone.core.labels.Detections
formatInstance segmentations in
fiftyone.core.labels.Detections
format with theirmask
attributes populatedPolygons in
fiftyone.core.labels.Polylines
formatKeypoints in
fiftyone.core.labels.Keypoints
formatTemporal detections in
fiftyone.core.labels.TemporalDetections
format
For spatial object detection evaluation, this method uses COCO-style evaluation by default.
When evaluating keypoints, “IoUs” are computed via object keypoint similarity.
For temporal segment detection, this method uses ActivityNet-style evaluation by default.
You can use the
method
parameter to select a different method, and you can optionally customize the method by passing additional parameters for the method’s config class askwargs
.The natively provided
method
values and their associated configs are:"open-images"
:fiftyone.utils.eval.openimages.OpenImagesEvaluationConfig
"activitynet"
:fiftyone.utils.eval.activitynet.ActivityNetEvaluationConfig
If an
eval_key
is provided, a number of fields are populated at the object- and sample-level recording the results of the evaluation:True positive (TP), false positive (FP), and false negative (FN) counts for the each sample are saved in top-level fields of each sample:
TP: sample.<eval_key>_tp FP: sample.<eval_key>_fp FN: sample.<eval_key>_fn
In addition, when evaluating frame-level objects, TP/FP/FN counts are recorded for each frame:
TP: frame.<eval_key>_tp FP: frame.<eval_key>_fp FN: frame.<eval_key>_fn
The fields listed below are populated on each individual object; these fields tabulate the TP/FP/FN status of the object, the ID of the matching object (if any), and the matching IoU:
TP/FP/FN: object.<eval_key> ID: object.<eval_key>_id IoU: object.<eval_key>_iou
- Parameters
pred_field – the name of the field containing the predicted
fiftyone.core.labels.Detections
,fiftyone.core.labels.Polylines
,fiftyone.core.labels.Keypoints
, orfiftyone.core.labels.TemporalDetections
gt_field ("ground_truth") – the name of the field containing the ground truth
fiftyone.core.labels.Detections
,fiftyone.core.labels.Polylines
,fiftyone.core.labels.Keypoints
, orfiftyone.core.labels.TemporalDetections
eval_key (None) – a string key to use to refer to this evaluation
classes (None) – the list of possible classes. If not provided, the observed ground truth/predicted labels are used
missing (None) – a missing label string. Any unmatched objects are given this label for results purposes
method (None) – a string specifying the evaluation method to use. The supported values are
fo.evaluation_config.detection_backends.keys()
and the default isfo.evaluation_config.detection_default_backend
iou (0.50) – the IoU threshold to use to determine matches
use_masks (False) – whether to compute IoUs using the instances masks in the
mask
attribute of the provided objects, which must befiftyone.core.labels.Detection
instancesuse_boxes (False) – whether to compute IoUs using the bounding boxes of the provided
fiftyone.core.labels.Polyline
instances rather than using their actual geometriesclasswise (True) – whether to only match objects with the same class label (True) or allow matches between classes (False)
dynamic (True) – whether to declare the dynamic object-level attributes that are populated on the dataset’s schema
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments for the constructor of the
fiftyone.utils.eval.detection.DetectionEvaluationConfig
being used
- Returns
-
evaluate_regressions
(pred_field, gt_field='ground_truth', eval_key=None, missing=None, method=None, progress=None, **kwargs)¶ Evaluates the regression predictions in this collection with respect to the specified ground truth values.
You can customize the evaluation method by passing additional parameters for the method’s config class as
kwargs
.The natively provided
method
values and their associated configs are:If an
eval_key
is specified, then this method will record some statistics on each sample:When evaluating sample-level fields, an
eval_key
field will be populated on each sample recording the error of that sample’s prediction.When evaluating frame-level fields, an
eval_key
field will be populated on each frame recording the error of that frame’s prediction. In addition, aneval_key
field will be populated on each sample that records the average error of the frame predictions of the sample.
- Parameters
pred_field – the name of the field containing the predicted
fiftyone.core.labels.Regression
instancesgt_field ("ground_truth") – the name of the field containing the ground truth
fiftyone.core.labels.Regression
instanceseval_key (None) – a string key to use to refer to this evaluation
missing (None) – a missing value. Any None-valued regressions are given this value for results purposes
method (None) – a string specifying the evaluation method to use. The supported values are
fo.evaluation_config.regression_backends.keys()
and the default isfo.evaluation_config.regression_default_backend
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments for the constructor of the
fiftyone.utils.eval.regression.RegressionEvaluationConfig
being used
- Returns
-
evaluate_segmentations
(pred_field, gt_field='ground_truth', eval_key=None, mask_targets=None, method=None, progress=None, **kwargs)¶ Evaluates the specified semantic segmentation masks in this collection with respect to the specified ground truth masks.
If the size of a predicted mask does not match the ground truth mask, it is resized to match the ground truth.
By default, this method simply performs pixelwise evaluation of the full masks, but other strategies such as boundary-only evaluation can be configured by passing additional parameters for the method’s config class as
kwargs
.The natively provided
method
values and their associated configs are:If an
eval_key
is provided, the accuracy, precision, and recall of each sample is recorded in top-level fields of each sample:Accuracy: sample.<eval_key>_accuracy Precision: sample.<eval_key>_precision Recall: sample.<eval_key>_recall
In addition, when evaluating frame-level masks, the accuracy, precision, and recall of each frame if recorded in the following frame-level fields:
Accuracy: frame.<eval_key>_accuracy Precision: frame.<eval_key>_precision Recall: frame.<eval_key>_recall
Note
The mask values
0
and#000000
are treated as a background class for the purposes of computing evaluation metrics like precision and recall.- Parameters
pred_field – the name of the field containing the predicted
fiftyone.core.labels.Segmentation
instancesgt_field ("ground_truth") – the name of the field containing the ground truth
fiftyone.core.labels.Segmentation
instanceseval_key (None) – a string key to use to refer to this evaluation
mask_targets (None) – a dict mapping pixel values or RGB hex strings to labels. If not provided, the observed values are used as labels
method (None) – a string specifying the evaluation method to use. The supported values are
fo.evaluation_config.segmentation_backends.keys()
and the default isfo.evaluation_config.segmentation_default_backend
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments for the constructor of the
fiftyone.utils.eval.segmentation.SegmentationEvaluationConfig
being used
- Returns
-
exclude
(sample_ids)¶ Excludes the samples with the given IDs from the collection.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample(filepath="/path/to/image1.png"), fo.Sample(filepath="/path/to/image2.png"), fo.Sample(filepath="/path/to/image3.png"), ] ) # # Exclude the first sample from the dataset # sample_id = dataset.first().id view = dataset.exclude(sample_id) # # Exclude the first and last samples from the dataset # sample_ids = [dataset.first().id, dataset.last().id] view = dataset.exclude(sample_ids)
- Parameters
sample_ids –
the samples to exclude. Can be any of the following:
a sample ID
an iterable of sample IDs
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
an iterable of
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
instances
- Returns
-
exclude_by
(field, values)¶ Excludes the samples with the given field values from the collection.
This stage is typically used to work with categorical fields (strings, ints, and bools). If you want to exclude samples based on floating point fields, use
match()
.Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample(filepath="image%d.jpg" % i, int=i, str=str(i)) for i in range(10) ] ) # # Create a view excluding samples whose `int` field have the given # values # view = dataset.exclude_by("int", [1, 9, 3, 7, 5]) print(view.head(5)) # # Create a view excluding samples whose `str` field have the given # values # view = dataset.exclude_by("str", ["1", "9", "3", "7", "5"]) print(view.head(5))
- Parameters
field – a field or
embedded.field.name
values – a value or iterable of values to exclude by
- Returns
-
exclude_fields
(field_names=None, meta_filter=None, _allow_missing=False)¶ Excludes the fields with the given names from the samples in the collection.
Note that default fields cannot be excluded.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", ground_truth=fo.Classification(label="cat"), predictions=fo.Classification( label="cat", confidence=0.9, mood="surly", ), ), fo.Sample( filepath="/path/to/image2.png", ground_truth=fo.Classification(label="dog"), predictions=fo.Classification( label="dog", confidence=0.8, mood="happy", ), ), fo.Sample( filepath="/path/to/image3.png", ), ] ) # # Exclude the `predictions` field from all samples # view = dataset.exclude_fields("predictions") # # Exclude the `mood` attribute from all classifications in the # `predictions` field # view = dataset.exclude_fields("predictions.mood")
- Parameters
field_names (None) – a field name or iterable of field names to exclude. May contain
embedded.field.name
as wellmeta_filter (None) –
a filter that dynamically excludes fields in the collection’s schema according to the specified rule, which can be matched against the field’s
name
,type
,description
, and/orinfo
. For example:Use
meta_filter="2023"
ormeta_filter={"any": "2023"}
to exclude fields that have the string “2023” anywhere in their name, type, description, or infoUse
meta_filter={"type": "StringField"}
ormeta_filter={"type": "Classification"}
to exclude all string or classification fields, respectivelyUse
meta_filter={"description": "my description"}
to exclude fields whose description contains the string “my description”Use
meta_filter={"info": "2023"}
to exclude fields that have the string “2023” anywhere in their infoUse
meta_filter={"info.key": "value"}}
to exclude fields that have a specific key/value pair in their infoInclude
meta_filter={"include_nested_fields": True, ...}
in your meta filter to include all nested fields in the filter
- Returns
-
exclude_frames
(frame_ids, omit_empty=True)¶ Excludes the frames with the given IDs from the video collection.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-video") # # Exclude some specific frames # frame_ids = [ dataset.first().frames.first().id, dataset.last().frames.last().id, ] view = dataset.exclude_frames(frame_ids) print(dataset.count("frames")) print(view.count("frames"))
- Parameters
frame_ids –
the frames to exclude. Can be any of the following:
a frame ID
an iterable of frame IDs
a
fiftyone.core.frame.Frame
orfiftyone.core.frame.FrameView
an iterable of
fiftyone.core.frame.Frame
orfiftyone.core.frame.FrameView
instancesa
fiftyone.core.collections.SampleCollection
whose frames to exclude
omit_empty (True) – whether to omit samples that have no frames after excluding the specified frames
- Returns
-
exclude_groups
(group_ids)¶ Excludes the groups with the given IDs from the grouped collection.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") # # Exclude some specific groups by ID # view = dataset.take(2) group_ids = view.values("group.id") other_groups = dataset.exclude_groups(group_ids) assert len(set(group_ids) & set(other_groups.values("group.id"))) == 0
- Parameters
groups_ids –
the groups to exclude. Can be any of the following:
a group ID
an iterable of group IDs
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
a group dict returned by
get_group()
an iterable of
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
instancesan iterable of group dicts returned by
get_group()
- Returns
-
exclude_labels
(labels=None, ids=None, tags=None, fields=None, omit_empty=True)¶ Excludes the specified labels from the collection.
The returned view will omit samples, sample fields, and individual labels that do not match the specified selection criteria.
You can perform an exclusion via one or more of the following methods:
Provide the
labels
argument, which should contain a list of dicts in the format returned byfiftyone.core.session.Session.selected_labels
, to exclude specific labelsProvide the
ids
argument to exclude labels with specific IDsProvide the
tags
argument to exclude labels with specific tags
If multiple criteria are specified, labels must match all of them in order to be excluded.
By default, the exclusion is applied to all
fiftyone.core.labels.Label
fields, but you can provide thefields
argument to explicitly define the field(s) in which to exclude.Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") # # Exclude the labels currently selected in the App # session = fo.launch_app(dataset) # Select some labels in the App... view = dataset.exclude_labels(labels=session.selected_labels) # # Exclude labels with the specified IDs # # Grab some label IDs ids = [ dataset.first().ground_truth.detections[0].id, dataset.last().predictions.detections[0].id, ] view = dataset.exclude_labels(ids=ids) print(dataset.count("ground_truth.detections")) print(view.count("ground_truth.detections")) print(dataset.count("predictions.detections")) print(view.count("predictions.detections")) # # Exclude labels with the specified tags # # Grab some label IDs ids = [ dataset.first().ground_truth.detections[0].id, dataset.last().predictions.detections[0].id, ] # Give the labels a "test" tag dataset = dataset.clone() # create copy since we're modifying data dataset.select_labels(ids=ids).tag_labels("test") print(dataset.count_values("ground_truth.detections.tags")) print(dataset.count_values("predictions.detections.tags")) # Exclude the labels via their tag view = dataset.exclude_labels(tags="test") print(dataset.count("ground_truth.detections")) print(view.count("ground_truth.detections")) print(dataset.count("predictions.detections")) print(view.count("predictions.detections"))
- Parameters
labels (None) – a list of dicts specifying the labels to exclude in the format returned by
fiftyone.core.session.Session.selected_labels
ids (None) – an ID or iterable of IDs of the labels to exclude
tags (None) – a tag or iterable of tags of labels to exclude
fields (None) – a field or iterable of fields from which to exclude
omit_empty (True) – whether to omit samples that have no labels after filtering
- Returns
-
exists
(field, bool=None)¶ Returns a view containing the samples in the collection that have (or do not have) a non-
None
value for the given field or embedded field.Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", ground_truth=fo.Classification(label="cat"), predictions=fo.Classification(label="cat", confidence=0.9), ), fo.Sample( filepath="/path/to/image2.png", ground_truth=fo.Classification(label="dog"), predictions=fo.Classification(label="dog", confidence=0.8), ), fo.Sample( filepath="/path/to/image3.png", ground_truth=fo.Classification(label="dog"), predictions=fo.Classification(label="dog"), ), fo.Sample( filepath="/path/to/image4.png", ground_truth=None, predictions=None, ), fo.Sample(filepath="/path/to/image5.png"), ] ) # # Only include samples that have a value in their `predictions` # field # view = dataset.exists("predictions") # # Only include samples that do NOT have a value in their # `predictions` field # view = dataset.exists("predictions", False) # # Only include samples that have prediction confidences # view = dataset.exists("predictions.confidence")
- Parameters
field – the field name or
embedded.field.name
bool (None) – whether to check if the field exists (None or True) or does not exist (False)
- Returns
-
export
(export_dir=None, dataset_type=None, data_path=None, labels_path=None, export_media=None, rel_dir=None, dataset_exporter=None, label_field=None, frame_labels_field=None, overwrite=False, progress=None, **kwargs)¶ Exports the samples in the collection to disk.
You can perform exports with this method via the following basic patterns:
Provide
export_dir
anddataset_type
to export the content to a directory in the default layout for the specified format, as documented in this pageProvide
dataset_type
along withdata_path
,labels_path
, and/orexport_media
to directly specify where to export the source media and/or labels (if applicable) in your desired format. This syntax provides the flexibility to, for example, perform workflows like labels-only exportsProvide a
dataset_exporter
to which to feed samples to perform a fully-customized export
In all workflows, the remaining parameters of this method can be provided to further configure the export.
See this page for more information about the available export formats and examples of using this method.
See this guide for more details about exporting datasets in custom formats by defining your own
fiftyone.utils.data.exporters.DatasetExporter
.This method will automatically coerce the data to match the requested export in the following cases:
When exporting in either an unlabeled image or image classification format, if a spatial label field is provided (
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
), then the image patches of the provided samples will be exportedWhen exporting in labeled image dataset formats that expect list-type labels (
fiftyone.core.labels.Classifications
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Keypoints
, orfiftyone.core.labels.Polylines
), if a label field contains labels in non-list format (e.g.,fiftyone.core.labels.Classification
), the labels will be automatically upgraded to single-label listsWhen exporting in labeled image dataset formats that expect
fiftyone.core.labels.Detections
labels, if afiftyone.core.labels.Classification
field is provided, the labels will be automatically upgraded to detections that span the entire images
- Parameters
export_dir (None) –
the directory to which to export the samples in format
dataset_type
. This parameter may be omitted if you have provided appropriate values for thedata_path
and/orlabels_path
parameters. Alternatively, this can also be an archive path with one of the following extensions:.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz
If an archive path is specified, the export is performed in a directory of same name (minus extension) and then automatically archived and the directory then deleted
dataset_type (None) – the
fiftyone.types.Dataset
type to write. If not specified, the default type forlabel_field
is useddata_path (None) –
an optional parameter that enables explicit control over the location of the exported media for certain export formats. Can be any of the following:
a folder name like
"data"
or"data/"
specifying a subfolder ofexport_dir
in which to export the mediaan absolute directory path in which to export the media. In this case, the
export_dir
has no effect on the location of the dataa filename like
"data.json"
specifying the filename of a JSON manifest file inexport_dir
generated whenexport_media
is"manifest"
an absolute filepath specifying the location to write the JSON manifest file when
export_media
is"manifest"
. In this case,export_dir
has no effect on the location of the data
If None, a default value of this parameter will be chosen based on the value of the
export_media
parameter. Note that this parameter is not applicable to certain export formats such as binary types like TF recordslabels_path (None) –
an optional parameter that enables explicit control over the location of the exported labels. Only applicable when exporting in certain labeled dataset formats. Can be any of the following:
a type-specific folder name like
"labels"
or"labels/"
or a filename like"labels.json"
or"labels.xml"
specifying the location inexport_dir
in which to export the labelsan absolute directory or filepath in which to export the labels. In this case, the
export_dir
has no effect on the location of the labels
For labeled datasets, the default value of this parameter will be chosen based on the export format so that the labels will be exported into
export_dir
export_media (None) –
controls how to export the raw media. The supported values are:
True
: copy all media files into the output directoryFalse
: don’t export media. This option is only useful when exporting labeled datasets whose label format stores sufficient information to locate the associated media"move"
: move all media files into the output directory"symlink"
: create symlinks to the media files in the output directory"manifest"
: create adata.json
in the output directory that maps UUIDs used in the labels files to the filepaths of the source media, rather than exporting the actual media
If None, an appropriate default value of this parameter will be chosen based on the value of the
data_path
parameter. Note that some dataset formats may not support certain values for this parameter (e.g., when exporting in binary formats such as TF records, “symlink” is not an option)rel_dir (None) – an optional relative directory to strip from each input filepath to generate a unique identifier for each media. When exporting media, this identifier is joined with
data_path
to generate an output path for each exported media. This argument allows for populating nested subdirectories that match the shape of the input paths. The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
dataset_exporter (None) – a
fiftyone.utils.data.exporters.DatasetExporter
to use to export the samples. When provided, parameters such asexport_dir
,dataset_type
,data_path
, andlabels_path
have no effectlabel_field (None) –
controls the label field(s) to export. Only applicable to labeled datasets. Can be any of the following:
the name of a label field to export
a glob pattern of label field(s) to export
a list or tuple of label field(s) to export
a dictionary mapping label field names to keys to use when constructing the label dictionaries to pass to the exporter
Note that multiple fields can only be specified when the exporter used can handle dictionaries of labels. By default, the first field of compatible type for the exporter is used. When exporting labeled video datasets, this argument may contain frame fields prefixed by
"frames."
frame_labels_field (None) –
controls the frame label field(s) to export. The
"frames."
prefix is optional. Only applicable to labeled video datasets. Can be any of the following:the name of a frame label field to export
a glob pattern of frame label field(s) to export
a list or tuple of frame label field(s) to export
a dictionary mapping frame label field names to keys to use when constructing the frame label dictionaries to pass to the exporter
Note that multiple fields can only be specified when the exporter used can handle dictionaries of frame labels. By default, the first field of compatible type for the exporter is used
overwrite (False) – whether to delete existing directories before performing the export (True) or to merge the export with existing files and directories (False)
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments to pass to the dataset exporter’s constructor. If you are exporting image patches, this can also contain keyword arguments for
fiftyone.utils.patches.ImagePatchesExtractor
-
filter_field
(field, filter, only_matches=True)¶ Filters the values of a field or embedded field of each sample in the collection.
Values of
field
for whichfilter
returnsFalse
are replaced withNone
.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", ground_truth=fo.Classification(label="cat"), predictions=fo.Classification(label="cat", confidence=0.9), numeric_field=1.0, ), fo.Sample( filepath="/path/to/image2.png", ground_truth=fo.Classification(label="dog"), predictions=fo.Classification(label="dog", confidence=0.8), numeric_field=-1.0, ), fo.Sample( filepath="/path/to/image3.png", ground_truth=None, predictions=None, numeric_field=None, ), ] ) # # Only include classifications in the `predictions` field # whose `label` is "cat" # view = dataset.filter_field("predictions", F("label") == "cat") # # Only include samples whose `numeric_field` value is positive # view = dataset.filter_field("numeric_field", F() > 0)
- Parameters
field – the field name or
embedded.field.name
filter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (True) – whether to only include samples that match the filter (True) or include all samples (False)
- Returns
-
filter_keypoints
(field, filter=None, labels=None, only_matches=True)¶ Filters the individual
fiftyone.core.labels.Keypoint.points
elements in the specified keypoints field of each sample in the collection.Note
Use
filter_labels()
if you simply want to filter entirefiftyone.core.labels.Keypoint
objects in a field.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Keypoints( keypoints=[ fo.Keypoint( label="person", points=[(0.1, 0.1), (0.1, 0.9), (0.9, 0.9), (0.9, 0.1)], confidence=[0.7, 0.8, 0.95, 0.99], ) ] ) ), fo.Sample(filepath="/path/to/image2.png"), ] ) dataset.default_skeleton = fo.KeypointSkeleton( labels=["nose", "left eye", "right eye", "left ear", "right ear"], edges=[[0, 1, 2, 0], [0, 3], [0, 4]], ) # # Only include keypoints in the `predictions` field whose # `confidence` is greater than 0.9 # view = dataset.filter_keypoints( "predictions", filter=F("confidence") > 0.9 ) # # Only include keypoints in the `predictions` field with less than # four points # view = dataset.filter_keypoints( "predictions", labels=["left eye", "right eye"] )
- Parameters
field – the
fiftyone.core.labels.Keypoint
orfiftyone.core.labels.Keypoints
field to filterfilter (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean, likeF("confidence") > 0.5
orF("occluded") == False
, to apply elementwise to the specified field, which must be a list of same length asfiftyone.core.labels.Keypoint.points
labels (None) – a label or iterable of keypoint skeleton labels to keep
only_matches (True) – whether to only include keypoints/samples with at least one point after filtering (True) or include all keypoints/samples (False)
- Returns
-
filter_labels
(field, filter, only_matches=True, trajectories=False)¶ Filters the
fiftyone.core.labels.Label
field of each sample in the collection.If the specified
field
is a singlefiftyone.core.labels.Label
type, fields for whichfilter
returnsFalse
are replaced withNone
:If the specified
field
is afiftyone.core.labels.Label
list type, the label elements for whichfilter
returnsFalse
are omitted from the view:Classifications Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Classification(label="cat", confidence=0.9), ), fo.Sample( filepath="/path/to/image2.png", predictions=fo.Classification(label="dog", confidence=0.8), ), fo.Sample( filepath="/path/to/image3.png", predictions=fo.Classification(label="rabbit"), ), fo.Sample( filepath="/path/to/image4.png", predictions=None, ), ] ) # # Only include classifications in the `predictions` field whose # `confidence` is greater than 0.8 # view = dataset.filter_labels("predictions", F("confidence") > 0.8) # # Only include classifications in the `predictions` field whose # `label` is "cat" or "dog" # view = dataset.filter_labels( "predictions", F("label").is_in(["cat", "dog"]) )
Detections Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.1, 0.1, 0.5, 0.5], confidence=0.9, ), fo.Detection( label="dog", bounding_box=[0.2, 0.2, 0.3, 0.3], confidence=0.8, ), ] ), ), fo.Sample( filepath="/path/to/image2.png", predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.5, 0.5, 0.4, 0.4], confidence=0.95, ), fo.Detection(label="rabbit"), ] ), ), fo.Sample( filepath="/path/to/image3.png", predictions=fo.Detections( detections=[ fo.Detection( label="squirrel", bounding_box=[0.25, 0.25, 0.5, 0.5], confidence=0.5, ), ] ), ), fo.Sample( filepath="/path/to/image4.png", predictions=None, ), ] ) # # Only include detections in the `predictions` field whose # `confidence` is greater than 0.8 # view = dataset.filter_labels("predictions", F("confidence") > 0.8) # # Only include detections in the `predictions` field whose `label` # is "cat" or "dog" # view = dataset.filter_labels( "predictions", F("label").is_in(["cat", "dog"]) ) # # Only include detections in the `predictions` field whose bounding # box area is smaller than 0.2 # # Bboxes are in [top-left-x, top-left-y, width, height] format bbox_area = F("bounding_box")[2] * F("bounding_box")[3] view = dataset.filter_labels("predictions", bbox_area < 0.2)
Polylines Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Polylines( polylines=[ fo.Polyline( label="lane", points=[[(0.1, 0.1), (0.1, 0.6)]], filled=False, ), fo.Polyline( label="road", points=[[(0.2, 0.2), (0.5, 0.5), (0.2, 0.5)]], filled=True, ), ] ), ), fo.Sample( filepath="/path/to/image2.png", predictions=fo.Polylines( polylines=[ fo.Polyline( label="lane", points=[[(0.4, 0.4), (0.9, 0.4)]], filled=False, ), fo.Polyline( label="road", points=[[(0.6, 0.6), (0.9, 0.9), (0.6, 0.9)]], filled=True, ), ] ), ), fo.Sample( filepath="/path/to/image3.png", predictions=None, ), ] ) # # Only include polylines in the `predictions` field that are filled # view = dataset.filter_labels("predictions", F("filled") == True) # # Only include polylines in the `predictions` field whose `label` # is "lane" # view = dataset.filter_labels("predictions", F("label") == "lane") # # Only include polylines in the `predictions` field with at least # 3 vertices # num_vertices = F("points").map(F().length()).sum() view = dataset.filter_labels("predictions", num_vertices >= 3)
Keypoints Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Keypoint( label="house", points=[(0.1, 0.1), (0.1, 0.9), (0.9, 0.9), (0.9, 0.1)], ), ), fo.Sample( filepath="/path/to/image2.png", predictions=fo.Keypoint( label="window", points=[(0.4, 0.4), (0.5, 0.5), (0.6, 0.6)], ), ), fo.Sample( filepath="/path/to/image3.png", predictions=None, ), ] ) # # Only include keypoints in the `predictions` field whose `label` # is "house" # view = dataset.filter_labels("predictions", F("label") == "house") # # Only include keypoints in the `predictions` field with less than # four points # view = dataset.filter_labels("predictions", F("points").length() < 4)
- Parameters
field – the label field to filter
filter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (True) – whether to only include samples with at least one label after filtering (True) or include all samples (False)
trajectories (False) – whether to match entire object trajectories for which the object matches the given filter on at least one frame. Only applicable to datasets that contain videos and frame-level label fields whose objects have their
index
attributes populated
- Returns
-
flatten
(stages=None)¶ Returns a flattened view that contains all samples in the dynamic grouped collection.
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("cifar10", split="test") # Group samples by ground truth label grouped_view = dataset.take(1000).group_by("ground_truth.label") print(len(grouped_view)) # 10 # Return a flat view that contains 10 samples from each class flat_view = grouped_view.flatten(fo.Limit(10)) print(len(flat_view)) # 100
- Parameters
stages (None) – a
fiftyone.core.stages.ViewStage
or list offiftyone.core.stages.ViewStage
instances to apply to each group’s samples while flattening- Returns
-
geo_near
(point, location_field=None, min_distance=None, max_distance=None, query=None)¶ Sorts the samples in the collection by their proximity to a specified geolocation.
Note
This stage must be the first stage in any
fiftyone.core.view.DatasetView
in which it appears.Examples:
import fiftyone as fo import fiftyone.zoo as foz TIMES_SQUARE = [-73.9855, 40.7580] dataset = foz.load_zoo_dataset("quickstart-geo") # # Sort the samples by their proximity to Times Square # view = dataset.geo_near(TIMES_SQUARE) # # Sort the samples by their proximity to Times Square, and only # include samples within 5km # view = dataset.geo_near(TIMES_SQUARE, max_distance=5000) # # Sort the samples by their proximity to Times Square, and only # include samples that are in Manhattan # import fiftyone.utils.geojson as foug in_manhattan = foug.geo_within( "location.point", [ [ [-73.949701, 40.834487], [-73.896611, 40.815076], [-73.998083, 40.696534], [-74.031751, 40.715273], [-73.949701, 40.834487], ] ] ) view = dataset.geo_near( TIMES_SQUARE, location_field="location", query=in_manhattan )
- Parameters
point –
the reference point to compute distances to. Can be any of the following:
A
[longitude, latitude]
listA GeoJSON dict with
Point
typeA
fiftyone.core.labels.GeoLocation
instance whosepoint
attribute contains the point
location_field (None) –
the location data of each sample to use. Can be any of the following:
The name of a
fiftyone.core.fields.GeoLocation
field whosepoint
attribute to use as location dataAn
embedded.field.name
containing GeoJSON data to use as location dataNone
, in which case there must be a singlefiftyone.core.fields.GeoLocation
field on the samples, which is used by default
min_distance (None) – filter samples that are less than this distance (in meters) from
point
max_distance (None) – filter samples that are greater than this distance (in meters) from
point
query (None) – an optional dict defining a MongoDB read query that samples must match in order to be included in this view
- Returns
-
geo_within
(boundary, location_field=None, strict=True)¶ Filters the samples in this collection to only include samples whose geolocation is within a specified boundary.
Examples:
import fiftyone as fo import fiftyone.zoo as foz MANHATTAN = [ [ [-73.949701, 40.834487], [-73.896611, 40.815076], [-73.998083, 40.696534], [-74.031751, 40.715273], [-73.949701, 40.834487], ] ] dataset = foz.load_zoo_dataset("quickstart-geo") # # Create a view that only contains samples in Manhattan # view = dataset.geo_within(MANHATTAN)
- Parameters
boundary – a
fiftyone.core.labels.GeoLocation
,fiftyone.core.labels.GeoLocations
, GeoJSON dict, or list of coordinates that define aPolygon
orMultiPolygon
to search withinlocation_field (None) –
the location data of each sample to use. Can be any of the following:
The name of a
fiftyone.core.fields.GeoLocation
field whosepoint
attribute to use as location dataAn
embedded.field.name
that directly contains the GeoJSON location data to useNone
, in which case there must be a singlefiftyone.core.fields.GeoLocation
field on the samples, which is used by default
strict (True) – whether a sample’s location data must strictly fall within boundary (True) in order to match, or whether any intersection suffices (False)
- Returns
-
get_annotation_info
(anno_key)¶ Returns information about the annotation run with the given key on this collection.
- Parameters
anno_key – an annotation key
- Returns
-
get_brain_info
(brain_key)¶ Returns information about the brain method run with the given key on this collection.
- Parameters
brain_key – a brain key
- Returns
-
get_classes
(field)¶ Gets the classes list for the given field, or None if no classes are available.
Classes are first retrieved from
classes()
if they exist, otherwise fromdefault_classes()
.- Parameters
field – a field name
- Returns
a list of classes, or None
-
get_dynamic_field_schema
(fields=None, recursive=True)¶ Returns a schema dictionary describing the dynamic fields of the samples in the collection.
Dynamic fields are embedded document fields with at least one non-None value that have not been declared on the dataset’s schema.
- Parameters
fields (None) – an optional field or iterable of fields for which to return dynamic fields. By default, all fields are considered
recursive (True) – whether to recursively inspect nested lists and embedded documents
- Returns
a dict mapping field paths to
fiftyone.core.fields.Field
instances or lists of them
-
get_dynamic_frame_field_schema
(fields=None, recursive=True)¶ Returns a schema dictionary describing the dynamic fields of the frames in the collection.
Dynamic fields are embedded document fields with at least one non-None value that have not been declared on the dataset’s schema.
- Parameters
fields (None) – an optional field or iterable of fields for which to return dynamic fields. By default, all fields are considered
recursive (True) – whether to recursively inspect nested lists and embedded documents
- Returns
a dict mapping field paths to
fiftyone.core.fields.Field
instances or lists of them, orNone
if the collection does not contain videos
-
get_evaluation_info
(eval_key)¶ Returns information about the evaluation with the given key on this collection.
- Parameters
eval_key – an evaluation key
- Returns
-
get_field
(path, ftype=None, embedded_doc_type=None, read_only=None, include_private=False, leaf=False)¶ Returns the field instance of the provided path, or
None
if one does not exist.- Parameters
path – a field path
ftype (None) – an optional field type to enforce. Must be a subclass of
fiftyone.core.fields.Field
embedded_doc_type (None) – an optional embedded document type to enforce. Must be a subclass of
fiftyone.core.odm.BaseEmbeddedDocument
read_only (None) – whether to optionally enforce that the field is read-only (True) or not read-only (False)
include_private (False) – whether to include fields that start with
_
in the returned schemaleaf (False) – whether to return the subfield of list fields
- Returns
a
fiftyone.core.fields.Field
instance orNone
- Raises
ValueError – if the field does not match provided constraints
-
get_index_information
(include_stats=False)¶ Returns a dictionary of information about the indexes on this collection.
See
pymongo.collection.Collection.index_information()
for details on the structure of this dictionary.- Parameters
include_stats (False) – whether to include the size and build status of each index
- Returns
a dict mapping index names to info dicts
-
get_mask_targets
(field)¶ Gets the mask targets for the given field, or None if no mask targets are available.
Mask targets are first retrieved from
mask_targets()
if they exist, otherwise fromdefault_mask_targets()
.- Parameters
field – a field name
- Returns
a list of classes, or None
-
get_run_info
(run_key)¶ Returns information about the run with the given key on this collection.
- Parameters
run_key – a run key
- Returns
-
get_skeleton
(field)¶ Gets the keypoint skeleton for the given field, or None if no skeleton is available.
Skeletons are first retrieved from
skeletons()
if they exist, otherwise fromdefault_skeleton()
.- Parameters
field – a field name
- Returns
a list of classes, or None
-
group_by
(field_or_expr, order_by=None, reverse=False, flat=False, match_expr=None, sort_expr=None, create_index=True)¶ Creates a view that groups the samples in the collection by a specified field or expression.
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("cifar10", split="test") # # Take 1000 samples at random and group them by ground truth label # view = dataset.take(1000).group_by("ground_truth.label") for group in view.iter_dynamic_groups(): group_value = group.first().ground_truth.label print("%s: %d" % (group_value, len(group))) # # Variation of above operation that arranges the groups in # decreasing order of size and immediately flattens them # from itertools import groupby view = dataset.take(1000).group_by( "ground_truth.label", flat=True, sort_expr=F().length(), reverse=True, ) rle = lambda v: [(k, len(list(g))) for k, g in groupby(v)] for label, count in rle(view.values("ground_truth.label")): print("%s: %d" % (label, count))
- Parameters
field_or_expr – the field or
embedded.field.name
to group by, or a list of field names defining a compound group key, or afiftyone.core.expressions.ViewExpression
or MongoDB aggregation expression that defines the value to group byorder_by (None) – an optional field by which to order the samples in each group
reverse (False) – whether to return the results in descending order. Applies both to
order_by
andsort_expr
flat (False) – whether to return a grouped collection (False) or a flattened collection (True)
match_expr (None) –
an optional
fiftyone.core.expressions.ViewExpression
or MongoDB aggregation expression that defines which groups to include in the output view. If provided, this expression will be evaluated on the list of samples in each group. Only applicable whenflat=True
sort_expr (None) –
an optional
fiftyone.core.expressions.ViewExpression
or MongoDB aggregation expression that defines how to sort the groups in the output view. If provided, this expression will be evaluated on the list of samples in each group. Only applicable whenflat=True
create_index (True) – whether to create an index, if necessary, to optimize the grouping. Only applicable when grouping by field(s), not expressions
- Returns
-
has_annotation_run
(anno_key)¶ Whether this collection has an annotation run with the given key.
- Parameters
anno_key – an annotation key
- Returns
True/False
-
property
has_annotation_runs
¶ Whether this collection has any annotation runs.
-
has_brain_run
(brain_key)¶ Whether this collection has a brain method run with the given key.
- Parameters
brain_key – a brain key
- Returns
True/False
-
property
has_brain_runs
¶ Whether this collection has any brain runs.
-
has_classes
(field)¶ Determines whether this collection has a classes list for the given field.
Classes may be defined either in
classes()
ordefault_classes()
.- Parameters
field – a field name
- Returns
True/False
-
has_evaluation
(eval_key)¶ Whether this collection has an evaluation with the given key.
- Parameters
eval_key – an evaluation key
- Returns
True/False
-
property
has_evaluations
¶ Whether this collection has any evaluation results.
-
has_field
(path)¶ Determines whether the collection has a field with the given name.
- Parameters
path – the field name or
embedded.field.name
- Returns
True/False
-
has_frame_field
(path)¶ Determines whether the collection has a frame-level field with the given name.
- Parameters
path – the field name or
embedded.field.name
- Returns
True/False
-
has_mask_targets
(field)¶ Determines whether this collection has mask targets for the given field.
Mask targets may be defined either in
mask_targets()
ordefault_mask_targets()
.- Parameters
field – a field name
- Returns
True/False
-
has_run
(run_key)¶ Whether this collection has a run with the given key.
- Parameters
run_key – a run key
- Returns
True/False
-
property
has_runs
¶ Whether this collection has any runs.
-
has_sample_field
(path)¶ Determines whether the collection has a sample field with the given name.
- Parameters
path – the field name or
embedded.field.name
- Returns
True/False
-
has_skeleton
(field)¶ Determines whether this collection has a keypoint skeleton for the given field.
Keypoint skeletons may be defined either in
skeletons()
ordefault_skeleton()
.- Parameters
field – a field name
- Returns
True/False
-
histogram_values
(field_or_expr, expr=None, bins=None, range=None, auto=False)¶ Computes a histogram of the field values in the collection.
This aggregation is typically applied to numeric field types (or lists of such types):
Examples:
import numpy as np import matplotlib.pyplot as plt import fiftyone as fo from fiftyone import ViewField as F samples = [] for idx in range(100): samples.append( fo.Sample( filepath="/path/to/image%d.png" % idx, numeric_field=np.random.randn(), numeric_list_field=list(np.random.randn(10)), ) ) dataset = fo.Dataset() dataset.add_samples(samples) def plot_hist(counts, edges): counts = np.asarray(counts) edges = np.asarray(edges) left_edges = edges[:-1] widths = edges[1:] - edges[:-1] plt.bar(left_edges, counts, width=widths, align="edge") # # Compute a histogram of a numeric field # counts, edges, other = dataset.histogram_values( "numeric_field", bins=50, range=(-4, 4) ) plot_hist(counts, edges) plt.show(block=False) # # Compute the histogram of a numeric list field # counts, edges, other = dataset.histogram_values( "numeric_list_field", bins=50 ) plot_hist(counts, edges) plt.show(block=False) # # Compute the histogram of a transformation of a numeric field # counts, edges, other = dataset.histogram_values( 2 * (F("numeric_field") + 1), bins=50 ) plot_hist(counts, edges) plt.show(block=False)
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingbins (None) – can be either an integer number of bins to generate or a monotonically increasing sequence specifying the bin edges to use. By default, 10 bins are created. If
bins
is an integer and norange
is specified, bin edges are automatically distributed in an attempt to evenly distribute the counts in each binrange (None) – a
(lower, upper)
tuple specifying a range in which to generate equal-width bins. Only applicable whenbins
is an integerauto (False) – whether to automatically choose bin edges in an attempt to evenly distribute the counts in each bin. If this option is chosen,
bins
will only be used if it is an integer, and therange
parameter is ignored
- Returns
a tuple of
counts: a list of counts in each bin
edges: an increasing list of bin edges of length
len(counts) + 1
. Note that each bin is treated as having an inclusive lower boundary and exclusive upper boundary,[lower, upper)
, including the rightmost binother: the number of items outside the bins
-
init_run
(**kwargs)¶ Initializes a config instance for a new run.
- Parameters
**kwargs – JSON serializable config parameters
- Returns
-
init_run_results
(run_key, **kwargs)¶ Initializes a results instance for the run with the given key.
- Parameters
run_key – a run key
**kwargs – JSON serializable data
- Returns
-
limit
(limit)¶ Returns a view with at most the given number of samples.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", ground_truth=fo.Classification(label="cat"), ), fo.Sample( filepath="/path/to/image2.png", ground_truth=fo.Classification(label="dog"), ), fo.Sample( filepath="/path/to/image3.png", ground_truth=None, ), ] ) # # Only include the first 2 samples in the view # view = dataset.limit(2)
- Parameters
limit – the maximum number of samples to return. If a non-positive number is provided, an empty view is returned
- Returns
-
limit_labels
(field, limit)¶ Limits the number of
fiftyone.core.labels.Label
instances in the specified labels list field of each sample in the collection.The specified
field
must be one of the following types:Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.1, 0.1, 0.5, 0.5], confidence=0.9, ), fo.Detection( label="dog", bounding_box=[0.2, 0.2, 0.3, 0.3], confidence=0.8, ), ] ), ), fo.Sample( filepath="/path/to/image2.png", predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.5, 0.5, 0.4, 0.4], confidence=0.95, ), fo.Detection(label="rabbit"), ] ), ), fo.Sample( filepath="/path/to/image4.png", predictions=None, ), ] ) # # Only include the first detection in the `predictions` field of # each sample # view = dataset.limit_labels("predictions", 1)
- Parameters
field – the labels list field to filter
limit – the maximum number of labels to include in each labels list. If a non-positive number is provided, all lists will be empty
- Returns
-
classmethod
list_aggregations
()¶ Returns a list of all available methods on this collection that apply
fiftyone.core.aggregations.Aggregation
operations to this collection.- Returns
a list of
SampleCollection
method names
-
list_annotation_runs
(type=None, method=None, **kwargs)¶ Returns a list of annotation keys on this collection.
- Parameters
type (None) –
a specific annotation run type to match, which can be:
a string
fiftyone.core.annotations.AnnotationMethodConfig.type
a
fiftyone.core.annotations.AnnotationMethod
class or its fully-qualified class name string
method (None) – a specific
fiftyone.core.annotations.AnnotationMethodConfig.method
string to match**kwargs – optional config parameters to match
- Returns
a list of annotation keys
-
list_brain_runs
(type=None, method=None, **kwargs)¶ Returns a list of brain keys on this collection.
- Parameters
type (None) –
a specific brain run type to match, which can be:
a
fiftyone.core.brain.BrainMethod
class or its fully-qualified class name string
method (None) – a specific
fiftyone.core.brain.BrainMethodConfig.method
string to match**kwargs – optional config parameters to match
- Returns
a list of brain keys
-
list_evaluations
(type=None, method=None, **kwargs)¶ Returns a list of evaluation keys on this collection.
- Parameters
type (None) –
a specific evaluation type to match, which can be:
a string
fiftyone.core.evaluations.EvaluationMethodConfig.type
a
fiftyone.core.evaluations.EvaluationMethod
class or its fully-qualified class name string
method (None) – a specific
fiftyone.core.evaluations.EvaluationMethodConfig.method
string to match**kwargs – optional config parameters to match
- Returns
a list of evaluation keys
-
list_indexes
()¶ Returns the list of index names on this collection.
Single-field indexes are referenced by their field name, while compound indexes are referenced by more complicated strings. See
pymongo.collection.Collection.index_information()
for details on the compound format.- Returns
the list of index names
-
list_runs
(**kwargs)¶ Returns a list of run keys on this collection.
- Parameters
**kwargs – optional config parameters to match
- Returns
a list of run keys
-
list_schema
(field_or_expr, expr=None)¶ Extracts the value type(s) in a specified list field across all samples in the collection.
Examples:
from datetime import datetime import fiftyone as fo dataset = fo.Dataset() sample1 = fo.Sample( filepath="image1.png", ground_truth=fo.Classification( label="cat", info=[ fo.DynamicEmbeddedDocument( task="initial_annotation", author="Alice", timestamp=datetime(1970, 1, 1), notes=["foo", "bar"], ), fo.DynamicEmbeddedDocument( task="editing_pass", author="Bob", timestamp=datetime.utcnow(), ), ], ), ) sample2 = fo.Sample( filepath="image2.png", ground_truth=fo.Classification( label="dog", info=[ fo.DynamicEmbeddedDocument( task="initial_annotation", author="Bob", timestamp=datetime(2018, 10, 18), notes=["spam", "eggs"], ), ], ), ) dataset.add_samples([sample1, sample2]) # Determine that `ground_truth.info` contains embedded documents print(dataset.list_schema("ground_truth.info")) # fo.EmbeddedDocumentField # Determine the fields of the embedded documents in the list print(dataset.schema("ground_truth.info[]")) # {'task': StringField, ..., 'notes': ListField} # Determine the type of the values in the nested `notes` list field # Since `ground_truth.info` is not yet declared on the dataset's # schema, we must manually include `[]` to unwind the info lists print(dataset.list_schema("ground_truth.info[].notes")) # fo.StringField # Declare the `ground_truth.info` field dataset.add_sample_field( "ground_truth.info", fo.ListField, subfield=fo.EmbeddedDocumentField, embedded_doc_type=fo.DynamicEmbeddedDocument, ) # Now we can inspect the nested `notes` field without unwinding print(dataset.list_schema("ground_truth.info.notes")) # fo.StringField
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregateexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregating
- Returns
a
fiftyone.core.fields.Field
or list offiftyone.core.fields.Field
instances describing the value type(s) in the list
-
classmethod
list_view_stages
()¶ Returns a list of all available methods on this collection that apply
fiftyone.core.stages.ViewStage
operations to this collection.- Returns
a list of
SampleCollection
method names
-
load_annotation_results
(anno_key, cache=True, **kwargs)¶ Loads the results for the annotation run with the given key on this collection.
The
fiftyone.utils.annotations.AnnotationResults
object returned by this method will provide a variety of backend-specific methods allowing you to perform actions such as checking the status and deleting this run from the annotation backend.Use
load_annotations()
to load the labels from an annotation run onto your FiftyOne dataset.- Parameters
anno_key – an annotation key
cache (True) – whether to cache the results on the collection
**kwargs – keyword arguments for run’s
fiftyone.core.annotation.AnnotationMethodConfig.load_credentials()
method
- Returns
-
load_annotation_view
(anno_key, select_fields=False)¶ Loads the
fiftyone.core.view.DatasetView
on which the specified annotation run was performed on this collection.- Parameters
anno_key – an annotation key
select_fields (False) – whether to exclude fields involved in other annotation runs
- Returns
-
load_annotations
(anno_key, dest_field=None, unexpected='prompt', cleanup=False, progress=None, **kwargs)¶ Downloads the labels from the given annotation run from the annotation backend and merges them into this collection.
See this page for more information about using this method to import annotations that you have scheduled by calling
annotate()
.- Parameters
anno_key – an annotation key
dest_field (None) – an optional name of a new destination field into which to load the annotations, or a dict mapping field names in the run’s label schema to new destination field names
unexpected ("prompt") –
how to deal with any unexpected labels that don’t match the run’s label schema when importing. The supported values are:
"prompt"
: present an interactive prompt to direct/discard unexpected labels"ignore"
: automatically ignore any unexpected labels"keep"
: automatically keep all unexpected labels in a field whose name matches the the label type"return"
: return a dict containing all unexpected labels, orNone
if there aren’t any
cleanup (False) – whether to delete any informtation regarding this run from the annotation backend after loading the annotations
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – keyword arguments for the run’s
fiftyone.core.annotation.AnnotationMethodConfig.load_credentials()
method
- Returns
None
, unlessunexpected=="return"
and unexpected labels are found, in which case a dict containing the extra labels is returned
-
load_brain_results
(brain_key, cache=True, load_view=True, **kwargs)¶ Loads the results for the brain method run with the given key on this collection.
- Parameters
brain_key – a brain key
cache (True) – whether to cache the results on the collection
load_view (True) – whether to load the view on which the results were computed (True) or the full dataset (False)
**kwargs – keyword arguments for the run’s
fiftyone.core.brain.BrainMethodConfig.load_credentials()
method
- Returns
-
load_brain_view
(brain_key, select_fields=False)¶ Loads the
fiftyone.core.view.DatasetView
on which the specified brain method run was performed on this collection.- Parameters
brain_key – a brain key
select_fields (False) – whether to exclude fields involved in other brain method runs
- Returns
-
load_evaluation_results
(eval_key, cache=True, **kwargs)¶ Loads the results for the evaluation with the given key on this collection.
- Parameters
eval_key – an evaluation key
cache (True) – whether to cache the results on the collection
**kwargs – keyword arguments for the run’s
fiftyone.core.evaluation.EvaluationMethodConfig.load_credentials()
method
- Returns
-
load_evaluation_view
(eval_key, select_fields=False)¶ Loads the
fiftyone.core.view.DatasetView
on which the specified evaluation was performed on this collection.- Parameters
eval_key – an evaluation key
select_fields (False) – whether to exclude fields involved in other evaluations
- Returns
-
load_run_results
(run_key, cache=True, load_view=True, **kwargs)¶ Loads the results for the run with the given key on this collection.
- Parameters
run_key – a run key
cache (True) – whether to cache the results on the collection
load_view (True) – whether to load the view on which the results were computed (True) or the full dataset (False)
**kwargs – keyword arguments for the run’s
fiftyone.core.runs.RunConfig.load_credentials()
method
- Returns
-
load_run_view
(run_key, select_fields=False)¶ Loads the
fiftyone.core.view.DatasetView
on which the specified run was performed on this collection.- Parameters
run_key – a run key
select_fields (False) – whether to exclude fields involved in other runs
- Returns
-
make_unique_field_name
(root='')¶ Makes a unique field name with the given root name for the collection.
- Parameters
root – an optional root for the output field name
- Returns
the field name
-
map_labels
(field, map)¶ Maps the
label
values of afiftyone.core.labels.Label
field to new values for each sample in the collection.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", weather=fo.Classification(label="sunny"), predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.1, 0.1, 0.5, 0.5], confidence=0.9, ), fo.Detection( label="dog", bounding_box=[0.2, 0.2, 0.3, 0.3], confidence=0.8, ), ] ), ), fo.Sample( filepath="/path/to/image2.png", weather=fo.Classification(label="cloudy"), predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.5, 0.5, 0.4, 0.4], confidence=0.95, ), fo.Detection(label="rabbit"), ] ), ), fo.Sample( filepath="/path/to/image3.png", weather=fo.Classification(label="partly cloudy"), predictions=fo.Detections( detections=[ fo.Detection( label="squirrel", bounding_box=[0.25, 0.25, 0.5, 0.5], confidence=0.5, ), ] ), ), fo.Sample( filepath="/path/to/image4.png", predictions=None, ), ] ) # # Map the "partly cloudy" weather label to "cloudy" # view = dataset.map_labels("weather", {"partly cloudy": "cloudy"}) # # Map "rabbit" and "squirrel" predictions to "other" # view = dataset.map_labels( "predictions", {"rabbit": "other", "squirrel": "other"} )
- Parameters
field – the labels field to map
map – a dict mapping label values to new label values
- Returns
-
match
(filter)¶ Filters the samples in the collection by the given filter.
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", weather=fo.Classification(label="sunny"), predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.1, 0.1, 0.5, 0.5], confidence=0.9, ), fo.Detection( label="dog", bounding_box=[0.2, 0.2, 0.3, 0.3], confidence=0.8, ), ] ), ), fo.Sample( filepath="/path/to/image2.jpg", weather=fo.Classification(label="cloudy"), predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.5, 0.5, 0.4, 0.4], confidence=0.95, ), fo.Detection(label="rabbit"), ] ), ), fo.Sample( filepath="/path/to/image3.png", weather=fo.Classification(label="partly cloudy"), predictions=fo.Detections( detections=[ fo.Detection( label="squirrel", bounding_box=[0.25, 0.25, 0.5, 0.5], confidence=0.5, ), ] ), ), fo.Sample( filepath="/path/to/image4.jpg", predictions=None, ), ] ) # # Only include samples whose `filepath` ends with ".jpg" # view = dataset.match(F("filepath").ends_with(".jpg")) # # Only include samples whose `weather` field is "sunny" # view = dataset.match(F("weather").label == "sunny") # # Only include samples with at least 2 objects in their # `predictions` field # view = dataset.match(F("predictions").detections.length() >= 2) # # Only include samples whose `predictions` field contains at least # one object with area smaller than 0.2 # # Bboxes are in [top-left-x, top-left-y, width, height] format bbox = F("bounding_box") bbox_area = bbox[2] * bbox[3] small_boxes = F("predictions.detections").filter(bbox_area < 0.2) view = dataset.match(small_boxes.length() > 0)
- Parameters
filter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to apply- Returns
-
match_frames
(filter, omit_empty=True)¶ Filters the frames in the video collection by the given filter.
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart-video") # # Match frames with at least 10 detections # num_objects = F("detections.detections").length() view = dataset.match_frames(num_objects > 10) print(dataset.count()) print(view.count()) print(dataset.count("frames")) print(view.count("frames"))
- Parameters
filter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB aggregation expression that returns a boolean describing the filter to applyomit_empty (True) – whether to omit samples with no frame labels after filtering
- Returns
-
match_labels
(labels=None, ids=None, tags=None, filter=None, fields=None, bool=None)¶ Selects the samples from the collection that contain (or do not contain) at least one label that matches the specified criteria.
Note that, unlike
select_labels()
andfilter_labels()
, this stage will not filter the labels themselves; it only selects the corresponding samples.You can perform a selection via one or more of the following methods:
Provide the
labels
argument, which should contain a list of dicts in the format returned byfiftyone.core.session.Session.selected_labels
, to match specific labelsProvide the
ids
argument to match labels with specific IDsProvide the
tags
argument to match labels with specific tagsProvide the
filter
argument to match labels based on a booleanfiftyone.core.expressions.ViewExpression
that is applied to each individualfiftyone.core.labels.Label
elementPass
bool=False
to negate the operation and instead match samples that do not contain at least one label matching the specified criteria
If multiple criteria are specified, labels must match all of them in order to trigger a sample match.
By default, the selection is applied to all
fiftyone.core.labels.Label
fields, but you can provide thefields
argument to explicitly define the field(s) in which to search.Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Only show samples whose labels are currently selected in the App # session = fo.launch_app(dataset) # Select some labels in the App... view = dataset.match_labels(labels=session.selected_labels) # # Only include samples that contain labels with the specified IDs # # Grab some label IDs ids = [ dataset.first().ground_truth.detections[0].id, dataset.last().predictions.detections[0].id, ] view = dataset.match_labels(ids=ids) print(len(view)) print(view.count("ground_truth.detections")) print(view.count("predictions.detections")) # # Only include samples that contain labels with the specified tags # # Grab some label IDs ids = [ dataset.first().ground_truth.detections[0].id, dataset.last().predictions.detections[0].id, ] # Give the labels a "test" tag dataset = dataset.clone() # create copy since we're modifying data dataset.select_labels(ids=ids).tag_labels("test") print(dataset.count_values("ground_truth.detections.tags")) print(dataset.count_values("predictions.detections.tags")) # Retrieve the labels via their tag view = dataset.match_labels(tags="test") print(len(view)) print(view.count("ground_truth.detections")) print(view.count("predictions.detections")) # # Only include samples that contain labels matching a filter # filter = F("confidence") > 0.99 view = dataset.match_labels(filter=filter, fields="predictions") print(len(view)) print(view.count("ground_truth.detections")) print(view.count("predictions.detections"))
- Parameters
labels (None) – a list of dicts specifying the labels to select in the format returned by
fiftyone.core.session.Session.selected_labels
ids (None) – an ID or iterable of IDs of the labels to select
tags (None) – a tag or iterable of tags of labels to select
filter (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB aggregation expression that returns a boolean describing whether to select a given label. In the case of list fields likefiftyone.core.labels.Detections
, the filter is applied to the list elements, not the root fieldfields (None) – a field or iterable of fields from which to select
bool (None) – whether to match samples that have (None or True) or do not have (False) at least one label that matches the specified criteria
- Returns
Returns a view containing the samples in the collection that have or don’t have any/all of the given tag(s).
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample(filepath="image1.png", tags=["train"]), fo.Sample(filepath="image2.png", tags=["test"]), fo.Sample(filepath="image3.png", tags=["train", "test"]), fo.Sample(filepath="image4.png"), ] ) # # Only include samples that have the "test" tag # view = dataset.match_tags("test") # # Only include samples that do not have the "test" tag # view = dataset.match_tags("test", bool=False) # # Only include samples that have the "test" or "train" tags # view = dataset.match_tags(["test", "train"]) # # Only include samples that have the "test" and "train" tags # view = dataset.match_tags(["test", "train"], all=True) # # Only include samples that do not have the "test" or "train" tags # view = dataset.match_tags(["test", "train"], bool=False) # # Only include samples that do not have the "test" and "train" tags # view = dataset.match_tags(["test", "train"], bool=False, all=True)
- Parameters
tags – the tag or iterable of tags to match
bool (None) – whether to match samples that have (None or True) or do not have (False) the given tags
all (False) – whether to match samples that have (or don’t have) all (True) or any (False) of the given tags
- Returns
-
mean
(field_or_expr, expr=None, safe=False)¶ Computes the arithmetic mean of the field values of the collection.
None
-valued fields are ignored.This aggregation is typically applied to numeric field types (or lists of such types):
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", numeric_field=1.0, numeric_list_field=[1, 2, 3], ), fo.Sample( filepath="/path/to/image2.png", numeric_field=4.0, numeric_list_field=[1, 2], ), fo.Sample( filepath="/path/to/image3.png", numeric_field=None, numeric_list_field=None, ), ] ) # # Compute the mean of a numeric field # mean = dataset.mean("numeric_field") print(mean) # the mean # # Compute the mean of a numeric list field # mean = dataset.mean("numeric_list_field") print(mean) # the mean # # Compute the mean of a transformation of a numeric field # mean = dataset.mean(2 * (F("numeric_field") + 1)) print(mean) # the mean
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to ignore nan/inf values when dealing with floating point values
- Returns
the mean
-
merge_labels
(in_field, out_field)¶ Merges the labels from the given input field into the given output field of the collection.
If this collection is a dataset, the input field is deleted after the merge.
If this collection is a view, the input field will still exist on the underlying dataset but will only contain the labels not present in this view.
- Parameters
in_field – the name of the input label field
out_field – the name of the output label field, which will be created if necessary
-
mongo
(pipeline, _needs_frames=None, _group_slices=None)¶ Adds a view stage defined by a raw MongoDB aggregation pipeline.
See MongoDB aggregation pipelines for more details.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.1, 0.1, 0.5, 0.5], confidence=0.9, ), fo.Detection( label="dog", bounding_box=[0.2, 0.2, 0.3, 0.3], confidence=0.8, ), ] ), ), fo.Sample( filepath="/path/to/image2.png", predictions=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.5, 0.5, 0.4, 0.4], confidence=0.95, ), fo.Detection(label="rabbit"), ] ), ), fo.Sample( filepath="/path/to/image3.png", predictions=fo.Detections( detections=[ fo.Detection( label="squirrel", bounding_box=[0.25, 0.25, 0.5, 0.5], confidence=0.5, ), ] ), ), fo.Sample( filepath="/path/to/image4.png", predictions=None, ), ] ) # # Extract a view containing the second and third samples in the # dataset # view = dataset.mongo([{"$skip": 1}, {"$limit": 2}]) # # Sort by the number of objects in the `precictions` field # view = dataset.mongo([ { "$addFields": { "_sort_field": { "$size": {"$ifNull": ["$predictions.detections", []]} } } }, {"$sort": {"_sort_field": -1}}, {"$project": {"_sort_field": False}}, ])
- Parameters
pipeline – a MongoDB aggregation pipeline (list of dicts)
- Returns
-
quantiles
(field_or_expr, quantiles, expr=None, safe=False)¶ Computes the quantile(s) of the field values of a collection.
None
-valued fields are ignored.This aggregation is typically applied to numeric field types (or lists of such types):
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", numeric_field=1.0, numeric_list_field=[1, 2, 3], ), fo.Sample( filepath="/path/to/image2.png", numeric_field=4.0, numeric_list_field=[1, 2], ), fo.Sample( filepath="/path/to/image3.png", numeric_field=None, numeric_list_field=None, ), ] ) # # Compute the quantiles of a numeric field # quantiles = dataset.quantiles("numeric_field", [0.1, 0.5, 0.9]) print(quantiles) # the quantiles # # Compute the quantiles of a numeric list field # quantiles = dataset.quantiles("numeric_list_field", [0.1, 0.5, 0.9]) print(quantiles) # the quantiles # # Compute the mean of a transformation of a numeric field # quantiles = dataset.quantiles(2 * (F("numeric_field") + 1), [0.1, 0.5, 0.9]) print(quantiles) # the quantiles
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregatequantiles – the quantile or iterable of quantiles to compute. Each quantile must be a numeric value in
[0, 1]
expr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to ignore nan/inf values when dealing with floating point values
- Returns
the quantile or list of quantiles
-
register_run
(run_key, config, results=None, overwrite=False, cleanup=True, cache=True)¶ Registers a run under the given key on this collection.
- Parameters
run_key – a run key
config – a
fiftyone.core.runs.RunConfig
results (None) – an optional
fiftyone.core.runs.RunResults
overwrite (False) – whether to allow overwriting an existing run of the same type
cleanup (True) – whether to execute an existing run’s
fiftyone.core.runs.Run.cleanup()
method when overwriting itcache (True) – whether to cache the results on the collection
-
rename_annotation_run
(anno_key, new_anno_key)¶ Replaces the key for the given annotation run with a new key.
- Parameters
anno_key – an annotation key
new_anno_key – a new annotation key
-
rename_brain_run
(brain_key, new_brain_key)¶ Replaces the key for the given brain run with a new key.
- Parameters
brain_key – a brain key
new_brain_key – a new brain key
-
rename_evaluation
(eval_key, new_eval_key)¶ Replaces the key for the given evaluation with a new key.
- Parameters
eval_key – an evaluation key
new_anno_key – a new evaluation key
-
rename_run
(run_key, new_run_key)¶ Replaces the key for the given run with a new key.
- Parameters
run_key – a run key
new_run_key – a new run key
-
save_context
(batch_size=None, batching_strategy=None)¶ Returns a context that can be used to save samples from this collection according to a configurable batching strategy.
Examples:
import random as r import string as s import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("cifar10", split="test") def make_label(): return "".join(r.choice(s.ascii_letters) for i in range(10)) # No save context for sample in dataset.iter_samples(progress=True): sample.ground_truth.label = make_label() sample.save() # Save using default batching strategy with dataset.save_context() as context: for sample in dataset.iter_samples(progress=True): sample.ground_truth.label = make_label() context.save(sample) # Save in batches of 10 with dataset.save_context(batch_size=10) as context: for sample in dataset.iter_samples(progress=True): sample.ground_truth.label = make_label() context.save(sample) # Save every 0.5 seconds with dataset.save_context(batch_size=0.5) as context: for sample in dataset.iter_samples(progress=True): sample.ground_truth.label = make_label() context.save(sample)
- Parameters
batch_size (None) – the batch size to use. If a
batching_strategy
is provided, this parameter configures the strategy as described below. If nobatching_strategy
is provided, this can either be an integer specifying the number of samples to save in a batch (in which casebatching_strategy
is implicitly set to"static"
) or a float number of seconds between batched saves (in which casebatching_strategy
is implicitly set to"latency"
)batching_strategy (None) –
the batching strategy to use for each save operation. Supported values are:
"static"
: a fixed sample batch size for each save"size"
: a target batch size, in bytes, for each save"latency"
: a target latency, in seconds, between saves
By default,
fo.config.default_batcher
is used
- Returns
a
SaveContext
-
save_run_results
(run_key, results, overwrite=True, cache=True)¶ Saves run results for the run with the given key.
- Parameters
run_key – a run key
results – a
fiftyone.core.runs.RunResults
overwrite (True) – whether to overwrite an existing result with the same key
cache (True) – whether to cache the results on the collection
-
schema
(field_or_expr, expr=None, dynamic_only=False, _doc_type=None, _include_private=False)¶ Extracts the names and types of the attributes of a specified embedded document field across all samples in the collection.
Schema aggregations are useful for detecting the presence and types of dynamic attributes of
fiftyone.core.labels.Label
fields across a collection.Examples:
import fiftyone as fo dataset = fo.Dataset() sample1 = fo.Sample( filepath="image1.png", ground_truth=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.1, 0.1, 0.4, 0.4], foo="bar", hello=True, ), fo.Detection( label="dog", bounding_box=[0.5, 0.5, 0.4, 0.4], hello=None, ) ] ) ) sample2 = fo.Sample( filepath="image2.png", ground_truth=fo.Detections( detections=[ fo.Detection( label="rabbit", bounding_box=[0.1, 0.1, 0.4, 0.4], foo=None, ), fo.Detection( label="squirrel", bounding_box=[0.5, 0.5, 0.4, 0.4], hello="there", ), ] ) ) dataset.add_samples([sample1, sample2]) # # Get schema of all dynamic attributes on the detections in a # `Detections` field # print(dataset.schema("ground_truth.detections", dynamic_only=True)) # {'foo': StringField, 'hello': [BooleanField, StringField]}
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregateexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingdynamic_only (False) – whether to only include dynamically added attributes
- Returns
a dict mapping field names to
fiftyone.core.fields.Field
instances. If a field’s values takes multiple non-None types, the list of observed types will be returned
-
select
(sample_ids, ordered=False)¶ Selects the samples with the given IDs from the collection.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") # # Create a view containing the currently selected samples in the App # session = fo.launch_app(dataset) # Select samples in the App... view = dataset.select(session.selected)
- Parameters
sample_ids –
the samples to select. Can be any of the following:
a sample ID
an iterable of sample IDs
an iterable of booleans of same length as the collection encoding which samples to select
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
an iterable of
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
instances
- ordered (False): whether to sort the samples in the returned view to
match the order of the provided IDs
- Returns
-
select_by
(field, values, ordered=False)¶ Selects the samples with the given field values from the collection.
This stage is typically used to work with categorical fields (strings, ints, and bools). If you want to select samples based on floating point fields, use
match()
.Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample(filepath="image%d.jpg" % i, int=i, str=str(i)) for i in range(100) ] ) # # Create a view containing samples whose `int` field have the given # values # view = dataset.select_by("int", [1, 51, 11, 41, 21, 31]) print(view.head(6)) # # Create a view containing samples whose `str` field have the given # values, in order # view = dataset.select_by( "str", ["1", "51", "11", "41", "21", "31"], ordered=True ) print(view.head(6))
- Parameters
field – a field or
embedded.field.name
values – a value or iterable of values to select by
ordered (False) – whether to sort the samples in the returned view to match the order of the provided values
- Returns
-
select_fields
(field_names=None, meta_filter=None, _allow_missing=False)¶ Selects only the fields with the given names from the samples in the collection. All other fields are excluded.
Note that default sample fields are always selected.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", uniqueness=1.0, ground_truth=fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.1, 0.1, 0.5, 0.5], mood="surly", age=51, ), fo.Detection( label="dog", bounding_box=[0.2, 0.2, 0.3, 0.3], mood="happy", age=52, ), ] ) ), fo.Sample( filepath="/path/to/image2.png", uniqueness=0.0, ), fo.Sample( filepath="/path/to/image3.png", ), ] ) # # Include only the default fields on each sample # view = dataset.select_fields() # # Include only the `uniqueness` field (and the default fields) on # each sample # view = dataset.select_fields("uniqueness") # # Include only the `mood` attribute (and the default attributes) of # each `Detection` in the `ground_truth` field # view = dataset.select_fields("ground_truth.detections.mood")
- Parameters
field_names (None) – a field name or iterable of field names to select. May contain
embedded.field.name
as wellmeta_filter (None) –
a filter that dynamically selects fields in the collection’s schema according to the specified rule, which can be matched against the field’s
name
,type
,description
, and/orinfo
. For example:Use
meta_filter="2023"
ormeta_filter={"any": "2023"}
to select fields that have the string “2023” anywhere in their name, type, description, or infoUse
meta_filter={"type": "StringField"}
ormeta_filter={"type": "Classification"}
to select all string or classification fields, respectivelyUse
meta_filter={"description": "my description"}
to select fields whose description contains the string “my description”Use
meta_filter={"info": "2023"}
to select fields that have the string “2023” anywhere in their infoUse
meta_filter={"info.key": "value"}}
to select fields that have a specific key/value pair in their infoInclude
meta_filter={"include_nested_fields": True, ...}
in your meta filter to include all nested fields in the filter
- Returns
-
select_frames
(frame_ids, omit_empty=True)¶ Selects the frames with the given IDs from the video collection.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-video") # # Select some specific frames # frame_ids = [ dataset.first().frames.first().id, dataset.last().frames.last().id, ] view = dataset.select_frames(frame_ids) print(dataset.count()) print(view.count()) print(dataset.count("frames")) print(view.count("frames"))
- Parameters
frame_ids –
the frames to select. Can be any of the following:
a frame ID
an iterable of frame IDs
a
fiftyone.core.frame.Frame
orfiftyone.core.frame.FrameView
an iterable of
fiftyone.core.frame.Frame
orfiftyone.core.frame.FrameView
instancesa
fiftyone.core.collections.SampleCollection
whose frames to select
omit_empty (True) – whether to omit samples that have no frames after selecting the specified frames
- Returns
-
select_group_slices
(slices=None, media_type=None, _allow_mixed=False, _force_mixed=False)¶ Selects the samples in the group collection from the given slice(s).
The returned view is a flattened non-grouped view containing only the slice(s) of interest.
Note
This stage performs a
$lookup
that pulls the requested slice(s) for each sample in the input collection from the source dataset. As a result, this stage always emits unfiltered samples.Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_group_field("group", default="ego") group1 = fo.Group() group2 = fo.Group() dataset.add_samples( [ fo.Sample( filepath="/path/to/left-image1.jpg", group=group1.element("left"), ), fo.Sample( filepath="/path/to/video1.mp4", group=group1.element("ego"), ), fo.Sample( filepath="/path/to/right-image1.jpg", group=group1.element("right"), ), fo.Sample( filepath="/path/to/left-image2.jpg", group=group2.element("left"), ), fo.Sample( filepath="/path/to/video2.mp4", group=group2.element("ego"), ), fo.Sample( filepath="/path/to/right-image2.jpg", group=group2.element("right"), ), ] ) # # Retrieve the samples from the "ego" group slice # view = dataset.select_group_slices("ego") # # Retrieve the samples from the "left" or "right" group slices # view = dataset.select_group_slices(["left", "right"]) # # Retrieve all image samples # view = dataset.select_group_slices(media_type="image")
- Parameters
slices (None) – a group slice or iterable of group slices to select. If neither argument is provided, a flattened list of all samples is returned
media_type (None) – a media type whose slice(s) to select
- Returns
-
select_groups
(group_ids, ordered=False)¶ Selects the groups with the given IDs from the grouped collection.
Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-groups") # # Select some specific groups by ID # group_ids = dataset.take(10).values("group.id") view = dataset.select_groups(group_ids) assert set(view.values("group.id")) == set(group_ids) view = dataset.select_groups(group_ids, ordered=True) assert view.values("group.id") == group_ids
- Parameters
groups_ids –
the groups to select. Can be any of the following:
a group ID
an iterable of group IDs
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
a group dict returned by
get_group()
an iterable of
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
instancesan iterable of group dicts returned by
get_group()
ordered (False) – whether to sort the groups in the returned view to match the order of the provided IDs
- Returns
-
select_labels
(labels=None, ids=None, tags=None, fields=None, omit_empty=True)¶ Selects only the specified labels from the collection.
The returned view will omit samples, sample fields, and individual labels that do not match the specified selection criteria.
You can perform a selection via one or more of the following methods:
Provide the
labels
argument, which should contain a list of dicts in the format returned byfiftyone.core.session.Session.selected_labels
, to select specific labelsProvide the
ids
argument to select labels with specific IDsProvide the
tags
argument to select labels with specific tags
If multiple criteria are specified, labels must match all of them in order to be selected.
By default, the selection is applied to all
fiftyone.core.labels.Label
fields, but you can provide thefields
argument to explicitly define the field(s) in which to select.Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") # # Only include the labels currently selected in the App # session = fo.launch_app(dataset) # Select some labels in the App... view = dataset.select_labels(labels=session.selected_labels) # # Only include labels with the specified IDs # # Grab some label IDs ids = [ dataset.first().ground_truth.detections[0].id, dataset.last().predictions.detections[0].id, ] view = dataset.select_labels(ids=ids) print(view.count("ground_truth.detections")) print(view.count("predictions.detections")) # # Only include labels with the specified tags # # Grab some label IDs ids = [ dataset.first().ground_truth.detections[0].id, dataset.last().predictions.detections[0].id, ] # Give the labels a "test" tag dataset = dataset.clone() # create copy since we're modifying data dataset.select_labels(ids=ids).tag_labels("test") print(dataset.count_label_tags()) # Retrieve the labels via their tag view = dataset.select_labels(tags="test") print(view.count("ground_truth.detections")) print(view.count("predictions.detections"))
- Parameters
labels (None) – a list of dicts specifying the labels to select in the format returned by
fiftyone.core.session.Session.selected_labels
ids (None) – an ID or iterable of IDs of the labels to select
tags (None) – a tag or iterable of tags of labels to select
fields (None) – a field or iterable of fields from which to select
omit_empty (True) – whether to omit samples that have no labels after filtering
- Returns
-
set_field
(field, expr, _allow_missing=False)¶ Sets a field or embedded field on each sample in a collection by evaluating the given expression.
This method can process embedded list fields. To do so, simply append
[]
to any list component(s) of the field path.Note
There are two cases where FiftyOne will automatically unwind array fields without requiring you to explicitly specify this via the
[]
syntax:Top-level lists: when you specify a
field
path that refers to a top-level list field of a dataset; i.e.,list_field
is automatically coerced tolist_field[]
, if necessary.List fields: When you specify a
field
path that refers to the list field of aLabel
class, such as theDetections.detections
attribute; i.e.,ground_truth.detections.label
is automatically coerced toground_truth.detections[].label
, if necessary.See the examples below for demonstrations of this behavior.
The provided
expr
is interpreted relative to the document on which the embedded field is being set. For example, if you are setting a nested fieldfield="embedded.document.field"
, then the expressionexpr
you provide will be applied to theembedded.document
document. Note that you can override this behavior by defining an expression that is bound to the root document by prepending"$"
to any field name(s) in the expression.See the examples below for more information.
Note
Note that you cannot set a non-existing top-level field using this stage, since doing so would violate the dataset’s schema. You can, however, first declare a new field via
fiftyone.core.dataset.Dataset.add_sample_field()
and then populate it in a view via this stage.Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Replace all values of the `uniqueness` field that are less than # 0.5 with `None` # view = dataset.set_field( "uniqueness", (F("uniqueness") >= 0.5).if_else(F("uniqueness"), None) ) print(view.bounds("uniqueness")) # # Lower bound all object confidences in the `predictions` field at # 0.5 # view = dataset.set_field( "predictions.detections.confidence", F("confidence").max(0.5) ) print(view.bounds("predictions.detections.confidence")) # # Add a `num_predictions` property to the `predictions` field that # contains the number of objects in the field # view = dataset.set_field( "predictions.num_predictions", F("$predictions.detections").length(), ) print(view.bounds("predictions.num_predictions")) # # Set an `is_animal` field on each object in the `predictions` field # that indicates whether the object is an animal # ANIMALS = [ "bear", "bird", "cat", "cow", "dog", "elephant", "giraffe", "horse", "sheep", "zebra" ] view = dataset.set_field( "predictions.detections.is_animal", F("label").is_in(ANIMALS) ) print(view.count_values("predictions.detections.is_animal"))
- Parameters
field – the field or
embedded.field.name
to setexpr –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that defines the field value to set
- Returns
-
set_label_values
(field_name, values, dynamic=False, skip_none=False, validate=True, progress=False)¶ Sets the fields of the specified labels in the collection to the given values.
Note
This method is appropriate when you have the IDs of the labels you wish to modify. See
set_values()
andset_field()
if your updates are not keyed by label ID.Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Populate a new boolean attribute on all high confidence labels # view = dataset.filter_labels("predictions", F("confidence") > 0.99) label_ids = view.values("predictions.detections.id", unwind=True) values = {_id: True for _id in label_ids} dataset.set_label_values("predictions.detections.high_conf", values) print(dataset.count("predictions.detections")) print(len(label_ids)) print(dataset.count_values("predictions.detections.high_conf"))
- Parameters
field_name – a field or
embedded.field.name
values – a dict mapping label IDs to values
skip_none (False) – whether to treat None data in
values
as missing data that should not be setdynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
validate (True) – whether to validate that the values are compliant with the dataset schema before adding them
progress (False) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
set_values
(field_name, values, key_field=None, skip_none=False, expand_schema=True, dynamic=False, validate=True, progress=False, _allow_missing=False, _sample_ids=None, _frame_ids=None)¶ Sets the field or embedded field on each sample or frame in the collection to the given values.
When setting a sample field
embedded.field.name
, this function is an efficient implementation of the following loop:for sample, value in zip(sample_collection, values): sample.embedded.field.name = value sample.save()
When setting an embedded field that contains an array, say
embedded.array.field.name
, this function is an efficient implementation of the following loop:for sample, array_values in zip(sample_collection, values): for doc, value in zip(sample.embedded.array, array_values): doc.field.name = value sample.save()
When setting a frame field
frames.embedded.field.name
, this function is an efficient implementation of the following loop:for sample, frame_values in zip(sample_collection, values): for frame, value in zip(sample.frames.values(), frame_values): frame.embedded.field.name = value sample.save()
When setting an embedded frame field that contains an array, say
frames.embedded.array.field.name
, this function is an efficient implementation of the following loop:for sample, frame_values in zip(sample_collection, values): for frame, array_values in zip(sample.frames.values(), frame_values): for doc, value in zip(frame.embedded.array, array_values): doc.field.name = value sample.save()
When
values
is a dict mapping keys inkey_field
to values, then this function is an efficient implementation of the following loop:for key, value in values.items(): sample = sample_collection.one(F(key_field) == key) sample.embedded.field.name = value sample.save()
When setting frame fields using the dict
values
syntax, each value invalues
may either be a list corresponding to the frames of the sample matching the given key, or each value may itself be a dict mapping frame numbers to values. In the latter case, this function is an efficient implementation of the following loop:for key, frame_values in values.items(): sample = sample_collection.one(F(key_field) == key) for frame_number, value in frame_values.items(): frame = sample[frame_number] frame.embedded.field.name = value sample.save()
You can also update list fields using the dict
values
syntax, in which case this method is an efficient implementation of the natural nested list modifications of the above sample/frame loops.The dual function of
set_values()
isvalues()
, which can be used to efficiently extract the values of a field or embedded field of all samples in a collection as lists of values in the same structure expected by this method.Note
If the values you are setting can be described by a
fiftyone.core.expressions.ViewExpression
applied to the existing dataset contents, then consider usingset_field()
+save()
for an even more efficient alternative to explicitly iterating over the dataset or callingvalues()
+set_values()
to perform the update in-memory.Examples:
import random import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Create a new sample field # values = [random.random() for _ in range(len(dataset))] dataset.set_values("random", values) print(dataset.bounds("random")) # # Add a tag to all low confidence labels # view = dataset.filter_labels("predictions", F("confidence") < 0.06) detections = view.values("predictions.detections") for sample_detections in detections: for detection in sample_detections: detection.tags.append("low_confidence") view.set_values("predictions.detections", detections) print(dataset.count_label_tags())
- Parameters
field_name – a field or
embedded.field.name
values – an iterable of values, one for each sample in the collection. When setting frame fields, each element can either be an iterable of values (one for each existing frame of the sample) or a dict mapping frame numbers to values. If
field_name
contains array fields, the corresponding elements ofvalues
must be arrays of the same lengths. This argument can also be a dict mapping keys to values (each value as described previously), in which case the keys are used to match samples by theirkey_field
key_field (None) – a key field to use when choosing which samples to update when
values
is a dictskip_none (False) – whether to treat None data in
values
as missing data that should not be setexpand_schema (True) – whether to dynamically add new sample/frame fields encountered to the dataset schema. If False, an error is raised if the root
field_name
does not existdynamic (False) – whether to declare dynamic attributes of embedded document fields that are encountered
validate (True) – whether to validate that the values are compliant with the dataset schema before adding them
progress (False) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
shuffle
(seed=None)¶ Randomly shuffles the samples in the collection.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", ground_truth=fo.Classification(label="cat"), ), fo.Sample( filepath="/path/to/image2.png", ground_truth=fo.Classification(label="dog"), ), fo.Sample( filepath="/path/to/image3.png", ground_truth=None, ), ] ) # # Return a view that contains a randomly shuffled version of the # samples in the dataset # view = dataset.shuffle() # # Shuffle the samples with a fixed random seed # view = dataset.shuffle(seed=51)
- Parameters
seed (None) – an optional random seed to use when shuffling the samples
- Returns
-
skip
(skip)¶ Omits the given number of samples from the head of the collection.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", ground_truth=fo.Classification(label="cat"), ), fo.Sample( filepath="/path/to/image2.png", ground_truth=fo.Classification(label="dog"), ), fo.Sample( filepath="/path/to/image3.png", ground_truth=fo.Classification(label="rabbit"), ), fo.Sample( filepath="/path/to/image4.png", ground_truth=None, ), ] ) # # Omit the first two samples from the dataset # view = dataset.skip(2)
- Parameters
skip – the number of samples to skip. If a non-positive number is provided, no samples are omitted
- Returns
-
sort_by
(field_or_expr, reverse=False, create_index=True)¶ Sorts the samples in the collection by the given field(s) or expression(s).
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Sort the samples by their `uniqueness` field in ascending order # view = dataset.sort_by("uniqueness", reverse=False) # # Sorts the samples in descending order by the number of detections # in their `predictions` field whose bounding box area is less than # 0.2 # # Bboxes are in [top-left-x, top-left-y, width, height] format bbox = F("bounding_box") bbox_area = bbox[2] * bbox[3] small_boxes = F("predictions.detections").filter(bbox_area < 0.2) view = dataset.sort_by(small_boxes.length(), reverse=True) # # Performs a compound sort where samples are first sorted in # descending or by number of detections and then in ascending order # of uniqueness for samples with the same number of predictions # view = dataset.sort_by( [ (F("predictions.detections").length(), -1), ("uniqueness", 1), ] ) num_objects, uniqueness = view[:5].values( [F("predictions.detections").length(), "uniqueness"] ) print(list(zip(num_objects, uniqueness)))
- Parameters
field_or_expr –
the field(s) or expression(s) to sort by. This can be any of the following:
a field to sort by
an
embedded.field.name
to sort bya
fiftyone.core.expressions.ViewExpression
or a MongoDB aggregation expression that defines the quantity to sort bya list of
(field_or_expr, order)
tuples defining a compound sort criteria, wherefield_or_expr
is a field or expression as defined above, andorder
can be 1 or any string starting with “a” for ascending order, or -1 or any string starting with “d” for descending order
reverse (False) – whether to return the results in descending order
create_index (True) – whether to create an index, if necessary, to optimize the sort. Only applicable when sorting by field(s), not expressions
- Returns
-
sort_by_similarity
(query, k=None, reverse=False, dist_field=None, brain_key=None)¶ Sorts the collection by similarity to a specified query.
In order to use this stage, you must first use
fiftyone.brain.compute_similarity()
to index your dataset by similarity.Examples:
import fiftyone as fo import fiftyone.brain as fob import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") fob.compute_similarity( dataset, model="clip-vit-base32-torch", brain_key="clip" ) # # Sort samples by their similarity to a sample by its ID # query_id = dataset.first().id view = dataset.sort_by_similarity(query_id, k=5) # # Sort samples by their similarity to a manually computed vector # model = foz.load_zoo_model("clip-vit-base32-torch") embeddings = dataset.take(2, seed=51).compute_embeddings(model) query = embeddings.mean(axis=0) view = dataset.sort_by_similarity(query, k=5) # # Sort samples by their similarity to a text prompt # query = "kites high in the air" view = dataset.sort_by_similarity(query, k=5)
- Parameters
query –
the query, which can be any of the following:
an ID or iterable of IDs
a
num_dims
vector ornum_queries x num_dims
array of vectorsa prompt or iterable of prompts (if supported by the index)
k (None) – the number of matches to return. By default, the entire collection is sorted
reverse (False) – whether to sort by least similarity (True) or greatest similarity (False). Some backends may not support least similarity
dist_field (None) – the name of a float field in which to store the distance of each example to the specified query. The field is created if necessary
brain_key (None) – the brain key of an existing
fiftyone.brain.compute_similarity()
run on the dataset. If not specified, the dataset must have an applicable run, which will be used by default
- Returns
-
split_labels
(in_field, out_field, filter=None)¶ Splits the labels from the given input field into the given output field of the collection.
This method is typically invoked on a view that has filtered the contents of the specified input field, so that the labels in the view are moved to the output field and the remaining labels are left in-place.
Alternatively, you can provide a
filter
expression that selects the labels of interest to move in this collection.- Parameters
in_field – the name of the input label field
out_field – the name of the output label field, which will be created if necessary
filter (None) – a boolean
fiftyone.core.expressions.ViewExpression
to apply to each label in the input field to determine whether to move it (True) or leave it (False)
-
std
(field_or_expr, expr=None, safe=False, sample=False)¶ Computes the standard deviation of the field values of the collection.
None
-valued fields are ignored.This aggregation is typically applied to numeric field types (or lists of such types):
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", numeric_field=1.0, numeric_list_field=[1, 2, 3], ), fo.Sample( filepath="/path/to/image2.png", numeric_field=4.0, numeric_list_field=[1, 2], ), fo.Sample( filepath="/path/to/image3.png", numeric_field=None, numeric_list_field=None, ), ] ) # # Compute the standard deviation of a numeric field # std = dataset.std("numeric_field") print(std) # the standard deviation # # Compute the standard deviation of a numeric list field # std = dataset.std("numeric_list_field") print(std) # the standard deviation # # Compute the standard deviation of a transformation of a numeric field # std = dataset.std(2 * (F("numeric_field") + 1)) print(std) # the standard deviation
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to ignore nan/inf values when dealing with floating point values
sample (False) – whether to compute the sample standard deviation rather than the population standard deviation
- Returns
the standard deviation
-
sum
(field_or_expr, expr=None, safe=False)¶ Computes the sum of the field values of the collection.
None
-valued fields are ignored.This aggregation is typically applied to numeric field types (or lists of such types):
Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", numeric_field=1.0, numeric_list_field=[1, 2, 3], ), fo.Sample( filepath="/path/to/image2.png", numeric_field=4.0, numeric_list_field=[1, 2], ), fo.Sample( filepath="/path/to/image3.png", numeric_field=None, numeric_list_field=None, ), ] ) # # Compute the sum of a numeric field # total = dataset.sum("numeric_field") print(total) # the sum # # Compute the sum of a numeric list field # total = dataset.sum("numeric_list_field") print(total) # the sum # # Compute the sum of a transformation of a numeric field # total = dataset.sum(2 * (F("numeric_field") + 1)) print(total) # the sum
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingsafe (False) – whether to ignore nan/inf values when dealing with floating point values
- Returns
the sum
-
sync_last_modified_at
(include_frames=True)¶ Syncs the
last_modified_at
property(s) of the dataset.Updates the
last_modified_at
property of the dataset if necessary to incorporate any modification timestamps to its samples.If
include_frames==True
, thelast_modified_at
property of each video sample is first updated if necessary to incorporate any modification timestamps to its frames.- Parameters
include_frames (True) – whether to update the
last_modified_at
property of video samples. Only applicable to datasets that contain videos
-
tag_labels
(tags, label_fields=None)¶ Adds the tag(s) to all labels in the specified label field(s) of this collection, if necessary.
- Parameters
tags – a tag or iterable of tags
label_fields (None) – an optional name or iterable of names of
fiftyone.core.labels.Label
fields. By default, all label fields are used
-
tag_samples
(tags)¶ Adds the tag(s) to all samples in this collection, if necessary.
- Parameters
tags – a tag or iterable of tags
-
take
(size, seed=None)¶ Randomly samples the given number of samples from the collection.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", ground_truth=fo.Classification(label="cat"), ), fo.Sample( filepath="/path/to/image2.png", ground_truth=fo.Classification(label="dog"), ), fo.Sample( filepath="/path/to/image3.png", ground_truth=fo.Classification(label="rabbit"), ), fo.Sample( filepath="/path/to/image4.png", ground_truth=None, ), ] ) # # Take two random samples from the dataset # view = dataset.take(2) # # Take two random samples from the dataset with a fixed seed # view = dataset.take(2, seed=51)
- Parameters
size – the number of samples to return. If a non-positive number is provided, an empty view is returned
seed (None) – an optional random seed to use when selecting the samples
- Returns
-
to_clips
(field_or_expr, **kwargs)¶ Creates a view that contains one sample per clip defined by the given field or expression in the video collection.
The returned view will contain:
A
sample_id
field that records the sample ID from which each clip was takenA
support
field that records the[first, last]
frame support of each clipAll frame-level information from the underlying dataset of the input collection
Refer to
fiftyone.core.clips.make_clips_dataset()
to see the available configuration options for generating clips.Note
The clip generation logic will respect any frame-level modifications defined in the input collection, but the output clips will always contain all frame-level labels.
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart-video") # # Create a clips view that contains one clip for each contiguous # segment that contains at least one road sign in every frame # clips = ( dataset .filter_labels("frames.detections", F("label") == "road sign") .to_clips("frames.detections") ) print(clips) # # Create a clips view that contains one clip for each contiguous # segment that contains at least two road signs in every frame # signs = F("detections.detections").filter(F("label") == "road sign") clips = dataset.to_clips(signs.length() >= 2) print(clips)
- Parameters
field_or_expr –
can be any of the following:
a
fiftyone.core.labels.TemporalDetection
,fiftyone.core.labels.TemporalDetections
,fiftyone.core.fields.FrameSupportField
, or list offiftyone.core.fields.FrameSupportField
fielda frame-level label list field of any of the following types:
a
fiftyone.core.expressions.ViewExpression
that returns a boolean to apply to each frame of the input collection to determine if the frame should be clippeda list of
[(first1, last1), (first2, last2), ...]
lists defining the frame numbers of the clips to extract from each sample
other_fields (None) –
controls whether sample fields other than the default sample fields are included. Can be any of the following:
a field or list of fields to include
True
to include all other fieldsNone
/False
to include no other fields
tol (0) – the maximum number of false frames that can be overlooked when generating clips. Only applicable when
field_or_expr
is a frame-level list field or expressionmin_len (0) – the minimum allowable length of a clip, in frames. Only applicable when
field_or_expr
is a frame-level list field or an expressiontrajectories (False) – whether to create clips for each unique object trajectory defined by their
(label, index)
. Only applicable whenfield_or_expr
is a frame-level field
- Returns
-
to_dict
(rel_dir=None, include_private=False, include_frames=False, frame_labels_dir=None, pretty_print=False, progress=None)¶ Returns a JSON dictionary representation of the collection.
- Parameters
rel_dir (None) – a relative directory to remove from the
filepath
of each sample, if possible. The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
. The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directoryinclude_private (False) – whether to include private fields
include_frames (False) – whether to include the frame labels for video samples
frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to datasets that contain videos when
include_frames
is Truepretty_print (False) – whether to render frame labels JSON in human readable format with newlines and indentations. Only applicable to datasets that contain videos when a
frame_labels_dir
is providedprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a JSON dict
-
to_evaluation_patches
(eval_key, **kwargs)¶ Creates a view based on the results of the evaluation with the given key that contains one sample for each true positive, false positive, and false negative example in the collection, respectively.
True positive examples will result in samples with both their ground truth and predicted fields populated, while false positive/negative examples will only have one of their corresponding predicted/ground truth fields populated, respectively.
If multiple predictions are matched to a ground truth object (e.g., if the evaluation protocol includes a crowd attribute), then all matched predictions will be stored in the single sample along with the ground truth object.
The returned dataset will also have top-level
type
andiou
fields populated based on the evaluation results for that example, as well as asample_id
field recording the sample ID of the example, and acrowd
field if the evaluation protocol defines a crowd attribute.Note
The returned view will contain patches for the contents of this collection, which may differ from the view on which the
eval_key
evaluation was performed. This may exclude some labels that were evaluated and/or include labels that were not evaluated.If you would like to see patches for the exact view on which an evaluation was performed, first call
load_evaluation_view()
to load the view and then convert to patches.Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") dataset.evaluate_detections("predictions", eval_key="eval") session = fo.launch_app(dataset) # # Create a patches view for the evaluation results # view = dataset.to_evaluation_patches("eval") print(view) session.view = view
- Parameters
eval_key – an evaluation key that corresponds to the evaluation of ground truth/predicted fields that are of type
fiftyone.core.labels.Detections
,fiftyone.core.labels.Polylines
, orfiftyone.core.labels.Keypoints
other_fields (None) –
controls whether fields other than the ground truth/predicted fields and the default sample fields are included. Can be any of the following:
a field or list of fields to include
True
to include all other fieldsNone
/False
to include no other fields
- Returns
-
to_frames
(**kwargs)¶ Creates a view that contains one sample per frame in the video collection.
The returned view will contain all frame-level fields and the
tags
of each video as sample-level fields, as well as asample_id
field that records the IDs of the parent sample for each frame.By default,
sample_frames
is False and this method assumes that the frames of the input collection havefilepath
fields populated pointing to each frame image. Any frames without afilepath
populated will be omitted from the returned view.When
sample_frames
is True, this method samples each video in the collection into a directory of per-frame images and stores the filepaths in thefilepath
frame field of the source dataset. By default, each folder of images is written using the same basename as the input video. For example, ifframes_patt = "%%06d.jpg"
, then videos with the following paths:/path/to/video1.mp4 /path/to/video2.mp4 ...
would be sampled as follows:
/path/to/video1/ 000001.jpg 000002.jpg ... /path/to/video2/ 000001.jpg 000002.jpg ...
However, you can use the optional
output_dir
andrel_dir
parameters to customize the location and shape of the sampled frame folders. For example, ifoutput_dir = "/tmp"
andrel_dir = "/path/to"
, then videos with the following paths:/path/to/folderA/video1.mp4 /path/to/folderA/video2.mp4 /path/to/folderB/video3.mp4 ...
would be sampled as follows:
/tmp/folderA/ video1/ 000001.jpg 000002.jpg ... video2/ 000001.jpg 000002.jpg ... /tmp/folderB/ video3/ 000001.jpg 000002.jpg ...
By default, samples will be generated for every video frame at full resolution, but this method provides a variety of parameters that can be used to customize the sampling behavior.
Note
If this method is run multiple times with
sample_frames
set to True, existing frames will not be resampled unless you setforce_sample
to True.Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart-video") session = fo.launch_app(dataset) # # Create a frames view for an entire video dataset # frames = dataset.to_frames(sample_frames=True) print(frames) session.view = frames # # Create a frames view that only contains frames with at least 10 # objects, sampled at a maximum frame rate of 1fps # num_objects = F("detections.detections").length() view = dataset.match_frames(num_objects > 10) frames = view.to_frames(max_fps=1) print(frames) session.view = frames
- Parameters
sample_frames (False) – whether to assume that the frame images have already been sampled at locations stored in the
filepath
field of each frame (False), or whether to sample the video frames now according to the specified parameters (True)fps (None) – an optional frame rate at which to sample each video’s frames
max_fps (None) – an optional maximum frame rate at which to sample. Videos with frame rate exceeding this value are downsampled
size (None) – an optional
(width, height)
at which to sample frames. A dimension can be -1, in which case the aspect ratio is preserved. Only applicable whensample_frames=True
min_size (None) – an optional minimum
(width, height)
for each frame. A dimension can be -1 if no constraint should be applied. The frames are resized (aspect-preserving) if necessary to meet this constraint. Only applicable whensample_frames=True
max_size (None) – an optional maximum
(width, height)
for each frame. A dimension can be -1 if no constraint should be applied. The frames are resized (aspect-preserving) if necessary to meet this constraint. Only applicable whensample_frames=True
sparse (False) – whether to only sample frame images for frame numbers for which
fiftyone.core.frame.Frame
instances exist in the input collection. This parameter has no effect whensample_frames==False
since frames must always exist in order to havefilepath
information useoutput_dir (None) – an optional output directory in which to write the sampled frames. By default, the frames are written in folders with the same basename of each video
rel_dir (None) – a relative directory to remove from the filepath of each video, if possible. The path is converted to an absolute path (if necessary) via
fiftyone.core.storage.normalize_path()
. This argument can be used in conjunction withoutput_dir
to cause the sampled frames to be written in a nested directory structure withinoutput_dir
matching the shape of the input video’s folder structureframes_patt (None) – a pattern specifying the filename/format to use to write or check or existing sampled frames, e.g.,
"%%06d.jpg"
. The default value isfiftyone.config.default_sequence_idx + fiftyone.config.default_image_ext
force_sample (False) – whether to resample videos whose sampled frames already exist. Only applicable when
sample_frames=True
skip_failures (True) – whether to gracefully continue without raising an error if a video cannot be sampled
verbose (False) – whether to log information about the frames that will be sampled, if any
- Returns
-
to_json
(rel_dir=None, include_private=False, include_frames=False, frame_labels_dir=None, pretty_print=False)¶ Returns a JSON string representation of the collection.
The samples will be written as a list in a top-level
samples
field of the returned dictionary.- Parameters
rel_dir (None) – a relative directory to remove from the
filepath
of each sample, if possible. The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
. The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directoryinclude_private (False) – whether to include private fields
include_frames (False) – whether to include the frame labels for video samples
frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to datasets that contain videos when
include_frames
is Truepretty_print (False) – whether to render the JSON in human readable format with newlines and indentations
- Returns
a JSON string
-
to_patches
(field, **kwargs)¶ Creates a view that contains one sample per object patch in the specified field of the collection.
Fields other than
field
and the default sample fields will not be included in the returned view. Asample_id
field will be added that records the sample ID from which each patch was taken.Examples:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") session = fo.launch_app(dataset) # # Create a view containing the ground truth patches # view = dataset.to_patches("ground_truth") print(view) session.view = view
- Parameters
field – the patches field, which must be of type
fiftyone.core.labels.Detections
,fiftyone.core.labels.Polylines
, orfiftyone.core.labels.Keypoints
other_fields (None) –
controls whether fields other than
field
and the default sample fields are included. Can be any of the following:a field or list of fields to include
True
to include all other fieldsNone
/False
to include no other fields
keep_label_lists (False) – whether to store the patches in label list fields of the same type as the input collection rather than using their single label variants
- Returns
-
to_trajectories
(field, **kwargs)¶ Creates a view that contains one clip for each unique object trajectory defined by their
(label, index)
in a frame-level field of a video collection.The returned view will contain:
A
sample_id
field that records the sample ID from which each clip was takenA
support
field that records the[first, last]
frame support of each clipA sample-level label field that records the
label
andindex
of each trajectory
Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart-video") # # Create a trajectories view for the vehicles in the dataset # trajectories = ( dataset .filter_labels("frames.detections", F("label") == "vehicle") .to_trajectories("frames.detections") ) print(trajectories)
- Parameters
field –
a frame-level label list field of any of the following types:
**kwargs – optional keyword arguments for
fiftyone.core.clips.make_clips_dataset()
specifying how to perform the conversion
- Returns
-
untag_labels
(tags, label_fields=None)¶ Removes the tag from all labels in the specified label field(s) of this collection, if necessary.
- Parameters
tags – a tag or iterable of tags
label_fields (None) – an optional name or iterable of names of
fiftyone.core.labels.Label
fields. By default, all label fields are used
-
untag_samples
(tags)¶ Removes the tag(s) from all samples in this collection, if necessary.
- Parameters
tags – a tag or iterable of tags
-
update_run_config
(run_key, config)¶ Updates the run config for the run with the given key.
- Parameters
run_key – a run key
config – a
fiftyone.core.runs.RunConfig
-
validate_field_type
(path, ftype=None, embedded_doc_type=None)¶ Validates that the collection has a field of the given type.
- Parameters
path – a field name or
embedded.field.name
ftype (None) – an optional field type to enforce. Must be a subclass of
fiftyone.core.fields.Field
embedded_doc_type (None) – an optional embedded document type or iterable of types to enforce. Must be a subclass(es) of
fiftyone.core.odm.BaseEmbeddedDocument
- Raises
ValueError – if the field does not exist or does not have the expected type
-
validate_fields_exist
(fields, include_private=False)¶ Validates that the collection has field(s) with the given name(s).
If embedded field names are provided, only the root field is checked.
- Parameters
fields – a field name or iterable of field names
include_private (False) – whether to include private fields when checking for existence
- Raises
ValueError – if one or more of the fields do not exist
-
values
(field_or_expr, expr=None, missing_value=None, unwind=False, _allow_missing=False, _big_result=True, _raw=False, _field=None)¶ Extracts the values of a field from all samples in the collection.
Values aggregations are useful for efficiently extracting a slice of field or embedded field values across all samples in a collection. See the examples below for more details.
The dual function of
values()
isset_values()
, which can be used to efficiently set a field or embedded field of all samples in a collection by providing lists of values of same structure returned by this aggregation.Note
Unlike other aggregations,
values()
does not automatically unwind list fields, which ensures that the returned values match the potentially-nested structure of the documents.You can opt-in to unwinding specific list fields using the
[]
syntax, or you can pass the optionalunwind=True
parameter to unwind all supported list fields. See Aggregating list fields for more information.Examples:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", numeric_field=1.0, numeric_list_field=[1, 2, 3], ), fo.Sample( filepath="/path/to/image2.png", numeric_field=4.0, numeric_list_field=[1, 2], ), fo.Sample( filepath="/path/to/image3.png", numeric_field=None, numeric_list_field=None, ), ] ) # # Get all values of a field # values = dataset.values("numeric_field") print(values) # [1.0, 4.0, None] # # Get all values of a list field # values = dataset.values("numeric_list_field") print(values) # [[1, 2, 3], [1, 2], None] # # Get all values of transformed field # values = dataset.values(2 * (F("numeric_field") + 1)) print(values) # [4.0, 10.0, None] # # Get values from a label list field # dataset = foz.load_zoo_dataset("quickstart") # list of `Detections` detections = dataset.values("ground_truth") # list of lists of `Detection` instances detections = dataset.values("ground_truth.detections") # list of lists of detection labels labels = dataset.values("ground_truth.detections.label")
- Parameters
field_or_expr –
a field name,
embedded.field.name
,fiftyone.core.expressions.ViewExpression
, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returnedexpr (None) –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression to apply tofield_or_expr
(which must be a field) before aggregatingmissing_value (None) – a value to insert for missing or
None
-valued fieldsunwind (False) – whether to automatically unwind all recognized list fields (True) or unwind all list fields except the top-level sample field (-1)
- Returns
the list of values
-
write_json
(json_path, rel_dir=None, include_private=False, include_frames=False, frame_labels_dir=None, pretty_print=False)¶ Writes the colllection to disk in JSON format.
- Parameters
json_path – the path to write the JSON
rel_dir (None) – a relative directory to remove from the
filepath
of each sample, if possible. The path is converted to an absolute path (if necessary) viafiftyone.core.storage.normalize_path()
. The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directoryinclude_private (False) – whether to include private fields
include_frames (False) – whether to include the frame labels for video samples
frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to datasets that contain videos when
include_frames
is Truepretty_print (False) – whether to render the JSON in human readable format with newlines and indentations