fiftyone.core.sample#

Dataset samples.

Copyright 2017-2025, Voxel51, Inc.

Functions:

get_default_sample_fields([include_private, ...])

Returns the default fields present on all samples.

Classes:

Sample(filepath[, tags, metadata])

A sample in a fiftyone.core.dataset.Dataset.

SampleView(doc, view[, selected_fields, ...])

A view into a Sample in a dataset.

fiftyone.core.sample.get_default_sample_fields(include_private=False, use_db_fields=False)#

Returns the default fields present on all samples.

Parameters:
  • include_private (False) – whether to include fields starting with _

  • use_db_fields (False) – whether to return database fields rather than user-facing fields, when applicable

Returns:

a tuple of field names

class fiftyone.core.sample.Sample(filepath, tags=None, metadata=None, **kwargs)#

Bases: _SampleMixin, Document

A sample in a fiftyone.core.dataset.Dataset.

Samples store all information associated with a particular piece of data in a dataset, including basic metadata about the data, one or more sets of labels (ground truth, user-provided, or FiftyOne-generated), and additional features associated with subsets of the data and/or label sets.

Note

Sample instances that are in datasets are singletons, i.e., dataset[sample_id] will always return the same Sample instance.

Parameters:

Methods:

reload([hard, include_frames])

Reloads the sample from the database.

save()

Saves the sample to the database.

from_frame(frame[, filepath])

Creates a sample from the given frame.

from_doc(doc[, dataset])

Creates a sample backed by the given document.

from_dict(d)

Loads the sample from a JSON dictionary.

add_labels(labels[, label_field, ...])

Adds the given labels to the sample.

clear_field(field_name)

Clears the value of a field of the document.

compute_metadata([overwrite, skip_failures])

Populates the metadata field of the sample.

copy([fields, omit_fields])

Returns a deep copy of the sample that has not been added to the database.

from_json(s)

Loads the document from a JSON string.

get_field(field_name)

Gets the value of a field of the document.

has_field(field_name)

Determines whether the document has the given field.

iter_fields([include_id, include_timestamps])

Returns an iterator over the (name, value) pairs of the public fields of the document.

merge(sample[, fields, omit_fields, ...])

Merges the fields of the given sample into this sample.

set_field(field_name, value[, create, ...])

Sets the value of a field of the document.

to_dict([include_frames, include_private])

Serializes the sample to a JSON dictionary.

to_json([pretty_print])

Serializes the document to a JSON string.

to_mongo_dict([include_id])

Serializes the document to a BSON dictionary equivalent to the representation that would be stored in the database.

update_fields(fields_dict[, expand_schema, ...])

Sets the dictionary of fields on the document.

Attributes:

dataset

The dataset to which this document belongs, or None if it has not been added to a dataset.

dataset_id

field_names

An ordered tuple of the public field names of this document.

filename

The basename of the media's filepath.

in_dataset

Whether the document has been added to a dataset.

media_type

The media type of the sample.

reload(hard=False, include_frames=True)#

Reloads the sample from the database.

Parameters:
  • hard (False) – whether to reload the sample’s schema in addition to its field values. This is necessary if new fields may have been added to the dataset schema

  • include_frames (True) – whether to reload any in-memory frames of video samples

save()#

Saves the sample to the database.

classmethod from_frame(frame, filepath=None)#

Creates a sample from the given frame.

Parameters:
  • frame – a fiftyone.core.frame.Frame

  • filepath (None) – the path to the corresponding image frame on disk, if not available

Returns:

a Sample

classmethod from_doc(doc, dataset=None)#

Creates a sample backed by the given document.

Parameters:
Returns:

a Sample

classmethod from_dict(d)#

Loads the sample from a JSON dictionary.

The returned sample will not belong to a dataset.

Returns:

a Sample

add_labels(labels, label_field=None, confidence_thresh=None, expand_schema=True, validate=True, dynamic=False)#

Adds the given labels to the sample.

The provided labels can be any of the following:

  • A fiftyone.core.labels.Label instance, in which case the labels are directly saved in the specified label_field

  • A dict mapping keys to fiftyone.core.labels.Label instances. In this case, the labels are added as follows:

    for key, value in labels.items():
        sample[label_key(key)] = value
    
  • A dict mapping frame numbers to fiftyone.core.labels.Label instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:

    sample.frames.merge(
        {
            frame_number: {label_field: label}
            for frame_number, label in labels.items()
        }
    )
    
  • A dict mapping frame numbers to dicts mapping keys to fiftyone.core.labels.Label instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:

    sample.frames.merge(
        {
            frame_number: {
                label_key(key): value
                for key, value in frame_dict.items()
            }
            for frame_number, frame_dict in labels.items()
        }
    )
    

In the above, the label_key function maps label dict keys to field names, and is defined from label_field as follows:

if isinstance(label_field, dict):
    label_key = lambda k: label_field.get(k, k)
elif label_field is not None:
    label_key = lambda k: label_field + "_" + k
else:
    label_key = lambda k: k
Parameters:
  • labels – a fiftyone.core.labels.Label or dict of labels per the description above

  • label_field (None) – the sample field, prefix, or dict defining in which field(s) to save the labels

  • confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels before saving them

  • expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic attributes

clear_field(field_name)#

Clears the value of a field of the document.

Parameters:

field_name – the name of the field to clear

Raises:

AttributeError – if the field does not exist

compute_metadata(overwrite=False, skip_failures=False)#

Populates the metadata field of the sample.

Parameters:
  • overwrite (False) – whether to overwrite existing metadata

  • skip_failures (False) – whether to gracefully continue without raising an error if metadata cannot be computed

copy(fields=None, omit_fields=None)#

Returns a deep copy of the sample that has not been added to the database.

Parameters:
  • fields (None) – an optional field or iterable of fields to which to restrict the copy. This can also be a dict mapping existing field names to new field names

  • omit_fields (None) – an optional field or iterable of fields to exclude from the copy

Returns:

a Sample

property dataset#

The dataset to which this document belongs, or None if it has not been added to a dataset.

property dataset_id#
property field_names#

An ordered tuple of the public field names of this document.

property filename#

The basename of the media’s filepath.

classmethod from_json(s)#

Loads the document from a JSON string.

The returned document will not belong to a dataset.

Parameters:

s – the JSON string

Returns:

a Document

get_field(field_name)#

Gets the value of a field of the document.

Parameters:

field_name – the field name

Returns:

the field value

Raises:

AttributeError – if the field does not exist

has_field(field_name)#

Determines whether the document has the given field.

Parameters:

field_name – the field name

Returns:

True/False

property in_dataset#

Whether the document has been added to a dataset.

iter_fields(include_id=False, include_timestamps=False)#

Returns an iterator over the (name, value) pairs of the public fields of the document.

Parameters:
  • include_id (False) – whether to include the id field

  • include_timestamps (False) – whether to include the created_at and last_modified_at fields

Returns:

an iterator that emits (name, value) tuples

property media_type#

The media type of the sample.

merge(sample, fields=None, omit_fields=None, merge_lists=True, merge_embedded_docs=False, overwrite=True, expand_schema=True, validate=True, dynamic=False)#

Merges the fields of the given sample into this sample.

The behavior of this method is highly customizable. By default, all top-level fields from the provided sample are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both samples are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether new fields can be added to the dataset schema

  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements

  • Whether to merge only specific fields, or all but certain fields

  • Mapping input sample fields to different field names of this sample

Parameters:
  • sample – a fiftyone.core.sample.Sample

  • fields (None) – an optional field or iterable of fields to which to restrict the merge. May contain frame fields for video samples. This can also be a dict mapping field names of the input sample to field names of this sample

  • omit_fields (None) – an optional field or iterable of fields to exclude from the merge. May contain frame fields for video samples

  • merge_lists (True) – whether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided sample

  • merge_embedded_docs (False) – whether to merge the attributes of embedded documents (True) rather than merging the entire top-level field (False)

  • overwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements

  • expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic embedded document fields

set_field(field_name, value, create=True, validate=True, dynamic=False)#

Sets the value of a field of the document.

Parameters:
  • field_name – the field name

  • value – the field value

  • create (True) – whether to create the field if it does not exist

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic embedded document fields

Raises:
  • ValueError – if field_name is not an allowed field name

  • AttributeError – if the field does not exist and create == False

to_dict(include_frames=False, include_private=False)#

Serializes the sample to a JSON dictionary.

Parameters:
  • include_frames (False) – whether to include the frame labels for video samples

  • include_private (False) – whether to include private fields

Returns:

a JSON dict

to_json(pretty_print=False)#

Serializes the document to a JSON string.

The document ID and private fields are excluded in this representation.

Parameters:

pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

Returns:

a JSON string

to_mongo_dict(include_id=False)#

Serializes the document to a BSON dictionary equivalent to the representation that would be stored in the database.

Parameters:

include_id (False) – whether to include the document ID

Returns:

a BSON dict

update_fields(fields_dict, expand_schema=True, validate=True, dynamic=False)#

Sets the dictionary of fields on the document.

Parameters:
  • fields_dict – a dict mapping field names to values

  • expand_schema (True) – whether to dynamically add new fields encountered to the document schema. If False, an error is raised if any fields are not in the document schema

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic embedded document fields

Raises:

AttributeError – if expand_schema == False and a field does not exist

class fiftyone.core.sample.SampleView(doc, view, selected_fields=None, excluded_fields=None, filtered_fields=None)#

Bases: _SampleMixin, DocumentView

A view into a Sample in a dataset.

Like Sample instances, the fields of a SampleView instance can be modified, new fields can be created, and any changes can be saved to the database.

SampleView instances differ from Sample instances in the following ways:

  • A sample view may contain only a subset of the fields of its source sample, either by selecting and/or excluding specific fields

  • A sample view may contain array fields or embedded array fields that have been filtered, thus containing only a subset of the array elements from the source sample

  • Excluded fields of a sample view may not be accessed or modified

Note

Sample views should never be created manually; they are generated when accessing the samples in a fiftyone.core.view.DatasetView.

Parameters:
  • doc – a fiftyone.core.odm.mixins.DatasetSampleDocument

  • view – the fiftyone.core.view.DatasetView that the sample belongs to

  • selected_fields (None) – a set of field names that this sample view is restricted to, if any

  • excluded_fields (None) – a set of field names that are excluded from this sample view, if any

  • filtered_fields (None) – a set of field names of list fields that are filtered in this sample view, if any

Methods:

to_dict([include_frames, include_private])

Serializes the sample view to a JSON dictionary.

save()

Saves the sample view to the database.

add_labels(labels[, label_field, ...])

Adds the given labels to the sample.

clear_field(field_name)

Clears the value of a field of the document.

compute_metadata([overwrite, skip_failures])

Populates the metadata field of the sample.

copy([fields, omit_fields])

Returns a deep copy of the sample that has not been added to the database.

get_field(field_name)

Gets the value of a field of the document.

has_field(field_name)

Determines whether the document has the given field.

iter_fields([include_id, include_timestamps])

Returns an iterator over the (name, value) pairs of the public fields of the document.

merge(sample[, fields, omit_fields, ...])

Merges the fields of the given sample into this sample.

set_field(field_name, value[, create, ...])

Sets the value of a field of the document.

to_json([pretty_print])

Serializes the document to a JSON string.

to_mongo_dict([include_id])

Serializes the document to a BSON dictionary equivalent to the representation that would be stored in the database.

update_fields(fields_dict[, expand_schema, ...])

Sets the dictionary of fields on the document.

Attributes:

dataset

The dataset to which this document belongs, or None if it has not been added to a dataset.

dataset_id

excluded_field_names

The set of field names that are excluded on this document view, or None if no fields are explicitly excluded.

field_names

An ordered tuple of field names of this document view.

filename

The basename of the media's filepath.

filtered_field_names

The set of field names or embedded.field.names that have been filtered on this document view, or None if no fields are filtered.

in_dataset

Whether the document has been added to a dataset.

media_type

The media type of the sample.

selected_field_names

The set of field names that are selected on this document view, or None if no fields are explicitly selected.

to_dict(include_frames=False, include_private=False)#

Serializes the sample view to a JSON dictionary.

Parameters:
  • include_frames (False) – whether to include the frame labels for video samples

  • include_private (False) – whether to include private fields

Returns:

a JSON dict

save()#

Saves the sample view to the database.

Warning

This will permanently delete any omitted or filtered contents from the source dataset.

add_labels(labels, label_field=None, confidence_thresh=None, expand_schema=True, validate=True, dynamic=False)#

Adds the given labels to the sample.

The provided labels can be any of the following:

  • A fiftyone.core.labels.Label instance, in which case the labels are directly saved in the specified label_field

  • A dict mapping keys to fiftyone.core.labels.Label instances. In this case, the labels are added as follows:

    for key, value in labels.items():
        sample[label_key(key)] = value
    
  • A dict mapping frame numbers to fiftyone.core.labels.Label instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:

    sample.frames.merge(
        {
            frame_number: {label_field: label}
            for frame_number, label in labels.items()
        }
    )
    
  • A dict mapping frame numbers to dicts mapping keys to fiftyone.core.labels.Label instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:

    sample.frames.merge(
        {
            frame_number: {
                label_key(key): value
                for key, value in frame_dict.items()
            }
            for frame_number, frame_dict in labels.items()
        }
    )
    

In the above, the label_key function maps label dict keys to field names, and is defined from label_field as follows:

if isinstance(label_field, dict):
    label_key = lambda k: label_field.get(k, k)
elif label_field is not None:
    label_key = lambda k: label_field + "_" + k
else:
    label_key = lambda k: k
Parameters:
  • labels – a fiftyone.core.labels.Label or dict of labels per the description above

  • label_field (None) – the sample field, prefix, or dict defining in which field(s) to save the labels

  • confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels before saving them

  • expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic attributes

clear_field(field_name)#

Clears the value of a field of the document.

Parameters:

field_name – the name of the field to clear

Raises:

AttributeError – if the field does not exist

compute_metadata(overwrite=False, skip_failures=False)#

Populates the metadata field of the sample.

Parameters:
  • overwrite (False) – whether to overwrite existing metadata

  • skip_failures (False) – whether to gracefully continue without raising an error if metadata cannot be computed

copy(fields=None, omit_fields=None)#

Returns a deep copy of the sample that has not been added to the database.

Parameters:
  • fields (None) – an optional field or iterable of fields to which to restrict the copy. This can also be a dict mapping existing field names to new field names

  • omit_fields (None) – an optional field or iterable of fields to exclude from the copy

Returns:

a Sample

property dataset#

The dataset to which this document belongs, or None if it has not been added to a dataset.

property dataset_id#
property excluded_field_names#

The set of field names that are excluded on this document view, or None if no fields are explicitly excluded.

property field_names#

An ordered tuple of field names of this document view.

This may be a subset of all fields of the document if fields have been selected or excluded.

property filename#

The basename of the media’s filepath.

property filtered_field_names#

The set of field names or embedded.field.names that have been filtered on this document view, or None if no fields are filtered.

get_field(field_name)#

Gets the value of a field of the document.

Parameters:

field_name – the field name

Returns:

the field value

Raises:

AttributeError – if the field does not exist

has_field(field_name)#

Determines whether the document has the given field.

Parameters:

field_name – the field name

Returns:

True/False

property in_dataset#

Whether the document has been added to a dataset.

iter_fields(include_id=False, include_timestamps=False)#

Returns an iterator over the (name, value) pairs of the public fields of the document.

Parameters:
  • include_id (False) – whether to include the id field

  • include_timestamps (False) – whether to include the created_at and last_modified_at fields

Returns:

an iterator that emits (name, value) tuples

property media_type#

The media type of the sample.

merge(sample, fields=None, omit_fields=None, merge_lists=True, merge_embedded_docs=False, overwrite=True, expand_schema=True, validate=True, dynamic=False)#

Merges the fields of the given sample into this sample.

The behavior of this method is highly customizable. By default, all top-level fields from the provided sample are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both samples are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether new fields can be added to the dataset schema

  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements

  • Whether to merge only specific fields, or all but certain fields

  • Mapping input sample fields to different field names of this sample

Parameters:
  • sample – a fiftyone.core.sample.Sample

  • fields (None) – an optional field or iterable of fields to which to restrict the merge. May contain frame fields for video samples. This can also be a dict mapping field names of the input sample to field names of this sample

  • omit_fields (None) – an optional field or iterable of fields to exclude from the merge. May contain frame fields for video samples

  • merge_lists (True) – whether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided sample

  • merge_embedded_docs (False) – whether to merge the attributes of embedded documents (True) rather than merging the entire top-level field (False)

  • overwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements

  • expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic embedded document fields

property selected_field_names#

The set of field names that are selected on this document view, or None if no fields are explicitly selected.

set_field(field_name, value, create=True, validate=True, dynamic=False)#

Sets the value of a field of the document.

Parameters:
  • field_name – the field name

  • value – the field value

  • create (True) – whether to create the field if it does not exist

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic embedded document fields

Raises:
  • ValueError – if field_name is not an allowed field name

  • AttributeError – if the field does not exist and create == False

to_json(pretty_print=False)#

Serializes the document to a JSON string.

The document ID and private fields are excluded in this representation.

Parameters:

pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

Returns:

a JSON string

to_mongo_dict(include_id=False)#

Serializes the document to a BSON dictionary equivalent to the representation that would be stored in the database.

Parameters:

include_id (False) – whether to include the document ID

Returns:

a BSON dict

update_fields(fields_dict, expand_schema=True, validate=True, dynamic=False)#

Sets the dictionary of fields on the document.

Parameters:
  • fields_dict – a dict mapping field names to values

  • expand_schema (True) – whether to dynamically add new fields encountered to the document schema. If False, an error is raised if any fields are not in the document schema

  • validate (True) – whether to validate values for existing fields

  • dynamic (False) – whether to declare dynamic embedded document fields

Raises:

AttributeError – if expand_schema == False and a field does not exist