fiftyone.brain.similarity¶

Similarity interface.

Copyright 2017-2025, Voxel51, Inc.
voxel51.com

Functions:

compute_similarity(samples, patches_field, …)

See fiftyone/brain/__init__.py.

Classes:

`SimilarityConfig`([embeddings_field, model, …])	Similarity configuration.
`Similarity`(config)	Base class for similarity factories.
`SimilarityIndex`(samples, config, brain_key)	Base class for similarity indexes.
`DuplicatesMixin`()	Mixin for `SimilarityIndex` instances that support duplicate detection operations.

fiftyone.brain.similarity.compute_similarity(samples, patches_field, roi_field, embeddings, brain_key, model, model_kwargs, force_square, alpha, batch_size, num_workers, skip_failures, progress, backend, **kwargs)¶: See fiftyone/brain/__init__.py.

class fiftyone.brain.similarity.SimilarityConfig(embeddings_field=None, model=None, model_kwargs=None, patches_field=None, roi_field=None, supports_prompts=None, **kwargs)¶

Bases: fiftyone.core.brain.BrainMethodConfig

Similarity configuration.

Parameters

embeddings_field (None) – the sample field containing the embeddings, if one was provided
model (None) – the fiftyone.core.models.Model or name of the zoo model that was used to compute embeddings, if known
model_kwargs (None) – a dictionary of optional keyword arguments to pass to the model’s Config when a model name is provided
patches_field (None) – the sample field defining the patches being analyzed, if any
roi_field (None) – the sample field defining a region of interest within each image to use to compute embeddings, if any
supports_prompts (False) – whether this run supports prompt queries

Attributes:

`type`	The type of run.
`method`	The name of the similarity backend.
`max_k`	A maximum k value for nearest neighbor queries, or None if there is no limit.
`supports_least_similarity`	Whether this backend supports least similarity queries.
`supported_aggregations`	A tuple of supported values for the `aggregation` parameter of the backend’s `sort_by_similarity()` and `_kneighbors()` methods.
`cls`	The fully-qualified name of this `BaseRunConfig` class.
`run_cls`	The `BaseRun` class associated with this config.

Methods:

`load_credentials`(**kwargs)	Loads any necessary credentials from the given keyword arguments or the relevant FiftyOne config.
`attributes`()	Returns the list of class attributes that will be serialized by `serialize()`.
`base_config_cls`(type)	Returns the config class for the given run type.
`build`()	Builds the `BaseRun` instance associated with this config.
`builder`()	Returns a ConfigBuilder instance for this class.
`copy`()	Returns a deep copy of the object.
`custom_attributes`([dynamic, private])	Returns a customizable list of class attributes.
`default`()	Returns the default config instance.
`from_dict`(d)	Constructs a `BaseRunConfig` from a serialized JSON dict representation of it.
`from_json`(path, args, *kwargs)	Constructs a Serializable object from a JSON file.
`from_kwargs`(**kwargs)	Constructs a Config object from keyword arguments.
`from_str`(s, args, *kwargs)	Constructs a Serializable object from a JSON string.
`get_class_name`()	Returns the fully-qualified class name string of this object.
`load_default`()	Loads the default config instance from file.
`parse_array`(d, key[, default])	Parses a raw array attribute.
`parse_bool`(d, key[, default])	Parses a boolean value.
`parse_categorical`(d, key, choices[, default])	Parses a categorical JSON field, which must take a value from among the given choices.
`parse_dict`(d, key[, default])	Parses a dictionary attribute.
`parse_int`(d, key[, default])	Parses an integer attribute.
`parse_mutually_exclusive_fields`(fields)	Parses a mutually exclusive dictionary of pre-parsed fields, which must contain exactly one field with a truthy value.
`parse_number`(d, key[, default])	Parses a number attribute.
`parse_object`(d, key, cls[, default])	Parses an object attribute.
`parse_object_array`(d, key, cls[, default])	Parses an array of objects.
`parse_object_dict`(d, key, cls[, default])	Parses a dictionary whose values are objects.
`parse_path`(d, key[, default])	Parses a path attribute.
`parse_raw`(d, key[, default])	Parses a raw (arbitrary) JSON field.
`parse_string`(d, key[, default])	Parses a string attribute.
`serialize`([reflective])	Serializes the object into a dictionary.
`to_str`([pretty_print])	Returns a string representation of this object.
`validate_all_or_nothing_fields`(fields)	Validates a dictionary of pre-parsed fields checking that either all or none of the fields have a truthy value.
`write_json`(path[, pretty_print])	Serializes the object and writes it to disk.

property type¶: The type of run.

property method¶: The name of the similarity backend.

property max_k¶: A maximum k value for nearest neighbor queries, or None if there is no limit.

property supports_least_similarity¶: Whether this backend supports least similarity queries.

property supported_aggregations¶: A tuple of supported values for the aggregation parameter of the backend’s sort_by_similarity() and _kneighbors() methods.

load_credentials(**kwargs)¶

Loads any necessary credentials from the given keyword arguments or the relevant FiftyOne config.

Parameters: **kwargs – subclass-specific credentials

attributes()¶

Returns the list of class attributes that will be serialized by serialize().

Returns: a list of attributes

static base_config_cls(type)¶

Returns the config class for the given run type.

Parameters: type – a BaseRunConfig.type
Returns: a BaseRunConfig subclass

build()¶

Builds the BaseRun instance associated with this config.

Returns: a BaseRun instance

classmethod builder()¶: Returns a ConfigBuilder instance for this class.

property cls¶: The fully-qualified name of this BaseRunConfig class.

copy()¶

Returns a deep copy of the object.

Returns: a Serializable instance

custom_attributes(dynamic=False, private=False)¶

Returns a customizable list of class attributes.

By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).

Parameters

dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False
private – whether to include private properties, i.e., those starting with “_”. By default, this is False

Returns

a list of class attributes

classmethod default()¶

Returns the default config instance.

By default, this method instantiates the class from an empty dictionary, which will only succeed if all attributes are optional. Otherwise, subclasses should override this method to provide the desired default configuration.

classmethod from_dict(d)¶

Constructs a BaseRunConfig from a serialized JSON dict representation of it.

Parameters: d – a JSON dict
Returns: a BaseRunConfig

classmethod from_json(path, *args, **kwargs)¶

Constructs a Serializable object from a JSON file.

Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.

Parameters

path – the path to the JSON file on disk
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()

Returns

an instance of the Serializable class

classmethod from_kwargs(**kwargs)¶

Constructs a Config object from keyword arguments.

Parameters: **kwargs – keyword arguments that define the fields expected by cls
Returns: an instance of cls

classmethod from_str(s, *args, **kwargs)¶

Constructs a Serializable object from a JSON string.

Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.

Parameters

s – a JSON string representation of a Serializable object
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()

Returns

an instance of the Serializable class

classmethod get_class_name()¶: Returns the fully-qualified class name string of this object.

classmethod load_default()¶

Loads the default config instance from file.

Subclasses must implement this method if they intend to support default instances.

static parse_array(d, key, default=<eta.core.config.NoDefault object>)¶

Parses a raw array attribute.

Parameters

d – a JSON dictionary
key – the key to parse
default – a default list to return if key is not present

Returns

a list of raw (untouched) values

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_bool(d, key, default=<eta.core.config.NoDefault object>)¶

Parses a boolean value.

Parameters

d – a JSON dictionary
key – the key to parse
default – a default bool to return if key is not present

Returns

True/False

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_categorical(d, key, choices, default=<eta.core.config.NoDefault object>)¶

Parses a categorical JSON field, which must take a value from among the given choices.

Parameters

d – a JSON dictionary
key – the key to parse
choices – either an iterable of possible values or an enum-like class whose attributes define the possible values
default – a default value to return if key is not present

Returns

the raw (untouched) value of the given field, which is equal to a value from choices

Raises

ConfigError – if the key was present in the dictionary but its value was not an allowed choice, or if no default value was provided and the key was not found in the dictionary

static parse_dict(d, key, default=<eta.core.config.NoDefault object>)¶

Parses a dictionary attribute.

Parameters

d – a JSON dictionary
key – the key to parse
default – a default dict to return if key is not present

Returns

a dictionary

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_int(d, key, default=<eta.core.config.NoDefault object>)¶

Parses an integer attribute.

Parameters

d – a JSON dictionary
key – the key to parse
default – a default integer value to return if key is not present

Returns

an int

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_mutually_exclusive_fields(fields)¶

Parses a mutually exclusive dictionary of pre-parsed fields, which must contain exactly one field with a truthy value.

Parameters: fields – a dictionary of pre-parsed fields
Returns: the (field, value) that was set
Raises: ConfigError – if zero or more than one truthy value was found

static parse_number(d, key, default=<eta.core.config.NoDefault object>)¶

Parses a number attribute.

Parameters

d – a JSON dictionary
key – the key to parse
default – a default numeric value to return if key is not present

Returns

a number (e.g. int, float)

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_object(d, key, cls, default=<eta.core.config.NoDefault object>)¶

Parses an object attribute.

The value of d[key] can be either an instance of cls or a serialized dict from an instance of cls.

Parameters

d – a JSON dictionary
key – the key to parse
cls – the class of d[key]
default – a default cls instance to return if key is not present

Returns

an instance of cls

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_object_array(d, key, cls, default=<eta.core.config.NoDefault object>)¶

Parses an array of objects.

The values in d[key] can be either instances of cls or serialized dicts from instances of cls.

Parameters

d – a JSON dictionary
key – the key to parse
cls – the class of the elements of list d[key]
default – the default list to return if key is not present

Returns

a list of cls instances

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_object_dict(d, key, cls, default=<eta.core.config.NoDefault object>)¶

Parses a dictionary whose values are objects.

The values in d[key] can be either instances of cls or serialized dicts from instances of cls.

Parameters

d – a JSON dictionary
key – the key to parse
cls – the class of the values of dictionary d[key]
default – the default dict of cls instances to return if key is not present

Returns

a dictionary whose values are cls instances

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_path(d, key, default=<eta.core.config.NoDefault object>)¶

Parses a path attribute.

The path is converted to an absolute path if necessary via os.path.abspath(os.path.expanduser(value)).

Parameters

d – a JSON dictionary
key – the key to parse
default – a default string to return if key is not present

Returns

a path string

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_raw(d, key, default=<eta.core.config.NoDefault object>)¶

Parses a raw (arbitrary) JSON field.

Parameters

d – a JSON dictionary
key – the key to parse
default – a default value to return if key is not present

Returns

the raw (untouched) value of the given field

Raises

ConfigError – if no default value was provided and the key was not found in the dictionary

static parse_string(d, key, default=<eta.core.config.NoDefault object>)¶

Parses a string attribute.

Parameters

d – a JSON dictionary
key – the key to parse
default – a default string to return if key is not present

Returns

a string

Raises

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

property run_cls¶: The BaseRun class associated with this config.

serialize(reflective=False)¶

Serializes the object into a dictionary.

Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.

Parameters: reflective – whether to include reflective attributes when serializing the object. By default, this is False
Returns: a JSON dictionary representation of the object

to_str(pretty_print=True, **kwargs)¶

Returns a string representation of this object.

Parameters

pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True
**kwargs – optional keyword arguments for self.serialize()

Returns

a string representation of the object

static validate_all_or_nothing_fields(fields)¶

Validates a dictionary of pre-parsed fields checking that either all or none of the fields have a truthy value.

Parameters: fields – a dictionary of pre-parsed fields
Raises: ConfigError – if some values are truth and some are not

write_json(path, pretty_print=False, **kwargs)¶

Serializes the object and writes it to disk.

Parameters

path – the output path
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False
**kwargs – optional keyword arguments for self.serialize()

class fiftyone.brain.similarity.Similarity(config)¶

Bases: fiftyone.core.brain.BrainMethod

Base class for similarity factories.

Parameters: config – a SimilarityConfig

Methods:

`initialize`(samples, brain_key)	Initializes a similarity index.
`get_fields`(samples, brain_key)	Gets the fields that were involved in the given run.
`cleanup`(samples, key)	Cleans up the results of the run with the given key from the collection.
`delete_run`(samples, key[, cleanup])	Deletes the results associated with the given run key from the collection.
`delete_runs`(samples[, cleanup])	Deletes all runs from the collection.
`ensure_requirements`()	Ensures that any necessary packages to execute this run are installed.
`ensure_usage_requirements`()	Ensures that any necessary packages to use existing results for this run are installed.
`from_config`(config)	Instantiates a Configurable class from a <cls>Config instance.
`from_dict`(d)	Instantiates a Configurable class from a <cls>Config dict.
`from_json`(json_path)	Instantiates a Configurable class from a <cls>Config JSON file.
`from_kwargs`(**kwargs)	Instantiates a Configurable class from keyword arguments defining the attributes of a <cls>Config.
`get_run_info`(samples, key)	Gets the `BaseRunInfo` for the given key on the collection.
`has_cached_run_results`(samples, key)	Determines whether `BaseRunResults` for the given key are cached on the collection.
`list_runs`(samples[, type, method])	Returns the list of run keys on the given collection.
`load_run_results`(samples, key[, cache, …])	Loads the `BaseRunResults` for the given key on the collection.
`load_run_view`(samples, key[, select_fields])	Loads the view on which the specified run was performed.
`parse`(class_name[, module_name])	Parses a Configurable subclass name string.
`register_run`(samples, key[, overwrite, cleanup])	Registers a run of this method under the given key on the given collection.
`rename`(samples, key, new_key)	Performs any necessary operations required to rename this run’s key.
`run_info_cls`()	The `BaseRunInfo` class associated with this class.
`save_run_info`(samples, run_info[, …])	Saves the run information on the collection.
`save_run_results`(samples, key, run_results)	Saves the run results on the collection.
`update_run_config`(samples, key, config)	Updates the `BaseRunConfig` for the given run on the collection.
`update_run_key`(samples, key, new_key)	Replaces the key for the given run with a new key.
`validate`(config)	Validates that the given config is an instance of <cls>Config.
`validate_run`(samples, key[, overwrite])	Validates that the collection can accept this run.

initialize(samples, brain_key)¶

Initializes a similarity index.

Parameters

samples – a fiftyone.core.collections.SampleColllection
brain_key – the brain key

Returns

a SimilarityIndex

get_fields(samples, brain_key)¶

Gets the fields that were involved in the given run.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key

Returns

a list of fields

cleanup(samples, key)¶

Cleans up the results of the run with the given key from the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key

classmethod delete_run(samples, key, cleanup=True)¶

Deletes the results associated with the given run key from the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
cleanup (True) – whether to execute the run’s BaseRun.cleanup() method

classmethod delete_runs(samples, cleanup=True)¶

Deletes all runs from the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
cleanup (True) – whether to execute the run’s BaseRun.cleanup() methods

ensure_requirements()¶

Ensures that any necessary packages to execute this run are installed.

Runs should respect fiftyone.config.requirement_error_level when handling errors.

ensure_usage_requirements()¶

Ensures that any necessary packages to use existing results for this run are installed.

Runs should respect fiftyone.config.requirement_error_level when handling errors.

classmethod from_config(config)¶: Instantiates a Configurable class from a <cls>Config instance.

classmethod from_dict(d)¶

Instantiates a Configurable class from a <cls>Config dict.

Parameters: d – a dict to construct a <cls>Config
Returns: an instance of cls

classmethod from_json(json_path)¶

Instantiates a Configurable class from a <cls>Config JSON file.

Parameters: json_path – path to a JSON file for type <cls>Config
Returns: an instance of cls

classmethod from_kwargs(**kwargs)¶

Instantiates a Configurable class from keyword arguments defining the attributes of a <cls>Config.

Parameters: **kwargs – keyword arguments that define the fields of a <cls>Config dict
Returns: an instance of cls

classmethod get_run_info(samples, key)¶

Gets the BaseRunInfo for the given key on the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key

Returns

a BaseRunInfo

classmethod has_cached_run_results(samples, key)¶

Determines whether BaseRunResults for the given key are cached on the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key

Returns

True/False

classmethod list_runs(samples, type=None, method=None, **kwargs)¶

Returns the list of run keys on the given collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
type (None) –
a specific run type to match, which can be:
- a string fiftyone.core.runs.BaseRunConfig.type
- a fiftyone.core.runs.BaseRun class or its fully-qualified class name string
method (None) – a specific fiftyone.core.runs.BaseRunConfig.method string to match
**kwargs – optional config parameters to match

Returns

a list of run keys

classmethod load_run_results(samples, key, cache=True, load_view=True, **kwargs)¶

Loads the BaseRunResults for the given key on the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
cache (True) – whether to cache the results on the collection
load_view (True) – whether to load the run view in the results (True) or the full dataset (False)
**kwargs – keyword arguments for the run’s BaseRunConfig.load_credentials() method

Returns

a BaseRunResults, or None if the run did not save results

classmethod load_run_view(samples, key, select_fields=False)¶

Loads the view on which the specified run was performed.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
select_fields (False) – whether to exclude fields involved in other runs of the same type

Returns

a fiftyone.core.collections.SampleCollection

static parse(class_name, module_name=None)¶

Parses a Configurable subclass name string.

Assumes both the Configurable class and the Config class are defined in the same module. The module containing the classes will be loaded if necessary.

Parameters

class_name – a string containing the name of the Configurable class, e.g. “ClassName”, or a fully-qualified class name, e.g. “eta.core.config.ClassName”
module_name – a string containing the fully-qualified module name, e.g. “eta.core.config”, or None if class_name includes the module name. Set module_name = __name__ to load a class from the calling module

Returns

the Configurable class config_cls: the Config class associated with cls

Return type

cls

register_run(samples, key, overwrite=True, cleanup=True)¶

Registers a run of this method under the given key on the given collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
overwrite (True) – whether to allow overwriting an existing run of the same type
cleanup (True) – whether to execute an existing run’s BaseRun.cleanup() method when overwriting it

rename(samples, key, new_key)¶

Performs any necessary operations required to rename this run’s key.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
new_key – a new run key

classmethod run_info_cls()¶: The BaseRunInfo class associated with this class.

classmethod save_run_info(samples, run_info, overwrite=True, cleanup=True)¶

Saves the run information on the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
run_info – a BaseRunInfo
overwrite (True) – whether to overwrite an existing run with the same key
cleanup (True) – whether to execute an existing run’s BaseRun.cleanup() method when overwriting it

classmethod save_run_results(samples, key, run_results, overwrite=True, cache=True)¶

Saves the run results on the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
run_results – a BaseRunResults, or None
overwrite (True) – whether to overwrite an existing result with the same key
cache (True) – whether to cache the results on the collection

classmethod update_run_config(samples, key, config)¶

Updates the BaseRunConfig for the given run on the collection.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
config – a BaseRunConfig

classmethod update_run_key(samples, key, new_key)¶

Replaces the key for the given run with a new key.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
new_key – a new run key

classmethod validate(config)¶

Validates that the given config is an instance of <cls>Config.

Raises: ConfigurableError – if config is not an instance of <cls>Config

validate_run(samples, key, overwrite=True)¶

Validates that the collection can accept this run.

The run may be invalid if, for example, a run of a different type has already been run under the same key and thus overwriting it would cause ambiguity on how to cleanup the results.

Parameters

samples – a fiftyone.core.collections.SampleCollection
key – a run key
overwrite (True) – whether to allow overwriting an existing run of the same type

Raises

ValueError – if the run is invalid

class fiftyone.brain.similarity.SimilarityIndex(samples, config, brain_key, backend=None)¶

Bases: fiftyone.core.brain.BrainResults

Base class for similarity indexes.

Parameters

samples – the fiftyone.core.collections.SampleCollection used
config – the SimilarityConfig used
brain_key – the brain key
backend (None) – a Similarity backend

Attributes:

`config`	The `SimilarityConfig` for these results.
`is_external`	Whether this similarity index manages its own embeddings (True) or loads them directly from the `embeddings_field` of the dataset (False).
`sample_ids`	The sample IDs of the full index, or `None` if not supported.
`label_ids`	The label IDs of the full index, or `None` if not applicable or not supported.
`total_index_size`	The total number of data points in the index.
`has_view`	Whether the index is currently restricted to a view.
`view`	The `fiftyone.core.collections.SampleCollection` against which results are currently being generated.
`current_sample_ids`	The sample IDs of the currently active data points in the index.
`current_label_ids`	The label IDs of the currently active data points in the index, or `None` if not applicable.
`index_size`	The number of active data points in the index.
`missing_size`	The total number of data points in `view()` that are missing from this index, or `None` if unknown.
`backend`	The `BaseRun` for these results.
`cls`	The fully-qualified name of this `BaseRunResults` class.
`key`	The run key for these results.
`samples`	The `fiftyone.core.collections.SampleCollection` associated with these results.

Methods:

`add_to_index`(embeddings, sample_ids[, …])	Adds the given embeddings to the index.
`remove_from_index`([sample_ids, label_ids, …])	Removes the specified embeddings from the index.
`get_embeddings`([sample_ids, label_ids, …])	Retrieves the embeddings for the given IDs from the index.
`use_view`(samples[, allow_missing, warn_missing])	Restricts the index to the provided view.
`clear_view`()	Clears the view set by `use_view()`, if any.
`reload`()	Reloads the index for the current view.
`cleanup`()	Deletes the similarity index from the backend.
`values`(path_or_expr)	Extracts a flat list of values from the given field or expression corresponding to the current `view()`.
`sort_by_similarity`(query[, k, reverse, …])	Returns a view that sorts the samples/labels in `view()` by similarity to the specified query.
`get_model`()	Returns the stored model for this index.
`compute_embeddings`(samples[, model, …])	Computes embeddings for the given samples using this backend’s model.
`attributes`()	Returns the list of class attributes that will be serialized by `serialize()`.
`base_results_cls`(type)	Returns the results class for the given run type.
`copy`()	Returns a deep copy of the object.
`custom_attributes`([dynamic, private])	Returns a customizable list of class attributes.
`from_dict`(d, samples, config, key)	Builds a `BaseRunResults` from a JSON dict representation of it.
`from_json`(path, args, *kwargs)	Constructs a Serializable object from a JSON file.
`from_str`(s, args, *kwargs)	Constructs a Serializable object from a JSON string.
`get_class_name`()	Returns the fully-qualified class name string of this object.
`save`()	Saves the results to the database.
`save_config`()	Saves these results config to the database.
`serialize`([reflective])	Serializes the object into a dictionary.
`to_str`([pretty_print])	Returns a string representation of this object.
`write_json`(path[, pretty_print])	Serializes the object and writes it to disk.

property config¶: The SimilarityConfig for these results.

property is_external¶: Whether this similarity index manages its own embeddings (True) or loads them directly from the embeddings_field of the dataset (False).

property sample_ids¶: The sample IDs of the full index, or None if not supported.

property label_ids¶: The label IDs of the full index, or None if not applicable or not supported.

property total_index_size¶

The total number of data points in the index.

If use_view() has been called to restrict the index, this value may be larger than the current index_size().

property has_view¶

Whether the index is currently restricted to a view.

Use use_view() to restrict the index to a view, and use clear_view() to reset to the full index.

property view¶

The fiftyone.core.collections.SampleCollection against which results are currently being generated.

If use_view() has been called, this view may be different than the collection on which the full index was generated.

property current_sample_ids¶

The sample IDs of the currently active data points in the index.

If use_view() has been called, this may be a subset of the full index.

If the index does not support full sample ID lists (ie if sample_ids() is None), then this will be all sample IDs in the current view() regardless of whether all samples are indexed.

property current_label_ids¶

The label IDs of the currently active data points in the index, or None if not applicable.

If use_view() has been called, this may be a subset of the full index.

If the index does not support full label ID lists (ie if label_ids() is None), then this will be all label IDs in the current view() regardless of whether all labels are indexed.

property index_size¶

The number of active data points in the index.

If use_view() has been called to restrict the index, this property will reflect the size of the active index.

property missing_size¶

The total number of data points in view() that are missing from this index, or None if unknown.

This property is only applicable when use_view() has been called, and it will be None if no data points are missing or when the backend does not support it.

add_to_index(embeddings, sample_ids, label_ids=None, overwrite=True, allow_existing=True, warn_existing=False, reload=True)¶

Adds the given embeddings to the index.

Parameters

embeddings – a num_embeddings x num_dims array of embeddings
sample_ids – a num_embeddings array of sample IDs
label_ids (None) – a num_embeddings array of label IDs, if applicable
overwrite (True) – whether to replace (True) or ignore (False) existing embeddings with the same sample/label IDs
allow_existing (True) – whether to ignore (True) or raise an error (False) when overwrite is False and a provided ID already exists in the
warn_existing (False) – whether to log a warning if an embedding is not added to the index because its ID already exists
reload (True) – whether to call reload() to refresh the current view after the update

remove_from_index(sample_ids=None, label_ids=None, allow_missing=True, warn_missing=False, reload=True)¶

Removes the specified embeddings from the index.

Parameters

sample_ids (None) – an array of sample IDs
label_ids (None) – an array of label IDs, if applicable
allow_missing (True) – whether to allow the index to not contain IDs that you provide (True) or whether to raise an error in this case (False)
warn_missing (False) – whether to log a warning if the index does not contain IDs that you provide
reload (True) – whether to call reload() to refresh the current view after the update

get_embeddings(sample_ids=None, label_ids=None, allow_missing=True, warn_missing=False)¶

Retrieves the embeddings for the given IDs from the index.

If no IDs are provided, the entire index is returned.

Parameters

sample_ids (None) – a sample ID or list of sample IDs for which to retrieve embeddings
label_ids (None) – a label ID or list of label IDs for which to retrieve embeddings
allow_missing (True) – whether to allow the index to not contain IDs that you provide (True) or whether to raise an error in this case (False)
warn_missing (False) – whether to log a warning if the index does not contain IDs that you provide

Returns

a num_embeddings x num_dims array of embeddings
a num_embeddings array of sample IDs
a num_embeddings array of label IDs, if applicable, or else None

Return type

a tuple of

use_view(samples, allow_missing=True, warn_missing=False)¶

Restricts the index to the provided view.

Subsequent calls to methods on this instance will only contain results from the specified view rather than the full index.

Use clear_view() to reset to the full index. Or, equivalently, use the context manager interface as demonstrated below to automatically reset the view when the context exits.

Example usage:

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

results = fob.compute_similarity(dataset)
print(results.index_size)  # 200

view = dataset.take(50)

with results.use_view(view):
    print(results.index_size)  # 50

    results.find_unique(10)
    print(results.unique_ids)

    plot = results.visualize_unique()
    plot.show()

Parameters

samples – a fiftyone.core.collections.SampleCollection
allow_missing (True) – whether to allow the provided collection to contain data points that this index does not contain (True) or whether to raise an error in this case (False)
warn_missing (False) – whether to log a warning if the provided collection contains data points that this index does not contain

Returns

self

clear_view()¶

Clears the view set by use_view(), if any.

Subsequent operations will be performed on the full index.

reload()¶

Reloads the index for the current view.

Subclasses may override this method, but by default this method simply passes the current view() back into use_view(), which updates the index’s current ID set based on any changes to the view since the index was last loaded.

cleanup()¶: Deletes the similarity index from the backend.

values(path_or_expr)¶

Extracts a flat list of values from the given field or expression corresponding to the current view().

This method always returns values in the same order as current_sample_ids() and current_label_ids().

Parameters

path_or_expr –

the values to extract, which can be:

the name of a sample field or embedded.field.name from which to extract numeric or string values
a fiftyone.core.expressions.ViewExpression defining numeric or string values to compute via fiftyone.core.collections.SampleCollection.values()

Returns

a list of values

sort_by_similarity(query, k=None, reverse=False, aggregation='mean', dist_field=None, _mongo=False)¶

Returns a view that sorts the samples/labels in view() by similarity to the specified query.

When querying by IDs, the query can be any ID(s) in the full index of this instance, even if the current view() contains a subset of the full index.

Parameters

query –
the query, which can be any of the following:
- an ID or iterable of IDs
- a num_dims vector or num_queries x num_dims array of vectors
- a prompt or iterable of prompts (if supported by the index)
k (None) – the number of matches to return. Some backends may support None, in which case all samples will be sorted
reverse (False) – whether to sort by least similarity (True) or greatest similarity (False). Some backends may not support least similarity
aggregation ("mean") – the aggregation method to use when multiple queries are provided. The default is "mean", which means that the query vectors are averaged prior to searching. Some backends may support additional options
dist_field (None) – the name of a float field in which to store the distance of each example to the specified query. The field is created if necessary

Returns

a fiftyone.core.view.DatasetView

get_model()¶

Returns the stored model for this index.

Returns: a fiftyone.core.models.Model

compute_embeddings(samples, model=None, batch_size=None, num_workers=None, skip_failures=True, skip_existing=False, warn_existing=False, force_square=False, alpha=None, progress=None)¶

Computes embeddings for the given samples using this backend’s model.

Parameters

samples – a fiftyone.core.collections.SampleCollection
model (None) – a fiftyone.core.models.Model to apply. If not provided, these results must have been created with a stored model, which will be used by default
batch_size (None) – an optional batch size to use when computing embeddings. Only applicable when a model is provided
num_workers (None) – the number of workers to use when loading images. Only applicable when a Torch-based model is being used to compute embeddings
skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample
skip_existing (False) – whether to skip generating embeddings for sample/label IDs that are already in the index
warn_existing (False) – whether to log a warning if any IDs already exist in the index
force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a model and patches_field are specified
alpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in [-1, inf). If provided, the length and width of the box are expanded (or contracted, when alpha < 0) by (100 * alpha)%. For example, set alpha = 0.1 to expand the boxes by 10%, and set alpha = -0.1 to contract the boxes by 10%. Only applicable when a model and patches_field are specified
progress (None) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

Returns

a num_embeddings x num_dims array of embeddings
a num_embeddings array of sample IDs
a num_embeddings array of label IDs, if applicable, or else None

Return type

a tuple of

attributes()¶

Returns the list of class attributes that will be serialized by serialize().

Returns: a list of attributes

property backend¶: The BaseRun for these results.

static base_results_cls(type)¶

Returns the results class for the given run type.

Parameters: type – a BaseRunConfig.type
Returns: a BaseRunResults subclass

property cls¶: The fully-qualified name of this BaseRunResults class.

copy()¶

Returns a deep copy of the object.

Returns: a Serializable instance

custom_attributes(dynamic=False, private=False)¶

Returns a customizable list of class attributes.

By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).

Parameters

dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False
private – whether to include private properties, i.e., those starting with “_”. By default, this is False

Returns

a list of class attributes

classmethod from_dict(d, samples, config, key)¶

Builds a BaseRunResults from a JSON dict representation of it.

Parameters

d – a JSON dict
samples – the fiftyone.core.collections.SampleCollection for the run
config – the BaseRunConfig for the run
key – the run key

Returns

a BaseRunResults

classmethod from_json(path, *args, **kwargs)¶

Constructs a Serializable object from a JSON file.

Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.

Parameters

path – the path to the JSON file on disk
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()

Returns

an instance of the Serializable class

classmethod from_str(s, *args, **kwargs)¶

Constructs a Serializable object from a JSON string.

Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.

Parameters

s – a JSON string representation of a Serializable object
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()

Returns

an instance of the Serializable class

classmethod get_class_name()¶: Returns the fully-qualified class name string of this object.

property key¶: The run key for these results.

property samples¶: The fiftyone.core.collections.SampleCollection associated with these results.

save()¶: Saves the results to the database.

save_config()¶: Saves these results config to the database.

serialize(reflective=False)¶

Serializes the object into a dictionary.

Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.

Parameters: reflective – whether to include reflective attributes when serializing the object. By default, this is False
Returns: a JSON dictionary representation of the object

to_str(pretty_print=True, **kwargs)¶

Returns a string representation of this object.

Parameters

pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True
**kwargs – optional keyword arguments for self.serialize()

Returns

a string representation of the object

write_json(path, pretty_print=False, **kwargs)¶

Serializes the object and writes it to disk.

Parameters

path – the output path
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False
**kwargs – optional keyword arguments for self.serialize()

class fiftyone.brain.similarity.DuplicatesMixin¶

Bases: object

Mixin for SimilarityIndex instances that support duplicate detection operations.

Similarity backends can expose this mixin simply by implementing _radius_neighbors().

Attributes:

`thresh`	The threshold used by the last call to `find_duplicates()` or `find_unique()`.
`unique_ids`	A list of unique IDs from the last call to `find_duplicates()` or `find_unique()`.
`duplicate_ids`	A list of duplicate IDs from the last call to `find_duplicates()` or `find_unique()`.
`neighbors_map`	A dictionary mapping IDs to lists of `(dup_id, dist)` tuples from the last call to `find_duplicates()`.

Methods:

`find_duplicates`([thresh, fraction])	Queries the index to find near-duplicate examples based on the provided parameters.
`find_unique`(count)	Queries the index to select a subset of examples of the specified size that are maximally unique with respect to each other.
`plot_distances`([bins, log, backend])	Plots a histogram of the distance between each example and its nearest neighbor.
`duplicates_view`([type_field, id_field, …])	Returns a view that contains only the duplicate examples and their corresponding nearest non-duplicate examples generated by the last call to `find_duplicates()`.
`unique_view`()	Returns a view that contains only the unique examples generated by the last call to `find_duplicates()` or `find_unique()`.
`visualize_duplicates`(visualization[, backend])	Generates an interactive scatterplot of the results generated by the last call to `find_duplicates()`.
`visualize_unique`(visualization[, backend])	Generates an interactive scatterplot of the results generated by the last call to `find_unique()`.

property thresh¶: The threshold used by the last call to find_duplicates() or find_unique().

property unique_ids¶: A list of unique IDs from the last call to find_duplicates() or find_unique().

property duplicate_ids¶: A list of duplicate IDs from the last call to find_duplicates() or find_unique().

property neighbors_map¶: A dictionary mapping IDs to lists of (dup_id, dist) tuples from the last call to find_duplicates().

find_duplicates(thresh=None, fraction=None)¶

Queries the index to find near-duplicate examples based on the provided parameters.

Calling this method populates the unique_ids(), duplicate_ids(), neighbors_map, and thresh properties of this object with the results of the query.

Use duplicates_view() and visualize_duplicates() to analyze the results generated by this method.

Parameters

thresh (None) – a distance threshold to use to determine duplicates. If specified, the non-duplicate set will be the (approximately) largest set such that all pairwise distances between non-duplicate examples are greater than this threshold
fraction (None) – a desired fraction of images/patches to tag as duplicates, in [0, 1]. In this case thresh is automatically tuned to achieve the desired fraction of duplicates

find_unique(count)¶

Queries the index to select a subset of examples of the specified size that are maximally unique with respect to each other.

Calling this method populates the unique_ids(), duplicate_ids(), and thresh properties of this object with the results of the query.

Use unique_view() and visualize_unique() to analyze the results generated by this method.

Parameters: count – the desired number of unique examples

plot_distances(bins=100, log=False, backend='plotly', **kwargs)¶

Plots a histogram of the distance between each example and its nearest neighbor.

If :meth:`find_duplicates or find_unique() has been executed, the threshold used is also indicated on the plot.

Parameters

bins (100) – the number of bins to use
log (False) – whether to use a log scale y-axis
backend ("plotly") – the plotting backend to use. Supported values are ("plotly", "matplotlib")
**kwargs – keyword arguments for the backend plotting method

Returns

a fiftyone.core.plots.plotly.PlotlyNotebookPlot, if you are working in a notebook context and the plotly backend is used
a plotly or matplotlib figure, otherwise

Return type

one of the following

duplicates_view(type_field=None, id_field=None, dist_field=None, sort_by='distance', reverse=False)¶

Returns a view that contains only the duplicate examples and their corresponding nearest non-duplicate examples generated by the last call to find_duplicates().

If you are analyzing patches, the returned view will be a fiftyone.core.patches.PatchesView.

The examples are organized so that each non-duplicate is immediately followed by all duplicate(s) that are nearest to it.

Parameters

type_field (None) – the name of a string field in which to store "nearest" and "duplicate" labels. The field is created if necessary
id_field (None) – the name of a string field in which to store the ID of the nearest non-duplicate for each example in the view. The field is created if necessary
dist_field (None) – the name of a float field in which to store the distance of each example to its nearest non-duplicate example. The field is created if necessary
sort_by ("distance") –
specifies how to sort the groups of duplicate examples. The supported values are:
- "distance": sort the groups by the distance between the non-duplicate and its (nearest, if multiple) duplicate
- "count": sort the groups by the number of duplicate examples
reverse (False) – whether to sort in descending order

Returns

a fiftyone.core.view.DatasetView

unique_view()¶

Returns a view that contains only the unique examples generated by the last call to find_duplicates() or find_unique().

If you are analyzing patches, the returned view will be a fiftyone.core.patches.PatchesView.

Returns: a fiftyone.core.view.DatasetView

visualize_duplicates(visualization, backend='plotly', **kwargs)¶

Generates an interactive scatterplot of the results generated by the last call to find_duplicates().

The visualization argument can be any visualization computed on the same dataset (or subset of it) as long as it contains every sample/object in the view whose results you are visualizing.

The points are colored based on the following partition:

“duplicate”: duplicate example

“nearest”: nearest neighbor of a duplicate example

“unique”: the remaining unique examples

Edges are also drawn between each duplicate and its nearest non-duplicate neighbor.

You can attach plots generated by this method to an App session via its fiftyone.core.session.Session.plots attribute, which will automatically sync the session’s view with the currently selected points in the plot.

Parameters

visualization – a fiftyone.brain.visualization.VisualizationResults instance to use to visualize the results
backend ("plotly") – the plotting backend to use. Supported values are ("plotly", "matplotlib")
**kwargs –
keyword arguments for the backend plotting method:
- ”plotly” backend: fiftyone.core.plots.plotly.scatterplot()
- ”matplotlib” backend: fiftyone.core.plots.matplotlib.scatterplot()

Returns

a fiftyone.core.plots.base.InteractivePlot

visualize_unique(visualization, backend='plotly', **kwargs)¶

Generates an interactive scatterplot of the results generated by the last call to find_unique().

The visualization argument can be any visualization computed on the same dataset (or subset of it) as long as it contains every sample/object in the view whose results you are visualizing.

The points are colored based on the following partition:

“unique”: the unique examples

“other”: the other examples

You can attach plots generated by this method to an App session via its fiftyone.core.session.Session.plots attribute, which will automatically sync the session’s view with the currently selected points in the plot.

Parameters

visualization – a fiftyone.brain.visualization.VisualizationResults instance to use to visualize the results
backend ("plotly") – the plotting backend to use. Supported values are ("plotly", "matplotlib")
**kwargs –
keyword arguments for the backend plotting method:
- ”plotly” backend: fiftyone.core.plots.plotly.scatterplot()
- ”matplotlib” backend: fiftyone.core.plots.matplotlib.scatterplot()

Returns

a fiftyone.core.plots.base.InteractivePlot