fiftyone.brain.similarity#

Similarity interface.

Copyright 2017-2025, Voxel51, Inc.

Functions:

compute_similarity(samples, patches_field, ...)

See fiftyone/brain/__init__.py.

Classes:

SimilarityConfig([embeddings_field, model, ...])

Similarity configuration.

Similarity(config)

Base class for similarity factories.

SimilarityIndex(samples, config, brain_key)

Base class for similarity indexes.

DuplicatesMixin()

Mixin for SimilarityIndex instances that support duplicate detection operations.

fiftyone.brain.similarity.compute_similarity(samples, patches_field, roi_field, embeddings, brain_key, model, model_kwargs, force_square, alpha, batch_size, num_workers, skip_failures, progress, backend, **kwargs)#

See fiftyone/brain/__init__.py.

class fiftyone.brain.similarity.SimilarityConfig(embeddings_field=None, model=None, model_kwargs=None, patches_field=None, roi_field=None, supports_prompts=None, **kwargs)#

Bases: BrainMethodConfig

Similarity configuration.

Parameters:
  • embeddings_field (None) – the sample field containing the embeddings, if one was provided

  • model (None) – the fiftyone.core.models.Model or name of the zoo model that was used to compute embeddings, if known

  • model_kwargs (None) – a dictionary of optional keyword arguments to pass to the model’s Config when a model name is provided

  • patches_field (None) – the sample field defining the patches being analyzed, if any

  • roi_field (None) – the sample field defining a region of interest within each image to use to compute embeddings, if any

  • supports_prompts (False) – whether this run supports prompt queries

Attributes:

type

The type of run.

method

The name of the similarity backend.

max_k

A maximum k value for nearest neighbor queries, or None if there is no limit.

supports_least_similarity

Whether this backend supports least similarity queries.

supported_aggregations

A tuple of supported values for the aggregation parameter of the backend's sort_by_similarity() and _kneighbors() methods.

cls

The fully-qualified name of this BaseRunConfig class.

run_cls

The BaseRun class associated with this config.

Methods:

load_credentials(**kwargs)

Loads any necessary credentials from the given keyword arguments or the relevant FiftyOne config.

attributes()

Returns the list of class attributes that will be serialized by serialize().

base_config_cls(type)

Returns the config class for the given run type.

build()

Builds the BaseRun instance associated with this config.

builder()

Returns a ConfigBuilder instance for this class.

copy()

Returns a deep copy of the object.

custom_attributes([dynamic, private])

Returns a customizable list of class attributes.

default()

Returns the default config instance.

from_dict(d)

Constructs a BaseRunConfig from a serialized JSON dict representation of it.

from_json(path, *args, **kwargs)

Constructs a Serializable object from a JSON file.

from_kwargs(**kwargs)

Constructs a Config object from keyword arguments.

from_str(s, *args, **kwargs)

Constructs a Serializable object from a JSON string.

get_class_name()

Returns the fully-qualified class name string of this object.

load_default()

Loads the default config instance from file.

parse_array(d, key[, default])

Parses a raw array attribute.

parse_bool(d, key[, default])

Parses a boolean value.

parse_categorical(d, key, choices[, default])

Parses a categorical JSON field, which must take a value from among the given choices.

parse_dict(d, key[, default])

Parses a dictionary attribute.

parse_int(d, key[, default])

Parses an integer attribute.

parse_mutually_exclusive_fields(fields)

Parses a mutually exclusive dictionary of pre-parsed fields, which must contain exactly one field with a truthy value.

parse_number(d, key[, default])

Parses a number attribute.

parse_object(d, key, cls[, default])

Parses an object attribute.

parse_object_array(d, key, cls[, default])

Parses an array of objects.

parse_object_dict(d, key, cls[, default])

Parses a dictionary whose values are objects.

parse_path(d, key[, default])

Parses a path attribute.

parse_raw(d, key[, default])

Parses a raw (arbitrary) JSON field.

parse_string(d, key[, default])

Parses a string attribute.

serialize([reflective])

Serializes the object into a dictionary.

to_str([pretty_print])

Returns a string representation of this object.

validate_all_or_nothing_fields(fields)

Validates a dictionary of pre-parsed fields checking that either all or none of the fields have a truthy value.

write_json(path[, pretty_print])

Serializes the object and writes it to disk.

property type#

The type of run.

property method#

The name of the similarity backend.

property max_k#

A maximum k value for nearest neighbor queries, or None if there is no limit.

property supports_least_similarity#

Whether this backend supports least similarity queries.

property supported_aggregations#

A tuple of supported values for the aggregation parameter of the backend’s sort_by_similarity() and _kneighbors() methods.

load_credentials(**kwargs)#

Loads any necessary credentials from the given keyword arguments or the relevant FiftyOne config.

Parameters:

**kwargs – subclass-specific credentials

attributes()#

Returns the list of class attributes that will be serialized by serialize().

Returns:

a list of attributes

static base_config_cls(type)#

Returns the config class for the given run type.

Parameters:

type – a BaseRunConfig.type

Returns:

a BaseRunConfig subclass

build()#

Builds the BaseRun instance associated with this config.

Returns:

a BaseRun instance

classmethod builder()#

Returns a ConfigBuilder instance for this class.

property cls#

The fully-qualified name of this BaseRunConfig class.

copy()#

Returns a deep copy of the object.

Returns:

a Serializable instance

custom_attributes(dynamic=False, private=False)#

Returns a customizable list of class attributes.

By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).

Parameters:
  • dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False

  • private – whether to include private properties, i.e., those starting with “_”. By default, this is False

Returns:

a list of class attributes

classmethod default()#

Returns the default config instance.

By default, this method instantiates the class from an empty dictionary, which will only succeed if all attributes are optional. Otherwise, subclasses should override this method to provide the desired default configuration.

classmethod from_dict(d)#

Constructs a BaseRunConfig from a serialized JSON dict representation of it.

Parameters:

d – a JSON dict

Returns:

a BaseRunConfig

classmethod from_json(path, *args, **kwargs)#

Constructs a Serializable object from a JSON file.

Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.

Parameters:
  • path – the path to the JSON file on disk

  • *args – optional positional arguments for self.from_dict()

  • **kwargs – optional keyword arguments for self.from_dict()

Returns:

an instance of the Serializable class

classmethod from_kwargs(**kwargs)#

Constructs a Config object from keyword arguments.

Parameters:

**kwargs – keyword arguments that define the fields expected by cls

Returns:

an instance of cls

classmethod from_str(s, *args, **kwargs)#

Constructs a Serializable object from a JSON string.

Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.

Parameters:
  • s – a JSON string representation of a Serializable object

  • *args – optional positional arguments for self.from_dict()

  • **kwargs – optional keyword arguments for self.from_dict()

Returns:

an instance of the Serializable class

classmethod get_class_name()#

Returns the fully-qualified class name string of this object.

classmethod load_default()#

Loads the default config instance from file.

Subclasses must implement this method if they intend to support default instances.

static parse_array(d, key, default=<eta.core.config.NoDefault object>)#

Parses a raw array attribute.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default list to return if key is not present

Returns:

a list of raw (untouched) values

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_bool(d, key, default=<eta.core.config.NoDefault object>)#

Parses a boolean value.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default bool to return if key is not present

Returns:

True/False

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_categorical(d, key, choices, default=<eta.core.config.NoDefault object>)#

Parses a categorical JSON field, which must take a value from among the given choices.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • choices – either an iterable of possible values or an enum-like class whose attributes define the possible values

  • default – a default value to return if key is not present

Returns:

the raw (untouched) value of the given field, which is equal to a value from choices

Raises:

ConfigError – if the key was present in the dictionary but its value was not an allowed choice, or if no default value was provided and the key was not found in the dictionary

static parse_dict(d, key, default=<eta.core.config.NoDefault object>)#

Parses a dictionary attribute.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default dict to return if key is not present

Returns:

a dictionary

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_int(d, key, default=<eta.core.config.NoDefault object>)#

Parses an integer attribute.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default integer value to return if key is not present

Returns:

an int

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_mutually_exclusive_fields(fields)#

Parses a mutually exclusive dictionary of pre-parsed fields, which must contain exactly one field with a truthy value.

Parameters:

fields – a dictionary of pre-parsed fields

Returns:

the (field, value) that was set

Raises:

ConfigError – if zero or more than one truthy value was found

static parse_number(d, key, default=<eta.core.config.NoDefault object>)#

Parses a number attribute.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default numeric value to return if key is not present

Returns:

a number (e.g. int, float)

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_object(d, key, cls, default=<eta.core.config.NoDefault object>)#

Parses an object attribute.

The value of d[key] can be either an instance of cls or a serialized dict from an instance of cls.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • cls – the class of d[key]

  • default – a default cls instance to return if key is not present

Returns:

an instance of cls

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_object_array(d, key, cls, default=<eta.core.config.NoDefault object>)#

Parses an array of objects.

The values in d[key] can be either instances of cls or serialized dicts from instances of cls.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • cls – the class of the elements of list d[key]

  • default – the default list to return if key is not present

Returns:

a list of cls instances

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_object_dict(d, key, cls, default=<eta.core.config.NoDefault object>)#

Parses a dictionary whose values are objects.

The values in d[key] can be either instances of cls or serialized dicts from instances of cls.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • cls – the class of the values of dictionary d[key]

  • default – the default dict of cls instances to return if key is not present

Returns:

a dictionary whose values are cls instances

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_path(d, key, default=<eta.core.config.NoDefault object>)#

Parses a path attribute.

The path is converted to an absolute path if necessary via os.path.abspath(os.path.expanduser(value)).

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default string to return if key is not present

Returns:

a path string

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

static parse_raw(d, key, default=<eta.core.config.NoDefault object>)#

Parses a raw (arbitrary) JSON field.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default value to return if key is not present

Returns:

the raw (untouched) value of the given field

Raises:

ConfigError – if no default value was provided and the key was not found in the dictionary

static parse_string(d, key, default=<eta.core.config.NoDefault object>)#

Parses a string attribute.

Parameters:
  • d – a JSON dictionary

  • key – the key to parse

  • default – a default string to return if key is not present

Returns:

a string

Raises:

ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary

property run_cls#

The BaseRun class associated with this config.

serialize(reflective=False)#

Serializes the object into a dictionary.

Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.

Parameters:

reflective – whether to include reflective attributes when serializing the object. By default, this is False

Returns:

a JSON dictionary representation of the object

to_str(pretty_print=True, **kwargs)#

Returns a string representation of this object.

Parameters:
  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True

  • **kwargs – optional keyword arguments for self.serialize()

Returns:

a string representation of the object

static validate_all_or_nothing_fields(fields)#

Validates a dictionary of pre-parsed fields checking that either all or none of the fields have a truthy value.

Parameters:

fields – a dictionary of pre-parsed fields

Raises:

ConfigError – if some values are truth and some are not

write_json(path, pretty_print=False, **kwargs)#

Serializes the object and writes it to disk.

Parameters:
  • path – the output path

  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False

  • **kwargs – optional keyword arguments for self.serialize()

class fiftyone.brain.similarity.Similarity(config)#

Bases: BrainMethod

Base class for similarity factories.

Parameters:

config – a SimilarityConfig

Methods:

initialize(samples, brain_key)

Initializes a similarity index.

get_fields(samples, brain_key)

Gets the fields that were involved in the given run.

cleanup(samples, key)

Cleans up the results of the run with the given key from the collection.

delete_run(samples, key[, cleanup])

Deletes the results associated with the given run key from the collection.

delete_runs(samples[, cleanup])

Deletes all runs from the collection.

ensure_requirements()

Ensures that any necessary packages to execute this run are installed.

ensure_usage_requirements()

Ensures that any necessary packages to use existing results for this run are installed.

from_config(config)

Instantiates a Configurable class from a <cls>Config instance.

from_dict(d)

Instantiates a Configurable class from a <cls>Config dict.

from_json(json_path)

Instantiates a Configurable class from a <cls>Config JSON file.

from_kwargs(**kwargs)

Instantiates a Configurable class from keyword arguments defining the attributes of a <cls>Config.

get_run_info(samples, key)

Gets the BaseRunInfo for the given key on the collection.

has_cached_run_results(samples, key)

Determines whether BaseRunResults for the given key are cached on the collection.

list_runs(samples[, type, method])

Returns the list of run keys on the given collection.

load_run_results(samples, key[, cache, ...])

Loads the BaseRunResults for the given key on the collection.

load_run_view(samples, key[, select_fields])

Loads the view on which the specified run was performed.

parse(class_name[, module_name])

Parses a Configurable subclass name string.

register_run(samples, key[, overwrite, cleanup])

Registers a run of this method under the given key on the given collection.

rename(samples, key, new_key)

Performs any necessary operations required to rename this run's key.

run_info_cls()

The BaseRunInfo class associated with this class.

save_run_info(samples, run_info[, ...])

Saves the run information on the collection.

save_run_results(samples, key, run_results)

Saves the run results on the collection.

update_run_config(samples, key, config)

Updates the BaseRunConfig for the given run on the collection.

update_run_key(samples, key, new_key)

Replaces the key for the given run with a new key.

validate(config)

Validates that the given config is an instance of <cls>Config.

validate_run(samples, key[, overwrite])

Validates that the collection can accept this run.

initialize(samples, brain_key)#

Initializes a similarity index.

Parameters:
  • samples – a fiftyone.core.collections.SampleColllection

  • brain_key – the brain key

Returns:

a SimilarityIndex

get_fields(samples, brain_key)#

Gets the fields that were involved in the given run.

Parameters:
Returns:

a list of fields

cleanup(samples, key)#

Cleans up the results of the run with the given key from the collection.

Parameters:
classmethod delete_run(samples, key, cleanup=True)#

Deletes the results associated with the given run key from the collection.

Parameters:
classmethod delete_runs(samples, cleanup=True)#

Deletes all runs from the collection.

Parameters:
ensure_requirements()#

Ensures that any necessary packages to execute this run are installed.

Runs should respect fiftyone.config.requirement_error_level when handling errors.

ensure_usage_requirements()#

Ensures that any necessary packages to use existing results for this run are installed.

Runs should respect fiftyone.config.requirement_error_level when handling errors.

classmethod from_config(config)#

Instantiates a Configurable class from a <cls>Config instance.

classmethod from_dict(d)#

Instantiates a Configurable class from a <cls>Config dict.

Parameters:

d – a dict to construct a <cls>Config

Returns:

an instance of cls

classmethod from_json(json_path)#

Instantiates a Configurable class from a <cls>Config JSON file.

Parameters:

json_path – path to a JSON file for type <cls>Config

Returns:

an instance of cls

classmethod from_kwargs(**kwargs)#

Instantiates a Configurable class from keyword arguments defining the attributes of a <cls>Config.

Parameters:

**kwargs – keyword arguments that define the fields of a <cls>Config dict

Returns:

an instance of cls

classmethod get_run_info(samples, key)#

Gets the BaseRunInfo for the given key on the collection.

Parameters:
Returns:

a BaseRunInfo

classmethod has_cached_run_results(samples, key)#

Determines whether BaseRunResults for the given key are cached on the collection.

Parameters:
Returns:

True/False

classmethod list_runs(samples, type=None, method=None, **kwargs)#

Returns the list of run keys on the given collection.

Parameters:
Returns:

a list of run keys

classmethod load_run_results(samples, key, cache=True, load_view=True, **kwargs)#

Loads the BaseRunResults for the given key on the collection.

Parameters:
  • samples – a fiftyone.core.collections.SampleCollection

  • key – a run key

  • cache (True) – whether to cache the results on the collection

  • load_view (True) – whether to load the run view in the results (True) or the full dataset (False)

  • **kwargs – keyword arguments for the run’s BaseRunConfig.load_credentials() method

Returns:

a BaseRunResults, or None if the run did not save results

classmethod load_run_view(samples, key, select_fields=False)#

Loads the view on which the specified run was performed.

Parameters:
Returns:

a fiftyone.core.collections.SampleCollection

static parse(class_name, module_name=None)#

Parses a Configurable subclass name string.

Assumes both the Configurable class and the Config class are defined in the same module. The module containing the classes will be loaded if necessary.

Parameters:
  • class_name – a string containing the name of the Configurable class, e.g. “ClassName”, or a fully-qualified class name, e.g. “eta.core.config.ClassName”

  • module_name – a string containing the fully-qualified module name, e.g. “eta.core.config”, or None if class_name includes the module name. Set module_name = __name__ to load a class from the calling module

Returns:

the Configurable class config_cls: the Config class associated with cls

Return type:

cls

register_run(samples, key, overwrite=True, cleanup=True)#

Registers a run of this method under the given key on the given collection.

Parameters:
  • samples – a fiftyone.core.collections.SampleCollection

  • key – a run key

  • overwrite (True) – whether to allow overwriting an existing run of the same type

  • cleanup (True) – whether to execute an existing run’s BaseRun.cleanup() method when overwriting it

rename(samples, key, new_key)#

Performs any necessary operations required to rename this run’s key.

Parameters:
classmethod run_info_cls()#

The BaseRunInfo class associated with this class.

classmethod save_run_info(samples, run_info, overwrite=True, cleanup=True)#

Saves the run information on the collection.

Parameters:
  • samples – a fiftyone.core.collections.SampleCollection

  • run_info – a BaseRunInfo

  • overwrite (True) – whether to overwrite an existing run with the same key

  • cleanup (True) – whether to execute an existing run’s BaseRun.cleanup() method when overwriting it

classmethod save_run_results(samples, key, run_results, overwrite=True, cache=True)#

Saves the run results on the collection.

Parameters:
  • samples – a fiftyone.core.collections.SampleCollection

  • key – a run key

  • run_results – a BaseRunResults, or None

  • overwrite (True) – whether to overwrite an existing result with the same key

  • cache (True) – whether to cache the results on the collection

classmethod update_run_config(samples, key, config)#

Updates the BaseRunConfig for the given run on the collection.

Parameters:
classmethod update_run_key(samples, key, new_key)#

Replaces the key for the given run with a new key.

Parameters:
classmethod validate(config)#

Validates that the given config is an instance of <cls>Config.

Raises:

ConfigurableError – if config is not an instance of <cls>Config

validate_run(samples, key, overwrite=True)#

Validates that the collection can accept this run.

The run may be invalid if, for example, a run of a different type has already been run under the same key and thus overwriting it would cause ambiguity on how to cleanup the results.

Parameters:
Raises:

ValueError – if the run is invalid

class fiftyone.brain.similarity.SimilarityIndex(samples, config, brain_key, backend=None)#

Bases: BrainResults

Base class for similarity indexes.

Parameters:

Attributes:

config

The SimilarityConfig for these results.

is_external

Whether this similarity index manages its own embeddings (True) or loads them directly from the embeddings_field of the dataset (False).

sample_ids

The sample IDs of the full index, or None if not supported.

label_ids

The label IDs of the full index, or None if not applicable or not supported.

total_index_size

The total number of data points in the index.

has_view

Whether the index is currently restricted to a view.

view

The fiftyone.core.collections.SampleCollection against which results are currently being generated.

current_sample_ids

The sample IDs of the currently active data points in the index.

current_label_ids

The label IDs of the currently active data points in the index, or None if not applicable.

index_size

The number of active data points in the index.

missing_size

The total number of data points in view() that are missing from this index, or None if unknown.

backend

The BaseRun for these results.

cls

The fully-qualified name of this BaseRunResults class.

key

The run key for these results.

samples

The fiftyone.core.collections.SampleCollection associated with these results.

Methods:

add_to_index(embeddings, sample_ids[, ...])

Adds the given embeddings to the index.

remove_from_index([sample_ids, label_ids, ...])

Removes the specified embeddings from the index.

get_embeddings([sample_ids, label_ids, ...])

Retrieves the embeddings for the given IDs from the index.

use_view(samples[, allow_missing, warn_missing])

Restricts the index to the provided view.

clear_view()

Clears the view set by use_view(), if any.

reload()

Reloads the index for the current view.

cleanup()

Deletes the similarity index from the backend.

values(path_or_expr)

Extracts a flat list of values from the given field or expression corresponding to the current view().

sort_by_similarity(query[, k, reverse, ...])

Returns a view that sorts the samples/labels in view() by similarity to the specified query.

get_model()

Returns the stored model for this index.

compute_embeddings(samples[, model, ...])

Computes embeddings for the given samples using this backend's model.

attributes()

Returns the list of class attributes that will be serialized by serialize().

base_results_cls(type)

Returns the results class for the given run type.

copy()

Returns a deep copy of the object.

custom_attributes([dynamic, private])

Returns a customizable list of class attributes.

from_dict(d, samples, config, key)

Builds a BaseRunResults from a JSON dict representation of it.

from_json(path, *args, **kwargs)

Constructs a Serializable object from a JSON file.

from_str(s, *args, **kwargs)

Constructs a Serializable object from a JSON string.

get_class_name()

Returns the fully-qualified class name string of this object.

save()

Saves the results to the database.

save_config()

Saves these results config to the database.

serialize([reflective])

Serializes the object into a dictionary.

to_str([pretty_print])

Returns a string representation of this object.

write_json(path[, pretty_print])

Serializes the object and writes it to disk.

property config#

The SimilarityConfig for these results.

property is_external#

Whether this similarity index manages its own embeddings (True) or loads them directly from the embeddings_field of the dataset (False).

property sample_ids#

The sample IDs of the full index, or None if not supported.

property label_ids#

The label IDs of the full index, or None if not applicable or not supported.

property total_index_size#

The total number of data points in the index.

If use_view() has been called to restrict the index, this value may be larger than the current index_size().

property has_view#

Whether the index is currently restricted to a view.

Use use_view() to restrict the index to a view, and use clear_view() to reset to the full index.

property view#

The fiftyone.core.collections.SampleCollection against which results are currently being generated.

If use_view() has been called, this view may be different than the collection on which the full index was generated.

property current_sample_ids#

The sample IDs of the currently active data points in the index.

If use_view() has been called, this may be a subset of the full index.

If the index does not support full sample ID lists (ie if sample_ids() is None), then this will be all sample IDs in the current view() regardless of whether all samples are indexed.

property current_label_ids#

The label IDs of the currently active data points in the index, or None if not applicable.

If use_view() has been called, this may be a subset of the full index.

If the index does not support full label ID lists (ie if label_ids() is None), then this will be all label IDs in the current view() regardless of whether all labels are indexed.

property index_size#

The number of active data points in the index.

If use_view() has been called to restrict the index, this property will reflect the size of the active index.

property missing_size#

The total number of data points in view() that are missing from this index, or None if unknown.

This property is only applicable when use_view() has been called, and it will be None if no data points are missing or when the backend does not support it.

add_to_index(embeddings, sample_ids, label_ids=None, overwrite=True, allow_existing=True, warn_existing=False, reload=True)#

Adds the given embeddings to the index.

Parameters:
  • embeddings – a num_embeddings x num_dims array of embeddings

  • sample_ids – a num_embeddings array of sample IDs

  • label_ids (None) – a num_embeddings array of label IDs, if applicable

  • overwrite (True) – whether to replace (True) or ignore (False) existing embeddings with the same sample/label IDs

  • allow_existing (True) – whether to ignore (True) or raise an error (False) when overwrite is False and a provided ID already exists in the

  • warn_existing (False) – whether to log a warning if an embedding is not added to the index because its ID already exists

  • reload (True) – whether to call reload() to refresh the current view after the update

remove_from_index(sample_ids=None, label_ids=None, allow_missing=True, warn_missing=False, reload=True)#

Removes the specified embeddings from the index.

Parameters:
  • sample_ids (None) – an array of sample IDs

  • label_ids (None) – an array of label IDs, if applicable

  • allow_missing (True) – whether to allow the index to not contain IDs that you provide (True) or whether to raise an error in this case (False)

  • warn_missing (False) – whether to log a warning if the index does not contain IDs that you provide

  • reload (True) – whether to call reload() to refresh the current view after the update

get_embeddings(sample_ids=None, label_ids=None, allow_missing=True, warn_missing=False)#

Retrieves the embeddings for the given IDs from the index.

If no IDs are provided, the entire index is returned.

Parameters:
  • sample_ids (None) – a sample ID or list of sample IDs for which to retrieve embeddings

  • label_ids (None) – a label ID or list of label IDs for which to retrieve embeddings

  • allow_missing (True) – whether to allow the index to not contain IDs that you provide (True) or whether to raise an error in this case (False)

  • warn_missing (False) – whether to log a warning if the index does not contain IDs that you provide

Returns:

  • a num_embeddings x num_dims array of embeddings

  • a num_embeddings array of sample IDs

  • a num_embeddings array of label IDs, if applicable, or else None

Return type:

a tuple of

use_view(samples, allow_missing=True, warn_missing=False)#

Restricts the index to the provided view.

Subsequent calls to methods on this instance will only contain results from the specified view rather than the full index.

Use clear_view() to reset to the full index. Or, equivalently, use the context manager interface as demonstrated below to automatically reset the view when the context exits.

Example usage:

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

results = fob.compute_similarity(dataset)
print(results.index_size)  # 200

view = dataset.take(50)

with results.use_view(view):
    print(results.index_size)  # 50

    results.find_unique(10)
    print(results.unique_ids)

    plot = results.visualize_unique()
    plot.show()
Parameters:
  • samples – a fiftyone.core.collections.SampleCollection

  • allow_missing (True) – whether to allow the provided collection to contain data points that this index does not contain (True) or whether to raise an error in this case (False)

  • warn_missing (False) – whether to log a warning if the provided collection contains data points that this index does not contain

Returns:

self

clear_view()#

Clears the view set by use_view(), if any.

Subsequent operations will be performed on the full index.

reload()#

Reloads the index for the current view.

Subclasses may override this method, but by default this method simply passes the current view() back into use_view(), which updates the index’s current ID set based on any changes to the view since the index was last loaded.

cleanup()#

Deletes the similarity index from the backend.

values(path_or_expr)#

Extracts a flat list of values from the given field or expression corresponding to the current view().

This method always returns values in the same order as current_sample_ids() and current_label_ids().

Parameters:

path_or_expr –

the values to extract, which can be:

Returns:

a list of values

sort_by_similarity(query, k=None, reverse=False, aggregation='mean', dist_field=None, _mongo=False)#

Returns a view that sorts the samples/labels in view() by similarity to the specified query.

When querying by IDs, the query can be any ID(s) in the full index of this instance, even if the current view() contains a subset of the full index.

Parameters:
  • query –

    the query, which can be any of the following:

    • an ID or iterable of IDs

    • a num_dims vector or num_queries x num_dims array of vectors

    • a prompt or iterable of prompts (if supported by the index)

  • k (None) – the number of matches to return. Some backends may support None, in which case all samples will be sorted

  • reverse (False) – whether to sort by least similarity (True) or greatest similarity (False). Some backends may not support least similarity

  • aggregation ("mean") – the aggregation method to use when multiple queries are provided. The default is "mean", which means that the query vectors are averaged prior to searching. Some backends may support additional options

  • dist_field (None) – the name of a float field in which to store the distance of each example to the specified query. The field is created if necessary

Returns:

a fiftyone.core.view.DatasetView

get_model()#

Returns the stored model for this index.

Returns:

a fiftyone.core.models.Model

compute_embeddings(samples, model=None, batch_size=None, num_workers=None, skip_failures=True, skip_existing=False, warn_existing=False, force_square=False, alpha=None, progress=None)#

Computes embeddings for the given samples using this backend’s model.

Parameters:
  • samples – a fiftyone.core.collections.SampleCollection

  • model (None) – a fiftyone.core.models.Model to apply. If not provided, these results must have been created with a stored model, which will be used by default

  • batch_size (None) – an optional batch size to use when computing embeddings. Only applicable when a model is provided

  • num_workers (None) – the number of workers to use when loading images. Only applicable when a Torch-based model is being used to compute embeddings

  • skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample

  • skip_existing (False) – whether to skip generating embeddings for sample/label IDs that are already in the index

  • warn_existing (False) – whether to log a warning if any IDs already exist in the index

  • force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a model and patches_field are specified

  • alpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in [-1, inf). If provided, the length and width of the box are expanded (or contracted, when alpha < 0) by (100 * alpha)%. For example, set alpha = 0.1 to expand the boxes by 10%, and set alpha = -0.1 to contract the boxes by 10%. Only applicable when a model and patches_field are specified

  • progress (None) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

Returns:

  • a num_embeddings x num_dims array of embeddings

  • a num_embeddings array of sample IDs

  • a num_embeddings array of label IDs, if applicable, or else None

Return type:

a tuple of

attributes()#

Returns the list of class attributes that will be serialized by serialize().

Returns:

a list of attributes

property backend#

The BaseRun for these results.

static base_results_cls(type)#

Returns the results class for the given run type.

Parameters:

type – a BaseRunConfig.type

Returns:

a BaseRunResults subclass

property cls#

The fully-qualified name of this BaseRunResults class.

copy()#

Returns a deep copy of the object.

Returns:

a Serializable instance

custom_attributes(dynamic=False, private=False)#

Returns a customizable list of class attributes.

By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).

Parameters:
  • dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False

  • private – whether to include private properties, i.e., those starting with “_”. By default, this is False

Returns:

a list of class attributes

classmethod from_dict(d, samples, config, key)#

Builds a BaseRunResults from a JSON dict representation of it.

Parameters:
Returns:

a BaseRunResults

classmethod from_json(path, *args, **kwargs)#

Constructs a Serializable object from a JSON file.

Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.

Parameters:
  • path – the path to the JSON file on disk

  • *args – optional positional arguments for self.from_dict()

  • **kwargs – optional keyword arguments for self.from_dict()

Returns:

an instance of the Serializable class

classmethod from_str(s, *args, **kwargs)#

Constructs a Serializable object from a JSON string.

Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.

Parameters:
  • s – a JSON string representation of a Serializable object

  • *args – optional positional arguments for self.from_dict()

  • **kwargs – optional keyword arguments for self.from_dict()

Returns:

an instance of the Serializable class

classmethod get_class_name()#

Returns the fully-qualified class name string of this object.

property key#

The run key for these results.

property samples#

The fiftyone.core.collections.SampleCollection associated with these results.

save()#

Saves the results to the database.

save_config()#

Saves these results config to the database.

serialize(reflective=False)#

Serializes the object into a dictionary.

Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.

Parameters:

reflective – whether to include reflective attributes when serializing the object. By default, this is False

Returns:

a JSON dictionary representation of the object

to_str(pretty_print=True, **kwargs)#

Returns a string representation of this object.

Parameters:
  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True

  • **kwargs – optional keyword arguments for self.serialize()

Returns:

a string representation of the object

write_json(path, pretty_print=False, **kwargs)#

Serializes the object and writes it to disk.

Parameters:
  • path – the output path

  • pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False

  • **kwargs – optional keyword arguments for self.serialize()

class fiftyone.brain.similarity.DuplicatesMixin#

Bases: object

Mixin for SimilarityIndex instances that support duplicate detection operations.

Similarity backends can expose this mixin simply by implementing _radius_neighbors().

Attributes:

thresh

The threshold used by the last call to find_duplicates() or find_unique().

unique_ids

A list of unique IDs from the last call to find_duplicates() or find_unique().

duplicate_ids

A list of duplicate IDs from the last call to find_duplicates() or find_unique().

neighbors_map

A dictionary mapping IDs to lists of (dup_id, dist) tuples from the last call to find_duplicates().

Methods:

find_duplicates([thresh, fraction])

Queries the index to find near-duplicate examples based on the provided parameters.

find_unique(count)

Queries the index to select a subset of examples of the specified size that are maximally unique with respect to each other.

plot_distances([bins, log, backend])

Plots a histogram of the distance between each example and its nearest neighbor.

duplicates_view([type_field, id_field, ...])

Returns a view that contains only the duplicate examples and their corresponding nearest non-duplicate examples generated by the last call to find_duplicates().

unique_view()

Returns a view that contains only the unique examples generated by the last call to find_duplicates() or find_unique().

visualize_duplicates(visualization[, backend])

Generates an interactive scatterplot of the results generated by the last call to find_duplicates().

visualize_unique(visualization[, backend])

Generates an interactive scatterplot of the results generated by the last call to find_unique().

property thresh#

The threshold used by the last call to find_duplicates() or find_unique().

property unique_ids#

A list of unique IDs from the last call to find_duplicates() or find_unique().

property duplicate_ids#

A list of duplicate IDs from the last call to find_duplicates() or find_unique().

property neighbors_map#

A dictionary mapping IDs to lists of (dup_id, dist) tuples from the last call to find_duplicates().

find_duplicates(thresh=None, fraction=None)#

Queries the index to find near-duplicate examples based on the provided parameters.

Calling this method populates the unique_ids(), duplicate_ids(), neighbors_map, and thresh properties of this object with the results of the query.

Use duplicates_view() and visualize_duplicates() to analyze the results generated by this method.

Parameters:
  • thresh (None) – a distance threshold to use to determine duplicates. If specified, the non-duplicate set will be the (approximately) largest set such that all pairwise distances between non-duplicate examples are greater than this threshold

  • fraction (None) – a desired fraction of images/patches to tag as duplicates, in [0, 1]. In this case thresh is automatically tuned to achieve the desired fraction of duplicates

find_unique(count)#

Queries the index to select a subset of examples of the specified size that are maximally unique with respect to each other.

Calling this method populates the unique_ids(), duplicate_ids(), and thresh properties of this object with the results of the query.

Use unique_view() and visualize_unique() to analyze the results generated by this method.

Parameters:

count – the desired number of unique examples

plot_distances(bins=100, log=False, backend='plotly', **kwargs)#

Plots a histogram of the distance between each example and its nearest neighbor.

If :meth:`find_duplicates or find_unique() has been executed, the threshold used is also indicated on the plot.

Parameters:
  • bins (100) – the number of bins to use

  • log (False) – whether to use a log scale y-axis

  • backend ("plotly") – the plotting backend to use. Supported values are ("plotly", "matplotlib")

  • **kwargs – keyword arguments for the backend plotting method

Returns:

Return type:

one of the following

duplicates_view(type_field=None, id_field=None, dist_field=None, sort_by='distance', reverse=False)#

Returns a view that contains only the duplicate examples and their corresponding nearest non-duplicate examples generated by the last call to find_duplicates().

If you are analyzing patches, the returned view will be a fiftyone.core.patches.PatchesView.

The examples are organized so that each non-duplicate is immediately followed by all duplicate(s) that are nearest to it.

Parameters:
  • type_field (None) – the name of a string field in which to store "nearest" and "duplicate" labels. The field is created if necessary

  • id_field (None) – the name of a string field in which to store the ID of the nearest non-duplicate for each example in the view. The field is created if necessary

  • dist_field (None) – the name of a float field in which to store the distance of each example to its nearest non-duplicate example. The field is created if necessary

  • sort_by ("distance") –

    specifies how to sort the groups of duplicate examples. The supported values are:

    • "distance": sort the groups by the distance between the non-duplicate and its (nearest, if multiple) duplicate

    • "count": sort the groups by the number of duplicate examples

  • reverse (False) – whether to sort in descending order

Returns:

a fiftyone.core.view.DatasetView

unique_view()#

Returns a view that contains only the unique examples generated by the last call to find_duplicates() or find_unique().

If you are analyzing patches, the returned view will be a fiftyone.core.patches.PatchesView.

Returns:

a fiftyone.core.view.DatasetView

visualize_duplicates(visualization, backend='plotly', **kwargs)#

Generates an interactive scatterplot of the results generated by the last call to find_duplicates().

The visualization argument can be any visualization computed on the same dataset (or subset of it) as long as it contains every sample/object in the view whose results you are visualizing.

The points are colored based on the following partition:

  • “duplicate”: duplicate example

  • “nearest”: nearest neighbor of a duplicate example

  • “unique”: the remaining unique examples

Edges are also drawn between each duplicate and its nearest non-duplicate neighbor.

You can attach plots generated by this method to an App session via its fiftyone.core.session.Session.plots attribute, which will automatically sync the session’s view with the currently selected points in the plot.

Parameters:
Returns:

a fiftyone.core.plots.base.InteractivePlot

visualize_unique(visualization, backend='plotly', **kwargs)#

Generates an interactive scatterplot of the results generated by the last call to find_unique().

The visualization argument can be any visualization computed on the same dataset (or subset of it) as long as it contains every sample/object in the view whose results you are visualizing.

The points are colored based on the following partition:

  • “unique”: the unique examples

  • “other”: the other examples

You can attach plots generated by this method to an App session via its fiftyone.core.session.Session.plots attribute, which will automatically sync the session’s view with the currently selected points in the plot.

Parameters:
Returns:

a fiftyone.core.plots.base.InteractivePlot