fiftyone.brain.similarity¶
Similarity interface.
Functions:
|
See |
Classes:
|
Similarity configuration. |
|
Base class for similarity factories. |
|
Base class for similarity indexes. |
Mixin for |
-
fiftyone.brain.similarity.
compute_similarity
(samples, patches_field, embeddings, brain_key, model, model_kwargs, force_square, alpha, batch_size, num_workers, skip_failures, progress, backend, **kwargs)¶ See
fiftyone/brain/__init__.py
.
-
class
fiftyone.brain.similarity.
SimilarityConfig
(embeddings_field=None, model=None, model_kwargs=None, patches_field=None, supports_prompts=None, **kwargs)¶ Bases:
fiftyone.core.brain.BrainMethodConfig
Similarity configuration.
- Parameters
embeddings_field (None) – the sample field containing the embeddings, if one was provided
model (None) – the
fiftyone.core.models.Model
or name of the zoo model that was used to compute embeddings, if knownmodel_kwargs (None) – a dictionary of optional keyword arguments to pass to the model’s
Config
when a model name is providedpatches_field (None) – the sample field defining the patches being analyzed, if any
supports_prompts (False) – whether this run supports prompt queries
Attributes:
The type of run.
The name of the similarity backend.
A maximum k value for nearest neighbor queries, or None if there is no limit.
Whether this backend supports least similarity queries.
A tuple of supported values for the
aggregation
parameter of the backend’ssort_by_similarity()
and_kneighbors()
methods.The fully-qualified name of this
BaseRunConfig
class.The
BaseRun
class associated with this config.Methods:
load_credentials
(**kwargs)Loads any necessary credentials from the given keyword arguments or the relevant FiftyOne config.
Returns the list of class attributes that will be serialized by
serialize()
.base_config_cls
(type)Returns the config class for the given run type.
build
()Builds the
BaseRun
instance associated with this config.builder
()Returns a ConfigBuilder instance for this class.
copy
()Returns a deep copy of the object.
custom_attributes
([dynamic, private])Returns a customizable list of class attributes.
default
()Returns the default config instance.
from_dict
(d)Constructs a
BaseRunConfig
from a serialized JSON dict representation of it.from_json
(path, *args, **kwargs)Constructs a Serializable object from a JSON file.
from_kwargs
(**kwargs)Constructs a Config object from keyword arguments.
from_str
(s, *args, **kwargs)Constructs a Serializable object from a JSON string.
Returns the fully-qualified class name string of this object.
Loads the default config instance from file.
parse_array
(d, key[, default])Parses a raw array attribute.
parse_bool
(d, key[, default])Parses a boolean value.
parse_categorical
(d, key, choices[, default])Parses a categorical JSON field, which must take a value from among the given choices.
parse_dict
(d, key[, default])Parses a dictionary attribute.
parse_int
(d, key[, default])Parses an integer attribute.
parse_mutually_exclusive_fields
(fields)Parses a mutually exclusive dictionary of pre-parsed fields, which must contain exactly one field with a truthy value.
parse_number
(d, key[, default])Parses a number attribute.
parse_object
(d, key, cls[, default])Parses an object attribute.
parse_object_array
(d, key, cls[, default])Parses an array of objects.
parse_object_dict
(d, key, cls[, default])Parses a dictionary whose values are objects.
parse_path
(d, key[, default])Parses a path attribute.
parse_raw
(d, key[, default])Parses a raw (arbitrary) JSON field.
parse_string
(d, key[, default])Parses a string attribute.
serialize
([reflective])Serializes the object into a dictionary.
to_str
([pretty_print])Returns a string representation of this object.
validate_all_or_nothing_fields
(fields)Validates a dictionary of pre-parsed fields checking that either all or none of the fields have a truthy value.
write_json
(path[, pretty_print])Serializes the object and writes it to disk.
-
property
type
¶ The type of run.
-
property
method
¶ The name of the similarity backend.
-
property
max_k
¶ A maximum k value for nearest neighbor queries, or None if there is no limit.
-
property
supports_least_similarity
¶ Whether this backend supports least similarity queries.
-
property
supported_aggregations
¶ A tuple of supported values for the
aggregation
parameter of the backend’ssort_by_similarity()
and_kneighbors()
methods.
-
load_credentials
(**kwargs)¶ Loads any necessary credentials from the given keyword arguments or the relevant FiftyOne config.
- Parameters
**kwargs – subclass-specific credentials
-
attributes
()¶ Returns the list of class attributes that will be serialized by
serialize()
.- Returns
a list of attributes
-
static
base_config_cls
(type)¶ Returns the config class for the given run type.
- Parameters
type – a
BaseRunConfig.type
- Returns
a
BaseRunConfig
subclass
-
build
()¶ Builds the
BaseRun
instance associated with this config.- Returns
a
BaseRun
instance
-
classmethod
builder
()¶ Returns a ConfigBuilder instance for this class.
-
property
cls
¶ The fully-qualified name of this
BaseRunConfig
class.
-
copy
()¶ Returns a deep copy of the object.
- Returns
a Serializable instance
-
custom_attributes
(dynamic=False, private=False)¶ Returns a customizable list of class attributes.
By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).
- Parameters
dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False
private – whether to include private properties, i.e., those starting with “_”. By default, this is False
- Returns
a list of class attributes
-
classmethod
default
()¶ Returns the default config instance.
By default, this method instantiates the class from an empty dictionary, which will only succeed if all attributes are optional. Otherwise, subclasses should override this method to provide the desired default configuration.
-
classmethod
from_dict
(d)¶ Constructs a
BaseRunConfig
from a serialized JSON dict representation of it.- Parameters
d – a JSON dict
- Returns
a
BaseRunConfig
-
classmethod
from_json
(path, *args, **kwargs)¶ Constructs a Serializable object from a JSON file.
Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.
- Parameters
path – the path to the JSON file on disk
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()
- Returns
an instance of the Serializable class
-
classmethod
from_kwargs
(**kwargs)¶ Constructs a Config object from keyword arguments.
- Parameters
**kwargs – keyword arguments that define the fields expected by cls
- Returns
an instance of cls
-
classmethod
from_str
(s, *args, **kwargs)¶ Constructs a Serializable object from a JSON string.
Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.
- Parameters
s – a JSON string representation of a Serializable object
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()
- Returns
an instance of the Serializable class
-
classmethod
get_class_name
()¶ Returns the fully-qualified class name string of this object.
-
classmethod
load_default
()¶ Loads the default config instance from file.
Subclasses must implement this method if they intend to support default instances.
-
static
parse_array
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses a raw array attribute.
- Parameters
d – a JSON dictionary
key – the key to parse
default – a default list to return if key is not present
- Returns
a list of raw (untouched) values
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_bool
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses a boolean value.
- Parameters
d – a JSON dictionary
key – the key to parse
default – a default bool to return if key is not present
- Returns
True/False
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_categorical
(d, key, choices, default=<eta.core.config.NoDefault object>)¶ Parses a categorical JSON field, which must take a value from among the given choices.
- Parameters
d – a JSON dictionary
key – the key to parse
choices – either an iterable of possible values or an enum-like class whose attributes define the possible values
default – a default value to return if key is not present
- Returns
the raw (untouched) value of the given field, which is equal to a value from choices
- Raises
ConfigError – if the key was present in the dictionary but its value was not an allowed choice, or if no default value was provided and the key was not found in the dictionary
-
static
parse_dict
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses a dictionary attribute.
- Parameters
d – a JSON dictionary
key – the key to parse
default – a default dict to return if key is not present
- Returns
a dictionary
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_int
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses an integer attribute.
- Parameters
d – a JSON dictionary
key – the key to parse
default – a default integer value to return if key is not present
- Returns
an int
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_mutually_exclusive_fields
(fields)¶ Parses a mutually exclusive dictionary of pre-parsed fields, which must contain exactly one field with a truthy value.
- Parameters
fields – a dictionary of pre-parsed fields
- Returns
the (field, value) that was set
- Raises
ConfigError – if zero or more than one truthy value was found
-
static
parse_number
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses a number attribute.
- Parameters
d – a JSON dictionary
key – the key to parse
default – a default numeric value to return if key is not present
- Returns
a number (e.g. int, float)
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_object
(d, key, cls, default=<eta.core.config.NoDefault object>)¶ Parses an object attribute.
The value of d[key] can be either an instance of cls or a serialized dict from an instance of cls.
- Parameters
d – a JSON dictionary
key – the key to parse
cls – the class of d[key]
default – a default cls instance to return if key is not present
- Returns
an instance of cls
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_object_array
(d, key, cls, default=<eta.core.config.NoDefault object>)¶ Parses an array of objects.
The values in d[key] can be either instances of cls or serialized dicts from instances of cls.
- Parameters
d – a JSON dictionary
key – the key to parse
cls – the class of the elements of list d[key]
default – the default list to return if key is not present
- Returns
a list of cls instances
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_object_dict
(d, key, cls, default=<eta.core.config.NoDefault object>)¶ Parses a dictionary whose values are objects.
The values in d[key] can be either instances of cls or serialized dicts from instances of cls.
- Parameters
d – a JSON dictionary
key – the key to parse
cls – the class of the values of dictionary d[key]
default – the default dict of cls instances to return if key is not present
- Returns
a dictionary whose values are cls instances
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_path
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses a path attribute.
The path is converted to an absolute path if necessary via
os.path.abspath(os.path.expanduser(value))
.- Parameters
d – a JSON dictionary
key – the key to parse
default – a default string to return if key is not present
- Returns
a path string
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
static
parse_raw
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses a raw (arbitrary) JSON field.
- Parameters
d – a JSON dictionary
key – the key to parse
default – a default value to return if key is not present
- Returns
the raw (untouched) value of the given field
- Raises
ConfigError – if no default value was provided and the key was not found in the dictionary
-
static
parse_string
(d, key, default=<eta.core.config.NoDefault object>)¶ Parses a string attribute.
- Parameters
d – a JSON dictionary
key – the key to parse
default – a default string to return if key is not present
- Returns
a string
- Raises
ConfigError – if the field value was the wrong type or no default value was provided and the key was not found in the dictionary
-
property
run_cls
¶ The
BaseRun
class associated with this config.
-
serialize
(reflective=False)¶ Serializes the object into a dictionary.
Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.
- Parameters
reflective – whether to include reflective attributes when serializing the object. By default, this is False
- Returns
a JSON dictionary representation of the object
-
to_str
(pretty_print=True, **kwargs)¶ Returns a string representation of this object.
- Parameters
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True
**kwargs – optional keyword arguments for self.serialize()
- Returns
a string representation of the object
-
static
validate_all_or_nothing_fields
(fields)¶ Validates a dictionary of pre-parsed fields checking that either all or none of the fields have a truthy value.
- Parameters
fields – a dictionary of pre-parsed fields
- Raises
ConfigError – if some values are truth and some are not
-
write_json
(path, pretty_print=False, **kwargs)¶ Serializes the object and writes it to disk.
- Parameters
path – the output path
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False
**kwargs – optional keyword arguments for self.serialize()
-
class
fiftyone.brain.similarity.
Similarity
(config)¶ Bases:
fiftyone.core.brain.BrainMethod
Base class for similarity factories.
- Parameters
config – a
SimilarityConfig
Methods:
initialize
(samples, brain_key)Initializes a similarity index.
get_fields
(samples, brain_key)Gets the fields that were involved in the given run.
cleanup
(samples, key)Cleans up the results of the run with the given key from the collection.
delete_run
(samples, key[, cleanup])Deletes the results associated with the given run key from the collection.
delete_runs
(samples[, cleanup])Deletes all runs from the collection.
Ensures that any necessary packages to execute this run are installed.
Ensures that any necessary packages to use existing results for this run are installed.
from_config
(config)Instantiates a Configurable class from a <cls>Config instance.
from_dict
(d)Instantiates a Configurable class from a <cls>Config dict.
from_json
(json_path)Instantiates a Configurable class from a <cls>Config JSON file.
from_kwargs
(**kwargs)Instantiates a Configurable class from keyword arguments defining the attributes of a <cls>Config.
get_run_info
(samples, key)Gets the
BaseRunInfo
for the given key on the collection.has_cached_run_results
(samples, key)Determines whether
BaseRunResults
for the given key are cached on the collection.list_runs
(samples[, type, method])Returns the list of run keys on the given collection.
load_run_results
(samples, key[, cache, …])Loads the
BaseRunResults
for the given key on the collection.load_run_view
(samples, key[, select_fields])Loads the
fiftyone.core.view.DatasetView
on which the specified run was performed.parse
(class_name[, module_name])Parses a Configurable subclass name string.
register_run
(samples, key[, overwrite, cleanup])Registers a run of this method under the given key on the given collection.
rename
(samples, key, new_key)Performs any necessary operations required to rename this run’s key.
The
BaseRunInfo
class associated with this class.save_run_info
(samples, run_info[, …])Saves the run information on the collection.
save_run_results
(samples, key, run_results)Saves the run results on the collection.
update_run_config
(samples, key, config)Updates the
BaseRunConfig
for the given run on the collection.update_run_key
(samples, key, new_key)Replaces the key for the given run with a new key.
validate
(config)Validates that the given config is an instance of <cls>Config.
validate_run
(samples, key[, overwrite])Validates that the collection can accept this run.
-
initialize
(samples, brain_key)¶ Initializes a similarity index.
- Parameters
samples – a
fiftyone.core.collections.SampleColllection
brain_key – the brain key
- Returns
-
get_fields
(samples, brain_key)¶ Gets the fields that were involved in the given run.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
- Returns
a list of fields
-
cleanup
(samples, key)¶ Cleans up the results of the run with the given key from the collection.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
-
classmethod
delete_run
(samples, key, cleanup=True)¶ Deletes the results associated with the given run key from the collection.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
cleanup (True) – whether to execute the run’s
BaseRun.cleanup()
method
-
classmethod
delete_runs
(samples, cleanup=True)¶ Deletes all runs from the collection.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
cleanup (True) – whether to execute the run’s
BaseRun.cleanup()
methods
-
ensure_requirements
()¶ Ensures that any necessary packages to execute this run are installed.
Runs should respect
fiftyone.config.requirement_error_level
when handling errors.
-
ensure_usage_requirements
()¶ Ensures that any necessary packages to use existing results for this run are installed.
Runs should respect
fiftyone.config.requirement_error_level
when handling errors.
-
classmethod
from_config
(config)¶ Instantiates a Configurable class from a <cls>Config instance.
-
classmethod
from_dict
(d)¶ Instantiates a Configurable class from a <cls>Config dict.
- Parameters
d – a dict to construct a <cls>Config
- Returns
an instance of cls
-
classmethod
from_json
(json_path)¶ Instantiates a Configurable class from a <cls>Config JSON file.
- Parameters
json_path – path to a JSON file for type <cls>Config
- Returns
an instance of cls
-
classmethod
from_kwargs
(**kwargs)¶ Instantiates a Configurable class from keyword arguments defining the attributes of a <cls>Config.
- Parameters
**kwargs – keyword arguments that define the fields of a <cls>Config dict
- Returns
an instance of cls
-
classmethod
get_run_info
(samples, key)¶ Gets the
BaseRunInfo
for the given key on the collection.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
- Returns
a
BaseRunInfo
-
classmethod
has_cached_run_results
(samples, key)¶ Determines whether
BaseRunResults
for the given key are cached on the collection.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
- Returns
True/False
-
classmethod
list_runs
(samples, type=None, method=None, **kwargs)¶ Returns the list of run keys on the given collection.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
type (None) –
a specific run type to match, which can be:
a
fiftyone.core.runs.BaseRun
class or its fully-qualified class name string
method (None) – a specific
fiftyone.core.runs.BaseRunConfig.method
string to match**kwargs – optional config parameters to match
- Returns
a list of run keys
-
classmethod
load_run_results
(samples, key, cache=True, load_view=True, **kwargs)¶ Loads the
BaseRunResults
for the given key on the collection.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
cache (True) – whether to cache the results on the collection
load_view (True) – whether to load the run view in the results (True) or the full dataset (False)
**kwargs – keyword arguments for the run’s
BaseRunConfig.load_credentials()
method
- Returns
a
BaseRunResults
, or None if the run did not save results
-
classmethod
load_run_view
(samples, key, select_fields=False)¶ Loads the
fiftyone.core.view.DatasetView
on which the specified run was performed.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
select_fields (False) – whether to exclude fields involved in other runs of the same type
- Returns
-
static
parse
(class_name, module_name=None)¶ Parses a Configurable subclass name string.
Assumes both the Configurable class and the Config class are defined in the same module. The module containing the classes will be loaded if necessary.
- Parameters
class_name – a string containing the name of the Configurable class, e.g. “ClassName”, or a fully-qualified class name, e.g. “eta.core.config.ClassName”
module_name – a string containing the fully-qualified module name, e.g. “eta.core.config”, or None if class_name includes the module name. Set module_name = __name__ to load a class from the calling module
- Returns
the Configurable class config_cls: the Config class associated with cls
- Return type
cls
-
register_run
(samples, key, overwrite=True, cleanup=True)¶ Registers a run of this method under the given key on the given collection.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
overwrite (True) – whether to allow overwriting an existing run of the same type
cleanup (True) – whether to execute an existing run’s
BaseRun.cleanup()
method when overwriting it
-
rename
(samples, key, new_key)¶ Performs any necessary operations required to rename this run’s key.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
new_key – a new run key
-
classmethod
run_info_cls
()¶ The
BaseRunInfo
class associated with this class.
-
classmethod
save_run_info
(samples, run_info, overwrite=True, cleanup=True)¶ Saves the run information on the collection.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
run_info – a
BaseRunInfo
overwrite (True) – whether to overwrite an existing run with the same key
cleanup (True) – whether to execute an existing run’s
BaseRun.cleanup()
method when overwriting it
-
classmethod
save_run_results
(samples, key, run_results, overwrite=True, cache=True)¶ Saves the run results on the collection.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
run_results – a
BaseRunResults
, or Noneoverwrite (True) – whether to overwrite an existing result with the same key
cache (True) – whether to cache the results on the collection
-
classmethod
update_run_config
(samples, key, config)¶ Updates the
BaseRunConfig
for the given run on the collection.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
config – a
BaseRunConfig
-
classmethod
update_run_key
(samples, key, new_key)¶ Replaces the key for the given run with a new key.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
new_key – a new run key
-
classmethod
validate
(config)¶ Validates that the given config is an instance of <cls>Config.
- Raises
ConfigurableError – if config is not an instance of <cls>Config
-
validate_run
(samples, key, overwrite=True)¶ Validates that the collection can accept this run.
The run may be invalid if, for example, a run of a different type has already been run under the same key and thus overwriting it would cause ambiguity on how to cleanup the results.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
key – a run key
overwrite (True) – whether to allow overwriting an existing run of the same type
- Raises
ValueError – if the run is invalid
-
class
fiftyone.brain.similarity.
SimilarityIndex
(samples, config, brain_key, backend=None)¶ Bases:
fiftyone.core.brain.BrainResults
Base class for similarity indexes.
- Parameters
samples – the
fiftyone.core.collections.SampleCollection
usedconfig – the
SimilarityConfig
usedbrain_key – the brain key
backend (None) – a
Similarity
backend
Attributes:
The
SimilarityConfig
for these results.Whether this similarity index manages its own embeddings (True) or loads them directly from the
embeddings_field
of the dataset (False).The sample IDs of the full index, or
None
if not supported.The label IDs of the full index, or
None
if not applicable or not supported.The total number of data points in the index.
Whether the index is currently restricted to a view.
The
fiftyone.core.collections.SampleCollection
against which results are currently being generated.The sample IDs of the currently active data points in the index.
The label IDs of the currently active data points in the index, or
None
if not applicable.The number of active data points in the index.
The total number of data points in
view()
that are missing from this index, orNone
if unknown.The
BaseRun
for these results.The fully-qualified name of this
BaseRunResults
class.The run key for these results.
The
fiftyone.core.collections.SampleCollection
associated with these results.Methods:
add_to_index
(embeddings, sample_ids[, …])Adds the given embeddings to the index.
remove_from_index
([sample_ids, label_ids, …])Removes the specified embeddings from the index.
get_embeddings
([sample_ids, label_ids, …])Retrieves the embeddings for the given IDs from the index.
use_view
(samples[, allow_missing, warn_missing])Restricts the index to the provided view.
Clears the view set by
use_view()
, if any.reload
()Reloads the index for the current view.
cleanup
()Deletes the similarity index from the backend.
values
(path_or_expr)Extracts a flat list of values from the given field or expression corresponding to the current
view()
.sort_by_similarity
(query[, k, reverse, …])Returns a view that sorts the samples/labels in
view()
by similarity to the specified query.Returns the stored model for this index.
compute_embeddings
(samples[, model, …])Computes embeddings for the given samples using this backend’s model.
Returns the list of class attributes that will be serialized by
serialize()
.base_results_cls
(type)Returns the results class for the given run type.
copy
()Returns a deep copy of the object.
custom_attributes
([dynamic, private])Returns a customizable list of class attributes.
from_dict
(d, samples, config, key)Builds a
BaseRunResults
from a JSON dict representation of it.from_json
(path, *args, **kwargs)Constructs a Serializable object from a JSON file.
from_str
(s, *args, **kwargs)Constructs a Serializable object from a JSON string.
Returns the fully-qualified class name string of this object.
save
()Saves the results to the database.
Saves these results config to the database.
serialize
([reflective])Serializes the object into a dictionary.
to_str
([pretty_print])Returns a string representation of this object.
write_json
(path[, pretty_print])Serializes the object and writes it to disk.
-
property
config
¶ The
SimilarityConfig
for these results.
-
property
is_external
¶ Whether this similarity index manages its own embeddings (True) or loads them directly from the
embeddings_field
of the dataset (False).
-
property
sample_ids
¶ The sample IDs of the full index, or
None
if not supported.
-
property
label_ids
¶ The label IDs of the full index, or
None
if not applicable or not supported.
-
property
total_index_size
¶ The total number of data points in the index.
If
use_view()
has been called to restrict the index, this value may be larger than the currentindex_size()
.
-
property
has_view
¶ Whether the index is currently restricted to a view.
Use
use_view()
to restrict the index to a view, and useclear_view()
to reset to the full index.
-
property
view
¶ The
fiftyone.core.collections.SampleCollection
against which results are currently being generated.If
use_view()
has been called, this view may be different than the collection on which the full index was generated.
-
property
current_sample_ids
¶ The sample IDs of the currently active data points in the index.
If
use_view()
has been called, this may be a subset of the full index.
-
property
current_label_ids
¶ The label IDs of the currently active data points in the index, or
None
if not applicable.If
use_view()
has been called, this may be a subset of the full index.
-
property
index_size
¶ The number of active data points in the index.
If
use_view()
has been called to restrict the index, this property will reflect the size of the active index.
-
property
missing_size
¶ The total number of data points in
view()
that are missing from this index, orNone
if unknown.This property is only applicable when
use_view()
has been called, and it will beNone
if no data points are missing or when the backend does not support it.
-
add_to_index
(embeddings, sample_ids, label_ids=None, overwrite=True, allow_existing=True, warn_existing=False, reload=True)¶ Adds the given embeddings to the index.
- Parameters
embeddings – a
num_embeddings x num_dims
array of embeddingssample_ids – a
num_embeddings
array of sample IDslabel_ids (None) – a
num_embeddings
array of label IDs, if applicableoverwrite (True) – whether to replace (True) or ignore (False) existing embeddings with the same sample/label IDs
allow_existing (True) – whether to ignore (True) or raise an error (False) when
overwrite
is False and a provided ID already exists in thewarn_existing (False) – whether to log a warning if an embedding is not added to the index because its ID already exists
reload (True) – whether to call
reload()
to refresh the current view after the update
-
remove_from_index
(sample_ids=None, label_ids=None, allow_missing=True, warn_missing=False, reload=True)¶ Removes the specified embeddings from the index.
- Parameters
sample_ids (None) – an array of sample IDs
label_ids (None) – an array of label IDs, if applicable
allow_missing (True) – whether to allow the index to not contain IDs that you provide (True) or whether to raise an error in this case (False)
warn_missing (False) – whether to log a warning if the index does not contain IDs that you provide
reload (True) – whether to call
reload()
to refresh the current view after the update
-
get_embeddings
(sample_ids=None, label_ids=None, allow_missing=True, warn_missing=False)¶ Retrieves the embeddings for the given IDs from the index.
If no IDs are provided, the entire index is returned.
- Parameters
sample_ids (None) – a sample ID or list of sample IDs for which to retrieve embeddings
label_ids (None) – a label ID or list of label IDs for which to retrieve embeddings
allow_missing (True) – whether to allow the index to not contain IDs that you provide (True) or whether to raise an error in this case (False)
warn_missing (False) – whether to log a warning if the index does not contain IDs that you provide
- Returns
a
num_embeddings x num_dims
array of embeddingsa
num_embeddings
array of sample IDsa
num_embeddings
array of label IDs, if applicable, or elseNone
- Return type
a tuple of
-
use_view
(samples, allow_missing=True, warn_missing=False)¶ Restricts the index to the provided view.
Subsequent calls to methods on this instance will only contain results from the specified view rather than the full index.
Use
clear_view()
to reset to the full index. Or, equivalently, use the context manager interface as demonstrated below to automatically reset the view when the context exits.Example usage:
import fiftyone as fo import fiftyone.brain as fob import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart") results = fob.compute_similarity(dataset) print(results.index_size) # 200 view = dataset.take(50) with results.use_view(view): print(results.index_size) # 50 results.find_unique(10) print(results.unique_ids) plot = results.visualize_unique() plot.show()
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
allow_missing (True) – whether to allow the provided collection to contain data points that this index does not contain (True) or whether to raise an error in this case (False)
warn_missing (False) – whether to log a warning if the provided collection contains data points that this index does not contain
- Returns
self
-
clear_view
()¶ Clears the view set by
use_view()
, if any.Subsequent operations will be performed on the full index.
-
reload
()¶ Reloads the index for the current view.
Subclasses may override this method, but by default this method simply passes the current
view()
back intouse_view()
, which updates the index’s current ID set based on any changes to the view since the index was last loaded.
-
cleanup
()¶ Deletes the similarity index from the backend.
-
values
(path_or_expr)¶ Extracts a flat list of values from the given field or expression corresponding to the current
view()
.This method always returns values in the same order as
current_sample_ids()
andcurrent_label_ids()
.- Parameters
path_or_expr –
the values to extract, which can be:
the name of a sample field or
embedded.field.name
from which to extract numeric or string valuesa
fiftyone.core.expressions.ViewExpression
defining numeric or string values to compute viafiftyone.core.collections.SampleCollection.values()
- Returns
a list of values
-
sort_by_similarity
(query, k=None, reverse=False, aggregation='mean', dist_field=None, _mongo=False)¶ Returns a view that sorts the samples/labels in
view()
by similarity to the specified query.When querying by IDs, the query can be any ID(s) in the full index of this instance, even if the current
view()
contains a subset of the full index.- Parameters
query –
the query, which can be any of the following:
an ID or iterable of IDs
a
num_dims
vector ornum_queries x num_dims
array of vectorsa prompt or iterable of prompts (if supported by the index)
k (None) – the number of matches to return. Some backends may support
None
, in which case all samples will be sortedreverse (False) – whether to sort by least similarity (True) or greatest similarity (False). Some backends may not support least similarity
aggregation ("mean") – the aggregation method to use when multiple queries are provided. The default is
"mean"
, which means that the query vectors are averaged prior to searching. Some backends may support additional optionsdist_field (None) – the name of a float field in which to store the distance of each example to the specified query. The field is created if necessary
- Returns
-
get_model
()¶ Returns the stored model for this index.
- Returns
-
compute_embeddings
(samples, model=None, batch_size=None, num_workers=None, skip_failures=True, skip_existing=False, warn_existing=False, force_square=False, alpha=None, progress=None)¶ Computes embeddings for the given samples using this backend’s model.
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
model (None) – a
fiftyone.core.models.Model
to apply. If not provided, these results must have been created with a stored model, which will be used by defaultbatch_size (None) – an optional batch size to use when computing embeddings. Only applicable when a
model
is providednum_workers (None) – the number of workers to use when loading images. Only applicable when a Torch-based model is being used to compute embeddings
skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample
skip_existing (False) – whether to skip generating embeddings for sample/label IDs that are already in the index
warn_existing (False) – whether to log a warning if any IDs already exist in the index
force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a
model
andpatches_field
are specifiedalpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, inf)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%. Only applicable when amodel
andpatches_field
are specifiedprogress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a
num_embeddings x num_dims
array of embeddingsa
num_embeddings
array of sample IDsa
num_embeddings
array of label IDs, if applicable, or elseNone
- Return type
a tuple of
-
attributes
()¶ Returns the list of class attributes that will be serialized by
serialize()
.- Returns
a list of attributes
-
property
backend
¶ The
BaseRun
for these results.
-
static
base_results_cls
(type)¶ Returns the results class for the given run type.
- Parameters
type – a
BaseRunConfig.type
- Returns
a
BaseRunResults
subclass
-
property
cls
¶ The fully-qualified name of this
BaseRunResults
class.
-
copy
()¶ Returns a deep copy of the object.
- Returns
a Serializable instance
-
custom_attributes
(dynamic=False, private=False)¶ Returns a customizable list of class attributes.
By default, all attributes in vars(self) are returned, minus private attributes (those starting with “_”).
- Parameters
dynamic – whether to include dynamic properties, e.g., those defined by getter/setter methods or the @property decorator. By default, this is False
private – whether to include private properties, i.e., those starting with “_”. By default, this is False
- Returns
a list of class attributes
-
classmethod
from_dict
(d, samples, config, key)¶ Builds a
BaseRunResults
from a JSON dict representation of it.- Parameters
d – a JSON dict
samples – the
fiftyone.core.collections.SampleCollection
for the runconfig – the
BaseRunConfig
for the runkey – the run key
- Returns
a
BaseRunResults
-
classmethod
from_json
(path, *args, **kwargs)¶ Constructs a Serializable object from a JSON file.
Subclasses may override this method, but, by default, this method simply reads the JSON and calls from_dict(), which subclasses must implement.
- Parameters
path – the path to the JSON file on disk
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()
- Returns
an instance of the Serializable class
-
classmethod
from_str
(s, *args, **kwargs)¶ Constructs a Serializable object from a JSON string.
Subclasses may override this method, but, by default, this method simply parses the string and calls from_dict(), which subclasses must implement.
- Parameters
s – a JSON string representation of a Serializable object
*args – optional positional arguments for self.from_dict()
**kwargs – optional keyword arguments for self.from_dict()
- Returns
an instance of the Serializable class
-
classmethod
get_class_name
()¶ Returns the fully-qualified class name string of this object.
-
property
key
¶ The run key for these results.
-
property
samples
¶ The
fiftyone.core.collections.SampleCollection
associated with these results.
-
save
()¶ Saves the results to the database.
-
save_config
()¶ Saves these results config to the database.
-
serialize
(reflective=False)¶ Serializes the object into a dictionary.
Serialization is applied recursively to all attributes in the object, including element-wise serialization of lists and dictionary values.
- Parameters
reflective – whether to include reflective attributes when serializing the object. By default, this is False
- Returns
a JSON dictionary representation of the object
-
to_str
(pretty_print=True, **kwargs)¶ Returns a string representation of this object.
- Parameters
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is True
**kwargs – optional keyword arguments for self.serialize()
- Returns
a string representation of the object
-
write_json
(path, pretty_print=False, **kwargs)¶ Serializes the object and writes it to disk.
- Parameters
path – the output path
pretty_print – whether to render the JSON in human readable format with newlines and indentations. By default, this is False
**kwargs – optional keyword arguments for self.serialize()
-
class
fiftyone.brain.similarity.
DuplicatesMixin
¶ Bases:
object
Mixin for
SimilarityIndex
instances that support duplicate detection operations.Similarity backends can expose this mixin simply by implementing
_radius_neighbors()
.Attributes:
The threshold used by the last call to
find_duplicates()
orfind_unique()
.A list of unique IDs from the last call to
find_duplicates()
orfind_unique()
.A list of duplicate IDs from the last call to
find_duplicates()
orfind_unique()
.A dictionary mapping IDs to lists of
(dup_id, dist)
tuples from the last call tofind_duplicates()
.Methods:
find_duplicates
([thresh, fraction])Queries the index to find near-duplicate examples based on the provided parameters.
find_unique
(count)Queries the index to select a subset of examples of the specified size that are maximally unique with respect to each other.
plot_distances
([bins, log, backend])Plots a histogram of the distance between each example and its nearest neighbor.
duplicates_view
([type_field, id_field, …])Returns a view that contains only the duplicate examples and their corresponding nearest non-duplicate examples generated by the last call to
find_duplicates()
.Returns a view that contains only the unique examples generated by the last call to
find_duplicates()
orfind_unique()
.visualize_duplicates
(visualization[, backend])Generates an interactive scatterplot of the results generated by the last call to
find_duplicates()
.visualize_unique
(visualization[, backend])Generates an interactive scatterplot of the results generated by the last call to
find_unique()
.-
property
thresh
¶ The threshold used by the last call to
find_duplicates()
orfind_unique()
.
-
property
unique_ids
¶ A list of unique IDs from the last call to
find_duplicates()
orfind_unique()
.
-
property
duplicate_ids
¶ A list of duplicate IDs from the last call to
find_duplicates()
orfind_unique()
.
-
property
neighbors_map
¶ A dictionary mapping IDs to lists of
(dup_id, dist)
tuples from the last call tofind_duplicates()
.
-
find_duplicates
(thresh=None, fraction=None)¶ Queries the index to find near-duplicate examples based on the provided parameters.
Calling this method populates the
unique_ids()
,duplicate_ids()
,neighbors_map
, andthresh
properties of this object with the results of the query.Use
duplicates_view()
andvisualize_duplicates()
to analyze the results generated by this method.- Parameters
thresh (None) – a distance threshold to use to determine duplicates. If specified, the non-duplicate set will be the (approximately) largest set such that all pairwise distances between non-duplicate examples are greater than this threshold
fraction (None) – a desired fraction of images/patches to tag as duplicates, in
[0, 1]
. In this casethresh
is automatically tuned to achieve the desired fraction of duplicates
-
find_unique
(count)¶ Queries the index to select a subset of examples of the specified size that are maximally unique with respect to each other.
Calling this method populates the
unique_ids()
,duplicate_ids()
, andthresh
properties of this object with the results of the query.Use
unique_view()
andvisualize_unique()
to analyze the results generated by this method.- Parameters
count – the desired number of unique examples
-
plot_distances
(bins=100, log=False, backend='plotly', **kwargs)¶ Plots a histogram of the distance between each example and its nearest neighbor.
If :meth:`find_duplicates or
find_unique()
has been executed, the threshold used is also indicated on the plot.- Parameters
bins (100) – the number of bins to use
log (False) – whether to use a log scale y-axis
backend ("plotly") – the plotting backend to use. Supported values are
("plotly", "matplotlib")
**kwargs – keyword arguments for the backend plotting method
- Returns
a
fiftyone.core.plots.plotly.PlotlyNotebookPlot
, if you are working in a notebook context and the plotly backend is useda plotly or matplotlib figure, otherwise
- Return type
one of the following
-
duplicates_view
(type_field=None, id_field=None, dist_field=None, sort_by='distance', reverse=False)¶ Returns a view that contains only the duplicate examples and their corresponding nearest non-duplicate examples generated by the last call to
find_duplicates()
.If you are analyzing patches, the returned view will be a
fiftyone.core.patches.PatchesView
.The examples are organized so that each non-duplicate is immediately followed by all duplicate(s) that are nearest to it.
- Parameters
type_field (None) – the name of a string field in which to store
"nearest"
and"duplicate"
labels. The field is created if necessaryid_field (None) – the name of a string field in which to store the ID of the nearest non-duplicate for each example in the view. The field is created if necessary
dist_field (None) – the name of a float field in which to store the distance of each example to its nearest non-duplicate example. The field is created if necessary
sort_by ("distance") –
specifies how to sort the groups of duplicate examples. The supported values are:
"distance"
: sort the groups by the distance between the non-duplicate and its (nearest, if multiple) duplicate"count"
: sort the groups by the number of duplicate examples
reverse (False) – whether to sort in descending order
- Returns
-
unique_view
()¶ Returns a view that contains only the unique examples generated by the last call to
find_duplicates()
orfind_unique()
.If you are analyzing patches, the returned view will be a
fiftyone.core.patches.PatchesView
.- Returns
-
visualize_duplicates
(visualization, backend='plotly', **kwargs)¶ Generates an interactive scatterplot of the results generated by the last call to
find_duplicates()
.The
visualization
argument can be any visualization computed on the same dataset (or subset of it) as long as it contains every sample/object in the view whose results you are visualizing.The points are colored based on the following partition:
“duplicate”: duplicate example
“nearest”: nearest neighbor of a duplicate example
“unique”: the remaining unique examples
Edges are also drawn between each duplicate and its nearest non-duplicate neighbor.
You can attach plots generated by this method to an App session via its
fiftyone.core.session.Session.plots
attribute, which will automatically sync the session’s view with the currently selected points in the plot.- Parameters
visualization – a
fiftyone.brain.visualization.VisualizationResults
instance to use to visualize the resultsbackend ("plotly") – the plotting backend to use. Supported values are
("plotly", "matplotlib")
**kwargs –
keyword arguments for the backend plotting method:
”plotly” backend:
fiftyone.core.plots.plotly.scatterplot()
”matplotlib” backend:
fiftyone.core.plots.matplotlib.scatterplot()
- Returns
-
visualize_unique
(visualization, backend='plotly', **kwargs)¶ Generates an interactive scatterplot of the results generated by the last call to
find_unique()
.The
visualization
argument can be any visualization computed on the same dataset (or subset of it) as long as it contains every sample/object in the view whose results you are visualizing.The points are colored based on the following partition:
“unique”: the unique examples
“other”: the other examples
You can attach plots generated by this method to an App session via its
fiftyone.core.session.Session.plots
attribute, which will automatically sync the session’s view with the currently selected points in the plot.- Parameters
visualization – a
fiftyone.brain.visualization.VisualizationResults
instance to use to visualize the resultsbackend ("plotly") – the plotting backend to use. Supported values are
("plotly", "matplotlib")
**kwargs –
keyword arguments for the backend plotting method:
”plotly” backend:
fiftyone.core.plots.plotly.scatterplot()
”matplotlib” backend:
fiftyone.core.plots.matplotlib.scatterplot()
- Returns
-
property