fiftyone.brain¶
Module contents¶
The brains behind FiftyOne: a powerful package for dataset curation, analysis, and visualization.
See https://github.com/voxel51/fiftyone for more information.
Functions:
|
Adds a hardness field to each sample scoring the difficulty that the specified label field observed in classifying the sample. |
|
Computes the mistakenness (likelihood of being incorrect) of the labels in |
|
Adds a uniqueness field to each sample scoring how unique it is with respect to the rest of the samples. |
|
Adds a representativeness field to each sample scoring how representative of nearby samples it is. |
|
Computes a low-dimensional representation of the samples’ media or their patches that can be interactively visualized. |
|
Uses embeddings to index the samples or their patches so that you can query/sort by similarity. |
|
Detects duplicate media in a sample collection. |
-
fiftyone.brain.
compute_hardness
(samples, label_field, hardness_field='hardness', progress=None)¶ Adds a hardness field to each sample scoring the difficulty that the specified label field observed in classifying the sample.
Hardness is a measure computed based on model prediction output (through logits) that summarizes a measure of the uncertainty the model had with the sample. This makes hardness quantitative and can be used to detect things like hard samples, annotation errors during noisy training, and more.
All classifications must have their
logits
attributes populated in order to use this method.Note
Runs of this method can be referenced later via brain key
hardness_field
.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
label_field – the
fiftyone.core.labels.Classification
orfiftyone.core.labels.Classifications
field to use from each samplehardness_field ("hardness") – the field name to use to store the hardness value for each sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
fiftyone.brain.
compute_mistakenness
(samples, pred_field, label_field, mistakenness_field='mistakenness', missing_field='possible_missing', spurious_field='possible_spurious', use_logits=False, copy_missing=False, progress=None)¶ Computes the mistakenness (likelihood of being incorrect) of the labels in
label_field
based on the predcted labels inpred_field
.Mistakenness is measured based on either the
confidence
orlogits
of the predictions inpred_field
. This measure can be used to detect things like annotation errors and unusually hard samples.For classifications, a
mistakenness_field
field is populated on each sample that quantifies the likelihood that the label in thelabel_field
of that sample is incorrect.For objects (detections, polylines, keypoints, etc), the mistakenness of each object in
label_field
is computed, usingfiftyone.core.collections.SampleCollection.evaluate_detections()
to locate corresponding objects inpred_field
. Three types of mistakes are identified:(Mistakes) Objects in
label_field
with a match inpred_field
are assigned a mistakenness value in theirmistakenness_field
that captures the likelihood that the class label of the object inlabel_field
is a mistake. Amistakenness_field + "_loc"
field is also populated that captures the likelihood that the object inlabel_field
is a mistake due to its localization (bounding box).(Missing) Objects in
pred_field
with no matches inlabel_field
but which are likely to be correct will have theirmissing_field
attribute set to True. In addition, ifcopy_missing
is True, copies of these objects are added to the ground truthlabel_field
.(Spurious) Objects in
label_field
with no matches inpred_field
but which are likely to be incorrect will have theirspurious_field
attribute set to True.
In addition, for objects, the following sample-level fields are populated:
(Mistakes) The
mistakenness_field
of each sample is populated with the maximum mistakenness of the objects inlabel_field
(Missing) The
missing_field
of each sample is populated with the number of missing objects that were deemed missing fromlabel_field
.(Spurious) The
spurious_field
of each sample is populated with the number of objects inlabel_field
that were given deemed spurious.
Note
Runs of this method can be referenced later via brain key
mistakenness_field
.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
pred_field – the name of the predicted label field to use from each sample. Can be of type
fiftyone.core.labels.Classification
,fiftyone.core.labels.Classifications
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polylines
,fiftyone.core.labels.Keypoints
, orfiftyone.core.labels.TemporalDetections
label_field – the name of the “ground truth” label field that you want to test for mistakes with respect to the predictions in
pred_field
. Must have the same type aspred_field
mistakenness_field ("mistakenness") – the field name to use to store the mistakenness value for each sample
missing_field ("possible_missing) – the field in which to store per-sample counts of potential missing objects
spurious_field ("possible_spurious) – the field in which to store per-sample counts of potential spurious objects
use_logits (False) – whether to use logits (True) or confidence (False) to compute mistakenness. Logits typically yield better results, when they are available
copy_missing (False) – whether to copy predicted objects that were deemed to be missing into
label_field
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
fiftyone.brain.
compute_uniqueness
(samples, uniqueness_field='uniqueness', roi_field=None, embeddings=None, model=None, model_kwargs=None, force_square=False, alpha=None, batch_size=None, num_workers=None, skip_failures=True, progress=None)¶ Adds a uniqueness field to each sample scoring how unique it is with respect to the rest of the samples.
This function only uses the pixel data and can therefore process labeled or unlabeled samples.
If no
embeddings
ormodel
is provided, a default model is used to generate embeddings.Note
Runs of this method can be referenced later via brain key
uniqueness_field
.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
uniqueness_field ("uniqueness") – the field name to use to store the uniqueness value for each sample
roi_field (None) – an optional
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
field defining a region of interest within each image to use to compute uniquenessembeddings (None) –
if no
model
is provided, this argument specifies pre-computed embeddings to use, which can be any of the following:a
num_samples x num_dims
array of embeddingsif
roi_field
is specified, a dict mapping sample IDs tonum_patches x num_dims
arrays of patch embeddingsthe name of a dataset field containing the embeddings to use
If a
model
is provided, this argument specifies the name of a field in which to store the computed embeddings. In either case, when working with patch embeddings, you can provide either the fully-qualified path to the patch embeddings or just the name of the label attribute inroi_field
model (None) – a
fiftyone.core.models.Model
or the name of a model from the FiftyOne Model Zoo to use to generate embeddings. The model must expose embeddings (model.has_embeddings = True
)model_kwargs (None) – a dictionary of optional keyword arguments to pass to the model’s
Config
when a model name is providedforce_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a
model
androi_field
are specifiedalpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, inf)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%. Only applicable when amodel
androi_field
are specifiedbatch_size (None) – a batch size to use when computing embeddings. Only applicable when a
model
is providednum_workers (None) – the number of workers to use when loading images. Only applicable when a Torch-based model is being used to compute embeddings
skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
fiftyone.brain.
compute_representativeness
(samples, representativeness_field='representativeness', method='cluster-center', roi_field=None, embeddings=None, model=None, model_kwargs=None, force_square=False, alpha=None, batch_size=None, num_workers=None, skip_failures=True, progress=None)¶ Adds a representativeness field to each sample scoring how representative of nearby samples it is.
This function only uses the pixel data and can therefore process labeled or unlabeled samples.
If no
embeddings
ormodel
is provided, a default model is used to generate embeddings.Note
Runs of this method can be referenced later via brain key
representativeness_field
.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
representativeness_field ("representativeness") – the field name to use to store the representativeness value for each sample
method ("cluster-center") – the name of the method to use to compute the representativeness. The supported values are
["cluster-center", 'cluster-center-downweight']
."cluster-center"` will make a sample's representativeness proportional to it's proximity to cluster centers, while ``"cluster-center-downweight"
will ensure more diversity in representative samplesroi_field (None) – an optional
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
field defining a region of interest within each image to use to compute representativenessembeddings (None) –
if no
model
is provided, this argument specifies pre-computed embeddings to use, which can be any of the following:a
num_samples x num_dims
array of embeddingsif
roi_field
is specified, a dict mapping sample IDs tonum_patches x num_dims
arrays of patch embeddingsthe name of a dataset field containing the embeddings to use
If a
model
is provided, this argument specifies the name of a field in which to store the computed embeddings. In either case, when working with patch embeddings, you can provide either the fully-qualified path to the patch embeddings or just the name of the label attribute inroi_field
model (None) –
a
fiftyone.core.models.Model
or the name of a model from the FiftyOne Model Zoo to use to generate embeddings. The model must expose embeddings (model.has_embeddings = True
)model_kwargs (None) – a dictionary of optional keyword arguments to pass to the model’s
Config
when a model name is providedforce_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a
model
androi_field
are specifiedalpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, inf)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%. Only applicable when amodel
androi_field
are specifiedbatch_size (None) – a batch size to use when computing embeddings. Only applicable when a
model
is providednum_workers (None) – the number of workers to use when loading images. Only applicable when a Torch-based model is being used to compute embeddings
skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
-
fiftyone.brain.
compute_visualization
(samples, patches_field=None, embeddings=None, points=None, brain_key=None, num_dims=2, method=None, model=None, model_kwargs=None, force_square=False, alpha=None, batch_size=None, num_workers=None, skip_failures=True, progress=None, **kwargs)¶ Computes a low-dimensional representation of the samples’ media or their patches that can be interactively visualized.
The representation can be visualized by calling the
visualize()
method of the returnedfiftyone.brain.visualization.VisualizationResults
object.If no
embeddings
ormodel
is provided, the following default model is used to generate embeddings:import fiftyone.zoo as foz model = foz.load_zoo_model("mobilenet-v2-imagenet-torch")
You can use the
method
parameter to select the dimensionality reduction method to use, and you can optionally customize the method by passing additional parameters for the method’sfiftyone.brain.visualization.VisualizationConfig
class askwargs
.The builtin
method
values and their associated config classes are:"umap"
:fiftyone.brain.visualization.UMAPVisualizationConfig
"tsne"
:fiftyone.brain.visualization.TSNEVisualizationConfig
"manual"
:fiftyone.brain.visualization.ManualVisualizationConfig
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
patches_field (None) – a sample field defining the image patches in each sample that have been/will be embedded. Must be of type
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
embeddings (None) –
if no
model
is provided, this argument specifies pre-computed embeddings to use, which can be any of the following:a dict mapping sample IDs to embedding vectors
a
num_samples x num_embedding_dims
array of embeddings corresponding to the samples insamples
if
patches_field
is specified, a dict mapping label IDs to to embedding vectorsif
patches_field
is specified, a dict mapping sample IDs tonum_patches x num_embedding_dims
arrays of patch embeddingsthe name of a dataset field containing the embeddings to use
a
fiftyone.brain.similarity.SimilarityIndex
from which to retrieve embeddings for all samples/patches insamples
If a
model
is provided, this argument specifies the name of a field in which to store the computed embeddings. In either case, when working with patch embeddings, you can provide either the fully-qualified path to the patch embeddings or just the name of the label attribute inpatches_field
points (None) –
a pre-computed low-dimensional representation to use. If provided, no embeddings will be used/computed. Can be any of the following:
a dict mapping sample IDs to points vectors
a
num_samples x num_dims
array of points corresponding to the samples insamples
if
patches_field
is specified, a dict mapping label IDs to points vectorsif
patches_field
is specified, anum_patches x num_dims
array of points whose rows correspond to the flattened list of patches whose IDs are shown below:# The list of patch IDs that the rows of `points` must match _, id_field = samples._get_label_field_path(patches_field, "id") patch_ids = samples.values(id_field, unwind=True)
brain_key (None) – a brain key under which to store the results of this method
num_dims (2) – the dimension of the visualization space
method (None) – the dimensionality reduction method to use. The supported values are
fiftyone.brain.brain_config.visualization_methods.keys()
and the default isfiftyone.brain.brain_config.default_visualization_method
model (None) –
a
fiftyone.core.models.Model
or the name of a model from the FiftyOne Model Zoo to use to generate embeddings. The model must expose embeddings (model.has_embeddings = True
)model_kwargs (None) – a dictionary of optional keyword arguments to pass to the model’s
Config
when a model name is providedforce_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a
model
andpatches_field
are specifiedalpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, inf)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%. Only applicable when amodel
andpatches_field
are specifiedbatch_size (None) – an optional batch size to use when computing embeddings. Only applicable when a
model
is providednum_workers (None) – the number of workers to use when loading images. Only applicable when a Torch-based model is being used to compute embeddings
skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead**kwargs – optional keyword arguments for the constructor of the
fiftyone.brain.visualization.VisualizationConfig
being used
- Returns
-
fiftyone.brain.
compute_similarity
(samples, patches_field=None, embeddings=None, brain_key=None, model=None, model_kwargs=None, force_square=False, alpha=None, batch_size=None, num_workers=None, skip_failures=True, progress=None, backend=None, **kwargs)¶ Uses embeddings to index the samples or their patches so that you can query/sort by similarity.
Calling this method only creates the index. You can then call the methods exposed on the retuned
fiftyone.brain.similarity.SimilarityIndex
object to perform the following operations:sort_by_similarity()
: Sort the samples in the collection by similarity to a specific example or example(s)
All indexes support querying by image similarity by passing sample IDs to
sort_by_similarity()
. In addition, if you pass the name of a model from the FiftyOne Model Zoo likemodel="clip-vit-base32-torch"
that can embed prompts to this method, then you can query the index by text similarity as well.In addition, if the backend supports it, you can call the following duplicate detection methods:
find_duplicates()
: Query the index to find all examples with near-duplicates in the collectionfind_unique()
: Query the index to select a subset of examples of a specified size that are maximally unique with respect to each other
If no
embeddings
ormodel
is provided, the following default model is used to generate embeddings:import fiftyone.zoo as foz model = foz.load_zoo_model("mobilenet-v2-imagenet-torch")
- Parameters
samples – a
fiftyone.core.collections.SampleCollection
patches_field (None) – a sample field defining the image patches in each sample that have been/will be embedded. Must be of type
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
embeddings (None) –
embeddings to feed the index. This argument’s behavior depends on whether a
model
is provided, as described below.If no
model
is provided, this argument specifies pre-computed embeddings to use:a
num_samples x num_dims
array of embeddingsif
patches_field
is specified, a dict mapping sample IDs tonum_patches x num_dims
arrays of patch embeddingsthe name of a dataset field from which to load embeddings
None
: use the default model to compute embeddingsFalse
: do not compute embeddings right now
If a
model
is provided, this argument specifies where to store the model’s embeddings:the name of a field in which to store the computed embeddings
False
: do not compute embeddings right now
In either case, when working with patch embeddings, you can provide either the fully-qualified path to the patch embeddings or just the name of the label attribute in
patches_field
brain_key (None) – a brain key under which to store the results of this method
model (None) –
a
fiftyone.core.models.Model
or the name of a model from the FiftyOne Model Zoo to use, or that was already used, to generate embeddings. The model must expose embeddings (model.has_embeddings = True
)model_kwargs (None) – a dictionary of optional keyword arguments to pass to the model’s
Config
when a model name is providedforce_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a
model
andpatches_field
are specifiedalpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, inf)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%. Only applicable when amodel
andpatches_field
are specifiedbatch_size (None) – an optional batch size to use when computing embeddings. Only applicable when a
model
is providednum_workers (None) – the number of workers to use when loading images. Only applicable when a Torch-based model is being used to compute embeddings
skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke insteadbackend (None) – the similarity backend to use. The supported values are
fiftyone.brain.brain_config.similarity_backends.keys()
and the default isfiftyone.brain.brain_config.default_similarity_backend
**kwargs – keyword arguments for the
fiftyone.brian.SimilarityConfig
subclass of the backend being used
- Returns
-
fiftyone.brain.
compute_exact_duplicates
(samples, num_workers=None, skip_failures=True, progress=None)¶ Detects duplicate media in a sample collection.
This method detects exact duplicates with the same filehash. Use
compute_similarity()
to detect near-duplicate images.If duplicates are found, the first instance in
samples
will be the key in the returned dictionary, while the subsequent duplicates will be the values in the corresponding list.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
num_workers (None) – an optional number of processes to use
skip_failures (True) – whether to gracefully ignore samples whose filehash cannot be computed
progress (None) – whether to render a progress bar (True/False), use the default value
fiftyone.config.show_progress_bars
(None), or a progress callback function to invoke instead
- Returns
a dictionary mapping IDs of samples with exact duplicates to lists of IDs of the duplicates for the corresponding sample