![]() |
![]() |
![]() |
Removing Duplicate Objects¶
This recipe demonstrates a simple workflow for finding and removing duplicate objects in your FiftyOne datasets using intersection over union (IoU).
Specificially, it covers:
Using the compute_max_ious() utility to compute overlap between spatial objects
Using the App’s tagging UI to review and delete duplicate labels
Using FiftyOne’s CVAT integration to edit duplicate labels
Using the find_duplicates() utility to automatically detect duplicate objects
Also, check out our blog post for more information about using IoU to evaluate your object detection models.
Load a dataset¶
In this recipe, we’ll work with the validation split of the COCO dataset, which is conveniently available for download via the FiftyOne Dataset Zoo.
The snippet below downloads and loads a subset of the validation split into FiftyOne:
[1]:
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("coco-2017", split="validation", max_samples=1000)
Downloading split 'validation' to '/Users/Brian/fiftyone/coco-2017/validation' if necessary
Found annotations at '/Users/Brian/fiftyone/coco-2017/raw/instances_val2017.json'
Sufficient images already downloaded
Existing download of split 'validation' is sufficient
Loading 'coco-2017' split 'validation'
100% |███████████████| 1000/1000 [4.9s elapsed, 0s remaining, 216.7 samples/s]
Dataset 'coco-2017-validation-1000' created
Let’s print the dataset to see what we downloaded:
[2]:
print(dataset)
Name: coco-2017-validation-1000
Media type: image
Num samples: 1000
Persistent: False
Tags: ['validation']
Sample fields:
id: fiftyone.core.fields.ObjectIdField
filepath: fiftyone.core.fields.StringField
tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
Finding duplicate objects¶
Now let’s use the compute_max_ious() utility to compute the maximum IoU between each object in the ground_truth
field with another object of the same class (classwise=True
) within the same image.
The max IOU will be stored in a max_iou
attribute of each object, and the idea here is that duplicate objects will necessarily have high IoU with another object.
[3]:
import fiftyone.utils.iou as foui
foui.compute_max_ious(dataset, "ground_truth", iou_attr="max_iou", classwise=True)
print("Max IoU range: (%f, %f)" % dataset.bounds("ground_truth.detections.max_iou"))
100% |███████████████| 1000/1000 [3.2s elapsed, 0s remaining, 348.2 samples/s]
Max IoU range: (0.000000, 0.951640)
Note that compute_max_ious() provides an optional other_field
parameter if you would like to compute IoUs between objects in different fields instead.
In any case, let’s create a view that contains only labels with a max IoU > 0.75:
[4]:
from fiftyone import ViewField as F
# Retrieve detections that overlap above a chosen threshold
dups_view = dataset.filter_labels("ground_truth", F("max_iou") > 0.75)
print(dups_view)
Dataset: coco-2017-validation-1000
Media type: image
Num samples: 7
Tags: ['validation']
Sample fields:
id: fiftyone.core.fields.ObjectIdField
filepath: fiftyone.core.fields.StringField
tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
View stages:
1. FilterLabels(field='ground_truth', filter={'$gt': ['$$this.max_iou', 0.75]}, only_matches=True, trajectories=False)
and load it in the App:
[8]:
session = fo.launch_app(view=dups_view)