Removing Duplicate Objects#

This recipe demonstrates a simple workflow for finding and removing duplicate objects in your FiftyOne datasets using intersection over union (IoU).

Specificially, it covers:

Also, check out our blog post for more information about using IoU to evaluate your object detection models.

Setup#

If you haven’t already, install FiftyOne:

[ ]:
!pip install fiftyone

Load a dataset#

In this recipe, we’ll work with the validation split of the COCO dataset, which is conveniently available for download via the FiftyOne Dataset Zoo.

The snippet below downloads and loads a subset of the validation split into FiftyOne:

[1]:
import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("coco-2017", split="validation", max_samples=1000)
Downloading split 'validation' to '/Users/Brian/fiftyone/coco-2017/validation' if necessary
Found annotations at '/Users/Brian/fiftyone/coco-2017/raw/instances_val2017.json'
Sufficient images already downloaded
Existing download of split 'validation' is sufficient
Loading 'coco-2017' split 'validation'
 100% |███████████████| 1000/1000 [4.9s elapsed, 0s remaining, 216.7 samples/s]
Dataset 'coco-2017-validation-1000' created

Let’s print the dataset to see what we downloaded:

[2]:
print(dataset)
Name:        coco-2017-validation-1000
Media type:  image
Num samples: 1000
Persistent:  False
Tags:        ['validation']
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)

Finding duplicate objects#

Now let’s use the compute_max_ious() utility to compute the maximum IoU between each object in the ground_truth field with another object of the same class (classwise=True) within the same image.

The max IOU will be stored in a max_iou attribute of each object, and the idea here is that duplicate objects will necessarily have high IoU with another object.

[3]:
import fiftyone.utils.iou as foui

foui.compute_max_ious(dataset, "ground_truth", iou_attr="max_iou", classwise=True)
print("Max IoU range: (%f, %f)" % dataset.bounds("ground_truth.detections.max_iou"))
 100% |███████████████| 1000/1000 [3.2s elapsed, 0s remaining, 348.2 samples/s]
Max IoU range: (0.000000, 0.951640)

Note that compute_max_ious() provides an optional other_field parameter if you would like to compute IoUs between objects in different fields instead.

In any case, let’s create a view that contains only labels with a max IoU > 0.75:

[4]:
from fiftyone import ViewField as F

# Retrieve detections that overlap above a chosen threshold
dups_view = dataset.filter_labels("ground_truth", F("max_iou") > 0.75)
print(dups_view)
Dataset:     coco-2017-validation-1000
Media type:  image
Num samples: 7
Tags:        ['validation']
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
View stages:
    1. FilterLabels(field='ground_truth', filter={'$gt': ['$$this.max_iou', 0.75]}, only_matches=True, trajectories=False)

and load it in the App:

[8]:
session = fo.launch_app(view=dups_view)

Removing duplicates in the App#

One simple approach to removing the duplicate labels is to review them in the App and assign label tags to the labels that we deem to be duplicates:

[9]:
session.show()