GitHub Repo

Dataset Curation#

End-to-end curation pipeline: inspect quality, audit annotations, find duplicates, explore embeddings, and build curated splits.

Install#

curl -sL skil.sh | sh -s -- voxel51/fiftyone-skills

When prompted, select fiftyone-dataset-curation from the menu.

Requirements#

Usage#

Load a dataset in FiftyOne, then ask your AI assistant:

"Curate my dataset: check quality, find duplicates, and build a clean training split"
"Audit the annotations in my detection dataset"
"Analyze class distribution and flag imbalanced classes"
"Create a stratified train/val/test split"

The skill runs each phase sequentially and presents findings before making any changes.

Example#

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

Then ask your assistant:

"Curate the quickstart dataset: check for quality issues and near-duplicates"

See also#