Dataset Curation#
End-to-end curation pipeline: inspect quality, audit annotations, find duplicates, explore embeddings, and build curated splits.
Install#
curl -sL skil.sh | sh -s -- voxel51/fiftyone-skills
When prompted, select fiftyone-dataset-curation from the menu.
Requirements#
Usage#
Load a dataset in FiftyOne, then ask your AI assistant:
"Curate my dataset: check quality, find duplicates, and build a clean training split"
"Audit the annotations in my detection dataset"
"Analyze class distribution and flag imbalanced classes"
"Create a stratified train/val/test split"
The skill runs each phase sequentially and presents findings before making any changes.
Example#
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("quickstart")
Then ask your assistant:
"Curate the quickstart dataset: check for quality issues and near-duplicates"