Note

This is a Hugging Face dataset. Learn how to load datasets from the Hub in the Hugging Face integration docs.

Hugging Face

Dataset Card for Homework Training Set for Coursera MOOC - Hands Data Centric Visual AI#

This dataset is the training dataset for the homework assignments of the Hands-on Data Centric AI Coursera course.

This is a FiftyOne dataset with 18287 samples.

Installation#

If you haven’t already, install FiftyOne:

pip install -U fiftyone

Usage#

import fiftyone as fo
import fiftyone.utils.huggingface as fouh

# Load the dataset
# Note: other available arguments include 'max_samples', etc
dataset = fouh.load_from_hub("Voxel51/Coursera_homework_dataset_train")

# Launch the App
session = fo.launch_app(dataset)

Dataset Details#

Dataset Description#

This dataset is a modified subset of the LVIS dataset.

The dataset here only contains detections, some of which have been artificially perturbed and altered to demonstrate data centric AI techniques and methodologies for the course.

This dataset has the following labels:

  • ‘bolt’

  • ‘knob’

  • ‘tag’

  • ‘button’

  • ‘bottle_cap’

  • ‘belt’

  • ‘strap’

  • ‘necktie’

  • ‘shirt’

  • ‘sweater’

  • ‘streetlight’

  • ‘pole’

  • ‘reflector’

  • ‘headlight’

  • ‘taillight’

  • ‘traffic_light’

  • ‘rearview_mirror’

Dataset Sources#

  • Repository: https://www.lvisdataset.org/

  • Paper: https://arxiv.org/abs/1908.03195

Uses#

The labels in this dataset have been perturbed to illustrate data centric AI techniques for the Hands-on Data Centric AI Coursera MOOC.

Dataset Structure#

Each image in the dataset comes with detailed annotations in FiftyOne detection format. A typical annotation looks like this:

<Detection: {
    'id': '66a2f24cce2f9d11d98d3a21',
    'attributes': {},
    'tags': [],
    'label': 'shirt',
    'bounding_box': [
        0.25414,
        0.35845238095238097,
        0.041960000000000004,
        0.051011904761904765,
    ],
    'mask': None,
    'confidence': None,
    'index': None,
}>

Dataset Creation#

Curation Rationale#

The selected labels for this dataset is because these objects can be confusing to a model. Thus, making them a great choice for demonstrating data centric AI techniques.

Source Data#

This is a subset of the LVIS dataset.

Citation#

BibTeX:

@inproceedings{gupta2019lvis,
  title={{LVIS}: A Dataset for Large Vocabulary Instance Segmentation},
  author={Gupta, Agrim and Dollar, Piotr and Girshick, Ross},
  booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
  year={2019}
}