Note
This is a Hugging Face dataset. Learn how to load datasets from the Hub in the Hugging Face integration docs.
Dataset Card for Egocentric_10K_Evaluation#

This is a FiftyOne dataset with 30000 samples.
Installation#
If you haven’t already, install FiftyOne:
pip install -U fiftyone
Usage#
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
# Load the dataset
# Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/Egocentric_10K_Evaluation")
# Launch the App
session = fo.launch_app(dataset)
Dataset Details#
Dataset Description#
Egocentric-10K-Evaluation is a benchmark evaluation set and analysis protocol for large-scale egocentric (first-person) video datasets, focused on measuring hand visibility and active manipulation in real-world, in-the-wild scenarios, especially relevant for robotics, computer vision, and AI agent training on manipulation tasks.[1][2][3]
Curated by: builddotai
Shared by : builddotai
License: Apache 2.0
Dataset Sources#
Repository: https://huggingface.co/datasets/builddotai/Egocentric-10K-Evaluation
Uses#
Direct Use#
This dataset is intended for benchmarking egocentric video data with respect to hand presence and active object manipulation, enabling standardized analysis, dataset comparison, and the development/evaluation of perception and robotics models centered on real-world human skill tasks.
Dataset Structure#
Egocentric-10K-Evaluation consists of 10,000 sampled frames from factory egocentric video and comparable samples from other major datasets (Ego4D, EPIC-KITCHENS); each sample includes JSON metadata, hand label annotations (count 0, 1, or 2), and a binary label for presence/absence of active manipulation. The splits are standardized; additional metadata includes dataset, worker, and video index references.
Dataset Creation#
Curation Rationale#
To create a standardized benchmark for hand visibility and manipulation, facilitating research on manipulation-heavy tasks in robotics and AI using real industrial and skill-focused footage.
Source Data#
Data Collection and Processing#
The evaluation set comprises frames drawn from the primary Egocentric-10K dataset (real-world factory footage collected via head-mounted cameras), as well as standardized samples from open egocentric datasets Ego4D and EPIC-KITCHENS for comparison. Data is provided in 1080p, 30 FPS H.265 MP4 format, with structured JSON metadata and hand/manipulation annotations.
Who are the source data producers?#
Egocentric-10K’s original video data was produced by real factory workers wearing head-mounted cameras, performing natural work-line activities. Annotation was performed following strict guidelines as described in the evaluation schema.
Annotations#
Annotation process#
Each sampled frame is annotated for number of visible hands (0/1/2; detailed rules provided) and whether the hands are engaged in active manipulation (“yes”/“no” per explicit definition). The annotation schema and rules are detailed in the benchmark documentation.