Note
This is a Hugging Face dataset. For large datasets, ensure huggingface_hub>=1.1.3 to avoid rate limits. Learn more in the Hugging Face integration docs.
CholecT50 Dataset (FiftyOne Format)#
This is a FiftyOne dataset version of the CholecT50 dataset.
CholecT50 is a dataset of laparoscopic cholecystectomy surgeries, annotated with surgical action triplets. It is the first public dataset to provide action triplet annotations for surgical videos, enabling research in fine-grained surgical activity recognition.

Dataset Description#
Repository: CAMMA-public/cholect50
Paper: Rendition of CholecT50: A dataset for surgical action triplet recognition
Point of Contact: Chinedu Innocent Nwoye
Dataset Summary#
CholecT50 consists of 50 videos of laparoscopic cholecystectomy surgeries. Each frame is annotated with action triplets of the form <Instrument, Verb, Target>.
Instruments: 7 categories (e.g., grasper, bipolar, hook)
Verbs: 10 categories (e.g., grasp, retract, cauterize)
Targets: 15 categories (e.g., gallbladder, cystic duct, liver)
The dataset contains approximately 100,000 annotated frames, providing a rich source for training and evaluating models for surgical video understanding.
Installation#
If you haven’t already, install FiftyOne:
pip install -U fiftyone
Usage#
You can load the dataset from the Hugging Face Hub using FiftyOne:
import fiftyone as fo
import fiftyone.utils.huggingface as fouh
# Load the dataset
# Note: other available arguments include 'max_samples', etc
dataset = fouh.load_from_hub("Voxel51/cholect50")
# Launch the App
session = fo.launch_app(dataset)
Dataset Structure#
The dataset is organized as a FiftyOne dataset. Each sample represents a video frame and includes the following fields:
filepath: The path to the frame image.video_id: The ID of the video the frame belongs to.frame_number: The frame number within the video.instruments: Classifications for the instruments present in the frame.verbs: Classifications for the verbs (actions) being performed.targets: Classifications for the targets of the actions.triplets: The full action triplet annotations.clip_embeddings: 512-dimensional embeddings generated using a CLIP model (ViT-B/32).
Citation#
If you use the CholecT50 dataset in your research, please cite the following paper:
@article{nwoye2022rendition,
title={Rendition of CholecT50: A dataset for surgical action triplet recognition},
author={Nwoye, Chinedu Innocent and Yu, Tong and Zahid, Anwar and Al Hajj, Hassan and Mutschler, Cristopher and Padoy, Nicolas},
journal={arXiv preprint arXiv:2202.05324},
year={2022}
}
License#
The CholecT50 dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.