Note
This is a Hugging Face dataset. Learn how to load datasets from the Hub in the Hugging Face integration docs.
Dataset Card for harpreetsahota/sku110k_test#

This is a FiftyOne dataset with 2936 samples.
Installation#
If you haven’t already, install FiftyOne:
pip install -U fiftyone
Usage#
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
# Load the dataset
# Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/sku110k_test")
# Launch the App
session = fo.launch_app(dataset)
Dataset Card for SKU-110K (test split)#
Dataset Details#
Dataset Description#
The SKU-110K dataset is a large-scale benchmark for object detection in densely packed retail scenes. It consists of 11,762 images of retail shelves from thousands of supermarkets worldwide, encompassing diverse geographic locations including the United States, Europe, and East Asia. The dataset contains over 1.73 million bounding box annotations, with an average of approximately 147 objects per image. All images have been resized to a resolution of one million pixels.
The dataset addresses the challenge of precise detection in densely packed scenes where objects are closely positioned, often overlapping, and typically oriented within a range of [-15°, 15°]. This makes it particularly valuable for developing and evaluating object detection algorithms for real-world retail applications where traditional detection methods often struggle due to extreme object density and occlusion.
Curated by: Eran Goldman, Roei Herzig, Aviv Eisenschtat, Jacob Goldberger, and Tal Hassner
Funded by: Trax (based on license information)
Shared by: Research team from Bar-Ilan University and Trax
Language(s) (NLP): Not applicable (computer vision dataset)
License: Academic and non-commercial use only (proprietary license by Trax)
Dataset Sources#
Repository: https://github.com/eg4000/SKU110K_CVPR19
Paper: “Precise Detection in Densely Packed Scenes” - CVPR 2019
ArXiv: https://arxiv.org/abs/1904.00853
Uses#
Direct Use#
The SKU-110K dataset is designed for the following use cases:
Object Detection Research: Training and evaluating object detection models, particularly for densely packed scenes
Retail Analytics: Developing algorithms for automated shelf monitoring, inventory management, and planogram compliance
Benchmark Evaluation: Comparing performance of detection algorithms in challenging, high-density scenarios
Dense Object Detection: Research on handling extreme object density, occlusion, and scale variation
Academic Research: Educational purposes and non-commercial research projects
The dataset is particularly suitable for:
Studying detection performance in scenes with 50-200+ objects per image
Developing algorithms robust to varying lighting conditions, viewpoints, and scales
Research on handling closely packed objects with minimal spacing
Dataset Structure#
The dataset is organized into three splits with CSV annotation files:
Split Statistics#
Split |
Images |
Annotations |
Avg. Objects/Image |
|---|---|---|---|
Train |
8,233 |
1,208,482 |
~147 |
Validation |
588 |
90,968 |
~155 |
Test |
2,941 |
431,546 |
~147 |
Total |
11,762 |
1,730,996 |
~147 |
Annotation Format#
The CSV annotation files contain the following columns:
image_name: Filename of the image (e.g., “test_0.jpg”)x1: X-coordinate of the top-left corner of the bounding box (pixels)y1: Y-coordinate of the top-left corner of the bounding box (pixels)x2: X-coordinate of the bottom-right corner of the bounding box (pixels)y2: Y-coordinate of the bottom-right corner of the bounding box (pixels)class: Class label (all objects labeled as “object” - no fine-grained categories)image_width: Width of the image in pixelsimage_height: Height of the image in pixels
Note: Each annotation appears on a separate line in the CSV file, meaning images with multiple objects have multiple rows.
FiftyOne Dataset Structure#
The dataset has been converted to FiftyOne format with the following enhancements:
Base Structure#
Dataset Name:
sku110k_test(test split)Sample Structure: Each sample represents one image with associated detections
Image Path:
SKU110K_fixed/images/{image_name}Detection Field:
ground_truth(FiftyOne Detections object)
Bounding Box Format#
Bounding boxes are stored in FiftyOne’s normalized format:
[x, y, width, height]where all values are in range [0, 1]x: Normalized x-coordinate of top-left corner (x1 / image_width)y: Normalized y-coordinate of top-left corner (y1 / image_height)width: Normalized width ((x2 - x1) / image_width)height: Normalized height ((y2 - y1) / image_height)
Enriched Fields#
The FiftyOne dataset includes the following enrichments:
Bounding Box Areas (
areafield on each detection)Computed as:
width Ă— height(in normalized coordinates)Range: [0, 1] representing the proportion of image covered
Detection Counts (
num_detectionsfield at sample level)Integer count of objects detected in each image
Useful for filtering and analyzing image complexity
RADIO Embeddings (
radio_embeddingsfield at sample level)Global semantic features extracted using C-RADIO v3-h model
High-dimensional vectors capturing visual semantics
Enables similarity search and clustering
UMAP Visualization (Brain key:
radio_viz)2D projection of RADIO embeddings for visualization
Allows exploration of visual similarity patterns
Interactive visualization in FiftyOne App
Attention Heatmaps (
radio_heatmapfield at sample level)Spatial attention maps from C-RADIO v3-h model
Generated with smoothing (sigma=0.51)
Format: NCHW (channels first)
Highlights salient regions in each image
Dataset Creation#
Curation Rationale#
The SKU-110K dataset was created to address a critical gap in object detection research: the lack of large-scale datasets for densely packed scenes. While existing datasets like COCO and Pascal VOC contain object detection annotations, they typically feature relatively sparse scenes with well-separated objects. Real-world retail environments present unique challenges:
Extreme Density: Shelves contain 50-200+ products in close proximity
Heavy Occlusion: Objects frequently overlap and obscure one another
Scale Variation: Products vary greatly in size within the same scene
Orientation Patterns: Most objects aligned within [-15°, 15°] range
The dataset enables research on precise localization and detection algorithms capable of handling these challenging conditions, with applications in automated retail analytics, inventory management, and planogram compliance.
Source Data#
Data Collection and Processing#
Collection Method: Images captured from thousands of supermarket stores worldwide
Geographic Diversity: United States, Europe, and East Asia
Scene Variation: Diverse scales, viewpoints, lighting conditions, and noise levels
Image Processing: All images resized to one million pixels for consistency
Quality Control: Images selected to represent challenging, densely packed scenarios
Annotation Tool: Manual annotation using bounding box annotation software
Format: CSV files with one annotation per line
The dataset focuses on “in-the-wild” conditions with natural variations in:
Camera angles and distances
Lighting (fluorescent, natural, mixed)
Shelf arrangements and product placement
Image quality and noise levels
Who are the source data producers?#
The source images were captured from retail stores operated by various supermarket chains across multiple continents. The images represent real retail environments and were collected through Trax, a retail technology company specializing in computer vision solutions for in-store execution.
Annotations#
Annotation process#
Annotation Type: Manual bounding box annotation
Annotation Guidelines: Annotators were instructed to draw tight bounding boxes around each visible product on retail shelves
Class Labels: All objects labeled uniformly as “object” (no product-level categorization)
Annotation Density: Average of 147 bounding boxes per image, with some images containing 200+ annotations
Quality Assurance: Manual review and validation process to ensure annotation accuracy
Tools Used: Professional annotation tools for computer vision tasks
Completeness: All visible products in each image were annotated
Note: The dataset does not include fine-grained product categories or SKU-level identification. All objects are labeled with a single “object” class, making this a class-agnostic detection task focused on localization precision rather than classification.
Who are the annotators?#
The annotations were created by trained professional annotators working with the research team. Specific demographic information about the annotators is not publicly available. The annotation process was conducted with quality control measures to ensure consistency and accuracy across the large annotation volume (1.7M+ bounding boxes).
Personal and Sensitive Information#
The dataset consists of images of retail shelf scenes containing packaged products. The images do not intentionally capture or focus on people. However, users should be aware that:
Retail environments are public spaces where incidental capture of individuals may occur
Product brands and packaging visible in images are proprietary to their respective manufacturers
Store layouts and product arrangements may be considered proprietary information
The dataset is provided with restrictions on redistribution and commercial use to protect potential proprietary interests.
Citation#
BibTeX#
@inproceedings{goldman2019dense,
author = {Eran Goldman and Roei Herzig and Aviv Eisenschtat and Jacob Goldberger and Tal Hassner},
title = {Precise Detection in Densely Packed Scenes},
booktitle = {Proc. Conf. Comput. Vision Pattern Recognition (CVPR)},
year = {2019}
}
APA#
Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., & Hassner, T. (2019). Precise Detection in Densely Packed Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5227-5236).
More Information#
Additional Resources#
GitHub Repository: https://github.com/eg4000/SKU110K_CVPR19
ArXiv Paper: https://arxiv.org/abs/1904.00853
FiftyOne Documentation: https://docs.voxel51.com/
RADIO Model: https://github.com/harpreetsahota204/NVLabs_CRADIOV3