Note
This is a Hugging Face dataset. For large datasets, ensure huggingface_hub>=1.1.3 to avoid rate limits. Learn more in the Hugging Face integration docs.
Dataset Card for STONE#

STONE is a large-scale multi-modal dataset for off-road 3D traversability prediction, collected by autonomous ground vehicles across four outdoor environments in South Korea. It provides 7,000 keyframes with surround-view imagery from 6 cameras (1904×1200), 128-channel LiDAR scans (230K points), and voxel-level traversability annotations classifying terrain into free, traversable, potentially traversable, and non-traversable regions. Following the nuScenes format, the dataset includes 3D obstacle bounding boxes, ego-pose trajectories, and synchronized multi-sensor data at ~10 Hz. This FiftyOne version contains a stratified sample of 35 scenes (200 frames each) from the full 279-scene collection, organized as grouped samples with 7 slices per keyframe (6 cameras + 1 LiDAR 3D scene).
This is a FiftyOne dataset with 7000 samples.
Installation#
If you haven’t already, install FiftyOne:
pip install -U fiftyone
Usage#
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
# Load the dataset
# Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/STONE")
# Launch the App
session = fo.launch_app(dataset)
STONE — FiftyOne Dataset Card#
STONE is a large-scale multi-modal dataset for off-road 3D traversability prediction, collected by an autonomous ground vehicle (UGV) across four outdoor environments in South Korea. The dataset follows the nuScenes format and provides surround-view camera imagery, 128-channel LiDAR scans, and voxel-level traversability annotations.
Paper: Park et al., “STONE: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation”, ICRA 2026
arXiv: https://arxiv.org/abs/2603.09175
License: CC BY-NC-ND 4.0 (dataset) · Apache 2.0 (code)
Format: nuScenes / Occ3D-nuScenes
Project Page: https://konyul.github.io/STONE-dataset/
FiftyOne Dataset Structure#
The dataset is a grouped dataset — one group per keyframe, with seven slices:
Slice |
Media type |
Content |
|---|---|---|
|
|
1904 × 1200 JPEG, front-facing camera |
|
|
1904 × 1200 JPEG |
|
|
1904 × 1200 JPEG |
|
|
1904 × 1200 JPEG |
|
|
1904 × 1200 JPEG |
|
|
1904 × 1200 JPEG |
|
|
|
Sample Fields#
These fields are present on every sample across all seven slices.
Identity & Provenance#
Field |
Type |
Description |
|---|---|---|
|
|
Sensor name: |
|
|
nuScenes sample token (shared across all 7 slices in a group) |
|
|
nuScenes scene token |
|
|
Human-readable scene ID, e.g. |
|
|
Recording site: |
|
|
Vehicle ID: |
|
|
Unix timestamp in microseconds |
nuScenes Metadata (matching the official nuScenes guide)#
Field |
Type |
Description |
|---|---|---|
|
|
|
|
|
Token into |
|
|
Token into |
|
|
Always |
|
|
Previous |
|
|
Next |
|
|
Previous nuScenes sample token in the scene |
|
|
Next nuScenes sample token in the scene |
Labels#
Field |
Type |
Slices |
Description |
|---|---|---|---|
|
|
LIDAR_TOP |
3D obstacle annotations. Each |
|
|
cameras |
3D bounding boxes projected onto each camera as wireframe outlines using |
|
|
cameras |
Flat 2D bounding boxes from the pre-computed |
|
|
all |
Dominant traversability class in the frame’s voxel grid. |
|
|
cameras |
Projected path of the next 30 ego-pose waypoints (~3 seconds ahead) into the camera image plane. Present on ~83% of frames (absent near scene end) |
Traversability Fractions#
These fields are on all slices, derived from gts/<scene>/<token>/labels.npz.
Field |
Type |
Description |
|---|---|---|
|
|
Fraction of labeled voxels classified as Free (class 0) |
|
|
Fraction classified as Traversable (class 1) |
|
|
Fraction classified as Potentially Traversable (class 2) |
|
|
Fraction classified as Non-Traversable (class 3) |
LIDAR_TOP .fo3d Scene#
Each LIDAR_TOP sample points to a .fo3d scene file containing three stacked point cloud layers:
Layer |
Shading |
Source |
Description |
|---|---|---|---|
|
|
|
230,400-point raw scan from Hesai OT128. Points coloured by Z elevation via the viridis colorscale |
|
|
|
~140K points from the same scan, coloured by traversability class. Each point’s class is looked up from the voxel grid after transforming from LiDAR sensor frame to ego frame |
|
|
|
All 200 ego-pose waypoints for the scene, transformed to the current frame’s LiDAR sensor frame. Blue = past · White = current · Yellow = future |
Camera configuration: defaultCameraPosition = {x: -15, y: 0, z: 10} (15 m behind, 10 m above), up = "Z" (NuScenes Z-up convention), set via dataset.app_config.plugins["3d"].
Traversability Classes#
Class ID |
Label |
|
Colour in viewer |
|---|---|---|---|
0 |
Free |
|
green |
1 |
Traversable |
|
yellow |
2 |
Potentially Traversable |
|
orange |
3 |
Non-Traversable |
|
red |
The voxel grid has shape (200, 200, 16) — a 40 m × 40 m × 3.2 m volume centred on the vehicle at 0.2 m resolution. Value 255 = unoccupied.
Citation#
@inproceedings{park2026stone,
title={STONE: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation},
author={Park, Konyul and Kim, Daehun and Oh, Jiyong and Yu, Seunghoon and Park, Junseo
and Park, Jaehyun and Shin, Hongjae and Cho, Hyungchan and Kim, Jungho and Choi, Jun Won},
booktitle={Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
year={2026}
}