Note
This is a Hugging Face dataset. For large datasets, ensure huggingface_hub>=1.1.3 to avoid rate limits. Learn more in the Hugging Face integration docs.
TartanRGBT Dataset Card#

TartanRGBT is a hardware-synchronized RGB–thermal robotics dataset from CMU AirLab’s AnyThermal project (ICRA 2026). Features co-registered stereo RGB and thermal images across indoor, urban, park, and off-road environments.
This subset:
15 trajectories
5,952 timesteps
1 Hz sampling
23,808 FiftyOne samples
This is a FiftyOne dataset with 5952 samples.
Installation#
If you haven’t already, install FiftyOne:
pip install -U fiftyone
Usage#
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
# Load the dataset
# Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/TartanRGBT")
# Launch the App
session = fo.launch_app(dataset)
Dataset Sources#
Source: theairlabcmu/TartanRGBT
Paper: arXiv:2602.06203
License: BSD-3-Clause-Clear
Data Streams#
Each timestep provides 4 synchronized camera streams:
Stream |
Resolution |
Notes |
|---|---|---|
RGB in thermal frame |
640 Ă— 512 |
ZED RGB reprojected to thermal grid. Pixel-aligned with left thermal. Primary RGB–thermal pair. |
Left thermal |
640 Ă— 512 |
FLIR Boson 640+, 8-bit grayscale |
Right thermal |
640 Ă— 512 |
FLIR Boson 640+, 8-bit grayscale |
ZED left RGB |
960 Ă— 540 |
Native rectified stereo |
ZED right RGB |
960 Ă— 540 |
Native rectified stereo |
Stereo depth |
960 Ă— 540 |
Dense depth map (meters), aligned with ZED left |
Scenes#
15 scenes across 5 collection days covering indoor, urban, park, and off-road terrain.
FiftyOne Structure#
Type: Grouped dataset
Default slice:
rgb_in_thermalGroups: 5,952
Key Fields#
Field |
Type |
Description |
|---|---|---|
|
str |
|
|
str |
e.g. |
|
int |
Frame index (0, 10, 20, …) |
|
float |
Unix time (seconds) |
|
bool |
Exclude from training if |
|
float |
Position (meters, from stereo odometry) |
|
float |
Orientation (quaternion) |
Labels#
thermal(rgb_in_thermalslice): Heatmap overlay of thermal intensity on RGBdepth(zed_leftslice): Display-optimized depth (masked, percentile-normalized)depth_gt(zed_leftslice): Raw depth visualization (min/max normalized)
Use Cases#
Intended:
RGB–thermal representation learning & knowledge distillation
Cross-modal place recognition
Monocular thermal depth estimation
Multi-environment thermal features
Out of scope:
Odometry or metric depth benchmarking (stereo-derived, not ground truth)
GPS-based localization
Citation#
@misc{maheshwari2026anythermallearninguniversalrepresentations,
title={AnyThermal: Towards Learning Universal Representations for Thermal Perception},
author={Parv Maheshwari and Jay Karhade and Yogesh Chawla and Isaiah Adu and Florian Heisen
and Andrew Porco and Andrew Jong and Yifei Liu and Santosh Pitla
and Sebastian Scherer and Wenshan Wang},
year={2026},
eprint={2602.06203},
archivePrefix={arXiv},
primaryClass={cs.CV}
}