Note
This is a Hugging Face dataset. For large datasets, ensure huggingface_hub>=1.1.3 to avoid rate limits. Learn more in the Hugging Face integration docs.
Dataset Card for toon3d#

This is a FiftyOne dataset with 12 samples.
Installation#
If you haven’t already, install FiftyOne:
pip install -U fiftyone
Usage#
import fiftyone as fo
from huggingface_hub import snapshot_download
# Download the dataset snapshot to the current working directory
snapshot_download(
repo_id="Voxel51/toon3d",
local_dir=".",
repo_type="dataset"
)
# Load dataset from current directory using FiftyOne's native format
dataset = fo.Dataset.from_dir(
dataset_dir=".", # Current directory contains the dataset files
dataset_type=fo.types.FiftyOneDataset, # Specify FiftyOne dataset format
name="toon3d" # Assign a name to the dataset for identification
)
Dataset Details#
Dataset Description#
Toon3D is a multi-view cartoon scene reconstruction dataset introduced in the paper “Toon3D: Seeing Cartoons from New Perspectives” (Weber et al., arXiv:2405.10320, 2024). It contains 12 scenes from popular hand-drawn cartoons and anime, each comprising 5–12 frames that depict the same environment from geometrically inconsistent viewpoints.
This FiftyOne dataset wraps the Toon3D scenes together with the outputs of the authors’ custom Structure-from-Motion (SfM) pipeline, which recovers camera poses and dense 3D point clouds from the geometrically inconsistent input images.
Curated by: Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A. Efros, Angjoo Kanazawa (UC Berkeley / Teton.ai)
Paper: arXiv:2405.10320
Project page: https://toon3d.studio/
Dataset repository: https://github.com/ethanweber/toon3d-dataset
FiftyOne Dataset Structure#
Overview#
Property |
Value |
|---|---|
Dataset name |
|
FiftyOne media type |
|
Number of groups |
12 |
Number of samples |
92 |
Default group slice |
|
Total named slices |
13 |
Groups#
Each group corresponds to one cartoon scene. There are 12 groups, one per scene:
Group (scene tag) |
Frame slices |
Total samples (frames + 3D) |
|---|---|---|
|
|
9 |
|
|
8 |
|
|
13 |
|
|
8 |
|
|
7 |
|
|
10 |
|
|
6 |
|
|
7 |
|
|
6 |
|
|
6 |
|
|
5 |
|
|
7 |
Groups have a variable number of frame slices (4–12). Slice names are zero-padded to
two digits (frame_00–frame_11) and are consistent across all groups; scenes with
fewer frames simply have no sample in the higher-indexed slices.
Slices#
Frame slices — frame_00 through frame_11#
Each frame slice sample represents one original cartoon image from the scene.
filepath points to the original cartoon PNG (toon3d-dataset/<scene>/images/<NNNNN>.png).
Images are resized to a maximum of 960 × 720 px during preprocessing.
Fields:
Field |
Type |
Description |
|---|---|---|
|
|
Group membership — id shared across all slices of the same scene, name is the slice identifier (e.g. |
|
|
Single-element list containing the scene name, e.g. |
|
|
Marigold monocular depth estimate visualised as a heatmap. |
|
|
Human-annotated 2D sparse correspondences from the Toon3D Labeler, stored as a single |
|
|
Scene name, e.g. |
|
|
Zero-based frame index within the scene (matches the SfM frame order after filtering invalid images) |
|
|
Total number of valid frames in this scene |
|
|
Camera parameters recovered by the Toon3D SfM pipeline (see below) |
camera dict schema:
Key |
Type |
Description |
|---|---|---|
|
|
Learned focal length in pixels, x-axis |
|
|
Learned focal length in pixels, y-axis |
|
|
Principal point x (half image width) |
|
|
Principal point y (half image height) |
|
|
Image width in pixels |
|
|
Image height in pixels |
|
|
4×4 camera-to-world matrix in OpenCV / Nerfstudio convention. Camera 0 is placed at the world origin; all other cameras are expressed relative to it. |
The transform_matrix coordinate convention matches Nerfstudio’s OPENCV camera model.
A coordinate flip diag(1, -1, -1) is applied to the internal toon3d rotation before
writing, converting from the SfM-internal convention to Nerfstudio convention. In world
space, camera 0 looks in the +Z direction with −Y as the image-up axis.
scene_3d slice#
Each group has exactly one scene_3d sample. Its filepath points to a .fo3d
scene file (written by fo3d.Scene.write(..., resolve_relative_paths=True)).
Fields:
Field |
Type |
Description |
|---|---|---|
|
|
Group membership — same group id as the frame slices for this scene |
|
|
|
|
|
Scene name |
|
|
Number of per-frame PLY nodes in the scene |
.fo3d scene contents:
Each .fo3d file contains:
camera— aPerspectiveCamerawith:up = "-Y"— consistent with the SfM coordinate convention where image-up maps to world −Yposition— auto-computed as the point cloud centroid offset by 1.5× the scene diagonal along −Z, placing the viewer behind the scene looking forwardlook_at— centroid of the combinedall.plypoint cloudfov = 50.0°,near = 0.1,far = 2000.0
N
PlyMeshnodes (one per frame) with:name="frame_NNNNN"(zero-padded 5 digits)ply_path— absolute path tooutputs/<scene>/run/<timestamp>/nerfstudio/plys/<NNNNN>.plyis_point_cloud = False(set toTruefor point cloud rendering)center_geometry = False— the PLYs are already in SfM world space and must not be re-centreddefault_material—MeshBasicMaterialwith a distinct per-frame colour (cycles through 13 colours:#e63946,#457b9d,#2a9d8f,#e9c46a,#f4a261,#264653,#a8dadc,#bc4749,#6a994e,#7209b7,#3a0ca3,#f77f00,#4cc9f0)
The PLY format is ASCII with per-vertex fields
x y z red green blue(float32 x/y/z, uint8 RGB). Points are in SfM world space — back-projected from Marigold depth maps through the recovered camera poses. All per-frame PLYs share the same coordinate system and can be composed directly.
How the Dataset Was Built#
SfM outputs generated with
tnd-run(from thetoon3dpackage) for all 12 scenes. Each run readstoon3d-dataset/<scene>/(images,.ptdepth tensors,points.json) and writes Nerfstudio-format outputs tooutputs/<scene>/run/<timestamp>/nerfstudio/..fo3dscene files written — one per scene — intooutputs/<scene>/run/<timestamp>/nerfstudio/fo3d/scene.fo3d. The camera position is computed programmatically from theall.plycentroid and extent.FiftyOne grouped dataset created with
fo.Dataset.add_group_field("group", default="frame_00"). All 92 samples are added in a singledataset.add_samples(all_samples)call.
The build script is build_dataset.py at the root of this workspace.
Citation#
@inproceedings{weber2024toon3d,
title = {Toon3D: Seeing Cartoons from New Perspectives},
author = {Ethan Weber and Riley Peterlinz and Rohan Mathur and
Frederik Warburg and Alexei A. Efros and Angjoo Kanazawa},
booktitle = {arXiv:2405.10320},
year = {2024},
}