Note

This is a Hugging Face dataset. For large datasets, ensure huggingface_hub>=1.1.3 to avoid rate limits. Learn more in the Hugging Face integration docs.

Dataset Card for toon3d#

image/png

This is a FiftyOne dataset with 12 samples.

Installation#

If you haven’t already, install FiftyOne:

pip install -U fiftyone

Usage#

import fiftyone as fo
from huggingface_hub import snapshot_download


# Download the dataset snapshot to the current working directory

snapshot_download(
    repo_id="Voxel51/toon3d", 
    local_dir=".", 
    repo_type="dataset"
    )



# Load dataset from current directory using FiftyOne's native format
dataset = fo.Dataset.from_dir(
    dataset_dir=".",  # Current directory contains the dataset files
    dataset_type=fo.types.FiftyOneDataset,  # Specify FiftyOne dataset format
    name="toon3d"  # Assign a name to the dataset for identification
)

Dataset Details#

Dataset Description#

Toon3D is a multi-view cartoon scene reconstruction dataset introduced in the paper “Toon3D: Seeing Cartoons from New Perspectives” (Weber et al., arXiv:2405.10320, 2024). It contains 12 scenes from popular hand-drawn cartoons and anime, each comprising 5–12 frames that depict the same environment from geometrically inconsistent viewpoints.

This FiftyOne dataset wraps the Toon3D scenes together with the outputs of the authors’ custom Structure-from-Motion (SfM) pipeline, which recovers camera poses and dense 3D point clouds from the geometrically inconsistent input images.

Curated by: Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A. Efros, Angjoo Kanazawa (UC Berkeley / Teton.ai)
Paper: arXiv:2405.10320
Project page: https://toon3d.studio/
Dataset repository: https://github.com/ethanweber/toon3d-dataset

FiftyOne Dataset Structure#

Overview#

Property	Value
Dataset name	`toon3d`
FiftyOne media type	`group`
Number of groups	12
Number of samples	92
Default group slice	`frame_00`
Total named slices	13

Groups#

Each group corresponds to one cartoon scene. There are 12 groups, one per scene:

Group (scene tag)	Frame slices	Total samples (frames + 3D)
`avatar-house`	`frame_00` – `frame_07`	9
`bobs-burgers`	`frame_00` – `frame_06`	8
`bojak-room`	`frame_00` – `frame_11`	13
`family-guy-dining`	`frame_00` – `frame_06`	8
`family-guy-house`	`frame_00` – `frame_05`	7
`krusty-krab`	`frame_00` – `frame_08`	10
`magic-school-bus`	`frame_00` – `frame_04`	6
`mystery-machine`	`frame_00` – `frame_05`	7
`planet-express`	`frame_00` – `frame_04`	6
`simpsons-house`	`frame_00` – `frame_04`	6
`smith-residence`	`frame_00` – `frame_03`	5
`spirited-away`	`frame_00` – `frame_05`	7

Groups have a variable number of frame slices (4–12). Slice names are zero-padded to two digits (frame_00–frame_11) and are consistent across all groups; scenes with fewer frames simply have no sample in the higher-indexed slices.

Slices#

Frame slices — `frame_00` through `frame_11`#

Each frame slice sample represents one original cartoon image from the scene.

filepath points to the original cartoon PNG (toon3d-dataset/<scene>/images/<NNNNN>.png). Images are resized to a maximum of 960 × 720 px during preprocessing.

Fields:

Field	Type	Description
`group`	`EmbeddedDocumentField(Group)`	Group membership — id shared across all slices of the same scene, name is the slice identifier (e.g. `frame_02`)
`tags`	`ListField(StringField)`	Single-element list containing the scene name, e.g. `["bobs-burgers"]`
`depth`	`EmbeddedDocumentField(Heatmap)`	Marigold monocular depth estimate visualised as a heatmap. `map_path` points to the depth colormap PNG at `toon3d-dataset/<scene>/depth-images/<NNNNN>.png`
`keypoints`	`EmbeddedDocumentField(Keypoints)`	Human-annotated 2D sparse correspondences from the Toon3D Labeler, stored as a single `fo.Keypoint` with `label="correspondence"`. Coordinates are normalised to `[0, 1]`. Only valid (visible) points are included. `None` if no valid points exist for the frame.
`scene`	`StringField`	Scene name, e.g. `"bobs-burgers"`
`frame_idx`	`IntField`	Zero-based frame index within the scene (matches the SfM frame order after filtering invalid images)
`n_frames`	`IntField`	Total number of valid frames in this scene
`camera`	`DictField`	Camera parameters recovered by the Toon3D SfM pipeline (see below)

camera dict schema:

Key	Type	Description
`fl_x`	`float`	Learned focal length in pixels, x-axis
`fl_y`	`float`	Learned focal length in pixels, y-axis
`cx`	`float`	Principal point x (half image width)
`cy`	`float`	Principal point y (half image height)
`w`	`int`	Image width in pixels
`h`	`int`	Image height in pixels
`transform_matrix`	`list[list[float]]`	4×4 camera-to-world matrix in OpenCV / Nerfstudio convention. Camera 0 is placed at the world origin; all other cameras are expressed relative to it.

The transform_matrix coordinate convention matches Nerfstudio’s OPENCV camera model. A coordinate flip diag(1, -1, -1) is applied to the internal toon3d rotation before writing, converting from the SfM-internal convention to Nerfstudio convention. In world space, camera 0 looks in the +Z direction with −Y as the image-up axis.

`scene_3d` slice#

Each group has exactly one scene_3d sample. Its filepath points to a .fo3d scene file (written by fo3d.Scene.write(..., resolve_relative_paths=True)).

Fields:

Field	Type	Description
`group`	`EmbeddedDocumentField(Group)`	Group membership — same group id as the frame slices for this scene
`tags`	`ListField(StringField)`	`[scene_name]`
`scene`	`StringField`	Scene name
`n_frames`	`IntField`	Number of per-frame PLY nodes in the scene

.fo3d scene contents:

Each .fo3d file contains:

camera — a PerspectiveCamera with:
- up = "-Y" — consistent with the SfM coordinate convention where image-up maps to world −Y
- position — auto-computed as the point cloud centroid offset by 1.5× the scene diagonal along −Z, placing the viewer behind the scene looking forward
- look_at — centroid of the combined all.ply point cloud
- fov = 50.0°, near = 0.1, far = 2000.0
N PlyMesh nodes (one per frame) with:
- name = "frame_NNNNN" (zero-padded 5 digits)
- ply_path — absolute path to outputs/<scene>/run/<timestamp>/nerfstudio/plys/<NNNNN>.ply
- is_point_cloud = False (set to True for point cloud rendering)
- center_geometry = False — the PLYs are already in SfM world space and must not be re-centred
- default_material — MeshBasicMaterial with a distinct per-frame colour (cycles through 13 colours: #e63946, #457b9d, #2a9d8f, #e9c46a, #f4a261, #264653, #a8dadc, #bc4749, #6a994e, #7209b7, #3a0ca3, #f77f00, #4cc9f0)
The PLY format is ASCII with per-vertex fields x y z red green blue (float32 x/y/z, uint8 RGB). Points are in SfM world space — back-projected from Marigold depth maps through the recovered camera poses. All per-frame PLYs share the same coordinate system and can be composed directly.

How the Dataset Was Built#

SfM outputs generated with tnd-run (from the toon3d package) for all 12 scenes. Each run reads toon3d-dataset/<scene>/ (images, .pt depth tensors, points.json) and writes Nerfstudio-format outputs to outputs/<scene>/run/<timestamp>/nerfstudio/.
.fo3d scene files written — one per scene — into outputs/<scene>/run/<timestamp>/nerfstudio/fo3d/scene.fo3d. The camera position is computed programmatically from the all.ply centroid and extent.
FiftyOne grouped dataset created with fo.Dataset.add_group_field("group", default="frame_00"). All 92 samples are added in a single dataset.add_samples(all_samples) call.

The build script is build_dataset.py at the root of this workspace.

Citation#

@inproceedings{weber2024toon3d,
  title     = {Toon3D: Seeing Cartoons from New Perspectives},
  author    = {Ethan Weber and Riley Peterlinz and Rohan Mathur and
               Frederik Warburg and Alexei A. Efros and Angjoo Kanazawa},
  booktitle = {arXiv:2405.10320},
  year      = {2024},
}