Note

This is a Hugging Face dataset. For large datasets, ensure huggingface_hub>=1.1.3 to avoid rate limits. Learn more in the Hugging Face integration docs.

Hugging Face

Dataset Card for Spatial LM#

image/png

This is a FiftyOne 3D dataset with 19,992 samples representing indoor room point clouds with structured 3D layout and object annotations from the SpatialLM benchmark.

Each sample is an .fo3d scene containing a coloured point cloud with overlaid 3D bounding box annotations for walls, doors, windows, and furniture/objects — all browsable and queryable in the FiftyOne App.

Installation#

If you haven’t already, install FiftyOne:

pip install -U fiftyone

Usage#

import fiftyone as fo
from huggingface_hub import snapshot_download


# Download the dataset snapshot to the current working directory

snapshot_download(
    repo_id="Voxel51/spatial_lm_dataset", 
    local_dir=".", 
    repo_type="dataset"
    )



# Load dataset from current directory using FiftyOne's native format
dataset = fo.Dataset.from_dir(
    dataset_dir=".",  # Current directory contains the dataset files
    dataset_type=fo.types.FiftyOneDataset,  # Specify FiftyOne dataset format
    name="spatial_lm_dataset"  # Assign a name to the dataset for identification
)

# Launch the App
session = fo.launch_app(dataset)

Viewer Tips#

  • The point cloud renders at full RGB colour by default.

  • Toggle walls / doors / windows / objects in the sidebar to overlay coloured 3D bounding boxes on the point cloud.

  • Filter by room_type, split, or sample_type for targeted exploration.

  • The viewer uses a Z-up coordinate convention (floor at Z = 0, height along +Z).

Dataset Details#

Dataset Description#

SpatialLM is a large-scale synthetic indoor scene dataset created by Manycore Tech Inc. and researchers at HKUST. It contains point clouds from 12,328 diverse indoor scenes comprising 54,778 rooms, each paired with ground-truth 3D annotations for architectural layout (walls, doors, windows) and oriented object bounding boxes across 59 semantic categories. The scenes are sourced from professional interior designs on the Kujiale online platform and rendered with a production-grade rendering engine.

This FiftyOne version represents 19,992 room-level samples as .fo3d 3D scenes. Each sample pairs a PLY point cloud (rendered as coloured points in world-space coordinates) with fo.Detections fields for walls, doors, windows, and objects — making all annotations browsable, filterable, and queryable directly in the FiftyOne App’s 3D viewer.

  • Curated by: Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, Zihan Zhou — Manycore Tech Inc. and Hong Kong University of Science and Technology

  • FiftyOne integration by: Harpreet Sahota

  • Language(s): en (annotations are English category labels)

  • License: CC-BY-NC-4.0

Dataset Sources#

FiftyOne Dataset Structure#

Scene Format#

Each sample’s filepath points to an .fo3d scene file containing a single fo.PlyMesh node with is_point_cloud=True. The PLY files carry per-vertex RGB colours which FiftyOne renders automatically. center_geometry=False preserves the original world-space coordinates so annotations stay spatially aligned with the point cloud.

The scene uses a dark background (#1a1a2e) and a fo.PerspectiveCamera with up="Z".

Sample Fields#

Field

Type

Description

filepath

str

Path to the .fo3d scene file

sample_id

str

Unique identifier following {scene_id}_{room_id}_{sample} convention (e.g., scene_001523_00_2)

scene_id

str

Identifier for the multi-room apartment scene

room_type

str

Functional room type (e.g., bedroom, living_room, kitchen, bathroom, balcony)

sample_type

int

Point cloud sampling configuration: 0 = 8 panoramic views (most complete), 1 = 8 perspective views (most sparse), 2 = 16 perspective views, 3 = 24 perspective views

split

str

Dataset partition: train, val, test, or reserved

walls

fo.Detections

Wall annotations as 3D bounding boxes

doors

fo.Detections

Door annotations as 3D bounding boxes

windows

fo.Detections

Window annotations as 3D bounding boxes

objects

fo.Detections

Furniture/object annotations as oriented 3D bounding boxes

3D Detection Fields#

All annotation fields use fo.Detection with 3D attributes:

Attribute

Description

label

Semantic category (e.g., wall, door, window, sofa, bed)

location

[x, y, z] centre position in world-space coordinates

dimensions

[sx, sy, sz] bounding box extents in metres

rotation

[rx, ry, rz] Euler rotation; walls and openings use yaw (Z-rotation) derived from wall endpoints

Walls have tags containing the wall ID (e.g., wall_0). Doors and windows have tags containing both the opening ID and the parent wall ID they are attached to (e.g., ["door_0", "wall_16"]).

A minimum thickness of 0.05 m is enforced on walls and openings so they remain visible in the 3D viewer.

Coordinate System#

The dataset uses a Z-up convention: the floor sits at Z = 0, height extends along +Z. Coordinates are quantized into 1,280 bins at 2.5 cm resolution in the original SpatialLM format and stored as continuous float values in FiftyOne.

Dataset Splits#

Split

Scenes

Point Cloud Samples

Train

11,328

~199,286

Val

500

500

Test

500

500

Reserved

—

—

Multiple point cloud samples of the same room (varying observation completeness) exist in the training set. Val/test splits use a single random sample per room.

Object Categories#

The dataset annotates 59 object categories (excluding walls, doors, and windows) organised by function:

Super Category

Categories

Seating

sofa, chair, dining_chair, bar_chair, stool

Bedding

bed, pillow

Cabinetry

wardrobe, nightstand, tv_cabinet, wine_cabinet, bathroom_cabinet, shoe_cabinet, entrance_cabinet, decorative_cabinet, washing_cabinet, wall_cabinet, sideboard, cupboard

Tables

coffee_table, dining_table, side_table, dressing_table, desk

Kitchen

integrated_stove, gas_stove, range_hood, microwave_oven, sink, stove, refrigerator

Bathroom

hand_sink, shower, shower_room, toilet, tub

Lighting

illumination, chandelier, floor_standing_lamp

Decoration

wall_decoration, painting, curtain, carpet, plants, potted_bonsai

Electronics

tv, computer, air_conditioner, washing_machine

Other

clothes_rack, mirror, bookcase, cushion, bar, screen, combination_sofa, dining_table_combination, leisure_table_and_chair_combination, multifunctional_combination_bed

Small objects with all side lengths below 15 cm are filtered out. The dataset contains 412,932 annotated object instances from 35,426 unique CAD models.

Dataset Creation#

Curation Rationale#

SpatialLM was created to address the lack of large-scale, high-quality datasets that provide both architectural layout annotations (walls, doors, windows) and 3D object bounding boxes for indoor scenes. Existing real-world datasets like ScanNet and ScanCAD are limited in scale (hundreds to low thousands of scenes) and typically provide only object annotations. Existing synthetic alternatives either lack visual realism (ProcTHOR, ASE — procedurally generated) or are too small (Hypersim: 461 scenes, HSSD: 211 scenes). SpatialLM fills this gap by leveraging professionally designed interiors at scale.

Source Data#

Data Collection and Processing#

The 3D scenes originate from a large repository of interior designs on Kujiale, an online interior design platform. Most designs were created by professional designers for real-world production use. The authors parsed each 3D house model into individual rooms and applied filtering rules, yielding 12,328 distinct scenes with 54,778 rooms.

For each scene, photo-realistic RGBD images were rendered using a production-grade rendering engine. Camera trajectories traversing each room were simulated, with images captured at 0.5 m intervals. Point clouds were then reconstructed from these RGBD renders. Each room has up to 4 point cloud sampling configurations varying in observation completeness (8 panoramic views through 24 perspective views).

Who are the source data producers?#

Professional interior designers using the Kujiale platform. The scenes reflect real-world Chinese residential interior design conventions and standards.

Annotations#

Annotation process#

Annotations are derived directly from the source 3D scene geometry — no manual annotation was required. Walls, doors, and windows are extracted from the architectural model. Object bounding boxes (oriented 3D boxes with semantic categories) are computed from the placed 3D assets in the design. Objects are annotated across 59 categories, with small objects

Citation#

@inproceedings{SpatialLM,
  title     = {SpatialLM: Training Large Language Models for Structured Indoor Modeling},
  author    = {Mao, Yongsen and Zhong, Junhao and Fang, Chuan and Zheng, Jia and Tang, Rui and Zhu, Hao and Tan, Ping and Zhou, Zihan},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025}
}