Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

LeRobot Dataset Importer for FiftyOne#

A streamlined FiftyOne importer for LeRobot datasets that have been extracted into individual PNG images and JSON metadata files. Creates properly grouped datasets with temporal relationships preserved across multiple camera views.

Installation#

Install FiftyOne:
```
pip install fiftyone
```

Clone this repository:

git clone https://github.com/harpreetsahota204/fiftyone_lerobot_importer.git
cd fiftyone_lerobot_importer

Expected Dataset Structure#

Your LeRobot dataset should be in the extracted format:

dataset_root/
 extracted_data/
    episode_000000/
       episode_000000_000000_cam_low.png
       episode_000000_000000_cam_high.png
       episode_000000_000000_cam_right_wrist.png
       episode_000000_000000_cam_left_wrist.png
       episode_000000_000000.json
       ... (more frames)
    episode_000001/
 meta/
     info.json
     episodes.jsonl
     tasks.jsonl
     stats.json

Converting from Parquet Format#

If your LeRobot dataset is in the standard parquet format (with data/chunk-000/episode_*.parquet files), you can convert it to the extracted format using the provided extraction script:

# Convert parquet files to extracted PNG/JSON format
python extract_from_parquet_parallel.py \
    --input-dir <path_to_parquet_files> \
    --output-dir <path_to_output_directory> \
    --workers <select_number_of_workers>

# This will create the required directory structure:
# extracted_data/
#  episode_000000/
#     episode_000000_000000_cam_high.png
#     episode_000000_000000_cam_low.png
#     episode_000000_000000.json
#     ...
#  episode_000001/
#      ...

Extraction Script Options:

--input-dir: Directory containing episode_*.parquet files
--output-dir: Where to save extracted PNG/JSON files
--workers: Number of parallel workers (default: CPU count / 2)
--keep-parquet: Keep original parquet files after extraction
--test-one: Process only one episode for testing
--sequential: Disable multiprocessing for debugging

Note: The extraction process requires the meta/ directory to already exist with info.json, episodes.jsonl, tasks.jsonl, and stats.json files. These are typically created when you download or prepare the LeRobot dataset.

Usage#

import fiftyone as fo
from lerobot_importer import LeRobotDatasetImporter, LeRobotDataset

dataset_dir = "/path/to/your/dataset"
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=LeRobotDataset,
    camera_views=["low", "high", "right_wrist", "left_wrist"],
    labels_path="./meta",
    name="my_lerobot_dataset",
    include_metadata=True,
    overwrite=True,
)

print(f"Created dataset with {len(dataset)} samples")
print(f"Camera views: {dataset.group_slices}")
print(f"Default view: {dataset.default_group_slice}")

Parameters#

Parameter	Type	Required	Description
`dataset_dir`	str	Yes	Root directory containing extracted_data/ and meta/
`dataset_type`	class	Yes	Must be `LeRobotDataset`
`camera_views`	List[str]	Yes	Camera view names (e.g., [“low”, “high”, “right_wrist”])
`labels_path`	str	No	Path to meta directory (default: “./meta”)
`name`	str	Yes	Name for the FiftyOne dataset
`include_metadata`	bool	No	Load trajectory data from JSON files (default: True)
`episode_ids`	List[int]	No	Specific episode IDs to load (None = all)
`task_ids`	List[int]	No	Specific task IDs to load (None = all)
`max_samples`	int	No	Maximum samples to load (None = all)
`shuffle`	bool	No	Shuffle loading order (default: False)
`seed`	int	No	Random seed for shuffling
`default_slice`	str	No	Default camera view (default: first camera)
`overwrite`	bool	No	Overwrite existing dataset (default: False)

Example Variations#

Load specific episodes only:#

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=LeRobotDataset,
    camera_views=["low", "high"],
    episode_ids=[0, 1, 2, 3, 4],  # First 5 episodes only
    name="first_five_episodes",
    overwrite=True,
)

Load single camera view:#

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=LeRobotDataset,
    camera_views=["high"],  # Only high camera
    name="high_camera_only",
    max_samples=100,
    overwrite=True,
)

Skip trajectory metadata for faster loading:#

dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=LeRobotDataset,
    camera_views=["low", "high", "right_wrist", "left_wrist"],
    include_metadata=False,  # Skip JSON loading
    name="fast_loading",
    overwrite=True,
)

Working with Grouped Datasets#

The importer automatically creates grouped datasets where each group represents one temporal frame with multiple camera views:

# Access different camera views
high_cam = dataset.select_group_slices("high")
wrist_cam = dataset.select_group_slices("right_wrist")

# Group by episode for navigation
episodes_view = dataset.group_by("episode_index", order_by="frame_index")
dataset.save_view("by_episodes", episodes_view)

Sample Data Structure#

Each sample contains trajectory and metadata fields:

sample = dataset.first()

# Basic fields
print(sample.episode_index)    # Episode number
print(sample.frame_index)      # Frame number within episode
print(sample.camera_view)      # Camera view name
print(sample.task)            # Task name

# Trajectory data (if include_metadata=True)
print(sample.timestamp)           # Frame timestamp
print(sample.observation_state)   # Robot joint states
print(sample.action)              # Robot actions
print(sample.observation_velocity) # Joint velocities
print(sample.observation_effort)   # Joint efforts

Visualization#

Launch FiftyOne App to visualize your dataset:

session = fo.launch_app(dataset)

Navigate between camera views using the group slices dropdown in the FiftyOne interface.

Requirements#

Extracted format only: Works exclusively with PNG/JSON files (no parquet support)
Camera views required: You must specify which camera views to load
JSON metadata: Uses episode/frame indices from JSON files, not filename parsing
Standard structure: Expects extracted_data/ and meta/ directories

Notes#

The importer creates one FiftyOne sample per camera view per frame
All samples from the same frame share a group ID for proper temporal grouping
JSON trajectory data is automatically sanitized for MongoDB compatibility
Large datasets may take time to load due to file I/O and metadata processing