Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

VGGT: Visual Geometry Grounded Transformer FiftyOne Remote Source Zoo Model Integration#

This repository provides a FiftyOne Zoo Model for VGGT (Visual Geometry Grounded Transformer), enabling seamless 3D scene reconstruction from single images with integrated visualization capabilities.

Overview#

VGGT takes a single RGB image as input and produces:

Dense depth maps with confidence scores
Camera pose estimation (extrinsic and intrinsic parameters)
Dense 3D point clouds from depth map unprojection
Dynamic camera orientation for optimal 3D visualization

Installation#

# Install FiftyOne
pip install fiftyone

You also need to install the following:

pip install vggt@git+https://github.com/facebookresearch/vggt.git
pip install open3d

Register the VGGT Zoo Model source#

foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/vggt",
    overwrite=True
)

Quick Start#

1. Load the Model#

import fiftyone as fo
import fiftyone.zoo as foz

# Load VGGT model from the zoo
model = foz.load_zoo_model("facebook/VGGT-1B")

2. Apply to Your Dataset#

# Load your dataset
dataset = fo.load_dataset("your_dataset")

# Apply VGGT model to generate depth maps and 3D reconstructions
dataset.apply_model(model, "depth_map_path")

Output Files#

For each input image VGGT generates:

image_depth.png: Colorized depth map for heatmap visualization
image.pcd: 3D point cloud in PCD format
image.fo3d: FiftyOne 3D scene with dynamic camera orientation

Configuration Options#

Model Parameters#

Parameter	Type	Default	Description
`confidence_threshold`	float	51.0	Percentile threshold (0-100) for point filtering
`mode`	str	“pad”	Image preprocessing mode

Preprocessing Modes#

The mode parameter controls how images are preprocessed for VGGT:

mode="crop": Ensures width=518px while maintaining aspect ratio. Height is center-cropped if larger than 518px
mode="pad": Ensures the largest dimension is 518px while maintaining aspect ratio. The smaller dimension is padded to reach a square shape (518x518)

Example Configuration#

# Load model with custom configuration
model = foz.load_zoo_model(
    "facebook/VGGT-1B",
    confidence_threshold=75.0,  # More aggressive filtering
    mode="crop"                 # Use crop instead of pad
)

Camera Orientation#

The implementation Sets FiftyOne camera orientation statically to up="Z"

Roadmap#

Current Release#

Depth map generation and visualization
3D point cloud reconstruction
FiftyOne Zoo Model integration

Future Releases#

Camera parameter extraction and export
Positions camera at the actual VGGT camera location with correct viewing direction
Dynamic camera orientation
Multi-view reconstruction support
Camera animation for video sequences
Advanced scene analysis tools

License#

This integration wrapper follows the same license as the underlying VGGT model. Please refer to the original VGGT repository for licensing details.

Contributing#

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request with clear description

Citation#

If you use this FiftyOne integration in your research, please cite both VGGT and FiftyOne:

@inproceedings{wang2025vggt,
  title={VGGT: Visual Geometry Grounded Transformer},
  author={Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

@misc{fiftyone,
  title={FiftyOne},
  author={Voxel51},
  year={2020},
  url={https://github.com/voxel51/fiftyone}
}