Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

SAM3 for FiftyOne#

Integration of Meta’s SAM3 (Segment Anything Model 3) into FiftyOne, with full batching support and visual embeddings.

Features#

Three Segmentation Operations

Concept Segmentation: Find ALL matching instances using text prompts
Visual Segmentation: Segment SPECIFIC instances using interactive prompts (boxes/points)
Automatic Segmentation: Generate all masks without prompts (with quality filtering & deduplication)

Visual Embeddings

Extract 1024-dim visual embeddings for similarity search
Three pooling strategies: mean, max, cls
Independent of text prompts

Installation#

** Important:** SAM3 is brand new and requires transformers from source (not yet on PyPI):

# Install transformers from source
pip install git+https://github.com/huggingface/transformers.git#egg=transformers

# Install FiftyOne
pip install fiftyone

# Install other dependencies
pip install torch torchvision huggingface-hub

Loading as Remote Zoo Model#

import fiftyone.zoo as foz

# Register the remote model source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/sam3_images"
)

# Load the model
model = foz.load_zoo_model("facebook/sam3")

Quick Start Notebook#

Try it instantly in Google Colab:

The notebook covers:

Visual embeddings & similarity search
Concept segmentation (single & multi-concept)
Visual segmentation with existing detections
Automatic segmentation with quality filtering

Parameters#

Model Parameters#

Parameter	Type	Default	Description
`operation`	str	“concept_segmentation”	Operation type
`prompt`	str	None	Default text prompt
`threshold`	float	0.5	Confidence threshold
`mask_threshold`	float	0.5	Mask binarization threshold
`points_mask_index`	int	0	Which mask for point prompts (0=best)
`auto_kwargs`	dict	`{}`	Automatic segmentation settings
`auto_kwargs.points_per_side`	int	16	Point grid density (16² = 256 points)
`auto_kwargs.points_per_batch`	int	256	Inference batch size
`auto_kwargs.quality_threshold`	float	0.8	Minimum quality score to keep (0-1)
`auto_kwargs.iou_threshold`	float	0.85	IoU threshold for deduplication (0-1)
`auto_kwargs.max_masks`	int	None	Maximum masks to return (None = unlimited)
`pooling_strategy`	str	“mean”	Embeddings pooling (mean/max/cls)
`return_semantic_seg`	bool	False	Include semantic segmentation mask
`device`	str	“auto”	Device (cuda/cpu/mps/auto)

Operations#

concept_segmentation

Finds ALL matching instances
Supports text prompts only (single string or list)
Returns all objects matching the concept

visual_segmentation

Segments SPECIFIC instances
Supports box or point prompts
Returns one mask per prompt

automatic_segmentation

No prompts needed
Generates all possible masks
Memory intensive

Complete Example (All Operations)#

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.brain as fob

# Load dataset
dataset = foz.load_zoo_dataset("quickstart")

# Register remote zoo model
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/sam3_images"
)

# Load SAM3 model
model = foz.load_zoo_model("facebook/sam3")

# ============================================================
# 1. Compute Embeddings for Similarity Search
# ============================================================
model.pooling_strategy = "max"  # or "mean", "cls"

dataset.compute_embeddings(
    model,
    embeddings_field="sam_embeddings",
    batch_size=32
)

# Visualize with UMAP
fob.compute_visualization(
    dataset,
    method="umap",
    brain_key="sam_viz",
    embeddings="sam_embeddings",
    num_dims=2
)

# ============================================================
# 2. Automatic Segmentation (Segment Everything)
# ============================================================
model.operation = "automatic_segmentation"
model.threshold = 0.5
model.mask_threshold = 0.5

dataset.apply_model(
    model,
    label_field="automatic_segmentation",
    batch_size=4,
    num_workers=4
)

# ============================================================
# 3. Visual Segmentation (Refine Existing Detections)
# ============================================================
model.operation = "visual_segmentation"

dataset.apply_model(
    model,
    label_field="visual_segmentation",
    prompt_field="ground_truth",  # Use existing detections as prompts
    batch_size=64,
    num_workers=4
)

# ============================================================
# 4. Concept Segmentation (Find Multiple Object Types)
# ============================================================
model.operation = "concept_segmentation"
model.prompt = [
    "bird", "human", "land vehicle", "air vehicle",
    "aquatic vehicle", "animal", "food", "utensils", "furniture"
]
model.threshold = 0.5
model.mask_threshold = 0.5

dataset.apply_model(
    model,
    label_field="concept_segmentation",
    batch_size=8,
    num_workers=4
)

# Launch app
session = fo.launch_app(dataset)

Quick Start#

Concept Segmentation (Text Prompts)#

Find ALL instances matching text concepts:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = fo.load_dataset("quickstart")

# Load model
model = foz.load_zoo_model("facebook/sam3")

# Single concept
model.operation = "concept_segmentation"
model.prompt = "person"
model.threshold = 0.5
model.mask_threshold = 0.5

dataset.apply_model(
    model,
    label_field="people",
    batch_size=16,
    num_workers=4
)

# Multiple concepts (finds all in each image)
# Note: Runs one inference pass per concept, so 4 concepts = 4x slower
model.prompt = ["person", "car", "dog", "bird"]
model.threshold = 0.5
model.mask_threshold = 0.5

dataset.apply_model(
    model,
    label_field="multiple_objects",
    batch_size=8,
    num_workers=4
)

session = fo.launch_app(dataset)

Visual Segmentation (Box or Point Prompts)#

Segment SPECIFIC instances using existing detections or keypoints as prompts:

# Load model
model = foz.load_zoo_model("facebook/sam3")

# Configure for visual segmentation
model.operation = "visual_segmentation"

# Option 1: Use boxes as prompts
dataset.apply_model(
    model,
    label_field="box_segmentations",
    prompt_field="ground_truth",  # Field with fo.Detections (boxes)
    batch_size=64,
    num_workers=4
)

# Option 2: Use keypoints as prompts
dataset.apply_model(
    model,
    label_field="point_segmentations",
    prompt_field="keypoints",  # Field with fo.Keypoints (points)
    batch_size=64,
    num_workers=4
)

Automatic Segmentation#

Generate all masks without prompts using point grid sampling with automatic filtering:

# Load model
model = foz.load_zoo_model("facebook/sam3")

# Configure for automatic segmentation
model.operation = "automatic_segmentation"
model.auto_kwargs = {
    "points_per_side": 16,       # Grid density (16x16 = 256 points)
    "points_per_batch": 256,     # Inference batch size
    "quality_threshold": 0.8,    # Keep masks with IoU score >= 0.8
    "iou_threshold": 0.85,       # Remove duplicates with IoU > 0.85
    "max_masks": 100             # Limit to top 100 masks by quality
}

dataset.apply_model(
    model,
    label_field="auto_masks",
    batch_size=4,
    num_workers=2
)

Quality of Life Features:

Quality filtering: Only keeps high-quality masks (configurable threshold)
Deduplication: Removes overlapping/duplicate masks using NMS
Limit results: Optionally cap at top N masks by quality score
Clear labeling: Masks labeled as object_0, object_1, etc. for easy visualization

Visual Embeddings#

Extract embeddings for similarity search and visualization:

import fiftyone.brain as fob

# Load model
model = foz.load_zoo_model("facebook/sam3")

# Configure pooling strategy
model.pooling_strategy = "max"  # or "mean", "cls"

# Compute embeddings
dataset.compute_embeddings(
    model,
    embeddings_field="sam3_embeddings",
    batch_size=32
)

# Similarity search
fob.compute_similarity(dataset, embeddings="sam3_embeddings")
query = dataset.first()
similar = dataset.sort_by_similarity(query, k=10)

# Visualize with UMAP
fob.compute_visualization(
    dataset,
    embeddings="sam3_embeddings",
    method="umap",
    num_dims=2
)

Semantic Segmentation#

SAM3 provides semantic segmentation alongside instance masks - a unified mask covering ALL instances:

# Load model
model = foz.load_zoo_model("facebook/sam3")

# Configure
model.operation = "concept_segmentation"
model.prompt = "person"
model.return_semantic_seg = True  # Enable semantic segmentation

dataset.apply_model(
    model,
    label_field="instance_masks",
    batch_size=16,
    num_workers=4
)

Per-Sample Text Prompts#

# Load model
model = foz.load_zoo_model("facebook/sam3")

# Configure
model.operation = "concept_segmentation"

# Assumes dataset has field "my_prompt" with text values
dataset.apply_model(
    model,
    label_field="results",
    prompt_field="my_prompt",  # Field with str or list values
    batch_size=16,
    num_workers=4
)

# Examples of valid prompts in dataset field:
# - "cat" (single string - finds cats in that image)
# - ["cat", "dog", "bird"] (list - finds all three in that image, 3x slower)

Batching Limitations#

Visual Segmentation: SAM3 Tracker cannot batch images with different numbers of prompts (boxes/points). The model automatically falls back to sequential processing when this occurs.

# This will batch efficiently (all images have 1 box)
dataset.apply_model(model, prompt_field="single_box", batch_size=16)

# This will fall back to sequential (images have varying box counts)
# Performance will be slower but results are still correct
dataset.apply_model(model, prompt_field="multi_box", batch_size=16)

Concept Segmentation: No batching limitations - handles variable prompts naturally.

Model Details#

Model: facebook/sam3
Embeddings: 1024-dimensional visual features
Input: RGB images (any size, resized to 1008x1008)
Output: Instance masks with bounding boxes and scores

Citation#

@article{sam3,
  title={SAM 3: Segment Anything Model 3},
  author={Meta AI Research},
  year={2024}
}