Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

GitHub Repo

Isaac-0.2 FiftyOne Model Zoo Integration#

A FiftyOne Model Zoo integration for Isaac-0.2 by Perceptron AI - hybrid-reasoning vision-language models designed for real-world visual understanding tasks.

image

Overview#

Isaac-0.2 extends the efficient frontier of perception — small models that outperform systems 10× larger on visual reasoning and perception tasks, all running on commodity GPUs or edge devices. From robotics to media search to industrial inspection, Isaac 0.2 delivers high-accuracy perception without the heavy compute footprint.

Read the full announcement.

Available Models:

  • Isaac-0.2-2B-Preview - 2B parameter hybrid-reasoning model for maximum accuracy

  • Isaac-0.2-1B - 1B parameter model for faster inference and edge deployment

What’s New in Isaac 0.2#

  • Reasoning via Thinking Traces: Short, structured reasoning traces improve multi-step decisions, small-object understanding, and ambiguous spatial tasks

  • Tool Calling + Focus (Zoom & Crop): Isaac 0.2 can trigger tool calls to focus (zoom + crop) and re-query on smaller regions — improving fine-grained perception

  • Structured Outputs: More reliable structured output generation for consistent JSON and predictable downstream integration

  • Complex OCR: Improved text recognition across cluttered, low-resolution, or distorted regions — enabling accurate extraction from documents, diagrams, labels, screens, and dense real-world scenes

  • Desktop Use: Better performance on everyday desktop and mobile workflows such as UI understanding and navigation, making Isaac faster and more capable for agentic use cases

Features#

  • Object Detection: Detect and localize objects with bounding boxes

  • Keypoint Detection: Identify key points in images with spatial awareness

  • Complex OCR: Extract and detect text from documents, diagrams, labels, screens, and cluttered scenes

  • Classification: Classify images into categories with reliable JSON output

  • Visual Question Answering: Answer questions about image content

  • Segmentation: Generate polygon masks for instance segmentation

  • Desktop/UI Understanding: Navigate and understand desktop and mobile interfaces

Installation#

Prerequisites#

pip install fiftyone
pip install perceptron
pip install transformers
pip install accelerate
pip install torch torchvision
pip install huggingface-hub

Register the Model Zoo#

import fiftyone.zoo as foz

# Register this model zoo source
foz.register_zoo_model_source(
    "https://github.com/perceptron-ai-inc/fiftyone-isaac-0_2",
    overwrite=True
)

Usage#

Loading the Model#

import fiftyone.zoo as foz

# Load the Isaac-0.2 2B model
model = foz.load_zoo_model("PerceptronAI/Isaac-0.2-2B-Preview")

# Or load the 1B model for faster inference
model = foz.load_zoo_model("PerceptronAI/Isaac-0.2-1B")

Object Detection#

import fiftyone as fo
import fiftyone.zoo as foz

# Load a dataset
dataset = foz.load_zoo_dataset("quickstart", max_samples=10)

# Load model and set operation
model = foz.load_zoo_model("PerceptronAI/Isaac-0.2-2B-Preview")
model.operation = "detect"

# Set detection prompt
model.prompt = "Animals, Humans, Vehicles, Objects"

# Apply model to dataset
dataset.apply_model(model, label_field="detections")

Visual Question Answering (VQA)#

model.operation = "vqa"
model.prompt = "Describe the spatial relationships between objects in this scene"

dataset.apply_model(model, label_field="vqa_response")

OCR Text Detection#

model.operation = "ocr_detection"
model.prompt = "Detect all text in this image"

dataset.apply_model(model, label_field="text_detections")

OCR Text Extraction#

model.operation = "ocr"
model.prompt = "Extract all text from this image"

dataset.apply_model(model, label_field="extracted_text")

Keypoint Detection#

model.operation = "point"
model.prompt = "Identify key features: eyes, nose, corners"

dataset.apply_model(model, label_field="keypoints")

Classification#

model.operation = "classify"
model.prompt = "Classify the weather: sunny, rainy, snowy, cloudy, indoor"

dataset.apply_model(model, label_field="weather")

Segmentation (Polygons)#

model.operation = "segment"
model.prompt = "Draw polygons around the following objects: person, car, animal"

dataset.apply_model(model, label_field="polygons")

Advanced Usage#

Thinking Mode#

Enable structured reasoning traces for improved accuracy on complex scenes. Thinking traces improve multi-step decisions, small-object understanding, and ambiguous spatial tasks:

model.operation = "detect"
model.enable_thinking = True

dataset.apply_model(model, label_field="detections_with_reasoning")

# Disable when done
model.enable_thinking = False

Focus Tool Call#

Enable the Focus system (zoom + crop) for fine-grained perception. Isaac 0.2 can trigger tool calls to focus on smaller regions and re-query — improving detection of small objects and dense scenes. Only works with BOX operations (detect, ocr_detection):

model.operation = "detect"
model.enable_focus_tool_call = True

dataset.apply_model(model, label_field="focused_detections")

# Disable when done
model.enable_focus_tool_call = False

Combining Advanced Options#

You can combine both options for maximum precision:

model.operation = "detect"
model.enable_thinking = True
model.enable_focus_tool_call = True

dataset.apply_model(model, label_field="enhanced_detections")

Using Sample-Level Prompts#

You can use different prompts for each sample in your dataset:

# Apply model using the prompt field
model.operation = "detect"
dataset.apply_model(
    model,
    label_field="custom_detections",
    prompt_field="sample_prompt"
)

Custom System Prompts#

You can customize the system prompt for specific use cases:

model.system_prompt = """
You are a specialized assistant for medical image analysis.
Focus on identifying anatomical structures and abnormalities.
"""

Model Operations#

Operation

Description

Output Type

Example Prompt

detect

Object detection with bounding boxes

fo.Detections

“Cars, pedestrians, traffic signs”

point

Keypoint detection

fo.Keypoints

“Eyes, nose, mouth corners”

classify

Image classification

fo.Classifications

“Indoor or outdoor scene”

ocr

Text extraction

String

“Extract all text from the image”

ocr_detection

Text detection with boxes

fo.Detections

“Detect text regions”

ocr_polygon

Text detection with polygons

fo.Polylines

“Detect text regions”

segment

Instance segmentation

fo.Polylines

“Segment all objects”

vqa

Visual question answering

String

“What is the main subject?”

Example Notebook#

Open In Colab

See isaac_0_2_demo.ipynb for a complete interactive notebook demonstrating all operations. You can run it directly in Google Colab or download it for local execution.

Model Details#

  • Parameters: 2B (Preview) / 1B

  • Architecture: Based on Qwen with custom vision encoder

  • Vision Resolution: Dynamic, up to 60 megapixels

  • Context Length: 16,384 tokens

  • Training: Perceptive-language pretraining on multimodal data

License#

  • Code: Apache 2.0 License

  • Model Weights: Creative Commons Attribution-NonCommercial 4.0 International License

Resources#

Citation#

If you use Isaac-0.2 in your research or applications, please cite:

@software{isaac2025fiftyone,
  title = {Isaac-0.2 FiftyOne Model Zoo Integration},
  author = {Perceptron AI},
  year = {2025},
  url = {https://github.com/perceptron-ai-inc/fiftyone-isaac-0_2},
  note = {FiftyOne integration for Isaac-0.2 perceptive-language model}
}

@misc{perceptronai2025isaac,
  title = {Isaac-0.2: A Perceptive-Language Model},
  author = {{Perceptron AI}},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PerceptronAI/Isaac-0.2-2B-Preview},
  note = {Open-source multimodal model for real-world visual understanding}
}

Contact#

  • Technical inquiries: support@perceptron.inc

  • Commercial inquiries: sales@perceptron.inc

  • Join the team: join-us@perceptron.inc

Acknowledgments#