Note
This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.
Synthetic GUI Samples Plugin for FiftyOne#
A comprehensive FiftyOne plugin for augmenting GUI screenshot datasets with computer vision and language model transformations. This plugin provides a collection of operators designed specifically for synthetic GUI data generation and augmentation.
Overview#
This plugin is designed for image datasets containing GUI screenshots in COCO4GUI format, with detections
(bounding boxes) and keypoints
annotations. It provides both visual transformations and text augmentation capabilities to create diverse synthetic training data for GUI understanding models.
Key Features#
Visual Augmentations: Grayscale conversion, color inversion, colorblind simulation, and resolution scaling
LLM-Powered Text Augmentation: Rephrase or translate task descriptions using local language models
Annotation Preservation: All transformations preserve bounding boxes, keypoints, and metadata
Flexible Resolution Support: Resize images to common screen resolutions for multi-device training
Provenance Tracking: Complete transformation history for reproducibility
Installation#
Prerequisites#
FiftyOne >= 1.3.0
Python >= 3.10
OpenCV (
cv2
)NumPy
Pillow (PIL)
For LLM Text Augmentation#
# Standard PyTorch installation
pip install torch transformers
Plugin Installation#
Download this repository to your FiftyOne plugins directory:
fiftyone plugins download https://github.com/harpreetsahota204/synthetic_gui_samples_plugins.git
Restart your FiftyOne App to load the plugin:
# If FiftyOne is running, restart it
fiftyone app launch
Operators#
This plugin provides five main operators accessible through the FiftyOne App’s operator browser:
1. Grayscale Augmentation#
Operator: grayscale_augment
Converts images to 3-channel grayscale while preserving all annotations.
Features:
Maintains 3-channel BGR format for compatibility
Preserves all bounding boxes and keypoints
Optional label field copying controls
Use Case: Create grayscale variants for robustness testing and data diversity.
2. Color Inversion Augmentation#
Operator: invert_colors_augment
Inverts image colors using bitwise NOT operation.
Features:
Complete color inversion (white becomes black, etc.)
Preserves spatial relationships
Maintains annotation accuracy
Use Case: Test model robustness to inverted color schemes (dark mode UIs, high contrast displays).
3. Colorblind Simulation#
Operator: colorblind_sim_augment
Simulates various types of color vision deficiency.
Supported Types:
Deuteranopia: Green-blind (complete)
Protanopia: Red-blind (complete)
Tritanopia: Blue-blind (complete)
Deuteranomaly: Green-weak (partial)
Protanomaly: Red-weak (partial)
Tritanomaly: Blue-weak (partial)
Use Case: Ensure GUI accessibility by testing how interfaces appear to users with color vision deficiencies.
4. Task Description Augmentation#
Operator: task_description_augment
Uses local language models to rephrase or translate task descriptions in annotations.
Features:
Multiple Models: Support for Qwen3-0.6B, Qwen3-1.7B
Two Modes:
Rephrase: Generate alternative phrasings in the same language
Translate: Convert to different languages
Provenance: Preserves original text and includes reasoning
Selective Processing: Choose which annotation types to process
Use Case: Create diverse language variations for multilingual GUI understanding or paraphrase augmentation.
5. Resolution Scaling#
Operator: resize_images
Resizes images to common screen resolutions while maintaining annotation accuracy.
Supported Resolutions:
Mobile/Tablet: 1024×768, 1280×800
Laptop/Desktop: 1366×768, 1920×1080, 1440×900, 1536×864
High-End: 2560×1440, 3840×2160 (4K), 5120×2880 (5K)
Ultrawide: 2560×1080, 3440×1440
Custom: User-defined dimensions
Features:
Automatic annotation scaling (relative coordinates preserved)
Multiple interpolation methods
Batch processing support
Use Case: Generate training data for different screen sizes and device types.
Usage#
Basic Workflow#
Load your dataset in FiftyOne App
Select samples (optional) - operators work on selection or entire view
Open operator browser ( icon in the App)
Choose an operator from the
@harpreetsahota/synthetic_gui_samples_plugins
sectionConfigure parameters in the operator form
Execute immediately or delegate for background processing
Example: Grayscale Augmentation#
import fiftyone as fo
# Load your GUI dataset
dataset = fo.load_dataset("my_gui_dataset")
# Launch FiftyOne App
session = fo.launch_app(dataset)
# In the App:
# 1. Select samples or use entire view
# 2. Open operator browser ( icon)
# 3. Find "Apply Grayscale Augmentation"
# 4. Choose which label fields to copy
# 5. Execute
Example: LLM Text Augmentation#
# For task description rephrasing:
# 1. Select samples with task_description annotations
# 2. Run "Rephrase Task Descriptions with LLM"
# 3. Choose model (Qwen3-0.6B for speed, Qwen3-1.7B for quality)
# 4. Select "Simple Rephrasing" mode
# 5. Choose annotation types to process
# 6. Execute
# For translation:
# 1. Same as above but select "Translate to Different Language"
# 2. Specify target language (e.g., "Spanish", "French", "Chinese")
# 3. Execute
Example: Resolution Scaling#
# Resize to common resolutions:
# 1. Select GUI screenshots
# 2. Run "Resize Images to Screen Resolutions"
# 3. Choose target resolution (e.g., "1920x1080")
# 4. Or enable custom resolution and specify dimensions
# 5. Select which annotations to preserve
# 6. Execute
Configuration#
Execution Modes#
All operators support two execution modes:
Immediate: Process immediately in the FiftyOne App (default)
Delegated: Queue for background processing (requires orchestrator setup)
Label Field Selection#
Most operators allow you to choose which annotation fields to copy:
Detections: Bounding box annotations
Keypoints: Point-based annotations
All Fields: Copy all label fields automatically
Output Location#
Transformed images are saved in the same directory as original images with unique hash suffixes to prevent conflicts.
Architecture#
Core Components#
transform_sample()
: Central utility for applying image transformationsTransform Functions: OpenCV-based image processing functions
LLM Integration: Hugging Face Transformers for text processing
Annotation Handling: Automatic copying and scaling of spatial annotations
Transform Record#
Each augmented sample includes a transform_record
in its metadata for full provenance tracking:
{
"transforms": [{"name": "grayscale", "params": {}}],
"source_sample_id": "original_id",
"timestamp": "2024-01-01T12:00:00",
"plugin": "synthetic_gui_samples_plugins"
}
Supported Annotation Types#
Detections (
fo.Detections
): Bounding boxes with labels and attributesKeypoints (
fo.Keypoints
): Point-based annotationsTask Descriptions: Text attributes on detection/keypoint objects
Custom Attributes: All custom fields and metadata are preserved
Use Cases#
GUI Model Training#
Multi-Resolution Training: Generate samples at different screen resolutions
Accessibility Testing: Create colorblind-simulated variants
Robustness Testing: Test with inverted colors and grayscale images
Multilingual Support: Generate translated task descriptions
Data Augmentation Pipeline#
# Example workflow combining multiple operators:
# 1. Start with base GUI screenshots
# 2. Apply grayscale augmentation for robustness
# 3. Resize to multiple resolutions for device compatibility
# 4. Use colorblind simulation for accessibility
# 5. Rephrase task descriptions for linguistic diversity
Research Applications#
Vision-Language Models: Train on diverse visual and textual variations
Accessibility Research: Study GUI perception across different visual conditions
Cross-Cultural UX: Generate multilingual interface descriptions
Advanced Features#
LLM Models#
The plugin supports multiple language models with different performance characteristics:
Model |
Size |
Speed |
Quality |
---|---|---|---|
Qwen3-0.6B |
Small |
Fastest |
Good |
Qwen3-1.7B |
Medium |
Fast |
Better |
Custom Resolutions#
Beyond predefined screen resolutions, you can specify custom dimensions for specialized use cases.
Batch Processing#
All operators support batch processing of selected samples or entire dataset views.
Development#
Plugin Structure#
synthetic_gui_samples_plugins/
fiftyone.yml # Plugin configuration
__init__.py # Plugin registration
utils.py # Core transformation utilities
grayscale.py # Grayscale augmentation operator
invert_colors.py # Color inversion operator
color_blind_sim.py # Colorblind simulation operator
task_description_augment.py # LLM text augmentation
resizer.py # Resolution scaling operator
README.md # This file
Requirements#
Python Dependencies#
fiftyone>=1.3.0
opencv-python
numpy
Pillow
torch
transformers
License#
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.