Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

GitHub Repo

ScreenParser for FiftyOne#

ScreenSpot-Pro sample with ScreenParser inference results
Inference results in FiftyOne using ScreenParser on a sample from the ScreenSpot-Pro dataset.

A FiftyOne remote model zoo source for ScreenParser, a YOLO11-L object detector fine-tuned by the docling-project on the ScreenParse v2 dataset (~1.45M screenshots) to localize 55 UI element classes (buttons, tables, navigation bars, text inputs, icons, etc.) in application and web screenshots.

ScreenParser is a standard Ultralytics YOLO model, so this integration uses FiftyOne’s built-in fiftyone.utils.ultralytics.FiftyOneYOLOModel wrapper, there is no custom inference code, only a manifest.json describing where to download the weights and how to deploy them.

Requirements#

pip install "fiftyone>=1.0" "ultralytics>=8.3.0"

Usage#

Register this repository as a remote zoo model source, then load and apply the model like any other zoo model:

import fiftyone as fo
import fiftyone.zoo as foz

# 1. Register the remote source (one time)
foz.register_zoo_model_source("https://github.com/Burhan-Q/screenparser")

# 2. Download the weights (153 MB); load_zoo_model does this for you
foz.download_zoo_model(
    "https://github.com/Burhan-Q/screenparser",
    model_name="docling-project/ScreenParser",
)

# 3. Load the model
model = foz.load_zoo_model("docling-project/ScreenParser")

# 4. Apply to a dataset of screenshots
dataset = fo.Dataset.from_images_dir("/path/to/screenshots")
dataset.apply_model(model, label_field="ui_elements")

session = fo.launch_app(dataset)

Predictions are stored as fiftyone.core.labels.Detections in the ui_elements field.

Inference settings#

The model was trained at 1280px; the manifest sets the recommended defaults of imgsz=1280, conf=0.10, iou=0.10. You can override the confidence threshold and other Ultralytics arguments at load time:

model = foz.load_zoo_model(
    "docling-project/ScreenParser",
    confidence_thresh=0.25,
    overrides={"iou": 0.10, "imgsz": 1280},
)

Training Data & Detected Classes#

The current main checkpoint was trained on ScreenParse v2, which provides 1,447,100 high-quality training screenshots and 25,575,213 UI element annotations. The dataset uses filtered leaf-element annotations to reduce noisy nested boxes and includes multiple viewport resolutions.

Limitations#

  • Produces bounding boxes and element labels only; it does not produce text content for detected elements. Pair it with OCR or ScreenVLM when text extraction is needed.

  • The model is trained on rendered web screenshots, so performance may vary on native desktop, mobile, or application screenshots outside the training distribution.

Expand for the full class list
  • Table

  • Column/Browser

  • Button

  • Utility Button

  • App Icon

  • Navigation Bar

  • Status Bar

  • Search Field

  • Toolbar

  • Tooltip

  • Video

  • Tab Bar

  • Side Bar

  • Slider

  • Picker

  • ContextMenu

  • DockMenu

  • EditMenu

  • Image

  • Scroll

  • Switch

  • File Icon

  • Chart

  • Window

  • Screen

  • List

  • List Item

  • PopUp Menu

  • Steppers

  • Toggles

  • Text Input

  • Rating Indicator

  • Checkbox

  • Radiobox

  • Select

  • Avatar

  • Badge

  • Alert

  • Progress bar

  • Bottom navigation

  • Breadcrumb

  • Page control

  • Link

  • Menu

  • Pagination

  • Tab

  • Search Bar

  • Date-Time picker

  • Calendar

  • Text

  • Heading

  • Code snippet

  • Carousel

  • Notification

  • Logo

License#

The ScreenParser FiftyOne integration source is released under the Apache-2.0 license. See the model card for details about the docling-project license of the model weights.