Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

Nanonets-OCR2 for FiftyOne#

Nanonets-OCR2 Demo

A FiftyOne Zoo Model integration for Nanonets-OCR2, a powerful vision-language model that transforms documents into structured markdown with intelligent content recognition and semantic tagging.

Features#

Nanonets-OCR2 goes beyond traditional OCR by providing:

LaTeX Equation Recognition: Converts mathematical formulas to LaTeX syntax
Intelligent Image Description: Describes images within documents using <img> tags
Signature Detection: Isolates signatures with <signature> tags
Watermark Extraction: Detects watermarks with <watermark> tags
Smart Checkbox Handling: Converts checkboxes to Unicode symbols (, , )
Complex Table Extraction: Outputs tables in HTML format
Flow Charts & Org Charts: Extracts as Mermaid code
Handwritten Documents: Trained on handwritten text across multiple languages
Multilingual Support: English, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Arabic, and more
Visual Question Answering: Provides answers directly from documents

Installation#

pip install fiftyone

Usage#

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone.utils.huggingface import load_from_hub

# Load your dataset
dataset = load_from_hub("Voxel51/scanned_receipts", max_samples=200)

# Register the model source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/nanonets_ocr2",
    overwrite=True
)

# Load the model
model = foz.load_zoo_model("nanonets/Nanonets-OCR2-3B")

# Apply OCR to your dataset
dataset.apply_model(model, label_field="ocr_text")

# Launch the App to view results
session = fo.launch_app(dataset)

Structured Output#

The model returns text with semantic markup:

Regular text extracted naturally

<table>
  <tr><td>Column 1</td><td>Column 2</td></tr>
</table>

Inline equation: $E = mc^2$

<img>Description of chart showing sales data</img>

<watermark>CONFIDENTIAL</watermark>

<page_number>5</page_number>

Checkboxes:  Complete  Incomplete

Citation#

@misc{Nanonets-OCR2,
  title={Nanonets-OCR2: A model for transforming documents into structured markdown with intelligent content recognition and semantic tagging},
  author={Souvik Mandal and Ashish Talewar and Siddhant Thakuria and Paras Ahuja and Prathamesh Juvatkar},
  year={2025},
}

Resources#

License#

See LICENSE for details.