Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

Visual Question Answering Plugin#

vqa_updated

Updates#

  • 2024-05-07: Major updates:

    • Added support for Moondream2 model.

    • Added support for reading question from field on the sample.

    • Added support for storing the answer in a field on the sample.

    • Added support for applying to all samples in the current view (one at a time).

    • Added support for delegated execution.

    • Added support for Python operator execution.

  • 2024-05-03: @harpreetsahota204 added support for Idefics-8b model from Replicate.

  • 2023-10-24: Added support for Llava-13b and Fuyu-8b models from Replicate.

Plugin Overview#

This plugin is a Python plugin that allows you to answer visual questions about images in your dataset!

Supported Models#

This version of the plugin supports the following models:

Feel free to fork this plugin and add support for other models!

Watch On Youtube#

Video Thumbnail

Installation#

Pre-requisites#

  1. If you plan to use it, install the Hugging Face transformers library:

pip install transformers
  1. If you plan to use it, install the Replicate library:

pip install replicate

And add your Replicate API key to your environment:

export REPLICATE_API_TOKEN=<your-api-token>

Install the plugin#

fiftyone plugins download https://github.com/jacobmarks/vqa-plugin

Operators#

answer_visual_question#

  • Applies the selected visual question answering model to the selected sample in your dataset and outputs the answer.

Usage#

The recommended interactive way to use this plugin is in the FiftyOne App with exactly one sample selected.

Python Operator Execution#

If you want to loop over samples in your dataset or view, you may be interested in using the Python operator execution mode.

import fiftyone as fo
import fiftyone.operators as foo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart", max_samples=5)

## Access the operator via its URI (plugin name + operator name)
vqa = foo.get_operator("@jacobmarks/vqa/answer_visual_question")

## Apply the operator to the dataset
vqa(
    dataset,
    model_name="llava",
    question="Describe the image",
    answer_field="llava_answer",
)

## Print the answers
print(dataset.values("llava_answer"))