Note

This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.

FiftyOne + Twelve Labs Plugin#

Bring multimodal video intelligence into your computer vision workflows with FiftyOne and Twelve Labs.

This plugin lets you generate rich video embeddings (visual, audio, OCR, conversation) using the Twelve Labs API and organize them into a clip-level FiftyOne dataset for analysis, search, and iteration.

Ideal for building your own retrieval pipelines, video QA systems, or semantic labeling tools on top of real clip-level understanding.

semantic_video_search

Key Features#

Generate multimodal embeddings from full videos
Automatically split videos into meaningful clips
Store results in a new FiftyOne dataset with clip-level granularity
Run semantic search over your indexed videos using prompts
Uses secure secrets (TL_API_KEY) for easy API access

Installation#

Install the plugin directly in FiftyOne:

fiftyone plugins download https://github.com/danielgural/semantic_video_search

Plugin Operators#

`create_twelve_labs_embeddings`#

Generate embeddings for your videos via the Twelve Labs API. Videos are automatically split into clips, and the resulting dataset contains embeddings from selected modalities:

visual
audio

Each sample afterwards contains a TemporalDetection correlating to its embeddings. Turn your dataset into clips with to_clips to use as a normal embeddings! (More below!)

Recommended to run as a delegated operator due to processing time.

`create_twelve_labs_index`#

Creates a searchable Twelve Labs index from your embedded clips. Choose your index name and embedding types. You can build indexes from:

Entire dataset
Current view
Selected samples

Note, this builds the index in Twelve Labs!

`twelve_labs_index_search`#

Query your Twelve Labs index using a natural language prompt, and return results sorted by relevance. You can select one or more modalities to match (e.g., visual + audio + OCR).

Use this to semantically explore your video data while keeping data in Twelve Labs!

Environment Setup#

You’ll need a Twelve Labs API Key.

export TL_API_KEY=<YOUR_TWELVE_LABS_API_KEY>

You can also securely store it in the FiftyOne App as a plugin secret.

Example Workflow#

Generate clip-level embeddings
Run create_twelve_labs_embeddings on a video dataset
→ Creates a new dataset with embedded clips for more embedding awesomeness!
Index your clips
Run create_twelve_labs_index on the clip dataset
→ Builds a searchable index with selected modalities that stays in Twelve Labs
Search your videos
Use twelve_labs_index_search with a prompt
→ View most relevant clips inside FiftyOne!

Resources#

Clip Dataset Conversion#

import fiftyone.utils.video as fouv

def create_clip_dataset(
    dataset: fo.Dataset,
    clip_field: str,
    new_dataset_name: str = "clips",
    overwrite: bool = True,
    viz: bool = False,
    sim: bool = False,
) -> fo.Dataset:
    clips = []
    clip_view = dataset.to_clips(clip_field)
    clip_dataset = fo.Dataset(name=new_dataset_name,overwrite=overwrite)
    i = 0
    last_file = ""
    samples = []
    for clip in clip_view:

        out_path = clip.filepath.split(".")[0] + f"_{i}.mp4"
        fpath = clip.filepath 
        fouv.extract_clip(fpath, output_path=out_path, support=clip.support)
        clip.filepath = out_path
        samples.append(clip)
        clip.filepath = fpath
        if clip.filepath == last_file:
            i += 1
        else:
            i = 0
        last_file = clip.filepath
    clip_dataset.add_samples(samples)
    clip_dataset.add_sample_field("Twelve Labs Marengo-retrieval-27 Embeddings", fo.VectorField)
    clip_dataset.set_field("Twelve Labs Marengo-retrieval-27 Embeddings", clip_view.values("Twelve Labs Marengo-retrieval-27.embedding"))
    
    return clip_dataset

License#

MIT