Note
This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.
FiftyOne + Twelve Labs Plugin#
Bring multimodal video intelligence into your computer vision workflows with FiftyOne and Twelve Labs.
This plugin lets you generate rich video embeddings (visual, audio, OCR, conversation) using the Twelve Labs API and organize them into a clip-level FiftyOne dataset for analysis, search, and iteration.
Ideal for building your own retrieval pipelines, video QA systems, or semantic labeling tools on top of real clip-level understanding.
Key Features#
Generate multimodal embeddings from full videos
Automatically split videos into meaningful clips
Store results in a new FiftyOne dataset with clip-level granularity
Run semantic search over your indexed videos using prompts
Uses secure secrets (
TL_API_KEY
) for easy API access
Installation#
Install the plugin directly in FiftyOne:
fiftyone plugins download https://github.com/danielgural/semantic_video_search
Plugin Operators#
create_twelve_labs_embeddings
#
Generate embeddings for your videos via the Twelve Labs API. Videos are automatically split into clips, and the resulting dataset contains embeddings from selected modalities:
visual
audio
Each sample afterwards contains a TemporalDetection correlating to its embeddings. Turn your dataset into clips with to_clips to use as a normal embeddings! (More below!)
Recommended to run as a delegated operator due to processing time.
create_twelve_labs_index
#
Creates a searchable Twelve Labs index from your embedded clips. Choose your index name and embedding types. You can build indexes from:
Entire dataset
Current view
Selected samples
Note, this builds the index in Twelve Labs!
twelve_labs_index_search
#
Query your Twelve Labs index using a natural language prompt, and return results sorted by relevance. You can select one or more modalities to match (e.g., visual + audio + OCR).
Use this to semantically explore your video data while keeping data in Twelve Labs!
Environment Setup#
You’ll need a Twelve Labs API Key.
export TL_API_KEY=<YOUR_TWELVE_LABS_API_KEY>
You can also securely store it in the FiftyOne App as a plugin secret.
Example Workflow#
Generate clip-level embeddings
Runcreate_twelve_labs_embeddings
on a video dataset
→ Creates a new dataset with embedded clips for more embedding awesomeness!Index your clips
Runcreate_twelve_labs_index
on the clip dataset
→ Builds a searchable index with selected modalities that stays in Twelve LabsSearch your videos
Usetwelve_labs_index_search
with a prompt
→ View most relevant clips inside FiftyOne!
Resources#
Clip Dataset Conversion#
import fiftyone.utils.video as fouv
def create_clip_dataset(
dataset: fo.Dataset,
clip_field: str,
new_dataset_name: str = "clips",
overwrite: bool = True,
viz: bool = False,
sim: bool = False,
) -> fo.Dataset:
clips = []
clip_view = dataset.to_clips(clip_field)
clip_dataset = fo.Dataset(name=new_dataset_name,overwrite=overwrite)
i = 0
last_file = ""
samples = []
for clip in clip_view:
out_path = clip.filepath.split(".")[0] + f"_{i}.mp4"
fpath = clip.filepath
fouv.extract_clip(fpath, output_path=out_path, support=clip.support)
clip.filepath = out_path
samples.append(clip)
clip.filepath = fpath
if clip.filepath == last_file:
i += 1
else:
i = 0
last_file = clip.filepath
clip_dataset.add_samples(samples)
clip_dataset.add_sample_field("Twelve Labs Marengo-retrieval-27 Embeddings", fo.VectorField)
clip_dataset.set_field("Twelve Labs Marengo-retrieval-27 Embeddings", clip_view.values("Twelve Labs Marengo-retrieval-27.embedding"))
return clip_dataset
License#
MIT