Note
This is a community plugin, an external project maintained by its respective author. Community plugins are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.
Clustering Plugin for FiftyOne#
This plugin provides a FiftyOne App that allows you to cluster your dataset using a variety of algorithms:
It also serves as a proof of concept for adding new “types” of runs to FiftyOne!!!
Installation#
fiftyone plugins download https://github.com/jacobmarks/clustering-plugin
You will also need to have scikit-learn
installed:
pip install -U scikit-learn
Usage#
Clustering#
Once you have the plugin installed, you can generate clusters for your dataset using the
compute_clusters
operator:
The specific arguments depend on the method
you choose — kmeans
, birch
, or agglomerative
.
Here, we are generating clusters at the same time as we are generating the embeddings, but you can also generate clusters from existing embeddings:
You can generate clusters for:
Your entire dataset
A view of your dataset
Currently selected samples in the App
Additionally, you can run the operator in:
Real-time, or
In the background, as a delegated operation
Once you have generated clusters, you can view information about the clusters in the App with the get_clustering_run_info
operator:
Visualizing Clusters#
It can be insightful to use clustering in conjunction with compute_visualization
to visualize the clusters:
Labeling Clusters#
Once you have generated clusters, you can also use the magic of multimodal AI to automatically assign short descriptions, or labels to each cluster!
This is achieved by randomly selecting a few samples from each cluster, and prompting GPT-4V to generate a description for the cluster from the samples.
To use this functionality, you must have an API key for OpenAI’s GPT-4V API, and you must set it in your environment as OPENAI_API_KEY
.
export OPENAI_API_KEY=your-api-key
Then, you can label the clusters using the label_clusters_with_gpt4v
operator.
This might take a minute or so, depending on the number of clusters, but it is worth it!
It is recommended to delegate the execution of this operation, and then launch it via
fiftyone delegated launch
Then you can view the labels in the App!