|Run in Google Colab||View source on GitHub||Download notebook|
Annotating Datasets with CVAT¶
The tight integration between FiftyOne and CVAT allows you to curate and explore datasets in FiftyOne and then send off samples or existing labels for annotation in CVAT with just one line of code.
This walkthrough covers:
Selecting subsets and annotating unlabeled image datasets with CVAT
Improving datasets and fixing annotaiton mistakes with CVAT
Annotating videos with CVAT
So, what’s the takeaway?
FiftyOne makes it incredibly easy to explore datasets, understand them, and discover ways to improve them. This walkthrough covers the imporant next step: using CVAT to take action to both annotate datasets and correct existing label deficiencies that you’ve identified in your datasets.
If you haven’t already, install FiftyOne:
!pip install fiftyone
In order to use CVAT, you must create an account on a CVAT server.
Another option is to set up CVAT locally and then configure FiftyOne to use your self-hosted server. A primary benefit of setting up CVAT locally is that you are limited to 10 tasks and 500MB of data with app.cvat.ai.
In any case, FiftyOne will need to connect to your CVAT account. The easiest way to configure your CVAT login credentials is to store them in environment variables:
!export FIFTYONE_CVAT_USERNAME=<YOUR_USERNAME> !export FIFTYONE_CVAT_PASSWORD=<YOUR_PASSWORD>
There are also other ways to configure your login credentials if you prefer.
Finally, this post also optionally runs an object detection model that requires TensorFlow:
!pip install tensorflow
Unlabeled dataset annotation¶
For most machine learning projects, the first step is to collect a suitable dataset for the task. For computer vision projects specifically, this will generally result in thousands of images or videos that have been gathered from internet sources like Flickr or captured by new footage from a data acquisition team.
With collections containing thousands or millions of samples, the cost to annotate every single sample can be astronomical. It thus makes sense to ensure that only the most useful and relevant data is being sent to annotation. One metric for how “useful” data is in training a model is how unique the example is with respect to the rest of the dataset. Multiple similar examples will not provide the model with as much new information to learn as visually unique examples.
FiftyOne provides a visual similarity capability that we’ll use in this tutorial to select some unique images to annotate.
# Example import fiftyone as fo dataset_dir = "/path/to/raw/data" name = "my_dataset" dataset = fo.Dataset.from_dir( dataset_dir=dataset_dir, dataset_type=fo.types.ImageDirectory, name=name, )
100% |█████████████████████| 0/0 [3.5ms elapsed, ? remaining, ? samples/s]
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset( "open-images-v6", split="validation", label_types=, max_samples=200, )
Downloading split 'validation' to '/home/voxel51/fiftyone/open-images-v6/validation' if necessary Necessary images already downloaded Existing download of split 'validation' is sufficient Loading 'open-images-v6' split 'validation' 100% |█████████████████| 200/200 [72.4ms elapsed, 0s remaining, 2.8K samples/s] Dataset 'open-images-v6-validation-200' created
Now let’s make the dataset persistent so that we can access it in future Python sessions.
dataset.persistent = True
Now that the data is loaded, let’s visualize it in the FiftyOne App:
session = fo.launch_app(dataset)