fiftyone.core.odm.database#

Database utilities.

Copyright 2017-2025, Voxel51, Inc.

Classes:

DatabaseConfigDocument(conn[, version, type])

Backing document for the database config.

Functions:

get_db_config()

Retrieves the database config.

establish_db_conn(config)

Establishes the database connection.

aggregate(collection, pipelines[, hints, ...])

Executes one or more aggregations on a collection.

ensure_connection()

Ensures database connection exists

get_db_client()

Returns a database client.

get_db_conn()

Returns a connection to the database.

get_async_db_client([use_global])

Returns an async database client.

get_async_db_conn([use_global])

Returns an async connection to the database.

drop_database()

Drops the database.

sync_database()

Syncs all pending database writes to disk.

list_collections()

Returns a list of all collection names in the database.

drop_collection(collection_name)

Drops specified collection from the database.

drop_orphan_collections([dry_run])

Drops all orphan collections from the database.

drop_orphan_saved_views([dry_run])

Drops all orphan saved views from the database.

drop_orphan_runs([dry_run])

Drops all orphan runs from the database.

drop_orphan_stores([dry_run])

Drops all orphan execution stores from the database.

stream_collection(collection_name)

Streams the contents of the collection to stdout.

get_collection_stats(collection_name)

Sets stats about the collection.

count_documents(coll, pipeline)

export_document(doc, json_path)

Exports the document to disk in JSON format.

export_collection(docs, json_dir_or_path[, ...])

Exports the collection to disk in JSON format.

import_document(json_path)

Imports a document from JSON on disk.

import_collection(json_dir_or_path[, key])

Imports the collection from JSON on disk.

insert_documents(docs, coll[, ordered, ...])

Inserts documents into a collection.

bulk_write(ops, coll[, ordered, batcher, ...])

Performs a batch of write operations on a collection.

list_datasets()

Returns the list of available FiftyOne datasets.

patch_saved_views(dataset_name[, dry_run])

Ensures that the saved view documents in the views collection for the given dataset exactly match the IDs in its dataset document.

patch_workspaces(dataset_name[, dry_run])

Ensures that the workspace documents in the workspaces collection for the given dataset exactly match the IDs in its dataset document.

patch_annotation_runs(dataset_name[, dry_run])

Ensures that the annotation runs in the runs collection for the given dataset exactly match the values in its dataset document.

patch_brain_runs(dataset_name[, dry_run])

Ensures that the brain method runs in the runs collection for the given dataset exactly match the values in its dataset document.

patch_evaluations(dataset_name[, dry_run])

Ensures that the evaluation runs in the runs collection for the given dataset exactly match the values in its dataset document.

patch_runs(dataset_name[, dry_run])

Ensures that the runs in the runs collection for the given dataset exactly match the values in its dataset document.

delete_dataset(name[, dry_run])

Deletes the dataset with the given name.

delete_saved_view(dataset_name, view_name[, ...])

Deletes the saved view with the given name from the dataset with the given name.

delete_saved_views(dataset_name[, dry_run])

Deletes all saved views from the dataset with the given name.

delete_annotation_run(name, anno_key[, dry_run])

Deletes the annotation run with the given key from the dataset with the given name.

delete_annotation_runs(name[, dry_run])

Deletes all annotation runs from the dataset with the given name.

delete_brain_run(name, brain_key[, dry_run])

Deletes the brain method run with the given key from the dataset with the given name.

delete_brain_runs(name[, dry_run])

Deletes all brain method runs from the dataset with the given name.

delete_evaluation(name, eval_key[, dry_run])

Deletes the evaluation run with the given key from the dataset with the given name.

delete_evaluations(name[, dry_run])

Deletes all evaluations from the dataset with the given name.

delete_run(name, run_key[, dry_run])

Deletes the run with the given key from the dataset with the given name.

delete_runs(name[, dry_run])

Deletes all runs from the dataset with the given name.

get_indexed_values(collection, ...[, ...])

Returns the values of the field(s) for all samples in the given collection that are covered by the index.

class fiftyone.core.odm.database.DatabaseConfigDocument(conn, version=None, type=None, *args, **kwargs)#

Bases: object

Backing document for the database config.

Attributes:

Methods:

save()

version: str#
type: str#
save()#
fiftyone.core.odm.database.get_db_config()#

Retrieves the database config.

Returns:

a DatabaseConfigDocument

fiftyone.core.odm.database.establish_db_conn(config)#

Establishes the database connection.

If fiftyone.config.database_uri is defined, then we connect to that URI. Otherwise, a fiftyone.core.service.DatabaseService is created.

Parameters:

config – a fiftyone.core.config.FiftyOneConfig

Raises:
  • ConnectionError – if a connection to mongod could not be established

  • FiftyOneConfigError – if fiftyone.config.database_uri is not defined and mongod could not be found

  • ServiceExecutableNotFound – if fiftyone.core.service.DatabaseService startup was attempted, but mongod was not found in fiftyone.db.bin

  • RuntimeError – if the mongod found does not meet FiftyOne’s requirements, or validation could not occur

fiftyone.core.odm.database.aggregate(collection, pipelines, hints=None, maxTimeMS=None, _stream=False)#

Executes one or more aggregations on a collection.

Multiple aggregations are executed using multiple threads, and their results are returned as lists rather than cursors.

Parameters:
  • collection – a pymongo.collection.Collection or motor.motor_asyncio.AsyncIOMotorCollection

  • pipelines – a MongoDB aggregation pipeline or a list of pipelines

  • hints (None) – a corresponding index hint or list of index hints for each pipeline

  • maxTimeMS (None) – max timeout for the request(s)

Returns:

  • If a single pipeline is provided, a pymongo.command_cursor.CommandCursor or motor.motor_asyncio.AsyncIOMotorCommandCursor is returned

  • If multiple pipelines are provided, each cursor is extracted into a list and the list of lists is returned

fiftyone.core.odm.database.ensure_connection()#

Ensures database connection exists

fiftyone.core.odm.database.get_db_client()#

Returns a database client.

Returns:

a pymongo.mongo_client.MongoClient

fiftyone.core.odm.database.get_db_conn()#

Returns a connection to the database.

Returns:

a pymongo.database.Database

fiftyone.core.odm.database.get_async_db_client(use_global=False)#

Returns an async database client.

Parameters:

use_global – whether to use the global client singleton

Returns:

a motor.motor_asyncio.AsyncIOMotorClient

fiftyone.core.odm.database.get_async_db_conn(use_global=False)#

Returns an async connection to the database.

Returns:

a motor.motor_asyncio.AsyncIOMotorDatabase

fiftyone.core.odm.database.drop_database()#

Drops the database.

fiftyone.core.odm.database.sync_database()#

Syncs all pending database writes to disk.

fiftyone.core.odm.database.list_collections()#

Returns a list of all collection names in the database.

Returns:

a list of all collection names

fiftyone.core.odm.database.drop_collection(collection_name)#

Drops specified collection from the database.

Parameters:

collection_name – the collection name

fiftyone.core.odm.database.drop_orphan_collections(dry_run=False)#

Drops all orphan collections from the database.

Orphan collections are collections that are not associated with any known dataset or other collections used by FiftyOne.

Parameters:

dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_saved_views(dry_run=False)#

Drops all orphan saved views from the database.

Orphan saved views are saved view documents that are not associated with any known dataset or other collections used by FiftyOne.

Parameters:

dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_runs(dry_run=False)#

Drops all orphan runs from the database.

Orphan runs are runs that are not associated with any known dataset or other collections used by FiftyOne.

Parameters:

dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_stores(dry_run=False)#

Drops all orphan execution stores from the database.

Orphan stores are those that are associated with a dataset that no longer exists in the database.

Parameters:

dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.stream_collection(collection_name)#

Streams the contents of the collection to stdout.

Parameters:

collection_name – the name of the collection

fiftyone.core.odm.database.get_collection_stats(collection_name)#

Sets stats about the collection.

Parameters:

collection_name – the name of the collection

Returns:

a stats dict

fiftyone.core.odm.database.count_documents(coll, pipeline)#
fiftyone.core.odm.database.export_document(doc, json_path)#

Exports the document to disk in JSON format.

Parameters:
  • doc – a BSON document dict

  • json_path – the path to write the JSON file

fiftyone.core.odm.database.export_collection(docs, json_dir_or_path, key='documents', patt='{idx:06d}-{id}.json', num_docs=None, progress=None)#

Exports the collection to disk in JSON format.

Parameters:
  • docs – an iterable containing the documents to export

  • json_dir_or_path – the path to write a single JSON file containing the entire collection, or a directory in which to write per-document JSON files

  • key ("documents") – the field name under which to store the documents when json_path is a single JSON file

  • ("{idx (patt) – 06d}-{id}.json”): a filename pattern to use when json_path is a directory. The pattern may contain idx to refer to the index of the document in docs or id to refer to the document’s ID

  • num_docs (None) – the total number of documents. If omitted, this must be computable via len(docs)

  • progress (None) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

fiftyone.core.odm.database.import_document(json_path)#

Imports a document from JSON on disk.

Parameters:

json_path – the path to the document

Returns:

a BSON document dict

fiftyone.core.odm.database.import_collection(json_dir_or_path, key='documents')#

Imports the collection from JSON on disk.

Parameters:
  • json_dir_or_path – the path to a JSON file on disk, or a directory containing per-document JSON files

  • key ("documents") – the field name under which the documents are stored when json_path is a single JSON file

Returns:

a tuple of

  • an iterable of BSON documents

  • the number of documents

fiftyone.core.odm.database.insert_documents(docs, coll, ordered=False, batcher=None, progress=None, num_docs=None)#

Inserts documents into a collection.

The _id field of the input documents will be populated if it is not already set.

Parameters:
  • docs – an iterable of BSON document dicts

  • coll – a pymongo collection

  • ordered (False) – whether the documents must be inserted in order

  • batcher (None) – an optional fiftyone.core.utils.Batcher class to use to batch the documents, or False to strictly insert the documents in a single batch. By default, fiftyone.config.default_batcher is used

  • progress (None) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

  • num_docs (None) – the total number of documents. Only used when progress=True. If omitted, this will be computed via len(docs), if possible

Returns:

a list of IDs of the inserted documents

fiftyone.core.odm.database.bulk_write(ops, coll, ordered=False, batcher=None, progress=False)#

Performs a batch of write operations on a collection.

Parameters:
  • ops – a list of pymongo operations

  • coll – a pymongo collection

  • ordered (False) – whether the operations must be performed in order

  • batcher (None) – an optional fiftyone.core.utils.Batcher class to use to batch the operations, or False to strictly perform the operations in a single batch. By default, fiftyone.config.default_batcher is used

  • progress (False) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

Returns:

A list of pymongo.results.BulkWriteResult objects

fiftyone.core.odm.database.list_datasets()#

Returns the list of available FiftyOne datasets.

This is a low-level implementation of dataset listing that does not call fiftyone.core.dataset.list_datasets(), which is helpful if a database may be corrupted.

Returns:

a list of Dataset names

fiftyone.core.odm.database.patch_saved_views(dataset_name, dry_run=False)#

Ensures that the saved view documents in the views collection for the given dataset exactly match the IDs in its dataset document.

Parameters:
  • dataset_name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_workspaces(dataset_name, dry_run=False)#

Ensures that the workspace documents in the workspaces collection for the given dataset exactly match the IDs in its dataset document.

Parameters:
  • dataset_name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_annotation_runs(dataset_name, dry_run=False)#

Ensures that the annotation runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters:
  • dataset_name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_brain_runs(dataset_name, dry_run=False)#

Ensures that the brain method runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters:
  • dataset_name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_evaluations(dataset_name, dry_run=False)#

Ensures that the evaluation runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters:
  • dataset_name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_runs(dataset_name, dry_run=False)#

Ensures that the runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters:
  • dataset_name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_dataset(name, dry_run=False)#

Deletes the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters:
  • name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_saved_view(dataset_name, view_name, dry_run=False)#

Deletes the saved view with the given name from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.load_saved_view(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters:
  • dataset_name – the name of the dataset

  • view_name – the name of the saved view

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_saved_views(dataset_name, dry_run=False)#

Deletes all saved views from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.load_saved_view(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters:
  • dataset_name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_annotation_run(name, anno_key, dry_run=False)#

Deletes the annotation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_annotation_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • anno_key – the annotation key

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_annotation_runs(name, dry_run=False)#

Deletes all annotation runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_annotation_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_brain_run(name, brain_key, dry_run=False)#

Deletes the brain method run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_brain_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • brain_key – the brain key

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_brain_runs(name, dry_run=False)#

Deletes all brain method runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_brain_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_evaluation(name, eval_key, dry_run=False)#

Deletes the evaluation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_evaluation(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • eval_key – the evaluation key

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_evaluations(name, dry_run=False)#

Deletes all evaluations from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_evaluations(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_run(name, run_key, dry_run=False)#

Deletes the run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • run_key – the run key

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_runs(name, dry_run=False)#

Deletes all runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters:
  • name – the name of the dataset

  • dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.get_indexed_values(collection, field_or_fields, *, index_key=None, query=None, values_only=False, _stream=False)#

Returns the values of the field(s) for all samples in the given collection that are covered by the index. Raises an error if the field is not indexed.

Parameters:
  • collection – a pymongo.collection.Collection or motor.motor_asyncio.AsyncIOMotorCollection

  • field_or_fields – the field name or list of field names to retrieve.

  • index_key (None) – the name of the index to use. If None, the default index name will be constructed from the field name(s).

  • query (None) – a dict selection filter to apply when querying. For performance, this should only include fields that are in the specified index.

  • values_only (False) – whether to remove field names from the resulting list. If True, the field names are removed and only the values will be returned as a list for each sample. If False, the field names are preserved and the values will be returned as a dict for each sample.

Returns:

a list of values for the specified field or index keys for each sample sorted in the same order as the index

Raises:

ValueError – if the field is not indexed