fiftyone.core.odm.database¶

Database utilities.

Copyright 2017-2025, Voxel51, Inc.
voxel51.com

Classes:

DatabaseConfigDocument(conn[, version, type])

Backing document for the database config.

Functions:

`get_db_config`()	Retrieves the database config.
`establish_db_conn`(config)	Establishes the database connection.
`aggregate`(collection, pipelines[, hints, …])	Executes one or more aggregations on a collection.
`ensure_connection`()	Ensures database connection exists
`get_db_client`()	Returns a database client.
`get_db_conn`()	Returns a connection to the database.
`get_async_db_client`([use_global])	Returns an async database client.
`get_async_db_conn`([use_global])	Returns an async connection to the database.
`drop_database`()	Drops the database.
`sync_database`()	Syncs all pending database writes to disk.
`list_collections`()	Returns a list of all collection names in the database.
`drop_collection`(collection_name)	Drops specified collection from the database.
`drop_orphan_collections`([dry_run])	Drops all orphan collections from the database.
`drop_orphan_saved_views`([dry_run])	Drops all orphan saved views from the database.
`drop_orphan_runs`([dry_run])	Drops all orphan runs from the database.
`drop_orphan_stores`([dry_run])	Drops all orphan execution stores from the database.
`stream_collection`(collection_name)	Streams the contents of the collection to stdout.
`get_collection_stats`(collection_name)	Sets stats about the collection.
`count_documents`(coll, pipeline)
`export_document`(doc, json_path)	Exports the document to disk in JSON format.
`export_collection`(docs, json_dir_or_path[, …])	Exports the collection to disk in JSON format.
`import_document`(json_path)	Imports a document from JSON on disk.
`import_collection`(json_dir_or_path[, key])	Imports the collection from JSON on disk.
`insert_documents`(docs, coll[, ordered, …])	Inserts documents into a collection.
`bulk_write`(ops, coll[, ordered, batcher, …])	Performs a batch of write operations on a collection.
`list_datasets`()	Returns the list of available FiftyOne datasets.
`patch_saved_views`(dataset_name[, dry_run])	Ensures that the saved view documents in the `views` collection for the given dataset exactly match the IDs in its dataset document.
`patch_workspaces`(dataset_name[, dry_run])	Ensures that the workspace documents in the `workspaces` collection for the given dataset exactly match the IDs in its dataset document.
`patch_annotation_runs`(dataset_name[, dry_run])	Ensures that the annotation runs in the `runs` collection for the given dataset exactly match the values in its dataset document.
`patch_brain_runs`(dataset_name[, dry_run])	Ensures that the brain method runs in the `runs` collection for the given dataset exactly match the values in its dataset document.
`patch_evaluations`(dataset_name[, dry_run])	Ensures that the evaluation runs in the `runs` collection for the given dataset exactly match the values in its dataset document.
`patch_runs`(dataset_name[, dry_run])	Ensures that the runs in the `runs` collection for the given dataset exactly match the values in its dataset document.
`delete_dataset`(name[, dry_run])	Deletes the dataset with the given name.
`delete_saved_view`(dataset_name, view_name[, …])	Deletes the saved view with the given name from the dataset with the given name.
`delete_saved_views`(dataset_name[, dry_run])	Deletes all saved views from the dataset with the given name.
`delete_annotation_run`(name, anno_key[, dry_run])	Deletes the annotation run with the given key from the dataset with the given name.
`delete_annotation_runs`(name[, dry_run])	Deletes all annotation runs from the dataset with the given name.
`delete_brain_run`(name, brain_key[, dry_run])	Deletes the brain method run with the given key from the dataset with the given name.
`delete_brain_runs`(name[, dry_run])	Deletes all brain method runs from the dataset with the given name.
`delete_evaluation`(name, eval_key[, dry_run])	Deletes the evaluation run with the given key from the dataset with the given name.
`delete_evaluations`(name[, dry_run])	Deletes all evaluations from the dataset with the given name.
`delete_run`(name, run_key[, dry_run])	Deletes the run with the given key from the dataset with the given name.
`delete_runs`(name[, dry_run])	Deletes all runs from the dataset with the given name.
`get_indexed_values`(collection, …[, …])	Returns the values of the field(s) for all samples in the given collection that are covered by the index.

class fiftyone.core.odm.database.DatabaseConfigDocument(conn, version=None, type=None, *args, **kwargs)¶

Bases: object

Backing document for the database config.

Attributes:

Methods:

save()

version: str¶

type: str¶

save()¶

fiftyone.core.odm.database.get_db_config()¶

Retrieves the database config.

Returns: a DatabaseConfigDocument

fiftyone.core.odm.database.establish_db_conn(config)¶

Establishes the database connection.

If fiftyone.config.database_uri is defined, then we connect to that URI. Otherwise, a fiftyone.core.service.DatabaseService is created.

Parameters

config – a fiftyone.core.config.FiftyOneConfig

Raises

ConnectionError – if a connection to mongod could not be established
FiftyOneConfigError – if fiftyone.config.database_uri is not defined and mongod could not be found
ServiceExecutableNotFound – if fiftyone.core.service.DatabaseService startup was attempted, but mongod was not found in fiftyone.db.bin
RuntimeError – if the mongod found does not meet FiftyOne’s requirements, or validation could not occur

fiftyone.core.odm.database.aggregate(collection, pipelines, hints=None, maxTimeMS=None, _stream=False)¶

Executes one or more aggregations on a collection.

Multiple aggregations are executed using multiple threads, and their results are returned as lists rather than cursors.

Parameters

collection – a pymongo.collection.Collection or motor.motor_asyncio.AsyncIOMotorCollection
pipelines – a MongoDB aggregation pipeline or a list of pipelines
hints (None) – a corresponding index hint or list of index hints for each pipeline
maxTimeMS (None) – max timeout for the request(s)

Returns

If a single pipeline is provided, a pymongo.command_cursor.CommandCursor or motor.motor_asyncio.AsyncIOMotorCommandCursor is returned
If multiple pipelines are provided, each cursor is extracted into a list and the list of lists is returned

fiftyone.core.odm.database.ensure_connection()¶: Ensures database connection exists

fiftyone.core.odm.database.get_db_client()¶

Returns a database client.

Returns: a pymongo.mongo_client.MongoClient

fiftyone.core.odm.database.get_db_conn()¶

Returns a connection to the database.

Returns: a pymongo.database.Database

fiftyone.core.odm.database.get_async_db_client(use_global=False)¶

Returns an async database client.

Parameters: use_global – whether to use the global client singleton
Returns: a motor.motor_asyncio.AsyncIOMotorClient

fiftyone.core.odm.database.get_async_db_conn(use_global=False)¶

Returns an async connection to the database.

Returns: a motor.motor_asyncio.AsyncIOMotorDatabase

fiftyone.core.odm.database.drop_database()¶: Drops the database.

fiftyone.core.odm.database.sync_database()¶: Syncs all pending database writes to disk.

fiftyone.core.odm.database.list_collections()¶

Returns a list of all collection names in the database.

Returns: a list of all collection names

fiftyone.core.odm.database.drop_collection(collection_name)¶

Drops specified collection from the database.

Parameters: collection_name – the collection name

fiftyone.core.odm.database.drop_orphan_collections(dry_run=False)¶

Drops all orphan collections from the database.

Orphan collections are collections that are not associated with any known dataset or other collections used by FiftyOne.

Parameters: dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_saved_views(dry_run=False)¶

Drops all orphan saved views from the database.

Orphan saved views are saved view documents that are not associated with any known dataset or other collections used by FiftyOne.

Parameters: dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_runs(dry_run=False)¶

Drops all orphan runs from the database.

Orphan runs are runs that are not associated with any known dataset or other collections used by FiftyOne.

Parameters: dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_stores(dry_run=False)¶

Drops all orphan execution stores from the database.

Orphan stores are those that are associated with a dataset that no longer exists in the database.

Parameters: dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.stream_collection(collection_name)¶

Streams the contents of the collection to stdout.

Parameters: collection_name – the name of the collection

fiftyone.core.odm.database.get_collection_stats(collection_name)¶

Sets stats about the collection.

Parameters: collection_name – the name of the collection
Returns: a stats dict

fiftyone.core.odm.database.count_documents(coll, pipeline)¶

fiftyone.core.odm.database.export_document(doc, json_path)¶

Exports the document to disk in JSON format.

Parameters

doc – a BSON document dict
json_path – the path to write the JSON file

fiftyone.core.odm.database.export_collection(docs, json_dir_or_path, key='documents', patt='{idx:06d}-{id}.json', num_docs=None, progress=None)¶

Exports the collection to disk in JSON format.

Parameters

docs – an iterable containing the documents to export
json_dir_or_path – the path to write a single JSON file containing the entire collection, or a directory in which to write per-document JSON files
key ("documents") – the field name under which to store the documents when json_path is a single JSON file
("{idx (patt) – 06d}-{id}.json”): a filename pattern to use when json_path is a directory. The pattern may contain idx to refer to the index of the document in docs or id to refer to the document’s ID
num_docs (None) – the total number of documents. If omitted, this must be computable via len(docs)
progress (None) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

fiftyone.core.odm.database.import_document(json_path)¶

Imports a document from JSON on disk.

Parameters: json_path – the path to the document
Returns: a BSON document dict

fiftyone.core.odm.database.import_collection(json_dir_or_path, key='documents')¶

Imports the collection from JSON on disk.

Parameters

json_dir_or_path – the path to a JSON file on disk, or a directory containing per-document JSON files
key ("documents") – the field name under which the documents are stored when json_path is a single JSON file

Returns

a tuple of

an iterable of BSON documents
the number of documents

fiftyone.core.odm.database.insert_documents(docs, coll, ordered=False, batcher=None, progress=None, num_docs=None)¶

Inserts documents into a collection.

The _id field of the input documents will be populated if it is not already set.

Parameters

docs – an iterable of BSON document dicts
coll – a pymongo collection
ordered (False) – whether the documents must be inserted in order
batcher (None) – an optional fiftyone.core.utils.Batcher class to use to batch the documents, or False to strictly insert the documents in a single batch. By default, fiftyone.config.default_batcher is used
progress (None) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
num_docs (None) – the total number of documents. Only used when progress=True. If omitted, this will be computed via len(docs), if possible

Returns

a list of IDs of the inserted documents

fiftyone.core.odm.database.bulk_write(ops, coll, ordered=False, batcher=None, progress=False)¶

Performs a batch of write operations on a collection.

Parameters

ops – a list of pymongo operations
coll – a pymongo collection
ordered (False) – whether the operations must be performed in order
batcher (None) – an optional fiftyone.core.utils.Batcher class to use to batch the operations, or False to strictly perform the operations in a single batch. By default, fiftyone.config.default_batcher is used
progress (False) – whether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead

Returns

A list of pymongo.results.BulkWriteResult objects

fiftyone.core.odm.database.list_datasets()¶

Returns the list of available FiftyOne datasets.

This is a low-level implementation of dataset listing that does not call fiftyone.core.dataset.list_datasets(), which is helpful if a database may be corrupted.

Returns: a list of Dataset names

fiftyone.core.odm.database.patch_saved_views(dataset_name, dry_run=False)¶

Ensures that the saved view documents in the views collection for the given dataset exactly match the IDs in its dataset document.

Parameters

dataset_name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_workspaces(dataset_name, dry_run=False)¶

Ensures that the workspace documents in the workspaces collection for the given dataset exactly match the IDs in its dataset document.

Parameters

dataset_name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_annotation_runs(dataset_name, dry_run=False)¶

Ensures that the annotation runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters

dataset_name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_brain_runs(dataset_name, dry_run=False)¶

Ensures that the brain method runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters

dataset_name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_evaluations(dataset_name, dry_run=False)¶

Ensures that the evaluation runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters

dataset_name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.patch_runs(dataset_name, dry_run=False)¶

Ensures that the runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters

dataset_name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_dataset(name, dry_run=False)¶

Deletes the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters

name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_saved_view(dataset_name, view_name, dry_run=False)¶

Deletes the saved view with the given name from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.load_saved_view(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters

dataset_name – the name of the dataset
view_name – the name of the saved view
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_saved_views(dataset_name, dry_run=False)¶

Deletes all saved views from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.load_saved_view(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters

dataset_name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_annotation_run(name, anno_key, dry_run=False)¶

Deletes the annotation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_annotation_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
anno_key – the annotation key
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_annotation_runs(name, dry_run=False)¶

Deletes all annotation runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_annotation_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_brain_run(name, brain_key, dry_run=False)¶

Deletes the brain method run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_brain_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
brain_key – the brain key
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_brain_runs(name, dry_run=False)¶

Deletes all brain method runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_brain_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_evaluation(name, eval_key, dry_run=False)¶

Deletes the evaluation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_evaluation(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
eval_key – the evaluation key
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_evaluations(name, dry_run=False)¶

Deletes all evaluations from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_evaluations(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_run(name, run_key, dry_run=False)¶

Deletes the run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
run_key – the run key
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_runs(name, dry_run=False)¶

Deletes all runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters

name – the name of the dataset
dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.get_indexed_values(collection, field_or_fields, *, index_key=None, query=None, values_only=False, _stream=False)¶

Returns the values of the field(s) for all samples in the given collection that are covered by the index. Raises an error if the field is not indexed.

Parameters

collection – a pymongo.collection.Collection or motor.motor_asyncio.AsyncIOMotorCollection
field_or_fields – the field name or list of field names to retrieve.
index_key (None) – the name of the index to use. If None, the default index name will be constructed from the field name(s).
query (None) – a dict selection filter to apply when querying. For performance, this should only include fields that are in the specified index.
values_only (False) – whether to remove field names from the resulting list. If True, the field names are removed and only the values will be returned as a list for each sample. If False, the field names are preserved and the values will be returned as a dict for each sample.

Returns

a list of values for the specified field or index keys for each sample sorted in the same order as the index

Raises

ValueError – if the field is not indexed