# fiftyone.core.odm.database¶

Database utilities.

Classes:

 DatabaseConfigDocument(*args, **values) Backing document for the database config.

Functions:

 aggregate(collection, pipelines) Executes one or more aggregations on a collection. bulk_write(ops, coll[, ordered]) Performs a batch of write operations on a collection. Internal utility that ensures that there is only one DatabaseConfigDocument in the database. count_documents(coll, pipeline) delete_annotation_run(name, anno_key[, dry_run]) Deletes the annotation run with the given key from the dataset with the given name. delete_annotation_runs(name[, dry_run]) Deletes all annotation runs from the dataset with the given name. delete_brain_run(name, brain_key[, dry_run]) Deletes the brain method run with the given key from the dataset with the given name. delete_brain_runs(name[, dry_run]) Deletes all brain method runs from the dataset with the given name. delete_dataset(name[, dry_run]) Deletes the dataset with the given name. delete_evaluation(name, eval_key[, dry_run]) Deletes the evaluation run with the given key from the dataset with the given name. delete_evaluations(name[, dry_run]) Deletes all evaluations from the dataset with the given name. drop_collection(collection_name) Drops specified collection from the database. Drops the database. drop_orphan_collections([dry_run]) Drops all orphan collections from the database. drop_orphan_runs([dry_run]) Drops all orphan runs from the database. drop_orphan_saved_views([dry_run]) Drops all orphan saved views from the database. establish_db_conn(config) Establishes the database connection. export_collection(docs, json_dir_or_path[, …]) Exports the collection to disk in JSON format. export_document(doc, json_path) Exports the document to disk in JSON format. Returns an async database client. Returns an async connection to the database. get_collection_stats(collection_name) Sets stats about the collection. Returns a database client. Retrieves the database config. Returns a connection to the database. import_collection(json_dir_or_path[, key]) Imports the collection from JSON on disk. import_document(json_path) Imports a document from JSON on disk. insert_documents(docs, coll[, ordered, …]) Inserts documents into a collection. Returns a list of all collection names in the database. Returns the list of available FiftyOne datasets. stream_collection(collection_name) Streams the contents of the collection to stdout. Syncs all pending database writes to disk.
class fiftyone.core.odm.database.DatabaseConfigDocument(*args, **values)

Backing document for the database config.

Miscellaneous:

Attributes:

 STRICT field_names An ordered tuple of the public fields of this document. id A field wrapper around MongoDB’s ObjectIds. in_db Whether the document has been inserted into the database. objects([q_obj]) pk Get the primary key. type A unicode string field. version A unicode string field.

Methods:

 cascade_save(**kwargs) Recursively save any references and generic references on the document. Hook for doing document level data cleaning (usually validation or assignment) before validation is run. clear_field(field_name) Clears the field from the document. Compares the indexes defined in MongoEngine with the ones existing in the database. Returns a deep copy of the document. create_index(keys[, background]) Creates the given indexes if required. delete([signal_kwargs]) Delete the Document from the database. Drops the entire collection associated with this Document type from the database. ensure_index(key_or_list[, background]) Ensure that the given indexes are in place. Checks the document meta data and ensures all the indexes exist. fancy_repr([class_name, select_fields, …]) Generates a customizable string representation of the document. field_to_mongo(field_name) field_to_python(field_name, value) from_dict(d[, extended]) Loads the document from a BSON/JSON dictionary. Loads the document from a JSON string. get_field(field_name) Gets the field of the document. Get text score from text query has_field(field_name) Determines whether the document has a field of the given name. Returns an iterator over the (name, value) pairs of the public fields of the document. Lists all indexes that should be created for the Document collection. merge(doc[, merge_lists, merge_dicts, overwrite]) Merges the contents of the given document into this document. modify([query]) Perform an atomic update of the document in the database and reload the document object using updated version. register_delete_rule(document_cls, …) This method registers the delete rules to apply when removing this object. reload(*fields, **kwargs) Reloads all attributes from the database. save([upsert, validate, clean, safe]) Saves the document to the database. select_related([max_depth]) Handles dereferencing of DBRef objects to a maximum depth in order to cut down the number queries to mongodb. set_field(field_name, value[, create, …]) Sets the value of a field of the document. switch_collection(collection_name[, …]) Temporarily switch the collection for a document instance. switch_db(db_alias[, keep_created]) Temporarily switch the database for a document instance. Returns an instance of DBRef useful in __raw__ queries. to_dict([extended]) Serializes this document to a BSON/JSON dictionary. to_json([pretty_print]) Serializes the document to a JSON string. to_mongo(*args, **kwargs) Return as SON data ready for use with MongoDB. update(**kwargs) Performs an update on the Document A convenience wrapper to update(). validate([clean]) Ensure that all fields’ values are valid and that required fields are present.

Classes:

 my_metaclass alias of mongoengine.base.metaclasses.TopLevelDocumentMetaclass
version

A unicode string field.

Parameters
• description (None) – an optional description

• info (None) – an optional info dict

type

A unicode string field.

Parameters
• description (None) – an optional description

• info (None) – an optional info dict

exception DoesNotExist

Bases: mongoengine.errors.DoesNotExist

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception MultipleObjectsReturned

Bases: mongoengine.errors.MultipleObjectsReturned

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

STRICT = False
cascade_save(**kwargs)

Recursively save any references and generic references on the document.

clean()

Hook for doing document level data cleaning (usually validation or assignment) before validation is run.

Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.

clear_field(field_name)

Clears the field from the document.

Parameters

field_name – the field name

Raises

ValueError – if the field does not exist

classmethod compare_indexes()

Compares the indexes defined in MongoEngine with the ones existing in the database. Returns any missing/extra indexes.

copy()

Returns a deep copy of the document.

Returns

a SerializableDocument

classmethod create_index(keys, background=False, **kwargs)

Creates the given indexes if required.

Parameters
• keys – a single index key or a list of index keys (to construct a multi-field index); keys may be prefixed with a + or a - to determine the index ordering

• background – Allows index creation in the background

delete(signal_kwargs=None, **write_concern)

Delete the Document from the database. This will only take effect if the document has been previously saved.

Parameters
• signal_kwargs – (optional) kwargs dictionary to be passed to the signal calls.

• write_concern – Extra keyword arguments are passed down which will be used as options for the resultant getLastError command. For example, save(..., w: 2, fsync: True) will wait until at least two servers have recorded the write and will force an fsync on the primary server.

classmethod drop_collection()

Drops the entire collection associated with this Document type from the database.

Raises OperationError if the document has no collection set (i.g. if it is abstract)

classmethod ensure_index(key_or_list, background=False, **kwargs)

Ensure that the given indexes are in place. Deprecated in favour of create_index.

Parameters
• key_or_list – a single index key or a list of index keys (to construct a multi-field index); keys may be prefixed with a + or a - to determine the index ordering

• background – Allows index creation in the background

classmethod ensure_indexes()

Checks the document meta data and ensures all the indexes exist.

Global defaults can be set in the meta - see Defining documents

By default, this will get called automatically upon first interaction with the Document collection (query, save, etc) so unless you disabled auto_create_index, you shouldn’t have to call this manually.

Note

You can disable automatic index creation by setting auto_create_index to False in the documents meta data

fancy_repr(class_name=None, select_fields=None, exclude_fields=None, **kwargs)

Generates a customizable string representation of the document.

Parameters
• class_name (None) – optional class name to use

• select_fields (None) – iterable of field names to restrict to

• exclude_fields (None) – iterable of field names to exclude

• **kwargs – additional key-value pairs to include in the string representation

Returns

a string representation of the document

property field_names

An ordered tuple of the public fields of this document.

field_to_mongo(field_name)
field_to_python(field_name, value)
classmethod from_dict(d, extended=False)

Loads the document from a BSON/JSON dictionary.

Parameters
• d – a dictionary

• extended (False) – whether the input dictionary may contain serialized extended JSON constructs

Returns

a SerializableDocument

classmethod from_json(s)

Loads the document from a JSON string.

Returns

a SerializableDocument

get_field(field_name)

Gets the field of the document.

Parameters

field_name – the field name

Returns

the field value

Raises

AttributeError – if the field does not exist

get_text_score()

Get text score from text query

has_field(field_name)

Determines whether the document has a field of the given name.

Parameters

field_name – the field name

Returns

True/False

id

A field wrapper around MongoDB’s ObjectIds.

property in_db

Whether the document has been inserted into the database.

iter_fields()

Returns an iterator over the (name, value) pairs of the public fields of the document.

Returns

an iterator that emits (name, value) tuples

classmethod list_indexes()

Lists all indexes that should be created for the Document collection. It includes all the indexes from super- and sub-classes.

Note that it will only return the indexes’ fields, not the indexes’ options

merge(doc, merge_lists=True, merge_dicts=True, overwrite=True)

Merges the contents of the given document into this document.

Parameters
• doc – a SerializableDocument of same type as this document

• merge_lists (True) – whether to merge the elements of top-level list fields rather than treating the list as a single value

• merge_dicts (True) – whether to recursively merge the contents of top-level dict fields rather than treating the dict as a single value

• overwrite (True) – whether to overwrite (True) or skip (False) existing fields

modify(query=None, **update)

Perform an atomic update of the document in the database and reload the document object using updated version.

Returns True if the document has been updated or False if the document in the database doesn’t match the query.

Note

All unsaved changes that have been made to the document are rejected if the method returns True.

Parameters
• query – the update will be performed only if the document in the database matches the query

• update – Django-style update keyword arguments

my_metaclass

alias of mongoengine.base.metaclasses.TopLevelDocumentMetaclass Methods:

 get_auto_id_names(new_class) Find a name for the automatic ID field for the given new class. mro Return a type’s method resolution order.
objects(q_obj=None, **query) = [<DatabaseConfigDocument: {'id': '641b6ac3156a33c29af7eb8f', 'version': '0.20.0', 'type': 'fiftyone'}>]
property pk

Get the primary key.

classmethod register_delete_rule(document_cls, field_name, rule)

This method registers the delete rules to apply when removing this object.

reload(*fields, **kwargs)

Reloads all attributes from the database.

Parameters
• fields – (optional) args list of fields to reload

• max_depth – (optional) depth of dereferencing to follow

save(upsert=False, validate=True, clean=True, safe=False, **kwargs)

Saves the document to the database.

If the document already exists, it will be updated, otherwise it will be created.

Parameters
• upsert (False) – whether to insert the document if it has an id populated but no document with that ID exists in the database

• validate (True) – whether to validate the document

• clean (True) – whether to call the document’s clean() method. Only applicable when validate is True

• safe (False) – whether to reload() the document before raising any errors

Returns

self

Handles dereferencing of DBRef objects to a maximum depth in order to cut down the number queries to mongodb.

set_field(field_name, value, create=True, validate=True, dynamic=False)

Sets the value of a field of the document.

Parameters
• field_name – the field name

• value – the field value

• create (True) – whether to create the field if it does not exist

Raises

ValueError – if field_name is not an allowed field name or does not exist and create == False

switch_collection(collection_name, keep_created=True)

Temporarily switch the collection for a document instance.

Only really useful for archiving off data and calling save():

user = User.objects.get(id=user_id)
user.switch_collection('old-users')
user.save()

Parameters
• collection_name (str) – The database alias to use for saving the document

• keep_created (bool) – keep self._created value after switching collection, else is reset to True

Use switch_db if you need to read from another database

switch_db(db_alias, keep_created=True)

Temporarily switch the database for a document instance.

Only really useful for archiving off data and calling save():

user = User.objects.get(id=user_id)
user.switch_db('archive-db')
user.save()

Parameters
• db_alias (str) – The database alias to use for saving the document

• keep_created (bool) – keep self._created value after switching db, else is reset to True

Use switch_collection if you need to read from another collection

to_dbref()

Returns an instance of DBRef useful in __raw__ queries.

to_dict(extended=False)

Serializes this document to a BSON/JSON dictionary.

Parameters

extended (False) – whether to serialize extended JSON constructs such as ObjectIDs, Binary, etc. into JSON format

Returns

a dict

to_json(pretty_print=False)

Serializes the document to a JSON string.

Parameters

pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

Returns

a JSON string

to_mongo(*args, **kwargs)

Return as SON data ready for use with MongoDB.

update(**kwargs)

Performs an update on the Document A convenience wrapper to update().

Raises OperationError if called on an object that has not yet been saved.

validate(clean=True)

Ensure that all fields’ values are valid and that required fields are present.

Raises ValidationError if any of the fields’ values are found to be invalid.

fiftyone.core.odm.database.get_db_config()

Retrieves the database config.

Returns
fiftyone.core.odm.database.cleanup_multiple_config_docs()

Internal utility that ensures that there is only one DatabaseConfigDocument in the database.

fiftyone.core.odm.database.establish_db_conn(config)

Establishes the database connection.

If fiftyone.config.database_uri is defined, then we connect to that URI. Otherwise, a fiftyone.core.service.DatabaseService is created.

Parameters
Raises
• ConnectionError – if a connection to mongod could not be established

• FiftyOneConfigError – if fiftyone.config.database_uri is not defined and mongod could not be found

• ServiceExecutableNotFound – if fiftyone.core.service.DatabaseService startup was attempted, but mongod was not found in fiftyone.db.bin

• RuntimeError – if the mongod found does not meet FiftyOne’s requirements, or validation could not occur

fiftyone.core.odm.database.aggregate(collection, pipelines)

Executes one or more aggregations on a collection.

Multiple aggregations are executed using multiple threads, and their results are returned as lists rather than cursors.

Parameters
• collection – a pymongo.collection.Collection or motor.motor_asyncio.AsyncIOMotorCollection

• pipelines – a MongoDB aggregation pipeline or a list of pipelines

Returns

• If a single pipeline is provided, a pymongo.command_cursor.CommandCursor or motor.motor_asyncio.AsyncIOMotorCommandCursor is returned

• If multiple pipelines are provided, each cursor is extracted into a list and the list of lists is returned

fiftyone.core.odm.database.get_db_client()

Returns a database client.

Returns

a pymongo.mongo_client.MongoClient

fiftyone.core.odm.database.get_db_conn()

Returns a connection to the database.

Returns

a pymongo.database.Database

fiftyone.core.odm.database.get_async_db_client()

Returns an async database client.

Returns

a motor.motor_asyncio.AsyncIOMotorClient

fiftyone.core.odm.database.get_async_db_conn()

Returns an async connection to the database.

Returns

a motor.motor_asyncio.AsyncIOMotorDatabase

fiftyone.core.odm.database.drop_database()

Drops the database.

fiftyone.core.odm.database.sync_database()

Syncs all pending database writes to disk.

fiftyone.core.odm.database.list_collections()

Returns a list of all collection names in the database.

Returns

a list of all collection names

fiftyone.core.odm.database.drop_collection(collection_name)

Drops specified collection from the database.

Parameters

collection_name – the collection name

fiftyone.core.odm.database.drop_orphan_collections(dry_run=False)

Drops all orphan collections from the database.

Orphan collections are collections that are not associated with any known dataset or other collections used by FiftyOne.

Parameters

dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_saved_views(dry_run=False)

Drops all orphan saved views from the database.

Orphan saved views are saved view documents that are not associated with any known dataset or other collections used by FiftyOne.

Parameters

dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.drop_orphan_runs(dry_run=False)

Drops all orphan runs from the database.

Orphan runs are runs that are not associated with any known dataset or other collections used by FiftyOne.

Parameters

dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.stream_collection(collection_name)

Streams the contents of the collection to stdout.

Parameters

collection_name – the name of the collection

fiftyone.core.odm.database.get_collection_stats(collection_name)

Parameters

collection_name – the name of the collection

Returns

a stats dict

fiftyone.core.odm.database.count_documents(coll, pipeline)
fiftyone.core.odm.database.export_document(doc, json_path)

Exports the document to disk in JSON format.

Parameters
• doc – a BSON document dict

• json_path – the path to write the JSON file

fiftyone.core.odm.database.export_collection(docs, json_dir_or_path, key='documents', patt='{idx:06d}-{id}.json', num_docs=None)

Exports the collection to disk in JSON format.

Parameters
• docs – an iterable containing the documents to export

• json_dir_or_path – the path to write a single JSON file containing the entire collection, or a directory in which to write per-document JSON files

• key ("documents") – the field name under which to store the documents when json_path is a single JSON file

• ("{idx (patt) – 06d}-{id}.json”): a filename pattern to use when json_path is a directory. The pattern may contain idx to refer to the index of the document in docs or id to refer to the document’s ID

• num_docs (None) – the total number of documents. If omitted, this must be computable via len(docs)

fiftyone.core.odm.database.import_document(json_path)

Imports a document from JSON on disk.

Parameters

json_path – the path to the document

Returns

a BSON document dict

fiftyone.core.odm.database.import_collection(json_dir_or_path, key='documents')

Imports the collection from JSON on disk.

Parameters
• json_dir_or_path – the path to a JSON file on disk, or a directory containing per-document JSON files

• key ("documents") – the field name under which the documents are stored when json_path is a single JSON file

Returns

a tuple of

• an iterable of BSON documents

• the number of documents

fiftyone.core.odm.database.insert_documents(docs, coll, ordered=False, progress=False, num_docs=None)

Inserts documents into a collection.

The _id field of the input documents will be populated if it is not already set.

Parameters
• docs – an iterable of BSON document dicts

• coll – a pymongo collection

• ordered (False) – whether the documents must be inserted in order

• progress (False) – whether to render a progress bar tracking the insertion

• num_docs (None) – the total number of documents. Only used when progress=True. If omitted, this will be computed via len(docs), if possible

Returns

a list of IDs of the inserted documents

fiftyone.core.odm.database.bulk_write(ops, coll, ordered=False)

Performs a batch of write operations on a collection.

Parameters
• ops – a list of pymongo operations

• coll – a pymongo collection

• ordered (False) – whether the operations must be performed in order

fiftyone.core.odm.database.list_datasets()

Returns the list of available FiftyOne datasets.

This is a low-level implementation of dataset listing that does not call fiftyone.core.dataset.list_datasets(), which is helpful if a database may be corrupted.

Returns

a list of Dataset names

fiftyone.core.odm.database.delete_dataset(name, dry_run=False)

Deletes the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters
• name – the name of the dataset

• dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_annotation_run(name, anno_key, dry_run=False)

Deletes the annotation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_annotation_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters
• name – the name of the dataset

• anno_key – the annotation key

• dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_annotation_runs(name, dry_run=False)

Deletes all annotation runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_annotation_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters
• name – the name of the dataset

• dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_brain_run(name, brain_key, dry_run=False)

Deletes the brain method run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_brain_run(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters
• name – the name of the dataset

• brain_key – the brain key

• dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_brain_runs(name, dry_run=False)

Deletes all brain method runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_brain_runs(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters
• name – the name of the dataset

• dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_evaluation(name, eval_key, dry_run=False)

Deletes the evaluation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_evaluation(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters
• name – the name of the dataset

• eval_key – the evaluation key

• dry_run (False) – whether to log the actions that would be taken but not perform them

fiftyone.core.odm.database.delete_evaluations(name, dry_run=False)

Deletes all evaluations from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset() or fiftyone.core.collections.SampleCollection.delete_evaluations(), which is helpful if a dataset’s backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup().

Parameters
• name – the name of the dataset

• dry_run (False) – whether to log the actions that would be taken but not perform them