About the Dgraph

The Dgraph is a component of Big Data Discovery that runs search analytical processing of the data sets. It handles query requests users make to data sets.

The Dgraph uses data structures and algorithms to provide real-time responses to client requests for analytic processing and data summarization. When source data is loaded into Big Data Discovery, the Dgraph creates a separate Dgraph database for each of the data sets. When the Dgraph receives a client request through Studio, the Dgraph queries the appropriate database and returns the results.

An Oracle Big Data Discovery cluster has one or more Dgraph processes that handle end-user query requests accessing the Dgraph databases on shared storage. One of the Dgraphs in a Big Data Discovery cluster is the leader for a particular database and therefore is responsible for handling all write operations (updates, configuration changes) for that database, while the remaining Dgraphs may serve as read-only followers.

About Dgraph databases

When a data set is created (either from Studio or via the DP CLI), the Dgraph creates a database for it. (A Dgraph database is known also as an index.) The Dgraph database is named:

<dataset>_indexes

where dataset is the name of the data set and "_indexes" is appended to the data set name. For example:

edp_cli_edp_256b0c6b-cacf-478c-80bf-b5332f4f37ae_indexes

Each data set has its own Dgraph database, and there is only one data set per Dgraph database. The databases are stored in the directory you specify for the DGRAPH_INDEX_DIR property in the bdd.conf file. This directory is called the Dgraph databases directory.

The Dgraph databases directory also contains three internal, system-created databases that are used by Studio:

system-bddProjectInventory_indexes
system-bddDatasetInventory_indexes
system-bddSemanticEntity_indexes

For example, if you create two data sets, Wine and Weather, in Studio, the Dgraph databases directory creates five databases (one for each of the two data sets and three internal databases). You may also see other databases in the Dgraph databases directory; they may be created as a result of committing a transformed data set.

This diagram illustrates this example:

This diagram shows that the Dgraph databases directory includes multiple databases, or indexes, for each of the data sets in BDD.

When a Dgraph database is created, it is automatically mounted by the Dgraph. Unmounted databases are also automatically mounted when the Dgraph receives a query that accesses the database's data. When a database is mounted, a log entry is made in the Dgraph out log, as in this example:

DGRAPH	NOTIFICATION  	{database}	[0]	Mounting database edp_cli_edp_256b0c6b-cacf-478c-80bf-b5332f4f37ae

Note that the entry is made by the Dgraph database log subsystem.

The database name also appears in other BDD component messages. For example, the name of a DP workflow in a YARN log will contain the database name:

EDP: ProvisionDataSetFromHiveConfig{hiveDatabaseName=default, hiveTableName=warrantyclaims, 
newCollectionId=MdexCollectionIdentifier{databaseName=edp_cli_edp_256b0c6b-cacf-478c-80bf-b5332f4f37ae, 
collectionName=edp_cli_edp_256b0c6b-cacf-478c-80bf-b5332f4f37ae}}

You should also see database names in the logs for Studio, Dgraph HDFS Agent, and Transform Service.

Dgraph support for HDFS Data at Rest Encryption

The HDFS Data at Rest Encryption feature, when enabled, allows data to be stored in encrypted HDFS directories called encryption zones. All files within an encryption zone are transparently encrypted and decrypted on the client side. Decrypted data is therefore never stored in HDFS.

If you have enabled HDFS Data at Rest Encryption, you can store your Dgraph databases in an encryption zone in HDFS. For details on enabling HDFS Data at Rest Encryption, see the Installation Guide.

Dgraph Tracing Utility

The Dgraph Tracing Utility is a Dgraph diagnostic program used by Oracle Support. It stores the Dgraph trace data, which are useful in troubleshooting the Dgraph. It starts when the Dgraph starts, and keeps track of all Dgraph operations. It stops when the Dgraph shuts down. You can save and download trace data to share it with Oracle Support.

The Tracing Utility stores the Dgraph target trace data it collects in *.ebb files, which are useful in analyzing Dgraph crashes. The files are intended for use by Oracle Support. The files are saved in the $DGRAPH_HOME/bin directory. You can also manually generate and save the trace data with the bdd-admin script's get-blackbox command, as described in get-blackbox.