Big Data Discovery backup strategy

Oracle recommends that you back up your system to ensure the safety of your data. This topic lists the resources you should back up, as well as their locations.

Backups must be performed manually and cold. A cold backup guarantees that your project data sets, Studio database, and sample files remain in synch. This involves (at a minimum) shutting down the Dgraph and HDFS Agent to prevent them from performing an ingest during the backup procedure.

Resource Location Description/notes

dateFormats.txt

edp_classpath.txt

logging.properties

sparkContext.properties

spark_worker_files.txt

The location on HDFS defined by the hdfsEdpLibPath property in the data_processing_CLI file. By default, this is /user/bdd/edp/lib. These files contain configuration settings specific to your system.
/dataSwamp The location on HDFS defined by the edpDataDir property in the data_processing_CLI file. By default, this is /user/bdd/edp/data. This directory contains the Avro files for your sample data sets.
Data Processing log files (edpLog*.log) The location on each node defined by the edpJarDir property in the data_processing-CLI file. By default, this is /opt/bdd/edp/data. The Data Processing log files are located on each node that has been involved in a Data Processing job. These include the client that started the job (which could be nodes running the CLI, the Hive Table Detector, or Studio), an Oozie worker node, or a Spark worker node.
The ZooKeeper infrastructure on CDH nodes,/endeca-cluster znode Refer to Cloudera's documentation for backup instructions.
HDFS and other Hadoop resources Refer to Cloudera's documentation for backup instructions.
Studio's database Refer to your database's documentation for backup instructions.
$MW_HOME The location defined by the ORACLE_HOME property in bdd.conf. By default, this is /localdisk/Oracle/Middleware. Back up this location on the WebLogic Admin Server node and each Weblogic Managed Server node in the BDD cluster.
$DOMAIN_HOME The root directory of Studio and your WebLogic domain. By default, this is $MW_HOME/user_projects/domains/bdd_domain. Back up this location on the WebLogic Admin Server node and each Weblogic Managed Server node in the BDD cluster.
The Dgraph index The location on the NFS defined by the DGRAPH_INDEX_DIR and DGRAPH_INDEX_NAME properties in the bdd.conf file. This location contains the indexes for all of your data sets.