backup

The backup command creates a backup of the cluster's data and metadata to a single TAR file that can later be used to restore it.

Note: backup can't be run if start, stop, restart, restore, publish-config, or reshape-nodes is currently running.
To back up the cluster, run the following from the Admin Server:
./bdd-admin.sh backup [option <arg>] <file>
Where <file> is the absolute path to the backup TAR file. This must not exist and its parent directory must be writable.

backup supports the following options.

Option Description
-o, --offline Performs a cold backup. Use this option if your cluster is down. If this option isn't specified, the script performs a hot backup.

For more information on hot and cold backups, see Hot vs. cold backups.

-r, --repeat <num> The number of times to repeat the backup process if verification fails. This is only used for hot backups.

If this option isn't specified, the script makes one attempt to back up the cluster. If it fails, the script must be rerun.

For more information, see Verification.

-l, --local-tmp <path> The absolute path to the temporary directory on the Admin Server used during the backup operation. If this option isn't specified, the location defined by BACKUP_LOCAL_TEMP_FOLDER_PATH in $BDD_HOME/BDD_manager/conf/bdd.conf is used.
-d, --hdfs-tmp <path> The absolute path to the temporary directory in HDFS used during the backup operation. If this option isn't specified, the location defined by BACKUP_HDFS_TEMP_FOLDER_PATH in $BDD_HOME/BDD_manager/conf/bdd.conf is used.
-v, --verbose Enables debugging messages.

If no options are specified, the script makes one attempt to perform a hot backup and doesn't output debugging messages.

For detailed instructions on backing up the cluster, see Backing up BDD.

Prerequisites

Before running backup, verify the following:
  • You can provide the script with the usernames and passwords for all component databases. You can either enter this information at runtime or set the following environment variables. Note that if you have HDP, you must also provide the username and password for Ambari.
    • BDD_STUDIO_JDBC_USERNAME: The username for the Studio database
    • BDD_STUDIO_JDBC_PASSWORD: The password for the Studio database
    • BDD_WORKFLOW_MANAGER_JDBC_USERNAME: The username for the Workflow Manager Service database
    • BDD_WORKFLOW_MANAGER_JDBC_PASSWORD: The password for the Workflow Manager Service database
    • BDD_HADOOP_UI_USERNAME: The username for Ambari (HDP only)
    • BDD_HADOOP_UI_PASSWORD: The password for Ambari (HDP only)
  • You have an Oracle or MySQL database. Hypersonic isn't supported.
  • The database client is installed on the Admin Server. For MySQL databases, this should be MySQL client. For Oracle databases, this should be Oracle Database Client, installed with a type of Administrator. The Instant Client isn't supported.
  • For Oracle databases, the ORACLE_HOME environment variable must be set to the directory one level above the /bin directory that the sqlplus executable is located in. For example, if the sqlplus executable is located in /u01/app/oracle/product/11/2/0/dbhome/bin, ORACLE_HOME should be set to /u01/app/oracle/product/11/2/0/dbhome.
  • The temporary directories used during the backup operation contain enough free space. For more information, see Space requirements below.

Backed-up data

The following data are included in the backup:
  • The Dgraph databases
  • The databases used by Studio and the Workflow Manager Service
  • The user sandbox data in the directory defined by SANDBOX_PATH in $BDD_HOME/BDD_manager/conf/bdd.conf
  • The HDFS sample data in $SANDBOX_PATH/edp/data/.swampData
  • $BDD_HOME/BDD_manager/conf/bdd.conf
  • The Hadoop server certificates (if TLS/SSL is enabled)
  • Studio configuration from portal-ext.properties and esconfig.properties
  • The DP CLI black- and white-lists (cli_blacklist.txt and cli_whitelist.txt)
  • The OPSS files cwallet.sso and system-jzn-data.xml

Note that transient data, like state in Studio, is not backed up. This information will be lost if the cluster is restored.

Space requirements

When the script runs, it verifies that the temporary directories it uses contain enough free space. These requirements only need to be met for the duration of the backup operation.
  • The destination of the backup TAR file must contain enough space to store the Dgraph databases, $HDFS_DP_USER_DIR, and the edpDataDir (defined in edp.properties) at the same time.
  • The local-tmp directory on the Admin Server also requires enough space to store all three items simultaneously.
  • The hdfs-tmp directory in HDFS must contain enough free space to accommodate the largest of these items, as it will only store them one at a time.

If these requirements aren't met, the script will fail.

Hot vs. cold backups

backup can perform both hot and cold backups:
  • Hot backups are performed while the cluster is running. Specifically, they're performed on the first Managed Server (defined by MANAGED_SERVERS in $BDD_HOME/BDD_manager/conf/bdd.conf), and require that the components on that node are running. This is backup's default behavior.
  • Cold backups are performed while the cluster is down. You must include the -o option to perform a cold backup.

Verification

Because hot backups are performed while the cluster is running, it's possible for the data in the backups of the Studio and Dgraph databases and sample files to become inconsistent. For example, something could be added to a Dgraph database after the database was backed up, which would make the data in those locations different.

To prevent this, backup verifies that the data in all three backups is consistent. If it isn't, the operation fails.

By default, backup only backs up and verifies the data once. However, it can be configured to repeat this process by including the -r <num> option, where <num> is the number of times to repeat the backup and verification steps. This increases the likelihood that the operation will succeed.

Note: It's unlikely that verification will fail the first time, so it's not necessary to repeat the process more than once or twice.

Examples

The following command performs a hot backup with debugging messages:
./bdd-admin.sh backup -v /tmp/bdd_backup1.tar
The following command performs a cold backup:
./bdd-admin.sh backup -o /tmp/bdd_backup2.tar