The
backup command creates a backup of the cluster's data and
metadata to a single TAR file that can later be used to restore it.
Note: backup can't be run if
start,
stop,
restart,
restore,
publish-config, or
reshape-nodes is currently running.
To back up the cluster, run the following from the Admin Server:
./bdd-admin.sh backup [option <arg>] <file>
Where
<file> is the absolute path to the backup TAR
file. This must not exist and its parent directory must be writable.
backup supports the following options.
Option
|
Description
|
-o, --offline
|
Performs a cold backup. Use this option if
your cluster is down. If this option isn't specified, the script performs a hot
backup.
For more information on hot and cold backups, see
Hot vs. cold backups.
|
-r, --repeat <num>
|
The number of times to repeat the backup
process if verification fails. This is only used for hot backups.
If this option isn't specified, the script makes one attempt
to back up the cluster. If it fails, the script must be rerun.
For more information, see
Verification.
|
-l, --local-tmp <path>
|
The absolute path to the temporary
directory on the Admin Server used during the backup operation. If this option
isn't specified, the location defined by
BACKUP_LOCAL_TEMP_FOLDER_PATH in
$BDD_HOME/BDD_manager/conf/bdd.conf is
used.
|
-d, --hdfs-tmp
<path>
|
The absolute path to the temporary
directory in HDFS used during the backup operation. If this option isn't
specified, the location defined by
BACKUP_HDFS_TEMP_FOLDER_PATH in
$BDD_HOME/BDD_manager/conf/bdd.conf is
used.
|
-v, --verbose
|
Enables debugging messages.
|
If no options are specified, the script makes one attempt to perform a
hot backup and doesn't output debugging messages.
For detailed instructions on backing up the cluster, see
Backing up BDD.
Prerequisites
Before running
backup, verify the following:
- You can provide the script
with the usernames and passwords for all component databases. You can either
enter this information at runtime or set the following environment variables.
Note that if you have HDP, you must also provide the username and password for
Ambari.
- BDD_STUDIO_JDBC_USERNAME:
The username for the Studio database
- BDD_STUDIO_JDBC_PASSWORD:
The password for the Studio database
- BDD_WORKFLOW_MANAGER_JDBC_USERNAME:
The username for the Workflow Manager Service database
- BDD_WORKFLOW_MANAGER_JDBC_PASSWORD:
The password for the Workflow Manager Service database
- BDD_HADOOP_UI_USERNAME:
The username for Ambari (HDP only)
- BDD_HADOOP_UI_PASSWORD:
The password for Ambari (HDP only)
- You have an Oracle or
MySQL database. Hypersonic isn't supported.
- The database client is
installed on the Admin Server. For MySQL databases, this should be MySQL
client. For Oracle databases, this should be Oracle Database Client, installed
with a type of Administrator. The Instant Client isn't supported.
- For Oracle databases, the
ORACLE_HOME environment variable must be set to the
directory one level above the
/bin directory that the
sqlplus executable is located in. For example, if
the
sqlplus executable is located in
/u01/app/oracle/product/11/2/0/dbhome/bin,
ORACLE_HOME should be set to
/u01/app/oracle/product/11/2/0/dbhome.
- The temporary directories
used during the backup operation contain enough free space. For more
information, see
Space requirements
below.
Backed-up data
The following data are included in the backup:
- The Dgraph databases
- The databases used by
Studio and the Workflow Manager Service
- The user sandbox data in
the directory defined by
SANDBOX_PATH in
$BDD_HOME/BDD_manager/conf/bdd.conf
- The HDFS sample data in
$SANDBOX_PATH/edp/data/.swampData
- $BDD_HOME/BDD_manager/conf/bdd.conf
- The Hadoop server
certificates (if TLS/SSL is enabled)
- Studio configuration from
portal-ext.properties and
esconfig.properties
- The DP CLI black- and
white-lists (cli_blacklist.txt and
cli_whitelist.txt)
- The OPSS files
cwallet.sso and
system-jzn-data.xml
Note that transient data, like state in Studio, is not backed up. This
information will be lost if the cluster is restored.
Space requirements
When the script runs, it verifies that the temporary directories it
uses contain enough free space. These requirements only need to be met for the
duration of the backup operation.
- The destination of the
backup TAR file must contain enough space to store the Dgraph databases,
$HDFS_DP_USER_DIR, and the
edpDataDir (defined in
edp.properties) at the same time.
- The
local-tmp directory on the Admin Server also
requires enough space to store all three items simultaneously.
- The
hdfs-tmp directory in HDFS must contain enough
free space to accommodate the largest of these items, as it will only store
them one at a time.
If these requirements aren't met, the script will fail.
Hot vs. cold backups
backup can perform both hot and cold backups:
- Hot backups are performed
while the cluster is running. Specifically, they're performed on the first
Managed Server (defined by
MANAGED_SERVERS in
$BDD_HOME/BDD_manager/conf/bdd.conf), and
require that the components on that node are running. This is
backup's default behavior.
- Cold backups are performed
while the cluster is down. You must include the
-o option to perform a cold backup.
Verification
Because hot backups are performed while the cluster is running, it's
possible for the data in the backups of the Studio and Dgraph databases and
sample files to become inconsistent. For example, something could be added to a
Dgraph database after the database was backed up, which would make the data in
those locations different.
To prevent this,
backup verifies that the data in all three backups is
consistent. If it isn't, the operation fails.
By default,
backup only backs up and verifies the data once.
However, it can be configured to repeat this process by including the
-r <num> option, where
<num> is the number of times to repeat the
backup and verification steps. This increases the likelihood that the operation
will succeed.
Note: It's unlikely that verification will fail the first time, so it's
not necessary to repeat the process more than once or twice.
Examples
The following command performs a hot backup with debugging messages:
./bdd-admin.sh backup -v /tmp/bdd_backup1.tar
The following command performs a cold backup:
./bdd-admin.sh backup -o /tmp/bdd_backup2.tar