Migrate from Big Data Cloud Compute Edition

Find out how to migrate from Oracle Big Data Cloud Compute Edition (BDCE or BDC) to Big Data Service

Migrating is done in several steps. You can migrate your artifacts to OCI Big Data Service from BDC on Oracle Cloud Infrastructure Classic or BDC on Oracle Cloud Infrastructure.At a high level, you do the following:

Export your existing cloud resources from BDC to Object Storage.
Import the exported cloud resources from Object Storage to Big Data Service

Prerequisites

Before you begin, ensure the following:

You are a valid user to a compartment in Big Data Service
You are enabled to do the following:
- Access the OCI console using your credentials
- Create a bucket in Oracle Object Storage so that you can copy the HDFS data. For information about Oracle Object Storage, see Overview of Object Storage.
- Inspect the OCI Object Store configuration
For more information, see Getting Started with Big Data Service.

You have the following OCI parameter values with you:


Value	Details
Tenancy ID	The OCID of the tenancy. For example, `ocid1.tenancy.oc1..aaaaaaaa5syd62crbj5xpfajpmopoqasxy7jwxk6ihopm5vk6bxkncyp56kc`. For more information, see Where to Get the Tenancy's OCID and User's OCID
User ID	The OCID of the user. For example, `ocid1.user.oc1..aaaaaaaa3pnl7qz4c2x2mpq4v4g2mp3wktxoyahwysmjrapgzjoyd3edxltp`. For more information, see Where to Get the Tenancy's OCID and User's OCID
API signing key	Required for an application user. For example, `03:8c:ef:51:c8:fe:6b:22:0c:5d:3c:43:a8:ff:58:d9`. For information about generating and uploading the API signing key, see the following topics: How to Generate an API Signing Key How to Upload the Public Key
Passphrase for the signing key (Optional)	Required if you have generated the key pair with a passphrase.
Fingerprint for the signing key	The fingerprint and passphrase of the signing key are created while generating and uploading the API signing key. For more information, see How to Get the Key's Fingerprint.
Bucket and tenancy name	For example, oci://myBucket@myTenancy/ For information about buckets, see Putting Data into Object Storage.
OCI Cloud Storage URL	The host name. For example, https://objectstorage.us-phoenix-1.oraclecloud.com. For more information, see Create a Cluster.

Exporting Resources

The resources that you can export from Big Data Cloud Compute Edition (BDC) is as follows:


Artifact in BDC	Exported Artifacts	Artifacts in OCI Big Data Service (BDS)
Data in HDFS	Copied into OCI Object Store at `oci://<bucket>@<tenancy>/<exportedHdfsDir>` For example: `oci://myStorageBucket@myTenancy/exportedHdfsDir`	Copy the exported data from the OCI Object Store to target BDS HDFS directories.
Data in OCI-Classic Object Store Note: This artifact doesn't apply to Oracle Big Data Cloud on Oracle Cloud Infrastructure.	Copied into OCI Object Store at `oci://<bucket>@<tenancy>/<exportedObjDir>` For example: `oci://myStorageBucket@myTenancy/exportedObjDir`
Hive Metadata	Generate the Hive DDL statements on the BDC cluster.	Copy the Hive DDL statements from the BDC cluster into the BDS cluster, and execute them.
Zeppelin Notebooks	Export the Zeppelin notebook definitions as a `.tar.gz` file from `/user/zeppelin/notebook` in HDFS. This is done using a script provided by Oracle.	Currently, importing Zeppelin Notebooks is not supported in BDS.
HDFS, YARN, Spark Configuration Files	Export the configuration files as a `.tar.gz` file using a utility script provided by Oracle.	As BDS has optimized configuration settings for HDFS, YARN, and Spark, you need not import the configuration files and versions from BDC.
Versions of various Open Source components	Export the service version details using Ambari REST API. Customers can also get version details from Ambari (Admin -> Stack and Versions).

Migrating Resources Using WANdisco LiveData Migrator

Ensure that Port 8020 opens at the destination.

For information about WANdisco LiveData Migrator, select here.

To migrate resources using WANdisco LiveData Migrator, follow these steps:

Install LiveData migrator on any edge of the source cluster by running the following commands:

wget https://wandisco.com/downloads/livedata-migrator.sh
 
chmod +x livedata-migrator.sh && ./livedata-migrator.sh
 
service livedata-migrator status
service hivemigrator status
service livedata-ui status

After the installation and setup of the LiveData migrator is complete, access the UI and create your user account. The URL of the UI is as follows:
```
http://<LDM-Installation-Host.com>:8081
```
Do the following to migrate data:
1. Configure source filesystem.
  To add a source filesystem, on your LiveData Migrator dashboard, do the following:
  1. From the Products panel, select the relevant instance.
  2. In the Filesystem Configuration page, select Add source filesystem.
2. Configure target filesystem.
  To add a target filesystem, on your LiveData Migrator dashboard, do the following:
  1. From the Products panel, select the relevant instance.
  2. In the Filesystem Configuration page, select Add target filesystem.
  3. Select Apache Hadoop for Target as BDS cluster and provide the default filesystem path. Make sure that source and target connect to destination on 8020 port.
3. Create a path mapping.
  Path mapping enables migrated data to be stored at an equivalent default location on the target. To create path mappings using the UI, follow these steps:
  1. From the Products list on the dashboard, select the LiveData Migrator instance for which you want to create a path mapping.
  2. From the Migrations menu, select Path Mappings.
  3. At the top right of the interface, select the Add New Path button.
4. Create a migration.
  Migrations transfer existing data from the defined source to a target. To create a new migration from the UI, follow these steps:
  1. Provide a name for the migration.
  2. From your filesystems, select a source and target.
  3. Select the Path on your source filesystem that you want to migrate. Use the folder browser and select the path name you want to migrate. Select the grey folder next to a path name to view its subdirectories.
Migrate the metadata.
To migrate the metadata, follow these steps:
1. Export Hive metadata from the source BDC cluster. For more information, see Exporting Hive Metadata.
2. Import the metadata to the destination BDS ODH 1.0 cluster. For more information, see Importing Metadata.

Migrating Resources Using the Distcp Tool

You can also migrate data and metadata from Big Data Cloud Compute Edition and import them to the Big Data Service using the Distcp tool. Distcp is an open source tool that can be used to copy large data sets between distributed file systems within and across clusters.

Preparing the BDC Cluster

Find out how to prepare the BDC cluster for export.

Set the HDFS service in maintenance mode to block alerts and bulk operations. For more information, see Set Maintenance Mode for a Service.
Connect to the BDC cluster through SSH as an opc user. For more information, see Connect to a Cluster Node Through Secure Shell (SSH).

Run the following commands to set the HDFS service in safe mode:

$ sudo su hdfs -c 'hdfs dfsadmin -safemode get'
Safe mode is OFF
 
$ sudo su hdfs -c 'hdfs dfsadmin -safemode enter'
Safe mode is ON

Do the following:
1. As a root user, copy your OCI private key (for example: ~/.oci/oci_api_key.pem) to the /home/opc/.oci directory on the BDC cluster.
```
sudo su -
cp <private-key> /home/opc/.oci/oci_api_key.pem
```
2. Change the private key file permissions to read-only.
```
chmod 444 /home/opc/.oci/oci_api_key.pem
```
3. Change the ownership of the private PEM key file to hdfs user.
```
chown -R hdfs:hdfs /home/opc/.oci/oci_api_key.pem
```
4. Create a directory named data that the HDFS user can access.
5. Copy the OCI private key from the /home/opc/.oci directory to the /data directory.
```
cp /home/opc/.oci/oci_api_key.pem /data
```
6. As a root user, change the private key file in the /data directory to read-only.
```
chmod 444 /data/oci_api_key.pem
```
If the BDC cluster has multiple nodes, ensure that the OCI private key is available in all the nodes with the same path. You can use otools user to copy files to all the nodes. However, ensure that the file is readable by otools. After you copy the files to all nodes, set the original permissions back.
To install the OCI private key on all the hosts, as a otools user, execute the following commands in the given sequence for each host:
```
sudo su - otools
ssh <hostname> sudo mkdir /data
scp oci_api_key.pem <hostname>:/data
ssh <hostname> sudo chown hdfs:hdfs /data/oci_api_key.pem
ssh <hostname> sudo chmod 444 /data/oci_api_key.pem
```

Exporting Data from HDFS

To export data from HDFS, follow these steps:

Set up the distcp tool. The distcp tool requires an updated oci-hdfs driver. Therefore, you must download the driver to the host where the distcp tool needs to be run. To download the driver, use the following commands:
1. Run the following commands to create a directory named hdfsjar under /data and to navigate to the directory.
  mkdir -p /data/hdfsjars cd /data/hdfsjars
2. Download oci-hdfs.zip.
  wget https://github.com/oracle/oci-hdfs-connector/releases/download/v2.7.7.0/oci-hdfs.zip
3. Extract the oci-hdfs.zip file.
  1. Unzip the file under the /data/hdfsjars/lib directory.
    unzip oci-hdfs.zip lib/oci-hdfs-full-2.7.7.0.jar
  2. Move the lib/oci-hdfs-full-2.7.7.0.jar to the current working directory, that is, /data/hdfsjar.
    mv lib/oci-hdfs-full-2.7.7.0.jar .
  3. Remove the lib directory and the oci-hdfs.zip file.
    rm -rf lib/ oci-hdfs.zip
Set the environment variables to point to the oci-hdfs-full-2.7.7.0.jar.
```
export LIBJARS=/data/hdfsjars/oci-hdfs-full-2.7.7.0.jar
```

(Optional) Configure the proxy by setting the following variable with sun.net.inetaddr.ttl.

export DISTCP_PROXY_OPTS="-Dhttps.proxyHost=<proxy-host> -Dhttps.proxyPort=<proxy-port> -Dsun.net.inetaddr.ttl=60"

Test the configuration by listing the contents in the OCI storage container.

hadoop fs -Dfs.oci.client.auth.fingerprint=<fingerprint> \
-Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
-Dfs.oci.client.auth.passphrase=<passphrase> \
-Dfs.oci.client.auth.tenantId=<OCID forTenancy> \
-Dfs.oci.client.auth.userId=<OCID forUser> \
-Dfs.oci.client.hostname=<HostName. Example: https://objectstorage.us-phoenix-1.oraclecloud.com/> \
-Dfs.oci.client.proxy.uri=http://<proxyhost>:<proxyport> \
-ls oci://<bucket>@<namespace>/

To copy all the HDFS data under a specific directory in the OCI Object Store bucket, create a directory under the bucket.

hadoop fs -Dfs.oci.client.auth.fingerprint=<fingerprint> \
      -Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
      -Dfs.oci.client.auth.passphrase=<passphrase> \
      -Dfs.oci.client.auth.tenantId=<OCID for Tenancy> \
      -Dfs.oci.client.auth.userId=<OCID for User> \
      -Dfs.oci.client.hostname=<HostName. Example: https://objectstorage.us-phoenix-1.oraclecloud.com/> \
      -Dfs.oci.client.proxy.uri=http://<proxyhost>:<proxyport>  \
      -mkdir oci://<bucket>@<namespace>/<Dir>

Run the following commands to verify that the directory is created in Object Store.

hadoop fs -Dfs.oci.client.auth.fingerprint=<fingerprint> \
      -Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
      -Dfs.oci.client.auth.passphrase=<passphrase> \
      -Dfs.oci.client.auth.tenantId=<OCID forTenancy> \
      -Dfs.oci.client.auth.userId=<OCID forUser> \
      -Dfs.oci.client.hostname=<HostName>\
      -ls oci://<bucket>@<namespace>/

Do one of the following to export HDFS data:

Use hadoop fs -cp to export HDFS data from a single node cluster to Object Store as an HDFS user.

sudo su - hdfs
hadoop fs -Dfs.oci.client.auth.fingerprint=<fingerprint> \
      -Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
      -Dfs.oci.client.auth.passphrase=<passphrase> \
      -Dfs.oci.client.auth.tenantId=<OCID for Tenancy> \
      -Dfs.oci.client.auth.userId=<OCID for User> \
      -Dfs.oci.client.hostname=<HostName> \
      -Dfs.oci.client.proxy.uri=http://<proxyhost>:<proxyport>  \
      -cp <hdfs-dir-to-copy> oci://<bucket>@<namespace>/<dir>/

Use hadoop distcp to export HDFS data from multi-node clusters to Object Store. Validate it against the specific directory. For example, use /user/oozie/ for <hdfs-dir-to-copy> in the following command and then run it for the entire HDFS (/).

hadoop distcp -libjars ${LIBJARS} \
-Dfs.client.socket-timeout=3000000
-Dfs.oci.client.auth.fingerprint=<fingerprint> \
-Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
-Dfs.oci.client.auth.passphrase=<passphrase> \
-Dfs.oci.client.auth.tenantId=<OCID for Tenancy> \
-Dfs.oci.client.auth.userId=<OCID for User> \
-Dfs.oci.client.hostname=<HostName. Example: https://objectstorage.us-phoenix-1.oraclecloud.com/> \
-Dfs.oci.client.multipart.allowed=true \
-Dfs.oci.client.proxy.uri=<http://proxy-host>:port \
-Dmapreduce.map.java.opts="$DISTCP_PROXY_OPTS" \
-Dmapreduce.reduce.java.opts="$DISTCP_PROXY_OPTS" \
-Dmapreduce.task.timeout=6000000 \
-skipcrccheck -m 40 -bandwidth 500 \
-update -strategy dynamic -i <hdfs-dir-to-copy> oci://<bucket>@<namespace>/<exportedHdfsDir>/

Verify the contents of Object Store.

hadoop fs -Dfs.oci.client.auth.fingerprint=<fingerprint> \
     -Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
     -Dfs.oci.client.auth.passphrase=<passphrase> \
     -Dfs.oci.client.auth.tenantId=<OCID forTenancy> \
     -Dfs.oci.client.auth.userId=<OCID forUser> \
     -Dfs.oci.client.hostname=<HostName>\
     -ls oci://<bucket>@<namespace>/<dir>

Create metadata files and export them to Object Store.

Download the following zip file:
files/create_metadata.zip
Unzip the file and place create_metadata.sh and create_permissions.py in one directory.
Run the following shell script. This script creates multiple metadata files and places them under the /metadata directory in HDFS.
```
bash create_metadata.sh <hdfs_directory>
```

Export the metadata files to Object Store.

hadoop distcp -Dfs.client.socket-timeout=3000000 \
    -Dfs.oci.client.auth.fingerprint=67:3a:86:d2:8b:61:b0:fd:c4:21:58:92:5a:d2:e8:83 \
    -Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
    -Dfs.oci.client.auth.tenantId=ocid1.tenancy.oc1..aaaaaaaajh6gxgbuudok7odjpwe5wyv2iq74ihhac5lyulxdgrhsssxc4hna \
    -Dfs.oci.client.auth.userId=ocid1.user.oc1..aaaaaaaaeslrldhdvniuue3dsi7p3xblpfpsluxtx2g6txg5iiukfhzg77ua \
    -Dfs.oci.client.hostname=https://objectstorage.us-ashburn-1.oraclecloud.com/ \
    -Dfs.oci.client.multipart.allowed=true -Dmapreduce.task.timeout=6000000  \
    -skipcrccheck -m 20 -bandwidth 100 -update -strategy dynamic -i /metadata/Metadata* oci://<bucket>@<namespace>/metadata

Exporting Hive Metadata

To export Hive metadata, follow these steps:

Get the target Big Data Service HDFS path to update hive_migrate.properties.
1. To retrieve the TARGET_HDFS_PATH, logon to the target BDS cluster using the Bastion host.
  cat /etc/hadoop/conf/core-site.xml | grep -A1 "fs.defaultFS"
2. Note down the TARGET_HDFS_PATH.

Update the TARGET_HDFS_PATH and TARGET_OBJECTSTORE_PATH in the hive_migrate.properties script to the location where the Hive table data is available after the migration. You need not escape the forward slashes in the path.

#!/bin/bash
#
# Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
# Properties file to overide
#
# This file will be sourced by the generate_target_ddl_from_source.sh
# to customize/overide properties
# SRC_HDFS_PATH and SRC_OBJECTSTORE_PATH will be derived automatically from the cluster.
# You will be customizing
#     RUNDIR , TARGET_HDFS_PATH, TARGET_OBJECTSTORE_PATH
#
# ------------------------------------------------------------
# Location to hold intermediate and final scripts to be generated.
# You must have write privileges on this directory
export RUNDIR=/tmp/hivemigrate
 
# ------------------------------------------------------------
# Specify current hive.metastore.warehouse.dir from Hive Config.
#
export SRC_WAREHOUSE_DIR=/apps/hive/warehouse
 
 
# ------------------------------------------------------------
# Modify expected target BDS hdfs path in your hive DDL script
#
export TARGET_HDFS_PATH=hdfs://testbdmn0.bmbdcsad1.bmbdcs.oraclevcn.com:8020/warehouse/tablespace/external/hive
 
 
# ------------------------------------------------------------
# Modify expected target BDS Object Store path in your hive DDL script
#
export TARGET_OBJECTSTORE_PATH=oci://bdc2odh@bdsdevcluster/warehouse

Connect to the BDC cluster through ssh as an opc user. For more information, see Connect to a Cluster Node Through Secure Shell (SSH).

Create a script named generate_target_ddl_from_source.sh in the ~opc home directory with the following code. This script generates the DDL statements that you can run on the target cluster to create the hive metadata.

#!/bin/bash
#
# Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
#
# Script to generate queries that can be run on a target cluster for hive metadata migration.
#
# Usage :
#   sudo generate_target_ddl_from_source.sh
# The generated scripts are available under /tmp/hivemigrate/generatedSQL_target_Hive
#  - 1_CREATE_DB.hql
#  - 2_CREATE_TABLE.hql
#  - 3_ADD_PARTITION.hql
#
# These scripts can be run on a target cluster AS IS to create hive metadata that matches the source..
#
 
 
#------------------------------------------------------------
# Set Environment
detect_basedir()
{
  DIRNAME=`dirname $0`
  BASENAME=`basename $0`
  BASEDIR=$(cd "$DIRNAME" ; pwd)
}
detect_basedir
 
 
#------------------------------------------------------------
# Initilize any default config properties that has been overriden in
#     ${BASEDIR}/hive_migrate.properties
# DO NOT MODIFY. PLEASE OVERRIDE ANY PARAMETER IN ${BASEDIR}/hive_migrate.properties
export RUNDIR=/tmp/hivemigrate
export TARGET_HDFS_PATH=hdfs://Some_cluster_name/tmp
export TARGET_OBJECTSTORE_PATH=oci://mybucket@mytenancy/tmp/
#
#------------------------------------------------------------
if [ -f ${BASEDIR}/hive_migrate.properties ]; then
    echo -e "*** Sourcing Overide proprties file ${BASEDIR}/hive_migrate.properties"
    source ${BASEDIR}/hive_migrate.properties
else
    echo -e "*** Warning : Missing Overide proprties file ${BASEDIR}/hive_migrate.properties"
    echo -e "*** Continue with default settings\n"
fi
#
usage()
{
  echo -e "Usage : ${BASENAME} \n"
  echo -e "**  This script is to be run on your source cluster and can be run as many times as you wish."
  echo -e "**  It cleans up any older generated scripts located in ${RUNDIR} before starting \n"
  echo -e "**  The generated output hive query scripts will be available in RUNDIR: ${RUNDIR} ...\n"
  echo -e "*** Modify RUNDIR if you would like to override\n"
  echo -e "**  Currently TARGET_HDFS_PATH is set to hdfs://Some_cluster_name/"
  echo -e "*** Modify TARGET_HDFS_PATH to match your hdfs path in Hive DDL \n"
}
 
 
# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
   echo -e "ERROR: *** This script ${BASENAME} must be run as root" 1>&2
   usage
   exit 1
fi
 
if [ "$1" != "" ]; then
   usage
   exit 1
fi
 
 
#------------------------------------------------------------
#
# Figure out the HDFS Path in source cluster.
# Depending on HA or non-HA cluster the hdfs path syntax is different
if ! grep -q "dfs.ha.namenodes.mycluster" /etc/bdcsce/conf/blueprint/*.json
then
    echo -e "INFO:**  Source: Non-HA cluster..."
    nn_fqdn=$(hostname -f)
    SRC_HDFS_PATH=hdfs://${nn_fqdn}:8020${SRC_WAREHOUSE_DIR}
else
    echo -e 'INFO:**  Source: HA cluster...'
    SRC_HDFS_PATH=hdfs://mycluster${SRC_WAREHOUSE_DIR}
fi
echo -e "INFO:**  Assumed source hdfspath : $SRC_HDFS_PATH"
 
 
#------------------------------------------------------------
#
# Figureout Source Object source Patch
source /etc/bdcsce/conf/datasources.properties
export SRC_OBJECTSTORE_PATH=${bdcsce_default_objectstore_url}
 
 
 
 
#------------------------------------------------------------
# Step-1 : Setup/Clean source and target directories to hold intermediate and final scripts
 
 
#Path to store intermediate hive queries
outputSourcePath=${RUNDIR}/generatedSQL_source_Hive
rm -rf $outputSourcePath
mkdir -m 777 -p $outputSourcePath
 
 
#Path to store final target hive queries to be used at target system
outputTargetPath=${RUNDIR}/generatedSQL_target_Hive
rm -rf $outputTargetPath
mkdir -m 777 -p $outputTargetPath
 
 
#------------------------------------------------------------
# Step-2 : Extract Create DB statements from MetaStore to Run in Target Hive Installation.
#
mysql -Dhive --skip-column-names -e"
 
 
-- Query for creating databases
SELECT
    CONCAT('CREATE DATABASE IF NOT EXISTS ',
            D.NAME,
            '\;') AS CREATE_DB_STATEMENTS
FROM
    DBS D;" > $outputTargetPath/1_CREATE_DB.hql
 
 
ret=$?
if [ $ret -ne 0 ];
then
    echo -e "Error:** - MySQL Error code is $ret while trying to extract DB creation scripts"
    exit $ret
else
    echo -e "**  DB Creation extraction script $outputTargetPath/1_CREATE_DB.hql completed successfully"
fi
 
 
#------------------------------------------------------------
# Extract Show Create Table statements from MetaStore to Run in Source Hive Installation.
 
 
mysql -Dhive --skip-column-names -e"
 
 
-- Query for Creating Show CT Statements to be run in the Source Hive Installation
SELECT
    CONCAT('USE ', D.name, '\; ','SHOW CREATE TABLE ',
            T.tbl_name,
            '\;') AS SHOW_CT_STATEMENTS
FROM
    TBLS T
        INNER JOIN
    DBS D ON D.DB_ID = T.DB_ID
WHERE
    T.TBL_TYPE != 'INDEX_TABLE';" > $outputSourcePath/1_SHOW_CREATE_TABLE.hql
 
 
 
 
ret=$?
if [ $ret -ne 0 ];
then
    echo -e "Error:** - MySQL Error code is $ret while trying to extract SHOW CT scripts"
    exit $ret
else
    echo -e "\n\n"
    echo -e "**  SHOW CREATE TABLE extraction script $outputSourcePath/1_SHOW_CREATE_TABLE.hql completed successfully"
fi
 
 
 
 
#------------------------------------------------------------
# Run Show Create Table Command on Source Hive Installation to create target Hive CT scripts
echo -e "**  Running show create table in source cluster"
echo -e "**  This may take a while...Depends on hive metadata size\n\n"
 
 
 
 
sudo -u hive hive -f $outputSourcePath/1_SHOW_CREATE_TABLE.hql > $outputTargetPath/2_CREATE_TABLE_temp.hql
ret=$?
if [ $ret -ne 0 ];
then
    echo -e "Error - Hive Error code is $ret while trying to execute SHOW CT scripts"
    exit $ret
else
    echo -e "\n\n"
    echo -e "**  SHOW CT execution script $outputTargetPath/2_CREATE_TABLE_temp.hql completed successfully"
fi
 
 
# Add semicolon suffix ; after CT statements
echo -e "USE default;" > $outputTargetPath/2_CREATE_TABLE.hql
cat $outputTargetPath/2_CREATE_TABLE_temp.hql | tr '\n' '\f' | sed -e 's/)\fCREATE/);\fCREATE/g'  | tr '\f' '\n' >> $outputTargetPath/2_CREATE_TABLE.hql
echo -e ";">>$outputTargetPath/2_CREATE_TABLE.hql
rm -rf $outputTargetPath/2_CREATE_TABLE_temp.hql
echo -e "**  Create Table execution script $outputTargetPath/2_CREATE_TABLE.hql created successfully"
 
#------------------------------------------------------------
# Convert Managed Table to External table for HIVE 3
#
echo -e "INFO:** Convering Managed Table to External table for HIVE 3 in CT $outputTargetPath/2_CREATE_TABLE.hql scripts"
sed -i "s/^CREATE TABLE/CREATE EXTERNAL TABLE/g" $outputTargetPath/2_CREATE_TABLE.hql
 
#------------------------------------------------------------
# Replace HDFS Path
#
echo -e "INFO:** Replacing $SRC_HDFS_PATH with $TARGET_HDFS_PATH in CT $outputTargetPath/2_CREATE_TABLE.hql scripts"
sed -i "s,$SRC_HDFS_PATH,$TARGET_HDFS_PATH,g" $outputTargetPath/2_CREATE_TABLE.hql
 
#------------------------------------------------------------
# Replace Target Object Store Path
#
echo -e "INFO:** Replacing $SRC_OBJECTSTORE_PATH with $TARGET_OBJECTSTORE_PATH in CT $outputTargetPath/2_CREATE_TABLE.hql scripts"
sed -i "s,$SRC_OBJECTSTORE_PATH,$TARGET_OBJECTSTORE_PATH,g" $outputTargetPath/2_CREATE_TABLE.hql
 
 
echo -e "INFO:** Replaced $SRC_OBJECTSTORE_PATH with $TARGET_OBJECTSTORE_PATH in CT $outputTargetPath/2_CREATE_TABLE.hql scripts"
 
 
 
#------------------------------------------------------------
# Extract Add Partition statements if any from MetaStore to Run in Target Hive Installation.
 
 
mysql -Dhive --skip-column-names -e"
-- Query to create add partition scripts to run in target Hive Installation
 
 
 
 
SET @oldNameNode = '$SRC_HDFS_PATH'; -- Old NameNode address
SET @newNameNode = '$TARGET_HDFS_PATH'; -- New NameNode address
 
 
 
 
SELECT
    REPLACE(add_partition_query,
        @oldNameNode,
        @newNameNode) AS add_partition_query
 
 
 
 
FROM
        (SELECT
                CONCAT('USE ', D.name, '\; ', ' ALTER TABLE ', T.TBL_NAME, ' ADD PARTITION (', GROUP_CONCAT(PK.PKEY_NAME, '=', '\'', PKV.PART_KEY_VAL, '\''
                    ORDER BY PK.INTEGER_IDX), ') LOCATION \'', S.location, '\'\; ') AS add_partition_query
    FROM
        TBLS T
    INNER JOIN DBS D ON T.DB_ID = D.DB_ID
    INNER JOIN PARTITION_KEYS PK ON T.TBL_ID = PK.TBL_ID
    INNER JOIN PARTITIONS P ON P.TBL_ID = T.TBL_ID
    INNER JOIN PARTITION_KEY_VALS PKV ON P.PART_ID = PKV.PART_ID
        AND PK.INTEGER_IDX = PKV.INTEGER_IDX
    INNER JOIN SDS S ON P.SD_ID = S.SD_ID
    GROUP BY P.PART_ID
    ORDER BY D.name) alias1;" > $outputTargetPath/3_ADD_PARTITION.hql
ret=$?
if [ $ret -ne 0 ];
then
    echo -e "Error - MySQL Error code is $ret while trying to extract ADD PARTITION scripts"
    exit $ret
else
    echo -e "\n\n"
    echo -e "**  ADD PARTITION extraction script $outputTargetPath/3_ADD_PARTITION.hql completed successfully"
fi
 
 
 
 
echo -e "\n\n"
echo -e "INFO: ** Here are your Target installation scripts"
ls -al $outputTargetPath/*
echo -e "\n\n"
echo -e "INFO: ** Run these hql scripts on your Target hive shell..."
 
 
exit 0

As root user, run the generate_target_ddl_from_source.sh script.
```
sudo ./generate_target_ddl_from_source.sh
```
The following scripts are generated under /tmp/hivemigrate/generatedSQL_target_Hive:
- 1_CREATE_DB.hql
- 2_CREATE_TABLE.hql
- 3_ADD_PARTITION.hql
During import, you copy these scripts from /tmp/hivemigrate/generatedSQL_target_Hive to a master node in the target BDS cluster with root and database access.

Exporting Zeppelin Notebooks, Service Configuration, and Versions

You can export service configurations from the source cluster and use them as reference for the destination cluster for any custom configuration changes that is used in the source cluster.

To export zepplin notebooks, service configurations, and versions, follow these steps:

Stop the Hive, Zeppelin, and Spark services.
1. Access the Ambari console using the following Ambari IP:
  https://<hostname>:8080
  For more information, see Access Big Data Cloud Using Ambari.
2. From Services, go to Hive, select Service Actions, and then select Stop.
3. From Services, go to Zeppelin, select Service Actions, and then select Stop.
4. From Services, go to Spark, select Service Actions, and then select Stop.
Prepare to run the export script.
1. Connect to the BDC cluster through ssh as an opc user. For more information, see Connect to a Cluster Node Through Secure Shell (SSH).
2. Create a file named exportBDC.py using the code given in github.
3. As root user, run chmod 755 exportBDC.py. The file must have execute privileges.
4. Create a file named config.json in the same directory, and specify the appropriate parameter values for the service instance being exported.
```
{
  "export_dir" : "/data/exportbdc",
  "ambari_username" : "bdcsce_admin",
  "ambari_password" : "<example-password>"
}
```

Run the export script.

As a root user, run the exportBDC.py script. To get the debug logs, run the script with the -v parameter (for example, exportBDC.py config.json -v).
```
sudo su
./exportBDC.py config.json
```

Access the contents of .tar.gz file that is exported to the following locations:

Configuration

config/NGINX_MASTER/nginx-ambari-config.xml
config/HCAT/hcat-env.sh
config/FABRIC_SERVER/fsm-config.xml
config/PIG/log4j.properties
config/PIG/pig.properties
config/PIG/pig-env.sh
config/SPARK2_CLIENT/spark-env.sh
config/SPARK2_CLIENT/spark-log4j.properties
config/SPARK2_CLIENT/spark-metrics.properties
config/SPARK2_CLIENT/spark-defaults.conf
config/HDFS_CLIENT/hdfs-site.xml
config/HDFS_CLIENT/log4j.properties
config/HDFS_CLIENT/core-site.xml
config/HDFS_CLIENT/hadoop-env.sh
config/YARN_CLIENT/log4j.properties
config/YARN_CLIENT/capacity-scheduler.xml
config/YARN_CLIENT/core-site.xml
config/YARN_CLIENT/yarn-env.sh
config/YARN_CLIENT/yarn-site.xml
config/SPOCUI_MASTER/spocui-ambari-config.xml
config/ZOOKEEPER_CLIENT/log4j.properties
config/ZOOKEEPER_CLIENT/zookeeper-env.sh
config/MAPREDUCE2_CLIENT/mapred-site.xml
config/MAPREDUCE2_CLIENT/mapred-env.sh
config/MAPREDUCE2_CLIENT/core-site.xml
config/ALLUXIO_WORKER/alluxio-log4j.xml
config/ALLUXIO_WORKER/alluxio-env.xml
config/ALLUXIO_WORKER/alluxio-site.xml
config/OOZIE_CLIENT/oozie-site.xml
config/OOZIE_CLIENT/oozie-env.sh
config/OOZIE_CLIENT/oozie-log4j.properties
config/TEZ_CLIENT/tez-site.xml
config/TEZ_CLIENT/tez-env.sh
config/HIVE_CLIENT/hive-env.sh
config/HIVE_CLIENT/hive-log4j.properties
config/HIVE_CLIENT/hive-site.xml
config/HIVE_CLIENT/hive-exec-log4j.properties
config/ALLUXIO_MASTER/alluxio-log4j.xml
config/ALLUXIO_MASTER/alluxio-env.xml
config/ALLUXIO_MASTER/alluxio-site.xml
config/LOGSTASH/bdcsce-logstash-agent-ambari-config.xml

Zepplin notebooks

zeppelin/notebook/2ANTDG878/note.json
zeppelin/notebook/2CGT3D6QX/note.json

Hive metadata
```
hive_metadata/hive_metadata_dump.sql
```
Version data
```
stack/StackVersions.txt
```

Start the Hive, Zeppelin, and Spark services.
1. Access the Ambari console using the following Ambari IP:
  https://<hostname>:8080
  For more information, see Access Big Data Cloud Using Ambari.
2. From Services, go to Hive, select Service Actions, and then select Start.
3. From Services, go to Zeppelin, select Service Actions, and then select Start.
4. From Services, go to Spark, select Service Actions, and then select Start.

Copying Scheduler Configurations

Do the following:

In BDC, go to Ambari, select YARN, select Advanced Configs, and then select Scheduler.xml.
Copy Scheduler.xml from BDC to Scheduler.xml (go to Ambari, select YARN, select Advanced Configs, and then select Scheduler.xml) in Big Data Service.

Note

This method retains the queue capacities in percentages and hence YARN cores and memory are proportional to source cluster.

Note

For advanced configuration review, use the exportBDC.py script and compare the yarn configurations dumps on the destination cluster.

Reviewing the Spark Job

You must review and update code to use latest APIs in Spark. Spark and Hive use different catalog in BDS. To access table from Hive, the catalog must be updated.

<property>
  <name>metastore.catalog.default</name>
  <value>hive</value>
</property>

In Big Data Service, by default, Hive creates ACID tables. Spark does not work on ACID tables. You must create external tables to access Hive and Spark.

Compare the configuration file created with exportBDC.py in BDC with the spark configuration file in Big Data Service custom configuration changes.

Importing Resources to Big Data Service

You now import the exported data and metadata to Big Data Service.

Prerequisites

Before importing resources, do the following:

Set up a fresh target environment on Big Data Service with the same BDC Hadoop version (Hadoop 2.7.x) as the source cluster.
Note

Note the following:
- - Define the Big Data Service cluster on OCI with the same size as the source BDC cluster. However, you must review your computing and storage needs before deciding the size of the target cluster.
  - For Oracle Cloud Infrastructure VM shapes, see Compute Shapes. BDC does not support all shapes.
  - If any software other than the BDC stack is installed on the source system using the bootstrap script or some other method, you must install and maintain that software on the target system as well.
Copy the PEM private key (oci_api_key.pem) file to all the nodes of the Big Data Service cluster, and set the appropriate permissions.
Export the artifacts from the source BDC cluster. For more information, see Export Data and Metadata from Oracle Big Data Cloud.

Importing Data to HDFS

To import data to HDFS, follow these steps:

Connect to the Big Data Service (BDS) cluster through SSH as an opc user.
```
ssh -i id_rsa <private_node_ip>
```
Determine the HDFS location where the data needs to be stored.
```
grep -A1 -i fs.defaultFS /etc/hadoop/conf/core-site.xml
```

As an HDFS user, move the data from Object Store to the target HDFS.

Use hadoop fs -cp to import HDFS data from Object Store to a destination BDS cluster as an HDFS user.

sudo su - hdfs
 
hadoop fs -Dfs.oci.client.auth.fingerprint=<fingerprint> \
-Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
-Dfs.oci.client.auth.passphrase=<passphrase> \
-Dfs.oci.client.auth.tenantId=<OCID for Tenancy> \
-Dfs.oci.client.auth.userId=<OCID for User> \
-Dfs.oci.client.hostname=<HostName. Example: https://objectstorage.us-phoenix-1.oraclecloud.com/> \
-Dfs.oci.client.proxy.uri=http://<proxyhost>:<proxyport> \
-cp oci://<bucket>@<tenancy>/<dir>/ <hdfs-dir-to-copy>

Use hadoop distcp to import HDFS data from Object Store to a multi-node BDS cluster. Validate it against specific directory. For example, use /user/oozie for <hdfs-dir-to-copy> in the following command first, and then run it for the entire HDFS (/).

hadoop distcp -libjars ${LIBJARS} \
-Dfs.client.socket-timeout=3000000
-Dfs.oci.client.auth.fingerprint=<fingerprint> \
-Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
-Dfs.oci.client.auth.passphrase=<passphrase> \
-Dfs.oci.client.auth.tenantId=<OCID for Tenancy> \
-Dfs.oci.client.auth.userId=<OCID for User> \
-Dfs.oci.client.hostname=<HostName. Example: https://objectstorage.us-phoenix-1.oraclecloud.com/> \
-Dfs.oci.client.multipart.allowed=true \
-Dfs.oci.client.proxy.uri=<http://proxy-host>:port \
-Dmapreduce.map.java.opts="$DISTCP_PROXY_OPTS" \
-Dmapreduce.reduce.java.opts="$DISTCP_PROXY_OPTS" \
-Dmapreduce.task.timeout=6000000 \
-skipcrccheck -m 40 -bandwidth 500 \
-update -strategy dynamic -i oci://<bucket>@<tenancy>/<exportedHdfsDir>/ <hdfs-dir-to-copy>

List the content of the tmp directory.
```
hadoop fs -ls /<hdfs-dir-to-copy>
```

Import the metadata files and execute the permissions

Do the following:

Import metadata files from Object Store to /metadata in HDFS.


hadoop distcp -libjars ${LIBJARS} \
-Dfs.client.socket-timeout=3000000
-Dfs.oci.client.auth.fingerprint=<fingerprint> \
-Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \
-Dfs.oci.client.auth.passphrase=<passphrase> \
-Dfs.oci.client.auth.tenantId=<OCID for Tenancy> \
-Dfs.oci.client.auth.userId=<OCID for User> \
-Dfs.oci.client.hostname=<HostName. Example: https://objectstorage.us-phoenix-1.oraclecloud.com/> \
-Dfs.oci.client.multipart.allowed=true \
-Dfs.oci.client.proxy.uri=<http://proxy-host>:port \
-Dmapreduce.map.java.opts="$DISTCP_PROXY_OPTS" \
-Dmapreduce.reduce.java.opts="$DISTCP_PROXY_OPTS" \
-Dmapreduce.task.timeout=6000000 \
-skipcrccheck -m 40 -bandwidth 500 \
-update -strategy dynamic -i oci://<bucket>@<tenancy>/metadata/ /metadata

Move files to the local directory.
```
hdfs dfs -get /metadata/Metadata*
```
Run the files in parallel in the background or in multiple terminals.
```
bash Metadataaa &
 
bash Metadataab &
 
bash Metadataac &...
```

Importing Metadata

To import metadata, follow these steps:

Connect to the BDS cluster through bastion host as an opc use
Copy the HIVE DDL query scripts, created in the source BDC cluster, to the target BDS cluster
1. 1_CREATE_DB.hql.
2. 2_CREATE_TABLE.hql
3. 3_ADD_PARTITION.hql
Run the 1_CREATE_DB.hql script.
```
hive -f 1_CREATE_DB.hql
```
Run the 2_CREATE_TABLE.hql script.
```
hive -f 2_CREATE_TABLE.hql
```
Run the 3_ADD_PARTITION.hql script.
```
hive -f 3_ADD_PARTITION.hql
```

Updating the Hive Metastore Warehouse Directory

Do the following:

Find the value of hive.metastore.warehouse.dir. This is a hive property configured to /apps/hive/warehouse in the BDC cluster in hive-site.xml.
```
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
```
Re-configure hive.metastore.warehouse.dir to <exportedHdfsDir>/apps/hive/warehouse in hive-site.xml in the BDS cluster.
If you copied your HDFS data to hdfs://tmp, configure the hive.metastore.warehouse.dir property as /tmp/apps/hive/warehouse.

Validating the Migration

After migrating the resources, verify that the same set of hive tables are present in the target cluster as in the source cluster.

Connect to the hive shell.
```
hive
```
Run the following command to list the tables:
```
show tables;
```
Run the following commands to query the table:
```
SELECT * FROM airports LIMIT 10;
```
Run the following command to verify the HDFS and Object Store data.
```
hadoop fs -du -s /tmp/hivemigrate
```
Check the cluster health by submitting all relevant jobs and getting the expected results. Pick a job that you ran in BDC and run it on the BDS cluster.
Note

Successful run of a job depends not only on the location of the data but also on the configuration settings such as HADOOP_CLASS_PATH, location of the client jars, and so on.

Oracle Cloud Infrastructure Documentation

Migrate from Big Data Cloud Compute Edition

Prerequisites

Exporting Resources

Migrating Resources Using WANdisco LiveData Migrator

Migrating Resources Using the Distcp Tool

Import the metadata files and execute the permissions

Validating the Migration