Migrate from Big Data Cloud Compute Edition
Find out how to migrate from Oracle Big Data Cloud Compute Edition (BDCE or BDC) to Big Data Service
Migrating is done in several steps. You can migrate your artifacts to OCI Big Data Service from BDC on Oracle Cloud Infrastructure Classic or BDC on Oracle Cloud Infrastructure.At a high level, you do the following:
-
Export your existing cloud resources from BDC to Object Storage.
- Import the exported cloud resources from Object Storage to Big Data Service
Prerequisites
- You are a valid user to a compartment in Big Data Service
- You are enabled to do the following:
- Access the OCI console using your credentials
- Create a bucket in Oracle Object Storage so that you can copy the HDFS data. For information about Oracle Object Storage, see Overview of Object Storage.
- Inspect the OCI Object Store configuration
For more information, see Getting Started with Big Data Service.
- You have the following OCI parameter values with you:
Value Details Tenancy ID The OCID of the tenancy. For example, ocid1.tenancy.oc1..aaaaaaaa5syd62crbj5xpfajpmopoqasxy7jwxk6ihopm5vk6bxkncyp56kc
. For more information, see Where to Get the Tenancy's OCID and User's OCIDUser ID The OCID of the user. For example, ocid1.user.oc1..aaaaaaaa3pnl7qz4c2x2mpq4v4g2mp3wktxoyahwysmjrapgzjoyd3edxltp
. For more information, see Where to Get the Tenancy's OCID and User's OCIDAPI signing key Required for an application user. For example, 03:8c:ef:51:c8:fe:6b:22:0c:5d:3c:43:a8:ff:58:d9
. For information about generating and uploading the API signing key, see the following topics:Passphrase for the signing key (Optional) Required if you have generated the key pair with a passphrase. Fingerprint for the signing key The fingerprint and passphrase of the signing key are created while generating and uploading the API signing key. For more information, see How to Get the Key's Fingerprint. Bucket and tenancy name For example, oci://myBucket@myTenancy/ For information about buckets, see Putting Data into Object Storage.
OCI Cloud Storage URL The host name. For example, https://objectstorage.us-phoenix-1.oraclecloud.com. For more information, see Create a Cluster.
Exporting Resources
Artifact in BDC | Exported Artifacts | Artifacts in OCI Big Data Service (BDS) |
---|---|---|
Data in HDFS |
Copied into OCI Object Store at For example: |
Copy the exported data from the OCI Object Store to target BDS HDFS directories. |
Data in OCI-Classic Object Store Note: This artifact doesn't apply to Oracle Big Data Cloud on Oracle Cloud Infrastructure. |
Copied into OCI Object Store at For example: |
|
Hive Metadata | Generate the Hive DDL statements on the BDC cluster. | Copy the Hive DDL statements from the BDC cluster into the BDS cluster, and execute them. |
Zeppelin Notebooks | Export the Zeppelin notebook definitions as a .tar.gz file from /user/zeppelin/notebook in HDFS. This is done using a script provided by Oracle. |
Currently, importing Zeppelin Notebooks is not supported in BDS. |
HDFS, YARN, Spark Configuration Files | Export the configuration files as a .tar.gz file using a utility script provided by Oracle. |
As BDS has optimized configuration settings for HDFS, YARN, and Spark, you need not import the configuration files and versions from BDC. |
Versions of various Open Source components | Export the service version details using Ambari REST API. Customers can also get version details from Ambari (Admin -> Stack and Versions). |
Migrating Resources Using WANdisco LiveData Migrator
Ensure that Port 8020 opens at the destination.
For information about WANdisco LiveData Migrator, click here.
To migrate resources using WANdisco LiveData Migrator, follow these steps:
Migrating Resources Using the Distcp Tool
You can also migrate data and metadata from Big Data Cloud Compute Edition and import them to the Big Data Service using the Distcp tool. Distcp is an open source tool that can be used to copy large data sets between distributed file systems within and across clusters.
Find out how to prepare the BDC cluster for export.
To export data from HDFS, follow these steps:
To export Hive metadata, follow these steps:
You can export service configurations from the source cluster and use them as reference for the destination cluster for any custom configuration changes that is used in the source cluster.
To export zepplin notebooks, service configurations, and versions, follow these steps:
-
Stop the Hive, Zeppelin, and Spark services.
-
Prepare to run the export script.
-
Run the export script.
-
Start the Hive, Zeppelin, and Spark services.
You must review and update code to use latest APIs in Spark. Spark and Hive use different catalog in BDS. To access table from Hive, the catalog must be updated.
<property>
<name>metastore.catalog.default</name>
<value>hive</value>
</property>
In Big Data Service, by default, Hive creates ACID tables. Spark does not work on ACID tables. You must create external tables to access Hive and Spark.
Compare the configuration file created with exportBDC.py
in BDC with the spark configuration file in Big Data Service custom configuration changes.
You now import the exported data and metadata to Big Data Service.
- Set up a fresh target environment on Big Data Service with the same BDC Hadoop version (Hadoop 2.7.x) as the source cluster.Note
Note the following:-
- Define the Big Data Service cluster on OCI with the same size as the source BDC cluster. However, you must review your computing and storage needs before deciding the size of the target cluster.
- For Oracle Cloud Infrastructure VM shapes, see Compute Shapes. BDC does not support all shapes.
- If any software other than the BDC stack is installed on the source system using the bootstrap script or some other method, you must install and maintain that software on the target system as well.
-
- Copy the PEM private key (
oci_api_key.pem
) file to all the nodes of the Big Data Service cluster, and set the appropriate permissions. - Export the artifacts from the source BDC cluster. For more information, see Export Data and Metadata from Oracle Big Data Cloud.
To import data to HDFS, follow these steps:
Import the metadata files and execute the permissions
- Import metadata files from Object Store to
/metadata
in HDFS.hadoop distcp -libjars ${LIBJARS} \ -Dfs.client.socket-timeout=3000000 -Dfs.oci.client.auth.fingerprint=<fingerprint> \ -Dfs.oci.client.auth.pemfilepath=<oci_pem_key> \ -Dfs.oci.client.auth.passphrase=<passphrase> \ -Dfs.oci.client.auth.tenantId=<OCID for Tenancy> \ -Dfs.oci.client.auth.userId=<OCID for User> \ -Dfs.oci.client.hostname=<HostName. Example: https://objectstorage.us-phoenix-1.oraclecloud.com/> \ -Dfs.oci.client.multipart.allowed=true \ -Dfs.oci.client.proxy.uri=<http://proxy-host>:port \ -Dmapreduce.map.java.opts="$DISTCP_PROXY_OPTS" \ -Dmapreduce.reduce.java.opts="$DISTCP_PROXY_OPTS" \ -Dmapreduce.task.timeout=6000000 \ -skipcrccheck -m 40 -bandwidth 500 \ -update -strategy dynamic -i oci://<bucket>@<tenancy>/metadata/ /metadata
- Move files to the local directory.
hdfs dfs -get /metadata/Metadata*
- Run the files in parallel in the background or in multiple terminals.
bash Metadataaa & bash Metadataab & bash Metadataac &...
To import metadata, follow these steps:
Do the following:
Validating the Migration
- Connect to the hive shell.
hive
- Run the following command to list the tables:
show tables;
- Run the following commands to query the table:
SELECT * FROM airports LIMIT 10;
- Run the following command to verify the HDFS and Object Store data.
hadoop fs -du -s /tmp/hivemigrate
- Check the cluster health by submitting all relevant jobs and getting the expected results. Pick a job that you ran in BDC and run it on the BDS cluster.Note
Successful run of a job depends not only on the location of the data but also on the configuration settings such asHADOOP_CLASS_PATH
, location of the client jars, and so on.