Use ORAAH in Big Data Cloud

These instructions describe how to use Oracle R Advanced Analytics for Hadoop (ORAAH) in Big Data Cloud.

Get Started with ORAAH

In a new notebook in Big Data Cloud, type %r and then type your R code.

To use the ORAAH libraries, you first need to load the libraries in a paragraph that is executed before other ORAAH functions. For example, you can add the following lines of code to load the ORAAH libraries inside your R session and check the files in your HDFS home:

%r
# Load the ORAAH library:
library(ORCH)
# List the datasets available in the cluster under the user's home HDFS folder:
hdfs.ls()

Connect to Hive

When connecting to Hive from a notebook paragraph, use zeppelin as the user. This is the specific user that has the read/write permissions on HDFS that ORAAH uses for storing temporary files. You can use localhost as the host.

%r
# Load the ORAAH library:
library(ORCH)
# After loading the libraries, connect to Hive with the following command:
ore.connect(user="zeppelin", host="localhost", port="10002",
schema="default", type="HIVE", transportMode="http", httpPath="hs2service")
# List all tables available in Hive:
ore.ls()

Use the Spark-based Machine Learning Interfaces

To use the Spark-based machine learning interfaces to ORAAH's ML algorithms and the Spark MLlib algorithms, use ORAAH's spark.connect() command to start an exclusive Spark session. The algorithms can then be executed against data stored in HDFS or Hive.

For example, the following lines of code establish a connection to an exclusive Spark session and run a few of the built-in ML examples from ORAAH. Note that IP_address in the following example is specific to your Big Data Cloud environment.

%r
# Load the ORAAH library:
library(ORCH)
# Ensure no other connection exists from ORAAH into Spark:
spark.disconnect()
# Connect to Spark via YARN, asking for 2 GB of RAM, using as dfs.namenode the
IP address indicated, usually the public IP address of the service:
spark.connect(master='yarn-client', dfs.namenode='IP_address',memory='2G')

# Run the example of Logistic Regression by ORAAH:
example(orch.glm2)
# Run the example of the Multi-layer Neural Networks by ORAAH:
example(orch.neural2)
# Run the example of the Linear Regression by ORAAH:
example(orch.lm2)
# Run the example of the Spark MLlib Random Forest via ORAAH:
example(orch.ml.random.forest)
# Run the example of the Spark MLlib Decision Trees via ORAAH:
example(orch.ml.dt)
# Run the example of the Spark MLlib Support Vector Machines via ORAAH:
example(orch.ml.svm)
# Run the example of the Spark MLlib Logistic Regression via ORAAH:
example(orch.ml.logistic)
# Run the example of the Spark MLlib k-Means Clustering via ORAAH:
example(orch.ml.kmeans)
# Run the example of the Spark MLlib Gaussian Mixture Model Clustering via ORAAH:
example(orch.ml.gmm)