Implement Oracle Modern Data Platform for Business Reporting and Forecasting

This section describes how to provision OCI services, implement the use case, visualize data, and create forecasts.

Provision OCI Services

Let's go through the main steps for provisioning resources for a simple demo of this use case.

Create an OCI Object Storage Bucket.
For more information, see Creating a Bucket.
Provision Oracle Big Data Service and Autonomous Data Warehouse.
For more information, see Get Started with a Highly Available ODH Big Data Cluster and Oracle Autonomous Database Serverless Quick Start.

Implement the Oracle Modern Data Platform Use Case

Before going into the implementation details, let us look at the overall flow.

Raw data will reside in OCI Object Storage.
Spark running natively on Oracle Big Data Service will read data from OCI Object Storage. It will cleanse, transform and anonymize data and finally persist data into Autonomous Data Warehouse.
Users will utilize Oracle Analytics Cloud to visualize data and make forecasts.

Description of oci-modern-data-reporting-flow.png follows

Description of the illustration oci-modern-data-reporting-flow.png

oci-modern-data-reporting-flow-oracle.zip

Note:

The chosen data flow is a simplistic example, actual business use cases can be expected to more involved.

The following section describes how to upload sample data, configure OCI Object Storage, Oracle Big Data Service, and invoke a simple Spark scala job to transform data and move data to Autonomous Data Warehouse

Obtain sample data and upload to the OCI Object Storage bucket you have created. Data used for the demo implementation in this solution is time series sales data downloaded from Kaggle. Each record in this dataset has the following fields:
1. Order Id
2. Product name
3. Quantity Ordered
4. Sales Price
5. Order Date
6. Purchase Address
For more information about the sample data, see the Kaggle: Sample Sales Data link in the Explore More section.
Configure OCI Object Storage access: The Oracle Big Data Service cluster requires access to Object Storage bucket to create external hive tables.
For implementation details, follow the steps in this document: Using Object Storage API keys.

Log in to Oracle Big Data Service and invoke Hive. Create an external table based on the Object Storage bucket that has sales data for our example.

create external table if not exists sales_obj_store
(Order_ID bigint,
Product string,
Quantity_Ordered int,
PriceEach double,
Order_Date string,
Purchase_Address string)
row format delimited
fields terminated by ','
stored as TEXTFILE
location 'oci://bucket_namej@namespace/sales_data/'

Confirm that you can query data using the external table above.
```
 select * from sales_obj_store limit 20;
```
Configure Oracle Big Data Service access to Autonomous Data Warehouse: Download the Autonomous Data Warehouse wallet file, navigate to the Autonomous Data Warehouse cluster, click Database connection and download the wallet file. Store the wallet in a directory on the Oracle Big Data Service cluster.
Download the Oracle JDBC driver and related jar files on the Oracle Big Data Service cluster. Drivers can be downloaded from this link: Oracle Database JDBC driver and Companion Jars Downloads. Pass in these jars when invoking your Spark job.

Invoke a simple Spark scala job to transform data and move data to Autonomous Data Warehouse.

This job does a simple transformation to convert string field to a timestamp field.
This job also anonymizes the data by removing full customer address and extracts the zip field from it.

import com.oracle.bmc.auth.AbstractAuthenticationDetailsProvider

import java.util.Properties

import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Row
import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.SparkSession
import oracle.jdbc.OracleConnection

val jdbcUrl = s"jdbc:oracle:thin:@(description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-ashburn-1.oraclecloud.com))(connect_data=(service_name=xxxx_high.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes)))?TNS_ADMIN=/home/xxx/wallet"

val connectionProperties = new Properties()
var options = collection.mutable.Map[String, String]()
options += ("driver" -> "oracle.jdbc.driver.OracleDriver")
options += ("url" -> jdbcUrl)
options += (OracleConnection.CONNECTION_PROPERTY_USER_NAME -> "xxxx")
options += (OracleConnection.CONNECTION_PROPERTY_PASSWORD -> "xxxxx")
options += (OracleConnection.CONNECTION_PROPERTY_TNS_ADMIN -> "/home/xxx/wallet")
options += ("dbtable" -> "schema_name.sales")

/* read from hive */
val dfsales = spark.sql("select * from sales")

/* transform */
val ts = to_timestamp($"order_date", "MM/dd/yy HH:mm")
val dfsales_orcl = dfsales.withColumn("order_date",ts)
val dfsales_orcl_zip = dfsales_orcl.select(col("*"), substring_index(col("purchase_address"), " ", -1).as("zip"))
val dfsales_orcl_transformed = dfsales_orcl_zip.drop("purchase_address")

/* write to ADW */
dfsales_orcl_transformed.write.format("jdbc").options(options).mode("append").save()

Once the data transformation job has finished, confirm that you can see the data in Autonomous Data Warehouse.