1 Big Data Spatial and Graph Overview

This chapter provides an overview of Oracle Big Data support for Oracle Spatial and Graph spatial and property graph features.

About Big Data Spatial and Graph
Spatial Features
Property Graph Features
Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance
Installing and Configuring the Big Data Spatial Image Processing Framework
Installing and Configuring the Big Data Spatial Image Server
Installing Oracle Big Data Spatial Hadoop Vector Console
Installing Property Graph Support on a CDH Cluster or Other Hardware

1.1 About Big Data Spatial and Graph

Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms

The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.

The property graph features support Apache Hadoop HBase and Oracle NoSQL Database for graph operations, indexing, queries, search, and in-memory analytics.

1.2 Spatial Features

Spatial location information is a common element of Big Data. Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.

The Spatial features of Oracle Big Data Special and Graph support those use cases with the following kinds of services.

Vector Services:

Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy
Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats
An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems
Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others
Spatial indexing for fast retrieval of data

Raster Services:

Support for hundreds of image file formats supported by GDAL and image files stored in HDFS
A sample console to view the set of images that are available
Raster operations, including, subsetting, georeferencing, mosaics, and format conversion

1.3 Property Graph Features

Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.

Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs.

The property graph features of Oracle Big Data Special and Graph support those use cases with the following capabilities:

A scalable graph database on Apache HBase and Oracle NoSQL Database
Developer-based APIs based upon Tinkerpop Blueprints and Rexter REST APIs, and Java graph APIs
Text search and query through integration with Apache Lucene and SolrCloud
Scripting languages support for Groovy and Python
A parallel, in-memory graph analytics engine
A fast, scalable suite of social network analysis functions that include ranking, centrality, recommender, community detection, path finding
Parallel bulk load and export of property graph data in Oracle-defined flat files format
Manageability through a Groovy-based console to execute Java and Tinkerpop Gremlin APIs

1.3.1 Property Graph Sizing Recommendations

The following are recommendations for property graph installation.

Table 1-1 Property Graph Sizing Recommendations

Graph Size	Recommended Physical Memory to be Dedicated	Recommended Number of CPU Processors
10 to 100M edges	Up to 14 GB RAM	2 to 4 processors, and up to 16 processors for more compute-intensive workloads
100M to 1B edges	14 GB to 100 GB RAM	4 to 12 processors, and up to 16 to 32 processors for more compute-intensive workloads
Over 1B edges	Over 100 GB RAM	12 to 32 processors, or more for especially compute-intensive workloads

1.4 Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance

The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including both the spatial and property graph capabilities. You can enable this option during an initial software installation, or afterward using the bdacli utility.

To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.

To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster

1.5 Installing and Configuring the Big Data Spatial Image Processing Framework

Installing and configuring the Image Processing Framework depends upon the distribution being used.

The Oracle Big Data Appliance cluster distribution comes with a pre-installed setup, but you must follow few steps in Installing Image Processing Framework for Oracle Big Data Appliance Distribution to get it working.
For a commodity distribution, follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).

After performing the installation, verify it (see Post-installation Verification of the Image Processing Framework).

1.5.1 Installing Image Processing Framework for Oracle Big Data Appliance Distribution

The Oracle Big Data Appliance distribution comes with a pre-installed configuration. However, perform the following steps to ensure that it works.

Identify the HADOOP_LIB_PATH for Oracle Big Data Appliance is under /opt/cloudera/parcels/CDH/lib/hadoop/lib/.
Make the imageserver folder under opt/shareddir/spatial/demo/ write accessible as follows:
```
chmod 777 /opt/shareddir/spatial/demo/imageserver/
```
Make the libproj.so (Proj.4) Cartographic Projections Library accessible to the users, and copy the libproj.so to the HADOOP_LIB_PATH/native under Resource Manager Node (and any backup Resource Manager Nodes) as follows:
```
cp libproj.so  HADOOP_LIB_PATH/native
```

Create a folder native under /opt/shareddir/spatial/demo/imageserver/ and copy the libproj.so and GDAL libraries to that folder as follows:

mkdir /opt/shareddir/spatial/demo/imageserver/native

cp libproj.so /opt/shareddir/spatial/demo/imageserver/native

cp -R /opt/oracle/oracle-spatial-graph/spatial/gdal/lib/. /opt/shareddir/spatial/demo/imageserver/native

1.5.2 Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance)

For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.

1.5.2.1 Prerequisites for Installing the Image Processing Framework for Other Distributions

Ensure that HADOOP_LIB_PATH is under /usr/lib/hadoop.
Install NFS.
Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.
Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER This folder is used for running the test scripts and must be pointed to /opt/shareddir. If not, then modify the test script accordingly to point to this folder.
Download oracle-spatial-graph-1.0-1.x86_64.rpm from the Oracle e-delivery web site.
Execute oracle-spatial-graph-1.0-1.x86_64.rpm using the rpm command.
After the rpm executes, verify that a directory structure created at /opt/oracle/oracle-spatial-graph/spatial contains three folders: jlib, gdal and demo.

1.5.2.2 Installing the Image Processing Framework for Other Distributions

Copy the content under jlib (bunch of jar files) to HADOOP_LIB_PATH directory in every cluster node.
In the Resource Manager Node (and any backup Resource Manager Nodes), copy all the content from /opt/oracle/oracle-spatial-graph/spatial/gdal/lib to the HADOOP_LIB_PATH/native directory as follows:

cp -R --preserve=links /opt/oracle/oracle-spatial-graph/spatial/gdal/lib/. HADOOP_LIB_PATH/native
In the Resource Manager Node (and any backup Resource Manager Nodes), copy the gdal data folder under /opt/oracle/oracle-spatial-graph/spatial/gdal and gdalplugins under /opt/oracle/oracle-spatial-graph/spatial/gdal into the SHARED_FOLDER as follows:

cp -R /opt/oracle/oracle-spatial-graph/spatial/gdal/data SHARED_FOLDER

cp –R /opt/oracle/oracle-spatial-graph/spatial/gdal/gdalplugins SHARED_FOLDER
Create a folder ALL_ACCESS_FOLDER with write access for all users under SHARED_FOLDER. This folder must be named as spatial in order for test scripts to be executed as they are, otherwise modify test script accordingly, as follows:

Go to the shared folder.

cd /opt/shareddir

Create a spatial folder.

mkdir spatial

Provide write access.

chmod 777 spatial
Copy the demo folder under /opt/oracle/oracle-spatial-graph/spatial/demo into ALL_ACCESS_FOLDER.

cp -R /opt/oracle/oracle-spatial-graph/spatial/demo /opt/shareddir/spatial
Provide write access to the imageserver folder under demo as follows:

chmod 777 /opt/shareddir/spatial/demo/imageserver/
Copy the user library libproj.so into the HADOOP_LIB_PATH/native as follows:

cp libproj.so HADOOP_LIB_PATH/native

Create a folder native under /opt/shareddir/spatial/demo/imageserver/ and copy the libproj.so and GDAL libraries to that folder as follows:

mkdir /opt/shareddir/spatial/demo/imageserver/native

cp libproj.so /opt/shareddir/spatial/demo/imageserver/native

cp -R /opt/oracle/oracle-spatial-graph/spatial/gdal/lib/. /opt/shareddir/spatial/demo/imageserver/native

Provide read and execute permissions for the libproj.so library to all users as follows:

chmod 755 /opt/shareddir/spatial/demo/imageserver/native/libproj.so

1.5.2.3 Configuring the environment

Set the following GDAL environment variables in the Resource Manager Node (and any backup Resource Manager Nodes):

GDAL_DRIVER_PATH=HADOOP_LIB_PATH/native/gdalplugins

GDAL_DATA=SHARED_FOLDER/data
Create or update the shared libraries to list the GDAL libraries location:

LD_LIBRARY_PATH=HADOOP_LIB_PATH/native

1.5.3 Post-installation Verification of the Image Processing Framework

Two test scripts are provided, one to test the Image Loading functionality and another to test the Image Processing functionality. Execute these two scripts as mentioned in this section to verify a successful installation of Image Processing Framework.

1.5.3.1 Image Loading Test Script

This test script loads three Hawaii images into HDFS, and one block is created for each one of them. The test script can be found in the following path: /opt/shareddir/spatial/demo/imageserver/runimageloader.sh.

The script can be executed using the following example.

Note:

All users must have write permission to the parent folder /opt/shareddir/spatial/demo/imageserver.

From the command line type sudo -u hdfs ./runimageloader.sh.

Upon a successful execution a list of the created images and thumbnails on HDFS should be listed after the following message: Generated ohifs files are with the list of files displayed and followed by this message: Thumbnails created are with the list of thumbnails.

If the three images and its corresponding thumbnails are listed, then the loading process was successful. If this step was reached, the install and configuration executed successfully.

If the installation and configuration was not successful, then the output is not generated and a message Not all the images were uploaded correctly, check for Hadoop logs.

1.5.3.2 Image Processor Test Script

This test script creates a mosaic with the three pre-loaded Hawaii images. The mosaic created is 1600 x 1447 pixels. The test script can be found at /opt/shareddir/spatial/demo/imageserver/runimageprocessor.sh.

The script can be executed using the following example.

Note:

All users must have write permission to the parent folder /opt/shareddir/spatial/demo/imageserver.

From the command line type sudo -u hdfs ./runimageprocessor.sh.

Upon a successful execution a message is displayed: Expected output file: /opt/shareddir/spatial/processtest/littlemap.tif.

If the installation and configuration was successful, then the output is generated with a message Mosaic image generated.

If the installation and configuration was not successful, then the output is not generated and a message Mosaic was not successfully created, check for Hadoop logs.

1.6 Installing and Configuring the Big Data Spatial Image Server

You can access the image processing framework through the Oracle Big Data Spatial Image Server, which provides a web interface for loading and processing images.

Installing and configuring the Spatial Image Server depends upon the distribution being used.

Installing and Configuring the Image Server for Oracle Big Data Appliance
Installing and Configuring the Image Server Web for Other Systems (Not Big Data Appliance)

After you perform the installation, verify it (see Post-installation Verification Example for the Image Server Console).

1.6.1 Installing and Configuring the Image Server for Oracle Big Data Appliance

Follow the instructions in this topic.

Prerequisites for installing Image Server on Oracle Big Data Appliance
Installing Image Server Web on an Oracle Big Data Appliance
Configuring the Environment

1.6.1.1 Prerequisites for installing Image Server on Oracle Big Data Appliance

Download the latest Jetty core component binary from the Jetty download page http://www.eclipse.org/jetty/downloads.php onto the Oracle DBA Resource Manager node.
Unzip the imageserver.war file into the jetty webapps directory or any other directory of choice as follows:

unzip /opt/oracle/oracle-spatial-graph/spatial/jlib/imageserver.war -d $JETTY_HOME/webapps/imageserver

Note:
The directory or location under which you unzip the file is known as $JETTY_HOME in this procedure.
Copy Hadoop dependencies as follows:

cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* $JETTY_HOME/webapps/imageserver/WEB-INF/lib/
Edit the $JETTY_HOME/start.ini file and change the property jsp-impl=apache to jsp-impl=glassfish optionally. You can download these jars from http://mvnrepository.com/ or another Apache jar provider:

xalan-2.7.1.jar

xerscesImpl-2.11.0.jar

xml-apis-1.4.01.jar

serializer-2.7.1.jar
Copy these jars to $JETTY_HOME/lib/apache-jsp.
Check the version by running: $JETTY_HOME/java -jar start.jar –version

1.6.1.2 Installing Image Server Web on an Oracle Big Data Appliance

Copy the gdal.jar file under /opt/oracle/oracle-spatial-graph/spatial/jlib/gdal.jar to $JETTY_HOME/lib/ext.
Copy the /opt/oracle/oracle-spatial-graph/spatial/conf/jetty-imgserver-realm.properties file to $JETTY_HOME/etc folder
Edit the $JETTY_HOME/etc/jetty-imgserver-realm.properties file to add a password and role
1. Remove the <password> and type a new password.
2. Remove the <> from the <admin_role> text and keep the admin_role.
Start the jetty server by running: java -jar $JETTY_HOME/start.jar.

1.6.1.3 Configuring the Environment

Type the http://thehost:8080/imageserver/console.jsp address in your browser address bar to open the console.
Log in to the console using the credentials you created in "Installing Image Server Web on an Oracle Big Data Appliance."
From the Configuration tab in the Hadoop Configuration Parameters section, depending on the cluster configuration change these three properties
1. fs.defaultFS: Type the active namenode of your cluster in the format hdfs://<namenode>:8020 (Check with the administrator for this information).
2. yarn.resourcemanager.scheduler.address: Active Resource manager of your cluster. <shcedulername>:8030. This is the Scheduler address.
3. yarn.resourcemanager.address: Active Resource Manager address in the format <resourcename>:8032
Note:
Keep the default values for the rest of the configuration. They are pre-loaded for your Oracle Big Data Appliance cluster environment.
Click Apply Changes to save the changes.

Tip:
You can review the missing configuration information under the Hadoop Loader tab of the console.

1.6.2 Installing and Configuring the Image Server Web for Other Systems (Not Big Data Appliance)

Follow the instructions in this topic.

Prerequisites for Installing the Image Server on Other Systems
Installing the Image Server Web on Other Systems
Configuring the Environment

1.6.2.1 Prerequisites for Installing the Image Server on Other Systems

Follow the instructions specified in "Prerequisites for Installing the Image Processing Framework for Other Distributions."
Follow the instructions specified in "Installing the Image Processing Framework for Other Distributions."
Follow the instructions specified in "Configuring the environment."

1.6.2.2 Installing the Image Server Web on Other Systems

Follow the instructions specified in "Prerequisites for installing Image Server on Oracle Big Data Appliance."
Follow the instructions specified in "Installing Image Server Web on an Oracle Big Data Appliance."
Follow the instructions specified in "Configuring the Environment."

1.6.2.3 Configuring the Environment

Type the http://thehost:8080/imageserver/console.jsp address in your browser address bar to open the console.
Log in to the console using the credentials you created in "Installing Image Server Web on an Oracle Big Data Appliance."
From the Configuration tab in the Hadoop Configuration Parameters section, depending on the cluster configuration change these three properties
1. Specify a shared folder to start browsing the images. This folder must be shared between the cluster and NFS mountpoint (SHARED_FOLDER).
2. Create a child folder named saveimages under Start with full write access. For example, if Start=/home, then saveimages=/home/saveimages.
3. If the cluster requires a mount point to access the SHARED_FOLDER, specify a mount point. For example, /net/home. Else, leave it blank and proceed.
4. Type the folder path that contains the Hadoop native libraries and additional libraries (HADOOP_LIB_PATH).
5. yarn.application.classpath: Type the classpath for the Hadoop to find the required jars and dependencies. Usually this is under /usr/lib/hadoop.
Note:
Keep the default values for the rest of the configuration. They are pre-loaded for your Oracle Big Data Appliance cluster environment.
Click Apply Changes to save the changes.

Tip:
You can review the missing configuration information under the Hadoop Loader tab of the console.

1.6.3 Post-installation Verification Example for the Image Server Console

In this example, you will:

Load the images from local server to HDFS Hadoop cluster.
Run a job to create a mosaic image file and a catalog with several images.
View the mosaic image.

Related subtopics:

Loading images from the local server to HDFS Hadoop cluster
Creating a mosaic image and catalog

1.6.3.1 Loading images from the local server to HDFS Hadoop cluster

Open (http://<hostname>:8080/imageserver/console.jsp) the Image Server Console.
Log in using the default user/password as admin/admin.
Go to the Hadoop Loader tab.
Click Open and browse to the demo folder that contains a set of Hawaii images. They can be found at /opt/shareddir/spatial/demo/imageserver/images.
Select the images folder and click Load images.

Wait for the message, 'Images loaded successfully'.

Note:

If no errors were shown, then you have successfully installed the Image Loader web interface.

1.6.3.2 Creating a mosaic image and catalog

Go to the Raster Image processing tab.
From the Catalog menu select Catalog > New Catalog > HDFS Catalog.

A new catalog is created.
From the Imagery menu select Imagery > Add hdfs image.
Browse the HDFS host and add images.

A new file tree gets created with all the images you just loaded from your host.
Browse the newdata folder and verify the images.
Select the images listed in the pre-visualizer and add click Add.

The images are added to the bottom sub-panel.
Click Add images.

The images are added to the main catalog.
Save the catalog.
From the Imagery menu select Imagery > Mosaic.
Click Load default configuration file, browse to /opt/shareddir/spatial/demo/imageserver and select testFS.xml.

Note:
The default configuration file testFS.xml is included in the demo.
Click Create Mosaic.

Wait for the image to be created.
Optionally, to download and view the image click Download.

1.7 Installing Oracle Big Data Spatial Hadoop Vector Console

Follow the instructions in this topic.

Assumptions and Prerequisite Libraries
Installing Spatial Hadoop Vector Console on Oracle Big Data Appliance
Installing Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)
Configuring Spatial Hadoop Vector Console on Oracle Big Data Appliance
Configuring Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)

1.7.1 Assumptions and Prerequisite Libraries

The following assumptions and prerequisites are a must for installing and configure the Spatial Hadoop Vector Console.

1.7.1.1 Assumptions

The API and jobs described here runs on a CDH5 or similar Hadoop environment.
Java 6 or newer versions are present in your environment.
MVSuggest must be installed separately. Download it from the http://www.oracle.com/technetwork/index.html site.

1.7.1.2 Prerequisite Libraries

In addition to the Hadoop environment jars, the libraries listed here are required by the Vector Analysis API.

Hadoop environment jars
sdohadoop-vector.jar
sdoutil.jar
sdohadoop-vector-demo.jar
sdoapi.jar
ojdbc.jar

1.7.2 Installing Spatial Hadoop Vector Console on Oracle Big Data Appliance

Download the latest Jetty core component binary from the Jetty download page http://www.eclipse.org/jetty/downloads.php onto the Oracle DBA Resource Manager node.
Unzip the spatialviewer.war file into the jetty webapps directory or any other directory of choice as follows:

unzip /opt/oracle/oracle-spatial-graph/spatial/jlib/spatialviewer.war -d $JETTY_HOME/webapps/spatialviewer

Note:
The directory or location under which you unzip the file is known as $JETTY_HOME in this procedure.
Copy Hadoop dependencies as follows:

cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* $JETTY_HOME/webapps/spatialviewer/WEB-INF/lib/
Complete the configuration steps mentioned in the "Configuring Spatial Hadoop Vector Console on Oracle Big Data Appliance."
Start the jetty server. $JETTY_HOME/java -jar start.jar

1.7.3 Installing Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)

Follow the steps mentioned in "Installing Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in step 3 replace the path /opt/cloudera/parcels/CDH/lib/ with /usr/lib/.

1.7.4 Configuring Spatial Hadoop Vector Console on Oracle Big Data Appliance

Edit the configuration file $JETTY_HOME/webapps/spatialviewer/conf/console-conf.xml to specify your own data to send mails and change the directory used by the console to build temporary hierarchical indexes.

You can as well change the properties of the Hadoop jobs. The configuration parameters are:

Edit the Notification URL: This is the URL where the console server is running. It has to be visible to the Hadoop cluster to notify the end of the jobs. This is an example settings: <baseurl>http:// hadoop.console.url:8080</baseurl>
Directory with temporary hierarchical indexes: An HDFS path that will contain temporary data on hierarchical relationships. This is an example settings: <hierarchydataindexpath>hdfs://hadoop.cluster.url:8020/user/myuser/hierarchyIndexPath</hierarchydataindexpath>

General Hadoop jobs configuration: The console uses two Hadoop jobs. The first is used to create a spatial index on existing files in HDFS and the second is used to generate displaying results based on the index. One part of the configuration is common to both jobs and another is specific to each job. The common configuration can be found within the <hadoopjobs><configuration> elements. An example configuration is given here:

<hadoopjobs>
   <configuration>
       <property>
<!--hadoop user. The user is a mandatory property.-->
         <name>hadoop.job.ugi</name>
         <value>hdfs</value>
       </property>
 
       <property>
<!-- like defined in core-site.xml
If in core-site.xml the path fs.defaultFS is define as the nameservice ID
(High Availability configuration) then set the full address and IPC port 
of the currently active name node. The service is define in the file hdfs-site.xml.-->
         <name>fs.defaultFS</name>
         <value>hdfs://hadoop.cluster.url:8020</value>
      </property>
 
      <property>
<!-- like defined in mapred-site.xml -->
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
     </property>

     <property>
<!-- like defined in yarn-site.xml -->
       <name>yarn.resourcemanager.scheduler.address</name>
       <value>hadoop.cluster.url:8030</value>
    </property>

    <property>
<!-- like defined in yarn-site.xml -->
        <name>yarn.resourcemanager.address</name>
        <value>hadoop.cluster.url:8032</value>
    </property>

    <property>
<!-- like defined in yarn-site.xml (full path) -->
        <name>yarn.application.classpath</name>
       <value>/etc/hadoop/conf/,/opt/cloudera/parcels/CDH/lib/hadoop/*,/opt/cloudera/parcels/CDH/lib/hadoop/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*</value>
     </property>             
  </configuration>
<hadoopjobs>

Create an index job specific configuration. Additional Hadoop parameters can be specified for the job that creates the spatial indexes. An example additional configuration is given here:

<hadoopjobs>
   <configuration>
   ...
   </configuration>
      <indexjobadditionalconfiguration>
         <property>
         <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. -->
            <name>mapred.max.split.size</name>
            <value>1342177280</value>
           </property>       
      </indexjobadditionalconfiguration>
<hadoopjobs>

Create a specific configuration for the job that generates the results. The same applies for the second job. The following is an example property settings:

<hadoopjobs>
  <configuration>
   ...
  </configuration>
    
     <indexjobadditionalconfiguration>
      ...
     </indexjobadditionalconfiguration>
 
     <hierarchicaljobadditionalconfiguration>
        <property>
        <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. -->
           <name>mapred.max.split.size</name>
           <value>1342177280</value>
         </property> 
      </hierarchicaljobadditionalconfiguration>
<hadoopjobs>

Specify the Notification emails: The email notifications are sent to notify about the job completion status. This is defined within the <notificationmails> element. It is mandatory to specify a user (<user>), password (<password>) and sender email (<mailfrom>). In the <configuration> element, the configuration properties needed for the Java Mail must be set. This example is a typical configuration to send mails via SMTP server using a SSL connection:

<notificationmails>
  <!--Authentication parameters. The Authentication parameters are mandatory.-->
    <user>user@mymail.com</user>
    <password>mypassword</password>
    <mailfrom>user@mymail.com</mailfrom>

    <!--Parameters that will be set to the system properties. Below the parameters needed to send mails via SMTP server using a SSL connection.      -->
    
    <configuration>
       <property>
         <name>mail.smtp.host</name>
         <value>mail.host.com</value>
       </property>
        
       <property>
         <name>mail.smtp.socketFactory.port</name>
         <value>myport</value>
       </property>
 
       <property>
         <name>mail.smtp.socketFactory.class</name>
         <value>javax.net.ssl.SSLSocketFactory</value>
       </property>

       <property>
         <name>mail.smtp.auth</name>
         <value>true</value>
       </property>
    </configuration>
</notificationmails>

1.7.5 Configuring Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)

Follow the steps mentioned in "Configuring Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in step 1 C (General Hadoop Job Configuration), in the Hadoop property yarn.application.classpath replace the /opt/cloudera/parcels/CDH/lib/ with the actual library path, by default /usr/lib/.

1.8 Installing Property Graph Support on a CDH Cluster or Other Hardware

You can use property graphs on either Oracle Big Data Appliance or commodity hardware.

Apache HBase Prerequisites
Property Graph Installation Steps
About the Property Graph Installation Directory
Optional Installation Task for In-Memory Analytics

1.8.1 Apache HBase Prerequisites

The following prerequisites apply to installing property graph support in HBase.

Linux operating system
Cloudera's Distribution including Apache Hadoop (CDH)

For the software download, see: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
Apache HBase
Java Development Kit

Details about supported versions of these products, including any interdependencies, will be provided in a My Oracle Support note.

1.8.2 Property Graph Installation Steps

To install property graph support, follow these steps.

Unzip the software package:
```
rpm -i oracle-spatial-graph-1.0-1.x86_64.rpm
```
By default, the software is installed in the following directory: /opt/oracle/

After the installation completes, the opt/oracle/oracle-spatial-graph directory exists and includes a property_graph subdirectory.
Set the JAVA_HOME environment variable. For example:
```
setenv JAVA_HOME  /usr/local/packages/jdk7
```

Set the PGX_HOME environment variable. For example:

setenv PGX_HOME /opt/oracle/oracle-spatial-graph/pgx

If HBase will be used, set the HBASE_HOME environment variable in all HBase region servers in the Apache Hadoop cluster. (HBASE_HOME specifies the location of the hbase installation directory.) For example:
```
setenv HBASE_HOME /usr/lib/hbase
```
Note that on some installations of Big Data Appliance, Apache HBase is placed in a directory like the following: /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hbase/
If HBase will be used, copy the data access layer library into $HBASE_HOME/lib. For example:
```
cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal*.jar $HBASE_HOME/lib
```
Tune the HBase or Oracle NoSQL Database configuration, as described in other tuning topics.
Log in to Cloudera Manager as the admin user, and restart the HBase service. Restarting enables the Region Servers to use the new configuration settings.

1.8.3 About the Property Graph Installation Directory

The installation directory for Oracle Big Data Spatial and Graph property graph features has the following structure:

$ tree -dFL 2 /opt/oracle/oracle-spatial-graph/property_graph/
/opt/oracle/oracle-spatial-graph/property_graph/
|-- dal
|   |-- groovy
|   |-- opg-solr-config
|   `-- webapp
|-- data
|-- doc
|   |-- dal
|   `-- pgx
|-- examples
|   |-- dal
|   |-- pgx
|   `-- pyopg
|-- lib
|-- librdf
`-- pgx
    |-- bin
    |-- conf
    |-- groovy
    |-- scripts
    |-- webapp
    `-- yarn

1.8.4 Optional Installation Task for In-Memory Analytics

Follow this installation task if property graph support is installed on a client without Hadoop, and you want to read graph data stored in the Hadoop Distributed File System (HDFS) into in-memory analytics and write the results back to the HDFS, and/or use Hadoop NextGen MapReduce (YARN) scheduling to start, monitor and stop in-memory analytics

Installing and Configuring Hadoop
Running In-Memory Analytics on Hadoop

1.8.4.1 Installing and Configuring Hadoop

To install and configure Hadoop, follow these steps.

Download the tarball for a supported version of the Cloudera CDH.
Unpack the tarball into a directory of your choice. For example:
```
tar xvf hadoop-2.5.0-cdh5.2.1.tar.gz -C /opt
```
Have the HADOOP_HOME environment variable point to the installation directory. For example.
```
export HADOOP_HOME=/opt/hadoop-2.5.0-cdh5.2.1
```
Add $HADOOP_HOME/bin to the PATH environment variable. For example:
```
export PATH=$HADOOP_HOME/bin:$PATH
```
Configure $HADOOP_HOME/etc/hadoop/hdfs-site.xml to point to the HDFS name node of your Hadoop cluster.
Configure $HADOOP_HOME/etc/hadoop/yarn-site.xml to point to the resource manager node of your Hadoop cluster.
Configure the fs.defaultFS field in $HADOOP_HOME/etc/hadoop/core-site.xml to point to the HDFS name node of your Hadoop cluster.

1.8.4.2 Running In-Memory Analytics on Hadoop

When running a Java application using in-memory analytics and HDFS, make sure that $HADOOP_HOME/etc/hadoop is on the classpath, so that the configurations get picked up by the Hadoop client libraries. However, you do not need to do this when using the In-Memory Analytics Shell, because it adds $HADOOP_HOME/etc/hadoop automatically to the classpath if HADOOP_HOME is set.

You do not need to put any extra Cloudera Hadoop libraries (JAR files) on the classpath. The only time you need the YARN libraries is when starting In-Memory Analytics as a YARN service. This is done with the yarn command, which automatically adds all necessary JAR files from your local installation to the classpath.

You are now ready to load data from HDFS or start In-Memory Analytics as a YARN service. For further information about Hadoop, refer to the CDH 5.2.x documentation.