1 Big Data Spatial and Graph Overview

This chapter provides an overview of Oracle Big Data support for Oracle Spatial and Graph spatial, property graph, and multimedia analytics features.

1.1 About Big Data Spatial and Graph

Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms.

The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.

The property graph features support Apache Hadoop HBase and Oracle NoSQL Database for graph operations, indexing, queries, search, and in-memory analytics.

The multimedia analytics features provide a framework for processing video and image data in Apache Hadoop, including built-in face recognition using OpenCV.

1.2 Spatial Features

Spatial location information is a common element of Big Data. Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.

The Spatial features of Oracle Big Data Special and Graph support those use cases with the following kinds of services.

Vector Services:

Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy
Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats
An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems
Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others
Spatial indexing for fast retrieval of data

Raster Services:

Support for many image file formats supported by GDAL and image files stored in HDFS
A sample console to view the set of images that are available
Raster operations, including, subsetting, georeferencing, mosaics, and format conversion

1.3 Property Graph Features

Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.

Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs.

The property graph features of Oracle Big Data Special and Graph support those use cases with the following capabilities:

A scalable graph database on Apache HBase and Oracle NoSQL Database
Developer-based APIs based upon Tinkerpop Blueprints, and Java graph APIs
Text search and query through integration with Apache Lucene and SolrCloud
Scripting languages support for Groovy and Python
A parallel, in-memory graph analytics engine
A fast, scalable suite of social network analysis functions that include ranking, centrality, recommender, community detection, path finding
Parallel bulk load and export of property graph data in Oracle-defined flat files format
Manageability through a Groovy-based console to execute Java and Tinkerpop Gremlin APIs

1.3.1 Property Graph Sizing Recommendations

The following are recommendations for property graph installation.

Table 1-1 Property Graph Sizing Recommendations

Graph Size	Recommended Physical Memory to be Dedicated	Recommended Number of CPU Processors
10 to 100M edges	Up to 14 GB RAM	2 to 4 processors, and up to 16 processors for more compute-intensive workloads
100M to 1B edges	14 GB to 100 GB RAM	4 to 12 processors, and up to 16 to 32 processors for more compute-intensive workloads
Over 1B edges	Over 100 GB RAM	12 to 32 processors, or more for especially compute-intensive workloads

1.4 Multimedia Analytics Features

The multimedia analytics feature of Oracle Big Data Spatial and Graph provides a framework for processing video and image data in Apache Hadoop. The framework enables distributed processing of video and image data.

A main use case is performing facial recognition in videos and images.

1.5 Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance

The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including the spatial, property graph, and multimedia capabilities. You can enable this option during an initial software installation, or afterward using the bdacli utility.

To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.

To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster.

1.6 Installing and Configuring the Big Data Spatial Image Processing Framework

Installing and configuring the Image Processing Framework depends upon the distribution being used.

The Oracle Big Data Appliance cluster distribution comes with a pre-installed setup, but you must follow few steps in Installing the Image Processing Framework for Oracle Big Data Appliance Distribution to get it working.
For a commodity distribution, follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).

For both distributions:

You must download and compile PROJ libraries, as explained in Getting and Compiling the Cartographic Projections Library.
After performing the installation, verify it (see Post-installation Verification of the Image Processing Framework).
If the cluster has security enabled, make sure that the user executing the jobs is in the princs list and has an active Kerberos ticket.

1.6.1 Getting and Compiling the Cartographic Projections Library

Before installing the Image Processing Framework, you must download the Cartographic Projections Library and perform several related operations.

Download the PROJ.4 source code and datum shifting files:

$ wget http://download.osgeo.org/proj/proj-4.9.1.tar.gz
$ wget http://download.osgeo.org/proj/proj-datumgrid-1.5.tar.gz

Untar the source code, and extract the datum shifting files in the nad subdirectory:

$ tar xzf proj-4.9.1.tar.gz
$ cd proj-4.9.1/nad
$ tar xzf ../../proj-datumgrid-1.5.tar.gz
$ cd ..

Configure, make, and install PROJ.4:
```
$ ./configure
$ make
$ sudo make install
$ cd ..
```
libproj.so is now available at /usr/local/lib/libproj.so.

Create a link to the libproj.so file in the spatial installation directory:

sudo ln -s /usr/local/lib/libproj.so /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so

Provide read and execute permissions for the libproj.so library for all users

sudo chmod 755 /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so

1.6.2 Installing the Image Processing Framework for Oracle Big Data Appliance Distribution

The Oracle Big Data Appliance distribution comes with a pre-installed configuration. However, be sure that the actions described in Getting and Compiling the Cartographic Projections Library have been performed, so that libproj.so (PROJ.4) is accessible to all users and is set up correctly.

For OBDA, ensure that the following directories exist:

SHARED_DIR (shared directory for all nodes in the cluster): /opt/shareddir
ALL_ACCESS_DIR (shared directory for all nodes in the cluster with Write access to the hadoop group): /opt/shareddir/spatial

1.6.3 Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance)

For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.

1.6.3.1 Prerequisites for Installing the Image Processing Framework for Other Distributions

Ensure that HADOOP_LIB_PATH is under /usr/lib/hadoop. If it is not there, find the path and use it as it your HADOOP_LIB_PATH.
Install NFS.
Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.
Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER
Download oracle-spatial-graph-<version>.x86_64.rpm from the Oracle e-delivery web site.
Execute oracle-spatial-graph-<version>.x86_64.rpm using the rpm command.
After rpm executes, verify that a directory structure created at /opt/oracle/oracle-spatial-graph/spatial/raster contains these folders: console, examples, jlib, gdal, and tests. Additionally, index.html describes the content, and javadoc.zip contains the Javadoc for the API..

1.6.3.2 Installing the Image Processing Framework for Other Distributions

Make the libproj.so (Proj.4) Cartographic Projections Library accessible to the users, as explained in Getting and Compiling the Cartographic Projections Library.
In the Resource Manager Node, copy the data folder under /opt/oracle/oracle-spatial-graph/spatial/raster/gdal into the SHARED_FOLDER as follows:

cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/data SHARED_FOLDER
Create a directory ALL_ACCESS_FOLDER under SHARED_FOLDER with write access for all users involved in job execution. Also consider the yarn user in the write access because job results are written by this user. Group access may be used to configure this.

Go to the shared folder.

cd SHARED_FOLDER

Create a new directory.

mkdir ALL_ACCESS_FOLDER

Provide write access.

chmod 777 ALL_ACCESS_FOLDER
Copy the data folder under /opt/oracle/oracle-spatial-graph/spatial/raster/examples into ALL_ACCESS_FOLDER.

cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/examples/data ALL_ACCESS_FOLDER
Provide write access to the data/xmls folder as follows (or just ensure that users executing the jobs, including tests and examples, have write access):

chmod 777 ALL_ACCESS_FOLDER/data/xmls/

1.6.4 Post-installation Verification of the Image Processing Framework

Several test scripts are provided to:

Test the image loading functionality
Test test the image processing functionality
Test a processing class for slope calculation in a DEM and a map algebra operation
Verify the image processing of a single raster with no mosaic process (it includes a user-provided function that calculates hill shade in the mapping phase).
Test processing of two rasters using a mask operation

Execute these scripts to verify a successful installation of image processing framework.

If the cluster has security enabled, make sure the current user is in the princs list and has an active Kerberos ticket.

Make sure the user has write access to ALL_ACCESS_FOLDER and that it belongs to the owner group for this directory. It is recommended that jobs be executed in Resource Manager node for Big Data Appliance. If jobs are executed in a different node, then the default is the hadoop group.

1.6.4.1 Image Loading Test Script

This script loads a set of six test rasters into the ohiftest folder in HDFS, 3 rasters of byte data type and 3 bands, 1 raster (DEM) of float32 data type and 1 band, and 2 rasters of int32 data type and 1 band. No parameters are required for OBDA environments and a single parameter with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Internally, the job creates a split for every raster to load. Split size depends on the block size configuration; for example, if a block size >= 64MB is configured, 4 mappers will run; and as a result the rasters will be loaded in HDFS and a corresponding thumbnail will be created for visualization. An external image editor is required to visualize the thumbnails, and an output path of these thumbnails is provided to the users upon successful completion of the job.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageloader.sh

For ODBA environments, enter:

./runimageloader.sh

For non-ODBA environments, enter:

./runimageloader.sh ALL_ACCESS_FOLDER

Upon successful execution, the message GENERATED OHIF FILES ARE LOCATED IN HDFS UNDER is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:

“THUMBNAILS CREATED ARE:
----------------------------------------------------------------------
total 13532
drwxr-xr-x 2 yarn yarn 4096 Sep 9 13:54 .
drwxr-xr-x 3 yarn yarn 4096 Aug 27 11:29 ..
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 hawaii.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32_1.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 kahoolawe.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 maui.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 4182040 Sep 9 13:54 NapaDEM.tif.ohif.tif
YOU MAY VISUALIZE THUMBNAILS OF THE UPLOADED IMAGES FOR REVIEW FROM THE FOLLOWING PATH:

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

NOT ALL THE IMAGES WERE UPLOADED CORRECTLY, CHECK FOR HADOOP LOGS

The amount of memory required to execute mappers and reducers depends on the configured HDFS block size By default, 1 GB of memory is assigned for Java, but you can modify that and other properties in the imagejob.prop file that is included in this test directory.

1.6.4.2 Image Processor Test Script (Mosaicking)

This script executes the processor job by setting three source rasters of Hawaii islands and some coordinates that includes all three. The job will create a mosaic based on these coordinates and resulting raster should include the three rasters combined in a single one.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 3 band rasters of byte data type.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Additionally, if the output should be stored in HDFS, the "-o" parameters must be used to set the HDFS folder where the mosaic output will be stored.

Internally the job filters the tiles using the coordinates specified in the configuration input, xml, only the required tiles are processed in a mapper and finally in the reduce phase, all of them are put together into the resulting mosaic raster.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessor.sh

For ODBA environments, enter:

./runimageprocessor.sh

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif is displayed, with the path to the output mosaic file. The output may include:

EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif
total 9452
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:12 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4741101 Sep 10 09:12 hawaiimosaic.tif

MOSAIC IMAGE GENERATED
----------------------------------------------------------------------
YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

To test the output storage in HDFS, use the following command

For ODBA environments, enter:

./runimageprocessor.sh -o hdfstest

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER -o hdfstest

1.6.4.3 Single-Image Processor Test Script

This script executes the processor job for a single raster, in this case is a DEM source raster of North Napa Valley. The purpose of this job is process the complete input by using the user processing classes configured for the mapping phase. This class calculates the hillshade of the DEM, and this is set to the output file. No mosaic operation is performed here.

runimageloader.sh should be executed as a prerequisite, so that the source raster exists in HDFS. This is 1 band of float 32 data type DEM rasters.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runsingleimageprocessor.sh

For ODBA environments, enter:

./runsingleimageprocessor.sh

For non-ODBA environments, enter:

./runsingleimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path to the output DEM file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaDEM.tif
IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

1.6.4.4 Image Processor DEM Test Script

This script executes the processor job by setting a DEM source raster of North Napa Valley and some coordinates that surround it. The job will create a mosaic based on these coordinates and will also calculate the slope on it by setting a processing class in the mosaic configuration XML.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. This is 1 band of float 32 data type DEM raster.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessordem.sh

For ODBA environments, enter:

./runimageprocessordem.sh

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path to the slope output file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaSlope.tif
MOSAIC IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

You may also test the “if” algebra function, where every pixel in this raster with value greater than 2500 will be replaced by the value you set in the command line using the “–c” flag. For example:

For ODBA environments, enter:

./runimageprocessordem.sh –c 8000

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER –c 8000

You can visualize the output file and notice the difference between simple slope calculation and this altered output, where the areas with pixel values greater than 2500 look more clear.

1.6.4.5 Multiple Raster Operation Test Script

This script executes the processor job for two rasters that cover a very small area of North Napa Valley in the US state of California.

These rasters have the same MBR, pixel size, SRID, and data type, all of which are required for complex multiple raster operation processing. The purpose of this job is process both rasters by using the mask operation, which checks every pixel in the second raster to validate if its value is contained in the mask list. If it is, the output raster will have the pixel value of the first raster for this output cell; otherwise, the zero (0) value is set. No mosaic operation is performed here.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 1 band of int32 data type rasters.

No parameters are required for OBDA environments. For non-ODBA environments, a single parameter -s with the ALL_ACCESS_FOLDER value is required.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessormultiple.sh

For ODBA environments, enter:

./runimageprocessormultiple.sh

For non-ODBA environments, enter:

./runimageprocessormultiple.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif is displayed, with the path to the mask output file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 MaskInt32Rasters.tif
IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

1.7 Installing and Configuring the Big Data Spatial Image Server

You can access the image processing framework through the Oracle Big Data Spatial Image Server, which provides a web interface for loading and processing images.

Installing and configuring the Spatial Image Server depends upon the distribution being used.

After you perform the installation, verify it (see Post-Installation Verification Example for the Image Server Console).

1.7.1 Installing and Configuring the Image Server for Oracle Big Data Appliance

To perform an automatic installation using the provided script, you can perform these steps:

Run the following script:
```
sudo /opt/oracle/oracle-spatial-graph/spatial/configure-server/install-bdsg-consoles.sh
```
If the active nodes have changed since the installation, update the configuration in the web console.
Start the server:
```
cd /opt/oracle/oracle-spatial-graph/spatial/web-server
sudo ./start-server.sh
```
If any errors occur, see the the README file located in /opt/oracle/oracle-spatial-graph/spatial/configure-server.

The preceding instructions configure the entire server. If no further configuration is required, you can go directly to Post-Installation Verification Example for the Image Server Console.

If you need more information or need to perform other actions, see the following topics:

1.7.1.1 Prerequisites for Performing a Manual Installation

Ensure that you have the prerequisite software installed.

Download a web server like Jetty or Tomcat onto the Oracle DBA Resource Manager node.
Unzip the imageserver.war file into the web server webapps directory as follows: :

unzip /opt/oracle/oracle-spatial-graph/spatial/jlib/imageserver.war -d WEB_SERVER_HOME/webapps/imageserver

Note:

The directory or location under which you unzip the file is known as WEB_SERVER_HOME in this procedure.
Copy Hadoop dependencies as follows:

cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* WEB_SERVER_HOME/webapps/imageserver/WEB-INF/lib/

Note:

If the installation of the web serveris done on a non-Oracle DBA cluster, then replace /opt/cloudera/parcels/CDH/lib/hadoop/ with the actual Hadoop library path, which by default is /usr/lib/hadoop.

1.7.1.2 Installing Dependencies on the Image Server Web on an Oracle Big Data Appliance

Copy the asm-3.1.jar file under /opt/oracle/oracle-spatial-graph/spatial/raster/jlib/asm-3.1.jar to WEB_SDERVER_HOME/webapps/imageserver/WEB-INF/lib.

Note:

The jersey-core* jars will be duplicated at WEB_SERVER_HOME/webapps/imageserver/WEB-INF/lib. Make sure you remove the old ones and leave just jersey-core-1.17.1.jar in the folder, as in the next step.
Enter the following command:
```
ls -lat jersey-core*
```
Delete the listed libraries, except do not delete jersey-core-1.17.1.jar.
In the same directory (WEB_SERVER_HOME/webapps/imageserver/WEB-INF/lib), delete the xercesImpl and servlet jar files:
```
rm xercesImpl*
rm servlet*
```
Start the web server

If you need to change the port, specify it. For example, in the case of the Jetty server, set jetty.http.port=8081.

Ignore any warnings, such as the following:
```
java.lang.UnsupportedOperationException:  setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBuilderFactory
```

1.7.1.3 Configuring the Environment for Big Data Appliance

Type the http://thehost:8045/imageserver address in your browser address bar to open the web console.
From the Administrator tab, then Configuration tab, in the Hadoop Configuration Parameters section, depending on the cluster configuration change three properties:
1. fs.defaultFS: Type the active namenode of your cluster in the format hdfs://<namenode>:8020 (Check with the administrator for this information).
2. yarn.resourcemanager.scheduler.address: Active Resource manager of your cluster. <shcedulername>:8030. This is the Scheduler address.
3. yarn.resourcemanager.address: Active Resource Manager address in the format <resourcename>:8032
Note:

Keep the default values for the rest of the configuration. They are pre-loaded for your Oracle Big Data Appliance cluster environment.
Click Apply Changes to save the changes.

Tip:

You can review the missing configuration information under the Hadoop Loader tab of the console.

1.7.2 Installing and Configuring the Image Server Web for Other Systems (Not Big Data Appliance)

To install and configure the image server web for other systems (not Big Data Appliance), see these topics.

1.7.2.1 Prerequisites for Installing the Image Server on Other Systems

Before installing the image server on other systems, you must install the image processing framework as specified in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).

1.7.2.2 Installing the Image Server Web on Other Systems

The steps to install the image server web on other systems are the same as for installing it on BDA.

Follow the instructions specified in "Prerequisites for Performing a Manual Installation."
Follow the instructions specified in "Installing Dependencies on the Image Server Web on an Oracle Big Data Appliance."
Follow the instructions specified in "Configuring the Environment for Other Systems."

1.7.2.3 Configuring the Environment for Other Systems

Configure the environment as described in Configuring the Environment for Big Data Appliance
, and then continue with the following steps.
From the Configuration tab in the Global Init Parameters section, depending on the cluster configuration change these properties
1. shared.gdal.data: Specify the gdal shared data folder. Follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance) .
2. gdal.lib: Location of the gdal .so libraries.
3. start: Specify a shared folder to start browsing the images. This folder must be shared between the cluster and NFS mountpoint (SHARED_FOLDER).
4. saveimages: Create a child folder named saveimages under start (SHARED-FOLDER) with full write access. For example, if start=/home, then saveimages=/home/saveimages.
5. nfs.mountpoint: If the cluster requires a mount point to access the SHARED_FOLDER, specify a mount point. For example, /net/home. Otherwise, leave it blank.
From the Configuration tab in the Hadoop Configuration Parameters section, update the following property:
1. yarn.application.classpath: The classpath for the Hadoop to find the required jars and dependencies. Usually this is under /usr/lib/hadoop. For example:
```
/etc/hadoop/conf/,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*
```
Note:

Keep the default values for the rest of the configuration.
Click Apply Changes to save the changes.

Tip:

You can review any missing configuration information under the Hadoop Loader tab of the console.

1.7.3 Post-Installation Verification Example for the Image Server Console

In this example, you will:

Load the images from local server to HDFS Hadoop cluster.
Run a job to create a mosaic image file and a catalog with several images.
View the mosaic image.

Related subtopics:

1.7.3.1 Loading Images from the Local Server to the HDFS Hadoop Cluster

Open (http://<hostname>:8080/imageserve) the Image Server Console.
Go to the Hadoop Image Loader tab.
Click Open and browse to the demo folder that contains a set of Hawaii images. They can be found at: /opt/shareddir/spatial/data/rasters..
Select the rasters folder and click Load images.

Wait for the message, 'Images loaded successfully'.

Note:

If no errors were shown, then you have successfully installed the Image Loader web interface.

1.7.3.2 Creating a Mosaic Image and Catalog

Go to the Raster Processing tab.
From the Catalog menu select Catalog > New Catalog > HDFS Catalog.

A new catalog is created.
From the Imagery menu select Imagery > Add hdfs image.
Browse the HDFS host and add images.

A new file tree gets created with all the images you just loaded from your host.
Browse the newdata folder under hdfs and verify the images.
Select the images listed in the pre-visualizer and add click Add.

The images are added to the bottom sub-panel.
Click Add images.

The images are added to the main catalog.
Save the catalog.
From the Imagery menu select Imagery > Mosaic.
Copy the testFS.xml file from /opt/shareddir/spatial/data/xmls to your $HOME directory.
Click Load default configuration file, browse to the default home directory, and select testFS.xml.

Note:

The default configuration file testFS.xml is included in the demo.
Click Create Mosaic.

Wait for the image to be created.
Optionally, to download and store the image, click Download.

1.7.3.3 Creating a Mosaic Directly from the Globe

Go to the Hadoop Raster Viewer tab.
Click Refresh Footprint and wait until all footprints are displayed on the panel.
Click Select Footprints , then select the desired area, zooming in or out as necessary.
Remove or ignore rasters as necessary.

If identical rasters are in the result, they are shown in yellow
Right-click on the map and select Generate Mosaic.
Specify the output folder in which to place the mosaic, or load an existing configuration file.
If you want to add an operation on every pixel in the mosaic, click Advanced Configuration..
Click Create Mosaic, and wait for the result.
If you need to remove the selection, click the red circle in the upper-left corner of the map.

Note:

If you requested the mosaic to be created on HDFS, you must wait until the image is loaded on HDFS.
Optionally, to download and view the image, click Download.

1.7.3.4 Creating a Slope Image from the Globe

Go to (or stay on) the Hadoop Raster Viewer tab.
Click Refresh Footprint and wait until all footprints are displayed on the panel.
Remove or ignore rasters as necessary.

If identical rasters are in the result, they are shown in yellow
Right-click on the Image and select Process Image(No Mosaic).
Specify the output folder in which to place the slope image, or load an existing configuration file.
6. Select the proper Pixel Type. (Usually these images are Float 32 Bits.)
7. Add a Process class by clicking Add Process Class. The framework provides a default process class for slope: oracle.spatial.hadoop.imageprocessor.process.ImageSlope
If you want to add an operation on every pixel in the mosaic, click Advanced Configuration.
Click Create Mosaic, and wait for the result.
I10. If you need to generate another slope, click Go Back.

Note:

If you requested the mosaic to be created on HDFS, you must wait until the image is loaded on HDFS.
Optionally, to download and store the image, click Download.

1.7.3.5 Creating an Image Using Multiple-Raster Algebra Operations

Go to (or stay on) the Hadoop Raster Viewer tab.
Click Refresh Footprint and wait until all footprints are displayed on the panel.
Remove or ignore rasters as necessary.

If identical rasters are in the result, they are shown in yellow
4. Right-click on the yellow identical rasters for processing.
Click Process Rasters with Same MBR.
Specify the output folder in which to place the mosaic, or load an existing configuration file.
Click Advanced Configuration.
8. Select a Multiple Complex Raster Operation from the right panel. (Only one option is allowed.)
Click Create Mosaic, and wait for the result.
I10. If you need to generate another image, click Go Back.

Note:

If you requested the mosaic to be created on HDFS, you must wait until the image is loaded on HDFS.
Optionally, to download and store the image, click Download.

1.7.3.6 Removing Identical Rasters

Go to the Raster Mapviewer Globe tab.
Click Refresh Footprint and wait until all footprints are displayed on the panel..

If identical rasters are in the result, they are shown in yellow.
For each pair of identical rasters, if you want to select one of them for removal, right-click on its yellow box.

A new dialog box is displayed.
To remove a raster, click the X button for it.
To see the thumbnail, click in the image.

1.7.4 Using the Provided Image Server Web Services

The image server has two ready-to-use web services, one for the HDFS loader and the other for the HDFS mosaic processor.

These services can be called from a Java application. They are currently supported only for GET operations. The formats for calling them are:

Loader: http://host:port/imageserver/rest/hdfsloader?path=string&overlap=string where:

path: The images to be processed; can be a the path of a single file, or of one or more whole folders. For more than one folder, use commas to separate folder names.

overlap (optional): The overlap between images (default = 10).

Mosaic: http://host:port/imageserver/rest/mosaic?mosaic=string&config=string where:

mosaic: The XML mosaic file that contains the images to be processed. If you are using the image server web application, the XML file is generated automatically. Example of a mosaic XML file:

<?xml version='1.0'?>
<catalog type='HDFS'>
    <image>
       <source>Hadoop File System</source>
       <type>HDFS</type>
       <raster> /hawaii.tif.ohif</raster>
        <bands datatype='1' config='1,2,3'>3</bands>
    </image>
    <image>
       <source>Hadoop File System</source>
       <type>HDFS</type>
       <raster>/ /kahoolawe.tif.ohif</raster>
        <bands datatype='1' config='1,2,3'>3</bands>
    </image>
</catalog>

config: Configuration file; created the first time a mosaic is processed using the image server web application. Example of a configuration file

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<mosaic>
    <output>
        <SRID>26904</SRID>
        <directory type = "FS">/net/system123/scratch/user3/installers</directory>
        <tempFsFolder>/net/system123/scratch/user3/installers</tempFsFolder>
        <filename>test</filename>
        <format>GTIFF</format>
        <width>1800</width>
        <height>1406</height>
        <algorithm order = "0">1</algorithm>
        <bands layers = "3"/>
        <nodata>#000000</nodata>
        <pixelType>1</pixelType>
    </output>
    <crop>
        <transform>294444.1905688362,114.06068372059636,0,2517696.9179752027,0,-114.06068372059636</transform>
    </crop>
    <process/>
    <operations>
        <localnot/>
    </operations>
</mosaic>

Java Example: Using the Loader

public class RestTest 
    public static void main(String args[]) {

        try {
            // Loader http://localhost:7101/imageserver/rest/hdfsloader?path=string&overlap=string
            // Mosaic http://localhost:7101/imageserver/rest/mosaic?mosaic=string&config=string
            String path = "/net/system123/scratch/user3/installers/hawaii/hawaii.tif";
           
            URL url = new URL(
                    "http://system123.example.com:7101/imageserver/rest/hdfsloader?path=" +
                            path + "&overlap=2"); // overlap its optional
           
            HttpURLConnection conn = (HttpURLConnection) url.openConnection();
            conn.setRequestMethod("GET");
            //conn.setRequestProperty("Accept", "application/json");

            if (conn.getResponseCode() != 200) {
                throw new RuntimeException("Failed : HTTP error code : "
                        + conn.getResponseCode());
            }

            BufferedReader br = new BufferedReader(new InputStreamReader(
                    (conn.getInputStream())));

            String output;
            System.out.println("Output from Server .... \n");
            while ((output = br.readLine()) != null) {
                System.out.println(output);
            }

            conn.disconnect();

        } catch (MalformedURLException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }
    }
}

Java Example: Using the Mosaic Processor

public class NetClientPost {
    public static void main(String[] args) {

      try {
          String mosaic = "<?xml version='1.0'?>\n" +
                    "<catalog type='HDFS'>\n" +
                    "    <image>\n" +
                    "       <source>Hadoop File System</source>\n" +
                    "       <type>HDFS</type>\n" +
                    "       <raster>/user/hdfs/newdata/net/system123/scratch/user3/installers/hawaii/hawaii.tif.ohif</raster>\n" +
                    "       <url>http://system123.example.com:7101/imageserver/temp/862b5871973372aab7b62094c575884ae13c3a27_thumb.jpg</url>\n" +
                    "       <bands datatype='1' config='1,2,3'>3</bands>\n" +
                    "    </image>\n" +
                    "</catalog>";
             String config = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
                     "<mosaic>\n" +
                     "<output>\n" +
                     "<SRID>26904</SRID>\n" +
                     "<directory type=\"FS\">/net/system123/scratch/user3/installers</directory>\n" +
                     "<tempFsFolder>/net/system123/scratch/user3/installers</tempFsFolder>\n" +
                     "<filename>test</filename>\n" +
                     "<format>GTIFF</format>\n" +
                     "<width>1800</width>\n" +
                     "<height>1269</height>\n" +
                     "<algorithm order=\"0\">1</algorithm>\n" +
                     "<bands layers=\"3\"/>\n" +
                     "<nodata>#000000</nodata>\n" +
                     "<pixelType>1</pixelType>\n" +
                     "</output>\n" +
                     "<crop>\n" +
                     "<transform>739481.1311601736,130.5820811245199,0,2254053.5858749463,0,-130.5820811245199</transform>\n" +
                     "</crop>\n" +
                     "<process/>\n" +
                     "</mosaic>";
             System.out.println ("asdf");
             URL url2 = new URL("http://192.168.1.67:8080" );
             HttpURLConnection conn2 = (HttpURLConnection) url2.openConnection();
             conn2.setRequestMethod("GET");
             if (conn2.getResponseCode() != 200 ) {
                throw new RuntimeException("Failed : HTTP error code : "
                    + conn2.getResponseCode());
            }
        /*URL url = new URL("http://system123.example.com:7101/imageserver/rest/mosaic?" +("mosaic=" + URLEncoder.encode(mosaic, "UTF-8") + "&config=" + 
                URLEncoder.encode(config, "UTF-8")));
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        conn.setRequestMethod("GET");
       
        if (conn.getResponseCode() != 200 ) {
            throw new RuntimeException("Failed : HTTP error code : "
                + conn.getResponseCode());
        }
        BufferedReader br = new BufferedReader(new InputStreamReader(
                (conn.getInputStream())));
        String output;System.out.println("Output from Server .... \n");
        while ((output = br.readLine()) != null)
            System.out.println(output);
        conn.disconnect();*/

      } catch (MalformedURLException e) {
        e.printStackTrace();
      } catch (IOException e) {
        e.printStackTrace();
     }
    }
}

1.8 Installing the Oracle Big Data Spatial Hadoop Vector Console

To install the Oracle Big Data Spatial Hadoop vector console, follow the instructions in this topic.

1.8.1 Assumptions and Prerequisite Libraries

The following assumptions and prerequisites apply for installing and configure the Spatial Hadoop Vector Console.

1.8.1.1 Assumptions

The API and jobs described here run on a Cloudera CDH5.7, Hortonworks HDP 2.4, or similar Hadoop environment.
Java 8 or newer versions are present in your environment.

1.8.1.2 Prerequisite Libraries

In addition to the Hadoop environment jars, the libraries listed here are required by the Vector Analysis API.

sdohadoop-vector.jar
sdoutil.jar
sdoapi.jar
ojdbc.jar
commons-fileupload-1.3.1.jar
commons-io-2.4.jar
jackson-annotations-2.1.4.jar
jackson-core-2.1.4.jar
jackson-core-asl-1.8.1.jar
jackson-databind-2.1.4.jar
javacsv.jar
lucene-analyzers-common-4.6.0.jar
lucene-core-4.6.0.jar
lucene-queries-4.6.0.jar
lucene-queryparser-4.6.0.jar
mvsuggest_core.jar

1.8.2 Installing the Spatial Hadoop Vector Console on Oracle Big Data Appliance

You can install the Spatial Hadoop vector console on Big Data Appliance either by using the provided script or by performing a manual configuration.. To use the provided script:

Run the following script to install the console:
```
sudo /opt/oracle/oracle-spatial-graph/spatial/configure-server/install-bdsg-consoles.sh
```
If the active nodes have changed after the installation, then update the configuration file as described in Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance.
Start the console:
```
cd /opt/oracle/oracle-spatial-graph/spatial/web-server 
sudo ./start-server.sh
```
If any errors occur, see the the README file located in /opt/oracle/oracle-spatial-graph/spatial/configure-server.

To perform a manual configuration, follow these steps.

1. Download a web server like Jetty or Tomcat onto the Oracle DBA Resource Manager node.
Unzip the spatialviewer.war file into the web serverwebapps directory as follows:

unzip /opt/oracle/oracle-spatial-graph/spatial/vector/console/spatialviewer.war -d WEB_SERVER_HOME/webapps/spatialviewer

Note:

The directory or location under which you unzip the file is known as WEB_SERVER_HOME in this procedure.
Copy Hadoop dependencies as follows:

cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* WEB_SERVER_HOME/webapps/spatialviewer/WEB-INF/lib/
Complete the configuration steps mentioned in Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance
Start the web server. WEB_SERVER_HOME/java -Djetty.deploy.scanInterval=0 -jar start.jar

Optionally, upload sample data (used with examples in other topics) to HDFS:

sudo -u hdfs hadoop fs -mkdir /user/oracle/bdsg

sudo -u hdfs hadoop fs -put /opt/oracle/oracle-spatial-graph/spatial/vector/examples/data/tweets.json /user/oracle/bdsg/

1.8.3 Installing the Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)

Follow the steps for manual configuration described in "Installing the Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in step 3 replace the path /opt/cloudera/parcels/CDH/lib/ with the actual library path, which by default is /usr/lib/.

1.8.4 Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance

Edit the configuration file WEB_SERVER_HOME/webapps/spatialviewer/conf/console-conf.xml, or /opt/oracle/oracle-spatial-graph/spatial/web-server/spatialviewer/conf/console-conf.xml if the installation was done using the provided script, to specify your own data for sending email and for other configuration parameters.

Follow these steps with the configuration parameters
1. Edit the Notification URL: This is the URL where the console server is running. It has to be visible to the Hadoop cluster to notify the end of the jobs. This is an example settings: <baseurl>http:// hadoop.console.url:8080</baseurl>
2. Edit the directory with temporary hierarchical indexes: an HDFS path that will contain temporary data on hierarchical relationships. Example: <hierarchydataindexpath>hdfs://hadoop.cluster.url:8020/user/myuser/hierarchyIndexPath</hierarchydataindexpath>
3. Edit the HDFS directory that will contain the MVSuggest generated index. Example: <mvsuggestindex> hdfs://hadoop.cluster.url:8020/user/myuser /mvSuggestIndex</mvsuggestindex>
4. If necessary, edit the URL used to get the eLocation background maps. Example: <elocationmvbaseurl>http://elocation.oracle.com/mapviewer</elocationmvbaseurl>
5. Edit the HDFS directory that will contain the index metadata. Example: <indexmetadatapath>hdfs:// hadoop.cluster.url:8020/user/myuser/indexMetadata</indexmetadatapath>
6. Edit the HDFS directory with temporary data used by the explore data processes. Example: <exploretempdatapath>hdfs:// hadoop.cluster.url:8020/user/myuser/exploreTmp<exploretempdatapath>
7. Edit the HDFS directory that will contain information about the jobs run by the console. Example: <jobregistrypath>hdfs:// hadoop.cluster.url:8020/user/myuser/spatialJobRegistry</jobregistrypath>
8. If necessary disable the display of the jobs in the job details screen. Disable this display if the logs are not in the default format. The default format is: Date LogLevel LoggerName: LogMessage
  
  The Date must have the default format: yyyy-MM-dd HH:mm:ss,SSS. For example: 2012-11-02 14:34:02,781. To disable the logs, set <displaylogs> to false. Example: <displaylogs>false</displaylogs>
  
  If the logs are not displayed and <displaylogs> is set to true, then ensure that yarn.log-aggregation-enable in yarn-site.xml is set to true. Also ensure that the Hadoop jobs configuration parameters yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix are set to the same value than in yarn-site.xml.
9. Edit the general Hadoop jobs configuration: The console uses two Hadoop jobs. The first is used to create a spatial index on existing files in HDFS and the second is used to generate displaying results based on the index. One part of the configuration is common to both jobs and another is specific to each job. The common configuration can be found within the <hadoopjobs><configuration> elements. An example configuration is given here:
```
<hadoopjobs>
   <configuration>
               <property>
        
                 <name>hadoop.job.ugi</name>
                 <value>hdfs</value>
               </property>
         
               <property>
        
                 <name>fs.defaultFS</name>
                 <value>hdfs://hadoop.cluster.url:8020</value>
              </property>
         
              <property>
        
               <name>mapreduce.framework.name</name>
               <value>yarn</value>
             </property>
        
             <property>
        
               <name>yarn.resourcemanager.scheduler.address</name>
               <value>hadoop.cluster.url:8030</value>
            </property>
        
            <property>
        
                <name>yarn.resourcemanager.address</name>
                <value>hadoop.cluster.url:8032</value>
            </property>

                        <property>
                                
                                <name>yarn.nodemanager.remote-app-log-dir</name>
                                <value>/tmp/logs</value>
                        </property>
                        
                        <property>
                                
                                <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
                                <value>logs</value>
                        </property>        

            <property>
        
                <name>yarn.application.classpath</name>
               <value>/etc/hadoop/conf/,/opt/cloudera/parcels/CDH/lib/hadoop/*,/opt/cloudera/parcels/CDH/lib/hadoop/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*</value>
             </property>             
          </configuration>
        <hadoopjobs>
```

Create an index job specific configuration. Additional Hadoop parameters can be specified for the job that creates the spatial indexes. An example additional configuration is:

<hadoopjobs>
   <configuration>
   ...
   </configuration>
      <indexjobadditionalconfiguration>
         <property>
         <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. -->
            <name>mapred.max.split.size</name>
            <value>1342177280</value>
           </property>    
      </indexjobadditionalconfiguration>
<hadoopjobs>

Create a specific configuration for the job that generates the categorization results. The following is an example of property settings:

<hadoopjobs>
  <configuration>
   ...
  </configuration>
    
     <indexjobadditionalconfiguration>
      ...
     </indexjobadditionalconfiguration>
 
     <hierarchicaljobadditionalconfiguration>
        <property>
        <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. -->
           <name>mapred.max.split.size</name>
           <value>1342177280</value>
         </property>      
      </hierarchicaljobadditionalconfiguration>
<hadoopjobs>

Specify the Notification emails: The email notifications are sent to notify about the job completion status. This is defined within the <notificationmails> element. It is mandatory to specify a user (<user>), password (<password>) and sender email (<mailfrom>). In the <configuration> element, the configuration properties needed for the Java Mail must be set. This example is a typical configuration to send mails via SMTP server using a SSL connection:

<notificationmails>
  <!--Authentication parameters. The Authentication parameters are mandatory.-->
    <user>user@mymail.com</user>
    <password>mypassword</password>
    <mailfrom>user@mymail.com</mailfrom>

    <!--Parameters that will be set to the system properties. Below the parameters needed to send mails via SMTP server using a SSL connection.      -->
    
    <configuration>
       <property>
         <name>mail.smtp.host</name>
         <value>mail.host.com</value>
       </property>
        
       <property>
         <name>mail.smtp.socketFactory.port</name>
         <value>myport</value>
       </property>
 
       <property>
         <name>mail.smtp.socketFactory.class</name>
         <value>javax.net.ssl.SSLSocketFactory</value>
       </property>

       <property>
         <name>mail.smtp.auth</name>
         <value>true</value>
       </property>
    </configuration>
</notificationmails>

1.8.5 Configuring the Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)

Follow the steps mentioned in "Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in the step (General Hadoop Job Configuration), in the Hadoop property yarn.application.classpath replace the /opt/cloudera/parcels/CDH/lib/ with the actual library path, which by default is /usr/lib/.

1.9 Installing Property Graph Support on a CDH Cluster or Other Hardware

You can use property graphs on either Oracle Big Data Appliance or commodity hardware.

1.9.1 Apache HBase Prerequisites

The following prerequisites apply to installing property graph support in HBase.

Linux operating system
Cloudera's Distribution including Apache Hadoop (CDH)

For the software download, see: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
Apache HBase
Java Development Kit (JDK) (Java 8 or higher)

Details about supported versions of these products, including any interdependencies, will be provided in a My Oracle Support note.

1.9.2 Property Graph Installation Steps

To install property graph support, follow these steps.

Unzip the software package:
```
rpm -i oracle-spatial-graph-<version>.x86_64.rpm
```
By default, the software is installed in the following directory: /opt/oracle/

After the installation completes, the opt/oracle/oracle-spatial-graph directory exists and includes a property_graph subdirectory.
Set the JAVA_HOME environment variable. For example:
```
setenv JAVA_HOME  /usr/local/packages/jdk8
```

Set the PGX_HOME environment variable. For example:

setenv PGX_HOME /opt/oracle/oracle-spatial-graph/pgx

If HBase will be used, set the HBASE_HOME environment variable in all HBase region servers in the Apache Hadoop cluster. (HBASE_HOME specifies the location of the hbase installation directory.) For example:
```
setenv HBASE_HOME /usr/lib/hbase
```
Note that on some installations of Big Data Appliance, Apache HBase is placed in a directory like the following: /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hbase/
If HBase will be used, copy the data access layer library into $HBASE_HOME/lib. For example:
```
cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal*.jar $HBASE_HOME/lib
```
Tune the HBase or Oracle NoSQL Database configuration, as described in other tuning topics.
Log in to Cloudera Manager as the admin user, and restart the HBase service. Restarting enables the Region Servers to use the new configuration settings.

1.9.3 About the Property Graph Installation Directory

The installation directory for Oracle Big Data Spatial and Graph property graph features has the following structure:

$ tree -dFL 2 /opt/oracle/oracle-spatial-graph/property_graph/
/opt/oracle/oracle-spatial-graph/property_graph/
|-- dal
|   |-- groovy
|   |-- opg-solr-config
|   `-- webapp
|-- data
|-- doc
|   |-- dal
|   `-- pgx
|-- examples
|   |-- dal
|   |-- pgx
|   `-- pyopg
|-- lib
|-- librdf
`-- pgx
    |-- bin
    |-- conf
    |-- groovy
    |-- scripts
    |-- webapp
    `-- yarn

1.9.4 Optional Installation Task for In-Memory Analyst Use

Follow this installation task if property graph support is installed on a client without Hadoop, and you want to read graph data stored in the Hadoop Distributed File System (HDFS) into the in-memory analyst and write the results back to the HDFS, and/or use Hadoop NextGen MapReduce (YARN) scheduling to start, monitor and stop the in-memory analyst.

1.9.4.1 Installing and Configuring Hadoop

To install and configure Hadoop, follow these steps.

Download the tarball for a supported version of the Cloudera CDH.
Unpack the tarball into a directory of your choice. For example:
```
tar xvf hadoop-2.5.0-cdh5.2.1.tar.gz -C /opt
```
Have the HADOOP_HOME environment variable point to the installation directory. For example.
```
export HADOOP_HOME=/opt/hadoop-2.5.0-cdh5.2.1
```
Add $HADOOP_HOME/bin to the PATH environment variable. For example:
```
export PATH=$HADOOP_HOME/bin:$PATH
```
Configure $HADOOP_HOME/etc/hadoop/hdfs-site.xml to point to the HDFS name node of your Hadoop cluster.
Configure $HADOOP_HOME/etc/hadoop/yarn-site.xml to point to the resource manager node of your Hadoop cluster.
Configure the fs.defaultFS field in $HADOOP_HOME/etc/hadoop/core-site.xml to point to the HDFS name node of your Hadoop cluster.

1.9.4.2 Running the In-Memory Analyst on Hadoop

When running a Java application using in-memory analytics and HDFS, make sure that $HADOOP_HOME/etc/hadoop is on the classpath, so that the configurations get picked up by the Hadoop client libraries. However, you do not need to do this when using the in-memory analyst shell, because it adds $HADOOP_HOME/etc/hadoop automatically to the classpath if HADOOP_HOME is set.

You do not need to put any extra Cloudera Hadoop libraries (JAR files) on the classpath. The only time you need the YARN libraries is when starting the in-memory analyst as a YARN service. This is done with the yarn command, which automatically adds all necessary JAR files from your local installation to the classpath.

You are now ready to load data from HDFS or start the in-memory analyst as a YARN service. For further information about Hadoop, see the CDH 5.x.x documentation.

1.10 Installing and Configuring Multimedia Analytics Support

To use the Multimedia analytics feature, the video analysis framework must be installed and configured.

1.10.1 Assumptions and Libraries for Multimedia Analytics

If you have licensed Oracle Big Data Spatial and Graph with Oracle Big Data Appliance, the video analysis framework for Multimedia analytics is already installed and configured. However, you must set $MMA_HOME to point to /opt/oracle/oracle-spatial-graph/multimedia.

Otherwise, you can install the framework on Cloudera CDH 5 or similar Hadoop environment, as follows:

Install the framework by using the following command on each node on the cluster:
```
rpm2cpio oracle-spatial-graph-<version>.x86_64.rpm | cpio -idmv
```
Set $MMA_HOME to point to /opt/oracle/oracle-spatial-graph/multimedia.
Identify the locations of the following libraries:
- Hadoop jar files (available in $HADOOP_HOME/jars)
- Video processing libraries (see Transcoding Software (Options)
- OpenCV libraries (available with the product)
If necessary, install the desired video processing software to transcode video data (see Transcoding Software (Options)).

1.10.2 Transcoding Software (Options)

The following options are available for transcoding video data:

JCodec
FFmpeg
Third-party transcoding software

To use Multimedia analytics with JCodec (which is included with the product), when running the Hadoop job to recognize faces, set the oracle.ord.hadoop.ordframegrabber property to the following value: oracle.ord.hadoop.decoder.OrdJCodecFrameGrabber

To use Multimedia analytics with FFmpeg:

Download FFmpeg from: https://www.ffmpeg.org/.
Install FFmpeg on the Hadoop cluster.
Set the oracle.ord.hadoop.ordframegrabber property to the following value: oracle.ord.hadoop.decoder.OrdFFMPEGFrameGrabber

To use Multimedia analytics with custom video decoding software, implement the abstract class oracle.ord.hadoop.decoder.OrdFrameGrabber. See the Javadoc for more details