1 Big Data Spatial and Graph Overview

This chapter provides an overview of Oracle Big Data support for Oracle Spatial and Graph spatial, property graph, and multimedia analytics features.

1.1 About Big Data Spatial and Graph

Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms.

The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.

The property graph features support Apache Hadoop HBase and Oracle NoSQL Database for graph operations, indexing, queries, search, and in-memory analytics.

The multimedia analytics features provide a framework for processing video and image data in Apache Hadoop, including built-in face recognition using OpenCV.

1.2 Spatial Features

Spatial location information is a common element of Big Data.

Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.

The spatial features of Oracle Big Data Spatial and Graph support those use cases with the following kinds of services.

Vector Services:

  • Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy

  • Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats

  • An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems

  • Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others

  • Spatial indexing for fast retrieval of data

Raster Services:

  • Support for many image file formats supported by GDAL and image files stored in HDFS

  • A sample console to view the set of images that are available

  • Raster operations, including, subsetting, georeferencing, mosaics, and format conversion

1.3 Property Graph Features

Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.

Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs.

The property graph features of Oracle Big Data Spatial and Graph support those use cases with the following capabilities:

  • A scalable graph database on Apache HBase and Oracle NoSQL Database

  • Developer-based APIs based upon Tinkerpop Blueprints, and Java graph APIs

  • Text search and query through integration with Apache Lucene and SolrCloud

  • Scripting languages support for Groovy and Python

  • A parallel, in-memory graph analytics engine

  • A fast, scalable suite of social network analysis functions that include ranking, centrality, recommender, community detection, path finding

  • Parallel bulk load and export of property graph data in Oracle-defined flat files format

  • Manageability through a Groovy-based console to execute Java and Tinkerpop Gremlin APIs

1.3.1 Property Graph Sizing Recommendations

The following are recommendations for property graph installation.

Table 1-1 Property Graph Sizing Recommendations

Graph Size Recommended Physical Memory to be Dedicated Recommended Number of CPU Processors

10 to 100M edges

Up to 14 GB RAM

2 to 4 processors, and up to 16 processors for more compute-intensive workloads

100M to 1B edges

14 GB to 100 GB RAM

4 to 12 processors, and up to 16 to 32 processors for more compute-intensive workloads

Over 1B edges

Over 100 GB RAM

12 to 32 processors, or more for especially compute-intensive workloads

1.4 Multimedia Analytics Features

The multimedia analytics feature of Oracle Big Data Spatial and Graph provides a framework for processing video and image data in Apache Hadoop. The framework enables distributed processing of video and image data.

A main use case is performing facial recognition in videos and images.

1.5 Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance

The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including the spatial, property graph, and multimedia capabilities.

You can enable this option during an initial software installation, or afterward using the bdacli utility.

To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.

To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster.

See Also:

Oracle Big Data Appliance Owner's Guide for software configuration instructions.

1.6 Installing and Configuring the Big Data Spatial Image Processing Framework

Installing and configuring the Image Processing Framework depends upon the distribution being used.

For both distributions:

1.6.1 Getting and Compiling the Cartographic Projections Library

Before installing the Image Processing Framework, you must download the Cartographic Projections Library and perform several related operations.

  1. Download the PROJ.4 source code and datum shifting files:

    $ wget http://download.osgeo.org/proj/proj-4.9.1.tar.gz
    $ wget http://download.osgeo.org/proj/proj-datumgrid-1.5.tar.gz
    
  2. Untar the source code, and extract the datum shifting files in the nad subdirectory:

    $ tar xzf proj-4.9.1.tar.gz
    $ cd proj-4.9.1/nad
    $ tar xzf ../../proj-datumgrid-1.5.tar.gz
    $ cd ..
    
  3. Configure, make, and install PROJ.4:

    $ ./configure
    $ make
    $ sudo make install
    $ cd ..
    

    libproj.so is now available at /usr/local/lib/libproj.so.

  4. Copy the libproj.so file in the spatial installation directory:

    cp /usr/local/lib/libproj.so /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
  5. Provide read and execute permissions for the libproj.so library for all users

    sudo chmod 755 /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so

1.6.2 Installing the Image Processing Framework for Oracle Big Data Appliance Distribution

The Oracle Big Data Appliance distribution comes with a pre-installed configuration, though you must ensure that the image processing framework has been installed.

Be sure that the actions described in Getting and Compiling the Cartographic Projections Library have been performed, so that libproj.so (PROJ.4) is accessible to all users and is set up correctly.

For OBDA, ensure that the following directories exist:

  • SHARED_DIR (shared directory for all nodes in the cluster): /opt/shareddir

  • ALL_ACCESS_DIR (shared directory for all nodes in the cluster with Write access to the hadoop group): /opt/shareddir/spatial

1.6.3 Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance)

For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.

1.6.3.1 Prerequisites for Installing the Image Processing Framework for Other Distributions
  • Ensure that HADOOP_LIB_PATH is under /usr/lib/hadoop. If it is not there, find the path and use it as it your HADOOP_LIB_PATH.

  • Install NFS.

  • Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.

  • Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER

  • Download oracle-spatial-graph-<version>.x86_64.rpm from the Oracle e-delivery web site.

  • Execute oracle-spatial-graph-<version>.x86_64.rpm using the rpm command.

  • After rpm executes, verify that a directory structure created at /opt/oracle/oracle-spatial-graph/spatial/raster contains these folders: console, examples, jlib, gdal, and tests. Additionally, index.html describes the content, and javadoc.zip contains the Javadoc for the API..

1.6.3.2 Installing the Image Processing Framework for Other Distributions
  1. Make the libproj.so (Proj.4) Cartographic Projections Library accessible to the users, as explained in Getting and Compiling the Cartographic Projections Library.
  2. In the Resource Manager Node, copy the data folder under /opt/oracle/oracle-spatial-graph/spatial/raster/gdal into the SHARED_FOLDER as follows:

    cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/data SHARED_FOLDER

  3. Create a directory ALL_ACCESS_FOLDER under SHARED_FOLDER with write access for all users involved in job execution. Also consider the yarn user in the write access because job results are written by this user. Group access may be used to configure this.

    Go to the shared folder.

    cd SHARED_FOLDER

    Create a new directory.

    mkdir ALL_ACCESS_FOLDER

    Provide write access.

    chmod 777 ALL_ACCESS_FOLDER

  4. Copy the data folder under /opt/oracle/oracle-spatial-graph/spatial/raster/examples into ALL_ACCESS_FOLDER.

    cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/examples/data ALL_ACCESS_FOLDER

  5. Provide write access to the data/xmls folder as follows (or just ensure that users executing the jobs, including tests and examples, have write access):

    chmod 777 ALL_ACCESS_FOLDER/data/xmls/

1.6.4 Post-installation Verification of the Image Processing Framework

Several test scripts are provided to perform the following verification operations.

  • Test the image loading functionality

  • Test test the image processing functionality

  • Test a processing class for slope calculation in a DEM and a map algebra operation

  • Verify the image processing of a single raster with no mosaic process (it includes a user-provided function that calculates hill shade in the mapping phase).

  • Test processing of two rasters using a mask operation

Execute these scripts to verify a successful installation of image processing framework.

If the cluster has security enabled, make sure the current user is in the princs list and has an active Kerberos ticket.

Make sure the user has write access to ALL_ACCESS_FOLDER and that it belongs to the owner group for this directory. It is recommended that jobs be executed in Resource Manager node for Big Data Appliance. If jobs are executed in a different node, then the default is the hadoop group.

For GDAL to work properly, the libraries must be available using $LD_LIBRARY_PATH. Make sure that the shared libraries path is set properly in your shell window before executing a job. For example:

export LD_LIBRARY_PATH=$ALLACCESSDIR/gdal/native
1.6.4.1 Image Loading Test Script

This script loads a set of six test rasters into the ohiftest folder in HDFS, 3 rasters of byte data type and 3 bands, 1 raster (DEM) of float32 data type and 1 band, and 2 rasters of int32 data type and 1 band. No parameters are required for OBDA environments and a single parameter with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Internally, the job creates a split for every raster to load. Split size depends on the block size configuration; for example, if a block size >= 64MB is configured, 4 mappers will run; and as a result the rasters will be loaded in HDFS and a corresponding thumbnail will be created for visualization. An external image editor is required to visualize the thumbnails, and an output path of these thumbnails is provided to the users upon successful completion of the job.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageloader.sh

For ODBA environments, enter:

./runimageloader.sh

For non-ODBA environments, enter:

./runimageloader.sh ALL_ACCESS_FOLDER

Upon successful execution, the message GENERATED OHIF FILES ARE LOCATED IN HDFS UNDER is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:

“THUMBNAILS CREATED ARE:
----------------------------------------------------------------------
total 13532
drwxr-xr-x 2 yarn yarn 4096 Sep 9 13:54 .
drwxr-xr-x 3 yarn yarn 4096 Aug 27 11:29 ..
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 hawaii.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32_1.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 kahoolawe.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 maui.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 4182040 Sep 9 13:54 NapaDEM.tif.ohif.tif
YOU MAY VISUALIZE THUMBNAILS OF THE UPLOADED IMAGES FOR REVIEW FROM THE FOLLOWING PATH:

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

NOT ALL THE IMAGES WERE UPLOADED CORRECTLY, CHECK FOR HADOOP LOGS

The amount of memory required to execute mappers and reducers depends on the configured HDFS block size By default, 1 GB of memory is assigned for Java, but you can modify that and other properties in the imagejob.prop file that is included in this test directory.

1.6.4.2 Image Processor Test Script (Mosaicking)

This script executes the processor job by setting three source rasters of Hawaii islands and some coordinates that includes all three. The job will create a mosaic based on these coordinates and resulting raster should include the three rasters combined in a single one.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 3 band rasters of byte data type.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Additionally, if the output should be stored in HDFS, the "-o" parameters must be used to set the HDFS folder where the mosaic output will be stored.

Internally the job filters the tiles using the coordinates specified in the configuration input, xml, only the required tiles are processed in a mapper and finally in the reduce phase, all of them are put together into the resulting mosaic raster.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessor.sh

For ODBA environments, enter:

./runimageprocessor.sh

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif is displayed, with the path to the output mosaic file. The output may include:

EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif
total 9452
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:12 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4741101 Sep 10 09:12 hawaiimosaic.tif

MOSAIC IMAGE GENERATED
----------------------------------------------------------------------
YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

To test the output storage in HDFS, use the following command

For ODBA environments, enter:

./runimageprocessor.sh -o hdfstest

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER -o hdfstest
1.6.4.3 Single-Image Processor Test Script

This script executes the processor job for a single raster, in this case is a DEM source raster of North Napa Valley. The purpose of this job is process the complete input by using the user processing classes configured for the mapping phase. This class calculates the hillshade of the DEM, and this is set to the output file. No mosaic operation is performed here.

runimageloader.sh should be executed as a prerequisite, so that the source raster exists in HDFS. This is 1 band of float 32 data type DEM rasters.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runsingleimageprocessor.sh

For ODBA environments, enter:

./runsingleimageprocessor.sh

For non-ODBA environments, enter:

./runsingleimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path to the output DEM file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaDEM.tif
IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
1.6.4.4 Image Processor DEM Test Script

This script executes the processor job by setting a DEM source raster of North Napa Valley and some coordinates that surround it. The job will create a mosaic based on these coordinates and will also calculate the slope on it by setting a processing class in the mosaic configuration XML.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. This is 1 band of float 32 data type DEM raster.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessordem.sh

For ODBA environments, enter:

./runimageprocessordem.sh

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path to the slope output file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaSlope.tif
MOSAIC IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

You may also test the “if” algebra function, where every pixel in this raster with value greater than 2500 will be replaced by the value you set in the command line using the “–c” flag. For example:

For ODBA environments, enter:

./runimageprocessordem.sh –c 8000

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER –c 8000

You can visualize the output file and notice the difference between simple slope calculation and this altered output, where the areas with pixel values greater than 2500 look more clear.

1.6.4.5 Multiple Raster Operation Test Script

This script executes the processor job for two rasters that cover a very small area of North Napa Valley in the US state of California.

These rasters have the same MBR, pixel size, SRID, and data type, all of which are required for complex multiple raster operation processing. The purpose of this job is process both rasters by using the mask operation, which checks every pixel in the second raster to validate if its value is contained in the mask list. If it is, the output raster will have the pixel value of the first raster for this output cell; otherwise, the zero (0) value is set. No mosaic operation is performed here.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 1 band of int32 data type rasters.

No parameters are required for OBDA environments. For non-ODBA environments, a single parameter -s with the ALL_ACCESS_FOLDER value is required.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessormultiple.sh

For ODBA environments, enter:

./runimageprocessormultiple.sh

For non-ODBA environments, enter:

./runimageprocessormultiple.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif is displayed, with the path to the mask output file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 MaskInt32Rasters.tif
IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

1.7 Installing the Oracle Big Data SpatialViewer Web Application

To install the Oracle Big Data SpatialViewer web application (SpatialViewer), follow the instructions in this topic.

1.7.1 Assumptions for SpatialViewer

The following assumptions apply for installing and configuring SpatialViewer.

1.7.2 Installing SpatialViewer on Oracle Big Data Appliance

You can install SpatialViewer on Big Data Appliance as follows

  1. Run the following script:

    sudo /opt/oracle/oracle-spatial-graph/spatial/configure-server/install-bdsg-consoles.sh
  2. Start the web application by using one of the following commands (the second command enables you to view logs):

    sudo service bdsg start
    sudo /opt/oracle/oracle-spatial-graph/spatial/web-server/start-server.sh

    If any errors occur, see the the README file located in /opt/oracle/oracle-spatial-graph/spatial/configure-server.

  3. Open: http://<oracle_big_data_spatial_vector_console>:8045/spatialviewer/

  4. If the active nodes have changed after the installation or if Kerberos is enabled, then update the configuration file as described in Configuring SpatialViewer on Oracle Big Data Appliance.

  5. Optionally, upload sample data (used with examples in other topics) to HDFS:

    sudo -u hdfs hadoop fs -mkdir /user/oracle/bdsg
    sudo -u hdfs hadoop fs -put /opt/oracle/oracle-spatial-graph/spatial/vector/examples/data/tweets.json /user/oracle/bdsg/
    

1.7.3 Installing SpatialViewer for Other Systems (Not Big Data Appliance)

Follow the steps for manual configuration described in in Installing SpatialViewer on Oracle Big Data Appliance.

Then, change the configuration, as described in Configuring SpatialViewer for Other Systems (Not Big Data Appliance)

1.7.4 Configuring SpatialViewer on Oracle Big Data Appliance

To configure SpatialViewer on Oracle Big Data Appliance, follow these steps.

  1. Open the console: http://<oracle_big_data_spatial_vector_console>:8045/spatialviewer/?root=swadmin

  2. Change the general configuration, as needed:

    • Local working directory: SpatialViewer local working directory. Absolute path. The default directory /usr/oracle/spatialviewer is created when installing SpatialViewer.

    • HDFS working directory: SpatialViewer HDFS working directory. The default directory /user/oracle/spatialviewer is created when installing SpatialViewer.

    • Hadoop configuration file: The Hadoop configuration directory. By default: /etc/hadoop/conf

      If you change this value, you must restart the server.

    • Spark configuration file: The Spark configuration directory. By default: /etc/spark/conf

      If you change this value, you must restart the server.

    • eLocation URL: URL used to get the eLocation background maps. By default: http://elocation.oracle.com

    • Kerberos keytab: If Kerberos is enabled, provide the full path to the file that contains the keytab file.

    • Display logs: If necessary, disable the display of the jobs in the Spatial Jobs screen. Disable this display if the logs are not in the default format. The default format is: Date LogLevel LoggerName: LogMessage

      The Date must have the default format: yyyy-MM-dd HH:mm:ss,SSS. For example: 2012-11-02 14:34:02,781.

      If the logs are not displayed and the Display logs field is set to Yes, then ensure that yarn.log-aggregation-enable in yarn-site.xml is set to true. Also ensure that the Hadoop jobs configuration parameters yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix are set to the same value as in yarn-site.xml.

  3. Change the raster configuration, as needed:

    • Shared directory: Directory used to read and write from different nodes, which requires that is be shared and have the greatest permissions or at least be in the Hadoop user group.

    • Network file system mount point: NFS mountpoint that allows the shared folders to be seen and accessed individually. Can be blank if you are using a non-distributed environment.

    • GDAL directory: Native GDAL installation directory. Must be accessible to all the cluster nodes.

      If you change this value, you must restart the server.

    • Shared GDAL data directory: GDAL shared data folder. Must be a shared directory. (See the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).)

  4. Change the Hadoop configuration, as needed.

  5. Change the Spark configuration, as needed. The raster processor needs additional configuration details:

    • spark.driver.extraClassPath, spark.executor.extraClassPath: Specify your hive library installation using these keys. Example: /usr/lib/hive/lib/*

    • spark.kryoserializer.buffer.max: Enter the memory for the data serialization. Example: 160m

  6. If Kerberos is enabled, then you may need to add the parameters:

    • spark.yarn.keytab: the full path to the file that contains the keytab for the principal.

    • spark.yarn.principal: the principal to be used to log in to Kerberos. The format of a typical Kerberos V5 principal is primary/instance@REALM.

  7. On Linux systems, you may need to change the secure container executor to LinuxContainerExecutor. For that, set the following parameters:

    • Set yarn.nodemanager.container-executor.class to org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.

    • Set yarn.nodemanager.linux-container-executor.group to hadoop.

  8. Ensure that the user can read the keytab file.

  9. Copy the keytab file to the same location on all the nodes of the cluster.

1.7.5 Configuring SpatialViewer for Other Systems (Not Big Data Appliance)

Before installing the SpatialViewer on other systems, you must install the image processing framework as specified in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).

Then follow the steps mentioned in Configuring SpatialViewer on Oracle Big Data Appliance.

Additionally, change the Hadoop Configuration, replacing the Hadoop property yarn.application.classpath value /opt/cloudera/parcels/CDH/lib/ with the actual library path, which by default is /usr/lib/.

Additionally, change the Hadoop and Spark configuration, replacing the Hadoop conf. directory and Spark conf. directory values according your Hadoop and Spark installations.

1.8 Installing Property Graph Support on a CDH Cluster or Other Hardware

You can use property graphs on either Oracle Big Data Appliance or commodity hardware.

1.8.1 Apache HBase Prerequisites

The following prerequisites apply to installing property graph support in HBase.

Details about supported versions of these products, including any interdependencies, will be provided in a My Oracle Support note.

1.8.2 Property Graph Installation Steps

To install property graph support, follow these steps.

  1. Unzip the software package:
    rpm -i oracle-spatial-graph-<version>.x86_64.rpm
    

    By default, the software is installed in the following directory: /opt/oracle/

    After the installation completes, the opt/oracle/oracle-spatial-graph directory exists and includes a property_graph subdirectory.

  2. Set the JAVA_HOME environment variable. For example:
    setenv JAVA_HOME  /usr/local/packages/jdk8
    
  3. Set the PGX_HOME environment variable. For example:
    setenv PGX_HOME /opt/oracle/oracle-spatial-graph/pgx
    
  4. If HBase will be used, set the HBASE_HOME environment variable in all HBase region servers in the Apache Hadoop cluster. (HBASE_HOME specifies the location of the hbase installation directory.) For example:
    setenv HBASE_HOME /usr/lib/hbase
    

    Note that on some installations of Big Data Appliance, Apache HBase is placed in a directory like the following: /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hbase/

  5. If HBase will be used, copy the data access layer library into $HBASE_HOME/lib. For example:
    cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal*.jar $HBASE_HOME/lib
    
  6. Tune the HBase or Oracle NoSQL Database configuration, as described in other tuning topics.
  7. Log in to Cloudera Manager as the admin user, and restart the HBase service. Restarting enables the Region Servers to use the new configuration settings.

1.8.3 About the Property Graph Installation Directory

The installation directory for Oracle Big Data Spatial and Graph property graph features has the following structure:

$ tree -dFL 2 /opt/oracle/oracle-spatial-graph/property_graph/
/opt/oracle/oracle-spatial-graph/property_graph/
|-- dal
|   |-- groovy
|   |-- opg-solr-config
|   `-- webapp
|-- data
|-- doc
|   |-- dal
|   `-- pgx
|-- examples
|   |-- dal
|   |-- pgx
|   `-- pyopg
|-- lib
|-- librdf
`-- pgx
    |-- bin
    |-- conf
    |-- groovy
    |-- scripts
    |-- webapp
    `-- yarn

1.8.4 Optional Installation Task for In-Memory Analyst Use

Follow this installation task if property graph support is installed on a client without Hadoop, and you want to read graph data stored in the Hadoop Distributed File System (HDFS) into the in-memory analyst and write the results back to the HDFS, and/or use Hadoop NextGen MapReduce (YARN) scheduling to start, monitor and stop the in-memory analyst.

1.8.4.1 Installing and Configuring Hadoop

To install and configure Hadoop, follow these steps.

  1. Download the tarball for a supported version of the Cloudera CDH.
  2. Unpack the tarball into a directory of your choice. For example:
    tar xvf hadoop-2.5.0-cdh5.2.1.tar.gz -C /opt
    
  3. Have the HADOOP_HOME environment variable point to the installation directory. For example.
    export HADOOP_HOME=/opt/hadoop-2.5.0-cdh5.2.1
    
  4. Add $HADOOP_HOME/bin to the PATH environment variable. For example:
    export PATH=$HADOOP_HOME/bin:$PATH
    
  5. Configure $HADOOP_HOME/etc/hadoop/hdfs-site.xml to point to the HDFS name node of your Hadoop cluster.
  6. Configure $HADOOP_HOME/etc/hadoop/yarn-site.xml to point to the resource manager node of your Hadoop cluster.
  7. Configure the fs.defaultFS field in $HADOOP_HOME/etc/hadoop/core-site.xml to point to the HDFS name node of your Hadoop cluster.
1.8.4.2 Running the In-Memory Analyst on Hadoop

When running a Java application using in-memory analytics and HDFS, make sure that $HADOOP_HOME/etc/hadoop is on the classpath, so that the configurations get picked up by the Hadoop client libraries. However, you do not need to do this when using the in-memory analyst shell, because it adds $HADOOP_HOME/etc/hadoop automatically to the classpath if HADOOP_HOME is set.

You do not need to put any extra Cloudera Hadoop libraries (JAR files) on the classpath. The only time you need the YARN libraries is when starting the in-memory analyst as a YARN service. This is done with the yarn command, which automatically adds all necessary JAR files from your local installation to the classpath.

You are now ready to load data from HDFS or start the in-memory analyst as a YARN service. For further information about Hadoop, see the CDH 5.x.x documentation.

1.9 Installing and Configuring Multimedia Analytics Support

To use the Multimedia analytics feature, the video analysis framework must be installed and configured.

Note:

The multimedia analytics feature of Big Data Spatial and Graph is deprecated in Big Data Spatial and Graph Release 2.5 and may be desupported in a future release.  There is no replacement for the multimedia analytics features.

1.9.1 Assumptions and Libraries for Multimedia Analytics

If you have licensed Oracle Big Data Spatial and Graph with Oracle Big Data Appliance, the video analysis framework for Multimedia analytics is already installed and configured. However, you must set $MMA_HOME to point to /opt/oracle/oracle-spatial-graph/multimedia.

Otherwise, you can install the framework on Cloudera CDH 5 or similar Hadoop environment, as follows:

  1. Install the framework by using the following command on each node on the cluster:

    rpm2cpio oracle-spatial-graph-<version>.x86_64.rpm | cpio -idmv

    You can use the dcli utility (see Executing Commands Across a Cluster Using the dcli Utility).

  2. Set $MMA_HOME to point to /opt/oracle/oracle-spatial-graph/multimedia.

  3. Identify the locations of the following libraries:

    • Hadoop jar files (available in $HADOOP_HOME/jars)

    • Video processing libraries (see Transcoding Software (Options)

    • OpenCV libraries (available with the product)

  4. Copy all the lib* files from $MMA_HOME/opencv_3.1.0/lib to the native Hadoop library location.

    On Oracle Big Data Appliance, this location is /opt/cloudera/parcels/CDH/lib/hadoop/lib/native.

  5. If necessary, install the desired video processing software to transcode video data (see Transcoding Software (Options)).

1.9.2 Transcoding Software (Options)

The following options are available for transcoding video data:

  • JCodec

  • FFmpeg

  • Third-party transcoding software

To use Multimedia analytics with JCodec (which is included with the product), when running the Hadoop job to recognize faces, set the oracle.ord.hadoop.ordframegrabber property to the following value: oracle.ord.hadoop.decoder.OrdJCodecFrameGrabber

To use Multimedia analytics with FFmpeg:

  1. Download FFmpeg from: https://www.ffmpeg.org/.

  2. Install FFmpeg on the Hadoop cluster.

  3. Set the oracle.ord.hadoop.ordframegrabber property to the following value: oracle.ord.hadoop.decoder.OrdFFMPEGFrameGrabber

To use Multimedia analytics with custom video decoding software, implement the abstract class oracle.ord.hadoop.decoder.OrdFrameGrabber. See the Javadoc for more details