1 Big Data Spatial and Graph Overview

This chapter provides an overview of Oracle Big Data support for Oracle Spatial and Graph spatial, property graph, and multimedia analytics features.

1.1 About Big Data Spatial and Graph

Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms.

The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.

The property graph features support Apache Hadoop HBase and Oracle NoSQL Database for graph operations, indexing, queries, search, and in-memory analytics.

The multimedia analytics features provide a framework for processing video and image data in Apache Hadoop, including built-in face recognition using OpenCV.

1.2 Spatial Features

Spatial location information is a common element of Big Data. Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.

The Spatial features of Oracle Big Data Special and Graph support those use cases with the following kinds of services.

Vector Services:

  • Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy

  • Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats

  • An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems

  • Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others

  • Spatial indexing for fast retrieval of data

Raster Services:

  • Support for many image file formats supported by GDAL and image files stored in HDFS

  • A sample console to view the set of images that are available

  • Raster operations, including, subsetting, georeferencing, mosaics, and format conversion

1.3 Property Graph Features

Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.

Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs.

The property graph features of Oracle Big Data Special and Graph support those use cases with the following capabilities:

  • A scalable graph database on Apache HBase and Oracle NoSQL Database

  • Developer-based APIs based upon Tinkerpop Blueprints, and Java graph APIs

  • Text search and query through integration with Apache Lucene and SolrCloud

  • Scripting languages support for Groovy and Python

  • A parallel, in-memory graph analytics engine

  • A fast, scalable suite of social network analysis functions that include ranking, centrality, recommender, community detection, path finding

  • Parallel bulk load and export of property graph data in Oracle-defined flat files format

  • Manageability through a Groovy-based console to execute Java and Tinkerpop Gremlin APIs

See also Property Graph Sizing Recommendations

1.3.1 Property Graph Sizing Recommendations

The following are recommendations for property graph installation.


Table 1-1 Property Graph Sizing Recommendations

Graph Size Recommended Physical Memory to be Dedicated Recommended Number of CPU Processors

10 to 100M edges

Up to 14 GB RAM

2 to 4 processors, and up to 16 processors for more compute-intensive workloads

100M to 1B edges

14 GB to 100 GB RAM

4 to 12 processors, and up to 16 to 32 processors for more compute-intensive workloads

Over 1B edges

Over 100 GB RAM

12 to 32 processors, or more for especially compute-intensive workloads


1.4 Multimedia Analytics Features

The multimedia analytics feature of Oracle Big Data Spatial and Graph provides a framework for processing video and image data in Apache Hadoop. The framework enables distributed processing of video and image data.

A main use case is performing facial recognition in videos and images.

1.5 Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance

The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including the spatial, property graph, and multimedia capabilities. You can enable this option during an initial software installation, or afterward using the bdacli utility.

To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.

To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster.

See Also:

Oracle Big Data Appliance Owner's Guide for software configuration instructions.

1.6 Installing and Configuring the Big Data Spatial Image Processing Framework

Installing and configuring the Image Processing Framework depends upon the distribution being used.

For both distributions, you must download and compile PROJ libraries, as explained in Getting and Compiling the Cartographic Projections Library.

After performing the installation, verify it (see Post-installation Verification of the Image Processing Framework).

1.6.1 Getting and Compiling the Cartographic Projections Library

Before installing the Image Processing Framework, you must download the Cartographic Projections Library and perform several related operations.

  1. Download the PROJ.4 source code and datum shifting files:

    $ wget http://download.osgeo.org/proj/proj-4.9.1.tar.gz
    $ wget http://download.osgeo.org/proj/proj-datumgrid-1.5.tar.gz
    
  2. Untar the source code, and extract the datum shifting files in the nad subdirectory:

    $ tar xzf proj-4.9.1.tar.gz
    $ cd proj-4.9.1/nad
    $ tar xzf ../../proj-datumgrid-1.5.tar.gz
    $ cd ..
    
  3. Configure, make, and install PROJ.4:

    $ ./configure
    $ make
    $ sudo make install
    $ cd ..
    

    libproj.so is now available at /usr/local/lib/libproj.so.

  4. Create a link to the libproj.so file in the spatial installation directory:

    sudo ln -s /usr/local/lib/libproj.so /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
    
  5. Provide read and execute permissions for the libproj.so library for all users

    sudo chmod 755 /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
    

1.6.2 Installing the Image Processing Framework for Oracle Big Data Appliance Distribution

The Oracle Big Data Appliance distribution comes with a pre-installed configuration. However, be sure that the actions described in Getting and Compiling the Cartographic Projections Library have been performed, so that libproj.so (PROJ.4) is accessible to all users and is set up correctly.

1.6.3 Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance)

For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.

1.6.3.1 Prerequisites for Installing the Image Processing Framework for Other Distributions

  • Ensure that HADOOP_LIB_PATH is under /usr/lib/hadoop. If it is not there, find the path and use it as it your HADOOP_LIB_PATH.

  • Install NFS.

  • Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.

  • Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER

  • Download oracle-spatial-graph-<version>.x86_64.rpm from the Oracle e-delivery web site.

  • Execute oracle-spatial-graph-<version>.x86_64.rpm using the rpm command.

  • After rpm executes, verify that a directory structure created at /opt/oracle/oracle-spatial-graph/spatial contains these folders: console, examples, jlib, gdal, and tests. Additionally, index.html describes the content, and HadoopRasterProcessorAPI.zip contains the Javadoc for the API..

1.6.3.2 Installing the Image Processing Framework for Other Distributions

  1. Make the libproj.so (Proj.4) Cartographic Projections Library accessible to the users, as explained in Getting and Compiling the Cartographic Projections Library.
  2. In the Resource Manager Node, copy the gdal data folder under /opt/oracle/oracle-spatial-graph/spatial/gdal and gdalplugins under /opt/oracle/oracle-spatial-graph/spatial/gdal into the SHARED_FOLDER as follows:

    cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/data SHARED_FOLDER

  3. Create a directory ALL_ACCESS_FOLDER under SHARED_FOLDER with write access for all users involved in job execution. Also consider the yarn user in the write access because job results are written by this user. Group access may be used to configure this.

    Go to the shared folder.

    cd SHARED_FOLDER

    Create a new directory.

    mkdir ALL_ACCESS_FOLDER

    Provide write access.

    chmod 777 ALL_ACCESS_FOLDER

  4. Copy the data folder under /opt/oracle/oracle-spatial-graph/spatial/raster/examples into ALL_ACCESS_FOLDER.

    cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/examples/data ALL_ACCESS_FOLDER

  5. Provide write access to the data/xmls folder as follows (or just ensure that users executing the jobs, including tests and examples, have write access):

    chmod 777 ALL_ACCESS_FOLDER/data/xmls/

1.6.4 Post-installation Verification of the Image Processing Framework

Several test scripts are provided: one to test the image loading functionality, another to test the image processing functionality, another to test a processing class for slope calculation in a DEM and a map algebra operation, and another to verify the image processing of a single raster with no mosaic process (it includes a user-provided function that calculates hill shade in the mapping phase). Execute these scripts to verify a successful installation of image processing framework.

If the cluster has security enabled, make sure the current user is in the princs list and has an active Kerberos ticket.

Make sure the user has write access to $ALL_ACCESS_FOLDER. For Oracle Big Data Spatial and Graph, this directory has default write access for the hadoop user group.

1.6.4.1 Image Loading Test Script

This script loads a set of four test rasters into the ohiftest folder in HDFS, 3 rasters of byte data type and 3 bands and 1 raster (DEM) of float32 data type and 1 band. No parameters are required for OBDA environments and a single parameter with the $ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Internally, the job creates a split for every raster to load. Split size depends on the block size configuration; for example, if a block size >= 64MB is configured, 4 mappers will run; and as a result the rasters will be loaded in HDFS and a corresponding thumbnail will be created for visualization. An external image editor is required to visualize the thumbnails, and an output path of these thumbnails is provided to the users upon successful completion of the job.

The test script can be found here:

/oracle/oracle-spatial-graph/raster/tests/runimageloader.sh

For ODBA environments, enter:

./runimageloader.sh

For non-ODBA environments, enter:

./runimageloader.sh ALL_ACCESS_FOLDER

Upon successful execution, the message GENERATED OHIF FILES ARE LOCATED IN HDFS UNDER is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:

“THUMBNAILS CREATED ARE:
----------------------------------------------------------------------
total 13532
drwxr-xr-x 2 yarn yarn    4096 Sep  9 13:54 .
drwxr-xr-x 3 yarn yarn    4096 Aug 27 11:29 ..
-rw-r--r-- 1 yarn yarn 3214053 Sep  9 13:54 hawaii.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep  9 13:54 kahoolawe.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep  9 13:54 maui.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 4182040 Sep  9 13:54 NapaDEM.tif.ohif.tif
YOU MAY VISUALIZE THUMBNAILS OF THE UPLOADED IMAGES FOR REVIEW FROM THE FOLLOWING PATH:  

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

NOT ALL THE IMAGES WERE UPLOADED CORRECTLY, CHECK FOR HADOOP LOGS

The amount of memory required to execute mappers and reducers depends on the configured HDFS block size By default, 1 GB of memory is assigned for Java, but you can modify that and other properties in the imagejob.prop file that is included in this test directory.

1.6.4.2 Image Processor Test Script (Mosaicking)

This script executes the processor job by setting three source rasters of Hawaii islands and some coordinates that includes all three. The job will create a mosaic based on these coordinates and resulting raster should include the three rasters combined in a single one.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 3 band rasters of byte data type.

No parameters are required for OBDA environments, and a single parameter "-s" with the $ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Additionally, if the output should be stored in HDFS, the "-o" parameters must be used to set the HDFS folder where the mosaic output will be stored.

Internally the job filters the tiles using the coordinates specified in the configuration input, xml, only the required tiles are processed in a mapper and finally in the reduce phase, all of them are put together into the resulting mosaic raster.

The test script can be found here:

/oracle/oracle-spatial-graph/raster/tests/runimageprocessor.sh

For ODBA environments, enter:

./runimageprocessor.sh

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE IS is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:

ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif
total 9452
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:12 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4741101 Sep 10 09:12 hawaiimosaic.tif

MOSAIC IMAGE GENERATED
----------------------------------------------------------------------
YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

To test the output storage in HDFS, use the following command

For ODBA environments, enter:

./runimageprocessor.sh -o hdfstest

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER -o hdfstest

1.6.4.3 Single-Image Processor Test Script

This script executes the processor job for a single raster, in this case is a DEM source raster of North Napa Valley. The purpose of this job is process the complete input by using the user processing classes configured for the mapping phase. This class calculates the hillshade of the DEM, and this is set to the output file. No mosaic operation is performed here.

runimageloader.sh should be executed as a prerequisite, so that the source raster exists in HDFS. This is 1 band of float 32 datatype DEM rasters..

No parameters are required for OBDA environments, and a single parameter "-s" with the $ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/oracle/oracle-spatial-graph/raster/tests/runsingleimageprocessor.sh

For ODBA environments, enter:

./runsingleimageprocessor.sh

For non-ODBA environments, enter:

./runsingleimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaDEM.tif
IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

1.6.4.4 Image Processor DEM Test Script

This script executes the processor job by setting a DEM source raster of North Napa Valley and some coordinates that surround it. The job will create a mosaic based on these coordinates and will also calculate the slope on it by setting a processing class in the mosaic configuration XML.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 3 band rasters of byte data type.

No parameters are required for OBDA environments, and a single parameter "-s" with the $ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/oracle/oracle-spatial-graph/raster/tests/runimageprocessordem.sh

For ODBA environments, enter:

./runimageprocessordem.sh

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaSlope.tif
MOSAIC IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

You may also test the “if” algebra function, where every pixel in this raster with value greater than 2500 will be replaced by the value you set in the command line using the “–c” flag. For example:

For ODBA environments, enter:

./runimageprocessordem.sh –c 8000

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER –c 8000

You can visualize the output file and notice the difference between simple slope calculation and this altered output, where the areas with pixel values greater than 2500 look more clear.

1.7 Installing and Configuring the Big Data Spatial Image Server

You can access the image processing framework through the Oracle Big Data Spatial Image Server, which provides a web interface for loading and processing images.

Installing and configuring the Spatial Image Server depends upon the distribution being used.

After you perform the installation, verify it (see Post-Installation Verification Example for the Image Server Console).

1.7.1 Installing and Configuring the Image Server for Oracle Big Data Appliance

To perform an automatic installation using the provided script, you can perform these steps:

  1. Run the following script:

    sudo /home/osg/configure-server/install-bdsg-consoles.sh
    

    If the active nodes have changed since the installation, update the configuration file.

  2. Start the server:

    cd /opt/oracle/oracle-spatial-graph/spatial/web-server
    ./start-server.sh
    

The preceding instructions configure the entire server. If no further configuration is required, you can go directly to Post-Installation Verification Example for the Image Server Console.

If you need more information or need to perform other actions, see the following topics:

1.7.1.1 Prerequisites for Performing a Manual Installation

Ensure that you have the prerequisite software installed.

  1. Download the latest Jetty core component binary from the Jetty download page http://www.eclipse.org/jetty/downloads.php onto the Oracle DBA Resource Manager node.
  2. Unzip the imageserver.war file into the jetty webapps directory or any other directory of choice as follows:

    unzip /opt/oracle/oracle-spatial-graph/spatial/jlib/imageserver.war -d $JETTY_HOME/webapps/imageserver

    Note:

    The directory or location under which you unzip the file is known as $JETTY_HOME in this procedure.

  3. Copy Hadoop dependencies as follows:

    cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* $JETTY_HOME/webapps/imageserver/WEB-INF/lib/

    Note:

    If the installation of Jetty is done on a non-Oracle DBA cluster, then replace /opt/cloudera/parcels/CDH/lib/hadoop/ with the actual Hadoop library path, which by default is /usr/lib/hadoop.

1.7.1.2 Installing Dependencies on the Image Server Web on an Oracle Big Data Appliance

  1. Copy the gdal.jar file under /opt/oracle/oracle-spatial-graph/spatial/jlib/gdal.jar to $JETTY_HOME/lib/ext.

  2. Copy the asm-3.1.jar file under /opt/oracle/oracle-spatial-graph/spatial/raster/jlib/asm-3.1.jar to $JETTY_HOME/webapps/imageserver/WEB-INF/lib.

    Note:

    The jersey-core* jars will be duplicated at $JETTY_HOME/webapps/imageserver/WEB-INF/lib. Make sure you remove the old ones and leave just jersey-core-1.17.1.jar in the folder, as in the next step.

  3. Enter the following command:

    ls -lat jersey-core*
    
  4. Delete the listed libraries, except do not delete jersey-core-1.17.1.jar.

  5. In the same directory ($JETTY_HOME/webapps/imageserver/WEB-INF/lib, delete the xercesImpl jar files:

     rm xercesImpl*
    
  6. Start the Jetty server by running: java -Djetty.deploy.scanInterval=0 -jar start.jar

    If you need to change the port, specify it. For example: jetty.http.port=8081

    Ignore any warnings, such as the following:

    java.lang.UnsupportedOperationException:  setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBuilderFactory
    

1.7.1.3 Configuring the Environment for Big Data Appliance

  1. Type the http://thehost:8080/imageserver address in your browser address bar to open the console.

  2. From the Administrator tab, then Configuration tab, in the Hadoop Configuration Parameters section, depending on the cluster configuration change three properties:

    1. fs.defaultFS: Type the active namenode of your cluster in the format hdfs://<namenode>:8020 (Check with the administrator for this information).

    2. yarn.resourcemanager.scheduler.address: Active Resource manager of your cluster. <shcedulername>:8030. This is the Scheduler address.

    3. yarn.resourcemanager.address: Active Resource Manager address in the format <resourcename>:8032

    Note:

    Keep the default values for the rest of the configuration. They are pre-loaded for your Oracle Big Data Appliance cluster environment.

  3. Click Apply Changes to save the changes.

    Tip:

    You can review the missing configuration information under the Hadoop Loader tab of the console.

1.7.2 Installing and Configuring the Image Server Web for Other Systems (Not Big Data Appliance)

To install and configure the image server web for other systems (not Big Data Appliance), see these topics.

1.7.2.1 Prerequisites for Installing the Image Server on Other Systems

1.7.2.2 Installing the Image Server Web on Other Systems

1.7.2.3 Configuring the Environment for Other Systems

  1. Configure the environment as described in Configuring the Environment for Big Data Appliance

    , and then continue with the following steps.
  2. From the Configuration tab in the Global Init Parameters section, depending on the cluster configuration change these properties

    1. shared.gdal.data: Specify the gdal shared data folder. Follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance) .

    2. gdal.lib: Location of the gdal .so libraries.

    3. start: Specify a shared folder to start browsing the images. This folder must be shared between the cluster and NFS mountpoint (SHARED_FOLDER).

    4. saveimages: Create a child folder named saveimages under start (SHARED-FOLDER) with full write access. For example, if start=/home, then saveimages=/home/saveimages.

    5. nfs.mountpoint: If the cluster requires a mount point to access the SHARED_FOLDER, specify a mount point. For example, /net/home. Otherwise, leave it blank.

  3. From the Configuration tab in the Hadoop Configuration Parameters section, update the following property:

    1. yarn.application.classpath: The classpath for the Hadoop to find the required jars and dependencies. Usually this is under /usr/lib/hadoop. For example:

      /etc/hadoop/conf/,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*
      

    Note:

    Keep the default values for the rest of the configuration.

  4. Click Apply Changes to save the changes.

    Tip:

    You can review any missing configuration information under the Hadoop Loader tab of the console.

1.7.3 Post-Installation Verification Example for the Image Server Console

In this example, you will:

  • Load the images from local server to HDFS Hadoop cluster.

  • Run a job to create a mosaic image file and a catalog with several images.

  • View the mosaic image.

Related subtopics:

1.7.3.1 Loading Images from the Local Server to the HDFS Hadoop Cluster

  1. Open (http://<hostname>:8080/imageserver) the Image Server Console.
  2. Go to the Hadoop Loader tab.
  3. Click Open and browse to the demo folder that contains a set of Hawaii images. They can be found at: /opt/shareddir/spatial/data/rasters..
  4. Select the rasters folder and click Load images.

    Wait for the message, 'Images loaded successfully'.

Note:

If no errors were shown, then you have successfully installed the Image Loader web interface.

1.7.3.2 Creating a Mosaic Image and Catalog

  1. Go to the Raster Processing tab.
  2. From the Catalog menu select Catalog > New Catalog > HDFS Catalog.

    A new catalog is created.

  3. From the Imagery menu select Imagery > Add hdfs image.
  4. Browse the HDFS host and add images.

    A new file tree gets created with all the images you just loaded from your host.

  5. Browse the newdata folder and verify the images.
  6. Select the images listed in the pre-visualizer and add click Add.

    The images are added to the bottom sub-panel.

  7. Click Add images.

    The images are added to the main catalog.

  8. Save the catalog.
  9. From the Imagery menu select Imagery > Mosaic.
  10. Copy the testFS.xml file from /opt/shareddir/spatial/data/xmls to your $HOME directory.
  11. Click Load default configuration file, browse to the default home directory, and select testFS.xml.

    Note:

    The default configuration file testFS.xml is included in the demo.

  12. Click Create Mosaic.

    Wait for the image to be created.

  13. Optionally, to download and view the image, click Download.

1.7.3.3 Creating a Mosaic Directly from the Globe

  1. Go to the Hadoop Raster Viewer tab.
  2. Click Refresh Footprint and wait until all footprints are displayed on the panel.
  3. ClickSelect Footprints , then select the desired area, zooming in or out as necessary.
  4. Remove or ignore rasters as necessary.

    If identical rasters are in the result, they are shown in yellow

  5. Right-click on the map and select Generate Mosaic.
  6. Specify the output folder in which to place the mosaic, or load an existing configuration file.
  7. If you want to add an operation on every pixel in the mosaic, click Advanced Configuration..
  8. Click Create Mosaic, and wait for the result.
  9. If you need to remove the selection, click the red circle in the upper-left corner of the map.

    Note:

    If you requested the mosaic to be created on HDFS, you must wait until the image is loaded on HDFS.

  10. Optionally, to download and view the image, click Download.

1.7.3.4 Removing Identical Rasters

  1. Go to the Hadoop Loader tab.
  2. Click Refresh Footprint and wait until all footprints are displayed on the panel..

    If identical rasters are in the result, they are shown in yellow.

  3. For each pair of identical rasters, if you want to select one of them for removal, right-click on its yellow box.

    A new dialog box is displayed.

  4. To remove a raster, click the X button for it.
  5. To see the thumbnail, click in the image.

1.7.4 Using the Provided Image Server Web Services

The image server has two ready-to-use web services, one for the HDFS loader and the other for the HDFS mosaic processor.

These services can be called from a Java application. They are currently supported only for GET operations. The formats for calling them are:

  • Loader: http://host:port/imageserver/rest/hdfsloader?path=string&overlap=string where:

    path: The images to be processed; can be a the path of a single file, or of one or more whole folders. For more than one folder, use commas to separate folder names.

    overlap (optional): The overlap between images (default = 10).

  • Mosaic: http://host:port/imageserver/rest/mosaic?mosaic=string&config=string where:

    mosaic: The XML mosaic file that contains the images to be processed. If you are using the image server web application, the XML file is generated automatically. Example of a mosaic XML file:

    <?xml version='1.0'?>
    <catalog type='HDFS'>
        <image>
           <source>Hadoop File System</source>
           <type>HDFS</type>
           <raster> /hawaii.tif.ohif</raster>
            <bands datatype='1' config='1,2,3'>3</bands>
        </image>
        <image>
           <source>Hadoop File System</source>
           <type>HDFS</type>
           <raster>/ /kahoolawe.tif.ohif</raster>
            <bands datatype='1' config='1,2,3'>3</bands>
        </image>
    </catalog>
    

    config: Configuration file; created the first time a mosaic is processed using the image server web application. Example of a configuration file

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <mosaic>
        <output>
            <SRID>26904</SRID>
            <directory type = "FS">/net/system123/scratch/user3/installers</directory>
            <tempFsFolder>/net/system123/scratch/user3/installers</tempFsFolder>
            <filename>test</filename>
            <format>GTIFF</format>
            <width>1800</width>
            <height>1406</height>
            <algorithm order = "0">1</algorithm>
            <bands layers = "3"/>
            <nodata>#000000</nodata>
            <pixelType>1</pixelType>
        </output>
        <crop>
            <transform>294444.1905688362,114.06068372059636,0,2517696.9179752027,0,-114.06068372059636</transform>
        </crop>
        <process/>
        <operations>
            <localnot/>
        </operations>
    </mosaic> 
    

Java Example: Using the Loader

public class RestTest 
    public static void main(String args[]) {

        try {
            // Loader http://localhost:7101/imageserver/rest/hdfsloader?path=string&overlap=string
            // Mosaic http://localhost:7101/imageserver/rest/mosaic?mosaic=string&config=string
            String path = "/net/system123/scratch/user3/installers/hawaii/hawaii.tif";
           
            URL url = new URL(
                    "http://system123.example.com:7101/imageserver/rest/hdfsloader?path=" +
                            path + "&overlap=2"); // overlap its optional
           
            HttpURLConnection conn = (HttpURLConnection) url.openConnection();
            conn.setRequestMethod("GET");
            //conn.setRequestProperty("Accept", "application/json");

            if (conn.getResponseCode() != 200) {
                throw new RuntimeException("Failed : HTTP error code : "
                        + conn.getResponseCode());
            }

            BufferedReader br = new BufferedReader(new InputStreamReader(
                    (conn.getInputStream())));

            String output;
            System.out.println("Output from Server .... \n");
            while ((output = br.readLine()) != null) {
                System.out.println(output);
            }

            conn.disconnect();

        } catch (MalformedURLException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }
    }
}

Java Example: Using the Mosaic Processor

public class NetClientPost {
    public static void main(String[] args) {

      try {
          String mosaic = "<?xml version='1.0'?>\n" +
                    "<catalog type='HDFS'>\n" +
                    "    <image>\n" +
                    "       <source>Hadoop File System</source>\n" +
                    "       <type>HDFS</type>\n" +
                    "       <raster>/user/hdfs/newdata/net/system123/scratch/user3/installers/hawaii/hawaii.tif.ohif</raster>\n" +
                    "       <url>http://system123.example.com:7101/imageserver/temp/862b5871973372aab7b62094c575884ae13c3a27_thumb.jpg</url>\n" +
                    "       <bands datatype='1' config='1,2,3'>3</bands>\n" +
                    "    </image>\n" +
                    "</catalog>";
             String config = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
                     "<mosaic>\n" +
                     "<output>\n" +
                     "<SRID>26904</SRID>\n" +
                     "<directory type=\"FS\">/net/system123/scratch/user3/installers</directory>\n" +
                     "<tempFsFolder>/net/system123/scratch/user3/installers</tempFsFolder>\n" +
                     "<filename>test</filename>\n" +
                     "<format>GTIFF</format>\n" +
                     "<width>1800</width>\n" +
                     "<height>1269</height>\n" +
                     "<algorithm order=\"0\">1</algorithm>\n" +
                     "<bands layers=\"3\"/>\n" +
                     "<nodata>#000000</nodata>\n" +
                     "<pixelType>1</pixelType>\n" +
                     "</output>\n" +
                     "<crop>\n" +
                     "<transform>739481.1311601736,130.5820811245199,0,2254053.5858749463,0,-130.5820811245199</transform>\n" +
                     "</crop>\n" +
                     "<process/>\n" +
                     "</mosaic>";
             System.out.println ("asdf");
             URL url2 = new URL("http://192.168.1.67:8080" );
             HttpURLConnection conn2 = (HttpURLConnection) url2.openConnection();
             conn2.setRequestMethod("GET");
             if (conn2.getResponseCode() != 200 ) {
                throw new RuntimeException("Failed : HTTP error code : "
                    + conn2.getResponseCode());
            }
        /*URL url = new URL("http://system123.example.com:7101/imageserver/rest/mosaic?" +("mosaic=" + URLEncoder.encode(mosaic, "UTF-8") + "&config=" + 
                URLEncoder.encode(config, "UTF-8")));
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        conn.setRequestMethod("GET");
       
        if (conn.getResponseCode() != 200 ) {
            throw new RuntimeException("Failed : HTTP error code : "
                + conn.getResponseCode());
        }
        BufferedReader br = new BufferedReader(new InputStreamReader(
                (conn.getInputStream())));
        String output;System.out.println("Output from Server .... \n");
        while ((output = br.readLine()) != null)
            System.out.println(output);
        conn.disconnect();*/

      } catch (MalformedURLException e) {
        e.printStackTrace();
      } catch (IOException e) {
        e.printStackTrace();
     }
    }
}

1.8 Installing the Oracle Big Data Spatial Hadoop Vector Console

To install the Oracle Big Data Spatial Hadoop vector console, follow the instructions in this topic.

1.8.1 Assumptions and Prerequisite Libraries

The following assumptions and prerequisites apply for installing and configure the Spatial Hadoop Vector Console.

1.8.1.1 Assumptions

  • The API and jobs described here run on a Cloudera CDH5.7, Hortonworks HDP 2.4, or similar Hadoop environment.

  • Java 8 or newer versions are present in your environment.

1.8.1.2 Prerequisite Libraries

In addition to the Hadoop environment jars, the libraries listed here are required by the Vector Analysis API.

sdohadoop-vector.jar
sdoutil.jar
sdoapi.jar
ojdbc.jar
commons-fileupload-1.3.1.jar
commons-io-2.4.jar
jackson-annotations-2.1.4.jar
jackson-core-2.1.4.jar
jackson-core-asl-1.8.1.jar
jackson-databind-2.1.4.jar
javacsv.jar
lucene-analyzers-common-4.6.0.jar
lucene-core-4.6.0.jar
lucene-queries-4.6.0.jar
lucene-queryparser-4.6.0.jar
mvsuggest_core.jar

1.8.2 Installing the Spatial Hadoop Vector Console on Oracle Big Data Appliance

You can install the Spatial Hadoop vector console on Big Data Appliance either by using the provided script or by performing a manual configuration.. To use the provided script:

  1. Run the following script to install the console:

    sudo /home/osg/configure-server/install-bdsg-consoles.sh
    

    If the active nodes have changed after the installation, then update the configuration file as described in Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance.

  2. Start the console:

    cd /opt/oracle/oracle-spatial-graph/spatial/web-server 
    ./start-server.sh
    

To perform a manual configuration, follow these steps.

  1. Download the latest Jetty core component binary from the Jetty download page http://www.eclipse.org/jetty/downloads.php onto the Oracle DBA Resource Manager node.
  2. Unzip the spatialviewer.war file into the jetty webapps directory as follows:

    unzip /opt/oracle/oracle-spatial-graph/spatial/vector/console/spatialviewer.war -d $JETTY_HOME/webapps/spatialviewer

    Note:

    The directory or location under which you unzip the file is known as $JETTY_HOME in this procedure.

  3. Copy Hadoop dependencies as follows:

    cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* $JETTY_HOME/webapps/spatialviewer/WEB-INF/lib/

  4. Complete the configuration steps mentioned in the "Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance."
  5. Start the jetty server. $JETTY_HOME/java -Djetty.deploy.scanInterval=0 -jar start.jar

Optionally, upload sample data (used with examples in other topics) to HDFS:

sudo -u hdfs hadoop fs -mkdir /user/oracle/bdsg

sudo -u hdfs hadoop fs -put /opt/oracle/oracle-spatial-graph/spatial/vector/examples/data/tweets.json /user/oracle/bdsg/

1.8.3 Installing the Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)

Follow the steps for manual configuration described in "Installing the Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in step 3 replace the path /opt/cloudera/parcels/CDH/lib/ with the actual library path, which by default is /usr/lib/.

1.8.4 Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance

  1. Edit the configuration file $JETTY_HOME/webapps/spatialviewer/conf/console-conf.xml, or /opt/oracle/oracle-spatial-graph/spatial/web-server/spatialviewer/conf/console-conf.xml if the installation was done using the provided script, to specify your own data for sending email and for other configuration parameters.

    Follow these steps with the configuration parameters

    1. Edit the Notification URL: This is the URL where the console server is running. It has to be visible to the Hadoop cluster to notify the end of the jobs. This is an example settings: <baseurl>http:// hadoop.console.url:8080</baseurl>

    2. Edit the directory with temporary hierarchical indexes: an HDFS path that will contain temporary data on hierarchical relationships. Example: <hierarchydataindexpath>hdfs://hadoop.cluster.url:8020/user/myuser/hierarchyIndexPath</hierarchydataindexpath>

    3. Edit the HDFS directory that will contain the MVSuggest generated index. Example: <mvsuggestindex> hdfs://hadoop.cluster.url:8020/user/myuser /mvSuggestIndex</mvsuggestindex>

    4. If necessary, edit the URL used to get the eLocation background maps. Example: <elocationmvbaseurl>http://elocation.oracle.com/mapviewer</elocationmvbaseurl>

    5. Edit the HDFS directory that will contain the index metadata. Example: <indexmetadatapath>hdfs:// hadoop.cluster.url:8020/user/myuser/indexMetadata</indexmetadatapath>

    6. Edit the HDFS directory with temporary data used by the explore data processes. Example: <exploretempdatapath>hdfs:// hadoop.cluster.url:8020/user/myuser/exploreTmp<exploretempdatapath>

    7. Edit the HDFS directory that will contain information about the jobs run by the console. Example: <jobregistrypath>hdfs:// hadoop.cluster.url:8020/user/myuser/spatialJobRegistry</jobregistrypath>

    8. If necessary disable the display of the jobs in the job details screen. Disable this display if the logs are not in the default format. The default format is: Date LogLevel LoggerName: LogMessage

      The Date must have the default format: yyyy-MM-dd HH:mm:ss,SSS. For example: 2012-11-02 14:34:02,781. To disable the logs, set <displaylogs> to false. Example: <displaylogs>false</displaylogs>

      If the logs are not displayed and <displaylogs> is set to true, then ensure that yarn.log-aggregation-enable in yarn-site.xml is set to true. Also ensure that the Hadoop jobs configuration parameters yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix are set to the same value than in yarn-site.xml.

    9. Edit the general Hadoop jobs configuration: The console uses two Hadoop jobs. The first is used to create a spatial index on existing files in HDFS and the second is used to generate displaying results based on the index. One part of the configuration is common to both jobs and another is specific to each job. The common configuration can be found within the <hadoopjobs><configuration> elements. An example configuration is given here:

      <hadoopjobs>
         <configuration>
                     <property>
              <!--hadoop user. The user is a mandatory property.-->
                       <name>hadoop.job.ugi</name>
                       <value>hdfs</value>
                     </property>
               
                     <property>
              <!-- like defined in core-site.xml
              If in core-site.xml the path fs.defaultFS is define as the nameservice ID
              (High Availability configuration) then set the full address and IPC port 
              of the currently active name node. The service is define in the file hdfs-site.xml.-->
                       <name>fs.defaultFS</name>
                       <value>hdfs://hadoop.cluster.url:8020</value>
                    </property>
               
                    <property>
              <!-- like defined in mapred-site.xml -->
                     <name>mapreduce.framework.name</name>
                     <value>yarn</value>
                   </property>
              
                   <property>
              <!-- like defined in yarn-site.xml -->
                     <name>yarn.resourcemanager.scheduler.address</name>
                     <value>hadoop.cluster.url:8030</value>
                  </property>
              
                  <property>
              <!-- like defined in yarn-site.xml -->
                      <name>yarn.resourcemanager.address</name>
                      <value>hadoop.cluster.url:8032</value>
                  </property>
      
                              <property>
                                      <!-- like defined in yarn-site.xml by default /tmp/logs -->
                                      <name>yarn.nodemanager.remote-app-log-dir</name>
                                      <value>/tmp/logs</value>
                              </property>
                              
                              <property>
                                      <!-- like defined in yarn-site.xml by default logs -->
                                      <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
                                      <value>logs</value>
                              </property>        
      
                  <property>
              <!-- like defined in yarn-site.xml (full path) -->
                      <name>yarn.application.classpath</name>
                     <value>/etc/hadoop/conf/,/opt/cloudera/parcels/CDH/lib/hadoop/*,/opt/cloudera/parcels/CDH/lib/hadoop/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*</value>
                   </property>             
                </configuration>
              <hadoopjobs>
      
  2. Create an index job specific configuration. Additional Hadoop parameters can be specified for the job that creates the spatial indexes. An example additional configuration is:

    <hadoopjobs>
       <configuration>
       ...
       </configuration>
          <indexjobadditionalconfiguration>
             <property>
             <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. -->
                <name>mapred.max.split.size</name>
                <value>1342177280</value>
               </property>    
          </indexjobadditionalconfiguration>
    <hadoopjobs>
    
  3. Create a specific configuration for the job that generates the categorization results. The following is an example of property settings:

    <hadoopjobs>
      <configuration>
       ...
      </configuration>
        
         <indexjobadditionalconfiguration>
          ...
         </indexjobadditionalconfiguration>
     
         <hierarchicaljobadditionalconfiguration>
            <property>
            <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. -->
               <name>mapred.max.split.size</name>
               <value>1342177280</value>
             </property>      
          </hierarchicaljobadditionalconfiguration>
    <hadoopjobs>
    
  4. Specify the Notification emails: The email notifications are sent to notify about the job completion status. This is defined within the <notificationmails> element. It is mandatory to specify a user (<user>), password (<password>) and sender email (<mailfrom>). In the <configuration> element, the configuration properties needed for the Java Mail must be set. This example is a typical configuration to send mails via SMTP server using a SSL connection:

    <notificationmails>
      <!--Authentication parameters. The Authentication parameters are mandatory.-->
        <user>user@mymail.com</user>
        <password>mypassword</password>
        <mailfrom>user@mymail.com</mailfrom>
    
        <!--Parameters that will be set to the system properties. Below the parameters needed to send mails via SMTP server using a SSL connection.      -->
        
        <configuration>
           <property>
             <name>mail.smtp.host</name>
             <value>mail.host.com</value>
           </property>
            
           <property>
             <name>mail.smtp.socketFactory.port</name>
             <value>myport</value>
           </property>
     
           <property>
             <name>mail.smtp.socketFactory.class</name>
             <value>javax.net.ssl.SSLSocketFactory</value>
           </property>
    
           <property>
             <name>mail.smtp.auth</name>
             <value>true</value>
           </property>
        </configuration>
    </notificationmails>
    

1.8.5 Configuring the Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)

Follow the steps mentioned in "Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in the step (General Hadoop Job Configuration), in the Hadoop property yarn.application.classpath replace the /opt/cloudera/parcels/CDH/lib/ with the actual library path, which by default is /usr/lib/.

1.9 Installing Property Graph Support on a CDH Cluster or Other Hardware

You can use property graphs on either Oracle Big Data Appliance or commodity hardware.

1.9.1 Apache HBase Prerequisites

The following prerequisites apply to installing property graph support in HBase.

Details about supported versions of these products, including any interdependencies, will be provided in a My Oracle Support note.

1.9.2 Property Graph Installation Steps

To install property graph support, follow these steps.

  1. Unzip the software package:
    rpm -i oracle-spatial-graph-<version>.x86_64.rpm
    

    By default, the software is installed in the following directory: /opt/oracle/

    After the installation completes, the opt/oracle/oracle-spatial-graph directory exists and includes a property_graph subdirectory.

  2. Set the JAVA_HOME environment variable. For example:
    setenv JAVA_HOME  /usr/local/packages/jdk7
    
  3. Set the PGX_HOME environment variable. For example:
    setenv PGX_HOME /opt/oracle/oracle-spatial-graph/pgx
    
  4. If HBase will be used, set the HBASE_HOME environment variable in all HBase region servers in the Apache Hadoop cluster. (HBASE_HOME specifies the location of the hbase installation directory.) For example:
    setenv HBASE_HOME /usr/lib/hbase
    

    Note that on some installations of Big Data Appliance, Apache HBase is placed in a directory like the following: /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hbase/

  5. If HBase will be used, copy the data access layer library into $HBASE_HOME/lib. For example:
    cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal*.jar $HBASE_HOME/lib
    
  6. Tune the HBase or Oracle NoSQL Database configuration, as described in other tuning topics.
  7. Log in to Cloudera Manager as the admin user, and restart the HBase service. Restarting enables the Region Servers to use the new configuration settings.

1.9.3 About the Property Graph Installation Directory

The installation directory for Oracle Big Data Spatial and Graph property graph features has the following structure:

$ tree -dFL 2 /opt/oracle/oracle-spatial-graph/property_graph/
/opt/oracle/oracle-spatial-graph/property_graph/
|-- dal
|   |-- groovy
|   |-- opg-solr-config
|   `-- webapp
|-- data
|-- doc
|   |-- dal
|   `-- pgx
|-- examples
|   |-- dal
|   |-- pgx
|   `-- pyopg
|-- lib
|-- librdf
`-- pgx
    |-- bin
    |-- conf
    |-- groovy
    |-- scripts
    |-- webapp
    `-- yarn

1.9.4 Optional Installation Task for In-Memory Analyst Use

Follow this installation task if property graph support is installed on a client without Hadoop, and you want to read graph data stored in the Hadoop Distributed File System (HDFS) into the in-memory analyst and write the results back to the HDFS, and/or use Hadoop NextGen MapReduce (YARN) scheduling to start, monitor and stop the in-memory analyst.

1.9.4.1 Installing and Configuring Hadoop

To install and configure Hadoop, follow these steps.

  1. Download the tarball for a supported version of the Cloudera CDH.
  2. Unpack the tarball into a directory of your choice. For example:
    tar xvf hadoop-2.5.0-cdh5.2.1.tar.gz -C /opt
    
  3. Have the HADOOP_HOME environment variable point to the installation directory. For example.
    export HADOOP_HOME=/opt/hadoop-2.5.0-cdh5.2.1
    
  4. Add $HADOOP_HOME/bin to the PATH environment variable. For example:
    export PATH=$HADOOP_HOME/bin:$PATH
    
  5. Configure $HADOOP_HOME/etc/hadoop/hdfs-site.xml to point to the HDFS name node of your Hadoop cluster.
  6. Configure $HADOOP_HOME/etc/hadoop/yarn-site.xml to point to the resource manager node of your Hadoop cluster.
  7. Configure the fs.defaultFS field in $HADOOP_HOME/etc/hadoop/core-site.xml to point to the HDFS name node of your Hadoop cluster.

1.9.4.2 Running the In-Memory Analyst on Hadoop

When running a Java application using in-memory analytics and HDFS, make sure that $HADOOP_HOME/etc/hadoop is on the classpath, so that the configurations get picked up by the Hadoop client libraries. However, you do not need to do this when using the in-memory analyst shell, because it adds $HADOOP_HOME/etc/hadoop automatically to the classpath if HADOOP_HOME is set.

You do not need to put any extra Cloudera Hadoop libraries (JAR files) on the classpath. The only time you need the YARN libraries is when starting the in-memory analyst as a YARN service. This is done with the yarn command, which automatically adds all necessary JAR files from your local installation to the classpath.

You are now ready to load data from HDFS or start the in-memory analyst as a YARN service. For further information about Hadoop, see the CDH 5.x.x documentation.

1.10 Installing and Configuring Multimedia Analytics Support

To use the Multimedia analytics feature, the video analysis framework must be installed and configured.

1.10.1 Assumptions and Libraries for Multimedia Analytics

If you have licensed Oracle Big Data Spatial and Graph with Oracle Big Data Appliance, the video analysis framework for Multimedia analytics is already installed and configured. However, you must set $MMA_HOME to point to /opt/oracle/oracle-spatial-graph/multimedia.

Otherwise, you can install the framework on Cloudera CDH 5 or similar Hadoop environment, as follows:

  1. Install the framework by using the following command on each node on the cluster:

    rpm2cpio oracle-spatial-graph-<version>.x86_64.rpm | cpio -idmv
    
  2. Set $MMA_HOME to point to /opt/oracle/oracle-spatial-graph/multimedia.

  3. Identify the locations of the following libraries:

    • Hadoop jar files (available in $HADOOP_HOME/jars)

    • Video processing libraries (see Transcoding Software (Options)

    • OpenCV libraries (available with the product)

  4. If necessary, install the desired video processing software to transcode video data (see Transcoding Software (Options)).

1.10.2 Transcoding Software (Options)

The following options are available for transcoding video data:

  • JCodec

  • FFmpeg

  • Third-party transcoding software

To use Multimedia analytics with JCodec (which is included with the product), when running the Hadoop job to recognize faces, set the oracle.ord.hadoop.ordframegrabber property to the following value: oracle.ord.hadoop.decoder.OrdJCodecFrameGrabber

To use Multimedia analytics with FFmpeg:

  1. Download FFmpeg from: https://www.ffmpeg.org/.

  2. Install FFmpeg on the Hadoop cluster.

  3. Set the oracle.ord.hadoop.ordframegrabber property to the following value: oracle.ord.hadoop.decoder.OrdFFMPEGFrameGrabber

To use Multimedia analytics with custom video decoding software, implement the abstract class oracle.ord.hadoop.decoder.OrdFrameGrabber. See the Javadoc for more details