This chapter provides an overview of Oracle Big Data support for Oracle Spatial and Graph spatial and property graph features.
Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance
Installing and Configuring the Big Data Spatial Image Processing Framework
Installing and Configuring the Big Data Spatial Image Server
Installing Property Graph Support on a CDH Cluster or Other Hardware
Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms
The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.
The property graph features support Apache Hadoop HBase and Oracle NoSQL Database for graph operations, indexing, queries, search, and in-memory analytics.
Spatial location information is a common element of Big Data. Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.
The Spatial features of Oracle Big Data Special and Graph support those use cases with the following kinds of services.
Vector Services:
Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy
Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats
An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems
Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others
Spatial indexing for fast retrieval of data
Raster Services:
Support for hundreds of image file formats supported by GDAL and image files stored in HDFS
A sample console to view the set of images that are available
Raster operations, including, subsetting, georeferencing, mosaics, and format conversion
Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.
Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs.
The property graph features of Oracle Big Data Special and Graph support those use cases with the following capabilities:
A scalable graph database on Apache HBase and Oracle NoSQL Database
Developer-based APIs based upon Tinkerpop Blueprints and Rexter REST APIs, and Java graph APIs
Text search and query through integration with Apache Lucene and SolrCloud
Scripting languages support for Groovy and Python
A parallel, in-memory graph analytics engine
A fast, scalable suite of social network analysis functions that include ranking, centrality, recommender, community detection, path finding
Parallel bulk load and export of property graph data in Oracle-defined flat files format
Manageability through a Groovy-based console to execute Java and Tinkerpop Gremlin APIs
See also Property Graph Sizing Recommendations
The following are recommendations for property graph installation.
Table 1-1 Property Graph Sizing Recommendations
Graph Size | Recommended Physical Memory to be Dedicated | Recommended Number of CPU Processors |
---|---|---|
10 to 100M edges |
Up to 14 GB RAM |
2 to 4 processors, and up to 16 processors for more compute-intensive workloads |
100M to 1B edges |
14 GB to 100 GB RAM |
4 to 12 processors, and up to 16 to 32 processors for more compute-intensive workloads |
Over 1B edges |
Over 100 GB RAM |
12 to 32 processors, or more for especially compute-intensive workloads |
The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including both the spatial and property graph capabilities. You can enable this option during an initial software installation, or afterward using the bdacli
utility.
To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.
To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster
See Also:
Oracle Big Data Appliance Owner's Guide for software configuration instructions.Installing and configuring the Image Processing Framework depends upon the distribution being used.
The Oracle Big Data Appliance cluster distribution comes with a pre-installed setup, but you must follow few steps in Installing Image Processing Framework for Oracle Big Data Appliance Distribution to get it working.
For a commodity distribution, follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).
After performing the installation, verify it (see Post-installation Verification of the Image Processing Framework).
The Oracle Big Data Appliance distribution comes with a pre-installed configuration. However, perform the following steps to ensure that it works.
Identify the HADOOP_LIB_PATH
for Oracle Big Data Appliance is under /opt/cloudera/parcels/CDH/lib/hadoop/lib/
.
Make the imageserver
folder under opt/shareddir/spatial/demo/
write accessible as follows:
chmod 777 /opt/shareddir/spatial/demo/imageserver/
Make the libproj.so
(Proj.4) Cartographic Projections Library accessible to the users, and copy the libproj.so
to the HADOOP_LIB_PATH/native
under Resource Manager Node (and any backup Resource Manager Nodes) as follows:
cp libproj.so HADOOP_LIB_PATH/native
Create a folder native
under /opt/shareddir/spatial/demo/imageserver/
and copy the libproj.so
and GDAL libraries to that folder as follows:
mkdir /opt/shareddir/spatial/demo/imageserver/native cp libproj.so /opt/shareddir/spatial/demo/imageserver/native cp -R /opt/oracle/oracle-spatial-graph/spatial/gdal/lib/. /opt/shareddir/spatial/demo/imageserver/native
For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.
Ensure that HADOOP_LIB_PATH
is under /usr/lib/hadoop
.
Install NFS.
Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.
Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER This folder is used for running the test scripts and must be pointed to /opt/shareddir
. If not, then modify the test script accordingly to point to this folder.
Download oracle-spatial-graph-1.0-1.x86_64.rpm
from the Oracle e-delivery web site.
Execute oracle-spatial-graph-1.0-1.x86_64.rpm
using the rpm command.
After the rpm executes, verify that a directory structure created at /opt/oracle/oracle-spatial-graph/spatial
contains three folders: jlib,
gdal
and demo
.
Copy the content under jlib
(bunch of jar files) to HADOOP_LIB_PATH
directory in every cluster node.
In the Resource Manager Node (and any backup Resource Manager Nodes), copy all the content from /opt/oracle/oracle-spatial-graph/spatial/gdal/lib
to the HADOOP_LIB_PATH/native directory
as follows:
cp -R --preserve=links /opt/oracle/oracle-spatial-graph/spatial/gdal/lib/. HADOOP_LIB_PATH/native
In the Resource Manager Node (and any backup Resource Manager Nodes), copy the gdal
data folder under /opt/oracle/oracle-spatial-graph/spatial/gdal
and gdalplugins
under /opt/oracle/oracle-spatial-graph/spatial/gdal
into the SHARED_FOLDER as follows:
cp -R /opt/oracle/oracle-spatial-graph/spatial/gdal/data SHARED_FOLDER
cp –R /opt/oracle/oracle-spatial-graph/spatial/gdal/gdalplugins SHARED_FOLDER
Create a folder ALL_ACCESS_FOLDER
with write access for all users under SHARED_FOLDER. This folder must be named as spatial in order for test scripts to be executed as they are, otherwise modify test script accordingly, as follows:
Go to the shared folder.
cd /opt/shareddir
Create a spatial folder.
mkdir spatial
Provide write access.
chmod 777 spatial
Copy the demo folder under /opt/oracle/oracle-spatial-graph/spatial/demo
into ALL_ACCESS_FOLDER
.
cp -R /opt/oracle/oracle-spatial-graph/spatial/demo /opt/shareddir/spatial
Provide write access to the imageserver
folder under demo as follows:
chmod 777 /opt/shareddir/spatial/demo/imageserver/
Copy the user library libproj.so
into the HADOOP_LIB_PATH/native
as follows:
cp libproj.so HADOOP_LIB_PATH/native
Create a folder native
under /opt/shareddir/spatial/demo/imageserver/
and copy the libproj.so
and GDAL libraries to that folder as follows:
mkdir /opt/shareddir/spatial/demo/imageserver/native cp libproj.so /opt/shareddir/spatial/demo/imageserver/native cp -R /opt/oracle/oracle-spatial-graph/spatial/gdal/lib/. /opt/shareddir/spatial/demo/imageserver/native
Provide read and execute permissions for the libproj.so
library to all users as follows:
chmod 755 /opt/shareddir/spatial/demo/imageserver/native/libproj.so
Set the following GDAL environment variables in the Resource Manager Node (and any backup Resource Manager Nodes):
GDAL_DRIVER_PATH=HADOOP_LIB_PATH/native/gdalplugins
GDAL_DATA=SHARED_FOLDER/data
Create or update the shared libraries to list the GDAL libraries location:
LD_LIBRARY_PATH=HADOOP_LIB_PATH/native
Two test scripts are provided, one to test the Image Loading functionality and another to test the Image Processing functionality. Execute these two scripts as mentioned in this section to verify a successful installation of Image Processing Framework.
This test script loads three Hawaii images into HDFS, and one block is created for each one of them. The test script can be found in the following path: /opt/shareddir/spatial/demo/imageserver/runimageloader.sh
.
The script can be executed using the following example.
Note:
All users must have write permission to the parent folder/opt/shareddir/spatial/demo/imageserver
.From the command line type sudo -u hdfs ./runimageloader.sh
.
Upon a successful execution a list of the created images and thumbnails on HDFS should be listed after the following message: Generated ohifs files are
with the list of files displayed and followed by this message: Thumbnails created are
with the list of thumbnails.
If the three images and its corresponding thumbnails are listed, then the loading process was successful. If this step was reached, the install and configuration executed successfully.
If the installation and configuration was not successful, then the output is not generated and a message Not all the images were uploaded correctly, check for Hadoop logs
.
This test script creates a mosaic with the three pre-loaded Hawaii images. The mosaic created is 1600 x 1447 pixels. The test script can be found at /opt/shareddir/spatial/demo/imageserver/runimageprocessor.sh
.
The script can be executed using the following example.
Note:
All users must have write permission to the parent folder/opt/shareddir/spatial/demo/imageserver
.From the command line type sudo -u hdfs ./runimageprocessor.sh
.
Upon a successful execution a message is displayed: Expected output file: /opt/shareddir/spatial/processtest/littlemap.tif
.
If the installation and configuration was successful, then the output is generated with a message Mosaic image generated
.
If the installation and configuration was not successful, then the output is not generated and a message Mosaic was not successfully created, check for Hadoop logs
.
You can access the image processing framework through the Oracle Big Data Spatial Image Server, which provides a web interface for loading and processing images.
Installing and configuring the Spatial Image Server depends upon the distribution being used.
Installing and Configuring the Image Server for Oracle Big Data Appliance
Installing and Configuring the Image Server Web for Other Systems (Not Big Data Appliance)
After you perform the installation, verify it (see Post-installation Verification Example for the Image Server Console).
Follow the instructions in this topic.
Download the latest Jetty core component binary from the Jetty download page http://www.eclipse.org/jetty/downloads.php
onto the Oracle DBA Resource Manager node.
Unzip the imageserver.war
file into the jetty webapps
directory or any other directory of choice as follows:
unzip /opt/oracle/oracle-spatial-graph/spatial/jlib/imageserver.war -d $JETTY_HOME/webapps/imageserver
Note:
The directory or location under which you unzip the file is known as$JETTY_HOME
in this procedure.Copy Hadoop dependencies as follows:
cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* $JETTY_HOME/webapps/imageserver/WEB-INF/lib/
Edit the $JETTY_HOME/start.ini
file and change the property jsp-impl=apache
to jsp-impl=glassfish
optionally. You can download these jars from http://mvnrepository.com/
or another Apache jar provider:
xalan-2.7.1.jar
xerscesImpl-2.11.0.jar
xml-apis-1.4.01.jar
serializer-2.7.1.jar
Copy these jars to $JETTY_HOME/lib/apache-jsp
.
Check the version by running: $JETTY_HOME/java -jar start.jar –version
Copy the gdal.jar
file under /opt/oracle/oracle-spatial-graph/spatial/jlib/gdal.jar
to $JETTY_HOME/lib/ext
.
Copy the /opt/oracle/oracle-spatial-graph/spatial/conf/jetty-imgserver-realm.properties
file to $JETTY_HOME/etc
folder
Edit the $JETTY_HOME/etc/jetty-imgserver-realm.properties
file to add a password and role
Remove the <password>
and type a new password.
Remove the <>
from the <admin_role>
text and keep the admin_role
.
Start the jetty server by running: java -jar $JETTY_HOME/start.jar.
Type the http://thehost:8080/imageserver/console.jsp
address in your browser address bar to open the console.
Log in to the console using the credentials you created in "Installing Image Server Web on an Oracle Big Data Appliance."
From the Configuration tab in the Hadoop Configuration Parameters section, depending on the cluster configuration change these three properties
fs.defaultFS
: Type the active namenode
of your cluster in the format hdfs://<namenode>:8020
(Check with the administrator for this information).
yarn.resourcemanager.scheduler.address
: Active Resource manager of your cluster. <shcedulername>:8030.
This is the Scheduler address.
yarn.resourcemanager.address
: Active Resource Manager address in the format <resourcename>:8032
Note:
Keep the default values for the rest of the configuration. They are pre-loaded for your Oracle Big Data Appliance cluster environment.Click Apply Changes to save the changes.
Tip:
You can review the missing configuration information under the Hadoop Loader tab of the console.Follow the instructions in this topic.
Follow the instructions specified in "Prerequisites for Installing the Image Processing Framework for Other Distributions."
Follow the instructions specified in "Installing the Image Processing Framework for Other Distributions."
Follow the instructions specified in "Configuring the environment."
Follow the instructions specified in "Prerequisites for installing Image Server on Oracle Big Data Appliance."
Follow the instructions specified in "Installing Image Server Web on an Oracle Big Data Appliance."
Follow the instructions specified in "Configuring the Environment."
Type the http://thehost:8080/imageserver/console.jsp
address in your browser address bar to open the console.
Log in to the console using the credentials you created in "Installing Image Server Web on an Oracle Big Data Appliance."
From the Configuration tab in the Hadoop Configuration Parameters section, depending on the cluster configuration change these three properties
Specify a shared folder to start browsing the images. This folder must be shared between the cluster and NFS mountpoint (SHARED_FOLDER).
Create a child folder named saveimages
under Start with full write access. For example, if Start=/home, then saveimages=/home/saveimages
.
If the cluster requires a mount point to access the SHARED_FOLDER, specify a mount point. For example, /net/home
. Else, leave it blank and proceed.
Type the folder path that contains the Hadoop native libraries and additional libraries (HADOOP_LIB_PATH
).
yarn.application.classpath: Type the classpath for the Hadoop to find the required jars and dependencies. Usually this is under /usr/lib/hadoop
.
Note:
Keep the default values for the rest of the configuration. They are pre-loaded for your Oracle Big Data Appliance cluster environment.Click Apply Changes to save the changes.
Tip:
You can review the missing configuration information under the Hadoop Loader tab of the console.In this example, you will:
Load the images from local server to HDFS Hadoop cluster.
Run a job to create a mosaic image file and a catalog with several images.
View the mosaic image.
Related subtopics:
Open (http://<hostname>:8080/imageserver/console.jsp
) the Image Server Console.
Log in using the default user/password as admin
/admin
.
Go to the Hadoop Loader
tab.
Click Open
and browse to the demo
folder that contains a set of Hawaii images. They can be found at /opt/shareddir/spatial/demo/imageserver/images
.
Select the images
folder and click Load images
.
Wait for the message, 'Images loaded successfully'.
Note:
If no errors were shown, then you have successfully installed the Image Loader web interface.Go to the Raster Image processing
tab.
From the Catalog menu select Catalog
> New Catalog
> HDFS Catalog
.
A new catalog is created.
From the Imagery menu select Imagery
> Add hdfs image
.
Browse the HDFS host and add images.
A new file tree gets created with all the images you just loaded from your host.
Browse the newdata
folder and verify the images.
Select the images listed in the pre-visualizer and add click Add
.
The images are added to the bottom sub-panel.
Click Add images
.
The images are added to the main catalog.
Save the catalog.
From the Imagery menu select Imagery
> Mosaic
.
Click Load default configuration file
, browse to /opt/shareddir/spatial/demo/imageserver
and select testFS.xml
.
Note:
The default configuration filetestFS.xml
is included in the demo.Click Create Mosaic
.
Wait for the image to be created.
Optionally, to download and view the image click Download
.
Follow the instructions in this topic.
Installing Spatial Hadoop Vector Console on Oracle Big Data Appliance
Installing Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)
Configuring Spatial Hadoop Vector Console on Oracle Big Data Appliance
Configuring Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)
The following assumptions and prerequisites are a must for installing and configure the Spatial Hadoop Vector Console.
The API and jobs described here runs on a CDH5 or similar Hadoop environment.
Java 6 or newer versions are present in your environment.
MVSuggest must be installed separately. Download it from the http://www.oracle.com/technetwork/index.html
site.
Download the latest Jetty core component binary from the Jetty download page http://www.eclipse.org/jetty/downloads.php
onto the Oracle DBA Resource Manager node.
Unzip the spatialviewer.war
file into the jetty webapps
directory or any other directory of choice as follows:
unzip /opt/oracle/oracle-spatial-graph/spatial/jlib/spatialviewer.war -d $JETTY_HOME/webapps/spatialviewer
Note:
The directory or location under which you unzip the file is known as$JETTY_HOME
in this procedure.Copy Hadoop dependencies as follows:
cp /opt/cloudera/parcels/CDH/lib/hadoop/client/* $JETTY_HOME/webapps/spatialviewer/WEB-INF/lib/
Complete the configuration steps mentioned in the "Configuring Spatial Hadoop Vector Console on Oracle Big Data Appliance."
Start the jetty server. $JETTY_HOME/java -jar start.jar
Follow the steps mentioned in "Installing Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in step 3 replace the path /opt/cloudera/parcels/CDH/lib/ with /usr/lib/.
Edit the configuration file $JETTY_HOME/webapps/spatialviewer/conf/console-conf.xml
to specify your own data to send mails and change the directory used by the console to build temporary hierarchical indexes.
You can as well change the properties of the Hadoop jobs. The configuration parameters are:
Edit the Notification URL: This is the URL where the console server is running. It has to be visible to the Hadoop cluster to notify the end of the jobs. This is an example settings: <baseurl>http:// hadoop.console.url:8080</baseurl>
Directory with temporary hierarchical indexes: An HDFS path that will contain temporary data on hierarchical relationships. This is an example settings: <hierarchydataindexpath>hdfs://hadoop.cluster.url:8020/user/myuser/hierarchyIndexPath</hierarchydataindexpath>
General Hadoop jobs configuration: The console uses two Hadoop jobs. The first is used to create a spatial index on existing files in HDFS and the second is used to generate displaying results based on the index. One part of the configuration is common to both jobs and another is specific to each job. The common configuration can be found within the <hadoopjobs><configuration>
elements. An example configuration is given here:
<hadoopjobs> <configuration> <property> <!--hadoop user. The user is a mandatory property.--> <name>hadoop.job.ugi</name> <value>hdfs</value> </property> <property> <!-- like defined in core-site.xml If in core-site.xml the path fs.defaultFS is define as the nameservice ID (High Availability configuration) then set the full address and IPC port of the currently active name node. The service is define in the file hdfs-site.xml.--> <name>fs.defaultFS</name> <value>hdfs://hadoop.cluster.url:8020</value> </property> <property> <!-- like defined in mapred-site.xml --> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <!-- like defined in yarn-site.xml --> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop.cluster.url:8030</value> </property> <property> <!-- like defined in yarn-site.xml --> <name>yarn.resourcemanager.address</name> <value>hadoop.cluster.url:8032</value> </property> <property> <!-- like defined in yarn-site.xml (full path) --> <name>yarn.application.classpath</name> <value>/etc/hadoop/conf/,/opt/cloudera/parcels/CDH/lib/hadoop/*,/opt/cloudera/parcels/CDH/lib/hadoop/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*</value> </property> </configuration> <hadoopjobs>
Create an index job specific configuration. Additional Hadoop parameters can be specified for the job that creates the spatial indexes. An example additional configuration is given here:
<hadoopjobs> <configuration> ... </configuration> <indexjobadditionalconfiguration> <property> <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. --> <name>mapred.max.split.size</name> <value>1342177280</value> </property> </indexjobadditionalconfiguration> <hadoopjobs>
Create a specific configuration for the job that generates the results. The same applies for the second job. The following is an example property settings:
<hadoopjobs> <configuration> ... </configuration> <indexjobadditionalconfiguration> ... </indexjobadditionalconfiguration> <hierarchicaljobadditionalconfiguration> <property> <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. --> <name>mapred.max.split.size</name> <value>1342177280</value> </property> </hierarchicaljobadditionalconfiguration> <hadoopjobs>
Specify the Notification emails: The email notifications are sent to notify about the job completion status. This is defined within the <notificationmails>
element. It is mandatory to specify a user (<user>
), password (<password>
) and sender email (<mailfrom>
). In the <configuration>
element, the configuration properties needed for the Java Mail must be set. This example is a typical configuration to send mails via SMTP server using a SSL connection:
<notificationmails> <!--Authentication parameters. The Authentication parameters are mandatory.--> <user>user@mymail.com</user> <password>mypassword</password> <mailfrom>user@mymail.com</mailfrom> <!--Parameters that will be set to the system properties. Below the parameters needed to send mails via SMTP server using a SSL connection. --> <configuration> <property> <name>mail.smtp.host</name> <value>mail.host.com</value> </property> <property> <name>mail.smtp.socketFactory.port</name> <value>myport</value> </property> <property> <name>mail.smtp.socketFactory.class</name> <value>javax.net.ssl.SSLSocketFactory</value> </property> <property> <name>mail.smtp.auth</name> <value>true</value> </property> </configuration> </notificationmails>
Follow the steps mentioned in "Configuring Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in step 1 C (General Hadoop Job Configuration
), in the Hadoop property yarn.application.classpath
replace the /opt/cloudera/parcels/CDH/lib/
with the actual library path, by default /usr/lib/
.
You can use property graphs on either Oracle Big Data Appliance or commodity hardware.
The following prerequisites apply to installing property graph support in HBase.
Linux operating system
Cloudera's Distribution including Apache Hadoop (CDH)
For the software download, see: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
Apache HBase
Java Development Kit
Details about supported versions of these products, including any interdependencies, will be provided in a My Oracle Support note.
To install property graph support, follow these steps.
Unzip the software package:
rpm -i oracle-spatial-graph-1.0-1.x86_64.rpm
By default, the software is installed in the following directory: /opt/oracle/
After the installation completes, the opt/oracle/oracle-spatial-graph
directory exists and includes a property_graph
subdirectory.
Set the JAVA_HOME environment variable. For example:
setenv JAVA_HOME /usr/local/packages/jdk7
Set the PGX_HOME environment variable. For example:
setenv PGX_HOME /opt/oracle/oracle-spatial-graph/pgx
If HBase will be used, set the HBASE_HOME
environment variable in all HBase region servers in the Apache Hadoop cluster. (HBASE_HOME
specifies the location of the hbase
installation directory.) For example:
setenv HBASE_HOME /usr/lib/hbase
Note that on some installations of Big Data Appliance, Apache HBase is placed in a directory like the following: /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hbase/
If HBase will be used, copy the data access layer library into $HBASE_HOME/lib. For example:
cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal*.jar $HBASE_HOME/lib
Tune the HBase or Oracle NoSQL Database configuration, as described in other tuning topics.
Log in to Cloudera Manager as the admin
user, and restart the HBase service. Restarting enables the Region Servers to use the new configuration settings.
The installation directory for Oracle Big Data Spatial and Graph property graph features has the following structure:
$ tree -dFL 2 /opt/oracle/oracle-spatial-graph/property_graph/
/opt/oracle/oracle-spatial-graph/property_graph/
|-- dal
| |-- groovy
| |-- opg-solr-config
| `-- webapp
|-- data
|-- doc
| |-- dal
| `-- pgx
|-- examples
| |-- dal
| |-- pgx
| `-- pyopg
|-- lib
|-- librdf
`-- pgx
|-- bin
|-- conf
|-- groovy
|-- scripts
|-- webapp
`-- yarn
Follow this installation task if property graph support is installed on a client without Hadoop, and you want to read graph data stored in the Hadoop Distributed File System (HDFS) into in-memory analytics and write the results back to the HDFS, and/or use Hadoop NextGen MapReduce (YARN) scheduling to start, monitor and stop in-memory analytics
To install and configure Hadoop, follow these steps.
Download the tarball for a supported version of the Cloudera CDH.
Unpack the tarball into a directory of your choice. For example:
tar xvf hadoop-2.5.0-cdh5.2.1.tar.gz -C /opt
Have the HADOOP_HOME
environment variable point to the installation directory. For example.
export HADOOP_HOME=/opt/hadoop-2.5.0-cdh5.2.1
Add $HADOOP_HOME/bin
to the PATH
environment variable. For example:
export PATH=$HADOOP_HOME/bin:$PATH
Configure $HADOOP_HOME/etc/hadoop/hdfs-site.xml
to point to the HDFS name node of your Hadoop cluster.
Configure $HADOOP_HOME/etc/hadoop/yarn-site.xml
to point to the resource manager node of your Hadoop cluster.
Configure the fs.defaultFS
field in $HADOOP_HOME/etc/hadoop/core-site.xml
to point to the HDFS name node of your Hadoop cluster.
When running a Java application using in-memory analytics and HDFS, make sure that $HADOOP_HOME/etc/hadoop
is on the classpath, so that the configurations get picked up by the Hadoop client libraries. However, you do not need to do this when using the In-Memory Analytics Shell, because it adds $HADOOP_HOME/etc/hadoop
automatically to the classpath if HADOOP_HOME
is set.
You do not need to put any extra Cloudera Hadoop libraries (JAR files) on the classpath. The only time you need the YARN libraries is when starting In-Memory Analytics as a YARN service. This is done with the yarn
command, which automatically adds all necessary JAR files from your local installation to the classpath.
You are now ready to load data from HDFS or start In-Memory Analytics as a YARN service. For further information about Hadoop, refer to the CDH 5.2.x documentation.