1 Big Data Spatial and Graph Overview
- About Big Data Spatial and Graph
Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities on supported Apache Hadoop Big Data platforms. - Spatial Features
Spatial location information is a common element of Big Data. - Property Graph Features
Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks. - Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance
The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including the spatial and property graph capabilities. - Installing and Configuring the Big Data Spatial Image Processing Framework
Installing and configuring the Image Processing Framework depends upon the distribution being used. - Installing the Oracle Big Data SpatialViewer Web Application
To install the Oracle Big Data SpatialViewer web application (SpatialViewer), follow the instructions in this topic. - Installing Big Data Spatial and Graph in Non-BDA Environments
Some actions may be required if you install Big Data Spatial and Graph in an environment other than Oracle Big Data Appliance. - Required Application Code Changes due to Upgrades
Application code changes may be required due to upgrades, such as to more recent versions of Apache HBase and SolrCloud.
1.1 About Big Data Spatial and Graph
Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities on supported Apache Hadoop Big Data platforms.
The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.
Parent topic: Big Data Spatial and Graph Overview
1.2 Spatial Features
Spatial location information is a common element of Big Data.
Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.
The spatial features of Oracle Big Data Spatial and Graph support those use cases with the following kinds of services.
Vector Services:
-
Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy
-
Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats
-
An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems
-
Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others
-
Spatial indexing for fast retrieval of data
Raster Services:
-
Support for many image file formats supported by GDAL and image files stored in HDFS
-
A sample console to view the set of images that are available
-
Raster operations, including, subsetting, georeferencing, mosaics, and format conversion
Parent topic: Big Data Spatial and Graph Overview
1.3 Property Graph Features
Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.
Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs. These use cases are supported by the property graph features of Oracle Big Data Spatial and Graph.
Property Graph features in Big Data platforms are enabled by using Oracle Graph HDFS Connector which is part of Oracle Graph Server and Client. Relevant features of Oracle Graph Server and Client are supported by accessing data in Apache HDFS using this connector. See Oracle Database Graph Developer's Guide for Property Graph for more information.
Parent topic: Big Data Spatial and Graph Overview
1.4 Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance
The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including the spatial and property graph capabilities.
You can enable this option during an initial software installation, or afterward using the bdacli
utility.
To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.
To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster.
See Also:
Oracle Big Data Appliance Owner's Guide for software configuration instructions.
Parent topic: Big Data Spatial and Graph Overview
1.5 Installing and Configuring the Big Data Spatial Image Processing Framework
Installing and configuring the Image Processing Framework depends upon the distribution being used.
-
The Oracle Big Data Appliance cluster distribution comes with a pre-installed setup, but you must follow few steps in Installing the Image Processing Framework for Oracle Big Data Appliance Distribution to get it working.
-
For a commodity distribution, follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).
For both distributions:
-
You must download and compile PROJ libraries, as explained in Getting and Compiling the Cartographic Projections Library.
-
After performing the installation, verify it (see Post-installation Verification of the Image Processing Framework).
-
If the cluster has security enabled, make sure that the user executing the jobs is in the
princs
list and has an active Kerberos ticket.
- Getting and Compiling the Cartographic Projections Library
- Installing the Image Processing Framework for Oracle Big Data Appliance Distribution
The Oracle Big Data Appliance distribution comes with a pre-installed configuration, though you must ensure that the image processing framework has been installed. - Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance)
For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section. - Post-installation Verification of the Image Processing Framework
Several test scripts are provided to perform the following verification operations.
Parent topic: Big Data Spatial and Graph Overview
1.5.1 Getting and Compiling the Cartographic Projections Library
Before installing the Image Processing Framework, you must download the Cartographic Projections Library and perform several related operations.
-
Download the PROJ.4 source code and datum shifting files:
$ wget http://download.osgeo.org/proj/proj-4.9.1.tar.gz $ wget http://download.osgeo.org/proj/proj-datumgrid-1.5.tar.gz
-
Untar the source code, and extract the datum shifting files in the
nad
subdirectory:$ tar xzf proj-4.9.1.tar.gz $ cd proj-4.9.1/nad $ tar xzf ../../proj-datumgrid-1.5.tar.gz $ cd ..
-
Configure, make, and install PROJ.4:
$ ./configure $ make $ sudo make install $ cd ..
libproj.so
is now available at/usr/local/lib/libproj.so
. -
Copy the
libproj.so
file in the spatial installation directory:cp /usr/local/lib/libproj.so /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
-
Provide read and execute permissions for the
libproj.so
library for all userssudo chmod 755 /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
1.5.2 Installing the Image Processing Framework for Oracle Big Data Appliance Distribution
The Oracle Big Data Appliance distribution comes with a pre-installed configuration, though you must ensure that the image processing framework has been installed.
Be sure that the actions described in Getting and Compiling the Cartographic Projections Library have been performed, so that libproj.so
(PROJ.4
) is accessible to all users and is set up correctly.
For OBDA, ensure that the following directories exist:
-
SHARED_DIR (shared directory for all nodes in the cluster):
/opt/shareddir
-
ALL_ACCESS_DIR (shared directory for all nodes in the cluster with Write access to the hadoop group):
/opt/shareddir/spatial
1.5.3 Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance)
For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.
1.5.3.1 Prerequisites for Installing the Image Processing Framework for Other Distributions
-
Ensure that
HADOOP_LIB_PATH
is under/usr/lib/hadoop
. If it is not there, find the path and use it as it yourHADOOP_LIB_PATH
. -
Install NFS.
-
Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.
-
Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER
-
Download
oracle-spatial-graph-<version>.x86_64.rpm
from the Oracle e-delivery web site. -
Execute
oracle-spatial-graph-<version>.x86_64.rpm
using the rpm command. -
After rpm executes, verify that a directory structure created at
/opt/oracle/oracle-spatial-graph/spatial/raster
contains these folders:console
,examples
,jlib
,gdal
, andtests
. Additionally,index.html
describes the content, andjavadoc.zip
contains the Javadoc for the API..
1.5.4 Post-installation Verification of the Image Processing Framework
Several test scripts are provided to perform the following verification operations.
-
Test the image loading functionality
-
Test test the image processing functionality
-
Test a processing class for slope calculation in a DEM and a map algebra operation
-
Verify the image processing of a single raster with no mosaic process (it includes a user-provided function that calculates hill shade in the mapping phase).
-
Test processing of two rasters using a mask operation
Execute these scripts to verify a successful installation of image processing framework.
If the cluster has security enabled, make sure the current user is in the princs
list and has an active Kerberos ticket.
Make sure the user has write access to ALL_ACCESS_FOLDER and that it belongs to the owner group for this directory. It is recommended that jobs be executed in Resource Manager node for Big Data Appliance. If jobs are executed in a different node, then the default is the hadoop group.
For GDAL to work properly, the libraries must be available using $LD_LIBRARY_PATH. Make sure that the shared libraries path is set properly in your shell window before executing a job. For example:
export LD_LIBRARY_PATH=$ALLACCESSDIR/gdal/native
1.5.4.1 Image Loading Test Script
This script loads a set of six test rasters into the ohiftest
folder in HDFS, 3 rasters of byte data type and 3 bands, 1 raster (DEM) of float32 data type and 1 band, and 2 rasters of int32 data type and 1 band. No parameters are required for OBDA environments and a single parameter with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
Internally, the job creates a split for every raster to load. Split size depends on the block size configuration; for example, if a block size >= 64MB is configured, 4 mappers will run; and as a result the rasters will be loaded in HDFS and a corresponding thumbnail will be created for visualization. An external image editor is required to visualize the thumbnails, and an output path of these thumbnails is provided to the users upon successful completion of the job.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageloader.sh
For ODBA environments, enter:
./runimageloader.sh
For non-ODBA environments, enter:
./runimageloader.sh ALL_ACCESS_FOLDER
Upon successful execution, the message GENERATED OHIF FILES ARE LOCATED IN HDFS UNDER
is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:
“THUMBNAILS CREATED ARE: ---------------------------------------------------------------------- total 13532 drwxr-xr-x 2 yarn yarn 4096 Sep 9 13:54 . drwxr-xr-x 3 yarn yarn 4096 Aug 27 11:29 .. -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 hawaii.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32_1.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 kahoolawe.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 maui.tif.ohif.tif -rw-r--r-- 1 yarn yarn 4182040 Sep 9 13:54 NapaDEM.tif.ohif.tif YOU MAY VISUALIZE THUMBNAILS OF THE UPLOADED IMAGES FOR REVIEW FROM THE FOLLOWING PATH:
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
NOT ALL THE IMAGES WERE UPLOADED CORRECTLY, CHECK FOR HADOOP LOGS
The amount of memory required to execute mappers and reducers depends on the configured HDFS block size By default, 1 GB of memory is assigned for Java, but you can modify that and other properties in the imagejob.prop
file that is included in this test directory.
1.5.4.2 Image Processor Test Script (Mosaicking)
This script executes the processor job by setting three source rasters of Hawaii islands and some coordinates that includes all three. The job will create a mosaic based on these coordinates and resulting raster should include the three rasters combined in a single one.
runimageloader.sh
should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 3 band rasters of byte data type.
No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
Additionally, if the output should be stored in HDFS, the "-o" parameters must be used to set the HDFS folder where the mosaic output will be stored.
Internally the job filters the tiles using the coordinates specified in the configuration input, xml, only the required tiles are processed in a mapper and finally in the reduce phase, all of them are put together into the resulting mosaic raster.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessor.sh
For ODBA environments, enter:
./runimageprocessor.sh
For non-ODBA environments, enter:
./runimageprocessor.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif
is displayed, with the path to the output mosaic file. The output may include:
EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif total 9452 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:12 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4741101 Sep 10 09:12 hawaiimosaic.tif MOSAIC IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
To test the output storage in HDFS, use the following command
For ODBA environments, enter:
./runimageprocessor.sh -o hdfstest
For non-ODBA environments, enter:
./runimageprocessor.sh -s ALL_ACCESS_FOLDER -o hdfstest
1.5.4.3 Single-Image Processor Test Script
This script executes the processor job for a single raster, in this case is a DEM source raster of North Napa Valley. The purpose of this job is process the complete input by using the user processing classes configured for the mapping phase. This class calculates the hillshade of the DEM, and this is set to the output file. No mosaic operation is performed here.
runimageloader.sh
should be executed as a prerequisite, so that the source raster exists in HDFS. This is 1 band of float 32 data type DEM rasters.
No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runsingleimageprocessor.sh
For ODBA environments, enter:
./runsingleimageprocessor.sh
For non-ODBA environments, enter:
./runsingleimageprocessor.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
is displayed, with the path to the output DEM file. The output may include:
EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif total 4808 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:42 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4901232 Sep 10 09:42 NapaDEM.tif IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
1.5.4.4 Image Processor DEM Test Script
This script executes the processor job by setting a DEM source raster of North Napa Valley and some coordinates that surround it. The job will create a mosaic based on these coordinates and will also calculate the slope on it by setting a processing class in the mosaic configuration XML.
runimageloader.sh
should be executed as a prerequisite, so that the source rasters exist in HDFS. This is 1 band of float 32 data type DEM raster.
No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessordem.sh
For ODBA environments, enter:
./runimageprocessordem.sh
For non-ODBA environments, enter:
./runimageprocessordem.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
is displayed, with the path to the slope output file. The output may include:
EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif total 4808 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:42 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4901232 Sep 10 09:42 NapaSlope.tif MOSAIC IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
You may also test the “if” algebra function, where every pixel in this raster with value greater than 2500 will be replaced by the value you set in the command line using the “–c” flag. For example:
For ODBA environments, enter:
./runimageprocessordem.sh –c 8000
For non-ODBA environments, enter:
./runimageprocessordem.sh -s ALL_ACCESS_FOLDER –c 8000
You can visualize the output file and notice the difference between simple slope calculation and this altered output, where the areas with pixel values greater than 2500 look more clear.
1.5.4.5 Multiple Raster Operation Test Script
This script executes the processor job for two rasters that cover a very small area of North Napa Valley in the US state of California.
These rasters have the same MBR, pixel size, SRID, and data type, all of which are required for complex multiple raster operation processing. The purpose of this job is process both rasters by using the mask operation, which checks every pixel in the second raster to validate if its value is contained in the mask list. If it is, the output raster will have the pixel value of the first raster for this output cell; otherwise, the zero (0) value is set. No mosaic operation is performed here.
runimageloader.sh
should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 1 band of int32 data type rasters.
No parameters are required for OBDA environments. For non-ODBA environments, a single parameter -s
with the ALL_ACCESS_FOLDER value is required.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessormultiple.sh
For ODBA environments, enter:
./runimageprocessormultiple.sh
For non-ODBA environments, enter:
./runimageprocessormultiple.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif
is displayed, with the path to the mask output file. The output may include:
EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif total 4808 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:42 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4901232 Sep 10 09:42 MaskInt32Rasters.tif IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
1.6 Installing the Oracle Big Data SpatialViewer Web Application
To install the Oracle Big Data SpatialViewer web application (SpatialViewer), follow the instructions in this topic.
- Assumptions for SpatialViewer
- Installing SpatialViewer on Oracle Big Data Appliance
- Installing SpatialViewer for Other Systems (Not Big Data Appliance)
- Configuring SpatialViewer on Oracle Big Data Appliance
- Configuring SpatialViewer for Other Systems (Not Big Data Appliance)
Parent topic: Big Data Spatial and Graph Overview
1.6.1 Assumptions for SpatialViewer
The following assumptions apply for installing and configuring SpatialViewer.
-
The API and jobs described here run on a Cloudera CDH6 or similar Hadoop environment.
-
Java 8 or a newer version is present in your environment.
-
The image processing framework has been installed as described in Installing and Configuring the Big Data Spatial Image Processing Framework.
1.6.2 Installing SpatialViewer on Oracle Big Data Appliance
You can install SpatialViewer on Big Data Appliance as follows
-
Run the following script:
sudo /opt/oracle/oracle-spatial-graph/spatial/configure-server/install-bdsg-consoles.sh
-
Start the web application by using one of the following commands (the second command enables you to view logs):
sudo service bdsg start sudo /opt/oracle/oracle-spatial-graph/spatial/web-server/start-server.sh
If any errors occur, see the README file located in
/opt/oracle/oracle-spatial-graph/spatial/configure-server
. -
Open:
http://<oracle_big_data_spatial_vector_console>:8045/spatialviewer/
-
If the active nodes have changed after the installation or if Kerberos is enabled, then update the configuration file as described in Configuring SpatialViewer on Oracle Big Data Appliance.
-
Optionally, upload sample data (used with examples in other topics) to HDFS:
sudo -u hdfs hadoop fs -mkdir /user/oracle/bdsg sudo -u hdfs hadoop fs -put /opt/oracle/oracle-spatial-graph/spatial/vector/examples/data/tweets.json /user/oracle/bdsg/
1.6.3 Installing SpatialViewer for Other Systems (Not Big Data Appliance)
Follow the steps for manual configuration described in Installing SpatialViewer on Oracle Big Data Appliance.
Then, change the configuration, as described in Configuring SpatialViewer for Other Systems (Not Big Data Appliance)
1.6.4 Configuring SpatialViewer on Oracle Big Data Appliance
To configure SpatialViewer on Oracle Big Data Appliance, follow these steps.
-
Open the console:
http://<oracle_big_data_spatial_vector_console>:8045/spatialviewer/?root=swadmin
-
Change the general configuration, as needed:
-
Local working directory: SpatialViewer local working directory. Absolute path. The default directory
/usr/oracle/spatialviewer
is created when installing SpatialViewer. -
HDFS working directory: SpatialViewer HDFS working directory. The default directory
/user/oracle/spatialviewer
is created when installing SpatialViewer. -
Hadoop configuration file: The Hadoop configuration directory. By default:
/etc/hadoop/conf
If you change this value, you must restart the server.
-
Spark configuration file: The Spark configuration directory. By default:
/etc/spark/conf
If you change this value, you must restart the server.
-
eLocation URL: URL used to get the eLocation background maps. By default:
http://elocation.oracle.com
-
Kerberos keytab: If Kerberos is enabled, provide the full path to the file that contains the keytab file.
-
Display logs: If necessary, disable the display of the jobs in the Spatial Jobs screen. Disable this display if the logs are not in the default format. The default format is:
Date LogLevel LoggerName: LogMessage
The Date must have the default format:
yyyy-MM-dd HH:mm:ss,SSS
. For example:2012-11-02 14:34:02,781
.If the logs are not displayed and the Display logs field is set to
Yes
, then ensure thatyarn.log-aggregation-enable
inyarn-site.xml
is set totrue
. Also ensure that the Hadoop jobs configuration parametersyarn.nodemanager.remote-app-log-dir
andyarn.nodemanager.remote-app-log-dir-suffix
are set to the same value as inyarn-site.xml
.
-
-
Change the raster configuration, as needed:
-
Shared directory: Directory used to read and write from different nodes, which requires that is be shared and have the greatest permissions or at least be in the Hadoop user group.
-
Network file system mount point: NFS mountpoint that allows the shared folders to be seen and accessed individually. Can be blank if you are using a non-distributed environment.
-
GDAL directory: Native GDAL installation directory. Must be accessible to all the cluster nodes.
If you change this value, you must restart the server.
-
Shared GDAL data directory: GDAL shared data folder. Must be a shared directory. (See the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).)
-
-
Change the Hadoop configuration, as needed.
-
Change the Spark configuration, as needed. The raster processor needs additional configuration details:
-
spark.driver.extraClassPath, spark.executor.extraClassPath
: Specify your hive library installation using these keys. Example:/usr/lib/hive/lib/*
-
spark.kryoserializer.buffer.max
: Enter the memory for the data serialization. Example:160m
-
-
If Kerberos is enabled, then you may need to add the parameters:
-
spark.yarn.keytab
: the full path to the file that contains the keytab for the principal. -
spark.yarn.principal
: the principal to be used to log in to Kerberos. The format of a typical Kerberos V5 principal isprimary/instance@REALM
.
-
-
On Linux systems, you may need to change the secure container executor to
LinuxContainerExecutor
. For that, set the following parameters:-
Set
yarn.nodemanager.container-executor.class
toorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
. -
Set
yarn.nodemanager.linux-container-executor.group
tohadoop
.
-
-
Ensure that the user can read the keytab file.
1.6.5 Configuring SpatialViewer for Other Systems (Not Big Data Appliance)
Before installing the SpatialViewer on other systems, you must install the image processing framework as specified in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).
Then follow the steps mentioned in Configuring SpatialViewer on Oracle Big Data Appliance.
Additionally, change the Hadoop and Spark configuration, replacing the Hadoop conf.
directory and Spark conf.
directory values according your Hadoop and Spark installations.
1.7 Installing Big Data Spatial and Graph in Non-BDA Environments
Some actions may be required if you install Big Data Spatial and Graph in an environment other than Oracle Big Data Appliance.
Starting with Big Data Spatial and Graph (BDSG) 2.5.3, third-party libraries provided by Cloudera required for interaction with Cloudera CDH are no longer distributed with the BDSG distribution. This topic describes the actions that may be needed to enable Cloudera CDH support with BDSG.
On Oracle Big Data Appliance (BDA), BDSG is preconfigured to work with Cloudera CDH "out of the box" as in previous BDSG releases. So any additional installation steps are not required for a BDA environment.
- Automatic Installation of BDSG
- Manual Installation of BDSG
- Configuring the BDSG Environment
- Managing BDSG Text Indexing Using Apache Lucene 7.0
- Managing BDSG Text Indexing Using SolrCloud 7.0
Parent topic: Big Data Spatial and Graph Overview
1.7.1 Automatic Installation of BDSG
After installing the .rpm, you can attempt an automatic installation by running the following script as root:
/opt/oracle/oracle-spatial-graph/property_graph/configure-hadoop.sh
This script makes many assumptions about your Hadoop distribution and version. If any commands in the script fails, perform a manual installation.
1.7.2 Manual Installation of BDSG
To perform a manual installation, use the subtopic relevant to your environment
HDFS
Go into the BDSG property graph installation directory:
cd /opt/oracle/oracle-spatial-graph/property_graph
Set HADOOP_HOME to point to your Hadoop installation base path. For example:
HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678
Copy the required HDFS libraries (and their dependencies) into the hadoop/hdfs directory. (The exact location and version of above JAR files may vary depending on Hadoop distribution and Hadoop version. So, you might have to change some of those input paths to match your cluster installation.)
cp $HADOOP_HOME/lib/hadoop/hadoop-auth-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop/hadoop-common-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-client-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-cli-1.2.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-collections-3.2.2.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-lang-2.6.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-logging-1.1.3.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/stax2-api-3.1.4.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/woodstox-core-5.0.3.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/htrace-core4-4.1.0-incubating.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/protobuf-java-2.5.0.jar hadoop/hdfs/
To enable PGX server to access HDFS, you also need to copy the libraries into the .war file:
mkdir -p WEB-INF/lib
cp /opt/oracle/oracle-spatial-graph/property_graph/hadoop/hdfs/* WEB-INF/lib/
jar -uvf /opt/oracle/oracle-spatial-graph/property_graph/pgx/webapp/pgx-webapp-<version>.war WEB-INF/lib/
rm -r WEB-INF
Then start the server by either running the ./pgx/bin/start-server
script or by deploying the WAR file into an application server.
Yarn
Go into the BDSG property graph installation directory:
cd /opt/oracle/oracle-spatial-graph/property_graph
Locate the path to the Zookeeper JAR file of your Hadoop distribution, for example $HADOOP_HOME/lib/zookeeper/zookeeper-3.4.5-cdh6.0.1.jar
. Then run the following script to configure your BDSG installation to work with Yarn:
TMP_DIR=$(mktemp -d)
cd "${TMP_DIR}"
jar xf "${HADOOP_HOME}/lib/zookeeper/zookeeper-3.4.5-cdh6.0.1.jar"
rm META-INF/MANIFEST.MF
jar -uf /opt/oracle/oracle-spatial-graph/property_graph/hadoop/yarn/pgx-yarn-<version>.jar .
rm -rf "${TMP_DIR}"
HBase
Go into the BDSG property graph installation directory:
cd /opt/oracle/oracle-spatial-graph/property_graph
Create a hadoop/hbase directory to hold all the HBase libraries (and their dependencies) required for execution:
mkdir -p hadoop/hbase
Set HADOOP_HOME to point to your Hadoop installation base path. For example:
HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678
Copy the required HDFS libraries (and their dependencies) into the hadoop/hdfs directory. (The exact location and version of above JAR files may vary depending on Hadoop distribution and Hadoop version. So, you might have to change some of those input paths to match your cluster installation.)
cp $HADOOP_HOME/lib/hbase/hbase-client-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/hbase-common-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/hbase-protocol-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/hbase-shaded-protobuf-2.1.0.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hadoop/hadoop-common-3.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-3.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/zookeeper/zookeeper-3.4.5-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/protobuf-java-2.5.0.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/metrics-core-3.2.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/jettison-1.3.8.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/stax2-api-3.1.4.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/woodstox-core-5.0.3.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar hadoop/hbase
To enable the PGX server to access HBase, you also need to copy the HBase libraries into the .war file:
mkdir -p WEB-INF/lib
cp /opt/oracle/oracle-spatial-graph/property_graph/hadoop/hbase/* WEB-INF/lib/
jar -uvf /opt/oracle/oracle-spatial-graph/property_graph/pgx/webapp/pgx-webapp-<version>.war WEB-INF/lib/
rm -r WEB-INF
Then start the server by either running the ./pgx/bin/start-server
script or by deploying the WAR file into an application server.
1.7.3 Configuring the BDSG Environment
To configure the environment, use the subtopic relevant to your environment.
HDFS
Set the HADOOP_CONF_DIR environment variable to point to the HDFS configuration directory of your cluster. For example:
export HADOOP_CONF_DIR=/etc/hadoop/conf
Set the BDSG_CLASSPATH environment variable to point to the libraries of the previous step before starting the shell. For example:
export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/hadoop/hdfs/*
Then start the shell as usual and access data from HDFS using the hdfs
path prefix:
cd /opt/oracle/oracle-spatial-graph/property_graph
./pgx/bin/pgx
[WARNING] BDSG_CLASSPATH environment will be prepended to PGX classpath. If this is not intended, do 'unset BDSG_CLASSPATH' and restart.
PGX Shell 3.1.3
type :help for available commands
12:01:30,824 INFO Ctrl$1 - >>> PGX engine 3.1.3 running.
variables instance, session and analyst ready to use
pgx> g = session.readGraphWithProperties('hdfs:/tmp/data/connections.edge_list.json')
==> PgxGraph[name=connections,N=78,E=164,created=1543176112779]
Yarn
Copy the required artifacts for Yarn deployments into a HDFS directory of your choice by running the following helper script:
./pgx/scripts/install-pgx-hdfs.sh <dest-dir>
where <dest-dir> could be hdfs:/binaries/pgx
, for example.
Make sure the hdfs
binary is on your PATH environment variable.
After the script finishes, make sure to update pgx/conf/yarn.conf
to contain the paths to the installed binaries and the correct Zookeeper connection string of your cluster.
HBase
To access a property graph in Apache HBase using DAL in a Java application:
- Set BDSG_HOME environment variable to the property graph installation directory. For example:
export BDSG_HOME=/opt/oracle/oracle-spatial-graph/property_graph
- Set BDSG_CLASSPATH environment variable to the hadoop/hbase directory. For example:
export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/hadoop/hbase/*:$BDSG_CLASSPATH
- Set BDSG_CLASSPATH environment variable to the hadoop/hbase directory. For example:
javac -classpath $BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename.java
- Run the Java application by executing the compiled code, as follows:
java -classpath ./:$BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename args
To access a property graph in Apache HBase using DAL in a Groovy console:
- Set the BDSG_CLASSPATH environment variable to the hadoop/hbase directory. For example:
export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/hadoop/hbase/*:$BDSG_CLASSPATH
- Start the shell as usual and access data from an Apache HBase storage using an
OraclePropertyGraph
instance.Note that from Apache HBase 2.0, the
HConnection
interface has been deprecated, so use aConnection
object to connect to the database.cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy sh gremlin-opg-hbase.sh -------------------------------- opg-hbase> conf = HBaseConfiguration.create(); ==>hbase.rs.cacheblocksonwrite=false ==>... opg-hbase> conf.set("hbase.zookeeper.quorum", "localhost"); ==>null opg-hbase> conf.set("hbase.zookeeper.property.clientPort","2181"); ==>null opg-hbase> conn = ConnectionFactory.createConnection(conf); ==>hconnection-0x720653c2 opg-hbase> opg=OraclePropertyGraph.getInstance(conf, conn, "connections"); ==>oraclepropertygraph with name connections
1.7.4 Managing BDSG Text Indexing Using Apache Lucene 7.0
To manage text indexing over property graph data using Apache Lucene:
- Go into the BDSG property graph installation directory:
cd /opt/oracle/oracle-spatial-graph/property_graph
- Create a
lucene
directory to hold all the Apache Lucene 7.0 libraries (and their dependencies) required for execution:mkdir lucene
- Set HADOOP_HOME to point to your Hadoop installation base path. For example:
HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.c
- Copy the required Apache Lucene libraries into the
lucene
directory:cp $HADOOP_HOME/lib/search/lucene-core.jar lucene cp $HADOOP_HOME/lib/search/lucene-queryparser.jar lucene cp $HADOOP_HOME/lib/search/lucene-analyzers-common.jar lucene
Managing Text Indexing in a Java Application
- Set BDSG_HOME to the property graph installation directory. For example:
export BDSG_HOME=/opt/oracle/oracle-spatial-graph/property_graph
- Set BDSG_CLASSPATH to the
lucene
directory. For example:export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/lucene/*:$BDSG_CLASSPATH
- Compile the Java code. For example:
javac -classpath $BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename.java
- Run the Java application by executing the compiled code. For example:
java -classpath ./:$BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename args
Managing Text Indexing Using a Groovy Console
- Set BDSG_CLASSPATH to the
lucene
directory. For example:export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/lucene/*:$BDSG_CLASSPATH
- Start the shell as usual to create a text index over a property graph stored in Apache HBase storage using an
OraclePropertyGraph
instance. For example:cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy sh gremlin-opg-hbase.sh -------------------------------- opg-hbase> conf = HBaseConfiguration.create(); ==>hbase.rs.cacheblocksonwrite=false ==>... opg-hbase> dop=2; ==>2 opg-hbase> conf.set("hbase.zookeeper.quorum", "localhost"); ==>null opg-hbase> conf.set("hbase.zookeeper.property.clientPort","2181"); ==>null opg-hbase> conn = ConnectionFactory.createConnection(conf); ==>hconnection-0x720653c2 opg-hbase> opg=OraclePropertyGraph.getInstance(conf, conn, "connections"); ==>oraclepropertygraph with name connections opg-hbase> indexParams = OracleIndexParameters.buildFS(dop /* number of directories */, dop /* number of connections used when indexing */, 10000 /* batch size before commit*/, 500000 /* commit size before Lucene commit*/, true /* enable datatypes */, "./lucene-index" /* index location */); ==>[parameter[search-engine,1], parameter[num-subdirectories,4], parameter[directory-type,FS_DIRECTORY], parameter[reindex-numConns,4], parameter[batch-size,10000], parameter[commit-batch-size,500000], parameter[values-as-strings,true], parameter[directory-location,[Ljava.lang.String;@5c1f6d57]] opg-hbase> opg.setDefaultIndexParameters(indexParams); ==>null opg-hbase> indexedKeys = new String[4]; indexedKeys[0] = "name"; indexedKeys[1] = "role"; indexedKeys[2] = "religion"; indexedKeys[3] = "country"; ==>name ==>role ==>religion ==>country opg-hbase> opg.createKeyIndex(indexedKeys, Vertex.class); ==>null
1.7.5 Managing BDSG Text Indexing Using SolrCloud 7.0
To manage text indexing over property graph data using Apache Lucene:
- Go into the BDSG property graph installation directory:
cd /opt/oracle/oracle-spatial-graph/property_graph
- Create a
solrcloud
directory to hold all the Apache Lucene 7.0 libraries (and their dependencies) required for execution:mkdir solrcloud
- Set HADOOP_HOME to point to your Hadoop installation base path. For example:
HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678
- Copy the required SolrCloud libraries into the
lucene
directory:cp $HADOOP_HOME/lib/solr/solr-solrj-7.0.0-cdh6.0.1.jar solrcloud cp $HADOOP_HOME/lib/solr/lib/noggit-0.8.jar solrcloud cp $HADOOP_HOME/lib/solr/lib/httpmime-4.5.3.jar solrcloud cp $HADOOP_HOME/lib/search/lucene-core.jar solrcloud cp $HADOOP_HOME/lib/search/lucene-queryparser.jar solrcloud cp $HADOOP_HOME/lib/search/lucene-analyzers-common.jar solrcloud
- Set BDSG_CLASSPATH to the
solrcloud
directory. For example:export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/solrcloud/*:$BDSG_CLASSPATH
Managing Text Indexing in a Java Application
- Set BDSG_HOME to the property graph installation directory. For example:
export BDSG_HOME=/opt/oracle/oracle-spatial-graph/property_graph
- Set BDSG_CLASSPATH to the
solrcloud
directory. For example:export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/solrcloud/*:$BDSG_CLASSPATH
- Compile the Java code. For example:
javac -classpath $BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename.java
- Run the Java application by executing the compiled code. For example::
java -classpath ./:$BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename args
Managing Text Indexing Using a Groovy Console
- Set BDSG_CLASSPATH to the
solrcloud
directory. For example:export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/solrcloud/*:$BDSG_CLASSPATH
- Start the shell as usual to create a text index over a property graph stored in Apache HBase storage using an
OraclePropertyGraph
instance. For example:cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy sh gremlin-opg-hbase.sh -------------------------------- opg-hbase> conf = HBaseConfiguration.create(); ==>hbase.rs.cacheblocksonwrite=false ==>... opg-hbase> dop=2; ==>2 opg-hbase> conf.set("hbase.zookeeper.quorum", "localhost"); ==>null opg-hbase> conf.set("hbase.zookeeper.property.clientPort","2181"); ==>null opg-hbase> conn = ConnectionFactory.createConnection(conf); ==>hconnection-0x720653c2 opg-hbase> opg=OraclePropertyGraph.getInstance(conf, conn, "connections"); ==>oraclepropertygraph with name connections opg-hbase> indexParams = OracleIndexParameters.buildSolr("opgconfig" /* solr config */, "localhost:2181/solr" /* solr server url */, "localhost:8983_solr" /* solr node set */, 15 /* zookeeper timeout in seconds */, 1 /* total number of shards */, 1 /* Replication factor */, 1 /* maximum number of shardsper node */, 4 /* dop used for scan */, 10000 /* batch size before commit */, 500000 /* commit size before SolrCloud commit */, 15 /* write timeout in seconds */); ==>[parameter[search-engine,0], parameter[config-name,opgconfig], parameter[solr-server-url,localhost:2181/solr], parameter[solr-admin-url,localhost:8983_solr], parameter[zk-timeout,15], parameter[replication-factor,1], parameter[num-shards,1], parameter[max-shards-per-node,1], parameter[reindex-numConns,4], parameter[batch-size,10000], parameter[commit-batch-size,500000], parameter[write-timeout,15]] opg-hbase> opg-hbase> opg.setDefaultIndexParameters(indexParams); ==>null opg-hbase> indexedKeys = new String[4]; indexedKeys[0] = "name"; indexedKeys[1] = "role"; indexedKeys[2] = "religion"; indexedKeys[3] = "country"; ==>name ==>role ==>religion ==>country opg-hbase> opg.createKeyIndex(indexedKeys, Vertex.class); ==>null
1.8 Required Application Code Changes due to Upgrades
Application code changes may be required due to upgrades, such as to more recent versions of Apache HBase and SolrCloud.
- Changes Due to Upgrade from Apache HBase 1.x to Apache HBase 2.x
- Changes Due to Upgrade from SolrCloud 4.10.3 to SolrCloud 7.0.0
Parent topic: Big Data Spatial and Graph Overview
1.8.1 Changes Due to Upgrade from Apache HBase 1.x to Apache HBase 2.x
Big Data Spatial and Graph 2.5.3 supports Cloudera CDH6, which upgraded Apache HBase to a newer version.
Creating a Property Graph Instance
Effective with Apache HBase 2.0, the HConnection
interface has been deprecated, so the data acesss layer requires using a Connection object to connect to the database. The following code snippet illustrates how to create an OraclePropertyGraph instance from an Apache HBase 2.0 Connection object.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.*;
...
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", szQuorum);
conf.set("hbase.zookeeper.property.clientPort","2181");
Connection conn = ConnectionFactory.createConnection(conf);
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(conf, hconn, szGraphName);
Parallel Retrieval of Vertices/Edges
The following code snippet opens an array of connections to HBase (using the Connection
/ConnectionFactory
APIs from Apache HBase 2.x), and executes a parallel query to retrieve all vertices and edges using the opened connections. The number of calls to the getVerticesPartitioned
/getEdgesPartitioned
method is controlled by the total number of splits and the number of connections used.
int dop = 4;
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", szQuorum);
conf.set("hbase.zookeeper.property.clientPort","2181");
Connection conn = ConnectionFactory.createConnection(conf);
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(conn, "connections");
// Create connections used in parallel query
Connection[] conns= new Connection[dop];
for (int i = 0; i < dop; i++) {
Configuration conf_new = HBaseConfiguration.create(opg.getConfiguration());
conns[i] = ConnectionFactory.createConnection(conf_new);
}
long lCountV = 0;
// Iterate over all the vertices¿ splits to count all the vertices
for (int split = 0; split < opg.getVertexTableSplits(); split += dop) {
Iterable<Vertex>[] iterables = opg.getVerticesPartitioned(conns /* Connection array */,
true /* skip store to cache */,
split /* starting split */);
for (Iterable<Vertex> iterable : iterables) {
lCountV += OraclePropertyGraphUtils.size(iterable); /* consume iterables */
}
}
// Count all vertices
System.out.println("Vertices found using parallel query: " + lCountV);
long lCountE = 0;
// Iterate over all the edges¿ splits to count all the edges
for (int split = 0; split < opg.getEdgeTableSplits(); split += dop) {
Iterable<Edge>[] iterables = opg.getEdgesPartitioned(conns /* Connection array */,
true /* skip store to cache */,
split /* starting split */);
for (Iterable<Vertex> iterable : iterables) {
lCountE += consumeIterables(iterables); /* consume iterables */
}
}
// Count all edges
System.out.println("Edges found using parallel query: " + lCountE);
// Close the connections to the database after completed
for (int idx = 0; idx < conns.length; idx++) {
conns[idx].close();
}
Dropping an Existing Graph
For Apache HBase 2.x, the OraclePropertyGraphUtils.dropPropertyGraph
method uses the Hadoop nodes and the Apache HBase port number for the connection. The following code fragment deletes a graph named my_graph
from Apache HBase 2.x.
int dop = 4;
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", szQuorum);
conf.set("hbase.zookeeper.property.clientPort","2181");
Connection conn = ConnectionFactory.createConnection(conf);
OraclePropertyGraphUtils.dropPropertyGraph(conn, "my_graph");
Parent topic: Required Application Code Changes due to Upgrades
1.8.2 Changes Due to Upgrade from SolrCloud 4.10.3 to SolrCloud 7.0.0
The upgrade from SolrCloud 4.10.3 to SolrCloud 7.0.0 may require some application code changes.
Parallel Query on Text Indexes for Property Graph Data
With SolrCloud 7.0, the SolrCloudServer
interface has been deprecated, so the data acess layer requires using a CloudSolrClient
object to connect to SolrCloud text search engine. In order to execute parallel queries over a SolrCloud-based text index, you must specify a set of CloudSolrClient
instances. To create a CloudSolrClient
instance, you can rely on the SolrIndexUtils.getCloudSolrClient
API, because the operation SolrIndexUtils.getCloudSolrServer
is now deprecated
The following code snippet generates an automatic text index using the SolrCloud Search engine and executes a parallel text query. The number of calls to the getPartitioned
method in the SolrIndex
class is controlled by the total number of shards in the index and the number of connections used.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName);
String configName = "opgconfig";
String solrServerUrl = args[4];//"localhost:2181/solr"
String solrNodeSet = args[5]; //"localhost:8983_solr";
int zkTimeout = 15; // zookeeper timeout in seconds
int numShards = Integer.parseInt(args[6]); // number of shards in the index
int replicationFactor = 1; // replication factor
int maxShardsPerNode = 1; // maximum number of shards per node
// Create an automatic index using SolrCloud
OracleIndexParameters indexParams = OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout /* zookeeper timeout in seconds */, numShards /* total number of shards */, replicationFactor /* Replication factor */, maxShardsPerNode /* maximum number of shardsper node*/, 4 /* dop used for scan */, 10000 /* batch size before commit*/, 500000 /* commit size before SolrCloud commit*/, 15 /* write timeout in seconds */);
opg.setDefaultIndexParameters(indexParams);
// Create auto indexing on name property for all vertices
System.out.println("Create automatic index on name for vertices");
opg.createKeyIndex("name", Vertex.class);
// Get the SolrIndex object
SolrIndex<Vertex> index = (SolrIndex<Vertex>) opg.getAutoIndex(Vertex.class);
// Open an array of connections to handle connections to SolrCloud needed for parallel text search
CloudSolrClient[] conns = new CloudSolrClient[dop];
for (int idx = 0; idx < conns.length; idx++) {
conns[idx] = index.getCloudSolrClient(15 /* write timeout in secs*/);
}
// Iterate to cover all the shards in the index
long lCount = 0;
for (int split = 0; split < index.getTotalShards(); split += conns.length) {
// Gets elements from split to split + conns.length
Iterable<Vertex>[] iterAr = index.getPartitioned(conns /* connections */, "name"/* key */, "*" /* value */, true /* wildcards */, split /* start split ID */);
for (Iterable<Vertex> iterable : iterables) {
lCount += OraclePropertyGraphUtils.size(iterable); /* consume iterables */
}
}
// Close the connections to SolrCloud after completed
for (int idx = 0; idx < conns.length; idx++) {
conns[idx].close();
}
Using Native Query Results with SolrCloud
You can use native query results using SolrCloud by calling the method get(QueryResponse)
in SolrIndex
. A QueryResponse
object provides a set of Documents matching a text search query over a specific SolrCloud collection. SolrIndex
will produce an Iterable object holding all the vertices (or edges) from the documents found in the QueryResponse
object.
With SolrCloud 7.0, the SolrCloudServer
interface has been deprecated, so the data access layer requires use of a CloudSolrClient
object to process native query results over a text index in Oracle Property Graph. The following code fragment generates an automatic text index using the Apache SolrCloud Search engine, creates a SolrQuery
object, and executes it against a CloudSolrClient
object to get a QueryResponse object. Later, an Iterable object of vertices is created from the given result object.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName);
String configName = "opgconfig";
String solrServerUrl = args[4];//"localhost:2181/solr"
String solrNodeSet = args[5]; //"localhost:8983_solr";
int zkTimeout = 15; // zookeeper timeout in seconds
int numShards = Integer.parseInt(args[6]); // number of shards in the index
int replicationFactor = 1; // replication factor
int maxShardsPerNode = 1; // maximum number of shards per node
// Create an automatic index using SolrCloud
OracleIndexParameters indexParams = OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout /* zookeeper timeout in seconds */, numShards /* total number of shards */, replicationFactor /* Replication factor */, maxShardsPerNode /* maximum number of shardsper node*/, 4 /* dop used for scan */, 10000 /* batch size before commit*/, 500000 /* commit size before SolrCloud commit*/, 15 /* write timeout in seconds */);
opg.setDefaultIndexParameters(indexParams);
// Create auto indexing on name property for all vertices System.out.println("Create automatic index on name and country for vertices"); String[] indexedKeys = new String[2]; indexedKeys[0]="name"; indexedKeys[1]="country"; opg.createKeyIndex(indexedKeys, Vertex.class);
// Get the SolrIndex object
SolrIndex<Vertex> index = (SolrIndex<Vertex>) opg.getAutoIndex(Vertex.class);
// Search first for Key name with property value Beyon* using only string data types
String szQueryStrBey = index.buildSearchTerm("name", "Beyo*", String.class);
String key = index.appendDatatypesSuffixToKey("country", String.class);
String value = index.appendDatatypesSuffixToValue("United States", String.class);
String szQueryStrCountry = key + ":" + value;
Solrquery query = new SolrQuery(szQueryStrBey + " AND " + szQueryStrCountry);
CloudSolrClient conn = index.getCloudSolrClient(15 /* write timeout in secs*/);
//Query using get operation
QueryResponse qr = conn.query(query, SolrRequest.METHOD.POST);
Iterable<Vertex> it = index.get(qr);
long lCount = 0;
while (it.hasNext()) {
System.out.println(it.next());
lCount++;
}
System.out.println("Vertices found: "+ lCount);
Parent topic: Required Application Code Changes due to Upgrades