1 Big Data Spatial and Graph Overview

This chapter provides an overview of Oracle Big Data support for Oracle Spatial and Graph spatial and property graph features.

1.1 About Big Data Spatial and Graph

Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities on supported Apache Hadoop Big Data platforms.

The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.

1.2 Spatial Features

Spatial location information is a common element of Big Data.

Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.

The spatial features of Oracle Big Data Spatial and Graph support those use cases with the following kinds of services.

Vector Services:

  • Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy

  • Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats

  • An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems

  • Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others

  • Spatial indexing for fast retrieval of data

Raster Services:

  • Support for many image file formats supported by GDAL and image files stored in HDFS

  • A sample console to view the set of images that are available

  • Raster operations, including, subsetting, georeferencing, mosaics, and format conversion

1.3 Property Graph Features

Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.

Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs. These use cases are supported by the property graph features of Oracle Big Data Spatial and Graph.

Property Graph features in Big Data platforms are enabled by using Oracle Graph HDFS Connector which is part of Oracle Graph Server and Client. Relevant features of Oracle Graph Server and Client are supported by accessing data in Apache HDFS using this connector. See Oracle Database Graph Developer's Guide for Property Graph for more information.

1.4 Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance

The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including the spatial and property graph capabilities.

You can enable this option during an initial software installation, or afterward using the bdacli utility.

To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.

To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster.

See Also:

Oracle Big Data Appliance Owner's Guide for software configuration instructions.

1.5 Installing and Configuring the Big Data Spatial Image Processing Framework

Installing and configuring the Image Processing Framework depends upon the distribution being used.

For both distributions:

1.5.1 Getting and Compiling the Cartographic Projections Library

Before installing the Image Processing Framework, you must download the Cartographic Projections Library and perform several related operations.

  1. Download the PROJ.4 source code and datum shifting files:

    $ wget http://download.osgeo.org/proj/proj-4.9.1.tar.gz
    $ wget http://download.osgeo.org/proj/proj-datumgrid-1.5.tar.gz
    
  2. Untar the source code, and extract the datum shifting files in the nad subdirectory:

    $ tar xzf proj-4.9.1.tar.gz
    $ cd proj-4.9.1/nad
    $ tar xzf ../../proj-datumgrid-1.5.tar.gz
    $ cd ..
    
  3. Configure, make, and install PROJ.4:

    $ ./configure
    $ make
    $ sudo make install
    $ cd ..
    

    libproj.so is now available at /usr/local/lib/libproj.so.

  4. Copy the libproj.so file in the spatial installation directory:

    cp /usr/local/lib/libproj.so /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
  5. Provide read and execute permissions for the libproj.so library for all users

    sudo chmod 755 /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so

1.5.2 Installing the Image Processing Framework for Oracle Big Data Appliance Distribution

The Oracle Big Data Appliance distribution comes with a pre-installed configuration, though you must ensure that the image processing framework has been installed.

Be sure that the actions described in Getting and Compiling the Cartographic Projections Library have been performed, so that libproj.so (PROJ.4) is accessible to all users and is set up correctly.

For OBDA, ensure that the following directories exist:

  • SHARED_DIR (shared directory for all nodes in the cluster): /opt/shareddir

  • ALL_ACCESS_DIR (shared directory for all nodes in the cluster with Write access to the hadoop group): /opt/shareddir/spatial

1.5.3 Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance)

For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.

1.5.3.1 Prerequisites for Installing the Image Processing Framework for Other Distributions
  • Ensure that HADOOP_LIB_PATH is under /usr/lib/hadoop. If it is not there, find the path and use it as it your HADOOP_LIB_PATH.

  • Install NFS.

  • Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.

  • Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER

  • Download oracle-spatial-graph-<version>.x86_64.rpm from the Oracle e-delivery web site.

  • Execute oracle-spatial-graph-<version>.x86_64.rpm using the rpm command.

  • After rpm executes, verify that a directory structure created at /opt/oracle/oracle-spatial-graph/spatial/raster contains these folders: console, examples, jlib, gdal, and tests. Additionally, index.html describes the content, and javadoc.zip contains the Javadoc for the API..

1.5.3.2 Installing the Image Processing Framework for Other Distributions
  1. Make the libproj.so (Proj.4) Cartographic Projections Library accessible to the users, as explained in Getting and Compiling the Cartographic Projections Library.
  2. In the Resource Manager Node, copy the data folder under /opt/oracle/oracle-spatial-graph/spatial/raster/gdal into the SHARED_FOLDER as follows:

    cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/data SHARED_FOLDER

  3. Create a directory ALL_ACCESS_FOLDER under SHARED_FOLDER with write access for all users involved in job execution. Also consider the yarn user in the write access because job results are written by this user. Group access may be used to configure this.

    Go to the shared folder.

    cd SHARED_FOLDER

    Create a new directory.

    mkdir ALL_ACCESS_FOLDER

    Provide write access.

    chmod 777 ALL_ACCESS_FOLDER

  4. Copy the data folder under /opt/oracle/oracle-spatial-graph/spatial/raster/examples into ALL_ACCESS_FOLDER.

    cp -R /opt/oracle/oracle-spatial-graph/spatial/raster/examples/data ALL_ACCESS_FOLDER

  5. Provide write access to the data/xmls folder as follows (or just ensure that users executing the jobs, including tests and examples, have write access):

    chmod 777 ALL_ACCESS_FOLDER/data/xmls/

1.5.4 Post-installation Verification of the Image Processing Framework

Several test scripts are provided to perform the following verification operations.

  • Test the image loading functionality

  • Test test the image processing functionality

  • Test a processing class for slope calculation in a DEM and a map algebra operation

  • Verify the image processing of a single raster with no mosaic process (it includes a user-provided function that calculates hill shade in the mapping phase).

  • Test processing of two rasters using a mask operation

Execute these scripts to verify a successful installation of image processing framework.

If the cluster has security enabled, make sure the current user is in the princs list and has an active Kerberos ticket.

Make sure the user has write access to ALL_ACCESS_FOLDER and that it belongs to the owner group for this directory. It is recommended that jobs be executed in Resource Manager node for Big Data Appliance. If jobs are executed in a different node, then the default is the hadoop group.

For GDAL to work properly, the libraries must be available using $LD_LIBRARY_PATH. Make sure that the shared libraries path is set properly in your shell window before executing a job. For example:

export LD_LIBRARY_PATH=$ALLACCESSDIR/gdal/native
1.5.4.1 Image Loading Test Script

This script loads a set of six test rasters into the ohiftest folder in HDFS, 3 rasters of byte data type and 3 bands, 1 raster (DEM) of float32 data type and 1 band, and 2 rasters of int32 data type and 1 band. No parameters are required for OBDA environments and a single parameter with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Internally, the job creates a split for every raster to load. Split size depends on the block size configuration; for example, if a block size >= 64MB is configured, 4 mappers will run; and as a result the rasters will be loaded in HDFS and a corresponding thumbnail will be created for visualization. An external image editor is required to visualize the thumbnails, and an output path of these thumbnails is provided to the users upon successful completion of the job.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageloader.sh

For ODBA environments, enter:

./runimageloader.sh

For non-ODBA environments, enter:

./runimageloader.sh ALL_ACCESS_FOLDER

Upon successful execution, the message GENERATED OHIF FILES ARE LOCATED IN HDFS UNDER is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:

“THUMBNAILS CREATED ARE:
----------------------------------------------------------------------
total 13532
drwxr-xr-x 2 yarn yarn 4096 Sep 9 13:54 .
drwxr-xr-x 3 yarn yarn 4096 Aug 27 11:29 ..
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 hawaii.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32_1.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 kahoolawe.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 maui.tif.ohif.tif
-rw-r--r-- 1 yarn yarn 4182040 Sep 9 13:54 NapaDEM.tif.ohif.tif
YOU MAY VISUALIZE THUMBNAILS OF THE UPLOADED IMAGES FOR REVIEW FROM THE FOLLOWING PATH:

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

NOT ALL THE IMAGES WERE UPLOADED CORRECTLY, CHECK FOR HADOOP LOGS

The amount of memory required to execute mappers and reducers depends on the configured HDFS block size By default, 1 GB of memory is assigned for Java, but you can modify that and other properties in the imagejob.prop file that is included in this test directory.

1.5.4.2 Image Processor Test Script (Mosaicking)

This script executes the processor job by setting three source rasters of Hawaii islands and some coordinates that includes all three. The job will create a mosaic based on these coordinates and resulting raster should include the three rasters combined in a single one.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 3 band rasters of byte data type.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

Additionally, if the output should be stored in HDFS, the "-o" parameters must be used to set the HDFS folder where the mosaic output will be stored.

Internally the job filters the tiles using the coordinates specified in the configuration input, xml, only the required tiles are processed in a mapper and finally in the reduce phase, all of them are put together into the resulting mosaic raster.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessor.sh

For ODBA environments, enter:

./runimageprocessor.sh

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif is displayed, with the path to the output mosaic file. The output may include:

EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif
total 9452
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:12 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4741101 Sep 10 09:12 hawaiimosaic.tif

MOSAIC IMAGE GENERATED
----------------------------------------------------------------------
YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

To test the output storage in HDFS, use the following command

For ODBA environments, enter:

./runimageprocessor.sh -o hdfstest

For non-ODBA environments, enter:

./runimageprocessor.sh -s ALL_ACCESS_FOLDER -o hdfstest
1.5.4.3 Single-Image Processor Test Script

This script executes the processor job for a single raster, in this case is a DEM source raster of North Napa Valley. The purpose of this job is process the complete input by using the user processing classes configured for the mapping phase. This class calculates the hillshade of the DEM, and this is set to the output file. No mosaic operation is performed here.

runimageloader.sh should be executed as a prerequisite, so that the source raster exists in HDFS. This is 1 band of float 32 data type DEM rasters.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runsingleimageprocessor.sh

For ODBA environments, enter:

./runsingleimageprocessor.sh

For non-ODBA environments, enter:

./runsingleimageprocessor.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path to the output DEM file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaDEM.tif
IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
1.5.4.4 Image Processor DEM Test Script

This script executes the processor job by setting a DEM source raster of North Napa Valley and some coordinates that surround it. The job will create a mosaic based on these coordinates and will also calculate the slope on it by setting a processing class in the mosaic configuration XML.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. This is 1 band of float 32 data type DEM raster.

No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessordem.sh

For ODBA environments, enter:

./runimageprocessordem.sh

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif is displayed, with the path to the slope output file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 NapaSlope.tif
MOSAIC IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

You may also test the “if” algebra function, where every pixel in this raster with value greater than 2500 will be replaced by the value you set in the command line using the “–c” flag. For example:

For ODBA environments, enter:

./runimageprocessordem.sh –c 8000

For non-ODBA environments, enter:

./runimageprocessordem.sh -s ALL_ACCESS_FOLDER –c 8000

You can visualize the output file and notice the difference between simple slope calculation and this altered output, where the areas with pixel values greater than 2500 look more clear.

1.5.4.5 Multiple Raster Operation Test Script

This script executes the processor job for two rasters that cover a very small area of North Napa Valley in the US state of California.

These rasters have the same MBR, pixel size, SRID, and data type, all of which are required for complex multiple raster operation processing. The purpose of this job is process both rasters by using the mask operation, which checks every pixel in the second raster to validate if its value is contained in the mask list. If it is, the output raster will have the pixel value of the first raster for this output cell; otherwise, the zero (0) value is set. No mosaic operation is performed here.

runimageloader.sh should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 1 band of int32 data type rasters.

No parameters are required for OBDA environments. For non-ODBA environments, a single parameter -s with the ALL_ACCESS_FOLDER value is required.

The test script can be found here:

/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessormultiple.sh

For ODBA environments, enter:

./runimageprocessormultiple.sh

For non-ODBA environments, enter:

./runimageprocessormultiple.sh -s ALL_ACCESS_FOLDER

Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif is displayed, with the path to the mask output file. The output may include:

EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif
total 4808
drwxrwxrwx 2 hdfs    hdfs    4096 Sep 10 09:42 .
drwxrwxrwx 9 zherena dba     4096 Sep  9 13:50 ..
-rwxrwxrwx 1 yarn    yarn 4901232 Sep 10 09:42 MaskInt32Rasters.tif
IMAGE GENERATED
----------------------------------------------------------------------

YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif”

If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:

IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM

1.6 Installing the Oracle Big Data SpatialViewer Web Application

To install the Oracle Big Data SpatialViewer web application (SpatialViewer), follow the instructions in this topic.

1.6.1 Assumptions for SpatialViewer

The following assumptions apply for installing and configuring SpatialViewer.

1.6.2 Installing SpatialViewer on Oracle Big Data Appliance

You can install SpatialViewer on Big Data Appliance as follows

  1. Run the following script:

    sudo /opt/oracle/oracle-spatial-graph/spatial/configure-server/install-bdsg-consoles.sh
  2. Start the web application by using one of the following commands (the second command enables you to view logs):

    sudo service bdsg start
    sudo /opt/oracle/oracle-spatial-graph/spatial/web-server/start-server.sh

    If any errors occur, see the README file located in /opt/oracle/oracle-spatial-graph/spatial/configure-server.

  3. Open: http://<oracle_big_data_spatial_vector_console>:8045/spatialviewer/

  4. If the active nodes have changed after the installation or if Kerberos is enabled, then update the configuration file as described in Configuring SpatialViewer on Oracle Big Data Appliance.

  5. Optionally, upload sample data (used with examples in other topics) to HDFS:

    sudo -u hdfs hadoop fs -mkdir /user/oracle/bdsg
    sudo -u hdfs hadoop fs -put /opt/oracle/oracle-spatial-graph/spatial/vector/examples/data/tweets.json /user/oracle/bdsg/
    

1.6.3 Installing SpatialViewer for Other Systems (Not Big Data Appliance)

Follow the steps for manual configuration described in Installing SpatialViewer on Oracle Big Data Appliance.

Then, change the configuration, as described in Configuring SpatialViewer for Other Systems (Not Big Data Appliance)

1.6.4 Configuring SpatialViewer on Oracle Big Data Appliance

To configure SpatialViewer on Oracle Big Data Appliance, follow these steps.

  1. Open the console: http://<oracle_big_data_spatial_vector_console>:8045/spatialviewer/?root=swadmin

  2. Change the general configuration, as needed:

    • Local working directory: SpatialViewer local working directory. Absolute path. The default directory /usr/oracle/spatialviewer is created when installing SpatialViewer.

    • HDFS working directory: SpatialViewer HDFS working directory. The default directory /user/oracle/spatialviewer is created when installing SpatialViewer.

    • Hadoop configuration file: The Hadoop configuration directory. By default: /etc/hadoop/conf

      If you change this value, you must restart the server.

    • Spark configuration file: The Spark configuration directory. By default: /etc/spark/conf

      If you change this value, you must restart the server.

    • eLocation URL: URL used to get the eLocation background maps. By default: http://elocation.oracle.com

    • Kerberos keytab: If Kerberos is enabled, provide the full path to the file that contains the keytab file.

    • Display logs: If necessary, disable the display of the jobs in the Spatial Jobs screen. Disable this display if the logs are not in the default format. The default format is: Date LogLevel LoggerName: LogMessage

      The Date must have the default format: yyyy-MM-dd HH:mm:ss,SSS. For example: 2012-11-02 14:34:02,781.

      If the logs are not displayed and the Display logs field is set to Yes, then ensure that yarn.log-aggregation-enable in yarn-site.xml is set to true. Also ensure that the Hadoop jobs configuration parameters yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix are set to the same value as in yarn-site.xml.

  3. Change the raster configuration, as needed:

    • Shared directory: Directory used to read and write from different nodes, which requires that is be shared and have the greatest permissions or at least be in the Hadoop user group.

    • Network file system mount point: NFS mountpoint that allows the shared folders to be seen and accessed individually. Can be blank if you are using a non-distributed environment.

    • GDAL directory: Native GDAL installation directory. Must be accessible to all the cluster nodes.

      If you change this value, you must restart the server.

    • Shared GDAL data directory: GDAL shared data folder. Must be a shared directory. (See the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).)

  4. Change the Hadoop configuration, as needed.

  5. Change the Spark configuration, as needed. The raster processor needs additional configuration details:

    • spark.driver.extraClassPath, spark.executor.extraClassPath: Specify your hive library installation using these keys. Example: /usr/lib/hive/lib/*

    • spark.kryoserializer.buffer.max: Enter the memory for the data serialization. Example: 160m

  6. If Kerberos is enabled, then you may need to add the parameters:

    • spark.yarn.keytab: the full path to the file that contains the keytab for the principal.

    • spark.yarn.principal: the principal to be used to log in to Kerberos. The format of a typical Kerberos V5 principal is primary/instance@REALM.

  7. On Linux systems, you may need to change the secure container executor to LinuxContainerExecutor. For that, set the following parameters:

    • Set yarn.nodemanager.container-executor.class to org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.

    • Set yarn.nodemanager.linux-container-executor.group to hadoop.

  8. Ensure that the user can read the keytab file.

1.6.5 Configuring SpatialViewer for Other Systems (Not Big Data Appliance)

Before installing the SpatialViewer on other systems, you must install the image processing framework as specified in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).

Then follow the steps mentioned in Configuring SpatialViewer on Oracle Big Data Appliance.

Additionally, change the Hadoop and Spark configuration, replacing the Hadoop conf. directory and Spark conf. directory values according your Hadoop and Spark installations.

1.7 Installing Big Data Spatial and Graph in Non-BDA Environments

Some actions may be required if you install Big Data Spatial and Graph in an environment other than Oracle Big Data Appliance.

Starting with Big Data Spatial and Graph (BDSG) 2.5.3, third-party libraries provided by Cloudera required for interaction with Cloudera CDH are no longer distributed with the BDSG distribution. This topic describes the actions that may be needed to enable Cloudera CDH support with BDSG.

On Oracle Big Data Appliance (BDA), BDSG is preconfigured to work with Cloudera CDH "out of the box" as in previous BDSG releases. So any additional installation steps are not required for a BDA environment.

1.7.1 Automatic Installation of BDSG

After installing the .rpm, you can attempt an automatic installation by running the following script as root:

/opt/oracle/oracle-spatial-graph/property_graph/configure-hadoop.sh

This script makes many assumptions about your Hadoop distribution and version. If any commands in the script fails, perform a manual installation.

1.7.2 Manual Installation of BDSG

To perform a manual installation, use the subtopic relevant to your environment

HDFS

Go into the BDSG property graph installation directory:

cd /opt/oracle/oracle-spatial-graph/property_graph

Set HADOOP_HOME to point to your Hadoop installation base path. For example:

HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678

Copy the required HDFS libraries (and their dependencies) into the hadoop/hdfs directory. (The exact location and version of above JAR files may vary depending on Hadoop distribution and Hadoop version. So, you might have to change some of those input paths to match your cluster installation.)

cp $HADOOP_HOME/lib/hadoop/hadoop-auth-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop/hadoop-common-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-client-3.0.0-cdh6.0.1.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-cli-1.2.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-collections-3.2.2.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-lang-2.6.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/commons-logging-1.1.3.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/stax2-api-3.1.4.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/woodstox-core-5.0.3.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/htrace-core4-4.1.0-incubating.jar hadoop/hdfs/
cp $HADOOP_HOME/lib/hadoop-hdfs/lib/protobuf-java-2.5.0.jar hadoop/hdfs/

To enable PGX server to access HDFS, you also need to copy the libraries into the .war file:

mkdir -p WEB-INF/lib
cp /opt/oracle/oracle-spatial-graph/property_graph/hadoop/hdfs/* WEB-INF/lib/
jar -uvf /opt/oracle/oracle-spatial-graph/property_graph/pgx/webapp/pgx-webapp-<version>.war WEB-INF/lib/
rm -r WEB-INF

Then start the server by either running the ./pgx/bin/start-server script or by deploying the WAR file into an application server.

Yarn

Go into the BDSG property graph installation directory:

cd /opt/oracle/oracle-spatial-graph/property_graph

Locate the path to the Zookeeper JAR file of your Hadoop distribution, for example $HADOOP_HOME/lib/zookeeper/zookeeper-3.4.5-cdh6.0.1.jar. Then run the following script to configure your BDSG installation to work with Yarn:

TMP_DIR=$(mktemp -d)
cd "${TMP_DIR}"
jar xf "${HADOOP_HOME}/lib/zookeeper/zookeeper-3.4.5-cdh6.0.1.jar"
rm META-INF/MANIFEST.MF
jar -uf /opt/oracle/oracle-spatial-graph/property_graph/hadoop/yarn/pgx-yarn-<version>.jar .
rm -rf "${TMP_DIR}"

HBase

Go into the BDSG property graph installation directory:

cd /opt/oracle/oracle-spatial-graph/property_graph

Create a hadoop/hbase directory to hold all the HBase libraries (and their dependencies) required for execution:

mkdir -p hadoop/hbase

Set HADOOP_HOME to point to your Hadoop installation base path. For example:

HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678

Copy the required HDFS libraries (and their dependencies) into the hadoop/hdfs directory. (The exact location and version of above JAR files may vary depending on Hadoop distribution and Hadoop version. So, you might have to change some of those input paths to match your cluster installation.)

cp $HADOOP_HOME/lib/hbase/hbase-client-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/hbase-common-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/hbase-protocol-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/hbase-shaded-protobuf-2.1.0.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hadoop/hadoop-common-3.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-3.0.0-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/zookeeper/zookeeper-3.4.5-cdh6.0.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/protobuf-java-2.5.0.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/metrics-core-3.2.1.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/jettison-1.3.8.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/stax2-api-3.1.4.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/woodstox-core-5.0.3.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar hadoop/hbase
cp $HADOOP_HOME/lib/hbase/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar hadoop/hbase

To enable the PGX server to access HBase, you also need to copy the HBase libraries into the .war file:

mkdir -p WEB-INF/lib
cp /opt/oracle/oracle-spatial-graph/property_graph/hadoop/hbase/* WEB-INF/lib/
jar -uvf /opt/oracle/oracle-spatial-graph/property_graph/pgx/webapp/pgx-webapp-<version>.war WEB-INF/lib/
rm -r WEB-INF

Then start the server by either running the ./pgx/bin/start-server script or by deploying the WAR file into an application server.

1.7.3 Configuring the BDSG Environment

To configure the environment, use the subtopic relevant to your environment.

HDFS

Set the HADOOP_CONF_DIR environment variable to point to the HDFS configuration directory of your cluster. For example:

export HADOOP_CONF_DIR=/etc/hadoop/conf

Set the BDSG_CLASSPATH environment variable to point to the libraries of the previous step before starting the shell. For example:

export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/hadoop/hdfs/*

Then start the shell as usual and access data from HDFS using the hdfs path prefix:

cd /opt/oracle/oracle-spatial-graph/property_graph
./pgx/bin/pgx
[WARNING] BDSG_CLASSPATH environment will be prepended to PGX classpath. If this is not intended, do 'unset BDSG_CLASSPATH' and restart.
 
PGX Shell 3.1.3
type :help for available commands
12:01:30,824 INFO Ctrl$1 - >>> PGX engine 3.1.3 running.
variables instance, session and analyst ready to use
pgx> g = session.readGraphWithProperties('hdfs:/tmp/data/connections.edge_list.json')
==> PgxGraph[name=connections,N=78,E=164,created=1543176112779]

Yarn

Copy the required artifacts for Yarn deployments into a HDFS directory of your choice by running the following helper script:

./pgx/scripts/install-pgx-hdfs.sh <dest-dir>

where <dest-dir> could be hdfs:/binaries/pgx, for example.

Make sure the hdfs binary is on your PATH environment variable.

After the script finishes, make sure to update pgx/conf/yarn.conf to contain the paths to the installed binaries and the correct Zookeeper connection string of your cluster.

HBase

To access a property graph in Apache HBase using DAL in a Java application:

  1. Set BDSG_HOME environment variable to the property graph installation directory. For example:
    export BDSG_HOME=/opt/oracle/oracle-spatial-graph/property_graph
  2. Set BDSG_CLASSPATH environment variable to the hadoop/hbase directory. For example:
    export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/hadoop/hbase/*:$BDSG_CLASSPATH
  3. Set BDSG_CLASSPATH environment variable to the hadoop/hbase directory. For example:
    javac -classpath $BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename.java
  4. Run the Java application by executing the compiled code, as follows:
    java -classpath ./:$BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename args

To access a property graph in Apache HBase using DAL in a Groovy console:

  1. Set the BDSG_CLASSPATH environment variable to the hadoop/hbase directory. For example:
    export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/hadoop/hbase/*:$BDSG_CLASSPATH
  2. Start the shell as usual and access data from an Apache HBase storage using an OraclePropertyGraph instance.

    Note that from Apache HBase 2.0, the HConnection interface has been deprecated, so use a Connection object to connect to the database.

    cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy
    sh gremlin-opg-hbase.sh
     
    --------------------------------
     
    opg-hbase> conf = HBaseConfiguration.create();
    ==>hbase.rs.cacheblocksonwrite=false
    ==>...
    opg-hbase> conf.set("hbase.zookeeper.quorum", "localhost");
    ==>null
    opg-hbase> conf.set("hbase.zookeeper.property.clientPort","2181");
    ==>null
    opg-hbase> conn = ConnectionFactory.createConnection(conf);
    ==>hconnection-0x720653c2
    opg-hbase> opg=OraclePropertyGraph.getInstance(conf, conn, "connections");
    ==>oraclepropertygraph with name connections
    

1.7.4 Managing BDSG Text Indexing Using Apache Lucene 7.0

To manage text indexing over property graph data using Apache Lucene:

  1. Go into the BDSG property graph installation directory:
    cd /opt/oracle/oracle-spatial-graph/property_graph
  2. Create a lucene directory to hold all the Apache Lucene 7.0 libraries (and their dependencies) required for execution:
    mkdir lucene
  3. Set HADOOP_HOME to point to your Hadoop installation base path. For example:
    HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.c
  4. Copy the required Apache Lucene libraries into the lucene directory:
    cp $HADOOP_HOME/lib/search/lucene-core.jar lucene
    cp $HADOOP_HOME/lib/search/lucene-queryparser.jar lucene
    cp $HADOOP_HOME/lib/search/lucene-analyzers-common.jar lucene
    

Managing Text Indexing in a Java Application

  1. Set BDSG_HOME to the property graph installation directory. For example:
    export BDSG_HOME=/opt/oracle/oracle-spatial-graph/property_graph
  2. Set BDSG_CLASSPATH to the lucene directory. For example:
    export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/lucene/*:$BDSG_CLASSPATH
  3. Compile the Java code. For example:
    javac -classpath $BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename.java
  4. Run the Java application by executing the compiled code. For example:
    java -classpath ./:$BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename args

Managing Text Indexing Using a Groovy Console

  1. Set BDSG_CLASSPATH to the lucene directory. For example:
    export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/lucene/*:$BDSG_CLASSPATH
  2. Start the shell as usual to create a text index over a property graph stored in Apache HBase storage using an OraclePropertyGraph instance. For example:
    cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy
    sh gremlin-opg-hbase.sh
     
    --------------------------------
     
    opg-hbase> conf = HBaseConfiguration.create();
    ==>hbase.rs.cacheblocksonwrite=false
    ==>...
    opg-hbase> dop=2;
    ==>2
    opg-hbase> conf.set("hbase.zookeeper.quorum", "localhost");
    ==>null
    opg-hbase> conf.set("hbase.zookeeper.property.clientPort","2181");
    ==>null
    opg-hbase> conn = ConnectionFactory.createConnection(conf);
    ==>hconnection-0x720653c2
    opg-hbase> opg=OraclePropertyGraph.getInstance(conf, conn, "connections");
    ==>oraclepropertygraph with name connections
    opg-hbase> indexParams = OracleIndexParameters.buildFS(dop /* number of directories */, dop /* number of connections used when indexing */, 10000 /* batch size before commit*/, 500000 /* commit size before Lucene commit*/, true /* enable datatypes */, "./lucene-index" /* index location */);
    ==>[parameter[search-engine,1], parameter[num-subdirectories,4], parameter[directory-type,FS_DIRECTORY], parameter[reindex-numConns,4], parameter[batch-size,10000], parameter[commit-batch-size,500000], parameter[values-as-strings,true], parameter[directory-location,[Ljava.lang.String;@5c1f6d57]]
    opg-hbase> opg.setDefaultIndexParameters(indexParams);
    ==>null
    opg-hbase> indexedKeys = new String[4]; indexedKeys[0] = "name"; indexedKeys[1] = "role"; indexedKeys[2] = "religion"; indexedKeys[3] = "country";
    ==>name
    ==>role
    ==>religion
    ==>country
    opg-hbase> opg.createKeyIndex(indexedKeys, Vertex.class);
    ==>null
    

1.7.5 Managing BDSG Text Indexing Using SolrCloud 7.0

To manage text indexing over property graph data using Apache Lucene:

  1. Go into the BDSG property graph installation directory:
    cd /opt/oracle/oracle-spatial-graph/property_graph
  2. Create a solrcloud directory to hold all the Apache Lucene 7.0 libraries (and their dependencies) required for execution:
    mkdir solrcloud
  3. Set HADOOP_HOME to point to your Hadoop installation base path. For example:
    HADOOP_HOME=/scratch/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678
  4. Copy the required SolrCloud libraries into the lucene directory:
    cp $HADOOP_HOME/lib/solr/solr-solrj-7.0.0-cdh6.0.1.jar solrcloud
     
    cp $HADOOP_HOME/lib/solr/lib/noggit-0.8.jar solrcloud
    cp $HADOOP_HOME/lib/solr/lib/httpmime-4.5.3.jar solrcloud
    cp $HADOOP_HOME/lib/search/lucene-core.jar solrcloud
    cp $HADOOP_HOME/lib/search/lucene-queryparser.jar solrcloud
    cp $HADOOP_HOME/lib/search/lucene-analyzers-common.jar solrcloud
    
  5. Set BDSG_CLASSPATH to the solrcloud directory. For example:
    export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/solrcloud/*:$BDSG_CLASSPATH

Managing Text Indexing in a Java Application

  1. Set BDSG_HOME to the property graph installation directory. For example:
    export BDSG_HOME=/opt/oracle/oracle-spatial-graph/property_graph
  2. Set BDSG_CLASSPATH to the solrcloud directory. For example:
    export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/solrcloud/*:$BDSG_CLASSPATH
  3. Compile the Java code. For example:
    javac -classpath $BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename.java
  4. Run the Java application by executing the compiled code. For example::
    java -classpath ./:$BDSG_HOME/lib/'*':$BDSG_CLASSPATH filename args

Managing Text Indexing Using a Groovy Console

  1. Set BDSG_CLASSPATH to the solrcloud directory. For example:
    export BDSG_CLASSPATH=/opt/oracle/oracle-spatial-graph/property_graph/solrcloud/*:$BDSG_CLASSPATH
  2. Start the shell as usual to create a text index over a property graph stored in Apache HBase storage using an OraclePropertyGraph instance. For example:
    cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy
    sh gremlin-opg-hbase.sh
     
    --------------------------------
     
    opg-hbase> conf = HBaseConfiguration.create();
    ==>hbase.rs.cacheblocksonwrite=false
    ==>...
    opg-hbase> dop=2;
    ==>2
    opg-hbase> conf.set("hbase.zookeeper.quorum", "localhost");
    ==>null
    opg-hbase> conf.set("hbase.zookeeper.property.clientPort","2181");
    ==>null
    opg-hbase> conn = ConnectionFactory.createConnection(conf);
    ==>hconnection-0x720653c2
    opg-hbase> opg=OraclePropertyGraph.getInstance(conf, conn, "connections");
    ==>oraclepropertygraph with name connections
    opg-hbase> indexParams = OracleIndexParameters.buildSolr("opgconfig" /* solr config */, "localhost:2181/solr" /* solr server url */, "localhost:8983_solr" /* solr node set */, 15 /* zookeeper timeout in seconds */, 1 /* total number of shards */, 1 /* Replication factor */, 1 /* maximum number of shardsper node */, 4 /* dop used for scan */, 10000 /* batch size before commit */, 500000 /* commit size before SolrCloud commit */, 15 /* write timeout in seconds */);
    ==>[parameter[search-engine,0], parameter[config-name,opgconfig], parameter[solr-server-url,localhost:2181/solr], parameter[solr-admin-url,localhost:8983_solr], parameter[zk-timeout,15], parameter[replication-factor,1], parameter[num-shards,1], parameter[max-shards-per-node,1], parameter[reindex-numConns,4], parameter[batch-size,10000], parameter[commit-batch-size,500000], parameter[write-timeout,15]]
    opg-hbase> 
    opg-hbase> opg.setDefaultIndexParameters(indexParams);
    ==>null
     
     
    opg-hbase> indexedKeys = new String[4]; indexedKeys[0] = "name"; indexedKeys[1] = "role"; indexedKeys[2] = "religion"; indexedKeys[3] = "country";
    ==>name
    ==>role
    ==>religion
    ==>country
    opg-hbase> opg.createKeyIndex(indexedKeys, Vertex.class);
    ==>null
    

1.8 Required Application Code Changes due to Upgrades

Application code changes may be required due to upgrades, such as to more recent versions of Apache HBase and SolrCloud.

1.8.1 Changes Due to Upgrade from Apache HBase 1.x to Apache HBase 2.x

Big Data Spatial and Graph 2.5.3 supports Cloudera CDH6, which upgraded Apache HBase to a newer version.

Creating a Property Graph Instance

Effective with Apache HBase 2.0, the HConnection interface has been deprecated, so the data acesss layer requires using a Connection object to connect to the database. The following code snippet illustrates how to create an OraclePropertyGraph instance from an Apache HBase 2.0 Connection object.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.*;
  
...
  
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", szQuorum);
conf.set("hbase.zookeeper.property.clientPort","2181");
Connection conn = ConnectionFactory.createConnection(conf);
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(conf, hconn, szGraphName);

Parallel Retrieval of Vertices/Edges

The following code snippet opens an array of connections to HBase (using the Connection/ConnectionFactory APIs from Apache HBase 2.x), and executes a parallel query to retrieve all vertices and edges using the opened connections. The number of calls to the getVerticesPartitioned/getEdgesPartitioned method is controlled by the total number of splits and the number of connections used.

int dop = 4;
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", szQuorum);
conf.set("hbase.zookeeper.property.clientPort","2181");
Connection conn = ConnectionFactory.createConnection(conf);
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(conn, "connections");
 
// Create connections used in parallel query
Connection[] conns= new Connection[dop];
for (int i = 0; i < dop; i++) { 
  Configuration conf_new = HBaseConfiguration.create(opg.getConfiguration());
  conns[i] = ConnectionFactory.createConnection(conf_new); 
}
 
long lCountV = 0;
// Iterate over all the vertices¿ splits to count all the vertices
for (int split = 0; split < opg.getVertexTableSplits(); split += dop) { 
  Iterable<Vertex>[] iterables = opg.getVerticesPartitioned(conns /* Connection array */, 
                                                            true /* skip store to cache */, 
                                                            split /* starting split */); 
 
 
  for (Iterable<Vertex> iterable : iterables) {
    lCountV += OraclePropertyGraphUtils.size(iterable); /* consume iterables */
  }
}
 
// Count all vertices
System.out.println("Vertices found using parallel query: " + lCountV);
 
long lCountE = 0;
// Iterate over all the edges¿ splits to count all the edges
for (int split = 0; split < opg.getEdgeTableSplits();  split += dop) { 
  Iterable<Edge>[] iterables = opg.getEdgesPartitioned(conns /* Connection array */, 
                                                       true /* skip store to cache */, 
                                                       split /* starting split */); 
   
  for (Iterable<Vertex> iterable : iterables) {
    lCountE += consumeIterables(iterables); /* consume iterables */
  }
}
 
// Count all edges
System.out.println("Edges found using parallel query: " + lCountE);
 
// Close the connections to the database after completed
for (int idx = 0; idx < conns.length; idx++) { 
  conns[idx].close();
}

Dropping an Existing Graph

For Apache HBase 2.x, the OraclePropertyGraphUtils.dropPropertyGraph method uses the Hadoop nodes and the Apache HBase port number for the connection. The following code fragment deletes a graph named my_graph from Apache HBase 2.x.

int dop = 4;
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", szQuorum);
conf.set("hbase.zookeeper.property.clientPort","2181");
Connection conn = ConnectionFactory.createConnection(conf);
OraclePropertyGraphUtils.dropPropertyGraph(conn, "my_graph");

1.8.2 Changes Due to Upgrade from SolrCloud 4.10.3 to SolrCloud 7.0.0

The upgrade from SolrCloud 4.10.3 to SolrCloud 7.0.0 may require some application code changes.

Parallel Query on Text Indexes for Property Graph Data

With SolrCloud 7.0, the SolrCloudServer interface has been deprecated, so the data acess layer requires using a CloudSolrClient object to connect to SolrCloud text search engine. In order to execute parallel queries over a SolrCloud-based text index, you must specify a set of CloudSolrClient instances. To create a CloudSolrClient instance, you can rely on the SolrIndexUtils.getCloudSolrClient API, because the operation SolrIndexUtils.getCloudSolrServer is now deprecated

The following code snippet generates an automatic text index using the SolrCloud Search engine and executes a parallel text query. The number of calls to the getPartitioned method in the SolrIndex class is controlled by the total number of shards in the index and the number of connections used.

OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName);
 
 
String configName = "opgconfig";
String solrServerUrl = args[4];//"localhost:2181/solr"
String solrNodeSet = args[5]; //"localhost:8983_solr";
  
int zkTimeout = 15; // zookeeper timeout in seconds
int numShards = Integer.parseInt(args[6]); // number of shards in the index
int replicationFactor = 1; // replication factor
int maxShardsPerNode = 1; // maximum number of shards per node
  
// Create an automatic index using SolrCloud
OracleIndexParameters indexParams =  OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout /* zookeeper timeout in seconds */, numShards /* total number of shards */, replicationFactor /* Replication factor */, maxShardsPerNode /* maximum number of shardsper node*/, 4 /* dop used for scan */, 10000 /* batch size before commit*/, 500000 /* commit size before SolrCloud commit*/, 15 /* write timeout in seconds */);
 
opg.setDefaultIndexParameters(indexParams);
 
// Create auto indexing on name property for all vertices
System.out.println("Create automatic index on name for vertices");
opg.createKeyIndex("name", Vertex.class);
 
// Get the SolrIndex object 
SolrIndex<Vertex> index = (SolrIndex<Vertex>) opg.getAutoIndex(Vertex.class);
 
// Open an array of connections to handle connections to SolrCloud needed for parallel text search
CloudSolrClient[] conns = new CloudSolrClient[dop];
 
 
for (int idx = 0; idx < conns.length; idx++) {
  conns[idx] = index.getCloudSolrClient(15 /* write timeout in secs*/);
}
 
// Iterate to cover all the shards in the index
long lCount = 0;
for (int split = 0; split < index.getTotalShards(); split += conns.length) {
  // Gets elements from split to split + conns.length
  Iterable<Vertex>[] iterAr = index.getPartitioned(conns /* connections */, "name"/* key */, "*" /* value */, true /* wildcards */, split /* start split ID */);
  for (Iterable<Vertex> iterable : iterables) {
    lCount += OraclePropertyGraphUtils.size(iterable); /* consume iterables */
  }
}
 
// Close the connections to SolrCloud after completed
for (int idx = 0; idx < conns.length; idx++) { 
  conns[idx].close();
}

Using Native Query Results with SolrCloud

You can use native query results using SolrCloud by calling the method get(QueryResponse) in SolrIndex. A QueryResponse object provides a set of Documents matching a text search query over a specific SolrCloud collection. SolrIndex will produce an Iterable object holding all the vertices (or edges) from the documents found in the QueryResponse object.

With SolrCloud 7.0, the SolrCloudServer interface has been deprecated, so the data access layer requires use of a CloudSolrClient object to process native query results over a text index in Oracle Property Graph. The following code fragment generates an automatic text index using the Apache SolrCloud Search engine, creates a SolrQuery object, and executes it against a CloudSolrClient object to get a QueryResponse object. Later, an Iterable object of vertices is created from the given result object.

OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName);
 
 String configName = "opgconfig";
String solrServerUrl = args[4];//"localhost:2181/solr"
String solrNodeSet = args[5]; //"localhost:8983_solr";
  
int zkTimeout = 15; // zookeeper timeout in seconds
int numShards = Integer.parseInt(args[6]); // number of shards in the index
int replicationFactor = 1; // replication factor
int maxShardsPerNode = 1; // maximum number of shards per node
  
// Create an automatic index using SolrCloud
OracleIndexParameters indexParams =  OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout /* zookeeper timeout in seconds */, numShards /* total number of shards */, replicationFactor /* Replication factor */, maxShardsPerNode /* maximum number of shardsper node*/, 4 /* dop used for scan */, 10000 /* batch size before commit*/, 500000 /* commit size before SolrCloud commit*/, 15 /* write timeout in seconds */);
 
opg.setDefaultIndexParameters(indexParams);
 
 // Create auto indexing on name property for all vertices System.out.println("Create automatic index on name and country for vertices"); String[] indexedKeys = new String[2]; indexedKeys[0]="name"; indexedKeys[1]="country"; opg.createKeyIndex(indexedKeys, Vertex.class);
 
 // Get the SolrIndex object
SolrIndex<Vertex> index = (SolrIndex<Vertex>) opg.getAutoIndex(Vertex.class);
 
 // Search first for Key name with property value Beyon* using only string data types
String szQueryStrBey = index.buildSearchTerm("name", "Beyo*", String.class);
String key = index.appendDatatypesSuffixToKey("country", String.class);
String value = index.appendDatatypesSuffixToValue("United States", String.class);
String szQueryStrCountry = key + ":" + value;
Solrquery query = new SolrQuery(szQueryStrBey + " AND " + szQueryStrCountry);
 
 CloudSolrClient conn = index.getCloudSolrClient(15 /* write timeout in secs*/);
  
//Query using get operation
QueryResponse qr = conn.query(query, SolrRequest.METHOD.POST);
 
Iterable<Vertex> it = index.get(qr);
long lCount = 0;
while (it.hasNext()) {
  System.out.println(it.next());
  lCount++;
}
 
System.out.println("Vertices found: "+ lCount);