Installing Oracle Big Data Spatial and Graph on an Oracle Big Data Appliance
Installing and Configuring the Big Data Spatial Image Processing Framework
Installing and Configuring the Big Data Spatial Image Server
Installing the Oracle Big Data Spatial Hadoop Vector Console
Installing Property Graph Support on a CDH Cluster or Other Hardware
Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms.
The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization.
The property graph features support Apache Hadoop HBase and Oracle NoSQL Database for graph operations, indexing, queries, search, and in-memory analytics.
The multimedia analytics features provide a framework for processing video and image data in Apache Hadoop, including built-in face recognition using OpenCV.
Spatial location information is a common element of Big Data. Businesses can use spatial data as the basis for associating and linking disparate data sets. Location information can also be used to track and categorize entities based on proximity to another person, place, or object, or on their presence a particular area. Location information can facilitate location-specific offers to customers entering a particular geography, something known as geo-fencing. Georeferenced imagery and sensory data can be analyzed for a variety of business benefits.
The Spatial features of Oracle Big Data Special and Graph support those use cases with the following kinds of services.
Vector Services:
Ability to associate documents and data with names, such as cities or states, or longitude/latitude information in spatial object definitions for a default administrative hierarchy
Support for text-based 2D and 3D geospatial formats, including GeoJSON files, Shapefiles, GML, and WKT, or you can use the Geospatial Data Abstraction Library (GDAL) to convert popular geospatial encodings such as Oracle SDO_Geometry, ST_Geometry, and other supported formats
An HTML5-based map client API and a sample console to explore, categorize, and view data in a variety of formats and coordinate systems
Topological and distance operations: Anyinteract, Inside, Contains, Within Distance, Nearest Neighbor, and others
Spatial indexing for fast retrieval of data
Raster Services:
Support for many image file formats supported by GDAL and image files stored in HDFS
A sample console to view the set of images that are available
Raster operations, including, subsetting, georeferencing, mosaics, and format conversion
Graphs manage networks of linked data as vertices, edges, and properties of the vertices and edges. Graphs are commonly used to model, store, and analyze relationships found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.
Typical graph analyses encompass graph traversal, recommendations, finding communities and influencers, and pattern matching. Industries including, telecommunications, life sciences and healthcare, security, media and publishing can benefit from graphs.
The property graph features of Oracle Big Data Special and Graph support those use cases with the following capabilities:
A scalable graph database on Apache HBase and Oracle NoSQL Database
Developer-based APIs based upon Tinkerpop Blueprints, and Java graph APIs
Text search and query through integration with Apache Lucene and SolrCloud
Scripting languages support for Groovy and Python
A parallel, in-memory graph analytics engine
A fast, scalable suite of social network analysis functions that include ranking, centrality, recommender, community detection, path finding
Parallel bulk load and export of property graph data in Oracle-defined flat files format
Manageability through a Groovy-based console to execute Java and Tinkerpop Gremlin APIs
The following are recommendations for property graph installation.
Table 1-1 Property Graph Sizing Recommendations
Graph Size | Recommended Physical Memory to be Dedicated | Recommended Number of CPU Processors |
---|---|---|
10 to 100M edges |
Up to 14 GB RAM |
2 to 4 processors, and up to 16 processors for more compute-intensive workloads |
100M to 1B edges |
14 GB to 100 GB RAM |
4 to 12 processors, and up to 16 to 32 processors for more compute-intensive workloads |
Over 1B edges |
Over 100 GB RAM |
12 to 32 processors, or more for especially compute-intensive workloads |
The multimedia analytics feature of Oracle Big Data Spatial and Graph provides a framework for processing video and image data in Apache Hadoop. The framework enables distributed processing of video and image data.
A main use case is performing facial recognition in videos and images.
The Mammoth command-line utility for installing and configuring the Oracle Big Data Appliance software also installs the Oracle Big Data Spatial and Graph option, including the spatial, property graph, and multimedia capabilities. You can enable this option during an initial software installation, or afterward using the bdacli
utility.
To use Oracle NoSQL Database as a graph repository, you must have an Oracle NoSQL Database cluster.
To use Apache HBase as a graph repository, you must have an Apache Hadoop cluster.
See Also:
Oracle Big Data Appliance Owner's Guide for software configuration instructions.
Installing and configuring the Image Processing Framework depends upon the distribution being used.
The Oracle Big Data Appliance cluster distribution comes with a pre-installed setup, but you must follow few steps in Installing the Image Processing Framework for Oracle Big Data Appliance Distribution to get it working.
For a commodity distribution, follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).
For both distributions:
You must download and compile PROJ libraries, as explained in Getting and Compiling the Cartographic Projections Library.
After performing the installation, verify it (see Post-installation Verification of the Image Processing Framework).
If the cluster has security enabled, make sure that the user executing the jobs is in the princs
list and has an active Kerberos ticket.
Before installing the Image Processing Framework, you must download the Cartographic Projections Library and perform several related operations.
Download the PROJ.4 source code and datum shifting files:
$ wget http://download.osgeo.org/proj/proj-4.9.1.tar.gz $ wget http://download.osgeo.org/proj/proj-datumgrid-1.5.tar.gz
Untar the source code, and extract the datum shifting files in the nad
subdirectory:
$ tar xzf proj-4.9.1.tar.gz $ cd proj-4.9.1/nad $ tar xzf ../../proj-datumgrid-1.5.tar.gz $ cd ..
Configure, make, and install PROJ.4:
$ ./configure $ make $ sudo make install $ cd ..
libproj.so
is now available at /usr/local/lib/libproj.so
.
Create a link to the libproj.so
file in the spatial installation directory:
sudo ln -s /usr/local/lib/libproj.so /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
Provide read and execute permissions for the libproj.so
library for all users
sudo chmod 755 /opt/oracle/oracle-spatial-graph/spatial/raster/gdal/lib/libproj.so
The Oracle Big Data Appliance distribution comes with a pre-installed configuration. However, be sure that the actions described in Getting and Compiling the Cartographic Projections Library have been performed, so that libproj.so
(PROJ.4
) is accessible to all users and is set up correctly.
For OBDA, ensure that the following directories exist:
SHARED_DIR (shared directory for all nodes in the cluster): /opt/shareddir
ALL_ACCESS_DIR (shared directory for all nodes in the cluster with Write access to the hadoop group): /opt/shareddir/spatial
For Big Data Spatial and Graph in environments other than the Big Data Appliance, follow the instructions in this section.
Ensure that HADOOP_LIB_PATH
is under /usr/lib/hadoop
. If it is not there, find the path and use it as it your HADOOP_LIB_PATH
.
Install NFS.
Have at least one folder, referred in this document as SHARED_FOLDER, in the Resource Manager node accessible to every Node Manager node through NFS.
Provide write access to all the users involved in job execution and the yarn users to this SHARED_FOLDER
Download oracle-spatial-graph-<version>.x86_64.rpm
from the Oracle e-delivery web site.
Execute oracle-spatial-graph-<version>.x86_64.rpm
using the rpm command.
After rpm executes, verify that a directory structure created at /opt/oracle/oracle-spatial-graph/spatial/raster
contains these folders: console
, examples
, jlib
, gdal
, and tests
. Additionally, index.html
describes the content, and javadoc.zip
contains the Javadoc for the API..
Several test scripts are provided to:
Test the image loading functionality
Test test the image processing functionality
Test a processing class for slope calculation in a DEM and a map algebra operation
Verify the image processing of a single raster with no mosaic process (it includes a user-provided function that calculates hill shade in the mapping phase).
Test processing of two rasters using a mask operation
Execute these scripts to verify a successful installation of image processing framework.
If the cluster has security enabled, make sure the current user is in the princs
list and has an active Kerberos ticket.
Make sure the user has write access to ALL_ACCESS_FOLDER and that it belongs to the owner group for this directory. It is recommended that jobs be executed in Resource Manager node for Big Data Appliance. If jobs are executed in a different node, then the default is the hadoop group.
This script loads a set of six test rasters into the ohiftest
folder in HDFS, 3 rasters of byte data type and 3 bands, 1 raster (DEM) of float32 data type and 1 band, and 2 rasters of int32 data type and 1 band. No parameters are required for OBDA environments and a single parameter with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
Internally, the job creates a split for every raster to load. Split size depends on the block size configuration; for example, if a block size >= 64MB is configured, 4 mappers will run; and as a result the rasters will be loaded in HDFS and a corresponding thumbnail will be created for visualization. An external image editor is required to visualize the thumbnails, and an output path of these thumbnails is provided to the users upon successful completion of the job.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageloader.sh
For ODBA environments, enter:
./runimageloader.sh
For non-ODBA environments, enter:
./runimageloader.sh ALL_ACCESS_FOLDER
Upon successful execution, the message GENERATED OHIF FILES ARE LOCATED IN HDFS UNDER
is displayed, with the path in HDFS where the files are located (this path depends on the definition of ALL_ACCESS_FOLDER) and a list of the created images and thumbnails on HDFS. The output may include:
“THUMBNAILS CREATED ARE: ---------------------------------------------------------------------- total 13532 drwxr-xr-x 2 yarn yarn 4096 Sep 9 13:54 . drwxr-xr-x 3 yarn yarn 4096 Aug 27 11:29 .. -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 hawaii.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 inputimageint32_1.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 kahoolawe.tif.ohif.tif -rw-r--r-- 1 yarn yarn 3214053 Sep 9 13:54 maui.tif.ohif.tif -rw-r--r-- 1 yarn yarn 4182040 Sep 9 13:54 NapaDEM.tif.ohif.tif YOU MAY VISUALIZE THUMBNAILS OF THE UPLOADED IMAGES FOR REVIEW FROM THE FOLLOWING PATH:
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
NOT ALL THE IMAGES WERE UPLOADED CORRECTLY, CHECK FOR HADOOP LOGS
The amount of memory required to execute mappers and reducers depends on the configured HDFS block size By default, 1 GB of memory is assigned for Java, but you can modify that and other properties in the imagejob.prop
file that is included in this test directory.
This script executes the processor job by setting three source rasters of Hawaii islands and some coordinates that includes all three. The job will create a mosaic based on these coordinates and resulting raster should include the three rasters combined in a single one.
runimageloader.sh
should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 3 band rasters of byte data type.
No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
Additionally, if the output should be stored in HDFS, the "-o" parameters must be used to set the HDFS folder where the mosaic output will be stored.
Internally the job filters the tiles using the coordinates specified in the configuration input, xml, only the required tiles are processed in a mapper and finally in the reduce phase, all of them are put together into the resulting mosaic raster.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessor.sh
For ODBA environments, enter:
./runimageprocessor.sh
For non-ODBA environments, enter:
./runimageprocessor.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif
is displayed, with the path to the output mosaic file. The output may include:
EXPECTED OUTPUT FILE IS: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif total 9452 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:12 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4741101 Sep 10 09:12 hawaiimosaic.tif MOSAIC IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/hawaiimosaic.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
To test the output storage in HDFS, use the following command
For ODBA environments, enter:
./runimageprocessor.sh -o hdfstest
For non-ODBA environments, enter:
./runimageprocessor.sh -s ALL_ACCESS_FOLDER -o hdfstest
This script executes the processor job for a single raster, in this case is a DEM source raster of North Napa Valley. The purpose of this job is process the complete input by using the user processing classes configured for the mapping phase. This class calculates the hillshade of the DEM, and this is set to the output file. No mosaic operation is performed here.
runimageloader.sh
should be executed as a prerequisite, so that the source raster exists in HDFS. This is 1 band of float 32 data type DEM rasters.
No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runsingleimageprocessor.sh
For ODBA environments, enter:
./runsingleimageprocessor.sh
For non-ODBA environments, enter:
./runsingleimageprocessor.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
is displayed, with the path to the output DEM file. The output may include:
EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif total 4808 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:42 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4901232 Sep 10 09:42 NapaDEM.tif IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaDEM.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
This script executes the processor job by setting a DEM source raster of North Napa Valley and some coordinates that surround it. The job will create a mosaic based on these coordinates and will also calculate the slope on it by setting a processing class in the mosaic configuration XML.
runimageloader.sh
should be executed as a prerequisite, so that the source rasters exist in HDFS. This is 1 band of float 32 data type DEM raster.
No parameters are required for OBDA environments, and a single parameter "-s" with the ALL_ACCESS_FOLDER value is required for non-OBDA environments.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessordem.sh
For ODBA environments, enter:
./runimageprocessordem.sh
For non-ODBA environments, enter:
./runimageprocessordem.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif
is displayed, with the path to the slope output file. The output may include:
EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif total 4808 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:42 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4901232 Sep 10 09:42 NapaSlope.tif MOSAIC IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE MOSAIC OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/NapaSlope.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
MOSAIC WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
You may also test the “if” algebra function, where every pixel in this raster with value greater than 2500 will be replaced by the value you set in the command line using the “–c” flag. For example:
For ODBA environments, enter:
./runimageprocessordem.sh –c 8000
For non-ODBA environments, enter:
./runimageprocessordem.sh -s ALL_ACCESS_FOLDER –c 8000
You can visualize the output file and notice the difference between simple slope calculation and this altered output, where the areas with pixel values greater than 2500 look more clear.
This script executes the processor job for two rasters that cover a very small area of North Napa Valley in the US state of California.
These rasters have the same MBR, pixel size, SRID, and data type, all of which are required for complex multiple raster operation processing. The purpose of this job is process both rasters by using the mask operation, which checks every pixel in the second raster to validate if its value is contained in the mask list. If it is, the output raster will have the pixel value of the first raster for this output cell; otherwise, the zero (0) value is set. No mosaic operation is performed here.
runimageloader.sh
should be executed as a prerequisite, so that the source rasters exist in HDFS. These are 1 band of int32 data type rasters.
No parameters are required for OBDA environments. For non-ODBA environments, a single parameter -s
with the ALL_ACCESS_FOLDER value is required.
The test script can be found here:
/opt/oracle/oracle-spatial-graph/spatial/raster/tests/runimageprocessormultiple.sh
For ODBA environments, enter:
./runimageprocessormultiple.sh
For non-ODBA environments, enter:
./runimageprocessormultiple.sh -s ALL_ACCESS_FOLDER
Upon successful execution, the message EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif
is displayed, with the path to the mask output file. The output may include:
EXPECTED OUTPUT FILE: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif total 4808 drwxrwxrwx 2 hdfs hdfs 4096 Sep 10 09:42 . drwxrwxrwx 9 zherena dba 4096 Sep 9 13:50 .. -rwxrwxrwx 1 yarn yarn 4901232 Sep 10 09:42 MaskInt32Rasters.tif IMAGE GENERATED ---------------------------------------------------------------------- YOU MAY VISUALIZE THE OUTPUT IMAGE FOR REVIEW IN THE FOLLOWING PATH: ALL_ACCESS_FOLDER/processtest/MaskInt32Rasters.tif”
If the installation and configuration were not successful, then the output is not generated and a message like the following is displayed:
IMAGE WAS NOT SUCCESSFULLY CREATED, CHECK HADOOP LOGS TO REVIEW THE PROBLEM
You can access the image processing framework through the Oracle Big Data Spatial Image Server, which provides a web interface for loading and processing images.
Installing and configuring the Spatial Image Server depends upon the distribution being used.
Installing and Configuring the Image Server for Oracle Big Data Appliance
Installing and Configuring the Image Server Web for Other Systems (Not Big Data Appliance)
After you perform the installation, verify it (see Post-Installation Verification Example for the Image Server Console).
To perform an automatic installation using the provided script, you can perform these steps:
Run the following script:
sudo /opt/oracle/oracle-spatial-graph/spatial/configure-server/install-bdsg-consoles.sh
If the active nodes have changed since the installation, update the configuration in the web console.
Start the server:
cd /opt/oracle/oracle-spatial-graph/spatial/web-server sudo ./start-server.sh
If any errors occur, see the the README file located in /opt/oracle/oracle-spatial-graph/spatial/configure-server
.
The preceding instructions configure the entire server. If no further configuration is required, you can go directly to Post-Installation Verification Example for the Image Server Console.
If you need more information or need to perform other actions, see the following topics:
Ensure that you have the prerequisite software installed.
Copy the asm-3.1.jar
file under /opt/oracle/oracle-spatial-graph/spatial/raster/jlib/asm-3.1.jar
to WEB_SDERVER_HOME/webapps/imageserver/WEB-INF/lib
.
Note:
The jersey-core*
jars will be duplicated at WEB_SERVER_HOME/webapps/imageserver/WEB-INF/lib
. Make sure you remove the old ones and leave just jersey-core-1.17.1.jar
in the folder, as in the next step.
Enter the following command:
ls -lat jersey-core*
Delete the listed libraries, except do not delete jersey-core-1.17.1.jar
.
In the same directory (WEB_SERVER_HOME/webapps/imageserver/WEB-INF/lib
), delete the xercesImpl
and servlet
jar files:
rm xercesImpl* rm servlet*
Start the web server
If you need to change the port, specify it. For example, in the case of the Jetty server, set jetty.http.port=8081
.
Ignore any warnings, such as the following:
java.lang.UnsupportedOperationException: setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBuilderFactory
Type the http://thehost:8045/imageserver
address in your browser address bar to open the web console.
From the Administrator tab, then Configuration tab, in the Hadoop Configuration Parameters section, depending on the cluster configuration change three properties:
fs.defaultFS
: Type the active namenode
of your cluster in the format hdfs://<namenode>:8020
(Check with the administrator for this information).
yarn.resourcemanager.scheduler.address
: Active Resource manager of your cluster. <shcedulername>:8030.
This is the Scheduler address.
yarn.resourcemanager.address
: Active Resource Manager address in the format <resourcename>:8032
Note:
Keep the default values for the rest of the configuration. They are pre-loaded for your Oracle Big Data Appliance cluster environment.
Click Apply Changes to save the changes.
Tip:
You can review the missing configuration information under the Hadoop Loader tab of the console.
To install and configure the image server web for other systems (not Big Data Appliance), see these topics.
Before installing the image server on other systems, you must install the image processing framework as specified in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance).
The steps to install the image server web on other systems are the same as for installing it on BDA.
Follow the instructions specified in "Prerequisites for Performing a Manual Installation."
Follow the instructions specified in "Installing Dependencies on the Image Server Web on an Oracle Big Data Appliance."
Follow the instructions specified in "Configuring the Environment for Other Systems."
Configure the environment as described in Configuring the Environment for Big Data Appliance
, and then continue with the following steps.From the Configuration tab in the Global Init Parameters section, depending on the cluster configuration change these properties
shared.gdal.data
: Specify the gdal shared data folder. Follow the instructions in Installing the Image Processing Framework for Other Distributions (Not Oracle Big Data Appliance) .
gdal.lib
: Location of the gdal .so
libraries.
start
: Specify a shared folder to start browsing the images. This folder must be shared between the cluster and NFS mountpoint (SHARED_FOLDER).
saveimages
: Create a child folder named saveimages
under start
(SHARED-FOLDER) with full write access. For example, if start=/home
, then saveimages=/home/saveimages
.
nfs.mountpoint
: If the cluster requires a mount point to access the SHARED_FOLDER, specify a mount point. For example, /net/home
. Otherwise, leave it blank.
From the Configuration tab in the Hadoop Configuration Parameters section, update the following property:
yarn.application.classpath
: The classpath for the Hadoop to find the required jars and dependencies. Usually this is under /usr/lib/hadoop
. For example:
/etc/hadoop/conf/,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*
Note:
Keep the default values for the rest of the configuration.
Click Apply Changes to save the changes.
Tip:
You can review any missing configuration information under the Hadoop Loader tab of the console.
In this example, you will:
Load the images from local server to HDFS Hadoop cluster.
Run a job to create a mosaic image file and a catalog with several images.
View the mosaic image.
Related subtopics:
Note:
If no errors were shown, then you have successfully installed the Image Loader web interface.
The image server has two ready-to-use web services, one for the HDFS loader and the other for the HDFS mosaic processor.
These services can be called from a Java application. They are currently supported only for GET operations. The formats for calling them are:
Loader: http://host:port/imageserver/rest/hdfsloader?path=string&overlap=string
where:
path
: The images to be processed; can be a the path of a single file, or of one or more whole folders. For more than one folder, use commas to separate folder names.
overlap
(optional): The overlap between images (default = 10).
Mosaic: http://host:port/imageserver/rest/mosaic?mosaic=string&config=string
where:
mosaic
: The XML mosaic file that contains the images to be processed. If you are using the image server web application, the XML file is generated automatically. Example of a mosaic XML file:
<?xml version='1.0'?> <catalog type='HDFS'> <image> <source>Hadoop File System</source> <type>HDFS</type> <raster> /hawaii.tif.ohif</raster> <bands datatype='1' config='1,2,3'>3</bands> </image> <image> <source>Hadoop File System</source> <type>HDFS</type> <raster>/ /kahoolawe.tif.ohif</raster> <bands datatype='1' config='1,2,3'>3</bands> </image> </catalog>
config
: Configuration file; created the first time a mosaic is processed using the image server web application. Example of a configuration file
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <mosaic> <output> <SRID>26904</SRID> <directory type = "FS">/net/system123/scratch/user3/installers</directory> <tempFsFolder>/net/system123/scratch/user3/installers</tempFsFolder> <filename>test</filename> <format>GTIFF</format> <width>1800</width> <height>1406</height> <algorithm order = "0">1</algorithm> <bands layers = "3"/> <nodata>#000000</nodata> <pixelType>1</pixelType> </output> <crop> <transform>294444.1905688362,114.06068372059636,0,2517696.9179752027,0,-114.06068372059636</transform> </crop> <process/> <operations> <localnot/> </operations> </mosaic>
Java Example: Using the Loader
public class RestTest public static void main(String args[]) { try { // Loader http://localhost:7101/imageserver/rest/hdfsloader?path=string&overlap=string // Mosaic http://localhost:7101/imageserver/rest/mosaic?mosaic=string&config=string String path = "/net/system123/scratch/user3/installers/hawaii/hawaii.tif"; URL url = new URL( "http://system123.example.com:7101/imageserver/rest/hdfsloader?path=" + path + "&overlap=2"); // overlap its optional HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("GET"); //conn.setRequestProperty("Accept", "application/json"); if (conn.getResponseCode() != 200) { throw new RuntimeException("Failed : HTTP error code : " + conn.getResponseCode()); } BufferedReader br = new BufferedReader(new InputStreamReader( (conn.getInputStream()))); String output; System.out.println("Output from Server .... \n"); while ((output = br.readLine()) != null) { System.out.println(output); } conn.disconnect(); } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }
Java Example: Using the Mosaic Processor
public class NetClientPost { public static void main(String[] args) { try { String mosaic = "<?xml version='1.0'?>\n" + "<catalog type='HDFS'>\n" + " <image>\n" + " <source>Hadoop File System</source>\n" + " <type>HDFS</type>\n" + " <raster>/user/hdfs/newdata/net/system123/scratch/user3/installers/hawaii/hawaii.tif.ohif</raster>\n" + " <url>http://system123.example.com:7101/imageserver/temp/862b5871973372aab7b62094c575884ae13c3a27_thumb.jpg</url>\n" + " <bands datatype='1' config='1,2,3'>3</bands>\n" + " </image>\n" + "</catalog>"; String config = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" + "<mosaic>\n" + "<output>\n" + "<SRID>26904</SRID>\n" + "<directory type=\"FS\">/net/system123/scratch/user3/installers</directory>\n" + "<tempFsFolder>/net/system123/scratch/user3/installers</tempFsFolder>\n" + "<filename>test</filename>\n" + "<format>GTIFF</format>\n" + "<width>1800</width>\n" + "<height>1269</height>\n" + "<algorithm order=\"0\">1</algorithm>\n" + "<bands layers=\"3\"/>\n" + "<nodata>#000000</nodata>\n" + "<pixelType>1</pixelType>\n" + "</output>\n" + "<crop>\n" + "<transform>739481.1311601736,130.5820811245199,0,2254053.5858749463,0,-130.5820811245199</transform>\n" + "</crop>\n" + "<process/>\n" + "</mosaic>"; System.out.println ("asdf"); URL url2 = new URL("http://192.168.1.67:8080" ); HttpURLConnection conn2 = (HttpURLConnection) url2.openConnection(); conn2.setRequestMethod("GET"); if (conn2.getResponseCode() != 200 ) { throw new RuntimeException("Failed : HTTP error code : " + conn2.getResponseCode()); } /*URL url = new URL("http://system123.example.com:7101/imageserver/rest/mosaic?" +("mosaic=" + URLEncoder.encode(mosaic, "UTF-8") + "&config=" + URLEncoder.encode(config, "UTF-8"))); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("GET"); if (conn.getResponseCode() != 200 ) { throw new RuntimeException("Failed : HTTP error code : " + conn.getResponseCode()); } BufferedReader br = new BufferedReader(new InputStreamReader( (conn.getInputStream()))); String output;System.out.println("Output from Server .... \n"); while ((output = br.readLine()) != null) System.out.println(output); conn.disconnect();*/ } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }
To install the Oracle Big Data Spatial Hadoop vector console, follow the instructions in this topic.
Installing the Spatial Hadoop Vector Console on Oracle Big Data Appliance
Installing the Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)
Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance
Configuring the Spatial Hadoop Vector Console for Other Systems (Not Big Data Appliance)
The following assumptions and prerequisites apply for installing and configure the Spatial Hadoop Vector Console.
The API and jobs described here run on a Cloudera CDH5.7, Hortonworks HDP 2.4, or similar Hadoop environment.
Java 8 or newer versions are present in your environment.
In addition to the Hadoop environment jars, the libraries listed here are required by the Vector Analysis API.
sdohadoop-vector.jar sdoutil.jar sdoapi.jar ojdbc.jar commons-fileupload-1.3.1.jar commons-io-2.4.jar jackson-annotations-2.1.4.jar jackson-core-2.1.4.jar jackson-core-asl-1.8.1.jar jackson-databind-2.1.4.jar javacsv.jar lucene-analyzers-common-4.6.0.jar lucene-core-4.6.0.jar lucene-queries-4.6.0.jar lucene-queryparser-4.6.0.jar mvsuggest_core.jar
You can install the Spatial Hadoop vector console on Big Data Appliance either by using the provided script or by performing a manual configuration.. To use the provided script:
Run the following script to install the console:
sudo /opt/oracle/oracle-spatial-graph/spatial/configure-server/install-bdsg-consoles.sh
If the active nodes have changed after the installation, then update the configuration file as described in Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance.
Start the console:
cd /opt/oracle/oracle-spatial-graph/spatial/web-server sudo ./start-server.sh
If any errors occur, see the the README file located in /opt/oracle/oracle-spatial-graph/spatial/configure-server
.
To perform a manual configuration, follow these steps.
Optionally, upload sample data (used with examples in other topics) to HDFS:
sudo -u hdfs hadoop fs -mkdir /user/oracle/bdsg sudo -u hdfs hadoop fs -put /opt/oracle/oracle-spatial-graph/spatial/vector/examples/data/tweets.json /user/oracle/bdsg/
Follow the steps for manual configuration described in "Installing the Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in step 3 replace the path /opt/cloudera/parcels/CDH/lib/
with the actual library path, which by default is /usr/lib/
.
Edit the configuration file WEB_SERVER_HOME/webapps/spatialviewer/conf/console-conf.xml
, or /opt/oracle/oracle-spatial-graph/spatial/web-server/spatialviewer/conf/console-conf.xml
if the installation was done using the provided script, to specify your own data for sending email and for other configuration parameters.
Follow these steps with the configuration parameters
Edit the Notification URL: This is the URL where the console server is running. It has to be visible to the Hadoop cluster to notify the end of the jobs. This is an example settings: <baseurl>http:// hadoop.console.url:8080</baseurl>
Edit the directory with temporary hierarchical indexes: an HDFS path that will contain temporary data on hierarchical relationships. Example: <hierarchydataindexpath>hdfs://hadoop.cluster.url:8020/user/myuser/hierarchyIndexPath</hierarchydataindexpath>
Edit the HDFS directory that will contain the MVSuggest generated index. Example: <mvsuggestindex> hdfs://hadoop.cluster.url:8020/user/myuser /mvSuggestIndex</mvsuggestindex>
If necessary, edit the URL used to get the eLocation background maps. Example: <elocationmvbaseurl>http://elocation.oracle.com/mapviewer</elocationmvbaseurl>
Edit the HDFS directory that will contain the index metadata. Example: <indexmetadatapath>hdfs:// hadoop.cluster.url:8020/user/myuser/indexMetadata</indexmetadatapath>
Edit the HDFS directory with temporary data used by the explore data processes. Example: <exploretempdatapath>hdfs:// hadoop.cluster.url:8020/user/myuser/exploreTmp<exploretempdatapath>
Edit the HDFS directory that will contain information about the jobs run by the console. Example: <jobregistrypath>hdfs:// hadoop.cluster.url:8020/user/myuser/spatialJobRegistry</jobregistrypath>
If necessary disable the display of the jobs in the job details screen. Disable this display if the logs are not in the default format. The default format is: Date LogLevel LoggerName: LogMessage
The Date must have the default format: yyyy-MM-dd HH:mm:ss,SSS
. For example: 2012-11-02 14:34:02,781
. To disable the logs, set <displaylogs>
to false
. Example: <displaylogs>false</displaylogs>
If the logs are not displayed and <displaylogs>
is set to true
, then ensure that yarn.log-aggregation-enable
in yarn-site.xml
is set to true
. Also ensure that the Hadoop jobs configuration parameters yarn.nodemanager.remote-app-log-dir
and yarn.nodemanager.remote-app-log-dir-suffix
are set to the same value than in yarn-site.xml
.
Edit the general Hadoop jobs configuration: The console uses two Hadoop jobs. The first is used to create a spatial index on existing files in HDFS and the second is used to generate displaying results based on the index. One part of the configuration is common to both jobs and another is specific to each job. The common configuration can be found within the <hadoopjobs><configuration>
elements. An example configuration is given here:
<hadoopjobs> <configuration> <property> <!--hadoop user. The user is a mandatory property.--> <name>hadoop.job.ugi</name> <value>hdfs</value> </property> <property> <!-- like defined in core-site.xml If in core-site.xml the path fs.defaultFS is define as the nameservice ID (High Availability configuration) then set the full address and IPC port of the currently active name node. The service is define in the file hdfs-site.xml.--> <name>fs.defaultFS</name> <value>hdfs://hadoop.cluster.url:8020</value> </property> <property> <!-- like defined in mapred-site.xml --> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <!-- like defined in yarn-site.xml --> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop.cluster.url:8030</value> </property> <property> <!-- like defined in yarn-site.xml --> <name>yarn.resourcemanager.address</name> <value>hadoop.cluster.url:8032</value> </property> <property> <!-- like defined in yarn-site.xml by default /tmp/logs --> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> <property> <!-- like defined in yarn-site.xml by default logs --> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <!-- like defined in yarn-site.xml (full path) --> <name>yarn.application.classpath</name> <value>/etc/hadoop/conf/,/opt/cloudera/parcels/CDH/lib/hadoop/*,/opt/cloudera/parcels/CDH/lib/hadoop/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*,/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*,/opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*,/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*</value> </property> </configuration> <hadoopjobs>
Create an index job specific configuration. Additional Hadoop parameters can be specified for the job that creates the spatial indexes. An example additional configuration is:
<hadoopjobs> <configuration> ... </configuration> <indexjobadditionalconfiguration> <property> <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. --> <name>mapred.max.split.size</name> <value>1342177280</value> </property> </indexjobadditionalconfiguration> <hadoopjobs>
Create a specific configuration for the job that generates the categorization results. The following is an example of property settings:
<hadoopjobs> <configuration> ... </configuration> <indexjobadditionalconfiguration> ... </indexjobadditionalconfiguration> <hierarchicaljobadditionalconfiguration> <property> <!-- Increase the mapred.max.split.size, so that less mappers are allocated in slot and thus reduces the mapper initializing overhead. --> <name>mapred.max.split.size</name> <value>1342177280</value> </property> </hierarchicaljobadditionalconfiguration> <hadoopjobs>
Specify the Notification emails: The email notifications are sent to notify about the job completion status. This is defined within the <notificationmails>
element. It is mandatory to specify a user (<user>
), password (<password>
) and sender email (<mailfrom>
). In the <configuration>
element, the configuration properties needed for the Java Mail must be set. This example is a typical configuration to send mails via SMTP server using a SSL connection:
<notificationmails> <!--Authentication parameters. The Authentication parameters are mandatory.--> <user>user@mymail.com</user> <password>mypassword</password> <mailfrom>user@mymail.com</mailfrom> <!--Parameters that will be set to the system properties. Below the parameters needed to send mails via SMTP server using a SSL connection. --> <configuration> <property> <name>mail.smtp.host</name> <value>mail.host.com</value> </property> <property> <name>mail.smtp.socketFactory.port</name> <value>myport</value> </property> <property> <name>mail.smtp.socketFactory.class</name> <value>javax.net.ssl.SSLSocketFactory</value> </property> <property> <name>mail.smtp.auth</name> <value>true</value> </property> </configuration> </notificationmails>
Follow the steps mentioned in "Configuring the Spatial Hadoop Vector Console on Oracle Big Data Appliance." However, in the step (General Hadoop Job Configuration
), in the Hadoop property yarn.application.classpath
replace the /opt/cloudera/parcels/CDH/lib/
with the actual library path, which by default is /usr/lib/
.
You can use property graphs on either Oracle Big Data Appliance or commodity hardware.
See Also:
The following prerequisites apply to installing property graph support in HBase.
Linux operating system
Cloudera's Distribution including Apache Hadoop (CDH)
For the software download, see: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
Apache HBase
Java Development Kit (JDK) (Java 8 or higher)
Details about supported versions of these products, including any interdependencies, will be provided in a My Oracle Support note.
The installation directory for Oracle Big Data Spatial and Graph property graph features has the following structure:
$ tree -dFL 2 /opt/oracle/oracle-spatial-graph/property_graph/
/opt/oracle/oracle-spatial-graph/property_graph/
|-- dal
| |-- groovy
| |-- opg-solr-config
| `-- webapp
|-- data
|-- doc
| |-- dal
| `-- pgx
|-- examples
| |-- dal
| |-- pgx
| `-- pyopg
|-- lib
|-- librdf
`-- pgx
|-- bin
|-- conf
|-- groovy
|-- scripts
|-- webapp
`-- yarn
Follow this installation task if property graph support is installed on a client without Hadoop, and you want to read graph data stored in the Hadoop Distributed File System (HDFS) into the in-memory analyst and write the results back to the HDFS, and/or use Hadoop NextGen MapReduce (YARN) scheduling to start, monitor and stop the in-memory analyst.
When running a Java application using in-memory analytics and HDFS, make sure that $HADOOP_HOME/etc/hadoop
is on the classpath, so that the configurations get picked up by the Hadoop client libraries. However, you do not need to do this when using the in-memory analyst shell, because it adds $HADOOP_HOME/etc/hadoop
automatically to the classpath if HADOOP_HOME
is set.
You do not need to put any extra Cloudera Hadoop libraries (JAR files) on the classpath. The only time you need the YARN libraries is when starting the in-memory analyst as a YARN service. This is done with the yarn
command, which automatically adds all necessary JAR files from your local installation to the classpath.
You are now ready to load data from HDFS or start the in-memory analyst as a YARN service. For further information about Hadoop, see the CDH 5.x.x documentation.
To use the Multimedia analytics feature, the video analysis framework must be installed and configured.
If you have licensed Oracle Big Data Spatial and Graph with Oracle Big Data Appliance, the video analysis framework for Multimedia analytics is already installed and configured. However, you must set $MMA_HOME
to point to /opt/oracle/oracle-spatial-graph/multimedia
.
Otherwise, you can install the framework on Cloudera CDH 5 or similar Hadoop environment, as follows:
Install the framework by using the following command on each node on the cluster:
rpm2cpio oracle-spatial-graph-<version>.x86_64.rpm | cpio -idmv
Set $MMA_HOME
to point to /opt/oracle/oracle-spatial-graph/multimedia
.
Identify the locations of the following libraries:
Hadoop jar files (available in $HADOOP_HOME/jars
)
Video processing libraries (see Transcoding Software (Options)
OpenCV libraries (available with the product)
If necessary, install the desired video processing software to transcode video data (see Transcoding Software (Options)).
The following options are available for transcoding video data:
JCodec
FFmpeg
Third-party transcoding software
To use Multimedia analytics with JCodec (which is included with the product), when running the Hadoop job to recognize faces, set the oracle.ord.hadoop.ordframegrabber
property to the following value: oracle.ord.hadoop.decoder.OrdJCodecFrameGrabber
To use Multimedia analytics with FFmpeg:
Download FFmpeg from: https://www.ffmpeg.org/.
Install FFmpeg on the Hadoop cluster.
Set the oracle.ord.hadoop.ordframegrabber
property to the following value: oracle.ord.hadoop.decoder.OrdFFMPEGFrameGrabber
To use Multimedia analytics with custom video decoding software, implement the abstract class oracle.ord.hadoop.decoder.OrdFrameGrabber
. See the Javadoc for more details