This chapter provides conceptual and usage information about creating, storing, and working with property graph data in a Big Data environment.
Property graphs allow an easy association of properties (key-value pairs) with graph vertices and edges, and they enable analytical operations based on relationships across a massive set of data.
A property graph consists of a set of objects or vertices, and a set of arrows or edges connecting the objects. Vertices and edges can have multiple properties, which are represented as key-value pairs.
Each vertex has a unique identifier and can have:
A set of outgoing edges
A set of incoming edges
A collection of properties
Each edge has a unique identifier and can have:
An outgoing vertex
An incoming vertex
A text label that describes the relationship between the two vertices
A collection of properties
Figure 4-1 illustrates a very simple property graph with two vertices and one edge. The two vertices have identifiers 1 and 2. Both vertices have properties name
and age
. The edge is from the outgoing vertex 1 to the incoming vertex 2. The edge has a text label knows
and a property type
identifying the type of relationship between vertices 1 and 2.
Standards are not available for Big Data Spatial and Graph property graph data model, but it is similar to the W3C standards-based Resource Description Framework (RDF) graph data model. The property graph data model is simpler and much less precise than RDF. These differences make it a good candidate for use cases such as these:
Identifying influencers in a social network
Predicting trends and customer behavior
Discovering relationships based on pattern matching
Identifying clusters to customize campaigns
Property graphs are supported for Big Data in Hadoop and in Oracle NoSQL Database. This support consists of a data access layer and an analytics layer. A choice of databases in Hadoop provides scalable and persistent storage management.
Figure 4-2 provides an overview of the Oracle property graph architecture.
Figure 4-2 Oracle Property Graph Architecture
The in-memory analyst layer enables you to analyze property graphs using parallel in-memory execution. It provides over 35 analytic functions, including path calculation, ranking, community detection, and recommendations.
The data access layer provides a set of Java APIs that you can use to create and drop property graphs, add and remove vertices and edges, search for vertices and edges using key-value pairs, create text indexes, and perform other manipulations. The Java APIs include an implementation of TinkerPop Blueprints graph interfaces for the property graph data model. The APIs also integrate with the Apache Lucene and Apache SolrCloud, which are widely-adopted open-source text indexing and search engines.
You can store your property graphs in either Oracle NoSQL Database or Apache HBase. Both databases are mature and scalable, and support efficient navigation, querying, and analytics. Both use tables to model the vertices and edges of property graphs.
The following graph formats are supported:
The GraphML file format uses XML to describe graphs. Example 4-1 shows a GraphML description of the property graph shown in Figure 4-1.
Example 4-1 GraphML Description of a Simple Property Graph
<?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns"> <key id="name" for="node" attr.name="name" attr.type="string"/> <key id="age" for="node" attr.name="age" attr.type="int"/> <key id="type" for="edge" attr.name="type" attr.type="string"/> <graph id="PG" edgedefault="directed"> <node id="1"> <data key="name">Alice</data> <data key="age">31</data> </node> <node id="2"> <data key="name">Bob</data> <data key="age">27</data> </node> <edge id="3" source="1" target="2" label="knows"> <data key="type">friends</data> </edge> </graph> </graphml>
The GraphSON file format is based on JavaScript Object Notation (JSON) for describing graphs. Example 4-2 shows a GraphSON description of the property graph shown in Figure 4-1.
See Also:
"GraphSON Reader and Writer Library" at
https://github.com/tinkerpop/blueprints/wiki/GraphSON-Reader-and-Writer-Library
Example 4-2 GraphSON Description of a Simple Property Graph
{ "graph": { "mode":"NORMAL", "vertices": [ { "name": "Alice", "age": 31, "_id": "1", "_type": "vertex" }, { "name": "Bob", "age": 27, "_id": "2", "_type": "vertex" } ], "edges": [ { "type": "friends", "_id": "3", "_type": "edge", "_outV": "1", "_inV": "2", "_label": "knows" } ] } }
The Graph Modeling Language (GML) file format uses ASCII to describe graphs. Example 4-3 shows a GML description of the property graph shown in Figure 4-1.
See Also:
"GML: A Portable Graph File Format" by Michael Himsolt at
Example 4-3 GML Description of a Simple Property Graph
graph [ comment "Simple property graph" directed 1 IsPlanar 1 node [ id 1 label "1" name "Alice" age 31 ] node [ id 2 label "2" name "Bob" age 27 ] edge [ source 1 target 2 label "knows" type "friends" ] ]
The Oracle flat file format exclusively describes property graphs. It is more concise and provides better data type support than the other file formats. The Oracle flat file format uses two files for a graph description, one for the vertices and one for edges. Commas separate the fields of the records.
Example 4-4 shows the Oracle flat files that describe the property graph shown in Figure 4-1.
See Also:
Example 4-4 Oracle Flat File Description of a Simple Property Graph
Vertex file:
1,name,1,Alice,, 1,age,2,,31, 2,name,1,Bob,, 2,age,2,,27,
Edge file:
1,1,2,knows,type,1,friends,,
Creating a property graph involves using the Java APIs to create the property graph and objects in it.
The Java APIs that you can use for property graphs include:
Oracle Big Data Spatial and Graph property graph support provides database-specific APIs for Apache HBase and Oracle NoSQL Database. The data access layer API (oracle.pg.*
) implements TinkerPop Blueprints APIs, text search, and indexing for property graphs stored in Oracle NoSQL Database and Apache HBase.
To use the Oracle Big Data Spatial and Graph API, import the classes into your Java program:
import oracle.pg.nosql.*; // or oracle.pg.hbase.*
import oracle.pgx.config.*;
import oracle.pgx.common.types.*;
Also include TinkerPop Blueprints Java APIs.
See Also:
Oracle Big Data Spatial and Graph Java API Reference
TinkerPop Blueprints supports the property graph data model. The API provides utilities for manipulating graphs, which you use primarily through the Big Data Spatial and Graph data access layer Java APIs.
To use the Blueprints APIs, import the classes into your Java program:
import com.tinkerpop.blueprints.Vertex; import com.tinkerpop.blueprints.Edge;
See Also:
"Blueprints: A Property Graph Model Interface API" at
http://www.tinkerpop.com/docs/javadocs/blueprints/2.3.0/index.html
The Apache Hadoop Java APIs enable you to write your Java code as a MapReduce program that runs within the Hadoop distributed framework.
To use the Hadoop Java APIs, import the classes into your Java program. For example:
import org.apache.hadoop.conf.Configuration;
See Also:
"Apache Hadoop Main 2.5.0-cdh5.3.2 API" at
The Oracle NoSQL Database APIs enable you to create and populate a key-value (KV) store, and provide interfaces to Hadoop, Hive, and Oracle NoSQL Database.
To use Oracle NoSQL Database as the graph data store, import the classes into your Java program. For example:
import oracle.kv.*; import oracle.kv.table.TableOperation;
See Also:
"Oracle NoSQL Database Java API Reference" at
The Apache HBase APIs enable you to create and manipulate key-value pairs.
To use HBase as the graph data store, import the classes into your Java program. For example:
import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.filter.*; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.conf.Configuration;
See Also:
"HBase 0.98.6-cdh5.3.2 API" at
http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/index.html?overview-summary.html
A Java API is provided for performing parallel loading of graph data.
Given a set of vertex files (or input streams) and a set of edge files (or input streams), they can be split into multiple chunks and loaded into database in parallel. The number of chunks is determined by the degree of parallelism (DOP) specified by the user.
Parallelism is achieved with Splitter threads that split vertex and edge flat files into multiple chunks and Loader threads that load each chunk into the database using separate database connections. Java pipes are used to connect Splitter and Loader threads -- Splitter: PipedOutputStream
and Loader: PipedInputStream
.
The simplest usage of data loading API is specifying a property graph instance, one vertex file, one edge file, and a DOP.
The following example of the load process loads graph data stored in a vertices file and an edges file of the optimized Oracle flat file format, and executes the load with 48 degrees of parallelism.
opgdl = OraclePropertyGraphDataLoader.getInstance(); vfile = "../../data/connections.opv"; efile = "../../data/connections.ope"; opgdl.loadData(opg, vfile, efile, 48);
The data loading API allows loading the data into database using multiple partitions. This API requires the property graph, the vertex file, the edge file, the DOP, the total number of partitions, and the partition offset (from 0 to total number of partitions - 1). For example, to load the data using two partitions, the partition offsets should be 0 and 1. That is, there should be two data loading API calls to fully load the graph, and the only difference between the two API calls is the partition offset (0 and 1).
The following code fragment loads the graph data using 4 partitions. Each call to the data loader can be processed using a separate Java client, on a single system or from multiple systems.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(
args, szGraphName);
int totalPartitions = 4;
int dop= 32; // degree of parallelism for each client.
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
SimpleLogBasedDataLoaderListenerImpl dll = SimpleLogBasedDataLoaderListenerImpl.getInstance(100 /* frequency */,
true /* Continue on error */);
// Run the data loading using 4 partitions (Each call can be run from a
// separate Java Client)
// Partition 1
OraclePropertyGraphDataLoader opgdlP1 = OraclePropertyGraphDataLoader.getInstance();
opgdlP1.loadData(opg, szOPVFile, szOPEFile, dop,
4 /* Total number of partitions, default 1 */,
0 /* Partition to load (from 0 to totalPartitions - 1, default 0 */,
dll);
// Partition 2
OraclePropertyGraphDataLoader opgdlP2 = OraclePropertyGraphDataLoader.getInstance();
opgdlP2.loadData(opg, szOPVFile, szOPEFile, dop, 4 /* Total number of partitions, default 1 */,
1 /* Partition to load (from 0 to totalPartitions - 1, default 0 */, dll);
// Partition 3
OraclePropertyGraphDataLoader opgdlP3 = OraclePropertyGraphDataLoader.getInstance();
opgdlP3.loadData(opg, szOPVFile, szOPEFile, dop, 4 /* Total number of partitions, default 1 */,
2 /* Partition to load (from 0 to totalPartitions - 1, default 0 */, dll);
// Partition 4
OraclePropertyGraphDataLoader opgdlP4 = OraclePropertyGraphDataLoader.getInstance();
opgdlP4.loadData(opg, szOPVFile, szOPEFile, dop, 4 /* Total number of partitions, default 1 */,
3 /* Partition to load (from 0 to totalPartitions - 1, default 0 */, dll);
Data loading APIs also support fine-tuning those lines in the source vertex and edges files that are to be loaded. You can specify the vertex (or edge) offset line number and vertex (or edge) maximum line number. Data will be loaded from the offset line number until the maximum line number. If the maximum line number is -1, the loading process will scan the data until reaching the end of file.
The following code fragment loads the graph data using fine-tuning.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(
args, szGraphName);
int totalPartitions = 4;
int dop= 32; // degree of parallelism for each client.
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
SimpleLogBasedDataLoaderListenerImpl dll = SimpleLogBasedDataLoaderListenerImpl.getInstance(100 /* frequency */,
true /* Continue on error */);
// Run the data loading using fine tuning
long lVertexOffsetlines = 0;
long lEdgeOffsetLines = 0;
long lVertexMaxlines = 100;
long lEdgeMaxlines = 100;
int totalPartitions = 1;
int idPartition = 0;
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(m_opg, szOPVFile, szOPEFile,
lVertexOffsetlines /* offset of lines to start loading
from partition, default 0*/,
lEdgeOffsetlines /* offset of lines to start loading
from partition, default 0*/,
lVertexMaxlines /* maximum number of lines to start loading
from partition, default -1 (all lines in partition)*/,
lEdgeMaxlines /* maximun number of lines to start loading
from partition, default -1 (all lines in partition)*/,
dop,
totalPartitions /* Total number of partitions, default 1 */,
idPartition /* Partition to load (from 0 to totalPartitions - 1,
default 0 */,
dll);
Oracle Big Data Spatial and Graph also support loading multiple vertex files and multiple edges files into database. The given multiple vertex files will be split into DOP chunks and loaded into database in parallel using DOP threads. Similarly, the multiple edge files will also be split and loaded in parallel.
The following code fragment loads multiple vertex fan and edge files using the parallel data loading APIs. In the example, two string arrays szOPVFiles and szOPEFiles are used to hold the input files; Although only one vertex file and one edge file is used in this example, you can supply multiple vertex files and multiple edge files in these two arrays.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(
args, szGraphName);
String[] szOPVFiles = new String[] {"../../data/connections.opv"};
String[] szOPEFiles = new String[] {"../../data/connections.ope"};
// Clear existing vertices/edges in the property graph
opg.clearRepository();
opg.setQueueSize(100); // 100 elements
// This object will handle parallel data loading over the property graph
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, szOPVFiles, szOPEFiles, dop);
System.out.println("Total vertices: " + opg.countVertices());
System.out.println("Total edges: " + opg.countEdges());
The parallel property graph query provides a simple Java API to perform parallel scans on vertices (or edges). Parallel retrieval is an optimized solution taking advantage of the distribution of the data among splits with the back-end database, so each split is queried using separate database connections.
Parallel retrieval will produce an array where each element holds all the vertices (or edges) from a specific split. The subset of shards queried will be separated by the given start split ID and the size of the connections array provided. This way, the subset will consider splits in the range of [start, start - 1 + size of connections array]. Note that an integer ID (in the range of [0, N - 1]) is assigned to all the splits in the vertex table with N splits.
The following code loads a property graph using Apache HBase, opens an array of connections, and executes a parallel query to retrieve all vertices and edges using the opened connections. The number of calls to the getVerticesPartitioned (getEdgesPartitioned)
method is controlled by the total number of splits and the number of connections used.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create connections used in parallel query HConnection hConns= new HConnection[dop]; for (int i = 0; i < dop; i++) { Configuration conf_new = HBaseConfiguration.create(opg.getConfiguration()); hConns[i] = HConnectionManager.createConnection(conf_new); } long lCountV = 0; // Iterate over all the vertices’ splits to count all the vertices for (int split = 0; split < opg.getVertexTableSplits(); split += dop) { Iterable<Vertex>[] iterables = opg.getVerticesPartitioned(hConns /* Connection array */, true /* skip store to cache */, split /* starting split */); lCountV += consumeIterables(iterables); /* consume iterables using threads */ } // Count all vertices System.out.println("Vertices found using parallel query: " + lCountV); long lCountE = 0; // Iterate over all the edges’ splits to count all the edges for (int split = 0; split < opg.getEdgeTableSplits(); split += dop) { Iterable<Edge>[] iterables = opg.getEdgesPartitioned(hConns /* Connection array */, true /* skip store to cache */, split /* starting split */); lCountE += consumeIterables(iterables); /* consume iterables using threads */ } // Count all edges System.out.println("Edges found using parallel query: " + lCountE); // Close the connections to the database after completed for (int idx = 0; idx < hConns.length; idx++) { hConns[idx].close(); }
Oracle Big Data Spatial and Graph provides support for an easy subgraph extraction using user-defined element filter callbacks. An element filter callback defines a set of conditions that a vertex (or an edge) must meet in order to keep it in the subgraph. Users can define their own element filtering by implementing the VertexFilterCallback
and EdgeFilterCallback
API interfaces.
The following code fragment implements a VertexFilterCallback
that validates if a vertex does not have a political role and its origin is the United States.
/** * VertexFilterCallback to retrieve a vertex from the United States * that does not have a political role */ private static class NonPoliticianFilterCallback implements VertexFilterCallback { @Override public boolean keepVertex(OracleVertexBase vertex) { String country = vertex.getProperty("country"); String role = vertex.getProperty("role"); if (country != null && country.equals("United States")) { if (role == null || !role.toLowerCase().contains("political")) { return true; } } return false; } public static NonPoliticianFilterCallback getInstance() { return new NonPoliticianFilterCallback(); } }
The following code fragment implements an EdgeFilterCallback
that uses the VertexFilterCallback
to keep only edges connected to the given input vertex, and whose connections are not politicians and come from the United States.
/** * EdgeFilterCallback to retrieve all edges connected to an input * vertex with "collaborates" label, and whose vertex is from the * United States with a role different than political */ private static class CollaboratorsFilterCallback implements EdgeFilterCallback { private VertexFilterCallback m_vfc; private Vertex m_startV; public CollaboratorsFilterCallback(VertexFilterCallback vfc, Vertex v) { m_vfc = vfc; m_startV = v; } @Override public boolean keepEdge(OracleEdgeBase edge) { if ("collaborates".equals(edge.getLabel())) { if (edge.getVertex(Direction.IN).equals(m_startV) && m_vfc.keepVertex((OracleVertex) edge.getVertex(Direction.OUT))) { return true; } else if (edge.getVertex(Direction.OUT).equals(m_startV) && m_vfc.keepVertex((OracleVertex) edge.getVertex(Direction.IN))) { return true; } } return false; } public static CollaboratorsFilterCallback getInstance(VertexFilterCallback vfc, Vertex v) { return new CollaboratorsFilterCallback(vfc, v); } }
Using the filter callbacks previously defined, the following code fragment loads a property graph, creates an instance of the filter callbacks and later gets all of Barack Obama’s collaborators who are not politicians and come from the United States.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // VertexFilterCallback to retrieve all people from the United States // who are not politicians NonPoliticianFilterCallback npvfc = NonPoliticianFilterCallback.getInstance(); // Initial vertex: Barack Obama Vertex v = opg.getVertices("name", "Barack Obama").iterator().next(); // EdgeFilterCallback to retrieve all collaborators of Barack Obama // from the United States who are not politicians CollaboratorsFilterCallback cefc = CollaboratorsFilterCallback.getInstance(npvfc, v); Iterable<<Edge> obamaCollabs = opg.getEdges((String[])null /* Match any of the properties */, cefc /* Match the EdgeFilterCallback */ ); Iterator<<Edge> iter = obamaCollabs.iterator(); System.out.println("\n\n--------Collaborators of Barack Obama from " + " the US and non-politician\n\n"); long countV = 0; while (iter.hasNext()) { Edge edge = iter.next(); // get the edge // check if obama is the IN vertex if (edge.getVertex(Direction.IN).equals(v)) { System.out.println(edge.getVertex(Direction.OUT) + "(Edge ID: " + edge.getId() + ")"); // get out vertex } else { System.out.println(edge.getVertex(Direction.IN)+ "(Edge ID: " + edge.getId() + ")"); // get in vertex } countV++; }
By default, all reading operations such as get all vertices, get all edges (and parallel approaches) will use the filter callbacks associated with the property graph using the methods opg.setVertexFilterCallback(vfc)
and opg.setEdgeFilterCallback(efc)
. If there is no filter callback set, then all the vertices (or edges) and edges will be retrieved.
The following code fragment uses the default edge filter callback set on the property graph to retrieve the edges.
// VertexFilterCallback to retrieve all people from the United States // who are not politicians NonPoliticianFilterCallback npvfc = NonPoliticianFilterCallback.getInstance(); // Initial vertex: Barack Obama Vertex v = opg.getVertices("name", "Barack Obama").iterator().next(); // EdgeFilterCallback to retrieve all collaborators of Barack Obama // from the United States who are not politicians CollaboratorsFilterCallback cefc = CollaboratorsFilterCallback.getInstance(npvfc, v); opg.setEdgeFilterCallback(cefc); Iterable<Edge> obamaCollabs = opg.getEdges(); Iterator<Edge> iter = obamaCollabs.iterator(); System.out.println("\n\n--------Collaborators of Barack Obama from " + " the US and non-politician\n\n"); long countV = 0; while (iter.hasNext()) { Edge edge = iter.next(); // get the edge // check if obama is the IN vertex if (edge.getVertex(Direction.IN).equals(v)) { System.out.println(edge.getVertex(Direction.OUT) + "(Edge ID: " + edge.getId() + ")"); // get out vertex } else { System.out.println(edge.getVertex(Direction.IN)+ "(Edge ID: " + edge.getId() + ")"); // get in vertex } countV++; }
Oracle Big Data Spatial and Graph provides support for optimization flags to improve graph iteration performance. Optimization flags allow processing vertices (or edges) as objects with none or minimal information, such as ID, label, and/or incoming/outgoing vertices. This way, the time required to process each vertex (or edge) during iteration is reduced.
The following table shows the optimization flags available when processing vertices (or edges) in a property graph.
Optimization Flag | Description |
DO_NOT_CREATE_OBJECT | Use a predefined constant object when processing vertices or edges. |
JUST_EDGE_ID | Construct edge objects with ID only when processing edges. |
JUST_LABEL_EDGE_ID | Construct edge objects with ID and label only when processing edges. |
JUST_LABEL_VERTEX_EDGE_ID | Construct edge objects with ID, label, and in/out vertex IDs only when processing edges |
JUST_VERTEX_EDGE_ID | Construct edge objects with just ID and in/out vertex IDs when processing edges. |
JUST_VERTEX_ID | Construct vertex objects with ID only when processing vertices. |
The following code fragment uses a set of optimization flags to retrieve only all the IDs from the vertices and edges in the property graph. The objects retrieved by reading all vertices and edges will include only the IDs and no Key/Value properties or additional information.
import oracle.pg.common.OraclePropertyGraphBase.OptimizationFlag; OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Optimization flag to retrieve only vertices IDs OptimizationFlag optFlagVertex = OptimizationFlag.JUST_VERTEX_ID; // Optimization flag to retrieve only edges IDs OptimizationFlag optFlagEdge = OptimizationFlag.JUST_EDGE_ID; // Print all vertices Iterator<Vertex> vertices = opg.getVertices((String[])null /* Match any of the properties */, null /* Match the VertexFilterCallback */, optFlagVertex /* optimization flag */ ).iterator(); System.out.println("----- Vertices IDs----"); long vCount = 0; while (vertices.hasNext()) { OracleVertex v = vertices.next(); System.out.println((Long) v.getId()); vCount++; } System.out.println("Vertices found: " + vCount); // Print all edges Iterator<Edge> edges = opg.getEdges((String[])null /* Match any of the properties */, null /* Match the EdgeFilterCallback */, optFlagEdge /* optimization flag */ ).iterator(); System.out.println("----- Edges ----"); long eCount = 0; while (edges.hasNext()) { Edge e = edges.next(); System.out.println((Long) e.getId()); eCount++; } System.out.println("Edges found: " + eCount);
By default, all reading operations such as get all vertices, get all edges (and parallel approaches) will use the optimization flag associated with the property graph using the method opg.setDefaultVertexOptFlag(optFlagVertex)
and opg.setDefaultEdgeOptFlag(optFlagEdge)
. If the optimization flags for processing vertices and edges are not defined, then all the information about the vertices and edges will be retrieved.
The following code fragment uses the default optimization flags set on the property graph to retrieve only all the IDs from its vertices and edges.
import oracle.pg.common.OraclePropertyGraphBase.OptimizationFlag; // Optimization flag to retrieve only vertices IDs OptimizationFlag optFlagVertex = OptimizationFlag.JUST_VERTEX_ID; // Optimization flag to retrieve only edges IDs OptimizationFlag optFlagEdge = OptimizationFlag.JUST_EDGE_ID; opg.setDefaultVertexOptFlag(optFlagVertex); opg.setDefaultEdgeOptFlag(optFlagEdge); Iterator<Vertex> vertices = opg.getVertices().iterator(); System.out.println("----- Vertices IDs----"); long vCount = 0; while (vertices.hasNext()) { OracleVertex v = vertices.next(); System.out.println((Long) v.getId()); vCount++; } System.out.println("Vertices found: " + vCount); // Print all edges Iterator<Edge> edges = opg.getEdges().iterator(); System.out.println("----- Edges ----"); long eCount = 0; while (edges.hasNext()) { Edge e = edges.next(); System.out.println((Long) e.getId()); eCount++; } System.out.println("Edges found: " + eCount);
Oracle Big Data Spatial and Graph supports updating attributes (key/value pairs) to a subgraph of vertices and/or edges by using a user-customized operation callback. An operation callback defines a set of conditions that a vertex (or an edge) must meet in order to update it (either add or remove the given attribute and value).
You can define your own attribute operations by implementing the VertexOpCallback
and EdgeOpCallback
API interfaces. You must override the needOp
method, which defines the conditions to be satisfied by the vertices (or edges) to be included in the update operation, as well as the getAttributeKeyName
and getAttributeKeyValue
methods, which return the key name and value, respectively, to be used when updating the elements.
The following code fragment implements a VertexOpCallback
that operates over the obamaCollaborator
attribute associated only with Barack Obama collaborators. The value of this property is specified based on the role of the collaborators.
private static class CollaboratorsVertexOpCallback implements VertexOpCallback { private OracleVertexBase m_obama; private List<Vertex> m_obamaCollaborators; public CollaboratorsVertexOpCallback(OraclePropertyGraph opg) { // Get a list of Barack Obama'sCollaborators m_obama = (OracleVertexBase) opg.getVertices("name", "Barack Obama") .iterator().next(); Iterable<Vertex> iter = m_obama.getVertices(Direction.BOTH, "collaborates"); m_obamaCollaborators = OraclePropertyGraphUtils.listify(iter); } public static CollaboratorsVertexOpCallback getInstance(OraclePropertyGraph opg) { return new CollaboratorsVertexOpCallback(opg); } /** * Add attribute if and only if the vertex is a collaborator of Barack * Obama */ @Override public boolean needOp(OracleVertexBase v) { return m_obamaCollaborators != null && m_obamaCollaborators.contains(v); } @Override public String getAttributeKeyName(OracleVertexBase v) { return "obamaCollaborator"; } /** * Define the property's value based on the vertex role */ @Override public Object getAttributeKeyValue(OracleVertexBase v) { String role = v.getProperty("role"); role = role.toLowerCase(); if (role.contains("political")) { return "political"; } else if (role.contains("actor") || role.contains("singer") || role.contains("actress") || role.contains("writer") || role.contains("producer") || role.contains("director")) { return "arts"; } else if (role.contains("player")) { return "sports"; } else if (role.contains("journalist")) { return "journalism"; } else if (role.contains("business") || role.contains("economist")) { return "business"; } else if (role.contains("philanthropist")) { return "philanthropy"; } return " "; } }
The following code fragment implements an EdgeOpCallback
that operates over the obamaFeud
attribute associated only with Barack Obama feuds. The value of this property is specified based on the role of the collaborators.
private static class FeudsEdgeOpCallback implements EdgeOpCallback { private OracleVertexBase m_obama; private List<Edge> m_obamaFeuds; public FeudsEdgeOpCallback(OraclePropertyGraph opg) { // Get a list of Barack Obama's feuds m_obama = (OracleVertexBase) opg.getVertices("name", "Barack Obama") .iterator().next(); Iterable<Vertex> iter = m_obama.getVertices(Direction.BOTH, "feuds"); m_obamaFeuds = OraclePropertyGraphUtils.listify(iter); } public static FeudsEdgeOpCallback getInstance(OraclePropertyGraph opg) { return new FeudsEdgeOpCallback(opg); } /** * Add attribute if and only if the edge is in the list of Barack Obama's * feuds */ @Override public boolean needOp(OracleEdgeBase e) { return m_obamaFeuds != null && m_obamaFeuds.contains(e); } @Override public String getAttributeKeyName(OracleEdgeBase e) { return "obamaFeud"; } /** * Define the property's value based on the in/out vertex role */ @Override public Object getAttributeKeyValue(OracleEdgeBase e) { OracleVertexBase v = (OracleVertexBase) e.getVertex(Direction.IN); if (m_obama.equals(v)) { v = (OracleVertexBase) e.getVertex(Direction.OUT); } String role = v.getProperty("role"); role = role.toLowerCase(); if (role.contains("political")) { return "political"; } else if (role.contains("actor") || role.contains("singer") || role.contains("actress") || role.contains("writer") || role.contains("producer") || role.contains("director")) { return "arts"; } else if (role.contains("journalist")) { return "journalism"; } else if (role.contains("player")) { return "sports"; } else if (role.contains("business") || role.contains("economist")) { return "business"; } else if (role.contains("philanthropist")) { return "philanthropy"; } return " "; } }
Using the operations callbacks defined previously, the following code fragment loads a property graph, creates an instance of the operation callbacks, and later adds the attributes into the pertinent vertices and edges using the addAttributeToAllVertices
and addAttributeToAllEdges
methods in OraclePropertyGraph
.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create the vertex operation callback CollaboratorsVertexOpCallback cvoc = CollaboratorsVertexOpCallback.getInstance(opg); // Add attribute to all people collaborating with Obama based on their role opg.addAttributeToAllVertices(cvoc, true /** Skip store to Cache */, dop); // Look up for all collaborators of Obama Iterable<Vertex> collaborators = opg.getVertices("obamaCollaborator", "political"); System.out.println("Political collaborators of Barack Obama " + getVerticesAsString(collaborators)); collaborators = opg.getVertices("obamaCollaborator", "business"); System.out.println("Business collaborators of Barack Obama " + getVerticesAsString(collaborators)); // Add an attribute to all people having a feud with Barack Obama to set // the type of relation they have FeudsEdgeOpCallback feoc = FeudsEdgeOpCallback.getInstance(opg); opg.addAttributeToAllEdges(feoc, true /** Skip store to Cache */, dop); // Look up for all feuds of Obama Iterable<Edge> feuds = opg.getEdges("obamaFeud", "political"); System.out.println("\n\nPolitical feuds of Barack Obama " + getEdgesAsString(feuds)); feuds = opg.getEdges("obamaFeud", "business"); System.out.println("Business feuds of Barack Obama " + getEdgesAsString(feuds));
The following code fragment defines an implementation of VertexOpCallback
that can be used to remove vertices having value philanthropy for attribute obamaCollaborator
, then call the API removeAttributeFromAllVertices
; It also defines an implementation of EdgeOpCallback
that can be used to remove edges having value business for attribute obamaFeud
, then call the API removeAttributeFromAllEdges
.
System.out.println("\n\nRemove 'obamaCollaborator' property from all the" + "philanthropy collaborators"); PhilanthropyCollaboratorsVertexOpCallback pvoc = PhilanthropyCollaboratorsVertexOpCallback.getInstance(); opg.removeAttributeFromAllVertices(pvoc); System.out.println("\n\nRemove 'obamaFeud' property from all the" + "business feuds"); BusinessFeudsEdgeOpCallback beoc = BusinessFeudsEdgeOpCallback.getInstance(); opg.removeAttributeFromAllEdges(beoc); /** * Implementation of a EdgeOpCallback to remove the "obamaCollaborators" * property from all people collaborating with Barack Obama that have a * philanthropy role */ private static class PhilanthropyCollaboratorsVertexOpCallback implements VertexOpCallback { public static PhilanthropyCollaboratorsVertexOpCallback getInstance() { return new PhilanthropyCollaboratorsVertexOpCallback(); } /** * Remove attribute if and only if the property value for * obamaCollaborator is Philanthropy */ @Override public boolean needOp(OracleVertexBase v) { String type = v.getProperty("obamaCollaborator"); return type != null && type.equals("philanthropy"); } @Override public String getAttributeKeyName(OracleVertexBase v) { return "obamaCollaborator"; } /** * Define the property's value. In this case can be empty */ @Override public Object getAttributeKeyValue(OracleVertexBase v) { return " "; } } /** * Implementation of a EdgeOpCallback to remove the "obamaFeud" property * from all connections in a feud with Barack Obama that have a business role */ private static class BusinessFeudsEdgeOpCallback implements EdgeOpCallback { public static BusinessFeudsEdgeOpCallback getInstance() { return new BusinessFeudsEdgeOpCallback(); } /** * Remove attribute if and only if the property value for obamaFeud is * business */ @Override public boolean needOp(OracleEdgeBase e) { String type = e.getProperty("obamaFeud"); return type != null && type.equals("business"); } @Override public String getAttributeKeyName(OracleEdgeBase e) { return "obamaFeud"; } /** * Define the property's value. In this case can be empty */ @Override public Object getAttributeKeyValue(OracleEdgeBase e) { return " "; } }
You can get graph metadata and statistics, such as all graph names in the database; for each graph, getting the minimum/maximum vertex ID, the minimum/maximum edge ID, vertex property names, edge property names, number of splits in graph vertex, and the edge table that supports parallel table scans.
The following code fragment gets the metadata and statistics of the existing property graphs stored in the back-end database (either Oracle NoSQL Database or Apache HBase). The arguments required vary for each database.
// Get all graph names in the database List<String> graphNames = OraclePropertyGraphUtils.getGraphNames(dbArgs); for (String graphName : graphNames) { OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, graphName); System.err.println("\n Graph name: " + graphName); System.err.println(" Total vertices: " + opg.countVertices(dop)); System.err.println(" Minimum Vertex ID: " + opg.getMinVertexID(dop)); System.err.println(" Maximum Vertex ID: " + opg.getMaxVertexID(dop)); Set<String> propertyNamesV = new HashSet<String>(); opg.getVertexPropertyNames(dop, 0 /* timeout,0 no timeout */, propertyNamesV); System.err.println(" Vertices property names: " + getPropertyNamesAsString(propertyNamesV)); System.err.println("\n\n Total edges: " + opg.countEdges(dop)); System.err.println(" Minimum Edge ID: " + opg.getMinEdgeID(dop)); System.err.println(" Maximum Edge ID: " + opg.getMaxEdgeID(dop)); Set<String> propertyNamesE = new HashSet<String>(); opg.getEdgePropertyNames(dop, 0 /* timeout,0 no timeout */, propertyNamesE); System.err.println(" Edge property names: " + getPropertyNamesAsString(propertyNamesE)); System.err.println("\n\n Table Information: "); System.err.println("Vertex table number of splits: " + (opg.getVertexTableSplits())); System.err.println("Edge table number of splits: " + (opg.getEdgeTableSplits())); }
When describing a property graph, use these Oracle Property Graph classes to open and close the property graph instance properly:
OraclePropertyGraph.getInstance
: Opens an instance of an Oracle property graph. This method has two parameters, the connection information and the graph name. The format of the connection information depends on whether you use HBase or Oracle NoSQL Database as the backend database.
OraclePropertyGraph.clearRepository
: Removes all vertices and edges from the property graph instance.
OraclePropertyGraph.shutdown
: Closes the graph instance.
In addition, you must use the appropriate classes from the Oracle NoSQL Database or HBase APIs.
For Oracle NoSQL Database, the OraclePropertyGraph.getInstance
method uses the KV store name, host computer name, and port number for the connection:
String kvHostPort = "cluster02:5000"; String kvStoreName = "kvstore"; String kvGraphName = "my_graph"; // Use NoSQL Java API KVStoreConfig kvconfig = new KVStoreConfig(kvStoreName, kvHostPort); OraclePropertyGraph opg = OraclePropertyGraph.getInstance(kvconfig, kvGraphName); opg.clearRepository(); // . // . Graph description // . // Close the graph instance opg.shutdown();
If the in-memory analyst functions are required for your application, then it is recommended that you use GraphConfigBuilder
to create a graph config
for Oracle NoSQL Database, and instantiates OraclePropertyGraph
with the config
as an argument.
As an example, the following code snippet constructs a graph config
, gets an OraclePropertyGraph
instance, loads some data into that graph, and gets an in-memory analyst.
import oracle.pgx.config.*; import oracle.pgx.api.*; import oracle.pgx.common.types.*; ... String[] hhosts = new String[1]; hhosts[0] = "my_host_name:5000"; // need customization String szStoreName = "kvstore"; // need customization String szGraphName = "my_graph"; int dop = 8; PgNosqlGraphConfig cfg = GraphConfigBuilder.forPropertyGraphNosql() .setName(szGraphName) .setHosts(Arrays.asList(hhosts)) .setStoreName(szStoreName) .addEdgeProperty("lbl", PropertyType.STRING, "lbl") .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .build(); OraclePropertyGraph opg = OraclePropertyGraph.getInstance(cfg); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // perform a parallel data load OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); ... PgxSession session = Pgx.createSession("session-id-1"); PgxGraph g = session.readGraphWithProperties(cfg); Analyst analyst = session.createAnalyst(); ...
For Apache HBase, the OraclePropertyGraph.getInstance
method uses the Hadoop nodes and the Apache HBase port number for the connection:
String hbQuorum = "bda01node01.example.com, bda01node02.example.com, bda01node03.example.com"; String hbClientPort = "2181" String hbGraphName = "my_graph"; // Use HBase Java APIs Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum", hbQuorum); conf.set("hbase.zookeper.property.clientPort", hbClientPort); HConnection conn = HConnectionManager.createConnection(conf); // Open the property graph OraclePropertyGraph opg = OraclePropertyGraph.getInstance(conf, conn, hbGraphName); opg.clearRepository(); // . // . Graph description // . // Close the graph instance opg.shutdown(); // Close the HBase connection conn.close();
If the in-memory analyst functions are required for your application, then it is recommended that you use GraphConfigBuilder
to create a graph config
, and instantiates OraclePropertyGraph
with the config
as an argument.
As an example, the following code snippet sets the configuration for in memory analytics, constructs a graph config for Apache HBase, instantiates an OraclePropertyGraph
instance, gets an in-memory analyst, and counts the number of triangles in the graph.
confPgx = new HashMap<PgxConfig.Field, Object>(); confPgx.put(PgxConfig.Field.ENABLE_GM_COMPILER, false); confPgx.put(PgxConfig.Field.NUM_WORKERS_IO, dop + 2); confPgx.put(PgxConfig.Field.NUM_WORKERS_ANALYSIS, 8); // <= # of physical cores confPgx.put(PgxConfig.Field.NUM_WORKERS_FAST_TRACK_ANALYSIS, 2); confPgx.put(PgxConfig.Field.SESSION_TASK_TIMEOUT_SECS, 0);// no timeout set confPgx.put(PgxConfig.Field.SESSION_IDLE_TIMEOUT_SECS, 0); // no timeout set ServerInstance instance = Pgx.getInstance(); instance.startEngine(confPgx); int iClientPort = Integer.parseInt(hbClientPort); int splitsPerRegion = 2; PgHbaseGraphConfig cfg = GraphConfigBuilder.forPropertyGraphHbase() .setName(hbGraphName) .setZkQuorum(hbQuorum) .setZkClientPort(iClientPort) .setZkSessionTimeout(60000) .setMaxNumConnections(dop) .setSplitsPerRegion(splitsPerRegion) .addEdgeProperty("lbl", PropertyType.STRING, "lbl") .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .build(); PgxSession session = Pgx.createSession("session-id-1"); PgxGraph g = session.readGraphWithProperties(cfg); Analyst analyst = session.createAnalyst(); long triangles = analyst.countTriangles(g, false);
To create a vertex, use these Oracle Property Graph methods:
OraclePropertyGraph.addVertex
: Adds a vertex instance to a graph.
OracleVertex.setProperty
: Assigns a key-value property to a vertex.
OraclePropertyGraph.commit
: Saves all changes to the property graph instance.
The following code fragment creates two vertices named V1
and V2
, with properties for age, name, weight, height, and sex in the opg
property graph instance. The v1
properties set the data types explicitly.
// Create vertex v1 and assign it properties as key-value pairs Vertex v1 = opg.addVertex(1l); v1.setProperty("age", Integer.valueOf(31)); v1.setProperty("name", "Alice"); v1.setProperty("weight", Float.valueOf(135.0f)); v1.setProperty("height", Double.valueOf(64.5d)); v1.setProperty("female", Boolean.TRUE); Vertex v2 = opg.addVertex(2l); v2.setProperty("age", 27); v2.setProperty("name", "Bob"); v2.setProperty("weight", Float.valueOf(156.0f)); v2.setProperty("height", Double.valueOf(69.5d)); v2.setProperty("female", Boolean.FALSE);
To create an edge, use these Oracle Property Graph methods:
OraclePropertyGraph.addEdge
: Adds an edge instance to a graph.
OracleEdge.setProperty
: Assigns a key-value property to an edge.
The following code fragment creates two vertices (v1
and v2
) and one edge (e1
).
// Add vertices v1 and v2 Vertex v1 = opg.addVertex(1l); v1.setProperty("name", "Alice"); v1.setProperty("age", 31); Vertex v2 = opg.addVertex(2l); v2.setProperty("name", "Bob"); v2.setProperty("age", 27); // Add edge e1 Edge e1 = opg.addEdge(1l, v1, v2, "knows"); e1.setProperty("type", "friends");
You can remove vertex and edge instances individually, or all of them simultaneously. Use these methods:
OraclePropertyGraph.removeEdge
: Removes the specified edge from the graph.
OraclePropertyGraph.removeVertex
: Removes the specified vertex from the graph.
OraclePropertyGraph.clearRepository
: Removes all vertices and edges from the property graph instance.
The following code fragment removes edge e1
and vertex v1
from the graph instance. The adjacent edges will also be deleted from the graph when removing a vertex. This is because every edge must have an beginning and ending vertex. After removing the beginning or ending vertex, the edge is no longer a valid edge.
// Remove edge e1 opg.removeEdge(e1); // Remove vertex v1 opg.removeVertex(v1);
The OraclePropertyGraph.clearRepository
method can be used to remove all contents from an OraclePropertyGraph
instance. However, use it with care because this action cannot be reversed.
You can read a graph from Apache HBase or Oracle NoSQL Database into an in-memory analyst that is embedded in the same client Java application (a single JVM). For the following Apache HBase example:
A correct java.io.tmpdir
setting is required.
dop + 2
is a workaround for a performance issue before Release 1.1.2. Effective with Release 1.1.2, you can instead specify a dop
value directly in the configuration settings.
int dop = 8; // need customization Map<PgxConfig.Field, Object> confPgx = new HashMap<PgxConfig.Field, Object>(); confPgx.put(PgxConfig.Field.ENABLE_GM_COMPILER, false); confPgx.put(PgxConfig.Field.NUM_WORKERS_IO, dop + 2); // use dop directly with release 1.1.2 or newer confPgx.put(PgxConfig.Field.NUM_WORKERS_ANALYSIS, dop); // <= # of physical cores confPgx.put(PgxConfig.Field.NUM_WORKERS_FAST_TRACK_ANALYSIS, 2); confPgx.put(PgxConfig.Field.SESSION_TASK_TIMEOUT_SECS, 0); // no timeout set confPgx.put(PgxConfig.Field.SESSION_IDLE_TIMEOUT_SECS, 0); // no timeout set PgHbaseGraphConfig cfg = GraphConfigBuilder.forPropertyGraphHbase() .setName("mygraph") .setZkQuorum("localhost") // quorum, need customization .setZkClientPort(2181) .addNodeProperty("name", PropertyType.STRING, "default_name") .build(); OraclePropertyGraph opg = OraclePropertyGraph.getInstance(cfg); ServerInstance localInstance = Pgx.getInstance(); localInstance.startEngine(confPgx); PgxSession session = localInstance.createSession("session-id-1"); // Put your session description here. Analyst analyst = session.createAnalyst(); // The following call will trigger a read of graph data from the database PgxGraph pgxGraph = session.readGraphWithProperties(opg.getConfig()); long triangles = analyst.countTriangles(pgxGraph, false); System.out.println("triangles " + triangles); // Remove edge e1 opg.removeEdge(e1); // Remove vertex v1 opg.removeVertex(v1);
In addition to Reading Graph Data into Memory, you can create an in-memory graph programmatically. This can simplify development when the size of graph is small or when the content of the graph is highly dynamic. The key Java class is GraphBuilder
, which can accumulate a set of vertices and edges added with the addVertex
and addEdge
APIs. After all changes are made, an in-memory graph instance (PgxGraph
) can be created by the GraphBuilder
.
The following Java code snippet illustrates a graph construction flow. Note that there are no explicit calls to addVertex
, because any vertex that does not already exist will be added dynamically as its adjacent edges are created.
import oracle.pgx.api.*; PgxSession session = Pgx.createSession("example"); GraphBuilder<Integer> builder = session.newGraphBuilder(); builder.addEdge(0, 1, 2); builder.addEdge(1, 2, 3); builder.addEdge(2, 2, 4); builder.addEdge(3, 3, 4); builder.addEdge(4, 4, 2); PgxGraph graph = builder.build();
To construct a graph with vertex properties, you can use setProperty
against the vertex objects created.
PgxSession session = Pgx.createSession("example"); GraphBuilder<Integer> builder = session.newGraphBuilder(); builder.addVertex(1).setProperty("double-prop", 0.1); builder.addVertex(2).setProperty("double-prop", 2.0); builder.addVertex(3).setProperty("double-prop", 0.3); builder.addVertex(4).setProperty("double-prop", 4.56789); builder.addEdge(0, 1, 2); builder.addEdge(1, 2, 3); builder.addEdge(2, 2, 4); builder.addEdge(3, 3, 4); builder.addEdge(4, 4, 2); PgxGraph graph = builder.build();
To use long integers as vertex and edge identifiers, specify IdType.LONG
when getting a new instance of GraphBuilder
. For example:
import oracle.pgx.common.types.IdType; GraphBuilder<Long> builder = session.newGraphBuilder(IdType.LONG);
During edge construction, you can directly use vertex objects that were previously created in a call to addEdge
.
v1 = builder.addVertex(1l).setProperty("double-prop", 0.5) v2 = builder.addVertex(2l).setProperty("double-prop", 2.0) builder.addEdge(0, v1, v2)
As with vertices, edges can have properties. The following example sets the edge label by using setLabel
:
builder.addEdge(4, v4, v2).setProperty("edge-prop", "edge_prop_4_2").setLabel("label")
To drop a property graph from the database, use the OraclePropertyGraphUtils.dropPropertyGraph
method. This method has two parameters, the connection information and the graph name.
The format of the connection information depends on whether you use HBase or Oracle NoSQL Database as the backend database. It is the same as the connection information you provide to OraclePropertyGraph.getInstance
.
For Oracle NoSQL Database, the OraclePropertyGraphUtils.dropPropertyGraph
method uses the KV store name, host computer name, and port number for the connection. This code fragment deletes a graph named my_graph
from Oracle NoSQL Database.
String kvHostPort = "cluster02:5000"; String kvStoreName = "kvstore"; String kvGraphName = "my_graph"; // Use NoSQL Java API KVStoreConfig kvconfig = new KVStoreConfig(kvStoreName, kvHostPort); // Drop the graph OraclePropertyGraphUtils.dropPropertyGraph(kvconfig, kvGraphName);
For Apache HBase, the OraclePropertyGraphUtils.dropPropertyGraph
method uses the Hadoop nodes and the Apache HBase port number for the connection. This code fragment deletes a graph named my_graph
from Apache HBase.
String hbQuorum = "bda01node01.example.com, bda01node02.example.com, bda01node03.example.com"; String hbClientPort = "2181"; String hbGraphName = "my_graph"; // Use HBase Java APIs Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum", hbQuorum); conf.set("hbase.zookeper.property.clientPort", hbClientPort); // Drop the graph OraclePropertyGraphUtils.dropPropertyGraph(conf, hbGraphName);
Indexes in Oracle Big Data Spatial and Graph allow fast retrieval of elements by a particular key/value or key/text pair. These indexes are created based on an element type (vertices or edges), a set of keys (and values), and an index type.
Two types of indexing structures are supported by Oracle Big Data Spatial and Graph: manual and automatic.
Automatic text indexes provide automatic indexing of vertices or edges by a set of property keys. Their main purpose is to enhance query performance on vertices and edges based on particular key/value pairs.
Manual text indexes enable you to define multiple indexes over a designated set of vertices and edges of a property graph. You must specify what graph elements go into the index.
Oracle Big Data Spatial and Graph provides APIs to create manual and automatic text indexes over property graphs for Oracle NoSQL Database and Apache HBase. Indexes are managed using the available search engines, Apache Lucene and SolrCloud. The rest of this section focuses on how to create text indexes using the property graph capabilities of the Data Access Layer.
Using Automatic Indexes with the Apache Lucene Search Engine
Uploading a Collection's SolrCloud Configuration to Zookeeper
Updating Configuration Settings on Text Indexes for Property Graph Data
Using Parallel Query on Text Indexes for Property Graph Data
Using Native Query Objects on Text Indexes for Property Graph Data
The supplied examples ExampleNoSQL6 and ExampleHBase6 create a property graph from an input file, create an automatic text index on vertices, and execute some text search queries using Apache Lucene.
The following code fragment creates an automatic index over an existing property graph's vertices with these property keys: name, role, religion, and country. The automatic text index will be stored under four subdirectories under the /home/data/text-index
directory. Apache Lucene data types handling is enabled. This example uses a DOP (parallelism) of 4 for re-indexing tasks.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // Do a parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create an automatic index using Apache Lucene engine. // Specify Index Directory parameters (number of directories, // number of connections to database, batch size, commit size, // enable datatypes, location) OracleIndexParameters indexParams = OracleIndexParameters.buildFS(4, 4, 10000, 50000, true, "/home/data/text-index "); opg.setDefaultIndexParameters(indexParams); // specify indexed keys String[] indexedKeys = new String[4]; indexedKeys[0] = "name"; indexedKeys[1] = "role"; indexedKeys[2] = "religion"; indexedKeys[3] = "country"; // Create auto indexing on above properties for all vertices opg.createKeyIndex(indexedKeys, Vertex.class);
By default, indexes are configured based on the OracleIndexParameters
associated with the property graph using the method opg.setDefaultIndexParameters(indexParams
).
Indexes can also be created by specifying a different set of parameters. This is shown in the following code snippet.
// Create an OracleIndexParameters object to get Index configuration (search engine, etc). OracleIndexParameters indexParams = OracleIndexParameters.buildFS(args) // Create auto indexing on above properties for all vertices opg.createKeyIndex("name", Vertex.class, indexParams.getParameters());
The code fragment in the next example executes a query over all vertices to find all matching vertices with the key/value pair name:Barack Obama
. This operation will execute a lookup into the text index.
Additionally, wildcard searches are supported by specifying the parameter useWildCards
in the getVertices
API call. Wildcard search is only supported when automatic indexes are enabled for the specified property key. For details on text search syntax using Apache Lucene, see https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
.
// Find all vertices with name Barack Obama. Iterator<Vertices> vertices = opg.getVertices("name", "Barack Obama").iterator(); System.out.println("----- Vertices with name Barack Obama -----"); countV = 0; while (vertices.hasNext()) { System.out.println(vertices.next()); countV++; } System.out.println("Vertices found: " + countV); // Find all vertices with name including keyword "Obama" // Wildcard searching is supported. boolean useWildcard = true; Iterator<Vertices> vertices = opg.getVertices("name", "*Obama*").iterator(); System.out.println("----- Vertices with name *Obama* -----"); countV = 0; while (vertices.hasNext()) { System.out.println(vertices.next()); countV++; } System.out.println("Vertices found: " + countV);
The preceding code example produces output like the following:
----- Vertices with name Barack Obama----- Vertex ID 1 {name:str:Barack Obama, role:str:political authority, occupation:str:44th president of United States of America, country:str:United States, political party:str:Democratic, religion:str:Christianity} Vertices found: 1 ----- Vertices with name *Obama* ----- Vertex ID 1 {name:str:Barack Obama, role:str:political authority, occupation:str:44th president of United States of America, country:str:United States, political party:str:Democratic, religion:str:Christianity} Vertices found: 1
See Also:
The supplied examples ExampleNoSQL7 and ExampleHBase7 create a property graph from an input file, create a manual text index on edges, put some data into the index, and execute some text search queries using Apache SolrCloud.
When using SolrCloud, you must first load a collection's configuration for the text indexes into Apache Zookeeper, as described in Uploading a Collection's SolrCloud Configuration to Zookeeper.
The following code fragment creates a manual text index over an existing property graph using four shards, one shard per node, and a replication factor of 1. The number of shards corresponds to the number of nodes in the SolrCloud cluster.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // Do a parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create a manual text index using SolrCloud// Specify Index Directory parameters: configuration name, Solr Server URL, Solr Node set, // replication factor, zookeeper timeout (secs), // maximum number of shards per node, // number of connections to database, batch size, commit size, // write timeout (in secs) String configName = "opgconfig"; String solrServerUrl = "nodea:2181/solr" String solrNodeSet = "nodea:8983_solr,nodeb:8983_solr," + "nodec:8983_solr,noded:8983_solr"; int zkTimeout = 15; int numShards = 4; int replicationFactor = 1; int maxShardsPerNode = 1; OracleIndexParameters indexParams = OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout, numShards, replicationFactor, maxShardsPerNode, 4, 10000, 500000, 15); opg.setDefaultIndexParameters(indexParams); // Create manual indexing on above properties for all vertices OracleIndex<Edge> index = ((OracleIndex<Edge>) opg.createIndex("myIdx", Edge.class)); Vertex v1 = opg.getVertices("name", "Barack Obama").iterator().next(); Iterator<Edge> edges = v1.getEdges(Direction.OUT, "collaborates").iterator(); while (edges.hasNext()) { Edge edge = edges.next(); Vertex vIn = edge.getVertex(Direction.IN); index.put("collaboratesWith", vIn.getProperty("name"), edge); }
The next code fragment executes a query over the manual index to get all edges with the key/value pair collaboratesWith:Beyonce
. Additionally, wildcards search can be supported by specifying the parameter useWildCards
in the get API call.
// Find all edges with collaboratesWith Beyonce. // Wildcard searching is supported using true parameter. edges = index.get("collaboratesWith", "Beyonce").iterator(); System.out.println("----- Edges with name Beyonce -----"); countE = 0; while (edges.hasNext()) { System.out.println(edges.next()); countE++; } System.out.println("Edges found: "+ countE); // Find all vertices with name including Bey*. // Wildcard searching is supported using true parameter. edges = index.get("collaboratesWith", "*Bey*", true).iterator(); System.out.println("----- Edges with collaboratesWith Bey* -----"); countE = 0; while (edges.hasNext()) { System.out.println(edges.next()); countE++; } System.out.println("Edges found: " + countE);
The preceding code example produces output like the following:
----- Edges with name Beyonce ----- Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}] Edges found: 1 ----- Edges with collaboratesWith Bey* ----- Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}] Edges found: 1
See Also:
Oracle's property graph support indexes and stores an element's Key/Value pairs based on the value data type. The main purpose of handling data types is to provide extensive query support like numeric and date range queries.
By default, searches over a specific key/value pair are matched up to a query expression based on the value's data type. For example, to find vertices with the key/value pair age:30
, a query is executed over all age fields with a data type integer. If the value is a query expression, you can also specify the data type class of the value to find by calling the API get(String key, Object value, Class dtClass, Boolean useWildcards)
. If no data type is specified, the query expression will be matched to all possible data types.
When dealing with Boolean operators, each subsequent key/value pair must append the data type's prefix/suffix so the query can find proper matches. The following topics describe how to append this prefix/suffix for Apache Lucene and SolrCloud.
When Lucene's data types handling is enabled, you must append the proper data type identifier as a suffix to the key in the query expression. This can be done by executing a String.concat()
operation to the key. If Lucene's data types handling is disabled, you must insert the data type identifier as a prefix in the value String. Table 4-1 shows the data type identifiers available for text indexing using Apache Lucene (see also the Javadoc for LuceneIndex
).
Table 4-1 Apache Lucene Data Type Identifiers
Lucene Data Type Identifier | Description |
---|---|
TYPE_DT_STRING |
String |
TYPE_DT_BOOL |
Boolean |
TYPE_DT_DATE |
Date |
TYPE_DT_FLOAT |
Float |
TYPE_DT_DOUBLE |
Double |
TYPE_DT_INTEGER |
Integer |
TYPE_DT_SERIALIZABLE |
Serializable |
The following code fragment creates a manual index on edges using Lucene's data type handling, adds data, and later executes a query over the manual index to get all edges with the key/value pair collaboratesWith:Beyonce AND country1:United*
using wildcards.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args,
szGraphName);
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
// Do a parallel data loading
OraclePropertyGraphDataLoader opgdl =
OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, szOPVFile, szOPEFile, dop);
// Specify Index Directory parameters (number of directories,
// number of connections to database, batch size, commit size,
// enable datatypes, location)
OracleIndexParameters indexParams =
OracleIndexParameters.buildFS(4, 4, 10000, 50000, true,
"/home/data/text-index ");
opg.setDefaultIndexParameters(indexParams);
// Create manual indexing on above properties for all edges
OracleIndex<Edge> index = ((OracleIndex<Edge>) opg.createIndex("myIdx", Edge.class));
Vertex v1 = opg.getVertices("name", "Barack Obama").iterator().next();
Iterator<Edge> edges
= v1.getEdges(Direction.OUT, "collaborates").iterator();
while (edges.hasNext()) {
Edge edge = edges.next();
Vertex vIn = edge.getVertex(Direction.IN);
index.put("collaboratesWith", vIn.getProperty("name"), edge);
index.put("country", vIn.getProperty("country"), edge);
}
// Wildcard searching is supported using true parameter.
String key = "country";
key = key.concat(String.valueOf(oracle.pg.text.lucene.LuceneIndex.TYPE_DT_STRING));
String queryExpr = "Beyonce AND " + key + ":United*";
edges = index.get("collaboratesWith", queryExpr, true /*UseWildcard*/).iterator();
System.out.println("----- Edges with query: " + queryExpr + " -----");
countE = 0;
while (edges.hasNext()) {
System.out.println(edges.next());
countE++;
}
System.out.println("Edges found: "+ countE);
The preceding code example might produce output like the following:
----- Edges with name Beyonce AND country1:United* ----- Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}] Edges found: 1
The following code fragment creates an automatic index on vertices, disables Lucene's data type handling, adds data, and later executes a query over the manual index from a previous example to get all vertices with the key/value pair country:United* AND role:1*political*
using wildcards.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args,
szGraphName);
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
// Do a parallel data loading
OraclePropertyGraphDataLoader opgdl =
OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, szOPVFile, szOPEFile, dop);
// Create an automatic index using Apache Lucene engine.
// Specify Index Directory parameters (number of directories,
// number of connections to database, batch size, commit size,
// enable datatypes, location)
OracleIndexParameters indexParams =
OracleIndexParameters.buildFS(4, 4, 10000, 50000, false, "/ home/data/text-index ");
opg.setDefaultIndexParameters(indexParams);
// specify indexed keys
String[] indexedKeys = new String[4];
indexedKeys[0] = "name";
indexedKeys[1] = "role";
indexedKeys[2] = "religion";
indexedKeys[3] = "country";
// Create auto indexing on above properties for all vertices
opg.createKeyIndex(indexedKeys, Vertex.class);
// Wildcard searching is supported using true parameter.
String value = "*political*";
value = String.valueOf(LuceneIndex.TYPE_DT_STRING) + value;
String queryExpr = "United* AND role:" + value;
vertices = opg.getVertices("country", queryExpr, true /*useWildcard*/).iterator();
System.out.println("----- Vertices with query: " + queryExpr + " -----");
countV = 0;
while (vertices.hasNext()) {
System.out.println(vertices.next());
countV++;
}
System.out.println("Vertices found: " + countV);
The preceding code example might produce output like the following:
----- Vertices with query: United* and role:1*political* ----- Vertex ID 30 {name:str:Jerry Brown, role:str:political authority, occupation:str:34th and 39th governor of California, country:str:United States, political party:str:Democratic, religion:str:roman catholicism} Vertex ID 24 {name:str:Edward Snowden, role:str:political authority, occupation:str:system administrator, country:str:United States, religion:str:buddhism} Vertex ID 22 {name:str:John Kerry, role:str:political authority, country:str:United States, political party:str:Democratic, occupation:str:68th United States Secretary of State, religion:str:Catholicism} Vertex ID 21 {name:str:Hillary Clinton, role:str:political authority, country:str:United States, political party:str:Democratic, occupation:str:67th United States Secretary of State, religion:str:Methodism} Vertex ID 19 {name:str:Kirsten Gillibrand, role:str:political authority, country:str:United States, political party:str:Democratic, occupation:str:junior United States Senator from New York, religion:str:Methodism} Vertex ID 13 {name:str:Ertharin Cousin, role:str:political authority, country:str:United States, political party:str:Democratic} Vertex ID 11 {name:str:Eric Holder, role:str:political authority, country:str:United States, political party:str:Democratic, occupation:str:United States Deputy Attorney General} Vertex ID 1 {name:str:Barack Obama, role:str:political authority, occupation:str:44th president of United States of America, country:str:United States, political party:str:Democratic, religion:str:Christianity} Vertices found: 8
Additionally, Oracle Big Data Spatial and Graph provides a set of utilities to help users write their own Lucene text search queries using the query syntax and data type identifiers required by the automatic and manual text indexes. The method buildSearchTerm(key, value, dtClass)
in LuceneIndex
creates a query expression of the form field:query_expr by adding the data type identifier to the key (or value) and transforming the value into the required string representation based on the given data type and Apache Lucene's data type handling configuration.
The following code fragment uses the buildSearchTerm
method to produce a query expression country1:United* (if Lucene's data type handling is enabled), or country:1United* (if Lucene's data type handling is disabled) used in the previous examples:
String szQueryStrCountry = index.buildSearchTerm("country", "United*", String.class);
To deal with the key and values as individual objects to construct a different Lucene Query like a WildcardQuery using the required syntax, the methods appendDatatypesSuffixToKey(key, dtClass)
and appendDatatypesSuffixToValue(value, dtClass)
in LuceneIndex will append the appropriate data type identifiers and transform the value into the required Lucene string representation based on the given data type.
The following code fragment uses theappendDatatypesSuffixToKey
method to generate the field name required in a Lucene text query. If Lucene’s data type handling is enabled, the string returned will append the String data type identifier as a suffix of the key (country1). In any other case, the retrieved string will be the original key (country).
String key = index.appendDatatypesSuffixToKey("country", String.class);
The next code fragment uses the appendDatatypesSuffixToValue
method to generate the query body expression required in a Lucene text query. If Lucene’s data type handling is disabled, the string returned will append the String data type identifier as a prefix of the key (1United*). In all other cases, the string returned will be the string representation of the value (United*).
String value = index.appendDatatypesSuffixToValue("United*", String.class);
LuceneIndex
also supports generating a Term object using the method buildSearchTermObject(key, value, dtClass)
. Term objects are commonly used among different type of Lucene Query objects to constrain the fields and values of the documents to be retrieved. The following code fragment shows how to create a Wildcard Query object using the buildSearchTermObject
method.
Term term = index.buildSearchTermObject("country", "United*", String.class); Query query = new WildcardQuery(term);
For Boolean operations on SolrCloud text indexes, you must append the proper data type identifier as suffix to the key in the query expression. This can be done by executing a String.concat()
operation to the key. Table 4-2 shows the data type identifiers available for text indexing using SolrCloud (see the Javadoc for SolrIndex
).
Table 4-2 SolrCloud Data Type Identifiers
Solr Data Type Identifier | Description |
---|---|
TYPE_DT_STRING |
String |
TYPE_DT_BOOL |
Boolean |
TYPE_DT_DATE |
Date |
TYPE_DT_FLOAT |
Float |
TYPE_DT_DOUBLE |
Double |
TYPE_DT_INTEGER |
Integer |
TYPE_DT_SERIALIZABLE |
Serializable |
The following code fragment creates a manual index on edges using SolrCloud, adds data, and later executes a query over the manual index to get all edges with the key/value pair collaboratesWith:Beyonce AND country1:United*
using wildcards.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args, szGraphName); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // Do a parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create a manual text index using SolrCloud// Specify Index Directory parameters: configuration name, Solr Server URL, Solr Node set, // replication factor, zookeeper timeout (secs), // maximum number of shards per node, // number of connections to database, batch size, commit size, // write timeout (in secs) String configName = "opgconfig"; String solrServerUrl = "nodea:2181/solr" String solrNodeSet = "nodea:8983_solr,nodeb:8983_solr," + "nodec:8983_solr,noded:8983_solr"; int zkTimeout = 15; int numShards = 4; int replicationFactor = 1; int maxShardsPerNode = 1; OracleIndexParameters indexParams = OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout, numShards, replicationFactor, maxShardsPerNode, 4, 10000, 500000, 15); opg.setDefaultIndexParameters(indexParams); // Create manual indexing on above properties for all vertices OracleIndex<Edge> index = ((OracleIndex<Edge>) opg.createIndex("myIdx", Edge.class)); Vertex v1 = opg.getVertices("name", "Barack Obama").iterator().next(); Iterator<Edge> edges = v1.getEdges(Direction.OUT, "collaborates").iterator(); while (edges.hasNext()) { Edge edge = edges.next(); Vertex vIn = edge.getVertex(Direction.IN); index.put("collaboratesWith", vIn.getProperty("name"), edge); index.put("country", vIn.getProperty("country"), edge); } // Wildcard searching is supported using true parameter. String key = "country"; key = key.concat(oracle.pg.text.solr.SolrIndex.TYPE_DT_STRING); String queryExpr = "Beyonce AND " + key + ":United*"; edges = index.get("collaboratesWith", queryExpr, true /** UseWildcard*/).iterator(); System.out.println("----- Edges with query: " + queryExpr + " -----"); countE = 0; while (edges.hasNext()) { System.out.println(edges.next()); countE++; } System.out.println("Edges found: "+ countE);
The preceding code example might produce output like the following:
----- Edges with name Beyonce AND country_str:United* ----- Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}] Edges found: 1
Additionally, Oracle Big Data Spatial and Graph provides a set of utilities to help users write their own SolrCloud text search queries using the query syntax and data type identifiers required by the automatic and manual text indexes. The method buildSearchTerm(key, value, dtClass)
in SolrIndex
creates a query expression of the form field:query_expr by adding the data type identifier to the key (or value) and transforming the value into the required string representation using the data type formats required by the index
The following code fragment uses the buildSearchTerm
method to produce a query expression country1:United* (if Lucene's data type handling is enabled), or country:1United* (if Lucene's data type handling is disabled) used in the previous examples:
String szQueryStrCountry = index.buildSearchTerm("country", "United*", String.class);
To deal with the key and values as individual objects to construct a different Lucene Query like a WildcardQuery using the required syntax, the methods appendDatatypesSuffixToKey(key, dtClass)
and appendDatatypesSuffixToValue(value, dtClass)
in SolrIndex
will append the appropriate data type identifiers and transform the value into the required SolrCloud string representation based on the given data type.
The following code fragment uses theappendDatatypesSuffixToKey
method to generate the field name required in a SolrCloud text query. The retrieved string will append the String data type identifier as a suffix of the key (country_str).
String key = index.appendDatatypesSuffixToKey("country", String.class);
The next code fragment uses the appendDatatypesSuffixToValue
method to generate the query body expression required in a SolrCloud text query. The string returned will be the string representation of the value (United*).
String value = index.appendDatatypesSuffixToValue("United*", String.class);
Before using SolrCloud text indexes on Oracle Big Data Spatial and Graph property graphs, you must upload a collection's configuration to Zookeeper. This can be done using the ZkCli tool from one of the SolrCloud cluster nodes.
A predefined collection configuration directory can be found in dal/opg-solr-config
under the installation home. The following shows an example on how to upload the PropertyGraph configuration directory.
Copy dal/opg-solr-config under the installation home into /tmp directory on one of the Solr cluster nodes. For example:
scp –r dal/opg-solr-config user@solr-node:/tmp
Execute the following command line like the following example using the ZkCli tool on the same node:
$SOLR_HOME/bin/zkcli.sh -zkhost 127.0.0.1:2181/solr -cmd upconfig –confname opgconfig -confdir /tmp/opg-solr-config
Oracle's property graph support manages manual and automatic text indexes through integration with Apache Lucene and SolrCloud. At creation time, you must create an OracleIndexParameters
object specifying the search engine and other configuration settings to be used by the text index. After a text index for property graph is created, these configuration settings cannot be changed. For automatic indexes, all vertex index keys are managed by a single text index, and all edge index keys are managed by a different text index using the configuration specified when the first vertex or edge key is indexed.
If you need to change the configuration settings, you must first disable the current index and create it again using a new OracleIndexParameters
object. The following code fragment creates two automatic Apache Lucene-based indexes (on vertices and edges) over an existing property graph, disables them, and recreates them to use SolrCloud.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // Do parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create an automatic index using Apache Lucene. // Specify Index Directory parameters (number of directories, // number of connections to database, batch size, commit size, // enable datatypes, location) OracleIndexParameters luceneIndexParams = OracleIndexParameters.buildFS(4, 4, 10000, 50000, true, "/home/data/text-index "); // Specify indexed keys String[] indexedKeys = new String[4]; indexedKeys[0] = "name"; indexedKeys[1] = "role"; indexedKeys[2] = "religion"; indexedKeys[3] = "country"; // Create auto indexing on above properties for all vertices opg.createKeyIndex(indexedKeys, Vertex.class, luceneIndexParams.getParameters()); // Create auto indexing on weight for all edges opg.createKeyIndex("weight", Edge.class, luceneIndexParams.getParameters()); // Disable auto indexes to change parameters opg.getOracleIndexManager().disableVertexAutoIndexer(); opg.getOracleIndexManager().disableEdgeAutoIndexer(); // Recreate text indexes using SolrCloud // Specify Index Directory parameters: configuration name, Solr Server URL, Solr Node set, // replication factor, zookeeper timeout (secs), // maximum number of shards per node, // number of connections to database, batch size, commit size, // write timeout (in secs) String configName = "opgconfig"; String solrServerUrl = "nodea:2181/solr" String solrNodeSet = "nodea:8983_solr,nodeb:8983_solr," + "nodec:8983_solr,noded:8983_solr"; int zkTimeout = 15; int numShards = 4; int replicationFactor = 1; int maxShardsPerNode = 1; OracleIndexParameters solrIndexParams = OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout, numShards, replicationFactor, maxShardsPerNode, 4, 10000, 500000, 15); // Create auto indexing on above properties for all vertices opg.createKeyIndex(indexedKeys, Vertex.class, solrIndexParams.getParameters()); // Create auto indexing on weight for all edges opg.createKeyIndex("weight", Edge.class, solrIndexParams.getParameters());
Text indexes in Oracle Big Data Spatial and Graph allow executing text queries over millions of vertices and edges by a particular key/value or key/text pair using parallel query execution.
Parallel text querying is an optimized solution taking advantage of the distribution of the data in the index among shards in SolrCloud (or subdirectories in Apache Lucene), so each one is queried using separate index connection. This involves multiple threads and connections to SolrCloud (or Apache Lucene) search engines to increase performance on read operations and retrieve multiple elements from the index. Note that this approach will not rank the matching results based on their score.
Parallel text query will produce an array where each element holds all the vertices (or edges) with an attribute matching the given K/V pair from a shard. The subset of shards queried will be delimited by the given start sub-directory ID and the size of the connections array provided. This way, the subset will consider shards in the range of [start, start - 1 + size of connections array]. Note that an integer ID (in the range of [0, N - 1]) is assigned to all the shards in index with N shards.
Parallel Text Query Using Apache Lucene
You can use parallel text query using Apache Lucene by calling the method getPartitioned
in LuceneIndex
, specifying an array of connections to set of subdirectories (SearcherManager
objects), the key/value pair to search, and the starting subdirectory ID. Each connection needs to be linked to the appropriate subdirectory, as each subdirectory is independent of the rest of the subdirectories in the index.
The following code fragment generates an automatic text index using the Apache Lucene Search engine, and executes a parallel text query. The number of calls to the getPartitioned
method in the LuceneIndex
class is controlled by the total number of subdirectories and the number of connections used.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create an automatic index OracleIndexParameters indexParams = OracleIndexParameters.buildFS(dop /* number of directories */, dop /* number of connections used when indexing */, 10000 /* batch size before commit*/, 500000 /* commit size before Lucene commit*/, true /* enable datatypes */, "./lucene-index" /* index location */); opg.setDefaultIndexParameters(indexParams); // Create auto indexing on name property for all vertices System.out.println("Create automatic index on name for vertices"); opg.createKeyIndex("name", Vertex.class); // Get the SolrIndex object LuceneIndex<Vertex> index = (LuceneIndex<Vertex>) opg.getAutoIndex(Vertex.class); long lCount = 0; for (int split = 0; split < index.getTotalShards(); split += conns.length) { // Gets a connection object from subdirectory split to //(split + conns.length) for (int idx = 0; idx < conns.length; idx++) { conns[idx] = index.getOracleSearcherManager(idx + split); } // Gets elements from split to split + conns.length Iterable<Vertex>[] iterAr } = index.getPartitioned(conns /* connections */, "name"/* key */, "*" /* value */, true /* wildcards */, split /* start split ID */); lCount = countFromIterables(iterAr); /* Consume iterables in parallel */ // Close the connections to the sub-directories after completed for (int idx = 0; idx < conns.length; idx++) { conns[idx].close(); } } // Count all vertices System.out.println("Vertices found using parallel query: " + lCount);
Parallel Text Search Using SolrCloud
You can use parallel text query using SolrCloud by calling the method getPartitioned
in SolrIndex
, specifying an array of connections to SolrCloud (CloudSolrServer
objects), the key/value pair to search, and the starting shard ID.
The following code fragment generates an automatic text index using the SolrCloud Search engine and executes a parallel text query. The number of calls to the getPartitioned
method in the SolrIndex
class is controlled by the total number of shards in the index and the number of connections used.
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); String configName = "opgconfig"; String solrServerUrl = args[4];//"localhost:2181/solr" String solrNodeSet = args[5]; //"localhost:8983_solr"; int zkTimeout = 15; // zookeeper timeout in seconds int numShards = Integer.parseInt(args[6]); // number of shards in the index int replicationFactor = 1; // replication factor int maxShardsPerNode = 1; // maximum number of shards per node // Create an automatic index using SolrCloud OracleIndexParameters indexParams = OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout /* zookeeper timeout in seconds */, numShards /* total number of shards */, replicationFactor /* Replication factor */, maxShardsPerNode /* maximum number of shardsper node*/, 4 /* dop used for scan */, 10000 /* batch size before commit*/, 500000 /* commit size before SolrCloud commit*/, 15 /* write timeout in seconds */); opg.setDefaultIndexParameters(indexParams); // Create auto indexing on name property for all vertices System.out.println("Create automatic index on name for vertices"); opg.createKeyIndex("name", Vertex.class); // Get the SolrIndex object SolrIndex<Vertex> index = (SolrIndex<Vertex>) opg.getAutoIndex(Vertex.class); // Open an array of connections to handle connections to SolrCloud needed for parallel text search CloudSolrServer[] conns = new CloudSolrServer[dop]; for (int idx = 0; idx < conns.length; idx++) { conns[idx] = index.getCloudSolrServer(15 /* write timeout in secs*/); } // Iterate to cover all the shards in the index long lCount = 0; for (int split = 0; split < index.getTotalShards(); split += conns.length) { // Gets elements from split to split + conns.length Iterable<Vertex>[] iterAr = index.getPartitioned(conns /* connections */, "name"/* key */, "*" /* value */, true /* wildcards */, split /* start split ID */); lCount = countFromIterables(iterAr); /* Consume iterables in parallel */ } // Do not close the connections to the subdirectories after completion, // because those connections are used by the index itself. // Count results System.out.println("Vertices found using parallel query: " + lCount);
Using Query objects directly is for advanced users, enabling them to take full advantage of the underlying query capabilities of the text search engine (Apache Lucene or SolrCloud). For example, you can add constraints to text searches, such as adding a boost to the matching scores and adding sorting clauses.
Using text searches with Query objects will produce an Iterable object holding all the vertices (or edges) with an attribute (or set of attributes) matching the text query while satisfying the constraints. This approach will automatically rank the results based on their matching score.
To build the clauses in the query body, you may need to consider the data type used by the key/value pair to be matched, as well as the configuration of the search engine used. For more information about building a search term, see Handling Data Types.
Using Native Query Objects with Apache Lucene
You can use native query objects using Apache Lucene by calling the method get(Query)
in LuceneIndex
. You can also use parallel text query with native query objects by calling the method getPartitioned(SearcherManager[], Query, int)
in LuceneIndex
specifying an array of connections to a set of subdirectories (SearcherManager
objects), the Lucene query object, and the starting subdirectory ID. Each connection must be linked to the appropriate subdirectory, because each subdirectory is independent of the rest of the subdirectories in the index.
The following code fragment generates an automatic text index using Apache Lucene Search engine, creates a Lucene Query, and executes a parallel text query. The number of calls to the getPartitioned
method in the LuceneIndex
class is controlled by the total number of subdirectories and the number of connections used.
import oracle.pg.text.lucene.LuceneIndex; import org.apache.lucene.search.*; import org.apache.lucene.index.*; ... OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Create an automatic index OracleIndexParameters indexParams = OracleIndexParameters.buildFS(dop /* number of directories */, dop /* number of connections used when indexing */, 10000 /* batch size before commit*/, 500000 /* commit size before Lucene commit*/, true /* enable datatypes */, "./lucene-index" /* index location */); opg.setDefaultIndexParameters(indexParams); // Create auto indexing on name and country properties for all vertices System.out.println("Create automatic index on name and country for vertices"); String[] indexedKeys = new String[2]; indexedKeys[0]="name"; indexedKeys[1]="country"; opg.createKeyIndex(indexedKeys, Vertex.class); // Get the LuceneIndex object LuceneIndex<Vertex> index = (LuceneIndex<Vertex>) opg.getAutoIndex(Vertex.class); // Search first for Key name with property value Beyon* using only string //data types Term term = index.buildSearchTermObject("name", "Beyo*", String.class); Query queryBey = new WildcardQuery(term); // Add another condition to query all the vertices whose country is //"United States" String key = index.appendDatatypesSuffixToKey("country", String.class); String value = index.appendDatatypesSuffixToValue("United States", String.class); Query queryCountry = new PhraseQuery(); StringTokenizer st = new StringTokenizer(value); while (st.hasMoreTokens()) { queryCountry.add(new Term(key, st.nextToken())); }; //Concatenate queries Boolean bQuery = new BooleanQuery(); bQuery.add(queryBey, BooleanClause.Occur.MUST); bQuery.add(queryCountry, BooleanClause.Occur.MUST); long lCount = 0; conns = new SearcherManager[dop]; for (int split = 0; split < index.getTotalShards(); split += conns.length) { // Gets a connection object from subdirectory split to //(split + conns.length). Skip the cache so we clone the connection and // avoid using the connection used by the index. for (int idx = 0; idx < conns.length; idx++) { conns[idx] = index.getOracleSearcherManager(idx + split, true /* skip looking in the cache*/ ); } // Gets elements from split to split + conns.length Iterable<Vertex>[] iterAr = index.getPartitioned(conns /* connections */, bQuery, split /* start split ID */); lCount = countFromIterables(iterAr); /* Consume iterables in parallel */ // Do not close the connections to the sub-directories after completed, // as those connections are used by the index itself } // Count all vertices System.out.println("Vertices found using parallel query: " + lCount);
Using Native Query Objects withSolrCloud
You can directly use native query objects against SolrCloud by calling the method get(SolrQuery)
in SolrIndex
. You can also use parallel text query with native query objects by calling the method getPartitioned(CloudSolrServer[],SolrQuery,int)
in SolrIndex
specifying an array of connections to SolrCloud (CloudSolrServer
objects), the SolrQuery
object, and the starting shard ID.
The following code fragment generates an automatic text index using the Apache SolrCloud Search engine, creates a SolrQuery
object, and executes a parallel text query. The number of calls to the getPartitioned
method in the SolrIndex
class is controlled by the total number of subdirectories and the number of connections used.
import oracle.pg.text.solr.*; import org.apache.solr.client.solrj.*; OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName); // Clear existing vertices/edges in the property graph opg.clearRepository(); String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); String configName = "opgconfig"; String solrServerUrl = args[4];//"localhost:2181/solr" String solrNodeSet = args[5]; //"localhost:8983_solr"; int zkTimeout = 15; // zookeeper timeout in seconds int numShards = Integer.parseInt(args[6]); // number of shards in the index int replicationFactor = 1; // replication factor int maxShardsPerNode = 1; // maximum number of shards per node // Create an automatic index using SolrCloud OracleIndexParameters indexParams = OracleIndexParameters.buildSolr(configName, solrServerUrl, solrNodeSet, zkTimeout /* zookeeper timeout in seconds */, numShards /* total number of shards */, replicationFactor /* Replication factor */, maxShardsPerNode /* maximum number of shardsper node*/, 4 /* dop used for scan */, 10000 /* batch size before commit*/, 500000 /* commit size before SolrCloud commit*/, 15 /* write timeout in seconds */ ); opg.setDefaultIndexParameters(indexParams); // Create auto indexing on name property for all vertices System.out.println("Create automatic index on name and country for vertices"); String[] indexedKeys = new String[2]; indexedKeys[0]="name"; indexedKeys[1]="country"; opg.createKeyIndex(indexedKeys, Vertex.class); // Get the SolrIndex object SolrIndex<Vertex> index = (SolrIndex<Vertex>) opg.getAutoIndex(Vertex.class); // Search first for Key name with property value Beyon* using only string //data types String szQueryStrBey = index.buildSearchTerm("name", "Beyo*", String.class); String key = index.appendDatatypesSuffixToKey("country", String.class); String value = index.appendDatatypesSuffixToValue("United States", String.class); String szQueryStrCountry = key + ":" + value; Solrquery query = new SolrQuery(szQueryStrBey + " AND " + szQueryStrCountry); //Query using get operation index.get(query); // Open an array of connections to handle connections to SolrCloud needed // for parallel text search CloudSolrServer[] conns = new CloudSolrServer[dop]; for (int idx = 0; idx < conns.length; idx++) { conns[idx] = index.getCloudSolrServer(15 /* write timeout in secs*/); } // Iterate to cover all the shards in the index long lCount = 0; for (int split = 0; split < index.getTotalShards(); split += conns.length) { // Gets elements from split to split + conns.length Iterable<Vertex>[] iterAr = index.getPartitioned(conns /* connections */, query, split /* start split ID */); lCount = countFromIterables(iterAr); /* Consume iterables in parallel */ } // Close the connections to SolCloud after completion for (int idx = 0; idx < conns.length; idx++) { conns[idx].shutdown(); } // Count results System.out.println("Vertices found using parallel query: " + lCount);
Oracle Big Data Spatial and Graph supports a rich set of graph pattern matching capabilities. It provides a SQL-like declarative language that allows you to express a graph query pattern that consists of vertices and edges, and constraints on the properties of the vertices and edges.
An example property graph query is as follows. It defines a graph pattern inspired by the famous ancient proverb: The enemy of my enemy is my friend. In this example, variables x
, y
, z
are used for vertices, and variables e1
, e2
are used for edges. There is a constraint on the edge label, and the query returns (projects) the value of the name
property of vertices x
and y
.
SELECT x.name, z.name WHERE x -[e1 WITH label = 'feuds']-> y, y -[e2 WITH label = 'feuds']-> z
You can run the query either in a Groovy shell environment or from Java. For example, to run the preceding query from the Groovy shell for Apache HBase or Oracle NoSQL Database, you can first read the graph from the database into the in-memory analyst, get an in-memory graph, and invoke the queryPgql
function.
// Read graph data from a backend database into memory // Note that opg is an instance of OraclePropertyGraph class opg-hbase> G = session.readGraphWithProperties(opg.getConfig()); opg-hbase> resultSet = G.queryPgql("SELECT x.name, z.name WHERE x -[e1 WITH label = 'feuds']-> y, y -[e2 WITH label = 'feuds']-> z")
To get the type and variable name of the first projected variable in the result set, you can enter the following:
opg-hbase> resultElement = resultElements.get(0) opg-hbase> type = resultElement.getElementType() // STRING opg-hbase> varName = resultElement.getVarName() // x.name
You can also iterate over the result set. For example:
opg-hbase> resultSet.getResults().each { \ // the variable 'it' is implicitly declared to references each PgqlResult instance }
Finally, you can display (print) results. For example, to display the first 10 rows:
opg-hbase> resultSet.print(10) // print the first 10 results
Oracle Big Data Spatial and Graph property graph support works with both secure and non-secure Oracle NoSQL Database installations. This topic provides information about how to use property graph functions with a secure Oracle NoSQL Database setup. It assumes that a secure Oracle NoSQL Database is already installed (a process explained in "Performing a Secure Oracle NoSQL Database Installation" in the Oracle NoSQL Database Security Guide at http://docs.oracle.com/cd/NOSQL/html/SecurityGuide/secure_installation.html
).
You must have the correct credentials to access the secure database. Create a user such as the following:
kv-> plan create-user -name myusername -admin -wait
Grant this user the readwrite
and dbaadmin
roles. For example:
kv-> plan grant -user myusername -role readwrite -wait kv-> plan grant -user myusername -role dbadmin -wait
When generating the login_properties.txt
from the file client.security
, make sure the user name is correct. For example:
oracle.kv.auth.username=myusername
On Oracle property graph client side, you must have the security-related files and libraries to interact with the secure Oracle NoSQL Database. First, copy these files (or directories) from KVROOT/security/
to the client side:
client.security client.trust login.wallet/ login_properties.txt
If Oracle Wallet is used to hold passwords that are needed for accessing the secure database, copy these three libraries to the client side and set the class path correctly:
oraclepki.jar osdt_cert.jar osdt_core.jar
After configuring the database and Oracle property graph client side correctly, you can connect to a graph stored in Secure NoSQL Database using either one of the following two approaches.
Specify the login properties file, using a Java VM setting with the following format:
-Doracle.kv.security=/<your-path>/login_properties.txt
You can also set this Java VM property for applications deployed into a J2EE container (including in-memory analytics). For example, before starting WebLogic Server, you can set an environment variable in the following format to refer to the login properties configuration file:
setenv JAVA_OPTIONS "-Doracle.kv.security=/<your-path>/login_properties.txt"
Then you can call OraclePropertyGraph.getInstance(kconfig, szGraphName)
as usual to create an OraclePropertyGraph
instance.
Call OraclePropertyGraph.getInstance(kconfig, szGraphName, username, password, truStoreFile)
, where username
and password
are the correct credentials to access secure Oracle NoSQL Database, and truStoreFile
is the path to the client side trust store file client.trust
.
The following code fragment creates a property graph in a Secure Oracle NoSQL Database, loads the data, and then counts how many vertices and edges in the graph:
// This object will handle operations over the property graph OraclePropertyGraph opg = OraclePropertyGraph.getInstance(kconfig, szGraphName, username, password, truStoreFile); // Clear existing vertices/edges in the property graph opg.clearRepository(); opg.setQueueSize(100); // 100 elements String szOPVFile = "../../data/connections.opv"; String szOPEFile = "../../data/connections.ope"; // This object will handle parallel data loading over the property graph System.out.println("Load data for graph " + szGraphName); OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, szOPVFile, szOPEFile, dop); // Count all vertices long countV = 0; Iterator<Vertex> vertices = opg.getVertices().iterator(); while (vertices.hasNext()) { vertices.next(); countV++; } System.out.println("Vertices found: " + countV); // Count all edges long countE = 0; Iterator<Edge> edges = opg.getEdges().iterator(); while (edges.hasNext()) { edges.next(); countE++; } System.out.println("Edges found: " + countE);
Kerberos authentication is recommended for Apache HBase to secure property graphs in Oracle Big Data Spatial and Graph.
Oracle's property graph support works with both secure and non-secure Cloudera Hadoop (CDH) cluster installations. This topic provides information about secure Apache HBase installations.
Kerberos authentication is recommended for Apache HBase to secure property graphs in Oracle Big Data Spatial and Graph.
This topic assumes that a secure Apache HBase is already configured with Kerberos, that the client machine has the Kerberos libraries installed and that you have the correct credentials. For detailed information, see "Configuring Kerberos Authentication for HBase" at: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_sg_hbase_authentication.html
. For information about how to set up your Kerberos cluster and clients, see the MIT Kerberos Documentation at http://web.mit.edu/kerberos/krb5-latest/doc/index.html
.
On the client side, you must have a Kerberos credential to interact with the Kerberos-enabled HDFS daemons. Additionally, you need to modify the Kerberos configuration information (located in krb5.conf
) to include the realm and mappings of hostnames onto Kerberos realms used in the Secure CDH Cluster.
The following code fragment shows the realm and hostname mapping used in a Secure CDH cluster on BDA.COM:
[libdefaults] default_realm = EXAMPLE.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = yes [realms] EXAMPLE.COM = { kdc = hostname1.example.com:88 kdc = hostname2.example.com:88 admin_server = hostname1.example.com:749 default_domain = example.com } BDA.COM = { kdc = hostname1.bda.com:88 kdc = hostname2.bda.com:88 admin_server = hostname1.bda.com:749 default_domain = bda.com } [domain_realm] .example.com = EXAMPLE.COM example.com = EXAMPLE.COM .bda.com = BDA.COM bda.com = BDA.COM
After modifying krb5.conf
, you can connect to a graph stored in Apache HBase by using a Java Authentication and Authorization Service (JAAS) configuration file to provide your credentials to the application. This provides the same capabilities of the preceding example without having to modify a single line of your code in case you already have an application that uses an insecure Apache HBase installation.
To use property graph support for for HBase with a JAAS configuration, create a file with content in the following form, replacing the keytab
and principal
entries with your own information:
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=true keyTab="/path/to/your/keytab/user.keytab" principal="your-user/your.fully.qualified.domain.name@YOUR.REALM"; };
The following code fragment shows an example JAAS file with the realm used in a Secure CDH cluster on BDA.COM:
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=true keyTab="/path/to/keytab/user.keytab" principal="hbaseuser/hostname1@BDA.COM"; };
In order to run your Secure HBase application you must specify the JAAS configuration file you created by using the java.security.auth.login.config flag. You can run your application using a command in the following format:
java -Djava.security.auth.login.config=/path/to/your/jaas.conf/ -classpath ./classes/:../../lib/'*' YourJavaApplication
Then, you can call OraclePropertyGraph.getInstance(conf, hconn, szGraphName)
as usual to create an Oracle property graph.
Another option to use the Oracle Big Data Spatial and Graph property graph support on a secure Apache HBase installation is to use a secure HBase configuration. The following code fragment shows how to obtain a secure HBase configuration using prepareSecureConfig(). This API requires the security authentication setting used in Apache Hadoop and Apache HBase, as well as Kerberos credentials set to authenticate and obtain an authorized ticket.
The following code fragment creates a property graph in a Secure Apache HBase, loads the data, and then counts how many vertices and edges in the graph.
String szQuorum= "hostname1,hostname2,hostname3";
String szCliPort = "2181";
String szGraph = "SecureGraph";
String hbaseSecAuth="kerberos";
String hadoopSecAuth="kerberos";
String hmKerberosPrincipal="hbase/_HOST@BDA.COM";
String rsKerberosPrincipal="hbase/_HOST@BDA.COM";
String userPrincipal = "hbase/hostname1@BDA.COM";
String keytab= "/path/to/your/keytab/hbase.keytab";
int dop= 8;
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", szQuorum);
conf.set("hbase.zookeeper.property.clientPort", szCliPort);
// Prepare the secure configuration providing the credentials in the keytab
conf = OraclePropertyGraph.prepareSecureConfig(conf,
hbaseSecAuth,
hadoopSecAuth,
hmKerberosPrincipal,
rsKerberosPrincipal,
userPrincipal,
keytab);
HConnection hconn = HConnectionManager.createConnection(conf);
OraclePropertyGraph opg=OraclePropertyGraph.getInstance(conf, hconn, szGraph);
opg.setInitialNumRegions(24);
opg.clearRepository();
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
// Do a parallel data loading
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, szOPVFile, szOPEFile, dop);
opg.commit();
The Oracle Big Data Spatial and Graph property graph support includes a built-in Groovy shell (based on the original Gremlin Groovy shell script). With this command-line shell interface, you can explore the Java APIs.
To start the Groovy shell, go to the dal/groovy
directory under the installation home (/opt/oracle/oracle-spatial-graph/property_graph
by default). For example:
cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/
Included are the scripts gremlin-opg-nosql.sh
and gremlin-opg-hbase.sh
, for connecting to an Oracle NoSQL Database and an Apache HBase, respectively.
Note:
To run some gremlin traversal examples, you must first do the following import operation:import com.tinkerpop.pipes.util.structures.*;
The following example connects to an Oracle NoSQL Database, gets an instance of OraclePropertyGraph
with graph name myGraph
, loads some example graph data, and gets the list of vertices and edges.
$ ./gremlin-opg-nosql.sh
opg-nosql>
opg-nosql> hhosts = new String[1];
==>null
opg-nosql> hhosts[0] = "bigdatalite:5000";
==>bigdatalite:5000
opg-nosql> cfg = GraphConfigBuilder.forPropertyGraphNosql().setName("myGraph").setHosts(Arrays.asList(hhosts)).setStoreName("mystore").addEdgeProperty("lbl", PropertyType.STRING, "lbl").addEdgeProperty("weight", PropertyType.DOUBLE, "1000000").build();
==>{"db_engine":"NOSQL","loading":{},"format":"pg","name":"myGraph","error_handling":{},"hosts":["bigdatalite:5000"],"node_props":[],"store_name":"mystore","edge_props":[{"type":"string","name":"lbl","default":"lbl"},{"type":"double","name":"weight","default":"1000000"}]}
opg-nosql> opg = OraclePropertyGraph.getInstance(cfg);
==>oraclepropertygraph with name myGraph
opg-nosql> opgdl = OraclePropertyGraphDataLoader.getInstance();
==>oracle.pg.nosql.OraclePropertyGraphDataLoader@576f1cad
opg-nosql> opgdl.loadData(opg, new FileInputStream("../../data/connections.opv"), new FileInputStream("../../data/connections.ope"), 1, 1, 0, null);
==>null
opg-nosql> opg.getVertices();
==>Vertex ID 5 {country:str:Italy, name:str:Pope Francis, occupation:str:pope, religion:str:Catholicism, role:str:Catholic religion authority}
[... other output lines omitted for brevity ...]
opg-nosql> opg.getEdges();
==>Edge ID 1139 from Vertex ID 64 {country:str:United States, name:str:Jeff Bezos, occupation:str:business man} =[leads]=> Vertex ID 37 {country:str:United States, name:str:Amazon, type:str:online retailing} edgeKV[{weight:flo:1.0}]
[... other output lines omitted for brevity ...]
The following example customizes several configuration parameters for in-memory analytics. It connects to an Apache HBase, gets an instance of OraclePropertyGraph
with graph name myGraph
, loads some example graph data, gets the list of vertices and edges, gets an in-memory analyst, and execute one of the built-in analytics, triangle counting.
$ ./gremlin-opg-hbase.sh
opg-hbase>
opg-hbase> dop=2; // degree of parallelism
==>2
opg-hbase> confPgx = new HashMap<PgxConfig.Field, Object>();
opg-hbase> confPgx.put(PgxConfig.Field.ENABLE_GM_COMPILER, false);
==>null
opg-hbase> confPgx.put(PgxConfig.Field.NUM_WORKERS_IO, dop + 2);
==>null
opg-hbase> confPgx.put(PgxConfig.Field.NUM_WORKERS_ANALYSIS, 3);
==>null
opg-hbase> confPgx.put(PgxConfig.Field.NUM_WORKERS_FAST_TRACK_ANALYSIS, 2);
==>null
opg-hbase> confPgx.put(PgxConfig.Field.SESSION_TASK_TIMEOUT_SECS, 0);
==>null
opg-hbase> confPgx.put(PgxConfig.Field.SESSION_IDLE_TIMEOUT_SECS, 0);
==>null
opg-hbase> instance = Pgx.getInstance()
==>null
opg-hbase> instance.startEngine(confPgx)
==>null
opg-hbase> cfg = GraphConfigBuilder.forPropertyGraphHbase() .setName("myGraph") .setZkQuorum("bigdatalite") .setZkClientPort(iClientPort) .setZkSessionTimeout(60000) .setMaxNumConnections(dop) .setLoadEdgeLabel(true) .setSplitsPerRegion(1) .addEdgeProperty("lbl", PropertyType.STRING, "lbl") .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .build();
==>{"splits_per_region":1,"max_num_connections":2,"node_props":[],"format":"pg","load_edge_label":true,"name":"myGraph","zk_client_port":2181,"zk_quorum":"bigdatalite","edge_props":[{"type":"string","default":"lbl","name":"lbl"},{"type":"double","default":"1000000","name":"weight"}],"loading":{},"error_handling":{},"zk_session_timeout":60000,"db_engine":"HBASE"}
opg-hbase> opg = OraclePropertyGraph.getInstance(cfg);
==>oraclepropertygraph with name myGraph
opg-hbase> opgdl = OraclePropertyGraphDataLoader.getInstance();
==>oracle.pg.hbase.OraclePropertyGraphDataLoader@3451289b
opg-hbase> opgdl.loadData(opg, "../../data/connections.opv", "../../data/connections.ope", 1, 1, 0, null);
==>null
opg-hbase> opg.getVertices();
==>Vertex ID 78 {country:str:United States, name:str:Hosain Rahman, occupation:str:CEO of Jawbone}
...
opg-hbase> opg.getEdges();
==>Edge ID 1139 from Vertex ID 64 {country:str:United States, name:str:Jeff Bezos, occupation:str:business man} =[leads]=> Vertex ID 37 {country:str:United States, name:str:Amazon, type:str:online retailing} edgeKV[{weight:flo:1.0}]
[... other output lines omitted for brevity ...]
opg-hbase> session = Pgx.createSession("session-id-1");
opg-hbase> g = session.readGraphWithProperties(cfg);
opg-hbase> analyst = session.createAnalyst();
opg-hbase> triangles = analyst.countTriangles(false).get();
==>22
For detailed information about the Java APIs, see the Javadoc reference information in doc/dal/
and doc/pgx/
under the installation home (/opt/oracle/oracle-spatial-graph/property_graph/
by default).
The software installation includes a directory of example programs, which you can use to learn about creating and manipulating property graphs.
The sample programs are distributed in an installation subdirectory named examples/dal
. The examples are replicated for HBase and Oracle NoSQL Database, so that you can use the set of programs corresponding to your choice of backend database. Table 4-3 describes the some of the programs.
Table 4-3 Property Graph Program Examples (Selected)
Program Name | Description |
---|---|
ExampleNoSQL1 ExampleHBase1 |
Creates a minimal property graph consisting of one vertex, sets properties with various data types on the vertex, and queries the database for the saved graph description. |
ExampleNoSQL2 ExampleHBase2 |
Creates the same minimal property graph as Example1, and then deletes it. |
ExampleNoSQL3 ExampleHBase3 |
Creates a graph with multiple vertices and edges. Deletes some vertices and edges explicitly, and other implicitly by deleting other, required objects. This example queries the database repeatedly to show the current list of objects. |
To compile and run the Java source files:
Change to the examples directory:
cd examples/dal
Use the Java compiler:
javac -classpath ../../lib/'*' filename.java
For example: javac -classpath ../../lib/'*' ExampleNoSQL1.java
Execute the compiled code:
java -classpath ../../lib/'*':./ filename args
The arguments depend on whether you are using Oracle NoSQL Database or Apache HBase to store the graph. The values are passed to OraclePropertyGraph.getInstance
.
Apache HBase Argument Descriptions
Provide these arguments when using the HBase examples:
quorum: A comma-delimited list of names identifying the nodes where HBase runs, such as "node01.example.com, node02.example.com, node03.example.com"
.
client_port: The HBase client port number, such as "2181"
.
graph_name: The name of the graph, such as "customer_graph"
.
Oracle NoSQL Database Argument Descriptions
Provide these arguments when using the NoSQL examples:
host_name: The cluster name and port number for Oracle NoSQL Database registration, such as "cluster02:5000"
.
store_name: The name of the key-value store, such as "kvstore"
graph_name: The name of the graph, such as "customer_graph"
.
The example programs use System.out.println
to retrieve the property graph descriptions from the database where it is stored, either Oracle NoSQL Database or Apache HBase. The key name, data type, and value are delimited by colons. For example, weight:flo:30.0
indicates that the key name is weight
, the data type is float
, and the value is 30.0.
Table 4-4 identifies the data type abbreviations used in the output.
Table 4-4 Property Graph Data Type Abbreviations
Abbreviation | Data Type |
---|---|
bol |
Boolean |
dat |
date |
dbl |
double |
flo |
float |
int |
integer |
ser |
serializable |
str |
string |
ExampleNoSQL1
and ExampleHBase1
create a minimal property graph consisting of one vertex. The code fragment in Example 4-5 creates a vertex named v1
and sets properties with various data types. It then queries the database for the saved graph description.
Example 4-5 Creating a Property Graph
// Create a property graph instance named opg
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args);
// Clear all vertices and edges from opg
opg.clearRepository();
// Create vertex v1 and assign it properties as key-value pairs
Vertex v1 = opg.addVertex(1l);
v1.setProperty("age", Integer.valueOf(18));
v1.setProperty("name", "Name");
v1.setProperty("weight", Float.valueOf(30.0f));
v1.setProperty("height", Double.valueOf(1.70d));
v1.setProperty("female", Boolean.TRUE);
// Save the graph in the database
opg.commit();
// Display the stored vertex description
System.out.println("Fetch 1 vertex: " + opg.getVertices().iterator().next());
// Close the graph instance
opg.shutdown();
The OraclePropertyGraph.getInstance
arguments (args) depend on whether you are using Oracle NoSQL Database or Apache HBase to store the graph. See "Compiling and Running the Sample Programs".
System.out.println
displays the following output:
Fetch 1 vertex: Vertex ID 1 {age:int:18, name:str:Name, weight:flo:30.0, height:dbl:1.7, female:bol:true}
See the property graph support Javadoc (/opt/oracle/oracle-spatial-graph/property_graph/doc/pgx
by default) for the following:
OraclePropertyGraph.addVertex OraclePropertyGraph.clearRepository OraclePropertyGraph.getInstance OraclePropertyGraph.getVertices OraclePropertyGraph.shutdown Vertex.setProperty
ExampleNoSQL2 and ExampleHBase2 create a graph like the one in "Example: Creating a Property Graph", and then drop it from the database.
The code fragment in Example 4-6 drops the graph. See "Compiling and Running the Sample Programs" for descriptions of the OraclePropertyGraphUtils.dropPropertyGraph
arguments.
Example 4-6 Dropping a Property Graph
// Drop the property graph from the database OraclePropertyGraphUtils.dropPropertyGraph(args); // Display confirmation that the graph was dropped System.out.println("Graph " + graph_name + " dropped. ");
System.out.println
displays the following output:
Graph graph_name dropped.
See the Javadoc for OraclePropertyGraphUtils.dropPropertyGraph
.
ExampleNoSQL3 and ExampleHBase3 add and drop both vertices and edges.
Example 4-7 Creating the Vertices
The code fragment in Example 4-7 creates three vertices. It is a simple variation of Example 4-5.
// Create a property graph instance named opg
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(args);
// Clear all vertices and edges from opg
opg.clearRepository();
// Add vertices a, b, and c
Vertex a = opg.addVertex(1l);
a.setProperty("name", "Alice");
a.setProperty("age", 31);
Vertex b = opg.addVertex(2l);
b.setProperty("name", "Bob");
b.setProperty("age", 27);
Vertex c = opg.addVertex(3l);
c.setProperty("name", "Chris");
c.setProperty("age", 33);
Example 4-8 Creating the Edges
The code fragment in Example 4-8 uses vertices a, b, and c to create the edges.
// Add edges e1, e2, and e3 Edge e1 = opg.addEdge(1l, a, b, "knows"); e1.setProperty("type", "partners"); Edge e2 = opg.addEdge(2l, a, c, "knows"); e2.setProperty("type", "friends"); Edge e3 = opg.addEdge(3l, b, c, "knows"); e3.setProperty("type", "colleagues");
Example 4-9 Deleting Edges and Vertices
The code fragment in Example 4-9 explicitly deletes edge e3
and vertex b
. It implicitly deletes edge e1
, which was connected to vertex b
.
// Remove edge e3 opg.removeEdge(e3); // Remove vertex b and all related edges opg.removeVertex(b);
Example 4-10 Querying for Vertices and Edges
This example queries the database to show when objects are added and dropped. The code fragment in Example 4-10 shows the method used.
// Print all vertices vertices = opg.getVertices().iterator(); System.out.println("----- Vertices ----"); vCount = 0; while (vertices.hasNext()) { System.out.println(vertices.next()); vCount++; } System.out.println("Vertices found: " + vCount); // Print all edges edges = opg.getEdges().iterator(); System.out.println("----- Edges ----"); eCount = 0; while (edges.hasNext()) { System.out.println(edges.next()); eCount++; } System.out.println("Edges found: " + eCount);
The examples in this topic may produce output like the following:
----- Vertices ---- Vertex ID 3 {name:str:Chris, age:int:33} Vertex ID 1 {name:str:Alice, age:int:31} Vertex ID 2 {name:str:Bob, age:int:27} Vertices found: 3 ----- Edges ---- Edge ID 2 from Vertex ID 1 {name:str:Alice, age:int:31} =[knows]=> Vertex ID 3 {name:str:Chris, age:int:33} edgeKV[{type:str:friends}] Edge ID 3 from Vertex ID 2 {name:str:Bob, age:int:27} =[knows]=> Vertex ID 3 {name:str:Chris, age:int:33} edgeKV[{type:str:colleagues}] Edge ID 1 from Vertex ID 1 {name:str:Alice, age:int:31} =[knows]=> Vertex ID 2 {name:str:Bob, age:int:27} edgeKV[{type:str:partners}] Edges found: 3 Remove edge Edge ID 3 from Vertex ID 2 {name:str:Bob, age:int:27} =[knows]=> Vertex ID 3 {name:str:Chris, age:int:33} edgeKV[{type:str:colleagues}] ----- Vertices ---- Vertex ID 1 {name:str:Alice, age:int:31} Vertex ID 2 {name:str:Bob, age:int:27} Vertex ID 3 {name:str:Chris, age:int:33} Vertices found: 3 ----- Edges ---- Edge ID 2 from Vertex ID 1 {name:str:Alice, age:int:31} =[knows]=> Vertex ID 3 {name:str:Chris, age:int:33} edgeKV[{type:str:friends}] Edge ID 1 from Vertex ID 1 {name:str:Alice, age:int:31} =[knows]=> Vertex ID 2 {name:str:Bob, age:int:27} edgeKV[{type:str:partners}] Edges found: 2 Remove vertex Vertex ID 2 {name:str:Bob, age:int:27} ----- Vertices ---- Vertex ID 1 {name:str:Alice, age:int:31} Vertex ID 3 {name:str:Chris, age:int:33} Vertices found: 2 ----- Edges ---- Edge ID 2 from Vertex ID 1 {name:str:Alice, age:int:31} =[knows]=> Vertex ID 3 {name:str:Chris, age:int:33} edgeKV[{type:str:friends}] Edges found: 1
A property graph can be defined in two flat files, specifically description files for the vertices and edges.
A pair of files describe a property graph:
Vertex file: Describes the vertices of the property graph. This file has an .opv
file name extension.
Edge file: Describes the edges of the property graph. This file has an .ope
file name extension.
It is recommended that these two files share the same base name. For example, simple.opv
and simple.ope
define a property graph.
Each line in a vertex file is a record that describes a vertex of the property graph. A record can describe one key-value property of a vertex, thus multiple records/lines are used to describe a vertex with multiple properties.
A record contains six fields separated by commas. Each record must contain five commas to delimit all fields, whether or not they have values:
vertex_ID, key_name, value_type, value, value, value
Table 4-5 describes the fields composing a vertex file record.
Table 4-5 Vertex File Record Format
Field Number | Name | Description |
---|---|---|
1 |
vertex_ID |
An integer that uniquely identifies the vertex |
2 |
key_name |
The name of the key in the key-value pair If the vertex has no properties, then enter a space ( 1,%20,,,, |
3 |
value_type |
An integer that represents the data type of the value in the key-value pair:
|
4 |
value |
The encoded, nonnull value of key_name when it is neither numeric nor date |
5 |
value |
The encoded, nonnull value of key_name when it is numeric |
6 |
value |
The encoded, nonnull value of key_name when it is a date Use the Java SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX"); encode(sdf.format((java.util.Date) value)); |
Required Grouping of Vertices: A vertex can have multiple properties, and the vertex file includes a record (represented by a single line of text in the flat file) for each combination of a vertex ID and a property for that vertex. In the vertex file, all records for each vertex must be grouped together (that is, not have any intervening records for other vertices. You can accomplish this any way you want, but a convenient way is to sort the vertex file records in ascending (or descending) order by vertex ID. (Note, however, a vertex file is not required to have all records sorted by vertex ID; this is merely one way to achieve the grouping requirement.)
Each line in an edge file is a record that describes an edge of the property graph. A record can describe one key-value property of an edge, thus multiple records are used to describe an edge with multiple properties.
A record contains nine fields separated by commas. Each record must contain eight commas to delimit all fields, whether or not they have values:
edge_ID, source_vertex_ID, destination_vertex_ID, edge_label, key_name, value_type, value, value, value
Table 4-6 describes the fields composing an edge file record.
Table 4-6 Edge File Record Format
Field Number | Name | Description |
---|---|---|
1 |
edge_ID |
An integer that uniquely identifies the edge |
2 |
source_vertex_ID |
The vertex_ID of the outgoing tail of the edge. |
3 |
destination_vertex_ID |
The vertex_ID of the incoming head of the edge. |
4 |
edge_label |
The encoded label of the edge, which describes the relationship between the two vertices |
5 |
key_name |
The encoded name of the key in a key-value pair If the edge has no properties, then enter a space ( 100,1,2,likes,%20,,,, |
6 |
value_type |
An integer that represents the data type of the value in the key-value pair:
|
7 |
value |
The encoded, nonnull value of key_name when it is neither numeric nor date |
8 |
value |
The encoded, nonnull value of key_name when it is numeric |
9 |
value |
The encoded, nonnull value of key_name when it is a date Use the Java SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'Th'HH:mm:ss.SSSXXX"); encode(sdf.format((java.util.Date) value)); |
Required Grouping of Edges: An edge can have multiple properties, and the edge file includes a record (represented by a single line of text in the flat file) for each combination of an edge ID and a property for that edge. In the edge file, all records for each edge must be grouped together (that is, not have any intervening records for other edges. You can accomplish this any way you want, but a convenient way is to sort the edge file records in ascending (or descending) order by edge ID. (Note, however, an edge file is not required to have all records sorted by edge ID; this is merely one way to achieve the grouping requirement.)
The encoding is UTF-8 for the vertex and edge files. Table 4-7 lists the special characters that must be encoded as strings when they appear in a vertex or edge property (key-value pair) or an edge label. No other characters require encoding.
Table 4-7 Special Character Codes in the Oracle Flat File Format
Special Character | String Encoding | Description |
---|---|---|
|
|
Percent |
|
|
Tab |
|
Space |
|
|
|
New line |
|
|
Return |
|
|
Comma |
An example property graph in Oracle flat file format is as follows. In this example, there are two vertices (John and Mary), and a single edge denoting that John is a friend of Mary.
%cat simple.opv 1,age,2,,10, 1,name,1,John,, 2,name,1,Mary,, 2,hobby,1,soccer,, %cat simple.ope 100,1,2,friendOf,%20,,,,
The Oracle Big Data Spatial and Graph support for property graphs includes an example Python user interface. It can invoke a set of example Python scripts and modules that perform a variety of property graph operations.
Instructions for installing the example Python user interface are in the /property_graph/examples/pyopg/README
file under the installation home (/opt/oracle/oracle-spatial-graph
by default).
The example Python scripts in /property_graph/examples/pyopg/
can used with Oracle Spatial and Graph Property Graph, and you may want to change and enhance them (or copies of them) to suit your needs.
To invoke the user interface to run the examples, use the script pyopg.sh
.
The examples include the following:
Example 1: Connect to an Oracle NoSQL Database and perform a simple check of number of vertices and edges. To run it:
cd /opt/oracle/oracle-spatial-graph/property_graph/examples/pyopg ./pyopg.sh connectONDB("mygraph", "kvstore", "localhost:5000") print "vertices", countV() print "edges", countE()
In the preceding example, mygraph
is the name of the graph stored in the Oracle NoSQL Database, kvstore
and localhost:5000
are the connection information to access the Oracle NoSQL Database. They must be customized for your environment.
Example 2: Connect to an Apache HBase and perform a simple check of number of vertices and edges. To run it:
cd /opt/oracle/oracle-spatial-graph/property_graph/examples/pyopg ./pyopg.sh connectHBase("mygraph", "localhost", "2181") print "vertices", countV() print "edges", countE()
In the preceding example, mygraph
is the name of the graph stored in the Apache HBase, and localhost
and 2181
are the connection information to access the Apache HBase. They must be customized for your environment.
Example 3: Connect to an Oracle NoSQL Database and run a few analytical functions. To run it:
cd /opt/oracle/oracle-spatial-graph/property_graph/examples/pyopg ./pyopg.sh connectONDB("mygraph", "kvstore", "localhost:5000") print "vertices", countV() print "edges", countE() import pprint analyzer = analyst() print "# triangles in the graph", analyzer.countTriangles() graph_communities = [{"commid":i.getName(),"size":i.size()} for i in analyzer.communities().iterator()] import pandas as pd import numpy as np community_frame = pd.DataFrame(graph_communities) community_frame[:5] import matplotlib as mpl import matplotlib.pyplot as plt fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(16,12)); community_frame["size"].plot(kind="bar", title="Communities and Sizes") ax.set_xticklabels(community_frame.index); plt.show()
The preceding example connects to an Oracle NoSQL Database, prints basic information about the vertices and edges, get an in memory analyst, computes the number of triangles, performs community detection, and finally plots out in a bar chart communities and their sizes.
Example 4: Connect to an Apache HBase and run a few analytical functions. To run it:
cd /opt/oracle/oracle-spatial-graph/property_graph/examples/pyopg ./pyopg.sh connectHBase("mygraph", "localhost", "2181") print "vertices", countV() print "edges", countE() import pprint analyzer = analyst() print "# triangles in the graph", analyzer.countTriangles() graph_communities = [{"commid":i.getName(),"size":i.size()} for i in analyzer.communities().iterator()] import pandas as pd import numpy as np community_frame = pd.DataFrame(graph_communities) community_frame[:5] import matplotlib as mpl import matplotlib.pyplot as plt fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(16,12)); community_frame["size"].plot(kind="bar", title="Communities and Sizes") ax.set_xticklabels(community_frame.index); plt.show()
The preceding example connects to an Apache HBase, prints basic information about the vertices and edges, gets an in-memory analyst, computes the number of triangles, performs community detection, and finally plots out in a bar chart communities and their sizes.
For detailed information about this example Python interface, see the following directory under the installation home:
property_graph/examples/pyopg/doc/