Merging New Data into an Existing Property Graph

5.4.7 Merging New Data into an Existing Property Graph

In addition to loading graph data into an empty property graph in Oracle Database, you can merge new graph data into an existing (empty or non-empty) graph. As with data loading, data merging splits the input vertices and edges into multiple chunks and merges them with the existing graph in database in parallel.

When doing the merging, the flows are different depends on whether there is an overlap between new graph data and existing graph data. Overlap here means that the same key of a graph element may have different values in the new and existing graph data. For example, key weight of the vertex with ID 1 may have value 0.8 in the new graph data and value 0.5 in the existing graph data. In this case, you must specify whether the new value or the existing value should be used for the key.

The following options are available for graph data merging: JDB-based, external table-based, and SQL loader-based merging.

JDBC-Based Graph Data Merging
External Table-Based Data Merging
SQL Loader-Based Data Merging

JDBC-Based Graph Data Merging

JDBC-based data merging uses Java Database Connectivity (JDBC) APIs to load the new graph data into Oracle Database and then merge the new graph data into an existing graph.

The following example merges the new graph data from vertex and edge files szOPVFile and szOPEFile in Oracle-defined Flat-file format with an existing graph named opg, using a JDBC-based data merging with a DOP (degree of parallelism) of 48, batch size of 1000, and specified data merging options.

String szOPVFile = "../../data/connectionsNew.opv"; 
String szOPEFile = "../../data/connectionsNew.ope"; 
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); 
opgdl.mergeData(opg, szOPVFile, szOPEFile, 
     48 /*DOP*/, 
     1000 /*Batch Size*/, 
     true /*Rebuild index*/,  
     "pdml=t, pddl=t, no_dup=t, use_new_val_for_dup_key=t" /*Merge options*/);

To optimize the performance of the data merging operations, a set of flags and hints can be specified in the merging options parameter when calling the JDBC-based data merging. These hints include:

DOP: The degree of parallelism to use when merging the data. This parameter determines the number of chunks to generate when splitting the file, as well as the number of loader threads to use when merging the data into the property graph VT$ and GE$ tables.
Batch Size: An integer specifying the batch size to use for Oracle JDBC statements in batching mode.
Rebuild index: If set to true, the data loader will disable all the indexes and constraints defined over the property graph into which the data will be loaded. After all the data is merged into the property graph, all the original indexes and constraints will be rebuilt and enabled.
Merge options: An option (or multiple options separated by commas) to optimize the data merging operations. These options include:
- PDML=T: enables parallel execution for DML operations for the database session used in the data loader. This hint is used to improve the performance of long-running batching jobs.
- PDDL=T: enables parallel execution for DDL operations for the database session used in the data loader. This hint is used to improve the performance of long-running batching jobs.
- NO_DUP=T: assumes the input new graph data does not have invalid duplicates. In a valid property graph, each vertex (or edge) can at most have one value for a given property key. In an invalid property graph, a vertex (or edge) may have two or more values for a particular key. As an example, a vertex, v, has two key/value pairs: name/"John" and name/"Johnny", and they share the same key.
- OVERLAP=F: assumes there is no overlap between new graph data and existing graph data. That is, there is no key with multiple distinct values in the new and existing graph data.
- USE_NEW_VAL_FOR_DUP_KEY=T: if there is overlap between new graph data and existing graph data, use the value in the new graph data; otherwise, use the value in the existing graph data.

External Table-Based Data Merging

External table-based data merging uses an external table to load new graph data into Oracle Database and then merge the new graph data into an existing graph.

External-table based data merging requires a directory object, where the files read by the external tables will be stored. This directory can be created using the following SQL*Plus statements:

create or replace directory tmp_dir as '/tmppath/';
grant read, write on directory tmp_dir to public;

The following example merges the new graph data from a vertex and edge files szOPVFile and szOPEFile in Oracle flat-file format with an existing graph opg using an external table-based data merging, a DOP (degree of parallelism) of 48, and specified merging options.

String szOPVFile = "../../data/connectionsNew.opv"; 
String szOPEFile = "../../data/connectionsNew.ope"; 
String szExtDir = "tmp_dir";
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); 
opgdl.mergeDataWithExtTab(opg, szOPVFile, szOPEFile, 
     48 /*DOP*/, 
     true /*Use Named Pipe for splitting*/, 
     szExtDir /*database directory object*/, 
     true /*Rebuild index*/,  
     "pdml=t, pddl=t, no_dup=t, use_new_val_for_dup_key=t" /*Merge options*/);

SQL Loader-Based Data Merging

SQL loader-based data merging uses Oracle SQL*Loader to load the new graph data into Oracle Database and then merge the new graph data into an existing graph.

The following example merges the new graph data from a vertex and edge files szOPVFile and szOPEFile in Oracle Flat-file format with an existing graph opg using an SQL loader -based data merging with a DOP (degree of parallelism) of 48 and the specified merging options. To use the APIs, the path to the SQL*Loader needs to be specified.

String szUser = "username";
String szPassword = "password";
String szDbId = "db18c"; /*service name of the database*/ 
String szOPVFile = "../../data/connectionsNew.opv"; 0
String szOPEFile = "../../data/connectionsNew.ope"; 
String szSQLLoaderPath = "<YOUR_ORACLE_HOME>/bin/sqlldr";    
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); 
opgdl.mergeDataWithSqlLdr(opg, szUser, szPassword, szDbId, szOPVFile, szOPEFile, 
     48 /*DOP*/, 
     true /*Use Named Pipe for splitting*/, 
     szSQLLoaderPath /* SQL*Loader path: the path to bin/sqlldr */, 
     true /*Rebuild index*/,  
     "pdml=t, pddl=t, no_dup=t, use_new_val_for_dup_key=t" /*Merge options*/);

Parent topic: Using Java APIs for Property Graph Data