Chapter 4. Load an RDF Graph

Table of Contents

Parallel Loading Using the RDF Graph feature

The RDF Graph feature supports loading RDF triples data into the default graph or a named graph in Oracle NoSQL Database. RDF data can be loaded into the graph using two approaches: Triples can be inserted incrementally using the graph.add(Triple.create()) API as illustrated in Example1.java: Create a default graph and add/delete triples and Example1b.java: Create a named graph and add/delete triples.

Triples can be bulk loaded from an RDF file using the DatasetGraphNoSql.load() API as illustrated in Example2.java: Load an RDF file and Concurrent RDF data loading.

Parallel Loading Using the RDF Graph feature

To load RDF data files containing thousands to millions of records into an Oracle NoSQL Database, you can use concurrent loading in the RDF Graph feature to speed up the task.

Concurrent or parallel loading is an optimized solution to data loading in the RDF Graph feature, where triples are organized into batches and load execution is done if and only if a batch is full or the process has loaded all triples from the RDF file. Once a batch is full, to increase performance on write operations to Oracle NoSQL Database, we use multiple threads and connections to store multiple triples into the Oracle NoSQL Database.

You can use parallel loading by specifying the degree of parallelism (number of threads used in load operations) and the size of the batches managed when calling the load method in the OracleDatasetGraphNoSql class.

The following example loads an RDF data file in Oracle NoSQL Database using parallel loading. The degree of parallelism and batch size used are controlled by the input parameters iDOP and iBatchSize respectively.

On a balanced hardware setup with 4 or more CPU cores, setting a DOP to 8 (or 16) can improve significantly the speed of the load operation when many triples are going to be processed.

public static void main(String[] args) throws Exception
{
String szStoreName  = args[0];
String szHostName   = args[1];
String szHostPort   = args[2];
int iBatchSize      = Integer.parseInt(args[3]);
int iDOP            = Integer.parseInt(args[4]);

// Create Oracle NoSQL connection
OracleNoSqlConnection conn 
= OracleNoSqlConnection.createInstance(szStoreName,
                                       szHostName, 
                                       szHostPort);
     
// Create Oracle NoSQL datasetgraph
OracleGraphNoSql graph = new OracleGraphNoSql(conn);
DatasetGraphNoSql datasetGraph = DatasetGraphNoSql.createFrom(graph);
   
// Close graph, as it is no longer needed
graph.close();
    
// Clear datasetgraph
datasetGraph.clearRepository();
    
// Load N-QUADS data from a file into the Oracle NoSQL Database
DatasetGraphNoSql.load("example.nt", 
                       Lang.NQUADS,         // data format
                       conn, 
                       "http://example.org",
                       iBatchSize,          // batch size
                       iDOP);               // degree of parallelism
    
// Create dataset from Oracle NoSQL datasetgraph to execute
Dataset ds = DatasetImpl.wrap(datasetGraph);
   
String szQuery = "select * where { graph ?g { ?s ?p ?o }  }";
System.out.println("Execute query " + szQuery);

Query query = QueryFactory.create(szQuery);
QueryExecution qexec = QueryExecutionFactory.create(query, ds);

try {
      ResultSet results = qexec.execSelect();
      ResultSetFormatter.out(System.out, results, query);
    }

finally {
      qexec.close();
    }

ds.close();
conn.dispose();
   }
}