Table of Contents
The RDF Graph feature supports loading RDF triples data into the
default graph or a named graph in Oracle NoSQL Database. RDF data can be
loaded into the graph using two approaches: Triples can be inserted
incrementally using the
graph.add(Triple.create())
API as
illustrated in
Example1.java: Create a default graph and add/delete triples
and
Example1b.java: Create a named graph and add/delete triples.
Triples can be bulk loaded from an RDF file using the
DatasetGraphNoSql.load()
API as illustrated in
Example2.java: Load an RDF file
and
Concurrent RDF data loading.
To load RDF data files containing thousands to millions of records into an Oracle NoSQL Database, you can use concurrent loading in the RDF Graph feature to speed up the task.
Concurrent or parallel loading is an optimized solution to data loading in the RDF Graph feature, where triples are organized into batches and load execution is done if and only if a batch is full or the process has loaded all triples from the RDF file. Once a batch is full, to increase performance on write operations to Oracle NoSQL Database, we use multiple threads and connections to store multiple triples into the Oracle NoSQL Database.
You can use parallel loading by specifying the degree of parallelism (number of threads used in load operations) and the size of the batches managed when calling the load method in the OracleDatasetGraphNoSql class.
The following example loads an RDF data file in Oracle NoSQL Database using parallel loading. The degree of parallelism and batch size used are controlled by the input parameters iDOP and iBatchSize respectively.
On a balanced hardware setup with 4 or more CPU cores, setting a DOP to 8 (or 16) can improve significantly the speed of the load operation when many triples are going to be processed.
public static void main(String[] args) throws Exception { String szStoreName = args[0]; String szHostName = args[1]; String szHostPort = args[2]; int iBatchSize = Integer.parseInt(args[3]); int iDOP = Integer.parseInt(args[4]); // Create Oracle NoSQL connection OracleNoSqlConnection conn = OracleNoSqlConnection.createInstance(szStoreName, szHostName, szHostPort); // Create Oracle NoSQL datasetgraph OracleGraphNoSql graph = new OracleGraphNoSql(conn); DatasetGraphNoSql datasetGraph = DatasetGraphNoSql.createFrom(graph); // Close graph, as it is no longer needed graph.close(); // Clear datasetgraph datasetGraph.clearRepository(); // Load N-QUADS data from a file into the Oracle NoSQL Database DatasetGraphNoSql.load("example.nt", Lang.NQUADS, // data format conn, "http://example.org", iBatchSize, // batch size iDOP); // degree of parallelism // Create dataset from Oracle NoSQL datasetgraph to execute Dataset ds = DatasetImpl.wrap(datasetGraph); String szQuery = "select * where { graph ?g { ?s ?p ?o } }"; System.out.println("Execute query " + szQuery); Query query = QueryFactory.create(szQuery); QueryExecution qexec = QueryExecutionFactory.create(query, ds); try { ResultSet results = qexec.execSelect(); ResultSetFormatter.out(System.out, results, query); } finally { qexec.close(); } ds.close(); conn.dispose(); } }