5.4.2.3 SQL*Loader-Based Data Loading
SQL*Loader-based data loading uses Oracle SQL*Loader to load the graph data into Oracle Database. SQL*Loader loads data from external files into Oracle Database tables. In this case, the vertices (or edges) in the given input stream will be spread among multiple chunks by the splitter thread. Each chunk will be processed by a different loader thread that inserts all the elements in the chunk into a temporary work table using SQL*Loader. The number of splitter and loader threads used is determined by the degree of parallelism (DOP) specified by the user.
After all the graph data is loaded into the temporary work table, the graph loader will load all the data stored in the temporary work tables into the property graph VT$ and GE$ tables.
The following code fragment loads the graph data from a vertex and edge files in Oracle flat-file format using a SQL-based parallel data loading with a degree of parallelism of 48. To use the APIs, the path to the SQL*Loader must be specified.
String szUser = "username";
String szPassword = "password";
String szDbId = "db18c"; /*service name of the database*/
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
String szSQLLoaderPath = "<YOUR_ORACLE_HOME>/bin/sqlldr";
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName);
opgdl = OraclePropertyGraphDataLoader.getInstance();
opgdl.loadDataWithSqlLdr(opg, szUser, szPassword, szDbId,
szOPVFile, szOPEFile,
48 /* DOP */,
true /*named pipe flag */,
szSQLLoaderPath /* SQL*Loader path: the path to
bin/sqlldr*/,
true /*rebuild index */,
"pddl=t,pdml=t" /* options */);
As with JDBC-based data loading, SQL*Loader-based data loading supports parallel data loading using a single file, multiple files, partitions, and fine-tuning.
Subtopics:
-
SQL*Loader-Based Data Loading with Multiple Files
-
SQL*Loader-Based Data Loading with Partitions
-
SQL*Loader-Based Parallel Data Loading Using Fine-Tuning
SQL*Loader-Based Data Loading with Multiple Files
SQL*Loader-based data loading supports loading vertices and edges from multiple files or input streams into the database. The following code fragment loads multiple vertex and edge files using the parallel data loading APIs. In the example, two string arrays szOPVFiles
and szOPEFiles
are used to hold the input files.
String szUser = "username";
String szPassword = "password";
String szDbId = "db18c"; /*service name of the database*/
String[] szOPVFiles = new String[] {"../../data/connections-p1.opv",
"../../data/connections-p2.opv"};
String[] szOPEFiles = new String[] {"../../data/connections-p1.ope",
"../../data/connections-p2.ope"};
String szSQLLoaderPath = "../../../dbhome_1/bin/sqlldr";
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName);
opgdl = OraclePropertyGraphDataLoader.getInstance();
opgdl. loadDataWithSqlLdr (opg, szUser, szPassword, szDbId,
szOPVFiles, szOPEFiles,
48 /* DOP */,
true /* named pipe flag */,
szSQLLoaderPath /* SQL*Loader path */,
true /* rebuild index flag */,
"pddl=t,pdml=t" /* options */);
SQL*Loader-Based Data Loading with Partitions
When dealing with a large property graph, the SQL*Loader-based data loading API allows loading the graph data in Oracle flat-file format into Oracle Database using logical partitioning. Each partition represents a subset of vertices (or edges) in the graph data file of size that is approximately the number of distinct element IDs in the file divided by the number of partitions. Each partition is identified by an integer ID in the range of [0, Number of partitions – 1].
To use parallel data loading with partitions, you must specify the total number of partitions to use and the partition offset, in addition to the base parameters used in the loadDataWithSqlLdr
API. To fully load a graph data file or input stream into the database, you must execute the data loading operation as many times as the defined number of partitions. For example, to load the graph data from a file using two partitions, there should be two data loading API calls using an offset of 0 and 1. Each call to the data loader can be processed using multiple threads or a separate Java client on a single system or multiple systems.
Note that this approach is intended to be used with a single vertex file (or input stream) and a single edge file (or input stream). Additionally, this option requires disabling the indexes and constraints on vertices and edges. These indexes and constraints must be rebuilt after all partitions have been loaded.
The example for JDBC-based data loading with partitions can be easily migrated to work as SQL*Loader- based loading with partitions. The only changes needed are to replace API loadData()
with loadDataWithSqlLdr()
, and supply some additional input parameters such as the location of SQL*Loader.
SQL*Loader-Based Parallel Data Loading Using Fine-Tuning
SQL Loader-based data loading supports fine-tuning the subset of data from a line to be loaded, as well as the ID offset to use when loading the elements into the property graph instance. You can specify the subset of data to load from a file by specifying the maximum number of lines to read from the file and the offset line number for both vertices and edges. This way, data will be loaded from the offset line number until the maximum number of lines has been read. If the maximum line number is -1, the loading process will scan the data until reaching the end of file.
Because graph data files may have some ID collisions, the SQL Loader-based data loading allows you to define a vertex and edge ID offset. This way, the ID of each loaded vertex will be obtained from the sum of the original vertex ID with the given vertex ID offset. Similarly, the ID of each loaded edge will be generated from the sum of the original edge ID with the given edge ID offset. Note that the vertices and edge files must be correlated, because the in/out vertex ID for the loaded edges will be modified with respect to the specified vertex ID offset. This operation is supported only in a data loading using a single partition.
The following code fragment loads the first 100 vertices and edges from the given graph data file. In this example, no ID offset is provided.
String szUser = "username";
String szPassword = "password";
String szDbId = "db18c"; /* service name of the database */
String szOPVFile = "../../data/connections.opv";
String szOPEFile = "../../data/connections.ope";
String szSQLLoaderPath = "../../../dbhome_1/bin/sqlldr";
// Run the data loading using fine tuning
long lVertexOffsetlines = 0;
long lEdgeOffsetlines = 0;
long lVertexMaxlines = 100;
long lEdgeMaxlines = 100;
long lVIDOffset = 0;
long lEIDOffset = 0;
OraclePropertyGraph opg = OraclePropertyGraph.getInstance( args, szGraphName);
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance();
opgdl.loadDataWithSqlLdr(opg, szUser, szPassword, szDbId,
szOPVFile, szOPEFile,
lVertexOffsetlines /* offset of lines to start loading
from partition, default 0*/,
lEdgeOffsetlines /* offset of lines to start loading from
partition, default 0*/,
lVertexMaxlines /* maximum number of lines to start
loading from partition, default -1
(all lines in partition)*/,
lEdgeMaxlines /* maximum number of lines to start loading
from partition, default -1 (all lines in
partition) */,
lVIDOffset /* vertex ID offset: the vertex ID will be
original vertex ID + offset, default 0 */,
lEIDOffset /* edge ID offset: the edge ID will be
original edge ID + offset, default 0 */,
48 /* DOP */,
1 /* Total number of partitions, default 1 */,
0 /* Partition to load (from 0 to totalPartitions - 1,
default 0) */,
OraclePropertyGraphDataLoader.NAMEDPIPE
/* splitter flag */,
"chunkPrefix" /* prefix */,
szSQLLoaderPath /* SQL*Loader path: the path to
bin/sqlldr*/,
true /* rebuild index */,
"pddl=t,pdml=t" /* options */);
Parent topic: Parallel Loading of Graph Data