PGX 20.1.1
Documentation

Oracle Two Tables RDBMS Format

When using a relational data model, graph data can be represented with two relational tables. One table is for vertices and their properties; the other table for edges and their properties. See Handling Property Graphs Using a Two-Tables Schema for more details.

PGX allows graphs to be loaded from such relational graph representation -- two RDBMS tables representing vertices and edges. All you need is to specify the following additional fields in the graph config.

General Configuration Fields

Field Type Description Default
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
data_source_idstringthe data source id to use to connect to databasenull
datastoreenum[rdbms, spark, file]underlying datastorerdbms
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long.null
edge_propsarray of objectspecification of edge properties associated with graph[]
edges_key_columnstringname of primary key column in edges tableeid
edges_label_columnstringcolumn name for edge labelel
edges_table_namestringname of edges tablenull
error_handlingobjecterror handling configurationnull
external_storesarray of objectSpecification of the external stores where external string properties reside.[]
formatenum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables]graph formatnull
from_nid_columnstringcolumn name for source nodesvid
insert_batch_sizeintegerbatch size of the rows to be inserted10000
jdbc_urlstringjdbc URL pointing to databasenull
keystore_aliasstringalias to the keystore to use when connecting to databasenull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
max_prefetched_rowsintegermaximun number or rows prefetched during each round trip (resultset-the database)10000
nodes_key_columnstringname of primary key column in nodes tablevid
nodes_label_columnstringcolumn name for node labelvl
nodes_table_namestringname of nodes tablenull
num_connectionsintegernumber of connections to read/write data from/to two tables<no-of-cpus>
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
partition_while_loadingenum[by_label, no]Indicates if the graph should be partitioned while loadingnull
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
schemastringschema where the tables are going to be writtennull
tablespacestringtablespace where the tables are going to be writtenusers
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
to_nid_columnstringcolumn name for destination nodedvid
usernamestringusername to use when connecting to databasenull
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detectednull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data).null
vertex_propsarray of objectspecification of vertex properties associated with graph[]

Column Name Configuration Fields

Field Type Description Default
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long.null
edge_propsarray of objectspecification of edge properties associated with graph[]
edges_key_columnstringname of primary key column in edges tableeid
edges_label_columnstringcolumn name for edge labelel
error_handlingobjecterror handling configurationnull
external_storesarray of objectSpecification of the external stores where external string properties reside.[]
formatenum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables]graph formatnull
from_nid_columnstringcolumn name for source nodesvid
keystore_aliasstringalias to the keystore to use when connecting to databasenull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
nodes_key_columnstringname of primary key column in nodes tablevid
nodes_label_columnstringcolumn name for node labelvl
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
partition_while_loadingenum[by_label, no]Indicates if the graph should be partitioned while loadingnull
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
to_nid_columnstringcolumn name for destination nodedvid
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detectednull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data).null
vertex_propsarray of objectspecification of vertex properties associated with graph[]

Warning: Case of Column Names

PGX assumes that the provided property names are an exact match for the column names. For instance, the loading will fail if the column is named VPROP in the RDBMS but vprop in the configuration. Note that since SQL is by default case-insensitive, if you create a column with a name in lowercase, the RDBMS will store it in uppercase, and you will need to specify the column name in uppercase in the PGX's graph configuration. To avoid this, when creating the column in your database, you can use quotes around the name for the RDBMS to store it in a case-sensitive manner.

Example

The following examples illustrate how to configure PGX to load a graph from two relational tables (a vertex table and an edge table).

Example Graph Configuration

{
  "jdbc_url": "jdbc:oracle:thin:@mydatabaseserver:1521:sid",
  "format": "two_tables",
  "datastore": "rdbms",
  "username": "dbUser",
  "password": "dbPassword",
  "nodes_table_name": "nodes",
  "edges_table_name": "edges",
  "node_props": [{
    "name": "VProp1",
    "type": "string"
  },{
    "name": "VProp2",
    "type": "timestamp"
  },{
    "name": "VProp3",
    "type": "double"
  }],
  "edge_props": [{
    "name": "EProp1",
    "type": "string"
  }],
  "loading_options": {
    "load_edge_label": true
  }
}

Example Vertex Table

vid VProp1 VProp2 VProp3
42 "string" 27/09/2018 42.42
37 "prop" 05/07/1987 42.37
... ... ... ...

Example Edge Table

eid svid dvid EProp1 el
0 42 37 "blah" "label0"
99 42 42 "bazz" "label1"
... ... ... ... ...

The example tables above use the default column names (e.g., vid, eid, ...), but custom column names can be defined using the appropriate configuration fields (e.g., nodes_key_column, edges_key_column; see the column name configuration fields).

For more details in how to specify the proper JDBC url check JDBC thin docs

Handling of NULL Values

In PGX 20.1.1, neither property nor label values cannot be NULL; therefore PGX converts NULL values as illustrated below.

Properties

Whenever PGX encounters a NULL property value in the vertex or edge tables, a default value will be assigned instead. If not specified in the config, the default value is 0 for numeric properties, false for boolean properties, "" (empty String) for String properties and epoch time for date properties. You can set a custom default value in the configuration for a property as in the following example.

{
  "name": "VProp3",
  "type": "double",
  "default": 1.0
}

Labels

If the graph is configured to load labels (for either vertices or edges) and the corresponding table column (by default, named vl and el for vertices and edges, respectively) contains NULL values, then the corresponding labels will not be set.