When using a relational data model, graph data can be represented with two relational tables. One table is for vertices and their properties; the other table for edges and their properties. See Handling Property Graphs Using a Two-Tables Schema for more details.
PGX allows graphs to be loaded from such relational graph representation -- two RDBMS tables representing vertices and edges. All you need is to specify the following additional fields in the graph config.
Field | Type | Description | Default |
---|---|---|---|
array_compaction_threshold | number | [only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead | 0.2 |
attributes | object | additional attributes needed to read/write the graph data | null |
data_source_id | string | data source id to use to connect to an RDBMS instance | null |
datastore | enum[rdbms, spark, file] | underlying datastore | rdbms |
edge_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default value | null |
edge_id_type | enum[long] | type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long. | null |
edge_props | array of object | specification of edge properties associated with graph | [] |
edges_key_column | string | name of primary key column in edges table | eid |
edges_label_column | string | column name for edge label | el |
edges_table_name | string | name of edges table | null |
error_handling | object | error handling configuration | null |
external_stores | array of object | Specification of the external stores where external string properties reside. | [] |
format | enum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables] | graph format | null |
from_nid_column | string | column name for source node | svid |
insert_batch_size | integer | batch size of the rows to be inserted | 10000 |
jdbc_url | string | jdbc URL pointing to an RDBMS instance | null |
keystore_alias | string | alias to the keystore to use when connecting to database | null |
loading | object | loading-specific configuration | null |
local_date_format | array of string | array of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string | [] |
max_prefetched_rows | integer | maximun number of rows prefetched during each round trip resultset-database | 10000 |
nodes_key_column | string | name of primary key column in nodes table | vid |
nodes_label_column | string | column name for node label | vl |
nodes_table_name | string | name of nodes table | null |
num_connections | integer | number of connections to read/write data from/to the RDBMS table | <no-of-cpus> |
optimized_for | enum[read, updates] | Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updates | read |
partition_while_loading | enum[by_label, no] | Indicates if the graph should be partitioned while loading | null |
password | string | password to use when connecting to database | null |
point2d | string | longitude and latitude as floating point values separated by a space | 0.0 0.0 |
schema | string | schema to use when reading/writing RDBMS objects | null |
tablespace | string | tablespace where the tables are going to be written | users |
time_format | array of string | the time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string | [] |
time_with_timezone_format | array of string | the time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_format | array of string | the timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_with_timezone_format | array of string | the timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
to_nid_column | string | column name for destination node | dvid |
username | string | username to use when connecting to an RDBMS instance | null |
vector_component_delimiter | character | delimiter for the different components of vector properties | ; |
vertex_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detected | null |
vertex_id_type | enum[int, integer, long, string] | type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data). | null |
vertex_props | array of object | specification of vertex properties associated with graph | [] |
Field | Type | Description | Default |
---|---|---|---|
array_compaction_threshold | number | [only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead | 0.2 |
attributes | object | additional attributes needed to read/write the graph data | null |
edge_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default value | null |
edge_id_type | enum[long] | type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long. | null |
edge_props | array of object | specification of edge properties associated with graph | [] |
edges_key_column | string | name of primary key column in edges table | eid |
edges_label_column | string | column name for edge label | el |
error_handling | object | error handling configuration | null |
external_stores | array of object | Specification of the external stores where external string properties reside. | [] |
format | enum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables] | graph format | null |
from_nid_column | string | column name for source node | svid |
keystore_alias | string | alias to the keystore to use when connecting to database | null |
loading | object | loading-specific configuration | null |
local_date_format | array of string | array of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string | [] |
nodes_key_column | string | name of primary key column in nodes table | vid |
nodes_label_column | string | column name for node label | vl |
optimized_for | enum[read, updates] | Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updates | read |
partition_while_loading | enum[by_label, no] | Indicates if the graph should be partitioned while loading | null |
password | string | password to use when connecting to database | null |
point2d | string | longitude and latitude as floating point values separated by a space | 0.0 0.0 |
time_format | array of string | the time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string | [] |
time_with_timezone_format | array of string | the time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_format | array of string | the timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_with_timezone_format | array of string | the timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
to_nid_column | string | column name for destination node | dvid |
vector_component_delimiter | character | delimiter for the different components of vector properties | ; |
vertex_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detected | null |
vertex_id_type | enum[int, integer, long, string] | type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data). | null |
vertex_props | array of object | specification of vertex properties associated with graph | [] |
Warning: Case of Column Names
PGX assumes that the provided property names are an exact match for the column names. For instance, the loading will
fail if the column is named VPROP
in the RDBMS but vprop
in the configuration. Note that since SQL is by default
case-insensitive, if you create a column with a name in lowercase, the RDBMS will store it in uppercase, and you
will need to specify the column name in uppercase in the PGX's graph configuration. To avoid this, when creating
the column in your database, you can use quotes around the name for the RDBMS to store it in a case-sensitive manner.
The following examples illustrate how to configure PGX to load a graph from two relational tables (a vertex table and an edge table).
{ "jdbc_url": "jdbc:oracle:thin:@mydatabaseserver:1521:sid", "format": "two_tables", "datastore": "rdbms", "username": "dbUser", "password": "dbPassword", "nodes_table_name": "nodes", "edges_table_name": "edges", "node_props": [{ "name": "VProp1", "type": "string" },{ "name": "VProp2", "type": "timestamp" },{ "name": "VProp3", "type": "double" }], "edge_props": [{ "name": "EProp1", "type": "string" }], "loading_options": { "load_edge_label": true } }
vid | VProp1 | VProp2 | VProp3 |
---|---|---|---|
42 | "string" | 27/09/2018 | 42.42 |
37 | "prop" | 05/07/1987 | 42.37 |
... | ... | ... | ... |
eid | svid | dvid | EProp1 | el |
---|---|---|---|---|
0 | 42 | 37 | "blah" | "label0" |
99 | 42 | 42 | "bazz" | "label1" |
... | ... | ... | ... | ... |
The example tables above use the default column names (e.g., vid, eid, ...), but custom column names can be defined using the appropriate configuration fields (e.g., nodes_key_column
, edges_key_column
; see the column name configuration fields).
For more details in how to specify the proper JDBC url check JDBC thin docs
In PGX 21.1.1, neither property nor label values cannot be NULL
; therefore PGX converts NULL
values as illustrated below.
Whenever PGX encounters a NULL
property value in the vertex or edge tables, a default value will be assigned instead.
If not specified in the config, the default value is 0
for numeric properties, false
for boolean properties, ""
(empty String) for String properties and epoch time for date properties.
You can set a custom default value in the configuration for a property as in the following example.
{ "name": "VProp3", "type": "double", "default": 1.0 }
If the graph is configured to load labels (for either vertices or edges) and the corresponding table column (by default, named vl
and el
for vertices and edges, respectively) contains NULL
values, then the corresponding labels will not be set.