PGX allows the loading of property graphs from databases that support Oracle Property Graph schema. The Oracle Property Graph schema is implemented for the following database systems:
To load Oracle Property Graphs from any database, you need to specify the following additional fields in the graph config:
Field | Type | Description | Default |
---|---|---|---|
name | string | prefix of the table name (for edge and vertex tables) | required |
array_compaction_threshold | number | [only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead | 0.2 |
attributes | object | additional attributes needed to read/write the graph data | null |
db_engine | enum[rdbms, nosql, hbase] | underlying database engine | rdbms |
edge_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default value | null |
edge_id_type | enum[long] | type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long. | null |
edge_props | array of object | specification of edge properties associated with graph | [] |
error_handling | object | error handling configuration | null |
external_stores | array of object | Specification of the external stores where external string properties reside. | [] |
format | enum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables] | graph format | null |
keystore_alias | string | alias to the keystore to use when connecting to database | null |
loading | object | loading-specific configuration | null |
local_date_format | array of string | array of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string | [] |
max_num_connections | integer | maximum number of database connections to use when reading the graph | 8 |
optimized_for | enum[read, updates] | Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updates | read |
partition_while_loading | enum[by_label, no] | Indicates if the graph should be partitioned while loading | null |
password | string | password to use when connecting to database | null |
point2d | string | longitude and latitude as floating point values separated by a space | 0.0 0.0 |
time_format | array of string | the time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string | [] |
time_with_timezone_format | array of string | the time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_format | array of string | the timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_with_timezone_format | array of string | the timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
vector_component_delimiter | character | delimiter for the different components of vector properties | ; |
vertex_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detected | null |
vertex_id_type | enum[int, integer, long, string] | type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data). | null |
vertex_props | array of object | specification of vertex properties associated with graph | [] |
For details on the support for the Oracle Property Graph format in the Oracle RDBMS, see Property Graph Schema Objects for Oracle Database. To load Oracle Property Graphs from the Oracle RDBMS database, you need to specify the following additional fields in the graph config:
Field | Type | Description | Default |
---|---|---|---|
name | string | prefix of the table name (for edge and vertex tables) | required |
array_compaction_threshold | number | [only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead | 0.2 |
attributes | object | additional attributes needed to read/write the graph data | null |
data_source_id | string | data source id to use to connect to an RDBMS instance | null |
db_engine | enum[rdbms, nosql, hbase] | underlying database engine | rdbms |
edge_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default value | null |
edge_id_type | enum[long] | type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long. | null |
edge_props | array of object | specification of edge properties associated with graph | [] |
edges_view_name | string | the name of view for edges | null |
error_handling | object | error handling configuration | null |
external_stores | array of object | Specification of the external stores where external string properties reside. | [] |
format | enum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables] | graph format | null |
jdbc_url | string | jdbc URL pointing to an RDBMS instance | null |
keystore_alias | string | alias to the keystore to use when connecting to database | null |
label | string | the label to use when reading the graph | null |
loading | object | loading-specific configuration | null |
local_date_format | array of string | array of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string | [] |
max_num_connections | integer | maximum number of database connections to use when reading the graph | 8 |
max_prefetched_rows | integer | maximun number of rows prefetched during each round trip resultset-database | 10000 |
optimized_for | enum[read, updates] | Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updates | read |
options | string | a parameter that is used by the data access layer (and the underlying database) to change default behaviors of graph instance creation or initialization. Please refer to the data access layer documentation for possible configuration options | null |
partition_while_loading | enum[by_label, no] | Indicates if the graph should be partitioned while loading | null |
password | string | password to use when connecting to database | null |
point2d | string | longitude and latitude as floating point values separated by a space | 0.0 0.0 |
row_label | string | the row label to use when reading the graph | null |
security_policy | string | the policy for the given label or row label | null |
time_format | array of string | the time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string | [] |
time_with_timezone_format | array of string | the time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_format | array of string | the timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_with_timezone_format | array of string | the timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
username | string | username to use when connecting to an RDBMS instance | null |
vector_component_delimiter | character | delimiter for the different components of vector properties | ; |
vertex_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detected | null |
vertex_id_type | enum[int, integer, long, string] | type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data). | null |
vertex_props | array of object | specification of vertex properties associated with graph | [] |
vertices_view_name | string | the name of view for vertices | null |
view_parallel_hint_degree | integer | if view names are given, the resulting query will be hinted to run in parallel with the given degree. If the value is negative, the parallel hint will be omitted. If the value is zero, a parallel hint without degree is generated. | 1 |
To load Oracle Property Graphs from the Oracle NoSQL database, you need to specify the following additional fields in the graph config:
Field | Type | Description | Default |
---|---|---|---|
hosts | array of string | list of NoSQL hosts | required |
name | string | prefix of the table name (for edge and vertex tables) | required |
store_name | string | NoSQL store name | required |
array_compaction_threshold | number | [only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead | 0.2 |
attributes | object | additional attributes needed to read/write the graph data | null |
db_engine | enum[rdbms, nosql, hbase] | underlying database engine | rdbms |
edge_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default value | null |
edge_id_type | enum[long] | type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long. | null |
edge_props | array of object | specification of edge properties associated with graph | [] |
error_handling | object | error handling configuration | null |
external_stores | array of object | Specification of the external stores where external string properties reside. | [] |
format | enum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables] | graph format | null |
keystore_alias | string | alias to the keystore to use when connecting to database | null |
loading | object | loading-specific configuration | null |
local_date_format | array of string | array of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string | [] |
max_num_connections | integer | maximum number of database connections to use when reading the graph | 8 |
optimized_for | enum[read, updates] | Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updates | read |
partition_while_loading | enum[by_label, no] | Indicates if the graph should be partitioned while loading | null |
password | string | password to use when connecting to database | null |
point2d | string | longitude and latitude as floating point values separated by a space | 0.0 0.0 |
request_timeout_ms | integer | NoSQL request timeout in milliseconds | 5000 |
time_format | array of string | the time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string | [] |
time_with_timezone_format | array of string | the time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_format | array of string | the timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_with_timezone_format | array of string | the timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
username | string | name of a NoSQL user | null |
vector_component_delimiter | character | delimiter for the different components of vector properties | ; |
vertex_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detected | null |
vertex_id_type | enum[int, integer, long, string] | type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data). | null |
vertex_props | array of object | specification of vertex properties associated with graph | [] |
To load Oracle Property Graphs from Apache HBase, you need to specify the following additional fields in the graph config:
Field | Type | Description | Default |
---|---|---|---|
name | string | prefix of the table name (for edge and vertex tables) | required |
zk_quorum | string | ZooKeeper Quorum value | required |
array_compaction_threshold | number | [only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead | 0.2 |
attributes | object | additional attributes needed to read/write the graph data | null |
block_cache_size | integer | block_cache_size | 131072 |
compression | string | which HBase compression algorithm to use. Check HBase documentation for list of supported algorithms | snappy |
data_block_encoding | string | which datablock encoding algorithm to use. Supported values are 'none', 'prefix', 'diff', 'fast_diff' and 'prefix_tree'. See the DataBlockEncoding class in the org.apache.hadoop.hbase.io.encoding package for details. | none |
db_engine | enum[rdbms, nosql, hbase] | underlying database engine | rdbms |
edge_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default value | null |
edge_id_type | enum[long] | type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long. | null |
edge_props | array of object | specification of edge properties associated with graph | [] |
error_handling | object | error handling configuration | null |
external_stores | array of object | Specification of the external stores where external string properties reside. | [] |
format | enum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables] | graph format | null |
hadoop_sec_auth | string | Hadoop authentication string | null |
hbase_sec_auth | string | HBase authentication string | null |
hm_kerberos_principal | string | HM Kerberos principal | null |
initial_edge_num_regions | integer | how many initial edge regions defined for the HBase tables | 24 |
initial_vertex_num_regions | integer | how many initial vertex regions defined for the HBase tables | 24 |
keystore_alias | string | alias to the keystore to use when connecting to database | null |
keytab | string | path to keytab file | null |
loading | object | loading-specific configuration | null |
local_date_format | array of string | array of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string | [] |
max_num_connections | integer | maximum number of database connections to use when reading the graph | 8 |
optimized_for | enum[read, updates] | Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updates | read |
partition_while_loading | enum[by_label, no] | Indicates if the graph should be partitioned while loading | null |
password | string | password to use when connecting to database | null |
point2d | string | longitude and latitude as floating point values separated by a space | 0.0 0.0 |
rs_kerberos_principal | string | RS Kerberos principal | null |
splits_per_region | integer | how many splits per region to use when scanning vertices/edges | 1 |
time_format | array of string | the time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string | [] |
time_with_timezone_format | array of string | the time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_format | array of string | the timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string | [] |
timestamp_with_timezone_format | array of string | the timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string | [] |
user_principal | string | User principal | null |
vector_component_delimiter | character | delimiter for the different components of vector properties | ; |
vertex_id_strategy | enum[no_ids, keys_as_ids, unstable_generated_ids] | Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detected | null |
vertex_id_type | enum[int, integer, long, string] | type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data). | null |
vertex_props | array of object | specification of vertex properties associated with graph | [] |
zk_client_port | integer | ZooKeeper client port | 2181 |
zk_node_parent | string | ZooKeeper node parent | /hbase |
zk_session_timeout | integer | ZooKeeper session timeout (in milliseconds) | 60000 |
Since the PG format does not specify vertex labels, PGX allows to load vertex labels from a special vertex property instead. Typically a string property is used to load labels, but is not necessary.
To load a vertex property as vertex labels, you need to specify the name of the property in the use_vertex_property_value_as_label
of the loading config.
Notice that it is not required for the property to appear in the vertex_props
section.
If the property specified in use_vertex_property_value_as_label
also appears in vertex_props
it is both loaded as property and label.
If a vertex does not have a value for the specified property, a vertex will receive no labels as default.
The type of the property is assumed to be string
.
For type checking and conversion this mechanism relies on the same configuration values as for normal properties, as specified in the error_handling
configuration.
This means that by default the property values are converted into strings while logging a single warning.
By default the whole property is loaded as the single label of the vertex.
An optional delimiter can be specified in the configuration parameter property_value_delimiter
, to split the value of the property into multiple labels.
For example the property value "label1, label2, label3"
will be split into three distinct labels label1
, label2
and label3
.
For example, to load the property named label_property
from a NoSQL instance, the following graph config can be used:
{ "format": "pg", "db_engine": "nosql", "hosts": [ "my-host1:5000", "my-host2:5000", "my-host3:5000" ], "store_name": "my-store", "name": "my-graph", "vertex_props": [{ "name": "prop", "type": "float" }], "edge_props": [{ "name": "name", "type": "string" }, { "name": "cost", "type": "double" }], "loading_options": { "use_vertex_property_as_label": "label_property" } }
For example to load the property named label_property
and delimit it by a comma, the following graph config can be used:
{ "format": "pg", "db_engine": "nosql", "hosts": [ "my-host1:5000", "my-host2:5000", "my-host3:5000" ], "store_name": "my-store", "name": "my-graph", "vertex_props": [{ "name": "prop", "type": "float" }], "edge_props": [{ "name": "name", "type": "string" }, { "name": "cost", "type": "double" }], "loading_options": { "use_vertex_property_as_label": "label_property", "property_value_delimiter" "," } }