PGX 20.1.1
Documentation

Oracle Property Graph Format

PGX allows the loading of property graphs from databases that support Oracle Property Graph schema. The Oracle Property Graph schema is implemented for the following database systems:

Graph Config for Oracle Property Graphs

To load Oracle Property Graphs from any database, you need to specify the following additional fields in the graph config:

Field Type Description Default
namestringprefix of the table name (for edge and vertex tables)required
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
db_engineenum[rdbms, nosql, hbase]underlying database enginerdbms
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long.null
edge_propsarray of objectspecification of edge properties associated with graph[]
error_handlingobjecterror handling configurationnull
external_storesarray of objectSpecification of the external stores where external string properties reside.[]
formatenum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables]graph formatnull
keystore_aliasstringalias to the keystore to use when connecting to databasenull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
max_num_connectionsintegermaximum number of database connections to use when reading the graph6
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
partition_while_loadingenum[by_label, no]Indicates if the graph should be partitioned while loadingnull
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detectednull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data).null
vertex_propsarray of objectspecification of vertex properties associated with graph[]

Oracle RDBMS

For details on the support for the Oracle Property Graph format in the Oracle RDBMS, see Property Graph Schema Objects for Oracle Database. To load Oracle Property Graphs from the Oracle RDBMS database, you need to specify the following additional fields in the graph config:

Field Type Description Default
namestringprefix of the table name (for edge and vertex tables)required
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
data_source_idstringthe data source id to use to connect to databasenull
db_engineenum[rdbms, nosql, hbase]underlying database enginerdbms
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long.null
edge_propsarray of objectspecification of edge properties associated with graph[]
edges_view_namestringthe name of view for edgesnull
error_handlingobjecterror handling configurationnull
external_storesarray of objectSpecification of the external stores where external string properties reside.[]
formatenum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables]graph formatnull
jdbc_urlstringjdbc URL pointing to databasenull
keystore_aliasstringalias to the keystore to use when connecting to databasenull
labelstringthe label to use when reading the graphnull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
max_num_connectionsintegermaximum number of database connections to use when reading the graph6
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
optionsstringa parameter that is used by the data access layer (and the underlying database) to change default behaviors of graph instance creation or initialization. Please refer to the data access layer documentation for possible configuration optionsnull
partition_while_loadingenum[by_label, no]Indicates if the graph should be partitioned while loadingnull
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
row_labelstringthe row label to use when reading the graphnull
security_policystringthe policy for the given label or row labelnull
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
usernamestringusername to use when connecting to databasenull
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detectednull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data).null
vertex_propsarray of objectspecification of vertex properties associated with graph[]
vertices_view_namestringthe name of view for verticesnull
view_parallel_hint_degreeintegerif view names are given, the resulting query will be hinted to run in parallel with the given degree. If the value is negative, the parallel hint will be omitted. If the value is zero, a parallel hint without degree is generated.1

Oracle NoSQL

To load Oracle Property Graphs from the Oracle NoSQL database, you need to specify the following additional fields in the graph config:

Field Type Description Default
hostsarray of stringlist of NoSQL hostsrequired
namestringprefix of the table name (for edge and vertex tables)required
store_namestringNoSQL store namerequired
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
db_engineenum[rdbms, nosql, hbase]underlying database enginerdbms
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long.null
edge_propsarray of objectspecification of edge properties associated with graph[]
error_handlingobjecterror handling configurationnull
external_storesarray of objectSpecification of the external stores where external string properties reside.[]
formatenum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables]graph formatnull
keystore_aliasstringalias to the keystore to use when connecting to databasenull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
max_num_connectionsintegermaximum number of database connections to use when reading the graph6
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
partition_while_loadingenum[by_label, no]Indicates if the graph should be partitioned while loadingnull
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
request_timeout_msintegerNoSQL request timeout in milliseconds5000
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
usernamestringname of a NoSQL usernull
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detectednull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data).null
vertex_propsarray of objectspecification of vertex properties associated with graph[]

Apache HBase

To load Oracle Property Graphs from Apache HBase, you need to specify the following additional fields in the graph config:

Field Type Description Default
namestringprefix of the table name (for edge and vertex tables)required
zk_quorumstringZooKeeper Quorum valuerequired
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
block_cache_sizeintegerblock_cache_size131072
compressionstringwhich HBase compression algorithm to use. Check HBase documentation for list of supported algorithmssnappy
data_block_encodingstringwhich datablock encoding algorithm to use. Supported values are 'none', 'prefix', 'diff', 'fast_diff' and 'prefix_tree'. See the DataBlockEncoding class in the org.apache.hadoop.hbase.io.encoding package for details.none
db_engineenum[rdbms, nosql, hbase]underlying database enginerdbms
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long.null
edge_propsarray of objectspecification of edge properties associated with graph[]
error_handlingobjecterror handling configurationnull
external_storesarray of objectSpecification of the external stores where external string properties reside.[]
formatenum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables]graph formatnull
hadoop_sec_authstringHadoop authentication stringnull
hbase_sec_authstringHBase authentication stringnull
hm_kerberos_principalstringHM Kerberos principalnull
initial_edge_num_regionsintegerhow many initial edge regions defined for the HBase tables24
initial_vertex_num_regionsintegerhow many initial vertex regions defined for the HBase tables24
keystore_aliasstringalias to the keystore to use when connecting to databasenull
keytabstringpath to keytab filenull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
max_num_connectionsintegermaximum number of database connections to use when reading the graph6
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
partition_while_loadingenum[by_label, no]Indicates if the graph should be partitioned while loadingnull
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
rs_kerberos_principalstringRS Kerberos principalnull
splits_per_regionintegerhow many splits per region to use when scanning vertices/edges1
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
user_principalstringUser principalnull
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detectednull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data).null
vertex_propsarray of objectspecification of vertex properties associated with graph[]
zk_client_portintegerZooKeeper client port2181
zk_node_parentstringZooKeeper node parent/hbase
zk_session_timeoutintegerZooKeeper session timeout (in milliseconds)60000

Loading Vertex Properties as Labels

Since the PG format does not specify vertex labels, PGX allows to load vertex labels from a special vertex property instead. Typically a string property is used to load labels, but is not necessary.

To load a vertex property as vertex labels, you need to specify the name of the property in the use_vertex_property_value_as_label of the loading config. Notice that it is not required for the property to appear in the vertex_props section. If the property specified in use_vertex_property_value_as_label also appears in vertex_props it is both loaded as property and label.

If a vertex does not have a value for the specified property, a vertex will receive no labels as default.

The type of the property is assumed to be string. For type checking and conversion this mechanism relies on the same configuration values as for normal properties, as specified in the error_handling configuration. This means that by default the property values are converted into strings while logging a single warning.

By default the whole property is loaded as the single label of the vertex. An optional delimiter can be specified in the configuration parameter property_value_delimiter, to split the value of the property into multiple labels. For example the property value "label1, label2, label3" will be split into three distinct labels label1, label2 and label3.

Examples

For example, to load the property named label_property from a NoSQL instance, the following graph config can be used:

{
  "format": "pg",
  "db_engine": "nosql",
  "hosts": [
    "my-host1:5000",
    "my-host2:5000",
    "my-host3:5000"
  ],
  "store_name": "my-store",
  "name": "my-graph",
  "vertex_props": [{
    "name": "prop",
    "type": "float"
  }],
  "edge_props": [{
    "name": "name",
    "type": "string"
  }, {
    "name": "cost",
    "type": "double"
  }],
  "loading_options": {
    "use_vertex_property_as_label": "label_property"
  }
}

For example to load the property named label_property and delimit it by a comma, the following graph config can be used:

{
  "format": "pg",
  "db_engine": "nosql",
  "hosts": [
    "my-host1:5000",
    "my-host2:5000",
    "my-host3:5000"
  ],
  "store_name": "my-store",
  "name": "my-graph",
  "vertex_props": [{
    "name": "prop",
    "type": "float"
  }],
  "edge_props": [{
    "name": "name",
    "type": "string"
  }, {
    "name": "cost",
    "type": "double"
  }],
  "loading_options": {
    "use_vertex_property_as_label": "label_property",
    "property_value_delimiter" ","
  }
}