PGX 20.1.1
Documentation

Partitioned Graph Configuration

Similar to non-partitioned graph configurations, PGX requires a graph configuration for loading partitioned graphs. Partitioned graph data is split into multiple sources (i.e. files or database tables), so-called "entity providers". There are vertex providers for defining vertex data sources and edge providers for defining edge data sources.

When loading a partitioned graph the configuration needs to contain data for vertex providers and edge providers.

Typically, a partitioned graph configuration is given as a JSON file. It can also be constructed programmatically. See the related document for details.

Most restrictions and semantics of non-partitioned graph configurations apply also for partitioned graphs, if not stated otherwise.

PGX 20.1.1 limitation

It is currently not possible to construct a graph configuration from properties.

Partitioned Graph Config JSON File

Partitioned graph configurations have the following JSON fields:

Field Type Description Default
namestringname of the graphrequired
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
data_source_idstringdefault data source id to use to connect to database (for tables in RDBMS format only)null
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For partitioned graphs, setting it to long requires the IDs in the edge providers to be unique across the graphs; those IDs will be used as global IDs; setting it to null (or omitting it) will allow repeated IDs across different edge providers and PGX will automatically generate globally-unique 'partitioned' IDs for the edgesnull
edge_providersarray of objectlist of edge providers in this graph[]
error_handlingobjecterror handling configurationnull
jdbc_urlstringdefault jdbc URL pointing to database (for tables in RDBMS format only)null
keystore_aliasstringalias to the keystore to use when connecting to databasenull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
max_prefetched_rowsintegerdefault maximun number or rows prefetched during each round trip resultset-database (for tables in RDBMS format only)10000
num_connectionsintegerdefault number of connections to read/write data from/to the database table (for tables in RDBMS format only)<no-of-cpus>
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
redaction_rulesarray of objectarray of redaction rules[]
rules_mappingarray of objectmapping for redaction rules to users/roles[]
schemastringdefault schema where the database table is going to be written (for tables in RDBMS format only)null
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
usernamestringdefault username to use when connecting to database (for tables in RDBMS format only)null
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For partitioned graphs, setting it to a specific type requires the IDs in the vertex providers to be unique across the graphs; those IDs will be used as global IDs; setting it to null (or omitting it) will allow repeated IDs across different vertex providers and PGX will automatically generate globally-unique 'partitioned' IDs for the verticesnull
vertex_providersarray of objectlist of vertex providers in this graph[]

Provider Configuration

For partitioned graphs we specify the meta-information about each provider's data using provider configurations. Provider configurations include the following information about the provider data:

  • Location of the data — a file, multiple files or database providers
  • Information about the properties: name and type of the property.

Provider Configuration JSON File

All provider configurations have the following JSON fields in common:

Field Type Description Default
formatenum[pgb, csv, rdbms]provider formatrequired
namestringentity provider namerequired
attributesobjectadditional attributes needed to read/write the graph datanull
destination_vertex_providerstringname of the destination vertex provider to be used for this edge providernull
error_handlingobjecterror handling configurationnull
has_keysbooleanindicates if the provided entities data have keystrue
key_typeenum[int, integer, long, string]type of the keyslong
keystore_aliasstringalias to the keystore to use when connecting to databasenull
labelstringlabel for the entities loaded from this providernull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
propsarray of objectspecification of the properties associated with this entity provider[]
source_vertex_providerstringname of the source vertex provider to be used for this edge providernull
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
vector_component_delimitercharacterdelimiter for the different components of vector properties;

Provider Labels

The label field in the provider configuration can be used to set a label for the entities loaded from the provider. If no label is specified, all entities from the provider are labeled with the name of the provider. It is only possible to set the same label for two different providers if they have exactly the same properties (same names and same types).

Property Configuration

The props entry in the Provider configuration is is an object with the JSON fields:

Field Type Description Default
namestringname of propertyrequired
typeenum[boolean, integer, vertex, edge, float, long, double, string, date, local_date, time, timestamp, time_with_timezone, timestamp_with_timezone, point2d]type of property (Note: date is deprecated, use one of local_date / time / timestamp / time_with_timezone / timestamp_with_timezone instead). vertex/edge are place-holders for the type specified in vertex_id_type/edge_id_type fields.required
columnvaluename or index (starting from 0) of the column holding the property data. If it is not specified, the loader will try to use the property name as column name (for CSV format only)null
defaultvaluedefault value to be assigned to this property if datasource does not provide it. In case of date type: string is expected to be formatted with yyyy-MM-dd HH:mm:ss. If no default is present (null), non-existent properties will contain default Java types (primitives) or empty string (string) or 01.01.1970 00:00 (date).null
dimensionintegerdimension of property0
formatarray of stringarray of formats of property[]
max_distinct_strings_per_poolinteger[only relevant if string_pooling_strategy is indexed] amount of distinct strings per property after which to stop pooling. If the limit is reached an exception is thrown. If set to null, the default value from the global PGX configuration will be used.null
storesarray of objectA list of storage identifiers that indicate where this property resides.[]
string_pooling_strategyenum[indexed, on_heap, none][only relevant if use_string_pool is enabled] which string pooling strategy to use. If set to null, the default value from the global PGX configuration will be used.null
use_string_poolbooleanIf true, PGX will store string properties in a pool in order to consume less memory on string propertiestrue

Loading Configuration

The loading entry is a JSON object with the following fields:

Field Type Description Default
create_key_mappingbooleanif true, a mapping between entity keys and internal IDs is prepared during loading.true
load_labelsbooleanwhether or not to load the entity label if it is availablefalse
strict_modebooleanif true, exceptions are thrown and logged with ERROR level whenever loader encounters problems with input file, such as invalid format, repeated keys, missing fields, mismatches and other potential errors. If false, loader may use less memory during loading phase, but behave unexpectedly with erratic input filestrue

Error Handling Configuration

The error_handling entry is a JSON object with the following fields:

Field Type Description Default
on_missed_prop_keyenum[silent, log_warn, log_warn_once, error]what to do when missing property key is encounteredlog_warn_once
on_missing_vertexenum[ignore_edge, create_vertex, error]what to do when a source or destination vertex of an edge is not found in a vertex data source.error
on_prop_conversionenum[silent, log_warn, log_warn_once, error]what to do when different property type is encountered than specified, but coercion is possiblelog_warn_once
on_type_mismatchenum[silent, log_warn, log_warn_once, error]what to do when different property type is encountered than specified, but coercion is not possibleerror
on_vector_length_mismatchenum[silent, log_warn, log_warn_once, error]what to do when a vector property has not the correct dimensionerror

PGX 20.1.1 Limitation

For partitioned graphs, the only supported setting for the on_missing_vertex error handling configuration is ignore_edge.

each provider may contain additional JSON fields that are specific to the type of the data source. See Loading from Files for details.

Further details: