PGX 20.1.1
Documentation

Graph Configuration

Partitioned graphs

The information on this page refers to graph configuration for loading "non-partitioned" graphs. Read the partitioned graph configuration reference for information on partitioned graph configurations.

For loading graph data, PGX requires Graph Configs, i.e. the meta-information about the graph data. A Graph Config includes the following information about the data:

  • Location of the data — a file, database tables, etc.
  • Information about the properties: name and type of the property.

For instance, the following shell snippet loads the graph that is specified in mygraph.json.

pgx> var G = session.readGraphWithProperties("/path/to/mygraph.json", "my-graph")

Note that, typically, a Graph Config is given as a JSON file. The user can use Java Properties format, instead of JSON — check this document for an example.

Loading when in Remote mode

When loading a graph from file in Remote mode (Server-Client), the JSON file containing the graph configuration should be on the client side and the file containing the actual graph data on the server side.

It is also possible to create Graph Config programmatically. See the related document for details.

Some of the graph formats supported by PGX are partially or fully self-describing. For a subset of those formats, graph configurations can be automatically generated by PGX. For details, take a look at the configuration detection document.

Graph Config JSON File

All graph configs have the following JSON fields in common:

Field Type Description Default
array_compaction_thresholdnumber[only relevant if the graph is optimized for updates] threshold used to determined when to compact the delta-logs into a new array. If lower than the engine min_array_compaction_threshold value, min_array_compaction_threshold will be used instead0.2
attributesobjectadditional attributes needed to read/write the graph datanull
edge_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the edges of this graph. If not specified (or set to null), the strategy will be determined during loading or using a default valuenull
edge_id_typeenum[long]type of the edge ID. For homogeneous graphs, if not specified (or set to null), it will default to long.null
edge_propsarray of objectspecification of edge properties associated with graph[]
error_handlingobjecterror handling configurationnull
external_storesarray of objectSpecification of the external stores where external string properties reside.[]
formatenum[pgb, edge_list, adj_list, graphml, pg, rdf, two_tables]graph formatnull
keystore_aliasstringalias to the keystore to use when connecting to databasenull
loadingobjectloading-specific configurationnull
local_date_formatarray of stringarray of local_date formats to use when loading and storing local_date properties. Please see DateTimeFormatter for a documentation of the format string[]
optimized_forenum[read, updates]Indicates if the graph should use data-structures optimized for read-intensive scenarios or for fast updatesread
partition_while_loadingenum[by_label, no]Indicates if the graph should be partitioned while loadingnull
passwordstringpassword to use when connecting to databasenull
point2dstringlongitude and latitude as floating point values separated by a space0.0 0.0
time_formatarray of stringthe time format to use when loading and storing time properties. Please see DateTimeFormatter for a documentation of the format string[]
time_with_timezone_formatarray of stringthe time with timezone format to use when loading and storing time with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_formatarray of stringthe timestamp format to use when loading and storing timestamp properties. Please see DateTimeFormatter for a documentation of the format string[]
timestamp_with_timezone_formatarray of stringthe timestamp with timezone format to use when loading and storing timestamp with timezone properties. Please see DateTimeFormatter for a documentation of the format string[]
vector_component_delimitercharacterdelimiter for the different components of vector properties;
vertex_id_strategyenum[no_ids, keys_as_ids, unstable_generated_ids]Indicates what ID strategy should be used for the vertices of this graph. If not specified (or set to null), the strategy will be automatically detectednull
vertex_id_typeenum[int, integer, long, string]type of the vertex ID. For homogeneous graphs, if not specified (or set to null), it will default to a specific value (depending on the origin of the data).null
vertex_propsarray of objectspecification of vertex properties associated with graph[]

Security warning

Vertex/edge IDs can be part of REST calls, and so they are visible to others. PGX highly recommends designing your graph model not to use any sensitive information as vertex/edge IDs.

where vertex_props and edge_props are objects with the JSON fields

Field Type Description Default
namestringname of propertyrequired
typeenum[boolean, integer, vertex, edge, float, long, double, string, date, local_date, time, timestamp, time_with_timezone, timestamp_with_timezone, point2d]type of property (Note: date is deprecated, use one of local_date / time / timestamp / time_with_timezone / timestamp_with_timezone instead). vertex/edge are place-holders for the type specified in vertex_id_type/edge_id_type fields.required
columnvaluename or index (starting from 0) of the column holding the property data. If it is not specified, the loader will try to use the property name as column name (for CSV format only)null
defaultvaluedefault value to be assigned to this property if datasource does not provide it. In case of date type: string is expected to be formatted with yyyy-MM-dd HH:mm:ss. If no default is present (null), non-existent properties will contain default Java types (primitives) or empty string (string) or 01.01.1970 00:00 (date).null
dimensionintegerdimension of property0
formatarray of stringarray of formats of property[]
max_distinct_strings_per_poolinteger[only relevant if string_pooling_strategy is indexed] amount of distinct strings per property after which to stop pooling. If the limit is reached an exception is thrown. If set to null, the default value from the global PGX configuration will be used.null
storesarray of objectA list of storage identifiers that indicate where this property resides.[]
string_pooling_strategyenum[indexed, on_heap, none][only relevant if use_string_pool is enabled] which string pooling strategy to use. If set to null, the default value from the global PGX configuration will be used.null
use_string_poolbooleanIf true, PGX will store string properties in a pool in order to consume less memory on string propertiestrue

and loading a JSON object with the JSON fields

Field Type Description Default
auto_refreshbooleanif true the graph gets refreshed automatically in periodic intervals. Note: Depending on the global settings, only fixed (pre-loaded) graphs can be auto-refreshedfalse
create_edge_id_indexbooleanif true, an index is prepared during loading which enables retrieval of edge pathsfalse
create_edge_id_mappingbooleanif true, a mapping is prepared during loading which enables edge key arguments and filters containing edge keysfalse
create_label_histogrambooleanwhether a label histogram needs to be generated when the graph is loadedfalse
create_vertex_id_indexbooleanif true, an index is prepared during loading which enables retrieval of vertex pathstrue
create_vertex_id_mappingbooleanif true, a mapping is prepared during loading which enables vertex arguments and vertex filterstrue
fetch_interval_secinteger(only relevant if the format supports delta updates) the interval in which the graph source is queried for changes-1
filterobjectif not null, load subgraph specified by this filternull
filter_strategyenum[DB, STREAM, POST, AUTO]the strategy to process the filterauto
load_edge_labelbooleanwhether or not to load the edge label if it is availablefalse
load_vertex_labelsbooleanwhether or not to load the vertex label if it is availablefalse
partition_discard_default_valuesboolean[relevant for partition_while_loading]when partition_while_loading is specified, if set to by_label, the properties that contain only default values are removed from vertex and edge providers.false
property_value_delimiterstringif null read the whole string value as label. Otherwise, split the string using the specified delimiter and use all values as vertex labelsnull
skip_edgesbooleanwhether or not to load the edgesfalse
skip_verticesbooleanwhether or not to load the verticesfalse
snapshots_sourceenum[REFRESH, CHANGE_SET]source of graph snapshots: if REFRESH, new snapshots can be created only by reading the graph again via this config (e.g., with `readGraphWithProperties`), or equivalently via auto-refresh if enabled; if CHANGE_SET, new snapshots can be added only via changesets by any session. Note: CHANGE_SET is not compatible with auto-refreshrefresh
strict_modebooleanif true, exceptions are thrown and logged with ERROR level whenever loader encounters problems with input file, such as invalid format, repeated keys, missing fields, mismatches and other potential errors. If false, loader may use less memory during loading phase, but behave unexpectedly with erratic input filestrue
update_interval_secintegerthe interval in which a new snapshot is created, either by reloading the entire graph or if the format supports delta-updates, out of the cached changes. (only relevant if the format supports delta updates) Set to -1 if you want to disable periodic snapshot creation. Note: one of update_interval_sec and update_threshold must be set60
update_properties_in_placebooleanif true, non-structural updates get applied to the graph in-place, else non-structural updates also cause new snapshots of the graph to be created.false
update_thresholdinteger(only relevant if the format supports delta updates) the maximum number of changes that are cached before a new snapshot is created. Set to -1 if you want to disable the threshold for snapshot creation. Note: one of update_interval_sec and update_threshold must be set-1
use_vertex_property_value_as_labelstringload the given property as vertex label. Currently only available for loading from PGnull

and error_handling a JSON object with the JSON fields

Field Type Description Default
on_missed_prop_keyenum[silent, log_warn, log_warn_once, error]what to do when missing property key is encounteredlog_warn_once
on_missing_vertexenum[ignore_edge, create_vertex, error]what to do when a source or destination vertex of an edge is not found in a vertex data source.error
on_prop_conversionenum[silent, log_warn, log_warn_once, error]what to do when different property type is encountered than specified, but coercion is possiblelog_warn_once
on_type_mismatchenum[silent, log_warn, log_warn_once, error]what to do when different property type is encountered than specified, but coercion is not possibleerror
on_vector_length_mismatchenum[silent, log_warn, log_warn_once, error]what to do when a vector property has not the correct dimensionerror

However, each Graph Config may contain additional JSON fields that are specific to the type of the data source. See Loading from Files and Loading from DB for details.

PGX remote limitation

PGX does not support loading graphs from local file system in the remote use case by default. The allow_local_filesystem engine configuration option can enable this feature at the expense of security. If enabled, directories from which loading should be allowed must be specified with the datasource_dir_whitelist engine configuration option and permission must be granted to the user / role that needs to load graphs from the file-location.

Further details: