PGX 20.1.1
Documentation

Graph Loading Features in Distributed Execution

This page describes the supported and unsupported features related to graph loading of the distributed execution mode of PGX. Note that only non-partitioned graphs are currently supported in distributed mode.

Pre-loading Graphs

It is possible to instruct the PGX server to pre-load a graph in memory with the preload_graphs configuration entry for pgx.conf(refer to this page for more information and an example).

Unsupported Graph Pre-loading Features

Invalid or unsupported graph configurations or specifying unsupported pre-loading flags will cause the server to abort. It is possible to instruct the server to simply ignore the offending graphs by setting the strict_mode flag to false in the configuration.

  • Pre-loaded unpublished graphs (with both publish and publish_with_snapshots flags set to false) are not supported.

Unsupported Property Types

The distributed mode does not support point2d as vertex or edge properties. Furthermore, the distributed mode does not support loading vector properties from a data source, but they can be created once the graph is loaded.

Distributed Format Support Matrix

The following table illustrates which formats the distributed execution mode supports, as well as potential additional limitations for supported formats.

Format Supported Vertex and edge labels supported Other limitations
ADJ_LIST Yes No None
CSV Yes Yes See below
EDGE_LIST Yes Yes None
FLAT_FILE Yes Yes None
GRAPHML No N/A N/A
MULTI_TABLES_DB No N/A N/A
PG (HBASE) Yes Yes Only available within certain supported products
PG (NOSQL) Yes Yes Only available within certain supported products
PG (RDBMS) No N/A N/A
PGB Yes No None
RDF No N/A N/A
TWO_TABLES (RDBMS) Yes Yes None
TWO_TABLES (TEXT) No N/A N/A
XML No N/A N/A

File Distribution Across Machines

Note that all graph source file(s) should be accessible by all machines in the distributed modes, if the format is one of ADJ_LIST, CSV, EDGE_LIT, FLAT_FILE, and PGB. So the source file(s) should be located under a shared file system (i.e., NFS), or replicated on a local file system of each machine. If a source file is replicated on a local file system, a file path should be identical across all machines.

Parallel Loading of CSV Files

The CSV loader in the distributed execution mode can parse and load data in parallel if it is split into multiple files, but it does not support automatic partitioning. The user should split the source file into multiple files if the source file is too big or want to maximize loading performance.