PGX 20.1.1
Documentation

Graph Loading and Storing

PGX is an in-memory graph analysis engine. In order to analyze graph data with PGX, the graph data first needs to be loaded into PGX. PGX adopts the Property Graph data model. This means that PGX expects that the graph data is already modeled as a property graph before loading. If not, the user first has to transform his/her data into a property graph. PGX supports various data sources and data formats for loading graph data, including file system and database formats.

Data Loading Security Best Practices

Some of the sources PGX can load from (e.g., the database) require user authentication. We recommend to adhere to the following guidelines when configuring access to this kind of data sources:

  1. The user or role used to access the data should be a read-only account that only has access to the required graph data;
  2. The graph data should be marked as read-only, for example, with non-updateable views in the case of the database.

Data Format Support Matrix

The following table illustrates how the different data formats differ in the way IDs, labels and vector properties are handled. Note that below table refers to limitations of the PGX implementation of the format, not necessarily to limitations of the format itself.

Format Vertex IDs Edge IDs Vertex Labels Edge Labels Vector properties
PGB int, long, string long multiple single supported (vectors can be of type integer, long, float or double)
CSV int, long, string long multiple single supported (vectors can be of type integer, long, float or double)
ADJ_LIST int, long, string not supported not supported not supported supported (vectors can be of type integer, long, float or double)
EDGE_LIST int, long, string not supported multiple single supported (vectors can be of type integer, long, float or double)
GRAPHML int, long, string not supported not supported not supported not supported
TWO_TABLES int, long, string long multiple single only in text datastore (vectors can be of type integer, long, float or double)
RDF long, string not supported multiple single not supported
PG (FLAT_FILE) int, long long not supported single not supported

API for Loading Graphs into Memory

Although PGX supports multiple different usage modes, the data loading mechanism is fundamentally the same for all those modes. Find the concrete APIs to read graphs in our API Guide.

Note that the loading method always requires a graph configuration as an input. The graph configuration is some metadata about the graph to be loaded. See the related document for details.

Immutability of Loaded Graphs

Once a graph is loaded into PGX, the graph and its properties are automatically marked as immutable. The following reasons led to this design decision:

  • Typical graph analyses happen on a snapshot of a graph instance, and therefore they do not require mutations of the graph instance.
  • Immutability allows PGX to use an internal graph representation optimized for fast analysis.
  • In remote mode, the graph instance might be shared among multiple clients.

Nevertheless, PGX also provides methods to privatize and mutate graph instances for the sake of analysis. See the related document for details.

Continue reading: