PGX 20.1.1
Documentation

Partitioned Graph Model

Partitioned graph (beta version)

Not all PGX features and APIs may be available for partitioned graphs and partitioned graphs are not supported in distributed execution mode. Please refer to the "Unsupported features" section to read about the current limitations.

The PGX partitioned graph model is a representation of a property graph particularly suited for graphs having vertices and edges of different "types", where each type of vertex or edge has a different set of properties. For example, in a graph that represents people and locations, vertices of type "Person" would have properties such as "Name" or "Birthday", while vertices of type "Place" would have properties such as "Address". Similarly, edges from "Person" to "Person" may have properties like "Meeting date", while edges from "Person" to "Place" may have properties such as "Lives at floor".

Loading such graphs as partitioned graphs in PGX can result in potentially large memory savings, thanks to the specific memory layout, optimized for different types of vertices and edges.

Vertex and Edge IDs for Partitioned Graphs

PGX enforces by default the existence of a unique identifier for each vertex and edge in a graph, so that they can be retrieved with the PgxGraph.getVertex(ID id) and PgxGraph.getEdge(ID id) or PGQL queries using the built-in id() method. That remains the case for partitioned graphs.

The default strategy to generate the vertex IDs is to use the keys provided during loading of the graph. In that case each vertex should have a vertex key that is unique across all the types of vertices. For edges, by default no keys are required in the edge data, and edge IDs will be automatically generated by PGX. Please note that the generation of edge IDs is not guaranteed to be deterministic. If required, it is also possible to load edge keys as IDs.

However, as it may cumbersome for partitioned graphs to define such identifiers, it is possible to disable that requirement for the vertices and/or edges by setting the vertex_id_strategy and edge_id_strategy graph configuration fields to the value no_ids. When disabling vertex (resp. edge) IDs, the implication is that PGX will forbid the call to APIs using vertex (resp. edge) IDs, including the ones indicated previously.

Please refer to the Java API documentation of the IdStrategy enumeration, the Graph Loading Guide, and Graph Configuration Guide to learn more about the possible ID strategies, and how to specify them in graph configurations.

Graph Configuration, Loading and Storing of Partitioned Graphs

In partitioned graphs, vertices and edges are typed, meaning that they have a defined set of properties. Vertices and edges of a partitioned graphs are loaded from "providers", where each provider is a data source that provides vertices (or edges) of a specific type (i.e., with a specific set of properties).

See the partitioned graph loading documentation for more information on loading partitioned graphs.

Loading Non-partitioned Graph Data as Partitioned Graphs

Additionally to loading partitioned graphs directly from a partitioned graph configuration, it is possible for some non-partitioned graph formats (currently CSV, TWO_TABLES RDBMS, and the PG formats) to let PGX detect the vertex and edge types while loading the non-partitioned graph data, and create a partitioned graph. To do that, PGX relies on the vertex and/or edge labels present in the non-partitioned graph data to find the the vertex and edge types.

Loading partitioned graphs in this way presents the advantage of requiring few changes if a non-partitioned graph is already available in a supported format, while giving the memory improvements of partitioned graphs.

Loading a partitioned graph in this way is described further in the Auto-Heterogenization Guide.

Adding or Removing Vertex and Edge Providers After the Initial Graph Loading

It is possible to add or remove vertex and edge providers from a partitioned graph by applying a graph alteration mutation. To get more information about how to apply graph alterations on partitioned graphs, please read the dedicated documentation available at the graph alteration reference documentation.

Memory Consumption

By giving the ability to model precisely the types of the vertices and edges and their associated properties, the memory consumption for a partitioned graph can be very different of non-partitioned graphs. The memory consumption documentation page provides more information on the memory requirements for loading partitioned graphs.

Graph Changeset Application on Partitioned Graphs

Partitioned graphs can be modified by using changesets, subject to some constraints. Due to the fact that partitioned graphs are made of vertices and edges of specific types, the changes in a graph changeset on a partitioned graph have to obey to the types defined when initially creating or loading the partitioned graph. To get more information about how to create and apply graph changesets on partitioned graphs, please read the dedicated documentation available at the graph change set reference documentation.

Querying a Partitioned Graph with the PGQL Language

All the features of the PGQL language available for non-partitioned graphs are supported for partitioned graphs.

Furthermore, since partitioned graphs associate the vertices and edges of specific types, PGQL queries can execute faster by applying some specific optimizations. In order to benefit from all possible optimizations, we recommend to enable the creation of a label histogram when loading partitioned graphs. Please refer to the documentation of the create_label_histogram configuration field at the graph config reference documentation.

In partitioned graphs, not all the vertices or edges may have all properties. If a property access is attempted for a vertex or an edge that does not have this property, the PGQL query engine will continue the query by giving a NULL value as result of this access. If this NULL value is used in the rest of the query in an expression, the same rules as the SQL Three-valued logic are used to evaluate the expression. Sorting of NULL values with an ORDER BY clause is supported, and the NULL values will be placed after any other non-NULL value when using an ascending ordering, and before any non-NULL value when using descending ordering.

Current limitation when grouping by NULL values

The current PGQL engine may not function correctly for queries that execute a GROUP BY aggregation on keys that contain NULL values.

For more information about the PGQL language, please refer to the PGQL reference documentation.

Running INSERT/UPDATE/DELETE Queries on Partitioned Graphs

INSERT/UPDATE/DELETE queries are also supported for partitioned graphs. UPDATE and DELETE queries can be executed without limitations. In case of INSERT queries, the type of the inserted entity is determined by its label(s). For this reason, vertices inserted through PGQL must have their labels defined, and it should correspond to exactly one vertex type. In case of edge insert, the label of the inserted edge must refer to an edge type from the graph that is defined between the type of the source and the type of the destination vertex. Furthermore, the assigned properties must be defined for the type of the inserted entity.

More details on how to run INSERT/UPDATE/DELETE queries on a graph can be found here. For the exact syntax and semantics of INSERT/UPDATE/DELETE queries, please refer to the corresponding section of the PGQL specification.

Executing Graph Analytics Algorithms on Partitioned Graphs

The methods provided in the PGX Analyst API do support partitioned graphs in the same way as for non-partitioned graphs.

Current limitation when using the Analyst API

Most of the algorithms are supported on partitioned graphs. Among the currently non-supported algorithms are the community-label-propagation and infomap algorithms.

Partitioned graphs can also be used for custom algorithms written in Green-Marl or using PGX Algorithm.

Current limitation when using custom algorithms

Currently custom algorithms cannot be executed using partitioned graphs if they use certain features including local procedures, and ordered iterations.

Unsupported Features

We list here the notable features that are currently not supported on partitioned graphs (potentially non-exhaustive list):

  • distributed runtime support (PGX.D) is not implemented
  • graph mutations (subgraph, undirect, sort by degree and others) except PgxGraph.clone() are not supported
  • the PgxMap APIs are not supported
  • the GraphBuilder APIs are not supported
  • Analyst algorithm and custom GreenMarl/PGX Algorithms using local procedures and ordered iterations are not supported
  • Delta-refresh of partitioned graph is not supported: a full snapshot is created by reading the entire graph again