PGX 20.1.1
Documentation

Partitioned Graphs Operations

PGX supports an optimized way of loading "partitioned" graphs with different types of vertices or edges, as illustrated in the partitioned graph model.

In this guide, we will see how partitioned graph data can be represented, and how the graph model translates into a PGX graph configuration that can be used for loading the partitioned graph.

Partitioned Graph Data and Partitioned Graph Configuration

For example, for a graph representing people and locations, the vertices representing people could be stored in a table of an Oracle RDBMS while the different vertices representing locations could be stored in a CSV file. The edges representing friendship between people could be stored in another table in the RDBMS, while the "lives in" relationship between people and places can be stored in CSV files.

In order to specify edges, it is necessary to specify the key of the source vertex, and the key of the destination vertex of the edge. In our example, we will use integer keys for the vertices. PGX can use the keys specified when loading the vertices and edges of a graph as unique identifiers of those vertices and edges. It is also possible to disable such identifiers for vertices or edges. Having unique identifiers for vertices and edges makes it possible to use some PGX APIS. For more information about unique identifiers in partitioned graphs, please refer to the partitioned graph overview documentation. In our example, we will make the vertex keys in each table to be unique across the entire graph so that they can be used as vertex identifiers, but we will not load or declare unique identifiers for edges.

The rest of this guide expands on this example and shows how to use PGX to operate on such graph.

The following SQL script creates a table for the person vertices, and a table for the frienship edges. The person table contains columns for the vertex identifiers, the names and birthdates of the persons. Each row in this table represents a vertex of a person. We also add a constraint on the vertex identifier to make sure they are unique inside the table. The friendship edge table contains columns to indicate the source and destination person of the edge, and a column for the "meeting date" property.

CREATE TABLE persons
( VID NUMBER,
  NAME VARCHAR2(200),
  BIRTHDATE DATE,
  CONSTRAINT person_pk PRIMARY KEY (VID)
);

We can create a table for the edges in the same or another database with the following code:

CREATE TABLE friendships
( SVID NUMBER,
  DVID NUMBER,
  MEETING_DATE DATE,
  CONSTRAINT fk_SVID FOREIGN KEY (SVID) REFERENCES persons(VID),
  CONSTRAINT fk_DVID FOREIGN KEY (DVID) REFERENCES persons(VID)
);

We now insert some vertices and edges into the tables we just created:

INSERT INTO persons (VID,NAME,BIRTHDATE) VALUES (1, 'John', to_date('13/06/1963','DD/MM/YYYY'));
INSERT INTO persons (VID,NAME,BIRTHDATE) VALUES (2, 'Mary', to_date('25/09/1982','DD/MM/YYYY'));
INSERT INTO persons ((VID,NAME,BIRTHDATE) VALUES (3, 'Bob', to_date('11/03/1966','DD/MM/YYYY'));
INSERT INTO persons (VID,NAME,BIRTHDATE) VALUES (4, 'Alice', to_date('01/02/1987','DD/MM/YYYY'));

INSERT INTO friendships (SVID,DVID,MEETING_DATE) VALUES (1, 3, to_date('01/09/1972','DD/MM/YYYY'));
INSERT INTO friendships (SVID,DVID,MEETING_DATE) VALUES (2, 4, to_date('19/09/1992','DD/MM/YYYY'));
INSERT INTO friendships (SVID,DVID,MEETING_DATE) VALUES (4, 2, to_date('19/09/1992','DD/MM/YYYY'));
INSERT INTO friendships (SVID,DVID,MEETING_DATE) VALUES (3, 2, to_date('10/07/2001','DD/MM/YYYY'));

We have now the data for the persons and the friendships.

We show next what the CSV files for places and the "lives in" relationship edges could look like.

For the vertices for places, as we want to have unique identifiers for vertices, we need to pay attention to not have any vertex identifier of places also used for persons. The CSV data for places can be for example:

5,"San Francisco"
6,"Tokyo"
7,"Paris"
8,"London"

For the "lives in" relationship, the CSV data connects persons (vertex identifiers between 1 and 4) to places (vertex identifiers between 5 and 8):

1,8,"1985-10-18"
2,5,"1982-09-25"
4,7,"1998-11-13"

To be able to load the different vertices and edges, we create a graph configuration in JSON format. This JSON can be stored in a file. The graph configuration declares how every vertex and edge type are loaded, and how to access the data. Configuration for the loading vertices is done by declaring one or more providers of vertices. Similarly, loading the edges is done by declaring one or more providers of edges. When declaring edge providers, the source and destination vertex providers have to be specified.

{
  "name": "PeoplePlacesGraph",
  "vertex_providers":[{
    "name": "person",
    "format": "rdbms",
    "jdbc_url": "jdbc:oracle:thin:@mydatabaseserver:1521/dbName",
    "username": "dbUser",
    "password": "dbPassword",
    "database_table_name": "persons",
    "key_column": "VID",
    "props": [{
        "name": "BIRTHDATE",
        "type": "local_date"
      },
      {
        "name": "NAME",
        "type": "string"
      }]
  },{
    "name": "place",
    "format": "csv",
    "uris": ["/path/to/places.csv"],
    "props": [{
      "name": "ADDRESS",
      "type": "string"
    }]
  }],
  "edge_providers":[{
    "name": "livesIn",
    "format": "csv",
    "uris": ["../lives_in.csv"],
    "source_vertex_provider": "person",
    "destination_vertex_provider": "place",
    "props": [{
      "name": "SINCE",
      "type": "local_date"
    }]
  },{
    "name": "friendOf",
    "format": "rdbms",
    "source_vertex_provider": "person",
    "destination_vertex_provider": "person",
    "jdbc_url": "jdbc:oracle:thin:@myotherdatabaseserver:1521/otherDbName",
    "username": "otherDbUser",
    "password": "otherDbPassword",
    "database_table_name": "friendships",
    "source_column" : "SVID",
    "destination_column" : "DVID",
    "props": [{
      "name": "MEETING_DATE",
      "type": "local_date"
      }]
  }]
}

Alternatively, the partitioned graph configuration can be created using the Partitioned graph configuration builder.

We show in the next section how to load the graph with that graph configuration and data.

Loading a Partitioned Graph in PGX from a Graph Configuration

To load the partitioned graph data into memory, we can use the PgxSession.readGraphWithProperties() method, either by passing a path to a partitioned graph configuration in a JSON file:

var G = session.readGraphWithProperties("path/to/partitioned-config.json")
PgxSession session = Pgx.createSession("my-session");
PgxGraph G = session.readGraphWithProperties("path/to/partitioned-config.json");

or the config object directly if you built the configuration programmatically:

var G = session.readGraphWithProperties(config)
PgxSession session = Pgx.createSession("my-session");
PgxGraph G = session.readGraphWithProperties(config);

Storing a Partitioned Graph

To store the partitioned graph without a graph configuration, there are different store() methods that can be used with different parameters that can be set:

  • ProviderFormat targetFormat to specify in which file format the graph will be stored
  • String targetBasePath to specify where the vertex and edge providers will be stored
  • boolean overwrite to specify if existing files can be overwritten
  • Collection<VertexProperty<?, ?>> vertexProps, Collection<EdgeProperty<?>> edgeProps to select a set of properties to be stored with the graph
  • int numPartitions to export every provider into multiple files
  • Map<String, FileGraphStoringConfig> vertexStoringConfigs, Map<String, FileGraphStoringConfig> edgeStoringConfigs to provide for every vertex and edge provider a FileGraphStoringConfig with the providers names as keys.

The client can use for example the method store(ProviderFormat targetFormat, String targetBasePath, boolean overwrite) to store the graph previously loaded in a variable G:

PartitionedGraphConfig config = G.store(ProviderFormat.CSV, "targetPath/", true)
PartitionedGraphConfig config = G.store(ProviderFormat.CSV, "targetPath/", true);

Storing partitioned graph แบith a graph configuration is possible by using the PgxGraph.store(GraphConfig targetConfig, boolean overwrite) method. The graph configuration passed as first argument should then be a partitioned graph configuration. This graph configuration can be generated programmatically or read from a JSON file, as documented in Partitioned graph configuration reference documentation.

When storing a partitioned graph, a configuration for each vertex and edge provider used for the loading/creation of the graph should be indicated.

The following example shows how to store the example partitioned graph to vertex and edge providers in the PGB format, which is a binary file format optimized for read performance and disk space.

We need first to define the partitioned graph configuration, with all the providers using the PGB format. In JSON form, the configuration is as follows:

{
  "name": "PeoplePlacesGraph",
  "vertex_providers":[{
    "name": "person",
    "format": "pgb",
    "uris" : ["path/where/to/store/person.pgb"],
    "props": [{
        "name": "BIRTHDATE",
        "type": "local_date"
      },
      {
        "name": "NAME",
        "type": "string"
      }]
  },{
    "name": "place",
    "format": "pgb",
    "uris": ["/path/to/store/places.pgb"],
    "props": [{
      "name": "ADDRESS",
      "type": "string"
    }]
  }],
  "edge_providers":[{
    "name": "livesIn",
    "format": "pgb",
    "uris": ["/path/to/store/lives_in.pgb"],
    "source_vertex_provider": "person",
    "destination_vertex_provider": "place",
    "props": [{
      "name": "SINCE",
      "type": "local_date"
    }]
  },{
    "name": "friendOf",
    "format": "pgb",
    "uris": ["/path/to/store/friend_of.pgb"],
    "source_vertex_provider": "person",
    "destination_vertex_provider": "person",
    "props": [{
      "name": "MEETING_DATE",
      "type": "local_date"
      }]
  }]
}

To store the partitioned graph previously loaded in a variable G, we then can do:

var config = GraphConfigFactory.forAnyFormat().fromPath("/path/to/config.json")
G.store(config, true)
Graph config = GraphConfigFactory.forAnyFormat().fromPath("/path/to/config.json");
G.store(config, true);

Recap

In this guide we saw the following:

  • what the partitioned graph model is and how to map an example graph with different vertex and edge types to it by decomposing the graph in vertex and edge providers, each providing a type of vertex or edge
  • how to create a graph configuration for partitioned graphs either by using a JSON configuration file or using a programmatic method
  • how to load a partitioned graph, either from a path to a JSON configuration or from a graph configuration object
  • how to store a partitioned graph, potentially using different formats for some or all of the vertex/edge providers by using the PgxGraph.store(GraphConfig targetConfig, boolean overwrite) method

Further Reading

For more details about partitioned graph configuration, please have a look at the dedicated reference documentation.

For more information about partitioned graphs in general, please refer to the partitioned graph overview documentation.