PGX 20.2.2
Documentation

Loading Partitioned Graphs from Non-partitioned Graph Data

In the partitioned graph model, vertices and edges are "typed", and each vertex or edge type can have its own set of properties. By taking those types into account, PGX can optimize the memory consumption, or query execution on such graphs.

In non-partitioned graphs, the information about the types of vertices and edges is usually represented by using vertex and edge labels. PGX makes it possible to automatically create a partitioned graph from non-partitioned graph data by using such vertex and edge labels to determine the vertex and edge types of the partitioned graph. While doing so, PGX can also detect what properties are associated to each vertex and edge type by discarding the properties that contain only the default values (or if the property values are missing for formats such as FLAT_FILE).

Partitioning while loading is available for non-partitioned graph data in the CSV, PG, FLAT_FILES and TWO_TABLES RDBMS formats.

To automatically load non-partitioned graph data as a partitioned graph, only a few changes are required in the graph configuration, which we detail next.

Partitioning-while-loading Flag in Graph Config

To enable the partitioning of non-partitioned graph data while loading, it is necessary to set the partition_while_loading flag to labels in the graph configuration, as shown in the following example:

{
  "format": "csv",
  "partition_while_loading": "by_label",
  "header": false,
  "vertex_uris": ["vertices.csv"],
  "edge_uris": ["edges.csv"],
  "vertex_props": [
    {"name": "integer_prop", "type": "integer"},
    {"name": "string_prop", "type": "string"}
  ],
  "edge_props": [
    {"name": "integer_prop", "type": "integer"},
    {"name": "string_prop", "type": "string"}
  ],
  "loading": {
    "load_vertex_labels": true,
    "load_edge_label": true
  }
}

Loading vertex or edge labels is not mandatory. However, when labels are not associated to vertices (resp. edges), the types of those vertices (resp. edges) cannot be determined. In the case where no vertex labels and edge labels are loaded, the partitioning-while-loading will not be able to differentiate any vertex or edge type at all.

Discarding Properties Containing Only Default/missing Values

For non-partitioned graph formats that support sparse properties (such as FLAT_FILE, PG formats, TWO_TABLES RDBMS), properties that do not have any value defined for any vertex or edge of a certain type will not be included in the corresponding vertex/edge provider after partitioning.

To discard properties during partitioning for the other formats that do not support sparse property definitions (CSV for example), it is possible to specify that the properties filled with the default value for a certain vertex/edge provider should be discarded. The following example illustrates how to use that option in a graph configuration:

{
  "format": "csv",
  "partition_while_loading": "by_label",
  "header": false,
  "vertex_uris": ["vertices.csv"],
  "edge_uris": ["edges.csv"],
  "vertex_props": [
    {"name": "integer_prop", "type": "integer"},
    {"name": "string_prop", "type": "string"}
  ],
  "edge_props": [
    {"name": "integer_prop", "type": "integer", "default" : "-100"},
    {"name": "string_prop", "type": "string"}
  ],
  "loading": {
    "load_vertex_labels": true,
    "load_edge_label": true,
    "partition_discard_default_values": true
  }
}

The previous example shows how to enable discarding the properties containing only default values by setting partition_discard_default_values to true, and shows how a specific property can have its default value specified with the default attribute of a property specified.

Example with Labeled Vertices and Edges Graph Data

We illustrate here how the partitioning operates on a sample graph that has labeled vertices and edges. Some of the vertices represent persons, while others represent bank accounts. There are edges to indicate that a person owns a bank account, and edges to indicate a money transfer between accounts. Persons have an age property, while bank accounts have an IBAN property. All edges have keys that we will use as IDs, money transfers have an amount property.

The vertices can be put in a CSV file labeled_vertices.csv with the following content:

0,person,23,"N/A"
1,person,44,"N/A"
2,person,32,"N/A"
100,account,-10,"CH1234"
101,account,-10,"FR1234"
102,account,-10,"CZ1234"
103,account,-10,"GR1234"
104,account,-10,"IE1234"
105,account,-10,"HU1234"

The edges can be put in a CSV file labeled_edges.csv with the following content:

0,0,100,owns,42
1,0,101,owns,42
2,1,104,owns,42
3,1,105,owns,42
4,2,102,owns,42
5,2,103,owns,42
100,100,101,transferred,102
101,102,104,transferred,1
102,100,105,transferred,546

To load this graph and partition it, we use the following graph configuration, that can be placed in the file partitioning_config.json:

{
  "format": "csv",
  "partition_while_loading" : "by_label",
  "vertex_uris": ["labeled_vertices.csv"],
  "vertex_props":[
    {"name":"age","default":-10,"type":"integer"},
    {"name":"iban","default":"N/A","type":"string"}
  ],
  "edge_uris" : ["labeled_edges.csv"],
  "edge_props":[
    {"name":"amount","default":42.0,"type":"double"}
  ],
  "loading":{
    "create_edge_id_index": true,
    "create_edge_id_mapping": true,
    "partition_discard_default_values": true,
    "load_vertex_labels": true,
    "load_edge_label": true
  }
}

We can load the graph with the following commands:

var graph = session.readGraphWithProperties("/path/to/partitioning_config.json")
PgxGraph graph = session.readGraphWithProperties("/path/to/partitioning_config.json");

Due to the vertex labels, two vertex types are detected during partitioning and the associated vertex providers are created: person, account. Due to the removal of properties that only contain default values from the providers, only the person provider has the age property, while the account provider is the only one to have the property IBAN.

The vertices in the person vertex provider correspond to the following CSV data (after removing the IBAN property):

0,person,23
1,person,44
2,person,32

The vertices in the account vertex provider correspond to the following CSV data (after removing the age property):

100,account,"CH1234"
101,account,"FR1234"
102,account,"CZ1234"
103,account,"GR1234"
104,account,"IE1234"
105,account,"HU1234"

From the edge labels, two edge types are detected and the associated edge providers are created: transferred_FROM_account_TO_account and owns_FROM_person_TO_account. The transferred_FROM_account_TO_account edge provider has the amount property, while the owns_FROM_person_TO_account edge provider does not have any edge property.

The edges in the transferred_FROM_account_TO_account edge provider correspond to the following CSV data:

100,100,101,transferred,102
101,102,104,transferred,1
102,100,105,transferred,546

The edges in the owns_FROM_person_TO_account edge provider correspond to the following CSV data (after removing the amount property):

0,0,100,owns
1,0,101,owns
2,1,104,owns
3,1,105,owns
4,2,102,owns
5,2,103,owns

Example with Labeled Vertices but Unlabeled Edges Graph Data

We illustrate here how the partitioning operates on a sample graph that has labeled vertices and unlabeled edges. Some of the vertices represent persons, while others represent bank accounts. There are edges to indicate that a person owns a bank account, and edges to indicate a money transfer between accounts. However, the edges do not have labels. Persons have an age property, while bank accounts have an IBAN property. All edges have keys that we will use as IDs, money transfers have an amount property. PGX is able to distinguish the types of edges thanks to the knowledge of the types of vertices.

The vertices can be put in a CSV file labeled_vertices.csv with the following content:

0,person,23,"N/A"
1,person,44,"N/A"
2,person,32,"N/A"
100,account,-10,"CH1234"
101,account,-10,"FR1234"
102,account,-10,"CZ1234"
103,account,-10,"GR1234"
104,account,-10,"IE1234"
105,account,-10,"HU1234"

The edges can be put in a CSV file unlabeled_edges.csv with the following content:

0,0,100,42
1,0,101,42
2,1,104,42
3,1,105,42
4,2,102,42
5,2,103,42
100,100,101,102
101,102,104,1
102,100,105,546

To load this graph and partition it, we use the following graph configuration, that can be placed in the file unlabeled_edges_partitioning_config.json:

{
  "format": "csv",
  "partition_while_loading" : "by_label",
  "vertex_uris": ["labeled_vertices.csv"],
  "vertex_props":[
    {"name":"age","default":-10,"type":"integer"},
    {"name":"iban","default":"N/A","type":"string"}
  ],
  "edge_uris" : ["unlabeled_edges.csv"],
  "edge_props":[
    {"name":"amount","default":42.0,"type":"double"}
  ],
  "loading":{
    "create_edge_id_index": true,
    "create_edge_id_mapping": true,
    "partition_discard_default_values": true,
    "load_vertex_labels": true
  }
}

We can load the graph with the following commands:

var graph = session.readGraphWithProperties("/path/to/unlabeled_edges_partitioning_config.json")
PgxGraph graph = session.readGraphWithProperties("/path/to/unlabeled_edges_partitioning_config.json");

Due to the vertex labels, two vertex types are detected during partitioning and the associated vertex providers are created: person, account. Due to the removal of properties that only contain default values from the providers, only the person provider has the age property, while the account provider is the only one to have the property IBAN.

The vertices in the person vertex provider correspond to the following CSV data (after removing the IBAN property):

0,person,23
1,person,44
2,person,32

The vertices in the account vertex provider correspond to the following CSV data (after removing the age property):

100,account,"CH1234"
101,account,"FR1234"
102,account,"CZ1234"
103,account,"GR1234"
104,account,"IE1234"
105,account,"HU1234"

Two edge types are detected because of the types of the vertices. However because no labels are present in the edge data, they are given a default $$unlabeled$$ label, and the associated edge providers are created: $$unlabeled$$_FROM_person_TO_account and $$unlabeled$$_FROM_account_TO_account. The $$unlabeled$$_FROM_account_TO_account edge provider (that represents money transfers) has the amount property, while the $$unlabeled$$_FROM_person_TO_account edge provider (that represents ownership of a bank account) does not have any edge property.

The edges in the $$unlabeled$$_FROM_account_TO_account edge provider correspond to the following CSV data:

100,100,101,$$unlabeled$$,102
101,102,104,$$unlabeled$$,1
102,100,105,$$unlabeled$$,546

The edges in the $$unlabeled$$_FROM_person_TO_account edge provider correspond to the following CSV data (after removing the amount property):

0,0,100,$$unlabeled$$
1,0,101,$$unlabeled$$
2,1,104,$$unlabeled$$
3,1,105,$$unlabeled$$
4,2,102,$$unlabeled$$
5,2,103,$$unlabeled$$

Example with Unlabeled Vertices and Labeled Edges Graph Data

We illustrate here how the partitioning operates on a sample graph that has unlabeled vertices and labeled edges. Some of the vertices represent persons, while others represent bank accounts, however there are no vertex labels to distinguish them. There are labeled edges to indicate that a person owns a bank account, and edges to indicate a money transfer between accounts. Persons have an age property, while bank accounts have an IBAN property. All edges have keys that we will use as IDs, money transfers have an amount property.

The vertices can be put in a CSV file unlabeled_vertices.csv with the following content:

0,23,"N/A"
1,44,"N/A"
2,32,"N/A"
100,-10,"CH1234"
101,-10,"FR1234"
102,-10,"CZ1234"
103,-10,"GR1234"
104,-10,"IE1234"
105,-10,"HU1234"

The edges can be put in a CSV file labeled_edges.csv with the following content:

0,0,100,owns,42
1,0,101,owns,42
2,1,104,owns,42
3,1,105,owns,42
4,2,102,owns,42
5,2,103,owns,42
100,100,101,transferred,102
101,102,104,transferred,1
102,100,105,transferred,546

To load this graph and partition it, we use the following graph configuration, that can be placed in the file unlabeled_vertices_partitioning_config.json:

{
  "format": "csv",
  "partition_while_loading" : "by_label",
  "vertex_uris": ["unlabeled_vertices.csv"],
  "vertex_props":[
    {"name":"age","default":-10,"type":"integer"},
    {"name":"iban","default":"N/A","type":"string"}
  ],
  "edge_uris" : ["labeled_edges.csv"],
  "edge_props":[
    {"name":"amount","default":42.0,"type":"double"}
  ],
  "loading":{
    "create_edge_id_index": true,
    "create_edge_id_mapping": true,
    "partition_discard_default_values": true,
    "load_edge_label": true
  }
}

We can load the graph with the following commands:

var graph = session.readGraphWithProperties("/path/to/unlabeled_vertices_partitioning_config.json")
PgxGraph graph = session.readGraphWithProperties("/path/to/unlabeled_vertices_partitioning_config.json");

Due to the lack of vertex labels, the vertex types cannot be detected, and all vertices are put in the same vertex provider $$unlabeled$$. No vertex properties can be discarded.

All the vertices are put in the vertex provider $$unlabeled$$, corresponding to the following CSV data:

0,$$unlabeled$$,23,"N/A"
1,$$unlabeled$$,44,"N/A"
2,$$unlabeled$$,32,"N/A"
100,$$unlabeled$$,-10,"CH1234"
101,$$unlabeled$$,-10,"FR1234"
102,$$unlabeled$$,-10,"CZ1234"
103,$$unlabeled$$,-10,"GR1234"
104,$$unlabeled$$,-10,"IE1234"
105,$$unlabeled$$,-10,"HU1234"unt,"HU1234"

From the edge labels, two edge types are detected and the associated edge providers are created: owns_FROM_$$unlabeled$$_TO_$$unlabeled$$ and transferred_FROM_$$unlabeled$$_TO_$$unlabeled$$. The transferred_FROM_$$unlabeled$$_TO_$$unlabeled$$ edge provider has the amount property, while the owns_FROM_$$unlabeled$$_TO_$$unlabeled$$ edge provider does not have any edge property.

The edges in the transferred_FROM_$$unlabeled$$_TO_$$unlabeled$$ edge provider correspond to the following CSV data:

100,100,101,transferred,102
101,102,104,transferred,1
102,100,105,transferred,546

The edges in the owns_FROM_$$unlabeled$$_TO_$$unlabeled$$ edge provider correspond to the following CSV data (after removing the amount property):

0,0,100,owns
1,0,101,owns
2,1,104,owns
3,1,105,owns
4,2,102,owns
5,2,103,owns