PGX 20.2.2
Documentation

Storing a Graph into Multiple Files

In this tutorial, you will learn how to store a graph into multiple files. As specified in Exporting Graphs, we will use the store() method on a PgxGraph object. Most parameters are the same, as if storing to a single file. However, the main difference lies in specifying how to partition the data. We have two ways of achieving the partitioning of the data: either by specifying a FileGraphStoringConfig or by specifying a base path and the number of partitions.

Export into Multiple Files Using FileGraphStoringConfig

Using a FileGraphStoringConfig, we can specify a more detailed way of creating the multiple partitions used to store the graph. We create a FileGraphStoringConfig object using a FileGraphStoringConfigBuilder.

The following code specifies that the storing should be done into four partitions using the specified base path and using zero as the initial index for the partitioning, it also contains the file extension to use for vertex files and for edge files, last it sets comma as the delimiter to be used when storing the graph data:

FileGraphStoringConfig storingConfig = new FileGraphStoringConfigBuilder(basePath) //
  .setNumPartitions(4) //
  .setInitialPartitionIndex(0) //
  .setVertexExtension(vertexExtension) //
  .setEdgeExtension(edgeExtension) //
  .setDelimiter(',') //
  .build();

Export into Multiple Files without FileGraphStoringConfig

When we only need to specify how many partitions we want and the base name to use, it is simpler to use store() by only specifying those parameters. Following this procedure PGX will use defaults for the other fields. For details on default values please refer to the corresponding table here.

Export into Multiple Files Using a Graph Configuration Object

There is another way of using this feature, by creating a FileGraphStoringConfig and putting it into a Graph Configuration object using setStoringOptions in its builder, and then using the corresponding version of the store() method.

Storing Partitioned Graphs into Multiple Files

For a partitioned graph we can specify as well in how many files we want to export the graph.

When we want to partition all tables equally, we can use the numPartitions parameter. In that case all tables are exported into the same number of files.

If we don't want to partition the tables equally we can either create one PartitionedGraphConfig which contains for each provider a FileGraphStoringConfig or we can use a version of store() that takes two maps of FileGraphStoringConfigs, one for the vertex tables and one for the edge tables.

For the first option we create for each vertex and edge table a FileGraphStoringConfig and put it into a FileEntityProviderConfig using setStoringOptions in the builder of FileEntityProviderConfig. The providers are then added to the PartitionedGraphConfig as edge and vertex providers using addVertexProvider() and addEdgeProvider() in the builder of PartitionedGraphConfig. Afterwards, we can use the store() method which takes the PartitionedGraphConfig as parameter.

The second option creates for every edge and vertex table a storing configuration, adds those into a vertex provider and an edge provider map and calls the corresponding store() method with these maps as parameters.

FileGraphStoringConfig vertexStoringConfig1 = new FileGraphStoringConfigBuilder(basePath + "_vertexTable1") //
  .setNumPartitions(4) //
  .setInitialPartitionIndex(0) //
  .setVertexExtension(vertexExtension) //
  .setDelimiter(',') //
  .build();

FileGraphStoringConfig vertexStoringConfig2 = new FileGraphStoringConfigBuilder(basePath + "_vertexTable2") //
  .setNumPartitions(4) //
  .setInitialPartitionIndex(0) //
  .setVertexExtension(vertexExtension) //
  .setDelimiter(',') //
  .build();

FileGraphStoringConfig edgeStoringConfig1 = new FileGraphStoringConfigBuilder(basePath + "_edgeTable1") //
  .setNumPartitions(4) //
  .setInitialPartitionIndex(0) //
  .setEdgeExtension(edgeExtension) //
  .setDelimiter(',') //
  .build();

Map<String, FileGraphStoringConfig> vertexStoringConfigs = new HashMap<>();
vertexStoringConfigs.put("vertexTable1", vertexStoringConfig1);
vertexStoringConfigs.put("vertexTable2", vertexStoringConfig2);

Map<String, FileGraphStoringConfig> edgeStoringConfigs = new HashMap<>();
edgeStoringConfigs.put("edgeTable1", edgeStoringConfig);

Special Notes

Exporting to multiple files is specially useful when used in conjunction with Parallel File Loading.