15.2 Loading Graph Data in Parallel from Multiple Files

You can load a graph in parallel using multiple files.

The following example demonstrates how to load graph data from multiple files.

For example, consider a vertex file split into four partitions as shown:

vertex_file1

1,Color,1,red,,
2,Color,1,yellow,,
vertex_file2

3,Color,1,blue,,
4,Color,1,green,,
vertex_file3

5,Color,1,orange,,
6,Color,1,white,,
vertex_file4

7,Color,1,black,,

The edge file is split into two partitions as shown:

edge_file1

1,1,2,edge1,Weight,4,,1.0,
2,2,3,edge2,Weight,4,,2.0,
3,3,4,edge3,Weight,4,,3.0,
edge_file2

4,4,5,edge4,Weight,4,,4.0,
5,5,6,edge5,Weight,4,,5.0,
6,6,7,edge6,Weight,4,,6.0,

The following graph configuration can be used to load the graph data from four vertex files and two edge files into the same graph. Note that all the uris are specified inside the JSON graph configuration.

{
  "format": "flat_file",
  "vertex_uris": ["vertex_file1", "vertex_file2", "vertex_file3", "vertex_file4"],
  "edge_uris": ["edge_file1", "edge_file2"],
  "separator": ",",
  "edge_props": [
    {
      "name": "Weight",
      "type": "double"
    }
  ],
  "vertex_props": [
    {
      "name": "Color",
      "type": "string"
    }
  ]
}

You can also create a graph configuration with multiple file partitions using Java as shown:

FileGraphConfig config = GraphConfigBuilder
   .forFileFormat(Format.FLAT_FILE)
   .setSeparator(",")
   .addVertexUri("vertex_file1")
   .addVertexUri("vertex_file2")
   .addVertexUri("vertex_file3")
   .addVertexUri("vertex_file4")
   .addEdgeUri("edge_file1")
   .addEdgeUri("edge_file2")
   .addVertexProperty("Color", PropertyType.STRING)
   .addEdgeProperty("Weight", PropertyType.DOUBLE)
   .build();

Note:

The graph configuration in the preceding codes include one double edge property named "Weight" and one string vertex property named "Color".

You can now load the graph data from the files as explained in Creating a graph using graph builder API.

The graph server (PGX) will automatically load the graph in parallel, using one thread for each file. This means that a graph can be loaded in parallel with as many threads as files are given depending on the configured parallelism for the graph server (PGX) instance.

Note:

Since the graph config will be used for all of the specified files, it is crucial to use the same format for all these files, that is, using the same separator, having the same defined properties, complying with the same format specification.