PGX 1.2.0

Testing PGX Over Large Electric Network Models

PGX is a fast, parallel, in-memory graph analytics framework. This parallelization capability allows it to operate efficiently over large graphs without noticeable loss of performance.

To test the scalability of PGX we can scale up the original electrical network model and repeat the custom BFS traversal over this new graph. This scaling can be done by recreating the original structure of the electrical network graph constructing several unconnected components. For this example we are going to scale the model to a model 100 times larger. The figure below exposes the idea of the BFS algorithm running on the scaled model. Notice how the number of reached nodes does not increase since the BFS is confined to a single component.


Scaled model with 100 components

This whole procedure is done by the following script:

input_file_path  = "electric_graph.edge"
output_file_path = "electric_graph100x.edge"

offset  = 17179900000
size_factor = 100

file_writer = new FileWriter(new File(output_file_path), false)
buff_writer = new BufferedWriter(file_writer)

count = 0
new File(input_file_path).withReader { reader ->
    while ((line = reader.readLine())!=null) {
        line_elements = line.split(" ")
        if (line_elements[1] == "*") {
            for (i = 0; i < size_factor; i ++) {
                line_elements[0] = (line_elements[0].toLong() + offset * i).toString()
                line = line_elements.join(" ")
                buff_writer.write(line + "\n")
        else {
            for (i = 0; i < size_factor; i ++) {
                line_elements[0] = (line_elements[0].toLong() + offset * i).toString()
                line_elements[1] = (line_elements[1].toLong() + offset * i).toString()
                line = line_elements.join(" ")
                buff_writer.write(line + "\n")
        if (count == 1000)
        count ++


println "Graph multiplied by " + size_factor + " and saved to " + output_file_path

Basically the script reads electric_graph.edge and saves 100 nodes for each of the nodes in it. This is done by adding an offset value to each of the node identification numbers for every time the original node is multiplied. The process is repeated for each of the edges in the file as well. The final result is saved to electric_graph100x.edge.

Save the script as multiplyGraph100x.groovy and execute it with the PGX shell. Then, make a copy of the JSON file electric_graph.edge.json and name it electric_graph100x.edge.json. Open this newly created JSON file and edit the uri value to point to electric_graph100x.edge. The first lines of electric_graph100x.edge.json should look like this:

  "uri": "electric_graph100x.edge",
  "format": "edge_list",
  "separator": " ",
  "node_id_type": "long",

Now you have an electrical network graph 100 times larger than the original. This graph is in edge list format so now is time to transform it to a more convenient PGB format. Type the following on the PGX shell to achieve this:

g = session.readGraphWithProperties("electric_graph100x.edge.json"), "electric_graph100x.pgb",  VertexProperty.ALL, EdgeProperty.ALL, true)

Now we can test our getReachablePGQL.groovy script over the new graph. Just open it on a text editor and change the first line from graph_path = "electric_graph100x.edge.json" to graph_path = "electric_graph100x.pgb.json". Save the changes and run the script from the PGX shell. You should get a result similar to the one below.

==>     Graph loading time: 2.7 seconds
==>     BFS elapsed time: 62 millisenconds
==>     PGQL execution time: 264 milliseconds
==>     Reached nodes: 10911
==>     Report saving time: 1.021 seconds
==>     Total vertices: 1091600
==>     Total edges: 2186000

==>     Type exit to quit ...

Notice how the BFS execution time barely doubled despite the fact that the graph in memory was 100 times larger than the original. You can repeat what you learned on this section to perform the same experiment over graphs even larger but for your convenience we have performed it over graphs 12, 100, 150, 500, 1000, 2000 and 10000 times larger. The results can be seen in the next table.

Data Size Vertices Edges Reached nodes Loading time (secs) BFS time (ms) PGQL time (ms)
1x 10916 21860 10911 0.719 30 266
12x 130992 262320 10911 1.258 57 258
100x 1091600 2186000 10911 2.701 62 264
500x 5458000 10930000 10911 7.731 68 265
1000x 10916000 21860000 10911 21.974 59 220
2000x 21832000 43720000 10911 41.642 75 318
10000x * 109160000 218600000 10911 37.452 96 323

For the 10000 times larger graph we dropped all the node properties of the graph, except the switch_default property (since that is the one this analysis uses). The BFS column of the table generates the next plot.



  • PGX can be easily used to model complex networks such as those that represent electrical distribution networks.

  • Green-Marl allows the user to write complicated graph routines in a simple elegant way that will take full advantage of the parallel capabilities of the machine where the program is run.

  • PGX can perform operations over very large graphs with high performance.