PGX 1.2.0
Documentation

Retrieving Reachable Nodes from a Source With BFS

Once we have generated the graph for the electrical network model, it is time to perform some simulations on it to study its connectivity. As discussed in the first section of this use case, we will perform a modified version of Breadth First Search (BFS) on the graph starting from a source node to determine which nodes can be reached from such source node.

BFS is an algorithm for traversing a graphs. It starts at some arbitrary node sometimes referred to as source node and explores its neighbor nodes first, then it moves to the next level of neighbors and repeats the process. However, we are interested in modifying BFS to obey the additional restriction of not trespassing those nodes that represent open switches. To achieve this we wrote a custom Green-Marl program. Green-Marl is a domain specific language specially tailored to write graph related programs. It naturally takes advantage of the available parallel execution capabilities of PGX's architecture. The program below is the implementation of the modified BFS traversal.

/*
* Copyright (C) 2013 - 2015 Oracle and/or its affiliates. All rights reserved.
*/
/*
 * Performs a Breadth First Search (BFS) over a graph starting from a source node s,
 * marking each node with its respective BFS level or -1 if the node is not reachable.
 */

procedure bfs_reachable(G: graph, s: node, switch_default: nodeProp<bool>; bfs_level: nodeProp<int>, bfs_parent: nodeProp<node>)
{
    G.bfs_level = -1;

    inBFS(v: G.nodes from s)[v.switch_default==true]
    {
        v.bfs_level = currentBFSLevel();
        v.bfs_parent = v.BFSParentNode();
    }
}

The steps executed by the Green-Marl program above are are now explained:

  • The program receives as input arguments a graph G, a source node s, and boolean node property switch_default. It also receives an integer node property bfs_parent, and node typed node property bfs_parent which are interpreted as output reference arguments.

  • The program starts by setting the bfs_level property to -1 for each of the nodes in the graph G.

  • Then, the instruction inBFS(v: G.nodes from s)[v.switch_default==true] states that a BFS traversal will be performed from the source node s but only navigating through nodes v which property switch_default is set to true.

  • Finally, for each node v that is part of the traversal the properties bfs_level and bfs_parent are set by the functions currentBFSLevel() and BFSParentNode() respectively.

To do the exercise, copy the Green-Marl program to a text editor and save it with the name bfs_reachable.gm to the directory where the other files of this use case have been saved. This program can be compiled and used through the PGX shell, which will be explained next. The PGX shell script below performs the following steps:

  • Define the path to the network graph file as well as the intended path for the CSV report.

  • Load the electric network model graph and undirected it.

  • Create two properties, bfs_parent and bfs_level, that will serve as containers for the output of the custom BFS Green-Marl program defined before.

  • Select a source node s from the graph. In this example we are starting from the node numbered 21474836490.

  • Compile the program stored in bfs_reachable.gm to an object named gm_bfs. Then execute the traversal and save the execution summary to a dictionary named execution_summary_dic.

  • Once the traversal has been executed over the graph, those reached nodes should have their bfs_level property set to a number equal or greater than 0. To retrieve the traversed nodes the PGS script will perform the following PGQL query:

    • select n, n.bfs_level, n.bfs_parent from node n where n.bfs_level >= 0 orderby n.bfs_level
  • Iterate over the results of the PGQL query and save them to the CSV report.

  • Print the measured times for each of the stages.

// Input and output config
graph_path  = "electric_graph.pgb.json"
output_path = "reachable_subgraph.csv"

// Load electric network graph in to PGX and create useful properties
startTime = System.currentTimeMillis()
g = session.readGraphWithProperties(graph_path)
g = g.undirect()
loadingTime = System.currentTimeMillis() - startTime

g.createVertexProperty(PropertyType.VERTEX, "bfs_parent")
g.createVertexProperty(PropertyType.INTEGER, "bfs_level")

// Select source vertex
s = g.getVertex(21474836490)

// Compile and execute BFS Green-Marl program
gm_bfs = session.compileProgram("bfs_reachable.gm")
execution_summary_dic = gm_bfs.run(g, s, g.getVertexProperty("switch_default"), g.getVertexProperty("bfs_level"),  g.getVertexProperty("bfs_parent"))

// Query vertices using gmql
startTime = System.currentTimeMillis();
rs = g.queryGmql("SELECT n, n.bfs_level, n.bfs_parent WHERE (n WITH n.bfs_level >= 0) ORDER BY n.bfs_level")
gmqlTime = System.currentTimeMillis() - startTime;

// Build and save a report
startTime = System.currentTimeMillis();
file_writer = new FileWriter(new File(output_path), false)
buff_writer = new BufferedWriter(file_writer)

println "Dumping output results to " + output_path

line = ""
rs.getGmqlResultElements().each {
  line += it.getVarName() + ","
  System.out.println(it.getVarName() + " , " + it.getElementType().name())
}
line += "\n"
buff_writer.write(line)

rs.getResults().each {
    line = it.getVertex("n").getId() + "," + it.getInteger("n.bfs_level") + "," + it.getVertex("n.bfs_parent").getId() + "\n"
    buff_writer.write(line)
}

buff_writer.flush()
buff_writer.close()
reportTime = System.currentTimeMillis();

"\n\n"
"\tGraph loading time: " + loadingTime/1000 + " seconds"
"\tBFS elapsed time: " + execution_summary_dic["executionTimeMs"] + " millisenconds"
"\tPGQL execution time: " + gmqlTime + " milliseconds"
"\tReached nodes: " + rs.getNumResults()
"\tReport saving time: " + (reportTime - startTime)/1000 + " seconds"
"\tTotal vertices: " + g.getNumVertices()
"\tTotal edges: " + g.getNumEdges()
"\n"
"\tType exit to quit ... "
"\n\n"

Now copy the script above to a text editor and save it with the name getReachablePGQL.groovy to the directory where the other files of this use case have been saved. Then run pgx getReachablePGQL.groovy to execute the script. You should see an output similar to what is shown below.

==>     Graph loading time: 0.933 seconds
==>     BFS elapsed time: 30 millisenconds
==>     PGQL execution time: 253 milliseconds
==>     Reached nodes: 10911
==>     Report saving time: 0.992 seconds
==>     Total vertices: 10916
==>     Total edges: 21860
==>

==>     Type exit to quit ...

The execution of the script produced a file named reachable_subgraph.csv, it encodes the traversed route from the source to each of the reached nodes in a tabular format. This file can be opened with any spreadsheet program. The first 10 rows of the resulting table are shown below.

n n.bfs_level n.bfs_parent
21474836490 0 -1
6001 1 21474836490
38654705676 2 6001
38654705675 2 6001
38654705677 2 6001
6002 3 38654705676
30064771113 4 6002
6003 5 30064771113
8589934822 6 6003

As the results show, our modified BFS algorithm traversed 10911 nodes, we can modify the configuration of some switch nodes to change the result. Let s1 and s2 be the switch nodes corresponding to the nodes with id numbers 8589937331 and 8589937325 respectively. Both of this nodes have their switch_default property set to true, by changing it to false they will not allow the flow of energy through them in the simulation. Add the following lines getReachablePGQL.groovy just beneath the source selection instruction s = g.getVertex(21474836490) in line 15 and save the changes.

sw1 = g.getVertex(8589937331)
sw2 = g.getVertex(8589937325)

switch_default = g.getVertexProperty("switch_default")
switch_default.set(sw1, false)
switch_default.set(sw2, false)

To finish the use case, run again the getReachablePGQL.groovy script and look at the results below. As you may notice the restriction made to the switch nodes sw1 and sw2 prevented the BFS traversal to trespass them, thus reducing the number of nodes reachable from the source.

==>     Graph loading time: 0.921 seconds
==>     BFS elapsed time: 31 millisenconds
==>     PGQL execution time: 251 milliseconds
==>     Reached nodes: 10721
==>     Report saving time: 0.971 seconds
==>     Total vertices: 10916
==>     Total edges: 21860
==>

==>     Type exit to quit ...

Delete the changes made to getReachablePGQL.groovy and proceed to the next section.