PGX 1.2.0
Documentation

Building a graph from the Dataset

Before we use the PGX shell to process the datasets we must define how the resulting graph is going to be structured. For this purpose, PGX relies on a graph definition JSON file. This file describes the properties of the nodes and edges of the graph, it also defines the format of the graph and the source its data. For more information read the graph configuration reference.

The configuration JSON file that sets the graph that represents the model explained before is defined below. From the JSON file several things can be noticed:

  • First, notice that all the nodes include all the properties for all of the three kinds of nodes indistinctly. This is because PGX requires all of the nodes in a graph to have the same properties.

  • Second, notice that the JSON file specifies the format of the file to be edge_list. The edge list format is a plain text format used by PGX to represent graphs. Please read about plain text formats to better understand the inner details of edge lists.

  • Finally, notice the property named "node_class". This property allows us to distinguish between devices, connections and switches by respectively assigning them the values of 0, 1 and 2.

{
  "uri": "electric_graph.edge",
  "format": "edge_list",
  "separator": " ",
  "node_id_type": "long",
  "node_props":[
    {"name":"node_nick",       "type":"string"},
    {"name":"node_lat",        "type":"double"},
    {"name":"node_lon",        "type":"double"},
    {"name":"node_parent",     "type":"long"},
    {"name":"node_volts",      "type":"double"},
    {"name":"node_curr",       "type":"double"},
    {"name":"node_power",      "type":"double"},
    {"name":"node_conf",       "type":"string"},
    {"name":"node_remote",     "type":"boolean"},
    {"name":"node_segment",    "type":"string"},
    {"name":"node_upstream",   "type":"string"},
    {"name":"node_downstream", "type":"string"},
    {"name":"node_phase",      "type":"string"},
    {"name":"node_nominal",    "type":"long"},
    {"name":"switch_default",  "type":"boolean"},
    {"name":"connection_id",   "type":"long"},
    {"name":"node_class",      "type":"integer"}
    ],
  "edge_props":[]
}

To follow the exercise copy and paste the JSON code in to a text editor and save it as electric_graph.edge.json.

Now let's proceed to interpret the data contained in the datasets. The script below reads and process the CSV files of the datasets while saving the processed information to a file named electric_graph.edge.

/**
 * Copyright (C) 2013 - 2015 Oracle and/or its affiliates. All rights reserved.
 */

// This script generates graph in EDGE_LIST format from the datasets in:
///   NodeData-8500.csv
//    ConnectData-8500.csv 
//    SwitchConfig-8500.csv
//
// The resulting graph will be stored in:
//    electric_graph.edge

// Get dataset paths
nodes_file_path         = "dataset/NodeData-8500.csv"
connections_file_path   = "dataset/ConnectData-8500.csv"
switch_config_file_path = "dataset/SwitchConfig-8500.csv"

// Configure path to save the generated graph
output_file_path = "electric_graph.edge"

// NodeData-8500.csv related property indexes
idx_node_id          = 0  // "Node ID"
idx_node_nick        = 1  // "Nickname"
idx_node_lat         = 2  // "Latitude"
idx_node_lon         = 3  // "Longitude"
idx_node_parent      = 4  // "Node Parent"
idx_node_volts       = 5  // "Base Volts" 
idx_node_curr        = 6  // "Base Current" 
idx_node_power       = 7  // "Base Power" 
idx_node_conf        = 8  // "Configuration"
idx_node_remote      = 9  // "RemoteControlAvailable" 
idx_node_segment     = 10 // "SegmentId" 
idx_node_upstream    = 11 // "Upstream Connection" 
idx_node_downstream  = 12 // "Downstream Connection" 
idx_node_phase       = 13 // "Phase" 
idx_node_nominal     = 14 // "Nominal Feeder"

// ConnectData-8500.csv related property indexes
idx_connection_id    = 0 // "Connect ID"

// SwitchConfig-8500.csv related property indexes
idx_switch_conf      = 0 // "Configuration"
idx_switch_default   = 1 // "Normal Position"

// Define a buffered file writer to save the graph to a file
file_writer = new FileWriter(new File(output_file_path), false)
buff_writer = new BufferedWriter(file_writer)

// Build a switch configuration HashMap called switch_map. This HashMap helps
// to set the correct value for every node's "switch_default" property 
// in accordance to the value of its "node_conf" property. So, if a node
// has a "node_conf" property set as a switch, switch_map will aid in
// providing its default configuration.
first_line = true
switch_map = new HashMap<String, Boolean> () // keeps track of default switch configurations
new File(switch_config_file_path).splitEachLine(',') { row ->
    if (first_line) {
        first_line = false
    }
    else {
        switch_conf     = row[idx_switch_conf]
        switch_default  = row[idx_switch_default]
        if (switch_default == '"open"')
        {
            switch_default = false // Deny pass
        }
        else
        {
            switch_default = true // Allow pass
        }
        switch_map.put(switch_conf, switch_default)
    }
}

// Add "device" nodes to the graph file
first_line = true
new File(nodes_file_path).splitEachLine('","') { row ->
    if (first_line) {
        first_line = false
    }
    else {
        // Extract node properties from each row
        node_id         = row[idx_node_id]
        node_nick       = row[idx_node_nick]
        node_lat        = row[idx_node_lat]
        node_lon        = row[idx_node_lon]
        node_parent     = row[idx_node_parent]
        node_volts      = row[idx_node_volts]
        node_curr       = row[idx_node_curr]
        node_power      = row[idx_node_power]
        node_conf       = row[idx_node_conf]
        node_remote     = row[idx_node_remote]
        node_segment    = row[idx_node_segment]
        node_upstream   = row[idx_node_upstream]
        node_downstream = row[idx_node_downstream]
        node_phase      = row[idx_node_phase]
        node_nominal    = row[idx_node_nominal]
        switch_default  = true // Asume nodes allow pass, this will be changed 
                               // later for switches set to false by default

        connection_id   = 0    // Since the node is not a connection node this
                               // property is set to 0

        node_class      = 0    // Node class: 0 device, 1 connection, 2 switch

        // Remove spurious quotes
        node_id      = node_id.replaceAll('"', '')
        node_nominal = node_nominal.replaceAll('"', '')

        // Make sure string properties are between quotes
        node_nick       = '"' + node_nick + '"'
        node_conf       = '"' + node_conf + '"'
        node_segment    = '"' + node_segment + '"'
        node_upstream   = '"' + node_upstream + '"'
        node_downstream = '"' + node_downstream + '"'
        node_phase      = '"' + node_phase + '"'

        // If no geographical information is defined then define default impossible coordinates
        if (node_lat == "" || node_lon == "") {
            node_lat =  "200"
            node_lon =  "200"
        }

        // If the "node_parent" property is not set, then set it to-1
        if (node_parent == "") {
            node_parent = "-1"
        }

        // If the "nominal_feeder" property for a node is not defined, then set it to -1
        if (node_nominal == "") {
            node_nominal = "-1"
        }

        // If the node's "node_conf" property corresponds to a switch, then it should
        // be defined in the switch_map HashMap. If the switch_map maps the value of 
        // "node_conf" to false, then set the "switch_default" property to false.
        if (switch_map.get(node_conf) == false) {
            switch_default  = false
        }

        // If the node is a switch, then set its "node_class" property to 2
        if (switch_map.get(node_conf) != null) {
            node_class = 2
        }

        // Compose the line that will be writen to the graph file
        line = node_id          + " * " \
             + node_nick        + " " \
             + node_lat         + " " \
             + node_lon         + " " \
             + node_parent      + " " \
             + node_volts       + " " \
             + node_curr        + " " \
             + node_power       + " " \
             + node_conf        + " " \
             + node_remote      + " " \
             + node_segment     + " " \
             + node_upstream    + " " \
             + node_downstream  + " " \
             + node_phase       + " " \
             + node_nominal     + " " \
             + switch_default   + " " \
             + connection_id    + " " \
             + node_class       + "\n"

        // Write line to file
        buff_writer.write(line)
    }
}
// Flush the buffer of the buffered file writer
buff_writer.flush()


// Add "connection" nodes to the graph file
first_line = true
new File(connections_file_path).splitEachLine(','){ row ->
    if (first_line) {
        first_line = false
    }
    else {
        // Extract node properties from each row.
        // Connection nodes set all of its properties to their default
        // values except for "node_id", "connection_id" and "node_class".
        node_id         = row[idx_connection_id]
        node_nick       = '"null"'
        node_lat        = 200
        node_lon        = 200
        node_parent     = -1
        node_volts      = -1
        node_curr       = -1
        node_power      = -1
        node_conf       = '"null"'
        node_remote     = false
        node_segment    = '"null"'
        node_upstream   = '"null"'
        node_downstream = '"null"'
        node_phase      = '"null"'
        node_nominal    = -1
        switch_default  = true // True == allow pass, False == deny pass
        connection_id   = row[idx_connection_id]
        node_class      = 1  // Node class: 0 device, 1 connection, 2 switch

        // Remove spurious quotes
        node_id       = node_id.replaceAll('"', '')
        connection_id = connection_id.replaceAll('"', '')

        // Compose the line that will be writen to the graph file
        line = node_id          + " * " \
             + node_nick        + " " \
             + node_lat         + " " \
             + node_lon         + " " \
             + node_parent      + " " \
             + node_volts       + " " \
             + node_curr        + " " \
             + node_power       + " " \
             + node_conf        + " " \
             + node_remote      + " " \
             + node_segment     + " " \
             + node_upstream    + " " \
             + node_downstream  + " " \
             + node_phase       + " " \
             + node_nominal     + " " \
             + switch_default   + " " \
             + connection_id    + " " \
             + node_class       + "\n"

        // Write line to file
        buff_writer.write(line)
    }
}
// Flush the buffer of the buffered file writer
buff_writer.flush()


// Add "connection" edges to the graph file
first_line = true
new File(connections_file_path).splitEachLine(',') { row ->
    if (first_line) {
        first_line = false
    }
    else {
        edge_source = row[idx_connection_id].replaceAll('"', '')

        // Iterate over nodes listed for each connection row
        // Remember that each connection node is linked with 
        // at most 8 devices.
        for (i = 1; i < 9; i ++) {
            if (row[i] != '""') {
                edge_sink = row[i].replaceAll('"', '')

                // Compose line to be writen to file
                line = edge_source + " " \
                     + edge_sink   + "\n"

                // Write line to file
                buff_writer.write(line)
            }
        }
    }
}
// Flush the buffer of the buffered file writer and close it
buff_writer.flush()
buff_writer.close()

println "Type exit to quit"

To run the script, please copy its code and paste it in a text editor. Then save the script with the name buildGraph.groovy to the same location where electric_graph.edge.json is stored. Now, in the same folder, create a new sub directory named datasets and move the CSV files in to it. Finally run the command:

pgx buildGraph.groovy

and wait for it to finish. Then type exit to close the PGX shell. Now you should have a new file in the directory named electric_graph.edge, this file contains the nodes and edges that compose the electric network graph.

Even though the edge_list format is good for many graph applications it can be a little slow to load a graph encoded into this format. Luckily PGX supports the use of the PGB binary format which is faster to load and also in many cases produces smaller files. To transform our newly created network graph from edge_list to PGB you can type the following on the interactive PGX shell:

g = session.readGraphWithProperties("electric_graph.edge.json")
g.store(Format.PGB, "electric_graph.pgb",  VertexProperty.ALL, EdgeProperty.ALL, true)

This code reads the original edge list graph defined in electric_graph.edge.json and saves it to the PGB format by creating two new files electric_graph.pgb and electric_graph.pgb.json. So far the directory structure of the example should look like this:

.
|__ dataset
|   |__ ConnectData-8500.csv
|   |__ NodeData-8500.csv
|   |__ SwitchConfig-8500.csv
|__ buildGraph.groovy
|__ electric_graph.edge
|__ electric_graph.edge.json
|__ electric_graph.pgb
|__ electric_graph.pgb.json