PGX 20.1.1
Documentation
NB: This is an enterprise feature and it is only available in PGX versions embedded in supported Oracle products.

Load and Store Data from and to HDFS

PGX supports the Hadoop Distributed File System (HDFS). In this tutorial you will learn how to load and store graph data from and to HDFS via PGX APIs. PGX Hadoop support was designed to work with any Cloudera CDH 5.4.x -compatible Hadoop cluster.

Conceptually, you have to

  • Make sure you install Cloudera Hadoop Client.
  • Make sure $HADOOP_CONF_DIR is on the classpath, so your Hadoop configuration is found
  • Use hdfs: as path prefix in the uri graph configuration field when referring to files located in HDFS.

Graph configuration files are parsed client-side

In PGX client/server mode you also need to have Hadoop available on the client-side (HADOOP_CONF_DIR set) if not only the graph data, but also the graph configuration files are located in HDFS. This is because the configuration files are parsed on the client-side before they are sent to the server.

Load Data from HDFS

Let's assume that we have the connections.edge_list graph data and its configuration file of the load a graph tutorial stored in HDFS instead of the local file system. First, copy the graph data into HDFS:

cd $PGX_HOME
hadoop fs -mkdir -p /user/pgx
hadoop fs -copyFromLocal examples/graphs/connections.edge_list /user/pgx

Next, edit the uri field of the sample graph configuration file to point to the newly created HDFS resource:

{
  "uri": "hdfs:/user/pgx/connections.edge_list",
  "format": "adj_list",
  "vertex_props": [{
    "name": "prop",
    "type": "integer"
  }],
  "edge_props": [{
    "name": "cost",
    "type": "double"
  }],
  "separator": " "
}

Copy the configuration file into HDFS as well:

cd $PGX_HOME
hadoop fs -copyFromLocal examples/graphs/connections.edge_list.json /user/pgx

To load the sample graph from HDFS into PGX, do

var g = session.readGraphWithProperties("hdfs:/user/pgx/connections.edge_list.json")
import oracle.pgx.api.*;
...
PgxGraph g = session.readGraphWithProperties("hdfs:/user/pgx/connections.edge_list.json")

Store Data into HDFS

Let's store our loaded sample graph back into HDFS in PGB format.

var config = g.store(Format.PGB, "hdfs:/user/pgx/connections.pgb")
import oracle.pgx.api.*;
import oracle.pgx.config.*;

GraphConfig pgbGraphConfig = g.store(Format.PGB, "hdfs:/user/pgx/connections.pgb");

Verify that the PGB file was created:

hadoop fs -ls /user/pgx

Compile Green-Marl Code Stored in HDFS

PGX supports compilation of Green-Marl code stored in HDFS. Example:

var p = session.compileProgram("hdfs:/user/pgx/max_degree.gm")
import oracle.pgx.api.*;

CompiledProgram p = session.compileProgram("hdfs:/user/pgx/max_degree.gm");

As with graph configuration files, the Green-Marl code is read from HDFS client-side if running in client/server mode.

Compile and Run as Java Application

Here is the full Java class of the above examples:

import oracle.pgx.api.CompiledProgram;
import oracle.pgx.api.Pgx;
import oracle.pgx.api.PgxGraph;
import oracle.pgx.api.PgxSession;
import oracle.pgx.config.Format;
import oracle.pgx.config.GraphConfig;

public class HdfsExample {

  public static void main(String[] mainArgs) throws Exception {
    PgxSession session = Pgx.createSession("my-session");
    PgxGraph g1 = session.readGraphWithProperties("hdfs:/user/pgx/connections.edge_list.json");

    GraphConfig pgbConfig = g1.store(Format.PGB, "hdfs:/user/pgx/sample.pgb");
    PgxGraph g2 = session.readGraphWithProperties(pgbConfig);
    System.out.println("g1 N = " + g1.getNumVertices() + " E = " + g1.getNumEdges());
    System.out.println("g2 N = " + g2.getNumVertices() + " E = " + g2.getNumEdges());

    CompiledProgram p = session.compileProgram("hdfs:/user/pgx/max_degree.gm");
    System.out.println("compiled " + p.getName());
  }
}

To compile above class, do

cd $PGX_HOME
mkdir classes
javac -cp lib/common/*:lib/embedded/*:third-party/* examples/java/HdfsExample.java -d classes

To run it, do

java -cp lib/common/*:lib/embedded/*:third-party/*:classes:conf:$HADOOP_CONF_DIR HdfsExample