PGX 20.1.1
Documentation

Built-In Algorithms

PGX includes a wide selection of optimized graph algorithms that can be invoked through the Analyst. The following table provides an overview of the available algorithms, grouped by category.

Category Algorithms
Classic graph algorithms Prim's Algorithm
Community detection Conductance Minimization (Soman and Narang Algorithm), Infomap, Label Propagation, Louvain
Connected components Strongly Connected Components, Weakly Connected Components (WCC)
Link predition WTF (Whom To Follow) Algorithm
Matrix factorization Matrix Factorization
Other Graph Traversal Algorithms
Path finding All Vertices and Edges on Filtered Path, Bellman-Ford Algorithms, Bidirectional Dijkstra Algorithms, Compute Distance Index, Compute High-Degree Vertices, Dijkstra Algorithms, Enumerate Simple Paths, Fast Path Finding, Fattest Path, Filtered Fast Path Finding, Hop Distance Algorithms
Ranking and walking Closeness Centrality Algorithms, Degree Centrality Algorithms, Eigenvector Centrality, Hyperlink-Induced Topic Search (HITS), PageRank Algorithms, Random Walk with Restart, Stochastic Approach for Link-Structure Analysis (SALSA) Algorithms, Vertex Betweenness Centrality Algorithms
Structure evaluation Adamic-Adar index, Bipartite Check, Conductance, Cycle Detection Algorithms, Degree Distribution Algorithms, Eccentricity Algorithms, K-Core, Local Clustering Coefficient (LCC), Modularity, Partition Conductance, Reachability Algorithms, Topological Ordering Algorithms, Triangle Counting Algorithms

The rest of this page illustrates how to execute these built-in algorithms.

First, we describe the design of the Analyst API and explain how a Green-Marl signature can be mapped to the corresponding method in the Analyst API. Second, we show how to run two of the built-in algorithms: Triangle Counting and PageRank.

Analyst API Design

All built-in algorithms are written in Green-Marl, a domain-specific language for graph algorithms. Users can write their own custom graph algorithms using the Java-based PGX Algorithm API or the Green-Marl graph DSL. The oracle.pgx.api.Analyst class provides convenience methods for invoking a set of builtin algorithms that we already include with PGX. For each builtin algorithm, we provide both the Java (i.e., PGX Algorithm) and the Green-Marl source code.

The difference between the signature of an algorighm written in Green-Marl (or its Java frontend) and the corresponding method signature in the Analyst class is that Green-Marl uses output arguments, whereas the Analyst method use the return value. For example, the following snippet shows the signature of the PageRank procedure in the Java-based PGX Algorithm API and in the Green-Marl DSL.

public class Pagerank {
  public void pagerank(PgxGraph g, double tol, double damp, int maxIter, boolean norm,
      @Out VertexProperty<Double> rank) {
         ...
  }
}
procedure pagerank(G: graph, e,d: double, max: int; rank: nodeProp<double>)

The next snippet shows the corresponding method found in the Analyst class (notice the return value).

VertexProperty<ID, Double> pagerank(PgxGraph graph, double e, double d, int max);

Asynchronous Methods

PGX also provides asynchronous versions of every built-in algorithm which immediately return a Future object. These can later be used to retrieve the algorithm result or to cancel the running request. The asynchronous version of PageRank looks like this:

PgxFuture<VertexProperty<ID, Double>> pagerankAsync(PgxGraph graph, double e, double d, int max);

All asynchronous PGX methods are suffixed Async by convention and return PgxFuture, a JDK7-compatible version of Java 8's CompletableFuture which allows you to chain various asynchronous requests together in a non-blocking fashion. Read more about this technique here.

Read more about PGX APIs here.

Read The Graph

First, create a session:

cd $PGX_HOME
./bin/pgx-jshell
// starting the shell will create an implicit session
import oracle.pgx.api.*;

...

PgxSession session = Pgx.createSession("my-session");

let p = pgx.connect(url, options);

Next, load a graph into memory:

pgx> G = session.readGraphWithProperties("examples/graphs/connections.edge_list.json")
PgxGraph G = session.readGraphWithProperties("examples/graphs/connections.edge_list.json");

Run Triangle Counting

Check out the Green-Marl code of triangle counting:

/*
 * Copyright (C) 2013 - 2020 Oracle and/or its affiliates. All rights reserved.
 */

procedure triangle_counting(graph G) : long {
  long t = 0;

  foreach (u: G.nodes) {
    foreach (v: u.inOutNbrs) (v > u) {
      foreach (w: u.inOutNbrs) (w > v) {
        if (v.hasEdgeTo(w)) {
          t++;
        }
      }
      foreach (w: u.inOutNbrs) (w > v) {
        if (v.hasEdgeFrom(w)) {
          t++;
        }
      }
    }
  }
  return t;
}

Triangle Counting is defined on undirected graphs only. In PGX however, graphs are always directed by default after loaded into memory. That is why the PGX Analyst#countTriangles() algorithm implicitly creates an undirected copy of the loaded graph first by calling PgxGraph#undirect(). The copy is destroyed automatically after the algorithm terminates. Besides the input graph, the Analyst#countTriangles() method takes a second boolean parameter you can use to trade memory for speed. If set to true, PGX will additionally sort the undirected copy by degree which makes the algorithm run faster but increases memory consumption. If set to false, the algorithm might run slower, but less peak memory is consumed during computation.

pgx> analyst.countTriangles(G, true)
===> 1
Analyst analyst = session.createAnalyst();
long triangles = analyst.countTriangles(G, true);

The algorithm found 1 triangle in our sample graph.

Tip: Increase log output

With the PGX Shell, you can easily increase the amount of log output that PGX prints during execution by changing the logging level with the :loglevel command. See details about the logging here.

Run PageRank

Let's take a look at the source code of the Green-Marl implementation of PageRank:

/*
 * Copyright (C) 2013 - 2020 Oracle and/or its affiliates. All rights reserved.
 */

procedure pagerank(graph G, double tol, double damp, int max_iter, bool norm; nodeProp<double> rank) {

  double diff;
  int cnt = 0;
  double N = G.numNodes();

  G.rank = 1 / N;
  do {
    diff = 0.0;
    double dangling_factor = 0;

    if (norm) {
      dangling_factor = damp / N * sum (v: G.nodes) (v.outDegree() == 0) {v.rank};
    }

    foreach (t: G.nodes) {
      double in_sum = sum (w: t.inNbrs) {w.rank / w.outDegree()};
      double val = (1 - damp) / N + damp * in_sum + dangling_factor;
      diff += |val - t.rank|;
      t.rank <= val;
    }
    cnt++;
  } while (diff > tol && cnt < max_iter);

}

PageRank computes a rank value between 0 and 1 for each vertex in the graph and stores them in a property named pg_rank. The algorithm takes on a node property of type double as out-argument.

Transient vs. Persistent properties

In PGX, there are two types of vertex and edge properties: transient and persistent. Properties loaded together with a graph from the data source are fixed in-memory copies of the data on disk, and are hence called persistent. Persistent properties are read-only, immutable and shared between sessions. Values can only be written to transient properties, which are session-private. Transient properties can be created by calling createVertexProperty() (or createEdgeProperty()) on PgxGraph objects or by copying existing properties using clone() on Property objects. Many algorithms provided by the Analyst create transient properties implicitly to hold the computation result. They will get destroyed automatically once the Analyst object gets destroyed. Refer to the chapter on Analyst resource management for details .

Let's call the built-in PageRank algorithm on our loaded graph g, using 0.001 as max error,0.85 as damping factor and a maximum of 100 iterations:

pgx> var rank = analyst.pagerank(G, 0.001, 0.85, 100)
VertexProperty<Integer, Double> rank = analyst.pagerank(G, 0.001, 0.85, 100);

To see the top 3 vertices with the highest PageRank values, do

pgx> rank.getTopKValues(3)
==> PgxVertex with ID 128=0.1402019732468347
==> PgxVertex with ID 333=0.12002296283541904
==> PgxVertex with ID 99=0.09708583862990475
for (Map.Entry<PgxVertex<Integer>, Double> entry : rank.getTopKValues(3)) {
  System.out.println(entry.getKey() + "=" + entry.getValue());
}