PGX 21.1.1

Documentation

Documentation

PGX includes a wide selection of optimized graph algorithms that can be invoked through the Analyst. The following table provides an overview of the available algorithms, grouped by category.

The rest of this page illustrates how to execute these built-in algorithms.

First, we describe the design of the Analyst API and explain how a Green-Marl signature can be mapped to the corresponding method in the Analyst API. Second, we show how to run two of the built-in algorithms: Triangle Counting and PageRank.

All built-in algorithms are written in Green-Marl, a domain-specific language for graph algorithms.
Users can write their own custom graph algorithms using the Java-based PGX Algorithm API or the Green-Marl graph DSL.
The `oracle.pgx.api.Analyst`

class provides convenience methods for invoking a set of builtin algorithms that we already include with PGX.
For each builtin algorithm, we provide both the `Java`

(i.e., PGX Algorithm) and the Green-Marl source code.

The difference between the signature of an algorighm written in Green-Marl (or its Java frontend) and the corresponding method signature in the Analyst class is that Green-Marl uses output arguments, whereas the Analyst method use the return value. For example, the following snippet shows the signature of the PageRank procedure in the Java-based PGX Algorithm API and in the Green-Marl DSL.

public class Pagerank { public void pagerank(PgxGraph g, double tol, double damp, int maxIter, boolean norm, @Out VertexProperty<Double> rank) { ... } }

procedure pagerank(G: graph, e,d: double, max: int; rank: nodeProp<double>)

The next snippet shows the corresponding method found in the Analyst class (notice the return value).

VertexProperty<ID, Double> pagerank(PgxGraph graph, double e, double d, int max);

pagerank(self, graph, tol=0.001, damping=0.85, max_iter=100, norm=False, rank="pagerank")

PGX also provides asynchronous versions of every built-in algorithm which immediately return a Future object. These can later be used to retrieve the algorithm result or to cancel the running request. The asynchronous version of PageRank looks like this:

PgxFuture<VertexProperty<ID, Double>> pagerankAsync(PgxGraph graph, double e, double d, int max);

All asynchronous PGX methods are suffixed *Async* by convention and return `PgxFuture`

, a JDK7-compatible version of Java 8's CompletableFuture which allows you to chain various asynchronous requests together in a non-blocking fashion. Read more about this technique here.

Read more about PGX APIs here.

First, create a session:

cd $PGX_HOME ./bin/pgx-jshell // starting the shell will create an implicit session

import oracle.pgx.api.*; ... PgxSession session = Pgx.createSession("my-session"); let p = pgx.connect(url, options);

session =get_session(session_name="my-session")

Next, load a graph into memory:

pgx> G = session.readGraphWithProperties("examples/graphs/connections.edge_list.json")

PgxGraph G = session.readGraphWithProperties("examples/graphs/connections.edge_list.json");

G = session.read_graph_with_properties("examples/graphs/connections.edge_list.json")

Check out the Green-Marl code of triangle counting:

/* * Copyright (C) 2013 - 2020 Oracle and/or its affiliates. All rights reserved. */ procedure triangle_counting(graph G) : long { long t = 0; foreach (u: G.nodes) { foreach (v: u.inOutNbrs) (v > u) { foreach (w: u.inOutNbrs) (w > v) { if (v.hasEdgeTo(w)) { t++; } } foreach (w: u.inOutNbrs) (w > v) { if (v.hasEdgeFrom(w)) { t++; } } } } return t; }

Triangle Counting is defined on *undirected* graphs only. In PGX however, graphs are always *directed* by default after loaded into memory. That is why the PGX `Analyst#countTriangles()`

algorithm implicitly creates an undirected copy of the loaded graph first by calling `PgxGraph#undirect()`

. The copy is destroyed automatically after the algorithm terminates. Besides the input graph, the `Analyst#countTriangles()`

method takes a second `boolean`

parameter you can use to trade memory for speed. If set to `true`

, PGX will additionally sort the undirected copy by degree which makes the algorithm run faster but increases memory consumption. If set to `false`

, the algorithm might run slower, but less peak memory is consumed during computation.

pgx> analyst.countTriangles(G, true) ===> 1

Analyst analyst = session.createAnalyst(); long triangles = analyst.countTriangles(G, true);

analyst = session.create_analyst() triangles = analyst.count_triangle(G, true)

The algorithm found 1 triangle in our sample graph.

Tip: Increase log output

With the PGX Shell, you can easily increase the amount of log output that PGX prints during execution by changing the logging level with the `:loglevel`

command. See details about the logging here.

Let's take a look at the source code of the Green-Marl implementation of PageRank:

/* * Copyright (C) 2013 - 2020 Oracle and/or its affiliates. All rights reserved. */ procedure pagerank(graph G, double tol, double damp, int max_iter, bool norm; vertexProp<double> rank) { double diff; int cnt = 0; double N = G.numNodes(); G.rank = 1 / N; do { diff = 0.0; double dangling_factor = 0; if (norm) { dangling_factor = damp / N * sum (v: G.nodes) (v.outDegree() == 0) {v.rank}; } foreach (t: G.nodes) { double in_sum = sum (w: t.inNbrs) {w.rank / w.outDegree()}; double val = (1 - damp) / N + damp * in_sum + dangling_factor; diff += |val - t.rank|; t.rank <= val; } cnt++; } while (diff > tol && cnt < max_iter); }

PageRank computes a rank value between `0`

and `1`

for each vertex in the graph and stores them in a property named `pg_rank`

. The algorithm takes on a *node property* of type `double`

as out-argument.

Transient vs. Persistent properties

In PGX, there are two types of vertex and edge properties: *transient* and *persistent*. Properties loaded together with a graph from the data source are *fixed* in-memory copies of the data on disk, and are hence called *persistent*. Persistent properties are *read-only*, *immutable* and *shared between sessions*. Values can only be written to *transient* properties, which are *session-private*. Transient properties can be created by calling `createVertexProperty()`

(or `createEdgeProperty()`

) on `PgxGraph`

objects or by copying existing properties using `clone()`

on `Property`

objects. Many algorithms provided by the `Analyst`

create transient properties implicitly to hold the computation result. They will get destroyed automatically once the `Analyst`

object gets destroyed. Refer to the chapter on Analyst resource management for details .

Let's call the built-in PageRank algorithm on our loaded graph `g`

, using `0.001`

as max error,`0.85`

as damping factor and a maximum of `100`

iterations:

pgx> var rank = analyst.pagerank(G, 0.001, 0.85, 100)

VertexProperty<Integer, Double> rank = analyst.pagerank(G, 0.001, 0.85, 100);

rank = analyst.pagerank(G, tol= 0.001, damping= 0.85, max_iter= 100)

To see the top 3 vertices with the highest PageRank values, do

pgx> rank.getTopKValues(3) ==> PgxVertex with ID 128=0.1402019732468347 ==> PgxVertex with ID 333=0.12002296283541904 ==> PgxVertex with ID 99=0.09708583862990475

for (Map.Entry<PgxVertex<Integer>, Double> entry : rank.getTopKValues(3)) { System.out.println(entry.getKey() + "=" + entry.getValue()); }

for entry in zip(rank.get_top_k_values(3)): print("{} = {}".format(*entry))