16.6 Executing Built-in Algorithms

The graph server (PGX) contains a set of built-in algorithms that are available as Java APIs.

The following table provides an overview of the available algorithms, grouped by category. Note that these algorithms can be invoked through the Analyst Class.

Note:

See the supported Built-In Algorithms on GitHub for more details.

Table 16-11 Overview of Built-In Algorithms

Category Algorithms
Classic graph algorithms Prim's Algorithm
Community detection Conductance Minimization (Soman and Narang Algorithm), Infomap, Label Propagation, Louvain, LouvainDirected, Speaker Listener Label Propagation
Connected components Strongly Connected Components, Weakly Connected Components (WCC)
Link predition WTF (Whom To Follow) Algorithm
Matrix factorization Matrix Factorization
Other Graph Traversal Algorithms
Path finding All Vertices and Edges on Filtered Path, Bellman-Ford Algorithms, Bidirectional Dijkstra Algorithms, Compute Distance Index, Compute High-Degree Vertices, Dijkstra Algorithms, Enumerate Simple Paths, Fast Path Finding, Fattest Path, Fattest Path (ignoring edge directions), Filtered Fast Path Finding, Hop Distance Algorithms
Ranking and walking ArticleRank Algorithms, Closeness Centrality Algorithms, Degree Centrality Algorithms, Eigenvector Centrality, Harmonic Centrality, Hyperlink-Induced Topic Search (HITS), PageRank Algorithms, Random Walk with Restart, Stochastic Approach for Link-Structure Analysis (SALSA) Algorithms, Vertex Betweenness Centrality Algorithms
Structure evaluation Adamic-Adar algorithms, Bipartite Check, Clustering Coefficient Algorithms, Conductance, Cycle Detection Algorithms, Degree Distribution Algorithms, Eccentricity Algorithms, K-Core, Modularity, Partition Conductance, Reachability Algorithms, Topological Ordering Algorithms, Triangle Counting Algorithms

This following topics describe the use of the graph server (PGX) using Triangle Counting and PageRank analytics as examples.

16.6.1 About Built-In Algorithms in the Graph Server (PGX)

The graph server (PGX) contains a set of built-in algorithms that are available as Java APIs. The details of the APIs are documented in the Javadoc that is included in the product documentation library. Specifically, see the BuiltinAlgorithms interface Method Summary for a list of the supported in-memory analyst methods.

For example, this is the PageRank procedure signature:

/**
   * Classic pagerank algorithm. Time complexity: O(E * K) with E = number of edges, K is a given constant (max
   * iterations)
   *
   * @param graph
   *          graph
   * @param e
   *          maximum error for terminating the iteration
   * @param d
   *          damping factor
   * @param max
   *          maximum number of iterations
   * @return Vertex Property holding the result as a double
   */
  public <ID extends Comparable<ID>> VertexProperty<ID, Double> pagerank(PgxGraph graph, double e, double d, int max);

16.6.2 Running the Triangle Counting Algorithm

For triangle counting, the sortByDegree boolean parameter of countTriangles() allows you to control whether the graph should first be sorted by degree (true) or not (false). If true, more memory will be used, but the algorithm will run faster; however, if your graph is very large, you might want to turn this optimization off to avoid running out of memory.

opg4j> analyst.countTriangles(graph, true)
==> 1
import oracle.pgx.api.*;
 
Analyst analyst = session.createAnalyst();
long triangles = analyst.countTriangles(graph, true);

The algorithm finds one triangle in the sample graph.

Tip:

When using the graph shell, you can increase the amount of log output during execution by changing the logging level. See information about the :loglevel command with :h :loglevel.

16.6.3 Running the PageRank Algorithm

PageRank computes a rank value between 0 and 1 for each vertex (node) in the graph and stores the values in a double property. The algorithm therefore creates a vertex property of type double for the output.

In the graph server (PGX), there are two types of vertex and edge properties:

  • Persistent Properties: Properties that are loaded with the graph from a data source are fixed, in-memory copies of the data on disk, and are therefore persistent. Persistent properties are read-only, immutable and shared between sessions.

  • Transient Properties: Values can only be written to transient properties, which are private to a session. You can create transient properties by calling createVertexProperty and createEdgeProperty on PgxGraph objects, or by copying existing properties using clone() on Property objects.

    Transient properties hold the results of computation by algorithms. For example, the PageRank algorithm computes a rank value between 0 and 1 for each vertex in the graph and stores these values in a transient property named pg_rank. Transient properties are destroyed when the Analyst object is destroyed.

This example obtains the top three vertices with the highest PageRank values. It uses a transient vertex property of type double to hold the computed PageRank values. The PageRank algorithm uses the following default values for the input parameters: error (tolerance = 0.001), damping factor = 0.85, and maximum number of iterations = 100.

opg4j> rank = analyst.pagerank(graph, 0.001, 0.85, 100);
==> ...
opg4j> rank.getTopKValues(3)
==> 128=0.1402019732468347
==> 333=0.12002296283541904
==> 99=0.09708583862990475
import java.util.Map.Entry;
import oracle.pgx.api.*;
 
Analyst analyst = session.createAnalyst();
VertexProperty<Integer, Double> rank = analyst.pagerank(graph, 0.001, 0.85, 100);
for (Entry<Integer, Double> entry : rank.getTopKValues(3)) {
 System.out.println(entry.getKey() + "=" + entry.getValue());
}

16.6.4 Getting the Progress of a Running Algorithm

The progress of a graph algorithm is based on the value of a monotonically increasing counter that gets incremented periodically during algorithm executions. You can track the progress of the algorithms using the AlgorithmProgress Java API and review the progress by comparing the counter value at various points in time to an estimate of the final value of the counter.

The AlgorithmProgress object represents the progress of an algorithm execution at a certain time. It contains the following two attributes:

  • numberOfStepsCompleted: This counter represents the current number of steps executed.
  • numberOfStepsEstimatedForCompletion: This value is an estimation of the total number of steps needed for completion.

    A default positive value is provided for numberOfStepsEstimatedForCompletion for the following list of built-in algorithms:

    • PageRank
    • Approximate PageRank
    • Personalized PageRank
    • Personalized PageRank (for a set of vertices)
    • Personalized Weighted PageRank
    • Personalized Weighted PageRank (for a set of vertices)
    • Weighted PageRank
    • Degree Centrality
    • In-Degree Centrality
    • Out-Degree Centrality

You cannot estimate the progress as a percentage for algorithms that do not provide a value for numberOfStepsEstimatedForCompletion. In such a case, you can only access the value of the counter (numberOfStepsCompleted).

However, you can set the numberOfStepsEstimatedForCompletion value for a custom PGX graph algorithm. See Tracking the Progress of a Running Custom PGX Graph Algorithm for more information.

The following example uses the in-built PageRank algorithm and describes the steps to get the progress of the running built-in algorithm using the AlgorithmProgress Java API:

opg4j> var graph = session.readGraphByName("BANK_TXN_GRAPH", GraphSource.PG_PGQL)
g ==> PgxGraph[name=BANK_TXN_GRAPH,N=1000,E=4993,created=1712307339271]
opg4j>  var future = analyst.pagerankAsync(graph)
future ==> oracle.pgx.api.PgxFuture@1dfe5dd1[Not completed]
opg4j> var futureProgress = future.getProgress()
futureProgress ==> oracle.pgx.api.DefaultFutureProgress@6d7bb5cc
opg4j> var algorithmProgress = futureProgress.asAlgorithmExecutionProgress()
PgxGraph graph = session.readGraphByName("BANK_TXN_GRAPH", GraphSource.PG_PGQL);
PgxFuture<?> future = analyst.pagerank.runAsync(graph);
FutureProgress futureProgress = future.getProgress();
Optional<AlgorithmProgress> algorithmProgress = futureProgress.asAlgorithmExecutionProgress();

The following code shows how you can estimate the progress as a percentage for the running algorithm:

opg4j> if (algorithmProgress.isPresent()) {
...>   var progress = algorithmProgress.get();
...>   var completedSteps = progress.getNumberOfStepsCompleted();
...>   var numberOfStepsEstimatedForCompletion = progress.getNumberOfStepsEstimatedForCompletion();
...>   var progressPercentage = completedSteps * 100 / numberOfStepsEstimatedForCompletion;
...>   System.out.println(completedSteps);
...>   System.out.println(numberOfStepsEstimatedForCompletion);
...>   System.out.println(progressPercentage);
...> }
if (algorithmProgress.isPresent()) {
  AlgorithmProgress progress = algorithmProgress.get();
  long completedSteps = progress.getNumberOfStepsCompleted();
  Long numberOfStepsEstimatedForCompletion = progress.getNumberOfStepsEstimatedForCompletion();
  long progressPercentage = completedSteps * 100 / numberOfStepsEstimatedForCompletion;
  System.out.println(completedSteps); 
  System.out.println(numberOfStepsEstimatedForCompletion); 
  System.out.println(progressPercentage); 
};

The preceding code shows the progress at that current moment. If you try to get the progress of the running algorithm after a while (for example, 1min), then you should get a larger value for progressPercentage.