Using the Machine Learning Library (PgxML) for Graphs

7 Using the Machine Learning Library (PgxML) for Graphs

The in-memory graph server (PGX) provides a machine learning library oracle.pgx.api.mllib, which supports graph-empowered machine learning algorithms.

The following machine learning algorithms are currently supported:

Using the DeepWalk Algorithm
DeepWalk is a widely employed vertex representation learning algorithm used in industry.
Using the Supervised GraphWise Algorithm
Supervised GraphWise is an inductive vertex representation learning algorithm which is able to leverage vertex feature information. It can be applied to a wide variety of tasks, including vertex classification and link prediction.
Using the Pg2vec Algorithm
Pg2vec learns representations of graphlets (partitions inside a graph) by employing edges as the principal learning units and thereby packing more information in each learning unit (as compared to employing vertices as learning units) for the representation learning task.

7.1 Using the DeepWalk Algorithm

DeepWalk is a widely employed vertex representation learning algorithm used in industry.

It consists of two main steps:

First, the random walk generation step computes random walks for each vertex (with a pre-defined walk length and a pre-defined number of walks per vertex).
Second, these generated walks are fed to a Word2vec algorithm to generate the vector representation for each vertex (which is the word in the input provided to the Word2vec algorithm). See KDD paper for more details on DeepWalk algorithm.

DeepWalk creates vertex embeddings for a specific graph and cannot be updated to incorporate modifications on the graph. Instead, a new DeepWalk model should be trained on this modified graph. Lastly, it is important to note that the memory consumption of the DeepWalk model is O(2n*d) where n is the number of vertices in the graph and d is the embedding length.

The following describes the usage of the main functionalities of DeepWalk in in-memory PGX using DBpedia graph as an example with 8,637,721 vertices and 165,049,964 edges:

Parent topic: Using the Machine Learning Library (PgxML) for Graphs

7.1.1 Loading a Graph

The following describes the steps for loading a graph:

Create a Session and an Analyst.

Creating a Session and an Analyst Using JShell

cd /opt/oracle/graph/
./bin/opg-jshell
// starting the shell will create an implicit session and analyst

Creating a Session and an Analyst Using Java

import oracle.pgx.api.*;
import oracle.pgx.api.mllib.DeepWalkModel;
import oracle.pgx.api.frames.*;
...
PgxSession session = Pgx.createSession("my-session");
Analyst analyst = session.createAnalyst();

Creating a Session and an Analyst Using Python

session = pypgx.get_session(session_name="my-session")
analyst = session.create_analyst()

Load the graph.

Note:
Though the DeepWalk algorithm implementation can be applied to directed or undirected graphs, currently only undirected random walks are considered.
Loading a graph using JShell
```
opg-jshell> var graph = session.readGraphWithProperties("<path>/<graph.json>")
```
Loading a graph using Java
```
PgxGraph graph = session.readGraphWithProperties("<path>/<graph.json>");
```
Loading a graph using Python
```
graph = session.read_graph_with_properties("<path>/<graph.json>")
```

Parent topic: Using the DeepWalk Algorithm

7.1.2 Building a Minimal DeepWalk Model

You can build a DeepWalk model using the minimal configuration and default hyper-parameters as described in the following code:

Building a Minimal DeepWalk Model Using JShell

opg-jshell> var model = analyst.deepWalkModelBuilder()
                .setWindowSize(3)
                .setWalksPerVertex(6)
                .setWalkLength(4)
                .build()

Building a Minimal DeepWalk Model Using Java

DeepWalkModel model = analyst.deepWalkModelBuilder()
    .setWindowSize(3)
    .setWalksPerVertex(6)
    .setWalkLength(4)
    .build()

Building a Minimal DeepWalk Model Using Python

model = analyst.deepwalk_builder(window_size=3,walks_per_vertex=6,walk_length=4)

Parent topic: Using the DeepWalk Algorithm

7.1.3 Building a Customized DeepWalk Model

You can build a DeepWalk model using cusomized hyper-parameters as described in the following code:

Building a Customized DeepWalk model Using JShell

opg-jshell> var model = analyst.deepWalkModelBuilder()
                .setMinWordFrequency(1)
                .setBatchSize(512)
                .setNumEpochs(1)
                .setLayerSize(100)
                .setLearningRate(0.05)
                .setMinLearningRate(0.0001)
                .setWindowSize(3)
                .setWalksPerVertex(6)
                .setWalkLength(4)
                .setSampleRate(0.00001)
                .setNegativeSample(2)
                .setValidationFraction(0.01)
                .build()

Building a Customized DeepWalk model Using Java

DeepWalkModel model= analyst.deepWalkModelBuilder()
    .setMinWordFrequency(1)
    .setBatchSize(512)
    .setNumEpochs(1)
    .setLayerSize(100)
    .setLearningRate(0.05)
    .setMinLearningRate(0.0001)
    .setWindowSize(3)
    .setWalksPerVertex(6)
    .setWalkLength(4)
    .setSampleRate(0.00001)
    .setNegativeSample(2)
    .setValidationFraction(0.01)
    .build()

Building a Customized DeepWalk model Using Python

model = analyst.deepwalk_builder(min_word_frequency=1,
                                batch_size=512,num_epochs=1,
                                layer_size=100,
                                learning_rate=0.05,
                                min_learning_rate=0.0001,
                                window_size=3,
                                walks_per_vertex=6,
                                walk_length=4,
                                sample_rate=0.00001,
                                negative_sample=2,
                                validation_fraction=0.01)

See DeepWalkModelBuilder in Javadoc for more explanation for each builder operation along with the default values.

Parent topic: Using the DeepWalk Algorithm

7.1.4 Training a DeepWalk Model

You can train a DeepWalk model with the specified default or customized settings as described in the following code:

Training a DeepWalk model Using JShell

opg-jshell> model.fit(graph)

Training a DeepWalk model Using Java

model.fit(graph)

Training a DeepWalk model Using Python

model.fit(graph)

Parent topic: Using the DeepWalk Algorithm

7.1.5 Getting the Loss Value For a DeepWalk Model

You can fetch the loss value on a specified fraction of training data, that is set in builder using setValidationFraction as described in the following code:

Getting the Loss Value Using JShell

opg-jshell> var loss = model.getLoss()

Getting the Loss Value Using Java

double loss = model.getLoss();

Getting the Loss Value Using Python

loss = model.loss

Parent topic: Using the DeepWalk Algorithm

7.1.6 Computing Similar Vertices for a Given Vertex

You can fetch the k most similar vertices for a given vertex as described in the following code:

Computing Similar Vertices for Given Vertex Using JShell

opg-jshell> var similars = model.computeSimilars("Albert_Einstein", 10)
opg-jshell> similars.print()

Computing Similar Vertices for Given Vertex Using Java

PgxFrame similars = model.computeSimilars("Albert_Einstein", 10)
similars.print()

Computing Similar Vertices for Given Vertex Using Python

similars = model.compute_similars("Albert_Einstein",10)
similars.print()

Searching for similar vertices for Albert_Einstein using the trained model, will result in the following output:

+-----------------------------------------+
| dstVertex          | similarity         |
+-----------------------------------------+
| Albert_Einstein    | 1.0000001192092896 |
| Physics            | 0.8664291501045227 |
| Werner_Heisenberg  | 0.8625140190124512 |
| Richard_Feynman    | 0.8496938943862915 |
| List_of_physicists | 0.8415523767471313 |
| Physicist          | 0.8384397625923157 |
| Max_Planck         | 0.8370327353477478 |
| Niels_Bohr         | 0.8340970873832703 |
| Quantum_mechanics  | 0.8331197500228882 |
| Special_relativity | 0.8280861973762512 |
+-----------------------------------------+

Parent topic: Using the DeepWalk Algorithm

7.1.7 Computing Similar Vertices for a Vertex Batch

You can fetch the k most similar vertices for a list of input vertices as described in the following code:

Computing Similar Vertices for a Vertex Batch Using JShell

opg-jshell> var vertices = new ArrayList()
opg-jshell> vertices.add("Machine_learning")
opg-jshell> vertices.add("Albert_Einstein")
opg-jshell> batchedSimilars = model.computeSimilars(vertices, 10)
opg-jshell> batchedSimilars.print()

Computing Similar Vertices for a Vertex Batch Using Java

List vertices = Arrays.asList("Machine_learning","Albert_Einstein");
PgxFrame batchedSimilars = model.computeSimilars(vertices,10);
batchedSimilars.print();

Computing Similar Vertices for a Vertex Batch Using Python

vertices = ["Machine_learning","Albert_Einstein"]
batched_similars = model.compute_similars(vertices,10)
batched_similars.print()

The following describes the output result:

+-------------------------------------------------------------------+
| srcVertex        | dstVertex                 | similarity         |
+-------------------------------------------------------------------+
| Machine_learning | Machine_learning          | 1.0000001192092896 |
| Machine_learning | Data_mining               | 0.9070799350738525 |
| Machine_learning | Computer_science          | 0.8963605165481567 |
| Machine_learning | Unsupervised_learning     | 0.8828719854354858 |
| Machine_learning | R_(programming_language)  | 0.8821185827255249 |
| Machine_learning | Algorithm                 | 0.8819515705108643 |
| Machine_learning | Artificial_neural_network | 0.8773092031478882 |
| Machine_learning | Data_analysis             | 0.8758628368377686 |
| Machine_learning | List_of_algorithms        | 0.8737979531288147 |
| Machine_learning | K-means_clustering        | 0.8715602159500122 |
| Albert_Einstein  | Albert_Einstein           | 1.0000001192092896 |
| Albert_Einstein  | Physics                   | 0.8664291501045227 |
| Albert_Einstein  | Werner_Heisenberg         | 0.8625140190124512 |
| Albert_Einstein  | Richard_Feynman           | 0.8496938943862915 |
| Albert_Einstein  | List_of_physicists        | 0.8415523767471313 |
| Albert_Einstein  | Physicist                 | 0.8384397625923157 |
| Albert_Einstein  | Max_Planck                | 0.8370327353477478 |
| Albert_Einstein  | Niels_Bohr                | 0.8340970873832703 |
| Albert_Einstein  | Quantum_mechanics         | 0.8331197500228882 |
| Albert_Einstein  | Special_relativity        | 0.8280861973762512 |
+-------------------------------------------------------------------+

Parent topic: Using the DeepWalk Algorithm

7.1.8 Storing a Trained DeepWalk Model

You can store models in database. The models get stored as a row inside a model store table.

The following code shows how to store a trained DeepWalk model in database in a specific model store table:

Storing a Trained DeepWalk Model Using JShell

opg-jshell> model.export().db() 
              .modelstore("modelstoretablename")  // name of the model store table
              .modelname("model")                 // model name (primary key of model store table)
              .description("a model description") // description to store alongside the model
              .store();

Storing a Trained DeepWalk Model Using Java

model.export().db()
    .modelstore("modelstoretablename")  // name of the model store table
    .modelname("model")                 // model name (primary key of model store table)
    .description("a model description") // description to store alongside the model
    .store();

Storing a Trained DeepWalk Model Using Python

model.export().db(model_store="modelstoretablename",
                  model_name="model")

Note:

All the above examples assume that you are storing the model in the current logged in database. If you must store the model in a different database then refer to the examples in Storing a Trained Model in Another Database.

Storing a Trained Model in Another Database

Parent topic: Using the DeepWalk Algorithm

7.1.8.1 Storing a Trained Model in Another Database

You can store models in a different database other than the one used for login.

The following code shows how to store a trained model in a different database:

Storing a Trained Model Using JShell

opg-jshell> model.export().db() 
              .username("user")                   // DB user to use for storing the model
              .password("password")               // password of the DB user
              .jdbcUrl("jdbcUrl")                 // jdbc url to the DB
              .modelstore("modelstoretablename")  // name of the model store table
              .modelname("model")                 // model name (primary key of model store table)
              .description("a model description") // description to store alongside the model
              .store();

Storing a Trained Model Using Java

model.export().db()
    .username("user")                   // DB user to use for storing the model
    .password("password")               // password of the DB user
    .jdbcUrl("jdbcUrl")                 // jdbc url to the DB
    .modelstore("modelstoretablename")  // name of the model store table
    .modelname("model")                 // model name (primary key of model store table)
    .description("a model description") // description to store alongside the model
    .store();

Storing a Trained Model Using Python

model.export().db(username="user",
                  password="password",
                  model_store="modelstoretablename",
                  model_name="model",
                  jdbc_url="jdbc_url")

Parent topic: Storing a Trained DeepWalk Model

7.1.9 Loading a Pre-Trained DeepWalk Model

You can load models from a database.

You can load a pre-trained DeepWalk model from a model store table in database as described in the following code:

Loading a Pre-Trained DeepWalk Model Using JShell

opg-jshell> var model = analyst.loadDeepWalkModel().db()
                .modelstore("modeltablename") // name of the model store table
                .modelname("model")           // model name (primary key of model store table)
                .load();

Loading a Pre-Trained DeepWalk Model Using Java

DeepWalkModelmodel = analyst.loadDeepWalkModel().db()
     .modelstore("modeltablename") // name of the model store table
     .modelname("model")           // model name (primary key of model store table)
     .load();

Loading a Pre-Trained DeepWalk Model Using Python

analyst.get_deepwalk_model_loader().db(model_store="modelstoretablename",
                                       model_name="model")

Note:

All the above examples assume that you are loading the model from the current logged in database. If you must load the model from a different database then refer to the examples in Loading a Pre-Trained Model From Another Database.

Loading a Pre-Trained Model From Another Database

Parent topic: Using the DeepWalk Algorithm

7.1.9.1 Loading a Pre-Trained Model From Another Database

You can load models from a different database other than the one used for login.

You can load a pre-trained model from a model store table in database as described in the following code:

Loading a Pre-Trained Model Using JShell

opg-jshell> var model = analyst.<modelLoader>.db()
                .username("user")             // DB user to use for storing the model
                .password("password")         // password of the DB user
                .jdbcUrl("jdbcUrl")           // jdbc url to the DB
                .modelstore("modeltablename") // name of the model store table
                .modelname("model")           // model name (primary key of model store table)
                .load();

where <modelLoader> applies as follows:

loadDeepWalkModel(): Loads a Deepwalk model
loadSupervisedGraphWiseModel(): Loads a GraphWise model
loadPg2vecModel(): Loads a Pg2vec model

Loading a Pre-Trained DeepWalk Model Using Java

DeepWalkModelmodel = analyst.<modelLoader>.db()
     .username("user")             // DB user to use for storing the model
     .password("password")         // password of the DB user
     .jdbcUrl("jdbcUrl")           // jdbc url to the DB
     .modelstore("modeltablename") // name of the model store table
     .modelname("model")           // model name (primary key of model store table)
     .load();

where <modelLoader> applies as follows:

loadDeepWalkModel(): Loads a Deepwalk model
loadSupervisedGraphWiseModel(): Loads a GraphWise model
loadPg2vecModel(): Loads a Pg2vec model

Loading a Pre-Trained DeepWalk Model Using Python

analyst.<modelLoader>.db(username="user",
                                       password="password",
                                       model_store="modelstoretablename",
                                       model_name="model",
                                       jdbc_url="jdbc_url")

where <modelLoader> applies as follows:

get_deepwalk_model_loader(): Loads a Deepwalk model
get_pg2vec_model_loader(): Loads a Pg2vec model

Parent topic: Loading a Pre-Trained DeepWalk Model

7.1.10 Destroying a DeepWalk Model

You can destroy a DeepWalk model as described in the following code:

Destroying a DeepWalk Model Using JShell

opg-jshell> model.destroy()

Destroying a DeepWalk Model Using Java

model.destroy();

Destroying a DeepWalk Model Using Python

model.destroy()

Parent topic: Using the DeepWalk Algorithm

7.2 Using the Supervised GraphWise Algorithm

Supervised GraphWise is an inductive vertex representation learning algorithm which is able to leverage vertex feature information. It can be applied to a wide variety of tasks, including vertex classification and link prediction.

Supervised GraphWise is based on GraphSAGE by Hamilton et al.

Model Structure

A Supervised GraphWise model consists of two graph convolutional layers followed by several prediction layers.

The forward pass through a convolutional layer for a vertex proceeds as follows:

A set of neighbors of the vertex is sampled.
The previous layer representations of the neighbors are mean-aggregated, and the aggregated features are concatenated with the previous layer representation of the vertex.
This concatenated vector is multiplied with weights, and a bias vector is added.
The result is normalized to such that the layer output has unit norm.

The prediction layers are standard neural network layers.

The following describes the usage of the main functionalities of the implementation of GraphSAGE in PGX using the Cora graph as an example:

Parent topic: Using the Machine Learning Library (PgxML) for Graphs

7.2.1 Loading a Graph

The following describes the steps for loading a graph:

Create a Session and an Analyst.

Creating a Session and an Analyst Using JShell

cd /opt/oracle/graph/
./bin/opg-jshell
// starting the shell will create an implicit session and analyst

Creating a Session and an Analyst Using Java

import oracle.pgx.api.*;
import oracle.pgx.api.mllib.SupervisedGraphWiseModel;
import oracle.pgx.api.frames.*;
import oracle.pgx.config.mllib.ActivationFunction;
import oracle.pgx.config.mllib.GraphWiseConvLayerConfig;
import oracle.pgx.config.mllib.GraphWisePredictionLayerConfig;
import oracle.pgx.config.mllib.SupervisedGraphWiseModelConfig;
import oracle.pgx.config.mllib.WeightInitScheme;
PgxSession session = Pgx.createSession("my-session");
Analyst analyst = session.createAnalyst();

Load the graph.

Loading a graph using JShell

opg-jshell> var fullGraph = session.readGraphWithProperties("<path>/<full_graph.json>")
opg-jshell> var trainGraph = session.readGraphWithProperties("<path>/<train_graph.json>")
opg-jshell> var testVertices = fullGraph.getVertices()
                .stream()
                .filter(v -> !trainGraph.hasVertex(v.getId()))
                .collect(Collectors.toList());

Loading a graph using Java

PgxGraph fullGraph = session.readGraphWithProperties("<path>/<full_graph.json>");
PgxGraph trainGraph = session.readGraphWithProperties("<path>/<train_graph.json>");
List<PgxVertex> testVertices = fullGraph.getVertices()
    .stream()
    .filter(v->!trainGraph.hasVertex(v.getId()))
    .collect(Collectors.toList());

Parent topic: Using the Supervised GraphWise Algorithm

7.2.2 Building a Minimal GraphWise Model

You can build a GraphWise model using the minimal configuration and default hyper-parameters as described in the following code:

Building a Minimal GraphWise Model Using JShell

opg-jshell> var model = analyst.supervisedGraphWiseModelBuilder()
                .setVertexInputPropertyNames("features")
                .setVertexTargetPropertyName("label")
                .build()

Building a Minimal GraphWise Model Using Java

SupervisedGraphWiseModel model = analyst.supervisedGraphWiseModelBuilder()
    .setVertexInputPropertyNames("features")
    .setVertexTargetPropertyName("labels")
    .build()

Note:

Even though only one feature property is specified in the above example, you can specify arbitrarily many.

Parent topic: Using the Supervised GraphWise Algorithm

7.2.3 Advanced Hyperparameter Customization

You can build a GraphWise model using rich hyperparameter customization.

This is done through the following two sub-config classes:

GraphWiseConvLayerConfig
GraphWisePredictionLayerConfig

The following code describes the implementation of the configuration using the above classes in GraphWise model:

Building a Customized GraphWise Model Using JShell

opg-jshell> var weightProperty = analyst.pagerank(trainGraph).getName()
opg-jshell> var convLayerConfig = analyst.graphWiseConvLayerConfigBuilder()
                .setNumSampledNeighbors(25)
                .setActivationFunction(ActivationFunction.TANH)
                .setWeightInitScheme(WeightInitScheme.XAVIER)
                .setWeightedAggregationProperty(weightProperty)
                .build()
opg-jshell> var predictionLayerConfig = analyst.graphWisePredictionLayerConfigBuilder()
                .setHiddenDimension(32)
                .setActivationFunction(ActivationFunction.RELU)
                .setWeightInitScheme(WeightInitScheme.HE)
                .build()
opg-jshell> var model = analyst.supervisedGraphWiseModelBuilder()
                .setVertexInputPropertyNames("features")
                .setVertexTargetPropertyName("labels")
                .setConvLayerConfigs(convLayerConfig)
                .setPredictionLayerConfigs(predictionLayerConfig)
                .build()

Building a Customized GraphWise Model Using Java

String weightProperty = analyst.pagerank(trainGraph).getName()
GraphWiseConvLayerConfig convLayerConfig = analyst.graphWiseConvLayerConfigBuilder()
    .setNumSampledNeighbors(25)
    .setActivationFunction(ActivationFunction.TANH)
    .setWeightInitScheme(WeightInitScheme.XAVIER)
    .setWeightedAggregationProperty(weightProperty)
    .build();

GraphWisePredictionLayerConfig predictionLayerConfig = analyst.graphWisePredictionLayerConfigBuilder()
    .setHiddenDimension(32)
    .setActivationFunction(ActivationFunction.RELU)
    .setWeightInitScheme(WeightInitScheme.HE)
    .build();

SupervisedGraphWiseModel model = analyst.supervisedGraphWiseModelBuilder()
    .setVertexInputPropertyNames("features")
    .setVertexTargetPropertyName("labels")
    .setConvLayerConfigs(convLayerConfig)
    .setPredictionLayerConfigs(predictionLayerConfig)
    .build();

See SupervisedGraphWiseModelBuilder, GraphWiseConvLayerConfigBuilder and GraphWisePredictionLayerConfigBuilder in Javadoc for a full description of all available hyperparameters and their default values.

Parent topic: Using the Supervised GraphWise Algorithm

7.2.4 Training a Supervised GraphWise Model

You can train a Supervised GraphWise model on a graph as described in the following code:

Training a GraphWise Model Using JShell

opg-jshell> model.fit(trainGraph)

Training a GraphWise Model Using Java

model.fit(trainGraph)

Parent topic: Using the Supervised GraphWise Algorithm

7.2.5 Getting the Loss Value For a Supervised GraphWise Model

You can fetch the training loss value as described in the following code:

Getting the Loss Value Using JShell

opg-jshell> var loss = model.getTrainingLoss()

Getting the Loss Value Using Java

double loss = model.getTrainingLoss();

Parent topic: Using the Supervised GraphWise Algorithm

7.2.6 Inferring the Vertex Labels for a Supervised GraphWise Model

You can infer the labels for vertices on any graph (including vertices or graphs that were not seen during training) as described in the following code:

Inferring the Vertex Labels Using JShell

opg-jshell> var labels = model.inferLabels(fullGraph, testVertices)
opg-jshell> labels.head().print()

Inferring the Vertex Labels Using Java

PgxFrame labels = model.inferLabels(fullGraph,testVertices);
labels.head().print();

The output will be similar to the following example output:

+----------------------------------+
| vertexId | label                 |
+----------------------------------+
| 2        | Neural Networks       |
| 6        | Theory                |
| 7        | Case Based            |
| 22       | Rule Learning         |
| 30       | Theory                |
| 34       | Neural Networks       |
| 47       | Case Based            |
| 48       | Probabalistic Methods |
| 50       | Theory                |
| 52       | Theory                |
+----------------------------------+

Similarly, you can also get the model confidence for each class by inferring the prediction logits as described in the following code:

Getting the Model Confidence Using JShell

opg-jshell> var logits = model.inferLogits(fullGraph, testVertices)
opg-jshell> labels.head().print()

Getting the Model Confidence Using Java

PgxFrame logits = model.inferLogits(fullGraph,testVertices);
logits.head().print();

Parent topic: Using the Supervised GraphWise Algorithm

7.2.7 Evaluating the Supervised GraphWise Model Performance

You can evaluate various classification metrics for the model using the evaluateLabels method as described in the following code:

Evaluating the Supervised GraphWise Model Performance Using JShell

opg-jshell> model.evaluateLabels(fullGraph, testVertices).print()

Evaluating the Supervised GraphWise Model Performance Using Java

model.evaluateLabels(fullGraph,testVertices).print();

The output will be similar to the following example output:

+------------------------------------------+
| Accuracy | Precision | Recall | F1-Score |
+------------------------------------------+
| 0.8488   | 0.8523    | 0.831  | 0.8367   |
+------------------------------------------+

Parent topic: Using the Supervised GraphWise Algorithm

7.2.8 Inferring Embeddings for a Supervised GraphWise Model

You can use a trained model to infer embeddings for unseen nodes and store in the database as described in the following code:

Inferring Embeddings Using JShell

opg-jshell> var vertexVectors = model.inferEmbeddings(fullGraph, fullGraph.getVertices()).flattenAll()
opg-jshell> vertexVectors.write()
    .db()
    .username("user")            // DB user
    .password("password")        // password of the DB user
    .jdbcUrl("jdbcUrl")          // jdbc url to the DB
    .name("vertex vectors")
    .tablename("vertexVectors")  // indicate the name of the table in which the data should be stored
    .overwrite(true)             // indicate that if there is a table with the same name, it will be overwritten (truncated)
    .store()

Inferring Embeddings Using Java

PgxFrame vertexVectors = model.inferEmbeddings(fullGraph,fullGraph.getVertices()).flattenAll();
vertexVectors.write()
    .db()
    .username("user")           // DB user
    .password("password")       // password of the DB user
    .jdbcUrl("jdbcUrl")         // jdbc url to the DB
    .name("vertex vectors")
    .tablename("vertexVectors") // indicate the name of the table in which the data should be stored
    .overwrite(true)            // indicate that if there is a table with the same name, it will be overwritten (truncated)
    .store();

The schema for the vertexVectors will be as follows without flattening (flattenAll splits the vector column into separate double-valued columns):

+---------------------------------------------------------------+
| vertexId                                | embedding           |
+---------------------------------------------------------------+

Parent topic: Using the Supervised GraphWise Algorithm

7.2.9 Storing a Trained Supervised GraphWise Model

You can store models in database. The models get stored as a row inside a model store table.

The following code shows how to store a trained Supervised GraphWise model in database in a specific model store table:

Storing a Trained Supervised GraphWise Model Using JShell

opg-jshell> model.export().db() 
              .modelstore("modelstoretablename")  // name of the model store table
              .modelname("model")                 // model name (primary key of model store table)
              .description("a model description") // description to store alongside the model
              .store();

Storing a Trained Supervised GraphWise Model Using Java

model.export().db()
    .modelstore("modelstoretablename")  // name of the model store table
    .modelname("model")                 // model name (primary key of model store table)
    .description("a model description") // description to store alongside the model
    .store();

Note:

Parent topic: Using the Supervised GraphWise Algorithm

7.2.10 Loading a Pre-Trained Supervised GraphWise Model

You can load models from a database.

You can load a pre-trained Supervised GraphWise model from a model store table in database as described in the following code:

Loading a Pre-Trained Supervised GraphWise Model Using JShell

opg-jshell> var model = analyst.loadSupervisedGraphWiseModel().db()
                .modelstore("modeltablename") // name of the model store table
                .modelname("model")           // model name (primary key of model store table)
                .load();

Loading a Pre-Trained Supervised GraphWise Model Using Java

SupervisedGraphWiseModelmodel = analyst.loadSupervisedGraphWiseModel().db()
     .modelstore("modeltablename") // name of the model store table
     .modelname("model")           // model name (primary key of model store table)
     .load();

Note:

Parent topic: Using the Supervised GraphWise Algorithm

7.2.11 Destroying a Supervised GraphWise Model

You can destroy a GraphWise model as described in the following code:

Destroying a GraphWise Model Using JShell

opg-jshell> model.destroy()

Destroying a GraphWise Model Using Java

model.destroy();

Parent topic: Using the Supervised GraphWise Algorithm

7.3 Using the Pg2vec Algorithm

Pg2vec learns representations of graphlets (partitions inside a graph) by employing edges as the principal learning units and thereby packing more information in each learning unit (as compared to employing vertices as learning units) for the representation learning task.

It consists of three main steps:

Random walks for each vertex (with pre-defined length per walk and pre-defined number of walks per vertex) is generated.
Each edge in this random walk is mapped as a property edge-word in the created document (with the document label as the graph-id) where the property edge-word is defined as the concatenation of the properties of the source and destination vertices.
The generated documents (with their attached document labels) are fed to a doc2vec algorithm which generates the vector representation for each document, which is a graph in this case.

Pg2vec creates graphlet embeddings for a specific set of graphlets and cannot be updated to incorporate modifications on these graphlets. Instead, a new Pg2vec model should be trained on these modified graphlets.

The following represents the memory consumption of Pg2vec model.

O(2(n+m)*d)

where:

n: is the number of vertices in the graph
m: is the number of graphlets in the graph
d: is the embedding length

The following describes the usage of the main functionalities of the implementation of Pg2vec in PGX using NCI109 dataset as an example with 4127 graphs in it:

Parent topic: Using the Machine Learning Library (PgxML) for Graphs

7.3.1 Loading a Graph

The following describes the steps for loading a graph:

Create a Session and an Analyst.

Creating a Session and an Analyst Using JShell

cd /opt/oracle/graph/
./bin/opg-jshell
// starting the shell will create an implicit session and analyst

Creating a Session and an Analyst Using Java

import oracle.pgx.api.*;
import oracle.pgx.api.mllib.Pg2vecModel;
import oracle.pgx.api.frames.*;
...
PgxSession session = Pgx.createSession("my-session");
Analyst analyst = session.createAnalyst();

Creating a Session and an Analyst Using Python

session = pypgx.get_session(session_name="my-session")
analyst = session.create_analyst()

Load the graph.

Loading a graph using JShell

opg-jshell> var graph = session.readGraphWithProperties("<path>/<graph.json>")

Loading a graph using Java

PgxGraph graph = session.readGraphWithProperties("<path>/<graph.json>");

Loading a graph using Python

graph = session.read_graph_with_properties("<path>/<graph.json>")

Parent topic: Using the Pg2vec Algorithm

7.3.2 Building a Minimal Pg2vec Model

You can build a Pg2vec model using the minimal configuration and default hyper-parameters as described in the following code:

Building a Minimal Pg2vec Model Using JShell

opg-jshell> var model = analyst.pg2vecModelBuilder()
                .setGraphLetIdPropertyName("graph_id")
                .setVertexPropertyNames(Arrays.asList("category"))
                .setWindowSize(4)
                .setWalksPerVertex(5)
                .setWalkLength(8)
                .build()

Building a Minimal Pg2vec Model Using Java

Pg2vecModel model = analyst.pg2vecModelBuilder()
    .setGraphLetIdPropertyName("graph_id")
    .setVertexPropertyNames(Arrays.asList("category"))
    .setWindowSize(4)
    .setWalksPerVertex(5)
    .setWalkLength(8)
    .build();

Building a Minimal Pg2vec Model Using Python

model = analyst.pg2vec_model_builder(
    graph_let_id_property_name="graph_id",
    vertex_property_names(["category"]),
    window_size=4,
    walks_per_vertex=5,
    walk_length=8)

You can specify the property name to determine each graphlet using the Pg2vecModelBuilder#setGraphLetIdPropertyName operation and also employ the vertex properties in Pg2vec which are specified using the Pg2vecModelBuilder#setVertexPropertyNames operation.

You can also use the weakly connected component (WCC) functionality in PGX to determine the graphlets in a given graph.

Parent topic: Using the Pg2vec Algorithm

7.3.3 Building a Customized Pg2vec Model

You can build a Pg2vec model using cusomized hyper-parameters as described in the following code:

Building a Customized Pg2vec model Using JShell

opg-jshell> var model = analyst.pg2vecModelBuilder()
                .setGraphLetIdPropertyName("graph_id")
                .setVertexPropertyNames(Arrays.asList("category"))
                .setMinWordFrequency(1)
                .setBatchSize(128)
                .setNumEpochs(5)
                .setLayerSize(200)
                .setLearningRate(0.04)
                .setMinLearningRate(0.0001)
                .setWindowSize(4)
                .setWalksPerVertex(5)
                .setWalkLength(8)
                .setUseGraphletSize(true)
                .setValidationFraction(0.05)
                .setGraphletSizePropertyName("<propertyName>")
                .build()

Building a Customized Pg2vec model Using Java

Pg2vecModel model= analyst.pg2vecModelBuilder()
    .setGraphLetIdPropertyName("graph_id")
    .setVertexPropertyNames(Arrays.asList("category"))
    .setMinWordFrequency(1)
    .setBatchSize(128)
    .setNumEpochs(5)
    .setLayerSize(200)
    .setLearningRate(0.04)
    .setMinLearningRate(0.0001)
    .setWindowSize(4)
    .setWalksPerVertex(5)
    .setWalkLength(8)
    .setUseGraphletSize(true)
    .setValidationFraction(0.05)
    .setGraphletSizePropertyName("<propertyName>")
    .build()

Building a Customized Pg2vec model Using Python

model = analyst.pg2vec_model_builder(
    graph_let_id_property_name = "graph_id",
    vertex_property_names = ["category"],
    min_word_frequency = 1,
    batch_size = 128,
    num_epochs = 5,
    layer_size = 200,
    learning_rate = 0.04,
    min_learning_rate = 0.0001,
    window_size = 4,
    walks_per_vertex = 5,
    walk_length = 8,
    use_graphlet_size = true,
    graphlet_size_property_name = "<property_name>",
    validation_fraction = 0.05)

See Pg2vecModelBuilder in Javadoc for more explanation for each builder operation along with the default values.

Parent topic: Using the Pg2vec Algorithm

7.3.4 Training a Pg2vec Model

You can train a Pg2vec model with the specified default or customized settings as described in the following code:

Training a Pg2vec Model Using JShell

opg-jshell> model.fit(graph)

Training a Pg2vec Model Using Java

model.fit(graph);

Training a Pg2vec Model Using Python

model.fit(graph)

Parent topic: Using the Pg2vec Algorithm

7.3.5 Getting the Loss Value For a Pg2vec Model

You can fetch the training loss value on a specified fraction of training data (set in builder using setValidationFraction) as described in the following code:

Getting the Loss Value Using JShell

opg-jshell> var loss = model.getLoss()

Getting the Loss Value Using Java

double loss = model.getLoss();

Getting the Loss Value Using Python

loss = model.loss

Parent topic: Using the Pg2vec Algorithm

7.3.6 Computing Similar Graphlets for a Given Graphlet

You can fetch the k most similar graphlets for a given graphlet as described in the following code:

Computing Similar Graphlets for Given Graphlet Using JShell

opg-jshell> var similars = model.computeSimilars(52, 10)

Computing Similar Graphlets for Given Graphlet Using Java

PgxFrame similars = model.computeSimilars(52, 10);

Computing Similar Graphlets for Given Graphlet Using Python

similars = model.compute_similars(52, 10)

Searching for similar vertices for graphlet with ID = 52 using the trained model and printing it with similars.print(), will result in the following output:

+----------------------------------+
| dstGraphlet | similarity         |
+----------------------------------+
| 52          | 1.0                |
| 10          | 0.8748674392700195 |
| 23          | 0.8551455140113831 |
| 26          | 0.8493421673774719 |
| 47          | 0.8411962985992432 |
| 25          | 0.8281504511833191 |
| 43          | 0.8202780485153198 |
| 24          | 0.8179885745048523 |
| 8           | 0.796689510345459  |
| 9           | 0.7947834134101868 |
+----------------------------------+

The following depicts the visualization of two similar graphlets (top: ID = 52 and bottom: ID = 10):

Figure 7-1 Pg2vec - Visualization of Two Similar Graphlets

Description of "Figure 7-1 Pg2vec - Visualization of Two Similar Graphlets" Description of Figure 7-1 follows

Description of "Figure 7-1 Pg2vec - Visualization of Two Similar Graphlets"

Parent topic: Using the Pg2vec Algorithm

7.3.7 Computing Similars for a Graphlet Batch

You can fetch the k most similar graphlets for a batch of input graphlets as described in the following code:

Computing Similar Graphlets for a Graphlet Batch Using JShell

opg-jshell> var graphlets = new ArrayList()
opg-jshell> graphlets.add(52)
opg-jshell> graphlets.add(41)
opg-jshell> var batchedSimilars = model.computeSimilars(graphlets, 10)

Computing Similar Graphlets for a Graphlet Batch Using Java

List graphlets = Arrays.asList(52,41);
PgxFrame batchedSimilars = model.computeSimilars(graphlets,10);

Computing Similar Graphlets for a Graphlet Batch Using Python

batched_similars = model.compute_similars([52,41],10)

Searching for similar vertices for graphlet with ID = 52 and ID = 41 using the trained model and printing it with batched_similars.print(), will result in the following output:

+------------------------------------------------+
| srcGraphlet | dstGraphlet | similarity         |
+------------------------------------------------+
| 52          | 52          | 1.0                |
| 52          | 10          | 0.8748674392700195 |
| 52          | 23          | 0.8551455140113831 |
| 52          | 26          | 0.8493421673774719 |
| 52          | 47          | 0.8411962985992432 |
| 52          | 25          | 0.8281504511833191 |
| 52          | 43          | 0.8202780485153198 |
| 52          | 24          | 0.8179885745048523 |
| 52          | 8           | 0.796689510345459  |
| 52          | 9           | 0.7947834134101868 |
| 41          | 41          | 1.0                |
| 41          | 197         | 0.9653506875038147 |
| 41          | 84          | 0.9552277326583862 |
| 41          | 157         | 0.9465565085411072 |
| 41          | 65          | 0.9287481307983398 |
| 41          | 248         | 0.9177336096763611 |
| 41          | 315         | 0.9043129086494446 |
| 41          | 92          | 0.8998928070068359 |
| 41          | 297         | 0.8897411227226257 |
| 41          | 50          | 0.8810243010520935 |
+------------------------------------------------+

Parent topic: Using the Pg2vec Algorithm

7.3.8 Inferring a Graphlet Vector

You can infer the vector representation for a given new graphlet as described in the following code:

Inferring a Graphlet Vector Using JShell

opg-jshell> var graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>")
opg-jshell> inferredVector = model.inferGraphletVector(graphlet)
opg-jshell> inferredVector.print()

Inferring a Graphlet Vector Using Java

PgxGraph graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>");
PgxFrame inferredVector = model.inferGraphletVector(graphlet);
inferredVector.print();

Inferring a Graphlet Vector Using Python

PgxGraph graphlet = session.read_graph_with_properties("<path>/<graphletConfig.json>")
inferredVector = model.infer_graphlet_vector(graphlet)
inferredVector.print()

The schema for the inferredVector will be similar to the following output:

+---------------------------------------------------------------+
| graphlet                                | embedding           |
+---------------------------------------------------------------+

Parent topic: Using the Pg2vec Algorithm

7.3.9 Inferring Vectors for a Graphlet Batch

You can infer the vector representations for multiple graphlets (specified with different graph-ids in a graph) as described in the following code:

Inferring Vectors for a Graphlet Batch Using JShell

opg-jshell> var graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>")
opg-jshell> inferredVectorBatched = model.inferGraphletVectorBatched(graphlets)
opg-jshell> inferredVectorBatched.print()

Inferring Vectors for a Graphlet Batch Using Java

PgxGraph graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>");
PgxFrame inferredVectorBatched = model.inferGraphletVectorBatched(graphlets);
inferredVector.print();

Inferring Vectors for a Graphlet Batch Using Python

graphlets = session.read_graph_with_properties("<path>/<graphletConfig.json>")
inferred_vector_batched = model.infer_graphlet_vector_batched(graphlets)
inferred_vector_batched.print()

The schema is same as for inferGraphletVector but with more rows corresponding to the input graphlets.

Parent topic: Using the Pg2vec Algorithm

7.3.10 Storing a Trained Pg2vec Model

You can store models in database. The models get stored as a row inside a model store table.

The following code shows how to store a trained Pg2vec model in database in a specific model store table:

Storing a Trained Pg2vec Model Using JShell

opg-jshell> model.export().db() 
              .modelstore("modelstoretablename")  // name of the model store table
              .modelname("model")                 // model name (primary key of model store table)
              .description("a model description") // description to store alongside the model
              .store();

Storing a Trained Pg2vec Model Using Java

model.export().db()
    .modelstore("modelstoretablename")  // name of the model store table
    .modelname("model")                 // model name (primary key of model store table)
    .description("a model description") // description to store alongside the model
    .store();

Storing a Trained Pg2vec Model Using Python

model.export().db(model_store="modelstoretablename",
                  model_name="model")

Note:

Parent topic: Using the Pg2vec Algorithm

7.3.11 Loading a Pre-Trained Pg2vec Model

You can load models from a database.

You can load a pre-trained Pg2vec model from a model store table in database as described in the following:

Loading a Pre-Trained Pg2vec Model Using JShell

opg-jshell> var model = analyst.loadPg2vecModel().db()
                .modelstore("modeltablename") // name of the model store table
                .modelname("model")           // model name (primary key of model store table)
                .load();

Loading a Pre-Trained Pg2vec Model Using Java

Pg2vecModelmodel = analyst.loadPg2vecModel().db()
     .modelstore("modeltablename") // name of the model store table
     .modelname("model")           // model name (primary key of model store table)
     .load();

Loading a Pre-Trained Pg2vec Model Using Python

analyst.get_pg2vec_model_loader().db(model_store="modelstoretablename",
                                     model_name="model")

Note:

Parent topic: Using the Pg2vec Algorithm

7.3.12 Destroying a Pg2vec Model

You can destroy a Pg2vec model as described in the following code:

Destroying a Pg2vec Model Using JShell

opg-jshell> model.destroy()

Destroying a Pg2vec Model Using Java

model.destroy();

Destroying a Pg2vec Model Using Python

model.destroy()

Parent topic: Using the Pg2vec Algorithm