7 Using the Machine Learning Library (PgxML) for Graphs
The in-memory graph server (PGX) provides a machine learning library oracle.pgx.api.mllib
, which supports graph-empowered machine learning algorithms.
The following machine learning algorithms are currently supported:
- Using the DeepWalk Algorithm
DeepWalk is a widely employed vertex representation learning algorithm used in industry. - Using the Supervised GraphWise Algorithm
Supervised GraphWise is an inductive vertex representation learning algorithm which is able to leverage vertex feature information. It can be applied to a wide variety of tasks, including vertex classification and link prediction. - Using the Pg2vec Algorithm
Pg2vec learns representations of graphlets (partitions inside a graph) by employing edges as the principal learning units and thereby packing more information in each learning unit (as compared to employing vertices as learning units) for the representation learning task.
7.1 Using the DeepWalk Algorithm
DeepWalk is a widely employed vertex representation learning algorithm used in industry.
It consists of two main steps:
- First, the random walk generation step computes random walks for each vertex (with a pre-defined walk length and a pre-defined number of walks per vertex).
- Second, these generated walks are fed to a Word2vec algorithm to generate the vector representation for each vertex (which is the word in the input provided to the Word2vec algorithm). See KDD paper for more details on DeepWalk algorithm.
DeepWalk creates vertex embeddings for a specific graph and cannot be updated to incorporate modifications on the graph. Instead, a new DeepWalk model should be trained on this modified graph. Lastly, it is important to note that the memory consumption of the DeepWalk model is O(2n*d)
where n
is the number of vertices in the graph and d
is the embedding length.
The following describes the usage of the main functionalities of DeepWalk in in-memory PGX using DBpedia graph as an example with 8,637,721
vertices and 165,049,964
edges:
- Loading a Graph
- Building a Minimal DeepWalk Model
- Building a Customized DeepWalk Model
- Training a DeepWalk Model
- Getting the Loss Value For a DeepWalk Model
- Computing Similar Vertices for a Given Vertex
- Computing Similar Vertices for a Vertex Batch
- Storing a Trained DeepWalk Model
- Loading a Pre-Trained DeepWalk Model
- Destroying a DeepWalk Model
Parent topic: Using the Machine Learning Library (PgxML) for Graphs
7.1.1 Loading a Graph
The following describes the steps for loading a graph:
- Create a Session and an Analyst.
Creating a Session and an Analyst Using JShell
cd /opt/oracle/graph/ ./bin/opg-jshell // starting the shell will create an implicit session and analyst
Creating a Session and an Analyst Using Javaimport oracle.pgx.api.*; import oracle.pgx.api.mllib.DeepWalkModel; import oracle.pgx.api.frames.*; ... PgxSession session = Pgx.createSession("my-session"); Analyst analyst = session.createAnalyst();
Creating a Session and an Analyst Using Pythonsession = pypgx.get_session(session_name="my-session") analyst = session.create_analyst()
- Load the graph.
Note:
Though the DeepWalk algorithm implementation can be applied to directed or undirected graphs, currently only undirected random walks are considered.Loading a graph using JShellopg-jshell> var graph = session.readGraphWithProperties("<path>/<graph.json>")
Loading a graph using JavaPgxGraph graph = session.readGraphWithProperties("<path>/<graph.json>");
Loading a graph using Pythongraph = session.read_graph_with_properties("<path>/<graph.json>")
Parent topic: Using the DeepWalk Algorithm
7.1.2 Building a Minimal DeepWalk Model
You can build a DeepWalk model using the minimal configuration and default hyper-parameters as described in the following code:
opg-jshell> var model = analyst.deepWalkModelBuilder() .setWindowSize(3) .setWalksPerVertex(6) .setWalkLength(4) .build()
DeepWalkModel model = analyst.deepWalkModelBuilder() .setWindowSize(3) .setWalksPerVertex(6) .setWalkLength(4) .build()
model = analyst.deepwalk_builder(window_size=3,walks_per_vertex=6,walk_length=4)
Parent topic: Using the DeepWalk Algorithm
7.1.3 Building a Customized DeepWalk Model
You can build a DeepWalk model using cusomized hyper-parameters as described in the following code:
opg-jshell> var model = analyst.deepWalkModelBuilder() .setMinWordFrequency(1) .setBatchSize(512) .setNumEpochs(1) .setLayerSize(100) .setLearningRate(0.05) .setMinLearningRate(0.0001) .setWindowSize(3) .setWalksPerVertex(6) .setWalkLength(4) .setSampleRate(0.00001) .setNegativeSample(2) .setValidationFraction(0.01) .build()
DeepWalkModel model= analyst.deepWalkModelBuilder() .setMinWordFrequency(1) .setBatchSize(512) .setNumEpochs(1) .setLayerSize(100) .setLearningRate(0.05) .setMinLearningRate(0.0001) .setWindowSize(3) .setWalksPerVertex(6) .setWalkLength(4) .setSampleRate(0.00001) .setNegativeSample(2) .setValidationFraction(0.01) .build()
model = analyst.deepwalk_builder(min_word_frequency=1, batch_size=512,num_epochs=1, layer_size=100, learning_rate=0.05, min_learning_rate=0.0001, window_size=3, walks_per_vertex=6, walk_length=4, sample_rate=0.00001, negative_sample=2, validation_fraction=0.01)
See DeepWalkModelBuilder in Javadoc for more explanation for each builder operation along with the default values.
Parent topic: Using the DeepWalk Algorithm
7.1.4 Training a DeepWalk Model
You can train a DeepWalk model with the specified default or customized settings as described in the following code:
opg-jshell> model.fit(graph)
model.fit(graph)
model.fit(graph)
Parent topic: Using the DeepWalk Algorithm
7.1.5 Getting the Loss Value For a DeepWalk Model
You can fetch the loss value on a specified fraction of training data, that is set in builder using setValidationFraction
as described in the following code:
opg-jshell> var loss = model.getLoss()
double loss = model.getLoss();
loss = model.loss
Parent topic: Using the DeepWalk Algorithm
7.1.6 Computing Similar Vertices for a Given Vertex
You can fetch the k
most similar vertices for a given vertex as described in the following code:
opg-jshell> var similars = model.computeSimilars("Albert_Einstein", 10) opg-jshell> similars.print()
PgxFrame similars = model.computeSimilars("Albert_Einstein", 10) similars.print()
similars = model.compute_similars("Albert_Einstein",10) similars.print()
+-----------------------------------------+ | dstVertex | similarity | +-----------------------------------------+ | Albert_Einstein | 1.0000001192092896 | | Physics | 0.8664291501045227 | | Werner_Heisenberg | 0.8625140190124512 | | Richard_Feynman | 0.8496938943862915 | | List_of_physicists | 0.8415523767471313 | | Physicist | 0.8384397625923157 | | Max_Planck | 0.8370327353477478 | | Niels_Bohr | 0.8340970873832703 | | Quantum_mechanics | 0.8331197500228882 | | Special_relativity | 0.8280861973762512 | +-----------------------------------------+
Parent topic: Using the DeepWalk Algorithm
7.1.7 Computing Similar Vertices for a Vertex Batch
You can fetch the k
most similar vertices for a list of input vertices as described in the following code:
opg-jshell> var vertices = new ArrayList() opg-jshell> vertices.add("Machine_learning") opg-jshell> vertices.add("Albert_Einstein") opg-jshell> batchedSimilars = model.computeSimilars(vertices, 10) opg-jshell> batchedSimilars.print()
List vertices = Arrays.asList("Machine_learning","Albert_Einstein"); PgxFrame batchedSimilars = model.computeSimilars(vertices,10); batchedSimilars.print();
vertices = ["Machine_learning","Albert_Einstein"] batched_similars = model.compute_similars(vertices,10) batched_similars.print()
+-------------------------------------------------------------------+ | srcVertex | dstVertex | similarity | +-------------------------------------------------------------------+ | Machine_learning | Machine_learning | 1.0000001192092896 | | Machine_learning | Data_mining | 0.9070799350738525 | | Machine_learning | Computer_science | 0.8963605165481567 | | Machine_learning | Unsupervised_learning | 0.8828719854354858 | | Machine_learning | R_(programming_language) | 0.8821185827255249 | | Machine_learning | Algorithm | 0.8819515705108643 | | Machine_learning | Artificial_neural_network | 0.8773092031478882 | | Machine_learning | Data_analysis | 0.8758628368377686 | | Machine_learning | List_of_algorithms | 0.8737979531288147 | | Machine_learning | K-means_clustering | 0.8715602159500122 | | Albert_Einstein | Albert_Einstein | 1.0000001192092896 | | Albert_Einstein | Physics | 0.8664291501045227 | | Albert_Einstein | Werner_Heisenberg | 0.8625140190124512 | | Albert_Einstein | Richard_Feynman | 0.8496938943862915 | | Albert_Einstein | List_of_physicists | 0.8415523767471313 | | Albert_Einstein | Physicist | 0.8384397625923157 | | Albert_Einstein | Max_Planck | 0.8370327353477478 | | Albert_Einstein | Niels_Bohr | 0.8340970873832703 | | Albert_Einstein | Quantum_mechanics | 0.8331197500228882 | | Albert_Einstein | Special_relativity | 0.8280861973762512 | +-------------------------------------------------------------------+
Parent topic: Using the DeepWalk Algorithm
7.1.8 Storing a Trained DeepWalk Model
You can store models in database. The models get stored as a row inside a model store table.
The following code shows how to store a trained DeepWalk model in database in a specific model store table:
opg-jshell> model.export().db() .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
model.export().db() .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
model.export().db(model_store="modelstoretablename", model_name="model")
Note:
All the above examples assume that you are storing the model in the current logged in database. If you must store the model in a different database then refer to the examples in Storing a Trained Model in Another Database.7.1.8.1 Storing a Trained Model in Another Database
You can store models in a different database other than the one used for login.
The following code shows how to store a trained model in a different database:
opg-jshell> model.export().db() .username("user") // DB user to use for storing the model .password("password") // password of the DB user .jdbcUrl("jdbcUrl") // jdbc url to the DB .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
model.export().db() .username("user") // DB user to use for storing the model .password("password") // password of the DB user .jdbcUrl("jdbcUrl") // jdbc url to the DB .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
model.export().db(username="user", password="password", model_store="modelstoretablename", model_name="model", jdbc_url="jdbc_url")
Parent topic: Storing a Trained DeepWalk Model
7.1.9 Loading a Pre-Trained DeepWalk Model
You can load models from a database.
You can load a pre-trained DeepWalk model from a model store table in database as described in the following code:
opg-jshell> var model = analyst.loadDeepWalkModel().db() .modelstore("modeltablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .load();
DeepWalkModelmodel = analyst.loadDeepWalkModel().db() .modelstore("modeltablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .load();
analyst.get_deepwalk_model_loader().db(model_store="modelstoretablename", model_name="model")
Note:
All the above examples assume that you are loading the model from the current logged in database. If you must load the model from a different database then refer to the examples in Loading a Pre-Trained Model From Another Database.7.1.9.1 Loading a Pre-Trained Model From Another Database
You can load models from a different database other than the one used for login.
You can load a pre-trained model from a model store table in database as described in the following code:
opg-jshell> var model = analyst.<modelLoader>.db()
.username("user") // DB user to use for storing the model
.password("password") // password of the DB user
.jdbcUrl("jdbcUrl") // jdbc url to the DB
.modelstore("modeltablename") // name of the model store table
.modelname("model") // model name (primary key of model store table)
.load();
loadDeepWalkModel()
: Loads a Deepwalk modelloadSupervisedGraphWiseModel()
: Loads a GraphWise modelloadPg2vecModel()
: Loads a Pg2vec model
DeepWalkModelmodel = analyst.<modelLoader>.db()
.username("user") // DB user to use for storing the model
.password("password") // password of the DB user
.jdbcUrl("jdbcUrl") // jdbc url to the DB
.modelstore("modeltablename") // name of the model store table
.modelname("model") // model name (primary key of model store table)
.load();
loadDeepWalkModel()
: Loads a Deepwalk modelloadSupervisedGraphWiseModel()
: Loads a GraphWise modelloadPg2vecModel()
: Loads a Pg2vec model
analyst.<modelLoader>.db(username="user", password="password", model_store="modelstoretablename", model_name="model", jdbc_url="jdbc_url")
get_deepwalk_model_loader()
: Loads a Deepwalk modelget_pg2vec_model_loader()
: Loads a Pg2vec model
Parent topic: Loading a Pre-Trained DeepWalk Model
7.1.10 Destroying a DeepWalk Model
You can destroy a DeepWalk model as described in the following code:
opg-jshell> model.destroy()
model.destroy();
model.destroy()
Parent topic: Using the DeepWalk Algorithm
7.2 Using the Supervised GraphWise Algorithm
Supervised GraphWise is an inductive vertex representation learning algorithm which is able to leverage vertex feature information. It can be applied to a wide variety of tasks, including vertex classification and link prediction.
Supervised GraphWise is based on GraphSAGE by Hamilton et al.
Model Structure
A Supervised GraphWise model consists of two graph convolutional layers followed by several prediction layers.
The forward pass through a convolutional layer for a vertex proceeds as follows:
- A set of neighbors of the vertex is sampled.
- The previous layer representations of the neighbors are mean-aggregated, and the aggregated features are concatenated with the previous layer representation of the vertex.
- This concatenated vector is multiplied with weights, and a bias vector is added.
- The result is normalized to such that the layer output has unit norm.
The prediction layers are standard neural network layers.
The following describes the usage of the main functionalities of the implementation of GraphSAGE in PGX using the Cora graph as an example:
- Loading a Graph
- Building a Minimal GraphWise Model
- Advanced Hyperparameter Customization
- Training a Supervised GraphWise Model
- Getting the Loss Value For a Supervised GraphWise Model
- Inferring the Vertex Labels for a Supervised GraphWise Model
- Evaluating the Supervised GraphWise Model Performance
- Inferring Embeddings for a Supervised GraphWise Model
- Storing a Trained Supervised GraphWise Model
- Loading a Pre-Trained Supervised GraphWise Model
- Destroying a Supervised GraphWise Model
Parent topic: Using the Machine Learning Library (PgxML) for Graphs
7.2.1 Loading a Graph
The following describes the steps for loading a graph:
- Create a Session and an Analyst.
Creating a Session and an Analyst Using JShell
cd /opt/oracle/graph/ ./bin/opg-jshell // starting the shell will create an implicit session and analyst
Creating a Session and an Analyst Using Javaimport oracle.pgx.api.*; import oracle.pgx.api.mllib.SupervisedGraphWiseModel; import oracle.pgx.api.frames.*; import oracle.pgx.config.mllib.ActivationFunction; import oracle.pgx.config.mllib.GraphWiseConvLayerConfig; import oracle.pgx.config.mllib.GraphWisePredictionLayerConfig; import oracle.pgx.config.mllib.SupervisedGraphWiseModelConfig; import oracle.pgx.config.mllib.WeightInitScheme; PgxSession session = Pgx.createSession("my-session"); Analyst analyst = session.createAnalyst();
- Load the graph.
Loading a graph using JShell
opg-jshell> var fullGraph = session.readGraphWithProperties("<path>/<full_graph.json>") opg-jshell> var trainGraph = session.readGraphWithProperties("<path>/<train_graph.json>") opg-jshell> var testVertices = fullGraph.getVertices() .stream() .filter(v -> !trainGraph.hasVertex(v.getId())) .collect(Collectors.toList());
Loading a graph using JavaPgxGraph fullGraph = session.readGraphWithProperties("<path>/<full_graph.json>"); PgxGraph trainGraph = session.readGraphWithProperties("<path>/<train_graph.json>"); List<PgxVertex> testVertices = fullGraph.getVertices() .stream() .filter(v->!trainGraph.hasVertex(v.getId())) .collect(Collectors.toList());
Parent topic: Using the Supervised GraphWise Algorithm
7.2.2 Building a Minimal GraphWise Model
You can build a GraphWise model using the minimal configuration and default hyper-parameters as described in the following code:
opg-jshell> var model = analyst.supervisedGraphWiseModelBuilder() .setVertexInputPropertyNames("features") .setVertexTargetPropertyName("label") .build()
SupervisedGraphWiseModel model = analyst.supervisedGraphWiseModelBuilder() .setVertexInputPropertyNames("features") .setVertexTargetPropertyName("labels") .build()
Note:
Even though only one feature property is specified in the above example, you can specify arbitrarily many.Parent topic: Using the Supervised GraphWise Algorithm
7.2.3 Advanced Hyperparameter Customization
You can build a GraphWise model using rich hyperparameter customization.
This is done through the following two sub-config classes:
GraphWiseConvLayerConfig
GraphWisePredictionLayerConfig
The following code describes the implementation of the configuration using the above classes in GraphWise model:
opg-jshell> var weightProperty = analyst.pagerank(trainGraph).getName() opg-jshell> var convLayerConfig = analyst.graphWiseConvLayerConfigBuilder() .setNumSampledNeighbors(25) .setActivationFunction(ActivationFunction.TANH) .setWeightInitScheme(WeightInitScheme.XAVIER) .setWeightedAggregationProperty(weightProperty) .build() opg-jshell> var predictionLayerConfig = analyst.graphWisePredictionLayerConfigBuilder() .setHiddenDimension(32) .setActivationFunction(ActivationFunction.RELU) .setWeightInitScheme(WeightInitScheme.HE) .build() opg-jshell> var model = analyst.supervisedGraphWiseModelBuilder() .setVertexInputPropertyNames("features") .setVertexTargetPropertyName("labels") .setConvLayerConfigs(convLayerConfig) .setPredictionLayerConfigs(predictionLayerConfig) .build()
String weightProperty = analyst.pagerank(trainGraph).getName() GraphWiseConvLayerConfig convLayerConfig = analyst.graphWiseConvLayerConfigBuilder() .setNumSampledNeighbors(25) .setActivationFunction(ActivationFunction.TANH) .setWeightInitScheme(WeightInitScheme.XAVIER) .setWeightedAggregationProperty(weightProperty) .build(); GraphWisePredictionLayerConfig predictionLayerConfig = analyst.graphWisePredictionLayerConfigBuilder() .setHiddenDimension(32) .setActivationFunction(ActivationFunction.RELU) .setWeightInitScheme(WeightInitScheme.HE) .build(); SupervisedGraphWiseModel model = analyst.supervisedGraphWiseModelBuilder() .setVertexInputPropertyNames("features") .setVertexTargetPropertyName("labels") .setConvLayerConfigs(convLayerConfig) .setPredictionLayerConfigs(predictionLayerConfig) .build();
See SupervisedGraphWiseModelBuilder, GraphWiseConvLayerConfigBuilder and GraphWisePredictionLayerConfigBuilder in Javadoc for a full description of all available hyperparameters and their default values.
Parent topic: Using the Supervised GraphWise Algorithm
7.2.4 Training a Supervised GraphWise Model
You can train a Supervised GraphWise model on a graph as described in the following code:
opg-jshell> model.fit(trainGraph)
model.fit(trainGraph)
Parent topic: Using the Supervised GraphWise Algorithm
7.2.5 Getting the Loss Value For a Supervised GraphWise Model
You can fetch the training loss value as described in the following code:
opg-jshell> var loss = model.getTrainingLoss()
double loss = model.getTrainingLoss();
Parent topic: Using the Supervised GraphWise Algorithm
7.2.6 Inferring the Vertex Labels for a Supervised GraphWise Model
You can infer the labels for vertices on any graph (including vertices or graphs that were not seen during training) as described in the following code:
opg-jshell> var labels = model.inferLabels(fullGraph, testVertices) opg-jshell> labels.head().print()
PgxFrame labels = model.inferLabels(fullGraph,testVertices); labels.head().print();
+----------------------------------+ | vertexId | label | +----------------------------------+ | 2 | Neural Networks | | 6 | Theory | | 7 | Case Based | | 22 | Rule Learning | | 30 | Theory | | 34 | Neural Networks | | 47 | Case Based | | 48 | Probabalistic Methods | | 50 | Theory | | 52 | Theory | +----------------------------------+
Similarly, you can also get the model confidence for each class by inferring the prediction logits as described in the following code:
opg-jshell> var logits = model.inferLogits(fullGraph, testVertices) opg-jshell> labels.head().print()
PgxFrame logits = model.inferLogits(fullGraph,testVertices); logits.head().print();
Parent topic: Using the Supervised GraphWise Algorithm
7.2.7 Evaluating the Supervised GraphWise Model Performance
You can evaluate various classification metrics for the model using the evaluateLabels
method as described in the following code:
opg-jshell> model.evaluateLabels(fullGraph, testVertices).print()
model.evaluateLabels(fullGraph,testVertices).print();
+------------------------------------------+ | Accuracy | Precision | Recall | F1-Score | +------------------------------------------+ | 0.8488 | 0.8523 | 0.831 | 0.8367 | +------------------------------------------+
Parent topic: Using the Supervised GraphWise Algorithm
7.2.8 Inferring Embeddings for a Supervised GraphWise Model
You can use a trained model to infer embeddings for unseen nodes and store in the database as described in the following code:
opg-jshell> var vertexVectors = model.inferEmbeddings(fullGraph, fullGraph.getVertices()).flattenAll() opg-jshell> vertexVectors.write() .db() .username("user") // DB user .password("password") // password of the DB user .jdbcUrl("jdbcUrl") // jdbc url to the DB .name("vertex vectors") .tablename("vertexVectors") // indicate the name of the table in which the data should be stored .overwrite(true) // indicate that if there is a table with the same name, it will be overwritten (truncated) .store()
PgxFrame vertexVectors = model.inferEmbeddings(fullGraph,fullGraph.getVertices()).flattenAll(); vertexVectors.write() .db() .username("user") // DB user .password("password") // password of the DB user .jdbcUrl("jdbcUrl") // jdbc url to the DB .name("vertex vectors") .tablename("vertexVectors") // indicate the name of the table in which the data should be stored .overwrite(true) // indicate that if there is a table with the same name, it will be overwritten (truncated) .store();
vertexVectors
will be as follows without flattening (flattenAll
splits the vector column into separate double-valued columns):+---------------------------------------------------------------+ | vertexId | embedding | +---------------------------------------------------------------+
Parent topic: Using the Supervised GraphWise Algorithm
7.2.9 Storing a Trained Supervised GraphWise Model
You can store models in database. The models get stored as a row inside a model store table.
The following code shows how to store a trained Supervised GraphWise model in database in a specific model store table:
opg-jshell> model.export().db() .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
model.export().db() .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
Note:
All the above examples assume that you are storing the model in the current logged in database. If you must store the model in a different database then refer to the examples in Storing a Trained Model in Another Database.Parent topic: Using the Supervised GraphWise Algorithm
7.2.10 Loading a Pre-Trained Supervised GraphWise Model
You can load models from a database.
You can load a pre-trained Supervised GraphWise model from a model store table in database as described in the following code:
opg-jshell> var model = analyst.loadSupervisedGraphWiseModel().db() .modelstore("modeltablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .load();
SupervisedGraphWiseModelmodel = analyst.loadSupervisedGraphWiseModel().db() .modelstore("modeltablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .load();
Note:
All the above examples assume that you are loading the model from the current logged in database. If you must load the model from a different database then refer to the examples in Loading a Pre-Trained Model From Another Database.Parent topic: Using the Supervised GraphWise Algorithm
7.2.11 Destroying a Supervised GraphWise Model
You can destroy a GraphWise model as described in the following code:
opg-jshell> model.destroy()
model.destroy();
Parent topic: Using the Supervised GraphWise Algorithm
7.3 Using the Pg2vec Algorithm
Pg2vec learns representations of graphlets (partitions inside a graph) by employing edges as the principal learning units and thereby packing more information in each learning unit (as compared to employing vertices as learning units) for the representation learning task.
It consists of three main steps:
- Random walks for each vertex (with pre-defined length per walk and pre-defined number of walks per vertex) is generated.
- Each edge in this random walk is mapped as a
property edge-word
in the created document (with the document label as the graph-id) where theproperty edge-word
is defined as the concatenation of the properties of the source and destination vertices. - The generated documents (with their attached document labels) are fed to a doc2vec algorithm which generates the vector representation for each document, which is a graph in this case.
Pg2vec creates graphlet embeddings for a specific set of graphlets and cannot be updated to incorporate modifications on these graphlets. Instead, a new Pg2vec model should be trained on these modified graphlets.
O(2(n+m)*d)where:
n
: is the number of vertices in the graphm
: is the number of graphlets in the graphd
: is the embedding length
The following describes the usage of the main functionalities of the implementation of Pg2vec in PGX using NCI109 dataset as an example with 4127 graphs in it:
- Loading a Graph
- Building a Minimal Pg2vec Model
- Building a Customized Pg2vec Model
- Training a Pg2vec Model
- Getting the Loss Value For a Pg2vec Model
- Computing Similar Graphlets for a Given Graphlet
- Computing Similars for a Graphlet Batch
- Inferring a Graphlet Vector
- Inferring Vectors for a Graphlet Batch
- Storing a Trained Pg2vec Model
- Loading a Pre-Trained Pg2vec Model
- Destroying a Pg2vec Model
Parent topic: Using the Machine Learning Library (PgxML) for Graphs
7.3.1 Loading a Graph
The following describes the steps for loading a graph:
- Create a Session and an Analyst.
Creating a Session and an Analyst Using JShell
cd /opt/oracle/graph/ ./bin/opg-jshell // starting the shell will create an implicit session and analyst
Creating a Session and an Analyst Using Javaimport oracle.pgx.api.*; import oracle.pgx.api.mllib.Pg2vecModel; import oracle.pgx.api.frames.*; ... PgxSession session = Pgx.createSession("my-session"); Analyst analyst = session.createAnalyst();
Creating a Session and an Analyst Using Pythonsession = pypgx.get_session(session_name="my-session") analyst = session.create_analyst()
- Load the graph.
Loading a graph using JShell
opg-jshell> var graph = session.readGraphWithProperties("<path>/<graph.json>")
Loading a graph using JavaPgxGraph graph = session.readGraphWithProperties("<path>/<graph.json>");
Loading a graph using Pythongraph = session.read_graph_with_properties("<path>/<graph.json>")
Parent topic: Using the Pg2vec Algorithm
7.3.2 Building a Minimal Pg2vec Model
You can build a Pg2vec model using the minimal configuration and default hyper-parameters as described in the following code:
opg-jshell> var model = analyst.pg2vecModelBuilder() .setGraphLetIdPropertyName("graph_id") .setVertexPropertyNames(Arrays.asList("category")) .setWindowSize(4) .setWalksPerVertex(5) .setWalkLength(8) .build()
Pg2vecModel model = analyst.pg2vecModelBuilder() .setGraphLetIdPropertyName("graph_id") .setVertexPropertyNames(Arrays.asList("category")) .setWindowSize(4) .setWalksPerVertex(5) .setWalkLength(8) .build();
model = analyst.pg2vec_model_builder( graph_let_id_property_name="graph_id", vertex_property_names(["category"]), window_size=4, walks_per_vertex=5, walk_length=8)
You can specify the property name to determine each graphlet using the Pg2vecModelBuilder#setGraphLetIdPropertyName
operation and also employ the vertex properties in Pg2vec which are specified using the Pg2vecModelBuilder#setVertexPropertyNames
operation.
You can also use the weakly connected component (WCC) functionality in PGX to determine the graphlets in a given graph.
Parent topic: Using the Pg2vec Algorithm
7.3.3 Building a Customized Pg2vec Model
You can build a Pg2vec model using cusomized hyper-parameters as described in the following code:
opg-jshell> var model = analyst.pg2vecModelBuilder() .setGraphLetIdPropertyName("graph_id") .setVertexPropertyNames(Arrays.asList("category")) .setMinWordFrequency(1) .setBatchSize(128) .setNumEpochs(5) .setLayerSize(200) .setLearningRate(0.04) .setMinLearningRate(0.0001) .setWindowSize(4) .setWalksPerVertex(5) .setWalkLength(8) .setUseGraphletSize(true) .setValidationFraction(0.05) .setGraphletSizePropertyName("<propertyName>") .build()
Pg2vecModel model= analyst.pg2vecModelBuilder() .setGraphLetIdPropertyName("graph_id") .setVertexPropertyNames(Arrays.asList("category")) .setMinWordFrequency(1) .setBatchSize(128) .setNumEpochs(5) .setLayerSize(200) .setLearningRate(0.04) .setMinLearningRate(0.0001) .setWindowSize(4) .setWalksPerVertex(5) .setWalkLength(8) .setUseGraphletSize(true) .setValidationFraction(0.05) .setGraphletSizePropertyName("<propertyName>") .build()
model = analyst.pg2vec_model_builder( graph_let_id_property_name = "graph_id", vertex_property_names = ["category"], min_word_frequency = 1, batch_size = 128, num_epochs = 5, layer_size = 200, learning_rate = 0.04, min_learning_rate = 0.0001, window_size = 4, walks_per_vertex = 5, walk_length = 8, use_graphlet_size = true, graphlet_size_property_name = "<property_name>", validation_fraction = 0.05)
See Pg2vecModelBuilder in Javadoc for more explanation for each builder operation along with the default values.
Parent topic: Using the Pg2vec Algorithm
7.3.4 Training a Pg2vec Model
You can train a Pg2vec model with the specified default or customized settings as described in the following code:
opg-jshell> model.fit(graph)
model.fit(graph);
model.fit(graph)
Parent topic: Using the Pg2vec Algorithm
7.3.5 Getting the Loss Value For a Pg2vec Model
You can fetch the training loss value on a specified fraction of training data (set in builder using setValidationFraction
) as described in the following code:
opg-jshell> var loss = model.getLoss()
double loss = model.getLoss();
loss = model.loss
Parent topic: Using the Pg2vec Algorithm
7.3.6 Computing Similar Graphlets for a Given Graphlet
You can fetch the k
most similar graphlets for a given graphlet as described in the following code:
opg-jshell> var similars = model.computeSimilars(52, 10)
PgxFrame similars = model.computeSimilars(52, 10);
similars = model.compute_similars(52, 10)
ID = 52
using the trained model and printing it with similars.print()
, will result in the following output:+----------------------------------+ | dstGraphlet | similarity | +----------------------------------+ | 52 | 1.0 | | 10 | 0.8748674392700195 | | 23 | 0.8551455140113831 | | 26 | 0.8493421673774719 | | 47 | 0.8411962985992432 | | 25 | 0.8281504511833191 | | 43 | 0.8202780485153198 | | 24 | 0.8179885745048523 | | 8 | 0.796689510345459 | | 9 | 0.7947834134101868 | +----------------------------------+
The following depicts the visualization of two similar graphlets (top: ID = 52
and bottom: ID = 10
):
Figure 7-1 Pg2vec - Visualization of Two Similar Graphlets
Description of "Figure 7-1 Pg2vec - Visualization of Two Similar Graphlets"
Description of "Figure 7-1 Pg2vec - Visualization of Two Similar Graphlets"
Parent topic: Using the Pg2vec Algorithm
7.3.7 Computing Similars for a Graphlet Batch
You can fetch the k
most similar graphlets for a batch of input graphlets as described in the following code:
opg-jshell> var graphlets = new ArrayList() opg-jshell> graphlets.add(52) opg-jshell> graphlets.add(41) opg-jshell> var batchedSimilars = model.computeSimilars(graphlets, 10)
List graphlets = Arrays.asList(52,41); PgxFrame batchedSimilars = model.computeSimilars(graphlets,10);
batched_similars = model.compute_similars([52,41],10)
ID = 52
and ID = 41
using the trained model and printing it with batched_similars.print()
, will result in the following output:+------------------------------------------------+ | srcGraphlet | dstGraphlet | similarity | +------------------------------------------------+ | 52 | 52 | 1.0 | | 52 | 10 | 0.8748674392700195 | | 52 | 23 | 0.8551455140113831 | | 52 | 26 | 0.8493421673774719 | | 52 | 47 | 0.8411962985992432 | | 52 | 25 | 0.8281504511833191 | | 52 | 43 | 0.8202780485153198 | | 52 | 24 | 0.8179885745048523 | | 52 | 8 | 0.796689510345459 | | 52 | 9 | 0.7947834134101868 | | 41 | 41 | 1.0 | | 41 | 197 | 0.9653506875038147 | | 41 | 84 | 0.9552277326583862 | | 41 | 157 | 0.9465565085411072 | | 41 | 65 | 0.9287481307983398 | | 41 | 248 | 0.9177336096763611 | | 41 | 315 | 0.9043129086494446 | | 41 | 92 | 0.8998928070068359 | | 41 | 297 | 0.8897411227226257 | | 41 | 50 | 0.8810243010520935 | +------------------------------------------------+
Parent topic: Using the Pg2vec Algorithm
7.3.8 Inferring a Graphlet Vector
You can infer the vector representation for a given new graphlet as described in the following code:
opg-jshell> var graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>") opg-jshell> inferredVector = model.inferGraphletVector(graphlet) opg-jshell> inferredVector.print()
PgxGraph graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>"); PgxFrame inferredVector = model.inferGraphletVector(graphlet); inferredVector.print();
PgxGraph graphlet = session.read_graph_with_properties("<path>/<graphletConfig.json>") inferredVector = model.infer_graphlet_vector(graphlet) inferredVector.print()
inferredVector
will be similar to the following output:+---------------------------------------------------------------+ | graphlet | embedding | +---------------------------------------------------------------+
Parent topic: Using the Pg2vec Algorithm
7.3.9 Inferring Vectors for a Graphlet Batch
You can infer the vector representations for multiple graphlets (specified with different graph-ids in a graph) as described in the following code:
opg-jshell> var graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>") opg-jshell> inferredVectorBatched = model.inferGraphletVectorBatched(graphlets) opg-jshell> inferredVectorBatched.print()
PgxGraph graphlet = session.readGraphWithProperties("<path>/<graphletConfig.json>"); PgxFrame inferredVectorBatched = model.inferGraphletVectorBatched(graphlets); inferredVector.print();
graphlets = session.read_graph_with_properties("<path>/<graphletConfig.json>") inferred_vector_batched = model.infer_graphlet_vector_batched(graphlets) inferred_vector_batched.print()
The schema is same as for inferGraphletVector
but with more rows corresponding to the input graphlets.
Parent topic: Using the Pg2vec Algorithm
7.3.10 Storing a Trained Pg2vec Model
You can store models in database. The models get stored as a row inside a model store table.
The following code shows how to store a trained Pg2vec model in database in a specific model store table:
opg-jshell> model.export().db() .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
model.export().db() .modelstore("modelstoretablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .description("a model description") // description to store alongside the model .store();
model.export().db(model_store="modelstoretablename", model_name="model")
Note:
All the above examples assume that you are storing the model in the current logged in database. If you must store the model in a different database then refer to the examples in Storing a Trained Model in Another Database.Parent topic: Using the Pg2vec Algorithm
7.3.11 Loading a Pre-Trained Pg2vec Model
You can load models from a database.
You can load a pre-trained Pg2vec model from a model store table in database as described in the following:
opg-jshell> var model = analyst.loadPg2vecModel().db() .modelstore("modeltablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .load();
Pg2vecModelmodel = analyst.loadPg2vecModel().db() .modelstore("modeltablename") // name of the model store table .modelname("model") // model name (primary key of model store table) .load();
analyst.get_pg2vec_model_loader().db(model_store="modelstoretablename", model_name="model")
Note:
All the above examples assume that you are loading the model from the current logged in database. If you must load the model from a different database then refer to the examples in Loading a Pre-Trained Model From Another Database.Parent topic: Using the Pg2vec Algorithm
7.3.12 Destroying a Pg2vec Model
You can destroy a Pg2vec model as described in the following code:
opg-jshell> model.destroy()
model.destroy();
model.destroy()
Parent topic: Using the Pg2vec Algorithm