MLlib

Graph machine learning tools for use with PGX.

class pypgx.api.mllib.CorruptionFunction(java_corruption_function)

Bases: object

Abstract Corruption Function which generate the corrupted subgraph for DGI

Return type

None

class pypgx.api.mllib.DeepWalkModel(java_deepwalk_model)

Bases: pypgx.api._pgx_context_manager.PgxContextManager

DeepWalk model object.

Return type

None

close()

Call destroy

Return type

None

compute_similars(v, k)

Compute the top-k similar vertices for a given vertex.

Parameters
  • v (Union[int, str, List[int], List[str]]) – id of the vertex or list of vertex ids for which to compute the similar vertices

  • k (int) – number of similar vertices to return

Return type

pypgx.api.frames._pgx_frame.PgxFrame

destroy()

Destroy this model object.

Return type

None

export()

Return a ModelStorer object which can be used to save the model.

Returns

ModelStorer object

Return type

ModelStorer

fit(graph)

Fit the model on a graph.

Parameters

graph (pypgx.api._pgx_graph.PgxGraph) – Graph to fit on

Return type

None

store(path, key, overwrite=False)

Store the model in a file.

Parameters
  • path (str) – Path where to store the model

  • key (Optional[str]) – Encryption key

  • overwrite (bool) – Whether or not to overwrite pre-existing file

Return type

None

property trained_vectors: pypgx.api.frames._pgx_frame.PgxFrame

Get the trained vertex vectors for the current DeepWalk model.

Returns

PgxFrame object with the trained vertex vectors

Return type

PgxFrame

class pypgx.api.mllib.DevNetLoss(confidence_margin, anomaly_property_value)

Bases: pypgx.api.mllib._loss_function.LossFunction

Deviation loss for anomaly detection

Parameters
  • confidence_margin (float) –

  • anomaly_property_value (bool) –

Return type

None

get_anomaly_property_value()

Get Anomaly Property Value.

Returns

the anomaly property value

Return type

Any

get_confidence_margin()

Get confidence margin of the loss function.

Returns

the confidence margin

Return type

float

class pypgx.api.mllib.GnnExplanation(java_gnn_explanation)

Bases: object

GnnExplanation object

Return type

None

get_embedding()

Get the inferred embedding of the specified vertex.

Returns

the embedding

Return type

List[float]

get_importance_graph()

Get the importance Graph, that is, the computation graph with an additional vertex property indicating vertex importance. The additional importance property can be retrieved via get_vertex_importance_property.

Returns

the importance graph

Return type

PgxGraph

get_vertex_feature_importance()

Get the feature importances as a map from property to importance value.

Returns

the feature importances.

Return type

Dict[pypgx.api._property.VertexProperty, float]

get_vertex_importance_property()

Get the vertex property that contains the computed vertex importance.

Returns

the vertex importance property

Return type

pypgx.api._property.VertexProperty

class pypgx.api.mllib.GraphWiseConvLayerConfig(java_config, params)

Bases: object

GraphWise conv layer configuration.

Return type

None

class pypgx.api.mllib.GraphWiseDgiLayerConfig(java_config, params)

Bases: object

GraphWise dgi layer configuration.

Return type

None

get_corruption_function()

Return the corruption function

Return type

pypgx.api.mllib._corruption_function.PermutationCorruption

get_discriminator()

Return the discriminator

Return type

str

get_readout_function()

Return the readout function

Return type

str

set_corruption_function(corruption_function)

Set the corruption function

Parameters

corruption_function (CorruptionFunction) – the corruption function. Supported currently: PermutationCorruption

set_discriminator(discriminator)

Set the discriminator

Parameters

discriminator (str) – The discriminator function. Supported currently: ‘bilinear’

Return type

None

set_readout_function(readout_function)

Set the readout function

Parameters

readout_function (str) – The readout function. Supported currently: ‘mean’

Return type

None

class pypgx.api.mllib.GraphWiseModelConfig(java_graphwise_model_config)

Bases: object

Graphwise Model Configuration class

Return type

None

get_conv_layer_configs()

Return a list of conv layer configs

Return type

List[pypgx.api.mllib._graphwise_conv_layer_config.GraphWiseConvLayerConfig]

set_batch_size(batch_size)

Set the batch size

Parameters

batch_size (int) – batch size

Return type

None

set_edge_input_feature_dim(edge_input_feature_dim)

Set the edge input feature dimension

Parameters

edge_input_feature_dim (int) – edge input feature dimension

Return type

None

set_embedding_dim(embedding_dim)

Set the embedding dimension

Parameters

embedding_dim (int) – embedding dimension

Return type

None

set_fitted(fitted)

Set the fitted flag

Parameters

fitted (bool) – fitted flag

Return type

None

set_input_feature_dim(input_feature_dim)

Set the input feature dimension

Parameters

input_feature_dim (int) – input feature dimension

Return type

None

set_learning_rate(learning_rate)

Set the learning rate

Parameters

learning_rate (int) – initial learning rate

Return type

None

set_num_epochs(num_epochs)

Set the number of epochs

Parameters

num_epochs (int) – number of epochs

Return type

None

set_seed(seed)

Set the seed

Parameters

seed (int) – seed

Return type

None

set_shuffle(shuffle)

Set the shuffling flag

Parameters

shuffle (bool) – shuffling flag

Return type

None

set_standarize(standardize)

Set the standardize flag

Parameters

standardize (bool) – standardize flag

Return type

None

set_training_loss(training_loss)

Set the training loss

Parameters

training_loss (float) – training loss

Return type

None

set_weight_decay(weight_decay)

Set the weight decay

Parameters

weight_decay (float) – weight decay

Return type

None

class pypgx.api.mllib.GraphWisePredictionLayerConfig(java_config, params)

Bases: object

GraphWise prediction layer configuration.

Return type

None

class pypgx.api.mllib.ModelRepository(java_generic_model_repository)

Bases: object

ModelRepository object that exposes crud operations on - model stores and - the models within these model stores.

create(model_store_name)

Create a new model store.

Parameters

model_store_name (str) – the name of the model store

Returns

None

Return type

None

delete_model(model_store_name, model_name)

Delete the model in the specified model store with the given model name.

Parameters
  • model_store_name (str) – the name of the model store

  • model_name (str) – the name under which the model was stored

Returns

None

Return type

None

delete_model_store(model_store_name)

Delete a model store.

Parameters

model_store_name (str) – the name of the model store

Returns

None

Return type

None

get_model_description(model_store_name, model_name)

Retrieve the description of the model in the specified model store, with the given model name.

Parameters
  • model_store_name (str) – the name of the model store

  • model_name (str) – the name under which the model was stored

Returns

A string containing the description that was stored with the model

Return type

str

list_model_stores_names()

List the names of all model stores in the model repository.

Returns

List of names.

Return type

List[str]

list_model_stores_names_matching(regex)

List the names of all model stores in the model repository that match the regex.

Parameters

regex (str) – a regex in form of a string.

Returns

List of matching names.

Return type

List[str]

list_models(model_store_name)

List the models present in the model store with the given name.

Parameters

model_store_name (str) – the name of the model store (non-prefixed)

Returns

List of model names.

Return type

List[str]

class pypgx.api.mllib.ModelRepositoryBuilder(java_generic_model_repository_builder)

Bases: object

ModelRepositoryBuilder object that can be used to configure the connection to a model repository.

db(username=None, password=None, jdbc_url=None, keystore_alias=None, schema=None)

Connect to a model repository backed by a database.

Parameters
  • username (Optional[str]) – username in database

  • password (Optional[str]) – password of username in database

  • jdbc_url (Optional[str]) – jdbc url of database

  • keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore

  • schema (Optional[str]) – the schema of the model store in database

Returns

A model repository configured to connect to a database.

Return type

pypgx.api.mllib._model_repo.ModelRepository

class pypgx.api.mllib.PermutationCorruption(java_permutation_corruption)

Bases: pypgx.api.mllib._corruption_function.CorruptionFunction

Permutation Function which shuffle the nodes to generate the corrupted subgraph for DGI

Return type

None

class pypgx.api.mllib.Pg2vecModel(java_pg2vec_model)

Bases: pypgx.api._pgx_context_manager.PgxContextManager

Pg2Vec model object.

Return type

None

close()

Call destroy

Return type

None

compute_similars(graphlet_id, k)

Compute the top-k similar graphlets for a list of input graphlets.

Parameters
  • graphlet_id (Union[Iterable[Union[int, str]], int, str]) – graphletIds or iterable of graphletIds

  • k (int) – number of similars to return

Return type

pypgx.api.frames._pgx_frame.PgxFrame

destroy()

Destroy this model object.

Return type

None

export()

Return a ModelStore object which can be used to save the model.

Returns

ModelStore object

Return type

pypgx.api.mllib._model_utils.ModelStorer

fit(graph)

Fit the model on a graph.

Parameters

graph (pypgx.api._pgx_graph.PgxGraph) – Graph to fit on

Return type

None

infer_graphlet_vector(graph)

Return the inferred vector of the input graphlet as a PgxFrame.

Parameters

graph (pypgx.api._pgx_graph.PgxGraph) – graphlet for which to infer a vector

Return type

pypgx.api.frames._pgx_frame.PgxFrame

infer_graphlet_vector_batched(graph)

Return the inferred vectors of the input graphlets as a PgxFrame.

Parameters

graph (pypgx.api._pgx_graph.PgxGraph) – graphlets (as a single graph but different graphlet-id) for which to infer vectors

Return type

pypgx.api.frames._pgx_frame.PgxFrame

store(path, key, overwrite=False)

Store the model in a file.

Parameters
  • path (str) – Path where to store the model

  • key (Optional[str]) – Encryption key

  • overwrite (bool) – Whether or not to overwrite pre-existing file

Return type

None

property trained_graphlet_vectors: pypgx.api.frames._pgx_frame.PgxFrame

Get the trained graphlet vectors for the current pg2vec model.

Returns

PgxFrame containing the trained graphlet vectors

class pypgx.api.mllib.SigmoidCrossEntropyLoss

Bases: pypgx.api.mllib._loss_function.LossFunction

Sigmoid Cross Entropy loss for binary classification

class pypgx.api.mllib.SoftmaxCrossEntropyLoss

Bases: pypgx.api.mllib._loss_function.LossFunction

Softmax Cross Entropy loss for multi-class classification

Return type

None

class pypgx.api.mllib.SupervisedGnnExplanation(java_supervised_gnn_explanation, bool_label)

Bases: pypgx.api.mllib._gnn_explanation.GnnExplanation

SupervisedGnnExplanation object

Parameters

bool_label (bool) –

Return type

None

get_embedding()

Get the inferred embedding of the specified vertex.

Returns

the embedding

Return type

List[float]

get_importance_graph()

Get the importance Graph, that is, the computation graph with an additional vertex property indicating vertex importance. The additional importance property can be retrieved via get_vertex_importance_property.

Returns

the importance graph

Return type

PgxGraph

get_label()

Get the inferred label of the specified vertex.

Returns

the label

Return type

Any

get_logits()

Get the inferred logits of the specified vertex.

Returns

the logits

Return type

List[float]

get_vertex_feature_importance()

Get the feature importances as a map from property to importance value.

Returns

the feature importances.

Return type

Dict[pypgx.api._property.VertexProperty, float]

get_vertex_importance_property()

Get the vertex property that contains the computed vertex importance.

Returns

the vertex importance property

Return type

pypgx.api._property.VertexProperty

class pypgx.api.mllib.SupervisedGraphWiseModel(java_graphwise_model, params=None)

Bases: pypgx.api.mllib._graphwise_model.GraphWiseModel

SupervisedGraphWise model object.

Return type

None

check_is_fitted()

Make sure the model is fitted.

Returns

None

Raise

RuntimeError if the model is not fitted

Return type

None

close()

Call destroy().

Returns

None

Return type

None

destroy()

Destroy this model object.

Returns

None

Return type

None

evaluate_labels(graph, vertices)

Evaluate (macro averaged) classification performance statistics for the specified vertices.

Parameters
Returns

PgxFrame containing the metrics

Return type

PgxFrame

export()

Return a ModelStorer object which can be used to save the model.

Returns

ModelStorer object

Return type

ModelStorer

fit(graph)

Fit the model on a graph.

Parameters

graph (PgxGraph) – Graph to fit on

Returns

None

Return type

None

get_batch_size()

Get the batch size

Returns

batch size

Return type

int

get_class_weights()

Get the class weights.

Returns

a dictionary mapping classes to their weights.

Return type

dict

get_config()

Return the GraphWiseModelConfig object

Returns

the config

Return type

GraphWiseModelConfig

get_conv_layer_config()

Get the configuration objects for the convolutional layers

Returns

configurations

Return type

GraphWiseConvLayerConfig

get_edge_input_feature_dim()

Get the edges input feature dimension, that is, the dimension of all the input edge properties when concatenated

Returns

edges input feature dimension

Return type

int

get_edge_input_property_names()

Get the edges input feature names

Returns

edges input feature names

Return type

list(str)

get_layer_size()

Get the dimension of the embeddings

Returns

embedding dimension

Return type

int

get_learning_rate()

Get the initial learning rate

Returns

initial learning rate

Return type

float

get_loss_function()

Get the loss function name.

Returns

loss function name. Can be one of softmax_cross_entropy, sigmoid_cross_entropy, devnet

Return type

str

get_loss_function_class()

Get the loss function.

Returns

loss function

Return type

LossFunction

get_num_epochs()

Get the number of epochs to train the model

Returns

number of epochs to train the model

Return type

int

get_prediction_layer_configs()

Get the configuration objects for the prediction layers.

Returns

configuration of the prediction layer

Return type

GraphWisePredictionLayerConfig

get_seed()

Get the random seed

Returns

random seed

Return type

int

get_target_vertex_labels()

Get the target vertex labels

Returns

target vertex labels

Return type

List[str]

get_training_loss()

Get the final training loss

Returns

training loss

Return type

float

get_vertex_input_feature_dim()

Get the input feature dimension, that is, the dimension of all the input vertex properties when concatenated

Returns

input feature dimension

Return type

int

get_vertex_input_property_names()

Get the vertices input feature names

Returns

vertices input feature names

Return type

list(str)

get_vertex_target_property_name()

Get the target property name

Returns

target property name

Return type

str

gnn_explainer(num_optimization_steps=200, learning_rate=0.05, marginalize=False)

Configure and return the GnnExplainer object of this model that can be used to request explanations of predictions.

Parameters
  • num_optimization_steps (int, optional) – optimization steps for the explainer, defaults to 200

  • learning_rate (float, optional) – learning rate for the explainer, defaults to 0.05

  • marginalize (bool, optional) – marginalize the loss over features, defaults to False

Returns

SupervisedGnnExplainer object of this model

Return type

SupervisedGnnExplainer

infer_and_get_explanation(graph, vertex, num_optimization_steps=200, learning_rate=0.05, marginalize=False)

Perform inference on the specified vertex and generate an explanation that contains scores of how important each property and each vertex in the computation graph is for the prediction.

Parameters
  • graph (PgxGraph) – the graph

  • vertex (PgxVertex or int) – the vertex or its ID

  • num_optimization_steps (int) –

  • learning_rate (float) –

  • marginalize (bool) –

Returns

explanation containing feature importance and vertex importance.

Return type

SupervisedGnnExplanation

infer_embeddings(graph, vertices)

Infer the embeddings for the specified vertices

Parameters
Returns

PgxFrame containing the embeddings for each vertex

Return type

PgxFrame

infer_labels(graph, vertices)

Infer the labels for the specified vertices

Parameters
Returns

PgxFrame containing the labels for each vertex

Return type

PgxFrame

infer_logits(graph, vertices)

Infer the prediction logits for the specified vertices

Parameters
Returns

PgxFrame containing the logits for each vertex

Return type

PgxFrame

is_fitted()

Check if the model is fitted

Returns

True if the model is fitted, False otherwise

Return type

bool

store(path, key, overwrite=False)

Store the model in a file.

Parameters
  • path (str) – Path where to store the model

  • key (str) – Encryption key

  • overwrite (bool) – Whether or not to overwrite pre-existing file

Returns

None

Return type

None

update_is_fitted()

Determine whether the model is fitted.

This updates the internal state.

Returns

None

Return type

None

class pypgx.api.mllib.UnsupervisedGraphWiseModel(java_graphwise_model, params=None)

Bases: pypgx.api.mllib._graphwise_model.GraphWiseModel

UnsupervisedGraphWise model object.

Return type

None

check_is_fitted()

Make sure the model is fitted.

Returns

None

Raise

RuntimeError if the model is not fitted

Return type

None

close()

Call destroy().

Returns

None

Return type

None

destroy()

Destroy this model object.

Returns

None

Return type

None

export()

Return a ModelStorer object which can be used to save the model.

Returns

ModelStorer object

Return type

ModelStorer

fit(graph)

Fit the model on a graph.

Parameters

graph (PgxGraph) – Graph to fit on

Returns

None

Return type

None

get_batch_size()

Get the batch size

Returns

batch size

Return type

int

get_config()

Return the GraphWiseModelConfig object

Returns

the config

Return type

GraphWiseModelConfig

get_conv_layer_config()

Get the configuration objects for the convolutional layers

Returns

configurations

Return type

GraphWiseConvLayerConfig

get_dgi_layer_config()

Get the configuration object for the dgi layer.

Returns

configuration

Return type

GraphWiseDgiLayerConfig

get_edge_input_feature_dim()

Get the edges input feature dimension, that is, the dimension of all the input edge properties when concatenated

Returns

edges input feature dimension

Return type

int

get_edge_input_property_names()

Get the edges input feature names

Returns

edges input feature names

Return type

list(str)

get_layer_size()

Get the dimension of the embeddings

Returns

embedding dimension

Return type

int

get_learning_rate()

Get the initial learning rate

Returns

initial learning rate

Return type

float

get_loss_function()

Get the loss function name.

Returns

loss function name. Can only be sigmoid_cross_entropy

Return type

str

get_num_epochs()

Get the number of epochs to train the model

Returns

number of epochs to train the model

Return type

int

get_seed()

Get the random seed

Returns

random seed

Return type

int

get_training_loss()

Get the final training loss

Returns

training loss

Return type

float

get_vertex_input_feature_dim()

Get the input feature dimension, that is, the dimension of all the input vertex properties when concatenated

Returns

input feature dimension

Return type

int

get_vertex_input_property_names()

Get the vertices input feature names

Returns

vertices input feature names

Return type

list(str)

gnn_explainer(num_optimization_steps=200, learning_rate=0.05, marginalize=False, num_clusters=50, num_samples=10000)

Configure and return the GnnExplainer object of this model that can be used to request explanations of predictions.

Parameters
  • num_optimization_steps (int, optional) – optimization steps for the explainer, defaults to 200

  • learning_rate (float, optional) – learning rate for the explainer, defaults to 0.05

  • marginalize (bool, optional) – marginalize the loss over features, defaults to False

  • num_clusters (int, optional) – number of clusters to use, defaults to 50

  • num_samples (int, optional) – number of samples to use, defaults to 10000

Returns

UnsupervisedGnnExplainer object of this model

Return type

UnsupervisedGnnExplainer

infer_and_get_explanation(graph, vertex, num_clusters=50, num_samples=10000, num_optimization_steps=200, learning_rate=0.05, marginalize=False)

Perform inference on the specified vertex and generate an explanation that contains scores of how important each property and each vertex in the computation graph is for the embeddings position relative to embeddings of other vertices in the graph.

Parameters
  • graph (pypgx.api._pgx_graph.PgxGraph) – the graph

  • vertex (Union[pypgx.api._pgx_entity.PgxVertex, int]) – the vertex

  • num_clusters (int) – the number of semantic vertex clusters expected in the graph, must be greater than 1

  • num_samples (int) –

  • num_optimization_steps (int) –

  • learning_rate (float) –

  • marginalize (bool) –

Returns

explanation containing feature importance and vertex importance.

Return type

pypgx.api.mllib._gnn_explanation.GnnExplanation

infer_embeddings(graph, vertices)

Infer the embeddings for the specified vertices.

Returns

PgxFrame containing the embeddings for each vertex.

Return type

PgxFrame

Parameters
is_fitted()

Check if the model is fitted

Returns

True if the model is fitted, False otherwise

Return type

bool

store(path, key, overwrite=False)

Store the model in a file.

Parameters
  • path (str) – Path where to store the model

  • key (str) – Encryption key

  • overwrite (bool) – Whether or not to overwrite pre-existing file

Returns

None

Return type

None

update_is_fitted()

Determine whether the model is fitted.

This updates the internal state.

Returns

None

Return type

None

class pypgx.api.mllib._model_utils.ModelStorer(model)

Bases: object

ModelStorer object.

Parameters

model (Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]) –

Return type

None

db(model_store, model_name, username=None, password=None, jdbc_url=None, model_description=None, overwrite=False, keystore_alias=None, schema=None)

Store a model to a database.

Parameters
  • username (Optional[str]) – username in database

  • password (Optional[str]) – password of username in database

  • model_store (str) – model store in database

  • model_name (str) – name of the model to store

  • jdbc_url (Optional[str]) – jdbc url of database

  • model_description (Optional[str]) – description of model

  • overwrite (bool) – boolean value for overwriting or not

  • keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore

  • schema – the schema of the model store in database

Return type

None

file(path, key, overwrite=False)

Store an encrypted model to a file.

Parameters
  • path (str) – path to store model

  • key (str) – key used for encryption

  • overwrite (bool) – boolean value for overwriting or not

Return type

None

class pypgx.api.mllib._model_utils.ModelLoader(analyst, java_model_loader, wrapper, java_class)

Bases: object

ModelLoader object.

Parameters
  • analyst (Analyst) –

  • java_model_loader (Any) –

  • wrapper (Callable) –

  • java_class (str) –

Return type

None

db(model_store, model_name, username=None, password=None, jdbc_url=None, keystore_alias=None, schema=None)

Return a model stored in a database.

Parameters
  • username (Optional[str]) – username in database

  • password (Optional[str]) – password of username in database

  • model_store (str) – model store in database

  • model_name (str) – name of the model to load

  • jdbc_url (Optional[str]) – jdbc url of database

  • keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore

  • schema – the schema of the model store in database

Returns

model stored in database.

Return type

Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]

file(path, key)

Return an encrypted model stored in a file.

Parameters
  • path (str) – path of stored model

  • key (str) – used for encryption

Returns

model stored in file.

Return type

Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]