MLlib¶

Graph machine learning tools for use with PGX.

class pypgx.api.mllib.CorruptionFunction(java_corruption_function)¶

Bases: object

Abstract Corruption Function which generate the corrupted subgraph for DGI

Return type: None

class pypgx.api.mllib.DeepWalkModel(java_deepwalk_model)¶

Bases: pypgx.api._pgx_context_manager.PgxContextManager

DeepWalk model object.

Return type: None

close()¶

Call destroy

Return type: None

compute_similars(v, k)¶

Compute the top-k similar vertices for a given vertex.

Parameters

v (Union[int, str, List[int], List[str]]) – id of the vertex or list of vertex ids for which to compute the similar vertices
k (int) – number of similar vertices to return

Return type

pypgx.api.frames._pgx_frame.PgxFrame

destroy()¶

Destroy this model object.

Return type: None

export()¶

Return a ModelStorer object which can be used to save the model.

Returns: ModelStorer object
Return type: ModelStorer

fit(graph)¶

Fit the model on a graph.

Parameters: graph (pypgx.api._pgx_graph.PgxGraph) – Graph to fit on
Return type: None

store(path, key, overwrite=False)¶

Store the model in a file.

Parameters

path (str) – Path where to store the model
key (Optional[str]) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file

Return type

None

property trained_vectors: pypgx.api.frames._pgx_frame.PgxFrame¶

Get the trained vertex vectors for the current DeepWalk model.

Returns: PgxFrame object with the trained vertex vectors
Return type: PgxFrame

class pypgx.api.mllib.DevNetLoss(confidence_margin, anomaly_property_value)¶

Bases: pypgx.api.mllib._loss_function.LossFunction

Deviation loss for anomaly detection

Parameters

confidence_margin (float) –
anomaly_property_value (bool) –

Return type

None

get_anomaly_property_value()¶

Get Anomaly Property Value.

Returns: the anomaly property value
Return type: Any

get_confidence_margin()¶

Get confidence margin of the loss function.

Returns: the confidence margin
Return type: float

class pypgx.api.mllib.GnnExplanation(java_gnn_explanation)¶

Bases: object

GnnExplanation object

Return type: None

get_embedding()¶

Get the inferred embedding of the specified vertex.

Returns: the embedding
Return type: List[float]

get_importance_graph()¶

Get the importance Graph, that is, the computation graph with an additional vertex property indicating vertex importance. The additional importance property can be retrieved via get_vertex_importance_property.

Returns: the importance graph
Return type: PgxGraph

get_vertex_feature_importance()¶

Get the feature importances as a map from property to importance value.

Returns: the feature importances.
Return type: Dict[pypgx.api._property.VertexProperty, float]

get_vertex_importance_property()¶

Get the vertex property that contains the computed vertex importance.

Returns: the vertex importance property
Return type: pypgx.api._property.VertexProperty

class pypgx.api.mllib.GraphWiseConvLayerConfig(java_config, params)¶

Bases: object

GraphWise conv layer configuration.

Return type: None

class pypgx.api.mllib.GraphWiseDgiLayerConfig(java_config, params)¶

Bases: object

GraphWise dgi layer configuration.

Return type: None

get_corruption_function()¶

Return the corruption function

Return type: pypgx.api.mllib._corruption_function.PermutationCorruption

get_discriminator()¶

Return the discriminator

Return type: str

get_readout_function()¶

Return the readout function

Return type: str

set_corruption_function(corruption_function)¶

Set the corruption function

Parameters: corruption_function (CorruptionFunction) – the corruption function. Supported currently: PermutationCorruption

set_discriminator(discriminator)¶

Set the discriminator

Parameters: discriminator (str) – The discriminator function. Supported currently: ‘bilinear’
Return type: None

set_readout_function(readout_function)¶

Set the readout function

Parameters: readout_function (str) – The readout function. Supported currently: ‘mean’
Return type: None

class pypgx.api.mllib.GraphWiseModelConfig(java_graphwise_model_config)¶

Bases: object

Graphwise Model Configuration class

Return type: None

get_conv_layer_configs()¶

Return a list of conv layer configs

Return type: List[pypgx.api.mllib._graphwise_conv_layer_config.GraphWiseConvLayerConfig]

set_batch_size(batch_size)¶

Set the batch size

Parameters: batch_size (int) – batch size
Return type: None

set_edge_input_feature_dim(edge_input_feature_dim)¶

Set the edge input feature dimension

Parameters: edge_input_feature_dim (int) – edge input feature dimension
Return type: None

set_embedding_dim(embedding_dim)¶

Set the embedding dimension

Parameters: embedding_dim (int) – embedding dimension
Return type: None

set_fitted(fitted)¶

Set the fitted flag

Parameters: fitted (bool) – fitted flag
Return type: None

set_input_feature_dim(input_feature_dim)¶

Set the input feature dimension

Parameters: input_feature_dim (int) – input feature dimension
Return type: None

set_learning_rate(learning_rate)¶

Set the learning rate

Parameters: learning_rate (int) – initial learning rate
Return type: None

set_num_epochs(num_epochs)¶

Set the number of epochs

Parameters: num_epochs (int) – number of epochs
Return type: None

set_seed(seed)¶

Set the seed

Parameters: seed (int) – seed
Return type: None

set_shuffle(shuffle)¶

Set the shuffling flag

Parameters: shuffle (bool) – shuffling flag
Return type: None

set_standarize(standardize)¶

Set the standardize flag

Parameters: standardize (bool) – standardize flag
Return type: None

set_training_loss(training_loss)¶

Set the training loss

Parameters: training_loss (float) – training loss
Return type: None

set_weight_decay(weight_decay)¶

Set the weight decay

Parameters: weight_decay (float) – weight decay
Return type: None

class pypgx.api.mllib.GraphWisePredictionLayerConfig(java_config, params)¶

Bases: object

GraphWise prediction layer configuration.

Return type: None

class pypgx.api.mllib.ModelRepository(java_generic_model_repository)¶

Bases: object

ModelRepository object that exposes crud operations on - model stores and - the models within these model stores.

create(model_store_name)¶

Create a new model store.

Parameters: model_store_name (str) – the name of the model store
Returns: None
Return type: None

delete_model(model_store_name, model_name)¶

Delete the model in the specified model store with the given model name.

Parameters

model_store_name (str) – the name of the model store
model_name (str) – the name under which the model was stored

Returns

None

Return type

None

delete_model_store(model_store_name)¶

Delete a model store.

Parameters: model_store_name (str) – the name of the model store
Returns: None
Return type: None

get_model_description(model_store_name, model_name)¶

Retrieve the description of the model in the specified model store, with the given model name.

Parameters

model_store_name (str) – the name of the model store
model_name (str) – the name under which the model was stored

Returns

A string containing the description that was stored with the model

Return type

str

list_model_stores_names()¶

List the names of all model stores in the model repository.

Returns: List of names.
Return type: List[str]

list_model_stores_names_matching(regex)¶

List the names of all model stores in the model repository that match the regex.

Parameters: regex (str) – a regex in form of a string.
Returns: List of matching names.
Return type: List[str]

list_models(model_store_name)¶

List the models present in the model store with the given name.

Parameters: model_store_name (str) – the name of the model store (non-prefixed)
Returns: List of model names.
Return type: List[str]

class pypgx.api.mllib.ModelRepositoryBuilder(java_generic_model_repository_builder)¶

Bases: object

ModelRepositoryBuilder object that can be used to configure the connection to a model repository.

db(username=None, password=None, jdbc_url=None, keystore_alias=None, schema=None)¶

Connect to a model repository backed by a database.

Parameters

username (Optional[str]) – username in database
password (Optional[str]) – password of username in database
jdbc_url (Optional[str]) – jdbc url of database
keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore
schema (Optional[str]) – the schema of the model store in database

Returns

A model repository configured to connect to a database.

Return type

pypgx.api.mllib._model_repo.ModelRepository

class pypgx.api.mllib.PermutationCorruption(java_permutation_corruption)¶

Bases: pypgx.api.mllib._corruption_function.CorruptionFunction

Permutation Function which shuffle the nodes to generate the corrupted subgraph for DGI

Return type: None

class pypgx.api.mllib.Pg2vecModel(java_pg2vec_model)¶

Bases: pypgx.api._pgx_context_manager.PgxContextManager

Pg2Vec model object.

Return type: None

close()¶

Call destroy

Return type: None

compute_similars(graphlet_id, k)¶

Compute the top-k similar graphlets for a list of input graphlets.

Parameters

graphlet_id (Union[Iterable[Union[int, str]], int, str]) – graphletIds or iterable of graphletIds
k (int) – number of similars to return

Return type

pypgx.api.frames._pgx_frame.PgxFrame

destroy()¶

Destroy this model object.

Return type: None

export()¶

Return a ModelStore object which can be used to save the model.

Returns: ModelStore object
Return type: pypgx.api.mllib._model_utils.ModelStorer

fit(graph)¶

Fit the model on a graph.

Parameters: graph (pypgx.api._pgx_graph.PgxGraph) – Graph to fit on
Return type: None

infer_graphlet_vector(graph)¶

Return the inferred vector of the input graphlet as a PgxFrame.

Parameters: graph (pypgx.api._pgx_graph.PgxGraph) – graphlet for which to infer a vector
Return type: pypgx.api.frames._pgx_frame.PgxFrame

infer_graphlet_vector_batched(graph)¶

Return the inferred vectors of the input graphlets as a PgxFrame.

Parameters: graph (pypgx.api._pgx_graph.PgxGraph) – graphlets (as a single graph but different graphlet-id) for which to infer vectors
Return type: pypgx.api.frames._pgx_frame.PgxFrame

store(path, key, overwrite=False)¶

Store the model in a file.

Parameters

path (str) – Path where to store the model
key (Optional[str]) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file

Return type

None

property trained_graphlet_vectors: pypgx.api.frames._pgx_frame.PgxFrame¶

Get the trained graphlet vectors for the current pg2vec model.

Returns: PgxFrame containing the trained graphlet vectors

class pypgx.api.mllib.SigmoidCrossEntropyLoss¶

Bases: pypgx.api.mllib._loss_function.LossFunction

Sigmoid Cross Entropy loss for binary classification

class pypgx.api.mllib.SoftmaxCrossEntropyLoss¶

Bases: pypgx.api.mllib._loss_function.LossFunction

Softmax Cross Entropy loss for multi-class classification

Return type: None

class pypgx.api.mllib.SupervisedGnnExplanation(java_supervised_gnn_explanation, bool_label)¶

Bases: pypgx.api.mllib._gnn_explanation.GnnExplanation

SupervisedGnnExplanation object

Parameters: bool_label (bool) –
Return type: None

get_embedding()¶

Get the inferred embedding of the specified vertex.

Returns: the embedding
Return type: List[float]

get_importance_graph()¶

Get the importance Graph, that is, the computation graph with an additional vertex property indicating vertex importance. The additional importance property can be retrieved via get_vertex_importance_property.

Returns: the importance graph
Return type: PgxGraph

get_label()¶

Get the inferred label of the specified vertex.

Returns: the label
Return type: Any

get_logits()¶

Get the inferred logits of the specified vertex.

Returns: the logits
Return type: List[float]

get_vertex_feature_importance()¶

Get the feature importances as a map from property to importance value.

Returns: the feature importances.
Return type: Dict[pypgx.api._property.VertexProperty, float]

get_vertex_importance_property()¶

Get the vertex property that contains the computed vertex importance.

Returns: the vertex importance property
Return type: pypgx.api._property.VertexProperty

class pypgx.api.mllib.SupervisedGraphWiseModel(java_graphwise_model, params=None)¶

Bases: pypgx.api.mllib._graphwise_model.GraphWiseModel

SupervisedGraphWise model object.

Return type: None

check_is_fitted()¶

Make sure the model is fitted.

Returns: None
Raise: RuntimeError if the model is not fitted
Return type: None

close()¶

Call destroy().

Returns: None
Return type: None

destroy()¶

Destroy this model object.

Returns: None
Return type: None

evaluate_labels(graph, vertices)¶

Evaluate (macro averaged) classification performance statistics for the specified vertices.

Parameters

graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to evaluate on. Can be a list of vertices or their IDs.

Returns

PgxFrame containing the metrics

Return type

PgxFrame

export()¶

Return a ModelStorer object which can be used to save the model.

Returns: ModelStorer object
Return type: ModelStorer

fit(graph)¶

Fit the model on a graph.

Parameters: graph (PgxGraph) – Graph to fit on
Returns: None
Return type: None

get_batch_size()¶

Get the batch size

Returns: batch size
Return type: int

get_class_weights()¶

Get the class weights.

Returns: a dictionary mapping classes to their weights.
Return type: dict

get_config()¶

Return the GraphWiseModelConfig object

Returns: the config
Return type: GraphWiseModelConfig

get_conv_layer_config()¶

Get the configuration objects for the convolutional layers

Returns: configurations
Return type: GraphWiseConvLayerConfig

get_edge_input_feature_dim()¶

Get the edges input feature dimension, that is, the dimension of all the input edge properties when concatenated

Returns: edges input feature dimension
Return type: int

get_edge_input_property_names()¶

Get the edges input feature names

Returns: edges input feature names
Return type: list(str)

get_layer_size()¶

Get the dimension of the embeddings

Returns: embedding dimension
Return type: int

get_learning_rate()¶

Get the initial learning rate

Returns: initial learning rate
Return type: float

get_loss_function()¶

Get the loss function name.

Returns: loss function name. Can be one of softmax_cross_entropy, sigmoid_cross_entropy, devnet
Return type: str

get_loss_function_class()¶

Get the loss function.

Returns: loss function
Return type: LossFunction

get_num_epochs()¶

Get the number of epochs to train the model

Returns: number of epochs to train the model
Return type: int

get_prediction_layer_configs()¶

Get the configuration objects for the prediction layers.

Returns: configuration of the prediction layer
Return type: GraphWisePredictionLayerConfig

get_seed()¶

Get the random seed

Returns: random seed
Return type: int

get_target_vertex_labels()¶

Get the target vertex labels

Returns: target vertex labels
Return type: List[str]

get_training_loss()¶

Get the final training loss

Returns: training loss
Return type: float

get_vertex_input_feature_dim()¶

Get the input feature dimension, that is, the dimension of all the input vertex properties when concatenated

Returns: input feature dimension
Return type: int

get_vertex_input_property_names()¶

Get the vertices input feature names

Returns: vertices input feature names
Return type: list(str)

get_vertex_target_property_name()¶

Get the target property name

Returns: target property name
Return type: str

gnn_explainer(num_optimization_steps=200, learning_rate=0.05, marginalize=False)¶

Configure and return the GnnExplainer object of this model that can be used to request explanations of predictions.

Parameters

num_optimization_steps (int, optional) – optimization steps for the explainer, defaults to 200
learning_rate (float, optional) – learning rate for the explainer, defaults to 0.05
marginalize (bool, optional) – marginalize the loss over features, defaults to False

Returns

SupervisedGnnExplainer object of this model

Return type

SupervisedGnnExplainer

infer_and_get_explanation(graph, vertex, num_optimization_steps=200, learning_rate=0.05, marginalize=False)¶

Perform inference on the specified vertex and generate an explanation that contains scores of how important each property and each vertex in the computation graph is for the prediction.

Parameters

graph (PgxGraph) – the graph
vertex (PgxVertex or int) – the vertex or its ID
num_optimization_steps (int) –
learning_rate (float) –
marginalize (bool) –

Returns

explanation containing feature importance and vertex importance.

Return type

SupervisedGnnExplanation

infer_embeddings(graph, vertices)¶

Infer the embeddings for the specified vertices

Parameters

graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to infer embeddings for. Can be a list of vertices or their IDs.

Returns

PgxFrame containing the embeddings for each vertex

Return type

PgxFrame

infer_labels(graph, vertices)¶

Infer the labels for the specified vertices

Parameters

graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to infer labels for. Can be a list of vertices or their IDs.

Returns

PgxFrame containing the labels for each vertex

Return type

PgxFrame

infer_logits(graph, vertices)¶

Infer the prediction logits for the specified vertices

Parameters

graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to infer logits for. Can be a list of vertices or their IDs.

Returns

PgxFrame containing the logits for each vertex

Return type

PgxFrame

is_fitted()¶

Check if the model is fitted

Returns: True if the model is fitted, False otherwise
Return type: bool

store(path, key, overwrite=False)¶

Store the model in a file.

Parameters

path (str) – Path where to store the model
key (str) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file

Returns

None

Return type

None

update_is_fitted()¶

Determine whether the model is fitted.

This updates the internal state.

Returns: None
Return type: None

class pypgx.api.mllib.UnsupervisedGraphWiseModel(java_graphwise_model, params=None)¶

Bases: pypgx.api.mllib._graphwise_model.GraphWiseModel

UnsupervisedGraphWise model object.

Return type: None

check_is_fitted()¶

Make sure the model is fitted.

Returns: None
Raise: RuntimeError if the model is not fitted
Return type: None

close()¶

Call destroy().

Returns: None
Return type: None

destroy()¶

Destroy this model object.

Returns: None
Return type: None

export()¶

Return a ModelStorer object which can be used to save the model.

Returns: ModelStorer object
Return type: ModelStorer

fit(graph)¶

Fit the model on a graph.

Parameters: graph (PgxGraph) – Graph to fit on
Returns: None
Return type: None

get_batch_size()¶

Get the batch size

Returns: batch size
Return type: int

get_config()¶

Return the GraphWiseModelConfig object

Returns: the config
Return type: GraphWiseModelConfig

get_conv_layer_config()¶

Get the configuration objects for the convolutional layers

Returns: configurations
Return type: GraphWiseConvLayerConfig

get_dgi_layer_config()¶

Get the configuration object for the dgi layer.

Returns: configuration
Return type: GraphWiseDgiLayerConfig

get_edge_input_feature_dim()¶

Get the edges input feature dimension, that is, the dimension of all the input edge properties when concatenated

Returns: edges input feature dimension
Return type: int

get_edge_input_property_names()¶

Get the edges input feature names

Returns: edges input feature names
Return type: list(str)

get_layer_size()¶

Get the dimension of the embeddings

Returns: embedding dimension
Return type: int

get_learning_rate()¶

Get the initial learning rate

Returns: initial learning rate
Return type: float

get_loss_function()¶

Get the loss function name.

Returns: loss function name. Can only be sigmoid_cross_entropy
Return type: str

get_num_epochs()¶

Get the number of epochs to train the model

Returns: number of epochs to train the model
Return type: int

get_seed()¶

Get the random seed

Returns: random seed
Return type: int

get_training_loss()¶

Get the final training loss

Returns: training loss
Return type: float

get_vertex_input_feature_dim()¶

Get the input feature dimension, that is, the dimension of all the input vertex properties when concatenated

Returns: input feature dimension
Return type: int

get_vertex_input_property_names()¶

Get the vertices input feature names

Returns: vertices input feature names
Return type: list(str)

gnn_explainer(num_optimization_steps=200, learning_rate=0.05, marginalize=False, num_clusters=50, num_samples=10000)¶

Configure and return the GnnExplainer object of this model that can be used to request explanations of predictions.

Parameters

num_optimization_steps (int, optional) – optimization steps for the explainer, defaults to 200
learning_rate (float, optional) – learning rate for the explainer, defaults to 0.05
marginalize (bool, optional) – marginalize the loss over features, defaults to False
num_clusters (int, optional) – number of clusters to use, defaults to 50
num_samples (int, optional) – number of samples to use, defaults to 10000

Returns

UnsupervisedGnnExplainer object of this model

Return type

UnsupervisedGnnExplainer

infer_and_get_explanation(graph, vertex, num_clusters=50, num_samples=10000, num_optimization_steps=200, learning_rate=0.05, marginalize=False)¶

Perform inference on the specified vertex and generate an explanation that contains scores of how important each property and each vertex in the computation graph is for the embeddings position relative to embeddings of other vertices in the graph.

Parameters

graph (pypgx.api._pgx_graph.PgxGraph) – the graph
vertex (Union[pypgx.api._pgx_entity.PgxVertex, int]) – the vertex
num_clusters (int) – the number of semantic vertex clusters expected in the graph, must be greater than 1
num_samples (int) –
num_optimization_steps (int) –
learning_rate (float) –
marginalize (bool) –

Returns

explanation containing feature importance and vertex importance.

Return type

pypgx.api.mllib._gnn_explanation.GnnExplanation

infer_embeddings(graph, vertices)¶

Infer the embeddings for the specified vertices.

Returns

PgxFrame containing the embeddings for each vertex.

Return type

PgxFrame

Parameters

graph (pypgx.api._pgx_graph.PgxGraph) –
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) –

is_fitted()¶

Check if the model is fitted

Returns: True if the model is fitted, False otherwise
Return type: bool

store(path, key, overwrite=False)¶

Store the model in a file.

Parameters

path (str) – Path where to store the model
key (str) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file

Returns

None

Return type

None

update_is_fitted()¶

Determine whether the model is fitted.

This updates the internal state.

Returns: None
Return type: None

class pypgx.api.mllib._model_utils.ModelStorer(model)¶

Bases: object

ModelStorer object.

Parameters: model (Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]) –
Return type: None

db(model_store, model_name, username=None, password=None, jdbc_url=None, model_description=None, overwrite=False, keystore_alias=None, schema=None)¶

Store a model to a database.

Parameters

username (Optional[str]) – username in database
password (Optional[str]) – password of username in database
model_store (str) – model store in database
model_name (str) – name of the model to store
jdbc_url (Optional[str]) – jdbc url of database
model_description (Optional[str]) – description of model
overwrite (bool) – boolean value for overwriting or not
keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore
schema – the schema of the model store in database

Return type

None

file(path, key, overwrite=False)¶

Store an encrypted model to a file.

Parameters

path (str) – path to store model
key (str) – key used for encryption
overwrite (bool) – boolean value for overwriting or not

Return type

None

class pypgx.api.mllib._model_utils.ModelLoader(analyst, java_model_loader, wrapper, java_class)¶

Bases: object

ModelLoader object.

Parameters

analyst (Analyst) –
java_model_loader (Any) –
wrapper (Callable) –
java_class (str) –

Return type

None

db(model_store, model_name, username=None, password=None, jdbc_url=None, keystore_alias=None, schema=None)¶

Return a model stored in a database.

Parameters

username (Optional[str]) – username in database
password (Optional[str]) – password of username in database
model_store (str) – model store in database
model_name (str) – name of the model to load
jdbc_url (Optional[str]) – jdbc url of database
keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore
schema – the schema of the model store in database

Returns

model stored in database.

Return type

Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]

file(path, key)¶

Return an encrypted model stored in a file.

Parameters

path (str) – path of stored model
key (str) – used for encryption

Returns

model stored in file.

Return type

Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]