MLlib¶
Graph machine learning tools for use with PGX.
- class pypgx.api.mllib.CorruptionFunction(java_corruption_function)¶
Bases:
object
Abstract Corruption Function which generate the corrupted subgraph for DGI
- Return type
None
- class pypgx.api.mllib.DeepWalkModel(java_deepwalk_model)¶
Bases:
pypgx.api._pgx_context_manager.PgxContextManager
DeepWalk model object.
- Return type
None
- close()¶
Call destroy
- Return type
None
- compute_similars(v, k)¶
Compute the top-k similar vertices for a given vertex.
- Parameters
v (Union[int, str, List[int], List[str]]) – id of the vertex or list of vertex ids for which to compute the similar vertices
k (int) – number of similar vertices to return
- Return type
- destroy()¶
Destroy this model object.
- Return type
None
- export()¶
Return a ModelStorer object which can be used to save the model.
- Returns
ModelStorer object
- Return type
- fit(graph)¶
Fit the model on a graph.
- Parameters
graph (pypgx.api._pgx_graph.PgxGraph) – Graph to fit on
- Return type
None
- store(path, key, overwrite=False)¶
Store the model in a file.
- Parameters
path (str) – Path where to store the model
key (Optional[str]) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file
- Return type
None
- property trained_vectors: pypgx.api.frames._pgx_frame.PgxFrame¶
Get the trained vertex vectors for the current DeepWalk model.
- Returns
PgxFrame object with the trained vertex vectors
- Return type
- class pypgx.api.mllib.DevNetLoss(confidence_margin, anomaly_property_value)¶
Bases:
pypgx.api.mllib._loss_function.LossFunction
Deviation loss for anomaly detection
- Parameters
confidence_margin (float) –
anomaly_property_value (bool) –
- Return type
None
- get_anomaly_property_value()¶
Get Anomaly Property Value.
- Returns
the anomaly property value
- Return type
Any
- get_confidence_margin()¶
Get confidence margin of the loss function.
- Returns
the confidence margin
- Return type
float
- class pypgx.api.mllib.GnnExplanation(java_gnn_explanation)¶
Bases:
object
GnnExplanation object
- Return type
None
- get_embedding()¶
Get the inferred embedding of the specified vertex.
- Returns
the embedding
- Return type
List[float]
- get_importance_graph()¶
Get the importance Graph, that is, the computation graph with an additional vertex property indicating vertex importance. The additional importance property can be retrieved via get_vertex_importance_property.
- Returns
the importance graph
- Return type
- get_vertex_feature_importance()¶
Get the feature importances as a map from property to importance value.
- Returns
the feature importances.
- Return type
Dict[pypgx.api._property.VertexProperty, float]
- get_vertex_importance_property()¶
Get the vertex property that contains the computed vertex importance.
- Returns
the vertex importance property
- Return type
- class pypgx.api.mllib.GraphWiseConvLayerConfig(java_config, params)¶
Bases:
object
GraphWise conv layer configuration.
- Return type
None
- class pypgx.api.mllib.GraphWiseDgiLayerConfig(java_config, params)¶
Bases:
object
GraphWise dgi layer configuration.
- Return type
None
- get_corruption_function()¶
Return the corruption function
- get_discriminator()¶
Return the discriminator
- Return type
str
- get_readout_function()¶
Return the readout function
- Return type
str
- set_corruption_function(corruption_function)¶
Set the corruption function
- Parameters
corruption_function (CorruptionFunction) – the corruption function. Supported currently:
PermutationCorruption
- set_discriminator(discriminator)¶
Set the discriminator
- Parameters
discriminator (str) – The discriminator function. Supported currently: ‘bilinear’
- Return type
None
- set_readout_function(readout_function)¶
Set the readout function
- Parameters
readout_function (str) – The readout function. Supported currently: ‘mean’
- Return type
None
- class pypgx.api.mllib.GraphWiseModelConfig(java_graphwise_model_config)¶
Bases:
object
Graphwise Model Configuration class
- Return type
None
- get_conv_layer_configs()¶
Return a list of conv layer configs
- Return type
List[pypgx.api.mllib._graphwise_conv_layer_config.GraphWiseConvLayerConfig]
- set_batch_size(batch_size)¶
Set the batch size
- Parameters
batch_size (int) – batch size
- Return type
None
- set_edge_input_feature_dim(edge_input_feature_dim)¶
Set the edge input feature dimension
- Parameters
edge_input_feature_dim (int) – edge input feature dimension
- Return type
None
- set_embedding_dim(embedding_dim)¶
Set the embedding dimension
- Parameters
embedding_dim (int) – embedding dimension
- Return type
None
- set_fitted(fitted)¶
Set the fitted flag
- Parameters
fitted (bool) – fitted flag
- Return type
None
- set_input_feature_dim(input_feature_dim)¶
Set the input feature dimension
- Parameters
input_feature_dim (int) – input feature dimension
- Return type
None
- set_learning_rate(learning_rate)¶
Set the learning rate
- Parameters
learning_rate (int) – initial learning rate
- Return type
None
- set_num_epochs(num_epochs)¶
Set the number of epochs
- Parameters
num_epochs (int) – number of epochs
- Return type
None
- set_seed(seed)¶
Set the seed
- Parameters
seed (int) – seed
- Return type
None
- set_shuffle(shuffle)¶
Set the shuffling flag
- Parameters
shuffle (bool) – shuffling flag
- Return type
None
- set_standarize(standardize)¶
Set the standardize flag
- Parameters
standardize (bool) – standardize flag
- Return type
None
- set_training_loss(training_loss)¶
Set the training loss
- Parameters
training_loss (float) – training loss
- Return type
None
- set_weight_decay(weight_decay)¶
Set the weight decay
- Parameters
weight_decay (float) – weight decay
- Return type
None
- class pypgx.api.mllib.GraphWisePredictionLayerConfig(java_config, params)¶
Bases:
object
GraphWise prediction layer configuration.
- Return type
None
- class pypgx.api.mllib.ModelRepository(java_generic_model_repository)¶
Bases:
object
ModelRepository object that exposes crud operations on - model stores and - the models within these model stores.
- create(model_store_name)¶
Create a new model store.
- Parameters
model_store_name (str) – the name of the model store
- Returns
None
- Return type
None
- delete_model(model_store_name, model_name)¶
Delete the model in the specified model store with the given model name.
- Parameters
model_store_name (str) – the name of the model store
model_name (str) – the name under which the model was stored
- Returns
None
- Return type
None
- delete_model_store(model_store_name)¶
Delete a model store.
- Parameters
model_store_name (str) – the name of the model store
- Returns
None
- Return type
None
- get_model_description(model_store_name, model_name)¶
Retrieve the description of the model in the specified model store, with the given model name.
- Parameters
model_store_name (str) – the name of the model store
model_name (str) – the name under which the model was stored
- Returns
A string containing the description that was stored with the model
- Return type
str
- list_model_stores_names()¶
List the names of all model stores in the model repository.
- Returns
List of names.
- Return type
List[str]
- list_model_stores_names_matching(regex)¶
List the names of all model stores in the model repository that match the regex.
- Parameters
regex (str) – a regex in form of a string.
- Returns
List of matching names.
- Return type
List[str]
- list_models(model_store_name)¶
List the models present in the model store with the given name.
- Parameters
model_store_name (str) – the name of the model store (non-prefixed)
- Returns
List of model names.
- Return type
List[str]
- class pypgx.api.mllib.ModelRepositoryBuilder(java_generic_model_repository_builder)¶
Bases:
object
ModelRepositoryBuilder object that can be used to configure the connection to a model repository.
- db(username=None, password=None, jdbc_url=None, keystore_alias=None, schema=None)¶
Connect to a model repository backed by a database.
- Parameters
username (Optional[str]) – username in database
password (Optional[str]) – password of username in database
jdbc_url (Optional[str]) – jdbc url of database
keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore
schema (Optional[str]) – the schema of the model store in database
- Returns
A model repository configured to connect to a database.
- Return type
- class pypgx.api.mllib.PermutationCorruption(java_permutation_corruption)¶
Bases:
pypgx.api.mllib._corruption_function.CorruptionFunction
Permutation Function which shuffle the nodes to generate the corrupted subgraph for DGI
- Return type
None
- class pypgx.api.mllib.Pg2vecModel(java_pg2vec_model)¶
Bases:
pypgx.api._pgx_context_manager.PgxContextManager
Pg2Vec model object.
- Return type
None
- close()¶
Call destroy
- Return type
None
- compute_similars(graphlet_id, k)¶
Compute the top-k similar graphlets for a list of input graphlets.
- Parameters
graphlet_id (Union[Iterable[Union[int, str]], int, str]) – graphletIds or iterable of graphletIds
k (int) – number of similars to return
- Return type
- destroy()¶
Destroy this model object.
- Return type
None
- export()¶
Return a ModelStore object which can be used to save the model.
- Returns
ModelStore object
- Return type
- fit(graph)¶
Fit the model on a graph.
- Parameters
graph (pypgx.api._pgx_graph.PgxGraph) – Graph to fit on
- Return type
None
- infer_graphlet_vector(graph)¶
Return the inferred vector of the input graphlet as a PgxFrame.
- Parameters
graph (pypgx.api._pgx_graph.PgxGraph) – graphlet for which to infer a vector
- Return type
- infer_graphlet_vector_batched(graph)¶
Return the inferred vectors of the input graphlets as a PgxFrame.
- Parameters
graph (pypgx.api._pgx_graph.PgxGraph) – graphlets (as a single graph but different graphlet-id) for which to infer vectors
- Return type
- store(path, key, overwrite=False)¶
Store the model in a file.
- Parameters
path (str) – Path where to store the model
key (Optional[str]) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file
- Return type
None
- property trained_graphlet_vectors: pypgx.api.frames._pgx_frame.PgxFrame¶
Get the trained graphlet vectors for the current pg2vec model.
- Returns
PgxFrame containing the trained graphlet vectors
- class pypgx.api.mllib.SigmoidCrossEntropyLoss¶
Bases:
pypgx.api.mllib._loss_function.LossFunction
Sigmoid Cross Entropy loss for binary classification
- class pypgx.api.mllib.SoftmaxCrossEntropyLoss¶
Bases:
pypgx.api.mllib._loss_function.LossFunction
Softmax Cross Entropy loss for multi-class classification
- Return type
None
- class pypgx.api.mllib.SupervisedGnnExplanation(java_supervised_gnn_explanation, bool_label)¶
Bases:
pypgx.api.mllib._gnn_explanation.GnnExplanation
SupervisedGnnExplanation object
- Parameters
bool_label (bool) –
- Return type
None
- get_embedding()¶
Get the inferred embedding of the specified vertex.
- Returns
the embedding
- Return type
List[float]
- get_importance_graph()¶
Get the importance Graph, that is, the computation graph with an additional vertex property indicating vertex importance. The additional importance property can be retrieved via get_vertex_importance_property.
- Returns
the importance graph
- Return type
- get_label()¶
Get the inferred label of the specified vertex.
- Returns
the label
- Return type
Any
- get_logits()¶
Get the inferred logits of the specified vertex.
- Returns
the logits
- Return type
List[float]
- get_vertex_feature_importance()¶
Get the feature importances as a map from property to importance value.
- Returns
the feature importances.
- Return type
Dict[pypgx.api._property.VertexProperty, float]
- get_vertex_importance_property()¶
Get the vertex property that contains the computed vertex importance.
- Returns
the vertex importance property
- Return type
- class pypgx.api.mllib.SupervisedGraphWiseModel(java_graphwise_model, params=None)¶
Bases:
pypgx.api.mllib._graphwise_model.GraphWiseModel
SupervisedGraphWise model object.
- Return type
None
- check_is_fitted()¶
Make sure the model is fitted.
- Returns
None
- Raise
RuntimeError if the model is not fitted
- Return type
None
- destroy()¶
Destroy this model object.
- Returns
None
- Return type
None
- evaluate_labels(graph, vertices)¶
Evaluate (macro averaged) classification performance statistics for the specified vertices.
- Parameters
graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to evaluate on. Can be a list of vertices or their IDs.
- Returns
PgxFrame containing the metrics
- Return type
- export()¶
Return a ModelStorer object which can be used to save the model.
- Returns
ModelStorer object
- Return type
- fit(graph)¶
Fit the model on a graph.
- Parameters
graph (PgxGraph) – Graph to fit on
- Returns
None
- Return type
None
- get_batch_size()¶
Get the batch size
- Returns
batch size
- Return type
int
- get_class_weights()¶
Get the class weights.
- Returns
a dictionary mapping classes to their weights.
- Return type
dict
- get_config()¶
Return the GraphWiseModelConfig object
- Returns
the config
- Return type
- get_conv_layer_config()¶
Get the configuration objects for the convolutional layers
- Returns
configurations
- Return type
- get_edge_input_feature_dim()¶
Get the edges input feature dimension, that is, the dimension of all the input edge properties when concatenated
- Returns
edges input feature dimension
- Return type
int
- get_edge_input_property_names()¶
Get the edges input feature names
- Returns
edges input feature names
- Return type
list(str)
- get_layer_size()¶
Get the dimension of the embeddings
- Returns
embedding dimension
- Return type
int
- get_learning_rate()¶
Get the initial learning rate
- Returns
initial learning rate
- Return type
float
- get_loss_function()¶
Get the loss function name.
- Returns
loss function name. Can be one of softmax_cross_entropy, sigmoid_cross_entropy, devnet
- Return type
str
- get_loss_function_class()¶
Get the loss function.
- Returns
loss function
- Return type
LossFunction
- get_num_epochs()¶
Get the number of epochs to train the model
- Returns
number of epochs to train the model
- Return type
int
- get_prediction_layer_configs()¶
Get the configuration objects for the prediction layers.
- Returns
configuration of the prediction layer
- Return type
- get_seed()¶
Get the random seed
- Returns
random seed
- Return type
int
- get_target_vertex_labels()¶
Get the target vertex labels
- Returns
target vertex labels
- Return type
List[str]
- get_training_loss()¶
Get the final training loss
- Returns
training loss
- Return type
float
- get_vertex_input_feature_dim()¶
Get the input feature dimension, that is, the dimension of all the input vertex properties when concatenated
- Returns
input feature dimension
- Return type
int
- get_vertex_input_property_names()¶
Get the vertices input feature names
- Returns
vertices input feature names
- Return type
list(str)
- get_vertex_target_property_name()¶
Get the target property name
- Returns
target property name
- Return type
str
- gnn_explainer(num_optimization_steps=200, learning_rate=0.05, marginalize=False)¶
Configure and return the GnnExplainer object of this model that can be used to request explanations of predictions.
- Parameters
num_optimization_steps (int, optional) – optimization steps for the explainer, defaults to 200
learning_rate (float, optional) – learning rate for the explainer, defaults to 0.05
marginalize (bool, optional) – marginalize the loss over features, defaults to False
- Returns
SupervisedGnnExplainer object of this model
- Return type
SupervisedGnnExplainer
- infer_and_get_explanation(graph, vertex, num_optimization_steps=200, learning_rate=0.05, marginalize=False)¶
Perform inference on the specified vertex and generate an explanation that contains scores of how important each property and each vertex in the computation graph is for the prediction.
- Parameters
- Returns
explanation containing feature importance and vertex importance.
- Return type
- infer_embeddings(graph, vertices)¶
Infer the embeddings for the specified vertices
- Parameters
graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to infer embeddings for. Can be a list of vertices or their IDs.
- Returns
PgxFrame containing the embeddings for each vertex
- Return type
- infer_labels(graph, vertices)¶
Infer the labels for the specified vertices
- Parameters
graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to infer labels for. Can be a list of vertices or their IDs.
- Returns
PgxFrame containing the labels for each vertex
- Return type
- infer_logits(graph, vertices)¶
Infer the prediction logits for the specified vertices
- Parameters
graph (PgxGraph) – the graph
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) – the vertices to infer logits for. Can be a list of vertices or their IDs.
- Returns
PgxFrame containing the logits for each vertex
- Return type
- is_fitted()¶
Check if the model is fitted
- Returns
True if the model is fitted, False otherwise
- Return type
bool
- store(path, key, overwrite=False)¶
Store the model in a file.
- Parameters
path (str) – Path where to store the model
key (str) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file
- Returns
None
- Return type
None
- update_is_fitted()¶
Determine whether the model is fitted.
This updates the internal state.
- Returns
None
- Return type
None
- class pypgx.api.mllib.UnsupervisedGraphWiseModel(java_graphwise_model, params=None)¶
Bases:
pypgx.api.mllib._graphwise_model.GraphWiseModel
UnsupervisedGraphWise model object.
- Return type
None
- check_is_fitted()¶
Make sure the model is fitted.
- Returns
None
- Raise
RuntimeError if the model is not fitted
- Return type
None
- destroy()¶
Destroy this model object.
- Returns
None
- Return type
None
- export()¶
Return a ModelStorer object which can be used to save the model.
- Returns
ModelStorer object
- Return type
- fit(graph)¶
Fit the model on a graph.
- Parameters
graph (PgxGraph) – Graph to fit on
- Returns
None
- Return type
None
- get_batch_size()¶
Get the batch size
- Returns
batch size
- Return type
int
- get_config()¶
Return the GraphWiseModelConfig object
- Returns
the config
- Return type
- get_conv_layer_config()¶
Get the configuration objects for the convolutional layers
- Returns
configurations
- Return type
- get_dgi_layer_config()¶
Get the configuration object for the dgi layer.
- Returns
configuration
- Return type
- get_edge_input_feature_dim()¶
Get the edges input feature dimension, that is, the dimension of all the input edge properties when concatenated
- Returns
edges input feature dimension
- Return type
int
- get_edge_input_property_names()¶
Get the edges input feature names
- Returns
edges input feature names
- Return type
list(str)
- get_layer_size()¶
Get the dimension of the embeddings
- Returns
embedding dimension
- Return type
int
- get_learning_rate()¶
Get the initial learning rate
- Returns
initial learning rate
- Return type
float
- get_loss_function()¶
Get the loss function name.
- Returns
loss function name. Can only be sigmoid_cross_entropy
- Return type
str
- get_num_epochs()¶
Get the number of epochs to train the model
- Returns
number of epochs to train the model
- Return type
int
- get_seed()¶
Get the random seed
- Returns
random seed
- Return type
int
- get_training_loss()¶
Get the final training loss
- Returns
training loss
- Return type
float
- get_vertex_input_feature_dim()¶
Get the input feature dimension, that is, the dimension of all the input vertex properties when concatenated
- Returns
input feature dimension
- Return type
int
- get_vertex_input_property_names()¶
Get the vertices input feature names
- Returns
vertices input feature names
- Return type
list(str)
- gnn_explainer(num_optimization_steps=200, learning_rate=0.05, marginalize=False, num_clusters=50, num_samples=10000)¶
Configure and return the GnnExplainer object of this model that can be used to request explanations of predictions.
- Parameters
num_optimization_steps (int, optional) – optimization steps for the explainer, defaults to 200
learning_rate (float, optional) – learning rate for the explainer, defaults to 0.05
marginalize (bool, optional) – marginalize the loss over features, defaults to False
num_clusters (int, optional) – number of clusters to use, defaults to 50
num_samples (int, optional) – number of samples to use, defaults to 10000
- Returns
UnsupervisedGnnExplainer object of this model
- Return type
UnsupervisedGnnExplainer
- infer_and_get_explanation(graph, vertex, num_clusters=50, num_samples=10000, num_optimization_steps=200, learning_rate=0.05, marginalize=False)¶
Perform inference on the specified vertex and generate an explanation that contains scores of how important each property and each vertex in the computation graph is for the embeddings position relative to embeddings of other vertices in the graph.
- Parameters
graph (pypgx.api._pgx_graph.PgxGraph) – the graph
vertex (Union[pypgx.api._pgx_entity.PgxVertex, int]) – the vertex
num_clusters (int) – the number of semantic vertex clusters expected in the graph, must be greater than 1
num_samples (int) –
num_optimization_steps (int) –
learning_rate (float) –
marginalize (bool) –
- Returns
explanation containing feature importance and vertex importance.
- Return type
- infer_embeddings(graph, vertices)¶
Infer the embeddings for the specified vertices.
- Returns
PgxFrame containing the embeddings for each vertex.
- Return type
- Parameters
graph (pypgx.api._pgx_graph.PgxGraph) –
vertices (Union[Iterable[pypgx.api._pgx_entity.PgxVertex], Iterable[int]]) –
- is_fitted()¶
Check if the model is fitted
- Returns
True if the model is fitted, False otherwise
- Return type
bool
- store(path, key, overwrite=False)¶
Store the model in a file.
- Parameters
path (str) – Path where to store the model
key (str) – Encryption key
overwrite (bool) – Whether or not to overwrite pre-existing file
- Returns
None
- Return type
None
- update_is_fitted()¶
Determine whether the model is fitted.
This updates the internal state.
- Returns
None
- Return type
None
- class pypgx.api.mllib._model_utils.ModelStorer(model)¶
Bases:
object
ModelStorer object.
- Parameters
model (Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]) –
- Return type
None
- db(model_store, model_name, username=None, password=None, jdbc_url=None, model_description=None, overwrite=False, keystore_alias=None, schema=None)¶
Store a model to a database.
- Parameters
username (Optional[str]) – username in database
password (Optional[str]) – password of username in database
model_store (str) – model store in database
model_name (str) – name of the model to store
jdbc_url (Optional[str]) – jdbc url of database
model_description (Optional[str]) – description of model
overwrite (bool) – boolean value for overwriting or not
keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore
schema – the schema of the model store in database
- Return type
None
- file(path, key, overwrite=False)¶
Store an encrypted model to a file.
- Parameters
path (str) – path to store model
key (str) – key used for encryption
overwrite (bool) – boolean value for overwriting or not
- Return type
None
- class pypgx.api.mllib._model_utils.ModelLoader(analyst, java_model_loader, wrapper, java_class)¶
Bases:
object
ModelLoader object.
- Parameters
analyst (Analyst) –
java_model_loader (Any) –
wrapper (Callable) –
java_class (str) –
- Return type
None
- db(model_store, model_name, username=None, password=None, jdbc_url=None, keystore_alias=None, schema=None)¶
Return a model stored in a database.
- Parameters
username (Optional[str]) – username in database
password (Optional[str]) – password of username in database
model_store (str) – model store in database
model_name (str) – name of the model to load
jdbc_url (Optional[str]) – jdbc url of database
keystore_alias (Optional[str]) – the keystore alias to get the password in the keystore
schema – the schema of the model store in database
- Returns
model stored in database.
- Return type
Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]
- file(path, key)¶
Return an encrypted model stored in a file.
- Parameters
path (str) – path of stored model
key (str) – used for encryption
- Returns
model stored in file.
- Return type
Union[SupervisedGraphWiseModel, Pg2vecModel, UnsupervisedGraphWiseModel, DeepWalkModel]