*********************
UnsupervisedGraphWise
*********************

Overview
--------

:class:`UnsupervisedGraphWise` is an inductive vertex representation learning algorithm which is able to leverage vertex feature
information. It can be applied to a wide variety of tasks, including unsupervised learning vertex embeddings for
vertex classification.
:class:`UnsupervisedGraphWise` is based on `Deep Graph Infomax (DGI) by Velickovic et al. <https://arxiv.org/pdf/1809.10341.pdf>`_

Model Structure
---------------

A :class:`UnsupervisedGraphWise` model consists of graph convolutional layers followed by an embedding layer which defaults to a DGI layer.
The forward pass through a convolutional layer for a vertex proceeds as follows:

1. A set of neighbors of the vertex is sampled.

2. The previous layer representations of the neighbors are mean-aggregated, and the aggregated features are concatenated with the previous layer representation of the vertex.

3. This concatenated vector is multiplied with weights, and a bias vector is added.

4. The result is normalized to such that the layer output has unit norm.

The DGI Layer consists of three parts enabling unsuspervised learning using embeddings produced by the convolution layers.

1. Corruption function: Shuffles the node features while preserving the graph structure to produce negative embedding samples using the convolution layers.

2. Readout function: Sigmoid activated mean of embeddings, used as summary of a graph

3. Discriminator: Measures the similarity of positive (unshuffled) embeddings with the summary as well as the similarity of negative samples with the summary from which the loss function is computed.

Since none of these contains mutable hyperparameters, the default DGI layer is always used and cannot be adjusted.

Functionalities
---------------

We describe here the usage of the main functionalities of our implementation of ``DGI`` in PGX
using the `Cora <https://relational.fit.cvut.cz/dataset/CORA>`_ graph as an example.

Loading a graph
~~~~~~~~~~~~~~~

First, we create a session and an analyst:

.. code-block:: python
    :linenos:

    session = pypgx.get_session(create_partitioned_graphs_with_graph_builder=False)
    analyst = session.analyst


Since we train the model unsupervised, we do not have to use a test graph or test vertices

.. code-block:: python
    :linenos:
    
    graph = session.read_graph_with_properties(self.small_graph3)

Building an UnsupervisedGraphWise Model (minimal)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We build an :class:`UnsupervisedGraphWise` model using the minimal configuration and default hyper-parameters. Note that even though only
one feature property is specified in this example, you can specify arbitrarily many.

.. code-block:: python
    :linenos:

    model = analyst.unsupervised_graphwise_builder(
        vertex_input_property_names=["features"]
    )

Advanced hyperparameter customization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The implementation allows for very rich hyperparameter customization.
Internally, GraphWise for each node it applies an aggregation of the representations of neighbors, this operation can be configured through a sub-config class:
either :class:`GraphWiseConvLayerConfig` or :class:`GraphWiseAttentionLayerConfig`.

* GraphWiseConvLayer is based on `Inductive Representation Learning on Large Graphs (GraphSage) by Hamilton et al. <https://arxiv.org/pdf/1706.02216.pdf>`_

* GraphWiseAttentionLayer is based on `Graph Attention neTworks (GAT) by Velickovic et al. <https://arxiv.org/pdf/1710.10903.pdf>`_ which makes the aggregation smarter but comes with larger computation cost

In the following, we build such a configuration and use it in a model.
We specify a weight decay of ``0.001`` and dropout with dropping probability ``0.5`` to counteract overfitting.
Also, we recommend to disable normalization of embeddings when intended to use them in downstream classfication tasks.

To enable or disable GPU, we can use the parameter `enable_accelerator`. By default this feature is enabled, however if there's no GPU device and the cuda toolkit is not installed, the feature will be disabled and CPU will be the device used for all mllib operations.


.. code-block:: python
    :linenos:

    weight_property = analyst.pagerank(graph).name
    conv_layer_config = dict(
        num_sampled_neighbors=25,
        activation_fn='tanh',
        weight_init_scheme='xavier',
        neighbor_weight_property_name=weight_property,
        dropout_rate=0.5
    )
    conv_layer = analyst.graphwise_conv_layer_config(**conv_layer_config)
    params = dict(
        conv_layer_config=[conv_layer],
        vertex_input_property_names=[
            "feat1", "feat2", "feat3", "bool_label"],
        edge_input_property_names=[
            "edge_feat1", "edge_feat2", "edge_feat3", "edge_bool_label"],
        weight_decay=0.001,
        normalize=False,  # recommended
    )

    model = analyst.unsupervised_graphwise_builder(**params)

The above code uses :class:`GraphWiseConvLayerConfig` for the convolutional layer configuration.
It can be replaced with :class:`GraphWiseAttentionLayerConfig` if a graph attention network model is desired.
If the number of sampled neighbors is set to `-1` using `setNumSampledNeighbors`, all neighboring nodes will be sampled.

.. code-block:: python
    :linenos:

    conv_layer_config = dict(
        num_sampled_neighbors=25,
        activation_fn='leaky_relu',
        weight_init_scheme='xavier_uniform',
        num_heads=4,
        dropout_rate=0.5
    )

    conv_layer = analyst.graphwise_attention_layer_config(**conv_layer_config)

For a full description of all available hyperparameters and their default values, see the
:class:`pypgx.api.mllib.UnsupervisedGraphWiseModel`,
:class:`pypgx.api.mllib.GraphWiseConvLayerConfig`,
:class:`pypgx.api.mllib.GraphWiseAttentionLayerConfig`,
:class:`pypgx.api.mllib.GraphWiseDgiLayerConfig` and
:class:`pypgx.api.mllib.GraphWiseDominantLayerConfig` docs.


Property types supported
~~~~~~~~~~~~~~~~~~~~~~~~

The model supports two types of properties for both vertices and edges:

* ``continuous`` properties (boolean, double, float, integer, long)

* ``categorical`` properties (string)

For ``categorical`` properties, two categorical configurations are possible:

* ``one-hot-encoding``: each category is mapped to a vector, that is concatenated to other features (default)

* ``embedding table``: each category is mapped to an embedding that is concatenated to other features and is trained along with the model

One-hot-encoding converts each category into an independent vector. Therefore, it is suitable if we want each category
to be interpreted as an equally independent group. For instance, if there are categories ranging from A to E without
meaning anything by each alphabet, one-hot-encoding can be a good fit.

Embedding table is recommended if the semantics of the properties matter, and we want certain categories to be closer
to each other than the others. For example, let's assume there is a "day" property with values ranging from Monday to
Sunday and we want to preserve our intuition that "Tuesday" is closer to "Wednesday" than "Saturday". Then by choosing
the embedding table configuration, we can let the vectors that represent the categories to be learned during training
so that the vector that is mapped to "Tuesday" becomes close to that of "Wednesday".

Although the embedding table approach has an advantage over one-hot-encoding that we can learn more suitable vectors to
represent each category, this also means that a good amount of data is required to train the embedding table properly.
The one-hot-encoding approach might be better for use-cases with limited training data.

When using the embedding table, we let users set the out-of-vocabulary probability. With the given probability,
the embedding will be set to the out-of-vocabulary embedding randomly during training, in order to make the model more
robust to unseen categories during inference.

.. code-block:: python
    :linenos:

    vertex_input_property_configs = [
        analyst.one_hot_encoding_categorical_property_config(
            property_name="vertex_str_feature_1",
            max_vocabulary_size=100,
        ),
        analyst.learned_embedding_categorical_property_config(
            property_name="vertex_str_feature_2",
            embedding_dim=4,
            shared=False, # set whether to share the vocabulary or not when several  types have a property with the same name
            oov_probability=0.001 # probability to set the word embedding to the out-of-vocabulary embedding
        ),
    ]

    model_params = dict(
        vertex_input_property_names=[
            "vertex_int_feature_1", # continuous feature
            "vertex_str_feature_1", # string feature using one-hot-encoding
            "vertex_str_feature_2", # string feature using embedding table
            "vertex_str_feature_3", # string feature using one-hot-encoding (default)
        ],
        vertex_input_property_configs=vertex_input_property_configs,
        enable_accelerator=True # Enable or Disable GPU
    )

    model = analyst.unsupervised_graphwise_builder(**model_params)


Training the :class:`UnsupervisedGraphWiseModel`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can train a ``UnsupervisedGraphWiseModel`` model on a graph:

.. code-block:: python
    :linenos:

    model.fit(graph)

Assuming we have separate graphs for training and testing, we can also add a validation step to the training.
When training a model, the optimal number of training epochs is not known in advance and
it is one of the key parameters that determine the model quality.
Being able to monitor the training and validation losses helps us to identify a good value
for the model parameters and gain visibility in the training process.
The evaluation frequency can be specified in terms of epoch or step.
To configure a validation step, create a `GraphWiseValidationConfig` and pass it to the model builder:

.. code-block:: python
    :linenos:

    val_config = analyst.graphwise_validation_config(
        evaluation_frequency=1,
        evaluation_frequency_scale="epoch",
    )

    params = dict(
        vertex_input_property_names=["features"],
        validation_config=val_config,
        seed=17
    )

    model = analyst.unsupervised_graphwise_builder(**params)

After configuring a validation step, pass a graph for validation to the fit method together with the graph for training:

.. code-block:: python
    :linenos:
    
    model.fit(train_graph, test_graph)

Getting Loss value
~~~~~~~~~~~~~~~~~~

We can fetch the training loss value:

.. code-block:: python
    :linenos:

    loss = model.get_training_loss()

Getting Training log
~~~~~~~~~~~~~~~~~~

If a validation step was configured, we can fetch the training log that has training and validation loss information:

.. code-block:: python
    :linenos:

    training_log = model.get_training_log()
    training_log.print()

The output frame will be similar to the following example output:

+---------------------------------------------------+
| epoch | training_loss       | validation_loss     |
+---------------------------------------------------+
| 1     | 1.1378216743469238  | 0.7227532917802985  |
| 2     | 0.47905975580215454 | 0.36742845245383005 |
| 3     | 0.28058260679244995 | 0.32146902856501663 |
+---------------------------------------------------+

The first column will be named according to the evaluation frequency scale that was set in the validation configuration ("epoch" or "step").
Note that the validation loss is the average of the losses evaluated on all batches of the validation graph,
while the training loss is the loss value logged at that epoch or step (i.e., the loss evaluated on the last batch).
Also, please note that the training log will be overwritten if the fit method is called multiple times.

Inferring embeddings
~~~~~~~~~~~~~~~~~~~~

We can use a trained model to infer embeddings for unseen nodes and store in a ``CSV`` file:

.. code-block:: python
    :linenos:

    vertex_vectors = model.infer_embeddings(
        graph, graph.get_vertices()).flatten_all()
    vertex_vectors.store(tmp + "/vertex_vectors.csv",
                         file_format='csv', overwrite=True)

The schema for the :meth:`vertex_vectors` would be as follows without flattening (:meth:`flatten_all` splits the vector column into separate double-valued columns):

+-----------------------------------------+---------------------+
| vertexId                                | embedding           |
+-----------------------------------------+---------------------+

Classifying the vertices using the obtained embeddings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can use the obtained embeddings in downstream vertex classification tasks.
The following shows how we can train a MLP classifier which takes the embeddings as input.
We assume that the vertex label information is stored under the vertex property "bool_label".

.. code-block:: python
    :linenos:

    import pandas as pd
    from sklearn.metrics import accuracy_score, make_scorer
    from sklearn.model_selection import RepeatedStratifiedKFold, cross_val_score
    from sklearn.neural_network import MLPClassifier
    from sklearn.preprocessing import StandardScaler


    # prepare input data
    vertex_vectors_df = vertex_vectors.to_pandas().astype({"vertexId": int})
    vertex_labels_df = pd.DataFrame([
        {"vertexId": v.id, "labels": properties}
        for v, properties in graph.get_vertex_property("bool_label").get_values()
    ]).astype(int)

    vertex_vectors_with_labels_df = vertex_vectors_df.merge(vertex_labels_df, on="vertexId")

    feature_columns = [c for c in vertex_vectors_df.columns if c.startswith("embedding")]
    x = vertex_vectors_with_labels_df[feature_columns].to_numpy()
    y = vertex_vectors_with_labels_df["labels"].to_numpy()

    scaler = StandardScaler()
    x = scaler.fit_transform(x)

    # define a MLP classifier
    classifier = MLPClassifier(
        hidden_layer_sizes=(6,),
        learning_rate_init=0.05,
        max_iter=2000,
        random_state=42,
    )

    # define a metric and evaluate with cross-validation
    cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=3, random_state=42)
    scorer = make_scorer(accuracy_score, greater_is_better=True)
    scores = cross_val_score(classifier, x, y, scoring=scorer, cv=cv, n_jobs=-1)


Storing a trained model
~~~~~~~~~~~~~~~~~~~~~~~

Models can be stored either to the server file system, or to a database.

The following shows how to store a trained :class:`UnsupervisedGraphWise` model to a specified file path:

.. code-block:: python
    :linenos:

    model.export().file(path=tmp + "/model.model", key="test", overwrite=True)

When storing models in database, they are stored as a row inside a model store table.
The following shows how to store a trained :class:`UnsupervisedGraphWise` model in database in a specific model store table:

.. code-block:: python
    :linenos:

    model.export().db(
        "modeltablename",
        "model_name",
        username="user",
        password="password",
        jdbc_url="jdbcUrl"
    )


Loading a pre-trained model
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly to storing, models can be loaded from a file in the server file system, or from a database.

We can load a pre-trained :class:`UnsupervisedGraphWise` model from a specified file path as follows:

.. code-block:: python
    :linenos:

    model = analyst.load_unsupervised_graphwise_model(
        tmp + "/model.model",
        key="test"
    )
    simple_graph = session.create_graph_builder() \
        .add_vertex(0) \
        .set_property("feat1", 0.5) \
        .set_property("const_feature", 0.5) \
        .set_property("label", True) \
        .add_vertex(1) \
        .set_property("feat2", -0.5) \
        .set_property("const_feature", 0.5) \
        .set_property("label", False) \
        .add_edge(0, 1) \
        .build()

We can load a pre-trained :class:`UnsupervisedGraphWise` model from a model store table in database as follows:

.. code-block:: python
    :linenos:

    model = analyst.get_unsupervised_graphwise_model_loader().db(
        "modeltablename",
        "model_name",
        username="user",
        password="password",
        jdbc_url="jdbcUrl"
    )

Explaining a Prediction
~~~~~~~~~~~~~~~~~~~~~~~

In order to understand which features and vertices were important for a prediction of the model, 
we can generate an :class:`UnsupervisedGnnExplanation` using a technique similar to the `GNNExplainer by Ying et al. <https://papers.nips.cc/paper/2019/file/d80b7040b773199015de6d3b4293c8ff-Paper.pdf>`_.

The explanation holds information related to

* graph structure: an importance score for each vertex

* features: an importance score for each graph property

Note that the vertex being explained is always assigned importance 1. Further, the feature importances are scaled such
that the most important feature has importance 1.

Additionally, an :class:`UnsupervisedGnnExplanation` contains the inferred embedding.

To get explanations for a model's predictions, its :class:`UnsupervisedGnnExplainer` object can be obtained
using the :meth:`gnn_explainer` method. After obtaining the :class:`UnsupervisedGnnExplainer`, its
:meth:`inferAndExplain` method can be used to request and explanation for a vertex.

The parameters of the explainer can be configured while the explainer is being created or afterwards
using the relevant setter functions. The configurable parameters for the :class:`UnsupervisedGnnExplainer`
are:

* numOptimizationSteps: the number of optimization steps used by the explainer

* learningRate: the learning rate of the explainer

* marginalize: whether the explainer loss is marginalized over features. This can help in cases where there are important features that take values close to zero. Without marginalization the explainer can learn to mask such features out even if they are important, marginalization solves this by instead learning a mask for the deviation from the estimated input distribution.

* numClusters: the number of clusters to use in the explainer loss. The unsupervised explainer uses k-means clustering to compute the explainer loss that is optimized. If the approximate number of components in the graph is known, it is a good idea to set the number of clusters to this number.
* numSamples: the number of vertex samples to use to optimize the explainer. For the sake of performance, the explainer computes the loss on this number of randomly sampled vertices. Using more samples will be more accurate but will take longer and use more resources.

Note that, in order to achieve best results, the features should be centered around 0.

Let's assume we have a graph ``component_graph`` that contains ``k`` densely connect *components*.
I.e. there are many edges between vertices of the same component and few edges between any two components.
By training an unsupervised GraphWise model on this graph, we obtain a model that we expect to produce similar embeddings for vertices in a densely connected component.

The example below shows how to generate an explanation on an inference ``component_graph``.
We expect vertices from the same component to have a higher importance than vertices from a different component.
Note that the feature importances are not relevant in this example.

.. code-block:: python
    :linenos:

    # load 'component_graph' with vertex features 'feat1' and 'feat2'
    feat1_property = graph.get_vertex_property("feat1")
    feat2_property = graph.get_vertex_property("feat2")
    # build and train unsupervised GraphWise model as described above

    # obtain and configure the explainer
    # setting the num_clusters argument to the expected number of clusters may improve
    # explanation results as the explainer optimization will try to cluster samples into
    # this number of clusters
    explainer = model.gnn_explainer(num_clusters=50)
    # set the number of samples to compute the loss over during explainer optimization
    explainer.num_samples = 10000

    # explain prediction of vertex 0
    explanation = explainer.infer_and_explain(
        graph=graph,
        vertex=graph.get_vertex(0)
    )

    # retrieve computation graph with importances
    importance_graph = explanation.get_importance_graph()

    # retrieve importance of vertices
    # vertex 1 is in the same densely connected component as vertex 0
    # vertex 2 is in a different component
    importance_property = explanation.get_vertex_importance_property()
    importance_vertex_0 = importance_property[0]
    importance_vertex_1 = importance_property[1]

    # retrieve feature importance (not relevant for this example)
    feature_importances = explanation.get_vertex_feature_importance()
    importance_feat1_prop = feature_importances[feat1_property]
    importance_feat2_prop = feature_importances[feat2_property]

Destroying a model
~~~~~~~~~~~~~~~~~~

We can destroy a model as follows:

.. code-block:: python
    :linenos:

    model.destroy()