*******************
SupervisedGraphWise
*******************

Overview
--------

:class:`SupervisedGraphWise` is an inductive vertex representation learning algorithm which is able to leverage vertex feature
information. It can be applied to a wide variety of tasks, including vertex classification and link prediction.

:class:`SupervisedGraphWise` is based on `GraphSAGE by Hamilton et al. <https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf>`_

Model Structure
---------------

A :class:`SupervisedGraphWise` model consists of graph convolutional layers followed by several prediction layers.
The forward pass through a convolutional layer for a vertex proceeds as follows:


1. A set of neighbors of the vertex is sampled.

2. The previous layer representations of the neighbors are mean-aggregated, and the aggregated features are concatenated with the previous layer representation of the vertex.

3. This concatenated vector is multiplied with weights, and a bias vector is added.

4. The result is normalized such that the layer output has unit norm.

The prediction layers are standard neural network layers.

Functionalities
---------------

We describe here the usage of the main functionalities of our implementation of ``GraphSAGE`` in PGX using the `Cora <https://relational.fit.cvut.cz/dataset/CORA>`_ graph as an example.

Loading a graph
~~~~~~~~~~~~~~~

First, we create a session and an analyst:

.. code-block:: python
    :linenos:

    session = pypgx.get_session()
    analyst = session.analyst

.. code-block:: python
    :linenos:

    from pypgx.api.filters import VertexFilter
    full_graph = session.read_graph_with_properties(
        self.cora_cfg, graph_name="cora")
    vertex_filter = VertexFilter.from_pgql_result_set(
        session.query_pgql(
            "SELECT v FROM MATCH (v) ON cora WHERE ID(v) % 4 > 0"), "v"
    )
    train_graph = full_graph.filter(vertex_filter)

    test_vertices = []
    for v in full_graph.get_vertices():
        if not train_graph.has_vertex(v.id):
            test_vertices.append(v)


Building a GraphWise Model (minimal)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We build a ``GraphWise`` model using the minimal configuration and default hyper-parameters. Note that even though only
one feature property is specified in this example, you can specify arbitrarily many.

.. code-block:: python
    :linenos:
  
    params = dict(
        vertex_target_property_name="label",
        vertex_input_property_names=["features"]
    )
    model = analyst.supervised_graphwise_builder(**params)


Advanced hyperparameter customization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The implementation allows for very rich hyperparameter customization.
Internally, GraphWise for each node it applies an aggregation of the representations of neighbors, this operation can be configured through a sub-config class:
either :class:`GraphWiseConvLayerConfig` or :class:`GraphWiseAttentionLayerConfig`.

* GraphWiseConvLayer is based on `Inductive Representation Learning on Large Graphs (GraphSage) by Hamilton et al. <https://arxiv.org/pdf/1706.02216.pdf>`_

* GraphWiseAttentionLayer is based on `Graph Attention neTworks (GAT) by Velickovic et al. <https://arxiv.org/pdf/1710.10903.pdf>`_ which makes the aggregation smarter but comes with larger computation cost

Prediction layer config is implemented through :class:`pypgx.api.mllib.GraphWisePredictionLayerConfig` class.
In the following, we build such configurations and use them in a model.
We specify a weight decay of ``0.001`` and dropout with dropping probability ``0.5`` to counteract overfitting.

To enable or disable GPU, we can use the parameter `enable_accelerator`. By default this feature is enabled, however if there's no GPU device and the cuda toolkit is not installed, the feature will be disabled and CPU will be the device used for all mllib operations.


.. code-block:: python
    :linenos:

    weight_property = analyst.pagerank(train_graph).name
    conv_layer_config = dict(
        num_sampled_neighbors=25,
        activation_fn='tanh',
        weight_init_scheme='xavier',
        neighbor_weight_property_name=weight_property,
        dropout_rate=0.5
    )

    conv_layer = analyst.graphwise_conv_layer_config(**conv_layer_config)
    pred_layer_config = dict(
        hidden_dim=32,
        activation_fn='relu',
        weight_init_scheme='he',
        dropout_rate=0.5
    )

    pred_layer = analyst.graphwise_pred_layer_config(**pred_layer_config)
    params = dict(
        vertex_target_property_name="labels",
        conv_layer_config=[conv_layer],
        pred_layer_config=[pred_layer],
        vertex_input_property_names=["vertex_features"],
        edge_input_property_names=["edge_features"],
        seed=17,
        weight_decay=0.001
    )

    model = analyst.supervised_graphwise_builder(**params)

The above code uses :class:`GraphWiseConvLayerConfig` for the convolutional layer configuration.
It can be replaced with :class:`GraphWiseAttentionLayerConfig` if a graph attention network model is desired.
If the number of sampled neighbors is set to `-1` using `setNumSampledNeighbors`, all neighboring nodes will be sampled.

.. code-block:: python
    :linenos:

    conv_layer_config = dict(
        num_sampled_neighbors=25,
        activation_fn='leaky_relu',
        weight_init_scheme='xavier_uniform',
        num_heads=4,
        dropout_rate=0.5
    )

    conv_layer = analyst.graphwise_attention_layer_config(**conv_layer_config)

For a full description of all available hyperparameters and their default values, see the
:class:`pypgx.api.mllib.SupervisedGraphWiseModelBuilder`,
:class:`pypgx.api.mllib.GraphWiseConvLayerConfig`,
:class:`pypgx.api.mllib.GraphWiseAttentionLayerConfig` and
:class:`pypgx.api.mllib.GraphWisePredictionLayerConfig` docs.

Property types supported
~~~~~~~~~~~~~~~~~~~~~~~~

The model supports two types of properties for both vertices and edges:

* ``continuous`` properties (boolean, double, float, integer, long)

* ``categorical`` properties (string)

For ``categorical`` properties, two categorical configurations are possible:

* ``one-hot-encoding``: each category is mapped to a vector, that is concatenated to other features (default)

* ``embedding table``: each category is mapped to an embedding that is concatenated to other features and is trained along with the model

One-hot-encoding converts each category into an independent vector. Therefore, it is suitable if we want each category
to be interpreted as an equally independent group. For instance, if there are categories ranging from A to E without
meaning anything by each alphabet, one-hot-encoding can be a good fit.

Embedding table is recommended if the semantics of the properties matter, and we want certain categories to be closer
to each other than the others. For example, let's assume there is a "day" property with values ranging from Monday to
Sunday and we want to preserve our intuition that "Tuesday" is closer to "Wednesday" than "Saturday". Then by choosing
the embedding table configuration, we can let the vectors that represent the categories to be learned during training
so that the vector that is mapped to "Tuesday" becomes close to that of "Wednesday".

Although the embedding table approach has an advantage over one-hot-encoding that we can learn more suitable vectors to
represent each category, this also means that a good amount of data is required to train the embedding table properly.
The one-hot-encoding approach might be better for use-cases with limited training data.

When using the embedding table, we let users set the out-of-vocabulary probability. With the given probability,
the embedding will be set to the out-of-vocabulary embedding randomly during training, in order to make the model more
robust to unseen categories during inference.

.. code-block:: python
    :linenos:

    vertex_input_property_configs = [
        analyst.one_hot_encoding_categorical_property_config(
            property_name="vertex_str_feature_1",
            max_vocabulary_size=100,
        ),
        analyst.learned_embedding_categorical_property_config(
            property_name="vertex_str_feature_2",
            embedding_dim=4,
            shared=False, # set whether to share the vocabulary or not when several  types have a property with the same name
            oov_probability=0.001 # probability to set the word embedding to the out-of-vocabulary embedding
        ),
    ]

    model_params = dict(
        vertex_input_property_names=[
            "vertex_int_feature_1", # continuous feature
            "vertex_str_feature_1", # string feature using one-hot-encoding
            "vertex_str_feature_2", # string feature using embedding table
            "vertex_str_feature_3", # string feature using one-hot-encoding (default)
        ],
        vertex_input_property_configs=vertex_input_property_configs,
        vertex_target_property_name="labels",
        enable_accelerator=True # Enable or Disable GPU
    )

    model = analyst.supervised_graphwise_builder(**model_params)


Classification vs Regression models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Whatever the type of the property you're trying to predict, the default task that the model addresses is classification.
Even if this property is a number, the model will assign one label for each value found and classify on it.

In some cases, you may prefer to infer continuous values for your property when it is an integer or a float.
This is called the regression mode, and to enable it, you need to provide the :class:`MSELoss` loss function object.

Setting a custom Loss Function and Batch Generator (for Anomaly Detection)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It is possible to select different loss functions for the supervised model by providing a loss function object,
and different batch generators by providing a batch generator type. This is useful for applications such as
Anomaly Detection, which can be cast into the standard supervised framework but require different loss functions
and batch generators.

:class:`SupervisedGraphWise` model can use the `DevNetLoss <https://arxiv.org/pdf/1911.08623.pdf>`_ and the **StratifiedOversamplingBatchGenerator**. Where the DevNetLoss takes two parameters: the confidence margin and the value the anomaly takes in the target property.
In the following example, we assume the ``convLayerConfig`` has already been defined:

.. code-block:: python
    :linenos:

    from pypgx.api.mllib import DevNetLoss

    pred_layer_config = dict(
        hidden_dim=32,
        activation_fn='linear'
    )

    pred_layer = analyst.graphwise_pred_layer_config(**pred_layer_config)
    params = dict(
        vertex_target_property_name="labels",
        conv_layer_config=[conv_layer],
        pred_layer_config=[pred_layer],
        vertex_input_property_names=["vertex_features"],
        edge_input_property_names=["edge_features"],
        loss_fn=DevNetLoss(5.0, True),
        batch_gen="stratified_oversampling",
        seed=17,
        enable_accelerator=True # Enable or Disable GPU
    )

    model = analyst.supervised_graphwise_builder(**params)

Training the SupervisedGraphWiseModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can train a ``SupervisedGraphWiseModel`` model on a graph:

.. code-block:: python
    :linenos:
    
    model.fit(train_graph)

Getting Loss value
~~~~~~~~~~~~~~~~~~

We can fetch the training loss value:

.. code-block:: python
    :linenos:

    loss = model.get_training_loss()

Inferring vertex labels
~~~~~~~~~~~~~~~~~~~~~~~

We can infer the labels for vertices on any graph (including vertices or graphs that were not seen during training):

.. code-block:: python
    :linenos:

    labels = model.infer(full_graph, test_vertices)
    labels.print()

If the model is a classification model, it's also possible to set the decision threshold applied to the logits by
adding it as an extra parameter, which is by default 0:

.. code-block:: python
    :linenos:

    labels = model.infer(
        full_graph,
        full_graph.get_vertices(),
        6
    )
    labels.print()

The output will be similar to the following example output:

+----------+-----------------------+
| vertexId | label                 |
+==========+=======================+
| 2        | Neural Networks       |
+----------+-----------------------+
| 6        | Theory                |
+----------+-----------------------+
| 7        | Case Based            |
+----------+-----------------------+
| 22       | Rule Learning         |
+----------+-----------------------+
| 30       | Theory                |
+----------+-----------------------+
| 34       | Neural Networks       |
+----------+-----------------------+
| 47       | Case Based            |
+----------+-----------------------+
| 48       | Probabalistic Methods |
+----------+-----------------------+
| 50       | Theory                |
+----------+-----------------------+
| 52       | Theory                |
+----------+-----------------------+

In a similar fashion, if the model is a classification model, you can get the model confidence for each class by
inferring the prediction logits:

.. code-block:: python
    :linenos:
  
    logits = model.infer_logits(graph, test_vertices)
    logits.print()

If the model is a classification model, the `infer_labels` method is also available and equivalent to `infer`.

Evaluating model performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:meth:`evaluate` is a convenience method to evaluate various metrics for the model:

.. code-block:: python
    :linenos:

    model.evaluate(full_graph, test_vertices).print()

Similar to inferring labels, we can add the decision threshold as an extra parameter:

.. code-block:: python
    :linenos:

    model.evaluate(full_graph, test_vertices, 6).print()

The output will be similar to the following examples. For a classification model:

+----------+-----------+--------+----------+
| Accuracy | Precision | Recall | F1-Score |
+==========+===========+========+==========+
| 0.8488   | 0.8523    | 0.831  | 0.8367   |
+----------+-----------+--------+----------+

For a regression model:

+--------------------+
| MSE                |
+--------------------+
| 0.9573243436116953 |
+--------------------+


If the model is a classification model, the `evaluate_labels` method is also available and equivalent to `evaluate`.

Inferring embeddings
~~~~~~~~~~~~~~~~~~~~

We can use a trained model to infer embeddings for unseen nodes and store in a ``CSV`` file:

.. code-block:: python
    :linenos:

    vertex_vectors = model.infer_embeddings(full_graph, test_vertices).flatten_all()
    vertex_vectors.store("<path>/vertex_vectors.csv", file_format="csv", overwrite=True)

The schema for the ``vertex_vectors`` would be as follows without flattening (``flatten_all`` splits the vector column into separate double-valued columns):

+-----------------------------------------+---------------------+
| vertexId                                | embedding           |
+-----------------------------------------+---------------------+

Storing a trained model
~~~~~~~~~~~~~~~~~~~~~~~

Models can be stored either to the server file system, or to a database.

The following shows how to store a trained :class:`SupervisedGraphWise` model to a specified file path:

.. code-block:: python
    :linenos:

    model.export().file("<path>/<model_name>", key)

When storing models in database, they are stored as a row inside a model store table.
The following shows how to store a trained :class:`SupervisedGraphWise` model in database in a specific model store table:

.. code-block:: python
    :linenos:

    model.export().db(
        "modeltablename",
        "model_name",
        username="user",
        password="password",
        jdbc_url="jdbcUrl"
    )

Loading a pre-trained model
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly to storing, models can be loaded from a file in the server file system, or from a database.
We can load a pre-trained :class:`SupervisedGraphWise` model from a specified file path as follows:

.. code-block:: python
    :linenos:

    model = analyst.load_supervised_graphwise_model("<path>/<model>", "key")

We can load a pre-trained :class:`SupervisedGraphWise` model from a model store table in database as follows:

.. code-block:: python
    :linenos:

    model = analyst.get_supervised_graphwise_model_loader().db(
        "modeltablename",
        "model_name",
        username="user",
        password="password",
        jdbc_url="jdbcUrl"
    )

Explaining a Prediction
~~~~~~~~~~~~~~~~~~~~~~~

In order to understand which features and vertices were important for a prediction of the model, we can generate a ``SupervisedGnnExplanation`` using a technique similar to the `GNNExplainer by Ying et al. <https://papers.nips.cc/paper/2019/file/d80b7040b773199015de6d3b4293c8ff-Paper.pdf>`_.

The explanation holds information related to

* graph structure: an importance score for each vertex

* features: an importance score for each graph property

Note that the vertex being explained is always assigned importance 1. Further, the feature importances are scaled such that the most important feature has importance 1.

Additionally, a :class:`SupervisedGnnExplanation` contains the inferred embedding, logits, and label.

To get explanations for a model's predictions, its :class:`SupervisedGnnExplainer` object can be obtained
using the :meth:`gnn_explainer` method. After obtaining the :class:`SupervisedGnnExplainer`, its
:meth:`inferAndExplain` method can be used to request and explanation for a vertex.

The parameters of the explainer can be configured while the explainer is being created or afterwards
using the relevant setter functions. The configurable parameters for the :class:`SupervisedGnnExplainer`
are:

* numOptimizationSteps: the number of optimization steps used by the explainer

* learningRate: the learning rate of the explainer

* marginalize: whether the explainer loss is marginalized over features. This can help in cases
  where there are important features that take values close to zero. Without marginalization the
  explainer can learn to mask such features out even if they are important, marginalization solves
  this by instead learning a mask for the deviation from the estimated input distribution.

Note that, in order to achieve best results, the features should be centered around 0.

We can generate an explanation on a simple graph as follows. The graph contains a feature that correlates with the label
and one that does not. We hence expect the importance of the features to differ significantly (with the feature
correlating with the label being more important), whereas structural importance does not play a big role here.

.. code-block:: python
    :linenos:

    simple_graph = session.create_graph_builder() \
        .add_vertex(0) \
        .set_property("label_feature", 0.5) \
        .set_property("const_feature", 0.5) \
        .set_property("label", True) \
        .add_vertex(1) \
        .set_property("label_feature", -0.5) \
        .set_property("const_feature", 0.5) \
        .set_property("label", False) \
        .add_edge(0, 1) \
        .build()
    # build and train model as described above
    params = dict(
        vertex_target_property_name="label",
        vertex_input_property_names=["label_feature", "const_feature"]
    )

    model = analyst.supervised_graphwise_builder(**params)
    model.fit(simple_graph)

    # obtain the explainer
    explainer = model.gnn_explainer(learning_rate=0.05)
    explainer.num_optimization_steps = 200

    # explain prediction of vertex 0
    explanation = explainer.infer_and_explain(
        simple_graph, graph.get_vertex(0))
    # if we used the devNet loss, we can add the decision threshold as an extra parameter:
    # explanation = explainer.inferAndExplain(simple_graph, simple_graph.get_vertex(0), 6)

    const_property = simple_graph.get_vertex_property("const_feature")
    label_property = simple_graph.get_vertex_property("label_feature")

    # retrieve feature importances
    feature_importances = explanation.get_vertex_feature_importance()
    # small as unimportant
    importance_const_prop = feature_importances[const_property]
    # large (1) as important
    importance_label_prop = feature_importances[label_property]

    # retrieve computation graph with importances
    importance_graph = explanation.get_importance_graph()

    # retrieve importance of vertices
    importance_property = explanation.get_vertex_importance_property()
    # has importance 1
    importance_vertex_0 = importance_property[0]
    # available if vertex 1 part of computation
    importance_vertex_1 = importance_property[1]

Destroying a model
~~~~~~~~~~~~~~~~~~

We can destroy a model as follows:

.. code-block:: python
    :linenos:

    model.destroy()