********
DeepWalk
********

Overview of the algorithm
-------------------------

:class:`DeepWalk` is a widely employed vertex representation learning algorithm used in industry (e.g., in `Taobao from Alibaba <https://dl.acm.org/citation.cfm?doid=3219819.3219869>`_). 
It consists of two main steps:

- First, the random walk generation step computes random walks for each vertex (with a pre-defined walk length and a pre-defined number of walks per vertex)

- Second, these generated walks are fed to a word2vec algorithm to generate the vector representation for each vertex (which is the word in the input provided to the word2vec algorithm). Further details regarding the :class:`DeepWalk` algorithm is available in the KDD `paper <https://dl.acm.org/citation.cfm?id=2623732>`_.

:class:`DeepWalk` creates vertex embeddings for a specific graph and cannot be updated to incorporate modifications on the graph. 
Instead, a new :class:`DeepWalk` model should be trained on this modified graph. Lastly, it is important to note that the memory 
consumption of the :class:`DeepWalk` model is ``O(2n*d)`` where ``n`` is the number of vertices in the graph and ``d`` is the 
embedding length.

Functionalities
---------------

We describe here the usage of the main functionalities of our implementation of :class:`DeepWalk` in PGX 
using `DBpedia <http://dbpedia.org/>`_ graph as an example (with 8,637,721 vertices and 165,049,964 edges).

Loading a graph
~~~~~~~~~~~~~~~

First, we create a session and an analyst:

.. code-block:: python
    :linenos:

    session = pypgx.get_session(session_name="my-session")
    analyst = session.create_analyst()

Our :class:`DeepWalk` algorithm implementation can be applied to directed or undirected graphs 
(even though we only consider undirected random walks). To begin with, we can load a graph as follows:

.. code-block:: python
    :linenos:

    graph = session.read_graph_with_properties(self.small_graph)

Building a DeepWalk Model (minimal)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We build a :class:`DeepWalk` model using the minimal configuration and default hyper-parameters:

.. code-block:: python
    :linenos:
    
    model = analyst.deepwalk_builder(
        window_size=3,
        walks_per_vertex=6,
        walk_length=4
    )

Building a DeepWalk Model (customized)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We build a :class:`DeepWalk` model using customized hyper-parameters:

.. code-block:: python
    :linenos:

    model = analyst.deepwalk_builder(
        min_word_frequency=1,
        batch_size=512,
        num_epochs=1,
        layer_size=100,
        learning_rate=0.05,
        min_learning_rate=0.0001,
        window_size=3,
        walks_per_vertex=6,
        walk_length=4,
        sample_rate=1.0,
        negative_sample=2
    )

We provide complete explanation for each builder operation (along with the default values) in our :meth:`pypgx.api.mllib.Analyst.deepwalk_builder` docs.

Training the DeepWalk model
~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can train a :class:`DeepWalk` model with the specified (default or customized) settings:

.. code-block:: python
    :linenos:

    model.fit(graph)

Getting Loss value
~~~~~~~~~~~~~~~~~~

We can fetch the loss value on a specified fraction of training data:

.. code-block:: python
    :linenos:

    loss = model.loss

Computing the similar vertices
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can fetch the ``k`` most similar vertices for a given vertex:

.. code-block:: python
    :linenos:

    similars = model.compute_similars(9, 2)
    similars.print()

The output results will be in the following format, for e.g., searching for similar vertices for `Albert_Einstein <http://dbpedia.org/page/Albert_Einstein>`_ using
the trained model.

+--------------------+--------------------+
| dstVertex          | similarity         |
+====================+====================+
| Albert_Einstein    | 1.0000001192092896 |
+--------------------+--------------------+
| Physics            | 0.8664291501045227 |
+--------------------+--------------------+
| Werner_Heisenberg  | 0.8625140190124512 |
+--------------------+--------------------+
| Richard_Feynman    | 0.8496938943862915 |
+--------------------+--------------------+
| List_of_physicists | 0.8415523767471313 |
+--------------------+--------------------+
| Physicist          | 0.8384397625923157 |
+--------------------+--------------------+
| Max_Planck         | 0.8370327353477478 |
+--------------------+--------------------+
| Niels_Bohr         | 0.8340970873832703 |
+--------------------+--------------------+
| Quantum_mechanics  | 0.8331197500228882 |
+--------------------+--------------------+
| Special_relativity | 0.8280861973762512 |
+--------------------+--------------------+

Computing the similars (for a vertex batch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can fetch the ``k`` most similar vertices for a list of input vertices:

.. code-block:: python
    :linenos:

    vertices = [5, 9]
    batched_similars = model.compute_similars(vertices, 10)
    batched_similars.print()

The output results will be in the following format:

+------------------+---------------------------+--------------------+
| srcVertex        | dstVertex                 | similarity         |
+==================+===========================+====================+
| Machine_learning | Machine_learning          | 1.0000001192092896 |
+------------------+---------------------------+--------------------+
| Machine_learning | Data_mining               | 0.9070799350738525 |
+------------------+---------------------------+--------------------+
| Machine_learning | Computer_science          | 0.8963605165481567 |
+------------------+---------------------------+--------------------+
| Machine_learning | Unsupervised_learning     | 0.8828719854354858 |
+------------------+---------------------------+--------------------+
| Machine_learning | R_(programming_language)  | 0.8821185827255249 |
+------------------+---------------------------+--------------------+
| Machine_learning | Algorithm                 | 0.8819515705108643 |
+------------------+---------------------------+--------------------+
| Machine_learning | Artificial_neural_network | 0.8773092031478882 |
+------------------+---------------------------+--------------------+
| Machine_learning | Data_analysis             | 0.8758628368377686 |
+------------------+---------------------------+--------------------+
| Machine_learning | List_of_algorithms        | 0.8737979531288147 |
+------------------+---------------------------+--------------------+
| Machine_learning | K-means_clustering        | 0.8715602159500122 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Albert_Einstein           | 1.0000001192092896 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Physics                   | 0.8664291501045227 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Werner_Heisenberg         | 0.8625140190124512 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Richard_Feynman           | 0.8496938943862915 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | List_of_physicists        | 0.8415523767471313 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Physicist                 | 0.8384397625923157 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Max_Planck                | 0.8370327353477478 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Niels_Bohr                | 0.8340970873832703 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Quantum_mechanics         | 0.8331197500228882 |
+------------------+---------------------------+--------------------+
| Albert_Einstein  | Special_relativity        | 0.8280861973762512 |
+------------------+---------------------------+--------------------+

Getting all trained vertex vectors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can retrieve the trained vertex vectors for the current :class:`DeepWalk` model and store in a ``TSV`` file (``CSV`` with ``tab`` separator):

.. code-block:: python
    :linenos:

    vertex_vectors = model.trained_vectors.flatten_all()
    vertex_vectors.store(
        tmp + "/vertex_vectors.tsv",
        overwrite=True,
        file_format="csv"
    )

The schema for the :meth:`vertex_vectors` would be as follows without flattening (:meth:`flatten_all` splits the vector column into separate double-valued columns):

+-----------------------------------------+---------------------+
| vertexId                                | embedding           |
+-----------------------------------------+---------------------+

Storing a trained model
~~~~~~~~~~~~~~~~~~~~~~~

Models can be stored either to the server file system, or to a database.

The following shows how to store a trained :class:`DeepWalk` model to a specified file path:

.. code-block:: python
    :linenos:

    model.export().file(path=tmp + "/model.model", key="test", overwrite=True)

When storing models in database, they are stored as a row inside a model store table.
The following shows how to store a trained :class:`DeepWalk` model in database in a specific model store table:

.. code-block:: python
    :linenos:

    model.export().db(
        username="user",
        password="password",
        model_store="modelstoretablename",
        model_name="model",
        jdbc_url="jdbc_url"
    )

Loading a pre-trained model
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly to storing, models can be loaded from a file in the server file system, or from a database.

We can load a pre-trained :class:`DeepWalk` model from a specified file path as follows:

.. code-block:: python
    :linenos:

    analyst.get_deepwalk_model_loader().file(path=tmp + "/model.model", key="test")

We can load a pre-trained :class:`DeepWalk` model from a model store table in database as follows:

.. code-block:: python
    :linenos:

    analyst.get_deepwalk_model_loader().db(
        username="user",
        password="password",
        model_store="modelstoretablename",
        model_name="model",
        jdbc_url="jdbc_url"
    )

Destroying a model
~~~~~~~~~~~~~~~~~~

We can destroy a model as follows:

.. code-block:: python
    :linenos:
    
    model.destroy()