******** DeepWalk ******** Overview of the algorithm ------------------------- :class:`DeepWalk` is a widely employed vertex representation learning algorithm used in industry (e.g., in `Taobao from Alibaba `_). It consists of two main steps: - First, the random walk generation step computes random walks for each vertex (with a pre-defined walk length and a pre-defined number of walks per vertex) - Second, these generated walks are fed to a word2vec algorithm to generate the vector representation for each vertex (which is the word in the input provided to the word2vec algorithm). Further details regarding the :class:`DeepWalk` algorithm is available in the KDD `paper `_. :class:`DeepWalk` creates vertex embeddings for a specific graph and cannot be updated to incorporate modifications on the graph. Instead, a new :class:`DeepWalk` model should be trained on this modified graph. Lastly, it is important to note that the memory consumption of the :class:`DeepWalk` model is ``O(2n*d)`` where ``n`` is the number of vertices in the graph and ``d`` is the embedding length. Functionalities --------------- We describe here the usage of the main functionalities of our implementation of :class:`DeepWalk` in PGX using `DBpedia `_ graph as an example (with 8,637,721 vertices and 165,049,964 edges). Loading a graph ~~~~~~~~~~~~~~~ First, we create a session and an analyst: .. code-block:: python :linenos: session = pypgx.get_session(session_name="my-session") analyst = session.create_analyst() Our :class:`DeepWalk` algorithm implementation can be applied to directed or undirected graphs (even though we only consider undirected random walks). To begin with, we can load a graph as follows: .. code-block:: python :linenos: graph = session.read_graph_with_properties(self.small_graph) Building a DeepWalk Model (minimal) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We build a :class:`DeepWalk` model using the minimal configuration and default hyper-parameters: .. code-block:: python :linenos: model = analyst.deepwalk_builder( window_size=3, walks_per_vertex=6, walk_length=4 ) Building a DeepWalk Model (customized) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We build a :class:`DeepWalk` model using customized hyper-parameters: .. code-block:: python :linenos: model = analyst.deepwalk_builder( min_word_frequency=1, batch_size=512, num_epochs=1, layer_size=100, learning_rate=0.05, min_learning_rate=0.0001, window_size=3, walks_per_vertex=6, walk_length=4, sample_rate=1.0, negative_sample=2 ) We provide complete explanation for each builder operation (along with the default values) in our :meth:`pypgx.api.mllib.Analyst.deepwalk_builder` docs. Training the DeepWalk model ~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can train a :class:`DeepWalk` model with the specified (default or customized) settings: .. code-block:: python :linenos: model.fit(graph) Getting Loss value ~~~~~~~~~~~~~~~~~~ We can fetch the loss value on a specified fraction of training data: .. code-block:: python :linenos: loss = model.loss Computing the similar vertices ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can fetch the ``k`` most similar vertices for a given vertex: .. code-block:: python :linenos: similars = model.compute_similars(9, 2) similars.print() The output results will be in the following format, for e.g., searching for similar vertices for `Albert_Einstein `_ using the trained model. +--------------------+--------------------+ | dstVertex | similarity | +====================+====================+ | Albert_Einstein | 1.0000001192092896 | +--------------------+--------------------+ | Physics | 0.8664291501045227 | +--------------------+--------------------+ | Werner_Heisenberg | 0.8625140190124512 | +--------------------+--------------------+ | Richard_Feynman | 0.8496938943862915 | +--------------------+--------------------+ | List_of_physicists | 0.8415523767471313 | +--------------------+--------------------+ | Physicist | 0.8384397625923157 | +--------------------+--------------------+ | Max_Planck | 0.8370327353477478 | +--------------------+--------------------+ | Niels_Bohr | 0.8340970873832703 | +--------------------+--------------------+ | Quantum_mechanics | 0.8331197500228882 | +--------------------+--------------------+ | Special_relativity | 0.8280861973762512 | +--------------------+--------------------+ Computing the similars (for a vertex batch) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can fetch the ``k`` most similar vertices for a list of input vertices: .. code-block:: python :linenos: vertices = [5, 9] batched_similars = model.compute_similars(vertices, 10) batched_similars.print() The output results will be in the following format: +------------------+---------------------------+--------------------+ | srcVertex | dstVertex | similarity | +==================+===========================+====================+ | Machine_learning | Machine_learning | 1.0000001192092896 | +------------------+---------------------------+--------------------+ | Machine_learning | Data_mining | 0.9070799350738525 | +------------------+---------------------------+--------------------+ | Machine_learning | Computer_science | 0.8963605165481567 | +------------------+---------------------------+--------------------+ | Machine_learning | Unsupervised_learning | 0.8828719854354858 | +------------------+---------------------------+--------------------+ | Machine_learning | R_(programming_language) | 0.8821185827255249 | +------------------+---------------------------+--------------------+ | Machine_learning | Algorithm | 0.8819515705108643 | +------------------+---------------------------+--------------------+ | Machine_learning | Artificial_neural_network | 0.8773092031478882 | +------------------+---------------------------+--------------------+ | Machine_learning | Data_analysis | 0.8758628368377686 | +------------------+---------------------------+--------------------+ | Machine_learning | List_of_algorithms | 0.8737979531288147 | +------------------+---------------------------+--------------------+ | Machine_learning | K-means_clustering | 0.8715602159500122 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Albert_Einstein | 1.0000001192092896 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Physics | 0.8664291501045227 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Werner_Heisenberg | 0.8625140190124512 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Richard_Feynman | 0.8496938943862915 | +------------------+---------------------------+--------------------+ | Albert_Einstein | List_of_physicists | 0.8415523767471313 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Physicist | 0.8384397625923157 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Max_Planck | 0.8370327353477478 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Niels_Bohr | 0.8340970873832703 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Quantum_mechanics | 0.8331197500228882 | +------------------+---------------------------+--------------------+ | Albert_Einstein | Special_relativity | 0.8280861973762512 | +------------------+---------------------------+--------------------+ Getting all trained vertex vectors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can retrieve the trained vertex vectors for the current :class:`DeepWalk` model and store in a ``TSV`` file (``CSV`` with ``tab`` separator): .. code-block:: python :linenos: vertex_vectors = model.trained_vectors.flatten_all() vertex_vectors.store( tmp + "/vertex_vectors.tsv", overwrite=True, file_format="csv" ) The schema for the :meth:`vertex_vectors` would be as follows without flattening (:meth:`flatten_all` splits the vector column into separate double-valued columns): +-----------------------------------------+---------------------+ | vertexId | embedding | +-----------------------------------------+---------------------+ Storing a trained model ~~~~~~~~~~~~~~~~~~~~~~~ Models can be stored either to the server file system, or to a database. The following shows how to store a trained :class:`DeepWalk` model to a specified file path: .. code-block:: python :linenos: model.export().file(path=tmp + "/model.model", key="test", overwrite=True) When storing models in database, they are stored as a row inside a model store table. The following shows how to store a trained :class:`DeepWalk` model in database in a specific model store table: .. code-block:: python :linenos: model.export().db( username="user", password="password", model_store="modelstoretablename", model_name="model", jdbc_url="jdbc_url" ) Loading a pre-trained model ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Similarly to storing, models can be loaded from a file in the server file system, or from a database. We can load a pre-trained :class:`DeepWalk` model from a specified file path as follows: .. code-block:: python :linenos: analyst.get_deepwalk_model_loader().file(path=tmp + "/model.model", key="test") We can load a pre-trained :class:`DeepWalk` model from a model store table in database as follows: .. code-block:: python :linenos: analyst.get_deepwalk_model_loader().db( username="user", password="password", model_store="modelstoretablename", model_name="model", jdbc_url="jdbc_url" ) Destroying a model ~~~~~~~~~~~~~~~~~~ We can destroy a model as follows: .. code-block:: python :linenos: model.destroy()