17.5.13 Example: Computing Edge Embeddings on the Movielens Dataset

This section describes the usage of UnsupervisedEdgeWise in PGX using the Movielens graph as an example.

This data set consists of 100,000 ratings (1-5) from 943 users on 1682 movies, with simple demographic information for the users (age, gender, occupation) and movies (year, aggravating, genre). Users and movies are vertices, while ratings of users to movies are edges with a rating feature.

The following example predicts the ratings using the UnsupervisedEdgeWise model. You first build the model and fit it on the trainGraph.

opg4j> var convLayer = analyst.graphWiseConvLayerConfigBuilder().
        setNumSampledNeighbors(10).
        build()

opg4j> var model = analyst.unsupervisedEdgeWiseModelBuilder().
        setVertexInputPropertyNames("movie_year", "avg_rating", "movie_genres", // Movies features
            "user_occupation_label", "user_gender", "raw_user_age"). // Users features
        setEdgeInputPropertyNames("user_rating").
        setConvLayerConfigs(convLayer).
        setNumEpochs(10).
        setEmbeddingDim(32).
        setLearningRate(0.003).
        setStandardize(true).
        setNormalize(true).
        setSeed(0).
        build()
opg4j> model.fit(trainGraph)
GraphWiseConvLayerConfig convLayer = analyst.graphWiseConvLayerConfigBuilder()
        .setNumSampledNeighbors(10)
        .build();

UnsupervisedEdgeWiseModel model = analyst.unsupervisedEdgeWiseModelBuilder()
        .setVertexInputPropertyNames("movie_year", "avg_rating", "movie_genres", // Movies features
            "user_occupation_label", "user_gender", "raw_user_age") // Users features
        .setEdgeInputPropertyNames("user_rating")
        .setConvLayerConfigs(convLayer)
        .setNumEpochs(10)
        .setEmbeddingDim(32)
        .setLearningRate(0.003)
        .setStandardize(true)
        .setNormalize(true)
        .setSeed(0)
        .build();

model.fit(trainGraph);
conv_layer_config = dict(num_sampled_neighbors=10)

conv_layer = analyst.graphwise_conv_layer_config(**conv_layer_config)

params = dict(conv_layer_config=[conv_layer],
              vertex_input_property_names=["movie_year", "avg_rating", "movie_genres",
                "user_occupation_label", "user_gender", "raw_user_age"],
              edge_input_property_names=["user_rating"],
              num_epochs=10,
              embedding_dim=32,
              learning_rate=0.003,
              normalize=true,
              seed=0)

model = analyst.unsupervised_edgewise_builder(**params)

model.fit(train_graph)

Since EdgeWise is inductive, you can infer the ratings for unseen edges:

opg4j> var embeddings = model.inferEmbeddings(fullGraph, testEdges)
opg4j> embeddings.head().print()
PgxFrame embeddings = model.inferEmbeddings(fullGraph,testEdges);
embeddings.head().print();
embeddings = model.infer_embeddings(full_graph, test_edges)
embeddings.print()