17.5.14 Example: Computing Edge Embeddings on the Movielens Dataset
This section describes the usage of
UnsupervisedEdgeWise
in
PGX using the Movielens graph as an example.
This data set
consists of 100,000 ratings (1-5) from 943 users on 1682 movies, with simple
demographic information for the users (age, gender, occupation) and movies (year,
aggravating, genre). Users and movies are vertices, while ratings of users to movies
are edges with a rating
feature.
The following
example predicts the ratings using the UnsupervisedEdgeWise
model.
You first build the model and fit it on the
trainGraph
.
opg4j> var convLayer = analyst.graphWiseConvLayerConfigBuilder().
setNumSampledNeighbors(10).
build()
opg4j> var model = analyst.unsupervisedEdgeWiseModelBuilder().
setVertexInputPropertyNames("movie_year", "avg_rating", "movie_genres", // Movies features
"user_occupation_label", "user_gender", "raw_user_age"). // Users features
setEdgeInputPropertyNames("user_rating").
setConvLayerConfigs(convLayer).
setNumEpochs(10).
setEmbeddingDim(32).
setLearningRate(0.003).
setStandardize(true).
setNormalize(false). //recommended
setSeed(0).
build()
opg4j> model.fit(trainGraph)
GraphWiseConvLayerConfig convLayer = analyst.graphWiseConvLayerConfigBuilder()
.setNumSampledNeighbors(10)
.build();
UnsupervisedEdgeWiseModel model = analyst.unsupervisedEdgeWiseModelBuilder()
.setVertexInputPropertyNames("movie_year", "avg_rating", "movie_genres", // Movies features
"user_occupation_label", "user_gender", "raw_user_age") // Users features
.setEdgeInputPropertyNames("user_rating")
.setConvLayerConfigs(convLayer)
.setNumEpochs(10)
.setEmbeddingDim(32)
.setLearningRate(0.003)
.setStandardize(true)
.setNormalize(false) //recommended
.setSeed(0)
.build();
model.fit(trainGraph);
conv_layer_config = dict(num_sampled_neighbors=10)
conv_layer = analyst.graphwise_conv_layer_config(**conv_layer_config)
params = dict(conv_layer_config=[conv_layer],
vertex_input_property_names=["movie_year", "avg_rating", "movie_genres",
"user_occupation_label", "user_gender", "raw_user_age"],
edge_input_property_names=["user_rating"],
num_epochs=10,
embedding_dim=32,
learning_rate=0.003,
normalize=False, #recommended
seed=0)
model = analyst.unsupervised_edgewise_builder(**params)
model.fit(train_graph)
Since EdgeWise
is inductive, you can infer the ratings for unseen
edges:
opg4j> var embeddings = model.inferEmbeddings(fullGraph, testEdges)
opg4j> embeddings.head().print()
PgxFrame embeddings = model.inferEmbeddings(fullGraph,testEdges);
embeddings.head().print();
embeddings = model.infer_embeddings(full_graph, test_edges)
embeddings.print()
Parent topic: Using the Unsupervised EdgeWise Algorithm