17.3.18 Example: Predicting Ratings on the Movielens Dataset
SupervisedEdgeWise
in the
graph server (PGX) using the Movielens graph as an example.
This data set consists of 100,000 ratings (1-5) from 943 users on 1682 movies, with simple demographic information for the users (age, gender, occupation) and movies (year, aggravating, genre). Users and movies are vertices, while ratings of users to movies are edges with a rating feature.
The following example predicts the ratings
using the SupervisedEdgeWise
model. The model is first built and it
is then fit on the trainGraph
.
opg4j> import oracle.pgx.config.mllib.loss.LossFunctions
opg4j> var convLayer = analyst.graphWiseConvLayerConfigBuilder().
setNumSampledNeighbors(10).
build()
opg4j> var predictionLayer = analyst.graphWisePredictionLayerConfigBuilder().
setHiddenDimension(16).
build()
opg4j> var model = analyst.supervisedEdgeWiseModelBuilder().
setVertexInputPropertyNames("movie_year", "avg_rating", "movie_genres", // Movies features
"user_occupation_label", "user_gender", "raw_user_age"). // Users features
setEdgeTargetPropertyName("user_rating").
setConvLayerConfigs(convLayer).
setPredictionLayerConfigs(predictionLayer).
setNumEpochs(10).
setEmbeddingDim(32).
setLearningRate(0.003).
setStandardize(true).
setNormalize(true).
setSeed(0).
setLossFunction(LossFunctions.MSE_LOSS).
build()
opg4j> model.fit(trainGraph)
import oracle.pgx.config.mllib.loss.LossFunctions;
GraphWiseConvLayerConfig convLayer = analyst.graphWiseConvLayerConfigBuilder()
.setNumSampledNeighbors(10)
.build();
GraphWisePredictionLayerConfig predictionLayer = analyst.graphWisePredictionLayerConfigBuilder()
.setHiddenDimension(16)
.build();
SupervisedEdgeWiseModel model = analyst.supervisedEdgeWiseModelBuilder()
.setVertexInputPropertyNames("movie_year", "avg_rating", "movie_genres", // Movies features
"user_occupation_label", "user_gender", "raw_user_age") // Users features
.setEdgeTargetPropertyName("user_rating")
.setConvLayerConfigs(convLayer)
.setPredictionLayerConfigs(predictionLayer)
.setNumEpochs(10)
.setEmbeddingDim(32)
.setLearningRate(0.003)
.setStandardize(true)
.setNormalize(true)
.setSeed(0)
.setLossFunction(LossFunctions.MSE_LOSS)
.build();
model.fit(trainGraph);
from pypgx.api.mllib import MSELoss
conv_layer_config = dict(num_sampled_neighbors=10)
conv_layer = analyst.graphwise_conv_layer_config(**conv_layer_config)
pred_layer_config = dict(hidden_dim=16)
pred_layer = analyst.graphwise_pred_layer_config(**pred_layer_config)
params = dict(edge_target_property_name="labels",
conv_layer_config=[conv_layer],
pred_layer_config=[pred_layer],
vertex_input_property_names=["movie_year", "avg_rating", "movie_genres",
"user_occupation_label", "user_gender", "raw_user_age"],
edge_input_property_names=["user_rating"],
num_epochs=10,
layer_size=32,
learning_rate=0.003,
normalize=true,
loss_fn=MSELoss(),
seed=0)
model = analyst.supervised_edgewise_builder(**params)
model.fit(train_graph)
Since EdgeWise
is inductive, you can infer the ratings for unseen
edges:
opg4j> var labels = model.infer(fullGraph, testEdges)
opg4j> labels.head().print()
PgxFrame labels = model.infer(fullGraph, testEdges);
labels.head().print();
labels = model.infer(full_graph, test_edges)
labels.print()
This returns the rating prediction for any edge as:
+-----------------------------+
| edgeId | value |
+-----------------------------+
| 68472 | 3.844510078430176 |
| 53436 | 3.5453758239746094 |
| 73364 | 3.688265085220337 |
| 12096 | 3.8873679637908936 |
| 78740 | 3.3845553398132324 |
| 27664 | 2.6601722240448 |
| 34844 | 4.108948230743408 |
| 74224 | 3.7714107036590576 |
| 33744 | 3.2331383228302 |
| 32812 | 3.8763082027435303 |
+-----------------------------+
You can also evaluate the performance of the model:
opg4j> model.evaluate(fullGraph, testEdges).print()
model.evaluate(fullGraph,testEdges).print();
model.evaluate(full_graph,test_edges).print()
This returns the following output:
+--------------------+
| MSE |
+--------------------+
| 0.9573243436116953 |
+--------------------+