17.4.9 Classifying the Vertices Using the Obtained Embeddings
You can use the obtained embeddings in downstream vertex classification tasks.
The following code shows how you can train a multi-layer perceptron
(MLP) classifier, which takes the embeddings as input. It is assumed that the vertex
label information is stored under the vertex property labels
.
import pandas as pd
from sklearn.metrics import accuracy_score, make_scorer
from sklearn.model_selection import RepeatedStratifiedKFold, cross_val_score
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
# prepare input data
vertex_vectors_df = vertex_vectors.to_pandas().astype({"vertexId": int})
vertex_labels_df = pd.DataFrame([
{"vertexId": v.id, "labels": properties}
for v, properties in graph.get_vertex_property("labels").get_values()
]).astype(int)
vertex_vectors_with_labels_df = vertex_vectors_df.merge(vertex_labels_df, on="vertexId")
feature_columns = [c for c in vertex_vectors_df.columns if c.startswith("embedding")]
x = vertex_vectors_with_labels_df[feature_columns].to_numpy()
y = vertex_vectors_with_labels_df["labels"].to_numpy()
scaler = StandardScaler()
x = scaler.fit_transform(x)
# define an MLP classifier
model = MLPClassifier(
hidden_layer_sizes=(6,),
learning_rate_init=0.05,
max_iter=2000,
random_state=42,
)
# define a metric and evaluate with cross-validation
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=42)
scorer = make_scorer(accuracy_score, greater_is_better=True)
scores = cross_val_score(model, x, y, scoring=scorer, cv=cv, n_jobs=-1)