Graph Versioning

A graph can have multiple snapshots associated with it, reflecting different versions of the graph. All snapshots of a graph have the same graph config associated.

This guide describes:

  1. How to configure the source of snapshots

  2. How snapshots are created

  3. How to show available snapshots of a loaded graph

  4. How to check out the latest snapshot of a loaded graph

  5. How to check out different snapshots of a loaded graph

  6. How to load a specific snapshot of a graph

Note

Starting from PGX version 19.4, snapshots can be published to other sessions.

Configuring the snapshots source

Starting from PGX 20.0.0, snapshots can be created from two sources: Refreshing and ChangeSet. Prior to version 20.0.0, only refreshing was available.

Refreshing is available for graphs that are read from a persistent data source, e.g. a file. When the data source has changed with respect to the version stored in PGX, it can be read again manually by calling the PgxSession.read_graph_with_properties() method; similarly, if auto-refresh is set for the graph, the PGX server automatically reads the data source and creates new snapshots when the data source has changed.

Instead, a ChangeSet is a set of changes to a graph that the user creates and populates via the PGX ChangeSet API. Once a ChangeSet is created and populated with the desired changes, the user can simply call GraphChangeSet.build_new_snapshot() to create a new snapshot for the graph. In this way, PGX users can easily integrate changes coming from any source into the graph and build snapshots out of them with full control.

Only one source of snapshots is allowed for a single graph and is chosen during graph configuration via the snapshots_source option, which can be set to either REFRESH or CHANGE_SET (you can refer to Graph Configuration for the complete list of options). In case the snapshots_source option is not explicitly set by the user, the following default settings apply:

  • if the graph is from a persistent data source, the default value is REFRESH, so that snapshots can be created only by calling PgxSession.read_graph_with_properties() (or via auto-refresh, if configured)

  • if the graph is transient, i.e. built from a graph builder (see Building Graphs from Scratch for more information), the default value is CHANGE_SET, since the graph is not backed by a persistent data source to read changes from; for the same reason, CHANGE_SET is the only admissible value for transient graphs.

Additionally, the following restrictions apply:

  • if auto-refresh is enabled, then snapshots come from reading the backing data source and hence only REFRESH is admissible for the snapshots_source option

  • if the user attempts to create snapshots in a way that is different from the configuration (e.g. by calling GraphChangeSet.build_new_snapshot() when the graph’s snapshots_source is REFRESH ), the operation is invalid and an exception is thrown.

Snapshot creation

Here we show how to create a snapshot both via refreshing and via ChangeSet.

Snapshot creation via Refreshing

First, you should load a graph into memory: you can see the graph loading tutorial for a complete explanation about loading graphs; briefly, you should call the PgxSession.read_graph_with_properties() method and pass it the graph configuration.

1session = pypgx.get_session(session_name="my-session")
2g = session.read_graph_with_properties(self.graph_path)

Now you can check the available snapshots of the graph with PgxSession.get_available_snapshots(). Since you just loaded the graph there is only one snapshot available:

1snapshots = session.get_available_snapshots(g)
2for metadata in snapshots:
3    print(metadata)

Now you can edit the source file to contain an additional vertex and an additional edge. For example, add the vertex “42” with vertex property “7” and an edge from “42” to “333” with the edge property “10.0”. To do this add the line 42,7 at the end of examples/graphs/sample.vertices.csv, and the line 42,333,10.0 at the end of examples/graphs/sample.edges.csv. When you now load the updated graph within the same session as you loaded the original graph, a new snapshot is created.

1g = session.read_graph_with_properties(
2    g.config, update_if_not_fresh=True)

Notice how there are two GraphMetaData objects in the call for available snapshots, one with 4 vertices and 4 edges and one with 5 vertices and 5 edges.

The variable G will point to the newest loaded graph with 5 vertices and 5 edges. You can check this with the get_num_vertices() and get_num_edges() methods.

1vertices = g.num_vertices
2edges = g.num_edges

Snapshot creation via ChangeSet

With ChangeSets, all operations are done via the PGX Java API. In case you want to create the graph from a persistent data source, you can again use PgxSession.read_graph_with_properties() as in the previous example, with the snapshots_source configuration option set to CHANGE_SET. For the sake of example, here we create the first graph snapshot of a transient graph via a graph builder as in the graph builder example.

1builder = session.create_graph_builder()
2builder.add_edge(1, 2)
3builder.add_edge(2, 3)
4builder.add_edge(2, 4)
5builder.add_edge(3, 4)
6builder.add_edge(4, 2)
7
8graph = builder.build()

Regardless of how the first snapshot has been created, the following step consists in creating a ChangeSet from graph and populating it: here, we add a new edge between vertices 1 and 4.

1change_set = graph.create_change_set(
2    edge_id_generation_strategy='user_ids')
3change_set.add_edge(1, 4, 6)

Finally, the second snapshot is created by invoking GraphChangeSet.build_new_snapshot(), which returns the reference to the second snapshot.

1second_snapshot = change_set.build_new_snapshot()
2print(len(session.get_available_snapshots(graph)))

We finally see that two snapshots exist, referenced via the variables graph and second_snapshot.

Checking out the latest snapshots of a graph

With multiple snapshots of a graph being available and regardless of their source, you can check out a specific snapshot using the PgxSession.set_snapshot() method; in particular, you can use the LATEST_SNAPSHOT constant of PgxSession to easily check out the latest available snapshot, as in the following example.

1session.set_snapshot(g, creation_timestamp=PgxSession.LATEST_SNAPSHOT)
2s = session.get_available_snapshots(g)
3print(s[0].get_creation_timestamp())

Note the printed timestamp is that of the most recent snapshot.

Checking out different snapshots of a graph

You can also check out a specific snapshot, again using the PgxSession.set_snapshot().

To check out a specific snapshot of the graph, you should pass the creation_timestamp of the snapshot you want to load to set_snapshot(). For example, if G is pointing to the newest graph with 5 vertices and 5 edges but you want to analyze the older graph, you need to set the snapshot to 1453315122685.

1session.set_snapshot(
2    g, meta_data=s[0], creation_timestamp=s[0].get_creation_timestamp())

Notice how after setting the snapshot the number of vertices and edges changed from 5 to 4.

Here, we manually passed the creation timestamp we printed to set_snapshot() for the sake of example. In general, you can retrieve the creation timestamp of each snapshot from its associated GraphMetaData object via the GraphMetaData.get_creation_timestamp() method. The easiest way to get the GraphMetaData information of all the snapshots is to use the the PgxSession.get_available_snapshots() method, which returns a collection of GraphMetaData information of each snapshot ordered by creation timestamp from the most recent to the oldest.

Directly loading a specific snapshot of a graph

You can also load a specific snapshot of a graph directly using the PgxSession.read_graph_as_of() method. This is a shortcut for loading a graph with read_graph_with_properties() followed by a set_snapshot().

Note that this only works for snapshots created by auto-refresh only. Snapshots created using a ChangeSet are not accessible through PgxSession.read_graph_as_of().

Imagine two snapshots of a graph are already loaded into the PGX session, and you want to get a reference to a specific snapshot. First you need to get a graph configuration for this graph:

1config = GraphConfigFactory.for_file_formats().from_file_path(self.cfg)

Then you can check the loaded snapshots for this graph config using get_available_snapshots():

1g = session.read_graph_with_properties(config)
2s = session.get_available_snapshots(g)

Now you want to check out the snapshot of the graph which has 4 vertices and 4 edges, which has the timestamp 1453315122685.

1g = session.read_graph_as_of(
2    config, creation_timestamp=s[0].get_creation_timestamp())