PGX 20.1.1
Documentation

Graph Versioning

A graph can have multiple snapshots associated with it, reflecting different versions of the graph. All snapshots of a graph have the same graph config associated.

This guide describes:

  1. How to configure the source of snapshots
  2. How snapshots are created
  3. How to show available snapshots of a loaded graph
  4. How to check out the latest snapshot of a loaded graph
  5. How to check out different snapshots of a loaded graph
  6. How to load a specific snapshot of a graph

Starting from PGX version 19.4, snapshots can be published to other sessions.

For more information, see publish a graph with snapshots.

Configuring the Snapshots Source

Starting from PGX 20.0.0, snapshots can be created from two sources: Refreshing and ChangeSet. Prior to version 20.0.0, only refreshing was available.

Refreshing is available for graphs that are read from a persistent data source, e.g. a file. When the data source has changed with respect to the version stored in PGX, it can be read again manually by calling the PgxSession.readGraphWithProperties() method; similarly, if auto-refresh is set for the graph, the PGX server automatically reads the data source and creates new snapshots when the data source has changed (see Auto-refreshing graphs).

Instead, a ChangeSet is a set of changes to a graph that the user creates and populates via the PGX ChangeSet API (more information in the dedicated page and in the API docs). Once a ChangeSet is created and populated with the desired changes, the user can simply call GraphChangeSet.buildNewSnapshot() to create a new snapshot for the graph. In this way, PGX users can easily integrate changes coming from any source into the graph and build snapshots out of them with full control.

Only one source of snapshots is allowed for a single graph and is chosen during graph configuration via the snapshots_source option, which can be set to either REFRESH or CHANGE_SET (you can refer to Graph Configuration for the complete list of options). In case the snapshots_source option is not explicitly set by the user, the following default settings apply:

  • if the graph is from a persistent data source, the default value is REFRESH, so that snapshots can be created only by calling PgxSession.readGraphWithProperties() (or via auto-refresh, if configured)
  • if the graph is transient, i.e. built from a graph builder (see Building Graphs from Scratch for more information), the default value is CHANGE_SET, since the graph is not backed by a persistent data source to read changes from; for the same reason, CHANGE_SET is the only admissible value for transient graphs

Additionally, the following restrictions apply:

  • if auto-refresh is enabled, then snapshots come from reading the backing data source and hence only REFRESH is admissible for the snapshots_source option
  • if the user attempts to create snapshots in a way that is different from the configuration (e.g. by calling GraphChangeSet.buildNewSnapshot() when the graph's snapshots_source is REFRESH), the operation is invalid and an exception is thrown

Snapshot Creation

Here we show how to create a snapshot both via refreshing and via ChangeSet.

Snapshot Creation via Refreshing

First, you should load a graph into memory: you can see the graph loading tutorial for a complete explanation about loading graphs; briefly, you should call the PgxSession.readGraphWithProperties() method and pass it the graph configuration.

pgx> var G = session.readGraphWithProperties("examples/graphs/sample.csv.json")
==> PGX Graph named 'sample' bound to PGX session 'a1744e86-65fb-4bd1-b2dc-5458b20954a9' registered at PGX Server Instance running in embedded mode
PgxSession session = Pgx.createSession("tutorial");
PgxGraph = session.readGraphWithProperties("examples/graphs/sample.csv.json");

Now you can check the available snapshots of the graph with PgxSession.getAvailableSnapshots(). Since you just loaded the graph there is only one snapshot available:

pgx> session.getAvailableSnapshots(G)
==> GraphMetaData [getNumVertices()=4, getNumEdges()=4, memoryMb=0, dataSourceVersion=1453315103000, creationRequestTimestamp=1453315122669 (2016-01-20 10:38:42.669), creationTimestamp=1453315122685 (2016-01-20 10:38:42.685), vertexIdType=integer, edgeIdType=long]
Deque<GraphMetaData> snapshots = session.getAvailableSnapshots(G);

for( GraphMetaData metaData : snapshots ) {
  System.out.println( metaData );
}

Now you can edit the source file to contain an additional vertex and an additional edge. For example, add the vertex "42" with vertex property "7" and an edge from "42" to "333" with the edge property "10.0". To do this add the line 42,7 at the end of examples/graphs/sample.vertices.csv, and the line 42,333,10.0 at the end of examples/graphs/sample.edges.csv. When you now load the updated graph within the same session as you loaded the original graph, a new snapshot is created.

pgx> var G = session.readGraphWithProperties( G.getConfig(), true )
==> PGX Graph named 'sample_2' bound to PGX session 'a1744e86-65fb-4bd1-b2dc-5458b20954a9' registered at PGX Server Instance running in embedded mode

pgx> session.getAvailableSnapshots(G)
==> GraphMetaData [getNumVertices()=4, getNumEdges()=4, memoryMb=0, dataSourceVersion=1453315103000, creationRequestTimestamp=1453315122669 (2016-01-20 10:38:42.669), creationTimestamp=1453315122685 (2016-01-20 10:38:42.685), vertexIdType=integer, edgeIdType=long]
==> GraphMetaData [getNumVertices()=5, getNumEdges()=5, memoryMb=3, dataSourceVersion=1452083654000, creationRequestTimestamp=1453314938744 (2016-01-20 10:35:38.744), creationTimestamp=1453314938833 (2016-01-20 10:35:38.833), vertexIdType=integer, edgeIdType=long]
G = session.readGraphWithProperties( G.getConfig(), true );

Deque<GraphMetaData> snapshots = session.getAvailableSnapshots( G );

Notice how there are two GraphMetaData objects in the call for available snapshots, one with 4 vertices and 4 edges and one with 5 vertices and 5 edges.

The variable G will point to the newest loaded graph with 5 vertices and 5 edges. You can check this with the getNumVertices() and getNumEdges() methods.

pgx> G.getNumVertices()
==> 5
pgx> G.geNumEdges()
==> 5
int vertices = G.getNumVertices();
long edges = G.getNumEdges();

Snapshot Creation via ChangeSet

With ChangeSets, all operations are done via the PGX Java API. In case you want to create the graph from a persistent data source, you can again use PgxSession.readGraphWithProperties() as in the previous example, with the snapshots_source configuration option set to CHANGE_SET. For the sake of example, here we create the first graph snapshot of a transient graph via a graph builder as in the graph builder example.

var builder = session.createGraphBuilder()

builder.addEdge(1, 2)
builder.addEdge(2, 3)
builder.addEdge(2, 4)
builder.addEdge(3, 4)
builder.addEdge(4, 2)

var graph = builder.build()
import oracle.pgx.api.*;

GraphBuilder<Integer> builder = session.createGraphBuilder();

builder.addEdge(1, 2);
builder.addEdge(2, 3);
builder.addEdge(2, 4);
builder.addEdge(3, 4);
builder.addEdge(4, 2);

PgxGraph graph = builder.build();

Regardless of how the first snapshot has been created, the following step consists in creating a ChangeSet from graph and populating it: here, we add a new edge between vertices 1 and 4.

var changeSet = graph.<Integer>createChangeSet()

changeSet.addEdge(6, 1, 4)
import oracle.pgx.api.*;

GraphChangeSet<Integer> changeSet = graph.createChangeSet();
changeSet.addEdge(6, 1, 4);

Finally, the second snapshot is created by invoking GraphChangeSet.buildNewSnapshot(), which returns the reference to the second snapshot.

var secondSnapshot = changeSet.buildNewSnapshot()

session.getAvailableSnapshots(secondSnapshot).size()
==> 2
PgxGraph secondSnapshot = changeSet.buildNewSnapshot();

System.out.println( session.getAvailableSnapshots(secondSnapshot).size() );

We finally see that two snapshots exist, referenced via the variables graph and secondSnapshot.

Checking out the Latest Snapshots of a Graph

With multiple snapshots of a graph being available and regardless of their source, you can check out a specific snapshot using the PgxSession.setSnapshot() method; in particular, you can use the LATEST_SNAPSHOT constant of PgxSession to easily check out the latest available snapshot, as in the following example.

pgx> session.setSnapshot( G, PgxSession.LATEST_SNAPSHOT )
==> null

pgx> session.getCreationTimestamp()
==> 1453315122685
session.setSnapshot( G, PgxSession.LATEST_SNAPSHOT );

System.out.println( session.getCreationTimestamp() )

Note the printed timestamp is that of the most recent snapshot.

Checking out Different Snapshots of a Graph

You can also check out a specific snapshot, again using the PgxSession.setSnapshot().

Following the refresh example from above, you have two snapshots of the sample graph loaded:

==> GraphMetaData [getNumVertices()=4, getNumEdges()=4, memoryMb=0, dataSourceVersion=1453315103000, creationRequestTimestamp=1453315122669 (2016-01-20 10:38:42.669), creationTimestamp=1453315122685 (2016-01-20 10:38:42.685), vertexIdType=integer, edgeIdType=long]
==> GraphMetaData [getNumVertices()=5, getNumEdges()=5, memoryMb=3, dataSourceVersion=1452083654000, creationRequestTimestamp=1453314938744 (2016-01-20 10:35:38.744), creationTimestamp=1453314938833 (2016-01-20 10:35:38.833), vertexIdType=integer, edgeIdType=long]

To check out a specific snapshot of the graph, you should pass the creationTimestamp of the snapshot you want to load to setSnapshot(). For example, if G is pointing to the newest graph with 5 vertices and 5 edges but you want to analyze the older graph, you need to set the snapshot to 1453315122685.

pgx> G.getNumVertices()
==> 5
pgx> G.getNumEdges()
==> 5

pgx> session.setSnapshot( G, 1453315122685 )
==> null

pgx> G.getNumVertices()
==> 4
pgx> G.getNumEdges()
==> 4
session.setSnapshot( G, 1453315122685 );

Notice how after setting the snapshot the number of vertices and edges changed from 5 to 4.

Here, we manually passed the creation timestamp we printed to setSnapshot() for the sake of example. In general, you can retrieve the creation timestamp of each snapshot from its associated GraphMetaData object via the GraphMetaData.getCreationTimestamp() method. The easiest way to get the GraphMetaData information of all the snapshots is to use the the PgxSession.getAvailableSnapshots() method, which returns a collection of GraphMetaData information of each snapshot ordered by creation timestamp from the most recent to the oldest.

Directly Loading a Specific Snapshot of a Graph

You can also load a specific snapshot of a graph directly using the PgxSession.readGraphAsOf() method. This is a shortcut for loading a graph with readGraphWithProperties() followed by a setSnapshot().

Imagine two snapshots of a graph are already loaded into the PGX session, and you want to get a reference to a specific snapshot. First you need to get a graph configuration for this graph:

pgx> var config = GraphConfigFactory.forAnyFormat().fromPath("examples/graphs/sample.adj.json")
==> {"format":"adj_list", ... }
GraphConfig config = GraphConfigFactory.forAnyFormat().fromPath("examples/graphs/sample.csv.json");

Then you can check the loaded snapshots for this graph config using getAvailableSnapshots():

pgx> session.getAvailableSnapshots(G)
==> GraphMetaData [getNumVertices()=4, getNumEdges()=4, memoryMb=0, dataSourceVersion=1453315103000, creationRequestTimestamp=1453315122669 (2016-01-20 10:38:42.669), creationTimestamp=1453315122685 (2016-01-20 10:38:42.685), vertexIdType=integer, edgeIdType=long]
==> GraphMetaData [getNumVertices()=5, getNumEdges()=5, memoryMb=3, dataSourceVersion=1452083654000, creationRequestTimestamp=1453314938744 (2016-01-20 10:35:38.744), creationTimestamp=1453314938833 (2016-01-20 10:35:38.833), vertexIdType=integer, edgeIdType=long]
Deque<GraphMetaData> snapshots = session.getAvailableSnapshots(G);

Now you want to check out the snapshot of the graph which has 4 vertices and 4 edges, which has the timestamp 1453315122685.

pgx> var G = session.readGraphAsOf( config, 1453315122685 )
==> PGX Graph named 'sample' bound to PGX session 'a1744e86-65fb-4bd1-b2dc-5458b20954a9' registered at PGX Server Instance running in embedded mode
pgx> G.getNumVertices()
==> 4
pgx> G.getNumEdges()
==> 4
PgxGraph G = session.readGraphAsOf( config, 1453315122685 )

You now know how to create snapshots, check out different snapshots of the same graph and also how to load specific snapshots. You can now learn about the Auto-Refresh Mechanism, to automatically create snapshots of your loaded graph on a timely basis.