**************** Graph Versioning **************** A graph can have multiple snapshots associated with it, reflecting different versions of the graph. All snapshots of a graph have the same graph config associated. This guide describes: 1. How to configure the source of snapshots 2. How snapshots are created 3. How to show available snapshots of a loaded graph 4. How to check out the latest snapshot of a loaded graph 5. How to check out different snapshots of a loaded graph 6. How to load a specific snapshot of a graph .. note:: Starting from PGX version 19.4, snapshots can be published to other sessions. Configuring the snapshots source -------------------------------- Starting from PGX 20.0.0, snapshots can be created from two sources: **Refreshing** and **ChangeSet**. Prior to version 20.0.0, only refreshing was available. Refreshing is available for graphs that are read from a persistent data source, e.g. a file. When the data source has changed with respect to the version stored in PGX, it can be read again manually by calling the :meth:`PgxSession.read_graph_with_properties()` method; similarly, if auto-refresh is set for the graph, the PGX server automatically reads the data source and creates new snapshots when the data source has changed. Instead, a ChangeSet is a set of changes to a graph that the user creates and populates via the PGX ChangeSet API. Once a ChangeSet is created and populated with the desired changes, the user can simply call :meth:`GraphChangeSet.build_new_snapshot()` to create a new snapshot for the graph. In this way, PGX users can easily integrate changes coming from any source into the graph and build snapshots out of them with full control. Only one source of snapshots is allowed for a single graph and is chosen during graph configuration via the ``snapshots_source`` option, which can be set to either ``REFRESH`` or ``CHANGE_SET`` (you can refer to Graph Configuration for the complete list of options). In case the ``snapshots_source`` option is not explicitly set by the user, the following default settings apply: * if the graph is from a persistent data source, the default value is ``REFRESH``, so that snapshots can be created only by calling :meth:`PgxSession.read_graph_with_properties()` (or via auto-refresh, if configured) * if the graph is transient, i.e. built from a graph builder (see Building Graphs from Scratch for more information), the default value is ``CHANGE_SET``, since the graph is not backed by a persistent data source to read changes from; for the same reason, ``CHANGE_SET`` is the only admissible value for transient graphs. Additionally, the following restrictions apply: * if auto-refresh is enabled, then snapshots come from reading the backing data source and hence only ``REFRESH`` is admissible for the ``snapshots_source`` option * if the user attempts to create snapshots in a way that is different from the configuration (e.g. by calling :meth:`GraphChangeSet.build_new_snapshot()` when the graph's ``snapshots_source`` is ``REFRESH`` ), the operation is invalid and an exception is thrown. Snapshot creation ----------------- Here we show how to create a snapshot both via refreshing and via ChangeSet. Snapshot creation via Refreshing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First, you should load a graph into memory: you can see the graph loading tutorial for a complete explanation about loading graphs; briefly, you should call the :meth:`PgxSession.read_graph_with_properties()` method and pass it the graph configuration. .. code-block:: python :linenos: session = pypgx.get_session(session_name="my-session") g = session.read_graph_with_properties(self.graph_path) Now you can check the available snapshots of the graph with :meth:`PgxSession.get_available_snapshots()`. Since you just loaded the graph there is only one snapshot available: .. code-block:: python :linenos: snapshots = session.get_available_snapshots(g) for metadata in snapshots: print(metadata) Now you can edit the source file to contain an additional vertex and an additional edge. For example, add the vertex "42" with vertex property "7" and an edge from "42" to "333" with the edge property "10.0". To do this add the line ``42,7`` at the end of ``examples/graphs/sample.vertices.csv``, and the line ``42,333,10.0`` at the end of ``examples/graphs/sample.edges.csv``. When you now load the updated graph within the same session as you loaded the original graph, a new snapshot is created. .. code-block:: python :linenos: g = session.read_graph_with_properties( g.config, update_if_not_fresh=True) Notice how there are two :class:`GraphMetaData` objects in the call for available snapshots, one with 4 vertices and 4 edges and one with 5 vertices and 5 edges. The variable ``G`` will point to the newest loaded graph with 5 vertices and 5 edges. You can check this with the :meth:`get_num_vertices()` and :meth:`get_num_edges()` methods. .. code-block:: python :linenos: vertices = g.num_vertices edges = g.num_edges Snapshot creation via ChangeSet ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With ChangeSets, all operations are done via the PGX Java API. In case you want to create the graph from a persistent data source, you can again use :meth:`PgxSession.read_graph_with_properties()` as in the previous example, with the ``snapshots_source`` configuration option set to ``CHANGE_SET``. For the sake of example, here we create the first graph snapshot of a transient graph via a graph builder as in the :ref:`graph builder example `. .. code-block:: python :linenos: builder = session.create_graph_builder() builder.add_edge(1, 2) builder.add_edge(2, 3) builder.add_edge(2, 4) builder.add_edge(3, 4) builder.add_edge(4, 2) graph = builder.build() Regardless of how the first snapshot has been created, the following step consists in creating a ChangeSet from ``graph`` and populating it: here, we add a new edge between vertices 1 and 4. .. code-block:: python :linenos: change_set = graph.create_change_set( edge_id_generation_strategy='user_ids') change_set.add_edge(1, 4, 6) Finally, the second snapshot is created by invoking :meth:`GraphChangeSet.build_new_snapshot()`, which returns the reference to the second snapshot. .. code-block:: python :linenos: second_snapshot = change_set.build_new_snapshot() print(len(session.get_available_snapshots(graph))) We finally see that two snapshots exist, referenced via the variables ``graph`` and ``second_snapshot``. Checking out the latest snapshots of a graph -------------------------------------------- With multiple snapshots of a graph being available and regardless of their source, you can check out a specific snapshot using the :meth:`PgxSession.set_snapshot()` method; in particular, you can use the ``LATEST_SNAPSHOT`` constant of :class:`PgxSession` to easily check out the latest available snapshot, as in the following example. .. code-block:: python :linenos: session.set_snapshot(g, creation_timestamp=PgxSession.LATEST_SNAPSHOT) s = session.get_available_snapshots(g) print(s[0].get_creation_timestamp()) Note the printed timestamp is that of the most recent snapshot. Checking out different snapshots of a graph ------------------------------------------- You can also check out a specific snapshot, again using the :meth:`PgxSession.set_snapshot()`. To check out a specific snapshot of the graph, you should pass the ``creation_timestamp`` of the snapshot you want to load to :meth:`set_snapshot()`. For example, if ``G`` is pointing to the newest graph with 5 vertices and 5 edges but you want to analyze the older graph, you need to set the snapshot to ``1453315122685``. .. code-block:: python :linenos: session.set_snapshot( g, meta_data=s[0], creation_timestamp=s[0].get_creation_timestamp()) Notice how after setting the snapshot the number of vertices and edges changed from 5 to 4. Here, we manually passed the creation timestamp we printed to :meth:`set_snapshot()` for the sake of example. In general, you can retrieve the creation timestamp of each snapshot from its associated :class:`GraphMetaData` object via the :meth:`GraphMetaData.get_creation_timestamp()` method. The easiest way to get the :class:`GraphMetaData` information of all the snapshots is to use the the :meth:`PgxSession.get_available_snapshots()` method, which returns a collection of :class:`GraphMetaData` information of each snapshot ordered by creation timestamp from the most recent to the oldest. Directly loading a specific snapshot of a graph ----------------------------------------------- You can also load a specific snapshot of a graph directly using the :meth:`PgxSession.read_graph_as_of()` method. This is a shortcut for loading a graph with :meth:`read_graph_with_properties()` followed by a :meth:`set_snapshot()`. Note that this only works for snapshots created by auto-refresh only. Snapshots created using a ChangeSet are not accessible through :meth:`PgxSession.read_graph_as_of()`. Imagine two snapshots of a graph are already loaded into the PGX session, and you want to get a reference to a specific snapshot. First you need to get a graph configuration for this graph: .. code-block:: python :linenos: config = GraphConfigFactory.for_file_formats().from_file_path(self.cfg) Then you can check the loaded snapshots for this graph config using :meth:`get_available_snapshots()`: .. code-block:: python :linenos: g = session.read_graph_with_properties(config) s = session.get_available_snapshots(g) Now you want to check out the snapshot of the graph which has 4 vertices and 4 edges, which has the timestamp ``1453315122685``. .. code-block:: python :linenos: g = session.read_graph_as_of( config, creation_timestamp=s[0].get_creation_timestamp())