Graph Management in PGX

Graph Loading

In order to perform graph analysis with PGX, the user must first read a graph into PGX. The following method in PgxSession can be used to load graphs into memory as well as their blocking variants:

# The following method in PgxSession can be used to load graphs into memory
# as well as their blocking variants
graph = session.read_graph_with_properties(
    self.graph_path,
    max_age=9223372036854775807,
    max_age_time_unit='days',
    block_if_full=False,
    update_if_not_fresh=True,
    graph_name="my_graph"
)

The first argument (path to a graph config file or a parsed config object) is the meta-data of the graph to be read. The meta-data includes the following information:

Location of the graph data: file location and name, DB location and connection information, etc

Format of the graph data: plain text formats, XML-based formats, Binary formats, etc

Types and Names of the properties to be loaded

The update_if_not_fresh and max_age arguments can be used to fine-control the age of the snapshot to be read. PGX will return an existing graph snapshot if the given graph specification was already loaded into memory by a different session. So, the max_age argument becomes important if reading from a database in which the data might change frequently. If no update_if_not_fresh or max_age is specified, PGX will favor cached data over reading new snapshots into memory.

Graph Names

Graph names are part of a session-private namespace unless explicitly shared via the publish_with_snapshots() or the publish() methods; at that point, the published graph name moves into the public namespace, that any session can see. Names are unique within a given namespace and methods will throw an exception in case of name clashes.

Graph Publishing

The publish() methods in PgxGraph can be used to publish the current selected snapshot of the graph. If you want to make all snapshots of the graph visible to other sessions, use the publish_with_snapshots() methods instead.

graph.publish(vertex_properties=True, edge_properties=True)

You can publish specific properties using PgxProperty methods. Publishing properties requires the corresponding graph to be already published.

# You can publish specific properties using PgxProperty methods.
height = graph.create_edge_property('integer', 'height')
height.publish()

Checking if Graphs or Properties are Published

Both PgxGraph and PgxProperty offer these methods:

# Check if graph or properties are published
graph.is_published

You can also check whether a graph has been published with ist snapshots in a similar way:

# Check if graph has been published with its snapshots
graph.is_published_with_snapshots

Reading Loaded or Published Graphs

To check which graphs are currently loaded or published in a session you can use the following API method from PgxSession:

# Check which graphs are currently loaded or published in a session
session.get_graphs(NAMESPACE_PUBLIC)

The returned list contains the graph names in the given namespace. It is also possible to reference one loaded/published graph with PgxSession methods:

# It is also possible to reference one loaded/published graph with PgxSession methods
session.get_graph('pgql_lang_test_graph_with_labels')
session.get_graph('sample_vertices.csv')

Providing None for the namespace parameter or calling the get_graph() methods that don’t have a Namespace parameter will look for a graph with the given name in both the private and public namespace. If a graph with the given name is found in both namespaces, the graph found in the private namespace is returned.

If you invoke these methods multiple times with the same graph name, you will get multiple different PgxGraph objects, all pointing to the same graph; therefore, if you make any modification to the graph through any of those objects (e.g. you add a property), you will see it on all the objects pointing to the same graph:

 graph1 = session.get_graph("pgql_lang_test_graph_with_labels")
 # graph2 points to the same graph as graph1
 graph2 = session.get_graph("pgql_lang_test_graph_with_labels")

 graph1.create_vertex_property("boolean", "Bool_property")
 # returns the property just created
 graph2.get_vertex_property("Bool_property")

Graph Storing

The session can serialize a loaded graph instance to a file via the following method in the PgxGraph class:

graph_config = graph.store(
    format="pgb",
    path="/tmp/myGraph.pgb",
    num_partitions=None,
    vertex_properties=True,
    edge_properties=True,
    overwrite=True
)

The first two arguments format and path specify the format and the location on the local file system where the graph should be written to. The overwrite argument determines whether or not an existing file should be overwritten. It defaults to False if omitted.

The session can select which vertex or edge properties should be stored with the graph. The optional arguments vertexProps and edgeProps can be used to specify a list of vertex and edge properties. If these arguments are omitted, all the properties are stored by default.

Finally, the above methods return a GraphConfig object, which contains the meta-data of the stored graph instance. That object can be used to read the serialized graph into memory at a later point. Note that all GraphConfig objects can be serialized easily as well, as shown in the following example:

graph_config = graph.store("pgb", "/tmp/Graph.pgb", overwrite=True)
# returns a JSON representation of the config object
json = str(graph_config)
with open("/tmp/myGraph.pgb.json", "w") as f:
    f.write(json)
graph_config_2 = GraphConfigFactory.for_any_format().from_file_path(
    "/tmp/myGraph.pgb.json"
)

Graph Deletion

In order to reduce the memory usage of PGX, the session should drop the unused PgxGraph graph objects that it created via PgxSession.get_graph() by invoking their destroy() method. This step not only destroys the specified graph, but all of its associated properties, including transient properties as well. In addition, all of the collections related to the graph instance (e.g. a VertexSet) are also destroyed automatically. If a session holds multiple PgxGraph objects referencing the same graph, invoking destroy() on any of them will invalidate all the PgxGraph objects referencing that graph, making any operation on those objects fail:

graph1 = session.get_graph("sample_vertices.csv")

# graph2 references the same graph of graph1
graph2 = session.get_graph("sample_vertices.csv")

graph1.destroy()
# both calls throw an exception, as both references are not valid anymore
# Executing graph1.get_vertex_properties throws an exception
# Executing graph2.get_vertex_properties throws an exception
self.assertRaises(Exception, graph1.get_vertex_properties)
self.assertRaises(Exception, graph2.get_vertex_properties)