Graph Management in PGX
Graph Loading
In order to perform graph analysis with PGX, the user must first read a graph into PGX. The following method in PgxSession
can be used to load graphs into memory as well as their blocking variants:
1# The following method in PgxSession can be used to load graphs into memory
2# as well as their blocking variants
3graph = session.read_graph_with_properties(
4 self.graph_path,
5 max_age=9223372036854775807,
6 max_age_time_unit='days',
7 block_if_full=False,
8 update_if_not_fresh=True,
9 graph_name="my_graph"
10)
The first argument (path
to a graph config file or a parsed config
object) is the meta-data of the graph to be read. The meta-data includes the following information:
Location of the graph data: file location and name, DB location and connection information, etc
Format of the graph data: plain text formats, XML-based formats, Binary formats, etc
Types and Names of the properties to be loaded
The update_if_not_fresh
and max_age
arguments can be used to fine-control the age of the snapshot to be read. PGX will return an existing graph snapshot if the given graph specification was already loaded into memory by a different session. So, the max_age
argument becomes important if reading from a database in which the data might change frequently. If no update_if_not_fresh
or max_age
is specified, PGX will favor cached data over reading new snapshots into memory.
Graph Names
Graph names are part of a session-private namespace unless explicitly shared via the publish_with_snapshots()
or the publish()
methods; at that point, the published graph name moves into the public namespace, that any session can see.
Names are unique within a given namespace and methods will throw an exception in case of name clashes.
Graph Publishing
The publish()
methods in PgxGraph
can be used to publish the current selected snapshot of the graph. If you want to make all snapshots of the graph visible to other sessions, use the publish_with_snapshots()
methods instead.
1graph.publish(vertex_properties=True, edge_properties=True)
You can publish specific properties using PgxProperty
methods.
Publishing properties requires the corresponding graph to be already published.
1# You can publish specific properties using PgxProperty methods.
2height = graph.create_edge_property('integer', 'height')
3height.publish()
Checking if Graphs or Properties are Published
Both PgxGraph
and PgxProperty
offer these methods:
1# Check if graph or properties are published
2graph.is_published
You can also check whether a graph has been published with ist snapshots in a similar way:
1# Check if graph has been published with its snapshots
2graph.is_published_with_snapshots
Reading Loaded or Published Graphs
To check which graphs are currently loaded or published in a session you can use the following API method from PgxSession
:
1# Check which graphs are currently loaded or published in a session
2session.get_graphs()
The returned list contains the graph names in the given namespace.
It is also possible to reference one loaded/published graph with PgxSession
methods:
1# It is also possible to reference one loaded/published graph with PgxSession methods
2session.get_graph('pgql_lang_test_graph_with_labels')
3session.get_graph('sample_vertices.csv')
Providing None
for the namespace
parameter or calling the get_graph()
methods that don’t have a Namespace
parameter will look for a graph with the given name in both the private and public namespace. If a graph with the given name is found in both namespaces, the graph found in the private namespace is returned.
If you invoke these methods multiple times with the same graph name, you will get multiple different PgxGraph
objects, all pointing to the same graph; therefore, if you make any modification to the graph through any of those objects (e.g. you add a property), you will see it on all the objects pointing to the same graph:
1 graph1 = session.get_graph("pgql_lang_test_graph_with_labels")
2 # graph2 points to the same graph as graph1
3 graph2 = session.get_graph("pgql_lang_test_graph_with_labels")
4
5 graph1.create_vertex_property("boolean", "Bool_property")
6 # returns the property just created
7 graph2.get_vertex_property("Bool_property")
Graph Storing
The session can serialize a loaded graph instance to a file via the following method in the
PgxGraph
class:
1graph_config = graph.store(
2 format="pgb",
3 path="/tmp/myGraph.pgb",
4 num_partitions=None,
5 vertex_properties=True,
6 edge_properties=True,
7 overwrite=True
8)
The first two arguments format and path specify the format and the location on the local file system where the graph should be written to.
The overwrite
argument determines whether or not an existing file should be overwritten. It defaults to False
if omitted.
The session can select which vertex or edge properties should be stored with the graph. The optional arguments vertexProps and edgeProps can be used to specify a list of vertex and edge properties. If these arguments are omitted, all the properties are stored by default.
Finally, the above methods return a GraphConfig
object, which contains the meta-data of the stored graph instance. That object can be used to read the serialized graph into memory at a later point. Note that all GraphConfig
objects can be serialized easily as well, as shown in the following example:
1graph_config = graph.store("pgb", "/tmp/Graph.pgb", overwrite=True)
2# returns a JSON representation of the config object
3json = str(graph_config)
4with open("/tmp/myGraph.pgb.json", "w") as f:
5 f.write(json)
6graph_config_2 = GraphConfigFactory.for_any_format().from_file_path(
7 "/tmp/myGraph.pgb.json"
8)
Graph Deletion
In order to reduce the memory usage of PGX, the session should drop the unused PgxGraph
graph objects that it created via PgxSession.get_graph()
by invoking their destroy()
method.
This step not only destroys the specified graph, but all of its associated properties, including transient properties as well.
In addition, all of the collections related to the graph instance (e.g. a VertexSet
) are also destroyed automatically.
If a session holds multiple PgxGraph
objects referencing the same graph, invoking destroy()
on any of them will invalidate all the PgxGraph
objects referencing that graph, making any operation on those objects fail:
1graph1 = session.get_graph("sample_vertices.csv")
2
3# graph2 references the same graph of graph1
4graph2 = session.get_graph("sample_vertices.csv")
5
6graph1.destroy()
7# both calls throw an exception, as both references are not valid anymore
8# Executing graph1.get_vertex_properties throws an exception
9# Executing graph2.get_vertex_properties throws an exception
10self.assertRaises(Exception, graph1.get_vertex_properties)
11self.assertRaises(Exception, graph2.get_vertex_properties)