PGX 20.2.2
Documentation

Graphs Management

This page presents the API used to load, publish, store and delete graphs. To see examples of these operations and have more details, please look at the child pages of this guide.

Graph Loading

In order to perform graph analysis with PGX, the user must first read a graph into PGX. The following methods in PgxSession can be used to load graphs into memory:

PgxFuture<PgxGraph> readGraphWithPropertiesAsync(String path)
PgxFuture<PgxGraph> readGraphWithPropertiesAsync(String path, String newGraphName)
PgxFuture<PgxGraph> readGraphWithPropertiesAsync(GraphConfig config)
PgxFuture<PgxGraph> readGraphWithPropertiesAsync(GraphConfig config, String newGraphName)
PgxFuture<PgxGraph> readGraphWithPropertiesAsync(GraphConfig config, boolean forceUpdateIfNotFresh)
PgxFuture<PgxGraph> readGraphWithPropertiesAsync(GraphConfig config, boolean forceUpdateIfNotFresh, String newGraphName)
PgxFuture<PgxGraph> readGraphWithPropertiesAsync(GraphConfig config, long maxAge, TimeUnit maxAgeTimeUnit)
PgxFuture<PgxGraph> readGraphWithPropertiesAsync(GraphConfig config, long maxAge, TimeUnit maxAgeTimeUnit, boolean blockIfFull, String newGraphName)

as well as their blocking variants:

PgxGraph readGraphWithProperties(String path)
PgxGraph readGraphWithProperties(String path, String newGraphName)
PgxGraph readGraphWithProperties(GraphConfig config)
PgxGraph readGraphWithProperties(GraphConfig config, String newGraphName)
PgxGraph readGraphWithProperties(GraphConfig config, boolean forceUpdateIfNotFresh)
PgxGraph readGraphWithProperties(GraphConfig config, boolean forceUpdateIfNotFresh, String newGraphName)
PgxGraph readGraphWithProperties(GraphConfig config, long maxAge, TimeUnit maxAgeTimeUnit)
PgxGraph readGraphWithProperties(GraphConfig config, long maxAge, TimeUnit maxAgeTimeUnit, boolean blockIfFull, String newGraphName)
read_graph_with_properties(self, config, max_age=9223372036854775807, max_age_time_unit='days',
                                   block_if_full=False, update_if_not_fresh=True, graph_name=None)

The first argument (path to a graph config file or a parsed config object) is the meta-data of the graph to be read. The meta-data includes the following information:

  • Location of the graph data: file location and name, DB location and connection information, etc
  • Format of the graph data: plain text formats, XML-based formats, Binary formats, etc
  • Types and Names of the properties to be loaded

Refer to the Graph Loading Guide for detailed information about the different data formats PGX supports and their configurations.

The forceUpdateIfNotFresh and maxAge arguments can be used to fine-control the age of the snapshot to be read. PGX will return an existing graph snapshot if the given graph specification was already loaded into memory by a different session. So, the maxAge argument becomes important if reading from a database in which the data might change frequently. If no forceUpdateIfNotFresh or maxAge is specified, PGX will favor cached data over reading new snapshots into memory.

For more details, check the javadoc and the guide about loading custom graph data.

Graph Names

Graph names follow the rules described in the Namespaces and Sharing page. In brief, graph names are part of a session-private namespace unless explicitly shared via the publishWithSnapshots() or the publish() methods; at that point, the published graph name moves into the public namespace, that any session can see. Names are unique within a given namespace and methods will throw an exception in case of name clashes.

PGQL

PGQL supports selecting a graph to query using the FROM-statement. The graph name mentioned in the FROM-statement is resolved with the same semantics as retrieving a graph without a namespace; see Retrieving Graphs by Name. In the PGQL query SELECT * FROM MATCH (v) ON myGraph the graph myGraph will be retrieved with the same semantics as session.getGraph("myGraph").

Examples

Code examples for getting graphs from different namespaces:

// look up graph "myGraph" in session private namespace:
PgxGraph g = session.getGraph(Namespace.PRIVATE, "myGraph");

// look up graph "myGraph" in public namespace:
PgxGraph g = session.getGraph(Namespace.PUBLIC, "myGraph");

// look up "myGraph" in both namespaces, where PRIVATE takes precedence over PUBLIC:
PgxGraph g = session.getGraph("myGraph");
PgxGraph g = session.getGraph(null, "myGraph");
g = session.get_graph("myGraph")

Code example for getting a list of graph names in the private namespace:

Collection<String> privateGraphs = session.getGraphs(Namespace.PRIVATE);
session.getGraph(Namespace.PRIVATE, privateGraphs.get(0));

Graph Publishing

The publish() methods in PgxGraph can be used to publish the current selected snapshot of the graph. If you want to make all snapshots of the graph visible to other sessions, use the publishWithSnapshots() methods instead.

PgxFuture<Void> publishAsync()
void publish() // synchronous variant

PgxFuture<Void> publishAsync(Collection<VertexProperty<?, ?>> vertexProps, Collection<EdgeProperty<?>> edgeProps)
void publish(Collection<VertexProperty<?, ?>> vertexProps, Collection<EdgeProperty<?>> edgeProps) // synchronous variant
publish(self, vertex_properties=True, edge_properties=True)

You can publish specific properties using Property methods. Publishing properties requires the corresponding graph to be already published.

PgxFuture<Void> publishAsync()
void publish() // synchronous variant
publish(self)

If a private graph already has snapshots, publishWithSnapshots() will publish them all under the same name

PgxFuture<Void> publishWithSnapshotsAsync()
void publishWithSnapshots() // synchronous variant

PgxFuture<Void> publishWithSnapshotsAsync(Collection<VertexProperty<?, ?>> vertexProps, Collection<EdgeProperty<?>> edgeProps)
void publishWithSnapshots(Collection<VertexProperty<?, ?>> vertexProps, Collection<EdgeProperty<?>> edgeProps) // synchronous variant

For more information, you can refer to the page dedicated to publishing a graph.

Checking If Graphs or Properties Are Published

Both PgxGraph and Property offer these methods:

PgxFuture<Boolean> isPublishedAsync()
boolean isPublished()
publish(self)

You can also check whether a graph has been published with ist snapshots in a similar way:

PgxFuture<Boolean> isPublishedWithSnapshotsAsync()
boolean isPublishedWithSnapshots()

Reading Loaded or Published Graphs

To check which graphs are currently loaded or published in a session you can use the following API method from PgxSession:

PgxFuture<List<String>> getGraphsAsync(Namespace namespace)
List<String> getGraphs(Namespace namespace)
get_graphs(self)

The returned list contains the graph names in the given namespace.

It is also possible to reference one loaded/published graph with PgxSession methods:

PgxFuture<PgxGraph> getGraphAsync(Namespace namespace, String name)
PgxFuture<PgxGraph> getGraphAsync(String name)
PgxGraph getGraph(Namespace namespace, String name)
PgxGraph getGraph(String name)
get_graph(self, graph_name)

Providing null for the namespace parameter or calling the getGraph() methods that don't have a Namespace parameter will look for a graph with the given name in both the private and public namespace. If a graph with the given name is found in both namespaces, the graph found in the private namespace is returned.

If you invoke these methods multiple times with the same graph name, you will get multiple different PgxGraph objects, all pointing to the same graph; therefore, if you make any modification to the graph through any of those objects (e.g. you add a property), you will see it on all the objects pointing to the same graph:

PgxGraph graph1 = session.getGraph("myGraphName");
// graph2 points to the same graph as graph1
PgxGraph graph2 = session.getGraph("myGraphName");

graph1.createVertexProperty(PropertyType.BOOLEAN, "BoolProperty");
// returns the property just created
VertexProperty<Object, Boolean> property = graph2.getVertexProperty("BoolProperty");
graph1 = session.get_graph("myGraphName")
# graph2 points to the same graph as graph1"""
graph2 = session.get_graph("myGraphName")

graph1.create_vertex_property("boolean", "BoolProperty")
# returns the property just created
graph1.get_vertex_property("BoolProperty")

Note that the server keeps track of how many PgxGraph objects per session point to the each snapshot: in this way, if a PgxGraph object is modified to point to a different graph via PgxSession.setSnapshot(), the other objects still point to the initial snapshot:

// get a snapshot of "myGraphName"
PgxGraph graph1 = session.getGraph("myGraphName");
// graph2 points to the same snapshot as graph1
PgxGraph graph2 = session.getGraph("myGraphName");

// we assume another snapshot is created ...

// create a property on the snapshots pointed by graph1
graph1.createVertexProperty(PropertyType.BOOLEAN, "BoolProperty");
// make graph2 point to the latest snapshot available, which is different from graph1
session.setSnapshot(graph2, PgxSession.LATEST_SNAPSHOT);
// returns the property just created: graph1 is still a valid reference to the original snapshot we got
VertexProperty<Object, Boolean> property1 = graph1.getVertexProperty("BoolProperty");
// returns NULL, because graph2 points to the new snapshot, which does not have this property
VertexProperty<Object, Boolean> property2 = graph2.getVertexProperty("BoolProperty");

For a detailed explanation of graph versioning, you can refer to the Graph Versioning guide.

When you are done with your work on a graph snapshot, you should release it, as explained in the Graph Deletion section.

Graph Storing

The session can serialize a loaded graph instance to a file via one of the following methods in the PgxGraph class:

PgxFuture<FileGraphConfig> storeAsync(Format targetFormat, String targetPath)
PgxFuture<FileGraphConfig> storeAsync(Format targetFormat, String targetPath, boolean overwrite)
PgxFuture<FileGraphConfig> storeAsync(Format targetFormat, String targetPath, Collection<VertexProperty<?, ?>> vertexProps, Collection<EdgeProperty<?>> edgeProps, boolean overwrite)

or their blocking variants:

FileGraphConfig store(Format targetFormat, String targetPath)
FileGraphConfig store(Format targetFormat, String targetPath, boolean overwrite)
FileGraphConfig store(Format targetFormat, String targetPath, Collection<VertexProperty<?, ?>>vertexProps, Collection<EdgeProperty<?>> edgeProps, boolean overwrite)
 store(self, format, path, num_partitions=None, 
        vertex_properties=True, edge_properties=True, overwrite=False)

The first two arguments targetFormat and targetPath specify the format and the location on the local file system where the graph should be written to.

Embedded PGX instance only

Note: The above methods will throw an UnsupportedOperationException if connected to a remote PGX instance. Writing to a remote server's file system is not permitted for security reasons.

The overwrite argument determines whether or not an existing file should be overwritten. It defaults to false if omitted.

The session can select which vertex or edge properties should be stored with the graph. The optional arguments vertexProps and edgeProps can be used to specify a list of vertex and edge properties. If these arguments are omitted, all the properties are stored by default. PGX provides convenience constants VertexProperty.ALL, EdgeProperty.ALL and VertexProperty.NONE, EdgeProperty.NONE to specify all properties or none properties to be stored, respectively.

Finally, the above methods return a FileGraphConfig object, which contains the meta-data of the stored graph instance. That object can be used to read the serialized graph into memory at a later point. Note that all GraphConfig objects can be serialized easily as well, as shown in the following example:

import ...
import org.apache.commons.io.FileUtils; // commons-io is packaged with PGX

PgxGraph myGraph = ...
GraphConfig graphConfig = myGraph.store(Format.PGB, "/tmp/myGraph.pgb");
File configFile = new File("/tmp/myGraph.pgb.json");
String json = graphConfig.toString(); // returns a JSON representation of the config object
FileUtils.write(configFile, json);

// read config back into memory
GraphConfig graphConfig2 = GraphConfigFactory.forAnyFormat().fromFile(configFile);
assert(graphConfig.equals(graphConfig2));
my_graph = ...
graph_config = my_graph.store("pgb", "/tmp/myGraph.pgb");
 # returns a JSON representation of the config object
json = str(graph_config) 
with open("/tmp/myGraph.pgb.json","w") as f:
    f.write(json)
graph_config_2 = GraphConfigFactory.for_any_format().from_file_path("/tmp/myGraph.pgb.json")

Check the javadoc for details.

Graph Deletion

In order to reduce the memory usage of PGX, the session should drop the unused PgxGraph graph objects that it created via PgxSession.getGraph() by invoking their destroyAsync() (or destroy()) method. This step not only destroys the specified graph, but all of its associated properties, including transient properties as well. In addition, all of the collections related to the graph instance (e.g. a VertexSet) are also destroyed automatically. If a session holds multiple PgxGraph objects referencing the same graph, invoking destroyAsync() (or destroy()) on any of them will invalidate all the PgxGraph objects referencing that graph, making any operation on those objects fail:

PgxGraph graph1 = session.getGraph("myGraphName")
// graph2 references the same graph of graph1
PgxGraph graph2 = session.getGraph("myGraphName")
// both calls throw an exception, as both references are not valid anymore
Set<VertexProperty<?, ?>> properties = graph1.getVertexProperties();
properties = graph2.getVertexProperties()
graph1 = session.get_graph("myGraphName")

# graph2 references the same graph of graph1
 graph2 = session.get_graph("myGraphName")

# both calls throw an exception, as both references are not valid anymore
properties = graph1.get_vertex_properties()
properties = graph2.get_vertex_properties()

The same behavior occurs when multiple PgxGraph objects reference the same snapshot: since a snapshots is effectively a graph, destroying a PgxGraph object referencing a certain snapshot invalidates all PgxGraph objects referencing the same snapshot, but does not invalidate those referencing other snapshots:

// get a snapshot of "myGraphName"
PgxGraph graph1 = session.getGraph("myGraphName");
// graph2 and graph3 reference the same snapshot as graph1
PgxGraph graph2 = session.getGraph("myGraphName");
PgxGraph graph3 = session.getGraph("myGraphName");

// we assume another snapshot is created ...

// make graph3 references the latest snapshot available
session.setSnapshot(graph3, PgxSession.LATEST_SNAPSHOT);
graph2.destroy();
// both calls throw an exception, as both references are not valid anymore
Set<VertexProperty<?, ?>> properties = graph1.getVertexProperties();
properties = graph2.getVertexProperties();

// graph3 is still valid, so the call succeeds
properties = graph3.getVertexProperties();

Note that even if a graph is destroyed by a session, the graph data may still remain in the server memory if the graph is currently shared by other sessions. In such a case, the graph may still be visible among the available graphs via PgxSession.getGraphs().

As a safe alternative to manual destruction of each graph, the PGX API supports some implicit resource management features which allow developers to safely omit the destroy() call. You can refer to the dedicated section in the PGX API Design chapter.