PGX 19.3.1
Documentation

Graph Queries (PGQL)

This page illustrates extensions and limitations of the implementation of PGQL in PGX. The APIs for executing PGQL queries through PGX can be found here.

PGQL Specification

The complete PGQL 1.2 specification can be found at pgql-lang.org (see pgql-lang.org/spec/1.2/)

Limitations on PGQL 1.2 (shared Memory Execution Mode)

PGX 19.3.1 in shared memory execution mode has the following limitations in support for PGQL 1.2.

Cross-constraints in Reachability, Shortest Path and Cheapest Path

Constraints that access more than one variable (i.e. "cross-constraints") are not supported in the WHERE clause of a PATH pattern or in the WHERE clause of a SHORTEST or CHEAPEST path function.

Moreover, expressions that access more than one variable are not supported in the COST clause of a CHEAPEST path function.

For example, the following queries are not supported:

PATH p1 AS (n) -> (m) WHERE n.prop > m.prop ...
... SHORTEST ( (n) (-[e]-> (x) WHERE e.prop + x.prop > 10)* (m) ) ...
... CHEAPEST ( (n) (-[e]-> (x) COST e.prop + x.prop )* (m) ) ...
PATH p AS (a) -> (b) WHERE EXISTS (SELECT * MATCH (a) -> (c) -> (b))
SELECT ...

Note that this last query is not supported because the subquery accesses both variables a and b from the outer query. However, the following example is supported because the subquery only accesses a single variable (a) from the outer query:

PATH p AS (a) -> (b) WHERE EXISTS (SELECT * FROM MATCH (a) -> (c) -> (v_new))
SELECT ...

Negative or Zero Edge Cost in Cheapest Path

The cost function in a (top k) cheapest path query is only allowed to evaluate to a positive number. If not, the query fails with a runtime exception.

This limitation is due to the fact that negative or zero edge costs mean that certain edges reduce or keep the path cost and can therefore lead to infinite loops.

Quantifiers in Shortest and Cheapest Path

The only quantifier supported inside SHORTEST ( ... ) and CHEAPEST( ... ) is zero or more (*). Other quantifiers are only supported for reachability but not for shortest or cheapest path.

Querying Multiple Graphs at Once

Querying of multiple graphs through a single PGQL statement is not supported.

Subqueries must take the same graph as input as the graph that is the input of the outermost query. To specify that outer query and subquery should use the same input, one can either explicitly repeat the graph name in the FROM clauses, or, omit the FROM clause from an inner query, which is an implicit way to describe that the input graph of the inner query is the same as the input graph of the outer query.

Properties of Type vertex or edge

In PGX, built-in functions such as Dijkstra can optionally provide you with "parent vertices/edges", encoded as properties of type vertex and edge. However, PGQL does not support properties of type vertex or edge. Such properties would need to be manually converted into a set of edges, or, a set of properties of primitive type (e.g. string/long/integer), before they can be queried through PGQL.

Property Names

Property names in PGX may contain alphanumeric characters only. For example, underscore (_), dash (-) or space () characters are not allowed. There exist no such limitation for graph names or labels, which allow any character.

Undirected Graphs

It is not possible to use directed edge patterns to match undirected edges in the graph. Only any-directional edge pattern can be used.

GROUP BY on Null Values

Grouping on null values is not yet supported.

Limitations on PGQL 1.2 (distributed Execution Mode)

Please see the section on "Graph Pattern Matching" on Distributed Execution Mode Notice.

Extensions to PGQL 1.2 (shared-memory Execution Mode)

PGX 19.3.1 in shared-memory execution mode provides the following extensions to PGQL 1.2.

Not Equals Operator

In addition to the not equals operator in PGQL 1.2 (i.e. <>), PGX 19.3.1 supports the syntactic alternative !=.

Aliases in SELECT May Be Referenced from WHERE and GROUP BY

As a way to avoid repeated specification of value expressions, it is not only possible to reference the aliases in SELECT from HAVING and ORDER BY, but also from WHERE and GROUP BY.

For example:

  SELECT n.age + m.age / 2 AS avg_age
       , COUNT(*)
    FROM g MATCH (n) -[e:friend_of]-> (m)
   WHERE avg_age > 20
GROUP BY avg_age
ORDER BY avg_age

Here, the variable avg_age defined in the SELECT clause is accessed in the WHERE and GROUP BY clauses.

Temporal Functions

The to_date function can be used to convert a string to a date, based on a specified date format.

SELECT to_date('1999 Feb 19', 'yyyy MMM dd') MATCH ...

The to_time function can be used to convert a string to a time or a time with timezone, based on a specified time format.

SELECT to_time('08:11 PM', 'hh:mm a') MATCH ...
SELECT to_time('08:11 PM -03:15', 'hh:mm a XXX') MATCH ...

The to_timestamp function can be used to convert a string to a timestamp or a timestamp with timezone, based on a specified timestamp format.

SELECT to_timestamp('2020/04/02 08:11 PM', 'yyyy/MM/dd hh:mm a') MATCH ...
SELECT to_timestamp('2020/04/02 08:11 PM -03:15', 'yyyy/MM/dd hh:mm a XXX') MATCH ...

The date, time, and timestamp format strings are based on Java's DateTimeFormatter.

Spatial Types

PGQL provides the following built-in functions relevant to the spatial type Point2D:

  • ST_X(n.pointProp): gets the X (longitude) from the point property (pointProp) of vertex n
  • ST_Y(n.pointProp): gets the Y (latitude) from the point property (pointProp) of vertex n
  • ST_PointFromText('POINT(11.22 12.88)'): gets the point object from a Well-known text (WKT) representation

Top K Cheapest Path Queries

PGQL offers a TOP k CHEAPEST clause, which returns the k paths that match a given pattern with the lowest cost, computed with a user-defined cost function. If the user-defined cost function returns a constant, the TOP k CHEAPEST clause is equivalent to TOP k SHORTEST.

The syntax of the queries is extended the following way:

CheapestPathPattern                 ::= 'CHEAPEST' '(' SourceVertexPattern QuantifiedShortestPathPrimary DestinationVertexPattern ')'

TopKCheapestPathPattern             ::= 'TOP' KValue CheapestPathPattern

SourceVertexPattern                 ::= VertexPattern

DestinationVertexPattern            ::= VertexPattern

QuantifiedShortestPathPrimary       ::= ShortestPathPrimary GraphPatternQuantifier?

ShortestPathPrimary                 ::=   EdgePattern
                                          | ParenthesizedPathPatternExpression

ParenthesizedPathPatternExpression  ::= '(' VertexPattern? EdgePattern VertexPattern? WhereClause? CostClause? ')'

CostClause                          ::= 'COST' ValueExpression

The cost function must evaluate to a number.

Over paths returned by a CHEAPEST query the same aggregations are defined as over paths returned by a SHORTEST query.

The CHEAPEST queries represent paths the same way as SHORTEST, allowing the same path aggregations.

For example, the following query returns the top 3 cheapest flights (flights with the smallest total price for flight tickets) between London and Amsterdam.

SELECT src, SUM(e.weight), dst
MATCH TOP 3 CHEAPEST ((src) (-[e: flight]-> COST e.price)* (dst))
WHERE src.code = 'LON' AND dst.code = 'AMS'

Note that the cost function is not limited to edge properties, it can be an arbitrary expression (taking certain limitations into account). For example the following query replaces null property values with 10:

SELECT src, SUM(e.weight), dst
MATCH TOP 3 CHEAPEST ((src) ((u: Airport)-[e: flight]->(v: Airport) COST CASE WHEN u.fee IS NULL THEN 10 ELSE u.fee)* (dst))
WHERE src.code = 'LON' AND dst.code = 'AMS'

Beta Extensions to PGQL 1.2 (shared-memory Exec. Mode)

The following features were added for shared-memory execution mode on an experimental basis. Please be aware that these features might not be supported in later versions, or their exact syntax and behavior might change.

In order to enable beta language features, the query needs to contain an additional /*beta*/ comment (see examples below).

MODIFY Queries

Users can now run MODIFY queries on graphs. PGQL defines three kinds of modification: insert, update and delete. Insertions and deletions can only run on session-local graphs, whereas update queries can also run on shared graphs as long as the update only modifies values of session-local properties.

Syntax

The syntax of graph queries is extended in the following way:

Query ::=
  <CommonPathExpressions>?
  (<SelectClause> | <ModifyClause>) 
  <FromClause>? <MatchClause>
  <WhereClause>?
  <GroupByClause>? <HavingClause>?
  <OrderByClause>?
  <LimitOffsetClauses>?

and

ModifyClause            ::= 'MODIFY/*beta*/' { GraphName }? '(' (<Modification>)+ ')'

Modification            ::= <UpdateClause> | <InsertClause> | <DeleteClause>

UpdateClause            ::= 'UPDATE ' (<UpdateTerm> ', ')+

InsertClause            ::= 'INSERT ' ((<VertexInsertTerm> | <EdgeInsertTerm>) ', ')+

DeleteClause           ::= 'DELETE ' (<VariableReference> ', ' )+

UpdateTerm              ::= <VariableReference> 'SET PROPERTIES ( ' { <PropertyUpdate> ',' }+ ')'

VertexInsertTerm        ::= 'VERTEX ' <VariableReference> (<LabelsExpression>)? (<PropertiesExpression>)?

EdgeInsertTerm          ::= 'EDGE ' <VariableReference> ' BETWEEN ' <VariableReference> ' AND ' <VariableReference> (<LabelsExpression>)? (<PropertiesExpression>)?

LabelsExpression        ::= 'LABELS (' (  <ValueExpression> ', ') + ')'

PropertiesExpression    ::= 'PROPERTIES ( ' { <PropertyUpdate> ',' }+ ')'

PropertyUpdate          ::= <PropertyAccess> '=' <ValueExpression> 

For example, the value of property age for every person named "John" can be updated the following way:

MODIFY/*beta*/ g
    ( UPDATE x SET PROPERTIES ( x.age = 42 ) ) 
FROM g
    MATCH (x:Person)
    WHERE x.name = 'John'

Currently, at most one modification is allowed of each modification type in a MODIFY query. Modifications follow snapshot isolation semantics, meaning that modifications under the same MODIFY clause do not see each other's effect.

Just like in case of SELECT, the FROM clause can be omitted under certain conditions. However, the modified graph must be the graph against which the graph pattern is matched.

Return Value

Modify queries return null.

Updating Entities

The beginning of this section already showed an example for updating a property of a vertex. In order to be able to update the age property, it must be private to the session. This means that the graph should either be session-local, or age should be a transient property defined from the session.

MODIFY/*beta*/ g
    ( UPDATE x SET PROPERTIES ( x.age = 42 ) ) 
FROM g
    MATCH (x:Person)
    WHERE x.name = 'John'

Edge properties can be updated in a similar way, with the same restrictions.

For example the following query updates all edge's cost to be the sum of the cost of the source and destination vertices.

MODIFY/*beta*/ g
   ( UPDATE e SET PROPERTIES ( e.cost = x.cost + y.cost ) )
MATCH (x)-[e]->(y)

Properties can be updated via any variable matched with the pattern in the MATCH clause.

MODIFY/*beta*/ g
    ( UPDATE v SET PROPERTIES ( v.age = 42 ),
             u SET PROPERTIES ( u.weight = 3500 )
MATCH (v:Person) <-[:belongs_to]- (u:Car)
WHERE v.name = 'John'
Handling Read After Write Conflicts

During the update, the assigned values (right-hand-side of assignments) correspond to the graph property values before the beginning of the update. This aligns with the snapshot isolation semantics defined between modifications in the same MODIFY query.

For example consider the following update:

MODIFY/*beta*/ g
    ( UPDATE x SET PROPERTIES (x.a = y.b, x.b = 12) ) 
MATCH (x) -> (y)

It is possible, that a vertex is matched by both (x) and (y) for example

x y
V1 V2
V3 V1

Supposing that V1.b was 20 before executing the update, V1.b will be assigned 12 V3.a will be assigned 20 no matter in which order the updates are executed.

Handling Write After Write Conflicts

Multiple writes to the same property of the same entity are not allowed, in such cases the execution terminates with an error.

For example consider the following query:

MODIFY/*beta*/ 
    ( UPDATE x SET PROPERTIES ( x.a = y.a ) )
MATCH (x) -> (y)

If the following vertices are matched

x y
V1 V2
V1 V3

a runtime exception will be thrown, because the value assigned to V1.a could be ambiguous.

As an extension to this semantics, PGX implements a more relaxed version for conflicting write checks. If the assigned value can be statically guaranteed to be only depending on property values of the entity it is assigned to, then even in case of multiple assignments, (since the assigned value is always the same) the update succeeds.

For example, in the following case, multiple writes to v.a are allowed, because in this case no matter how many times v.a is written, it is always assigned the same value (65 minus its age property).

MODIFY/*beta*/ g
   ( UPDATE v SET PROPERTIES ( v.a = 65 - v.age ) )
MATCH (v:Person) -> (u:Person)
WHERE v.name = 'John'

In the following case, however, multiple writes to v.a are not allowed, because the value of the property would be ambiguous, 65 minus the other vertex's age property, that can be different for different matched u's.

MODIFY/*beta*/ g
    ( UPDATE v SET PROPERTIES ( v.a = 65 - u.age ) )
MATCH (v:Person) -> (u:Person)
WHERE v.name = 'John'

Deleting Entities

Entities can be deleted by enumerating them after the DELETE keyword. The order of enumeration does not affect the result of the execution.

For example, one can delete all edges from a graph using the following query

MODIFY/*beta*/ g
    ( DELETE e )
MATCH ()-[e]->()

Multiple deletes to the same entity are not considered conflicting. For example consider the following query:

MODIFY/*beta*/
   ( DELETE x, y )
MATCH (x) -> (y)

In that case, even if a vertex is matched multiple times by (x) or (y), and deleted multiple times, the query will complete without an exception.

If a vertex is deleted, all its incoming and outgoing edges are deleted as well, thus there are no dangling edges left after a query. So the following query not only deletes the vertex with id 11 but also all edges for which it is source or destination.

MODIFY/*beta*/ g
    ( DELETE x )
MATCH (x)
WHERE id(x) = 11

Because of implicit deletion of edges, the following query can be used to delete all edges as well as all vertices from a graph:

MODIFY/*beta*/ g
    ( DELETE x )
MATCH (x)

Inserting Entities

PGQL supports the insertions of edges and vertices into a graph. In the same query, multiple vertices and edges can be inserted by enumerating them after the INSERT keyword. All inserted entities must be identified with a variable name that has to be unique for the whole modification query.

So the following query should fail, because the variable x is not only local to the vertex insertion term:

MODIFY/*beta*/ g
    ( INSERT VERTEX x, VERTEX x )

The id values for the inserted entities are automatically generated.

Inserting Vertices

Vertices can be inserted with or without a match.

If the match is missing, one unconnected vertex is inserted to the graph. For example in case of the following query

MODIFY/*beta*/ g
    ( INSERT VERTEX x LABELS ('Male') PROPERTIES (x.age = 22) )

In the presence of a match, as many vertices are inserted as many rows are matched. So the follwoing query inserts a new vertex for every vertex in the graph that is labelled Male.

MODIFY/*beta*/ g
    ( INSERT VERTEX x LABELS ('Male') PROPERTIES (x.age = y.age) )
MATCH (y: Male)

In the presence of a GROUP BY expression, as many vertices are inserted, as many groups are matched. For example the following query inserts a new vertex for every profession in the graph.

MODIFY/*beta*/ g
    ( INSERT VERTEX x LABELS ('Profession') PROPERTIES ( x.name = y.profession )
MATCH (y: Person)
GROUP BY y.profession
Inserting Edges

Edges can be inserted by specifying the source and destination vertices. Only the insertion of directed edges are supported.

For example the following query inserts a vertex with source x and destination y:

MODIFY/*beta*/ g
    ( INSERT EDGE e BETWEEN x AND y )
MATCH (x), (y) 
WHERE id(x) = 1 AND id(y) = 2 
Labels

Labels for the inserted entities can be specified between braces after the LABELS keyword. Label expressions can be arbitrary expressions, but they must evaluate to string. An entity can be assigned multiple labels, however, as current limitation in PGX, edges can only be assigned at most one label. Moreover, in partitioned graphs, newly inserted entities must have exactly one label that identifies the type of the inserted entity.

For example in order to insert vertices with the exact same label as the matched vertices, one could write the following query:

MODIFY/*beta*/ g
   ( INSERT VERTEX v LABELS ( label(u) ) )
MATCH (u)

Edge labels can be specified similarly:

MODIFY/*beta*/ g
    ( INSERT EDGE e BETWEEN x AND y LABELS ('knows'))
MATCH (x: Person), (y: Person) 
WHERE id(x) = 1 AND id(y) = 2 
Properties

Properties can be specified between braces after the PROPERTIES keyword. On the right-hand-side of the expression, the property name must be preceded by the variable name and a dot. Property assignments can be arbitrary expressions with similar restrictions as property assignments in case of update queries. Property expressions cannot refer to other entities that are inserted at the same time.

For example, the following query inserts a new vertex with age = 22:

MODIFY/*beta*/ g
    ( INSERT VERTEX v PROPERTIES ( v.age = 22) )

Edge properties can be specified in the same manner:

MODIFY/*beta*/ g
    ( INSERT EDGE e BETWEEN x AND y LABELS ('knows') PROPERTIES (e.since = DATE '2017-09-21') )
MATCH (x: Person), (y: Person) 
WHERE id(x) = 1 AND id(y) = 2 

In case of partitioned schema, only those properties can be assigned that are defined for the type of the entity. Note that the entity type is determined by the label(s).

Multiple Inserts in the Same INSERT Clause

One insert clause can contain multiple inserts.

For example, the query below inserts two vertices into the graph:

MODIFY/*beta*/ g
    ( INSERT
        VERTEX v LABELS ('Male') PROPERTIES (v.age = 23, v.name = 'John'),
        VERTEX u LABELS ('Female') PROPERTIES (u.age = 24, v.name = 'Jane')
    )    

Multiple insertions under the same INSERT can be used to set a newly inserted vertex as source or destination for a newly inserted edge.

For example, the following query inserts a vertex and an edge that connects it to the matched vertex y:

MODIFY/*beta*/ g
    ( INSERT 
        VERTEX x LABELS ('Person') PROPERTIES (x.name = 'John'),
        EDGE e BETWEEN x AND y LABELS ('knows') PROPERTIES (e.since = DATE '2017-09-21'))
MATCH (y) WHERE y.name = 'Jane'

Note that the properties of x cannot be accessed in the property assignments of e, only the variable itself is visible as source of the edge. For this reason setting e.since to x.graduation_date would cause the query to fail.

In the presence of a match, as many edges are inserted as many (not necessarily unique) vertex pairs are matched. If a vertex pair is matched more than once, multiple edges will be inserted between the vertices.

For example consider the following query:

MODIFY/*beta*/ g
    ( INSERT EDGE e BETWEEN x AND y )
MATCH (x), (y) -> (z)
WHERE id(x) == 1

Sample graph

Figure 1: Sample graph

If the query is executed on the graph shown in Figure 1, the following vertices will be matched

x y z
V1 V2 V4
V1 V3 V2
V1 V3 V4

In that case, three edges will be inserted, one connecting V1 and V2 and two different edges, both connecting V1 and V3 as it is shown in Figure 2.

The sample graph after inserting edges

Figure 2: The sample graph after executing the query

Stacking Multiple Modifications in the Same Query

Multiple modifications can be executed in the same MODIFY query. For example, to update a vertex and also insert an edge with the same vertex as source, the following query can be used:

MODIFY/*beta*/ g
    ( INSERT EDGE e BETWEEN x AND y,
      UPDATE y SET PROPERTIES ( y.a = 12 ) )
MATCH (x), (y) 
WHERE id(x) = 1 AND id(y) = 2
Isolation Semantics of MODIFY Queries

Modify queries follow snapshot isolation, which means all modifications see a consistent state of the graph, that is its state before the execution of the update. For this reason, property assignments can come from updated and deleted vertices, but they cannot refer to inserted vertices.

For example, the query below succeeds, because y.age is evaluated based on the graph's status before the query.

MODIFY/*beta*/ g
    ( INSERT VERTEX x PROPERTIES (x.age = y.age),
      DELETE y)
MATCH (y)

Please note, that for the same reason, properties of newly inserted vertices cannot be referenced in the right-hand-side expressions. For example, the following query would fail as x is not yet in the graph, and x.age cannot be evaluated:

MODIFY/*beta*/ g
    ( INSERT VERTEX x PROPERTIES (v.age = 24),
      INSERT VERTEX y PROPERTIES (y.age = x.age))
Handling Conflicting Modifications

Multiple modifications on the same entity are not allowed, in such cases the execution terminates with an error. This section only addresses conflicts between different modifications under the same MODIFY query. For the conflicts within the same modification, please refer to the corresponding sections.

One example for such conflict would be the UPDATE-DELETE conflicts. The same entity cannot be updated and deleted in the same query.

For example, let us consider the following query:

MODIFY/*beta*/ 
    ( UPDATE x SET PROPERTIES ( x.a = 11 ),
      DELETE x ) 
MATCH (x)

There the conflict is trivial between the deleted and the updated vertex. However, the conflict is not always straightforward, for example, the following query can also fail due to conflicting update and delete:

MODIFY/*beta*/ 
    ( UPDATE x SET PROPERTIES ( x.a = 11 ),
      DELETE y ) 
MATCH (x) -> (y)

If the vertices matched by x are distinct to the ones matched by y the query should pass, however, if there is a vertex that is matched by both x and y the query will fail with an exception. Note that the order of modifications does not matter, the query will fail in any case.

Similar behavior is expected upon INSERT-DELETE conflicts, where the inserted entity depends on an entity that is being deleted. Note that because of the snapshot semantics, this is only possible if an edge is inserted, and at the same time its source or destination vertex is deleted.

For example, consider the following, not trivial case:

MODIFY/*beta*/ 
    ( INSERT EDGE e BETWEEN x AND y,
      DELETE z ) 
MATCH (x) -> (y), (z) WHERE id(z) = 11

If any vertex is matched by z and either x or z then after executing the query the inserted edge would not have a source or destination. Thus in that case the execution fails.