18.3 Supported PGQL Features and Limitations on the Graph Server (PGX)

Learn about the supported and unsupported PGQL functionalities in the graph server (PGX).

Table 18-2 Supported PGQL Functionalities and Limitations on the Graph Server (PGX)

Features PGQL on the Graph Server (PGX)
CREATE PROPERTY GRAPH Supported
Limitations:
  • No composite keys for vertices.
  • Properties need to be column references; arbitrary property expressions are not supported unless the graph is first created in the database and then loaded into the graph server (PGX).
DROP PROPERTY GRAPH Not Supported
Fixed-length pattern matching Supported
Variable-length pattern matching goals Supported:
  • Reachability
  • Path search prefixes:
    • ANY
    • ANY SHORTEST
    • SHORTEST k
    • ALL SHORTEST
    • ANY CHEAPEST
    • CHEAPEST k
    • ALL
  • Path modes:
    • WALK
    • TRAIL
    • SIMPLE
    • ACYCLIC
Variable-length pattern matching quantifiers Supported:
  • *
  • +
  • ?
  • { n }
  • { n, }
  • { n, m }
  • { , m }

Limitations:

  • ? is only supported for reachability
  • In case of ANY CHEAPEST and TOP k CHEAPEST, only * is supported
Variable-length path unnesting Supported:
  • ONE ROW PER VERTEX
  • ONE ROW PER STEP

Limitation:

  • * quantifier is not supported
GROUP BY Supported
HAVING Supported
Aggregations Supported:
  • COUNT
  • MIN, MAX, AVG, SUM
  • LISTAGG
  • ARRAY_AGG

Not Supported

  • JSON_ARRAYAGG
DISTINCT
  • SELECT DISTINCT
  • Aggregation with DISTINCT (such as, COUNT(DISTINCT e.prop))
Supported
SELECT v.* Supported
ORDER BY (+ASC/DESC), LIMIT, OFFSET Supported
Data Types Supported:
  • INTEGER (32-bit)
  • LONG (64-bit)
  • FLOAT (32-bit)
  • DOUBLE (64-bit)
  • STRING (no maximum length)
  • BOOLEAN
  • DATE
  • TIME
  • TIME WITH TIME ZONE
  • TIMESTAMP
  • TIMESTAMP WITH TIME ZONE
JSON No built-in JSON support. However, JSON values can be stored as STRING and manipulated or queried through user-defined functions (UDFs) written in Java or JavaScript.
Operators Supported:
  • Relational: +, -, *, /, %, - (unary minus)
  • Arithmetic: =, <>, <, >, <=, >=
  • Logical: AND, OR, NOT
  • String: || (concat)
Functions and predicates Supported:
  • IS NULL, IS NOT NULL
  • JAVA_REGEXP_LIKE (based on CONTAINS)
  • LOWER, UPPER
  • SUBSTRING
  • ABS, CEIL/CEILING, FLOOR, ROUND
  • EXTRACT
  • ID, VERTEX_ID, EDGE_ID
  • LABEL, LABELS, IS [NOT] LABELED
  • ALL_DIFFERENT
  • IN_DEGREE, OUT_DEGREE
  • CAST
  • CASE
  • IN and NOT IN
  • MATCHNUM
  • ELEMENT_NUMBER
  • IS [NOT] SOURCE [OF], IS [NOT] DESTINATION [OF]
  • VERTEX_EQUAL, EDGE_EQUAL
User-defined functions Supported:
  • Java UDFs
  • JavaScript UDFs
Subqueries:
  • Scalar subqueries
  • EXISTS and NOT EXISTS subqueries
  • LATERAL subquery
Supported
GRAPH_TABLE operator Supported
INSERT/UPDATE/DELETE Supported
INTERVAL literals and operations

Supported literals:

  • SECOND
  • MINUTE
  • HOUR
  • DAY
  • MONTH
  • YEAR

Supported operations:

  • Add INTERVAL to datetime (+)
  • Subtract INTERVAL from datetime (-)

Also, the following explains certain supported and unsupported PGQL features:

18.3.1 Support for Selecting All Properties

You can use SELECT v.* to select all properties of the vertices or edges that bind to the variable v. For example:

SELECT label(n), n.* FROM MATCH (n) ORDER BY "number", "name"

On execution, the query output is as shown:

+-----------------------------+
| label(n) | number | name    |
+-----------------------------+
| Account  | 1001   | <null>  |
| Account  | 2090   | <null>  |
| Account  | 8021   | <null>  |
| Account  | 10039  | <null>  |
| Person   | <null> | Camille |
| Person   | <null> | Liam    |
| Person   | <null> | Nikita  |
| Company  | <null> | Oracle  |
+-----------------------------+

You can use label expressions to select properties that belong to the specified vertex or edge labels. For example:

SELECT label(n), n.* FROM MATCH (n:Person) ORDER BY "name"

The preceding query retrieves all the properties for the specified Person label:

+--------------------+
| label(n) | name    |
+--------------------+
| Person   | Camille |
| Person   | Liam    |
| Person   | Nikita  |
+--------------------+

You can also specify a PREFIX to avoid duplicate column names in cases where you select all properties using multiple variables. For example:

SELECT n.* PREFIX 'n_', e.* PREFIX 'e_', m.* PREFIX 'm_'
FROM MATCH (n:Account) -[e:transaction]-> (m:Account)
ORDER BY "e_amount"

The query output is as shown:

+--------------------------------+
| n_number | e_amount | m_number |
+--------------------------------+
| 10039    | 1000.0   | 8021     |
| 8021     | 1500.3   | 1001     |
| 8021     | 3000.7   | 1001     |
| 2090     | 9900.0   | 10039    |
| 1001     | 9999.5   | 2090     |
+--------------------------------+

18.3.2 Unnesting of Variable-Length Path Queries

Unnesting of variable-length path queries (such as, SHORTEST or CHEAPEST paths) to obtain a separate row for each vertex or edge along a path is supported.

You can unnest a path aggregation using one of the following options:
  • ONE ROW PER MATCH (default option)
  • ONE ROW PER VERTEX(vertex_variable)
  • ONE ROW PER STEP(edge_source_variable,edge_variable,edge_destination_variable)

For example, the following PGQL query uses the ONE ROW PER STEP option:

SELECT  v1.ACCT_ID AS src_no, k.TXN_AMOUNT, v2.ACCT_ID AS dest_no 
FROM MATCH ALL SHORTEST (a:Accounts) -[e:transfers]->+ (b:Accounts)
ONE ROW PER STEP( v1,k,v2 )
WHERE a.ACCT_ID = 284 AND b.ACCT_ID = 616

It is important to note that the ONE ROW PER STEP option only supports paths with a minimal hop greater than 0 and hence * quantifier is not supported with this option.

On execution, the preceding query retrieves one row for every edge on the path that is bound by the corresponding source and destination vertices:

+-------------------------------+
| src_no | TXN_AMOUNT | dest_no |
+-------------------------------+
| 744    | 1000.0     | 616     |
| 772    | 1000.0     | 744     |
| 284    | 1000.0     | 772     |
| 744    | 1000.0     | 616     |
| 772    | 1500.0     | 744     |
| 284    | 1000.0     | 772     |
+-------------------------------+

You can also use the Graph Visualization tool to visualize edges using ONE ROW PER STEP along a path:

Figure 18-1 Visualizing Unnesting of Variable-Length Path Queries

Description of Figure 18-1 follows
Description of "Figure 18-1 Visualizing Unnesting of Variable-Length Path Queries"

An example for a query with the ONE ROW PER VERTEX option is as follows:

SELECT k.acct_id AS id, k.acct_name AS name
FROM MATCH ANY SHORTEST (a:Accounts) ((src:Accounts)-[e:transfers]->){1,3}(b:Accounts)
ONE ROW PER VERTEX(k)
WHERE a.acct_id=284 AND b.acct_id=616

On execution, the preceding query retrieves one row per vertex along a path:

+----------------+
| id  | name     |
+----------------+
| 616 | Account4 |
| 744 | Account3 |
| 772 | Account2 |
| 284 | Account1 |
+---------------+

Built-in Function Support for Recursive Path Unnesting Queries

PGQL supports the following two built-in functions, which can be used in combination with any of the path unnesting option (ONE ROW PER VERTEX, ONE ROW PER STEP or ONE ROW PER MATCH):

  • MATCH_NUMBER(k): Returns a unique per-path identifier for each unnested path (that is, if two rows come from the same path, they have the same MATCH_NUMBER(k)).
  • ELEMENT_NUMBER(k): Returns the element number of a vertex or an edge along a path. Vertices are numbered with odd numbers, the leftmost vertex is numbered 1, the second 3, then 5 and so on. Edges are assigned with even numbers, starting with 2 for the leftmost edge, 4 for the next one, and so on.

For example, the following PGQL query uses the MATCH_NUMBER(k) and ELEMENT_NUMBER(k) functions with ONE ROW PER VERTEX option:

SELECT k.*, match_number(k), element_number(k)
FROM MATCH ANY SHORTEST (a:Accounts) -[e:transfers]->* (b:Accounts) ONE ROW PER VERTEX ( k )
WHERE a.acct_id = 284 AND b.acct_id = 616

The preceding query produces the following output on execution. Note that the element_number(k) returned for the vertices are odd numbered values. Since the preceding query uses ANY path pattern, there is only one arbitrary path displayed in the output. Therefore match_number(k) is the same for all the rows in the path.

+-----------------------------------------------------------+
| ACCT_ID | ACCT_NAME | match_number(k) | element_number(k) |
+-----------------------------------------------------------+
| 616     | Account   | 0               | 7                 |
| 744     | Account   | 0               | 5                 |
| 772     | Account   | 0               | 3                 |
| 284     | Account   | 0               | 1                 |
+-----------------------------------------------------------+

The following example shows a PGQL query using MATCH_NUMBER(k) and ELEMENT_NUMBER(k) functions with ONE ROW PER STEP option:

SELECT v1.acct_id AS src_no,k.txn_amount,v2.acct_id AS dest_no, match_number(k), element_number(k)
FROM MATCH ALL SHORTEST (a:Accounts) -[e:transfers]->+ (b:Accounts)
ONE ROW PER STEP( v1,k,v2 )
WHERE a.acct_id = 284 AND b.acct_id = 616

The preceding query output is as shown. Note that there are two paths identified by match_number(k)and the edges are displayed with even numbered element_number(k) values.

+---------------------------------------------------------------------+
| src_no | txn_amount | dest_no | match_number(k) | element_number(k) |
+---------------------------------------------------------------------+
| 744    | 1000.0     | 616     | 0               | 6                 |
| 772    | 1000.0     | 744     | 0               | 4                 |
| 284    | 1000.0     | 772     | 0               | 2                 |
| 744    | 1000.0     | 616     | 1               | 6                 |
| 772    | 1500.0     | 744     | 1               | 4                 |
| 284    | 1000.0     | 772     | 1               | 2                 |
+---------------------------------------------------------------------+

18.3.3 Using INTERVAL Literals in PGQL Queries

You can use INTERVAL literals in PGQL queries to add or subtract intervals to or from PGQL temporal data types respectively.

See the PGQL 1.5 Specification for the supported temporal data types.

An INTERVAL type is a period of time, which consists of the keyword "INTERVAL" followed by a numeral and a temporal unit. For example, INTERVAL '1' DAY.

The following table shows the valid temporal units that are supported in INTERVAL values:

Table 18-3 Valid values for fields in INTERVAL values

Keyword Supported Valid Values
YEAR Unconstrained except by <interval leading field precision>
MONTH Months (within years) (0-11)
DAY Unconstrained except by <interval leading field precision>
HOUR Hours (within days) (0-23)
MINUTE Minutes (within hours) (0-59)
SECOND Seconds (within minutes) (0-59.999...)

The following INTERVAL operations are supported on a temporal data type:

  • TEMPORAL TYPE + INTERVAL
  • INTERVAL + TEMPORAL TYPE
  • TEMPORAL TYPE - INTERVAL

For example, the following PGQL query retrieves persons where n.birthdate + INTERVAL '20' YEAR > TIMESTAMP '2000-01-01 00:00:00':

opg4j> graph.queryPgql("SELECT n.name, n.birthdate FROM MATCH (n:Person) WHERE n.birthdate + INTERVAL '20' YEAR > TIMESTAMP '2000-01-01 00:00:00'").print()
graph.queryPgql("SELECT n.name, n.birthdate FROM MATCH (n:Person) WHERE n.birthdate + INTERVAL '20' YEAR > TIMESTAMP '2000-01-01 00:00:00'").print();
graph.query_pgql("SELECT n.name, n.birthdate FROM MATCH (n:Person) WHERE n.birthdate + INTERVAL '20' YEAR > TIMESTAMP '2000-01-01 00:00:00'").print()

On execution, the query output is as shown:

+--------------------------+
| name  | birthdate        |
+--------------------------+
| Mary  | 1982-09-25T00:00 |
| Alice | 1987-02-01T00:00 |
+--------------------------+

18.3.4 Using Path Modes with PGQL

The following path modes are available in combination with ANY, ALL, ANY SHORTEST, SHORTEST k, and ALL SHORTEST:

  • WALK (default path mode): A walk is traversing a graph through a sequence of vertices and edges. The vertices and edges visited in a walk can be repeated. Hence there is no filtering of paths in this default path mode.
  • TRAIL: A trail is traversing a graph without repeating the edges. Therefore, path bindings with repeated edges are not returned.
    SELECT CAST(a.number AS STRING) || ' -> ' || LISTAGG(x.number, ' -> ') AS accounts_along_path
    FROM MATCH ALL TRAIL PATHS (a IS account) (-[IS transaction]-> (x)){2,} (b IS Account)
    WHERE a.number = 8021 AND b.number = 1001
    
    +-----------------------------------------------+
    | accounts_along_path                           |
    +-----------------------------------------------+
    | 8021 -> 1001 -> 2090 -> 10039 -> 8021 -> 1001 |
    | 8021 -> 1001 -> 2090 -> 10039 -> 8021 -> 1001 |
    +-----------------------------------------------+
    In the preceding output, both the paths contain the vertices 8021 and 1001 twice but they are still valid trails as long as no edges are repeated.
  • ACYCLIC: If the starting and ending vertex in a graph traversal are different, then this implies that there are no cycles in the path. In this case, the path bindings with repeated vertices are not returned.
    SELECT CAST(a.number AS STRING) || ' -> ' || LISTAGG(x.number, ' -> ') AS accounts_along_path
    FROM MATCH SHORTEST 10 ACYCLIC PATHS (a IS account) (-[IS transaction]-> (x))+ (b)
    WHERE a.number = 10039 AND b.number = 1001
    
    +-----------------------+
    | accounts_along_path   |
    +-----------------------+
    | 10039 -> 8021 -> 1001 |
    | 10039 -> 8021 -> 1001 |
    +-----------------------+
    The preceding query requested 10 shortest paths. But only two are returned since all the other paths are cyclic.
  • SIMPLE: A simple walk is traversing a graph without repeating the vertices. Therefore, path bindings with repeated vertices are not returned. The only exception is when the repeated vertex is the first and the last in a path.
    SELECT CAST(a.number AS STRING) || ' -> ' || LISTAGG(x.number, ' -> ') AS accounts_along_path
    FROM MATCH ANY SIMPLE PATH (a IS account) (-[IS transaction]-> (x))+ (a)
    WHERE a.number = 10039
    
    +----------------------------------------+
    | accounts_along_path                    |
    +----------------------------------------+
    | 10039 -> 8021 -> 1001 -> 2090 -> 10039 |
    +----------------------------------------+
    The preceding query returns a cyclic path. This path is a valid simple path since it starts and ends in the same vertex and there is no other cycle in the path.

Note that the path modes are syntactically placed after ANY, ALL, ANY SHORTEST, SHORTEST k, ALL SHORTEST, CHEAPEST, and CHEAPEST k. The path mode is optionally followed by a PATH or PATHS keyword.

Note that using TRAIL, ACYCLIC, or SIMPLE matching path modes for all unbounded quantifiers guarantees that the result set of a graph pattern matching will be finite.

18.3.5 Support for PGQL Lateral Subqueries

You can use a LATERAL subquery to pass the output rows of one query into another.

For example, you can use the ORDER BY or GROUP BY clause on top of another ORDER BY or GROUP BY clause:

/* Find the top-5 largest transactions and return the account number
   that received the highest number of such large transactions */
SELECT recipient, COUNT(*) AS num_large_transactions
FROM LATERAL ( SELECT m.number AS recipient
               FROM MATCH (n:account) -[e:transaction]-> (m:account)
               ORDER BY e.amount DESC
               LIMIT 5 )
GROUP BY recipient
ORDER BY num_large_transactions DESC
LIMIT 1

Also, the LATERAL subquery in the FROM clause can be followed by one or more MATCH clauses. For example:

SELECT path_num, elem_num, owner.name
FROM LATERAL ( SELECT v, MATCHNUM(v) AS path_num, ELEMENT_NUMBER(v) AS elem_num
               FROM MATCH SHORTEST 2 PATHS (a1:account) -[e:transaction]->* (a2:account)
                      ONE ROW PER VERTEX ( v )
               WHERE a1.number = 10039 AND a2.number = 2090 )
    , MATCH (v) -[:owner]-> (owner:Person|Company)
ORDER BY path_num, elem_num 

Note that the FROM clause may contain any number of MATCH clauses and LATERAL subqueries.

Both, MATCH clause followed by a LATERAL subquery, or a LATERAL subquery followed by one or more LATERAL subqueries are supported.

Also, note the following:

  • Variables exported from previous table expressions can be used in subsequent table expressions.
  • Operators like ORDER BY and LIMIT can be used followed by additional pattern matching.
  • Variables that are not projected from a LATERAL subquery cannot be accessed in the outer query.

The following example query first retrieves a list of companies, Then, it finds the respective accounts of the companies. Finally, it finds the top 2 transactions made to the account of each company in the initial list.

SELECT c.name, a.number, t.amount FROM 
  LATERAL ( SELECT c FROM MATCH (c:Company) ORDER BY c.name LIMIT 1),
  MATCH (a:Account)->(c), 
  LATERAL (SELECT t FROM MATCH ()-[t:Transaction]->(a) ORDER BY t.amount DESC LIMIT 2)

In the following query, the LATERAL subquery is followed by two other LATERAL subqueries. Each subquery builds upon the output from the previous clauses:

SELECT f.number as fundsAccount FROM
LATERAL ( SELECT p FROM MATCH (p:Person) WHERE p.name = 'Nikita'),
LATERAL ( SELECT a FROM MATCH (a)->(p)),
LATERAL ( SELECT f FROM MATCH (f)->(a))

18.3.6 Support for PGQL GRAPH_TABLE Operator

The GRAPH_TABLE operator in PGQL increases the interoperability between graphs loaded into the graph server (PGX) and the graphs on the database.

However, in order to comply with the SQL standard, ensure that the PGQL query syntax is aligned as shown:
  • The label predicate in the graph pattern MATCH query must use the IS keyword.
  • To limit the number of output rows, use the FETCH [FIRST/NEXT] x [ROW/ROWS] clause instead of the LIMIT x clause.
  • To verify the orientation of the edge, use v IS [NOT] SOURCE [OF] e/v IS [NOT] DESTINATION [OF] e as the standard form instead of [NOT] is_source_of(e, v) / [NOT] is_destination_of(e, v).
  • To verify if the vertex or edge has the given label, use the x IS [NOT] LABELED <label_string> predicate as an alternative for has_label(x, <label_string>).
  • To match the k shortest paths, use MATCH (n) –[e]->* (m) KEEP SHORTEST k as the standard form of MATCH TOP k SHORTEST (n) –[e]->* (m).
  • ALL keyword optional in front of fixed-length path patterns.

    MATCH (n) –[e]->{1,4} (m) as an alternative for MATCH ALL (n) –[e]->{1,4} (m).

  • MATCH <path pattern> KEEP <path pattern prefix> <WHERE clause> as an alternative for MATCH <path pattern prefix> <path pattern> <WHERE clause>

The following shows a few query examples using the GRAPH_TABLE operator:

Example 18-2 Aggregation Query Using TRAIL path mode with ALL

SELECT *
FROM GRAPH_TABLE ( financial_transactions
       MATCH ALL TRAIL (a IS account) -[e IS transaction]->* (b IS account)
       /* optional ONE ROW PER VERTEX/STEP clause here */
       WHERE a.number = 8021 AND b.number = 1001
       COLUMNS ( LISTAGG(e.amount, ', ') AS amounts )
     )ORDER BY amounts

The preceding query produces the following output:

+----------------------------------------+
| amounts                                |
+----------------------------------------+
| 1500.3                                 |
| 1500.3, 9999.5, 9900.0, 1000.0, 3000.7 |
| 3000.7                                 |
| 3000.7, 9999.5, 9900.0, 1000.0, 1500.3 |
+----------------------------------------+

Example 18-3 Aggregation Query Using KEEP Clause

SELECT *
FROM GRAPH_TABLE ( financial_transactions
       MATCH (a IS Account) -[e IS transaction]->+ (a)
       KEEP SIMPLE PATHS
       WHERE a.number = 10039
       COLUMNS ( LISTAGG(e.amount, ', ') AS amounts_along_path,
                 SUM(e.amount) AS total_amount )
     )
ORDER BY total_amount DESC

The preceding query produces the following output:

+-----------------------------------------------+
| amounts_along_path             | total_amount |
+-----------------------------------------------+
| 1000.0, 3000.7, 9999.5, 9900.0 | 23900.2      |
| 1000.0, 1500.3, 9999.5, 9900.0 | 22399.8      |
+-----------------------------------------------+

18.3.7 Limitations on Quantifiers

Although all quantifiers such as *, +, and {1,4} are supported for reachability and shortest path patterns, the only quantifier that is supported for cheapest path patterns is * (zero or more).

18.3.8 Limitations on WHERE and COST Clauses in Quantified Patterns

The WHERE and COST clauses in quantified patterns, such as reachability patterns or shortest and cheapest path patterns, are limited to referencing a single variable only.

The following are examples of queries that are not supported because the WHERE or COST clauses reference two variables e and x instead of zero or one:

... PATH p AS (n) –[e]-> (m) WHERE e.prop > m.prop ...
... SHORTEST ( (n) (-[e]-> (x) WHERE e.prop + x.prop > 10)* (m) ) ...
... CHEAPEST ( (n) (-[e]-> (x) COST e.prop + x.prop )* (m) ) ...

The following query is supported because the subquery only references a single variable a from the outer scope, while the variable c does not count since it is newly introduced in the subquery:

... PATH p AS (a) -> (b)
      WHERE EXISTS ( SELECT * FROM MATCH (a) -> (c) ) ...