Working With Arrays

You can use slice or filter steps to select elements out of an array. We start with some examples using slice steps.

To select and display the second connection of each person, we use this query:

sql-> SELECT lastname, connections[1]
AS connection FROM Persons;
 +----------+------------+
 | lastname | connection |
 +----------+------------+
 | Scully   |          2 |
 | Smith    |          4 |
 | Morgan   |          2 |
 | Anderson |          2 |
 | Morrison |          2 |
 +----------+------------+

5 rows returned 

In the example, the slice step [1] is applied to the connections array. Since array elements start with 0, 1 selects the second connection value.

You can also use a slice step to select all array elements whose positions are within a range: [low:high], where low and high are expressions to specify the range boundaries. You can omit low and high expressions if you do not require a low or high boundary.

For example, the following query returns the lastname and the first 3 connections of person 5 as strongconnections:

sql-> SELECT lastname, [connections[0:2]]
AS strongconnections FROM Persons WHERE id = 5;
 +----------+-------------------+
 | lastname | strongconnections |
 +----------+-------------------+
 | Scully   | 2                 |
 |          | 4                 |
 |          | 1                 |
 +----------+-------------------+

1 row returned 

In the above query for Person 5, the path expression connections[0:2] returns the person's first 3 connections. Here, the range is [0:2], so 0 is the low expression and 2 is the high. The path expression returns its result as a list of 3 items. The list is converted to an array (a single item) by enclosing the path expression in an array-constructor expression ([]). The array constructor creates a new array containing the three connections. Notice that although the query shell displays the elements of this constructed array vertically, the number of rows returned by this query is 1.

Use of the array constructor in the select clause is optional. If no array constructor is used, an array will still be constructed, but only if the select-clause expression does indeed return more than one item. If exactly one item is returned, the result will contain just that one item. If the expression returns nothing (an empty result), NULL is used as the result. This behavior is illustrated in the next example, which we will run with and without an array constructor.

As mentioned above, you can omit the low or high expression when specifying the range for a slice step. For example the following query specifies a range of [3:] which returns all connections after the third one. Notice that for persons having only 3 connections or less, an empty array is constructed and returned due to the use of the array constructor.

To fully illustrate this behavior, we display this output in mode JSON because the COLUMN mode does not differentiate between a single item and an array containing a single item.

sql-> mode JSON
Query output mode is JSON
sql-> SELECT id, [connections[3:]] AS weakConnections FROM Persons;
{"id":3,"weakConnections":[]}
{"id":4,"weakConnections":[2]}
{"id":2,"weakConnections":[]}
{"id":5,"weakConnections":[3]}
{"id":1,"weakConnections":[]}

5 rows returned 

Now we run the same query, but without the array constructor. Notice how single items are not contained in an array, and for rows with no match, NULL is returned instead of an empty array.

sql-> SELECT id, connections[3:] AS weakConnections FROM Persons;
{"id":2,"weakConnections":null}
{"id":3,"weakConnections":null}
{"id":4,"weakConnections":2}
{"id":5,"weakConnections":3}
{"id":1,"weakConnections":null}

5 rows returned
sql-> mode COLUMN
Query output mode is COLUMN
sql-> 

As a last example of slice steps, the following query returns the last 3 connections of each person. In this query, the slice step is [size($)-3:]. In this expression, the $ is an implicitly declared variable that references the array that the slice step is applied to. In this example, $ references the connections array. The size() built-in function returns the size (number of elements) of the input array. So, in this example, size($) is the size of the current connections array. Finally, size($)-3 computes the third position from the end of the current connections array.

sql-> SELECT id, [connections[size($)-3:]]
AS weakConnections FROM Persons;
 +----+-------------------+
 | id | weakConnections   |
 +----+-------------------+
 |  5 | 4                 |
 |    | 1                 |
 |    | 3                 |
 +----+-------------------+
 |  4 | 5                 |
 |    | 1                 |
 |    | 2                 |
 +----+-------------------+
 |  3 | 1                 |
 |    | 4                 |
 |    | 2                 |
 +----+-------------------+
 |  2 | 1                 |
 |    | 3                 |
 +----+-------------------+
 |  1 | 2                 |
 |    | 3                 |
 +----+-------------------+

5 rows returned 

We now turn our attention to filter steps on arrays. Like slice steps, filter steps also use the square brackets ([]) syntax. However, what goes inside the [] is different. With filter steps there is either nothing inside the [] or a single expression that acts as a condition (returns a boolean result). In the former case, all the elements of the array are selected (the array is "unnested"). In the latter case, the condition is applied to each element in turn, and if the result is true, the element is selected, otherwise it is skipped. For example:

The following query returns the id and connections of persons who are connected to person 4:

sql-> SELECT id, connections
FROM Persons p WHERE p.connections[] =any 4;
 +----+-------------+
 | id | connections |
 +----+-------------+
 |  3 | 1           |
 |    | 4           |
 |    | 2           |
 +----+-------------+
 |  5 | 2           |
 |    | 4           |
 |    | 1           |
 |    | 3           |
 +----+-------------+

2 rows returned 

In the above query, the expression p.connections[] returns all the connections of a person. Then, the =any operator returns true if this sequence of connections contains the number 4.

The following query returns the id and connections of persons who are connected with any person having an id greater than 4:

sql-> SELECT id, connections FROM Persons p
WHERE p.connections[] >any 4;
 +----+-------------+
 | id | connections |
 +----+-------------+
 |  4 | 3           |
 |    | 5           |
 |    | 1           |
 |    | 2           |
 +----+-------------+

1 row returned 

The following query returns, for each person, the person's last name and the phone numbers with area code 339:

sql-> SELECT lastname,
[ p.address.phones[$element.areacode = 339].number ]
AS phoneNumbers FROM Persons p;
 +----------+--------------+
 | lastname | phoneNumbers |
 +----------+--------------+
 | Scully   | 3414578      |
 +----------+--------------+
 | Smith    | 4120211      |
 |          | 8694021      |
 |          | 1205678      |
 +----------+--------------+
 | Morgan   |              |
 +----------+--------------+
 | Anderson | 1684972      |
 +----------+--------------+
 | Morrison |              |
 +----------+--------------+

5 rows returned 

In the above query, the filter step [$element.areacode = 339] is applied to the phones array of each person. The filter step evaluates the condition $element.areacode = 339 on each element of the array. This condition expression uses the implicitly declared variable $element, which references the current element of the array. An empty array is returned for persons that do not have any phone number in the 339 area code. If we wanted to filter out such persons from the result, we would write the following query:

sql-> SELECT lastname,
[ p.address.phones[$element.areacode = 339].number ]
AS phoneNumbers FROM Persons p WHERE p.address.phones.areacode =any 339;
 +----------+--------------+
 | lastname | phoneNumbers |
 +----------+--------------+
 | Scully   | 3414578      |
 +----------+--------------+
 | Smith    | 4120211      |
 |          | 8694021      |
 |          | 1205678      |
 +----------+--------------+
 | Anderson | 1684972      |
 +----------+--------------+

3 rows returned 

The previous query contains the path expression p.address.phones.areacode. In that expression, the field step .areacode is applied to an array field (phones). In this case, the field step is applied to each element of the array in turn. In fact, the path expression is equivalent to p.address.phones[].areacode.

In addition to the implicitly-declared $ and $element variables, the condition inside a filter step can also use the $pos variable (also implicitly declared). $pos references the position within the array of the current element (the element on which the condition is applied). For example, the following query selects the "interesting" connections of each person, where a connection is considered interesting if it is among the 3 strongest connections and connects to a person with an id greater or equal to 4.

sql-> SELECT id, [p.connections[$element >= 4 and $pos < 3]]
AS interestingConnections FROM Persons p;
 +----+------------------------+
 | id | interestingConnections |
 +----+------------------------+
 |  5 | 4                      |
 +----+------------------------+
 |  4 | 5                      |
 +----+------------------------+
 |  3 | 4                      |
 +----+------------------------+
 |  2 |                        |
 +----+------------------------+
 |  1 |                        |
 +----+------------------------+

5 rows returned 

Finally, two arrays can be compared with each other using the usual comparison operators (=, !=, >, >=, >, and >=). For example the following query constructs the array [1,3] and selects persons whose connections array is equal to [1,3].

sql-> SELECT lastname FROM Persons p
WHERE p.connections = [1,3];
 +----------+
 | lastname |
 +----------+
 | Anderson |
 +----------+

1 row returned