Optimizing unnesting queries with the UNNEST clause

An UNNEST clause is recommended when there is an index on the array(s) or map(s) that are being unnested. The UNNEST clause places some restrictions on the kinds of expressions that it contains. These restrictions can help the query processor in choosing the appropriate index on the unnested arrays/maps, resulting in much better performance.

Note:

See Limitation for expression usage in the UNNEST clause for more details on the restrictions placed by the UNNEST clause. The SQL statements to create tables and load data for using unnest queries is available here.

Example 1: Fetch different shows aired in the US and the number of people watching them

  1. The query is specific to a country (US) and a specific show. So create an index on the country and showid fields of the stream_acct table.
    CREATE INDEX idx_country_showid ON stream_acct( acct_data.country AS string,
                                                    acct_data.contentStreamed[].showId AS integer)
    WITH UNIQUE KEYS PER ROW

    Note:

    The index must be created with the “unique keys per row property” in order for such indexes to be usable by queries that unnest the same array(s)/map(s) as the index.
  2. The query to fetch different shows aired in the US and number of people watching it with the UNNEST clause.
    SELECT $show.showId, count(*) as cnt
    FROM stream_acct $s, unnest($s.acct_data.contentStreamed[] as $show)
    WHERE $s.acct_data.country = "USA"
    GROUP BY $show.showId ORDER BY count(*) desc
    
    {"showId":15,"cnt":2}
    {"showId":16,"cnt":2}
    
    The above query will use the idx_country_showid index. The country condition will be pushed to the index, the group-by will be index-based, and the index is a covering one for this query. An index that contains all required information to resolve the query is known as a Covering Index – it completely covers the query. Covering Index includes all the columns, the query refers to in the SELECT, JOIN, and WHERE clauses. If the UNNEST clause is not used, the index will not be considered.

Example 2: For every show aired by the application, the total watch time for all users:

The following query returns, for each show, the total time users have spent watching the show.
SELECT $show.showId, sum($show.seriesInfo.episodes.minWatched) 
AS total_time FROM stream_acct $s,
unnest($s.acct_data.contentStreamed[] AS $show) 
GROUP BY $show.showId
ORDER BY sum($show.seriesInfo.episodes.minWatched)
Let us examine the effect of creating an index on acct_data.contentStreamed[].showId as the data is been grouped based on showId.

Despite the use of the UNNEST clause, this query cannot use the idx_showid index. This is because of the argument to the sum() function. The idx_showid index contains just the showId (and the primary key). So, the expression $show.seriesInfo.episodes.minWatched cannot be evaluated from the index. There are two ways to optimize this query.

Option 1: Create an additional index:

Create an additional composite index on the showId and minWatched fields as both are used in the query.
CREATE INDEX idx_showid_minWatched ON
      stream_acct(acct_data.contentStreamed[].showId AS integer,
      acct_data.contentStreamed[].seriesInfo[].episodes[].minWatched AS integer,
      acct_data.contentStreamed[].seriesInfo[].episodes[].episodeID as integer)
WITH UNIQUE KEYS PER ROW

Note:

The episodeID must be added in this index, as the last index path, in order for the "unique keys per row" constraint to be satisfied. This index will be used by the query, as a covering index.

Option 2: Avoid the cost of an additional index:

You can rewrite the query to use the idx_showid index.The rewritten query below uses the idx_showid index, but the index is not covering.
SELECT $show.showId, sum($s.acct_data.contentStreamed[$element.showId = $show.showId].
                        seriesInfo.episodes.minWatched) AS total_time
FROM stream_acct $s, unnest($s.acct_data.contentStreamed[] AS $show) GROUP BY $show.showId
ORDER BY sum($s.acct_data.contentStreamed[$element.showId = $show.showId].seriesInfo.episodes.minWatched)

{"showId":26,"total_time":225}
{"showId":16,"total_time":440}
{"showId":15,"total_time":642}

Example 3: The total watch time of users per show and season

SELECT $show.showId, $seriesInfo.seasonNum, 
sum($seriesInfo.episodes.minWatched) AS length
FROM stream_acct n,
unnest(n.acct_data.contentStreamed[] AS $show, $show.seriesInfo[] as $seriesInfo)
GROUP BY $show.showId, $seriesInfo.seasonNum
ORDER BY sum($seriesInfo.episodes.minWatched)

{"showId":26,"seasonNum":2,"length":80}
{"showId":26,"seasonNum":1,"length":145}
{"showId":16,"seasonNum":2,"length":190}
{"showId":16,"seasonNum":1,"length":250}
{"showId":15,"seasonNum":2,"length":295}
{"showId":15,"seasonNum":1,"length":347}
For best performance of the above query, create the following index. The index idx_showid_seasonNum_minWatched is a multi key index. The episodeID must be added in this index, as the last index path, in order for the "unique keys per row" constraint to be satisfied.
CREATE INDEX idx_showid_seasonNum_minWatched ON
        stream_acct(acct_data.contentStreamed[].showId as integer,
        acct_data.contentStreamed[].seriesInfo[].seasonNum as integer,
        acct_data.contentStreamed[].seriesInfo[].episodes[].minWatched as integer,
        acct_data.contentStreamed[].seriesInfo[].episodes[].episodeID as integer)
WITH UNIQUE KEYS PER ROW

Note:

An index is called a multikey index if for each row of data in the table, there are multiple entries created in the index. In a multikey index, there is at least one index path that uses .keys(), .values(), or [] steps. Any such index path will be called a multikey index path.

If you want your query to use an index on the array(s)/maps() that it is unnesting, each path expression in the UNNEST clause must match with the multikey_path_prefix of an index path in the index. As shown in the syntax for the CREATE INDEX Statement the multikey_path_prefix is the part of an index path up to and including the last multikey step. For example, the multikey_path_prefix of the first index path should match the first path expression in the UNNEST clause and so on. If this is not the case, the index will not be used.

In the above query the expression n.value.contentStreamed[] matches the multikey_path_prefix of the first index path in idx_showid_seasonNum_minWatched, and the expression $show.seriesInfo[] matches the multikey_path_prefix of the second index path, after the $show variable is replaced with its domain expression. So this index will be used in the query.