Examples: Using Indexes for Query Optimization

Example 9-1 Using Indexes for Query Optimization

SELECT * FROM Users2
WHERE 10 < income AND income < 20;

The query contains 2 index predicates. Indexes idx1, idx2, midx2, and midx3 are all applicable. For index idx1, 10 < income is a start predicate and income < 20 is a stop predicate. For the other indexes, both predicates are filtering predicates. If, say, idx2 were to be used, the subrange to scan is the whole index. Obviously, idx1 is better than the other indexes in this case. Notice however, that the number of table rows retrieved would be the same whether idx1 or idx2 were used. If midx2 or midx3 were used, the number of distinct rows retrieved would be the same as for idx1 and idx2, but a row would be retrieved as many times as the number of elements in the phones array of that row. Such duplicates are eliminated from the final query result set.

Notice that if index idx2 was created WITH NO NULLS, it would not be applicable to this query, because it does not have index predicates for fields address.state and address.city. For example, if Users2 contains a row where address.city is NULL and income is 15, the index would not contain any entry for this row, and as a result, if the index was used, the row would not appear in the result, even though it does qualify. The same is true for indexes midx2 and midx3. On the other hand, even if idx1 was created WITH NO NULLS, it would still be applicable, because it indexes a single field (income) and the query contains 2 start/stop predicates on that field.

Example 9-2 Using Indexes for Query Optimization

SELECT * FROM Users2
WHERE 20 < income OR income < 10;

The query contains 1 index predicate, which is the whole WHERE expression. Indexes idx1, idx2, midx2, midx3 are all applicable. For all of them, the predicate is a filtering predicate.

Example 9-3 Using Indexes for Query Optimization

SELECT * FROM Users2
WHERE 20 < income OR age > 70;

There is no index predicate in this case, because no index has information about user ages.

Example 9-4 Using Indexes for Query Optimization

SELECT * FROM Users2 u
WHERE u.address.state = "CA" 
    AND u.address.city = "San Jose";

Only idx2 is applicable. There are 2 index predicates, both of which serve as both start and stop predicates.

Example 9-5 Using Indexes for Query Optimization

SELECT id, 2*income FROM Users2 u
WHERE u.address.state = "CA" 
    AND u.address.city = "San Jose";

Only idx2 is applicable. There are 2 index predicates, both of which serve as both start and stop predicates. In this case, the id and income information needed in the SELECT clause is available in the index. As a result, the whole query can be answered from the index only, with no access to the table. We say that index idx2 is a covering index for the query in Example 5. The query processor will apply this optimization.

Example 9-6 Using Indexes for Query Optimization

SELECT * FROM Users2 u
WHERE u.address.state = "CA" 
    AND u.address.city = "San Jose" 
    AND u.income > 10;

idx1, idx2, midx2, and midx3 are applicable. For idx2, there are 3 index predicates: the state and city predicates serve as both start and stop predicates; the income predicate is a start predicate. For idx1 only the income predicate is applicable, as a start predicate. For midx2 and midx3, the income predicate is a filtering one.

Example 9-7 Using Indexes for Query Optimization

SELECT * FROM Users2 u
WHERE u.address.state = "CA" 
    AND u.income > 10;

idx1, idx2, midx2, and midx3 are applicable. For idx2, there are 2 index predicates: the state predicate serves as both start and stop predicate; the income predicate is a filtering predicate. The income predicate is a start predicate for idx1 and a filtering predicate for midx2 and midx3.

Example 9-8 Using Indexes for Query Optimization

DELCARE $city STRING;

SELECT * FROM Users2 u
WHERE u.address.state = "CA" 
    AND u.address.city = $city 
    AND (u.income > 50 
        OR (10 < u.income 
            AND u.income < 20));

idx1, idx2, midx2, and midx3 are applicable. For idx2, there are 3 index predicates. The state and city predicates serve as both start and stop predicates. The composite income predicate is a filtering predicate for all the applicable indexes (it's rooted at an OR node).

Example 9-9 Using Indexes for Query Optimization

SELECT u.address.city, SUM(u.expenses.values())
    FROM Users2 u
    WHERE u.address.state = "CA"
    GROUP BY u.address.city
    ORDER BY SUM(u.expenses.values());

In this example, for each city in California, the total amount of user expenditures in that city is returned. The query orders the results by the total amount. Only idx2 is applicable. The state predicate is both a stop and a start predicate. Furthermore, the index is a sorting index, because for any given state it sorts the table rows by the names of the cities in that state and the GROUP BY groups by the cities in CA. As a result, the grouping in this query will be index-based and the ORDER BY will be generic. Notice that if instead of idx2 there were 2 separate indexes, one on states and another on cities, both would be applicable: the first because of the state predicate, and the second because of the grouping. In this case, the query processor would choose the state index in order to reduce the number of rows accessed, at the expense of doing a generic GROUP BY.

Example 9-10 Using Indexes for Query Optimization

SELECT id FROM Users3 u
WHERE EXISTS u.info.income;

In this example we use table Users3, which stores all information about users as json data. The query looks for users who record their income. Index jidx1 is applicable. The EXISTS condition is actually converted to 2 index start/stop conditions: u.info.income < EMPTY and u.info.income > EMPTY. As a result, two range scans are performed on the index.

Example 9-11 Using Indexes for Query Optimization

SELECT * FROM users2 u
    WHERE (u.address.state, u.address.city) IN
        (("CA","San Jose"), ("MA","Boston"))

In this example, the idx2 index will be used. Two scans will be performed on the index: one for entries whose state and city fields are "CA" and "San Jose", respectively, and another for entries whose state and city fields are "MA" and "Boston", respectively.

Example 9-12 Using Indexes for Query Optimization

SELECT * FROM users2 u
    WHERE u.address.state in ("CA", "MA") AND
        u.address.city in ("San Jose","Boston")

In this example, the idx2 index will be used. Four scans will be performed on the index. The search keys for these scans are determined by the cartesian product of the keys in the right-hand-side of the two IN operators: ("CA", "San Jose"), ("CA", "Boston"), ("MA, "San Jose"), and ("MA", "Boston").

Example 9-13 Using Indexes for Query Optimization

SELECT * FROM users2 u
    WHERE (u.address.state, u.income) IN
        (("CA", 10000), ("MA", 20000))

In this example, the idx2 index will be used. Two scans will be performed on the index: one for entries whose state field is "CA", and another for entries whose state field is "MA". Furthermore, the whole IN condition will be used as a filtering predicate on the entries returned by the two scans.

As the above examples indicate, a predicate will be used as a start/stop predicate for an index IDX only if:
  • It is of the form <path expr> op <const expr>, or <const expr> op <path expr>, or (<path expr1>, … <path exprN>) IN (<const exprs>)
  • op is a comparison operator (EXISTS, NOT EXISTS, IS NULL and IS NOT NULL are converted to predicates of this form, as shown in Q9).
  • <const expr> is an expression built from literals and external variables only (does not reference any tables or internal variables)
  • <path expr> is a path expression that is "matches" an index path P appearing in the CREATE INDEX statement for IDX. So far we have seen examples of exact matches only. In the examples below we will see some non-exact matches as well.
  • If P is not IDX's 1st index path, there are equality start/stop predicates for each index path appearing before P in IDX's definition.
  • The comparison operator may be one of the "any" operators. Such operators are matched against the multi-key index paths of multi-key indexes. As shown in the examples below, additional restrictions apply for such predicates.

Example 9-14 Using Indexes for Query Optimization

SELECT * FROM users2 u
WHERE u.connections[] = any 10;

midx1 is applicable and the predicate is both a start and a stop predicate.

Example 9-15 Using Indexes for Query Optimization

SELECT * FROM users2 u
WHERE u.connections[0:4] = any 10;

midx1 is applicable. The predicate to push down to mdx1 is u.connections[] =any 10, in order to eliminate users who are not connected at all with user 10. However, the original predicate (u.connections[0:4] =any 10) must be retained in the query to eliminate users who do have a connection with user 10, but not among their 5 strongest connections. This is an example where the query path expression does not match exactly the corresponding index path.

Example 9-16 Using Indexes for Query Optimization

SELECT * FROM users2 u
WHERE u.connections[] > any 10;

midx1 is applicable and the predicate is a start predicate.

Example 9-17 Using Indexes for Query Optimization

SELECT id FROM users2 u
WHERE 10 < any u.connections[] 
    AND u.connections[] < any 100 ;

midx1 is applicable, but although each predicate by itself is an index predicate, only one of them can actually be used as such. To see why, first notice that the query asks for users that have a connection with id greater than 10 and another connection (which may or may not be the same as the 1st one) with id less than 100. Next, consider a Users2 table with only 2 users (say with ids 200 and 500) having the following connections arrays respectively: [ 1, 3, 110, 120 ] and [1, 50, 130]. Both of these arrays satisfy the predicates in the query, and both users should be returned as a result. Now, consider midx1; it contains the following 7 entries:

[1, 200], [1, 500], [3, 200], [50, 500], [110, 200], [120, 200], [130, 500]

By using only the 1st predicate as a start predicate to scan the index, and applying the 2nd predicate on the rows returned by the index scan, the result of the query is 500, 200, which is correct. If on the other hand both predicates were used for the index scan, only entry [50, 500] would qualify, and the query would return only user 500.

Example 9-18 Using Indexes for Query Optimization

To search for users who have a connection in the range between 10 and 100, the following query can be used:

SELECT id FROM users2 u
WHERE exist u.connections
    [10 < $element AND $element < 100];

Assuming the same 2 users as in Example 13, the result of this query is user 500 only and both predicates can be used as index predicates (start and stop), because both predicates apply to the same array element. The query processor will indeed push both predicates to mdx1.

Example 9-19 Using Indexes for Query Optimization

SELECT * FROM Users2 u
WHERE u.address.phones.area = any 650 
    AND u.address.phones.kind = any "work" 
    AND u.income > 10;

This query looks for users whose income is greater than 10, and have a phone number with area code 650, and also have a work phone number (whose area code may not be 650). Index midx3 is applicable, but the address.phones.kind predicate cannot be used as an index predicate (for the same reason as in Example 13). Only the area code predicate can be used as a start/stop predicate and the income predicate as a filtering one. Indexes idx1, idx2, and midx2 are also applicable in Example 15.

Example 9-20 Using Indexes for Query Optimization

SELECT * FROM Users2 u
WHERE u.expenses.housing = 10000;

idx4 is applicable and the predicate is both a start and a stop predicate. midx4 is also applicable. To use midx4, two predicates must be pushed to it, even though only one appears in the query. The 1st predicate is on the "keys" index field and the second on the "values" field. Specifically, the predicates key = "price" and value = 10000 are pushed as start/stop predicates. This is another example where the match between the query path expression and an index path is not exact: we match expenses.housing with the expenses.values() index path, and additionally, generate an index predicate for the properties.keys() index path.

Example 9-21 Using Indexes for Query Optimization

SELECT * FROM Users2 u
WHERE u.expenses.travel = 1000 
    AND u.expenses.clothes > 500;

midx4 is applicable. Each of the query predicates is by itself an index predicate and can be pushed to midx4 the same way as the expenses.housing predicate in the previous example. However, the query predicates cannot be both pushed (at least not in the current implementation). The query processor has to choose one of them to push and the other will remain in the query. Because the expenses.travel predicate is an equality one, it's more selective than the greater-than predicate and the query processor will use that.