Predicate Pushdown

To improve query performance, Apache Hive supports a mechanism referred to as predicate pushdown; in which the client side frontend processing of a given query decomposes the WHERE clause (the predicate) of the query into column information and corresponding comparison operators, passing (pushing) the resulting components to the database where search processing (filtering) can be performed on the database's server side backend. To achieve analogous performance improvements, the Oracle NoSQL Database table API Hive integration classes support similar predicate pushdown functionality when executing Hive or Big Data SQL queries against data in an Oracle NoSQL Database table. For example, consider the following query executed against the example vehicleTable whose schema was described previously in this document:

SELECT * FROM vehicleTable WHERE \
    type = 'auto' AND make = 'Chrysler' AND \
    model >= 'Imperial' AND model < 'Sebring';

This query will return all rows corresponding to automobiles (rather than trucks or SUVs) made by Chrysler; whose model is 'Imperial', 'Lebaron', or 'PTCruiser', but not 'Sebring'. If predicate pushdown is not employed when executing this query, then all rows from the vehicleTable will be retrieved from the store's backend database and returned to the frontend client, where the predicate information will be applied to search the returned rows for the desired matches. On the other hand, if predicate pushdown is employed, then the information in the WHERE clause is sent to the store and all filtering is performed in the database itself, so that only the rows of the table that match the predicate are returned to the client. The predicate pushdown, when it can be employed, can result in significant performance improvements.

As the examples presented in this document demonstrate, the variety of predicates that can be employed when querying a table can be virtually unlimited. So it is important to understand that the predicates that can actually be pushed to the backend Oracle NoSQL Database store are restricted to a finite subset of all possible predicates. This is because the predicates that can be supported by Oracle NoSQL Database are not only dependent on what the Hive predicate pushdown mechanism supports, but the semantics of the Oracle NoSQL Database table API as well. As a result, the operators that are supported by the predicate pushdown mechanism of the table API Hive integration classes are currently limited to:

=   <   <=   >   >=   AND   OR   IN

In addition to the above set of operators, the semantics of the table API can also affect how the table's fields (columns) will be handled during predicate pushdown. Specifically, for a given query's predicate, if a valid primary key, index key, and/or field range (as defined by the table API) cannot be formed from all or a subset of that predicate's fields, and no part of the predicate can be pushed to the server using the filter mechanism provided by SQL for Oracle NoSQL Database (see Filtering Results in the SQL Beginner's Guide), then the query's predicate will not be decomposed and sent to the database for backend filtering. Instead, the system will fallback to the default mechanism, and perform all filtering on the client side, applying the predicate to all the rows in the given table.

For example, consider the query presented above. For that query, each component of the predicate satisfies the necessary criteria for pushdown, and so the whole predicate will be pushed to the database for search processing. To understand this, first observe that the operators referenced in the query's predicate belong to the set described above; that is, '=', 'AND', '>=', '<'.

Next, based on the schema of the vehicleTable, the fields named type and make form a valid primary key for performing a table scan; and the predicate components referencing the field named model form a valid field range. Compare this with a query such as,

SELECT * FROM vehicleTable WHERE make = 'Chrysler' AND \   
    model >= 'Imperial' AND model < 'Sebring';

Assuming there is no index of the form (make, model), for this query, although the absence of the key's first field prevents the construction of a valid primary key as required by the semantics of the table API, the predicate can still be pushed to the backend store because it is considered valid for filtering by the SQL For Oracle NoSQL Database filtering mechanism. Finally, consider a query such as,

SELECT * FROM vehicleTable WHERE model LIKE "%Silverado%";

For this query, predicate pushdown will be bypassed and all filtering will be performed on the client side. This is because the predicate employs the LIKE operator, which is not currently eligible for predicate pushdown.

Note that the initial two example queries that were presented both result in the whole predicate being pushed and all filtering being performed on the backend. Whereas the third example query results in no predicate pushdown and all filtering being performed on the client side.

This does not mean that predicate pushdown will always be handled in such an all-or-nothing manner. On the contrary, for many queries, only part of the predicate will be pushed to the database to produce initial filtered results, which are then further filtered on the client side using the remaining - residual - part of the predicate.

For example, consider a query that wishes to find each '4WheelDrive' vehicle in the database that is 'blue', 'red', or 'yellow', and has a model name that begins with the letter 'E' (that is, Equinox, Expedition, Explorer, etc.). Such a query would look like the following:

SELECT * FROM vehicleTable WHERE \
    class = '4WheelDrive' AND \
    color IN ('blue','red','yellow') AND \
    model LIKE '%E%';

Based on the criteria presented in the next section, the only component of the query's predicate that cannot be pushed to the backend is the component that employs the LIKE operator (model LIKE '%E%'), whereas all other components in the query are eligible to be pushed. Thus, when executing the given query, the part of the predicate consisting of the components "class = '4WheelDrive' AND color IN ('blue', 'red', 'yellow')" will be pushed to the backend, producing rows referencing all four wheel drive vehicles that are blue, red, or yellow; after which the client will apply the residual predicate (model LIKE '%E%') to the results from the backend, to select and return only those rows with model name including an uppercase 'E'.