Content starts here Using Inverse Functions to Improve Query Performance
This page last changed on Jul 03, 2008.

edocs Home > BEA AquaLogic Data Services Platform 3.0/3.2/3.01 Documentation

Authors
Mike Carey,
Sachin Thatte,
John Jerney
Versions
ALDSP 3.0, 3.2, 3.01

Using Inverse Functions to Improve Query Performance

Prefer all related topics on a single page? Click here.

This topic introduces the concept of inverse functions, and explains how you can use reverse transformations to improve query performance.

Topic Map

Overview

When designing and implementing data services, one of the principal goals is to provide a set of abstractions that enable client applications to see and manipulate integrated enterprise data in a clean, unified, meaningful, canonical form. Doing so invariably requires transforming data, which can include restructuring and unifying the schemas and the instance-level data formats of the disparate data sources.

In such cases, names may be reformatted, addresses normalized, and differences in units reconciled, among other operations, to provide application developers (the consumers of data services) with a natural and easily manipulable view of the underlying data. Such transformations, while highly useful to the end consumers of the data, can lead to performance challenges when retrieving underlying data.

When the resulting data is queried, it is crucial for performance that much of the query processing (especially for selections and joins) be pushable to the underlying sources, particularly to relational data sources. This requires updates to the transformed view of the data to be translatable back into appropriate source updates. Unfortunately, if data transformations are written in a general-purpose programming language, such as Java, both of these requirements can be difficult. This is because, unlike user-written XQuery functions, such general-purpose functions are opaque to the ALDSP query and update processors.

The Inverse Function Solution 

To solve this issue, ALDSP enables data service developers to register inverse functions with the system, enabling you to define and use general user-defined data transformations without sacrificing query pushdown and updatability. Using this information, ALDSP is able to perform a reverse transformation of the data when analyzing query predicates or attempting to decompose updates into underlying data source updates.

This means that you can use inverse functions to retain the benefits of high-performance data processing for your logical data without giving up application-oriented convenience data functions. In addition, inverse functions enable automated updates without the need to create Java update overrides.

Using inverse functions effectively and correctly requires careful design. In particular, you must ensure that the functions are true inverses of one another, otherwise ALDSP may perform undesired operations on your data. While inverse functions are an intuitive and useful idea, be aware that the details require careful attention.

Understanding Invertible Functions

The thing to keep in mind when creating inverse functions is that the functions you create need to be truly invertible.

For example, in the following case date is converted to a string value:

public static String dateToString(Calendar cal) {
   SimpleDateFormat formatter;
   formatter = new SimpleDateFormat("MM/dd/yyyy hh:mm:ss a");
   return formatter.format(cal.getTime()) ;
}

However, notice that the millisecond value is not in the return string value. You get data back but you have lost an element of precision. By default, all values projected are used for optimistic lock checking, so a loss of precision can lead to a mismatch with the database's original value and thus an update failure.

Instead the above code should have retained millisecond values in its return string value, thus ensuring that the data you return is exactly the same as the original value.

How Inverse Functions Can Improve Performance

Here are some additional scenarios where inverse functions can improve performance, especially when large amounts of data are involved:

  • Type mismatches. A UK employees database stores date of hire as an integer number; the U.S. employees database stores hire dates in a datetime format. You can convert the integer values to datetime, but then searching on hire date would require fetching every record in the database and sorting at the middleware layer. In this situation, you could use inverse functions to take advantage of each database's hire date index.
  • Data Normalization. In order to avoid confusion of UK and U.S. employees, a data service function prepends a country code to the employee IDs of both groups. Again, sorting based on these values will be time consuming since the processing cannot be achieved on the backend without modifying the underlying data. Inverse functions can be used to remove the prepended country code when pushing the operation down to the underlying database.
  • Data Transformation. A logical data service has a fullname operation that concatenates firstname and lastname elements. Any attempt to sort by fullname would be penalized by the required retrieval of information on all customers, followed by local processing of the returned results.
  • Data Conversion. There are many cases where values need to be converted to their inverse based on established formulas. For example there could be a requirement that the application retrieve customers by date as an xs:dataTime rather than as a numeric. In this way users could supply date information in a variety of formats. Consider an example where the data architect creates the following XQuery function:
    declare function tns:getEmpWithFixedHireDate() as element(ns0:usemp)*{
       for $e in ns1:USEMPLOYEES()
          return
             <emp>
                <eid>{fn:data($e1/ID)}</eid>
                <name>{mkName($e1/LNAME, $e1/FNAME)}</name>
                <hiredate>{int2date($e1/HIRED)}</hiredate>
                <salary>)fn:data($e1/SAL)}</salary>
             </emp>
    }

    Given such a function, issuing a filter query on hiredate, on top of this function, results in inefficient execution since every record from the back-end must be retrieved and then processed in the middle tier.

You can use inverse functions in these and other situations to improve performance, especially when processing sizable amounts of data.

A Closer Look

Consider the case of a logical data service that has a fullname operation that concatenates firstname and lastname elements. It is clear that performance would be adversely affected when running the fullname operation against large data sets.

The ideal would be to have a function or functions which decomposed fullname into its indexed components, passes the components to the underlying database, gets the results and reconstitutes the returned results to match the requirements of the fullname operation. In fact, that is the basis of inverse functions.

Of course there are no XQuery functions to magically deconstruct a concatenated string. Instead you need to define, as part of your data service development process, custom functions that inverse engineer fullname.

In many cases complimentary inverse functions are needed. For example, fahrenheitToCentigrade() and centigradeToFahenheit() would be inverses of each other. Complimentary inverse functions are also needed to support fullname.

In addition to creating inverse functions, you also need to identify inverse functions when defining the data service.

Related Topics

Concepts
How Tos
Reference

InverseFunctions.zip (application/x-zip-compressed)
Document generated by Confluence on Jul 03, 2008 12:12