3 XQuery Engine and SQL

This chapter provides an overview of how Oracle Data Service Integrator works with relational data, and describes what happens when a relational data source is imported into Oracle Data Service Integrator.

The chapter also explains how SQL data types are mapped to XQuery data types and describes what happens during runtime after deploying a data-service-enabled application. The chapter further explains how queries are handled and describes the kind of performance you can expect.

This chapter covers the following topics:

Note that while the graphical-user interface tools handle many of the details, SQL developers and application-performance tuning experts should understand how Oracle Data Service Integrator works with relational data so that they can:

3.1 Introduction

At the core of Oracle Data Service Integrator is the data processing engine, often referred to as simply the XQuery engine—the robust, enterprise-class implementation of the XQuery language based on the standards listed in Section 1.3, "Supported XQuery Specifications," with additional enhancements as detailed in Chapter 2, "Oracle's XQuery Implementation."

In addition to compliance with XQuery and XML recommendations, Oracle Data Service Integrator XQuery engine also complies with the ANSI/ISO standard that bridges the SQL and XML worlds (the "SQL/XML (ISO-ANSI Working Draft) XML-Related Specifications" WD 9075-14 (SQL/XML), August, 2002).

As a Java application (J2EE server application), Oracle Data Service Integrator uses JDBC to generate SQL queries and submit them to the appropriate RDBMSs that comprise a data service, which means Oracle Data Service Integrator must accommodate differences in both SQL and JDBC, as follows:

  • SQL Language. The SQL standard has evolved over time, and vendor implementations (in their respective RDBMS products) may be at any number of stages of compliance with the standard (SQL-89, SQL-92, SQL:1999, and SQL:2003, for example). Furthermore, vendors implement various extensions to SQL in their respective RDBMS products. In short, Oracle Data Service Integrator's support for SQL is not a "one-size-fits-all" exercise: achieving optimal integration with relational data sources requires Oracle Data Service Integrator to generate vendor-specific SQL code at times.

  • JDBC API. Drivers are provided by RDBMS vendors as well as third-parties; various drivers for each RDBMS can have different levels of JDBC compatibility.

Given these factors, Oracle Data Service Integrator provides two different levels of SQL support for relational database management systems (RDBMS): base support and core support, as defined in the next section.

3.1.1 Base and Core RDBMS Support

Oracle Data Service Integrator provides two different levels of support for relational data sources:

  • Base support. Oracle Data Service Integrator generates standard SQL code that is minimally required to be supported by any SQL RDBMS. Some examples of base platforms would include Oracle 7, Informix, IDMS, and MySQL.

  • Core support. Oracle Data Service Integrator supports the native SQL dialect of specific versions of several leading commercial RDBMSs using the RDBMS-specific-JDBC of the vendor's JDBC driver or Oracle's JDBC driver (see Table 3-1).  

Table 3-1 Core Oracle Data Service Integrator RDBMS Support

RDBMS and Versions Vendor Driver Oracle WebLogic Driver

IBM DB2 9.7, 10.1

IBM DB2 JDBC driver

WebLogic JDBC Driver for DB2 Type 4

Microsoft SQL Server 2008 R2, 2012

Microsoft JDBC driver 4.0 for SQL Server

WebLogic JDBC Driver for Microsoft SQL Server Type 4

Oracle 11.x, 12.x

Oracle JDBC Thin driver, version 12.1

WebLogic JDBC Driver Type 4

Sybase Adaptive Server Enterprise 15.7

Sybase jConnect driver, version 6.05

WebLogic JDBC Driver Type 4

Teradata 13.x, 14.x

Teradata JDBC driver

N/A


3.1.2 How the XQuery Engine Supports SQL Data Sources

Oracle Data Service Integrator supports SQL (relational) data sources throughout the life-cycle of a data services project, from metadata import, through query plan optimization, through runtime execution of queries and delivery of data to an end-user (or other) application. Specifically, the XQuery engine provides:

  • Metadata Mapping. Importing metadata from relational data sources is the first step in creating a data service.

  • Data Type Mapping. Upon import of metadata, Oracle Data Service Integrator maps data types from the RDBMS data source into XQuery atomic data types, disregarding length and other constraints. If the data source tables or views include unsupported data types — an array, for example — the column is ignored (the GUI tool alerts the person performing the import if this issue arises, and enables the person to map the data type of the source table or view to a specific XQuery data type).

  • Query Optimization. The XQuery processing engine is fast and efficient, and uses several optimizing strategies, including:

  • SQL pushdown. As much as possible, processing is shifted from the XQuery engine to the native RDBMS so that smallest practical result set is actually processed by the XQuery engine.

  • Lazy evaluation. Queries are executed against the physical data sources only as far as necessary to obtain results.

  • Connection-sharing. Multiple active queries can run over a single connection (assuming the data source RDBMS allows; see Table 3-2).

3.1.2.1 Metadata and Data Type Mappings Get Stored in Annotated Files

For each of the tables and views whose metadata is imported into Oracle Data Service Integrator (using Import Source Metadata feature of the GUI), two files are generated:

  • Entity data service (.ds) file that defines the main access function (an external XQuery function with annotations that specify the RDBMS catalog or schema name and other properties) to access to the table or view data and return a sequence of elements corresponding to the rows of the underlying table. The .ds file includes numerous annotations to handle metadata about the data service, including:

  • Relational provider identifier.

  • Table structure information, including column names (field names), SQL data types and corresponding XQuery data types, primary key, and foreign key information.

  • Relationship functions that provide access to related tables or views.

  • Relationship annotations.

  • JNDI lookup information. The <relationalDB> annotation in the data service file provides the JNDI name that will be used at runtime to obtain a connection to the data source and execute queries.

  • XML Schema definition (.xsd) file that includes information about all the columns of the table (or view) and the data types for those columns, as mapped into the XQuery data types.

3.1.2.2 Runtime Connection Management—Connection Sharing

At runtime, the XQuery engine:

  • Obtains a connection to the RDBMS.

  • Prepares SQL statements, setting up parameters if necessary.

  • Executes the SQL statements and releases the connection.

  • Handles errors and exceptions.

  • Translates the result of the query to the XML model used by XQuery engine.

Database connections (connection pools) are registered in the JNDI (Java naming and directory interface) tree of the WebLogic Server (an administrator with privileges on the server can configure connection pool, data source, and JNDI name by which connection pools are accessible).

When sub-plan execution completes, connections are typically not released back to the WebLogic Server. The XQuery engine holds the connection for the duration of the entire XQuery — not just the duration of the SQL — enabling subsequent queries to the same relational data source to be executed using an already obtained connection (which also improves performance). Whether the XQuery engine can share connections or not depends on the underlying data source and JDBC driver (see Table 3-2).

If the data source RDBMS or JDBC driver does not support connection sharing, and if the Oracle Data Service Integrator has opened multiple connections to the same data source, the XQuery engine keeps the initial connection to a data source open during XQuery execution but releases any subsequent connections to the same data source once the SQL result is received in its entirety by the XQuery engine. The initial connection will be re-used subsequent SQL queries when the connection becomes available.

Table 3-2 Runtime Connection Management

RDBMS Support

Base RDBMS

No connection sharing.

IBM DB2 10.1, 9.7

Microsoft SQL Server 2008 R2, 2012

Oracle 11.x, 12.x

Sybase Adaptive Server Enterprise 15.7

Single shared connection for each JNDI data source; each connection supports multiple active SQL queries.

Teradata 14.x, 13.x

No connection sharing. Each access requires dedicated connection.


3.2 XQuery-SQL Data Type Mappings

XQuery-SQL data type mappings are specific to the RDBMS version and the JDBC driver, as discussed in Section 3.1.1, "Base and Core RDBMS Support." The specific data type mappings for each core RDBMS and the general mappings for any base RDBMS are detailed in Appendix A. However, XQuery and SQL differ in some respects that may affect XQuery-to-SQL translation; these differences apply to all RDBMSs:

3.2.1 Date and Time Data Type Differences: Timezones and Time Precision

The XQuery language defines richer data types than SQL for handling date and time information (temporal data). These data types provide more information (timezone data, for instance) or greater degree of precision (unlimited number of fractional seconds as part of a time or date, for example). The three built-in XQuery data types for data and time information are:

  • xs:dateTime

  • xs:date

  • xs:time

Minimally, every RDBMS has a single datatype that conveys both date and time data. This datatype maps to XQuery's xs:dateTime data type. Some RDBMSs offer additional SQL data types for storing date and time data separately (see Table 3-3)

(Of all the RDBMSs supported by Oracle Data Service Integrator, only Oracle 11.x (and higher) offers data types with timezone data (TIMESTAMP WITH TIMEZONE, TIMESTAMP WITH LOCAL TIMEZONE).

Table 3-3 Temporal Data Type Mappings

Database xs:date xs:dateTime xs:time

Base RDBMS

Reported by JDBC driver for the specific RDBMS

Reported by JDBC driver for the specific RDBMS

Reported by JDBC driver for the specific RDBMS

IBM DB2/NT 8

DATE

TIMESTAMP

TIME

Microsoft SQL Server 2000

N/A

DATETIMEFoot 1 , SMALLDATETIMEFoot 2 

N/A

Oracle 8.1.x

N/A

DATEFoot 3 

N/A

Oracle 9.x, 10.x

N/A

DATE, TIMESTAMP, TIMESTAMP WITH LOCAL TIMEZONE, TIMESTAMPWITH TIMEZONE

N/A

Sybase Adaptive Server Enterprise 12.5.2 (and higher)

DATE

DATETIME, SMALLDATETIME

TIME


Footnote 1 Supports fractional seconds up to 3 digits (miliseconds).

Footnote 2 Accuracy of 1 minute.

Footnote 3 Provides both date and time data, but supports neither fractional seconds nor timezone data (fractional-second data is truncated).

Oracle Data Service Integrator XQuery engine maps all SQL date and time data types to XQuery data types (for example, during metadata import of a new data source) without loss of data or precision.

However, the converse is not true: depending on the specific RDBMS (and JDBC driver) for a specific data source, the XQuery engine may need to perform additional processing to minimize data loss and to handle the timezone information when mapping XQuery temporal data types to SQL.

3.2.2 How Oracle Data Service Integrator Handles Timezone Information

When a query is being pushed down to an RDBMS that does not support timezone data, the Oracle Data Service Integrator XQuery engine converts date and time data into the local time of the underlying application server and removes the timezone information. The conversion occurs each time a date or time value that includes timezone data is sent to the data source, as follows:

  • During compile time, when SQL is generated for constant date or time expressions.

  • During query run time, when executing parameterized SQL with parameters bound to date/time values.

  • During update, when a date or time value must be stored in the RDBMS.

3.2.3 How Oracle Data Service Integrator Handles Fractional Seconds

The XQuery language supports unlimited precision for fractional seconds, while the Oracle Data Service Integrator XQuery engine supports up to 7 digits only (for fractional seconds). However, depending on the specific RDBMS, fractional second support may be far less than 7 digits. In translating from XQuery to SQL, Oracle Data Service Integrator truncates fractional seconds to the precision supported by that RDBMS.

For example, since Microsoft's DATETIME data type supports up to 3 digits (milliseconds) for fractional time precision, when Oracle Data Service Integrator sends a datetime value to Microsoft SQL Server 2008, the value is first converted into the local time zone and then any fractional seconds are converted to the 3-digit-milliseconds allowed.

If fractional-second-precision is required (but the data source does not support it appropriately), use the fn-bea:fence() function to disable pushdown of date and time data types and operations, so that the XQuery engine processes the time- and date-related queries. (See Section 3.4, "Preventing SQL Pushdown" for more information.)

See Appendix A for more information about time and date data types for core and base RDBMS.

3.2.4 Scope Differences for Expressions and Data Types

The XQuery language is less restrictive than the SQL language in terms of the scope of expressions and data types. For example, for most all RDBMSs, an SQL query that returns a boolean can only be used inside a WHERE clause. XQuery does not have such restrictions, and as a result, in some cases, valid XQuery expressions cannot be pushed down. Expressions and data types that cannot be pushed include:

  • Expressions returning boolean type can only be used in the WHERE clause (all RDBMSs)

  • Some data types, such as CLOB, can be returned in the project list but cannot be grouped on or sorted on (depending on the RDBMS' SQL dialect; see Appendix A for details).

  • Aggregate functions inside an ordering expression, such as in ORDER BY clauses, are not pushed down for any base RDBMS (but is supported by all other RDBMSs). See Appendix A for more information.

3.3 SQL Pushdown: Performance Optimization

Oracle Data Service Integrator achieves optimal performance for queries by performing SQL pushdown. Pushdown is an optimization technique that offloads processing from the XQuery engine by sending native SQL queries to the data source so that minimal result sets necessary to answer the query get processed by the XQuery engine.

SQL pushdown reduces the amount of data transported and processed by Oracle Data Service Integrator XQuery processing engine. This technique dramatically improves overall performance, especially when joining tables.

For example, a JOIN operation on two tables can be done by the underlying RDBMS, returning only the final result, rather than delivering all the data to the XQuery engine for processing the JOIN condition. Sorting criteria are also handled by the data source, eliminating the need to re-sort the data inside the XQuery engine.

For all core RDBMSs, the XQuery engine identifies the XQuery constructs and operations that can be translated into equivalent SQL operations. These include:

  • Basic language constructs, including constants, variables, path expressions, functions and operators, and cast operations.

  • Common query patterns, such as selections and projections (where clauses), joins (inner, outer, semi-join, anti-semi-join), ordering clauses, groupings and aggregations.

Not all queries can (or should) get pushed down. The XQuery engine does not pushdown:

  • Cross-joins. Any join without a condition (any join that results in a Cartesian product)

  • Expressions tagged with the fn-bea:fence() function.

The remainder of this section covers SQL pushdown in more detail, providing syntax samples based on the table structures shown in Figure 3-1. (For ease of reading, namespace references are not shown in the example queries.) In some cases, the query may not get pushed down as SQL, but the fragments of the query — names of columns, for example — may get pushed to the project list.

Figure 3-1 Table Structures for SQL Pushdown Examples

Table Structures for SQL Pushdown Examples
Description of "Figure 3-1 Table Structures for SQL Pushdown Examples"

3.3.1 Function and Operator Pushdown

XQuery functions and operators are translated into SQL only when:

  • all arguments can be pushed down directly (or as parameters)

  • at least one of the argument expressions uses a value from the relational data source

  • the XQuery function or operator has an equivalent SQL expression with equivalent semantics

  • data type of the result is supported

Table 3-4 shows an XQuery statement and its corresponding "pushdown" or SQL translation. (Oracle syntax is used.)

Table 3-4 Function Pushdown Example

XQuery Statement SQL Translation (Oracle Syntax)
for $c in CUSTOMER()
return lower-case($c/LAST_NAME)
SELECT LOWER(t1."LAST_NAME") AS c1
FROM "CUSTOMER" t1 

If some arguments to a function or operator are not directly pushable, but can be replaced with parameters, the XQuery engine will replace the arguments with parameters and pushdown the SQL. For example, since the XQuery's string-join() function has no explicit SQL equivalent, it is replaced with a parameter (see Table 3-5).

Table 3-5 External Variable Pushdown

XQuery Statement SQL Statement
declare variable $p as xs:string external;
...
for $c in CUSTOMER()
where starts-with($c/LAST_NAME, string-join( ("a", "b"), $p ))
return $c/FIRST_NAME
SELECT t1."FIRST_NAME" AS c1
FROM "CUSTOMER" t1 
WHERE t1."LAST_NAME" LIKE ?

Aggregate Functions

Oracle Data Service Integrator translates XQuery 1.0 and XPath 2.0 aggregate functions into corresponding SQL aggregate functions (Table 3-6).

Table 3-6 Aggregate Functions

XQuery Aggregate Function SQL Aggregate Function
fn:avg()
AVG()
fn:count()
COUNT()
fn:max()
MAX()
fn:min()
MIN()
fn:sum()
SUM()
fn:count(fn:distinct-values()
COUNT(DISTINCT …) 

Note that the distinct-values() XQuery aggregate function in conjunction with the fn:count() function is further translated into an SQL COUNT(DISTINCT...) operation, as shown in Table 3-7. See Section 3.3.8, "Grouping and Aggregation" for some examples of how aggregate functions in conjunction with other expressions affect the outcome of SQL pushdown.

3.3.2 Parameters in Generated SQL Statements

The Oracle Data Service Integrator XQuery engine generates parameters from variables, functions, operators, and cast operations as needed for use by the SQL engine. If all arguments to a function are parameters, the entire function gets pushed as a parameter.

The functions that can be pushed down depend on the database. See the Appendix A for details.

3.3.3 Cast Operation Pushdown

As with functions and operators, support for cast operation pushdown is RDBMS-specific, although cast pushdown is available only for core (not base) RDBMSs. The XQuery engine can pushdown cast operations if the data source RDBMS:

  • has equivalent SQL data types for both source and target of the cast XQuery data types (see Chapter 3, "XQuery Engine and SQL" for details).

  • has a semantically equivalent SQL operation to convert from source data type to target data type.

Table 3-7 shows an example of how a cast in XQuery would get pushed down to a Microsoft SQL Server 2000 data source.

Table 3-7 Cast Operation Pushdown

XQuery Statement SQL Statement (Microsoft SQL Server 2000 Syntax)
for $c in CUSTOMER() 
where xs:string($c/ZIP_CODE) eq "95131"
return $c/CUSTOMER_ID
SELECT t1."CUSTOMER_ID" AS c1 
FROM "CUSTOMER" t1
WHERE CAST(t1."ZIP_CODE" AS VARCHAR) = '95131'

3.3.4 Path Expressions Pushdown

The XQuery engine maps table columns to XML elements that are children of the corresponding row elements. Simple XQuery path expressions are recognized by the XQuery engine as column accessors. For example, $c/ZIP_CODE and $c/LAST_NAME (see Table 3-9) provide access to ZIP_CODE and LAST_NAME columns.

3.3.5 Constant Pushdown

The Oracle Data Service Integrator XQuery engine translates XQuery constants into SQL constants only if the data source has an equivalent SQL data type. Table 3-8 shows an example of a constant used in a FLWOR expression and how that constant gets translated in the SQL statement.

Table 3-8 SQL Pushdown for Constants

XQuery Statement SQL Statement
for $c in CUSTOMER() 
where $c/ZIP_CODE eq 95131
return $c/LAST_NAME
SELECT t1."LAST_NAME" AS c1
FROM "CUSTOMER" t1 
WHERE t1."ZIP_CODE" = 95131

3.3.6 Variable Pushdown

Both external and internal variables in XQuery expressions can be translated into SQL parameters (in generated SQL statements) when the variable's datatype is supported by the XQuery engine and:

  • is atomic (static data type).

  • can be translated into equivalent SQL type.

Table 3-9 shows an example of variable pushdown.

Table 3-9 Variable Pushdown

XQuery Statement SQL Statement
declare variable $extVar
as xs:string external;
for $c in CUSTOMER()
where $c/CUSTOMER_ID eq $extVar
return $c/LAST_NAME
SELECT t1."LAST_NAME" as c1
FROM "CUSTOMER" t1
WHERE t1."CUSTOMER_ID" = ?

3.3.7 Common Query Patterns

For each relational data source, the precise set of expressions pushed down depends on the capabilities of the underlying RDBMS; for details, see Chapter 3, "XQuery Engine and SQL."

3.3.7.1 Simple Projection Queries

Each of the example XQueries shown in Table 3-10 returns elements containing values of LAST_NAME columns from a CUSTOMER table. In all cases, the SQL statement generated by the XQuery engine is the same (see Table 3-10).

Table 3-10 Projection Query

XQuery Statements SQL Statement
for $c in CUSTOMER() return $c/LAST_NAME
CUSTOMER()/LAST_NAME
for $c in CUSTOMER() return data($c/LAST_NAME)
data(CUSTOMER()/LAST_NAME)
SELECT t1."LAST_NAME" AS 
c1 FROM "CUSTOMER" t1

The difference between the first two queries and the last two queries is that the fn:data() function is used in the query to limit the results to values only. Without the fn:data() function, the result is a list of <LAST_NAME> elements containing corresponding column values. If a column value is NULL, the element is skipped. With the fn:data() function, the result is the actual values.

3.3.7.2 Where Clause Pushdown

An XQuery where clause is usually translated into an SQL WHERE clause. An XQuery where clause gets pushed down as SQL when:

Table 3-11 shows an example of a where clause pushdown.

Table 3-11 Where Clause Pushdown

XQuery Statements SQL Statements
for $c in CUSTOMER() 
where $c/CUSTOMER_ID eq "CUSTOMER01"
return $c/LAST_NAME
SELECT t1."LAST_NAME" AS c1 
FROM "CUSTOMER" t1 
WHERE t1."CUSTOMER_ID" = 'CUSTOMER01'
for $c in CUSTOMER()
where year-from-dateTime($c/BIRTH_DAY) 
eq 
  year-from-date(current-date())
return 
  $c/LAST_NAME
(DB2 syntax)
SELECT t1."LAST_NAME" AS c1
FROM "CUSTOMER" t1
WHERE 
  YEAR(t1."BIRTH_DAY") = ?

However, note that if the WHERE clause follows a group by clause, the WHERE clause is translated into a HAVING clause. See Section 3.3.8.4, "Group-By with a Nested Where Clause Translates to SQL HAVING Clause").

3.3.7.3 Order By Clause Pushdown

An XQuery order by expression comprises:

  • ordering expression

  • direction property for each ordering expression; that is, ascending or descending

  • empty ordering property for each ordering expression; that is, empty least or empty greatest

The XQuery engine can pushdown SQL for ordering expressions, including properties, only when the ordering expression:

  • is pushable and uses data from the database.

  • is of the kind supported by the underlying data source (some RDBMSs can only support order by columns, not arbitrary expressions; some RDBMSs support non-column expressions in order by clause only if they do not contain aggregate functions.

  • when an empty expression can result in empty sequence, the RDBMS must support the same NULL order as the empty order specified by the XQuery. (Some RDBMSs have fixed NULL order, some allow NULL order to be specified—see Chapter 3, "XQuery Engine and SQL" for details).

Table 3-12 shows an example of an order by clause pushdown.

Table 3-12 Order By Pushdown

XQuery Statement SQL Statement
for $c in CUSTOMER() 
 order by $c/CUSTOMER_ID  descending
return $c/CUSTOMER_ID
SELECT t1."CUSTOMER_ID" AS c1 
FROM "CUSTOMER" t1 
ORDER BY t1."CUSTOMER_ID" DESC

Table 3-13 shows an example of the SQL pushdown that occurs when ordering by a NULLable column (ADDRESS2) in the XQuery clause when the RDBMS supports dynamic setting of NULL order.

Table 3-13 Order By Query When Setting NULL Order Dynamically

XQuery Statement SQL Statement (Oracle Syntax)
for $c in CUSTOMER()
order by $c/ADDRESS2 ascending
      empty greatest
return $c/CUSTOMER_ID, $c/ADDRESS2
SELECT t1."CUSTOMER_ID" AS c1,
       t1."ADDRESS2"    AS c2
FROM "CUSTOMER" t1 
ORDER BY t1."ADDRESS2" ASC NULLS LAST

If the data source RDBMS does not support the required empty (NULL) order, the order by will not be pushed down.

As another optimization, the Oracle Data Service Integrator XQuery engine can insert order by clauses into generated SQL statements—even when the original XQuery statement does not include them—to offload expensive sorting operations to the RDBMS. They are automatically inserted by the XQuery optimizer prior to execution. You can see these as well in the Query Plan View.

3.3.7.4 Inner Join Pushdown

Joining data from multiple sources is a very common data integration task. In SQL terms, an inner join relates each row in one table (or view) to one or more corresponding rows in another table or view. In XQuery, an inner join is expressed as a FLWR expression comprising several for clauses that iterate over the data sources, where clauses that specify the join predicates, and a return clause returning data values.

If two relational sources are located in the same database, the inner join can sometimes be pushed down as a single SQL statement using either SQL-92 or SQL-89 syntax, depending on the RDBMS of the data source.

An inner join can be pushed down when:

  • the condition itself is pushable.

  • both join branches belong to the same RDBMS and can be addressed from a single SQL statement (both branches are in the same JNDI data source).

  • join condition exists and uses values from both branches (cross joins are not pushed down).

Figure 3-2 XQuery Inner Join Pattern

XQuery Inner Join Pattern
Description of "Figure 3-2 XQuery Inner Join Pattern"

Although the example in Figure 3-2 shows a simple inner join between two branches, the XQuery engine also supports n-way joins, with each branch comprising a different for statement. See also Table 3-14.

Table 3-14 Rendering of XQuery Inner-Join as SQL-92 and SQL-89 Syntax

SQL-92 Syntax SQL-89 Syntax
SELECT t1."LAST_NAME" AS c1, t2."ORDER_ID" AS c2 
FROM "CUSTOMER" t1 JOIN "CUST_ORDER" t2 
ON t1."CUSTOMER_ID" = t2."CUSTOMER_ID"
SELECT t1."LAST_NAME" AS c1, t2."ORDER_ID" AS c2 
FROM "CUSTOMER" t1, "CUST_ORDER" t2 
WHERE t1."CUSTOMER_ID" = t2."CUSTOMER_ID"

3.3.7.5 Outer Join Pushdown

The XQuery engine interprets nested FLWR expressions (see Figure 3-3) as an outer join and can generate SQL for a data source when:

  • both join branches belong to the same database and are addressable from a single SQL statement (both branches must come from the same JNDI datasource), and

  • join condition is present and uses values from both branches, and

  • join condition is pushable, and

  • the underlying RDBMS supports outer join syntax using either SQL-92 or proprietary syntax in its SQL language

Figure 3-3 Outer Join Pattern

Outer Join Pattern
Description of "Figure 3-3 Outer Join Pattern"

The SQL code generated by the XQuery engine depends on the SQL dialect supported by the source database (see Appendix A for details). Table 3-15 shows example SQL-92 and proprietary syntax for the query shown in Figure 3-3.

Table 3-15 SQL-92 and Proprietary Outer Join Syntax Comparison

SQL-92 Syntax Oracle 11 Syntax
SELECT t1."LAST_NAME" AS c1, t2."ORDER_ID" AS c2 
FROM "CUSTOMER" t1 OUTER JOIN "CUST_ORDER" t2 
ON t1."CUSTOMER_ID" = t2."CUSTOMER_ID"
SELECT t1."LAST_NAME" AS c1, t2."ORDER_ID" AS c2 
FROM "CUSTOMER" t1, "CUST_ORDER" t2 
WHERE t1."CUSTOMER_ID" = t2."CUSTOMER_ID" (+)

Variations of the outer-join pattern are obtained from the original query by using equivalent XQuery expressions. Figure 3-4 is an example of a query equivalent to that shown in Figure 3-3 that will also result in a SQL statement with an outer join.

Figure 3-4 Outer Join Pattern

Outer Join Pattern
Description of "Figure 3-4 Outer Join Pattern"

3.3.7.6 Semi-Joins and Anti-Semi-Joins

A semi-join returns data from a single branch of the join condition, when the join condition is satisfied. An anti-semi-join returns data from a single branch when the join condition is false. Although the XQuery language does not have specific constructs for semi-joins and anti-semi-joins, the XQuery engine translates several specific FLWR patterns into SQL semi-join or anti-semi-join patterns, assuming that:

  • both sides (outer and inner) belong to the same database and are addressable from a single SQL statement (both branches must come from the same JNDI datasource).

  • the join condition exists.

  • the join condition is pushable.

  • the RDBMS supports the EXISTS function and subqueries (see Appendix A for details).

The XQuery interprets a FLWR query containing an inner existential quantified expression as a semi-join, translating the expression into an SQL query with the EXISTS check in the WHERE clause.

Universal quantified expressions are also supported, but their SQL generation is slightly more complicated. The XQuery engine translates FLWRs with exist() or empty() predicates in the where clause into semi-joins. Table 3-16 shows several examples of such patterns.

Table 3-16 Various XQuery Patterns that Can Generate Semi-Join and Anti-Semi-Join SQL

Pattern XQuery Statement SQL Statement

FLWR with existential ("some") quantifier [semi-join]

for $customer in CUSTOMER()
where 
    some $c_order in CUST_ORDER()
    satisfies ($customer/CUSTOMER_ID eq $c_order/ORDER_ID) 
and
($c_order/STATUS eq "OPEN")
return 
    $customer/CUSTOMER_ID
SELECT t1."CUSTOMER_ID" AS c1 
FROM "CUSTOMER" t1
WHERE EXISTS(
  SELECT 1
  FROM "CUST_ORDER" t2
  WHERE t1."CUSTOMER_ID" = t2."CUSTOMER_ID" AND t2."STATUS" = 'OPEN'
)

FLWR with negation of existential quantifier [anti-semi join]

for $customer in CUSTOMER()
where not(
    some $c_order in CUST_ORDER()
satisfies ($customer/CUSTOMER_ID eq $c_order/ORDER_ID) 
and
($c_order/STATUS eq "OPEN")
 )
 return 
    $customer/CUSTOMER_ID
SELECT t1."CUSTOMER_ID" AS c1 
FROM "CUSTOMER" t1
WHERE NOT EXISTS(
  SELECT 1
  FROM "CUST_ORDER" t2
  WHERE t1."CUSTOMER_ID" = t2."CUSTOMER_ID" AND t2."STATUS" = 'OPEN'
)

FLWR with universal ("every") quantified expression

for $customer in CUSTOMER() 
where 
    every $c_order in CUST_ORDER()
satisfies ($customer/CUSTOMER_ID eq $c_order/ORDER_ID) and
              ($c_order/STATUS eq "OPEN")
return 
    $customer/CUSTOMER_ID
SELECT t1."CUSTOMER_ID" AS c1 
FROM "CUSTOMER" t1
WHERE NOT EXISTS(
  SELECT 1
  FROM "CUST_ORDER" t2
  WHERE NOT(t1."CUSTOMER_ID" = t2."CUSTOMER_ID" AND t2."STATUS" = 'OPEN')
)

FLWR with exists() predicate

or $customer in CUSTOMER() 
where exists(
    for $c_order in CUST_ORDER() 
where ($customer/CUSTOMER_ID eq $c_order/ORDER_ID) and 
          ($c_order/STATUS eq "OPEN")
    return $c_order
 )
 return 
    $customer/CUSTOMER_ID
SELECT t1."CUSTOMER_ID" AS c1 
FROM "CUSTOMER" t1
WHERE EXISTS(
  SELECT 1
  FROM "CUST_ORDER" t2
  WHERE t1."CUSTOMER_ID" = t2."CUSTOMER_ID" AND t2."STATUS" = 'OPEN'
)

FLWR with empty() predicate

for $customer in CUSTOMER()
where empty(
for $c_order in CUST_ORDER()
where ($customer/CUSTOMER_ID eq $c_order/ORDER_ID) and ($c_order/STATUS eq "OPEN")
return $c_order
)
return 
$customer/CUSTOMER_ID
SELECT t1."CUSTOMER_ID" AS c1 
FROM "CUSTOMER" t1
WHERE NOT(EXISTS(
  SELECT 1
  FROM "CUST_ORDER" t2
  WHERE t1."CUSTOMER_ID" = t2."CUSTOMER_ID" AND t2."STATUS" = 'OPEN'
))

3.3.8 Grouping and Aggregation

The XQuery engine supports several patterns for group by pushdown and aggregate function pushdown.

3.3.8.1 Group By Pushdown

The Group By clause is a Oracle extension to the XQuery language (see Section 2.2.2.1, "Generalized FLWGOR (group by)" for more information). The XQuery engine implicitly adds a group by expression to some patterns to enable more efficient pushdown and query execution.

Figure 3-5 XQuery Containing a Group By

XQuery Containing a Group By
Description of "Figure 3-5 XQuery Containing a Group By"

The XQuery engine translates group-by clauses into equivalent SQL GROUP BY clauses if:

  • the expressions defining grouping variables are pushable

  • the partition variable is used by an aggregate function only

Since the query shown in Figure 3-5 meets these requirements, the following SQL statement is generated:

SELECT t1."CATEGORY" AS c1, COUNT(*) AS c2
FROM "PRODUCT" t1
GROUP BY t1."CATEGORY"

The group-by pushdown is closely related to the Distinct-by Pushdown. When a group-by clause does not include a partition variable, the XQuery engine generates SQL that includes the DISTINCT keyword, as described in the next section.

3.3.8.2 Distinct-by Pushdown

An XQuery containing a Group By clause (without a partition definition), can be generated into SQL query that uses SQL's DISTINCT keyword to eliminate duplicates in the result. For example, the XQuery statement in Table 3-17 uses a group-by clause but has no partition defined, and the SQL statement created by Oracle Data Service Integrator refines the result by using the DISTINCT keyword.

Table 3-17 Distinct By Pushdown

XQuery Statement SQL Statement
for $product in PRODUCT()
group by $product/CATEGORY_ID as $category
return $category
SELECT DISTINCT t1."CATEGORY_ID" AS c1 
FROM "PRODUCT" t1

3.3.8.3 Simple Aggregate Pattern

An aggregate function operating on a single column from a data source is one of the simplest aggregate patterns that the XQuery engine supports, although it does so in a slightly non-intuitive way. It uses a constant as a single grouping expression (...GROUP ...BY n). The XQuery engine can pushdown the SQL if the RDBMS supports either a GROUP BY operation on a constant or supports sub-queries in the sub-clause (see Table 3-18).

Table 3-18 Aggregate Pushdown

XQuery Statement SQL StatementFoot 1  SQL StatementFoot 2 
for $product in PRODUCT()
group $product 
as $price_group
by 1
return min($price_group/LIST_PRICE)
SELECT MIN(t1."LIST_PRICE") AS c1
FROM "PRODUCT" t1
GROUP BY 1
SELECT MIN(t2.c2) AS c3 
FROM (
  SELECT 1 AS c1, t1."LIST_PRICE" AS c2 
  FROM "PRODUCT" t1
) t2 
GROUP BY t2.c1

Footnote 1 RDBMS supports GROUP BY constant

Footnote 2 RDBMS does not support GROUP BY, but does support sub-queries in the FROM clause

3.3.8.4 Group-By with a Nested Where Clause Translates to SQL HAVING Clause

If a relational data source supports nested WHERE clauses, the XQuery engine can translate a where clause after a group-by clause into a SQL HAVING clause (see Table 3-19), provided that the where clause meets other requirements for XQuery-SQL translation.

Table 3-19 Nested WHERE Clauses

XQuery Statement SQL Statement
for $product in PRODUCT()
group $product/LIST_PRICE as $price_group
by $product/CATEGORY as $category
where max($price_group) gt 1000
return 
<t>
{ 
  $category,
  min($price_group)
}
</t>
SELECT t1."CATEGORY" AS c1, MIN(t1."LIST_PRICE") AS c2
FROM "PRODUCT" t1
GROUP BY t1."CATEGORY"
HAVING MAX(t1."LIST_PRICE") > 1000

3.3.8.5 Outer Join with Aggregate Pattern

Another common pattern supported by the XQuery engine is outer join with aggregation of the right branch, which is expressed in XQuery as nested FLWR expressions with aggregate functions in the inner level (Table 3-20).

Table 3-20 Outer Join with Aggregate

XQuery Statement SQL Statement
for $customer in CUSTOMER()
return 
<customer>
   <name>{ data($customer/LAST_NAME) }</name>
   <order-amount>
   {
     sum(
       for $c_order in CUST_ORDER()
       where $customer/CUSTOMER_ID eq $c_order/CUSTOMER_ID
       return $c_order/ORDER_AMOUNT
     )    
   }
   </order-amount>
</customer>
SELECT t1."LAST_NAME" AS c1, SUM(t2."ORDER_AMOUNT") AS c2
FROM "CUSTOMER" t1 
LEFT OUTER JOIN "CUST_ORDER" t2 
ON (t2."CUSTOMER_ID" = t1."CUSTOMER_ID") 
GROUP BY t1."CUSTOMER_ID"

With this type of query, in order to fully push as much of the query as possible to the data source RDBMS, the XQuery engine evaluates the outer join first and then performs the group-by on the left branch's primary key column, to compute the aggregate. The XQuery engine can perform this optimization only if the left branch of the query has a key column. As shown in Table 3-21, the CUSTOMER does, so the optimization will be performed.

The net effect is that only the XML creation is performed in the XQuery engine.

3.3.8.6 If-Then-Else Pattern

The CASE expression, introduced in SQL:1992, provides a way to use if-then-else logic in SQL statements without having to invoke procedures. The CASE expression correlates a list of values and alternatives.

An XQuery if-then-else pattern can be translated into an SQL CASE expression if:

  • the underlying data source (RDBMS) supports CASE expressions.

  • the XQuery data type result is not an xs:boolean.

  • the data types associated with the then and else expressions are the same (quantifiers are disregarded).

The then and else expressions can contain (or fully consist of) parameters. If the if-then-else expression does not depend on the data source, the entire expression is pushed as a parameter.

An example can be seen in Table 3-21.

Table 3-21 If-Then-Else Pushdown

XQuery Statement SQL Statement
for $i in CUST_ORDER()
return 
     if ($i/STATUS eq "SHIPPED") 
     then data($i/STATUS) 
     else data($i/CUSTOMER_ID)
SELECT 
     CASE WHEN (t1."STATUS" = 'SHIPPED') 
     THEN t1."STATUS" 
     ELSE t1."CUSTOMER_ID" END AS c1
FROM "CUST_ORDER" t1

3.3.8.7 Subsequence Pushdown

In the typical RDBMS application, it is quite common to paginate the results — output just 20 customer records per page, for example, for printing or other purposes. XQuery meets this need with its subsequence( ) function. XQuery provides two different subsequence functions, shown in Table 3-22 and in Table 3-23.

Table 3-22 Two- and Three-Argument Variants of XQuery Subsequence Function

Two-argument variant Three-argument variant
fn:subsequence(
 $sourceSeq as item()*,
 $startingLoc as xs:double
) as item()*
fn:subsequence(
 $sourceSeq as item()*,
 $startingLoc as xs:double,
 $length as xs:double
) as item()*

Table 3-23 Subsequence Pushdown

XQuery Statement SQL Statement (DB2)
let $s := 
  for $i in t2:PRODUCT() 
   order by $i/LIST_PRICE descending
   return $i
for $p in subsequence($s, 1, 10)
return <product>
  <name>
     { data($p/PRODUCT_NAME) } </name>
  <price> 
   { data($p/LIST_PRICE) } 
  </price>
</product>
};
SELECT t3.c1, t3.c2 FROM(
 SELECT ROW_NUMBER() OVER() 
     as c3, t2.c1, t2.c2
  FROM(
   SELECT t1."LIST_PRICE" as c1, 
    t1."PRODUCT_NAME" as c2
    FROM "RTLALL"."PRODUCT" t1
    ORDER BY t1."LIST_PRICE" DESC
   )t2
  )t3
WHERE(t3.c3 <11)

The two-argument variant returns the remaining items of an input sequence, starting from the $startingLoc. The three-argument variant returns $length items of the input sequence starting from the $startingLoc. Table 3-24 shows several different examples of the subsequence function in the context of specific queries.

Table 3-24 Examples of XQuery Expressions using Subsequence Function

Query statement XQuery Expression

Return the 10 most expensive products only.

let $s := 
  for $i in PRODUCT() 
  order by $i/LIST_PRICE descending
  return $i
for $p in subsequence($s, 1, 10)
return <product>
  <name> { data($p/PRODUCT_NAME) } </name>
  <price> { data($p/LIST_PRICE) } </price>
</product>

Return all service cases opened against each of the 10 most expensive products (outer join).

let $s := 
  for $i in PRODUCT() 
  order by $i/LIST_PRICE descending
  return $i
for $p in subsequence($s, 1, 10)
return <product>
<name> { data($p/PRODUCT_NAME) } </name>
{
  for $sc in SERVICE_CASE()
  where $p/PRODUCT_ID eq $sc/PRODUCT_ID and
    $sc/STATUS = 'Open'
  return <case>{ data($sc/CASE_ID) }</case> 
}
</product>

Return the total number of service cases opened against each of the 10 most expensive products (aggregation).

let $s := 
  for $i in PRODUCT() 
  order by $i/LIST_PRICE descending
  return $i
for $p in subsequence($s, 1, 10)
return
<product>
<name> { data($p/PRODUCT_NAME) } </name>
{
 let $scs := 
  for $sc in SERVICE_CASE()
  where $p/PRODUCT_ID eq $sc/PRODUCT_ID and $sc/STATUS = 'Open'
  return $sc
 return <case_count>{ count($scs) }</case_count>
}
</product>

An XQuery subsequence pattern can be translated into an SQL subsequence expression if:

  • the fn:subsequence( ) operates on a FLWR expression that returns items from the RDBMS

  • the return expression in the inner FLWR must always return a single item (it can be a row element or column element)

  • the underlying data source (RDBMS) supports subsequence

Oracle Data Service Integrator can pushdown the subsequence pattern to the underlying RDBMS, thereby enhancing performance, as long as the underlying RDBMS supports it.

  • IBM DB2 9.7 and 10.1 support both variants of the subsequence function.

  • Oracle 11.x and 12.x support both versions of the subsequence function, without restriction.

  • Microsoft SQL Server 2008 R2 and Microsoft SQL Server 2012 support the three-argument version only, and requires that $startingLoc must be 1 (a constant) and $length must be an xs:integer constant.

  • Teradata 14.x and 13.x support both versions of the subsequence function, without restriction.

    Note:

    Subsequence pushdown is not supported for Sybase, or any base RDBMS (see "XQuery-SQL Mapping Reference" for other core and base RDBMS information).

3.3.9 Direct SQL Data Services and Pushdown

Oracle Data Service Integrator lets you create data services not only from relational tables and views, but also from SQL queries. These direct SQL data services, as they are called, can also be composed by the XQuery engine, and pushed down as native SQL to the target RDBMS, if:

  • the RDBMS supports sub-queries in the FROM clause.

  • for outer join pushdown, key information must be specified in the Direct SQL data service configuration (see Appendix A).

If the RDBMS does not support sub-queries (the FROM clause), the pushdown will not occur.

For example, a user-defined SQL query, "recent_order" is configured as a relational source:

SELECT * from RECENT_ORDER

The XQuery that gets created in the data service and the resulting generated SQL that gets pushed down by the XQuery engine are shown in Table 3-25.

Table 3-25 Direct SQL Data Service Example

XQuery Statement SQL Statement
declare variable 
$external_variable as xs:string external;
for $recent_order in RECENT_ORDER()
where $recent_order/ORDER_ID eq $external_variable
return $recent_order/ORDER_AMOUNT
SELECT t1."ORDER_AMOUNT" AS c1 
FROM (
  SELECT * FROM RECENT_ORDER
) t1
WHERE t1."ORDER_ID" = ?

SQL pushdown on top of direct SQL is not limited to simple select-project queries. Any operation for which pushdown is supported for table and view sources is also supported for data services created for direct SQL queries. For example, Table 3-26 shows a join query and its generated result.

Table 3-26 Direct SQL Data Service with Join Condition

XQuery Statement SQL Statement
for $customer in CUSTOMER()
for $recent_order in RECENT_ORDER()
where $customer/CUSTOMER_ID eq $recent_order/CUSTOMER_ID
return
<t>{ $customer/CUSTOMER_ID, $recent_order/ORDER_ID }</t>
SELECT t1."CUSTOMER_ID" AS c1, t2."ORDER_ID" AS c2
FROM "CUSTOMER" t1 
JOIN (
  SELECT * FROM RECENT_ORDER
) t2
ON t1."CUSTOMER_ID" = t2."CUSTOMER_ID"

3.3.10 Distributed Query Pushdown

Oracle Data Service Integrator uses SQL pushdown to off-load query processing to the underlying data source RDBMS whenever possible. However, as mentioned in Section 3.1.2, "How the XQuery Engine Supports SQL Data Sources," SQL pushdown is not always possible, nor beneficial. For example, when two data sources are running on two different systems, or when a query combines relational data with non-relational data, SQL pushdown may not provide any performance benefit.

In cases such as these, Oracle Data Service Integrator uses special techniques to batch-process the outside portion of a query (the left branch) and send a cluster (or chunk) of data to the right branch as parameters (see Table 3-27).

The XQuery engine chooses this optimization technique (a "clustered parameter passing join," also known as PPK) for a distributed query when:

  • join pattern is recognized by the compiler, and

  • the join cannot be pushed down in its entirety for any reason, and

  • join condition is pushable to either branch when all expressions operating on another branch are treated as parameters in the generated SQL.

Table 3-27 Distributed Query Pushdown — a PPK Join Example

XQuery Statement SQL Statement
for $customer in CUSTOMER()
for $order in ORDER()
where
$customer/CUSTOMER_ID eq 
$recent_order/CUSTOMER_ID
return
<t>{ $customer/CUSTOMER_ID, $order/ORDER_ID }</t>
SELECT t1."CUSTOMER_ID" AS c1, 
t1."ORDER_ID" as c2
from "ORDER" t1
WHERE t1."CUSTOMER_ID" = ? OR
t1."CUSTOMER_ID" = ?
...
OR
t1."CUSTOMER_ID" =?

Unless all these conditions are met, the XQuery engine cannot use this optimization technique but will instead use the single parameter join instead (PP1 join).

3.4 Preventing SQL Pushdown

Developers can exercise control over SQL pushdown by using the fn-bea:fence() function (an Oracle extension to XQuery functions and operations) to demarcate sections of XQuery code that the XQuery engine should ignore when it is evaluating query fragments for SQL pushdown.

For the example shown in Table 3-28, even though the upper-case function could be pushed down to the RDBMS, its pushdown is blocked by the fence() function and the upper-case function will be executed by the XQuery engine. Only the fragment comprising the lower-case function is included in the query plan as SQL pushdown. The result of the SQL will be returned to the XQuery engine, which will use the XQuery upper-case function on the result.

Table 3-28 Using the fn-bea:fence() Function

XQuery Statement SQL Statement
for $c in CUSTOMER() 
return 
   upper-case( 
     fn-bea:fence( 
       lower-case( $c/LAST_NAME ) 
     ) 
   )
SELECT LOWER(t1."LAST_NAME") AS c1
FROM "CUSTOMER" t1 

Use the fence() function whenever you want SQL to be sent as is, to the RDBMS. For example, if you are accessing an Oracle 8.5.x RDBMS that uses hints and Oracle's rule-based optimizer, you should send the hinted SQL queries to the data source by wrapping them in the fence() function.

To circumvent SQL pushdown for specific clauses, extract those clauses into separate FLWOR expressions with the fence( ) function at the top of the clause, as shown here:

for $x in 
fn-bea:fence
   (
      for $c in CUSTOMER()
      return $c/LAST_NAME
   )
order by $x
return $x

As you develop data services that use relational data sources, use the Oracle Data Service Integrator Query Plan View to see the results of using the fence( ) function (Figure 3-6). In this example, the order by clause will be executed by the XQuery engine rather than pushed down as SQL.

Figure 3-6 Example of an XQuery Plan without (l) and with (r) the fn-bea:fence() Function

XQuery Plan without and with fn-bea:fence() Function
Description of "Figure 3-6 Example of an XQuery Plan without (l) and with (r) the fn-bea:fence() Function"

Note that the red triangles displayed in the SQL portions of Figure 3-6 are alerts calling attention to the fact that a where clause is missing from the XQuery statement.