5 Best Practices Using XQuery

This chapter offers a series of best practices for creating data services using XQuery. The chapter introduces a data service design model, and describes a conceptual model for layering data services to maximize management, maintainability, and reusability.

This chapter includes the following topics:

5.1 Introducing Data Service Design

When designing data services, you should strive to maximize the ability to maintain, manage, and reuse queries. One approach is to adopt a layered design model that partitions services into the following levels:

  • Application Services. Data services at the Application Services level are defined by client application requirements. Functions defined in this layer can additionally be used to constraint queries and to aggregate data, among other tasks.

  • Logical Services. The Logical Services contain functions that perform general purpose logical operations and transformations on data accessed through Canonical and Physical Services.

  • Canonical Services. Data services defined at the Canonical Services level normalize data obtained from the Physical Services level.

  • Physical Services. The Physical Services are defined by the system based on introspection of physical data sources. The system creates data service functions that retrieve all rows in a table, offering the greatest flexibility for data service functions defined in higher layers. The system also defines relationships between data services, as required.

Figure 5-1 illustrates the data service design model.

Figure 5-1 Data Service Design Model

Data Service Design Model
Description of "Figure 5-1 Data Service Design Model"

Using this design model, you can design and develop data services in the following manner:

  1. Develop the Physical Services based on introspection of physical data sources.

  2. Define the Application Services based on precise client application requirements.

  3. Design the Canonical Services to normalize and create relationships between data accessed using the Physical Services.

  4. Design the Logical Services to manipulate and transform data accessed through the Canonical and Physical Services, providing general purpose reusable services to the Application Services layer.

  5. Work through the layers from the top down, determining optimal functions for each level and factoring out reusable queries.

5.2 Understanding Data Service Design Principles

This section describes best practices for designing and developing services at each layer of the data service design model. Table 5-1 describes the data service design principles.

Table 5-1 Data Service Design Principles

Level Design Principle Description

Application Services

Base design on client needs

Design data services and queries at the Application Services level specifically tuned to client needs, using functions defined at the Logical and Canonical Service levels.

 

Nest or relate information, as required by the application

Use the XML practice of nesting related information in a single XML structure. Alternatively, use navigation functions to relate associated information, as required by the application.

 

Introduce constraints at the highest level

Oracle Data Service Integrator propagates constraints down function levels when generating queries. By keeping constraints, such as function parameters, at the highest level, you encourage reuse of lower level functions and permit the system to efficiently optimize the final generated query.

 

Aggregate data at the highest level

Aggregate data in functions at the highest level possible, preferably at the Application Services level.

Logical Services

Create common functions to serve multiple applications

Design functions that provide common services required by applications. Base function design at the Logical Services level on requirements already established at the Application Services level, based on client needs.

 

Refactor to reduce the number of functions

Refactor the functions, as necessary, to reduce the overall number of functions to as few as possible. This reduces complexity, simplifies documentation, and eases future maintenance.

Canonical Services

Use function defined in the Physical Services level

Create (public) read functions can then all be expressed in terms of the main "get all instances" function.

Canonical Services

Create navigation functions to represent relationships

Use separate data services with relationships (implemented through navigation functions) rather than nesting data. For example, create navigation functions to relate customers and orders or customers and addresses instead of nesting this information.

This keeps data services and their queries small, making them more manageable, maintainable, and reusable.

 

Define keys to improve performance

Defining keys enables the system to use this information when optimizing queries.

 

Establish relationships between unique identifiers and primary keys

Establish relationships between unique identifiers or primary keys that refer to the same data (such as Customer ID or SSN) but vary across multiple data sources. You can use either of the following methods:

  • Create navigation functions to create relationships between the data.

  • Create a new table in the database to relate the unique identifiers and primary keys.

Physical Services

Employ functions that get all records

Using protected functions that get all records at the Physical Services level provides the system with the most flexibility to optimize data access based on constraints specified in higher level functions.

 

Do not perform data type transformations

The system is unable to generate optimizations based on constraints specified at higher levels when data type transformations are performed at the Physical Services level.

 

Do not aggregate

Use aggregates at the highest level possible to enable the system to optimize data access.


5.3 Applying Data Service Implementation Guidelines

Table 5-2 describes implementation guidelines to apply when designing and developing data services.

Table 5-2 Data Service Implementation Guidelines

Level Design Principle Description

Application Services

Use the group clause to aggregate

When performing a simple aggregate operation (such as count, min, max, and so forth) over data stored in a relational source, use a group clause as illustrated by the following:

for $x in f1:CUSTOMER()
group $x as $g by 1
return count($g)

instead of:

count( f1:CUSTOMER() )

in order to enable pushdown of the aggregation operation to the underlying relational data source.

Note that the two formulations are semantically equivalent except for the case where the sequence returned by f1:CUSTOMER() is the empty sequence. Of course performance will be better for the pushed down statement.

Application Services

Use element(foo) instead of schema-element(foo)

Define function arguments and return types in data services as element(foo) instead of schema-element(foo). Using schema-element instead of element causes Oracle Data Service Integrator to perform validation, potentially blocking certain optimizations.

 

Use xs:string to cast data

Use xs:string when casting data instead of fn:string(). The two approaches are not equivalent when handling empty input, and the use of xs:string enables cast operations to be executed by the database.

 

Be aware of Oracle treating empty strings as NULL, and how this affects XQuery semantics

The Oracle RDBMS treats empty strings as NULL, without providing a method of distinguishing between the two. This can affect the semantics of certain XQuery functions and operations.

For example, the fn:lower-case() function is pushed down to the database as LOWER, though the two have different semantics when handling an empty string, as summarized by the following:

  • fn:lower-case() returns an empty string

  • LOWER in Oracle returns NULL

When using Oracle, consider using the fn-bea:fence() function and performing additional computation if precise XQuery semantics are required.

 

Use the function fn:exists() to check for NULL in a database field

Oracle Data Service Integrator does not create elements for fields with NULL value. For example, Oracle Data Service Integrator will not create a <PHONE> element if the corresponding value is NULL in the database.

You can therefore use the function fn:exists() to check for NULL in a database field, as this function is equivalent to the SQL clause PHONE IS NULL.

Application Services

Return plural for functions that contain FLWOR expressions

When a function body contains a FLWOR expression, or references to functions that contains FLWOR, the function should return plural.

For example, consider the following XQuery expression:

For $c in CUSTOMER()
Return 
  <CUSTOMER>
    <LAST_NAME>$c/LAST_NAME</LAST_NAME>
    <FIRST_NAME>$c/FIRST_NAME
      </FIRST_NAME>
    <ADDRESS>{
      For $a in ADDRESS()
        Where $a/CUSTOMER_ID =
          $c/CUSTOMER_ID
      Return
        $a
    }</ADDRESS>
  </CUSTOMER>

Defining a one-to-one relationship between a CUSTOMER and an ADDRESS, as in the following, can block optimizations.

<element name=CUSTOMER>
   <element name=LAST_NAME/>
   <element name=FIRST_NAME/>
   <element name=ADDRESS/>
</element>

Application Services

Return plural for functions that contain FLWOR expressions (continued)

This is because Oracle Data Service Integrator determines that there can be multiple addresses for one CUSTOMER. This leads the system to insert a TypeMatch operation to ensure that there is exactly one ADDRESS. The TypeMatch operation blocks optimizations, thus producing a less efficient query plan.

The Query Plan Viewer shows TypeMatch operations in red and should be avoided. Instead, the schema definition for ADDRESS should indicate that there could be zero or more ADDRESSes.

<element name=CUSTOMER>
   <element name=LAST_NAME/>
   <element name=FIRST_NAME/>
   <element name=ADDRESS minOccurs="0"
      maxOccurs="unbounded"/>
</element>
 

Avoid cross product situations

Avoid cross product (Cartesian Product) situations when including conditions. For example, the following XQuery sample results in poor performance due to a cross product situation:

define fn ($p string)
for $c in CUSTOMER()
for $o in ORDER()
where $c/id eq $p
and $o/id eq $p

Instead, use the following form to specify the same query:

define fn ($p string)
for $c in CUSTOMER()
for $o in ORDER()
where $c/id eq $o/id
and $c/id eq $p