XQuery and XQSE Developer’s Guide

     Previous  Next    Open TOC in new window    View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Best Practices Using XQuery

This chapter offers a series of best practices for creating data services using XQuery. The chapter introduces a data service design model, and describes a conceptual model for layering data services to maximize management, maintainability, and reusability.

This chapter includes the following topics:

 


Introducing Data Service Design

When designing data services, you should strive to maximize the ability to maintain, manage, and reuse queries. One approach is to adopt a layered design model that partitions services into the following levels:

Figure 5-1 illustrates the data service design model.

Figure 5-1 Data Service Design Model

Data Service Design Model

Using this design model, you can design and develop data services in the following manner:

  1. Develop the Physical Services based on introspection of physical data sources.
  2. Define the Application Services based on precise client application requirements.
  3. Design the Canonical Services to normalize and create relationships between data accessed using the Physical Services.
  4. Design the Logical Services to manipulate and transform data accessed through the Canonical and Physical Services, providing general purpose reusable services to the Application Services layer.
  5. Work through the layers from the top down, determining optimal functions for each level and factoring out reusable queries.

 


Understanding Data Service Design Principles

This section describes best practices for designing and developing services at each layer of the data service design model. Table 5-2 describes the data service design principles.

Table 5-2 Data Service Design Principles  
Level
Design Principle
Description
Application Services
Base design on client needs
Design data services and queries at the Application Services level specifically tuned to client needs, using functions defined at the Logical and Canonical Service levels.
Nest or relate information, as required by the application
Use the XML practice of nesting related information in a single XML structure. Alternatively, use navigation functions to relate associated information, as required by the application.
Introduce constraints at the highest level
AquaLogic Data Services Platform propagates constraints down function levels when generating queries. By keeping constraints, such as function parameters, at the highest level, you encourage reuse of lower level functions and permit the system to efficiently optimize the final generated query.
Aggregate data at the highest level
Aggregate data in functions at the highest level possible, preferably at the Application Services level.
Logical Services
Create common functions to serve multiple applications
Design functions that provide common services required by applications. Base function design at the Logical Services level on requirements already established at the Application Services level, based on client needs.
Refactor to reduce the number of functions
Refactor the functions, as necessary, to reduce the overall number of functions to as few as possible. This reduces complexity, simplifies documentation, and eases future maintenance.
Canonical Services
Use function defined in the Physical Services level
Create (public) read functions can then all be expressed in terms of the main “get all instances” function.
Canonical Services
Create navigation functions to represent relationships
Use separate data services with relationships (implemented through navigation functions) rather than nesting data. For example, create navigation functions to relate customers and orders or customers and addresses instead of nesting this information.
This keeps data services and their queries small, making them more manageable, maintainable, and reusable.
Define keys to improve performance
Defining keys enables the system to use this information when optimizing queries.
Establish relationships between unique identifiers and primary keys
Establish relationships between unique identifiers or primary keys that refer to the same data (such as Customer ID or SSN) but vary across multiple data sources. You can use either of the following methods:
  • Create navigation functions to create relationships between the data.
  • Create a new table in the database to relate the unique identifiers and primary keys.
Physical Services
Employ functions that get all records
Using protected functions that get all records at the Physical Services level provides the system with the most flexibility to optimize data access based on constraints specified in higher level functions.
Do not perform data type transformations
The system is unable to generate optimizations based on constraints specified at higher levels when data type transformations are performed at the Physical Services level.
Do not aggregate
Use aggregates at the highest level possible to enable the system to optimize data access.

 


Applying Data Service Implementation Guidelines

Table 5-3 describes implementation guidelines to apply when designing and developing data services.

Table 5-3 Data Service Implementation Guidelines  
Level
Design Principle
Description
Application Services
Use the group clause to aggregate
When performing a simple aggregate operation (such as count, min, max, and so forth) over data stored in a relational source, use a group clause as illustrated by the following:
for $x in f1:CUSTOMER()
group $x as $g by 1
return count($g)
instead of:
count( f1:CUSTOMER() )
in order to enable pushdown of the aggregation operation to the underlying relational data source.
Note that the two formulations are semantically equivalent except for the case where the sequence returned by f1:CUSTOMER() is the empty sequence. Of course performance will be better for the pushed down statement.
Application Services
Use element(foo) instead of schema-element(foo)
Define function arguments and return types in data services as element(foo) instead of schema-element(foo). Using schema-element instead of element causes AquaLogic Data Services Platform to perform validation, potentially blocking certain optimizations.
Use xs:string to cast data
Use xs:string when casting data instead of fn:string(). The two approaches are not equivalent when handling empty input, and the use of xs:string enables cast operations to be executed by the database.
Be aware of Oracle treating empty strings as NULL, and how this affects XQuery semantics
The Oracle RDBMS treats empty strings as NULL, without providing a method of distinguishing between the two. This can affect the semantics of certain XQuery functions and operations.
For example, the fn:lower-case() function is pushed down to the database as LOWER, though the two have different semantics when handling an empty string, as summarized by the following:
  • fn:lower-case() returns an empty string
  • LOWER in Oracle returns NULL
When using Oracle, consider using the fn-bea:fence() function and performing additional computation if precise XQuery semantics are required.
Application Services
Return plural for functions that contain FLWOR expressions
When a function body contains a FLWOR expression, or references to functions that contains FLWOR, the function should return plural.
For example, consider the following XQuery expression:
For $c in CUSTOMER()
Return
  <CUSTOMER>
    <LAST_NAME>$c/LAST_NAME</LAST_NAME>
    <FIRST_NAME>$c/FIRST_NAME
        </FIRST_NAME>
    <ADDRESS>{
       For $a in ADDRESS()
       Where $a/CUSTOMER_ID =
         $c/CUSTOMER_ID
       Return
         $a
    }</ADDRESS>
  </CUSTOMER>
Defining a one-to-one relationship between a CUSTOMER and an ADDRESS, as in the following, can block optimizations.
<element name=CUSTOMER>
   <element name=LAST_NAME/>
   <element name=FIRST_NAME/>
   <element name=ADDRESS/>
</element>
Application Services
Return plural for functions that contain FLWOR expressions (continued)
This is because AquaLogic Data Services Platform determines that there can be multiple addresses for one CUSTOMER. This leads the system to insert a TypeMatch operation to ensure that there is exactly one ADDRESS. The TypeMatch operation blocks optimizations, thus producing a less efficient query plan.
The Query Plan Viewer shows TypeMatch operations in red and should be avoided. Instead, the schema definition for ADDRESS should indicate that there could be zero or more ADDRESSes.
<element name=CUSTOMER>
   <element name=LAST_NAME/>
   <element name=FIRST_NAME/>
   <element name=ADDRESS minOccurs=”0”
      maxOccurs=”unbounded”/>
</element>
Avoid cross product
situations
Avoid cross product (Cartesian Product) situations when including conditions. For example, the following XQuery sample results in poor performance due to a cross product situation:
define fn ($p string)
for $c in CUSTOMER()
for $o in ORDER()
where $c/id eq $p
and $o/id eq $p
Instead, use the following form to specify the same query:
define fn ($p string)
for $c in CUSTOMER()
for $o in ORDER()
where $c/id eq $o/id
and $c/id eq $p


  Back to Top       Previous  Next