5 Best Practices Using XQuery

This chapter offers a series of best practices for creating data services using XQuery. The chapter introduces a data service design model, and describes a conceptual model for layering data services to maximize management, maintainability, and reusability.

This chapter includes the following topics:

Section 5.1, "Introducing Data Service Design"
Section 5.2, "Understanding Data Service Design Principles"
Section 5.3, "Applying Data Service Implementation Guidelines"

5.1 Introducing Data Service Design

When designing data services, you should strive to maximize the ability to maintain, manage, and reuse queries. One approach is to adopt a layered design model that partitions services into the following levels:

Application Services. Data services at the Application Services level are defined by client application requirements. Functions defined in this layer can additionally be used to constraint queries and to aggregate data, among other tasks.
Logical Services. The Logical Services contain functions that perform general purpose logical operations and transformations on data accessed through Canonical and Physical Services.
Canonical Services. Data services defined at the Canonical Services level normalize data obtained from the Physical Services level.
Physical Services. The Physical Services are defined by the system based on introspection of physical data sources. The system creates data service functions that retrieve all rows in a table, offering the greatest flexibility for data service functions defined in higher layers. The system also defines relationships between data services, as required.

Figure 5-1 illustrates the data service design model.

Figure 5-1 Data Service Design Model

Description of "Figure 5-1 Data Service Design Model"

Using this design model, you can design and develop data services in the following manner:

Develop the Physical Services based on introspection of physical data sources.
Define the Application Services based on precise client application requirements.
Design the Canonical Services to normalize and create relationships between data accessed using the Physical Services.
Design the Logical Services to manipulate and transform data accessed through the Canonical and Physical Services, providing general purpose reusable services to the Application Services layer.
Work through the layers from the top down, determining optimal functions for each level and factoring out reusable queries.

5.2 Understanding Data Service Design Principles

This section describes best practices for designing and developing services at each layer of the data service design model. Table 5-1 describes the data service design principles.

Table 5-1 Data Service Design Principles

Level	Design Principle	Description
Application Services	Base design on client needs	Design data services and queries at the Application Services level specifically tuned to client needs, using functions defined at the Logical and Canonical Service levels.
	Nest or relate information, as required by the application	Use the XML practice of nesting related information in a single XML structure. Alternatively, use navigation functions to relate associated information, as required by the application.
	Introduce constraints at the highest level	Oracle Data Service Integrator propagates constraints down function levels when generating queries. By keeping constraints, such as function parameters, at the highest level, you encourage reuse of lower level functions and permit the system to efficiently optimize the final generated query.
	Aggregate data at the highest level	Aggregate data in functions at the highest level possible, preferably at the Application Services level.
Logical Services	Create common functions to serve multiple applications	Design functions that provide common services required by applications. Base function design at the Logical Services level on requirements already established at the Application Services level, based on client needs.
	Refactor to reduce the number of functions	Refactor the functions, as necessary, to reduce the overall number of functions to as few as possible. This reduces complexity, simplifies documentation, and eases future maintenance.
Canonical Services	Use function defined in the Physical Services level	Create (public) read functions can then all be expressed in terms of the main "get all instances" function.
Canonical Services	Create navigation functions to represent relationships	Use separate data services with relationships (implemented through navigation functions) rather than nesting data. For example, create navigation functions to relate customers and orders or customers and addresses instead of nesting this information. This keeps data services and their queries small, making them more manageable, maintainable, and reusable.
	Define keys to improve performance	Defining keys enables the system to use this information when optimizing queries.
	Establish relationships between unique identifiers and primary keys	Establish relationships between unique identifiers or primary keys that refer to the same data (such as Customer ID or SSN) but vary across multiple data sources. You can use either of the following methods: Create navigation functions to create relationships between the data. Create a new table in the database to relate the unique identifiers and primary keys.
Physical Services	Employ functions that get all records	Using protected functions that get all records at the Physical Services level provides the system with the most flexibility to optimize data access based on constraints specified in higher level functions.
	Do not perform data type transformations	The system is unable to generate optimizations based on constraints specified at higher levels when data type transformations are performed at the Physical Services level.
	Do not aggregate	Use aggregates at the highest level possible to enable the system to optimize data access.

5.3 Applying Data Service Implementation Guidelines

Table 5-2 describes implementation guidelines to apply when designing and developing data services.

Table 5-2 Data Service Implementation Guidelines

Level	Design Principle	Description
Application Services	Use the group clause to aggregate	When performing a simple aggregate operation (such as count, min, max, and so forth) over data stored in a relational source, use a group clause as illustrated by the following: for $x in f1:CUSTOMER() group $x as $g by 1 return count($g) instead of: count( f1:CUSTOMER() ) in order to enable pushdown of the aggregation operation to the underlying relational data source. Note that the two formulations are semantically equivalent except for the case where the sequence returned by `f1:CUSTOMER()` is the empty sequence. Of course performance will be better for the pushed down statement.
Application Services	Use `element(foo)` instead of `schema-element(foo)`	Define function arguments and return types in data services as `element(foo)` instead of `schema-element(foo)`. Using schema-element instead of element causes Oracle Data Service Integrator to perform validation, potentially blocking certain optimizations.
	Use `xs:string` to cast data	Use xs:string when casting data instead of fn:string(). The two approaches are not equivalent when handling empty input, and the use of xs:string enables cast operations to be executed by the database.
	Be aware of Oracle treating empty strings as NULL, and how this affects XQuery semantics	The Oracle RDBMS treats empty strings as NULL, without providing a method of distinguishing between the two. This can affect the semantics of certain XQuery functions and operations. For example, the `fn:lower-case()` function is pushed down to the database as LOWER, though the two have different semantics when handling an empty string, as summarized by the following: `fn:lower-case()` returns an empty string LOWER in Oracle returns NULL When using Oracle, consider using the `fn-bea:fence()` function and performing additional computation if precise XQuery semantics are required.
	Use the function `fn:exists()` to check for NULL in a database field	Oracle Data Service Integrator does not create elements for fields with NULL value. For example, Oracle Data Service Integrator will not create a `<PHONE>` element if the corresponding value is NULL in the database. You can therefore use the function `fn:exists()` to check for NULL in a database field, as this function is equivalent to the SQL clause `PHONE IS NULL`.
Application Services	Return plural for functions that contain FLWOR expressions	When a function body contains a FLWOR expression, or references to functions that contains FLWOR, the function should return plural. For example, consider the following XQuery expression: For $c in CUSTOMER() Return <CUSTOMER> <LAST_NAME>$c/LAST_NAME</LAST_NAME> <FIRST_NAME>$c/FIRST_NAME </FIRST_NAME> <ADDRESS>{ For $a in ADDRESS() Where $a/CUSTOMER_ID = $c/CUSTOMER_ID Return $a }</ADDRESS> </CUSTOMER> Defining a one-to-one relationship between a CUSTOMER and an ADDRESS, as in the following, can block optimizations. <element name=CUSTOMER> <element name=LAST_NAME/> <element name=FIRST_NAME/> <element name=ADDRESS/> </element>
Application Services	Return plural for functions that contain FLWOR expressions (continued)	This is because Oracle Data Service Integrator determines that there can be multiple addresses for one CUSTOMER. This leads the system to insert a `TypeMatch` operation to ensure that there is exactly one ADDRESS. The TypeMatch operation blocks optimizations, thus producing a less efficient query plan. The Query Plan Viewer shows `TypeMatch` operations in red and should be avoided. Instead, the schema definition for ADDRESS should indicate that there could be zero or more ADDRESSes. <element name=CUSTOMER> <element name=LAST_NAME/> <element name=FIRST_NAME/> <element name=ADDRESS minOccurs="0" maxOccurs="unbounded"/> </element>
	Avoid cross product situations	Avoid cross product (Cartesian Product) situations when including conditions. For example, the following XQuery sample results in poor performance due to a cross product situation: define fn ($p string) for $c in CUSTOMER() for $o in ORDER() where $c/id eq $p and $o/id eq $p Instead, use the following form to specify the same query: define fn ($p string) for $c in CUSTOMER() for $o in ORDER() where $c/id eq $o/id and $c/id eq $p