Building Queries and Data Views

Analyzing and Optimizing Queries

This chapter describes techniques for optimizing Liquid Data queries.

Note: Tuning Performance in the Deployment Guide contains a general discussion of factors related to tuning and performance of Liquid Data including query design, data sources, and platform considerations.

The following sections are included:

Query Analysis

Two tools are available to help you analyze how the query you created executes on the Liquid Data Server and to measure the performance of the query against its various data sources:

Query Plan
Performance Information

Viewing the Query Plan

The query plan is designed to help you:

Understand how a query is executed \\
Identify areas where query tuning may improve performance

Only the parts of the plan that may have significant performance impact on execution time are displayed when you select the Plan tab in the Results pane. The returned plan identifies the following query components:

Join
Outer join
Select
Sources
Custom function calls
Order-bys
Remove duplicates
Source access operator

The query plan also appears if you select the Compile Query menu option or, in Design mode, the Compile Query icon.

Figure 7-1 Query Plan (DB-XML Sample Project: e2e-order.qpr)

Query Plan (DB-XML Sample Project: e2e-order.qpr)

Getting Information on the Query

Information available on a query after it has been compiled or executed in the Data View Builder includes:

Compilation time
Execution time
Retrieval time and number of invocations for each data source
Query statement

Listing 7-1 Sample Information After Running a Query (DB-XML Sample Project: e2e-order.qpr)

Compilation time: 2.403 sec
Execution time: 1.052 sec

Source: PB-WL{1}
Data retrieval time: 0.02 sec
Invocations: 1
Statement: SELECT t1."CUSTOMER_ID", t1."ORDER_DATE", t1."ORDER_ID", t1."SHIP_METHOD", t1."TOTAL_ORDER_AMOUNT"

FROM "WIRELESS"."CUSTOMER_ORDER" t1 
WHERE (t1."CUSTOMER_ID") = ('CUSTOMER_1')

Source: PB-WL{0}
Data retrieval time: 0.111 sec
Invocations: 1
Statement: SELECT t1."TELEPHONE_NUMBER", t1."CUSTOMER_ID", t1."FIRST_NAME", t1."LAST_NAME", t1."STATE", t1."EMAIL_ADDRESS"

FROM "WIRELESS"."CUSTOMER" t1 WHERE (t1."CUSTOMER_ID") = ('CUSTOMER_1')

Source: XM-BB-CO{2}
Data retrieval time: 0.631 sec
Invocations: 0
Statement: parser

Factors in Query Performance

If you have a good understanding of your data sources and relationships, you will be in a good position to try to improve query performance. It help greatly if you:

Know your data sources from a size and schema point of view.
Evaluate the relationships between your data. Consider factors like expected query result size, memory requirements, and ability to leverage stored queries, as appropriate.

Taking such factors into account you are in a position to add effective optimization hints that may greatly improve query performance.

This section covers some key factors related to performance and memory that you should consider while designing and building queries with the Data View Builder. Examples and recommendations for some typical scenarios and use cases are provided.

Optimizing Queries

To access tools to improve query performance, click on the Optimize tab. (See Figure 7-2.)

Figure 7-2 Optimize Tab

Optimize Tab

Source Order Optimization

When a query uses data from several sources, the Liquid Data Server creates intermediate results into memory combining data from the different sources. The size of these intermediate results depends on the amount of data retrieved from each data source. If you specify more than two sources, the Liquid Data Server combines the first two sources, then continues to integrate each additional resource, one at a time, in the order that they appear in for clauses.

The size of a source is the number of tuples, or records, retrieved by the query from that source. The size of the intermediate result depends on the input size of the first source multiplied by the input size of the second sources and so on. A query is generally more efficient when it minimizes the size of intermediate results.

The order of the peer XQuery for clauses in the query matches the order of the data sources in the Source Order list. In general, you should order sources in ascending order, by size. That is, the smallest resource should appear first in the list and the largest resource should appear last.

You can re-order source schemas on the top frame on the Optimize tab to improve query performance. To move a schema up or down, select the schema and click the up or down arrow buttons to the right of the list of schemas.

Example of Source Order Optimization

Consider a query designed to find all managers and the departments they manage that contains a join across three sources: Employees, Employees2 (Employees opened a second time), and Departments.

Note: You will notice that join selectivity is not considered in the following discussion. This is for the sake of simplicity.

This query joins the Employees schema ID field and the Employees2 schema Manager_id field to return all managers. It joins on the Employees schema Dept_id and Departments schema Department_no to return the corresponding department information. The generated XQuery language looks like the following example:

for $EMP1 in document("Employees")/db/EMP
for $EMP2 in document("Employees")/db/EMP
for $DEPT in document("Department")/db/DEPT
where $EMP1/id eq $EMP2/manager_id and
	$EMP1/dept_id eq $DEPT/department_no
...

This creates a cross-product of Employees ID and Employees Manager_id, then a cross-product with Departments Department_no. If there are 100 employees, and five departments, the query would generate (100 * 100) + (10,000 * 5) intermediate results for a total of 150,000. More accurately, the query would generate a fraction of this number, depending on the join selectivity of the two sources.

A better plan would be to combine Employees with Departments first, then combine that result with Employees2. This is easily accomplished in the Source Order Optimization pane by clicking on the "Department" data source and then on the up-arrow (see Figure 7-2).

The effect is to generates (100 * 5) + (500 * 100) intermediate results for a total of 50,500 intermediate results, a considerable potential processing reduction.

The generated XQuery language looks like the following example.

for $EMP1 in document("Employees")/db/EMP
for $DEPT in document("Department")/db/DEPT
for $EMP2 in document("Employees")/db/EMP
where $EMP1/id eq $EMP2/manager_id and
	$EMP1/dept_id eq $DEPT/department_no
...

Optimization Hints

A critical factor in query performance is the way disparate data sources are joined by the Liquid Data Server. The Liquid Data Server offers three different join methods. The Liquid Data Server optimizer applies heuristics to determine the best method for each case. However, you can apply a join hint in cases where you wish to override the method chosen by the optimizer. In some cases query performance can be greatly improved by properly applying query hints. In some cases query hints can greatly improve performance.

The Optimize tab on the Data View Builder provides a drop-down list of data source pairs and a table that shows the joins that have been applied to each pair. For each join in the table you can provide a hint about how to join the data most efficiently. (See Figure 7-2.)

When used, query hints appear in the query as special-purpose strings enclosed within comment brackets: {--! hint !--}. They specify which join algorithm should be selected when the query runs. The Join Hints frame contains a drop-down list of data source pairs, and a table that shows all the joins for each pair. Only source pairs that have join conditions across them appear in the drop-down list. For each join condition in the table, you can provide a hint about how to join the data most efficiently.

You can easily experiment with different query hints to determine the optimal settings.

Determining When Hints Are Needed

By default no hints are specified, meaning that the Liquid Data built-in optimizer is used. To add a hint to a particular join, select the join and choose a hint from the drop-down Query Hints list. The available hints are shown in Table 7-3.

Table 7-3 Optimization Hints

Hint	Description	Syntax
None (optimizer)	The Liquid Data optimizer takes a best guess at optimizing the statement.	n/a
Left	Parameter Pass to the Left (ppleft)	`{--! ppleft !--}`
Right	Parameter Pass to the Right (ppright)	`{--! ppright !--}`
Merge	Merge	`{--! merge !--}`
Index	Index(es) will be used.	`{--! index !--}`

Apply these rules to determine the correct hint to choose.

Table 7-4 When to Use Which Hint

Hint	When To Use
None (optimizer)	Uses the built-in Liquid Data query optimizer. In many cases this will yield the best results.
Merge	Both relational database sources are large and cannot fit into memory.
Parameter Passing (Left or Right)	One of the sources has fewer objects than the other. See Using Parameter Passing Hints (ppleft or ppright).
Index	The size of the source identified on the right side of the hint is small enough to fit into memory. The left and right sources are generally equal in size. There is at least one non-relational source used in the join.

Notes:

Using optimization hints can help you improve performance on equijoin conditions, which contain only one equality. If the query performs complex join conditions such as (A eq B) OR (C eq D), the join conditions will not appear on the Optimizer tab as the engine does not need a hint for these join conditions. Only the join conditions that are equi-join such as (A eq B), can be given a hint in the Optimizer pane or by manually editing a query.
Choosing the wrong direction for a parameter passing hint can degrade performance.

The following sections provide more detail regarding each type of hint setting.

Using the Liquid Data Built-in Optimizer

When no hints are provided Liquid Data attempts to optimize the query based on an analysis of the query plan based on the frequently correct premise that the most selective kinds of conditions becomes the driving source that passes parameters to the rest of the query, as appropriate.

Using Parameter Passing Hints (ppleft or ppright)

Choose a Parameter Passing hint when one of the sources has a fairly small number of data objects. In order to use the parameter passing hints (ppleft and ppright) effectively, you need to know which data sources contain the larger data sets.

When you choose the direction for the Parameter Passing hint, always choose the data source to the left or right with the larger number of items as the receiver. For example, if there are more items on the right side of the equality, then pass the parameter to the right. The direction indicated in the hint identifies the side in the equation that receives the parameter. In other words, the hints are named for the receiver.

Consider the following example, which is described fully in "Example 1: Simple Joins" in Liquid Data by Example.

Listing 7-2 XQuery with ppright Hints

{--Generated by Data View Builder 8.1 	--}

<customers>

{for $PB-WL.CUSTOMER_1 in document("PB-WL")/db/CUSTOMER
where ($#first_name of type xs:string  eq  $PB-WL.CUSTOMER_1/FIRST_NAME)
return
	<customer id={$PB-WL.CUSTOMER_1/CUSTOMER_ID}>
	<first_name>{ xf:data($PB-WL.CUSTOMER_1/FIRST_NAME) }</first_name>
	<last_name>{ xf:data($PB-WL.CUSTOMER_1/LAST_NAME) }</last_name>


<orders>
{
	for $PB-BB.CUSTOMER_ORDER_3 in document("PB-BB")/db/CUSTOMER_ORDER
	where
		($PB-WL.CUSTOMER_1/CUSTOMER_ID  eq  {--! ppright !--} $PB-BB.CUSTOMER_2/CUSTOMER_ID)
	return
		<order id={$PB-BB.CUSTOMER_ORDER_3/ORDER_ID}
		date={$PB-BB.CUSTOMER_ORDER_3/ORDER_DATE}></order>
	}
</orders>

<customer>
}
</customers>

Note: The second join in the example; the join between PB-WL customer IDs and PB-BB customer IDs:

			where
($PB-WL.CUSTOMER_1/CUSTOMER_ID  eq  {--! ppright !--} $PB-BB.CUSTOMER_2/CUSTOMER_ID)

In the example above, the where clause indicates that the PB-WL data source CUSTOMER table will output only one customer ID. This assumes that the PB-BB data source has a larger amount of customer IDs. You can optimize the join by providing the hint shown above (ppright), which tells the server to retrieve the PB-WL customer information first and then pass the CUSTOMER ID as a parameter to the right to look for matches in the PB-BB data source. The engine will thus require much less memory and respond faster than if no hint was provided.

Using Merge Hints

Choose a merge hint when both relational database sources are large and cannot fit into memory.

The following example shows the XQuery for a merge hint.

Listing 7-3 XQuery with Merge Hint

<root>
	{
	for $Wireless.CUSTOMER_1 in document("Wireless")/db/CUSTOMER
	for $BroadBand.CUSTOMER_2 in document("BroadBand")/db/CUSTOMER
	where ($Wireless.CUSTOMER_1/CUSTOMER_ID  eq  {--!merge!--} $BroadBand.CUSTOMER_2/CUSTOMER_ID)
	return
	<row>
		<CUSTOMER_ID>{ xf:data($BroadBand.CUSTOMER_2/CUSTOMER_ID) }</CUSTOMER_ID>
	</row>
	}
</root>

A merge join requires a minimal amount of memory to operate; however, the input must be sorted on join attributes. A query using a merge join might have a slower response time than a query without a hint, but the memory footprint is typically much smaller with the merge join.

Note: A merge join in a character column might yield unexpected results because the collating sequence for each database may be vary. See Table 7-5 for an example of how incompatible ordering sequences for strings from two different vendors can affect query results.

Table 7-5 Collation Sequences for Some Data Types Vary by Database Vendor

Oracle	MS SQL
ORDER_ID_8009_4	ORDER_ID_8009_4
ORDER_ID_8010_0	ORDER_ID_801_0
ORDER_ID_8011_0	ORDER_ID_8010_0
ORDER_ID_8012_0	ORDER_ID_8011_0
ORDER_ID_801_0	ORDER_ID_8011_1
ORDER_ID_801_1	ORDER_ID_8012_0

To ensure predictable results you should use an index join when merging character (varchar, string, and so forth) columns from different data sources.