bea.com | products | dev2dev | support | askBEA
 Download Docs   Site Map   Glossary 
Search

Building Queries and Data Views

 Previous Next Contents Index View as PDF  

Understanding Query Design Patterns

Here we present some common query patterns generated by the Data View Builder and provide high-level guidelines for effective query design including target schema design and source replication.

Target Schema Design Guidelines and Query Examples

This section provides several examples of queries built with the Data View Builder. We describe the conditions and mappings for a query and the resulting XQuery. The purpose of this is to illustrate how we follow certain guidelines to design the various types of example queries.

For a detailed description of target schemas, see Understanding Target Schemas," on page 1-11.

Design Guidelines

Use these guidelines when working with target schemas in your queries:

  1. Make sure the target schema has proper cardinality. For example, if you intend to project customer orders in your result, the target schema should reflect the parent-child relationship between "customer" and "orders."

    All examples in Examples of Effective Query Design demonstrate this guideline.

  2. Understand how target schema conformity works and use it efficiently. In an XML schema:

    The following examples demonstrate this guideline:

  3. Project at least one element from each data source that is part of the query to the target schema. The following examples illustrates this guideline:

  4. If your eventual goal is to create a data view from your query, your target schema should only contain required elements that are utilized in the query. For example, if the Customer table contains first_name, last_name, email, and phone elements and each of those elements is required in the target schema, then you need to map each element of your query before saving it.

    Alternatively, you can modify your target schema to reflect only the elements your query is using or give your revised target schema a new name. One of the benefits of this approach is that when your data view is created, the only elements of your schema available for queries are those you specifically identify through the target schema.

    In the Data View Builder you can save a target schema using the menu commands:

    File -> Save Target Schema

    See Adding a Target Schema for details.

Examples of Effective Query Design

For the following examples, assume we have two schema sets (databases) with the following entities (tables).

Broadband Schema with the following tables:

Wireless Schema with the following tables:

Example 1: Find all Broadband customers who are also Wireless customers

In this situation you do not project anything from the Wireless customer table.

The generated query will iterate over all customers and in Broadband and check for the existence of a matching customer in Wireless. This query also ensures duplicate customers are not returned from Broadband in the event a Broadband customer matches more than one Wireless customer.

Instructions to create query using Data View Builder:

  1. Map the source PB-BB.CUSTOMER FIRST_NAME and LAST_NAME to the target CUSTOMER FIRST_NAME and LAST_NAME respectively.

  2. Create the condition PB_BB.CUSTOMER CUSTOMER_ID eq PB_WL.CUSTOMER.CUSTOMER_ID

The query looks like:

<db> 
{
for $PB_BB.CUSTOMER_1 in document("PB-BB")/db/CUSTOMER
let $CUSTOMER_2 :=
for $PB_WL.CUSTOMER_3 in document("PB-WL")/db/CUSTOMER
where ($PB_BB.CUSTOMER_1/CUSTOMER_ID eq
                                                         $PB_WL.CUSTOMER_3/CUSTOMER_ID)
return
xf:true()
where xf:not(xf:empty($CUSTOMER_2))
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_1/FIRST_NAME) }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_1/LAST_NAME) }</LAST_NAME>
</CUSTOMER>
}
</db>

If you do not care about duplicates or know there will not be duplicates, you can avoid xf:entity(...) checking by projecting an element from the Wireless customer table.

Instructions to create this alternative version of the query using Data View Builder:

  1. Map the source PB-BB.CUSTOMER FIRST_NAME and LAST_NAME to the target CUSTOMER FIRST_NAME and LAST_NAME respectively.

  2. Map the source PB-WL.CUSTOMER STATE to the target CUSTOMER STATE (Additional projection).

  3. Create the condition PB_BB.CUSTOMER CUSTOMER_ID eq PB_WL.CUSTOMER.CUSTOMER_ID

The query now looks like:

<db> 
{
for $PB_WL.CUSTOMER_1 in document("PB-WL")/db/CUSTOMER
for $PB_BB.CUSTOMER_2 in document("PB-BB")/db/CUSTOMER
where ($PB_BB.CUSTOMER_2/CUSTOMER_ID eq $PB_WL.CUSTOMER_1/CUSTOMER_ID)
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_2/FIRST_NAME) }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_2/LAST_NAME) }</LAST_NAME>
<STATE>{ xf:data($PB_WL.CUSTOMER_1/STATE) }</STATE>
</CUSTOMER>
}
</db>

Example 2: Find all Broadband customers and their Wireless line items

This query basically asks for all Broadband customers and Wireless line items for which there exists a Wireless order that joins with both the Broadband customer and Wireless line item.

Now for this situation user does not project anything from Wireless order table.

The generated query will iterate over all customers and in Broadband, then for each line item it will check for the existence of a matching order in Wireless that also matches a customer in Broadband.

Instructions to create query using Data View Builder:

  1. Map the source PB-BB.CUSTOMER FIRST_NAME and LAST_NAME to the target CUSTOMER FIRST_NAME and LAST_NAME respectively.

  2. Map the source PB-WL.CUSTOMER_ORDER_LINE_ITEM PRODUCT_NAME and EXPECTED_SHIP_DATE to the target CUSTOMER_ORDER_LINE_ITEM PRODUCTION and EXPECTED_SHIP-DATE respectively.

  3. Create the condition PB_BB.CUSTOMER CUSTOMER_ID eq PB_WL.CUSTOMER_ORDER.CUSTOMER_ID

  4. Create the condition PB_WL.CUSTOMER_ORDER.ORDER_ID eq PB_WL.CUSTOMER_ORDER_LINE_ITEM.ORDER _ID

The query looks like:

<ROWS> 
{
for $PB_BB.CUSTOMER_1 in document("PB-BB")/db/CUSTOMER
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_1/FIRST_NAME) }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_1/LAST_NAME) }</LAST_NAME>
<CUSTOMER_ORDER>
{
for $PB_WL.CUSTOMER_ORDER_LINE_ITEM_2 in
                                                 document("PB-WL")/db/CUSTOMER_ORDER_LINE_ITEM
let $CUSTOMER_ORDER_LINE_ITEM_3 :=
for $PB_WL.CUSTOMER_ORDER_4 in
                                                                               document("PB-WL")/db/CUSTOMER_ORDER
where ($PB_BB.CUSTOMER_1/CUSTOMER_ID eq
                                                                                     $PB_WL.CUSTOMER_ORDER_4/CUSTOMER_ID)
and ($PB_WL.CUSTOMER_ORDER_4/ORDER_ID eq
                                                                       $PB_WL.CUSTOMER_ORDER_LINE_ITEM_2/ORDER_ID)
return
xf:true()
where xf:not(xf:empty($CUSTOMER_ORDER_LINE_ITEM_3))
return
<CUSTOMER_ORDER_LINE_ITEM>
<PRODUCT_NAME>{
                                       xf:data($PB_WL.CUSTOMER_ORDER_LINE_ITEM_2/PRODUCT_NAME)
}</PRODUCT_NAME>
<EXPECTED_SHIP_DATE>{
                                   xf:data($PB_WL.CUSTOMER_ORDER_LINE_ITEM_2/EXPECTED_SHIP_DATE)
}</EXPECTED_SHIP_DATE>
</CUSTOMER_ORDER_LINE_ITEM>
}
</CUSTOMER_ORDER>
</CUSTOMER>
}
</ROWS>

For performance reasons, we recommend that you project the intermediate data, especially if you do not care about duplicates or know there will not be duplicates. In the example above, you can project an element from the Wireless order table.

Instructions to create query using Data View Builder:

  1. Map the source PB-BB.CUSTOMER FIRST_NAME and LAST_NAME to the target CUSTOMER FIRST_NAME and LAST_NAME respectively.

  2. Map the source PB-WL.CUSTOMER_ORDER ORDER_ID to the target CUSTOMER_ORDER ORDER_ID. (Additional projection)

  3. Map the source PB-WL.CUSTOMER_ORDER_LINE_ITEM PRODUCT_NAME and EXPECTED_SHIP_DATE to the target CUSTOMER_ORDER_LINE_ITEM PRODUCTION and EXPECTED_SHIP-DATE respectively.

  4. Create the condition PB_BB.CUSTOMER CUSTOMER_ID eq PB_WL.CUSTOMER_ORDER.CUSTOMER_ID

  5. Create the condition PB_WL.CUSTOMER_ORDER.ORDER_ID eq PB_WL.CUSTOMER_ORDER_LINE_ITEM.ORDER _ID

The query looks like:

<ROWS>
{
for $PB_BB.CUSTOMER_1 in document("PB-BB")/db/CUSTOMER
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_1/FIRST_NAME) }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_1/LAST_NAME) }</LAST_NAME>
{
for $PB_WL.CUSTOMER_ORDER_2 in document("PB-WL")/db/CUSTOMER_ORDER
where ($PB_BB.CUSTOMER_1/CUSTOMER_ID eq
                                     $PB_WL.CUSTOMER_ORDER_2/CUSTOMER_ID)
return
<CUSTOMER_ORDER>
<ORDER_ID>{ xf:data($PB_WL.CUSTOMER_ORDER_2/ORDER_ID)
                                       }</ORDER_ID>
{
for $PB_WL.CUSTOMER_ORDER_LINE_ITEM_3 in
                                                 document("PB-WL")/db/CUSTOMER_ORDER_LINE_ITEM
where ($PB_WL.CUSTOMER_ORDER_2/ORDER_ID eq
                                                       $PB_WL.CUSTOMER_ORDER_LINE_ITEM_3/ORDER_ID)
return
<CUSTOMER_ORDER_LINE_ITEM>
<PRODUCT_NAME>{
                                     xf:data($PB_WL.CUSTOMER_ORDER_LINE_ITEM_3/PRODUCT_NAME)
}</PRODUCT_NAME>
<EXPECTED_SHIP_DATE>{
                                 xf:data($PB_WL.CUSTOMER_ORDER_LINE_ITEM_3/EXPECTED_SHIP_DATE) }
</EXPECTED_SHIP_DATE>
</CUSTOMER_ORDER_LINE_ITEM>
}
</CUSTOMER_ORDER>
}
</CUSTOMER>
}
</ROWS>

Example 3: Find all Broadband customers (CUSTOMER is Repeatable and Optional)

The target schema is ROWS(CUSTOMER*). This query returns the root element and all Broadband customers. Since, CUSTOMER is optional, an empty <ROWS/> element could be returned as the result of the query since it would conform to the schema.

Instructions to create query using Data View Builder:

The following query will be generated. Notice that this query will indeed return an empty root element <ROWS/> if there are not any Broadband customers.

The query looks like:

<ROWS> 
{
for $PB_BB.CUSTOMER_1 in document("PB-BB")/db/CUSTOMER
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_1/FIRST_NAME) }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_1/LAST_NAME) }</LAST_NAME>
</CUSTOMER>
}
</ROWS>

Example 4: Find all Broadband customers (CUSTOMER is Repeatable and Required)

This time the target schema is ROWS(CUSTOMER+). Again, this query returns the root element and all Broadband customers. But, since, CUSTOMER is required, in case there are no Broadband customer, an empty <ROWS/> element cannot be returned as the result of the query since such a result would not conform to the given target schema.

Instructions to create query using Data View Builder:

Below is the query generated under this schema. This query will return the root element and all Broadband customers that exist. Observe that if there are not any Broadband customers then an empty result will be returned (not even the root element).

let $CUSTOMER_1 :=  
for $PB_BB.CUSTOMER_2 in document("PB-BB")/db/CUSTOMER
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_2/FIRST_NAME)
                             }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_2/LAST_NAME) }</LAST_NAME>
</CUSTOMER>
where xf:not(xf:empty($CUSTOMER_1))
return
<ROWS>
{ $CUSTOMER_1 }
</ROWS>

The pattern of this query is discussed in more detail in Example 6: Find the list of all Broadband customers that have at least one Wireless order and return their Wireless orders (ORDER is Repeatable and Required).

Example 5: Find all Broadband customers and return their Wireless orders if the customer has Wireless orders (ORDER is Required and Optional)

In this case, the target schema is ROWS(CUSTOMER*(ORDER*)). The target schema allows for customers with zero orders. This means that the query can (and should) return customers without orders. Practically, this makes the query is a left outer-join between customers and orders.

Instructions to create query using Data View Builder:

  1. Map the source PB-BB.CUSTOMER FIRST_NAME and LAST_NAME to the target CUSTOMER FIRST_NAME and LAST_NAME respectively.

  2. Map the source PB-WL.CUSTOMER_ORDER ORDER_DATE and SHIP_METHOD to the target CUSTOMER_ORDER ORDER_DATE and SHIP_METHOD respectively.

  3. Create the condition PB_BB.CUSTOMER CUSTOMER_ID eq PB_WL.CUSTOMER_ORDER.CUSTOMER_ID

The query looks like:

<ROWS> 
{
for $PB_BB.CUSTOMER_1 in document("PB-BB")/db/CUSTOMER
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_1/FIRST_NAME) }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_1/LAST_NAME) }</LAST_NAME>
{
for $PB_WL.CUSTOMER_ORDER_2 in document("PB-WL")/db/CUSTOMER_ORDER
where ($PB_BB.CUSTOMER_1/CUSTOMER_ID eq
                                       $PB_WL.CUSTOMER_ORDER_2/CUSTOMER_ID)
return
<CUSTOMER_ORDER>
<ORDER_DATE>{ xf:data($PB_WL.CUSTOMER_ORDER_2/ORDER_DATE) }
</ORDER_DATE>
<SHIP_METHOD>{ xf:data($PB_WL.CUSTOMER_ORDER_2/SHIP_METHOD) }
</SHIP_METHOD>
</CUSTOMER_ORDER>
}
</CUSTOMER>
}

Example 6: Find the list of all Broadband customers that have at least one Wireless order and return their Wireless orders (ORDER is Repeatable and Required)

In this case, the target schema is ROWS(CUSTOMER*(ORDER+)). Now, the target schema does not allow for customers with zero orders. This means that the query should not return customers without orders. Practically, this makes the query is a (natural) join between customers and orders.

Instructions to create query using Data View Builder:

  1. Map the source PB-BB.CUSTOMER FIRST_NAME and LAST_NAME to the target CUSTOMER FIRST_NAME and LAST_NAME respectively.

  2. Map the source PB-WL.CUSTOMER_ORDER ORDER_DATE and SHIP_METHOD to the target CUSTOMER_ORDER ORDER_DATE and SHIP_METHOD respectively.

  3. Create the condition PB_BB.CUSTOMER CUSTOMER_ID eq PB_WL.CUSTOMER_ORDER.CUSTOMER_ID

The query looks like:

<ROWS> 
{
for $PB_BB.CUSTOMER_1 in document("PB-BB")/db/CUSTOMER
let $CUSTOMER_ORDER_2 :=
for $PB_WL.CUSTOMER_ORDER_3 in
                                               document("PB-WL")/db/CUSTOMER_ORDER
where ($PB_BB.CUSTOMER_1/CUSTOMER_ID eq
                                                 $PB_WL.CUSTOMER_ORDER_3/CUSTOMER_ID)
return
<CUSTOMER_ORDER>
<ORDER_DATE>{ xf:data($PB_WL.CUSTOMER_ORDER_3/ORDER_DATE) }
                                     </ORDER_DATE>
<SHIP_METHOD>{ xf:data($PB_WL.CUSTOMER_ORDER_3/SHIP_METHOD) }
                                     </SHIP_METHOD>
</CUSTOMER_ORDER>
where xf:not(xf:empty($CUSTOMER_ORDER_2))
return
<CUSTOMER>
<FIRST_NAME>{ xf:data($PB_BB.CUSTOMER_1/FIRST_NAME) }</FIRST_NAME>
<LAST_NAME>{ xf:data($PB_BB.CUSTOMER_1/LAST_NAME) }</LAST_NAME>
{ $CUSTOMER_ORDER_2 }
</CUSTOMER>
}
</ROWS>

The general pattern for handling required repeatable elements in the target schema is as follows. The goal is to be able to check for existence of at least one element before we generate the parent. Generation of required repeatable elements is "promoted" the element to the nearest optional repeatable ancestor (or the root of the result if there is no such element). There the list of elements is computed inside a let clause. After that, and the result (list) of this let clause is checked whether it is empty or not, before producing the rest of the result.

In this case, the ORDER element is required so we need to check for existence of orders before we produce customer. This means that we need to generate the list of orders for each customer, and output the customer only if this list is not empty.

Source Replication

A source is said to be replicated if the source appears multiple times in a query. Specifically in XQuery, a source is replicated if document("source name") appears multiple times in the XQuery, usually appearing in two different for clauses. Similarly in SQL, a source is replicated if the source (table) appears twice in a FROM clause (or in two different FROM clauses). (See the next section for examples.)

Why is source replication necessary?

The simplest example of a necessary source replication would be a self-join in SQL. In the classic example of a self-join, the query wants to get all the pairs of employee names to manager names from a single employee table:

SELECT e.name, m.name
FROM employee e, employee m
WHERE e.manager_id = m.id

In XQuery, the query would look like:

<employee_managers>
{
for $e in document("employee")//employee
for $m in document("employee")//employee
where $e.manager_id eq $m.id
return
<employee_manager>
<employee> {$e.name} </employee>
<manager> {$m.name} </manager>
</employee_manager>
}
</employee_managers>

In both of these examples, given the sources, there is no way to write these queries without replicating the sources.

When is source replication necessary?

Source replication is necessary whenever you want to use a source for two different purposes that will require iterating over the source twice. Another way to state it is when two different tuples from a source will be required at the same time.

When should you manually replicate sources?

In ambiguous cases, both replicating and not replicating a source would lead to reasonable queries.

For example, at the beginning of this section, we presented a self-join to get employee-manager pairs. Without replicating the source, you might try the following:

  1. Map name to the target (get the employee name)

  2. Join manager_id with id (join to get the manager)

  3. Map name to the target (get the manager name)

Of course, the Data View Builder would interpret this query as: "give me all employees who are their own manager." This interpretation is no less valid than the desired one.

There is no way to go into Advanced mode to fix this query. You simply must replicate the source in this case.

 

Back to Top Previous Next