Key Concepts of Query Building

Download Docs

Site Map

Glossary

e-docs > Liquid Data for WebLogic > Building Queries and Data Views > Key Concepts of Query Building

Building Queries and Data Views

Key Concepts of Query Building

The following terms and concepts introduced here:

Query Plans

Stored Queries

Ad Hoc Queries

Different Kinds of Data Sources

Source and Target Schemas

Anatomy of a Query: Joins, Unions, Aggregates, and Functions

Query Plans

A query plan is a compiled query. Before a query is run, Liquid Data compiles the XQuery into an optimized query plan. At runtime, Liquid Data executes the query plan against physical data sources and returns the query results.

Stored Queries

A stored query is a query that has been saved to the Liquid Data repository in the stored_queries folder. Queries must be saved with a .xq extension to be recognized as stored queries in Liquid Data. There is a performance benefit to using a stored query because caching is available as follows:

The query plan for a stored query is always cached in memory. (For an ad-hoc query, the query plan is not cached.)

The query result for a stored query can be cached.

Caching of query results for stored queries is configurable through the Administration Console (see Configuring the Query Results Cache in the Liquid Data Administration Guide). Using this feature, you can specify whether or not to cache query results for stored queries.

Note: Queries can be stored in subdirectories of the stored_queries folder and accessed similarly to a path expression. For example, if a query is saved in a repository directory under:

stored_queries/uCustomer/custQuery.xq

it could be executed from a jsp with:

<lds:query name="uCustomer.custQuery">
</lds:query>

Ad Hoc Queries

An ad hoc query is a query that has not been stored in the Liquid Data repository as a stored query but rather is passed to the Liquid Data server on the fly. Liquid Data does not cache the query plan or the result for an ad hoc query the way it can for a stored query, so an ad hoc query cannot leverage the performance benefit of caching.

Different Kinds of Data Sources

Information can reside in various kinds of data sources in an enterprise or across business entities. The most obvious of these is the relational database, which we typically think of as a data storage and retrieval resource. The reality is that the development of global business and distributed systems has generated information in many other types of data sources as well. Information resides not only in various kinds of databases, but also in packaged enterprise information system (EIS) applications such as PeopleSoft or Siebel, and in emerging net-based technologies like Web services and XML documents. Liquid Data and Data View Builder give you the ability to query and get views into data that resides in all these kinds of information sources.

Relational Databases

All types of businesses and other organizations use an RDBMS (relational database management system) to store information. Relational refers to the way the database organizes information. All information in a relational database appears in logical tables with rows and columns. Instead of a series of static records with one or more data fields that can be redundant from one file to another, information is directly accessible using queries. You can create logical table records that contain just the data you need by constructing a query.

Some databases track information, such as reservations or overnight package delivery information. Other databases store information for perpetual access, such as the IRS or the Library of Congress. Others change dynamically, depending on frequent updates, additions, and deletions, such as a newspaper subscriber database. Databases can reside on large mainframes, web servers, or powerful desktop systems.

Imagine how many times your employee number can appear in static records that describe your company 401K investment, employment, and health insurance records. In an isolated case, this represents three separate files with much of the same information repeated in each instance where there is a record of information. A comprehensive RDBMS would store your employee number, name, address, and other information once with pointers to other related pieces of information about you. Well designed queries could extract only information related to a specific task.

Note: When Data View Builder inspects the metadata for a relational database, if the schema contains any columns that start with numeric values, the Data View Builder adds an underscore character (_) to the beginning of the element name that represents the column. For example, if you have a column in the database named 123_COLUMN, the element corresponding to that column in the Data View Builder is labeled _123_COLUMN.

Also, the following characters from any catalog, schema, table, or column names are replaced with an underscore character:

: < > \ / $ , tab, newline, and spaces

For example, a table named customer><$table can be referenced as customer___table (three underscore characters replace the three special characters).

Additionally, if you are hand editing queries, the element or attribute names that refer to column names that begin with a numeric value must begin with an underscore character (_) when used in XPath expressions.

Tuples

Tuples are another way to refer to data in a database. In a relational database, a tuple is a complete set of information, or a logical record. For example, a personnel schema might contain records that has four fields: an employee number, a name field, an address field, and a phone number field. This tuple, or record, might occur many thousands of times in a very large company with many employees.

XML Files

Extensible Markup Language (XML) files are proving to be a convenient and portable format for storing many different kinds of information for document processing and information exchange. Liquid Data and Data View Builder supports use of XML files as data sources.

Web Services

A web service is a self-contained, platform-independent morsel of business logic, located somewhere on the Internet, that is accessible through standards-based Internet protocols like HTTP or SMTP. Web services facilitate application-to-application communication over the Internet or within and across enterprises. A familiar example of an externalized Web service is a weather portlet or stock quotes that you can integrate into your Web browser. You can use Web services to encapsulate information and operations. With the standards and wide-spread use of Web services for enterprise information exchange evolving, Web services are becoming important resources of global business information. Liquid Data and Data View Builder support the use of Web services as data sources.

Application Views

Enterprise Information Systems (EIS) and custom applications store information that you might need to aggregate for a complete view of data. You can query and retrieve subsets of relevant information from applications such as SAP, Siebel, PeopleSoft, Oracle Financial and so on and treat these views as application view data sources in your data integration solution.

Data Views—Using the Result of a Query as a Data Source

A data view is a special type of data source in which the result of a query is used as a data source. The query result will change as your data changes. In this way, you can build on the queries you design to create "views on data views" for an up-to-date picture of continually changing information.

Stored Procedures

For relational databases that support stored procedures, you can create stored procedures in the RDBMS and expose them to Liquid Data. Liquid Data treats a stored procedure as a function which requires one or more inputs to produce zero or more outputs. A stored procedure allows database programmers to combine business logic with database queries, and they provide a powerful way to produce information from relational databases. Also, stored procedures allow the database administrators to tune the queries run by the stored procedures, thus ensuring good performance and minimizing impact on database performance.

Source and Target Schemas

XML schemas are used in Liquid Data to describe the hierarchical structure of the various data sets with which you are working. The Data View Builder uses XML schema representations as follows:

Source Schemas—XML schemas that describe the structure of the source data.

Target Schema—An XML schema that describes the structure of the target data; that is, the structure of the query result.

To define schema to Liquid Data, you configure Liquid Data data sources. These data sources must have an associated schema. For relational databases, you can specify a schema or use the default schema determined automatically based on the metadata available through the database JDBC driver. For XML files, Views, Complex Parameter Types, Stored Procedures, or Web Services, you specify the schema when you define the data source to Liquid Data. Once the data sources are defined, you can use them in queries.

Note: For information on which versions of XML schema are supported, see Supported XML Schema Versions In Liquid Data.

This section includes the following:

Understanding Source Schemas

Understanding Target Schemas

Understanding Source Schemas

A source schema is the XML schema representation of the structure of the data in a data source.

The Data View Builder provides graphical representation of source schemas in a tree structure format. Nodes contain elements, attributes, and sub-elements and attributes that you can expand or collapse. If you are building a query that queries more than one data source, you will use multiple source schemas (one for each data source).

Understanding Target Schemas

A target schema describes the structure of a query result that will be produced when the query runs. As with source schemas, the Data View Builder provides a graphical representation of target schemas in a tree structure format. Nodes represent elements, attributes, and sub-elements that you can expand or collapse. Only one target schema per query is allowed.

Target schemas have two main purposes:

Target schemas provide a template for mapping data from source schemas in order to generate a query.

Target schemas provide the schema definition when creating Liquid Data Web Service definitions and Data View definitions.

You can specify a target schema in the Data View Builder in the following ways:

You can set the target schema to simply use an existing schema for the target schema. For example, you might have a target schema that a data architect built to use in a given class of queries.

You can open an existing target schema and then use the Data View Builder to modify and save it.

You can use the Data View Builder to build a new target schema from scratch, using the tools available on the right-click menu in the target schema Data View Builder pane.

The way you use target schemas with the Data View Builder varies depending upon the situation. Sometimes you have a schema-driven scenario where you have an existing target schema and want to generate a query that maps data to it appropriately. Other times you have a query-driven scenario where you are trying to generate a particular query result set, and you can use the target schema as a means of shaping your query results.

A target schema can either describe just the portion of the hierarchy that you want to appear in the result set or it can be a superset of the data that actually is mapped in the query. For example, you could choose the same schema for both the source and target data structure, but then map only some of the source elements to the target schema. The query result will show only those data elements that are actually mapped to nodes in the target schema. When a target schema contains unmapped elements, as in this example, the unmapped elements must be specified as Optional in the target schema definition.

You can specify Repeatable and Optional attributes of each node for a target schema in the properties dialog of each node. The Data View Builder uses these target schema attributes to infer your intent during query construction, and these attributes therefore affect the queries that the Data View Builder generates. If a node is not specified as optional, then it must be mapped in the query in order for the query to conform to that target schema. If a node is repeatable, then it can be repeated in the result set of the generated query. For example, consider the following target schema:

Figure 1-1 Target schema with non-repeatable node

In this target schema, the firstname and lastname elements are defined as non-repeatable, and the custrecord node is defined as repeatable and required (the + next to the custrecord name indicates that the node is repeatable and that it must appear at least once—if it had an asterisk [*] instead of the +, that would indicate the node is repeatable and optional, and could therefore occur zero or more times). If you map data to the firstname and lastname nodes, this target schema will generate results similar to the following sample result set:

<customers>
  <custrecord>
    <firstname>John</firstname>
    <lastname>Parker</lastname>
  </custrecord>
  <custrecord>
    <firstname>John</firstname>
    <lastname>Warfin</lastname>
  </custrecord>
    .......
    ......
</customers>

If you modify the target schema so the firstname and lastname elements are also repeatable, the result set schema for the generated query will be different. For example, in the schema shown in Figure 1-1, you could open the properties dialog on the firstname and lastname elements and modify it to look as follows:

Figure 1-2 Properties dialog from target schema with repeatable elements

With the changed target schema, the Data View Builder will now generate a query with results similar to the following sample result set:

<customers>
  <custrecord>
    <firstname>John</firstname>
    <firstname>John</firstname>
        ........
        ........
        <lastname>Parker</lastname>
    <lastname>Warfin</lastname>
        ........
        ........
  </custrecord>
</customers>

In this case, it is likely that the query designer would want the result set to display the first and last names together for the same customer, and would therefore desire the non-repeatable nodes for the firstname and lastname elements. To understand how these attributes affect the query results, experiment with different Node Property settings, run the queries, and compare the results.

When you use the Liquid Data Console to create a Data View defiinition or a Web Service definition, you must specify both a target schema and a stored query. The queries in these definitions must conform to the specified target schema; that is, if the target schema contains elements that are required (Optional box not checked), those elements must be mapped in the query. Make sure your Data View and Web Service queries conform to the specified target schema, as Data Views and Web Services whose queries and target schemas do not conform might produce unexpected behavior.

Anatomy of a Query: Joins, Unions, Aggregates, and Functions

A query can be thought of as a way of filtering through large amounts of data or information to extract only the specifics relevant to a particular instance. A query is made up of one or more types of conditions that accomplish the filtering task or "ask the question." The most commonly used techniques for establishing the source conditions are:

Joins

Unions

Aggregates

Functions

Query Parameters

Joins

A join operation merges data from two sources based on a common field. A query with a join operation combines information in two data source schemas when there is a match on a common field. The common field is a link between the two schemas.

Using this common field, you can gather other information that is unique to each source into a single target schema or view of the data.

For example, you could specify first and last names of all customers in two data sources Broadband and Wireless, but limit the output (query result described by target schema) to the subset of those customers with matching customer IDs in both source schemas.

There are two types of join operations based on equality of matching fields (or columns):

Inner Joins—An inner join is the default join type. It combines data from two data sources only if values in the joined fields match. The matching fields must have compatible data types or contain similar data.

Outer Joins—An outer join behaves like an inner join, but it includes data that does not match the join condition. You can create two different types of outer joins by specifying which unmatched data elements to include in the query results

Left Outer Join—In a left outer join, all selected nodes from the leftmost schema in the join clause appear. Unmatched data nodes in the rightmost schema in the join clause do not appear in the result.

Right Outer Join—In a right outer join, all rows in the rightmost schema in the join clause appear. Unmatched data nodes in the leftmost schema in the join clause do not appear in the result.

Unions

Union operations enable you to combine data from multiple sources into a single set of results described by the structure of the target schema. Even though the content of the source schemas can be the same, or different, you can use a union query to combine selected data nodes in the source schemas into a complete view of the data. For example, we could construct a query that reports all customer orders from multiple sources into a single result.

Aggregates

You can create queries in Liquid Data that aggregate query results, providing summary information on a set of data. The typical aggregate functions are average (avg), count, maximum (max), minimum (min), and sum. You can use aggregate operations to perform various business calculations such as counting the number of customers, calculating the total purchases of a single employee, calculating the average salary of workers, and so on.

Functions

Liquid Data provides a functions library with built-in functions can by used by any Liquid Data client, and also supports configuration and use of user-defined custom functions related to specific business needs.

All source conditions are established using some type of function. The default function for a simple join is the standard "equals" function (abbreviated eq). If you drag and drop one data source element onto another of the same name you have created a simple join using the equals function with two parameters (the two data source elements) which gets expressed as value1 eq value2 in the Builder-generated XQuery.

You can also choose from the functions library to explicitly specify a function to use.

The W3C specification for XQuery supports a discrete list of functions—Liquid Data supports a subset of those functions. For more information on W3C standard functions, see the XQuery 1.0 and XPath 2.0 Functions and Operators specification.

Query Parameters

The parameters to a function can be elements in a source schema, or they can be query parameters that you define as generic placeholders for a variable value. You can specify the variable value at query run time. For example, a query parameter could be defined as lastname, which is a placeholder for a real last name, such as Smith, that you identify when the query runs. Each time you run the query you can change the value of lastname, which gives you a lot of flexibility without creating a separate query for every unique last name you might be interested in. By supplying a new value each time, you could run queries on customers named Smith, or Jones, or any other last name of interest without redesigning the query.

Note: Query parameters are case sensitive.