Best Practices for SQL Data Sets

Consider the following tips to help you create more efficient SQL data sets:

Only Return the Data You Need

Ensure that your query returns only the data you need for your reports. Returning excessive data risks OutOfMemory exceptions.

For example, never simply return all columns as in:

SELECT * FROM EMPLOYEES;

Always avoid the use of *.

Two best practices for restricting the data returned are:

  • Always select only the columns you need

    For example:

    SELECT DEPARTMENT_ID, DEPARTMENT_NAME FROM EMPLOYEES;
  • Use a WHERE clause and bind parameters whenever possible to restrict the returned data more precisely.

    This example selects only the columns needed and only those that match the value of the parameter:

    SELECT DEPARTMENT_ID, DEPARTMENT_NAME 
    FROM EMPLOYEES 
    WHERE DEPARTMENT_ID IN (:P_DEPT_ID)
    

Use Column Aliases to Shorten XML File Length

The shorter the column name, the smaller the resulting XML file; the smaller the XML file the faster the system parses it.

Shorten your column names using aliases to shorten I/O processing time and enhance report efficiency.

In this example, DEPARTMENT_ID is shortened to "id" and DEPARTMENT_NAME is shortened to "name":

SELECT DEPARTMENT_ID id, DEPARTMENT_NAME nameFROM EMPLOYEES 
WHERE DEPARTMENT_ID IN (:P_DEPT_ID)

Avoid Using Group Filters by Enhancing Your Query

Although the Data Model Group Filter feature enables you to remove records retrieved by your query, this process takes place in the middle tier, which is much less efficient than the database tier.

It is a better practice to remove unneeded records through your query using WHERE clause conditions instead.

Avoid PL/SQL Calls in WHERE Clauses

PL/SQL function calls in the WHERE clause of the query can result in multiple executions.

These function calls execute for each row found in the database that matches. Moreover, this construction requires PL/SQL to SQL context switching, which is inefficient.

As a best practice, avoid PL/SQL calls in the WHERE clause; instead, join the base tables and add filters.

Avoid Use of the System Dual Table

The use of the system DUAL table for returning the sysdate or other constants is inefficient. You should avoid using the system DUAL table when not required.

For example, instead of:

SELECT DEPARTMENT_ID ID, (SELECT SYSDATE FROM DUAL) TODAYS_DATE  FROM DEPARTMENTS WHERE DEPARTMENT_ID IN (:P_DEPT_ID)

Consider:

SELECT DEPARTMENT_ID ID, SYSDATE TODAYS_DATE FROM DEPARTMENTS WHERE DEPARTMENT_ID IN (:P_DEPT_ID)

In the first example, DUAL is not required. You can access SYSDATE directly.

Avoid PL/SQL Calls at the Element Level

Package function calls at the element, within the group or row level, are not allowed. You can include package function calls at the global element level because these functions are executed only once per data model execution request.

Example:

<dataStructure>
  <group name="G_order_short_text" dataType="xsd:string" source="Q_ORDER_ATTACH">
    <element name="order_attach_desc" dataType="xsd:string" value="ORDER_ATTACH_DESC"/>
    <element name="order_attach_pk" dataType="xsd:string" value="ORDER_ATTACH_PK"/>     

The following element is incorrect:

<element name="ORDER_TOTAL _FORMAT" dataType="xsd:string" value=" WSH_WSHRDPIK_XMLP_PKG.ORDER_TOTAL _FORMAT "/>
<!--  This is wrong should not be called within group.-->
</group>
 <element name="S_BATCH_COUNT" function="sum" dataType="xsd:double" value="G_mo_number.pick_slip_number"/>
</dataStructure>

Avoid Including Multiple Data Sets

It can seem desirable to create one data model with multiple data sets to serve multiple reports, but this practice results in very poor performance.

When a report runs, the data processor executes all data sets irrespective of whether the data is used in the final output.

For better report performance and memory efficiency, consider carefully before using a single data model to support multiple reports.

Avoid Nested Data Sets

The data model provides a mechanism to create parent-child hierarchy by linking elements from one data set to another.

At run time, the data processor executes the parent query and for each row in the parent executes the child query. When a data model has many nested parent-child relationships slow processing can result.

A better approach to avoid nested data sets is to combine multiple data set queries into a single query using the WITH clause.

Following are some general tips about when to combine multiple data sets into one data set:

  • When the parent and child have a 1-to-1 relationship; that is, each parent row has exactly one child row, then merge the parent and child data sets into a single query.

  • When the parent query has many more rows compared to the child query. For example, an invoice distribution table linked to an invoice table where the distribution table has millions of rows compared to the invoice table. Although the execution of each child query takes less than a second, for each distribution hitting the child query can result in STUCK threads.

Example of when to use a WITH clause:

Query Q1: 
SELECT DEPARTMENT_ID EDID,EMPLOYEE_ID EID,FIRST_NAME FNAME,LAST_NAME LNAME,SALARY SAL,COMMISSION_PCT COMMFROM EMPLOYEES

Query Q2: 
SELECT DEPARTMENT_ID DID,DEPARTMENT_NAME DNAME,LOCATION_ID LOCFROM DEPARTMENTS

Combine the these queries into one using WITH clause as follows:

WITH Q1 as (SELECT DEPARTMENT_ID DID,DEPARTMENT_NAME DNAME,LOCATION_ID LOC
FROM DEPARTMENTS),
Q2 as (SELECT DEPARTMENT_ID EDID,EMPLOYEE_ID EID,FIRST_NAME FNAME,LAST_NAME LNAME,SALARY SAL,COMMISSION_PCT COMM
FROM EMPLOYEES)
SELECT Q1.*, Q2.*
FROM Q1 LEFT JOIN Q2
ON Q1.DID=Q2.EDID

Avoid In-Line Queries as Summary Columns

In-line queries execute for each column for each row. For example, if a main query has 100 columns, and brings 1000 rows, then each column query executes 1000 times.

Altogether, it is 100 multiplied by 1000 times. This is not scalable and cannot perform well. Avoid using in-line sub queries whenever possible.

Avoid the following use of in-line queries. If this query returns only a few rows this approach may work satisfactorily. However, if the query returns 10000 rows, then each sub or in-line query executes 10000 times and the query would likely result in Stuck threads.

SELECT
NATIONAL_IDENTIFIERS,NATIONAL_IDENTIFIER,
PERSON_NUMBER,
PERSON_ID,
STATE_CODE
FROM
(select pprd.person_id,(select REPLACE(national_identifier_number,'-') from per_
national_identifiers pni where pni.person_id = pprd.person_id and rownum<2)
 national_identifiers,(select national_identifier_number from per_national
identifiers pni where pni.person_id = pprd.person_id and rownum<2) national_
identifier,(select person_number from per_all_people_f ppf
where ppf.person_id = pprd.person_id
and :p_effective_start_date between ppf.effective_start_date and ppf.effective_
end_date) PERSON_NUMBER
(Select hg.geography_code from hz_geographies hg
where hg.GEOGRAPHY_NAME = paddr.region_2
and hg.geography_type = 'STATE')  state_code

Avoid Excessive Parameter Bind Values

Oracle Database allows bind maximum of 1000 values per parameter.

Binding a large number of parameter values is inefficient. Avoid binding more than 100 values to a parameter.

When you create a Menu type parameter and your list of values contains many values, ensure that you enable both the Multiple Selection and Can Select All options, then also select NULL value passed to ensure that too many values are not passed.

Tips for Multi-value Parameters

Report consumers often must run reports that support the certain conditions.

  • If no parameter is selected (null), then return all.

  • Allow selection of multiple parameter values

In these cases the use of NVL() does not work, you should therefore use

  • COALESCE() for queries against Oracle Database

  • CASE / WHEN for Oracle BI EE (logical) queries

Example:

SELECT EMPLOYEE_ID ID, FIRST_NAME FNAME, LAST_NAME LNAME FROM EMPLOYEES
WHERE DEPARTMENT_ID = NVL(:P_DEPT_ID, DEPARTMENT_ID

The preceding query syntax is correct only when the value of P_DEPT_ID is a single value or null. This syntax does not work when you pass more than a single value.

To support multiple values, use the following syntax:

For Oracle Database:

SELECT EMPLOYEE_ID ID, FIRST_NAME FNAME, LAST_NAME LNAME FROM EMPLOYEES
WHERE (DEPARTMENT_ID IN (:P_DEPT_ID) OR COALESCE (:P_DEPT_ID, null) is NULL)

For Oracle BI EE data source:

(CASE WHEN ('null') in (:P_YEAR) THEN 1 END =1 OR "Time"."Per Name Year" in (:P_YEAR))

For Oracle BI EE the parameter data type must be string. Number and date data types are not supported.

Group Break and Sorting Data

The data model provides a feature to group breaks and sort data.

Sorting is supported for parent group break columns only. For example, if a data set of employees is grouped by department and manager, you can sort the XML data by department. If you know how the data should be sorted in the final report or template, you specify sorting at data generation time to optimize document generation. The column order specified in the SELECT clause must exactly match the element orders in the data structure. Otherwise group break and sort may not work. Due to complexity, multiple grouping with multiple sorts at different group levels is not allowed.

Example: In the example shown below, sort and group break are applied to the parent group only, that is, G_1. Notice the column order in the query, data set dialog, and data structure. The SQL column order must exactly match the data structure element field order; otherwise, it may result in data corruption.

Example:

SELECT  d.DEPARTMENT_ID DEPT_ID, d.DEPARTMENT_NAME  DNAME,
        E.FIRST_NAME FNAME,E.LAST_NAME LNAME,E.JOB_ID JOB,E.MANAGER_ID
FROM EMPLOYEES E,DEPARTMENTS D
     WHERE D.DEPARTMENT_ID = E.DEPARTMENT_ID 
     ORDER BY  d.DEPARTMENT_ID, d.DEPARTMENT_NAME

Once you define the query, you can use the data model designer to select data elements and create group breaks as shown below.

The Data Structure with breaks is:

<output rootName="DATA_DS" uniqueRowName="false">
<nodeList name="data-structure"> <dataStructure tagName="DATA_DS">
<group name="G_1" label="G_1" source="q1">
   <element name="DEPT_ID" value="DEPT_ID" label="DEPT_ID" fieldOrder="1"/>
   <element name="DNAME" value="DNAME" label="DNAME" fieldOrder="2"/>
   <group name="G_2" label="G_2" source="q1">
      <element name="FNAME" value="FNAME" label="FNAME" fieldOrder="3"/>
      <element name="LNAME" value="LNAME" label="LNAME" fieldOrder="4"/>
      <element name="JOB" value="JOB" label="JOB" fieldOrder="5"/>
      <element name="MANAGER_ID" value="MANAGER_ID" label="MANAGER_ID" fieldOrder="6"/>
   </group>
</group>
</dataStructure>
</nodeList>
</output>