SQL Programming Standards

This document describes the SQL programming standards to be used in any database query. These standards will ensure that all database queries across the system have been structured properly and thus have less chance to cause performance issues. All developers must adhere to these standards.

Contents

Composing SQL Statements

Testing SQL Statements

Composing SQL Statements

Contents

Prerequisite

Composing A SELECT Statement

Existence Checks

SQL statements to avoid

Prerequisite

This document assumes that you have a basic knowledge of SQL syntax and database functions.

Composing A SELECT Statement

Contents

General SELECT Statement Considerations

Selection List

Database-specific Features

FROM Clause

WHERE Clause

Sort Order

Grouping

General SELECT Statement Considerations

·         Before composing an SQL statement, you should have in front of you the ERD of the tables involved in that SQL. You should make sure you fully understand the relationships between the tables.

·         As you may know, an SQL may return a single record or a set of records as its result set. When a set is to be returned, it is managed by a cursor that loops through that set and issues a separate database call for each record in the set.

Therefore, when you design your SQL, think carefully if the task can be easily achieved in a single SQL or rather that the nature of task is such that a row-by-row processing would make more sense. Examples for the latter could be a list processing or simply because the calculation per row is too complicated to be handled by the database.

Selection List

·         If a list of fields is to be returned, specify them prefixed by their table’s alias name as specified in the From Clause.

·         Use the DISTINCT option when the result list of records may contain duplicate rows in respect to the specified list of fields AND only one copy of the duplicated rows is needed.

·         For top-level batch programs, always specify the WITH HOLD keyword on the main SQL of a cursor based processing. This is to keep the cursor open after a commit or rollback. Without this, main cursor will be closed and fetch of the next record or restart processing will fail (specific to DB2) with SQL error 501.

Database-specific Features
Oracle

·         Oracle7 or later provides new approach for optimization: cost-based optimization (CBO). CBO evaluates the cost to, or impact on, your system of the execution path for each specific query and then select the lowest-cost path. The CBO was designed to save you the trouble of fiddling with your queries. Occasionally, it is not giving you the results you want and you have exhausted all other possible problem areas, you can specify hints to direct the CBO as it evaluates a query and creates an execution plan. If you have used hints before, you know that a hint starts with /*+ and ends with */.  A hint applies only to the statement in which it resides; nested statements consider as separate statement and require their own hints. Furthermore, a hint currently has a 255-character limit. Since the use of hint is database-specific, we should make use of Database Functions to accomplish it.

·         The most effective hints for use with the CBO are:

·         FULL – tells the optimizer to perform a full table scan on the table specified in the hint

SELECT /*+FULL(table_name)*/ COLUMN1,COLUMN2…..

·         INDEX – tells the optimizer to use one or more indexes for executing a given query.
Note: If you just want to ensure the optimizer doesn’t perform a table scan, use INDEX hint without specifying an index name and the optimizer will use the most restrictive index. A specific index should not be used as the actual index name may differ on the client’s site.

SELECT /*+INDEX(table_name index_name1 indexname2…) */ 
              COLUMN1, COLUM2

·         ORDERED – tells the optimizer to access tables in particular order, based on the order in the query’s FROM clause (often referred to as the driving order for a query)

SELECT /*+ORDERED*/ COLUMN1, COLUMN2
FROM TABLE1, TABLE2

·         ALL_ROWS – tells the optimizer to choose the fastest path for retrieving all the rows of a query, at the cost of retrieving a single row more slowly.

SELECT /*+ALL_ROWS*/ COLUMN1, COLUMN2…

·         FIRST_ROWS – tells the optimizer to choose the approach that returns the first row as quickly as possible.
Note: the optimizer will ignore the first rows hint in DELETE and UPDATE statements and in SELECT statements that contain any of the following: set operators, group by clauses, for update clause, group functions, and the distinct operators.

SELECT /*+FIRST_ROWS*/ COLUMN1, COLUMN2…

·         USE_NL – tells the optimizer to use nested loops by using the tables listed in the hint as the inner (non-driving) table of the nested loop. Note: if you use an alias for a table in the statement, the alias name, not the table name, must appear in the hint, or the hint will be ignored.

SELECT /*+USE_NL(tableA table B) */ COLUMN1, COLUMN2…

·         Hints are an Oracle specific feature and are not supported by the DB2 SQL syntax.
If you need to add a hint to your SQL make sure that a different SQL version is used for DB2 where the hint is not used.
Base product developers should not duplicate their SQL in this case but rather use the special database functions file “dbregex.txt”. In this file you should add a new hint-code that in Oracle translates into the specific hint whereas in DB2 it translates into an empty string. 

FROM Clause

·         Any table that has least one of its fields specified in the Selection List and/or any table that is directly referred to in the Where Clause (excluding sub-selects if any) must be specified in this section.

·         Label each table with a meaningful short alias and use this alias to reference the table anywhere in the SQL.

WHERE Clause

Contents

General WHERE Clause Considerations

Use of Sub-Selects

Use of IN Function

Use Of Database Functions

Other

General WHERE Clause Considerations

·         All tables specified in the From Clause must participate in a join statement with another table. Table left not joined, would cause a Cartesian join to be applied for this table and any other table on the list, resulting in an incorrect result list let alone very poor performance.

·         Note that there is no such thing as “conditional” join where the only join statement for a table is under a condition. In cases where the condition is not met and thus the join is not performed, one would end up with the same problem described previously.

·         The final result set is built up by taking the full population of the tables involved and applying the restricting criteria to it one after another where the intermediate result population of one step is the input for the next step. Therefore, it is recommended to specify the most restrictive criteria first so that at the end of one step, lesser records are processed in the next step.

This is of course a very schematic and simplified way to describe the internal process. This is not necessarily how the database is actually processing the statements. However, setting up the criteria as described would direct the database to use the right path.

Use of Sub-Selects

·         When you need to further test each processed record in the Where clause for meeting an additional condition, AND that condition can NOT be checked directly on the Where clause level, you probably need a sub-select.

·         As it is performed once for each outer level record it is considered as quite an expensive tool. Therefore if the criteria checked in a sub-select can be moved to the outer where clause level, it is preferable. If you still need to use a sub-select, it is very important to restrict the outer where clause population to the very minimum possible so that lesser records would need to be further checked for the sub-select condition.

·         When no value needs to be returned from the sub-select query but rather simply use it to check if a certain condition is true or false, use the EXISTS function as follows:

Select …
From …
Where … AND EXISTS (<sub-select>)

·         A sub-select query may refer to any value of the outer level record as its input parameters. Notice that if your sub-select does NOT refer to any of the processed record fields, it means that the result set of the sub-select would be the same for ALL the processed records.

Note that this could, but not necessarily, be an indication that your sub-select is set up wrong. One case where it is definitely wrong is when the sub-select result is input to an EXISTS function.

Use of IN Function

·         Whenever a field needs to be tested against a list of valid values it is recommended to use the IN function and not compare the field against each and every value.

Wrong way:

Select …
From …
Where … (A = ‘10’ or A=’20’ or A=’30’)

Right way:

Select …
From …
Where … A IN (‘10’,’20’,’30’)

Use Of Database Functions

·         Not all database functions available for one database are valid for others. Make sure that when you do use a database function the SQL works properly on every database supported by the product.

·         Avoid using LIKE as this will cause table scans. To achieve the ‘LIKE’ function where the first part of the string is specified, e.g., "CM%", BETWEEN may be used with the input criteria padded with high and low values.

Other

·         Depending on the data distribution, search on optional index column will likely to cause time out. See example –

Select BSEG_ID
From CI_BSEG
Where MASTER_BSEG_ID = &IN.MASTER-BSEG-ID

For such cases, consider additional restrictions or re-create the index to become composite – MASTER_BSEG_ID + BSEG_ID.

Sort Order

·         When a result list should be displayed in a specific order, sorting should take place on the database level and NOT on the client. This is especially important in cases when the list cannot be returned in full but rather in batches of records. Sorting each batch of records separately would not guarantee the sort order between records of different batches.

·         Columns in the sort order list must be specified in the selection list.

·         Prefix each field used in this clause with its table’s alias name.

·         Explicitly specify whether sorting should be ascending or descending and do not rely on database defaults.

Grouping

·         When a set of records needs to be grouped together by a simple and straightforward condition, it is recommended to use the database Group By Clause. In this case only the final summarized records are to be returned to the client resulting in a lesser number of database calls as opposed to processing the full list let alone a simpler program without any special grouping logic.

Existence Checks

·         The common technique used to check whether a certain condition is met or not, obviously when no data needs to be returned, is simply COUNT how many records match that condition. A zero number indicates that no record has met that condition.

Notice that this is not very efficient as we are asking the database to scan the records for an accurate number that we don’t really care about. All we really want to know if there is at least one such record and NOT how many they are.

When the tables involved are of low volume there should be no problem using this technique. It is very simple and uses common SQL syntax to all databases.

·         However, when that condition is checked against a high volume table that many of its records meet that condition, scanning all the matching records to get a count we don’t need should be avoided.

In this case use the EXISTS function as follows:

Select ‘x’
From <The main table of the searched field, where it is defined as the PK of that table>
Where <search field> = <search value> and
           EXISTS
       (<sub-select with the desired condition. This is the high volume table>);

For example :
Select ‘x’
From CI_UOM
Where UOM_CD = input UOM_CD and
           EXISTS (select ‘x’
                          From CI_BSEG_CALC_LN
                          Where UOM_CD = input UOM_CD);
        
If this does not work for your special case, use the following option :

Select ‘x’
From CI_INSTALLATION
Where EXISTS
        (<sub-select with the desired condition>) ;
Remember : This type of existence check using the Installation Options record should only be used in rare cases and should be consulted with the DBA first before implementation.

Note that we use CI_INSTALLATION as this table has only one row.

SQL statements to avoid

Contents

Decimal Delimiter

Whenever Statement

Decimal Delimiter


In Europe the decimal delimiter is often set to be comma character. DB2 database configured this way will return SQL syntax error in the following cases:

·         select ....,1, 

·         insert ....values(...1,2,3...)

·         insert ....values(...1 ,2,...)

·         order by 1,2,3

·         order by 1 ,2

·         update...set abc=1,def='XX'

·         case (? as varchar(50),12

To avoid this problem, surround the comma with spaces.

Whenever Statement

Expand Cobol pre compiler function does not support WHENEVER statement.

Testing SQL Statements

Contents

Result Data

Performance Testing – Oracle Only

More Extensive Performance Testing

Result Data

Once your SQL is ready, it is essential to test that it actually returns the expected result.

Create sample data for each condition checked by your SQL.  Then execute the SQL and make sure it returns the expected result for each case.

Performance Testing – Oracle Only

Contents

Overview

What Is An Explain Plan?

Generate The SQL’s Explain Plan

Analyze Explain Plan

Overview

An SQL may perform reasonably well even if not efficiently written in cases where the volume of processed data is low, like in a development environment. However, the same SQL may perform very poorly when executed in a real high volume environment. Therefore, any SQL should be carefully checked to make sure it would provide reasonable performance at execution time.

Obviously there could be many reasons for an SQL to perform poorly and not all of them are easy to predict or track. 

In general, these could be subcategorized into two main groups:

·         Basic issues related to the SQL code. These may be missing JOIN statements, inefficient path to the desired data, inefficient use of database functions, etc.

·         More complicated issues having to do with lack of indexes, database tuning and handling of high volume of data, efficiency of I/O system etc. 

The latter group of issues may only be truly tested on a designated environment simulating a real production configuration.  These performance tests are typically conducted by a team of database and operating system experts as part of a thorough performance testing of a predefined set of process.

It is the first group of issues that can and should be tested by the programmer at this stage.  This is done by analysis of the SQL’s Explain Plan result.

What Is An Explain Plan?

An explain plan is a representation of the access path that is taken when an SQL is executed within Oracle.

The optimal access path for an SQL is determined by the database optimizer component. With the Rule Based Optimizer (RBO) it uses a set of heuristics to determine access path. With the Cost Based Optimizer (CBO) we use statistics to analyze the relative costs of accessing objects. 

Since the Cost Based optimizer relies on actual data volume statistics to determine the access path, to generate an accurate Explain Plan using the cost based optimizer requires a database set up with the proper statistics of a real high volume data environment.

Note.  A cost based optimizer Explain Plan generated on an inadequate database, would be totally inaccurate and misleading!

Obviously, our development database does not qualify as an optimal environment of cost based optimizations.  Since the Rule Based optimizer is not data dependant it would provide a more reliable Explain Plan for this database. 

Note.  An efficient rule based Explain Plan does not guarantee an efficient cost based one when the SQL is finally executed on the real target database.  However, a poor rule based Explain Plan would most probably remain such on a database with a higher volume of data.

Note.  When the SQL is complicated and mainly designed to process high volume tables it is recommended to also analyze its Explain Plan on an appropriate high volume database. 

Generate The SQL’s Explain Plan

·         Let’s assume this is the SQL to be checked

SQL To Check

·         Adjust SQL Statement:

·         Extract the tested SQL into Toad SQL editor.

·         Replace the COBOL name of each Host Variable with the equivalent database identifier :b<n> where n is a unique number identifying that host variable.  If the same variable appears more than one in the SQL use the same database host variable id in all occurrences.

·         Force the database to analyze the SQL in Rule Base mode by introducing the RULE database hint phrase.

Adjust SQL Statement

·         Generate the Explain Plan:

·         Position the cursor on the SQL area

·         Choose the “Explain Plan Current SQL” option on the SQL-Window menu.  Alternatively, use CTRL+E.

Get Explain Plan

·         The generated plan appears on the result section of the editor.

Explain Plan

Analyze Explain Plan

Contents

Access Methods

Common Issues To Be Aware Of

Access Methods

Logically Oracle finds the data to read by using the following methods:

·         Full Table Scan (FTS).  Using this method the whole table is read.

·         Index Lookup (unique & non-unique).  Using this method, data is accessed by looking up key values in an index and returning rowids where a rowid uniquely identifies an individual row in a particular data block.

·         Rowid.  This is the quickest access method available Oracle simply retrieves the block specified and extracts the rows it is interested in. Most frequently seen in explain plans as Table access by Rowid.

Common Issues To Be Aware Of

Contents

Cartesian Product

Full Table Scan

Join Order

Nested Loops

Sort

Cartesian Product

·         A Join is a predicate that attempts to combine 2 row sources.  Cartesian Product is created when there are no join conditions between 2 row sources and there is no alternative method of accessing the data.  Typically this is caused by a coding mistake where a join has been left out.   The CARTESIAN keyword in the Explain Plan indicates this situation.

Full Table Scan

·         A Full Table Scan, i.e. TABLE ACCESS FULL phrase, found in the Explain Plan usually indicates an inefficient access path.  This means that the only way the database found to get to the desired data is by reading every single row in the table.

Notice that if the logic indeed requires reading all data, then this database decision is indeed correct.  However, if you intended to get a small subset of rows from a large table and ended up reading all of it this is definitely not efficient and should be fixed.  If this is the case, try and find a better SQL structure that would avoid a full table access.  If you can’t find such, please consult a DBA as this SQL may require an additional Index to be created for the table. 

·         Sometimes there would be a proper index on a particular table but still a full table scan would be chosen for the access path of that table.  This may be as result of an inefficient Join Order.  Please see details below.

Join Order

A Join is a predicate that attempts to combine 2 row sources. We only ever join 2 row sources together. Join steps are always performed serially even though underlying row sources may have been accessed in parallel.  The join order makes a significant difference to the way in which the query is executed. By accessing particular row sources first, certain predicates may be satisfied that are not satisfied by with other join orders.  This may prevent certain access paths from being taken.

·         Make sure the join between 2 tables is done via indexed fields as much as possible. 

·         Also, if such an index exists, make sure you specify fields in the order they are defined by that index.

Nested Loops

This is a common type of processing a join between 2 row sources.  First we return all the rows from row source 1, then we probe row source 2 once for each row returned from row source 1.

Row source 1

Row 1 --------------       -- Probe -> Row source 2

Row 2 --------------       -- Probe -> Row source 2

Row 3 --------------       -- Probe -> Row source 2

Row source 1 is known as the outer table. Row source 2 is known as the inner table. Accessing row source 2 is known a probing the inner table. For nested loops to be efficient it is important that the first row source returns as few rows as possible as this directly controls the number of probes of the second row source. Also it helps if the access method for row source 2 is efficient as this operation is being repeated once for every row returned by row source 1.

Sort

Sorts are expensive operations especially on large tables where the rows do not fit in memory and spill to disk.

There are a number of different operations that promote sorts:

·         Order by clauses

·         Group by

·         Sort merge join

Note that if the row source is already appropriately sorted then no sorting is required. In other words, if the fields you sort by happen to be defined by an Index in that particular order then sort operation is avoided.  Therefore, whenever you see that an explicit sort operation has taken place, check if it can be avoided by using an index or sometimes just by making sure your are using an index’s fields in the right order.

If no such index exists and the number of rows to be sorted is of high volume, please consult a DBA as this may justify adding a new index. 

More Extensive Performance Testing

Special attention should be paid to background processes that are designed to process high volume tables.  A thorough performance testing exercise in a benchmark format may be called upon.