Using Indexes and PS/nVision Tuning

This section provides an overview of indexes and discusses how to use:

Optimizers.
Filter factors.
Index matching.
Ledger access paths.
Access path analysis.
Index column suggestions.

An index is a fast way to find data. At a simple level, an index works like the tabs on a large dictionary; you can go directly to all the words that begin with a particular letter. After that, you need to do some additional searching, taking advantage of the fact that the words are stored in alphabetical sequence. The range of words on a page is generally printed at the top, so you do not have to scan through individual words until you find the page you want. Many database systems include a type of index, often called a primary or clustered index, that has the same sequence as the data.

But suppose you are searching an atlas, where data is generally stored in geographical sequence. If you are looking up Majorca, you are likely to look it up first in the alphabetical index, and then search the page that has a map of the Mediterranean. Data is accessed through an attribute different from its storage attribute.

Suppose you wan to find all the words in the dictionary derived from Finnish words. Unless you had a dictionary with an etymological index, your scan of the data pages would be very time-consuming. This type of access should be avoided when accessing large database tables, because it is slow even on the fastest server.

Typically, the Where clause in a query contains a mixture of criteria resolvable through an index and criteria resolvable only through access to the data pages. To be efficient, use the index criteria to limit the number of data rows searched.

Multi-Column Indexes

Relational databases allow indexes over multiple columns, so that if you have Where criteria for two or more columns in the index, the database manager can use one index to satisfy criteria on multiple columns at a time. Having the pertinent criteria columns in an index, however, does not guarantee that index will be used or that it will be used effectively on all the columns that have criteria.

Most relational database systems include a cost-based optimizer, a complex program responsible for choosing an access path to satisfy a particular query, such as a Select statement issued by PS/nVision. Using statistics stored in the database, the optimizer tries to determine the index to use for each of the tables accessed in the query and the table access sequence that yields the data with minimal searching.

Some database optimizers have a choice between cost-based and rule-based optimizers. For PS/nVision, and for most PeopleSoft software, you should use the cost-based optimizer because:

PS/nVision creates dynamic SQL based on the report criteria you provide.
Rule-based optimizers are designed for static SQL that is written following its rules.
Cost-based optimizers adapt much better to changes in data and indexes because they use statistics in their optimum access path calculations.

This discussion of tuning PS/nVision's SQL performance assumes use of a cost-based optimizer.

Although it might be named differently, a filter factor applies to all optimizers. It estimates how effective a particular index will be in narrowing a search.

Assume we have a table in which financial results are stored by fiscal year, period, and account number. The table has two indexes, one on fiscal year and another on account. Assume that our query contains the following:

WHERE FISCAL_YEAR = 1994 AND ACCOUNT='500120'

If the table has data for 4 fiscal years and 800 accounts, then the filter factor for the fiscal year index is ¼ or 0.25; and the filter factor for the account index is 1/800 or 0.00125. Using the index for fiscal year narrows the search to about one fourth of the total table, which is not that great. But using the index for account narrows the data searched to about one eight hundredth of the total table and will be much more efficient. Of course, an index combining fiscal year and account would be even better.

The measure of the selectivity of an index, or a column within an index, is often called its cardinality. Cardinality is the number of discrete values in that column or the number of discrete combinations represented by a multicolumn index. Cardinality is one of the most important statistics used by optimizers to select indexes and access paths. Like most other statistics, cardinality is updated on request rather than constantly.

When data changes substantially, update the statistics so the optimizer has accurate information. Updating statistics requires different processes on different database platforms.

See the PeopleTools Installation Guide.

The effective filter factor for an index is the combined cardinalities for the index columns actually used in a particular access. For example, if an index is built over FISCAL_YEAR, LEDGER, and ACCOUNT, and the table contains 4 years, 5 ledgers, and 800 accounts, the potential filter factor is 1/(4*5*800), or 1/16000, or 0.0000625. However, if the ACCOUNT field in the index couldn’t be used because of the nature its criteria, the filter factor would be only 1/20, which isn’t very selective.

These general rules apply to matching index columns:

Database systems provide direct access to data very quickly if the criteria can be processed through an optimized look-up process (such as searching a tree structure) within the index.
Scanning index pages to satisfy criteria is much slower, although it is usually much faster than scanning the corresponding data.
Columns are matched from left to right in the order they were specified when the index was created.

If, for example, an index is created over DEPTID, BUSINESS_UNIT, and ACCOUNT, but no criteria were provided for BUSINESS_UNIT, only the DEPTID field in the index would be matched, even if criteria were specified for ACCOUNT.
To get index matching on multiple columns, the leftmost columns must have simple criteria, often equality (such as FISCAL_YEAR=1996).
More complex criteria, such as In (...), Between, or a Join to another table, generally either prevent a random-access match on the index column or prevent matching any of the columns to its right.

As a general rule, it is most efficient to access ledger data through trees by accessing the tree table first, then using the detail ranges (or values) for the selected nodes to select the desired rows from the ledger. If the Select statement joins multiple trees, the database engine should select the one that best fits the available indexes or the one with the highest cardinality (if multiple indexes are possible).

Different techniques exist for showing the access path for a given SQL statement.

DB2

First, create a PLAN_TABLE if your database doesn’t already have one. You can find a sample Create statement in the DB2 Performance Tuning guide.

Include the SQL statement in the following and execute it via a utility like SPUFI:

DELETE FROM PLAN_TABLE WHERE QUERYNO=nnn;
EXPLAIN PLAN SET QUERYNO=nnn FOR statement;

where nnn is a number you assign to this statement.

Retrieve the plan from the PLAN_TABLE with the following Select statement:

SELECT QBLOCKNO, PLANNO, TNAME, ACCESSNAME, METHOD, ACCESSTYPE, MATCHCOLS, INDEXONLY, PREFETCH, SORTC_GROUPBY
FROM PLAN_TABLE
WHERE QUERYNO=nnn
ORDER BY QBLOCKNO, PLANNO;

The table contains other plan information; these are generally the most pertinent columns for PS/nVision queries.

Oracle

First, create a PLAN_TABLE if your database does not already have one. Here is a sample Create statement:

CREATE TABLE PLAN_TABLE(STATEMENT_ID VARCHAR2(254),
TIMESTAMP DATE,
REMARKS VARCHAR2(80),
OPERATION VARCHAR2(30),
OPTIONS VARCHAR2(30),
OBJECT_NODE VARCHAR2(128),
OBJECT_OWNER VARCHAR2(30),
OBJECT_NAME VARCHAR2(30),
OBJECT_INSTANCE  NUMERIC,
OBJECT_TYPE VARCHAR2(30),
OPTIMIZER VARCHAR2(255),
SEARCH_COLUMNS NUMERIC,
ID NUMERIC,
PARENT_ID NUMERIC,
POSITION NUMERIC,
other long);

You can use SQL*Plus to evaluate access plans interactively. First, include the SQL statement in the following code and execute it.

DELETE FROM PLAN_TABLE WHERE QUERYNO=nnn;
EXPLAIN PLAN SET STATEMENT_ID = 'nnn' FOR
statement;

where nnn is an identifier you assign to this statement.

Retrieve the plan from the PLAN_TABLE with the following Select statement:

SELECT LPAD(' ',2*LEVEL)||OPERATION,OPTIONS,OBJECT_NAME,
  OBJECT_INSTANCE,SEARCH_COLUMNS
FROM PLAN_TABLE
WHERE STATEMENT_ID='nnn'
CONNECT BY PRIOR ID = PARENT_ID
  AND STATEMENT_ID='nnn'
START WITH ID = 1
  AND STATEMENT_ID='nnn'
ORDER BY ID;

The plan is retrieved in a hierarchical tree format, in which the steps are evaluated from inside out, and then top to bottom. The first step listed (not indented) is actually the final step in the plan, and it is preceded by the step on the following line. For example, a Join is presented first, followed by two indented lines showing the two tables joined and the indexes used to access them.

Designing indexes for PS/nVision can be difficult because different reports can have different criteria. The following table lists the index columns and suggestions.

Index Column	Suggestions
Fiscal_Year	When using a TimeSpan (required for ledger reporting but optional for queries), PS/nVision always generates an equality for fiscal year (for example, FISCAL_YEAR=1996). If the TimeSpan requires data from multiple fiscal years, then PS/nVision generates multiple Select statements, one for each fiscal year, which makes Fiscal_Year a good candidate for the first column in a multicolumn index.
Ledger	When accessing ledger data, PS/nVision retrieves data from only one ledger at a time, so this column is guaranteed to have an equality. Thus, Ledger is a good choice as the second column in a multicolumn index.
Business_Unit	If you use the report request option to retrieve data from the requesting business unit only, PS/nVision generates an equality (for example, BUSINESS_UNIT=‘M04’) for this column. If this is the most common way of requesting reports or if you use a scope to get instances by business unit, then use business unit as the second or third column in a multicolumn index, especially if you have many business units in the same ledger table. If most of your reporting accesses multiple business units in a single instance, position it in the index as you would any other field.
Accounting_Period	When using TimeSpans, accounting period is specified using either an equality (ACCOUNTING_PERIOD=12) or a range (ACCOUNTING_PERIOD BETWEEN 10 AND 12). If you do a lot of reporting for the current period or other single periods (such as current period a year ago), performance may be improved using an alternate index beginning with accounting period. A good optimizer uses this index only when accounting period has an equality.
Account	In many companies, Account is the field with the highest cardinality. It also has criteria in almost all ledger reports, in part because PS/nVision enforces this rule. Use it as the next index column following the columns that you expect to have equalities.

Using Indexes and PS/nVision Tuning

Understanding Indexes

Multi-Column Indexes

Optimizers

Filter Factors

Index Matching

Ledger Access Paths

Access Path Analysis

DB2

Oracle

Index Suggestions