12 Creating and Persisting Aggregates for Oracle BI Server Queries

Most data warehouse practitioners create aggregated data tables to dramatically improve the performance of highly summarized queries. These aggregate tables store precomputed results that are aggregated measures (typically summed) over a set of dimensional attributes. Using aggregate tables is a typical technique used to improve query response times in decision support systems.

If you write SQL queries or use a tool that only understands what physical tables exist and not their meaning, then using aggregate tables becomes more complex as the number of aggregate tables increases. The aggregate navigation capability of the Oracle BI Server allows queries to use the information stored in aggregate tables automatically. The Oracle BI Server lets you concentrate on asking the right business question, and then the server decides which tables provide the fastest answers.

Oracle Business Intelligence has an aggregate navigation feature to take advantage of those aggregates in source databases (for more information, see Chapter 10). However, it can be time consuming to create and maintain the data aggregation, as well as load database scripts and the corresponding metadata mappings. For that reason, Oracle Business Intelligence provides an aggregate persistence feature that automates the creation and loading of the aggregate tables and their corresponding Oracle Business Intelligence metadata mappings.

This chapter explains how to set up and use aggregate persistence in Oracle Business Intelligence.

This chapter contains the following topics:

About Aggregate Persistence in Oracle Business Intelligence
Identifying Query Candidates for Aggregation
Using the Aggregate Persistence Wizard to Generate the Aggregate Specification
Writing the Create Aggregates Specification Manually
Running the Aggregate Specification Against the Oracle BI Server
Troubleshooting Aggregate Persistence

About Aggregate Persistence in Oracle Business Intelligence

Use the Aggregate Persistence feature to create aggregates for Oracle BI Server queries. The Aggregate Persistence Wizard lets you automate the creation of the aggregate specification script. When you run this script against a live Oracle BI Server, aggregate tables are created and are mapped into the metadata for navigation. When aggregates are persisted, indexes and statistics are created on relational tables for greater performance.

The Aggregate Persistence Wizard creates a SQL script that you can run on a scheduled basis against the Oracle BI Server. In the Aggregate Persistence Wizard, you specify the measures, dimensionality, and other parameters of each star or cube based on your performance design. The script should run after each load of the base-level tables, so that the aggregates are always synchronized with the detail-level data when the load window completes and users begin to run queries.

Aggregate creation runs against the master server in a cluster. It takes some time for the metadata changes to propagate to the slaves. The cluster refresh time is a user-controlled option and results might be incorrect if a query hits a slave server before it is refreshed. It is the administrator's responsibility to set an appropriate cluster refresh interval.

Aggregate persistence requires a dedicated connection pool to create tables or cubes in the target database that will hold the aggregates. Because the Oracle BI repository enables federation, the aggregated target can be on the same database as the detailed source, or in a completely different database. This dedicated connection pool must be created before you run the Aggregate Persistence Wizard, so it can be selected during the appropriate step of the wizard.

The default prefix SA_ is automatically added to dimension (level) aggregates. You can change this default prefix by updating the AGGREGATE_PREFIX parameter in the AGGREGATE_PERSISTENCE section of the NQSConfig.INI file:

AGGREGATE_PREFIX = "prefix_name" ;

The target schema used to store aggregates must be appropriately secured and should not allow public access. The schema should have privileges to connect, create, and drop tables and indexes. By default, only users who belong to the BIAdministrators group can manage aggregates.

Do not use aggregate persistence against tables with active Virtual Private Database (VPD) security filters. There is a possibility that the aggregate information might be persisted without the VPD filter, posing a security risk.

Identifying Que ry Candidates for Aggregation

When creating aggregates, you must identify which queries would benefit substantially from aggregated data. You will achieve the best results by aggregating to the highest level possible. To identify slow-running queries, perform the following tasks:

Enable usage tracking in the Oracle BI Server. Usage tracking statistics can be used in a variety of ways, such as database optimization, aggregation strategies, and billing users or departments based on the resources they consume. The Oracle BI Server tracks usage at the detailed query level. When you enable usage tracking, statistics for every query are written to a usage tracking log file or inserted into a database table.

Note:
It is strongly recommended that you use the direct insertion into a database method for usage tracking. See "Managing Usage Tracking" in Oracle Fusion Middleware System Administrator's Guide for Oracle Business Intelligence Enterprise Edition for full information about usage tracking.
Analyze the query run times and identify the slowest running queries as candidates for aggregation. The run time for creating aggregates is dependent on the type of aggregates selected by the user. Creating aggregates from large fact tables is slower than from smaller tables. You should carefully select the aggregates to be created.

Using the Aggregate Persistence Wizard to Generate the Aggregate Specification

You can use the Aggregate Persistence Wizard to create the SQL file that will be used to create and load aggregate tables and map them into the metadata. The resulting SQL file must be executed against a running Oracle BI Server.

Note:

It is strongly recommended that you use the Aggregate Persistence Wizard because it automatically enforces many of the constraints necessary when generating the aggregate specification. However, you can manually write the aggregate Logical SQL as an alternative to using the wizard. Make sure to follow the guidelines described in "Writing the Create Aggregates Specification Manually" if you choose to write your own aggregates specification.

To use the Aggregate Persistence Wizard:

In the Administration Tool, select Tools > Utilities > Aggregate Persistence, and then click Execute.
On the Select File Location screen, specify the complete path and file name of the aggregate creation script. You can specify a new or an existing file name.

Typically, when you run the SQL script against the Oracle BI Server, it creates DDL and runs it against the target database schema to create the aggregate tables, then loads them from the source, and finally creates the Oracle BI Server metadata so the aggregate navigation feature can use the new tables.

Alternatively, you can select Generate target DDL in a separate file if you want the DDL to be stored in a separate file from the Oracle BI Server SQL script. When you select this option, two SQL scripts are generated:
- The create aggregates script (script_name)
- The prepare aggregates script (script_name_DDL)
Both files are stored in the following location:
```
ORACLE_INSTANCE\bifoundation\OracleBIServerComponent\coreapplication_obisn\
aggr
```
Selecting Generate target DDL in a separate file gives you the flexibility to alter the auto-generated DDL and run it independently of the Oracle BI Server. For example, you may want to alter the storage parameter or index settings. When you select this option, you first make manual updates to the DDL file, then you run the DDL file (prepare aggregates), then you run the create aggregates script.

Click Next after you have finished specifying options on the Select File Location screen.
In the Select Business Measures screen, select the measures on which you want to aggregate. To do this, select a business model in the upper pane, then select a single fact table or a set of measures in the lower pane. You cannot select measures that span multiple fact tables. Use Ctrl-click to select multiple measures, or use Shift-click to select a range of consecutive measures.

Note that the View Script button is not available during the creation of the first aggregate table block.

Figure 12-1 shows the Select Business Measures screen.

Figure 12-1 Aggregate Persistence Wizard: Select Business Measures Screen

Description of "Figure 12-1 Aggregate Persistence Wizard: Select Business Measures Screen"

Click Next after you have selected the appropriate measures.
In the Select Levels screen, specify the level of aggregation by selecting a logical level for one or more dimensions. You can specify a surrogate key to be used for the fact-dimension join.

The default join option between the aggregated fact and dimension tables is the primary key defined in the logical level you selected. If the primary key of the level is large and complex, the join to the fact table is expensive, so using a surrogate key is recommended in this case. A surrogate key is an artificially generated key, usually a number. For example, a surrogate key in the level aggregate table would simplify this join, removing unnecessary (level primary key) columns from the fact table and resulting in a leaner fact table.

Using a surrogate key only changes the query response time, not the logical results of the queries. However, generating the surrogate keys can have the side effect of increasing the aggregate table load time. Therefore, the recommended setting is as follows:
- If the primary key for the logical level you have selected is already a single, numeric column, you typically should not select the Use Surrogate Key option since it may add to load processing time without producing a performance benefit.
- If the primary key for the logical level you have selected is a text string, or consists of multiple logical columns, you typically should use a surrogate key to improve the performance of the queries that join to that aggregate dimension. However, keep in mind that generating the surrogate key can increase the load time for that aggregate dimension table.
See "Adding Surrogate Keys to Dimension Aggregate Tables" for additional information about surrogate keys.

Figure 12-2 shows the Select Levels screen.

Figure 12-2 Aggregate Persistence Wizard: Select Levels Screen

Description of "Figure 12-2 Aggregate Persistence Wizard: Select Levels Screen"

Click Next after you have selected the appropriate level of aggregation.
In the Select Connection Pool screen, select the appropriate items to specify a location for the aggregate table.

A default aggregate table name is provided, and a prefix is added to the table name. The default prefix for the generated fact table is ag. For tables created for dimension (level) aggregates, the default prefix is SA_ and can be changed by updating the AGGREGATE_PREFIX property in NQSConfig.INI. See Oracle Fusion Middleware System Administrator's Guide for Oracle Business Intelligence Enterprise Edition for more information about changing configuration settings.

Figure 12-3 shows the Select Connection Pool screen.

Figure 12-3 Aggregate Persistence Wizard: Select Connection Pool Screen

Description of "Figure 12-3 Aggregate Persistence Wizard: Select Connection Pool Screen"

Click Next after you have provided connection pool information.
In the Finish screen, the View Script button becomes available for use, and the Logical SQL script appears for your review. Choose whether to define another aggregate (default) or end the wizard, and then click Next.
In the Finish Script screen, the complete path and file name appears. Click Finish.

See "Running the Aggregate Specification Against the Oracle BI Server" for information about using the SQL file to create aggregate tables.

Writing the Create A ggregates Specification Manually

If you choose not to use the Aggregate Persistence Wizard to create the script file, you can write the file manually. It is recommended that you use the Aggregate Persistence Wizard.

If you do not want the Oracle BI Server to modify your databases during aggregate creation, then you can specify this in the Aggregate Persistence Wizard by selecting the option Generate target DDL in a separate file. The Aggregate Persistence Wizard will create a DDL file (the "prepare aggregates" script) that you can use to create the empty aggregate tables. After this, you need to run the "create aggregates" script to populate the aggregate tables. This option provides some flexibility in case the database access to create tables is restricted. Note that you must run the prepare aggregates script before you run the create aggregates script.

This section contains the following topics:

What Constraints Are Imposed During the Create Process?
How to Write the Create Aggregates Specification
Adding Surrogate Keys to Dimension Aggregate Tables

What Constraints Are Imposed During the Create Process?

The following constraints are imposed during the create process:

Valid measures. A valid measure must have a valid aggregation rule. The following constraints apply to level-based measures:
- If the level is grand total alias, then that dimension must not be present in the list of levels for that aggregate specification.
- Any other level defined for this measure must be present in the list of levels for that aggregate specification.
If the above constraints are not met, then the entire aggregate specification is discarded. In addition, a measure is ignored by the create process if any of the following conditions are true:
- Measure is mapped to a session or repository variable.
- Measure is a derived measure.
- Measure has a default aggregation rule of None.
Measures that are ignored do not necessarily affect the aggregate specification. The remaining measures are used to create the aggregate.
Valid levels. A valid level must have a valid primary key. If a level is invalid, the aggregate specification is be discarded. Also, attributes of a level or its primary key are ignored if any of the following conditions are true:
- Attribute is mapped to session or repository variables.
- Attributes are not from the same logical table.
Valid aggregate specification. A valid aggregate specification has the following properties:
- Name length is between 1 and 18 characters (inclusive).
- At least one valid level must be specified.
- At least one valid measure must be specified.
- Must have a valid connection pool.
- Must have a valid output container (database/catalog/schema).
- Connection pool and container must belong to the same database.
- Only one level per dimension can be specified.
- Measures can only be from the same fact table.
- All logical components of the specification must be from the same subject area.
An aggregate specification is ignored if the name already exists in the output container because level aggregates are scoped by the entire database. However, if different catalogs or schemas are specified for the same fact aggregate name, it is allowed to have multiple facts with the same name but different scope in the same database.

Note that the aggregate specification is discarded if any dimension is not joined to a fact.

How to Write the Create Aggregates Specification

All metadata names (except for logical fact columns) are fully qualified. There are two modes of operation: Create and Delete. It is strongly recommended that you place all aggregate specifications under a single Create Aggregates statement.

Follow these guidelines when writing the aggregate specification:

Begin the script file with a Delete statement. It is essential to delete system-generated aggregates before creating new ones. This ensures that data is consistent and removes invalid or incomplete aggregates before you run the Create operation. The following statement is the syntax for deleting aggregates:
```
Delete aggregates;
```

The next statement should be a Create statement. The following is the syntax for creating aggregates:

Create|Prepare aggregates 
aggr_name_1
for  logical_fact_table_1 [(logical_fact_column_1, logical_fact_column_2,…)]   
at levels (level_1, level_2, …)
using connection pool connection_pool_name_1
in schema_name_1
[ ,aggr_name_2
for logical_fact_table_3 [(logical_fact_column_5, logical_fact_column_2,…)]   
at levels (level_3, level_2, …)
using connection pool connection_pool_name_2
in schema_name_2] ;

To specify multiple aggregates in a single Create Aggregates statement, follow these guidelines:
- Ensure that each of the multiple aggregate specifications are separated by a comma, and the entire aggregate creation script is terminated with a semicolon.
- In this file, only one Delete Aggregates statement should be specified at the beginning. Make sure that only one delete is issued per ETL run (unless a reset is called for).
  
  Caution:
  Any aggregate scripts that are run after the first one should not have a Delete Aggregates statement, or all previously created aggregates are removed.

For information about creating aggregates with surrogate keys, see the following section.

Adding Surrogate Keys to Dimension Aggregate Tables

The join option default between fact and level aggregate tables uses primary keys from the level aggregate. If the primary key of the level is large and complex (composite of many columns), then the join to the fact table is expensive. A surrogate key is an artificially generated key, usually a number. A surrogate key, in the level aggregate table, simplifies this join and removes unnecessary columns (level primary key) from the fact table, resulting in a smaller-sized fact table. Adding surrogate keys to the dimension (level) aggregate tables can simplify joins to the fact tables and might improve query performance. Additionally, a surrogate key makes sure that each aggregate table has a unique identifier.

There might be cases in which a level is shared among multiple fact tables. One fact might use surrogate keys, and another might use primary keys from the dimension aggregate. The following are some options for resolving this issue:

Set a metadata property for levels that indicates whether to use surrogate keys or primary keys.
Always create a surrogate key for a level aggregate (relatively low cost operation). Then, decide later if the fact aggregate should join to it using a surrogate or primary key.

An alternative to specifying the join type for each dimension is to specify if surrogate keys should be used for the entire star. This would result in simpler syntax, but would also restrict the available user options and slow the aggregate creation process.

About the Create/Prepare Aggregates Syntax

The following syntax for create/prepare aggregates contains the change for [Using_Surrogate_Key]. The surrogate key option can be specified for each level. If unspecified, the fact and dimension tables are joined using the primary key from the level aggregate.

Create|Prepare aggregates 
aggr_name_1
[file output_file_name]
for  logical_fact_table_1 [(logical_fact_column_1, logical_fact_column_2,…)]   
at levels (level_1 [Using_Surrogate_Key], level_2, …)
using connection pool connection_pool_name_1
in schema_name_1
[ ,aggr_name_2
for logical_fact_table_3 [(logical_fact_column_5, logical_fact_column_2,…)]   
at levels (level_3, level_2, …)
using connection pool connection_pool_name_2
in schema_name_2] ;

About Surrogate Key Output from Create/Prepare Aggregates

The changes to the current process are restricted to the physical metadata layer in the repository and the database.

When you use the Using_Surrogate_Key join option, the following describes the results:

For a level aggregate, the following occurs:
- In the physical metadata, the following occurs:
  - The level aggregate table has a new column called levelName_upgradeIDSK (check for collisions). This is the surrogate key column for the dimension aggregate. Note that levelName is truncated if the total number of characters exceeds 18.
  - The type of this column is SMALLUINT.
- In the database, the following occurs:
  - The level aggregate table also has a corresponding column called levelName_upgradeIDSK. Again, levelName is truncated if the total number of characters exceeds 18.
  - It can be populated using RCOUNT().
For a fact aggregate, the following occurs:
- In the physical metadata, the following occurs:
  - The fact aggregate table no longer contains columns from the level's primary keys.
  - Instead, a new column that corresponds to the level aggregate's surrogate key is added to the table.
  - The type of this column is identical to the level's surrogate key.
  - The column has the same name as that in the level aggregate (check for collisions).
  - The fact table and the level table are joined using this surrogate key only.
- In the database, the following occurs:
  - The fact aggregate table also has the corresponding surrogate key.
  - It is populated using new capabilities to be available through Populate.

Running the Aggregate Specification Against the Oracle BI Server

Before you run the aggregate specification against the Oracle BI Server, you should set an appropriate logging level. Trace logs are be logged to nqquery.log if the logging level is at least 2. The logging events include the aggregate execution plan and the order in which the aggregates are created and deleted. Higher logging levels provide more details about the query and execution plans. Error logs are logged to nqquery.log if the logging level is at least 1, and to nqserver.log regardless of the logging level.

After generating the SQL script file and setting the logging levels, you can run the SQL script using nqcmd. The SQL file must be executed against a running Oracle BI Server. See "Testing and Refining the Repository" for more information about running nqcmd.

After executing the SQL script, aggregates are created and persisted in the Oracle BI Server metadata, as well as in the back-end databases.

Troubleshooting Aggregate Persistence

The following is a list of some reasons errors can occur:

Network failure.
No disk space on the database.
Bad aggregate request.

If there is an error in the creation of any aggregate, then the entire aggregate request is terminated and subsequent aggregates are not created. Aggregates that are already created and checked in remain checked in. If there are errors, you must remove them at the time of the error or at the next ETL run in one of the following ways:

Manually remove the aggregates from the metadata and the database.
Automatically remove all the aggregates using the Delete Aggregates specification.