Tuning, 4 of 5

Tuning Your Data Warehouse

Overview

The speed and efficiency with which Oracle can respond to a query strongly affects the response time experienced by end users. All data warehouses can benefit from the creation of the best possible indexes and materialized views. The wrong type of indexes or materialized views generated by the wrong type of SQL commands may degrade performance rather than improve it.

Extremely large data warehouses should be striped and partitioned. Before taking this step, however, you should confirm that the data at the lowest levels of aggregation are truly needed for the types of analysis being performed. Eliminating unnecessarily low-level data from your data warehouse is much easier than striping.

If your data warehouse is configured in a snowflake schema, you should look at the frequency that queries must perform joins on dimension tables. Denormalizing can improve performance significantly.

You might also want to consider storing some of the data in a persistent analytic workspace. This option is discussed in "Data Storage Alternatives".

Data types of key columns

In a dimension table, the key column should have a NUMBER data type for the best performance. A primary index is always created on the key column to ensure that each row has a unique value. A NUMBER data type reduces the amount of disk space needed to store the index values for the key, since the index values are also stored as numbers instead of text strings. The smaller the index, the faster the database can search it.

The larger the number of values in the dimension, the greater the improvement in performance of NUMBER keys over VARCHAR (or other text) keys. Since time dimensions are typically rather small, a NUMBER key will improve performance only slightly. Thus, time dimensions can have either NUMBER or VARCHAR keys with little difference in performance between the them. However, dimensions for products and geographical areas often have thousands of members, and the performance benefits of NUMBER keys can be significant.

Indexing

Indexing is a vital component of any data warehouse. It allows Oracle to select rows quickly that satisfy the conditions of a query, without having to scan every row in the table. B-tree indexes are the most common; however, bitmap indexes are often the most effective in a data warehouse.

A column identifying gender will have in each cell one of two possible values to indicate male or female. Because the number of distinct values is small, the column has low cardinality. In dimension tables, the parent level columns also have low cardinality because the parent dimension values are repeated for each of their children.

A column containing actual sales figures might have unique values in most cells. Because the number of distinct values is large, columns of this type have high cardinality. Most of the columns in fact tables have high cardinality. Dimension key columns have extremely high cardinality because each value must be unique.

In fine-tuning your data warehouse, you may discover factors other than cardinality that influence your choice of an indexing method. With that caveat understood, here are the basic guidelines:

Create bitmap indexes for columns with low to high cardinality.
Create B-tree indexes for columns with very high cardinality (that is, all or nearly all unique values).

Striping and partitioning

Striping and partitioning are techniques used on large databases that contain millions of rows and multiple gigabytes of data. Striping is a method of distributing the data over your computer resources (such as multiple processors or computers) to avoid congestion when fetching large amounts of data. Partitioning is a method of dividing a large database into manageable subsets. Using partitions, you can reduce administration down-time, eliminate unnecessary scans of tables and indexes, and optimize joins.

If other methods of optimizing your database have not been successful in bringing performance up to acceptable standards, then you should investigate these techniques.

Materialized views

Oracle will rewrite queries written against tables and views to use materialized views whenever possible. For the optimizer to rewrite a query, it must pass several tests to verify that it is a suitable candidate. If the query fails any of the tests, then the query is not rewritten, and the materialized views are not used. And when the aggregate data must be recalculated at runtime, performance degrades.

All materialized views for use by the OLAP API must be created from within the OLAP management tool of OEM. Materialized views created elsewhere in Oracle Enterprise Manager or directly in SQL are unlikely to match the SQL statements generated by the OLAP API, and thus will not be used by the optimizer. Refer to "Creating Materialized Views" for information about generating appropriate materialized views for OLAP applications.

Related information

Refer to the Oracle9i Data Warehousing Guide for detailed information on indexing, partitioning, and striping.

Your performance solution might also be in designing a hybrid OLAP system in which your most frequently accessed data is stored in a a persistent analytic workspace. Refer to "Data Storage Alternatives".