Performance Recommendations
This section provides recommendations to utilize Oracle database partitioning to improve the performance of the overall product. It focuses on configuring the database for better system performance through methods such as partitioning, parallelism, and archiving.
On this page:
Partitioning
Partitioning helps to scale a data warehouse by dividing the database objects into smaller pieces, enabling access to smaller, more manageable objects. Having direct access to the smaller objects addresses the scalability requirements of the data warehouses.
It takes longer to scan a big table than it takes to scan a small table. Queries against partitioned tables may access one or more partitions that are small in contrast to the total size of the table. Similarly, queries may take advantage of partition elimination on indexes. It takes less time to read a smaller portion of an index from the disk than to read the entire index. The index structures that share the partitioning strategy with the table, such as local partitioned indexes, can be accessed and maintained on a partition-by-partition basis.
The partitioning of facts is explained in the sections that follow.
Partitioning of the Snapshot Facts
Most of the out-of-the-box snapshot facts are delivered as monthly snapshots. The fact has rows for each month from the beginning of the implementation.
All snapshot facts should be partitioned by the snapshot date. The table below lists Snapshot facts and the column on which the partition should be created. When converting an existing fact into a partitioned fact, make sure that the indexes, that do not have the partition key columns, are created as the local indexes. Make sure that the partitions are also created for the future periods so that the future snapshot loads do not fail while adding data.
For monthly snapshots, primary partition based on the year and sub-partition based on the calendar month work well. For snapshots that are not monthly, a different set can be used.
Partitioning of the Accumulation Facts
The accumulation facts can be partitioned based on a date key. However, the data in the Accumulation facts will be spread across multiple dates and a list based partition will not work for the Accumulation facts. A range based partitioning scheme should be employed on the primary date_key column in the specific fact.
The following query gets the start and end range values from the CD_DATE dimension. Using this range, values for individual partitions can be defined.
select abs_month_nbr part_name_suffix , min(date_key) part_range_start , max(date_key) part_range_end from dwadm.cd_date group by abs_month_nbr order by 1;
Parallelism
The database can take advantage of the distinct data sets in separate partitions if you use parallel execution to speed up queries, Data Manipulation Language (DML), and Data Definition Language (DDL) statements. Individual parallel execution servers can work on their own data sets, identified by the partition boundaries.
The parallel execution enables the application of multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases typically associated with a Decision Support System (DSS) and data warehouses. You can also implement parallel execution on an Online Transaction Processing (OLTP) system for batch processing or schema maintenance operations, such as index creation. Parallel execution is also called ‘Parallelism’. Parallelism involves breaking down a task so that instead of one process doing all of the work in a query, many processes do part of the work at the same time.
For example, when four processes combine to calculate the total sales for a year, each process handles one quarter of the year instead of a single process handling all four quarters. The improvement in performance can be quite significant. The parallel execution improves processing for:
-
Queries requiring large table scans, joins, or partitioned index scans
-
Creation of large indexes
-
Creation of large tables (including materialized views)
-
Bulk insertions, updates, merges, and deletions
Note: For details on parallelism, partitioning, and other performance enhancement options, see the Oracle Database VLDB and Partitioning Guide 19c.
Optimizing the Top N Answers
The Top N charts have to go through millions of records to find out the Top N objects that meet the criteria. Top N materialized views rearrange and partition the data so that the data reads are optimal. However, at times, depending on the amount of data, additional configuration may be required to reduce the size of the data set that is being scanned in order to identify the Top N objects.
For example, a large-sized utility with six million customers may have around six million service points in their Oracle Utilities Meter Data Management application. This means that the Consumption fact in the Oracle Utilities Meter Data Management star schema may have several million records for every snapshot month. So, for each month-level partition on the Consumption Detail Level materialized view, there will be several million records. The detail-level pages (Top N Analysis and Unreported Usage Details) on the Usage Summary dashboard under Oracle Utilities Meter Data Analytics tries to access a huge volume of data, which can sometimes result in sluggish performance. The recommendation in such scenarios is to set a key prompt filter, such as 'City', to mandatory. This ensures the report looks at a smaller data set, which in turn improves the report performance. You can choose to make any set of filters mandatory based on your data requirements.
Archiving the Historical Data
Assuming that the facts are partitioned as per the recommendations, it is now possible to archive historical data by selecting specific partitions and archiving them to a backup. Archival rules should be based on your data retention policies. This can be set up as a scheduled activity.