Compression and Partitioning

6 Compression and Partitioning

This chapter describes how Retail Insights implements compression and offers a discussion of Oracle partitioning.

Overview of Compression

Although data warehouses are often very large, the amount of detail generated in some Retail Insights tables is enormous even by usual standards. That is, a retailer with 500,000 items and 500 locations would generate 250,000,000 new rows each day. Storing this amount of uncompressed data is impractical from a disk storage perspective, in the cost to store the rows, the cost to perform backups, and other database maintenance operations.

One approach that Retail Insights uses to reduce the data volume is compression. This chapter describes:

What compression does
Mechanics of compression
Which tables are currently compressed
Oracle features that are related to compression
Strategies for implementing compressed tables

What Compression Does

Compression refers to storing physical data that only reflects changes to the underlying data source, and filling in the gaps between actual data records through the use of database views. This method is engaged primarily for subject areas that are perpetual, such as inventory. That is, when querying sales data, a valid sale record exists (a sale occurred) or a record does not exist (no sale occurred). However, when querying for on-hand inventory, even if no change occurred to the inventory on the date desired, a valid value is still required. One way to resolve this discrepancy is to store a record for every day and a valid item-location combination as mentioned above. Another method, compression, allows for the storage of only changes to the inventory position. The query is resolved by looking backward through time from the desired date (if no change record exists on that date) until an actual change record is found. This method returns the correct current data with the minimum requirements necessary for processing and storing data.

Retail Insights compression is different with Oracle DB table compression. Oracle DB table compression compress data by eliminating duplicate values within a data block. Any repetitive occurrence of a value in a block is replaced by a symbol entry in a "symbol table" within the data block. So for example DEPT_NUM=10 is repeated five times within a data block, it will be only stored once and for the other four times a symbol entry will be stored in symbol table. Oracle database table compression can also significantly reduce disk and buffer cache requirements for database tables while improving query performance. Oracle database compressed tables use fewer data blocks on disk, reducing disk space requirement.

Mechanics of Compression

The purpose of decompression views is to give the application the illusion that there is a record for each possible combination (that is, an item-location-day record for each permutation), when in fact there is not. Thus, the fact of whether a table is compressed or not should not be visible to the application that queries data from that table. A compressed table is made up of two distinct parts: a 'seed' that consists of all existing combinations at a point in time (typically the first day or week of the table or partition) and the changed data since that time. Retail Insights compressed tables use FROM_DT_WID and TO_DT_WID columns to indicate the time range in which records are valid. When resolving a query for a particular record, the decompression view provides the latest record for the requested item and location with the maximum day that is less than or equal to the requested day. A decompression view needs to encompass both the seed and all of the changed data since that seed. A decompression view compares FROM_DT_WID and TO_DT_WID of records with FROM_VALUE and TO_VALUE on partition mapping table W_RTL_PARTITION_MAP_G to make sure that a right partition is used by the decompression view.

To illustrate how the decompression views actually work, assume the following:

The user is interested in the inventory position of item 10 at location 10 on 1/23/02.
The seed was done on 1/1/02. Changes were posted on 1/4/02, 1/15/02, and 1/30/02.
The row that is presented to the application by the decompression view is the row on 1/15/02, because it is the latest date that is less than or equal to the requested date.

As a second example, assume that the inventory position of item 10, location 10, day 1/3/02 was desired. Because there was no change record less than or equal to the desired date, the seed record from 1/1/02 will be presented to the application.

Compression's performance is excellent when the user is querying for a single day (as in the example above). When querying over a group of days, however (that is, all of the inventory positions at a given location on a given day), the performance can be unacceptable. Even though the user is requesting a group of information back, and in most cases the database can process groups of information efficiently, each individual row must be evaluated individually by the decompression view and cannot be processed as a group. To counteract the slow performance of these summary operations, you may take advantage of compressed table partition seeding (see "Overview of Partitioning Strategies").This partition seeding utilizes the latest position status tables (also known as 'current' tables). An example is the W_RTL_INV_IT_LC_G table, which holds the current decompressed position for every item and location on the W_RTL_INV_IT_LC_DY_F table. This position can be used as a partition seed. This position is also utilized by base Retail Insights code during major change fact seeding.

Compressed Tables and 'CURRENT' Tables

The table below illustrates the compressed tables within Retail Insights, along with their corresponding 'CURRENT' tables.

Table 6-1 Compressed Tables and CURRENT Tables

Compressed Tables	Current Tables
W_RTL_PRICE_IT_LC_DY_F	W_RTL_PRICE_IT_LC_G
W_RTL_BCOST_IT_LC_DY_F	W_RTL_BCOST_IT_LC_G
W_RTL_NCOST_IT_LC_DY_F	W_RTL_NCOST_IT_LC_G
W_RTL_CO_LINE_STATUS_F	W_RTL_CO_LINE_STATUS_G
W_RTL_COMP_PRICE_IT_LC_DY_F	W_RTL_COMP_PRICE_IT_LC_G
W_RTL_INVU_IT_LC_DY_F	W_RTL_INVU_IT_LC_G
W_RTL_INV_IT_LC_DY_F	W_RTL_INV_IT_LC_G
W_RTL_PO_ONALC_IT_LC_DY_F	W_RTL_PO_ONALC_IT_LC_G
W_RTL_PO_ONORD_IT_LC_DY_F	W_RTL_PO_ONORD_IT_LC_G

Coping with Slowly Changing Dimension Type 2

Fact Close Program (factcloseplp.ksh)

On a compressed fact table, a record is only posted to the table when there is a change in one of the fact attributes. If there is no activity, no record is posted. Decompression views then fill in the gaps between physically posted records to ensure that a fact record appears for each item-location-day combination in the user interface. However, when an item, location, or department is closed or major-changed, any fact record with those dimensions becomes inactive. The decompression views need to be informed to stop filling in the gap after the last record was posted. To accomplish this instruction, scenario PLP_RetailFactCloseFact (called by factcloseplp.ksh) first queries the W_RTL_PROD_RECLASS_TMP and W_RTL_ORG_RECLASS_TMP tables to determine the compressed item-location facts that need to be closed today. The PLP_RetailFactCloseFact scenario then updates TO_DT_WID to the current date WID to stop the record. The decompression view fills in records up to the day that is in the range between FROM_DT_WID and TO_DT_WID.

Fact Open Program (factopenplp.ksh)

Retail Insights Data Compression tables require seeding when a major change in the product and organization dimension causes new surrogate keys to be created for items or locations. Seeding the compressed tables is required because the new key represents a new hierarchy relationship. If the new key is not represented on the compressed table, the compression view does not pick up any data from the day the old dimensions were closed to the day a record with the new dimensions is posted to the compressed fact tables. This missed data causes inaccuracy in query results and incorrect data aggregation.

To accomplish this seeding scenario, PLP_RetailFactOpenFact (called by factopenplp.ksh) first queries the W_RTL_PROD_RECLASS_TMP and W_RTL_ORG_RECLASS_TMP tables to determine what compressed item-location facts need to be closed today. The PLP_RetailFactOpenFact scenario then inserts seeded (closed) records for tomorrow's FROM_DT_WID, indicating that the closed fact records are no longer valid beginning tomorrow, when the newly seeded records (from PLP_RetailFactOpenFact) become active. In the case of the compressed week table, W_RTL_INV_IT_LC_WK_A, PLP_RetailFactOpenFact inserts seeded records with next week's warehouse ID.

Oracle Table Compression

Oracle table compression not only helps customers save disk space, it also helps to increase cache efficiency since more blocks can fit in the memory. Advanced Compression is available for Oracle 11g Enterprise Edition and Hybrid Columnar Compression is available for Exadata only.

Since compression could cause contention when tables get updated, it is suggested users only compress non-current partitions and leave the current partition uncompressed. This partial compression approach has proven to be a valuable implementation option.

Overview of Partitioning Strategies

This section describes partitioning strategies for Retail Insights data marts. Although optional, partitioning provides powerful performance benefits, and therefore is highly recommended. Tables in the RA_partitioned_tables.xls spreadsheet (see the Oracle Retail Insights Installation Guide) are highly recommended to be partitioned. If a report runs slowly and a fact table in the query is not partitioned, that fact table may be a good candidate for partitioning. For large tables, such as the inventory, pricing, cost and sales tables, splitting them into table partitions can provide the following benefits:

Partitions are smaller and therefore easier to manage.
Management operations on multiple partitions can occur in parallel.
Partition maintenance operations (such as index rebuilds) are faster than full table operations.
Partition availability is higher than table availability (that is, when recovering a particular partition, users may access all other partitions of the table at the same time).
The optimizer can prune queries to access data in only the partition of interest, not the entire table (that is, if you are interested only in February's data, you do not need to look at any of the table's data outside of the February partition).
Partitions are separate database objects, and can be managed accordingly (that is, if December sales are frequently accessed throughout the year whereas other months are not, the December sales partition could be located in a special tablespace that allows for faster disk access).
In some situations, the Oracle database can create parallel operations on partitions that it cannot on tables; an example is joining between two different tables if they are partitioned on the same key (this feature is called a 'parallel partition-wise join').

Indexes, as well as tables, can be partitioned. Index partitions can be global (one index over the table, regardless of whether the table is partitioned or not) or local (there is a one-to-one correspondence between index partitions and table partitions). In general, when tables are partitioned, local indexes should be preferred to global indexes for the following reasons:

Maintenance operations involve only one index partition instead of the entire index (that is, if the oldest table partition is aged out, a local index partition can be dropped along with its corresponding table partition, whereas an entire global index will need to be rebuilt after it becomes unusable when a table partition is dropped).
The optimizer can generate better query access plans that use only an individual partition.
When multiple index partitions are accessed, the optimizer may choose to use multiple parallel processes rather than just one.

Implementing Retail Insights Partitioning

For retailers who choose to partition a fact table, the figure on the following page illustrate some of the possibilities for table and index layout.

In general, option 2 is the preferred solution for large regular or compressed tables (for example, the W_RTL_INV_IT_LC_DY_F and W_RTL_PRICE_IT_LC_DY_F tables). It uses table partitions and local indexes, thus minimizing the impact of index maintenance and the deletion of old table partitions. Global indexes on partitioned tables are not recommended.

Option 1 can be used for smaller compressed tables. The disadvantage is that, functionally, there is no way to delete historical data and the table continues to grow.

Figure 6-1 Retail Insights Partitioning Options

Setup and Maintenance for Partitioning Retail Insights Compressed Inventory Table

The following procedure describes how to setup and maintain Retail Insights partitioning of the compressed inventory table (W_RTL_INV_IT_LC_DY_F) using Retail Insights partitioning. Note that these steps are informational and are managed by Oracle in cloud environments.

Make the following determinations, among others (see the Oracle Retail Insights Installation Guide for details):
- Your partitioning strategy.
- The time period your partitions will use.
- The 'values less than' boundaries according to your multi business calendar WID values.
- How many partitions are to be used.
- The partition naming standard.
On the database, create the partitions and indexes for the tables you want to partition.
Verify you have populated the Time Calendar Dimension. See the Oracle Retail Insights Installation Guide for details.
Perform step numbers 2 and 3 whenever any of the following events occur:
- Records are added to or deleted from the Time Calendar tables W_MCAL_DAY_D (extending time calendar for a new time period).
- Partitions are added to the Inventory Position table W_RTL_INV_IT_LC_DY_F.

Other maintenance activities include archiving and removing of partitions.

Implementing Partitioning for Compressed Inventory Table

Once the tables (including partitions) and indexes have been created, the data must be loaded. For tables that have a corresponding current status table (such as W_RTL_INV_IT_LC_DY_F and W_RTL_INV_IT_LC_G), the following steps are recommended:

Note:

All these steps can be performed automatically by the Retail Insights seeding program PLP_RetailPartSeed.ksh. See the Oracle Retail Insights Operations Guide for detail about how to execute this script.

In the partition mapping table W_RTL_PARTITION_MAP_G, update column TO_VALUE with the current business date WID for the latest partition on the target table W_RTL_INV_IT_LC_DY_F.
Insert a new record to partition mapping table W_RTL_PARTITION_MAP_G with next business date WID or week WID as FROM_VALUE and dummy value '999999999999999' as TO_VALUE. Column TABLE_NAME must be populated with target table W_RTL_INV_IT_LC_DY_F and column PARTITION_NAME must be populated with 'P_XX'.

Note:

XX is the number part of current partition name on the same target table plus 1. This partition name 'P_XX' can be different from the real partition name used in the database.
Copy data in W_RTL_INV_IT_LC_G table as the seed to the first partition or a new partition that is going to be used on the next day.

At this point, only the changed records are added to the W_RTL_INV_IT_LC_DY_F table. Whereas the W_RTL_INV_IT_LC_G table is a full and uncompressed version that holds the current inventory position as of the last time period.
When a partition boundary is crossed, the W_RTL_INV_IT_LC_G table is copied as the seed to the new partition, via the PLP_RetailPartSeed.ksh program.

If you have questions about how to implement partitioning with compression or require assistance implementing partitioning, contact Oracle Customer Support or Oracle Retail Services.

Partitioning Automation Implementation

For newly created aggregate and planning tables, we run run_partition_util.ksh once before the batch cycle to create partitions as an initialization step during implementation.

Three new tables were created to manage this process, as shown below.

C_MODULE_ARCHIVE_TBSP

This is for future. Currently, archiving of tablespaces are disabled.
C_MODULE_ARTIFACT

This tables lists the modules that are supported by RI and those that are partitioned already.
C_MODULE_EXACT

This tables holds the explicit search criteria for the tables that belongs to specific modules. By default: RI uses "Sales Promotion" (SLSPR), Supplier (SUPP) and Customer Order (CO) using defined search logic.

In the C_MODULE_ARTIFACT table, enable modules as applicable. By default, all modules will be set to 'N'. Eg: if sales and inventory are available, then set the ACTIVE_FLG = 'Y' for these modules.
1. Set SCHEDULE_EOM_DAYS (Default Set to 0)
  
  Valid values are 0-6 with 0 = Saturday, 1 = Sunday,….6=Friday.
2. Set SCHEDULE_EOW_DAYS (Default set to (null)).
3. Enable ACTIVE_FLG for the modules where it is applicable. For example, if sales and inventory are available, then set the ACTIVE_FLG = 'Y' for these modules. By default all modules are set to 'N'.
Set the RI_PART_DDL_CNT_LIMIT parameter. This parameter is set in C_ODI_PARAM. By default it is set to 10000. This limits the number of statements that can be executed. This is configurable.
Perform the initial run to extend partitions for the newly created tables before the batch cycle.

run_partition_util.ksh – This is the shell script that runs the partition program. It needs to be run to ensure that the required partitions are created before the batch cycle.
Set the RI_PART_COMPRESSION_PERIOD parameter. This parameter is set in C_ODI_PARAM. By default it is set to 7M. This compresses records which are older than 7 Months (Periods). This is configurable.
Set the RI_PART_RETENTION_PERIOD parameter. This parameter is set in C_ODI_PARAM. By default it is set to 37M. This purges records older than 37 Months(Periods). This is configurable.

Partitioning Automation Operation

run_partition_util.ksh – This is the shell script that runs the partition program. This programs is scheduled to run after the batch cycle email notification is sent out.

How Oracle Implements Partitions

This section highlights how partitions are implemented in an Oracle data warehouse.

For details on partitioning concepts, refer to the chapter Partitioning in Data Warehouses in the Oracle Database Data Warehousing Guide 11g Release 1 (11.1).

Range partitions in the Oracle data warehouse/database are split by a range of values on the partition key. Examples include partitions by month, partitions by department number, and partitions by item range. Partitioning options also include hash partitions (spreading the rows across a fixed number of partitions by applying a hash function to the partition key), and composite partitioning (a combination of range partitioning and hash partitioning). It is recommended that you partition the tables using range partitioning. Oracle Retail also recommends that the partition key be the date field in the primary key to allow partitions to be aged out when no longer needed.

As a general guideline, partitioning must be considered for tables listed in the RA_partitioned_tables.xls spreadsheet (see the Oracle Retail Insights Installation Guide) and any fact tables in a slow-running query. There is an administrative trade-off between having more partitions to manage and obtaining the benefits of partitioning.

The actual physical layout of partitions varies from site to site. A general approach is to put each partition into its own tablespace and map each tablespace to a separate mount point. This has several advantages:

Maintenance operations, as well as tablespace recovery, can occur on a partition while other partitions are unaffected.
If manual performance tuning of the data files is being done, tablespaces and their files can be moved around to achieve optimal performance.
If partitions are no longer being updated, their tablespaces can be changed to READ ONLY, which significantly reduces backup requirements.
Separate mount points pointing to a separate set of physical drives significantly reduces I/O time.

Partitions are ordered from low values to high values. The partition key value for a partition is a non-inclusive upper bound (high value) for that partition. That is, if the W_RTL_SLS_IT_LC_DY_A table is partitioned by month, the high value for January, 2010, partition is 01-Feb-2010. A low value can always be inserted into the lowest partition. However, you may not be able to insert a high value depending on the high value of the highest partition. For instance, if the highest partition has a high value of 01-Feb-2000, and you attempt to insert a record with a date of 01-Feb-2010, the row will not be inserted into the table (the high value of 01-Feb-2010 is a non-inclusive upper bound). For this reason, a special high value partition with a key of MAXVALUE is available in the Oracle database. It is recommended that all partitioned tables include a dummy partition with a MAXVALUE high value.

There are special considerations for the partitioning of Retail Insights compressed tables. The following is a brief description of the different partition maintenance commands. Refer to the current Oracle database documentation set for more details:

ADD PARTITION: Adds a new partition to the high end of a partitioned table. Because it is recommended to have a MAXVALUE partition, and this is the highest partition, the ADD PARTITION functionality can be achieved by performing a SPLIT of the MAXVALUE partition instead.
DROP PARTITION: Drops the partition. This is the typical method to delete the oldest partitions (those with the lowest values) as they age to maintain a rolling window of data.
EXCHANGE PARTITION: Converts a non-partitioned table into a partitioned table or converts a partitioned table into a non-partitioned table.
MERGE PARTITION: Merges two adjacent partitions into one.
MOVE PARTITION: Moves a partition to another segment; this is used to defragment a partition or to change its storage characteristics.
SPLIT PARTITION: Splits an existing partition by adding a new partition at its low end.
TRUNCATE PARTITION: Removes all rows from the partition.

Oracle database automatically maintains local index partitions in a 1-to-1 correspondence with their underlying table partitions. Any table partition operations, such as ADD PARTITION, also affect the relevant index partitions.

Summary

Partitions are useful for breaking up large tables into smaller, more manageable pieces. Take note of the following partitioning recommendations when using Retail Insights in an on-premise implementation:

Consider partitioning tables that are in the RA_partitioned_tables.xls spreadsheet (see the Oracle Retail Insights Installation Guide) and fact tables that are in a slow- running query.
Use the date as the partition key for range partitioning.
When tables are partitioned, make their indexes local.
Consider putting each partition in its own tablespace and each tablespace on its own mount point.
After updates on a partition cease, consider changing its tablespace to READ ONLY to reduce backup requirements.
If partitioning compressed tables, be sure to address any special requirements for seeding.