|Oracle9i OLAP Developer's Guide to the OLAP DML
Release 2 (9.2)
Part Number A95298-01
Aggregating Data, 11 of 12
Maintenance of an analytic workspace must usually be done within a restrictive batch window. For this reason, many DBAs perform partial aggregations rather than full aggregations each time they refresh the data. When all of the data is pre-aggregated, this does not present a problem. However, when partial aggregations are performed on data that uses both pre-aggregation and runtime aggregation, then steps must be taken to ensure that the results are correct. Errors in the data occur when the status list generated by the
PRECOMPUTE keyword is outdated.
PRECOMPUTE clause produces a status list that:
AGGREGATEcommand which data should be pre-calculated, and
AGGREGATEfunction what the
AGGREGATEcommand has done
If you never use the
AGGREGATE command with the
AGGREGATE function, you do not need the information.
You should read this information to address the following circumstances:
PRECOMPUTEclause in a
RELATIONcommand in an aggregation map.
PRECOMPUTEclauses: When you use the
PRECOMPUTEkeyword in an aggregation map, that
PRECOMPUTEclause can be data-dependent instead of simply identifying dimension values or levels.
Incremental data loading refers to the process of loading new input data into an existing analytic workspace and then aggregating that data. This usually happens on a regular basis, whether it is on a monthly, weekly, or even daily basis.
For example, suppose you design a new analytic workspace. It contains two variables:
units. Suppose that when you build the analytic workspace for the first time, you have input data for one year for both variables. Because
units contain exactly the same dimensions in exactly the same order in their definitions, you define one aggregation map that will be shared by both
units. You load that input data into the analytic workspace, then use the
AGGREGATE command to roll up that input data.
You know that you will be getting new input data for
units on the first day of every month. For example, suppose it is March 1. On this day, you expect to receive the sales data and units data for the previous month of February. Your responsibility is to load the February data into the existing analytic workspace and aggregate that input data. This is an incremental data load. The next incremental data load will take place on April 1, and so on.
Typically, when you aggregate this new data, you will use a
LIMIT command to ensure that only the new input data will be aggregated. For example, to aggregate only the new input data that you have loaded for February, you might use the following commands:
This is acceptable as long as you do not change any of the
PRECOMPUTE clauses in the aggregation map. If you do, then you must pre-aggregate all of the data.
If you change a
PRECOMPUTE clause, then the status list will change. This means that although the data that is produced by the
AGGREGATE command after you change the
PRECOMPUTE clause will be correct, Oracle OLAP may not be able to return the data that is requested by a user using the
AGGREGATE function. The status list might indicate that a value has already been calculated when in fact it has not.
If you make any changes to any
PRECOMPUTE clause in one or more
RELATION commands in an aggregation map, then you must pre-aggregate all of the data. Otherwise, the
AGGREGATE function will use a
PRECOMPUTE status list that is out of synchronization with the data, and thus may not generate all of the required values.
Use the following procedure to be sure the data will be aggregated correctly:
PRECOMPUTEclauses in your aggregation map.
ALL. (You can use one
ALLSTATcommand, or a
LIMIT TO ALLcommand for every dimension in the aggregation map.)
FUNCDATAkeyword when you execute the
AGGREGATEcommand in Step 4.)
The clause that follows the
PRECOMPUTE keyword is like a
LIMIT command. You have the flexibility to specify the limit expression using the values of the data. For example, you can specify the five areas with the lowest sales figures in a time period. The
RELATION command might look like this:
Data-dependent limit expressions can vary in their results. In other words, the "bottom five" areas in the analytic workspace that you build in February will not necessarily be the same "bottom five" areas after performing an incremental data load in March. Furthermore, the "bottom five" areas in your March will not necessarily be the same "bottom five" areas after the April incremental data load.
In this situation, the
PRECOMPUTE status list is out of synchronization, and the
AGGREGATE function may not calculate a needed value because the status list indicates that it was precomputed.
Instead of using a data-dependent
PRECOMPUTE clause, you can either:
As you load and aggregate incremental data over the course of time, the status list that is generated by the
PRECOMPUTE keyword remains constant when you use one of these methods. However, the five stores in the limit expression or valueset remains the same, regardless of whether or not they still represent the stores with the lowest sales figures.
To keep the limit phrase current, take the following steps:
Refer to "Incremental Data Loading" for the general guidelines you should follow.
If you have changed the input data or your hierarchies, then replace any data that has been aggregated with
NA values. These are the steps that you might take: Limit the dimensions to the input data, create a new variable, copy the data from the original variable to the new variable, delete the original variable, and rename the new variable to the name of the original variable.
Instead of using data-dependent
PRECOMPUTE clauses, use specific dimension values in the
PRECOMPUTE clause. After loading the data, issue a data-dependent
LIMIT command to identify the dimension values. Then list those values in the
PRECOMPUTE clause. For example,
LIMIT time TO '2001' LIMIT channel TO 'TOTALCHANNEL' LIMIT product TO 'TOTALPROD' LIMIT geography TO BOTTOM 5 BASED ON sales STATUS geography The current status of GEOGRAPHY is: BOGOTA, BORDEAUX, EDINBURGH, KYOTO, BRUSSELS
You would then change the
PRECOMPUTE clause to list these areas:
If you want to use data-dependent
PRECOMPUTE clauses, create and use a valueset with the
A valueset can be used to store a list of values. For example, the following commands create a valueset for the geography dimension. After performing an incremental update, you would need to update the valueset, but you would not need to edit the aggregation map.
The following commands create a valueset for
DEFINE lowsales.geog VALUESET geography LIMIT time TO '2001' LIMIT channel TO 'TOTALCHANNEL' LIMIT product TO 'TOTALPROD' LIMIT lowsales.geog TO BOTTOM 5 BASED ON sales
VALUES function returns the status list of the valueset:
RELATION command uses the valueset:
Once you have defined a hierarchy and you have aggregated data, if you move one or more dimension values to a different parent in the hierarchy, then you have changed the hierarchy.
For example, suppose your
geography hierarchy has input data for stores. The store data rolls up into cities. The cities roll up into regions, and so on.
You define your dimensions and variables. You define the hierarchies for your dimensions. You load data and roll it up. Several months later, after you have loaded and rolled up incremental data, one of the stores changes location. For example,
STORE22 closes its location in Hull, Massachusetts and then reopens at a new location in Waltham, Massachusetts. Therefore,
STORE22 now is part of the
WEST BOSTON region instead of
SOUTH BOSTON region.
Therefore, you must move the
STORE22 dimension value so that its data will roll up to different dimension values in the higher levels of the hierarchy. For example, you must move
STORE22 from the
HULL path to the
When you move one or more dimension values so that their data rolls up in a different path in the hierarchy, you have changed the hierarchy.
Suppose that you receive the most recent month's worth of data for
STORE22. You load that data and aggregate it.
Today you find out that last month the store moved to a new city, as well as a new region. This means that you have already aggregated the
STORE22 data into
HULL, when the
STORE22 data now should be aggregated into
The problem is that you not only need to change the hierarchy, but you need to correct the data so that the
STORE22 data aggregates into
WALTHAM instead of
When you change a hierarchy, you can re-aggregate the data in the analytic workspace (after you have changed the hierarchy) in one of two ways:
The advantage of a partial aggregation is that it takes a shorter period of time to complete than a full aggregation. However, the advantage of performing a full rollup is that you know the results will be correct.
Therefore, if you move one or two dimension values in your hierarchy, and you have a small window of time to roll up the analytic workspace, you can perform a partial aggregation; otherwise, perform a full aggregation.
Follow these steps to aggregate the data for the former parents and the current parents of the dimension value that moved in the hierarchy.
STORE22is the dimension value whose data now aggregates to
STORE22was previously grouped with
STORE24; either one qualifies as a previous sibling of
By identifying the dimension value that has moved, you can recalculate its new ancestors (such as
WALTHAM). By identifying a previous sibling of the dimension value that has moved, you can recalculated its previous ancestors (such as