Loading Data into Aggregate Storage Cubes

Essbase aggregate storage (ASO) cubes facilitate analysis of very large dimensions containing up to a million or more members. To help you load data to such large outlines, you can load incrementally, manage the data load buffers, and merge/replace slices of data.

To efficiently support loading data values into such large cubes, Essbase:

  • Allows processing multiple sources of data through temporary data load buffers

  • Allows you to control the percentage of resources a data load buffer uses

  • Allows an aggregate storage cube to contain multiple slices of data (a query to the database accesses each slice, collecting all of the data cells)

  • Provides an incremental data load process that completes in a length of time that is proportional to the size of the incremental data

To load values to aggregate storage cubes, you can use the Jobs page in the Essbase web interface, or you can use the alter database and import data statements in MaxL. Examples in this document are based on using MaxL.

Note:

If values have been calculated and stored through an aggregation, Essbase automatically updates higher-level stored values when data values are changed. No additional calculation step is necessary. The existence and size of an aggregation can affect the time it takes to perform a data load.

You cannot export data while loading data into a cube.

When copying an ASO application, to retain all of the data in the cube, you must merge all incremental data slices into the main database slice before you copy the application. Data in unmerged incremental data slices is not copied.

Incrementally Loading Data Using a Data Load Buffer

If you use incremental data load to load data values to an Essbase aggregate storage (ASO) cube, you can improve performance. Essbase loads the values to a temporary data load buffer first, with a final write to storage after all data sources have been read.

Using the import data MaxL statement to load data values from a single data source does not involve the aggregate storage data load buffer.

If you use multiple import database data MaxL statements to load data values to aggregate storage cubes, Essbase can utilize a temporary data load buffer, completing the final write to storage after all data sources have been read. Using the aggregate storage data load buffer can significantly improve overall data load performance.

In the aggregate storage data load buffer, Essbase sorts and commits the values after all data sources have been read. If multiple (or duplicate) records are encountered for any specific data cell, the values are accumulated. Essbase then stores the accumulated values—replacing, adding to, or subtracting from existing data values in the cube.

Note:

When using the aggregate storage data load buffer, the choice for replacing, adding, or subtracting values is specified for the entire set of data sources when loading the data buffer contents to the cube.

While the data load buffer exists in memory, you cannot build aggregations or merge slices, because these operations are resource-intensive. You can, however, load data to other data load buffers, and perform queries and other operations on the cube. There might be a brief wait for queries, until the full data set is committed and aggregations are created.

The data load buffer exists in memory until the buffer contents are committed to the cube or the application is restarted, at which time the buffer is destroyed. Even if the commit operation fails, the buffer is destroyed and the data is not loaded into the cube. You can manually destroy a data load buffer by using the alter database MaxL statement.

Note:

Stopping the application before committing the buffer contents destroys the buffer. In this situation, after restarting the application, you must initialize a new buffer and load the data to it.

To use the data load buffer for aggregate storage cubes:

  1. Prepare the data load buffer, where data values are sorted and accumulated by using the alter database MaxL statement to initialize an aggregate storage data load buffer. For example:
    alter database ASOsamp.Basic 
       initialize load_buffer with buffer_id 1;
  2. Load data from your data sources into the data load buffer using the import database MaxL statement. Use multiple statements to load data from multiple data sources. You can include any combination of data sources. Specify a rules file if the data source requires one.

    The following example loads two data sources, one of which uses a rules file, into the same data load buffer:

    import database ASOsamp.Basic data 
       from server data_file 'file_1.txt' 
       to load_buffer with buffer_id 1
       on error abort; 
    import database ASOsamp.Basic data
       from server data_file 'file_2' 
       using server rules_file ‘rule’ 
       to load_buffer with buffer_id 1;
       on error abort;
    

    To load data into multiple load buffers simultaneously, see Performing Multiple Data Loads in Parallel.

  3. Use the import data MaxL statement to commit the data load buffer contents to the cube. For example:
    import database ASOsamp.Basic data 
       from load_buffer with buffer_id 1;

    To commit the contents of multiple data load buffers into the cube with one MaxL statement, see Performing Multiple Data Loads in Parallel.

The following incremental data load example provides optimal performance when new data values do not intersect with existing values:

  1. Create a single data load buffer using the ignore_missing_values and ignore_zero_values properties. For example:

    alter database ASOsamp.Basic 
       initialize load_buffer with buffer_id 1
       property ignore_missing_values, ignore_zero_values;

    If the cube must be available for send data requests while being updated, initialize the data load buffer with the resource_usage grammar set for 80%. For example:

    alter database ASOsamp.Basic 
       initialize load_buffer with buffer_id 1
       resource_usage 0.8 property
       ignore_missing_values, ignore_zero_values;
  2. Load the data into the buffer. For example:

    import database ASOsamp.Basic data 
       from server data_file 'file_1.txt' 
       to load_buffer with buffer_id 1
       on error abort; 
    import database ASOsamp.Basic data
       from server data_file 'file_2'
       to load_buffer with buffer_id 1;
       on error abort;
    
  3. Commit the contents of the data load buffer to the cube by creating a slice and adding values. For example:

    import database ASOsamp.Basic data 
       from load_buffer with buffer_id 1
       add values create slice;

Controlling Data Load Buffer Resource and Disk Space Usage

When you use incremental data load to load data values to an Essbase aggregate storage (ASO) cube, you can put constraints on the allowed resource usage and wait time for the temporary data load buffer. You can reduce disk space usage by managing the tablespace.

Controlling Data Load Buffer Resource Usage

When performing an incremental data load, Essbase uses the aggregate storage cache for sorting data. You can control the amount of the cache a data load buffer can use by specifying the percentage. The percentage is a number between .01 and 1.0 inclusive; only two digits after the decimal point are significant—for example, 0.029 is interpreted as 0.02. By default, the resource usage of a data load buffer is set to 1.0, and the total resource usage of all data load buffers created on a database cannot exceed 1.0. For example, if a buffer of size 0.9 exists, you cannot create another buffer of a size greater than 0.1.

Note:

Send operations internally create load buffers of size 0.2; therefore, a load buffer of the default size of 1.0 will cause send operations to fail because of insufficient data load buffer resources.

To set the amount of resources the buffer is allowed to use, specify the percentage when you initiate the data load in the Essbase web interface. If using MaxL, use the alter database MaxL statement with the resource_usage grammar.

For example, to set the resource_usage to 50% of the total cache, use this statement:

alter database ASOsamp.Basic
   initialize load_buffer with buffer_id 1
   resource_usage .5;

If you plan to run concurrent send operations, use the ASOLOADBUFFERWAIT configuration setting and the alter database MaxL statement with the wait_for_resources grammar. ASOLOADBUFFERWAIT applies to the creation of aggregate storage data load buffers with the wait_for_resources option, and applies to allocations, custom calculations, data update operations.

Managing Disk Space For Incremental Data Loads

Incremental data loads on aggregate storage cubes may use disk space up to two times the size of your current data files. For example, assume that the size of a cube's data is 1 GB, and the size of the incremental data load is 200 MB, for a total size of 1.2 GB. During the incremental data load process, Essbase might use up to 2.4 GB of disk space.

In cases where databases are larger than 2 GB, you can reduce disk space utilization by setting the maximum file size of the default tablespace to no more than 2 GB.

To set the maximum file size of the default tablespace, you can use the alter tablespace MaxL statement.

Setting Data Load Buffer Properties

When incrementally load data values to an aggregate storage (ASO) load buffer, you can tell Essbase to ignore missing and zero values in the source data, and you can resolve cell conflicts (eliminate invalid aggregations by combining duplicate cells).

The data load buffer properties you can set are:

  • ignore_missing_values: Ignores #MI values in the incoming data stream

  • ignore_zero_values: Ignores zeros in the incoming data stream

  • aggregate_use_last: Combines duplicate cells by using the value of the cell that was loaded last into the load buffer

    Note:

    When loading text and date values into an aggregate storage database, use the aggregate_use_last property to help eliminate invalid aggregations. For other guidelines, see Loading, Clearing, and Exporting Text and Date Measures.

If you use multiple properties in the command and any conflict, the last property listed takes precedence.

Handling Missing and Zeroes in the Data Stream

When loading data incrementally, you can specify how missing and zero values in the source data are treated when loading the data into the data load buffer.

To set data load buffer properties, use the alter database MaxL statement with the property grammar.

For example:

alter database ASOsamp.Basic
   initialize load_buffer with buffer_id 1
   property ignore_missing_values, ignore_zero_values;

Resolving Cell Conflicts

For resolving cell conflicts for duplicate cells, you can specify whether to use the last cell loaded into the load buffer.

By default, when cells with identical keys are loaded into the same data load buffer, Essbase resolves the cell conflict by adding the values together.

To create a data load buffer that combines duplicate cells by accepting the value of the cell that was loaded last into the load buffer, use the alter database MaxL statement with the aggregate_use_last grammar.

For example:

alter database ASOsamp.Basic
   initialize load_buffer with buffer_id 1
   property aggregate_use_last;

Note:

When using data load buffers with the aggregate_use_last grammar, data loads are significantly slower, even if there are not any duplicate keys.

Performing Multiple Data Loads in Parallel

Multiple data load buffers can exist on an Essbase aggregate storage (ASO) cube. Although only one commit operation can be active at a time, you can commit multiple data load buffers in the same commit, which is faster than committing buffers individually.

To load data into multiple data load buffers simultaneously, use separate MaxL Shell sessions. For example, in one MaxL Shell session, load data into a buffer with an ID of 1:

alter database ASOsamp.Basic
   initialize load_buffer with buffer_id 1 resource_usage 0.5;
import database ASOsamp.Basic data
   from data_file "dataload1.txt"
   to load_buffer with buffer_id 1
   on error abort;

Simultaneously, in another MaxL Shell session, load data into a buffer with an ID of 2:

alter database ASOsamp.Basic
   initialize load_buffer with buffer_id 2 resource_usage 0.5;
import database ASOsamp.Basic data
   from data_file "dataload2.txt"
   to load_buffer with buffer_id 2
   on error abort;

When the data is fully loaded into the data load buffers, use one MaxL statement to commit the contents of both buffers into the database by using a comma-separated list of buffer IDs:

For example, this statement loads the contents of buffers 1 and 2:

import database ASOsamp.Basic data 
   from load_buffer with buffer_id 1, 2;

Note:

When loading SQL data into aggregate storage cubes, you can use up to eight rules files to load data in parallel. This functionality is different than the process described above. When preforming multiple SQL data loads in parallel, you can use one import database MaxL statement with the using multiple rules_file grammar. Essbase initializes multiple temporary aggregate storage data load buffers (one for each rules file) and commits the contents of all buffers into the cube in one operation.

Listing Data Load Buffers for an Aggregate Storage Cube

Multiple data load buffers can exist on an Essbase aggregate storage (ASO) cube. For a list and description of the data load buffers that exist on the cube, use the query database MaxL statement with the list load_buffers grammar.

The syntax of the MaxL statement to list ASO data load buffers is:

query database appname.dbname list load_buffers;

This statement returns the following information about each existing data load buffer:

Table 38-2 Data Load Buffer Information

Field Description

buffer_id

ID of a data load buffer (a number between 1 and 4,294,967,296).

internal

A Boolean that specifies whether the data load buffer was created internally by Essbase (TRUE) or by a user (FALSE).

active

A Boolean that specifies whether the data load buffer is currently in use by a data load operation.

resource_usage

The percentage (a number between .01 and 1.0 inclusive) of the aggregate storage cache that the data load buffer is allowed to use.

aggregation method

One of the methods used to combine multiple values for the same cell within the buffer:

  • AGGREGATE_SUM: Add values when the buffer contains multiple values for the same cell.

  • AGGREGATE_USE_LAST: Combine duplicate cells by using the value of the cell that was loaded last into the load buffer.

ignore_missings

A Boolean that specifies whether to ignore #MI values in the incoming data stream.

ignore_zeros

A Boolean that specifies whether to ignore zeros in the incoming data stream.

See: Query Database (Aggregate Storage)

Creating a Data Slice

You can incrementally commit the data load buffer to an Essbase aggregate storage (ASO) cube to create a slice. After loading the new slice into the cube, Essbase creates all necessary views on the slice (such as aggregate views) before the new data is visible to queries.

Creating a data slice is useful because it improves the performance of incremental data loads. The amount of time an incremental data load takes is proportional to the amount of new data; the size of the cube is not a factor.

To create a data slice, use the import database MaxL statement with the create slice grammar.

For example, to create a slice by overriding values (the default), use this statement:

import database ASOsamp.Basic data
   from load_buffer with buffer_id 1
   override values create slice;

Note:

If you use override values when creating a slice, #MISSING values are replaced with zeros. Using this option is significantly slower than using the add values or subtract values options.

See: Import Data (Aggregate Storage)

Merging Incremental Data Slices

When loading data incrementally to an aggregate storage (ASO) cube, you can manually merge the incremental data slices to the main cube slice, or you can use AUTOMERGE to configure Essbase to automatically merge slices during the data load.

Automatically Merging Incremental Data Slices During a Data Load to an Aggregate Storage Cube

Using the AUTOMERGE and AUTOMERGEMAXSLICENUMBER configuration settings, you can specify whether Essbase automatically merges incremental data slices during a data load to an aggregate storage cube.

AUTOMERGE configuration setting options:

  • ALWAYS—Specifies to automatically merge incremental data slices during a data load to an aggregate storage cube. By default, merges are executed once for every four consecutive incremental data slices. If, however, the AUTOMERGEMAXSLICENUMBER configuration setting is used, the auto-merge process is activated when the AUTOMERGEMAXSLICENUMBER value is exceeded. The size of the incremental data slices is not a factor in selecting which ones are merged.

    The default value is ALWAYS.

  • NEVER—Specifies to never automatically merge incremental data slices during a data load to an aggregate storage cube. To manually merge incremental data slices, use the alter database MaxL statement with the merge grammar.

  • SELECTIVE—Specifies to activate the incremental data slice auto-merge process when the number of incremental data slices specified in the AUTOMERGEMAXSLICENUMBER configuration setting is exceeded. If the number of incremental data slices in the data load does not exceed the value of AUTOMERGEMAXSLICENUMBER, the auto-merge process is not activated.

Manually Merging Incremental Data Slices

You can merge all incremental data slices into the main slice or merge all incremental data slices into a single data slice while leaving the main slice unchanged. To merge slices, you must have the same privileges as for loading data (Database Update Permission or higher).

After the new input view is written to the cube, Essbase creates the aggregate views for the slice. The views created for the new slice are a subset of the views that exist on the main slice.

Note:

You cannot export data when performing a merge.

If you cleared data from a region using the logical clear region operation, which results in a value of zero for the cells you cleared, you can elect to remove zero value cells during the merge operation.

To perform merging operations, use the alter database MaxL statement with the merge grammar.

For example, to merge all incremental data slices into the main slice, use this statement:

alter database ASOsamp.Basic
   merge all data;

To merge all incremental data slices into the main slice and remove zero-value cells, use this statement:

alter database ASOsamp.Basic
   merge all data remove_zero_cells;

To merge all incremental data slices into a single data slice, use this statement:

alter database ASOsamp.Basic
   merge incremental data;

Note:

Before you copy an aggregate storage application, you must merge all incremental data slices into the main slice. Data in unmerged incremental data slices is not copied.

Related Links

AUTOMERGE

AUTOMERGEMAXSLICENUMBER

Alter Database (Aggregate Storage)

Replacing Data Using Incremental Data Slice Contents

For aggregate storage (ASO) data sets that are small enough to reload completely while maintaining low data latency, Essbase can remove the current contents of an aggregate storage cube and replace the cube with the contents of a specified data load buffer.

The atomic replacement functionality transitions querying the old contents of the cube to the new contents without interrupting service. The newly loaded data set is aggregated to create the same set of views that existed for the replaced data set.

Essbase also allows for atomically replacing the contents of all incremental data slices in a cube. Consider a situation in which data can be separated into a relatively large, static data set that is never updated and a relatively small, volatile data set for which the individual updates are difficult to identify but are confined to the volatile data set. For example, the large, static data set consists of historical transaction data for the last three years; however, for the transaction data for the past two months, users can change a characteristic of a transaction in the source database. Tracking these changes can be prohibitively complex. You can load the static data set as the main slice in a cube and the volatile data set as one or more incremental slices.

When committing slices during ASO incremental data load, Essbase removes the current contents of all incremental data slices and creates a new slice (using the add values grammar in the buffer commit specification of the import database MaxL statement) with the contents of a specified data load buffer. The newly loaded data set is augmented with aggregated views based on the set of views that exist on the main slice.

Note:

To use the override grammar, create a data load buffer with the ignore_missing_values property for optimal performance. Additionally, you must ensure that there are not any conflicts between the static and volatile data sets (for example, there should not be a value in each data set for the same cell).

To replace the contents of a database or the incremental data slices in a cube, use the import database MaxL statement with the override grammar.

For example, to replace the contents of a cube, use this statement:

import database ASOsamp.Basic data
   from load_buffer with buffer_id 1
   override all data;

To replace the contents of all incremental data slices with a new slice, use this statement:

import database ASOsamp.Basic data
   from load_buffer with buffer_id 1
   override incremental data;

Note:

If the override replacement fails, Essbase continues to serve the old data set.

In Smart View, the submit command is equivalent to using the incremental data load functionality with the override grammar.

While performing a send operation, new requests for lock, unlock, and retrieve and lock will wait until the send operation is completed.

See: Import Data (Aggregate Storage)

Viewing Incremental Data Slices Statistics

Essbase provides statistics on the size and number of incremental aggregate storage (ASO) data slices, and the cost of querying the incremental data slices.

The time it takes for a query to access all of the incremental data slices is expressed as a percentage (between .01 and 1.0 inclusive). If a cube has a main slice and multiple incremental data slices, a query statistic of 0.66 means that two-thirds of the query time was spent querying the incremental data slices and one-third was spent querying the main data slice. If the cost of querying the incremental data slices is too high, you can merge the slices.

To view the information about slices, use the list aggregate_storage slice_info grammar in the query database MaxL statement. For example,

query database ASOsamp.Basic list aggregate_storage slice_info;

See: Query Database (Aggregate Storage)

Renegade Members in Aggregate Storage Data Loads

Renegade members enable continuation of an Essbase aggregate storage (ASO) data load even if a specified member combination has missing or invalid members.

When a data load encounters a missing or invalid member, the data load continues, with the data value of the missing or invalid member stored under the member that is tagged as the renegade member in the dimension. If a renegade member is not set in the dimension, the record is rejected. If data already exists for the renegade member, the behavior depends on whether you selected to add values or to overwrite values when creating the data load rules file.

Each dimension can have only one member assigned as a renegade member, and the renegade member must be a level-0 member.

The following data load file includes a member named SC:

Product  Measures   *Data*
NY,      Sales       100
SA,      Sales       200
SC,      Sales       300    

In the following outline, no member is named SC; however, the member named SA is set as the renegade member in the Products dimension:

Products (+)
   NY (+)
   SA (+)
Measures (+)
   Sales (+)
   COGS (+)

During the data load, the data value for the member combination SC and Sales, which is 300, is loaded into renegade member SA and Sales.

In the following data load file, two records exist for SC and Sales, each with different values:

Product    Measures    *Data*
NY,        Sales        100
SA,        Sales        200
SC,        Sales        250
SC,        Sales        300 

Both values for SC and Sales (250 and 300) are loaded into SA and Sales. If you selected to add values, the value in the cell is 550 (250 + 300). If you selected to overwrite values, then the value in the cell is the last one loaded; in this case, 300.

The following examples illustrate the behavior of renegade members using the following data load file:

Months    Transaction Type    Customer    Product    Price
Jan,      Sale,               Discard1,   Product1   300
Jan,      Sale,               Discard1,   Discard2   300
Jan,      Sale,               Customer1,  Discard2   300

Discard1 and Discard2 do not exist in the outline.

  • Example 1:

    If the Customer dimension has the Customer1 member tagged as renegade, and the other dimensions do not have renegade members, only the first record is loaded into the following intersection:

    Jan    Sale    Customer1(Ren)    Product1    300

    The other two records are rejected because the Product dimension does not have a renegade member. The rejected records are logged in the renegade member log file.

  • Example 2:

    If the Product dimension has the Product1 member tagged as renegade, and the other dimensions do not have renegade members, only the last record is loaded into the following intersection:

    Jan    Sale    Customer1    Product1(Ren)    300

    The other two records are rejected, because the Customer dimension does not have a renegade member. The rejected records are logged in the renegade member log file.

  • Example 3:

    If the Customer and Product dimensions both have renegade members (Customer1 and Product1), all records are loaded into the following intersection:

    Jan    Sale    Customer1(Ren)    Product1(Ren)    900 (or 300 if overwrite is enabled)

Example 4:

In example 4, the Customer dimension has RenMember1 tagged as renegade, and the Product dimension has RenMember2 tagged as renegade. Using the following data load file, all records are loaded because both the Customer and Product dimensions have renegade members.

Customer1 and Product1 are not renegade members. "Discard1" and "Discard2"do not exist in the outline.

Data load file:

Months  Transaction Type  Customer    Product   *Data*
Jan,     Sale,            Discard1,   Product1   300
Jan,     Sale,            Discard1,   Discard2   300
Jan,     Sale,            Customer1,  Discard2   300

The values specified in the data load file for discard members are instead automatically loaded into the designated renegade members:

Loaded data:

Months  Measures   Customer          Product           Price
Jan     Sale       RenMember1(ren)   ProductR          300
Jan     Sale       RenMember1(ren)   RenMember2(ren)   300
Jan     Sale       CustomerR         RenMember2(ren)   300

Logging for renegade members is not enabled by default. To enable logging, use the RENEGADELOG configuration setting, which, when set to TRUE, enables logging of members loaded into a renegade member intersection.

Note:

Renegade members can be referenced in calculation and report scripts. Renegade members are not supported in tabular data loads or spreadsheet update operations.

Source Data Differences for Aggregate Storage Data Loads

While processing records in the source data in preparation for loading values into aggregate storage (ASO) cubes, Essbase processes records only for the level 0 dimension intersections where the member does not have a formula.

The following example shows a source of data that has records for only level 0 intersections. The last field contains data values, and the other fields are level 0 members of their respective dimensions.

Jan, Curr Year, Digital Cameras, CO, Original Price, 10784
Jan, Prev Year, Camcorders, CO, Original Price, 13573

Essbase ignores records that specify upper-level members and, at the end of the data load, displays the number of skipped records.

For example, the following record would be skipped because member Mid West is a level 1 member:

Jan, Curr Year, Digital Cameras, Mid West, Original Price, 121301

Sorting through data is unnecessary, because Essbase reads and sorts records internally before committing values to the cube.