In This Section:
Some information in this chapter applies only to block storage databases and is not relevant to aggregate storage databases. Also see Comparison of Aggregate and Block Storage.
Loading a large data source into an Essbase database can take hours. You can shorten the data loading process by minimizing the time spent on these actions:
Reading and parsing the data source
Reading and writing to the database
To optimize data load performance, think in terms of database structure. Essbase loads data block by block. For each unique combination of sparse dimension members, one data block contains the data for all the dense dimension combinations, assuming that at least one cell contains data. For faster access to block locations, Essbase uses an index. Each entry in the index corresponds to one data block. See Sparse and Dense Dimensions, Selection of Dense and Sparse Dimensions, and Dense and Sparse Selection Scenarios.
Write: Essbase puts the data into blocks in memory and then writes the blocks to disk, finding the correct block on the disk by using the index, which is composed of pointers based on sparse intersections.
This process is repeated until all data is loaded. By using one or more processing threads in each stage, Essbase can perform some processes in parallel. See Managing Parallel Data Load Processing.
Examples in this chapter assume that you are familiar with the following topic: Data Sources.
The most effective strategy to improve performance is to minimize the number of disk I/Os that Essbase must perform while reading or writing to the database. Because Essbase loads data block by block, organizing the source data to correspond to the physical block organization reduces the number of physical disk I/Os that Essbase must perform.
The examples in this chapter illustrate ways that you can organize the data following this strategy. These examples use a subset of the Sample.Basic database, as shown in Table 184:
Table 184. Dimensions and Values for Examples
Because you do not load data into attribute dimensions, they are not relevant to this discussion although they are sparse.
Consider the following data source. Because it is not grouped by sparse-dimension member combinations, this data has not been sorted for optimization. As Essbase reads each record, it must deal with different members of the sparse dimensions.
Jan Actual Cola Ohio Sales 25 Budget "Root Beer" Florida Sales 28 Actual "Root Beer" Ohio Sales 18 Budget Cola Florida Sales 30
An optimally organized data source for the same Sample.Basic database shows different records sorted by a unique combination of sparse-dimension members: Actual -> Cola -> Ohio. Essbase accesses only one block to load these records.
Actual Cola Ohio Jan Sales 25 Actual Cola Ohio Jan Margin 18 Actual Cola Ohio Jan COGS 20 Actual Cola Ohio Jan Profit 5
You can use a data source that loads many cells per record. Ensure that records are grouped together by unique sparse-dimension member combinations. Then order the records so that the dimension in the record for which you provide multiple values is a dense dimension.
The next data source example uses a header record to identify the members of the Measures dimension, which is dense. The data is sorted first by members of the dense dimension Year and grouped hierarchically by members of the other dimensions. Multiple values for the Measures dimension are provided on each record.
Sales Margin COG Profit Jan Actual Cola Ohio 25 18 20 5 Jan Actual Cola Florida 30 19 20 10 Jan Actual "Root Beer" Ohio 18 12 10 8 Jan Actual "Root Beer" Florida 28 18 20 8
For information about arranging data in source files before loading, see Data Sources that Do Not Need a Rules File.
The following example data source is not organized in ranges. It includes unneeded repetition of fields. All values are Profit values. Profit must be included only at the beginning of the group of data applicable to it. This example contains 33 fields that Essbase must read to load the data values properly.
Profit Jan "New York" Cola 4 Jan "New York" "Diet Cola" 3 Jan Ohio Cola 8 Jan Ohio "Diet Cola" 7 Feb "New York" Cola 6 Feb "New York" "Diet Cola" 8 Feb Ohio Cola 7 Feb Ohio "Diet Cola" 9
The next example provides the same data optimized by grouping members in ranges. By eliminating redundancy, this example contains only 23 fields that Essbase must read in order to load the data values properly.
Profit Jan "New York" Cola 4 "Diet Cola" 3 Ohio Cola 8 "Diet Cola" 7 Feb "New York" Cola 6 "Diet Cola" 8 Ohio Cola 7 "Diet Cola" 9
Although sorted efficiently, the data source sorted and grouped by dense dimensions shows a lot of repetition that can slow down the load process. You can further optimize this data by grouping the data into ranges. The optimized data source below eliminates the redundant fields, reducing processing time.
Sales Margin COG Profit Jan Actual Cola Ohio 25 18 20 5 Florida 30 19 20 10 "Root Beer" Ohio 18 12 10 8 Florida 28 18 20 8
The index is organized in the same order as the sparse dimensions in the outline. To further optimize the data source, with the sparse data combinations in the data source grouped together, arrange the data so that sparse dimensions are in the same order as the outline.
Essbase pages portions of the index in and out of memory as requested by the data load or other operations. Arranging the source data to match the order of entries in the index speeds the data load because it requires less paging of the index. Less paging results in fewer I/O operations.
The methods described earlier in this chapter give you the most substantial data load performance enhancements. If you have not done so, carefully evaluate your processor speed and memory requirements and upgrade your computers to meet them.
Another way to speed data loads is to work with the Essbase parallel data load feature to optimize processor resources. The parallel data load feature recognizes opportunities to process data load tasks simultaneously. Although some opportunities present themselves on single-processor computers, many more opportunities are available on multiple-processor computers.
One form of parallel processing occurs when one thread takes advantage of processor resources that are left idle during the wait time of another thread. For example, while a thread performs I/O processing, it must wait for the slower hardware to perform its task. While this thread waits, another thread can use the idle processor resource. Processing staged tasks in parallel can improve processor efficiency by minimizing idle time.
When computers have multiple processors, Essbase can perform an additional form of parallel processing. When a data load stage completes its work on a portion of data, it can pass the work to the next stage and start work immediately on another portion of data. Processing threads perform their tasks simultaneously on the different processors, providing even faster throughput.
Although Essbase uses parallel processing to optimize processor resources across the data load stages, processor resources are idle at times. To take advantage of these times, Essbase can further divide record processing in the preparation and write stages. To tailor parallel processing to your situation, you can use the DLTHREADSPREPARE and DLTHREADSWRITE essbase.cfg settings to tell Essbase to use additional threads during these stages.
As shown in Table 185, Essbase provides three essbase.cfg settings that enable you to manage parallel data load processing.
Table 185. Parallel Data Load essbase.cfg Settings
Specifies how many threads Essbase may use during the data load stage that writes data to the disk. High values may require allocation of additional cache. See Implications in Sizing the Data Cache.
Only when the DLSINGLETHREADPERSTAGE setting is set to FALSE for the specific application and database being loaded does the data load process use the thread values specified in the DLTHREADSPREPARE and DLTHREADSWRITE settings.
For block storage databases, Essbase Server allocates the data cache memory area to hold uncompressed data blocks. Each thread specified by the DLTHREADSWRITE setting uses an area in the data cache equal to the size of an expanded block.
Depending on the size of the block, the number of threads, and how much data cache is used by other concurrent operations during a data load, it may be possible to need more data cache than is available. In such circumstances, decrease the number of threads or increase the size of the data cache.
While processing data loads, you can view processor activity. Different operating systems provide different tools for viewing processor activity. For example, the Task Manager in Windows enables you to view processor and memory usage and processes. Among the tools available on UNIX are top and vmstat. You can also use third-party tools to view and analyze system use.
To assess system use during data load processing;
Alter the essbase.cfg settings described in Setting Parallel Data Load Settings.