Skip to Main Content
Return to Navigation

Sizing Your Database/DataStage Servers

EPM uses hash files extensively, which are stored in a directory which you specify on the server. It is important, therefore, to remember this when you are determining sizing requirements because the amount of data in the hash files will increase with time. Also the server directory should hold the flat files as well as XML file inputs that the ETL process requires. Generally, every staging table has a corresponding hash file, and every dimension table has a corresponding hash file, so the size of all the hash files is a function of the size of the data that is stored in staging tables and the dimension tables. However, it is also to remember that only relevant columns in a table are loaded into a hash file.

For sizing the space requirement for hash files, we suggest that you take a few sample hash files and compare them with the underlying tables to determine the size requirement. Also compare the structure of the table and the number of columns in that table that are actually loaded to the hash file. It is very important to keep sufficient buffer size for future incremental data, since as the data size increases with time the hash files also grow in size. Another way to do this is with the help of an unsupported tool provided along with the IBM WebSphere DataStage CD. The tool is called HFC.exe, which is short for Hash File Calculator.

Perform the following server sizing tasks before you begin implementing ETL jobs:

  1. Refer to all relevant database sizing documents delivered with EPM, and thoroughly familiarize yourself with it before implementation.

  2. Perform database sizing, considering all the tables that are populated by the ETL process as well as those used for reporting.

  3. Run the delivered script for inserting a Not Available row into all relevant tables.

    This script will insert one Not Available row each into every table, which is a prerequisite for the ETL application.

    Note: You can find the script on the installation CD in the following location: <PSHOME>\SRC\ETL.

  4. To size the DataStage server, determine the number of hash files that will be created for the subset of the ETL application that you are going to implement.

    You can use the list of jobs you have created in previous steps and the list of hash files that are supplied along with EPM.

  5. Calculate the space required for storing all of these hash files.

    You must consider hash file properties and structure, as well as the quantum of data that is associated to each hash file to perform hash file sizing.

    Note: A buffer should be allocated for future incremental data (growth in the size of the hash file).

  6. Decide where you will physically store hash files by setting the value in the environmental parameter.

    Space is also required for Datastage server log files.

  7. Allocate space for all the other input data files such as XML files, parameter files, and *.dat files.

DataStage Server Requirements

Please see the IBM Information Server: Planning Installation and Configuration Guide for the minimum requirements to install the DataStge Server on a specific platform.

DataStage Client Requirements

Please see the IBM Information Server: Planning Installation and Configuration Guide for the minimum requirements to install the DataStge Client.