Appendix: ETL Installation and Implementation Prerequisites and Considerations

This appendix provides initial ETL preparation tasks and discusses how to:

See Also

Preparing to Load Source Data Into EPM

Setting Up DataStage for EPM

Click to jump to parent topicPreparing to Install and Implement IBM WebSphere DataStage

Perform the following preparatory tasks before you begin implementing ETL jobs:

  1. Create a detailed list of all the EPM products that have been purchased and the related license codes.

    Identify and enumerate the products you are going to implement and in what order.

  2. Create a detailed implementation schedule that accounts for the EPM data marts and business units you are going to implement.

  3. Review the list of ETL application software components (such as .dsx, parameter, and DSParams files) and identify which are necessary for your requirements based on your implementation schedule.

    See ETL Reference Documents.

  4. Identify the list of database tables that will be populated and the list of corresponding jobs that have to be executed to populate these tables.

    Note. Apart from the jobs, which directly populate the relevant target tables, you must also identify all the dependent jobs, such as hash file load jobs.

  5. Perform all non-ETL implementation tasks.

Click to jump to parent topicSizing Your Database/DataStage Servers

EPM uses hash files extensively, which are stored in a directory which you specify on the server. It is important, therefore, to remember this when you are determining sizing requirements because the amount of data in the hash files will increase with time. Also the server directory should hold the flat files as well as XML file inputs that the ETL process requires. Generally, every staging table has a corresponding hash file, and every dimension table has a corresponding hash file, so the size of all the hash files is a function of the size of the data that is stored in staging tables and the dimension tables. However, it is also to remember that only relevant columns in a table are loaded into a hash file.

For sizing the space requirement for hash files, we suggest that you take a few sample hash files and compare them with the underlying tables to determine the size requirement. Also compare the structure of the table and the number of columns in that table that are actually loaded to the hash file. It is very important to keep sufficient buffer size for future incremental data, since as the data size increases with time the hash files also grow in size. Another way to do this is with the help of an unsupported tool provided along with the IBM WebSphere DataStage CD. The tool is called HFC.exe, which is short for Hash File Calculator.

Perform the following server sizing tasks before you begin implementing ETL jobs:

  1. Refer to all relevant database sizing documents delivered with EPM, and thoroughly familiarize yourself with it before implementation.

  2. Perform database sizing, considering all the tables that are populated by the ETL process as well as those used for reporting.

  3. Run the delivered script for inserting a Not Available row into all relevant tables.

    This script will insert one Not Available row each into every table, which is a prerequisite for the ETL application.

    Note. You can find the script on the installation CD in the following location: <PSHOME>\SRC\ETL.

  4. To size the DataStage server, determine the number of hash files that will be created for the subset of the ETL application that you are going to implement.

    You can use the list of jobs you have created in previous steps and the list of hash files that are supplied along with EPM.

  5. Calculate the space required for storing all of these hash files.

    You must consider hash file properties and structure, as well as the quantum of data that is associated to each hash file to perform hash file sizing.

    Note. A buffer should be allocated for future incremental data (growth in the size of the hash file).

  6. Decide where you will physically store hash files by setting the value in the environmental parameter.

    Space is also required for Datastage server log files.

  7. Allocate space for all the other input data files such as XML files, parameter files, and *.dat files.

DataStage Server Requirements

Please see the IBM Information Server: Planning Installation and Configuration Guide for the minimum requirements to install the DataStge Server on a specific platform.

DataStage Client Requirements

Please see the IBM Information Server: Planning Installation and Configuration Guide for the minimum requirements to install the DataStge Client.

Click to jump to parent topicDetermining Server Configuration and Installing IBM WebSphere DataStage

Perform the following server configuration and installation tasks before you begin implementing ETL jobs:

  1. Determine a suitable server configuration for your development, QA, and production environments.

  2. Install the DataStage servers.

    Create separate servers for development, QA, and production environments.

  3. Perform all required steps to configure the database, depending on your source and target databases.

  4. Install the DataStage client.

  5. Apply the latest patches for DataStage server and client.

Click to jump to parent topicIBM WebSphere DataStage Implementation Considerations

The following considerations should be noted before you begin DataStage implementation:

  1. Perform a detailed analysis of your project creation strategy.

    You should decide whether you would like a single project for the whole EPM application or have separate projects for each data mart.

  2. Create separate DataStage projects for development, QA, and production.

    PeopleSoft recommends that the production project reside on a separate DataStage server.

  3. Classify your jobs as high, medium, and low volume.

    Provide project defaults for array size, transaction size, IPC buffer and other performance parameters. Any exceptions and special cases must be handled by changing the value at the job level.

  4. Open a sample job from each category and familiarize yourself with the filter conditions in the source, update strategy, job design, job parameters and other transformations.

  5. Review the master run utility and create appropriate sequential file inputs.

    Analyze this feature and decide on the different categories that you want to run using this utility.

  6. Review the master sequencers and familiarize yourself with them.

  7. Open one of the business process and identify all the jobs that are required to run it.

    Run this as an example to learn how the jobs are ordered, the interdependencies, the hash file usage, and so forth.

Click to jump to parent topicDefining a Job Execution Strategy

The following job execution strategies should be noted before you begin running jobs:

  1. Plan a job scheduling strategy and use the DataStage Director scheduler or another third-party tool.

    Do a sample run using the scheduling tool to test whether the tool meets all your requirements for scheduling the application.

  2. Familiarize yourself with all the job execution utilities that are provided with DataStage.

  3. Define the error validation strategy you wish to use in your job.