ETL Prerequisites and Considerations

Appendix: ETL Prerequisites and Considerations

This appendix provides initial ETL preparation tasks and discusses how to:

Perform the following preparatory tasks before you begin implementing ETL jobs:

Create a detailed list of all the EPM products that have been purchased and the related license codes.

Identify and enumerate the products you are going to implement and in what order.
Create a detailed implementation schedule that accounts for the EPM data marts and business units you are going to implement.
Review the list of ETL application software components (such as *.dsx, parameter, and DSParams files) and identify which are necessary for your requirements based on your implementation schedule.

See ETL Reference Documents.
Identify the list of database tables that will be populated and the list of corresponding jobs that have to be executed to populate these tables.

Note. Apart from the jobs, which directly populate the relevant target tables, you must also identify all the dependent jobs, such as hash file load jobs.

See ETL Lineage Reports.xls in Customer Connection for more details.
Perform all non-ETL implementation tasks.

Perform the following server sizing tasks before you begin implementing ETL jobs:

Refer to all relevant database sizing documents delivered with EPM, and thoroughly familiarize yourself with it before implementation.
Perform database sizing, considering all the tables that are populated by the ETL process as well as those used for reporting.
Run the delivered script for inserting a Not Available row into all relevant tables.

This script will insert one Not Available row each into every table, which is a prerequisite for the ETL application.

Note. You can find the script on the installation CD in the following location: <PSHOME>\SRC\ETL.
To size the DataStage server, determine the number of hash files that will be created for the subset of the ETL application that you are going to implement.

You can use the list of jobs you have created in previous steps and the list of hash files that are supplied along with EPM.
Calculate the space required for storing all of these hash files.

You must consider hash file properties and structure, as well as the quantum of data that is associated to each hash file to perform hash file sizing.

Note. A buffer should be allocated for future incremental data (growth in the size of the hash file).
Decide where you will physically store hash files (with the DataStage server directory or elsewhere).

Space is also required for Datastage server log files.
Allocate space for all the other input data files such as XML files, parameter files, and *.dat files.

Perform the following server configuration and installation tasks before you begin implementing ETL jobs:

Determine a suitable server configuration for your development, QA, and production environments.
Install the DataStage servers.

Create separate servers for development, QA, and production environments.
Perform all required steps to configure the database, depending on your source and target databases.
Install the DataStage client.
Apply the latest patches for DataStage server and client.

The following considerations should be noted before you begin DataStage implementation:

Perform a detailed analysis of your project creation strategy.

You should decide whether you would like a single project for the whole EPM application or have separate projects for each data mart.
Create separate DataStage projects for development, QA, and production.

PeopleSoft recommends that the production project reside on a separate DataStage server.
Classify your jobs as high, medium, and low volume.

Provide project defaults for array size, transaction size, IPC buffer and other performance parameters. Any exceptions and special cases must be handled by changing the value at the job level.
Open a sample job from each category and familiarize yourself with the filter conditions in the source, update strategy, job design, job parameters and other transformations.
Review the master run utility and create appropriate sequential file inputs.

Analyze this feature and decide on the different categories that you want to run using this utility.
Review the master sequencers and familiarize yourself with them.
Open one of the business process and identify all the jobs that are required to run it.

Run this as an example to learn how the jobs are ordered, the interdependencies, the hash file usage, and so forth.

The following job execution strategies should be noted before you begin running jobs:

Plan a job scheduling strategy and use the DataStage Director scheduler or another third-party tool.

Do a sample run using the scheduling tool to test whether the tool meets all your requirements for scheduling the application.
Familiarize yourself with all the job execution utilities that are provided with DataStage.
Define the error validation strategy you wish to use in your job.