ETL Jobs

This table provides answers to general questions about ETL jobs.

Question	Answer
What are the different categories of jobs in EPM?	The following is a list of EPM job categories: Source to OWS (Staging area) OWS to OWE OWS to MDW OWE to MDW OWS to MDW (for CRM online marketing data mart) OWE to source (for GL and Position Budgeting in HCM)
What types of staging loads does EPM support?	EPM supports incremental staging loads along with a small number of destructive staging loads. See Understanding ETL Load Strategies in EPM.
What is the load strategy for the MDW and OWE?	Most of the MDW and OWE loads are provided with Incremental logic using LASTUPD_EW_DTTM field which is part of the subrecord LOAD_OWS_SBR. See Understanding ETL Load Strategies in EPM.
How does EPM manage incremental loading if the source fields have Null values for the Datetime stamp?	If the Datetime column is a nullable field on the source database, then source filter will include a condition to bring that data as well along with the incremental data. See Understanding ETL Load Strategies in EPM.
Are there any special loading requirements between the first run and subsequent runs?	No, there are no special loading requirements between the first run and subsequent runs, provided the prepackaged jobs are used without any customization.
How does EPM manage deletes in the source tables?	For the Campus Solutions Warehouse, EPM provides staging jobs that can identify source records that have been physically deleted from your PeopleSoft transaction system and flag those records for physical deletion from the Campus Solutions Warehouse schema. For more information see the topic Managing Source System Deletes and Archiving in the PeopleSoft Campus Solutions Warehouse.
Does EPM have the functionality to accommodate late arriving dimensions?	No, EPM does not have the functionality to accommodate late arriving dimensions.
What types of data sources does EPM accommodate?	EPM can accommodate XML Files, Flat Files, and RDBMS tables.
Does EPM use any control tables in its ETL design?	EPM no longer uses control tables and those that were used in past EPM releases (such as ETL_JOB_CTL, ETL_RUN_HST, ETL_TSE_ERR) have been deprecated.
Does EPM use surrogate keys? If so, how are Surrogate IDs generated and managed?	Datastage manages the universe file SDKSequences, which will hold the surrogate key sequences for a particular key. See Surrogate Keys.
Does EPM have a strategy for handling slowly changing dimensions?	In EPM, the dimension D_EMPL_JOB from HCM warehouse is designed as Type 2 slowly changing dimension and all the other dimension loads are Type 1. However, the lookup operation supports Type 2 slowly changing dimension (for example, whenever there is lookup on other dimension, it will have effective dated logic). For more information on configuring Type 2 slowly changing dimensions, see the Configuring Slowly Changing Dimensions topic in your warehouse specific documentation (for example, the PeopleSoft Campus Solutions Warehouse).
How to change my dimension load from Type 1 design to Type 2 design?	For more information on configuring Type 2 slowly changing dimensions, see the Configuring Slowly Changing Dimensions topic in your warehouse specific documentation (for example, the PeopleSoft Campus Solutions Warehouse).
Does EPM support ETL rollbacks? If so, how is this handled?	Rollback is possible through the Transaction Size parameter. If the transaction size is selected as zero and if the job aborts in the middle, then the job will rollback the transactions since it follows the principle of two-way commit. If the transaction size is anything other than zero and if the job fails in the middle, then the job will perform commits for the number of rows that processed till the error message.
What aggregation strategy does EPM ETL employ?	The aggregator stage is generally not used in job design since the aggregation functions are better left to the database since the database can perform aggregation functions more efficiently than Datastage. Whenever the aggregation must be performed on the source data, it is achieved within DRS source stage itself. In case of generated sql queries, aggregate functions are given in against columns in corresponding derivation columns and group by clause is given in 'Other clauses' text area. Wherever User Defined SQL option is selected the query is specified appropriately with the aggregate function. In specific instances where an aggregation function must be performed on data that is transformed and not directly read from the Database and in. cases where the number of records is going to be large, temporary table is created where the data is temporarily written and then read out, when the aggregation functions can be performed.
What indexing strategy does EPM ETL employ?	EPM data models are delivered with indexes. Before loading the target tables, drop the indexes and then build them after load. This improves ETL performance.
How are lookups used in the EPM ETL design?	Lookups are usually used in a Hashed file stage, except for relational joins, when they are used in the DRS stage instead. See ETL Hashed Files, Understanding Data Validation and Error Handling in the ETL Process.
What types of job parameters does EPM use to increase run time flexibility?	Parameterization helps you enter run time parameters without resorting to changing jobs. Run time information, such as the Database type, the database connection parameters, and parameter file directories should be set as environmental variables, which are used in individual jobs. Parameter files are used for those jobs, which read from the user, input variable values or a list of values, which may change from run to run. The variables and their respective values are given in parameter files. See Parameter and Source Data Files Information.
How is DataStage code re-used?	PeopleSoft packages reusable DataStage code with Shared Containers, routines, and some server jobs (found in the Reusable Jobs folder).
Are there any customizations required to handle Unicode data?	To support Unicode databases, the DataStage Server must be installed with NLS enabled. Also, the proper character set should be selected based on the requirements by the user, in the DataStage Administrator.