Appendix: ETL Frequently Asked Questions

This appendix provides answers to frequently asked ETL questions for EPM, and covers these topics:

Click to jump to parent topicIBM WebSphere DataStage ETL Tools

This table provides answers to general questions about IBM WebSphere DataStage ETL tools.

Question

Answer

What versions of IBM WebSphere DataStage are supported in EPM?

EPM only supports IBM WebSphere DataStage version 8.1 server edition.

What if I am using an earlier version of DataStage either as part of a non-PeopleSoft installation or as part of EPM?

You must upgrade to IBM WebSphere DataStage version 8.1 server edition

What are the differences between ETL content for EPM release 9.1 and the previous EPM releases?

For 9.1 OWS staging tables contain all the fields of its related source tables.

What are the IBM WebSphere software components that are delivered with EPM?

EPM delivers IBM InfoSphere Information Server 8.1

What is QualityStage?

IBM WebSphere QualityStage provides a powerful framework for developing and deploying data investigation, standardization, enrichment, probabilistic matching and survivorship operations. For use in transactional, operational, or analytical applications, in batch and real-time, the same services are seamlessly deployed to facilitate data validation, cleansing or master data entity consolidation for your, locations and products.

For more information, please use the following link to the IBM WebSphere website: http://www-01.ibm.com/software/data/infosphere/qualitystage/

What is InfoSphere Information Analyzer?

For more information, please use the following link to the IBM WebSphere website: http://www-01.ibm.com/software/data/infosphere/information-analyzer/

What is IBM Metadata Workbench?

IBM Metadata Workbench provides Web-based exploration of information assets that are generated and used by IBM Information Server applications. IBM Metadata Workbench gives you out-of-the-box reporting on data movement, data lineage, and the impact of changes and dependencies. With the workbench, you can trace the data lineage of business intelligence reports to provide a basis for compliance with regulations such as Sarbanes-Oxley and Basel II. IBM Information Server components generate design time and runtime metadata, automatically storing that metadata in the IBM Information

What is Parallel Extender?

DataStage Parallel Extender (DS-PX) is a highly scalable parallel processing infrastructure package for the development and execution of data integration, data warehousing, business intelligence and analytical applications.

How does DataStage ensure application security?

Password and role based security can be effectively implemented in DataStage at a project level from the DataStage administrator.

How does DataStage ensure data security?

EPM accesses data on databases using the DRS stage. The user ID and password for accessing the databases are parameterized as environmental variables and the password parameter can be set as an encrypted field in the DataStage Administrator. This ensures data security by restricting the database access passwords.

Click to jump to parent topicDataStage Server Installation

This table provides answers to DataStage Server installation questions.

Question

Answer

What are the prerequisites to the DataStage server installation?

For information on prerequisites to DataStage server installation, please refer to the PeopleSoft Enterprise Performance Management Installation Guide.

See IBM Information Server: Planning Installation and Configuration Guide

What steps must I perform to install IBM WebSphere DataStage Server?

For information on prerequisites to DataStage server installation, please refer to the PeopleSoft Enterprise Performance Management Installation Guide.

See IBM Information Server: Planning Installation and Configuration Guide

How do I verify if I have successfully installed the IBM WebSphere DataStage Server?

After you install and configure IBM InfoSphere DataStage and QualityStage, test the installation by logging in to the Administrator and Designer clients.

See PeopleSoft Enterprise Performance Management Installation Guide.

Click to jump to parent topicDataStage Client Installation

This table provides answers to DataStage Client installation questions.

Question

Answer

How do I verify whether I have successfully installed DataStage client?

Please refer to the PeopleSoft Enterprise Performance Management Installation Guide.

Click to jump to parent topicDataStage Patches

This table provides answers to questions about DataStage patches.

Question

Answer

Are there any other relevant patches that I have to apply other than DataStage patches?

There are two patches for SQL Server and DB2 database users.

These are DRS patches for these two databases.

You can find these patches in the following resolutions:

  • 829720 - DRS patch for SQL server database users

  • 829719 - DRS patch for DB2 database users

How do I verify that the patch has been installed correctly?

Verify that the files specified in the install guide for the patch are updated properly to verify that the patches were installed correctly. Also, test some jobs in the project and ensure that no abnormal termination or any other errors occurs right after applying the patch.

Click to jump to parent topicETL Content

This table provides answers to ETL content packaging questions.

Question

Answer

How is the prepackaged ETL content delivered to me?

The DataStage application is delivered in the form of *.dsx files, which are text files that can contain export of a whole DataStage project or can contain DataStage design components and executables, like server jobs, sequencers, routines, containers and so forth.

Where can I find the list of delivered DSX files?

See DSX File Import Descriptions.

How are the delivered dsx files packaged in the CD (and future bundles)?

There are 38 dsx files that are delivered with EPM. You can see dsx files for five warehouses and 1 for Common where you have all the dsx files that are common across the warehouses.

What are the non-DSX file deliverables with the CD (and future bundles)?

Other than DSX files, which contain DataStage job designs, the other deliverables are flat file inputs, xml file inputs and parameter files.

See Parameter and Source Data Files Information.

What are the various EPM bundle files and how will they be packaged in the bundle1/bundle2/ICE Resolution?

Refer to the bundle posting related information from You Connection.

What are the various source application release versions that have been used for EPM 9.1 release?

  • FSCM Source Release – FSCM 9.1

  • CRM Source Release - CRM 9.1

  • HRMS Source Release - HCM 9.1

  • Campus Solutions Source Release -CS 9.0

Click to jump to parent topicEnvironmental Variables

This table provides answers to questions about environmental variables.

Question

Answer

What are environmental variables and how are they used?

Environment variables enable you to set the project defaults and these are used in jobs.

What is the DSParams file?

For any datastge project, all project levels, user defined environmental items are stored in a file called DSParams. It is located in <dshome>\DataStage\Projects\<projname>.

How do I copy the DSParams file the first time as opposed to copying from one project to another project?

The methodology described below provides a workaround for moving or sharing the global parameters without having to re-type them in the administrator. The work around consists of replacing and/or editing this file to add the parameters. Be sure to back up the original file before any other activity occurs.

For a new project that has not yet defined any global parameters, just copy the existing DSParams file to the new project. Be sure to rename the existing DSParams file. Ensure all DS clients (Designers, etc.) are logged off, and stop and start the DataStage services to activate it. Then go into the DataStage Administrator and all the parameters should be visible in the user-defined section of the environment screen. At this point, edit the default values for each parameter.

For an existing project that has already defined some global parameters; the DSParams file must be edited to add the desired parameters. The process below describes how to do this. The user-defined parameters are in two sections of the DSParams file: one section defines the parameters [EnvVarDefns] and the second section contains the default values [EnvVarValues]. The approach is to copy the correct lines from the original source project file into the target project DSParams file:

  1. Rename/Backup the DSParams file in the target project directory and backup the source project DSParams file as well.

  2. Edit the source project DSParams. Go to the end of the [EnvVarDefns] section and find the user defined parameters, which are at the end of the section. Select the lines up to but not including the line which contains [PROJECT]".

  3. Copy these lines and paste them into the target project DSParams file before the "[PROJECT]" section.

  4. Go back and edit the source project DSParams file. Find the section starting with the line "[EnvVarValues]". This is usually at the end of the file. Copy all of the lines of that section, or select all the lines for the specific parameters to be moved.

  5. Locate the end of the DSParams file in the target project directory. See if it has a section called "[EnvVarValues]". If it does not, add it. If it does, then go to the next step.

  6. Paste the lines into the target project DSParams file at the end of the "[EnvVarValues]" section and before the end of file.

  7. Save the DSParams file in the target project directory.

  8. Ensure all DS clients (designers and so forth) are logged off, and stop and start the DataStage services to activate it. Then go into the DataStage Administrator and all the parameters should be visible in the user-defined section of the environment screen. At this point, change the default values for each parameter.

What are 'Array' and 'Transaction' sizes?

  • Array Size: Array size is a parameter to specify the number of rows written (to a database) at a time. In other words it refers to the number of rows that are transferred in one call between DataStage and the database before they are written.

    Generally, increasing the array size will increase performance since client memory is used to cache records resulting in lesser server hits. The maximum size for array size is 32767. But increasing the array size too much will result in strain on the client memory. Hence an optimal value must be arrived at considering the client memory.

    For flexibility this has been parameterized as a environmental variable. Separate environmental variables are available for each source as well as for OWS, OWE and MDW. For direct DRS lookups, the recommended array size is 1 since DS lookups generally expects 1 row of data to be returned.

  • Transaction Size: Transaction size refers to the number of rows that are written to the database before the data is committed. Giving a transaction size of zero will ensure that commit doesn't happen until all the records are written. The default value is 0. If the transaction size is set to 100 then the database table commits are performed every 100 rows. Here again an optimal value must be arrived at considering the strain on the Database server and the number of records.

    For flexibility this has also been parameterized as a environmental variable. Separate environmental variables are available for each source as well as for the OWS, OWE and MDW.

How is the DATA_ORIGIN environment variable configured?

The value for DATA_ORIGIN is 'S' for following loads:

  • OWS (data flow from PeopleSoft source system)

  • OWE (data flow from OWS)

  • MDW (data flow from OWS)

The DATA_ORIGIN must be toggled to Enterprise only when running those sets of jobs whose flow is from OWE to MDW.

Click to jump to parent topicProject Creation, Import, and Export

This table provides answers to project creation, import, and export questions.

Question

Answer

What are the different methods of maintaining projects if I have jobs that source from more than one warehouse?

The EPM ETL design enables jobs to distribute across multiple projects or only a single project. However, the following are some of the options to create projects based on your needs.

  • Option 1: One project per EPM warehouse

    If you have more than one EPM warehouse, you can create one project per warehouse and one project for all the jobs related to Common jobs and Global Dimensions. For example, if you have HCM and FMS warehouse then there are three projects created for HCM, FMS and Common. The J_Dim_PS_D_PERSON is a part of Global Dimensions and this job must be triggered in Common project whereas the corresponding Hash Load jobs have to be triggered in HCM and FMS projects as well.

  • Option 2: All EPM warehouses in a single project

    You can have one single project for all the jobs for all the EPM warehouses, Common jobs and Global Dimensions. If you maintain only one project for all the warehouses (HCM, FMS and Common jobs) you need to run the Global Dimensions, Local Dimensions (HCM, FMS), SKU based dimensions and there is no need to run the Hash Load jobs after the first run.

    For this option, there is a possibility of reaching a maximum number of files limit for directories in some OS platforms. Please verify with system administrators before proceeding with this option.

What configurations must I perform after creating the projects?

Projects must be created from the DataStage Administrator.

See Setting Up DataStage Projects.

How do I import a project? How do I verify a successful import?

For more information about importing a project, see section 2-5 of the Configuring IBM WebSphereDataStage document.

See Setting Up DataStage Projects.

What are the DataStage categories (folders) and sub-categories that I will see after project import?

For more information about the DataStage categories available, see the topic 'Understanding the Project Structure' in this document.

Click to jump to parent topicSetting Up Development and Production Environments

This table provides answers to questions about development and production environment setup.

Question

Answer

Where can I find configuration information for different server platforms?

See IBM Information Server: Planning Installation and Configuration Guide

What are the supported platforms for Data stage?

The following platforms support the IBM InfoSphere Information Server:

  • HP-UX 11i v2, v3 on PA-RISC

  • HP-UX 11i v2, v3 on Intel Itanium

  • IBM AIX 5.3 and 6.1

  • Red Hat Enterprise Linux Advanced Server 4 on AMD or Intel processors

  • Red Hat Enterprise Linux 5 Advanced Platform on AMD or Intel processors

  • Red Hat Enterprise Linux 5 Advanced Platform on IBM System z

  • SUSE Linux Enterprise Server 10 on AMD or Intel processors

  • SUSE Linux Enterprise Server 10 on IBM System z

  • Sun Solaris 9 and Sun Solaris 10

  • Microsoft Windows XP Service Pack 2, Microsoft Windows Vista, Microsoft Windows Server 2003 Service Pack 2 (Client only)

  • Microsoft Windows 2003 Service Pack 2 (Server only)

Data sources Supported by EPM 9.1 application:

  • Oracle

  • MSSQL Server

  • IBM DB2 on OS/390

  • IBM DB2 on Unix/Windows

  • Flat files (Sequential Files)

  • XML files

Refer to the PeopleSoft Enterprise Performance Management Hardware and Software Requirements Guide for more details.

Are there database-specific configuration steps?

For MSSQL Server, you must enable the options to support functional index (MSCONCATCOL) in the database:

  • SET ANSI_NULLS

  • QUOTED_IDENTIFIER

  • CONCAT_NULL_YIELDS_NULL

  • ANSI_WARNINGS

  • ANSI_PADDING

Click to jump to parent topicETL Jobs

This table provides answers to general questions about ETL jobs.

Question

Answer

What are the different categories of jobs in EPM?

The following is a list of EPM job categories:

  • Source to OWS (Staging area)

  • OWS to OWE

  • OWS to MDW

  • OWE to MDW

  • OWS to MDW (for CRM online marketing data mart)

  • OWE to source (for GL and Position Budgeting in HCM)

What types of staging loads does EPM support?

EPM supports incremental staging loads along with a small number of destructive staging loads.

See Understanding ETL Load Strategies in EPM.

What is the load strategy for the MDW and OWE?

Most of the MDW and OWE loads are provided with Incremental logic using LASTUPD_EW_DTTM field which is part of the subrecord LOAD_OWS_SBR.

See Understanding ETL Load Strategies in EPM.

How does EPM manage incremental loading if the source fields have Null values for the Datetime stamp?

If the Datetime column is a nullable field on the source database, then source filter will include a condition to bring that data as well along with the incremental data.

See Understanding ETL Load Strategies in EPM.

Are there any special loading requirements between the first run and subsequent runs?

No, there are no special loading requirements between the first run and subsequent runs, provided the prepackaged jobs are used without any customization.

How does EPM manage deletes in the source tables?

For the Campus Solutions Warehouse, EPM provides staging jobs that can identify source records that have been physically deleted from your PeopleSoft transaction system and flag those records for physical deletion from the Campus Solutions Warehouse schema.

For more information see the chapter Managing Source System Deletes and Archiving in the PeopleSoft Campus Solutions Warehouse PeopleBook.

Does EPM have the functionality to accommodate late arriving dimensions?

No, EPM does not have the functionality to accommodate late arriving dimensions.

What types of data sources does EPM accommodate?

EPM can accommodate XML Files, Flat Files, and RDBMS tables.

Does EPM use any control tables in its ETL design?

EPM no longer uses control tables and those that were used in past EPM releases (such as ETL_JOB_CTL, ETL_RUN_HST, ETL_TSE_ERR) have been deprecated.

Does EPM use surrogate keys?

If so, how are Surrogate IDs generated and managed?

Datastage manages the universe file SDKSequences, which will hold the surrogate key sequences for a particular key.

See Surrogate Keys.

Does EPM have a strategy for handling slowly changing dimensions?

In EPM, the dimension D_EMPL_JOB from HCM warehouse is designed as Type 2 slowly changing dimension and all the other dimension loads are Type 1. However, the lookup operation supports Type 2 slowly changing dimension (for example, whenever there is lookup on other dimension, it will have effective dated logic).

For more information on configuring Type 2 slowly changing dimensions, see the chapter Configuring Slowly Changing Dimensions in your warehouse specific PeopleBook (for example, the PeopleSoft Campus Solutions Warehouse PeopleBook).

How to change my dimension load from Type 1 design to Type 2 design?

For more information on configuring Type 2 slowly changing dimensions, see the chapter Configuring Slowly Changing Dimensions in your warehouse specific PeopleBook (for example, the PeopleSoft Campus Solutions Warehouse PeopleBook).

Does EPM support ETL rollbacks?

If so, how is this handled?

Rollback is possible through the Transaction Size parameter. If the transaction size is selected as zero and if the job aborts in the middle, then the job will rollback the transactions since it follows the principle of two-way commit. If the transaction size is anything other than zero and if the job fails in the middle, then the job will perform commits for the number of rows that processed till the error message.

What aggregation strategy does EPM ETL employ?

The aggregator stage is generally not used in job design since the aggregation functions are better left to the database since the database can perform aggregation functions more efficiently than Datastage.

Whenever the aggregation must be performed on the source data, it is achieved within DRS source stage itself. In case of generated sql queries, aggregate functions are given in against columns in corresponding derivation columns and group by clause is given in 'Other clauses' text area. Wherever User Defined SQL option is selected the query is specified appropriately with the aggregate function.

In specific instances where an aggregation function must be performed on data that is transformed and not directly read from the Database and in. cases where the number of records is going to be large, temporary table is created where the data is temporarily written and then read out, when the aggregation functions can be performed.

What indexing strategy does EPM ETL employ?

EPM data models are delivered with indexes. Before loading the target tables, drop the indexes and then build them after load. This improves ETL performance.

How are lookups used in the EPM ETL design?

Lookups are usually used in a Hashed file stage, except for relational joins, when they are used in the DRS stage instead.

See Hashed Files, Understanding Data Validation and Error Handling in the ETL Process.

What types of job parameters does EPM use to increase run time flexibility?

Parameterization helps you enter run time parameters without resorting to changing jobs.

Run time information, such as the Database type, the database connection parameters, and parameter file directories should be set as environmental variables, which are used in individual jobs.

Parameter files are used for those jobs, which read from the user, input variable values or a list of values, which may change from run to run. The variables and their respective values are given in parameter files.

See Parameter and Source Data Files Information.

How is DataStage code re-used?

PeopleSoft packages reusable DataStage code with Shared Containers, routines, and some server jobs (found in the Reusable Jobs folder).

Are there any customizations required to handle Unicode data?

To support Unicode databases, the DataStage Server must be installed with NLS enabled. Also, the proper character set should be selected based on the requirements by the user, in the DataStage Administrator.

Click to jump to parent topicSurvey Jobs

This table provides answers to questions about survey jobs.

Question

Answer

Where are the Survey-related jobs in the dsx files?

After import, where can I find the Survey jobs in the DataStage Project?

Survey jobs are present in OWE and MDW modules of HCM warehouse.

In OWE module, there are some D00 jobs which reads the flat file data as source and loads the R00 tables. These jobs can be located in OWE.dsx and after the import, the jobs will be present under the OWE_E\HCM\D00\Base\Load_Tables\Server category.

In the MDW module, the R00 tables are used as source and it load the Survey Dimension tables. These jobs can be located in WHR_WORKFORCE_PROFILE_MART_E.dsx file and after the import, the jobs will be present under the \HCM_E\WORKFORCE_PROFILE_MART\Survey\OWE_To_MDW\Dimensions\Base\Load_Tables\Server category.

In EPM, the dimension D_EMPL_JOB from HCM warehouse is designed as a type 2 slowly changing dimension and all the other dimension loads are Type 1. However, the lookup operation supports Type 2 (for example, whenever there is lookup on other dimension, it will have effective dated logic).

What are the required steps in a Survey load?

  1. Run Survey Setup ETL: These are the jobs that read the source flat files or the temp tables and loads the R00/D00 tables. These jobs can be located in OWE_E.dsx files and it will be present under the path \OWE_E\HCM\D00\Base\Load_Tables\Server category.

  2. Run Survey Load ETL Batch: These are the jobs that read the data loaded in the above step and loads the F00 tables. These jobs can be located in OWE.dsx files and it will be present under the path \OWE_E\HCM\F00\Base\Load_Tables\Server category.

  3. Load Employee level Competencies: These jobs loads the Competency details of the employee from the OWS tables. These jobs can be located in OWE.dsx files and it will be present under the path \OWE_E\HCM\F00\Base\Load_Tables\Server category.

What are the prerequisites for loading the Survey module?

OWE Survey jobs are present under the HCM warehouse. These jobs use Flat files as sources. These source flat files should be present in the Project home directory and the environmental parameter $SOURCE_FILE_DIR should have the directory path of these survey flat files.

If the DataStage server is on Windows, then the survey jobs have to be modified by accessing the Sequential file stage and changing the Line Termination to DOS Style (CR LF). And then save the job, Compile and run the same.

Click to jump to parent topicETL Hashed Files

This table provides answers to questions about ETL hashed files.

Question

Answer

How are hash files used and for what purpose?

Hash Files are used to enhance the performance of the ETL job. Hash Files are typically used for lookups in an ETL job.

In EPM, there are jobs to initialize Hash Files. These jobs create the hash files before the jobs requiring them for lookup are executed. These Hash Files are also updated once the target table is loaded in the ETL job. This method will enable multiple jobs to utilize the same hash file as long as the structures required are the same.

Another method is to load the hash file within the same job using them as a lookup. This method requires the hash files to be reloaded every time the job executes.

See Hashed Files, Understanding Data Validation and Error Handling in the ETL Process.

What should I keep in mind when managing my hash files?

The default setting for Hashed Files are project specific and cannot be shared across projects. The validity of Hashed Files is dependent on the base table it is generated from. The base table should only be updated by the ETL jobs provided in EPM. If not, the hashed file and the table will be out of sync and may result in faulty data when used in an ETL job.

There are several Hashed File utilities provided in EPM. These are located in the Utilities\Hash_Utils category.

Can I customize the storage location for hash files?

It is possible to customize the storage location for hash files by specifying the directory path.

You can set the storage path of the hash files. The path location has to be set in the environmental parameter #$HASHED_FILE_DIRECTORY# and this parameter is used across all the hash files.

How to recover data from corrupted hash files?

Generally, a corrupted hash file must be reloaded from the base table. EPM provides utilities to back up and recover DateTime and SurrogateKey hashed files.

Click to jump to parent topicETL Routines

This table provides answers to questions about ETL routines.

Question

Answer

How are routines used?

Routines are used to make DataStage job code reusable. Routines are used in various parts of the job design. 95 routines are delivered as part of EPM ETL. These are present in a category called 'EPM_Routines'.

Where can one find the details for all the EPM Routines?

See Routine Descriptions.

Click to jump to parent topicETL Job Process Flow

This table provides answers to questions about the ETL job process flow.

Question

Answer

How does EPM manage the process flow for ETL jobs?

How are job interdependencies managed?

DataStage Sequence job allows you to specify several jobs to run in controlled manner and can be used to specify different courses of action to take depending on whether a job in the Sequence succeeds or fails. Every ETL load has a Sequence job and each business process within a datamart is provided with a master sequence to trigger all the jobs belonging to it.

See Using the Master Sequencer Utility to Create Master Sequencer Jobs.

What are the common triggers used in the process flow?

Triggers are used to control the flow of a Sequence job in triggering various other Sequence/Server child jobs. The most commonly used ones are

  • Failed – Conditional

  • Warning – Conditional

  • OK – Conditional

  • Unconditional

Click to jump to parent topicETL Utilities

This table provides answers to questions about delivered ETL utilities.

Question

Answer

What is the Language Swap utility?

If the source database base language is different from the EPM database base language, you must ensure that the EPM base tables have descriptions in EPM base language and the related language table have descriptions in EPM installed foreign language. The Language Swap ETL utility provides this functionality.

For more information on the language swap utility, see the chapter Setting Up Multilanguage Processing and Running the Language Swap Utility in your warehouse specific PeopleBook (for example, the PeopleSoft Campus Solutions Warehouse PeopleBook).

What are related language tables and how are they packaged?

In EPM, every table that requires language translation has a corresponding related language table. ETL jobs to populate these language tables are created. These jobs are packaged along with the base table jobs. Running these jobs is optional, since not all of them require the use of multi-language functionality.

What is the Currency Conversion utility?

This utility is used to populate the reporting amount and reporting currency code columns in fact tables in the Multidimensional Warehouse. This population is considered an ETL post process. Before running the ETL, the setup for MDW Currency Conversion definitions should be completed in the PIA pages.

For more information on the MDW currency conversion utility, see the chapter Implementing Currency Conversion for Multiple Currencies in your warehouse specific PeopleBook (for example, the PeopleSoft Campus Solutions Warehouse PeopleBook).

What are tree-processing jobs?

If a warehouse or data mart has tree or recursive hierarchy data, the ETL utility to process this data must be triggered. The utility flattens and denormalizes the set of hierarchies. These hierarchy definitions needs to defined in PIA pages before running the ETL jobs.

For more information on the MDW tree flattening process, see the chapter Processing Trees and Recursive Hierarchies in your warehouse specific PeopleBook (for example, the PeopleSoft Campus Solutions Warehouse PeopleBook).

What are the Dimension Mapper jobs?

Dimension mapper jobs refer to a set of jobs that enable users to set up Warehouse Business Units and SetIDs. This mapper tool utilizes data from several other tables such as PF_SRC_SETCNTRL, PF_SRC_BU_NAMES, and PF_SRC_BU_ROLES, which are loaded by the dimension mapper jobs.

The output tables of the Dimension Mapper tool are PF_SETID_LOOKUP, PF_BUS_UNIT_MAP, BUS_UNIT_TBL_PF, BUS_UNIT_TBL_FS, SETID_TBL, SET_SNTRL_TBL, SET_CNTRL_GROUP, and SET_CNTRL_REC. These tables are used as lookups in ETL job design.

See Importing Source Business Units into EPM to Create Warehouse Business Units.

Click to jump to parent topicRunning ETL Jobs

This table provides answers to questions about running ETL jobs.

Question

Answer

How do I identify ETL jobs that are needed based on the license code or my implementation plans?

See Using the PeopleSoft EPM Lineage Spreadsheets.

What do the OWS to OWE (D00) jobs do?

The D00 job from OWS to OWE loads data from staging tables to _D00 tables in the OWE database.

What do the OWS to OWE (F00) jobs do?

The F00 job from OWS to OWE loads data from staging tables to _F00 tables in the OWE database.

What do the OWS to MDW (Global Dimensions) jobs do?

The Global Dimension jobs loads data from staging tables to dimension tables. Global Dimensions are dimension tables that are shared across warehouses.

What do the OWS to MDW (Local Dimensions) jobs do?

The Local Dimension jobs loads data from staging tables to dimension tables. Local Dimensions are dimension tables that are shared across different marts in a warehouse.

What do the OWS to MDW (data mart dimensions and facts) jobs do?

The dimension and fact job loads data from staging tables to dimension or fact tables. A dimension contains a key (SID) value and attributes used for slicing and dicing measures located in a fact table.

What do the OWE to MDW (Dimensions and Facts) jobs do?

The dimension and fact job loads data from OWE tables (D00 or F00) to dimension or fact tables. A dimension contains a key (SID) value and attributes used for slicing and dicing measures located in a fact table.

How do uncompiled jobs affect EPM?

All server jobs/sequencers should be compiled before running. Uncompiled jobs will not run and have to be compiled using the Designer prior to running.

In what order must I run different categories of jobs?

See Using the PeopleSoft EPM Lineage Spreadsheets.

How are Master Sequences used to load data marts?

  • Loading Sequence for OWS: To run the OWS jobs, you can use the Master_Run_Utility, which reads the list of OWS jobs from a flat file and executes them in the specific order as mentioned in the input flat file.

    Using this Master_Run_Utility, you can run Hash Load jobs (first time to create the Hash Files), Base - Sequence Jobs, and Language Sequence jobs.

  • Loading Sequence for OWE: To run the OWE jobs, you can use the Master_Run_Utility, which reads the list of OWS jobs from a flat file and executes them in the specific order as mentioned in the input flat file.

    Using this Master_Run_Utility, user can run Hash Load jobs (first time to create the Hash Files), D00, F00 and Language D00 jobs.

  • Loading Sequence for MDW: To run the MDW Dimensions and Facts, user can run the Master_Sequence jobs that are present under the Master_Sequence category. Master_Sequence category will be present under all subject areas of the data mart node. Hence, there are four Master Sequence jobs for each subject area: Hash Load Jobs, Base - Dimensions, Facts and Language Dimensions.

    For Global Dimensions and Local Dimensions you can find 3 Master Sequence; Hash Load Jobs, Base – Dimensions and Language Dimensions. Alternately, you can also use Master_Run_Utility to run the MDW Dimensions and Facts, provided there is a loading sequence flat file present in the DataStage Server.

Can I use the People Tools Process Scheduler to trigger jobs?

No, you cannot use the Process Scheduler to trigger jobs. To schedule jobs, you can use the DataStage Scheduler or any other third party scheduling tool.

When do I use the Master Run utility in the utilities folder?

The Master_Run_Utility can be used to run the set of jobs that are present in a flat file in the DataStage Server. This utility will read the list of jobs that are present in the file and trigger them in a serial mode, by taking care of dependency logic as mentioned in the Input Flat file. Master_Run_Utility can be used to run any jobs in a dependent or independent mode.

Click to jump to parent topicError Handling with ETL Jobs

This table provides answers to questions about error handling with ETL jobs.

Question

Answer

What is the Error validation mechanism built into EPM ETL design?

See Understanding Data Validation and Error Handling in the ETL Process.

How is Error Validation performed using Error tables?

See Understanding Data Validation and Error Handling in the ETL Process.

Click to jump to parent topicETL Job Validation

This table provides answers to questions about ETL job validation.

Question

Answer

How do I confirm whether a job has run successfully?

Log onto Datastage director and check the status of the job. If the status of the job is "Finished" then the job competed Successfully with no errors or warnings. If the status is finished then you can go and query your target database and check if the number of rows populated in the target table are correct.

If the status of the job is Finished (See Log ) then there are some warnings in the job and you need to check the log file to see what the warning is about and fix that particular issue and run it till the job completes with status Finished. Finally if the status is Abort then the job has some Fatal errors. You need to fix those errors and run the job again to make sure the job is in "Finished" status.

How can I test OWS, OWE, and MDW jobs?

  • OWS: Run the sequencer and check if the job status of the sequencer is in Finished. Then do a data compare between the source and the target table and make sure the data is matching exactly to the result set you have.

  • OWE: Ensure the OWS data is populated first as the source for OWE jobs is OWS. Run the OWE job and make sure the job is in Finished status. Do a data compare with the source and target database to make sure the data that is populated in the target tables is matching to your expected result set.

  • MDW: Ensure the OWS data is populated first as the source for MDW job is OWS. In some cases, the source can be OWE and in such cases the OWE jobs must be executed first before running the MDW jobs. Run the MDW job and make sure the job is in 'Finished' status. Do a data compare with source and target database and check if the data populated in the target database is matching to your expected results.

How do I validate a job execution by verifying the data?

Perform a Database compare. This can be done by handwritten scripts or by third party DBCompare tools.

Click to jump to parent topicCommon ETL Issues

This table provides answers to questions about common ETL issues.

Question

Answer

How do you debug a job that aborted or completed with warnings?

See Managing Aborted and Failed Jobs.

What if I drop and recreate an index?

Dropping and recreating an index would increase the ETL performance. However the downside of it is when there are any integrity constraints defined via indexes at the DB level and they not being handled in the ETL application. In such cases, data that gets loaded might not be cleaner and this might produce errors during recreating the index.

How do I resolve the following issue I receive when running any job in DataStage:

I receive the error message - Could not load drsoci.so when pointed to an Oracle database and the DataStage server is UNIX.

Verify the dsenv file, which is a centralized file for storing environmental variables in the DataStage Server. It resides in $DSHOME, where $DSHOME identifies the DataStage main directory (for example /u1/dsadm/IBM WebSphere/DataStage/DSEngine).

The dsenv file is a series of Bourne shell arguments, which are referenced during DataStage server startup and can be referenced by interactive users or other programs or scripts. For a connection using a non-wire protocol driver, you generally need to specify the following in the dsenv file:

  • Environment variables required by the database client software

  • Database home location

  • Database library directory

Certain Plug-ins require shared libraries to be loaded and you need to include the library path in an environment variable. The names of the library path environment variables is platform dependent:

  • Solaris Platform = LD_LIBRARY_PATH

  • HP-UX Platform = SHLIB_PATH

  • AIX Platform = LIBPATH

  • Compaq Tru64 Platform = LD_LIBRARY_PATH

  • LINUX Platform = LD_LIBRARY_PATH

The following provides typical entries for commonly used databases:

# Oracle 8i

ORACLE_HOME=/space/oracle8i

ORAHOME=/space/oracle8i

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib:$ORACLE_HOME/rdbms/lib;export

LD_LIBRARY_PATH

ORACLE_SID=WSMK5

ORASID=WSMK5

export ORACLE_HOME ORAHOME ORACLE_SID ORASID

See IBM Information Server: Planning Installation and Configuration Guide

How do I resolve the following issue I receive when running any job in DataStage:

I receive the error message - Could not load drsdb2.so when pointed to a DB2 database and the DataStage server is UNIX.

Verify the dsenv file, which is a centralised file for storing environmental variables in the DataStage Server. It resides in $DSHOME, where $DSHOME identifies the DataStage main directory (for example /u1/dsadm/IBM WebSphere/DataStage/DSEngine).

The dsenv file is a series of Bourne shell arguments, which are referenced during DataStage server startup and can be referenced by interactive users or other programs or scripts. For a connection using a non-wire protocol driver, you generally need to specify the following in the dsenv file:

  • Environment variables required by the database client software

  • Database home location

  • Database library directory

Certain Plug-ins require shared libraries to be loaded and you need to include the library path in an environment variable. The names of the library path environment variables is platform dependent:

  • Solaris Platform = LD_LIBRARY_PATH

  • HP-UX Platform = SHLIB_PATH

  • AIX Platform = LIBPATH

  • Compaq Tru64 Platform = LD_LIBRARY_PATH

  • LINUX Platform = LD_LIBRARY_PATH

The following provides typical entries for commonly used databases:

#DB2 6.1

DB2DIR=/opt/IBMDB2/V6.1;export DB2DIR

DB2INSTANCE=DB2inst1; export DB2INSTANCE

INSTHOME=/export/home/DB2inst1;export INSTHOME

PATH=$PATH:$INSTHOME/sqllib/bin:$INSTHOME/sqllib/adm:$INSTHOME/sqllib/misc export PATH

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$INSTHOME/sqllib/lib;export LD_LIBRARY_PATH THREADS_FLAG=native;export THREADS_FLAG

How do I switch to a new project when the warehouse tables have already been loaded with data?

There might be a need to switch to new project when the warehouse tables have already been loaded for some time. In such cases, there is some project specific control data that must be restored onto the new project. For this purpose, it is always a good idea to backup this control data at some regular intervals of time after significant chunk of ETL loading gets completed.

PeopleSoft delivers utilities that the backup/recovery process.

Refer the section 'Running Datastage Project Utilities' and the following subsections in the PeopleSoft EPM Red Paper: ETL Implementation Steps (found in My Oracle Support) procedure to run these utilities:

  • Backup_SurrogateKey_HashFile – Utility

  • Backup_DateTime_HashFiles – Utility

  • Recovery_DateTime_HashFiles – Utility

  • Recovery_SurrogateKey_HashFile – Utility

If a job aborts after half of the one million rows are written to the tables, what will happen?

If the Transaction size is selected as zero and if the job aborts in the middle, then the job will rollback the transactions since it follows the principle of Two-way commit. If the Transaction size is anything other than zero and if the job fails in the middle, then the job will perform commits for the number of rows that processed till the error message.

How do I report an issue with a job log for a job that completed with warnings or errors?

Report an issue to Global Customer Support. To report an issue, you must include the job log of the last run. In DataStage Designer, view the log for the job in detailed view mode. Select Project, Print from the menu. In the Print dialog box, select the All entries, Full details, and Print to file options. Click OK, name the file, and send the log along with your issue description and other pertinent information.

Click to jump to parent topicConfiguring Delivered ETL Content

This table provides answers to questions about configuring and enhancing delivered ETL content.

Question

Answer

What are some of the configurations I may want to perform?

  • Convert an incremental load job to a destructive load job.

  • Parameterize the hash file directory for user defined directories that store hashed files (rather than the default project directory).

  • Remove lookups that do not pertain to your business.

  • Configure jobs for initial load.

How do I convert an incremental load job to a destructive load job?

See Converting Incremental Load Jobs to Destructive Load Jobs.

How do I run ETL jobs in destructive mode after some have already run as incremental loads?

See Converting Incremental Load Jobs to Destructive Load Jobs.

How do I add a new environmental variable?

See Adding New Environmental Variables.

What if I make a change that requires a new attribute in a dimension table?

If there is change in the data model with respect to a new addition of attribute to the EPM database, then you has to update the corresponding dimension job to incorporate this new attribute, otherwise the job will fail. If there is no source for this new attribute, then in the dimension job you can have a Default value assigned to this new attribute by using a routine that is delivered.

What if I make a change that requires a new dimension in a fact table?

If a new dimension key is added to a fact table in the database, then this is a change to the data model. Since the database has an additional dimension key for the fact table, this will result in changes to the ETL job. If this is a new dimension, then a new job has to be developed for this new dimension. Fact job must be updated accordingly with the correct dimension key and Corresponding SID population in the Fact table.

What if I make a change that requires a new measure in a fact table?

If a new measure is added to a fact table in the database, then this is a change to the data model. Since the database has an additional measure for the fact table, this will result in changes to the ETL job. Fact job must be updated accordingly with the correct measure getting assigned to the value that is either coming from the source directly or applying any logic that is required for this measure to be populated as per your requirements.

What if I make a change that requires a new dimension table?

A new ETL job has to be developed for this new dimension table as per the requirements.

What if I make a change that requires a new fact table?

A new ETL job has to be developed for this new fact table as per the requirements.

What are some techniques I can use to enhance ETL performance?

  • InterProcess Stage: The IPC stage is used to implement pipeline parallelism. The IPC stage can be used to explicitly de-link two passive stage activities to run in separate processes. As a good practice an IPC stage can be inserted before a database write stage.

  • Link Partitioner/Link Collector Stages: The link partitioner and link collector can be used in conjunction to implement partition parallelism. Usage of link partitioner and link collector can be envisaged where specific requirement exists to further enhance job performance. This can be used most effectively to run in parallel processes that take a lot of time. The functionality that is to be run in parallel is typically identical and can be made into a local container. This way any changes to the common parallel part of the job need to be effected only at a single place.

How can I enhance the parallel processing capabilities of ETL server jobs?

  • InterProcess Stage: The IPC stage is used to implement pipeline parallelism. The IPC stage can be used to explicitly de-link two passive stage activities to run in separate processes. As a good practice an IPC stage can be inserted before a database write stage.

  • Link Partitioner/Link Collector Stages: The link partitioner and link collector can be used in conjunction to implement partition parallelism. Usage of link partitioner and link collector can be envisaged where specific requirement exists to further enhance job performance. This can be used most effectively to run in parallel processes that take a lot of time. The functionality that is to be run in parallel is typically identical and can be made into a local container. This way any changes to the common parallel part of the job need to be effected only at a single place.

    See WebSphere DataStage Development: Designer Client Guide