ETL Configurations

This chapter discusses how to perform the following optional ETL configurations:

Click to jump to parent topicUsing the Master Sequencer Utility to Create Master Sequencer Jobs

The master sequencer utility (Create_MasterSequence) enables you to create master sequencer jobs using delivered sequencer jobs as input. You will recall that sequencer jobs invoke and run other jobs. You can create a master sequencer job to invoke and run all the required sequencers that populate a specific fact table , or populate all fact tables for a given data mart, or all fact tables for an entire EPM warehouse.

For example, you can create a master sequencer job to automatically invoke and run 20 sequencer jobs from the Receivables mart. To do so, simply use the master sequencer utility to specify the 20 jobs you want to invoke, enter the applicable job parameters, and import the generated *.dsx file into your project.

The master sequencer utility is available in the following DataStage folder: Utilities\Job_Utils\MasterSequence\Create_MasterSequence.

Creating a Master Sequencer Job

To create a master sequencer job using the master sequencer utility:

  1. Identify the individual sequencer jobs you want to include in the master sequencer job.

    The ETL Lineage appendix can help you determine which ETL sequencer jobs are required based on your needs.

    See Using the PeopleSoft EPM Lineage Spreadsheets.

    Note. The master sequencer utility uses only sequencer jobs as input; please do not use server jobs as input. Also, the sequencer jobs you select should be compiled and ready for use.

  2. Copy the list of sequencer jobs and paste the names into a text file.

    Ensure that each job is entered on a separate line. For example,

    SEQ_J_Dim_PS_D_DT_PATTERN_OWE SEQ_J_Dim_PS_D_BUS_UNIT_PF_OWE SEQ_J_Dim_PS_D_BUSINESS_UNIT_EX

    Note. The master sequencer job uses the text file as an input and reads the list of sequencer jobs from the file.

  3. Save the text file and specify the input file path (SRC_JOBPATH).

    The file path you specify should be local to the server; do not specify a path to a client machine.

    Note the file path of the text file for later use.

  4. In DataStage Director, navigate to the Master Sequencer Utility using the following navigation: Utilities\Job_Utils\MasterSequence\Create_MasterSequence.

  5. Select Create_MasterSequence and run.

    The Create _MasterSequence - Job Run Options window appears where the job input parameters are displayed.

  6. In the Parameters tab, use the Enter Source text file Name field to enter the file path for the text file you created in step three.

  7. Use the Enter Target File Path field to enter a location where you want the new master sequencer job (the output *.dsx file) stored.

    The file path you specify should be local to the server; do not specify a path to a client machine.

    The file path must have the *.dsx extension appended to it, for example, C:\MSEQ\Mseq01.dsx.

  8. Use the Enter the Jobname field to enter the name of the new master sequencer job.

  9. Use the Choose the Master Sequence Type field to specify the mode in which you want the sequencer jobs to be run.

    Valid values are Parallel or Sequential.

    Note. Ensure you select the proper mode for the sequential jobs included in the master sequencer. Only staging sequencer jobs can be run in parallel. Other dimension and fact jobs may have dependencies and, unless those dependencies are identified and managed accordingly, we advise that you do not run them in parallel.

  10. Click the Run button to generate the master sequencer job.

    This process generates the master sequencer job and stores it in the target file path you specified in step seven.

  11. Check the log file of the Create_MasterSequence job for warning messages

    Hard-coded values in the sequencer job parameters are listed as warning messages in the log file. If you encounter a warning message, correct the master sequencer job parameters accordingly, and re-run the Create_MasterSequence job.

  12. Import the generated master sequencer job into your DataStage project.

    If a master sequencer job with the same name already exists in the project, it is overwritten and placed in the same category.

    If the master sequencer job does not exist in the project, it is placed in the following default location: Utilities\Job_Utils\MasterSequence.

    Once the master sequencer job is imported into your project, you can move it to another category.

    The utility does not provide annotations for the master sequencer job, but you can provide annotations if you wish.

Note. You must compile the master sequencer job before you can use it.

See Compiling ETL Jobs.

Click to jump to parent topicUsing the Master Run Utility to Automatically Run Your ETL Jobs

This section provides an overview of the Master Run Utility and discusses how to run the utility.

Click to jump to top of pageClick to jump to parent topicUnderstanding the Master Run Utility

PeopleSoft provides the Master Run Utility, which enables you to automatically run a set of jobs located in a flat file on the DataStage Server. When you use the Master Run Utility, it reads a list of jobs that are present in a specified flat file and triggers the jobs to run in serial mode, using the dependency logic specified in the Input flat file.

The following features are incorporated into the Master Run Utility:

Location of the Master Run Utility

The Master Run Utility is available in the Common_Utilities.dsx file. After you import this .dsx file, you can locate the utility using the following navigation: Jobs, Utilities, Job_Utils, Master_Run_Utility.

Input File for Master Run Utility

The Input flat file contains the list of ETL jobs that are available in the DataStage Server and indicates whether a job is independent or dependent. Dependency information is used to determine job execution in the Master Run Utility. If a job labeled as independent fails to run properly, the Master Run Utility logs the information and proceeds to the next job. However, if a job labeled as dependent fails to run properly, the utility itself aborts. All jobs listed in the flat file contain the suffix N (Independent job) or D (Dependent Job) to indicate their dependency.

The Input flat file also contains comments that describe each set of ETL jobs. Comments are prefixed with an asterisk (*) to clearly indicate they are comments. The following screen shot provides an example flat file:

The sample file above contains the list of HCM jobs used to load data into OWS target tables.

Restart Recovery Feature

When a dependent job fails to run properly, the Master Run Utility automatically aborts the job and all subsequent jobs. Assuming you fix the problem that caused the job to fail, you must rerun the Master Run Utility to complete the process of running those jobs that were aborted. Instead of running all the jobs in that particular folder, the restart recovery feature enables you to rerun only those jobs that were aborted.

For example, assume the file SAMPLE_HCM_E_GLOBAL_DIMENSIONS_Base_Tables.txt contains Global Dimension jobs that are all dependent to each other. If the SEQ_J_Dim_PS_D_POS job is aborted, the entire utility aborts. Later you fix the issue in the SEQ_J_Dim_PS_D_POS job. This time, you can run the Master_Run_Utility with the Restart Recovery option set to Yes so that it runs the jobs from SEQ_J_Dim_PS_D_POS only and ignores the jobs that have previously completed successfully.

Click to jump to top of pageClick to jump to parent topicRunning the Master Run Utility

Perform the following steps to run the Master Run Utility:

  1. In DataStage Director, navigate to the Master Run Utility using the following navigation: Jobs, Utilities, Job_Utils.

  2. Select Master_Run_Utility and click Run.

    The Master_Run_Utility - Job Run Options window appears.

  3. Enter the path to the location to the flat file and specify whether you want to use the restart recovery feature.

    Click Run.

  4. From the DataStage Director menu, select View, Log.

    The Master Run Utility generates a report with the following information:

  5. If you double-click log entries that contain the phrase COMPLETED SUCCESSFULLY, the Event Details window appears and displays all the jobs that have successfully completed.

    Likewise, if you double-click log entries that contain the key word ABORTED, the Event Details window appears and displays all the jobs that have aborted.

Click to jump to parent topicSupporting UniCode Data

To support Unicode data, you must have first installed DataStage Server with the Install NLS for DataStage Server check box selected. You must then configure NLS options in DataStage Administrator.

To configure NLS options:

  1. In DataStage Administrator, select the Projects tab and then the project you want to configure NLS options.

  2. Select the NLS button to select NLS options.

    These options are available only if DataStage Server was installed with the NLS option.

  3. In the Server Maps tab, select the correct NLS map for the project default.

    This value is used in all the jobs to map Unicode data.

  4. If you do not see the correct NLS map in the list, you can click the Install>> button to view all available maps and load the particular map you require.

  5. Select the NLS tab to ensure that your selected map displays as the Project Default for individual jobs in the DRS stage.

    Note. This value can be overwritten at the job level by changing this value in the DRS stage.

  6. Click OK to return to the Projects tab.

Click to jump to parent topicConverting Incremental Load Jobs to Destructive Load Jobs

As part of the ETL configuration process, you can convert incremental load jobs to destructive load jobs. However, because server jobs that use CRC logic would require modification to at least 80% of the design, it is better not to alter the existing jobs and create a new destructive load job from scratch.

The section below discusses how to convert incremental load jobs that use the DateTime stamp.

Converting Jobs that Use the DateTime Stamp

The changes required to convert an incremental load job (that uses the DateTime stamp) to a destructive load job can be demonstrated using the J_Fact_PS_F_COMM_PERSON job as an example.

To convert an incremental load job (that uses the DateTime stamp) to a destructive load job:

  1. In DataStage Designer, open the server job you want to convert.

  2. Open the source DRS stage and select the Output tab.

  3. In the Selection sub-tab, locate the WHERE clause and delete the last update date time portion (highlighted below).

  4. Click OK to save your changes.

  5. Open the insert (*_INS) target DRS stage and select the Input tab.

  6. In the General sub-tab, select Truncate table then insert rows for the Update Action field.

  7. Click OK to save your changes.

  8. Delete the StoreMaxLastUpdDttm container and link.

  9. Delete the delete (*_DEL) target DRS stage and link.

  10. Delete the update (*_UPD) target DRS stage and link.

  11. Delete the hash target table lookup (the hash lookup that is performed against target table data) and link.

    Because this hash load is used to identify updated or new records and you are converting the job to destructive load, the hash load is no longer needed.

  12. Open the last transformation stage in the job (it should immediately precede the insert target DRS stage).

    New rows are identified in this stage and this is done to retain the Created_EW_DTTM of rows.

    In the example job above, the last stage is called Build_F_COMM_PERSON_final.

  13. Delete the InsertFlag stage variable and click OK to save and exit the window.

  14. Select Edit, Job Properties from the menu and change the target column value for CREATED_EW_DTTM to DSJobStartTimestamp, which is a DS Macro (and same as for the field LASTUPD_EW_DTTM).

  15. Delete the LastModifiedDateTime job parameter and click OK to save and exit the window.

  16. Open the corresponding sequence job that calls the server job and delete the GetLastUpDatetime job activity stage (which calls the routine of the same name).

  17. Select Edit, Job Properties from the menu and delete the LastUpdDateTime job parameter if it is present.

    This parameter is not present in every job.

    ( I say "IF" because it may not be present in the sequence job. It is not needed in the sequencer anyway).

  18. Change the job annotations and descriptions to reflect the change.

  19. Save changes and exit.

  20. Save and recompile the job.

Click to jump to parent topicAdding New Environmental Variables

Environmental variables are project level parameters which are typically used across projects. The advantage to environmental variables is that they can be set at the project level and all associated jobs in that project will automatically use the value.

Creating a New Environmental Variable

To add a new environmental variable:

  1. In DataStage Administrator, select the Projects tab.

  2. Click the Properties button and select the General tab.

  3. Click the Environment button and select User Defined from the Categories pane.

  4. Enter your new environmental variable parameters for the following fields:

    1. Name

    2. Type

    3. Prompt

    4. Value

  5. Click OK to save your changes.

Adding an Environmental Variable to a Server Job

To add the new environmental variable to a server job:

  1. In DataStage Designer, select Jobs from the project tree.

  2. Select the job you want to add the environmental variable.

  3. Select Edit, Job Properties from the menu and select the Parameters tab.

  4. Click the Add Environment Variable... button.

    The Choose Environment Variable window appears.

  5. Select the new environmental variable from the Choose Environment Variable window.

  6. Click OK to save your changes.

    The new environmental parameter is now a part of the job parameters.

    The Default value can be changed to $PROJDEF in the job parameters to signify that the value shall be taken from the project default value unless overwritten.

  7. Save and recompile the job.

Updating the Related Sequencer Job

If there is a sequencer job that calls the server job you modified, you must update the sequencer job to add the parameter value to the job activity stage where the value is passed to the called job.

To modify the related sequencer job:

  1. Select Edit, Job Properties from the menu and select the Job tab.

  2. Select the parameter from the list, and click Insert Parameter.

  3. Select your parameter, then click OK to save changes and exit.

  4. Save and recompile the job.