Working with Materialized View Logs in Delivered ETL Jobs

Materialized View Log ETL jobs can be found in the following DataStage folder: Jobs, FMS_E, OWS, MVs, Load_Tables, Sequence (and) Server.

Image: Materialized View Logs in the DataStage project tree

This example illustrates the Materialized View Logs in the DataStage project tree.

Materialized View Log ETL jobs are designed to source from Materialized View Logs to dramatically reduce the processing time for OWS jobs that use the CRC process. To facilitate this, the jobs are designed to run the existing CRC job for the first time after you implement the Materialized View Logs in the Oracle database, then run the Materialized-View-specific jobs every time thereafter. This method ensures that there is no data loss on the first job run. For example, if before you implement the Materialized View you insert or modify some records in the source table, they will not be captured in the MLOG table or the CRC hashed file. Thus, in order not to lose any data the very first run after the MLOG implementation will be the CRC job run.

Additionally, Materialized View Log ETL jobs are designed to identify deleted source records so those records can later be deleted in EPM fact tables.

Describing Materialized View Log ETL Jobs

This section provides a detailed overview of Materialized View Log ETL design, including the processing logic for identifying source record deletes. This section uses the sequencer job SEQ_J_Stage_PS_ACAD_PLAN_CS9_EPM91 to describe the Materialized View Log ETL design:

Image: SEQ_J_Stage_PS_ACAD_PLAN_CS9_EPM91 materialized view log job

This example illustrates the Materialized View Log job SEQ_J_Stage_PS_ACAD_PLAN_CS9_EPM91.

UV1 (User Variable 1)

User Variable 1 executes the following routines:

RTNCHKMLOGSC: This routine checks the job name value in the HASH_MLOG_END_SEQUENCE hashed file and returns an appropriate value for the CHK_CRC nested condition.

If a job name value exists, the routine returns the value ‘MLOG,’ and the CHK_CRC nested condition calls the J_Get_MLOG_Sequence job.

If a job name value does not exist, the routine returns the value ‘CRC,’ and the CHK_CRC nested condition calls the existing CRC job (CRC_JOB_SEQ_J_Stage_PS_ACAD_PLAN in this example).
RTNGETMAXMLOGSEQUENCE: This routine fetches the concatenated value of the start sequence and the end sequence number from the MLOG hashed files HASH_MLOG_START_SEQUENCE and HASH_MLOG_END_SEQUENCE.

Note: For the first time run the return value will be -1|-1 as the job name will have no entries in either of the two hash files.

CHK_CRC (Nested Condition)

The CHK_CRC nested condition checks the return value from the RTNCHKMLOGSC routine and executes one of two ETL jobs, based on the return value.

If the routine returns the value ‘MLOG,’ (job name value exists), the CHK_CRC nested condition calls the J_Get_MLOG_Sequence job. This job fetches the Maximum Sequence Number from the MLOG table and writes it to the hashed file HASH_MLOG_END_SEQUENCE.

If the routine returns the value ‘CRC,’ (job name value does not exist), the CHK_CRC nested condition calls the existing CRC job (CRC_JOB_SEQ_J_Stage_PS_ACAD_PLAN in this example).

Path 1: CRC_JOB_SEQ_J_Stage_PS_ACAD_PLAN Job Activity

Image: CRC_JOB_SEQ_J_Stage_PS_ACAD_PLAN Job Activity

This example illustrates the CRC_JOB_SEQ_J_Stage_PS_ACAD_PLAN Job Activity.

MLOG_SEQ_J_Stage_PS_ACAD_PLAN_CS9_EPM91 Job Activity Path

When the CHK_CRC nested condition calls the existing CRC job (CRC_JOB_SEQ_J_Stage_PS_ACAD_PLAN in this example), it can either run successfully or fail.

If the CRC job fails or aborts, the ClearChkMlogSC routine deletes the job name from the HASH_MLOG_END_SEQUENCE hashed file. Thus, when the job runs a second time, it executes the existing CRC job.

If the CRC job runs successfully, the job CRC_ORCL_J_Handle_Staging_Success is called. This job moves data from the delta hashed file to the CRC_HASHED_FILE and DELETE_HASHED_FILE:

Image: CRC_ORCL_J_Handle_Staging_Success job

This example illustrates the CRC_ORCL_J_Handle_Staging_Success job.

This process keeps the existing CRC hashed files and delete hashed files synchronized.

Also, the Delete_MLOG output link deletes data from the MLOG table up to the Maximum Sequence Number using the after-SQL in the DRS stage:

Image: MLOG source table purge

MLOG source table purge.

The Update_SC output link stores the job name and one value higher than the maximum sequence number ( maximum_seq_number+1) in the HASH_MLOG_START_SEQUENCE hash file present in the shared container. This value will be used as the START_SEQUENCE value for the next run.

Path 2: J_Get_MLOG_Sequence Job Activity

Image: Path 2: J_Get_MLOG_Sequence Job Activity

This example illustrates the Path 2: J_Get_MLOG_Sequence Job Activity.

MLOG_SEQ_J_Stage_PS_ACAD_PLAN_J_Get_MLOG_Sequence Job Activity

When the CHK_CRC nested condition calls the J_Get_MLOG_Sequence job, the job fetches the Maximum Sequence Number from the MLOG table and writes it to the HASH_MLOG_END_SEQUENCE hashed file:

Image: J_Get_MLOG_Sequence job

This example illustrates the J_Get_MLOG_Sequence job.

The UV_MLOG user variable calls the following routines:

MAX_MLOG_SEQ (RTNGETMAXMLOGSEQUENCE): Fetches the concatenated value of the start sequence and the end sequence numbers from the HASH_MLOG_START_SEQUENCE and HASH_MLOG_END_SEQUENCE hashed files.
BATCH_SID (GETNEXTBATCHNUMBERPARALLEL): Generates BATCH_SIDs.
HANDLE_DELETES (READPARAMETERFILE): Fetches the ParameterName value from the CS_HANDLEDELETES_SETUP parameter file.

Note: For FMS Warehouse jobs, it fetches the ParameterName value from the FMS_HANDLEDELETES_SETUP parameter file.
DELETE_FLAG (RTNGETDELFLAG): Fetches the Delete Flag for the corresponding server job (J_Stage_PS_ACAD_PLAN_CS9_EPM91 in this example).

Next, the ORCL_J_Stage_PS_ACAD_PLAN_CS91_EPM91 job activity calls the job ORCL_J_Stage_PS_ACAD_PLAN_CS91_EPM91:

Image: ORCL_J_Stage_PS_ACAD_PLAN_CS91_EPM91 job

ORCL_J_Stage_PS_ACAD_PLAN_CS91_EPM91 job

MLOG_ORCL_J_Stage_PS_ACAD_PLAN_CS91_EPM91

This job fetches data from the base source table and the MLOG table and loads it to the target OWS table. A right outer join on the MLOG table ensures all newly inserted, modified and deleted rows are extracted. The existing CRC hashed file identifies inserted and updated records.

The source DRS identifies deleted records from the master table using the BASE_KEY, which can be any of the key columns of the source table. When a record is deleted from the master table, the MLOG table captures that record but the record will no longer exist in master. As such, a RIGHT OUTER JOIN on the MLOG table fetches the deleted rows. For deleted records the BASE_KEY column will be NULL, indicating that the record no longer exists in the source table.

The UPDATE_HASH_DELTA hashed file is also maintained in this job.