Transportation Intelligence

2. TI: ETL Server

This chapter details the technical solution of the Oracle TI ETL in Oracle Data Integrator.

The Oracle TI ETL solution is delivered under a single Oracle Data Integrator project called FTI_ETL. The ETL process works as follows:

  • When the ETL process runs for the first time, initial load is executed for a ‘first time load’ when no data exists in the HD tables. Typically, the execution plan in this mode involves only inserts to the HD tables.
  • Any other time the ETL process runs, incremental load is executed for regular operational data load to the HD tables. Typically, the execution plan in this mode involves inserts, updates, and soft deletes to HD tables.

Data Loading for ETL

Oracle TI ETL only takes into account data available at the start of the ETL. Hence, any order, shipment, etc. that is modified in OTM after the ETL has started, will not be considered as part of the current ETL. Any such modified data will be picked up in the next ETL.

For every fact and dimension table, the ETL compares the update date on each object to load against the following dates:

  1. Previous Load time: Last successful load time of the table in HDOWNER (based on E_LOAD_LOG)
  2. ETL Start Time: Time when ETL started/kicked off.

Any data modified after this ETL Start Time will not be considered for the current ETL.

Data Rejection

Following are some of the possible reasons for data to be rejected in the ETL.

Reason 1

The TI/GTI ETL logic is so designed as to pick only those records which have been inserted/updated before the start of the ETL. Any record modified during the ETL run will be rejected, and picked up only in the subsequent ETL run. For a record to be successfully loaded, it must satisfy the following criteria sequentially:

  1. The date and time of the record should be between the last successful load time in HDOWNER and the ETL start time.
  2. The Shipment HD Load Status should be 'Ready to Load' when ETL starts.
  3. The Shipment Status Insert/Update Date, or Shipment Update Date, or Shipment Stop Update Date, or Shipment Cost Update Date value should be before the ETL start time.

For example:

  • Shipment ID: 1234
  • ETL Start Time: 9/13/2016 12:00 PM
  • Shipment HD Load Status: Ready to Load
  • Shipment Update Date: 9/13/2016 12:10 PM (changed after the shipment is modified in OTM)
  • D_Shipment Loaded: 9/13/2016 12:30:00
  • F_SHIPMENT_STOP_D loaded: 9/13/2016 13:00:00

In this case, even though the initial Shipment HD Load Status is 'Ready to Load', the shipment 1234 will not be considered for current ETL because the Shipment Insert Date has changed to 12:10 PM, which is after the ETL Start Time 12:00 PM. So both D_SHIPMENT and all shipment facts will be rejected for the current ETL. They will be picked up for the next ETL.

Reason 2

The Dimension D_SHIPMENT table might have some erroneous data. Then, when loading the fact table this data is rejected due to a missing foreign key.

Reason 3

Cloning of only OLTP can also result in data rejection. In a case where OLTP is cloned from a production environment to a test environment, there is chance of data being rejected. This issue might not occur in the production environment assuming copying to production database is unlikely. For example:

  1. There is a test environment with OTM and TI, along with some test data in OTM.
  2. ETL is run and data is loaded successfully into TI.
  3. Now the OTM data (OLTP) in the test environment is cloned from a production environment which contains a different data set.
  4. New shipments are created and ETL is run.
  5. ETL does not pick the newly created shipments and load them into E_LOAD_REJECTS due to a foreign key violation. This happens because dimensions like Services Providers, Locations, etc. are not available in TI. These dimensions are not loaded into TI in this ETL because they were created a long time back in the production environment. So when cloned for the test environment, the dimensions are not picked in the latest ETL because they have old Insert Date/Update Date.
  6. You can use one of the following options to resolve this issue:
  • Clear the TI database as well when the source database/OLTP is cloned. This keeps both OTM and TI in sync.
  • Truncate all data in TI and run the ETL which will run the initial (full) ETL. However, this is not advisable in case of huge volumes of data.

Path and Path Edge Loading

W_PATH_D and W_PATH_EDGE_D will not be loaded by default by the ETL due to performance issues. Set the following property to enable in the ETL.

  1. Login to OTM as DBA.ADMIN.
  2. Go to Business Process Automation > Power Data > Document Generation > Report Common Properties.
  3. Set LOAD_PATH_AND_EDGE to true.
  4. Run the ETL.

Related Topics