Oracle® Fusion Middleware User's Guide for Oracle Business Intelligence Data Warehouse Administration Console 11g Release 1 (11.1.1) Part Number E14849-06 |
|
|
PDF · Mobi · ePub |
This chapter provides information about building and running execution plans.
This chapter contains the following topics:
Setting Up Extract Delays, Event Delays and Data Source Notifications
How the DAC Server Handles Requests to Start and Stop Execution Plans
An execution plan is the unit of work you use to organize, schedule, execute, and monitor ETL processes. To generate an execution plan, you specify one or more subject areas, connectivity parameters, and, optionally, pre- and post-ETL tasks. DAC then generates an ordered set of tasks, with dependencies, that will carry out the ETL process.
When you run an execution plan, data is extracted from one or more tables in the source system database, loaded into staging tables, and then transformed and loaded into tables in the data warehouse. The ETL process is carried out in either full or incremental mode, depending on the refresh dates for the tables associated with the execution plan.
If the source or target table refresh dates for a task are null, then DAC invokes a full load workflow command. If both the source and target tables have refresh dates, then DAC invokes the incremental workflow command. For a detailed description of refresh dates, see "About Refresh Dates and DAC's Incremental Load Strategy".
DAC supports the following extract and load combinations:
Full extract and full load
This extract and load combination is used for the very first extract and load. All data is extracted from one or more source systems and loaded into the data warehouse in the following manner: Target tables are truncated; the bulk insert mode is used; and indexes are dropped and created.
Full extract and incremental load
This extract and load combination supports modular execution plan deployments, enabling you to deploy a new source system or new subject area that will write to data warehouse tables that already contain data. When you deploy a new source system or subject area, you should use the same staging tables for the conforming data that are in use for the original source. These staging tables will already have refresh dates, because they will have been populated with data from the original source. However, when a new source is added, the source tables will not have refresh dates. This will cause DAC to perform a full extract and incremental load. On the original source, DAC will perform an incremental extract and an incremental load. The Mode attribute in the Tasks subtab of the Current Runs tab indicates the source and load type.
The task level properties Truncate Always and Truncate for Full Load determine whether tables are truncated during the ETL process. For any given task, if the Truncate Always property is set to True, then both full and incremental loads will truncate the target tables. If the Truncate for Full load property is set to True, then only full loads will truncate the target tables.
When data is loaded into a staging table that has the Truncate Always property set to True, the bulk loader option can be used for both full and incremental commands. When data is written directly to a dimension or fact table that has the Truncate for Full Load property set to True, DAC will use the command for full load to extract all data from the source system but will not truncate the target tables, and, therefore, will not drop indexes. Therefore, the command should be able to write to a table that already has data and indexes. This also means that such a full command does not use bulk loaders, and may need an update strategy. Typically, such logic for using bulk loaders and update strategies are implemented in Informatica (or another external ETL tool).
The incremental load process requires additional logic to determine whether a record should be inserted or updated. Therefore, if you add a new source connection to populate an existing data warehouse, you should expect the incremental load to be slower than when running a full load.
Incremental extract and incremental load
This extract and load combination is used for regular nightly or weekly ETL processes. New or changed records are extracted from one or more source systems and loaded into the data warehouse.
A single-source execution plan extracts data from a single instance of a single, source system container. As illustrated in Figure 6-1, in this scenario data is extracted from a single source, loaded into staging tables, and then loaded into the data warehouse.
Truncate Table Behavior in a Single Extract Scenario
When DAC truncates a table, it also drops and recreates indexes (if the table has indexes) and analyzes the table after loading the data. In a single-extract scenario, the extract tasks truncation process for a table is as follows:
Truncate table.
Drop indexes.
Run the Informatica mapping.
Create indexes.
Analyze table.
For instructions on building a single-source execution plan, see "Building and Running Execution Plans".
When you have a multi-source environment, you can use a single execution plan to extract data for a given subject area for all the sources. When you build the execution plan, DAC creates as many instances of the extract tasks as there are sources, without having the need to duplicate the workflows or DAC metadata.
There are two types of multi-source execution plans:
Homogeneous. This type of execution plan extracts data from multiple instances of the same source system. For example, a business might have an instance of Oracle Fusion Applications in one location and time zone and another instance of Oracle Fusion Applications in another location and time zone. In such cases, the timing of data extraction from the different instances can be staggered to meet your business requirements.
Heterogeneous. This type of execution plan extracts data from multiple instances of dissimilar source systems. For example, a business might have an instance of Siebel 8.0 in one location and instance of Oracle EBS 11i in another location. You can also stagger the timing of data extraction when you use this type of execution plan, by setting the Delay property in the Connectivity Parameters subtab of the Execution Plans tab.
For instructions on building a multi-source execution plan, see "Building and Running Execution Plans".
This section describes properties and behaviors you should consider when setting up multi-source execution plans.
Source Priority
The order in which DAC loads data from the different sources that are participating in the ETL process is determined by the priority of the physical data source connection, as set in the Source Priority field in the Physical Data Sources tab of the Setup view. This property ensures that tasks attempting to write to the same target table will not be in conflict. If two sources are given the same priority, DAC will randomly stagger the tasks.
Truncate Table Behavior
If a target table is shared across different sources, it will be truncated only once. The priority of the data source (specified by the Source Priority property) determines which of the extracts truncates the tables. For more information about how tables are truncated, see "Execution Plan Build Process Rules".
Note that truncate properties should be the same across all source system containers associated with a multi-source execution plan.
In a multiple extract scenario, the truncation process for a table is as follows:
Truncate table (first extract task).
Drop indexes (first extract task).
Run the Informatica mapping (first extract task).
Run the Informatica mapping (second extract task).
Create indexes (second extract task).
Analyze table.
Enabling Failure Restarts When Extracting From Multiple Sources
Typically, extract tasks truncate the target tables, and, therefore, will not have update strategies. To enable failure restarts when extracting from multiple sources, you can create an action that will predelete all the records that are loaded from the source. See "Using a Task Action to Enable Failure Restarts When Extracting From Multiple Sources" for instructions.
Task Groups
Tasks should belong to the same task groups across all source system containers associated with the multi-source execution plan. For example, if a task belongs to a task group in one container but is not associated with a task group in a different container, then an exception will be thrown. Also, if a task with the same name belongs to two different task groups, an exception will be thrown.
Full and Incremental Workflow Commands
Full and incremental workflow commands need to match for tasks across all source system containers.
Configuring Extracts and Notifications
You can delay extracts for a particular source system by using the Delay property in the Connectivity Parameters subtab of the Execution Plans tab. You can also use the Event Delay feature to configure the extracts for different source systems to occur independently.
The Notifications feature enables you to initiate email notifications and the execution of custom SQL based on the first or last read or write operations on a particular data source. See "Setting Up Extract Delays, Event Delays and Data Source Notifications" for more information.
Task Physical Folder Instance Priority
DAC deduplicates tasks from multiple source containers or multiple instances of the same container based on the task name, the primary physical source, and the primary physical target. DAC executes only a single instance of the task. If multiple containers contain the same task, with the same source and target databases but different task physical folders, then DAC will pick one folder. If folders have different priorities, the folder with the lowest priority will be picked. If folders have the same priority, DAC randomly selects a folder.
In Oracle BI Applications, the Instance Priority for task physical folders is predefined. In certain situations, such as if you are using a vertical application, you may want to execute a task from a different folder. You can specify the preference of task physical folders by modifying the Instance Priority. To do so, on the Tools menu, select Seed Data, and then select Task Physical Folders.
DATASOURCE_NUM_ID
All tables should contain the column DATASOURCE_NUM_ID, which describes the source from which the data is coming.
Unique indexes should always include the column DATASOURCE_NUM_ID.
All the extract mappings should populate the DATASOURCE_NUM_ID column from the parameter file produced by DAC.
All the load mappings should extract the value of the DATASOURCE_NUM_ID column from the staging tables.
Multi-source execution plans support two load scenarios:
Multiple extract and single load. In this scenario, as illustrated in Figure 6-2, data is extracted from multiple sources and loaded into central staging tables. After all of the extract tasks have completed, the data is loaded into the data warehouse in a single load process.
Multiple extract and multiple loads. In this scenario, as illustrated in Figure 6-3, data is extracted from multiple sources and loaded into non-centralized staging tables. After all of the extract tasks have completed, the data is loaded into multiple data warehouses using multiple load processes.
Micro ETL execution plans are ETL processes that you schedule at very frequent intervals, such as hourly or half-hourly. They usually handle small subject areas or subsets of larger subject areas.
For instructions on building a micro ETL execution plan, see "Building and Running Execution Plans".
This section contains the following topics:
A regular execution plan includes one or more fact tables and all content associated with the fact tables, such as dimension tables and related tables, and so on. The more associated content that is added to an execution plan, the longer the ETL process runs. If a subject area is small, for example, with one fact table and two dimension tables, there is no need to create a micro ETL execution plan. Instead, you can schedule an execution plan with a single, small subject area to run at frequent intervals.
Micro ETL execution plans, on the other hand, accommodate a business need for specific users to access updated subsets of data at frequent intervals. For example, some users may find their reports are stale if data is only refreshed once nightly. In such cases, a micro ETL execution plan can refresh the results of a small subject area or a subset of a star schema hourly or half-hourly. If you have a fact table with 50 dimension tables as well as other multiple related stars, an execution plan containing this fact table would run for a long period of time and could not be scheduled to run at frequent intervals. You could create a micro ETL execution plan by removing the unnecessary tables and tasks, leaving only the fact and dimension tables required for a specific report. See "Designing a Micro ETL Execution Plan" for instructions.
You should be aware that inconsistencies can occur when running micro ETL execution plans, such as dimension keys pointing to "Unspecified" records and discrepancies between facts and aggregates. However, the subset of data extracted for a micro ETL execution plan will be updated and patched during a regularly scheduled nightly ETL process. See "Important Considerations When Using Micro ETL Execution Plans" for other points to consider.
Note: All tasks that need to participate in the micro ETL execution plan should have update strategies that can update the data even though it has already been loaded into the data warehouse.
DAC tracks refresh dates for tables in micro ETL execution plans separately from other execution plans and uses these refresh dates in the change capture process. After a micro ETL execution plan runs, DAC populates refresh date values in the Micro ETL Refresh Dates subtab of the Execution Plans tab. If a subject area is used in a regular execution plan (an execution plan with the Micro ETL option in the Execution Plans tab not selected) as well as a micro ETL execution plan, DAC maintains refresh dates for the tables in the regular execution plan in the Refresh Dates subtab of the Physical Data Sources tab (Setup view).
In cases where a subject area is used in both a regular and micro ETL execution plan and the micro ETL execution plan is suspended for a few days but the regular execution plan runs nightly, DAC automatically detects the last refresh date for the tables common to both execution plans and intelligently extracts only the most recent records for the micro ETL execution plan. Micro ETL refresh dates do not affect the refresh dates of regular execution plans.
As an example, suppose a micro ETL execution plan runs every hour and the complete (regular) execution plan runs once a day at 1:00 PM. When the micro ETL execution plan runs at 10:00 PM, it will get the refresh timestamp as 9:00 PM. When the complete execution plan runs at 1:00 PM, it will get the refresh timestamp as the previous day at 1:00 PM, even though the micro ETL execution plan ran at 12:00 PM. This will cause all the records to be re-extracted that were already loaded into the data warehouse using the micro ETL execution plan. As long as all tasks have proper update strategies, DAC will re-update the records that were loaded during the day.
Note:
The image sync for Siebel sources does not occur during micro ETL execution plans. This means that the records from the I image tables do not get moved into the R image tables. Therefore, the records are re-extracted during the complete (regular) execution plan.
Micro ETL processes can cause issues with data inconsistencies, data availability, and additional load on the transactional database. Therefore, you should consider the following factors before implementing a micro ETL process:
For related star schemas, if one schema is omitted from a micro ETL execution plan, the cross-star reports may be inaccurate. For example, if the Person fact table is refreshed more frequently than the Revenue fact table, a report that spans the Person and Revenue dimensional schemas may produce inconsistent results.
If you omit dimension tables from a micro ETL execution plan, the foreign keys for the fact tables will point to Unspecified rows for the new dimension records. The foreign key references will be resolved when the Complete ETL execution plan is run, but users of the reports should be aware of such inconsistencies.
If you do not include aggregate tables in micro ETL execution plans, the reports that use data from these tables will be inconsistent with the reports that use data from the detailed fact tables. However, if aggregate tables are included in the micro ETL execution plan, the aggregate calculations are performed for each ETL process, which will take a constant amount of time and may be inefficient to perform at such frequent intervals.
Ensure any tasks or tables for soft-deletes are included in the micro ETL to identify deleted records from the source system.
Ensure all downstream post load processes are included as well as aggregates. This can be actual business dependencies, such as the GL Reconciliation process in Financial Analytics. For example, suppose a customer wants to refresh just the AP (Payables) fact three times a day. This will require the whole GL Reconciliation process and tasks to run to match the GL with AP. Otherwise, there will be a mismatch between the balance and transaction reports.
Hierarchy tables are rebuilt during every ETL execution plan by querying the base dimension tables. This operation takes a constant amount of time. If the base tables are big, this operation may take a long time and may be inefficient if the micro ETL execution plan runs several times a day. However, if you avoid populating the hierarchy tables during micro ETL processes, data inconsistencies will occur.
With micro ETL execution plans, BI Server caching will occur more frequently, which may have performance implications.
Micro ETL execution plans will put more load on the transactional database because of the frequent extracts.
Ensure the business is aware of the frequency of micro ETL execution plans and how dashboard results and critical reports are impacted throughout the day.
Follow this procedure to design a micro ETL execution plan.
To design a micro ETL execution plan:
In the Design view, select the appropriate source system container from the drop-down list in the toolbar.
Go to the Subject Areas tab, and click New.
Enter a name for the subject area, and click Save.
In the Tables subtab, click Add/Remove.
The Choose Tables dialog is displayed. The left-hand window lists all the tables held in the selected container.
Query for one or more fact tables.
Select the fact table (use Shift+click to select more than one table), and click Add.
Click OK to close the Choose Tables dialog.
Click Assemble in the Subject Areas tab toolbar.
In the Assembling... dialog, select Selected record only.
The tree view on the left side of the Subject Area Assembly dialog displays the fact tables that belong to the subject area. You can expand the fact table node to view its related tables.
Deselect the unnecessary fact and dimension tables.
Click Calculate Task List to assemble the tasks needed to load the tables displayed in the tree view.
A list of tasks is displayed in the Task List tab on the right side of the window. Also, the Table List tab displays the tables included in the subject area.
Click Accept to complete the subject area assembly process.
Inactivate any unnecessary tasks:
With the appropriate subject area selected in the Subject Areas tab, click the Tasks subtab.
Select the Inactive check box for any unnecessary task.
Reassemble the subject area by clicking Assemble in the top pane toolbar.
Note:
You can also create a micro ETL execution plan by creating a new subject area and then using the Tasks subtab to associate specific tasks with it.
DAC builds an execution plan (generates a task dependency graph) based on the following rules for the metadata in the Design and Setup views.
Task source and target tables. The dependency algorithm first looks at a task's source and target tables. For example, suppose table A is populated by task T1 by reading from table B, and table B is populated by task T2 by reading from table C. The algorithm would determine task T2 should be executed before T1.
A task with a target table that is not a source table in any other task will be a leaf node.
Task phase priority. An ETL process typically goes through a number of phases. The task phase priority determines the order in which the phase will occur. For Oracle BI Applications, the task phase priorities are predefined. You can view or modify the priorities by selecting Tools, Seed Data, Task Phases.
An example of a typical order in which phases are executed is as follows:
Extract Dimension
Extract Fact
Load Dimension
Load Fact and Load Hierarchy (executed in parallel)
Load Aggregate tables
Update Dimensions
Truncate properties. DAC truncates a target table only once during the life span of an ETL execution. If multiple tasks write to the same table, the following rules apply:
The first task truncates the target table. This task has the highest priority. Even if there are multiple tasks that could potentially truncate a table—either from the source container or across multiple source containers—only the first task will truncate the table.
The task reading from the data source with the highest priority truncates the tables and drops the indexes. The last task writing to the table from the data source with the highest priority creates the indexes. The last task writing to the table will create query indexes.
DAC does not truncate the target tables during the execution of the subsequent tasks even if the Truncate Always or Truncate for Full Load properties are selected.
If the task belongs to a task group and the task group's Truncate Always or Truncate for Full Load properties are different from the individual task truncate properties, the task's truncate properties are ignored. Note: You can use a task action if you want to override this behavior. See "Using Actions to Optimize Indexes and Collect Statistics on Tables" for information about the actions feature.
Priority of the source connection. When there is more than one source system, you need to specify the priority of each source connection in the Source Priority field in the Physical Data Sources tab of Setup view.
Task groups. The first task in the group determines the position of the group in the dependency graph.
DAC randomly organizes tasks that have the same property values, because it does not allow parallel reads and writes on any table. If you want to force tasks to be executed in a particular order, or in parallel, you can create a task group that allows you to specify an execution order. For instructions on creating a task group, see "Creating a Task Group"
Task phase dependency. The task phase dependency is used to force dependencies between tasks that normally would not have a dependency, for example, between extract facts and extract dimensions. For instructions on setting a task phase dependency, see "Setting a Task Phase Dependency".
Follow this procedure to run a single-source, multi-source, or micro ETL execution plan.
Before you attempt to run an execution plan, make sure you have completed the following:
In the Physical Data Sources tab, set database connections to the transactional and data warehouse databases.
In the Informatica Servers tab, registered the Informatica Repository Service and Integration Service.
Created a custom container from which you will select the subject areas for the execution plan.
Before you attempt to run a multi-source execution plan, you must first define the priority for each source. The priority specifies the order in which DAC will load the data from the different sources.
To define a source priority for multi-source execution plans:
In the Setup view, select the Physical Data Sources tab.
For each of the physical data sources that will participate in the multi-source execution plan, enter a numerical value in the Source Priority field.
The lower the numerical value, the higher the priority. For example, if you enter a value of 1, data from this source will be loaded first.
To build and run an execution plan:
In the Execute view, select the Execution Plans tab.
Create a new execution plan.
In the top pane toolbar, click New.
In the top pane window or in the Edit subtab, enter the following information.
Field | Description |
---|---|
Name |
Enter a name for the execution plan. |
Full Load Always |
(Optional) Select to indicate the execution plan will always execute a full load. Note: DAC does not store refresh dates when this option is selected. |
Micro ETL |
(Optional) Select to indicate the execution plan is a micro ETL execution plan. See "About Micro ETL Execution Plans" for more information. |
Analyze |
(Optional) Select to indicate the tables associated with this execution plan will be analyzed. |
Analyze Truncated Tables Only |
(Optional) Select to indicate only truncated tables will be analyzed. |
Drop/Create Indices |
(Optional) Select to indicate indexes on the tables associated with this execution plan will be dropped and created. |
Inactive |
(Optional) Select to inactivate the execution plan. |
Click Save.
For a micro ETL execution plan, create a subject area in the Subject Areas tab of the Design view by doing one of the following:
Create a new subject area by adding the appropriate tasks to it.
Create a new subject area by adding the appropriate tables and then deleting the unnecessary tasks.
Modify an existing subject area by adding or removing tasks.
Assemble the subject area by clicking Assemble in the top pane toolbar. See "Creating a Subject Area" for more information.
Associate one or more subject areas with the execution plan.
Click the Subject Areas subtab.
Click Add/Remove in the bottom pane toolbar.
In the Choose Subject Areas dialog, select the appropriate source system container from the drop-down list.
Query for the subject area you want to associate with the execution plan. For micro ETL execution plans, query for the subject area you created in step 3.
Select the subject area and click Add.
You can repeat this process to associate multiple subject areas from any available source system container with an execution plan.
Click OK to close the window.
Generate the runtime connectivity parameters.
Click the Connectivity Parameters subtab, and then click Generate in the bottom pane toolbar.
The Generating Parameters... dialog lists the containers that are involved in this execution plan.
Enter the number of copies of each container that are needed, and then click OK.
DAC automatically generates the parameters required for each copy of the source system container. Note that not all copies require all of the possible parameters.
In the Value column, for each folder or database connection, select the appropriate physical folder name or database connection.
Note:
For the data source type of FlatFileConnection, make sure you have copied all files into the directory specified in the DAC system property InformaticaParameterFileLocation.
(Optional) If you are extracting data from more than one source system and want to stagger the data extracts, in the Delay field for the appropriate data source, enter a value for the number of minutes you want to delay this extract after the first extract process has started.
(Optional) Set the Prune Time property. This setting subtracts the Prune Time value from the LAST_REFRESH_DATE and supplies this value as the value for the $$LAST_EXTRACT_DATE parameter. See "Prune Time Column" for more information.
(Optional) Add one or more tasks that will run before the ordered tasks generated by DAC.
Click the Preceding Tasks subtab, and then click Add/Remove.
Select the appropriate container from the drop-down list.
Query for and select the task you want to add, and then click Add.
(Optional) Add one or more tasks that will run after the ordered tasks generated by DAC.
Click the Following Tasks subtab, and then click Add/Remove.
Select the appropriate container from the drop-down list.
Query for and select the task you want to add, and then click Add.
In the top pane of the Execution Plans tab, make sure the new execution plan is highlighted, and click Build.
In the Building... dialog, select the option Selected Record Only, to build only the selected execution plan.
Click the Ordered Tasks subtab, and verify the following:
Click Details in the toolbar, and review each task's predecessor and successor tasks to confirm tasks common to multiple sources are ordered in a manner consistent with the priority of the source connection.
Confirm that load tasks appear only once even if there are multiple extracts for tables common to multiple sources.
For tasks common to multiple sources, click Unit Test in the toolbar, and confirm that the first common task truncates the common target table and the following tasks do not. For instructions on unit testing a task, see "Unit Testing Execution Plan Tasks".
Make sure the DAC Server is running. For instructions on starting the DAC Server, see "Managing the DAC Server".
Start the execution plan:
Click Run Now.
In the Starting ETL dialog, click Yes.
Note: The Honor Time Delays property determines whether the Delay property in the Connectivity Parameters subtab will be active.
Once the execution plan starts running you can monitor its progress in the Current Runs tab. For instructions, "Monitoring Execution Plan Processes".
For information about how refresh dates are tracked, see "About Refresh Dates and DAC's Incremental Load Strategy".
To schedule an execution plan, see "Scheduling an Execution Plan".
This section contains the following topics:
Enabling Informatica Workflows to Be Shared by Multiple Execution Plans
Explicitly Defining Execution Plans as Independent or Dependent
You can configure DAC to run multiple execution plans concurrently if the execution plans are not dependent on one another. To avoid data inconsistency, execution plans that have dependencies with other execution plans cannot run concurrently (unless you override the default behavior).
Execution plans are considered dependent if 1) they load data into tables that have the same name on the same physical data source; and 2) one execution plan writes to a table and the other execution plans reads from the same table on the same physical data source. Execution plans that have dependencies with other execution plans are not be eligible to run concurrently if any of the following scenarios are true:
The execution plans share one or more target tables on the same physical data source.
An execution plan has a target table that is a source for another execution plan. For example, execution plan A writes to table 1 on the physical data source DW and execution plan B reads from table 1 on DW.
An execution plan has a source table that is a target table for another execution plan. For example, execution plan C reads from table 2 on the physical data source DW and execution plan D writes to table 2 on DW.
DAC allows execution plans to run concurrently as long as they write to unique tables. Tables are considered to be unique in relation to other tables if the table names are unique or if tables have the same name but reside in different physical data sources.
DAC does not distinguish whether the same Informatica workflows are used by multiple execution plans. Therefore, as long as execution plans write to unique tables, if they share the same Informatica workflows, they can still run concurrently.
For example, consider a scenario in which you have development and QA environments that each have a different OLTP and data warehouse but share the same DAC and Informatica repositories, and, therefore, use the same Informatica workflows. Execution plans for these two environments that share Informatica workflows can run concurrently.
DAC provides two options for enabling Informatica workflows to be shared by multiple execution plans:
Defining an execution plan prefix
You can define a prefix for each execution plan using the DAC_EP_PREFIX execution plan level parameter. When you use this option you are able to use single instances of the DAC Server, Informatica Integration Service, and DAC and Informatica repositories to run execution plans from multiple sources. That is, you can run multiple execution plans at the same time using the same metadata.
This solution is useful if you have multiple sources with execution plans that have the same subject area. It can also be useful if the sources are in the same time zone and the execution plans need to run simultaneously or the sources are in different time zones but have overlapping execution plan schedules.
The workflows that are common to different execution plans must be enabled in Informatica Workflow Manager to run concurrently. DAC will invoke the workflows with an instance name that is in the format <prefix>_workflowname. DAC captures the Informatica pmcmd and pmrep outputs and stores them as log files for each ETL process run. For more information about log file naming conventions, see "DAC Session Log File Naming Conventions" and "Informatica Log File Naming Conventions".
For instructions on defining an execution plan prefix, see "Defining an Execution Plan Prefix".
Copying the Informatica workflow folders and creating new folder mappings
You can make copies of the Informatica workflow folders so that there is one extract and load folder combination for each execution plan, and then create new mappings in DAC for the physical to logical folders. For instructions, see "Copying Informatica Workflow Folders and Creating New Folder Mappings".
When you run execution plans concurrently, the following resource configurations apply to each ETL process. You should consider these properties when deciding how many execution plans you will configure to run concurrently.
Num Parallel Workflows per EP
This property is located on the Informatica Servers tab in the Setup view. It specifies the maximum number of workflows that can be executed in parallel on the Informatica Server for each ETL run. For example, if this value is set to 10 and three execution plans are running concurrently, 30 Informatica workflows may be running at any given time.
Num Connections per EP
This property is located on the Physical Data Sources tab in the Setup view. It specifies the number of database connections that each ETL will open to the data source.
Generic Task Concurrency Limit per EP
This system property is located on the DAC System Properties tab in the Setup view. It specifies for each execution plan the number of tasks with an Execution Type other than Informatica that can run concurrently.
Note:
When execution plans run concurrently, they do not share resources. Keep in mind that the values you set for these properties apply to each ETL process.
For example, if you configure DAC to run two execution plans concurrently, your environment must have the resources for twice the value you set for the Num Parallel Workflows per EP, Num Connections per EP, and Generic Task Concurrently Limit properties.
Make sure you have reviewed the system requirements and certification documentation for information about hardware and software requirements, platforms, and databases available on Oracle Technology Network as well as the Oracle BI Applications configuration information in Oracle Fusion Middleware Configuration Guide for Oracle Business Intelligence Applications.
In considering whether to run execution plans concurrently, you should have an understanding of the concurrent dependencies (common target tables) between the execution plans.
To view execution plan concurrent dependencies:
In the Execute view, select the Execution Plans tab.
Right-click one of the execution plans you want to compare, and then select Concurrent Dependency.
From the drop-down list in the Concurrent Dependency dialog, select the execution plan you want to compare with the execution plan you selected in step 2.
The common target tables between the two execution plans are displayed as well as the tables that one execution plan writes to and the tables that the other execution plan reads from.
To configure DAC to run execution plans concurrently, you need to do the following:
Set the Concurrency Level system property, which controls the number of independent execution plans that DAC will allow to run in parallel. See the instructions below.
When the value of the Concurrency Level system property is more than one, each execution plan starts in a separate Java process. An identifier for each process is stored in the OS Process Name column on the Current Runs tab in the Execute view.
Add the directory path to the JAVA_HOME directory to the PATH environment variable.
To set the Concurrency Level system property:
In the DAC Setup view, select the DAC System Properties tab.
In the list of system properties, find Concurrency Level.
Click in the Value column, and enter a value that specifies the number of execution plans you want to run in parallel.
Click Save.
Restart the DAC Server by selecting Tools, DAC Server Management, Restart DAC Server.
DAC provides two options for enabling Informatica workflows to be shared by multiple execution plans:
For information about the requirements for running execution plans concurrently, see "Execution Plans That Are Eligible to Run Concurrently".
Follow the procedure below to define an execution plan prefix, which enables multiple execution plans to use the same Informatica workflows.
To define an execution plan prefix:
For each workflow associated with an execution plan that will run concurrently, enable the Informatica workflow property Configure Concurrent Execution:
In Informatica Workflow Manager, open the appropriate workflow in the Workflow Designer.
On the menu bar, select Workflows, and then select Edit.
In the Edit Workflow dialog, select the Enabled check box for the Configure Concurrent Execution property.
Click the Configure Concurrent Execution button.
In the Configure Concurrent Execution dialog, select Allow concurrent run with same instance name.
Click OK.
In the DAC Execute view, go to the Execution Plans tab.
Query for and select the execution plan for which you want to define a prefix.
Click the Execution Parameters subtab.
Click New in the subtab toolbar.
In the Name field, enter DAC_EP_PREFIX.
Note: This value is case sensitive. You must enter the value exactly as it appears, using upper-case characters.
Click Save in the subtab toolbar.
Leave the default values as Text for the Data Type and Both as the Load Type.
Click in the Value field to open the Enter Parameter Value dialog.
Make sure Static is selected.
In the text box, enter a text string of one to four characters that defines the value of the DAC_EP_PREFIX parameter.
This value will be appended as a prefix to the execution plan name. For example, if you define the parameter value as AMER
, and the execution plan name is Extract_Forecast
. When the execution plan runs, its name will appear in the Current Runs tab and the log files as AMER_Extract_Forecast
.
Follow this procedure to copy Informatica workflow folders and create new folder mappings in order to configure Informatica workflows so that they can be shared by multiple execution plans.
To make copies of the Informatica workflow folders and create new folder mappings:
In Informatica Workflow Manager, make copies of the SDE and SIL folders. You will need one copy of each folder for each execution plan.
Define the new physical folders in DAC:
In the DAC menu bar, select Tools, Seed Data, Task Physical Folders.
In the Task Physical Folders dialog, click New.
Enter a name and instance priority for the physical folder.
For information about the instance priority property, see "Task Physical Folder Instance Priority".
Repeat this step for each copy of the physical folder you made in Informatica Workflow Manager.
Click OK to close the Task Physical Folders dialog.
Define the new logical folders in DAC:
In the DAC menu bar, select Tools, Seed Data, Task Logical Folders.
In the Task Logical Folders dialog, click New.
Enter a name for the logical folder, and click OK.
Repeat this step for each copy of the logical folder you made in Informatica Workflow Manager.
Map the logical and physical folders:
In the DAC Design view, go to the Source System Folders tab.
In the toolbar, click New.
Click in the Logical Folder column and select the appropriate folder from the drop-down list.
Click in the Physical Folder column and select the appropriate folder from the drop-down list.
Click Save.
Repeat this step for each new combination of physical and logical folders.
You can override the DAC execution engine to define dependent execution plans as independent and independent execution plans as dependent.
Defining Execution Plans as Independent
You should exercise caution when overriding the DAC execution engine to define two dependent execution plans as independent of each other. You should perform this procedure only if: 1) you have an advanced understanding of the DAC ETL process; and 2) you know the load mappings handle mutually exclusive sets of data across execution plans, and the execution plans do not truncate tables; or 3) you have two execution plans that use time-based dimensions and are only loaded during full loads and not incremental loads.
Caution:
If your incremental ETL process involves dropping and recreating indexes, you should not declare dependent execution plans as independent.
Defining Execution Plans as Dependent
You may want to define independent execution plans as dependent if you have resource issues. For information about potential resource issues when running execution plans concurrently, see "About Resource Usage".
To define execution plans as independent or dependent:
In the Execute view, select the Execution Plans tab.
Select the execution plan for which you want to define the dependency attribute of another execution plan.
For example, if you want to define the execution plan Oracle 11.5.10 Enterprise Sales as independent from the execution plan Procurement and Spend: Oracle 11.5.10, select Procurement and Spend: Oracle 11.5.10 in this step.
Select the Concurrent Dependency subtab.
Locate the execution plan whose dependency attribute you want to set:
In the bottom pane toolbar, click Add/Remove.
Select the execution plan, and click Add.
Click OK to return to the Concurrent Dependency subtab.
In the Concurrent Dependency list, select the appropriate execution plan.
Click in the Type column, and select one of the following:
Independent
Defines the execution plan selected in the Concurrent Dependency subtab as independent of the execution plan selected in the top window.
Dependent
Defines the execution plan selected in the Concurrent Dependency subtab as dependent on the execution plan selected in the top window
Click Save in the bottom pane toolbar.
You can create multiple instances (or copies) of an execution plan and then run the parent execution plan and the additional child instances concurrently.
The following examples illustrate when you might find it useful to run multiple instances of an execution plan:
You have an execution plan that is designed to extract data from a particular source and load a corresponding data warehouse. You also have three additional countries which require similar execution plans. You can create three child instances of the execution plan and reconfigure the connectivity details. The only difference between the parent execution plan and the child instances will be the connectivity details.
You have development and QA environments that each have a different OLTP and data warehouse but share the same DAC and Informatica repositories, and, therefore, use the same Informatica workflows. You can create child instances of an execution plan to run concurrently.
Note: You must enable the Informatica workflow property Configure Concurrent Execution in Informatica Workflow Manager for each workflow. For instructions, see step one in the procedure "Defining an Execution Plan Prefix".
When you create a new instance of an execution plan, you have the option to override the following properties:
Physical data source.
Physical folder.
Context connectivity parameter for external executors. For more information about external executors, see Chapter 14, "Integrating DAC With Other ETL Tools."
Note:
The behavior of execution plans and child instances is the same, including behavior related to running execution plans concurrently. The information about running execution plans concurrently in the section "Running Execution Plans Concurrently" also applies to child instances.
Important Points About Execution Plan Child Instances
After you have created one or more child instances of a parent execution plan, if you make changes to the parent execution plan, these changes will be propagated automatically to all child instances when you rebuild the execution plan.
All of the properties of a parent execution plan, except for execution plan level parameters, are propagated to the child instances, including connectivity parameters, dependencies, and so on. Therefore, you do not have to rebuild the child instances.
Even though execution plan level parameters are not copied from the parent execution plan to the child instance, they are used by the instance during runtime. However, you can override one or more of the execution plan level parameters defined in the parent execution plan by creating new execution plan level parameters in the instance and using the same name for the parameter as that of the parent and giving it a different value.
For example, suppose you have an execution plan that has two execution plan level parameters defined: one for currency, with the value USD; and one for region, with the value California. Then, you create a child instance of the execution plan, in which you want the currency parameter to remain as USD, but you want the value of region to become Nevada. The value of the currency parameter will be propagated to the child instance, so you do not need to modify this parameter. For the region parameter, you would need to define an execution plan level parameter for the child instance. To do so, you would use the same name for the parameter as that of the parent execution plan (in this case, "region"), and you would give the region parameter the value Nevada.
You can also create execution plan level parameters specifically for a child instance.
To create, configure and run multiple instances of an execution plan:
In the Execute view, go to the Execution Plans tab, right-click the execution plan you want to copy, and select Create Execution Instance.
The Execution Instance Configuration dialog opens.
Enter a unique name for the instance.
Enter a unique instance code as an identifier. The value must be alphanumeric and from one to five characters.
This code is used as the prefix for the Informatica run instance name and enables DAC and Informatica to distinguish between workflow run instances that are running concurrently.
Click Next.
The physical data source for the parent execution plan is displayed. You can optionally override this property with a different physical data source by doing the following:
Click the drop-down list in the Override With field.
Select the appropriate physical data source for this execution plan instance.
Click Next to proceed.
The physical folder for the parent execution plan is displayed. You can optionally override this property with a different physical folder by following the substeps in step 5.
If a context connectivity parameter for external executors is defined for the parent execution plan, it is displayed. You can optionally override this property with a different context by following the substeps in step 5.
Click Finish.
A message dialog indicates whether the process was successful.
(Optional) Go to the Execution Instances tab, select the appropriate instance, and configure additional properties:
Configure the properties that appear in the top pane window or in the Edit subtab. For a description of these properties, see "Building and Running Execution Plans".
In the Connectivity Parameters subtab, modify the parameters as needed.
In the Execution Parameters subtab, do the following as needed:
- Override one or more of the execution plan level parameters defined in the parent execution plan by creating new execution plan level parameters in the instance and using the same name for the parameter as that of the parent and giving it a different value.
- Create new execution plan level parameters.
(Optional) To run the execution plan instance, do one of the following:
Click Run Now in the top pane toolbar.
Schedule the execution plan instance to run by following the instructions in the procedure "Scheduling an Execution Plan".
You can monitor execution plan instances the same way you monitor execution plans, using the Current Run and Run History tabs of the Execute view. See "Monitoring Execution Plan Processes" for more information.
To reconfigure the override settings of an instance, go to the Execution Instances tab, and click Configure.
This section contains the following topics. These topics apply to multi-source execution plans.
Use the Extract Delay feature to stagger the timing of data extraction from different source systems.
Use the Event Delay feature to configure the extracts for different source systems to occur independently.
Setting Up Data Source Usage Notifications
Use the Data Source Notification feature to initiate email notifications about data source usage and to define custom SQL to be executed based on the usage.
You can delay the extract of a source system in a multi-source environment by setting the Delay property in the Connectivity Parameters subtab of the Execution Plans tab. See "Building and Running Execution Plans" for instructions on setting the Delay property.
The Event Delay feature applies to multi-source environments. It enables you to configure the extracts for the different data sources to occur independently.
The following are examples of how you might use the Event Delay feature:
In a multi-source environment, when one data source is not available at the beginning of the ETL process.
For example, suppose an ETL process extracts CRM data from one source system and financial data from another source system. Nightly maintenance of the CRM system is scheduled for 11:00 PM, and the system is expected to be back up at 11:30 PM. The ETL process is scheduled to begin at 12:00 AM. If the CRM system is still down at the ETL start time, the ETL process will fail immediately because the connection pool to the CRM source will not be created.
In such situations, you can define an event delay that would allow the tasks that read and write to the financial data source to be executed while the tasks for the CRM source wait until a connection pool can be created. You define the event delay using the Lazy Initialization property, which prevents a connection pool from being created until the connection itself is available. The properties you need to define along with the Lazy Initialization property are described in Table 6-1.
In this example, the Lazy Initialization property is enabled on the CRM source. The Polling Frequency property is set at one minute, and the Timeout period is set to 120 minutes. Therefore, DAC will poll the CRM source every minute for 120 minutes.
If the connection becomes available within 120 minutes, DAC will create the connection pool and the ETL will proceed. If the Timeout period elapses, and Fail is selected for the Upon Timeout property, the tasks that read and write to the unavailable source will fail. If the Timeout period elapses, and Skip is selected, the tasks will be skipped, and the extract from the financial source will proceed.
Specifying a condition (or event) using custom SQL.
You can write custom SQL to define a condition that must occur before the ETL process starts. This condition would indicate whether the data source is ready to participate in the ETL process.
For example, suppose an automated process inserts and updates contact information into a CRM source. If the ETL process starts before the insert and update process has completed, only some of the data will be available for extraction, which will result in incomplete data loading into the data warehouse, and, consequently, inaccurate reports. In such cases, you could write custom SQL to define the condition that the insert and update process must complete successfully before the ETL process starts.
The properties for defining an event delay are described in Table 6-1.
Table 6-1 Event Delay Properties
Property | Description |
---|---|
Lazy Initialization |
When this check box is selected, DAC prevents a connection pool from being created until the connection itself is available. When you select Lazy Initialization, you then need to set the Timeout, Upon Timeout, and Polling Frequency properties. |
Condition SQL |
Click in the field to open a text box where you can enter a SQL statement that defines a condition that DAC will execute against the data source to verify if the data source is ready. This SQL should return a value in the form of a number. A result of 0 indicates the data source is not ready. A non-zero result means the event has happened. |
Polling Frequency (min) |
Specifies in minutes how often DAC will try to create the connection or poll for the readiness of the data source. |
Maximum Polling (min) |
The period of time (specified in minutes) indicating how long DAC will continue to poll to see if the connection is available or the data source is ready. |
Upon Timeout |
The possible action DAC will take if the Maximum Polling period elapses without a connection being available or the data source being ready. The options are:
|
Cleanup SQL |
A SQL that can be executed upon the successful occurrence of the event defined in the Condition SQL. This property applies only if a Condition SQL is defined. |
To set up an event delay:
Go to the Physical Data Sources tab in the Setup view.
In the top pane, select the physical data source for which you want to define an event delay.
Click the Extended Properties subtab.
Click New in the bottom pane toolbar.
Click in the Name field to expose the drop-down list, and select EventBasedDelay.
Click Save in the toolbar.
Double-click in the Value field to display the Property Value dialog.
Complete the fields in the Property Value dialog. See Table 6-1 for a description of the properties.
Click OK to close the Property Value dialog.
Click Save in the toolbar.
The Data Source Usage Notification feature enables you to initiate email notifications about data source usage and to define custom SQL to be executed based on the usage.
The data source usage categories about which DAC will send notifications are the following:
Notification Before First Read
Notification Before First Write
Notification Before First Read or Write
Notification After Last Read
Notification After Last Write
Notification After Last Read and Write
The email notification contains the execution plan name, the ETL run name and timestamp, the data source name, and the data source usage category. The recipients of the email are the email accounts set up in the procedure "Setting Up Email Notifications in the DAC Client and Server".
When you define the data source usage notification properties, you can also define custom SQL that will be executed upon the occurrence of the specified data source usage category. In the custom SQL, you can use the DAC variable @DAC_NOTIFICATION_RECORD in a parameter.
To set up a notification:
Go to the Physical Data Sources tab in the Setup view.
In the top pane, select the physical data source for which you want to define an event delay.
Click the Extended Properties subtab.
Click New in the bottom pane toolbar.
Click in the Name field to expose the drop-down list, and select the appropriate data source usage category.
Click Save in the toolbar.
Double-click in the Value field to display the Property Value dialog.
Complete the fields in the Property Value dialog using the following information:
Field | Description |
---|---|
Send Email |
Select this check box to indicate an email should be sent when the selected operation occurs. Email recipients with a Notification Level of 5 or higher will receive the email The DAC email administrator account and email recipients must be set up before DAC can send data source usage notifications. For instructions, see "Setting Up Email Notifications in the DAC Client and Server". |
Execute SQL |
Click in the field to open a text box where you can enter a SQL statement that DAC will execute against the data source when the selected operation occurs. |
Is Stored Procedure |
Select this check box if the custom SQL is a stored procedure. |
Click OK to close the Property Value dialog.
Click Save in the toolbar.
Starting an Execution Plan
When the DAC Server receives a request to start an execution plan, it performs a series of checks to verify that the execution plan can be started. It first checks that an execution plan with the requested name exists and that the execution plan is active.
Next, it checks the status of the execution plan that last ran. If an execution plan is still running and the DAC Server receives a request to start the same execution plan, the request will be rejected. If an execution plan failed, a request to run the same execution plan again will be executed to restart the failed execution plan.
However, a request to run a different execution plan will be put on hold if the execution plan has dependencies on the failed execution plan (dependencies such as having the same source and target tables). If the execution plans are mutually exclusive, the new request will result in a new execution plan run.
When the DAC Server receives a request to start an execution plan, it will issue a warning if any of the following conditions are true. (A warning is for informational purposes and does not mean the execution plan will not start.)
The Generic Task Concurrency Limit value in the DAC System Properties tab is not a positive number.
There are no active Informatica Integration Service or Repository Service registered in the Informatica Servers tab.
One or more Informatica Integration Services do not have the passwords defined in the Informatica Servers tab.
One or more Informatica Integration Services do not have a Num Parallel Workflows per EP value properly defined in the Informatica Servers tab.
One or more data sources do not have the Table Owner or Table Owner Password values properly defined in the Physical Data Sources tab.
One or more data sources do not have a maximum number of connections (Num Connections per EP) value properly defined in the Physical Data Sources tab.
One or more data sources do not have a Data Source Number defined in the Physical Data Sources tab.
Stopping an Execution Plan
When the DAC Server receives a request to stop the operation of a running execution plan, the request will fail in the following cases:
The name of the execution plan that is running is different from the name in the request.
There is no execution plan currently running
The volume of data in the source system that loads certain fact table extracts, such as payroll or GL, can be huge. Loading all of the source system data into the data warehouse can be very time consuming and can delay the move to production. Using a deferred ETL approach can help to alleviate the problems associated with loading large fact tables at one time. Deferred ETL processes enable you to load a subset of data initially and then incrementally add historical data. A deferred ETL uses the Initial Extract Date parameter (present in all preconfigured mappings) to limit the data that comes into the data warehouse by setting the parameter for a short period of time. For example, you could use the Initial Extract Date parameter to limit the initial extract to the most recent five years' worth of data, and then continue to extract several months' worth of data in increments.
Because the deferred ETL time window is based on a parameter, deferred ETLs are easy to maintain. And, you can stop the deferred extraction after you have reached the end of the extraction period, when all of the required data has been loaded into the data warehouse.
This section contains the following topics:
Follow this procedure to set up a deferred ETL process.
Note: The SQL used in this procedure appears in the Oracle format and needs to be modified for other platforms.
To set up a deferred ETL process:
Identify the fact tables that have huge amounts of data for which you want to do a deferred load.
In Informatica PowerCenter Workflow Manager, make a copy of the full extract mapping and workflow, and give it a name to identify it as a deferred extract. For example, if the original mapping is named SDE_ORA_GLBalanceFact, you could rename it to SDDE_ORA_GLBalanceFact, where SDDE stands for source dependent deferred extract.
In DAC, create runtime parameters for the start time and the end time of the extract. This procedure uses DEFERRED_EXTRACT_START and DEFERRED_EXTRACT_END as parameter names.
See "Defining Parameters" for instructions on creating a runtime parameter.
In Informatica PowerCenter Designer, edit the SQL override to query the main table for records that were modified between the start and end times of the extract.
Create a table in the data warehouse schema to hold the parameters for the start time and the end time of the extract: Include the following table columns:
Go to the Tables tab in the Design view.
Create a new table. This procedure uses DEFERRED_ETL_REF_DT as the table name.
Go to the Columns subtab, and create the following two columns:
NAME, with data type as varchar and length 50
VALUE, with data type as timestamp
Create this table in the data warehouse by going to the Tables tab, right-clicking the table and selecting Generate DW Table Scripts.
In DAC, go to the Source System Parameters tab in the Design view, and create the following two source system parameters:
Property | Value |
---|---|
Name |
DEFERRED_EXTRACT_START |
Type |
Timestamp |
SQL Value |
|
Function |
SQL Syntax Note: This value will be used in the select clause of the SQL override in Informatica. |
Connection Type |
@DAC_SOURCEDB_TYPE |
Property | Value |
---|---|
Name |
DEFERRED_EXTRACT_END |
Type |
Timestamp |
SQL Value |
|
Function |
SQL Syntax Note: This value will be used in the select clause of the SQL override in Informatica. |
Connection Type |
@DAC_SOURCEDB_TYPE |
In DAC, go to the Tasks tab in the Design view, and locate the extract tasks.
For each extract task, make a copy by right-clicking and selecting Copy Record.
For each copied record, in the Edit subtab, modify the following properties to create deferred extract tasks:
Property | Value |
---|---|
Name |
Enter a name so as to identify it as a deferred task. |
Command for Incremental Load |
Enter a name so as to identify it as a deferred workflow. |
Command for Full Load |
Leave this field blank. |
For each deferred extract task, override the $$INITIAL_EXTRACT_DATE parameter value to specify the initial extract date for the extract tasks associated with the deferred load.
Note: In this step you are setting the $$INITIAL_EXTRACT_DATE parameter at the task level for extract tasks associated with the deferred load. It is important to set this parameter at the task level so that it affects only the extract tasks associated with the deferred load.
Create a source system level parameter called $$DEFERRED_ETL_MONTHS and set the value to the number of months' worth of data you want to extract.
For example, if you set the value to 3, each run of the deferred ETL will extract an additional three months' worth of data.
Note: Because the value for the months of data to be extracted is a parameter, you can modify the number of months based on the performance of the sever runs. For example, if you set the value to three months and the data volume is too much, you can reduce it. However, if three months' worth of data is not impacting performance, you can increase the number of months.
In the Container Specific SQLs tab in the Design view, create a task action to create the first window for the deferred ETL. This action will run for the full load. This procedure uses the name "Create First Window for Deferred ETL" for the task action name.
For detailed instructions on creating task actions, see "Using Actions to Manage Indexes, Tables and Tasks".
When defining the task action, enter the following two SQL statements.
To set the deferred ETL start date:
INSERT INTO DEFERRED_ETL_REF_DT (NAME, VALUE) VALUES ('DEFERRED_EXTRACT_START', TO_DATE('@DAC_$$INITIAL_EXTRACT_DATE', 'MM/DD/YYYY'))
To set the deferred ETL end date:
INSERT INTO DEFERRED_ETL_REF_DT (NAME, VALUE) VALUES ('DEFERRED_EXTRACT_END', ADD_MONTHS(TO_DATE(@DAC_$$INITIAL_EXTRACT_DATE', 'MM/DD/YYYY'), -@DAC_$$DEFERRED_ETL_MONTHS))
In the Container Specific SQLs tab in the Design view, create a task action to move the deferred ETL window backwards during subsequent incremental runs. This action will run for the incremental load. This procedure uses the name "Update Window for Deferred ETL" for the task action name.
When defining the task action, enter the following SQL statement:
UPDATE DEFERRED_ETL_REF_DT SET VALUE = ADD_MONTHS(VALUE, -@DAC_$$DEFERRED_ETL_MONTHS)
In the Tasks tab of the Design view, create a new task to handle the deferred ETL window, and define the following properties:
Property | Value |
---|---|
Task Phase |
Post ETL Process |
Execution Type |
Container Specific Task Action |
Command for Full Load |
Assign the task action "Create First Window for Deferred ETL." |
Command for Incremental Load |
Assign the task action "Update Window for Deferred ETL." |
Primary Logical Source |
DBConnection_OLAP |
Primary Logical Target |
DBConnection_OLAP |
Go to the Source Tables subtab, and add DEFERRED_ETL_DT as a primary source table.
Go to the Target Tables subtab, add DEFERRED_ETL_DT as the target table, and select Truncate for Full Load.
By selecting Truncate for Full Load, you ensure that the deferred ETL process will also execute during the next full load.
Next, proceed to the section "Running Deferred ETL Processes".
There are two options for running a deferred ETL process:
Run the deferred extract tasks along with the nightly execution plan. You can use this option if only a small amount of data will be extracted, such as one month's worth.
See "Running Deferred Extract Tasks With a Nightly Execution Plan" for instructions.
Create an execution plan specifically for the deferred extract tasks and run this execution plan separately.
See "Creating and Running a Separate Deferred Execution Plan" for instructions.
Note: The SQL used in this procedure appears in the Oracle format and needs to be modified for other platforms.
Follow this procedure to run deferred extract tasks along with a nightly execution plan.
Reassemble the subject area that contains the fact tables for which you added the deferred extract tasks (in the procedure "Setting Up Deferred ETL Processes"):
In the Design view, select the appropriate source system container from the drop-down list in the toolbar.
Click the Subject Areas tab, and then select the appropriate subject area.
In the top pane toolbar, click Assemble.
This step adds the SDDE tasks to the collection of tasks required to populate the schema.
Add the new task you created to handle the deferred ETL window (in the procedure "Setting Up Deferred ETL Processes") as a following task in the appropriate execution plan:
Go to the Execution Plans tab in the Execute view, and query for the appropriate execution plan.
Click the Following Tasks subtab, and then click Add/Remove.
Select the appropriate container from the drop-down list.
Query for and select the task you want to add, and then click Add.
Note: Only tasks that have the task phase Post ETL Process will appear in this list.
In the top pane toolbar, click Build to rebuild the execution plan.
When the execution plan is rebuilt, the deferred extract tasks will be ordered among the other tasks alongside their regular counterparts. For every regular-deferred task pair, there is no guarantee which one will be executed first. However, DAC ensures that the task that runs first truncates the table and drops indexes (if any are present) and the last task creates indexes and analyzes the table. If a second extract task fails and is restarted, the truncate command will not be issued. You should design the extract processes to load data without update strategies, but, preferably, with the bulk loader enabled.
You can also create a "Preceding action" and "Upon Failure Restart action" to perform clean up before the tasks run. For more information, see "Using a Task Action to Enable Failure Restarts When Extracting From Multiple Sources".
If you choose to run the deferred extract tasks in parallel, you can optionally create a task group for the tasks and enable the group level Truncate Always property. This will truncate the staging table before either of the deferred extract tasks starts.
Use a heuristics rule to stop DAC from running the deferred extract tasks after a specified period of time has elapsed. For detailed information about DAC heuristics, see "Using Heuristics to Manage Tasks, Tables and Indexes".
Note: The SQL used in this procedure appears in the Oracle format and needs to be modified for other platforms.
From the Tools menu, select Seed Data, and then select Heuristics.
Create a heuristics rule with the following values:
Name - Deferred ETL Heuristic
Heuristics - Count
Metric - Sum
Source Tables - All Types - With Overrides
This heuristics rule tells DAC to run the task only if the result of a heuristic SQL associated with the source tables yields a result of greater than zero.
Create a task level parameter called DEFERRED_ETL_END_TIME, and select the Timestamp data type.
In the Enter Parameter Value dialog, do the following:
- Select Static.
- Select SQL Syntax as the Function.
- Select @DAC_SOURCE_DBTYPE as the Connection Type.
- Enter the appropriate date and time to specify the period after which DAC should stop running the deferred extract tasks.
For example, if you enter Jan 1, 1960 12:00:00 AM, DAC will stop running the deferred extract task after historical data with the timestamp Jan 1, 1960 12:00:00 AM has been loaded.
Write a heuristic SQL for the deferred refresh date validation, and name it "Deferred ETL Validation Heuristic." For instructions on writing a heuristic SQL, see "Writing Custom SQL for a Heuristics Rule".
Enter the following SQL statement:
SELECT CASE WHEN VAUE < @DAC_$$DEFERRED_ETL_END_TIME THEN 0 ELSE 1 END FROM DEFERRED_ETL_REF_DT WHERE NAME = 'DEFERRED_EXTRACT_START'
Assign the "Deferred ETL Heuristic" heuristic rule to the deferred extract tasks in the Extended Properties subtab of the Task tab. For instructions, see "Associating a Heuristics Rule With a Task".
Assign the "Deferred ETL Validation Heuristic" heuristic SQL to the primary source table:
Assign the "Deferred ETL Validation Heuristic" heuristic SQL to the primary source table:
In the Design view, click the Tasks tab, and select the deferred extract task.
Click the Source Tables subtab.
Select or query for the appropriate source table.
Click in the Task Count Override field.
The Choose SQL Override dialog is displayed.
Select or query for the "Deferred ETL Validation Heuristic" heuristic SQL, and then click OK.
Follow this procedure to create an execution plan specifically for the deferred extract tasks and run this execution plan separately.
Create a subject area, and manually add the tasks required to run the deferred load. The required deferred extract tasks are the SDDE task and the task with the Post ETL Process task phase, which you created in the procedure "Setting Up Deferred ETL Processes".
You can also include the following:
Load task (SIL)
Any PLP tasks relevant to the fact table
Any aggregate tables that depend on the fact table
For instructions on creating a subject area, see "Creating a Subject Area".
Create and build an execution plan for the deferred ETL, and select the Micro ETL property.
You must enable the Micro ETL property to track refresh dates separately for the deferred ETL execution plan. Because the deferred ETL execution plan shares the same source and target tables for the data sources, normal refresh dates would be impacted if the Micro ETL property is not enabled.
For instructions on creating and building a micro ETL execution plan, see "Building and Running Execution Plans".
Schedule the deferred ETL execution plan. For instructions, see "Scheduling an Execution Plan".