Oracle® Business Intelligence Data Warehouse Administration Console User's Guide Version 10.1.3.4 Part Number E12652-02 |
|
|
View PDF |
This chapter provides information about building, running, and monitoring execution plans.
This chapter contains the following topics:
How DAC Determines the Order of Task Execution within an Execution Plan
Building and Running Single-Source and Multi-Source Execution Plans
An execution plan is a unit of work that enables you to organize, schedule, and execute ETL processes. An execution plan comprises the following objects: subject areas, ordered tasks, indexes, tags, parameters, source system folders, and phases.
DAC supports single-source and multi-source execution plans, which are described in the following sections:
A single-source execution plan extracts data from a single instance of a single source system container, such as Siebel 7.8 or Oracle EBS 11. For information about data extraction and loads for single-source execution plans, see "Common Extract and Load Scenarios".
There are two types of multi-source execution plans:
Homogeneous
This type of execution plan extracts data from multiple instances of the same source system. For example, a business might have an instance of Oracle EBS 11i in one location and time zone and another instance of Oracle EBS 11i in another location and time zone. In such cases, the timing of data extraction from the different instances can be staggered to meet your business requirements.
Heterogeneous
This type of execution plan extracts data from one or more instances of dissimilar source systems. For example, a business might have an instance of Siebel 7.8 in one location, an instance of Oracle EBS 11i in another location, and a second instance of Oracle EBS 11i in yet a third location. You can also stagger the timing of data extraction when you use this type of execution plan.
For information about data extraction and loads for multi-source execution plans, see "Common Extract and Load Scenarios".
The order in which DAC loads data from the different sources that are participating in the ETL process is determined by the priority of the physical data source connection (set in the Physical Data Sources tab of the Setup view). The Priority property ensures that tasks attempting to write to the same target table will not be in conflict.
As shown in Figure 9-1, the Delay property is located in the Parameters subtab in the Execution Plans tab of the Execute view. You set this property to specify how many minutes an extract of a data source will be delayed after the first extract of a multiple source extract process started.
As shown in Figure 9-2, you need to set the folder level priority in the Task Physical Folders dialog box, which you by selecting Tools, then Seed Data, and then Task Physical Folders.
The most common extract and load scenarios are as follows.
The multiple extract scenarios apply to both homogeneous and heterogeneous multi-source execution plan types.
In the single extract and single load scenario, as shown in Figure 9-3, data is extracted from a single source, loaded into staging tables, and then loaded into the data warehouse.
When DAC truncates a table, as part of the truncation process, it also drops and recreates indexes (if the table has indexes) and analyzes the table after the truncation.
In a single extract scenario, the truncation process for a table without indexes is as follows:
Truncate table.
Run Informatica mapping
Analyze table Table.
For a table with indexes, the truncation process is as follows:
Truncate table.
Drop indexes.
Run Informatica mapping.
Recreate indexes.
Analyze Table.
In the multiple extract and single load scenario, as shown in Figure 9-4, data is extracted from multiple sources and loaded into central staging tables. You can stagger the extracts to accommodate different time zones by setting the Delay property found in the Parameters subtab of the Execution Plans tab in the Execute view.
After all of the extract tasks have completed, the data is loaded into the data warehouse in a single load process.
When DAC truncates a table, as part of the truncation process, it also drops and recreates indexes (if the table has indexes) and analyzes the table after the truncation.
If a target table is shared across different sources, it will be truncated only once. The priority of the data source determines which of the extracts truncates the tables. The task reading from the data source with the highest priority truncates the tables and drops the indices. The last task writing to the table from the data source with the highest priority creates the indexes.
In a multiple extract scenario, the truncation process for a table without indexes is as follows:
Truncate table (first extract task).
Run Informatica mapping (first extract task).
Run Informatica mapping (second extract task).
Analyze table.
For a table with indexes, the truncation process is as follows:
Truncate table (first extract task).
Drop indexes (first extract task).
Run Informatica mapping (first extract task).
Run Informatica mapping (second extract task).
Recreate indexes (second extract task).
Analyze table.
Figure 9-5 shows a multiple extract and multiple load scenario.
The following rules apply to homogeneous and heterogeneous multi-source execution plans.
Phase Dependency Blocking Behavior
DAC's Task Phase Dependency feature enables you to change the order in which tasks are executed. Task phase dependencies have a Scope property, which specifies how the Block action behaves in relation to multi-source execution plans. For more information about phase dependencies, see "Tasks Tab: Phase Dependency Subtab".
You can set the Scope property to one of the following values:
Both
Indicates the blocking action is active for tasks that have the same source and target physical data source connections.
Source
Indicates the blocking action is active for tasks that have the same source physical data source connection.
Target
Indicates the blocking action is active for tasks that have the same target physical data source connection.
None
Indicates the blocking action is active for all tasks regardless of the source and target physical data source connections.
The Delay property is located in the Parameters subtab in the Execution Plans tab of the Execute view. You set this property to specify how many minutes an extract of a data source will be delayed after the first extract of a multiple source extract process started.
For multi-source execution plans, you must assign each physical data source a priority (in the Physical Data Sources tab of the Setup view). The priority ranks the data sources and specifies the order in which DAC will load the data from the different sources. This property ensures that tasks attempting to write to the same target table will not be in conflict.
If a target table is shared across different sources, it will be truncated only once. The priority of the data source determines which of the extracts truncates the tables. The task reading from the data source with the highest priority truncates the tables and drops the indexes. The last task writing to the table from the data source with the highest priority creates the indexes.
Truncate options should be the same across the source system containers.
Tasks should belong to the same task groups across the source system containers.
Customized Tasks for Vertical Applications
In the case of Vertical Applications in which a task could have been customized in a Vertical workflow folder, the task will pick up the Vertical task folder when the source and target connections are the same based on the task folder priority.
All tables should contain the column DATASOURCE_NUM_ID.
Unique indexes should always include the column DATASOURCE_NUM_ID.
All the extract mappings populate the DATASOURCE_NUM_ID column from the parameter file produced by DAC.
All the load mappings extract the value of the DATASOURCE_NUM_ID column from the staging tables.
Level 1 aggregate tables should always contain the column DATASOURCE_NUM_ID.
You should use the same load strategy for loading a table across the source system containers; that is, you should always use full loads or always use incremental loads. Avoid using a combination of load types.
The Upon Failure Restart option of the Task Actions feature can be useful to restart tasks upon failure. For information about using task actions, see "About Index, Table and Task Actions".
An execution plan is a collection of subject areas and a unique collection of tasks. A task can have prerequisite tasks that need to be executed before its own execution. DAC determines the order of tasks based on the following considerations:
A task's source and target table
The dependency algorithm first looks at a task's source and target table. For example, suppose table A is populated by task T1 by reading from table B, and table B is populated by task T2 by reading from table C. The algorithm would determine task T2 should be executed before T1.
The dependency algorithm next considers the following:
Task phase
An ETL process typically goes through several phases. An example of a typical order in which phases are executed is as follows:
Extract Dimension
Extract Fact
Load Dimension
Load Fact and Load Hierarchy (executed in parallel)
Load Aggregate tables
Update Dimensions
A table's Truncate Always properties
The order of execution based on Truncate Always properties is as follows:
Insert
Upsert
Physical data source
Priority
DAC randomly organizes tasks that have the same property values. If some tasks need to be executed in a particular order, you can create a task group that allows you to specify an execution order.
Before you attempt to run an execution plan, make sure you have completed the following:
Set database connections to the transactional and data warehouse databases (in the Physical Data Sources tab).
Registered the Informatica PowerCenter Services and Integration Service (in the Informatica Servers tab).
Before you attempt to run a multi-source execution plan, you must first define the priority for each source. The priority specifies the order in which DAC will load the data from the different sources. For more information about the source priority property, see "Physical Data Sources Tab".
To define a source priority
Navigate to the Setup view, and then select the Physical Data Sources tab.
For each of the physical data sources that will participate in the multi-source execution plan, enter a numerical value in the Priority field.
The lower the numerical value, the higher the priority. For example, if you enter a value of 1, data from this source will be loaded first.
To build and run a single-source or multi-source execution plan
Navigate to the Execute view, then select the Execution Plans tab.
Create a new execution plan.
In the top pane toolbar, click New.
In the top pane window or in the Edit subtab, enter a name for the execution plan and other appropriate information.
For a description of the fields in this tab, see "Execution Plans Tab".
Click Save.
Associate one or more subject areas with the execution plan.
Click the Subject Areas child tab.
Click Add/Remove in the bottom pane toolbar.
In the Choose Subject Areas dialog, select the appropriate source system container from the drop-down list.
Query for the subject area you want to associate with the execution plan.
Select the subject area and click Add.
You can repeat this process to associate multiple subject areas from any available source system container with an execution plan.
Click OK to close the window.
Generate the runtime execution plan parameters.
Click the Parameters subtab, and then click Generate in the bottom pane toolbar.
In the Generating Parameters dialog box, enter the number of copies for each source system container, and then click OK.
Click OK in the informational message.
DAC automatically generates the parameters required for each copy. Not all copies require all of the possible parameters.
On the Parameters subtab, edit the parameters for each copy of the source system container as follows:
For each data source type, select the appropriate value from the Value drop-down list.
Note:
For the data source type of FlatFileConnection, make sure you have copied all files into the directory specified in the DAC system property InformaticaParameterFileLocation.
For each Informatica SIL and SDE folder, select the appropriate value in the Value drop-down list.
For each data source type, enter the appropriate name in the Value field.
(Optional) If you are extracting data from more than one source system container and want to stagger the data extracts, in the Delay field for the appropriate data source, enter a value for the number of minutes you want to delay the extract.
(Optional) Set the Prune Days property. This setting subtracts the Prune Days value from the LAST_REFRESH_DATE and supplies this value as the value for the $$LAST_EXTRACT_DATE parameter. See "Prune Days" for more information.
Click the Ordered Tasks child tab and verify the following:
Click Details in the toolbar, and review each task's predecessor and successor tasks to confirm tasks common to multiple sources are ordered in a manner consistent with the priority of the source connection.
Confirm that load tasks appear only once even if there are multiple extracts for tables common to multiple sources.
For tasks common to multiple sources, click Preview Run Details in the toolbar, and confirm the following:
The first common task truncates the common target table and the following tasks do not.
The first common task truncates the common target table and the following tasks do not.
For instructions on unit testing a task, see "Unit Testing Execution Plan Tasks".
In the top pane of the Execution Plans tab, make sure the new execution plan is highlighted, and click Build.
In the Building... dialog box, select the option Selected Record Only, to build only the selected execution plan.
To run the execution plan, select the execution plan in the top pane, and click Run Now.
Once the ETL process starts running you can monitor its progress in the Current Run tab.
For information about how refresh dates are tracked, see "About Refresh Dates".
To schedule an execution plan, see "Scheduling an Execution Plan".
You can test how DAC will execute an individual task (and its details) without running a fully functional execution plan. And you can test tasks that belong to single-source or multi-source execution plans. All DAC tasks in an execution plan can be individually tested, regardless of their position in the dependency graph. DAC generates a parameter file named exactly as in the Informatica sessions.
Note:
You cannot unit test the workflow directly from Informatica. DAC generates parameter files and log files with variable names, and, therefore, you cannot re-run workflows directly from Informatica.
To unit test an execution plan task
Navigate to the Execute view, then select the Execution Plans tab.
Select the execution plan to which the task you want to test belongs.
Click the Ordered Tasks child tab.
Select a task from the list.
On the toolbar, select Preview Run Details.
In the Preview Run Details dialog, click Execute.
DAC executes the task and displays the results in the Preview Run Details dialog.
Click OK to close the Preview Run Details dialog.
Micro ETL execution plans are ETL processes that you schedule at very frequent intervals, such as hourly or half-hourly. They usually handle small subject areas or subsets of larger subject areas. DAC tracks refresh dates for tables in micro ETL execution plans separately from other execution plans and uses these refresh dates in the change capture process.
After a micro ETL execution plan runs, DAC populates refresh date values in the Refresh Dates child tab of the Execution Plans tab. If a subject area is used in a regular execution plan (an execution plan with the Keep Separate Refresh Dates option not selected) as well as a micro ETL execution plan, DAC maintains refresh dates for the tables in the regular execution plan in the Refresh Dates child tab of the Physical Data Sources tab (Setup view).
In cases of a subject area being used in both a regular and micro ETL execution plan and the micro ETL execution plan is suspended for a few days but the regular execution plan runs nightly, DAC automatically detects the last refresh date for the tables common to both execution plans and intelligently extracts only the most recent records for the micro ETL execution plan.
Caution:
Micro ETL processes can cause issues with data inconsistencies, data availability, and additional load on the transactional database. Therefore, you should consider the following factors before implementing a micro ETL process:
For related star schemas, if one schema is omitted from a micro ETL execution plan, the cross-star reports may be inaccurate. For example, if the Person fact table is refreshed more frequently than the Revenue fact table, a report that spans the Person and Revenue dimensional schemas may produce inconsistent results.
If you omit dimension tables from a micro ETL execution plan, the foreign keys for the fact tables will point to Unspecified rows for the new dimension records. The foreign key references will be resolved when the Complete ETL execution plan is run, but users of the reports should be aware of such inconsistencies.
If you do not include aggregate tables in micro ETL execution plans, the reports that use data from these tables will be inconsistent with the reports that use data from the detailed fact tables. However, if aggregate tables are included in the micro ETL execution plan, the aggregate calculations are performed for each ETL process, which will take a constant amount of time and may be inefficient to perform at such frequent intervals.
Hierarchy tables are rebuilt during every ETL execution plan by querying the base dimension tables. This operation takes a constant amount of time. If the base tables are big, this operation may take a long time and may be inefficient if the micro ETL execution plan runs several times a day. However, if you avoid populating the hierarchy tables during micro ETL processes, data inconsistencies will occur.
With micro ETL execution plans, caching will occur more frequently, which may have performance implications.
Micro ETL execution plans will put more load on the transactional database because of the frequent extracts.
To a build and run a micro ETL execution plan
In the DAC toolbar, select the appropriate source system container from the drop-down list in the toolbar.
From the Menu bar, select Views, then select Design, then select Subject Areas.
In the Subject Areas tab, assemble a small subject area.
In the Tasks child tab, inactivate all tasks that are not required for the execution plan.
Create a new execution plan.
Navigate to the Execute view, then select the Execution Plans tab.
Enter a name for the execution plan
Select the Keep Separate Refresh Dates check box.
Click Save.
Associate one or more subject areas with the execution plan. The subject areas can belong to one or more source systems.
Click the Subject Areas child tab.
Click Add/Remove in the bottom pane toolbar.
In the Choose Subject Areas dialog, select the appropriate source system container.
Query for the subject area you want to associate with the execution plan.
Select the subject area and click Add.
You can associate multiple subject areas with an execution plan, but all the subject areas must be from the same source system container.
Click OK to close the window.
Generate the runtime execution plan parameters.
Click the Parameters subtab, and then click Generate in the bottom pane toolbar.
In the Generating Parameters dialog box, enter the number of copies for each source system container, and then click OK.
Click OK in the informational message.
DAC automatically generates the parameters required for each copy. Not all copies require all of the possible parameters.
On the Parameters subtab, edit the parameters for each copy of the source system container as follows:
For each data source type, select the appropriate value from the Value drop-down list.
Note:
For the data source type of FlatFileConnection, make sure you have copied all files into the directory specified in the DAC system property InformaticaParameterFileLocation.
For each Informatica SIL and SDE folder, select the appropriate value in the Value drop-down list.
For each data source type, enter the appropriate name in the Value field.
(Optional) If you are extracting data from more than one source system container and want to stagger the data extracts, in the Delay field for the appropriate data source, enter a value for the number of minutes you want to delay the extract.
In the top pane of the Execution Plans tab, make sure the new execution plan is highlighted, and click Build.
DAC builds the execution plan.
Click the Ordered Tasks child tab and verify the following:
Click Details in the toolbar, and review each task's predecessor and successor tasks to confirm tasks common to multiple sources are ordered in a manner consistent with the priority of the source connection.
Confirm that load tasks appear only once even if there are multiple extracts for tables common to multiple sources.
For tasks common to multiple sources, click Preview Run Details in the toolbar, and confirm the following:
The first common task truncates the common target table and the following tasks do not.
The first common task truncates the common target table and the following tasks do not.
The execution plan is now ready to run as a micro ETL execution plan.
Create a schedule for the micro ETL execution plan. For instructions, see "Scheduling an Execution Plan".
Follow this procedure to schedule an execution plan.
To schedule an execution plan
In DAC, navigate to the Scheduler tab.
The current list of schedules appears in the top pane.
Click New in the top pane toolbar.
The Edit tab in the bottom pane becomes active.
Enter a name for the schedule.
Select an execution plan.
If you want the schedule to run once, select the Run Only Once check box, and then select a start and end date and time.
To create a periodic schedule, select a recurrence pattern, and enter the appropriate date and time parameters.
Click Save.
Refresh dates refer to the date of the last ETL process (the last time data was extracted from tables in a given database or loaded into tables in a given database). DAC uses the refresh dates to determine whether to run the incremental load commands or to run full load commands and whether to truncate the target tables.
Refresh dates are tracked only for tables that are either a primary source or a primary target on tasks in a completed run of an execution plan. DAC runs the full load command for tasks on which a table is a primary source or target if the refresh date against the table is null. When there are multiple primary sources, the earliest of the refresh dates will trigger a full load or an incremental load. If any one of the primary source tables has no refresh date, then DAC will run the full load command.
Table 9-1 shows the possible scenarios regarding refresh dates.
Table 9-1 Refresh Date Scenarios
Scenario | Table Type (in Tasks child tabs) | Refresh Date | Command DAC Will Use | Truncate Target Table? |
---|---|---|---|---|
1 |
Primary Source |
Null |
Full Load |
Yes |
1 |
Primary Target |
Null |
Not applicable |
Not applicable |
2 (See note below) |
Primary Source |
Null |
Full Load |
No |
2 |
Primary Target |
Not Null |
Not applicable |
Not applicable |
3 (See note below) |
Primary Source |
Not Null |
Full Load |
Yes |
3 |
Primary Target |
Null |
Not applicable |
Not applicable |
4 |
Primary Source |
Not Null |
Incremental Load |
No |
4 |
Primary Target |
Not Null |
Not applicable |
Not applicable |
Scenario 2. When two or more source tables load into the same target table as separate tasks, the source table in the second task may have refresh date as null while the target may have a refresh date.
Scenario 3. When a source loads into more than one target table in separate tasks, the refresh date may be null for the second target table while the source table may have refresh dates.
The Current Run tab in the Execute view provides predefined reports that enable you to monitor execution plan processes in order to isolate bottlenecks and enhance performance.
To monitor an execution plan process
In DAC, navigate to the Current Run tab.
Right-click and select Get Run Information.
The following options are available:
Get log file
Analyze run
Get chart
Get phase chart
Get graph