Skip Headers
Oracle® Fusion Middleware User's Guide for Oracle Business Intelligence Data Warehouse Administration Console
11g Release 1 (11.1.1)

Part Number E14849-06
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

11 Performance Tuning With DAC

This chapter provides information about various performance tuning capabilities in DAC related to indexes, tables, tasks, and workflows.

This chapter contains the following topics:

Managing Indexes

In general, during the ETL process, before a table is truncated, all the indexes, as defined in the repository, will be dropped before the data is loaded and recreated automatically after the data is loaded. The dropping of indexes before the data load and recreating them after the data load improves ETL performance.

If you create an index on the database and it is not registered in the DAC repository, the index will not be dropped and the load will fail.

For Teradata databases, only secondary indexes should be registered in DAC. You should not register primary indexes or the more complex indexes, such as single- and multi-table indexes, because they cannot be dropped and recreated.

In the DAC Design view, the Indices tab lists all the indexes associated with the selected source system container. It is recommended that you have a clear understanding of when and where an index will be used at the time you register it in DAC. It is also recommended that you do not register in DAC any indexes for source tables.

This section contains the following topics:

Index Behavior

The general sequence of actions related to indexes when a task runs is as follows:

  1. Tables are truncated.

  2. Indexes are dropped.

  3. Data is loaded into target tables.

  4. ETL indexes and unique indexes are recreated.

  5. Successor tasks are notified that ETL indexes were created.

  6. Query indexes and additional unique indexes are recreated.

  7. Tables are analyzed.

Note: Successor tasks can start before query-related and unique indexes are created.

Specifying Index Spaces for Indexes by Table Type

You can specify index spaces for indexes by table type. If you do not specify an index space by table type, then DAC will create indexes in the index space specified by the Default Index Space property in the Physical Data Sources tab of the Setup view. If you do not specify a value for the Default Index Space property, then DAC will create indexes in the default tablespace for the table owner of the database.

Note:

You must create the index space in the database before you can specify index spaces for indexes in DAC.

To specify an index space for indexes by table type:

  1. In the Physical Data Sources tab of the Setup view, click the Index Spaces subtab.

  2. In the subtab toolbar, click Generate.

  3. When asked if you want to generate database properties based on table type, click Yes.

    A list of available table types is displayed in the Table Type list.

  4. Click OK to close the Generating Properties dialog.

  5. In the Index Space column, enter an index space for each table type, and click Save.

Specifying How Many Indexes Can Be Created in Parallel

You can specify how many indexes can be created in parallel for a specific table or how many indexes can be created in parallel for all tables associated with a specific physical data source connection.

To specify how many indexes can be created in parallel for a specific table:

  1. In the Physical Data Sources tab of the Setup view, select the appropriate physical data source in the top pane list.

  2. Click the Parallel Indexes subtab.

  3. Click New in the Parallel Indexes toolbar.

  4. In the Name field, query for and enter the names of the table for which you want to specify how many indexes can be created in parallel.

  5. For each table, in the Number of Parallel Indexes field, enter the number of indexes you want to create in parallel.

  6. Click Save.

To specify how many indexes can be created in parallel for all tables associated with a physical data source:

  1. In the Physical Data Sources tab of the Setup view, select the appropriate physical data source in the top pane list.

  2. Click the Edit subtab.

  3. In the Num Parallel Indexes Per Table field, enter a numeric value to specify how many indexes can be created in parallel for a table on the specified physical data source.

  4. Click Save.

Defining a During ETL Index

You can override the point at which an index is created during an ETL process by defining an index with the During ETL usage type. You assign the During ETL index to the task that populates the table for which the index was defined. The During ETL index will then be created after the task to which you assign it runs. In this manner, using During ETL indexes enables you to control when the index is created during the ETL process.

Note: If you define a During ETL index, you need to make sure that the index is assigned to a task, or else the index will not be created. You should only assign a During ETL index to one task per execution plan. Only advanced DAC users with a thorough understanding of their data warehouse and ETL processes should define During ETL indexes.

To define a During ETL Index:

  1. Create a new index in DAC:

    1. In the Design view, go to the Indices tab, and click New in the top pane toolbar.

    2. In the Edit subtab, define the index with the appropriate properties. For a description of the properties, see "Indices Tab".

      Make sure you select During ETL as the Index Usage type.

    3. Click Save.

  2. Assign the index you created in step 1 to a task:

    1. Go to the Tasks tab, and query for the task to which you want to assign the During ETL index.

    2. Click the During ETL subtab.

    3. In the bottom pane toolbar, click Add/Remove.

    4. In the Name field, enter the name of the During ETL index, and click Go.

    5. Make sure the index is selected, and then click Add.

      The index appears in the list on the right side of the window.

    6. Click Refresh to populate the Table Name and Owner fields.

    7. Click OK to close the dialog.

    The During ETL index will be created on the table associated with the task after the task runs.

    Note:

    You can also modify an existing index to change the usage type to During ETL.

Defining Join Indexes on a Teradata Database

This section provides instructions for defining indexes on a Teradata database.

To define a join index on a Teradata database:

  1. Define a task action to drop indexes and a second task action to create indexes.

    For instructions on defining a task action, see "Defining a SQL Script for an Action".

  2. Create a task for dropping indexes. Select the following attributes:

    • SQL File as the Execution Type.

    • Pre ETL Process as the Task Phase.

    For instructions on creating tasks in DAC, see "Creating Tasks in DAC for New or Modified Informatica Workflows".

  3. Create a task for creating indexes. Select the following attributes:

    • SQL File as the Execution Type.

    • Post ETL Process as the Task Phase.

  4. Assign the task for dropping indexes as a preceding task on the appropriate execution plan.

    1. In the Execution Plans tab, select the appropriate execution plan.

    2. Click the Preceding Tasks subtab.

    3. In the subtab toolbar, click Add/Remove.

    4. In the Choose Preceding Tasks dialog, query for the task you created in step 2.

    5. Click Add.

    6. Click OK to close the Choose Preceding Tasks dialog.

  5. Assign the task for creating indexes as a following task on the appropriate execution plan.

    1. In the Execution Plans tab, select the appropriate execution plan.

    2. Click the Following Tasks subtab.

    3. In the subtab toolbar, click Add/Remove.

    4. In the Choose Following Tasks dialog, query for the task you created in step 3.

    5. Click Add.

    6. Click OK to close the Choose Following Tasks dialog.

  6. Rebuild the execution plan.

Using Actions to Optimize Indexes and Collect Statistics on Tables

This section applies to incremental loads only.

When an index is marked in DAC as Drop & Create Always or Drop & Create Bitmap Always (for Oracle databases), DAC drops the indexes before the data load and creates them after the data load. Performance can be improved by customizing and tuning how indexes are dropped and created during incremental loads.

When large fact tables are involved in an incremental load, dropping and creating indexes on the whole table can impact performance. Table partitioning can be a performance enhancing alternative to dropping and creating indexes. For example, if the data being loaded is mostly based on updates to recent data, you can partition the table based on time. If the incremental data affects the current year, it may be more beneficial to disable the current year partition than to drop the index on the whole table; and, similarly, it may be more beneficial to rebuild the indexes for the partition rather than create the indexes on the whole table. This can be achieved by using different syntaxes for full and incremental loads, as described in the following example.

This example illustrates, for incremental runs, how to create a parameter that can be used to override the drop and create index syntaxes. It also describes how to override the analyze table behavior so that only the current year partition is analyzed rather than the whole table.

This example assumes the table in the underlying schema has already been partitioned, with the partitioned names as PART_2001, PART_2002, and so on.

  1. Create a source system parameter called CURRENT_YEAR, using the Timestamp data type and the runtime variable @DAC_ETL_START_TIME_FOR_TARGET.

    For instructions on creating a source system parameter, see "Defining a Timestamp Type Parameter".

  2. Create a custom index action to override the drop index behavior.

    1. Create an index action called Drop Partitioned Index.

    2. Enter the following SQL:

      alter index getIndexName()
      modify partition PART_@DAC_$$CURRENT_YEAR
      unusable
      

      For instructions on defining an index action, see "Defining a SQL Script for an Action".

  3. Assign the Drop Partitioned Index action to indexes.

    1. In the Indices tab, query for the table name and indexes that need to be dropped and created during the incremental runs.

    2. For each record in the list of query results, right-click and select Add Actions.

    3. Select Drop Index as the Action Type, Incremental as the Load Type, and Drop Partitioned Index as the Action.

  4. Create a custom index action to override the create index behavior.

    1. Create an index action called Enable Partitioned Index.

    2. Enter the following SQL:

      alter index getIndexName()
      rebuild partition PART_@DAC_$$CURRENT_YEAR
      nologging
      
  5. Assign the Enable Partitioned Index action to indexes.

    1. In the Indices tab, query for the table name and indexes that need to be dropped and created during the incremental runs.

    2. For each record in the list of query results, right-click and select Add Actions.

    3. Select Drop Index as the Action Type, Incremental as the Load Type, and Enable Partitioned Index as the Action.

  6. Create a custom index action to override the analyze table behavior.

    1. Create an index action called Analyze Current Partition.

    2. Enter the following SQL:

      DBMS_STATS.GATHER_TABLE_STATS(
      NULL,
      TABNAME => 'getTableName()'
      CASCADE => FALSE,
      PARTNAME => PART_@DAC_CURRENT_YEAR,
      ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE,
      GRANULARITY => 'FOR ALL INDEXED COLUMNS SIZE AUTO',
      DEGREE => DBMS_STATS.DEFAULT_DEGREE)
      
    3. Select Stored Procedure as the Type.

  7. Assign the Analyze Current Partition action to tables.

    1. In the Tables tab, query for the table names that need to be analyzed during the incremental runs.

    2. For each record in the list of query results, right-click and select Add Actions.

    3. Select Analyze Table as the Action Type, Incremental as the Load Type, and Analyze Current Partition as the Action.

For a more detailed example of using partitions, see "Oracle Business Intelligence Applications Version 7.9.6.x Performance Recommendations [ID 870314.1]" on My Oracle Support.

Using Heuristics to Manage Tasks, Tables and Indexes

The DAC heuristics functionality enables you to gather intelligence about the amount of incremental data that will be processed during an ETL execution. To optimize performance, you can use this intelligence to decide whether a task will run, whether tables are analyzed, and whether indexes are dropped and created.

This chapter contains the following topics:

Note:

DAC heuristics applies only to incremental data loads and not full loads.

About DAC Heuristics

Typically, during an incremental ETL process, DAC drops indexes on target tables before a data load, recreates the indexes after the data load, and then analyzes the tables. These processes can impact overall ETL process times. DAC heuristics enables you to optimize performance by deciding whether tasks will run, tables are analyzed, and indexes dropped and created.

There are three parts to defining heuristics:

  1. Defining the heuristics rule.

  2. Associating the heuristics rule with a task.

  3. Providing the means for computing the source and target counts to be used in the computation of heuristics.

Using Heuristics to Determine Whether a Task Should Run

You can determine whether or not you want a task to run as part of an ETL process. DAC heuristics looks at the primary source tables (or other tables you specify by writing custom SQL) before a data load to see if there are any incremental data records. The decision about whether a task will run is based on a threshold value you set. If the incremental record count is equal or less than the threshold value, DAC will not execute the task.

For example, assume you use DAC heuristics to count the records in a primary staging table for a dimension load task, and you set a threshold value of zero. If DAC heuristics finds no records, DAC will not execute this task when the ETL process runs. If DAC heuristics finds one or more records, the task will be executed.

Using Heuristics to Determine Whether Indexes Are Dropped and Created

You can determine whether indexes are dropped and created during an ETL process. For this action, you configure a heuristics rule that either counts the number of incremental records in the primary staging table (or other tables you specify by writing custom SQL) or that provides a ratio of the incremental records in the primary source table and the total records in the target table. This value is then compared to the threshold value you set. If the count or ratio is equal or less than the threshold value, DAC will not drop and create indexes.

To determine an appropriate threshold value, you need to have an understanding of the impact that dropping and recreating indexes has on your database.

  • Example Using the Count Option: Assume you configure DAC heuristics to count the incremental records in the primary source table. You have determined performance is only impacted during an incremental load with indexes in place if there are more than ten records to be loaded, so you set the threshold value at 10. If DAC heuristics finds between zero and nine incremental records in the primary source table, DAC will not drop and recreate indexes when the data is loaded into the target tables. The data will be loaded with indexes in place. If DAC heuristics finds ten or more incremental records in the primary staging table, DAC will drop and recreate indexes during the load process.

  • Example Using the Ratio Option: Assume you configure DAC heuristics to provide a ratio of the number of incremental records in the primary source table and the total number of records in the target table. You have determined that if incoming records are 5% or more of the total target table record count, then performance can be impacted with indexes in place. Therefore, you set the threshold value at 5. If DAC heuristics finds the ratio of incremental records in the primary source table to be 5% or more of the total records in the target table, then DAC will drop and recreate indexes during the load process. If the percentage of incremental records is equal or less than 5%, DAC will load the new records with the indexes in place.

Using Heuristics to Determine Whether Tables Are Analyzed

You can determine whether or not you want tables to be analyzed after the ETL process runs. For this action, you configure a heuristics rule that either counts the number of incremental records in the primary staging table (or other tables you specify by writing custom SQL) or that provides a ratio of the incremental records in the primary source table and the total records in the target table. This value is then compared to the threshold value you set. If the count or ratio is equal or less than the threshold value, DAC will not analyze tables.

To determine an appropriate threshold value, you need to have an understanding of the impact that analyzing tables has on your database.

The count and ratio options work the same way for the action of analyzing tables as for dropping and creating indexes. For examples of these two options as they relate to dropping and creating indexes, see "Using Heuristics to Determine Whether Indexes Are Dropped and Created".

Note: This DAC heuristics setting overrides the Analyze Frequency (in Days) system property.

DAC Heuristics and Task Groups

If a child task of a task group has heuristics defined, the task can optionally be specified as a heuristics driver. Any number of child tasks within a task group can be specified as a heuristics driver. Before a task group is executed, DAC computes the task heuristics of each of the child tasks. If the computation of any of the heuristics driver tasks falls under the threshold value, none of the child tasks are executed. The status of these tasks in the Current Run tab of the Execute view is "Not Executed." The Status Description is updated with the cumulative summary of all the driver children as to why the task was not executed.

About Heuristics Rules and the Heuristics Dialog

A heuristics rule defines how the DAC heuristics functionality will manage the behavior of tasks, tables and indexes. Heuristics rules can apply to any DAC heuristics action; that is, you can use the same rule to indicate whether a task should run, whether tables should be analyzed, or indexes dropped and recreated. Heuristics rules apply to all containers. However, if you write custom SQL ("Writing Custom SQL for a Heuristics Rule") to specify tables other than the primary tables, the SQL is specific to the container in which you write it.

You use the Heuristics dialog to define heuristics rules. The Heuristics dialog contains the following fields:

  • Heuristic

    The possible values are Count and Ratio. These options provide flexibility in how you gather intelligence. The Count option provides a count of the number of incremental records in the primary staging table that will be included in the ETL process. The Ratio option provides the number of incremental records in the primary source table and the total number of records in the target table. This value is expressed in the threshold field as a percentage.

    The Count option can be useful if you are deciding whether or not to run a task. For example, if you only want a task to run if there is one or more records in a staging table, you would select the Count option and set the Threshold value at 1. DAC heuristics counts the number of records in the staging table. If it finds one or more records, the task will run. If it finds no records, the task will not run.

    The Ratio option can be useful if you are deciding whether to drop and create indexes or analyze tables. For examples of using the count and ratio options, see "Using Heuristics to Determine Whether Indexes Are Dropped and Created".

  • Metric

    Provides options for how multiple primary source tables are handled (that is, when more than one primary source table is associated with the same task). If you are defining a heuristics rule that involves a single primary source table, the Metric field does not apply.

    The possible values are the following:

    • Sum

      Aggregates the total of new or changed records in all of the primary source tables.

    • Average

      Returns an average of the new or changed records in all of the primary source tables. That is, it will divide the sum of all the new or changed records in all of the primary tables by the number of primary tables.

    • Max

      Returns the largest count of new or changed records of all of the primary source tables.

    • Min

      Returns the smallest count of new or changed records of all of the primary source tables.

  • Source Table

    The possible values are the following:

    • Primary

      Specifies that intelligence will be gathered on one or more primary tables associated with the task that this heuristics rule is assigned to.

    • With SQL Overrides

      Specifies that you will write custom SQL to include source tables other than the default primary source tables.

  • Threshold

    The numerical value above which DAC heuristics will allow a particular action to take place.

    For example, if you are doing a count of a primary staging table to determine whether a task should run, and if you want the task to run if one or more records appears in the table, you would set the Threshold value at 1.

    If you use the Ratio option, the threshold value is a percentage. For example, suppose you were defining a heuristics rule for determining whether tables should be analyzed after an ETL process, and you knew you wanted to analyze tables if the incoming rows were more than 5% of the total target rows. In this case, you would set the threshold value at 5.

Creating a Heuristics Rule

Follow this procedure to create a heuristics rule. After you create the rule, you need to assign it to a task. For instructions, see "Associating a Heuristics Rule With a Task".

For an overview of DAC heuristics, see "Using Heuristics to Manage Tasks, Tables and Indexes". For detailed information about the Heuristics dialog, see "About Heuristics Rules and the Heuristics Dialog".

To create a heuristics rule:

  1. In the DAC Client, on the Tools menu, select Seed Data, Heuristics.

    The Heuristics dialog is displayed.

  2. In the toolbar, click New.

  3. In the Name field, enter a descriptive name for the heuristics rule.

  4. In the Heuristics field, select one of the following:

    • Count

      Provides a count of the number of records in the primary source table that will be included in the ETL process (based on the task to which you assign the heuristics rule).

    • Ratio

      Provides a count of the number of records in the primary source table and the total number of records in the target table. If you select the Ratio option, the value of the Threshold field is a percentage.

  5. If this heuristics rule applies to a task with multiple primary tables, select one of the following:

    • Sum. Aggregates the total of new or changed records in all of the primary source tables.

    • Average. Returns an average of the new or changed records in all of the primary source tables.

    • Max. Returns the largest count of new or changed records of all of the primary source tables.

    • Min. Returns the smallest count of new or changed records of all of the primary source tables.

  6. In the Source Tables field, select one of the following:

    • Primary. Specifies that intelligence will be gathered on one or more primary tables associated with the task that this heuristics rule is assigned to.

    • With SQL Overrides. Specifies that you will write custom SQL to include tables other than the primary tables.

      For instructions on writing custom SQL to create a heuristics rule, see "Writing Custom SQL for a Heuristics Rule".

  7. In the Threshold field, enter a numerical value above which DAC heuristics will allow a particular action to take place.

    For examples, see "About Heuristics Rules and the Heuristics Dialog".

  8. (Optional) If you want to create this heuristics rule but leave it inactive, select Inactive.

  9. Click Save.

After you create the heuristics rule, you need to assign the rule to a task. For instructions, see "Associating a Heuristics Rule With a Task".

Associating a Heuristics Rule With a Task

Before you perform this procedure, you need to have created a heuristics rule by following the instructions in "Creating a Heuristics Rule".

For an overview of DAC heuristics, see "Using Heuristics to Manage Tasks, Tables and Indexes". For detailed information about the Heuristics dialog, see "About Heuristics Rules and the Heuristics Dialog".

In this procedure, you will associate a heuristics rule with a task.

To assign a heuristics rule to a task:

  1. In the Design view, select the Tasks tab.

  2. In the Tasks tab, select the task to which you want to assign the heuristics rule.

  3. Select the Extended Properties subtab.

  4. In the toolbar, click New.

    A new, empty record is created.

  5. Click in the Name field to expose the drop-down list, and then select Heuristics.

  6. Double-click in the Value field to open the Property Value dialog.

    The Property Value dialog enables you to select a heuristics rule that you created in the procedure "Creating a Heuristics Rule".

  7. Assign the heuristics rule to a task by doing one of the following:

    • For a rule that determines whether a task should run, click in the Task Heuristic field, and then select the appropriate heuristics rule.

    • For a rule that determines whether tables should be analyzed, click in the Table Heuristic field, and then select the appropriate heuristics rule.

    • For a rule that determines whether indexes should be dropped and recreated, click in the Index Heuristic field, and then select the appropriate heuristics rule.

  8. Click OK in the Select Value dialog, and then click OK in the Property Value dialog.

The heuristics rule is now assigned to a task and will be processed when the task to which it is assigned is called by an execution plan.

Heuristics rules are reflected in the Status Description field in the Task Details tab of the Details dialog, which is accessible by going to the Tasks subtab in the Current Runs or Run History tabs and clicking the Details button on subtab toolbar. The Status Description will indicate whether a step was skipped because of a heuristics rule, and the task status will be Not Executed.

Writing Custom SQL for a Heuristics Rule

When you create a heuristics rule, you have the option of gathering intelligence on primary source tables, which is the default option, or writing custom SQL to gather intelligence on tables other than primary source tables.

Note: When you write custom SQL for a heuristics rule, it applies only to the source system container you select before you write the SQL.

For an overview of DAC heuristics, see "Using Heuristics to Manage Tasks, Tables and Indexes". For detailed information about the Heuristics dialog, see "About Heuristics Rules and the Heuristics Dialog".

Note:

Before you perform this procedure, you must have already created a heuristics rule, by following the procedure "Creating a Heuristics Rule", and selected the option With SQL Overrides in the Source Tables field.

To write custom SQL to create a heuristics rule:

  1. In the Design view, select the appropriate source system container from the drop-down list.

  2. Select the Container Specific Actions tab.

  3. Select Heuristic from the rightmost drop-down list on the toolbar.

  4. In the toolbar, click New.

    A new, empty record is created.

  5. In the Name field, enter a descriptive name for the SQL, and then click Save.

  6. Click in the Value field to open the Value dialog.

  7. Select a format for the tree view on the left side of the window.

    • Flat view displays the SQL entries in a list format in their order of execution.

    • Category view displays the entries by the categories Custom SQL and Stored Procedure.

      You can reorder the entries in the tree by dragging and dropping them.

  8. Click Add in the right-hand toolbar.

    A new, empty record is created in the top pane.

  9. Enter or select the appropriate information.

    Field Description

    Name

    Enter a logical name for the SQL block.

    Type

    Select one of the following:

    • Select SQL. Indicates the SQL you enter will be a SELECT statement.

    • Stored Procedure. Indicates the SQL you enter will be a stored procedure.

    Continue on Fail

    Specifies whether an execution should proceed if a given SQL block fails.

    Retries

    Specifies how many retries are allowed. If the number is not positive, a default number of one (1) will be used.

    Valid Database Platform

    Specifies the valid database platforms against which the SQL will run. If this field is left empty, the SQL can be run against any database.


  10. In the lower-right side text box, enter a SQL statement.

    The SQL Statement tab to the left of the text box lists the supported SQL functions and DAC source system parameters that you can use in constructing custom SQL statements. Double-click a function or source system parameter to move it into the text box.

    The source systems parameters list contains the applicable source system parameters defined in the DAC repository, with the prefix @DAC_. During runtime, the DAC Server resolves the source system parameter and replaces its name with the runtime value.

    See Table 11-1 for a list of supported SQL functions.

    Table 11-1 Functions for Heuristics SQL

    Function Description

    getAnalyzeTableStatement()

    Returns the default DAC analyze table statement.

    getDBType()

    Returns the physical data source connection type (Oracle OCI8, Oracle Thin, DB2, DB2-390, MSSQL, Teradata, or BI Server).

    getImageSuffix()

    Returns the table image suffix if one exists. Otherwise, the string is empty.

    getNamedSource()

    Returns the physical connection name in DAC.

    getTableName()

    Returns the table name.

    getTableOwner()

    Returns the table owner.

    getTableSpace()

    Returns the table space name if one exists. Otherwise, the string is empty.

    getTruncateTableStatement()

    Returns the default DAC truncate table statement.

    toString()

    Returns a string value.


  11. (Optional) Enter a comment about the SQL in the Comment tab.

  12. Click OK.

  13. If the SQL statement contains source tables, assign the heuristics SQL to the source tables.

    1. In the Design view, click the Tasks tab.

    2. Select the task whose source table is the one to which you want to assign the heuristics SQL.

    3. Click the Source Tables subtab.

    4. Select or query for the appropriate source table.

    5. Click in the Task Count Override field.

      The Choose SQL Override dialog is displayed.

    6. Select or query for the SQL heuristics rule you created in the preceding steps, then click OK.

  14. Assign the heuristics SQL to a target table.

    1. In the Design view, click the Tasks tab.

    2. Select the task whose target table is the one to which you want to assign the heuristics SQL.

    3. Click the Target Tables subtab.

    4. Select or query for the appropriate target table.

    5. Click in the Count Override field.

      The Choose SQL Override dialog is displayed.

    6. Select or query for the SQL heuristics rule you created in the preceding steps, then click OK.

      The heuristics SQL is now assigned to a task and will be processed when the task to which it is assigned is called by an execution plan.

Looping of Workflows

You can configure the full and incremental load commands for tasks to repeat (or loop) multiple times during the execution of an ETL process. This feature can be useful to enhance performance in the following ways:

The process for setting up workflows for looping involves two steps: First you define the looping properties, and then you create parameters to access the looping properties you defined.

This section includes the following topics:

Defining a Looping Property

Follow this procedure to define the properties for looping a workflow.

To define a workflow for looping:

  1. Enable the Informatica workflow property Configure Concurrent Execution:

    1. In Informatica Workflow Manager, open the appropriate workflow in the Workflow Designer.

    2. On the menu bar, select Workflows, and then select Edit.

    3. In the Edit Workflow dialog, select the Enabled check box for the Configure Concurrent Execution property.

    4. Click the Configure Concurrent Execution button.

    5. In the Configure Concurrent Execution dialog, select Allow concurrent run with same instance name.

    6. Click OK.

  2. In the DAC Design view, select the appropriate container from the drop-down list.

  3. In the Tasks tab, query for the task for which you want to configure looping.

  4. With the appropriate task selected in the top pane, click the Extended Properties subtab.

  5. In the bottom pane toolbar, click New.

    A new record is created with the name Loops. You cannot change the name of this record.

  6. Click in the Value field of the new record.

    The Property Value dialog is displayed.

  7. In the Loops field, enter a numeric value to specify how many times you want the workflow to repeat.

    The number of loops that you specify is applicable to all task execution types, that is, Informatica, SQL File, Stored Procedure, and External Program.

  8. (Optional) Select the Parallel check box to have the run instances of the same task execute in parallel. If you do not select the Parallel check box, the run instances of the same task will be executed serially.

    Note:

    • Run instances will be executed in parallel only if resources are available. For example, if your environment is set up to run 10 workflows in parallel, and a task's Number of Loops property is set to 20, only the first 10 instances will be executed. These 10 instances will have a Current Runs status of Running. The remaining 10 instances will have a status of Runnable. For a description of all Current Runs statuses, see "Current Runs Tab".

    • When the instances are executed in serial, the first failure encountered stops the rest of the executions.

    • When the instances are executed in parallel, any failure in the running batch stops the rest of the executions.

  9. (Optional) Select the Restart All check box to do the following:

    • Reissue the truncate command (if it exists as a task detail).

    • Mark all task details pertaining to the run instances as Queued. The Queued status enables them to be executed again.

      If you do not select the Restart All check box, the execution of the task will resume from the point of failure. If any runtime instances of the workflow fails, the task itself is marked as Failed. If you enable the Restart All property, the truncate command (if it exists as a task detail) is reissued, and all the task details pertaining to the run instances are marked as Queued and get re-executed. If you do not enable the Restart All property, the execution resumes from the point of failure

  10. Click OK in the Property Value dialog.

  11. Click Save in the subtab toolbar.

When the task executes, DAC assigns an instance number to each instance of the workflow. You can view the instance number in the Task Details subtab of the Current Runs tab.

The instance number is accessible as a predefined DAC variable named @DAC_TASK_RUN_INSTANCE_NUMBER. You can use this DAC variable as a runtime value for parameters that you create. The DAC variable @DAC_TASK_RUN_INSTANCE_NUMBER is available for use with parameters at the source system and task level.

Accessing the Loops Properties Using Parameters

To access the information defined in the looping properties, you need to create parameters in DAC that will be consumed by Informatica at runtime. You can create static or runtime parameters at either the task or source system level. The parameter values can be any of the following:

  • Total number of loops

  • Current instance number

  • Current instance number in reverse

  • Conditional statement

    For example, if the loop number is one, provide value x, or else provide the value y.

For more information about how parameters are used in DAC, see "Defining and Managing Parameters".

To create a parameter with a static value:

  1. In the DAC Design view, select the appropriate container from the drop-down list.

  2. Do one of the following:

    • To set up a parameter at the task level, go to the Task tab, and click the Parameters subtab.

    • To set up a parameter at the source system level, go to the Source System Parameters tab.

  3. Click New in the toolbar.

  4. In the Name field, enter a name for the parameter.

  5. In the Data Type field, select Text.

  6. In the Load Type field, select one of the following.

    • Full. To specify a full load.

    • Incremental. To specify an incremental load.

    • Both. To specify both full and incremental loads.

  7. Click Save to save the record.

    You must save the record before proceeding.

  8. Click in the Value field to open the Enter Parameter Value dialog.

  9. Select Static.

  10. Enter the appropriate value.

  11. Click Save in the toolbar.

To parameterize the Loops property as a runtime value

  1. In the Design view, select the appropriate container from the drop-down list.

  2. Do one of the following:

    • To set up a parameter at the task level, go to the Task tab, and click the Parameters subtab.

    • To set up a parameter at the source system level, go to the Source System Parameters tab.

  3. Click New in the toolbar.

  4. In the Name field, enter a name for the parameter.

  5. In the Data Type field, select Text.

  6. In the Load Type field, select one of the following.

    • Full. To specify a full load.

    • Incremental. To specify an incremental load.

    • Both. To specify both full and incremental loads.

  7. Click Save to save the record.

    You must save the record before proceeding.

  8. Click in the Value field to open the Enter Parameter Value dialog.

  9. Select Runtime.

  10. Select the appropriate runtime variable.

  11. Click OK.

  12. Click Save in the toolbar.

Parallelizing the Load Process on a Fact Table

Typically, a long running task is constrained by single reader and writer threads on Informatica. The looping of workflows parallelizes the reads and writes such that mutually exclusive records can be read from and written to without having overlaps.

For example, if a fact table load takes a long time to run because the volume of incremental data is high, then you can parallelize the load process by doing the following:

  1. Modify the schema for the staging table and introduce a numerical column called LOOP_NUMBER.

  2. In DAC, create a task level parameter for the extract process called NUMBER_OF_LOOPS, and set the value to 5.

  3. Modify the extract mapping for the Informatica task so that each of the records loaded in the staging table gets a value of from 1 to 5.

  4. Create a task level parameter in DAC for the load process called NUMBER_OF_LOOPS, and set it to the same value as the extract mapping.

  5. Create a new parameter at the task level called CURRENT_LOOP_NUMBER, and set the runtime variable as @DAC_CURRENT_INSTANCE_NUMBER. This value will be populated with a different number, from 1 to 5, for each invocation of the task.

  6. In the extended properties for the task, create a new record for "Loops" and set the value as @DAC_NUMBER_OF_LOOPS.

  7. Modify the Informatica mapping to include a where clause for the load logic:

    WHERE LOOP_NUMBER = $$current_LOOP
    

    When DAC runs the load process, it will spawn off five processes in parallel, and will send a specific value for the CURRENT_LOOP_NUMBER parameter to Informatica. Because DAC is reading and writing in parallel, the data load time can be reduced.

Creating Logical Partitions on Load Tasks to Increase Performance

Slow performance can be caused by a small number of tasks holding up many other tasks, which is often due to single reader and writer threads on Informatica. You can performance tune long running tasks that hold up other tasks by creating logical partitions on the task itself. The procedure below provides a typical way to achieve logical partitioning of the task.

To create logical partitions on load tasks:

  1. On the staging table, create a column to store the logical partition number. In this example, the column is named log_part_num.

  2. Create a bitmap index on the log_part_num column.

  3. In the Informatica SDE mapping, make the extract-into-staging task populate the logical partition column. This divides the data into 'n' number of buckets.

    For example, mod(rownum, 4) +1 creates four buckets with the numbers 1, 2, 3, and 4.

    Now, the load mapping will run once in parallel for each logical partition. This can help improve slow performance caused by single reader and writer threads, and, thus, reduce the time taken by a mapping having huge incremental data.

  4. In the Informatica SIL mapping, introduce a new mapping parameter called $$LOGICAL_PARTITION_NUM. Use this parameter in the SQL override.

    For example:

    select * from staging_table
    where ... and log_part_num = $$LOGICAL_PARTITION_NUM
    
  5. In the Informatica SIL workflow, edit the workflow properties, and select the "Enable Concurrent Execution" property.

  6. Define a looping property:

    1. In DAC, select the SIL task.

    2. Define a looping property, and assign 4 as the value for the number of loops.

      See "Defining a Looping Property" for instructions. Note that enabling the Informatica Workflow property "Configure Concurrent Execution" is a step in the procedure Defining a Looping Property, and you must complete this step.

  7. In DAC, define a parameter at the task level called $$LOGICAL_PARTITION_NUM. Use the Text parameter type, and assign it the runtime value @DAC_TASK_RUN_INSTANCE_NUMBER.

    When DAC runs this task, it will run four instances of it in parallel. Each instance invocation will be called with the following convention:

    • The $$LOGICAL_PARTITION_NUM will get distinct values, one per instance.

    • The workflows will be executed with the instance name with the instance number suffix.

Customizing customsql.xml to Drop and Create Indexes and Analyze Tables

The customsql.xml file, located in the <DAC_Config_Location>\CustomSQLs directory, contains the default syntax DAC uses for dropping and creating indexes and analyzing tables. You can edit the customsql.xml file to change the behavior of these operations.

Note: Modifying the customsql.xml file affects the syntax on a global level, for all execution plans. If you want to do customizations at the table or index level, you should consider using the Action functionality. See "Using Actions to Optimize Indexes and Collect Statistics on Tables" for more information.

To edit the Analyze Table syntax:

  1. Open the customsql.xml file located in the <DAC_Config_Location>\CustomSQLs directory.

  2. Locate the Analyze Table syntax for the appropriate database type.

    For example, the syntax for an Oracle database is as follows:

    <SqlQuery name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname => '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt => 'FOR ALL COLUMNS SIZE AUTO',cascade => true ) </SqlQuery>
    
  3. Edit the syntax.

    For example, to gather statistics for only the indexed columns, edit the syntax as follows:

    <SqlQuery name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname => '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt => 'FOR ALL INDEXED COLUMNS',cascade => true ) </SqlQuery>
    

    Note:

    The variables @TABLEOWNER, %1, %2, and so on, will be substituted appropriately by the DAC when the statement is executed.

To edit the Create Index syntax

  1. Open the customsql.xml file located in the <DAC_Config_Location>\CustomSQLs directory.

  2. Locate the Create Index syntax for the appropriate database type, and edit the syntax.

Performance Tuning the Siebel Change Capture Process

DAC performs the change capture process for Siebel source systems. This process has two components:

See "SQL for Change Capture and Change Capture Sync Processes" for the SQL blocks used for both of these processes.

Supporting Source Tables

The source tables that support the change capture process are as follows:

Full and Incremental Change Capture Processes

The full change capture process (for a first full load) does the following:

  1. Inserts records into the S_ETL_R_IMG table, which has been created or modified for the prune time period.

  2. Creates a view on the base table. For example, a view V_CONTACT would be created for the base table S_CONTACT.

The incremental change capture process (for subsequent incremental loads) does the following:

  1. Queries for all records that have changed in the transactional tables since the last ETL date, filters them against the records from the R_IMG table, and inserts them into the S_ETL_I_IMG table.

  2. Queries for all records that have been deleted from the S_ETL_D_IMG table and inserts them into the S_ETL_I_IMG table.

  3. Removes the duplicates in the S_ETL_I_IMG table. This is essential for all the databases where "dirty reads" (queries returning uncommitted data from all transactions) are allowed.

  4. Creates a view that joins the base table with the corresponding S_ETL_I_IMG table.

Performance Tips for Siebel Sources

This section contains the following performance tips:

Performance Tip: Reduce Prune Time Period

Reducing the prune time period (in the Connectivity Parameters subtab of the Execution Plans tab) can improve performance, because with a lower prune time period, the S_ETL_R_IMG table will contain a fewer number of rows. The default prune time period is 2 days. You can reduce it to a minimum of 1 day.

Note: If your organization has mobile users, when setting the prune time period, you must consider the lag time that may exist between the timestamp of the transactional system and the mobile users' local timestamp. You should interview your business users to determine the potential lag time, and then set the prune time period accordingly.

Performance Tip: Eliminate S_ETL_R_IMG From the Change Capture Process

If your Siebel implementation does not have any mobile users (which can cause inaccuracies in the values of the "LAST_UPD" attribute), you can simplify the change capture process by doing the following:

  • Removing the S_ETL_R_IMG table.

  • Using the LAST_REFRESH_DATE rather than PRUNED_LAST_REFRESH_DATE.

To override the default DAC behavior, add the following SQL to the customsql.xml file before the last line in the file, which reads as </SQL_Queries>. The customsql.xml file is located in the dac\CustomSQLs directory.

<SqlQuery name = "CHANGE_CAPTURE_FULL_LOAD">
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
TRUNCATE TABLE S_ETL_D_IMG_%SUFFIX
;
</SqlQuery>

<SqlQuery name = "CHANGE_CAPTURE_INCREMENTAL_LOAD">
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX
     (ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
     SELECT
          ROW_ID
          ,MODIFICATION_NUM
          ,'I'
          ,LAST_UPD
     FROM
          %SRC_TABLE
     WHERE
          %SRC_TABLE.LAST_UPD > %LAST_REFRESH_TIME
          %FILTER
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX
     (ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
     SELECT
          ROW_ID
          ,MODIFICATION_NUM
          ,'D'
          ,LAST_UPD
     FROM
          S_ETL_D_IMG_%SUFFIX
     WHERE NOT EXISTS
     (
          SELECT
                 'X'
          FROM
                 S_ETL_I_IMG_%SUFFIX
          WHERE
                 S_ETL_I_IMG_%SUFFIX.ROW_ID = S_ETL_D_IMG_%SUFFIX.ROW_ID
          )
;
</SqlQuery>

<SqlQuery name = "CHANGE_SYNC_INCREMENTAL_LOAD">
DELETE
FROM S_ETL_D_IMG_%SUFFIX
WHERE
     EXISTS
     (SELECT
                 'X'
     FROM
                 S_ETL_I_IMG_%SUFFIX
     WHERE
                 S_ETL_D_IMG_%SUFFIX .ROW_ID = S_ETL_I_IMG_%SUFFIX.ROW_ID
                 AND S_ETL_I_IMG_%SUFFIX.OPERATION = 'D'
     )
;
</SqlQuery>

Performance Tip: Omit the Process to Eliminate Duplicate Records

When the Siebel change capture process runs on live transactional systems, it can run into deadlock issues when DAC queries for the records that changed since the last ETL process. To alleviate this problem, you need to enable "dirty reads" on the machine where the ETL is run. If the transactional system is on a database that requires "dirty reads" for change capture, such as MSSQL, DB2, or DB2-390, it is possible that the record identifiers columns (ROW_WID) inserted in the S_ETL_I_IMG table may have duplicates. Before starting the ETL process, DAC eliminates such duplicate records so that only the record with the smallest MODIFICATION_NUM is kept. The SQL used by DAC is as follows:

<SqlQuery name = "FIND_DUPLICATE_I_IMG_ROWS">
SELECT
     ROW_ID, LAST_UPD, MODIFICATION_NUM
FROM
     S_ETL_I_IMG_%SUFFIX A
WHERE EXISTS
     (
     SELECT B.ROW_ID, COUNT(*)  FROM S_ETL_I_IMG_%SUFFIX B
               WHERE B.ROW_ID = A.ROW_ID
                 AND B.OPERATION = 'I'
                 AND A.OPERATION = 'I'
     GROUP BY
         B.ROW_ID
     HAVING COUNT(*) > 1
     )
AND A.OPERATION = 'I'
ORDER BY 1,2
</SqlQuery>

However, for situations where deadlocks and "dirty reads" are not an issue, you can omit the process that detects the duplicate records by using the following SQL block. Copy the SQL block into the customsql.xml file before the last line in the file, which reads as </SQL_Queries>. The customsql.xml file is located in the dac\CustomSQLs directory.

<SqlQuery name = "FIND_DUPLICATE_I_IMG_ROWS">
SELECT
     ROW_ID, LAST_UPD, MODIFICATION_NUM
FROM
     S_ETL_I_IMG_%SUFFIX A
WHERE 1=2
</SqlQuery>

Performance Tip: Manage Change Capture Views

DAC drops and creates the incremental views for every ETL process. This is done because DAC anticipates that the transactional system may add new columns on tables to track new attributes in the data warehouse. If you do not anticipate such changes in the production environment, you can set the DAC system property "Drop and Create Change Capture Views Always" to "false" so that DAC will not drop and create incremental views. On DB2 and DB2-390 databases, dropping and creating views can cause deadlock issues on the system catalog tables. Therefore, if your transactional database type is DB2 or DB2-390, you may want to consider setting the DAC system property "Drop and Create Change Capture Views Always" to "false." For other database types, this action may not enhance performance.

Note: If new columns are added to the transactional system and the ETL process is modified to extract data from those columns, and if views are not dropped and created, you will not see the new column definitions in the view, and the ETL process will fail.

Performance Tip: Determine Whether Informatica Filters on Additional Attributes

DAC populates the S_ETL_I_IMG tables by querying only for data that changed since the last ETL process. This may cause all of the records that were created or updated since the last refresh time to be extracted. However, the extract processes in Informatica may be filtering on additional attributes. Therefore, for long-running change capture tasks, you should inspect the Informatica mapping to see if it has additional WHERE clauses not present in the DAC change capture process. You can modify the DAC change capture process by adding a filter clause for a <named source>.<table name> combination in the ChangeCaptureFilter.xml file, which is located in the dac\CustomSQLs directory.

SQL for Change Capture and Change Capture Sync Processes

The SQL blocks used for the change capture and change capture sync processes are as follows:

<SqlQuery name = "CHANGE_CAPTURE_FULL_LOAD">
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
TRUNCATE TABLE S_ETL_R_IMG_%SUFFIX
;
TRUNCATE TABLE S_ETL_D_IMG_%SUFFIX
;
INSERT %APPEND INTO S_ETL_R_IMG_%SUFFIX
     (ROW_ID, MODIFICATION_NUM, LAST_UPD)
     SELECT
          ROW_ID
          ,MODIFICATION_NUM
          ,LAST_UPD
     FROM
          %SRC_TABLE
     WHERE
          LAST_UPD > %PRUNED_ETL_START_TIME
          %FILTER
;
</SqlQuery>

<SqlQuery name = "CHANGE_CAPTURE_INCREMENTAL_LOAD">
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX
     (ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
     SELECT
          ROW_ID
          ,MODIFICATION_NUM
          ,'I'
          ,LAST_UPD
     FROM
          %SRC_TABLE
     WHERE
          %SRC_TABLE.LAST_UPD > %PRUNED_LAST_REFRESH_TIME
          %FILTER
          AND NOT EXISTS
          (
          SELECT
                  ROW_ID
                  ,MODIFICATION_NUM
                  ,'I'
                  ,LAST_UPD
          FROM
                  S_ETL_R_IMG_%SUFFIX
          WHERE
                  S_ETL_R_IMG_%SUFFIX.ROW_ID = %SRC_TABLE.ROW_ID
                  AND S_ETL_R_IMG_%SUFFIX.MODIFICATION_NUM = %
SRC_TABLE.MODIFICATION_NUM
                  AND S_ETL_R_IMG_%SUFFIX.LAST_UPD = %
SRC_TABLE.LAST_UPD
                )
;
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX 
          (ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
          SELECT
                  ROW_ID
                  ,MODIFICATION_NUM
                  ,'D'
                  ,LAST_UPD
          FROM
                  S_ETL_D_IMG_%SUFFIX
          WHERE NOT EXISTS
          (
                  SELECT
                           'X'
                  FROM
                           S_ETL_I_IMG_%SUFFIX
                  WHERE
                           S_ETL_I_IMG_%SUFFIX.ROW_ID = S_ETL_D_IMG_%
SUFFIX.ROW_ID
          )
;
</SqlQuery>

<SqlQuery name = "CHANGE_SYNC_INCREMENTAL_LOAD">
DELETE
FROM S_ETL_D_IMG_%SUFFIX
WHERE
          EXISTS
          (
          SELECT
                    'X'
          FROM
                    S_ETL_I_IMG_%SUFFIX
          WHERE
                    S_ETL_D_IMG_%SUFFIX.ROW_ID = S_ETL_I_IMG_%SUFFIX.ROW_ID
                    AND S_ETL_I_IMG_%SUFFIX.OPERATION = 'D'
          )
;
DELETE
FROM S_ETL_I_IMG_%SUFFIX
WHERE LAST_UPD &lt; %PRUNED_ETL_START_TIME
;
DELETE
FROM S_ETL_I_IMG_%SUFFIX
WHERE LAST_UPD &gt; %ETL_START_TIME
;
DELETE
FROM S_ETL_R_IMG_%SUFFIX
WHERE
          EXISTS
          (
           SELECT
                    'X'
          FROM
                    S_ETL_I_IMG_%SUFFIX
          WHERE
                    S_ETL_R_IMG_%SUFFIX.ROW_ID = S_ETL_I_IMG_%SUFFIX.ROW_ID
          )
;
INSERT %APPEND INTO S_ETL_R_IMG_%SUFFIX 
          (ROW_ID, MODIFICATION_NUM, LAST_UPD)
          SELECT
                  ROW_ID
                  ,MODIFICATION_NUM
                  ,LAST_UPD
          FROM
                  S_ETL_I_IMG_%SUFFIX
;
DELETE FROM S_ETL_R_IMG_%SUFFIX WHERE LAST_UPD &lt; %PRUNED_ETL_START_TIME
;
</SqlQuery>

Performance Tuning the ETL Process Using Tasks by Depth Command

The Tasks by Depth right-click command is available in the Execution Plans tab of the Execute view. This command provides a data representation of the execution graph by showing how many tasks in an execution plan are at each depth level. The depth specifies the level of dependency.

For example, a task at depth 4 depends on one or more tasks at the level of depth 3, and a task at depth 3 depends one or more tasks at the level of depth 2, and so on. When an execution plan runs, DAC moves any task whose predecessors have completed into the execution queue. Therefore, tasks at a higher depth can execute at the same time as tasks with a lower depth.

Knowing how many tasks are at each depth level can help with performance tuning. A smaller number of tasks at any given depth can signify slow performance. For example, in Figure 11-1 task D is the only task at depth 1. This means that all of the tasks in depth 2 depend on task D. Situations such as this in which there is a small number of tasks at a depth level can cause slow performance and elongate the ETL process. You may want to tune the tasks at depths that have a small number of tasks.

Figure 11-1 Tasks by Depth

This graphic is described in the surrounding text.