Designing Data Integrator Projects

Execution Strategies

Sun Data Integrator automatically employs the most optimum execution strategy for collaborations. The strategy that is employed depends on the specific nature of the collaboration. If you do not want Data Integrator to determine the best execution strategy, you can configure a collaboration for either the staging or the pipeline execution strategy depending on what your Collaboration is set up to do. For example, if your collaboration business logic contains Java operators, you can only use the pipeline strategy. The following describes the criteria that Data Integrator uses to determine the best execution strategy, and these are also the criteria to use when you force an execution strategy.

Execution Strategy Selection

The execution strategy configuration for collaborations is set to the Best Fit option by default. When Data Integrator determines what execution strategy to use for a collaboration, it evaluates the collaboration for specific attributes. For example, it takes tables and columns into consideration. In addition, the selected execution strategy depends on whether a collaboration contains Java operators, which are operators that are not available across all supported databases. Examples of Java operators are date transformation operators and operators that parse business names and addresses and that normalize names. Projects with Java operators must be executed with a pipeline strategy.

You can force the execution strategy for a collaboration by changing its setting from Best Fit to Staging or Pipeline, as described in "Forcing Execution Strategies for Collaborations” .

Data Integrator uses the following execution methods depending on a collaboration's attributes:

Direct/Simple Execution Strategy

With the direct/simple execution strategy, all extraction, transformation, and loading happens in a single database. When the Best Fit option is enabled, Data Integrator uses this strategy under the following conditions:

One Pass Execution Strategy

With one pass execution, extraction and transformation occur in the source database. When the Best Fit option is enabled, Data Integrator uses this strategy under the following conditions:

Staging Execution Strategy

With the staging execution strategy, all source tables are extracted from source databases and staged in temporary tables in the target database. Join and Transformation happens in the target database. This setting is used automatically when the Best Fit option is enabled and the conditions below occur. You can also select this option manually to force its use, in which case this execution strategy is recommended under the following conditions:

Pipeline Execution Strategy

With the Pipeline execution strategy, transformation and loading (indirectly to the target database table) occurs in the internal database engine. This setting is used automatically when the Best Fit option is enabled and the conditions below occur. You can also select this option manually to force its use, in which case this execution strategy is recommended under the following conditions:

Whitespace Considerations

Sun Data Integrator handles whitespace differently depending on the execution strategy. When joining a flat file table and an Oracle table where the comparison column in the Oracle table contains whitespace, refer to the table below.

Table 1–1 Execution Strategies for Flat File and Oracle Table Joins

Strategy Specified 

Description 

Best Fit 

Uses the staging execution strategy since the source tables are from different databases. The results will be the same as if staging was selected. 

Staging 

Data Integrator extracts source tables from source databases and stages the data in temporary tables in the target database. By default, whitespace is trimmed. 

Pipeline  

Data Integrator uses an internal database engine instead of temporary tables, accessing data directly from the source tables rather than extracting it to temporary tables. To avoid whitespace causing failure in the join condition, add LTRIM/RTRIM operators to the Oracle table column. The result will be the same as Staging/Best Fit. 


Note –

In future this feature will be made obsolete