Designing Data Integrator Projects

Fine-Tuning the ETL Process

ETL collaborations can extract data without filtering or with filtering using runtime inputs. You can also configure the batch size and configure the collaboration to use the same source table multiple time. Perform any of the following steps to configure the data extraction.

Filtering Source Data Using Runtime Inputs

Sun Data Integrator allows you to pass values, known as runtime inputs, to ETL collaborations at runtime. You can use these values in extraction conditions. However, the use of such dynamic values are not limited to extraction; you can also pass values from BPEL business processes.

The following procedure describes how to add input runtime arguments to a Collaboration.

ProcedureTo Filter Source Data Using Runtime Inputs

  1. Open the collaboration you want to edit.

  2. Right-click the ETL Collaboration Editor window and select Runtime Inputs.

    The Add Input Runtime Arguments dialog box appears.

    Figure shows the Add Input Runtime Arguments window.
  3. Click Add.

    An empty row appears.

  4. Double-click the empty row under Argument Name and enter the name for source record to be filtered.

  5. Press Tab and enter the content that the record must contain to be selected.

  6. Press Tab and select the SQL type for the record.

  7. Press Tab and enter a number indicating the maximum length of the record.

  8. Press Tab and enter a number indicating the scale for the record.

  9. Click OK.

Setting the Batch Size for Joined Tables

To increase performance during collaboration execution, you can configure the batch size for the temporary tables created for joined source tables. By tuning the batch size you can load data more efficiently into source tables.

By default, 5000 rows are populated at the same time into a source table. There is no upper limit to the batch size. The limit is determined by the amount of internal memory available on the machine running the collaboration. Generally, the lower the number the better, but adjust the value to determine the optimum performance.


Note –

The source table batch size only affects temporary source tables. To limit the number of rows fetched at a time, specify the batch size in the Properties panel for the target table.


ProcedureTo Set the Batch Size for Joined Tables

  1. Open the collaboration you want to edit.

  2. Right-click the source table to set the batch size for, and then select Properties.

    The Properties panel appears.

    Figure shows the Source Table – Properties window.
  3. In the Batch Size property (under the Expert heading), enter the number of rows to populate at the same time into the temporary source table.

  4. Click OK.

Using Table Aliases with Multiple Source Table Views

Sun Data Integrator only allows you to map a column in a source table to one column in a target table. If you need to map one source column to multiple target columns, you can use multiple instances of the same source table with different aliases. This topic gives a scenario and example for doing this.

The project has the following source tables: EMP_TBL and CODES_TBL. You can create a join view with these tables and you can drag another view of the CODES_TBL to the ETL Collaboration Editor canvas to create a third join. The third join is used in a code lookup.

The following table displays the sample data for the EMP_TBL source table:

Table 1–3 Employee Table

NAME

ID

JOB CODE

DEPT CODE

Dave 

D1 

Judy 

D2 

The following table displays the sample data for the CODES_TBL source table:

Table 1–4 Company Codes

CODE

VALUE

D1 

Human Resource 

D2 

Marketing 

Permanent 

Contractor 

The following figure shows the Collaboration and mapping with the correct data from a test run. The lookup loads the description for both jobs and departments from the CODES_TBL table. In this example, the table CODES_TBL is used twice in the join condition with aliases S2 and S3. In the join condition S2.Code is joined with S1.JOB_CODE and S3.Code is joined with S1.DEPT_CODE.

Figure shows the contents of a table in an ETL collaboration.

As you can see in the following figure, the first join view shows the condition S1.JOB_CODE = S2.CODE. This will load the job descriptions from the CODES_TBL to the target table column JOB.

Figure shows the Edit Join Condition window.

The following figure shows the second join view with the condition S1.DEPT_CODE = S3.CODE. This loads the department descriptions from the CODES_TBL to the target table column DEPT.

Figure shows the Edit Join Conditions window.