ETL collaborations can extract data without filtering or with filtering using runtime inputs. You can also configure the batch size and configure the collaboration to use the same source table multiple time. Perform any of the following steps to configure the data extraction.
Sun Data Integrator allows you to pass values, known as runtime inputs, to ETL collaborations at runtime. You can use these values in extraction conditions. However, the use of such dynamic values are not limited to extraction; you can also pass values from BPEL business processes.
The following procedure describes how to add input runtime arguments to a Collaboration.
Open the collaboration you want to edit.
Right-click the ETL Collaboration Editor window and select Runtime Inputs.
The Add Input Runtime Arguments dialog box appears.
Click Add.
An empty row appears.
Double-click the empty row under Argument Name and enter the name for source record to be filtered.
Press Tab and enter the content that the record must contain to be selected.
Press Tab and select the SQL type for the record.
Press Tab and enter a number indicating the maximum length of the record.
Press Tab and enter a number indicating the scale for the record.
Click OK.
To increase performance during collaboration execution, you can configure the batch size for the temporary tables created for joined source tables. By tuning the batch size you can load data more efficiently into source tables.
By default, 5000 rows are populated at the same time into a source table. There is no upper limit to the batch size. The limit is determined by the amount of internal memory available on the machine running the collaboration. Generally, the lower the number the better, but adjust the value to determine the optimum performance.
The source table batch size only affects temporary source tables. To limit the number of rows fetched at a time, specify the batch size in the Properties panel for the target table.
Open the collaboration you want to edit.
Right-click the source table to set the batch size for, and then select Properties.
The Properties panel appears.
In the Batch Size property (under the Expert heading), enter the number of rows to populate at the same time into the temporary source table.
Click OK.
Sun Data Integrator only allows you to map a column in a source table to one column in a target table. If you need to map one source column to multiple target columns, you can use multiple instances of the same source table with different aliases. This topic gives a scenario and example for doing this.
The project has the following source tables: EMP_TBL and CODES_TBL. You can create a join view with these tables and you can drag another view of the CODES_TBL to the ETL Collaboration Editor canvas to create a third join. The third join is used in a code lookup.
The following table displays the sample data for the EMP_TBL source table:
Table 1–3 Employee Table
NAME |
ID |
JOB CODE |
DEPT CODE |
Dave |
1 |
p |
D1 |
Judy |
2 |
c |
D2 |
The following table displays the sample data for the CODES_TBL source table:
Table 1–4 Company Codes
CODE |
VALUE |
D1 |
Human Resource |
D2 |
Marketing |
P |
Permanent |
C |
Contractor |
The following figure shows the Collaboration and mapping with the correct data from a test run. The lookup loads the description for both jobs and departments from the CODES_TBL table. In this example, the table CODES_TBL is used twice in the join condition with aliases S2 and S3. In the join condition S2.Code is joined with S1.JOB_CODE and S3.Code is joined with S1.DEPT_CODE.
As you can see in the following figure, the first join view shows the condition S1.JOB_CODE = S2.CODE. This will load the job descriptions from the CODES_TBL to the target table column JOB.
The following figure shows the second join view with the condition S1.DEPT_CODE = S3.CODE. This loads the department descriptions from the CODES_TBL to the target table column DEPT.