Skip to Main Content
Return to Navigation

Configuring Delivered ETL Content

This table provides answers to questions about configuring and enhancing delivered ETL content.

Question

Answer

What are some of the configurations I may want to perform?

  • Convert an incremental load job to a destructive load job.

  • Parameterize the hash file directory for user defined directories that store hashed files (rather than the default project directory).

  • Remove lookups that do not pertain to your business.

  • Configure jobs for initial load.

How do I convert an incremental load job to a destructive load job?

See Converting Incremental Load Jobs to Destructive Load Jobs.

How do I run ETL jobs in destructive mode after some have already run as incremental loads?

See Converting Incremental Load Jobs to Destructive Load Jobs.

How do I add a new environmental variable?

See Adding New Environmental Variables.

What if I make a change that requires a new attribute in a dimension table?

If there is change in the data model with respect to a new addition of attribute to the EPM database, then you has to update the corresponding dimension job to incorporate this new attribute, otherwise the job will fail. If there is no source for this new attribute, then in the dimension job you can have a Default value assigned to this new attribute by using a routine that is delivered.

What if I make a change that requires a new dimension in a fact table?

If a new dimension key is added to a fact table in the database, then this is a change to the data model. Since the database has an additional dimension key for the fact table, this will result in changes to the ETL job. If this is a new dimension, then a new job has to be developed for this new dimension. Fact job must be updated accordingly with the correct dimension key and Corresponding SID population in the Fact table.

What if I make a change that requires a new measure in a fact table?

If a new measure is added to a fact table in the database, then this is a change to the data model. Since the database has an additional measure for the fact table, this will result in changes to the ETL job. Fact job must be updated accordingly with the correct measure getting assigned to the value that is either coming from the source directly or applying any logic that is required for this measure to be populated as per your requirements.

What if I make a change that requires a new dimension table?

A new ETL job has to be developed for this new dimension table as per the requirements.

What if I make a change that requires a new fact table?

A new ETL job has to be developed for this new fact table as per the requirements.

What are some techniques I can use to enhance ETL performance?

  • InterProcess Stage: The IPC stage is used to implement pipeline parallelism. The IPC stage can be used to explicitly de-link two passive stage activities to run in separate processes. As a good practice an IPC stage can be inserted before a database write stage.

  • Link Partitioner/Link Collector Stages: The link partitioner and link collector can be used in conjunction to implement partition parallelism. Usage of link partitioner and link collector can be envisaged where specific requirement exists to further enhance job performance. This can be used most effectively to run in parallel processes that take a lot of time. The functionality that is to be run in parallel is typically identical and can be made into a local container. This way any changes to the common parallel part of the job need to be effected only at a single place.

How can I enhance the parallel processing capabilities of ETL server jobs?

  • InterProcess Stage: The IPC stage is used to implement pipeline parallelism. The IPC stage can be used to explicitly de-link two passive stage activities to run in separate processes. As a good practice an IPC stage can be inserted before a database write stage.

  • Link Partitioner/Link Collector Stages: The link partitioner and link collector can be used in conjunction to implement partition parallelism. Usage of link partitioner and link collector can be envisaged where specific requirement exists to further enhance job performance. This can be used most effectively to run in parallel processes that take a lot of time. The functionality that is to be run in parallel is typically identical and can be made into a local container. This way any changes to the common parallel part of the job need to be effected only at a single place.

    See WebSphere DataStage Development: Designer Client Guide