Skip to Main Content
Return to Navigation

Common ETL Issues

This table provides answers to questions about common ETL issues.

Question

Answer

How do you debug a job that aborted or completed with warnings?

See Managing Aborted and Failed Jobs.

What if I drop and recreate an index?

Dropping and recreating an index would increase the ETL performance. However the downside of it is when there are any integrity constraints defined via indexes at the DB level and they not being handled in the ETL application. In such cases, data that gets loaded might not be cleaner and this might produce errors during recreating the index.

How do I resolve the following issue I receive when running any job in DataStage:

I receive the error message - Could not load drsoci.so when pointed to an Oracle database and the DataStage server is UNIX.

Verify the dsenv file, which is a centralized file for storing environmental variables in the DataStage Server. It resides in $DSHOME, where $DSHOME identifies the DataStage main directory (for example /u1/dsadm/IBM WebSphere/DataStage/DSEngine).

The dsenv file is a series of Bourne shell arguments, which are referenced during DataStage server startup and can be referenced by interactive users or other programs or scripts. For a connection using a non-wire protocol driver, you generally need to specify the following in the dsenv file:

  • Environment variables required by the database client software

  • Database home location

  • Database library directory

Certain Plug-ins require shared libraries to be loaded and you need to include the library path in an environment variable. The names of the library path environment variables is platform dependent:

  • Solaris Platform = LD_LIBRARY_PATH

  • HP-UX Platform = SHLIB_PATH

  • AIX Platform = LIBPATH

  • Compaq Tru64 Platform = LD_LIBRARY_PATH

  • LINUX Platform = LD_LIBRARY_PATH

The following provides typical entries for commonly used databases:

# Oracle 8i

ORACLE_HOME=/space/oracle8i

ORAHOME=/space/oracle8i

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib:$ORACLE_HOME/rdbms/lib;export

LD_LIBRARY_PATH

ORACLE_SID=WSMK5

ORASID=WSMK5

export ORACLE_HOME ORAHOME ORACLE_SID ORASID

See IBM Information Server: Planning Installation and Configuration Guide

How do I resolve the following issue I receive when running any job in DataStage:

I receive the error message - Could not load drsdb2.so when pointed to a DB2 database and the DataStage server is UNIX.

Verify the dsenv file, which is a centralised file for storing environmental variables in the DataStage Server. It resides in $DSHOME, where $DSHOME identifies the DataStage main directory (for example /u1/dsadm/IBM WebSphere/DataStage/DSEngine).

The dsenv file is a series of Bourne shell arguments, which are referenced during DataStage server startup and can be referenced by interactive users or other programs or scripts. For a connection using a non-wire protocol driver, you generally need to specify the following in the dsenv file:

  • Environment variables required by the database client software

  • Database home location

  • Database library directory

Certain Plug-ins require shared libraries to be loaded and you need to include the library path in an environment variable. The names of the library path environment variables is platform dependent:

  • Solaris Platform = LD_LIBRARY_PATH

  • HP-UX Platform = SHLIB_PATH

  • AIX Platform = LIBPATH

  • Compaq Tru64 Platform = LD_LIBRARY_PATH

  • LINUX Platform = LD_LIBRARY_PATH

The following provides typical entries for commonly used databases:

#DB2 6.1

DB2DIR=/opt/IBMDB2/V6.1;export DB2DIR

DB2INSTANCE=DB2inst1; export DB2INSTANCE

INSTHOME=/export/home/DB2inst1;export INSTHOME

PATH=$PATH:$INSTHOME/sqllib/bin:$INSTHOME/sqllib/adm:$INSTHOME/sqllib/misc export PATH

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$INSTHOME/sqllib/lib;export LD_LIBRARY_PATH THREADS_FLAG=native;export THREADS_FLAG

How do I switch to a new project when the warehouse tables have already been loaded with data?

There might be a need to switch to new project when the warehouse tables have already been loaded for some time. In such cases, there is some project specific control data that must be restored onto the new project. For this purpose, it is always a good idea to backup this control data at some regular intervals of time after significant chunk of ETL loading gets completed.

PeopleSoft delivers utilities that the backup/recovery process.

Refer the section 'Running Datastage Project Utilities' and the following subsections in the PeopleSoft EPM Red Paper: ETL Implementation Steps (found in My Oracle Support) procedure to run these utilities:

  • Backup_SurrogateKey_HashFile – Utility

  • Backup_DateTime_HashFiles – Utility

  • Recovery_DateTime_HashFiles – Utility

  • Recovery_SurrogateKey_HashFile – Utility

If a job aborts after half of the one million rows are written to the tables, what will happen?

If the Transaction size is selected as zero and if the job aborts in the middle, then the job will rollback the transactions since it follows the principle of Two-way commit. If the Transaction size is anything other than zero and if the job fails in the middle, then the job will perform commits for the number of rows that processed till the error message.

How do I report an issue with a job log for a job that completed with warnings or errors?

Report an issue to Global Customer Support. To report an issue, you must include the job log of the last run. In DataStage Designer, view the log for the job in detailed view mode. Select Project, Print from the menu. In the Print dialog box, select the All entries, Full details, and Print to file options. Click OK, name the file, and send the log along with your issue description and other pertinent information.