6 Loading Strategies

This chapter explains the loading strategies used for loading data into the staging area. These strategies are implemented in the Loading Knowledge Modules.

This chapter includes the following sections:

6.1 Loading Process

A loading process is required when source data needs to be loaded into the staging area. This loading is needed when some transformation take place in the staging area and the source schema is not located in the same server as the staging area. The staging area is the target of the loading phase.

6.1.1 Loading Process Overview

A typical loading process works in the following way:

  1. A temporary loading table is dropped (if it exists) and then created in the staging area.
  2. Data is loaded from the source into this loading table using a loading method.

    Action 1 and 2 are repeated for all the source data that needs to be moved to the staging area.

    The data is used in the integration phase to load the integration table.

  3. After the integration phase, before the mapping completes, the temporary loading table is dropped.

6.1.2 Loading Table Structure

The loading process creates in the staging area a loading table. This loading table is typically prefixed with a C$.

A loading table represents a execution unit and not a source datastore. There is no direct mapping between the source datastore and the loading table. Execution units appear in the physical diagram of the mapping editor.

The following cases illustrate the notion of execution unit:

  • If a source CUSTOMER table has only 2 attributes CUST_NAME, CUST_ID used in mapping and joins on the staging area, then the loading table will only contain an image of these two attributes. Attributes not needed for the rest of the integration flow do not appear in the loading table.

  • If a CUSTOMER table is filtered on CUST_AGE on the source, and CUST_AGE is not used afterwards, then the loading table will not include CUST_AGE. The loading process will process the filter in the source data server, and the loading table will contain the filtered records.

  • If two tables CUSTOMER and SALES_REPS are combined using a join on the source and the resulting execution unit is used in transformations taking place in the staging area, the loading table will contain the combined attributes from CUSTOMER and SALES_REPS.

  • If all the attributes of a source datastore are mapped and this datastore is not joined on the source, then the execution unit is the whole source datastore. In that case, the loading table is the exact image of the source datastore. This is the case for source technologies with no transformation capabilities such as File.

6.1.3 Loading Method

The loading method is the key to optimal performance when loading data from a source to the staging area. There are several loading methods, which can be grouped in the following categories:

6.1.3.1 Loading Using the Agent

The run-time agent is able to read a result set using JDBC on a source server and write this result set using JDBC to the loading table in the staging area. To use this method, the knowledge module needs to include a command with a SELECT on the source with a corresponding INSERT on the target.

This method may not be suited for large volumes as data is read row-by-row in arrays, using the array fetch feature, and written row-by-row, using the batch update feature.

6.1.3.2 Loading File Using Loaders

When the mapping contains a flat file as a source, you may want to use a strategy that leverages the most efficient loading utility available for the staging area technology, rather than the standard LKM File to SQL that uses the ODI built-in driver for files. Most RDBMSs have fast loading utilities to load flat files into tables, such as Oracle's SQL*Loader, Microsoft SQL Server bcp, Teradata FastLoad or MultiLoad.

Such LKM will load the source file into the staging area, and all transformations will take place in the staging area afterwards.

A typical LKM using a loading utility will include the following sequence of steps:

  1. Drop and create the loading table in the staging area

  2. Generate the script required by the loading utility to load the file to the loading table.

  3. Execute the appropriate operating system command to start the load and check its return code.

  4. Possibly analyze any log files produced by the utility for error handling.

  5. Drop the loading table once the integration phase has completed.

6.1.3.3 Loading Using Unload/Load

When the source result set is on a remote database server, an alternate solution to using the agent to transfer the data is to unload it to a file and then load that file into the staging area.

This is usually the most efficient method when dealing with large volumes across heterogeneous technologies. For example, you can unload data from a Microsoft SQL Server source using bcp and load this data into an Oracle staging area using SQL*Loader.

The steps of LKMs that follow this strategy are often as follows:

  1. Drop and create the loading table in the staging area

  2. Unload the data from the source to a temporary flat file using either a source database unload utility (such as Microsoft SQL Server bcp or DB2 unload) or the built-in OdiSqlUnload tool.

  3. Generate the script required by the loading utility to load the temporary file to the loading table.

  4. Execute the appropriate operating system command to start the load and check its return code.

  5. Possibly analyze any log files produced by the utility for error handling.

  6. Drop the loading table once the integration KM has terminated, and drop the temporary file.

When using an unload/load strategy, data needs to be staged twice: once in the temporary file and a second time in the loading table, resulting in extra disk space usage and potential efficiency issues. A more efficient alternative would be to use pipelines between the "unload" and the "load" utility. Unfortunately, not all the operating systems support file-based pipelines (FIFOs).

6.1.3.4 Loading Using RDBMS-Specific Strategies

Certain RDBMSs have a mechanism for transferring data across servers. For example:

  • Oracle: database links

  • Microsoft SQL Server: linked servers

  • IBM DB2 400: DRDA file transfer

Other databases implement specific mechanisms for loading files into a table, such as Oracle's External Table feature.

These loading strategies are implemented into specific KM that create the appropriate objects (views, dblinks, etc.) and implement the appropriate commands for using these features.

6.2 Case Studies

This section provides example of loading strategies.

6.2.1 LKM SQL to SQL

The LKM SQL to SQL is a typical example of the loading phase using the agent.

The commands below are extracted from this KM and are provided as examples. You can review the code of this knowledge module by editing it in Oracle Data Integrator Studio.

6.2.1.1 Drop Work Table

This task drops the loading table. This command is always executed and has the Ignore Errors flag activated. It will not stop the LKM if the loading table is not found.

Command on Target

drop table <%=snpRef.getTable("L", "COLL_NAME", "A")%>

6.2.1.2 Create Work Table

This task drops the loading table. This command is always executed.

Note the use of the property COLL_NAME of the getTable method that returns the name of the loading table.

Command on Target

create table <%=snpRef.getTable("L", "COLL_NAME", "A")%>( <%=snpRef.getColList("", "[CX_COL_NAME]\t[DEST_WRI_DT] " + snpRef.getInfo("DEST_DDL_NULL"), ",\n\t", "","")%>)

6.2.1.3 Load Data

This task reads data on the source connection and loads it into the loading table. This command is always executed.

Note:

The loading phase is always using auto commit, as ODI temporary tables do not contain unrecoverable data.

Command on Source

Note the use of the getFilter, getJoin, getFrom, etc. methods. These methods are shortcuts that return contextual expressions. For example, getFilter returns all the filter expressions executed on the source. Note also the use of the EXPRESSION property of getColList, that will return the source attributes and the expressions executed on the source. These expressions and source attributes are aliases after CX_COL_NAME, which is the name of their corresponding attribute in the loading table.

This select statement will cause the correct transformation (mappings, joins, filters, etc.) to be executed by the source engine.

select       <%=snpRef.getPop("DISTINCT_ROWS")%>       <%=snpRef.getColList("", "[EXPRESSION]\t[ALIAS_SEP] [CX_COL_NAME]", ",\n\t", "", "")%>from        <%=snpRef.getFrom()%>where        (1=1)<%=snpRef.getFilter()%><%=snpRef.getJrnFilter()%><%=snpRef.getJoin()%><%=snpRef.getGrpBy()%><%=snpRef.getHaving()%>

Command on Target

Note here the use of the binding using :[CX_COL_NAME]. The CX_COL_NAME binded value will match the alias on the source attribute.

insert into <%=snpRef.getTable("L", "COLL_NAME", "A")%>(       <%=snpRef.getColList("", "[CX_COL_NAME]", ",\n\t", "","")%>)values(       <%=snpRef.getColList("", ":[CX_COL_NAME]", ",\n\t", "","")%>)

6.2.1.4 Drop Work Table

This task drops the loading table. This command is executed if the DELETE_TEMPORARY_OBJECTS knowledge module option is selected. This option will allow to preserve the loading table for debugging.

Command on Target

drop table <%=snpRef.getTable("L", "COLL_NAME", "A")%>