Skip Headers
Oracle® Fusion Middleware User's Guide for Oracle Business Intelligence Data Warehouse Administration Console
11g Release 1 (11.1.1)

Part Number E14849-07
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

16 Moving Data Into a TimesTen Database

DAC supports Oracle TimesTen In-Memory Database (TimesTen) as a data warehouse database type. This chapter provides information about how to move data into a TimesTen database.

This chapter contains the following topics:

Before You Move Data Into a TimesTen Database

This section provides information you should be familiar with before you move data into a TimesTen database.

About TimesTen as a Database Type

Oracle TimesTen In-Memory Database (TimesTen) operates on databases that fit entirely in physical memory (RAM). Because memory is limited, you should make sure a data warehouse that resides in a TimesTen database holds only the necessary tables and columns and that the columns are of an optimal size. For example, you could store a full set of data warehouse tables in an Oracle database, and then move a subset of the tables to a TimesTen database.

The In-Memory check box in the Tables tab and Columns subtab of the Design view is a flag that controls whether DAC designates the tables and columns as In-Memory. Only In-Memory tables and columns will be created and upgraded in the TimesTen database. If a table is In-Memory, by default all of its columns are also In-Memory. You can individually deselect a column's In-Memory check box if you do not want the column to be In-Memory.

In-Memory tables and columns are created, updated, and dropped on a TimesTen database in the same way they are for Oracle, SQL Server, and DB2 databases. For instructions on managing data warehouse schemas, including a schema on a TimesTen database, see Chapter 10, "Managing Data Warehouse Schemas."

Setting Up an ODBC Connection for a TimesTen Database

DAC communicates with TimesTen databases using an ODBC connection. Note the following points:

  • The TimesTen client needs to be installed on the machine where DAC is running.

  • The network port of the TimesTen Server is required for the ODBC connection. To determine the network port, open a DOS Command Prompt and go to the <TimesTen>\bin directory. Then, run the ttstatus command on the machine where the TimesTen server is installed.

  • Make a note of the Client DSN. This value is required in the ODBC Data Source field of the DAC Physical Data Sources tab when you register the TimesTen database as a data source.

About DAC Support for Indexes in TimesTen Databases

Consider the following points about DAC support for indexes in TimesTen databases:

  • You can specify which database types an index is applicable to by going to the Indices tab and double-clicking in the Databases column. The Supported Database Types dialog enables you to select the appropriate database types for the index.

    If an index is not required for a table in a TimesTen database, use the Supported Database Types dialog to deselect TimesTen as a supported type for that particular index. Caution: If you inactivate the index by selecting the Inactive check box, you will inactivate the index for all database types.

  • If you want to create a special type of index that DAC does not natively support, you can use the Actions framework to define a template. For instructions, see "Using Actions to Optimize Indexes and Collect Statistics on Tables".

  • If an index is not necessary for a table in the TimesTen database, you can inactivate it by going to the Indices tab and selecting Inactive.

Customizing How Tables Are Analyzed in TimesTen Databases

The customsql.xml file, located in the <DAC_Config_Location>\CustomSQLs directory, contains the default syntax that DAC uses for analyzing tables on all database types, including a TimesTen database. You can edit the customsql.xml file to change the default behavior. For instructions, see "Customizing customsql.xml to Drop and Create Indexes and Analyze Tables".

You can customize the analyze command for individual tables using the Actions framework. For more information about the Actions framework, see "Using Actions to Manage Indexes, Tables and Tasks".

TimesTen Data Types

Table 16-1 shows the data types for DAC table column conversion to a TimesTen table column.

Table 16-1 Conversion of Data Types from DAC to TimesTen

DAC Data Type TimesTen Data Type

CHAR

CHAR

VARCHAR

VARCHAR2

NUMBER

NUMBER

DATE

TIMESTAMP


Overview of Moving Data From the Current Warehouse Into a TimesTen Database

You can move data into a TimesTen database from an existing warehouse that resides on Oracle, MS SQL, Teradata, DB2 and DB2-390 databases. In addition to moving data, DAC also facilitates implementation of the schema.

The data copy can consist of the following:

Tasks for Replicating Detailed Data into TimesTen

The tasks for moving detailed data into a TimesTen database are as follows:

Tasks for Incorporating Aggregations into TimesTen

The tasks for incorporating aggregations into a TimesTen database are as follows:

Step 1: Identify Tables for Replication Into a TimesTen Database

Follow this procedure to identify the tables for replication into a TimesTen database.

To identify tables for replication into a TimesTen database:

  1. Create a new subject area for the set of star schemas that you want to be deployed on the TimesTen database. For instructions on creating a subject area, see "Creating a Subject Area."

  2. In the Subject Areas tab of the Design view, select the Configuration Tag Tasks Only option. This ensures the normal ETL tasks do not appear when you assemble the subject area.

  3. Click Assemble.

    The Subject Area Assembly dialog appears.

  4. In the Subject Area Assembly dialog, prune the list of associated dimensions, facts, and aggregates that are not required for reporting, by doing the following:

    1. Deselect the dimension, fact, and aggregate tables that you do not need for reporting.

    2. Click Calculate Task List.

      Note: The Task List tab will state it found 0 tasks.

    3. Click Accept.

      This action persists the tables list, which can be viewed in the Extended Tables (RO) subtab. The Included check box is deselected for the tables that were deselected in step a.

  5. In the Subject Areas tab, right-click the new subject area, and select In-Memory, and then select Mark as In-Memory Tables.

    This action marks as In-Memory only the tables that have the Included check box selected in the Extended Tables (RO) subtab.

    You can also mark tables as In-Memory by doing one of the following:

    • In the Extended Tables (RO) subtab, right-click and select Mark as In-Memory.

    • Perform a mass update on the tables to select the In-Memory check box by using flat views querying. For instructions, see "Using Flat Views Querying".

Step 2: Run Schema Profiling to Identify Data Types Based on Data Size and Shape

The DAC schema profiling feature enables you to introspect one or more tables, and, based on the data shape and size, suggest TimesTen-specific property overrides. This feature is primarily used for In-Memory overrides, but you can also use it to understand, in general, the data in a table.

To perform schema profiling on one or more tables:

  1. In the Tables tab of the Design view, right-click on the table you want to introspect, or query for tables, and select In-Memory, and then select Profile and Discover IM Properties.

  2. In the Profile and Discover IM Properties dialog, specify the following properties:

    Property Description

    Data Source

    The data source for the table.

    Padding (% of max existing length):

    For string and varchar type columns, DAC will suggest the length of the column size based on the maximum length of columns. It will add this additional percentage to determine the suggested data length.

    While computing the compression maximum values, this number is used to accommodate for the extra distinct values.

    For numeric columns, the maximum and minimum values are enhanced based on this value to come up with the appropriate data type.

    Compression threshold (distinct values as %):

    When there are many distinct values, you can use TimesTen's capability to store an internal dictionary, and, therefore, reduce the storage requirements for that column. When compression is suggested, the distinct values are less than one-third of the NOT NULL row count, but you can suggest the percentage value that should be used as an upper threshold for compressing the columns.

    Unicode

    Indicates whether the schema should accommodate Unicode (multi-byte) data. You can define some columns to be Unicode and others as non-Unicode (containing only ASCII characters). When this property is selected, the sizing computation will use the individual columns's Unicode property. Otherwise, it will treat all of the columns as non-Unicode.

    Override manually modified columns

    If you manually modified some of the suggested data types, lengths, or compression buckets, the subsequent profiling will not update them. However, if you want the profiling to update the manually updated columns, select this check box.

    The In-Memory Auto (RO) flag in the In-Memory Overrides subtab of the Tables tab indicates that the properties were automatically generated. When you modify any of the In-Memory properties, this flag becomes deselected.


  3. Upon completion of the profiling, you will be asked if you want to persist the In-Memory schema overrides in the DAC repository.

    When the check box "Mark columns with no values as not In-Memory" is selected, the In-Memory flag is not selected for the columns that have no values. These columns do not appear in the schema creation for TimesTen. Note: Make sure these columns are not referenced in dashboard reports.

    This dialog is described in the surrounding text.

    If you select "Mark columns with no values as not In-Memory," then you also need to inactivate (in the TimesTen database only) the indexes that refer to these columns. When you select "Mark columns with no values as not In-Memory" and click Yes, the next dialog enables you to inactivate the columns in the TimesTen database.

    This dialog is described in the surrounding text.

Profiling Metrics

When you perform data profiling on a table, DAC inspects the table by querying all the columns and collecting metrics, as described in this section. The profiling runs queries to the database in parallel. The number of connections defined to the data source determines the parallelism for each of the columns and across the tables. Thus, by increasing the number of connections to the database, you can increase the parallelism, and increase the performance of the process. Note that the process can be long-running, especially when you perform profiling on more than one table at a time.

The SQLs used by the profiling process are in the schema_profiler.xml file, which is stored in the CustomSQLs directory. For calculating the size, DAC considers the total number of rows and metrics for the columns. For example, if the table name is MY_TABLE, and you want to restrict rows for profiling with the filter of records for one year based on a column called CREATED_DT, you would use the filter shown below. Do not include an "AND" clause.

<SqlQuery name = "PROFILE_FILTER_FOR_MY_TABLE">
CREATED_DT > SYSDATE - 365
</SqlQuery>

Varchar/Char Numeric Date Other Types

NOT NULL row count

X

X

X

X

Number of distinct values

X

X

X

X

Max data length

X

     

Max value

 

X

   

Min value

 

X

   

Based on the metrics in the table above, DAC provides suggestions. For string and Varchar type columns, depending on the maximum length of data used and the padding percentage, DAC suggests the In-Memory data length. DAC also computes the number of bytes. This size should not be construed as the size requirement for TimesTen; instead, it should be viewed as an indication of overall size. In actuality, however, this size may be more or less.

The summary is published in a file called SchemaProfile_Timestamp.txt. Detailed information is provided in the CSV files TableStatistics_Timestamp.csv and ColumnStatistics_Timestamp.csv. All of these files are stored in the DAC HOME\log\SchemaProfile directory.

The overall summary contains the following information:

  • Number of tables processed

  • Actual size (bytes) of all tables

  • Suggested size (bytes) of all tables including unused columns

  • Suggested size of all tables excluding unused columns

Table Statistics

The table statistics will contain the following information:

Attribute Description

Table Name

Name of the table.

# Columns

Total number of columns.

#Unused columns

The columns that do not have any value.

# Rows

Total number of rows.

Actual Table Size in Bytes

Total size of the data without any TimesTen-specific data types or reduced column lengths.

Suggested Table Size including unused columns

Total size of the data with TimesTen-specific data types and reduced column lengths for all columns.

% Suggested

The percentage of the suggested data size versus actual data size requirement.

Suggested Table Size excluding unused columns

Total size of the data with TimesTen-specific data types and reduced column lengths excluding the unused columns.

% Suggested Size excluding unused columns

The percentage of the suggested data size for only the columns with values versus actual data size requirement.


Column Statistics

The column statistics will contain the following information:

Attribute Description

Table name

Name of the table.

Column name

Name of the column.

Datatype

Data type of the column - as returned by the JDBC layer.

Unicode

Indicates whether the column is marked as Unicode in the DAC repository. When the sizing is done without the Unicode flag checked, it will be false for all columns.

# Not Nulls

Number of NOT NULL rows in the table.

# Distinct values

Number of distinct values.

% Distincts

The percentage of distinct values to the NOT NULL row count.

Min Value (for number type)

Minimum value - only for numeric data types.

Max Value (for number type)

Maximum value - only for numeric data types.

Original size

The data length of the column in the database where profiling is done.

Suggested data length (for string type)

Suggested data length based on the maximum length used and the padding percentage. This applies to string/varchar type columns.

Compressed?

If the percentage of distinct values to the NOT NULL row count is lesser than the compression threshold given while profiling, it will be true, else false.

Suggested datatype (for number type)

Suggested TimesTen numeric data type based on the minimum and maximum values of the column.

Actual column size

Computed based on the table row count and the actual data length.

Suggested column size for all rows

Computed based on the table row count and the suggested data length data type.

Should ignore?

If column is unused, meaning if there is no data in the column, then true, else false.


String (CHAR or VARCHAR type) Columns

As an example of profiling, assume a column is defined as varchar(100), and has 100 NOT NULL values, and the maximum length of data in the table is 10 characters. Then, if you were to assume 50% padding, the suggested data length will be 15 characters.

The following table shows an example of size calculation.


Calculation Size in Bytes (Non Unicode) Size in Bytes (Unicode)

Actual size

-

100

100 * 4 = 400

Suggested size

Max length used in column * padding %

10 + 5 = 15

(10 + 5) * 4 = 60

Total actual size

Number of rows * actual length of column

(100 * 100) = 10000

(100 * 400) = 40000

Total suggested size

Number of rows * suggested size

(100 * 15) = 1500

(100 * 60) = 6000


The ratio of the number of distinct values to the total NOT NULL row count helps decide whether a column needs to be compressed. For example, if a table has 100 rows, and there are 10 distinct values for the columns. Ten percent of the distinct values is below the "Compression %" given in the Profile and Discover IM Properties dialog. So, DAC will suggest that the column be compressed. Because 10 distinct values belong under the smallest bucket of dictionary values, DAC suggests the Max compression value be less than 255, which takes one byte to store.

In addition, the following apply when DAC checks to see if a column needs compression:

  • The table is large enough that the total count of NOT NULL column values in the table is greater than 1024.

  • The column size (for example, VARCHAR2(10) has size of 10) is smaller than the pointer size, where the pointer is the In-Memory structure used to map the compressed table's compressed column entry to its real value. The size of a table column's pointer is a function of the number of distinct values in that column.

The Max Compression value is based on the following:

Number of bytes to store actual value Number of distincts <=

1

255

2

2^16-1

4

2^32-1


Below is an example of size calculation for a compressed column:


Calculation Size in bytes

Total actual size

Number of rows * actual length of column

(100 * 100) = 10000

Total suggested size

(Number of rows * compressed size) + (number of distinct rows * max length of columns)

(100 * 1) + (10 * 10) = 200


For Unicode implementations, if a column is defined as a Unicode column, each character will require up to four bytes for storage.

Numeric Columns

Numerical columns can be either whole numbers or decimals.

The minimum and maximum values of data present in a column determine what the suggestions are going to be. In TimesTen, there are several numerical data types with different storage requirements, as shown below.

Data type Min/Max values Size in bytes

TT_TINYINT

0 - 2^8 (no negative values)

1

TT_SMALLINT

-2^16 to 2^16

2

TT_INTEGER

-2^31 to 2^31

4

TT_BIGINT

-2^63 to 2^63

8

TT_BINARYFLOAT

Float.maxValue

4

TT_BINARYDOUBLE

Depending on scale and precision

8

Generic Number

 

Up to 22


As an example, if a column has 100 values, and defined as NUMBER(10). The maximum value is 200 and minimum value is -10. With the padding percentage, the maximum value is 300 and the minimum value -15. TT_TINYINT is not a good choice because the minimum value contains negative numbers. The next bucket is TT_SMALLINT.

Below is an example of size calculation for SMALLINT.


Calculation Size in bytes

Actual size

-

22

Suggested data type

TT_SMALLINT

2

Total actual size

Number of rows * actual length of column

22 * 100 = 2200

Total suggested size

Number of rows * suggested size

2 * 100 = 200


Date Type Columns

The data type conversion to TT_TIMESTAMP happens automatically. The statistics collection involves getting the NOT NULL row counts. The size of this data type is eight bytes. The total size would be the number of rows in the table multiplied by eight.

Other Data Type Columns

DAC does not handle any other data types, such as CLOB, BLOB, and so on. For these kinds of columns, the statistics collection involves getting the NOT NULL row counts. The size of the data type is from the JDBC driver implementation.

Unused Columns

There may be columns in the underlying data warehouse that do not have any values. DAC can identify such columns and mark them in the DAC repository with an In-Memory flag set to false. By doing so, these columns do not participate in the schema creation process.

The suggested size would be the same as the actual column for the overall size computation for columns with no values.

Marking a column as unused, means that the column may not get created on the target schema. When such columns are activated, they should not be accessed from the reporting layer.

Step 3: Identify Columns for Replicating Into a TimesTen Database

Follow this procedure to identify columns for replicating into a TimesTen database.

To identify columns for replicating into a TimesTen database:

  1. Identify the table columns you want to replicate and make sure the In-Memory check box is selected in the Columns subtab of the Tables tab in the Design view. Deselect the In-Memory check box for columns you do not want to replicate.

    Note: If a table is designated as an In-Memory table, by default all the table columns are also designated as In-Memory.

  2. Run the schema profile for all the tables under the Subject Areas tab by right-clicking and selecting In-Memory, and then selecting Profile and Discover IM Properties.

    Alternatively, you can access the Profile and Discover IM Properties dialog by right-clicking in the Tables tab on one or more tables.

    The schema profile action will set the TimesTen-specific data types, lengths, and compression definitions. This functionality also identifies the columns that are not used (all null values), and helps you eliminate them. If there are indexes that refer to the eliminated columns, the indexes are marked as not applicable for TimesTen.

    The Profiling feature queries and understands the data shape in your implementation, and helps to adjust the size of the column. For example, if a column is defined as VARCHAR(500), but the values never exceed 50 characters, then it overrides the column length for the TimesTen schema (In-Memory Length). For numeric columns, appropriate TimesTen-specific data types are suggested. For additional information about how DAC does profiling, see "Step 2: Run Schema Profiling to Identify Data Types Based on Data Size and Shape".

  3. Alternatively, you can modify the In-Memory properties manually.

    The manually modified records are not updated in the subsequent profiling unless the "Override manually modified columns" property is selected in the Profile and Discover IM Properties dialog.

    This dialog is described in the surrounding text.

    The Profiling functionality also identifies columns that are not used (contain all null values). When the profiling process completes, you can choose to mark the unused columns as not In-Memory columns. You may want to consider this option because even empty columns (when defined as Inline) will take up physical space in memory. The profiling functionality does not optimize the data types for columns with all null values because the shape of the data is unknown. For example, if a column of type varchar (50) has all null values, the suggested size would be still 50.

    When you deselect the In-Memory check box for a column, it will not get created on the target schema. Make sure columns such columns are not referenced in dashboard reports.

Step 4: Create or Upgrade the Warehouse Schema

Follow this procedure to create or upgrade the warehouse schema.

To create or upgrade the warehouse schema:

  1. Copy the following TimesTen JDBC libraries into the DAC\lib directory:

    • ttjdbc6.jar

    • orai18n.jar

    • timestenjmsxla.jar

    • jms.jar

    • javax.jms.jar

  2. Define the ODBC data source and an equivalent connection in DAC in the Physical Data Sources tab in the Setup view.

  3. Right-click on the subject area or areas, and select In-Memory, and then select Generate DW Scripts.

  4. In the Generate DW Table Scripts dialog, select the following properties:

    Property Description

    Type of script generation

    • Create New. Select this option to generate a SQL script to create new data warehouse tables.

    • Create Missing. Select this option to generate a SQL script to create tables in the data warehouse that were added to the DAC repository but were not yet added to the data warehouse.

    • Upgrade Existing. Select this option to generate a SQL script to upgrade existing data warehouse tables.

    Physical Data Source Type

    Select TimesTen.

    Change default parameter file

    (Optional) Browse to select a parameter file different from the default.

    Unicode

    Select this option if the warehouse is Unicode.

    Execute

    Select this option of you want DAC to execute the SQL script automatically after it is generated.

    Drop Tables

    Select this option to generate a SQL script to drop tables from the data warehouse schema.

    Compress Columns

    Select this option if you want columns to be compressed.


Step 5: Create Data Copy Tasks and Include Them in a Subject Area

DAC will perform a full data copy of tables for the first load of the TimesTen database. Subsequent data copy processes can be carried out in either full or incremental mode. When the base tables in the data warehouse are incrementally refreshed, you might want to perform the data copy in incremental mode also. If the base tables are always rebuilt, which may be the case for some aggregate tables, then you might want to enable full data copy on an on-going basis.

To create data copy tasks:

  1. Create a task phase named "Data Copy to Timesten" and set the priority greater than the existing phases, for example, 100. This will ensure that the data copy tasks run after the rest of the phases. To create a task phase, select Tools, Seed Data, Task Phases.

  2. Create a logical connection called DBConnection_TT for the In-Memory database. Select Tools, Seed Data, Logical Data Sources.

  3. Create a Configuration Tag to isolate the data copy tasks. For example, create a tag called "DataCopy to Timesten." For instructions on creating configuration tags, see "Working with Configuration Tags".

  4. In the Subject Areas tab, select the appropriate subject area, right-click, and select In-Memory, Generate Data Copy Tasks.

    Alternatively, you can do the following:

    1. In the Tables tab, select the tables that need to be incrementally refreshed.

    2. Right-click, and then select Generate Data Copy Tasks.

    The Generate Data Copy Tasks dialog is displayed.

  5. Specify the following properties:

    1. For the Primary Logical Source, select DBConnection_OLAP.

    2. For the Primary Logical Target, select DBCOnnection_TT.

    3. For the Task Phase, Data Copy to TimesTen.

    4. Select Enable Incremental Data Copy if you want to allow incremental loads. Otherwise, leave it unchecked.

    5. Specify the configuration tag you created in step 3.

      Note that it can be beneficial to isolate the data copy tasks to a special category so that they do not appear in the normal subject area assembly until they are specifically included.

  6. For incremental data copy, you must specify the unique key and the filter condition to identify the incremental data.

    1. Specify the primary/unique key(s): This is a column or a set of columns that uniquely identifies a record. When the naming of the unique key is consistent across the tables, it can be defined as a source system level parameter. Create a source system level parameter called "DataCopyUniqueKey" and provide the column name that uniquely identifies the records. For example, for Oracle BI Applications, the ROW_WID generally identifies the record as unique. For instructions on creating a parameter, see "Defining Parameters".

      For those tables in which the pattern is not the same, the information can be overwritten as a task level parameter. For example, for Oracle BI Applications data warehouse implementations, ROW_WID is the name of the primary key for all the facts, dimensions and aggregates. However, for a certain table that tracks the ZIP codes, the unique key is ZIPCODE, which can be specified as a task parameter.

      Alternatively, you can identify the primary keys in the table columns with the attribute PK Order. This is particularly useful when the unique key naming convention is not consistent across all tables. For example, for the Products table, it is PRODUCT_ID, and for the orders table, it is ORDER_ID.

      Note: In general, the parameters override the primary key definition at the column level if all the columns defined as a parameter exist in the target table; otherwise, DAC looks into the target table's primary key ordering.

    2. Specify the filter criteria to identify the subset of data to be extracted incrementally by defining a parameter at the source system level called DataCopyIncrementalFilter. Typically the rows touched during an ETL process (for either inserts or updates), get updated with either a timestamp or, as in Oracle BI Applications implementations, a column such as ETL_PROC_WID; the parameter should make use of that information.

      For example, DAC provides for either ETL_START_TIME or ETL_PROC_WID, LAST_REFRESH_DATE or LAST_ETL_PROC_WID.

      If the data copy occurs as part of the ETL process, then the filter criteria could be ETL_PROC_WID > @DAC_ ETL_PROC_WID. If the data copy occurs as a separate scheduled process, then the filter criteria could be ETL_PROC_WID > @DAC_LAST_ETL_PROC_WID. For more information, see "Defining Incremental Filter Parameters".

  7. DAC performs a select statement for the common columns between the source and target from a table for full loads, and appends the filter clause for incremental loads. If you want to specify extra filters to restrict the number of rows, do one of the following:

    • Specify another parameter called DataCopyExtraReadFilter.

    • Create a SQL override with the parameter name DataCopySQLOverride at the task level. This can have a different value for full and incremental modes. When you write SQL, make sure you match the aliases of the column result with the target table's column name. For example to load into a table T1, with columns A and B, your select statement should be as follows:

      Select COL1 as A, COL2 as B from table T2

    All the data copy tasks are now generated. You can view them in the Tasks tab. The can now be included in the existing subject areas or grouped under a new subject area.

  8. Go to the subject area that you created originally for data copying purposes, and associate the configuration tag, and assemble the subject area again. Now, all the data copy tasks are associated with the subject area. Alternatively, you can manually include or exclude tasks in the subject area.

Defining Incremental Filter Parameters

This section provides a summary of how to define incremental filter parameters for copying data into a TimesTen database. For detailed instructions on creating parameters, see Chapter 8, "Defining and Managing Parameters."

Two parameters are required for data copy. The first is supplementary and is used to hold the value of the required DAC variable. The second will actually define the filter clause.

If you use ETL_PROC_WID based filters, the parameters should both have the Data Type specified as Text, and Load Type as Incremental. The first parameter will be a runtime parameter and will hold the value of @DAC_LAST_ETL_PROC_WID or @DAC_CURRENT_ETL_PROC_WID (DAC Variable field in the Enter Parameter Value dialog).

The second parameter is the actual filter and should be called DataCopyIncrementalFilter. This parameter is static, and the filter clause (entered in the Enter Parameter Value dialog) is: ETL_PROC_WID > @DAC_LAST_ETL_PROC_WID.

Notice that it uses the exact name of the first parameter with the @DAC_ suffix. So, if you use the @DAC_CURRENT_ETL_PROC_WID variable instead, and call the auxiliary parameter CURRENT_ETL_PROC_WID, then the filter clause will change to:

ETL_PROC_WID > @DAC_CURRENT_ETL_PROC_WID

If you are going to use a time-based filter (@DAC_ETL_START_TIME_FOR_SOURCE or @DAC_TARGET_REFRESH_TIMESTAMP), the supplementary parameter should still have the Load Type specified as Incremental, but the Data Type should be Timestamp. Note: The Connection Type should be based on the source, which is where the filter will be applied. The filter clause for the second parameter would change accordingly, for example, ETL_PROC_WID > @DAC_SOURCE_START_TIME.

Handling Deletes

When you mark a record as deleted in Oracle Business Analytics Warehouse, the delete flag is set to 'Y' (by an upsert operation), but the record is not physically deleted from the table. This type of delete is often referred to as a "soft delete."

For reporting purposes, you can filter out such deleted records. If you are using a TimesTen database and you have a significant number of deleted records, you may want to prune the deleted records so they are not moved into the TimesTen database. To do so, you can create a "On Success" task action using the following SQL:

delete from my_target where soft_deleted_flg = 'Y'

Enabling Aggregate Population

DAC can execute aggregate table populations. To populate subdimensions, you only need to specify the SQL for subsetting the dimensions. However, to populate aggregate tables, you might need to figure out the foreign keys on the target side based on natural keys. DAC creates and populates a temporary staging table in the image of the result set to hold the data from the SQL override, which extracts the natural keys to the target side. On this table, an additional write SQL can be run to query for the foreign keys to write into the aggregate table. This approach has two advantages: 1) There are no high-cost lookups needing expensive caching of the dimensional keys; 2) The ELT process happens in small batches, thus reducing the footprint of data required on the target database.

For example, assume you are working with the Agreement Item fact, with Product as one of the dimensions. The intention is to create a product subdimension, and an aggregate table that refers to the subdimension.

The base fact and dimension tables are on the base data warehouse, and the subdimension and aggregate tables are on the Timesten database. Figure 16-1 illustrates this setup.

Figure 16-1 Enable Aggregate Population

This graphic is described in the surrounding text.

First, you need to register the new tables and their column information in the DAC repository.

W_PROD_SUB_D has two columns:

  • ROW_WID NUMBER(10) - surrogate key

  • PROD_NAME VARCHAR(100) - name of the product

W_AI_PROD_A has three columns:

  • ROW_WID NUMBER(10) - surrogate key

  • NUM_AGREE_ITEMS NUMBER(10) - a measure

  • PROD_SUB_WID NUMBER(10) - Foreign key to product sub dimension on the TimesTen warehouse

Steps for Populating the Subdimension

For populating the subdimension, you need to create a data copy task for W_PRODUCT_D (or copy record if one exists already).

Ensure that the command for full and incremental loads are set to INSERT_ONLY_DATA_COPY, and the 'Truncate Always' property for the target table is selected if you plan to extract the dimension for each run in full.

If you can identify the changed records from the source, you can use INSERT_UPDATE_DATA_COPY for the incremental command. The incremental filter 'DataCopyIncrementalFilter' should be defined in the parameters, and the primary key should be specified at the table-column level or with the parameter called 'DataCopyUniqueKey'.

Select W_PROD_SUB_D as the target table, and set the appropriate truncate properties. In the Parameters subtab of the Tasks tab, right-click and choose "New Data Copy Parameter." In the New Data Copy Parameter dialog, select DataCopySQLOverride from the drop-down list. For the Value, use the following SQL:

SELECT
     rownum as ROW_WID,
     PROD_NAME
FROM
     (
          SELECT
               distinct PROD_NAME
           FROM
                W_PRODUCT_D
     ) DISTINCT_PRODUCTS

When the data copy runs, it uses the SQL override provided to fetch the data, matches the target columns with the aliases of the resultset columns, and moves the data. In the example above, rownum (in Oracle syntax) is used to generate the surrogate keys.

Steps for Populating the Aggregate-Dimension

For populating the aggregate table, the foreign key value needs to be derived from the target. This happens in two stages:

1. Extract the aggregated data from the source (using the DataCopySQLOverride parameter).

2. Resolve the foreign keys before loading (using the DataCopyAggregateInsertSQL parameter).

To do this, use the full and incremental command name as ELT_DATA_COPY.

During this process, the temp table is populated one batch at a time. Next, the insert SQL is used to transform and load the batch into the aggregate table. Then, the temp table is truncated and loaded with data from the next batch; then, DataCopyAggregateInsertSQL transforms the data. This process repeats until there is no more data left. Upon successful completion, the temp tables get dropped. Figure 16-2 depicts this process.

Figure 16-2 Populating the Aggregate Dimension

This graphic is described in the surrounding text.

In this example, there are two SQLs, as follows:

  • DataCopySQLOverride gets the data into the staging table:

    SELECT
         ROWNUM AS ROW_WID
         ,NUM_AGREE_ITEMS
         ,PROD_NAME 
    FROM
         (SELECT 
              COUNT(*) NUM_AGREE_ITEMS
              ,D.PROD_NAME AS PROD_NAME
         FROM
              W_AGREEITEM_F F
              ,W_PRODUCT_D D
         WHERE
              F.PRODUCT_WID = D.ROW_WID
         GROUP BY 
              D.PROD_NAME
         ) SUMMARY_SQL
    
  • DataCopyAggregateInsertSQL resolves the foreign keys and writes to the target table:

    INSERT INTO W_AI_BY_PROD_A 
    (ROW_WID, PRODUCT_SUB_WID, NUM_AGREE_ITEMS)
         SELECT
              F.ROW_WID AS ROW_WID
              ,D.ROW_WID AS PRODUCT_SUB_WID
              ,F.NUM_AGREE_ITEMS AS NUM_AGREE_ITEMS     FROM
                   W_PROD_SUB_D D
                   ,%tmp_stage_table_name% F
              WHERE
                   D.PROD_NAME = F.PROD_NAME
    

    Note, that the temporary staging table name is parameterized in the INSERT SQL. This is because there could be multiple writers specified and each writer will use its own temporary staging table. DAC will dynamically generate table names based on the aggregate table name and the writer number. Make sure to use the token %tmp_stage_table_name%, as in the example.

    If you are using Summary Advisor or Aggregate Wizard to create your summary tables, you should be able to get the physical SQL from the nqquery.log and use that to define the DataCopySQLOverride for both aggregate-dimension(s) and aggregate(s). The DataCopyAggregateInsertSQL, however, still needs to be specified, because it is required to resolve the foreign keys.

    A sample of query generated by Oracle BI Server might look similar to the following:

    select sum(T3100.Quantity) as c1,
         T3106.EmployeeID as c2,
         T3075.CategoryID as c3,
         TO_NUMBER(TO_CHAR(T3106.OrderDate, 'yyyy'), '9999') as c4
    from
         Categories T3075,
         Products T3122,
         OrderDetails T3100,
         Orders T3106
    where  ( T3075.CategoryID = T3122.CategoryID and T3100.OrderID = T3106.OrderID and T3100.ProductID = T3122.ProductID )
    group by T3075.CategoryID, T3106.EmployeeID, TO_NUMBER(TO_CHAR(T3106.OrderDate, 'yyyy'), '9999')
    order by c2
    

Incremental Aggregations (Partial Rebuild of Aggregates)

For the incremental load of aggregates, you can predelete some logical partitions and reload just that partition. Typically, you will need a time dimension or attribute to do this.

For example, if you wanted to predelete the data for the past one year and compute aggregates only for that one year, you could perform the predeletion by overwriting the truncate behavior only for the incremental load for that particular table. You can specify an additional filter for the subset computation of aggregates using the parameter DataCopyExtraReadFilter for incremental modes. You can use the OnSuccess Action for the task to continue to do the rolling deletes.

Step 6: Include the Data Copy Tasks as Part of an ETL

There are three scenarios for including data copy tasks in an ETL process.

Step 7: Configure the Data Copy Executor

Data copy processes are executed by the DAC Server. You need to create a data copy executor to define the properties for the behavior of the data copy process. You only need one instance of the executor.

To create a data copy executor:

  1. In the DAC Setup view, go to the External Executors tab.

  2. Click New in the top pane toolbar.

  3. Enter a name for the external executor.

  4. Select Data Copy as the Type.

  5. Click Save.

  6. Click Generate to generate the required properties.

  7. Click the Properties subtab to view and edit the properties:

    • Num Parallel Jobs per EP. Specifies how many data copy processes can run in a given execution plan in parallel. Note: One process copies data for one table. The default value is 10.

    • Num Writers per Job. Specifies how many writers a data copy process has. You can figure out how many writers you need by comparing read and write throughput published by the data copy tasks. The write stage of the data copy process can cause the entire process to slow down. This problem can be addressed by using more writers. The default value is 2.

    • Batch Size. Specifies how many records are written to the target at one time using array inserts and updates. The data is read in batches and is pushed to a queue for the writers to consume. For TimesTen, it is generally better to use a smaller batch size, with a multiple of 256. The default value is 1024.

Example of Setting Data Copy Executor Properties

For TimesTen targets, determining how many tasks can run in parallel and how many writers are required largely depends on the number of cores the Exalytics machine supports.

For example, if C is the total number of CPU cores available (40 on Exalytics), T is the number of concurrent load tasks, and W is the number of writers in each task then:

T * W < C

Note that you will want to allow one to two CPU cores to be left free to handle other tasks, such as check pointing, O/S activity, and so on. In addition, the number of writers for any one table should be <= 8. In this example, 10 tasks can safely run with three writers each.

You can also overwrite the number of writers per job and the batch size at the task level. For example, you could have two writers in general per task, but increase the writers to four for a task that has a long-process time. For this situation, you could define a task parameters called DataCopyNumWriters and DataCopyBatchSize.

Step 8: Debug the Data Copy Processes

The log files for each data copy process is stored in a directory called DAC_HOME\log\[EPNAME].[process id]. This information is available in the Description subtab in the Execution Plan tab. The log file name is also published in the task status description upon completion.

ETL Process Id : 22668776
ETL Name : AGGR EP
Run Name : AGGR EP: ETL Run - 2013-02-05 20:50:56.859
DAC Server : localhost(abc-pc)
DAC Port : 3141
Status: Completed
Log File Directory: c:\dac\log\AGGR_EP.22668776
Log File Name: AGGR_EP.22668776.log
Database Connection(s) Used :
     Timesten_Demo:50
     EBSDWH:10
Start Time: 2013-02-05 20:52:53.906
Message: Finished
Actual Start Time: 2013-02-05 20:50:56.973
End Time: 2013-02-05 20:54:17.765
Total Time Taken: 3 Minutes

Start Time For This Run: 2013-02-05 20:52:53.906
Total Time Taken For This Run: 1 Minutes

The naming convention for the data copy log files is DataCopy<task name>.log. You can obtain the file name from the Status Description in the task detail.