9 Integrating Data

This chapter describes how you can integrate just the data you need into your target using parameters, clauses, column mapping, and functions.

This chapter includes the following sections:

9.1 Selecting Records

You can select specific records to extract or replicate using the FILTER and WHERE clauses of the TABLE or MAP parameters. FILTER is the more powerful tool, letting you filter records on a variety of criteria. You may specify multiple filters in one FILE, TABLE, or MAP statement. However, WHERE is a quick, simple way to select a record that matches a single criteria. You may only have one WHERE clause per statement.

9.1.1 Selecting Records with FILTER

Use the FILTER clause of FILE, TABLE, or MAP to select specific records within a file or table for Extract or Replicat. FILTER uses the Oracle GoldenGate field conversion functions to evaluate whether to process a record. For example, the following statement extracts records in which the price multiplied by the amount exceeds 10000:

TABLE $DATA.MASTER.CUSTOMER, FILTER ((PRODUCT_PRICE*PRODUCT_AMOUNT)>10000);

In another example, the following extracts records containing a string JOE:

TABLE $DATA.MASTER.CUSTOMER, FILTER (@STRFIND(NAME, "JOE")>0);

9.1.2 Selecting Records with WHERE

Use the WHERE clause in TABLE or MAP to select specific records within a table to be extracted or replicated.

The WHERE clause consists of the following elements and must be enclosed in parentheses.

Element Example

Columns from the row

PRODUCT_AMT

Numeric values

-123, 5500.123

Literal strings enclosed in quotation marks

 "AUTO", "Ca"

Column tests

@NULL, @PRESENT, @ABSENT (column is null, present or absent in the record)

Comparison operators

 =, <>, >, <, >=, <=

Conjunctive operators

AND, OR

Grouping parentheses

open and close parentheses () for logical grouping


Arithmetic operators and floating point data types are not supported. To perform more complex selection conditions, use FILTER.

9.1.2.1 Comparing Fields

Ensure that the variable and value you specify in a comparison match appropriately. Compare:

  • Characters with literal string

  • Numeric fields with numeric values, which can include a sign and decimal point

  • SQL datetime types to literal strings, using the format in which the field is retrieved by a program

9.1.2.2 Compressed Update Considerations

When a compressed update record is encountered for a table, only part of the record image is available for the condition evaluation. By default, when a column required by the condition evaluation is missing, the record is ignored and output to the discard file, and a warning is issued.

  • Use only columns that appear in the primary key of the record, since key fields are always present in compressed records.

  • Test for a column's presence first, then for the column's value.

To test for a column's presence, use the following syntax:

field [= | <>] [@PRESENT | @ABSENT]

The following example returns all records when the AMOUNT field is over 10000 and does not cause a record to be discarded when AMOUNT is absent.

WHERE (AMOUNT = @PRESENT AND AMOUNT > 10000)

9.1.2.3 Testing for NULL Values

Evaluate SQL columns for NULL values with the @NULL clause.

The following test returns TRUE if the column is NULL, and FALSE for all other cases (including a column missing from the record).

WHERE (AMOUNT = @NULL)

The following test returns TRUE only if the column is present in the record and not NULL.

WHERE (AMOUNT = @PRESENT AND AMOUNT <> @NULL)

9.2 Column Mapping

Oracle GoldenGate provides the capability to transform data between two dissimilarly structured database tables or files. These features are implemented with the COLMAP clause in the TABLE or MAP parameters described in this chapter.

9.2.1 Mapping Between Different Database Structures

Using Oracle GoldenGate, you can transform data to accommodate differences in source and target database structures.

For example:

  • The source is a NonStop Enscribe file (ACCTFL), while the target is a SQL table (ACCTTAB).

  • 75 fields exist in ACCTFL, while ACCTTAB contains only nine columns.

  • Five columns in ACCTTAB have corresponding field names in the ACCTFL (ADDRESS, CITY, STATE, ZIPCODE, SOCIAL_SECURITY_NO).

  • A ten digit phone number field in ACCTFL corresponds to separate area code, prefix, and phone number columns in ACCTTAB.

  • A date column in ACCTTAB is computed from year, month and day fields in ACCTFL.

In this scenario, you can design a column map in a Replicat parameter file MAP statement on NonStop. For example:

MAP $DATA.MASTER.ACCTFL, DEF ACCOUNT-REC,
TARGET $DATA.MASTER.ACCTTAB,
COLMAP (
   USEDEFAULTS,
   NAME = CUST-NAME,
   TRANSACTION_DATE = @DATE ("YYYY-MM-DD",
                      "YY", TRDATE.YEAR,
                      "MM", TRDATE.MONTH,
                      "DD", TRDATE.DAY),
   AREA_CODE = @STREXT (PHONE, 1, 3),
   PHONE_PREFIX = @STREXT (PHONE, 4, 6),
   PHONE_NUMBER = @STREXT (PHONE, 7, 10)
);

This statement is composed of the following elements: 

  1. The source file (ACCTFL) and corresponding DDL definition for ACCOUNT-REC.

  2. The target table name (ACCTTAB). No definition is required for the SQL table since it is retrieved automatically from a catalog.

  3. The COLMAP parameter.

  4. USEDEFAULTS, which directs Replicat to move all fields in ACCTFL that have matching columns in ACCTTAB into the ACCTTAB table. Data translation between different data types is automatic.

  5. An explicit assignment of the CUST-NAME field to the NAME column. This is required because the names are different.

  6. A date calculation for TRANSACTION_DATE based on three fields in ACCTFL.

  7. Extracting parts of PHONE-NO into AREA_CODE, PHONE_PREFIX and PHONE_NUMBER.

9.2.1.1 Data Type Conversions

Numeric fields are converted from one type and scale to match the type and scale of the target. If the scale of the source is larger than that of the target, the number is truncated on the right. If the target scale is larger than the source, the number is padded with zeros.

Varchar and character columns can accept other character, varchar, group, and datetime columns, or string literals enclosed in quotation marks. If the target character column is smaller than that of the source, the character column is truncated on the right.

Date-time fields can accept datetime and character columns, as well as string literals. If you attempt to map a character into a datetime column, make sure it conforms to the Oracle GoldenGate external SQL format (YYYY-MM-DD:HH:MI:SS.FFFFFF). Required precision varies according to data type and target platform. Datetime columns are truncated on the right as necessary. If the source column is not as long as the target, the column is extended on the right with the values for the current date and time.

9.2.1.2 Oracle GoldenGate User Tokens

Oracle GoldenGate user tokens let you capture data and values for use in data integration. User tokens are composed of alphanumeric data from your source system, database, transactions, and/or records. They can also transfer values into other user tokens generated by queries, procedures, or other called functions.

Note:

The user token area is limited to 2000 bytes of information. Token names, data length, and the data itself are all used to calculate the user token area size.

User tokens are stored in each record's trail header, and retrieved by the appropriate Oracle GoldenGate component.

The following tables outline types of data that appear in user tokens.

Table 9-1 Sample Environmental Data for User Tokens

Environmental Detail Description
GROUPNAME

Extract or Replicat group name.

HOSTNAME

Host name running the Extract or Replicat.

OSUSERNAME

The user name that started Extract or Replicat.


Table 9-2 Sample Header Details and Their Description

Header Detail Description
BEFOREAFTERINDICATOR

Before/after indicator

COMMITTIMESTAMP

Commit timestamp

LOGPOSITION

Log position

LOGRBA

Log RBA

TABLENAME

Table name

OPTYPE

Operation type

RECORDLENGTH

Record length

TRANSACTIONINDICATOR

Transaction indicator


9.2.1.3 Populating User Tokens in the Trail Header

To populate user tokens in the trail header, you must include a TOKEN clause on the FILE or TABLE parameter in the Extract parameter file. To do so, complete the following procedure:

  1. Edit the Extract parameter file.

    GGSCI> TEDIT PARAMS EXTDEMO
    
  2. Specify a table name

    TABLE $DATA.MASTER.PRODUCT,
    
  3. Enter the desired tokens. The @GETENV function, quotation marks and comma delimiter are required.

    TOKENS
      (
       TKN-GROUP-NAME   =@GETENV ("GGENVIRONMENT", "GROUPNAME"),
       TKN-HOST-NAME    =@GETENV ("GGENVIRONMENT", "HOSTNAME"),
       TKN-OS-USER      =@GETENV ("GGENVIRONMENT", "OSUSERNAME"),
       TKN-BA           =@GETENV ("GGHEADER", "BEFOREAFTERINDICATOR"),
       TKN-COMMIT-TS    =@GETENV ("GGHEADER", "COMMITTIMESTAMP"),
       TKN-LOG-POSITION =@GETENV ("GGHEADER", "LOGPOSITION"),
       TKN-LOG-RBA      =@GETENV ("GGHEADER", "LOGRBA"),
       TKN-TABLE        =@GETENV ("GGHEADER", "TABLENAME"),
       TKN-OP-TYPE      =@GETENV ("GGHEADER", "OPTYPE"),
       TKN-REC-LEN      =@GETENV ("GGHEADER", "RECORDLENGTH"),
       TKN-TRNS-IND     =@GETENV ("GGHEADER", "TRANSACTION INDICATOR"),
      );
    
  4. Exit the parameter file.

9.2.1.4 Retrieving Values

To retrieve values, you must include a MAP parameter and a COLMAP clause in the Replicat parameter file, then use the @TOKEN function to specify the values to retrieve.

MAP $DATA.MASTER.PRODUCT, TARGET $DATA.MASTER.PRODUCT_CHANGES,
  COLMAP (USEDEFAULTS,
          SOURCE_GROUP      =@TOKEN ("TKN-GROUP-NAME"),
          SOURCE_HOST       =@TOKEN ("TKN-HOST-NAME"),
          SOURCE_USER       =@TOKEN ("TKN-OS-USER"),
          BEFORE_AFTER_IND  =@TOKEN ("TKN-BA"),
          TIMESTAMP         =@TOKEN ("TKN-COMMIT-TS"),
          SOURCE_TABLE      =@TOKEN ("TKN-TABLE"),
          IO_TYPE           =@TOKEN ("TKN-OP-TYPE"));

The @TOKEN function requires quotation marks.

9.2.1.5 Default Mapping

When you specify COLMAP USEDEFAULTS, Extract maps columns in the source table to columns in the target with the same name. At startup, Extract outputs column names that match and will map to each other.

The USEDEFAULTS parameter allows matching columns to be mapped, plus additional columns. This can be useful when the source and target definitions are similar but not identical.

If you set up global column mapping rules with COLMATCH parameters, you can map columns with different names to each other using default mapping. See the Extract and Replicat COLMATCH parameter for more details.

When unspecified or no match is found in a default map, a target field defaults to one of the following:

Column Value

Numeric

Zero

Character or varchar

Spaces

Datetime

Current date and time

Columns that can take a NULL value

NULL


If the target table contains names corresponding to the transactional columns described above, the special column values are mapped to the target record format.

9.2.1.6 Mapping Examples

The following is the source Enscribe DDL for the examples in this section.

RECORD PRODUCT-REC.
FILE IS PRODDAT KEY-SEQUENCED AUDIT.
05 PROD-KEY.
   10 CODE1 PIC X(2).
   10 CODE2 PIC 9(2).
05 PROD-INDEX1.
   10 PRICE PIC 9(7)V9(2) COMP.
   10 CODE1 PIC X(2).
   10 CODE2 PIC 9(2).
05 PROD-INDEX2.
   10 INVENTORY PIC 9(5).
   10 CODE1 PIC X(2).
   10 CODE2 PIC 9(2).
05 DESC     PIC X(40).
KEY IS PROD-KEY.
END.

The following is the target SQL DDL for the examples in this section.

Target SQL DDL
CREATE TABLE PRODTAB
(
CODE CHAR(4) NOT NULL
, PRICE NUMERIC (8,2) NOT NULL
, INVENTORY DECIMAL (6)
, MANAGER CHAR (20) NOT NULL
, DESC VARCHAR (30)
, UPDATE_TIME DATETIME YEAR TO SECOND NOT NULL
, PRIMARY KEY (CODE)
);

9.2.1.7 Legal Column Mapping

Note that one can move a group level (PROD-KEY) to a character field. This is feasible since CODE2 is a DISPLAY field, not a COMP. Also, the user does not have to qualify PRICE, INVENTORY or DESC since they are all unique in the source definition. UPDATE_TIME will default to the time at which EXTRACT processes the record. PRICE may be truncated since it has one more significant digit in the source field than in the target.

FILE $DAT11.OLDAPP.PRODFL,
DEF PRODUCT-REC,
TARGET $DATA6.NEWAPP.PRODTAB,
COLMAP
  (CODE      = PROD-KEY,
   PRICE     = PROD-INDEX1.PRICE,
   INVENTORY = INVENTORY,
   MANAGER   = "Jack Smith",
   DESC      = DESC);

9.2.1.8 Dangerous Mapping if AUDITCOMPRESS Used on Source File

Since this mapping takes the primary key value from a non-primary key source, it discards the result whenever a source record is updated without updating the price. In the following example, even if AUDITCOMPRESS is used, updates can be delivered since the primary key is always present.

FILE $DAT11.OLDAPP.PRODFL,
DEF PRODUCT-REC,
TARGET $DATA6.NEWAPP.PRODTAB,
COLMAP
  (CODE      = PROD-INDEX1.CD1,
   PRICE     = PROD-INDEX1.PRICE,
   INVENTORY = INVENTORY,
   MANAGER   = "Unknown",
   DESC      = DESC);

9.2.1.9 Using Constants, Taking Default Values.

This mapping sets PRICE to zero and Manager to spaces since they are not null fields, and sets INVENTORY and DESC to NULL since they can take null values.

TABLE $DAT11.OLDAPP.PRODFL,
DEF PRODUCT-REC,
TARGET $DATA6.NEWAPP.PRODTAB,
COLMAP
  (CODE = PROD-KEY,
   UPDATE_TIME = "2009-01-01:08:00:00");

9.3 Field Conversion Functions

Using field conversion functions, you can manipulate numbers, strings and source column or field values into the appropriate format for target columns.

See Reference for Oracle GoldenGate on HP NonStop Guardian for more information about column conversion functions.

9.3.1 Function Arguments

Column conversion functions can take one or more of the following parameters.

Parameter Example

A numeric constant

 123

A string constant

"ABCD"

A column or field from the source table or file

PHONE-NO.AREA-CODE or COLUMN_3

An arithmetic expression

COL2 * 100

A comparison expression

COL3 > 100 AND COL4 > 0

A field conversion function

its own parameters


Note:

Argument checking at run-time is not always strict and errors in argument passing are sometimes not detected until records are processed.

9.3.2 Arithmetic Expressions

Arithmetic expressions can be combinations of the following elements.

  • Numbers

  • Columns that contain numbers

  • Functions that return numbers

  • Arithmetic operators: + (plus), - (minus), * (multiply), / (divide), \ (remainder)

  • Comparison operators: > (greater than), >= (greater than or equal), < (less than), <= (less than or equal), = (equal), <> (not equal)

  • Parentheses (for grouping results in the expression)

  • Conjunction operators: AND, OR

To return the result of an arithmetic expression to a column, use the COMPUTE function.

The COMPUTE function is not required when an expression is passed as an argument, as in @STRNUM (AMOUNT1 + AMOUNT2, RIGHT).

@STRNUM (@COMPUTE(AMOUNT1 + AMOUNT2), RIGHT) would return the same result.

Arithmetic results derived from comparisons are zero (indicating FALSE) or non-zero (indicating TRUE).

When conjunction operators are involved in an expression, only the necessary part of the expression is evaluated. Once a statement is FALSE, the rest of the expression is ignored. This can be valuable when evaluating fields that may be missing or null.

For example, assume the value of COL1 is 25 and the value of COL2 is 10:

@COMPUTE (COL1 > 0 AND COL2 < 3) returns 0
@COMPUTE (COL1 < 0 AND COL2 < 3) returns 0 (and COL2 < 3 is never evaluated)
@COMPUTE ((COL1 + COL2)/5) returns 7

See Reference for Oracle GoldenGate on HP NonStop Guardian for details about the functions.

9.3.3 Null, Invalid, and Missing Columns and Fields

One problem encountered when calculating column values is that some data may be missing from the expression.

Table 9-3 summarizes the status conditions that the source columns or fields may assume.

Table 9-3 Explanation of Null, Invalid, and Missing Columns and field

Column Status Description

Missing

Frequently, data is missing in compressed update records. Compressed update records contain only those source columns that changed, plus the key of the source file or table.

Null

A source column may contain a null value, which makes a calculation difficult.

Invalid

The source data is invalid.


When one of these conditions occurs, by default the condition is returned as the result of the function.

For example, if BALANCE is 1000, but AMOUNT is NULL, the following expression returns NULL.

NEW_BALANCE = @COMPUTE (BALANCE + AMOUNT)

As another example, the AMOUNT field is defined as PIC 9(5)V99 in an Enscribe record definition, but contains spaces. In that case, the above expression returns INVALID, and the record is discarded.

If AMOUNT, but not BALANCE, is present in the update record, the field is not mapped.

9.3.3.1 Overriding Exceptional Conditions

The IF, COLSTAT and COLTEST functions recognize null, invalid, or missing columns and can compute alternative values.

For example:

NEW_BALANCE = @IF (@COLTEST (BALANCE, NULL, INVALID) OR
                   @COLTEST (AMOUNT, NULL, INVALID),
                   @COLSTAT (NULL),
                   BALANCE + AMOUNT)

This returns one of the following:

  • NULL when BALANCE or AMOUNT is NULL or INVALID

  • MISSING when either column is missing

  • The sum of the columns