3 Oracle Identity Analytics ETL Process

This chapter contains the following sections:

3.1 Overview

ETL stands for Extract, Transform, and Load. Oracle Identity Analytics uses CloverETL, which is a Java™-based data integration framework, to extract, transform, and load data to applications, databases, or warehouses.

3.2 Introduction

Oracle Identity Analytics provides the ability to import users, accounts, roles, and policies through CSV and XML files. It also supports a wide range of data transformations during the import process. Oracle Identity Analytics processes the CSV and XML files that are placed in a drop location and creates or updates objects in the Oracle Identity Analytics database. Oracle Identity Analytics uses different schema files (templates) to parse different data feeds (for example, users, accounts, roles, and policies). After Oracle Identity Analytics successfully processes a data feed, it moves the feed to a Completed location.

In addition to the Oracle Identity Analytics import functionality, Oracle Identity Analytics also provides the functionality to transform data feeds before they are put into the drop location. For example, Oracle Identity Analytics can read Excel and raw data files using the transformation graphs. Transformation graphs are XML files that contain machine-style processing instructions. For details, see Section 3.2.2, "Transformation Graphs."

3.2.1 Transformation Process

Oracle Identity Analytics transforms data files dropped into the ETL drop location using the transformation graphs. Oracle Identity Analytics uses CloverETL to perform all the transformation processing. At the end of transformation, ETL Manager writes the files to a specified drop location, which is usually configured as input for Oracle Identity Analytics.

3.2.2 Transformation Graphs

Transformation graphs are XML files that contain a machine-style processing instructions. The basic elements in graphs are as follows:

  • Parameters

  • Nodes

  • Edges

  • Metadata

  • Phases

For example:

<Graph name="testing" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*">

<Global>

<Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/TSSAccount.fmt"/>

</Global>

<Phase number="0">

<Node id="INPUT" type="com... ...DelimitedDataReader" fileURL="${inputFile}"/>

<Node id="TRANSFORM" type="REFORMAT" transformClass="com... ...ReformatAccount"/>

<Node id="OUTPUT" type="com... ...DelimitedDataWriter" fileURL="${outputFile}"/>

<Edge id="INEDGE" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/>

<Edge id="OUTEDGE" fromNode="COPY:0" toNode="OUTPUT:0" metadata="InMetadata"/>

</Phase>

</Graph>

  

In the previous example, the Oracle Identity Analytics ETL processor will transform all the files dropped in the ETL location that match the tss_\w*_accounts[\.\w]* format to the following:

tss_endpoint01_accounts.csv

tss_endpoint02_accounts.csv
  

Thus, a different transformation can be applied to each Resource type and to each resource within a Resource type.

3.2.2.1 Metadata Element

Metadata defines records node for node. In the previous example, the metadata is defined in a file called TSSAccount.fmt.

A record must be defined as delimited or fixed. When the record is defined as delimited, then the attribute delimiter is required. When the record is defined as fixed, a size attribute is required.

The following example shows the contents of the TSSAccount.fmt file:

<?xml version="1.0" encoding="UTF-8"?>

<Record name="TestInput" type="delimited">

<Field name="name" type="string" delimiter=","/>

<Field name="comments" type="string" delimiter=","/>

<Field name="endPoint" type="string" delimiter=","/>

<Field name="domain" type="string" delimiter=","/>

<Field name="suspended" type="string" delimiter=","/>

<Field name="locked" type="string" delimiter=","/>

<Field name="AcidAll" type="string" delimiter=","/>

<Field name="AcidXAuth" type="string" delimiter=","/>

<Field name="FullName" type="string" delimiter=","/>

<Field name="GroupMemberOf" type="string" delimiter=","/>

<Field name="InstallationData" type="string" delimiter=","/>

<Field name="ListDataResource" type="string" delimiter=","/>

<Field name="ListDataSource" type="string" delimiter=","/>

<Field name="M8All" type="string" delimiter="\r\n"/>

</Record>
  

3.2.2.2 Node

A node is an element that performs a specific task. In the following example, the Node INPUT reads from a CSV file, the node TRANSFORM transforms the data, and the last Node, OUTPUT, writes the resulting records to a CSV file.

<Node id="INPUT" type="com... ...DelimitedDataReader" fileURL="${inputFile}"/>

<Node id="TRANSFORM" type="REFORMAT" transformClass="com... ...ReformatAccount"/>

<Node id="OUTPUT" type="com... ...DelimitedDataWriter" fileURL="${outputFile}"/>

The element's type attribute refers to a CloverETL or Oracle Identity Analytics class. You can specify a complete class name or a short class name.

Oracle Identity Analytics provides the following nodes to read and write CSV files:

  • com.vaau.rbacx.etl.clover.components.DelimitedDataReader

  • com.vaau.rbacx.etl.clover.domain.DelimitedDataWriter

Oracle Identity Analytics also provides the com.vaau.rbacx.etl.clover.components.ExcelDataReader node to read Excel files.

3.2.2.3 Edge

The Edge element connects nodes. Nodes can have more than one input or output. To indicate a port to connect to, add a semicolon and the port number to the Node.

<Edge id="INEDGE" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/>

In this example, the output port 0 of the node INPUT1 connects to the input port 0 of the node COPY, and the records are described in the XML element InMetadata.

3.2.2.4 Phase

Transformation tasks are performed in phases. When the first phase is finished, the second starts, and so on.

3.2.3 Oracle Identity Analytics CloverETL Extensions

The attributes rbacxRegxLookupFiles and rbacxExecuteAlways are not part of the CloverETL graph definition. They are processed by the Oracle Identity Analytics ETL Manager.

The attribute rbacxRegxLookupFiles is a regular expression for file names.

ETL Manager scans the drop location with this regular expression. When ETL Manager finds a file that matches this pattern, ETL Manager runs the graph with the following parameters:

inputFile : Absolute path of the file found in the Drop Location

graphsLocation : Graph Location

outputLocation : Output Location

dropLocation : Drop Location

outputFile : Absolute path for the output File
  

If the attribute rbacxRegxLookupFiles equals true, but no file is found (for example, if reading from a database), ETL Manager runs the graph without defining the parameters inputFile and outputFile.

3.2.3.1 Transformation Configuration

ETL properties are configured in RBACX_HOME/conf/iam.properties.

Table 3-1 ETL Configuration Properties

Property Name Variable Description

ETL Graphs Location

eTLManager.graphsLocation=$RBACX_HOME/imports/etl/graphs

Directory in which to place the CloverETL graph files.

ETL Drop Location

eTLManager.dropLocation=$RBACX_HOME/imports/etl/drop

Directory in which to place data files that need transformation.

ETL Complete Location

eTLManager.completeLocation=$RBACX_HOME/imports/etl/complete

All processed files are moved to this directory after the ETL Manager completes the processing of the file.

ETL Output Location

eTLManager.outputLocation=$RBACX_HOME/imports/drop

This property specifies the directory in which to place the output of the transformation. To allow Oracle Identity Analytics to import the ETL output, this location should point to the Oracle Identity Analytics File Imports Drop Location.


3.3 Oracle Identity Analytics ETL Reference

This section includes reference information on the DelimitedDataReader, the DelimitedDataWriter, and the ExcelDataReader.

3.3.1 DelimitedDataReader and DelimitedDataWriter

CloverETL already has a .csv reader, but using the Oracle Identity Analytics version is recommended. If different delimiters are in use, however, use the CloverETL version.

Provide the file URL for the DelimitedDataReader.

<Node id="INPUT" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader"
fileURL="${inputFile}"/>
  

Provide the file URL for the DelimitedDataWriter.

<Node id="OUTPUT" type=" com.vaau.rbacx.etl.clover.domain.DelimitedDataWriter"
fileURL="${outputFile}"/>
  

3.3.2 ExcelDataReader

This Oracle Identity Analytics node reads Excel files.

Attributes:

fileURL - This attribute is Mandatory.

Row_From - Number of the initial Row. (Optional, Default value = 1)

Row_To - Number of the final Row. (Optional, Default value= -1 (All))

Col_From - Number of the initial Column. (Optional, Default value = 1)

There is no Col_To because the reader uses the metadata to know how many columns it has to read.

<Node id="INPUT1" type="com.vaau.rbacx.etl.clover.components.ExcelDataReader"
fileURL="${inputFile}" Row_From="1" />

3.4 Transformation Examples

3.4.1 Merge

When a file with the pattern tss_\w*_accounts[\.\w]* is found in the drop location by the ETL Manager, the following graph is executed. The ETL Manager will read the file_01.dat, file_02.dat, and file_03.dat CSV files using the com.vaau.rbacx.etl.clover.components.DelimitedDataReader node and then merge the data with the MERGE node. The output file will keep the sort order stated in mergeKey="ShipName;ShipVia".

The file with the pattern tss_\w*_accounts[\.\w]* is moved to the completed location. The files file_01.dat, file_02.dat, and file_03.dat stay in the c:\tss folder. The output file will have the same name as the input file.

<Graph name="TestingMerge" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*">

<!--
This graph illustrates usage of MERGE component. It merges data based on the
specified key.
-->

<Global>

<Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/tss_accunts.fmt"/>

</Global>

<Phase number="0">

<Node id="INPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader"
fileURL="c:\tss\file_01.dat"/>

<Node id="INPUT2" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader"
fileURL="c:\tss\file_02.dat"/>

<Node id="INPUT3" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader"
fileURL="c:\tss\file_03.dat"/>

<Node id="MERGE" type="MERGE" mergeKey="ShipName;ShipVia"/>

<Node id="OUTPUT" type="com.vaau.rbacx.etl.clover.domain.DelimitedDataWriter"
fileURL="${outputFile}"/>

<Edge id="INEDGE1" fromNode="INPUT1:0" toNode="MERGE:0" metadata="InMetadata"/>

<Edge id="INEDGE2" fromNode="INPUT2:0" toNode="MERGE:1" metadata="InMetadata"/>

<Edge id="INEDGE3" fromNode="INPUT3:0" toNode="MERGE:2" metadata="InMetadata"/>

<Edge id="OUTEDGE" fromNode="MERGE:0" toNode="OUTPUT:0" metadata="InMetadata"/>

</Phase>

</Graph>
  

3.4.2 Filter

The following graph demonstrates the functionality of the Extended Filter component.

It can filter on text, date, integer, and numeric fields with comparison operators: ( >, <, ==, <=, >=, != ).

Text fields can also be compared to a Java regular expression using the ~= operator.

A filter can be made of different parts separated by a logical operator AND or OR. Parentheses for grouping individual comparisons are also supported. For example, $Age>10 and ($Age <20 or $HireDate<"2003-01-01").

A filter works on a single input record, where individual fields of the record are referenced using a dollar sign and the field's name. For example, $Age,$Name.

The date format for date constants is yyyy-MM-dd or yyyy-MM-dd hh:mm:ss.

The following graph produces one output file where all employees have the pattern "DELTSO[0-9]*0" in the comments field.

<Graph name="Testing Filter" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*">

<Global>

<Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/InAccounts.fmt"/>

</Global>

<Phase number="0">

<Node id="INPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader" fileURL="\$\{inputFile\}"/>

<Node id="FILTEREMPL2" type="EXT_FILTER">

$comments~="DELTSO[0-9]*0"

</Node>

<Node id="OUTPUT1" 
type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter" 
fileURL="\$\{outputFile\}"/>

<Edge id="INEDGE1" fromNode="INPUT1\:0" toNode="FILTEREMPL2:0" 
metadata="InMetadata"/>

<Edge id="INNEREDGE3" fromNode="FILTEREMPL2\:0" toNode="OUTPUT1:0" 
metadata="InMetadata"/>

</Phase>

</Graph>

  

3.4.3 Fixed Length Data Reader

The following graph transforms a Fixed Length Data file into a CSV file.

<Graph name="Testing Filter" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*">

<Global>

<Metadata id="OutMetadata" fileURL="${graphsLocation}/metadata/InAccounts.fmt"/>

<Metadata id="InMetadata" 
fileURL="${graphsLocation}/metadata/InAccountsFixedWith.fmt"/>

</Global>

<Phase number="0">

<Node id="INPUT1" type="FIXLEN_DATA_READER_NIO" OneRecordPerLine="true" 
SkipLeadingBlanks="true" LineSeparatorSize="2" fileURL=" $ { inputFile } "/>

<Node id="COPY" type="SIMPLE_COPY"/>

<Node id="OUTPUT1" 
type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter" 
fileURL="${outputFile}"/>

<Edge id="INEDGE1" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/>

<Edge id="OUTEDGE1" fromNode="COPY:0" toNode="OUTPUT1:0" metadata="OutMetadata"/>

</Phase>

</Graph>

  

Following is the contents of the file InAccountsFixedWith.fmt.

<?xml version="1.0" encoding="UTF-8"?>

<Record name="TestInput" type="fixed">

<Field name="name" type="string" size="16"/>

<Field name="comments" type="string" size="16"/>

<Field name="endPoint" type="string" size="16"/>

<Field name="domain" type="string" size="5"/>

<Field name="suspended" type="string" size="10"/>

<Field name="locked" type="string" size="10"/>

<Field name="AcidAll" type="string" size="10"/>

<Field name="AcidXAuth" type="string" size="10"/>

<Field name="FullName" type="string" size="40"/>

<Field name="GroupMemberOf" type="string" size="60"/>

<Field name="InstallationData" type="string" size="60"/>

<Field name="ListDataResource" type="string" size="10"/>

<Field name="ListDataSource" type="string" size="10"/>

<Field name="M8All" type="string" size="10"/>

</Record>

3.4.4 Database Input

This node imports data from databases. In the following example, the ETL Manager executes the graph for each file that matches the pattern in rbacxRegxLookupFiles.

<Graph name="Testing Filter" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*">

<Global>

<Metadata id="InMetadata" 
fileURL="${graphsLocation}/metadata/InAccountsFromDB.fmt"/>

<Metadata id="OutMetadata" 
fileURL="${graphsLocation}/metadata/OutAccounts.fmt"/>

<DBConnection id="InterbaseDB" dbConfig="${graphsLocation}/dbConfig/Rbacx.cfg"/>

</Global>

<Phase number="0">

<Node id="INPUT1" type="DB_INPUT_TABLE"

dbConnection="InterbaseDB">


<SQLCode>

select * from tss_01_accounts

</SQLCode>

</Node>

<Node id="COPY" type="REFORMAT" >

import org.jetel.component.DataRecordTransform;

import org.jetel.data.DataRecord;

import org.jetel.data.SetVal;

import org.jetel.data.GetVal;



public class reformatAccount extends DataRecordTransform{

int counter=0;

DataRecord source;

DataRecord target;

public boolean transform(DataRecord _source[], DataRecord[] _target) {

StringBuffer strBuf = new StringBuffer(80);

source=_source[0];

target=_target[0];

try {

SetVal.setString(target,"name",GetVal.getString(source,"name"));

SetVal.setString(target,"comments",GetVal.getString(source,"comments"));

SetVal.setString(target,"endPoint",GetVal.getString(source,"endPoint"));

SetVal.setString(target,"domain",GetVal.getString(source,"domain"));

SetVal.setString(target,"suspended",
getBooleanString(GetVal.getInt(source,"suspended")));

SetVal.setString(target,"locked",
getBooleanString(GetVal.getString(source,"locked")));

SetVal.setString(target,"AcidAll",GetVal.getString(source,"AcidAll"));

SetVal.setString(target,"AcidXAuth",GetVal.getString(source,"AcidXAuth"));

SetVal.setString(target,"FullName",GetVal.getString(source,"FullName"));

SetVal.setString(target,"GroupMemberOf",
GetVal.getString(source,"GroupMemberOf"));

SetVal.setString(target,"InstallationData",
GetVal.getString(source,"InstallationData"));

SetVal.setString(target,"ListDataResource",
GetVal.getString(source,"ListDataResource"));

SetVal.setString(target,"ListDataSource",
GetVal.getString(source,"ListDataSource"));

SetVal.setString(target,"M8All",GetVal.getString(source,"M8All"));

}

catch (Exception ex) {

errorMessage = ex.getMessage() + " ->occured with record :" + counter;

return false;

}

counter++;

return true;

}



private String getBooleanString(int value){

if(value==0)

return "FALSE";

else

return "TRUE";

}

}

</Node>

<Node id="OUTPUT1" 
type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter" 
fileURL="${outputFile}/>



<Edge id="INEDGE1" fromNode="INPUT1:0" toNode="COPY:0" 
metadata="InMetadata"/>

<Edge id="OUTEDGE1" fromNode="COPY:0" toNode="OUTPUT1:0" 
metadata="OutMetadata"/>

</Phase>

</Graph>

  

If you don't want to execute this graph by putting a file in the drop location, add the attribute rbacxExecuteAlways=true.

<Graph name="Testing Filter" rbacxExecuteAlways="true" >

<Global>

<Metadata id="InMetadata" 
fileURL="${graphsLocation}/metadata/InAccountsFromDB.fmt"/>

<Metadata id="OutMetadata"
fileURL="${graphsLocation}/metadata/OutAccounts.fmt"/>

<DBConnection id="InterbaseDB" dbConfig="${graphsLocation}/dbConfig/Rbacx.cfg"/>

</Global>

<Phase number="0">

<Node id="INPUT1" type="DB_INPUT_TABLE"

dbConnection="InterbaseDB">

<SQLCode>

select * from tss_01_accounts

</SQLCode>

</Node>

<Node id="COPY" type="REFORMAT" >

import org.jetel.component.DataRecordTransform;

import org.jetel.data.DataRecord;

import org.jetel.data.SetVal;

import org.jetel.data.GetVal;



public class reformatAccount extends DataRecordTransform{

int counter=0;

DataRecord source;

DataRecord target;

public boolean transform(DataRecord _source[], DataRecord[] _target) {

StringBuffer strBuf = new StringBuffer(80);

source=_source[0];

target=_target[0];

try {

SetVal.setString(target,"name",GetVal.getString(source,"name"));

SetVal.setString(target,"comments",GetVal.getString(source,"comments"));

SetVal.setString(target,"endPoint",GetVal.getString(source,"endPoint"));

SetVal.setString(target,"domain",GetVal.getString(source,"domain"));

SetVal.setString(target,"suspended",
getBooleanString(GetVal.getInt(source,"suspended")));

SetVal.setString(target,"locked",
getBooleanString(GetVal.getString(source,"locked")));

SetVal.setString(target,"AcidAll",GetVal.getString(source,"AcidAll"));

SetVal.setString(target,"AcidXAuth",GetVal.getString(source,"AcidXAuth"));

SetVal.setString(target,"FullName",GetVal.getString(source,"FullName"));

SetVal.setString(target,
"GroupMemberOf",GetVal.getString(source,"GroupMemberOf"));

SetVal.setString(target,
"InstallationData",GetVal.getString(source,"InstallationData"));

SetVal.setString(target,
"ListDataResource",GetVal.getString(source,"ListDataResource"));

SetVal.setString(target,
"ListDataSource",GetVal.getString(source,"ListDataSource"));

SetVal.setString(target,"M8All",GetVal.getString(source,"M8All"));

}

catch (Exception ex) {

errorMessage = ex.getMessage() + " ->occured with record :" + counter;

return false;

}

counter++;

return true;

}



private String getBooleanString(int value){

if(value==0)

return "FALSE";

else

return "TRUE";

}

}

</Node>

<Node id="OUTPUT1" 
type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter"
fileURL="${outputLocation}/tss_01_accounts.dat"/>


<Edge id="INEDGE1" fromNode="INPUT1:0" toNode="COPY:0" 
metadata="InMetadata"/>

<Edge id="OUTEDGE1" fromNode="COPY:0" toNode="OUTPUT1:0" 
metadata="OutMetadata"/>

</Phase>

</Graph>

3.5 Load and Unload Data From the Database

This section discusses how to move data to and from the database using CloverETL.

3.5.1 How CloverETL Works With Databases

CloverETL uses the JDBC™ API to communicate with databases. If your database has a driver supporting the JDBC API, CloverETL can be used to unload data stored within database table, or it can populate a database table with internal data.

3.5.2 DBConnection

Before you can connect to a database, you must define the DBConnection. This property is defined within a graph.

<DBConnection id="InterbaseDB" dbConfig="Interbase.cfg"/>

This specifies that CloverETL should set up a database connection called InterbaseDB. All required parameters (JDBC driver name, DB connect string, user name, and password) can be found in the configuration file Interbase.cfg.

The dbConfig file is a standard Java properties file. It contains names of parameters along with their values. The following table lists the possible parameters.

Table 3-2 DBConnection Parameters

Parameter Name Description of Parameter Example of Parameter's Value

dbDriver

Specifies the name of the class containing the JDBC driver for your database. This class must be visible to Java (be part of CLASSPATH).

org.postgresql.Driver

dbURL

URL for connecting to the database, including the name of JDBC driver to use, the IP address where the server listens, the name of the database instance, and the port.

jdbc:postgresql://192.168.1.100/mydb

user

The user name under which to connect to the database.

Admin

password

The password to be used.

free

driverLibrary

Optional. The location of the JDBC driver class.

c:\Oracle\product\10.1.0\Client_1\jdbc\lib\ojdbc14.jar

JDBC driver-specific parameters

Optional. Specify as needed.

Oracle example: defaultRowPrefetch=10


The following example lists the possible contents of a Postgres.cfg file that defines the connection to a PostgreSQL database:

Postgres.cfg

dbDriver=org.postgresql.Driver

dbURL=jdbc:postgresql://192.168.1.100/mydb

user=david

password=unknown

All parameters can also be directly specified when defining the connection:

Defining Connection

<DBConnection id="InterbaseDB" dbDriver="org.postgresql.Driver" 
dbURL="jdbc:postgresql://192.168.1.100/mydb" user="david" password="unknown"/>

The values specified with the dbConfig parameter takes precedence over parameters specified in a properties file.

3.5.2.1 Mapping JDBC Data Types to Clover Types

When working with the database through JDBC drivers, CloverETL needs to map its internal data types onto JDBC data types. The variety of DB (JDBC) field types is large, but most of them (with the exception of BLOBs) can be mapped to Clover internal types without losing any information.

3.5.2.2 JDBC to CloverETL

The following table lists JDBC data types and corresponding CloverETL data types. The conversion is done automatically by CloverETL when analyzing DB tables using the org.jetel.database.AnalyzeDB utility. This conversion can also be made manually.

Table 3-3 JDBC Data Types and Corresponding CloverETL Data Types

JDBC (DB) Data Type CloverETL Data Type

INTEGER

SMALLINT

TINYINT

INTEGER

BIGINT

LONG

DECIMAL

DOUBLE

FLOAT

NUMERIC

REAL

NUMERIC

CHAR

LONGVARCHAR

VARCHAR

OTHER

STRING

DATE

TIME

TIMESTAMP

DATE

BOOLEAN

BIT

STRING

(True value coded as T; false value coded as F)


The following example illustrates the conversion. First, the DDL (Oracle DB) definition of the database table is presented, and then Clover's version of the same thing using its internal data types.

DDL (Oracle DB)

create table MYEMPLOYEE

(

  EMP_NO      NUMBER not null,

  FIRST_NAME  VARCHAR2(15) not null,

  LAST_NAME   VARCHAR2(20) not null,

  PHONE_EXT   VARCHAR2(4),

  HIRE_DATE   DATE not null,

  DEPT_NO     CHAR(3) not null,

  JOB_CODE    VARCHAR2(5) not null,

  JOB_GRADE   NUMBER(4,2) not null,

  JOB_COUNTRY VARCHAR2(15) not null,

  SALARY      NUMBER(15,2) not null,

  FULL_NAME   VARCHAR2(35)

);

Clover's Version

<?xml version="1.0" encoding="UTF-8"?>

<!-- Automatically generated from database null -->

<Record name="EMPLOYEE" type="delimited">

    <Field name="EMP_NO" type="numeric" delimiter="," format="#"/>

    <Field name="FIRST_NAME" type="string" delimiter="," />

    <Field name="LAST_NAME" type="string" delimiter="," />

    <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," />

    <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" />

    <Field name="DEPT_NO" type="string" delimiter="," />

    <Field name="JOB_CODE" type="string" delimiter="," />

    <Field name="JOB_GRADE" type="numeric" delimiter="," />

    <Field name="JOB_COUNTRY" type="string" delimiter="," />

    <Field name="SALARY" type="numeric" delimiter="," />

    <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" />

</Record>

3.5.2.3 CloverETL to JDBC

The reverse conversion from a CloverETL to JDBC data type (usually done when populating a target DB table) is also driven by JDBC data types. There are some exceptions that are caused by the non-existence of certain field types on the CloverETL side. These exceptions are handled automatically by CloverETL. Internally it is done by calling different than standard JDBC methods for populating database fields with values. Refer to the source code (org.jetel.database.CopySQLData) to get detailed information.

Table 3-4 Conversions Performed When Converting From CloverETL to JDBC

JDBC Type CloverETL Type Conversion Performed

Timestamp

Date

Date is converted to Timestamp, and the target is set using the setTimestamp() method.

Boolean

Bit

String

If the string contains T or t, the target is set to be True; otherwise False using setBoolean()

Decimal

Double

Numeric

Real

Integer

Conversion from Integer to Decimal is made. The target is set using the setDouble() method.

Other

(includes

NVARCHAR and

NCHAR)

String

The target is set using the setString() method.


3.5.3 Using the AnalyzeDB Utility

The CloverETL package contains a simple utility that can analyze a source or target database table and produce Clover's metadata description file. This metadata can be used by any DB-related component.

The following table lists the parameters that can be specified with the AnalyzeDB command. The command must specify which database to connect to and which database table to analyze. You can use the same DBConnection file described previously in Section 3.5.2, "DBConnection."

To specify which table to analyze, supply an SQL query to execute against the database. The returned result set is examined for field types. As a result, you can extract and analyze a portion of table.

The following table lists the options and parameters:

Table 3-5 Parameters for use With the AnalyzeDB Utility

Parameter Description

-dbDriver

JDBC driver to use

-dbURL

Database name (URL)

-config

Config or Property file containing parameters

-user

User name

-password

User's password

-d

Delimiter to use (a comma , is standard)

-o

Output file to use (stdout is standard)

-f

Read SQL query from file name

-q

SQL query on command line

-info

Displays list of driver's properties


The following example examines all data fields of the employees DB table:

java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select * from employees where 1=0"

The following example extracts specific fields, as stated in the SQL query:

java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select emp_no,full_name from employees where 1=0"

3.5.4 DBInputTable Component

To unload data from the database table, use the DBInputTable component. It requires that the dbConnection parameter be specified and an SQL command (sqlQuery parameter), which will be executed against the database specified by dbConnection.

Individual fields fetched from the database are mapped to Clover data records/fields. (See Section 3.5.2.2, "JDBC to CloverETL."). The structure of the Clover record is determined by specified Clover metadata. (Metadata is assigned to an Edge, which connects DBInputTable with other components connected to DBInputTable.)

The following example transformation graph uses the DBInputTable component:

Transformation Graph

<?xml version="1.0" encoding="UTF-8"?>

<Graph name="TestingDB">

<Global>

<Metadata id="InMetadata" fileURL="metadata/employee.fmt"/>

<DBConnection id="PosgressDB" dbConfig="Posgress.cfg"/>

</Global>

<Phase number="0">

<Node id="INPUT" type="DB_INPUT_TABLE" dbConnection="PosgressDB"

sqlQuery="select * from employee"/>

<Node id="OUTPUT" type="DELIMITED_DATA_WRITER_NIO" append="false"

fileURL="employees2.list.out"/>

<Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0"

metadata="InMetadata"/>

</Phase>

</Graph>

The SQL command (sqlQuery) can be more complicated than the previous example suggests. You can use any valid SQL construct, but make sure that the metadata corresponds to the number and types of returned data fields.

3.5.5 DBOutputTable Component

When there is a need to populate a database table with data coming from a CloverETL transformation graph, the DBOutputTable component can be used to fulfill it. It is complementary to DBInputTable. It maps CloverETL data records and individual fields to target database table fields. It can perform simple data conversions to successfully map CloverETL basic data types on to target database variants. See the previous Section 3.5.2.3, "CloverETL to JDBC."

The following example illustrates the usage of DBOutputTable:

<?xml version="1.0" encoding="UTF-8"?>

<Graph name="TestingDB2">

<Global>

<Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/>

<DBConnection id="PosgressDB" dbConfig="posgress.cfg"/>

</Global>

<Phase number="0">

<Node id="INPUT" type="DELIMITED_DATA_READER_NIO"

fileURL="employees.list.dat" />

<Node id="OUTPUT" type="DB_OUTPUT_TABLE" dbConnection="PosgressDB"

dbTable="myemployee" />

<Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0"

metadata="InMetadata"/>

</Phase>

</Graph>

  

If you need to populate only certain fields of the target DB table (when, for instance, one field is automatically populated from a DB sequence), the dbFields parameter of DBOutputTable can be used:

<Node id="OUTPUT2" type="DB_OUTPUT_TABLE" dbConnection="PosgressDB" 
dbTable="myemployee" dbFields="FIRST_NAME;LAST_NAME" />

  

The DBOutputTablecloverFields parameter can be used to precisely specify mapping from CloverETL data records to database table records. It allows you to specify which source field (from Clover) is mapped to which target database table field.

Coupled with dbFields, it specifies a 1:1 mapping. Individual fields are mapped according to the order in which they appear in dbFields and cloverFields, respectively. The parameter that determines how many fields will be populated is always dbFields. When there is no dbFields parameter present, CloverETL assumes that all target fields should be populated in the order in which they appear in the target database table.

The following examples illustrate how to pick certain fields from the source data record (a CloverETL record), regardless of their order, and map them to target database table fields (again, regardless of their order).

<?xml version="1.0" encoding="UTF-8"?>

<Graph name="TestingDB3">

<Global>

<Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/>

<DBConnection id="PosgressDB" dbConfig="posgress.cfg"/>

</Global>

<Phase number="1">

<Node id="INPUT" type="DELIMITED_DATA_READER_NIO"

fileURL="employees2.list.tmp" />

<Node id="OUTPUT" type="DB_OUTPUT_TABLE" dbConnection="InterbaseDB"

dbTable="myemployee"

    dbFields="FIRST_NAME;LAST_NAME"

    cloverFields="LAST_NAME;FIRST_NAME" />

<Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0"

metadata="InMetadata"/>

</Phase>

</Graph>

  

The resulting mapping between fields specified in the previous example is:

Source Field (CloverETL)

Target Field (DB Table)

LAST_NAME

FIRST_NAME

FIRST_NAME

LAST_NAME


3.5.6 Executing SQL/DML/DDL Statements against DB

Sometimes you need to execute one or more database commands that do not require any input. Examples include creating a new table, adding a data partition, and dropping an index. For this purpose, CloverETL offers the DBExecute component.

3.5.6.1 DBExecute Component

The DBExecute component takes specified commands and executes them one by one against the database. You can define whether all commands form one transaction, or whether they should be committed to the database after each command.

The following is a simple example of DBExecute:

<?xml version="1.0" encoding="UTF-8"?>

<Graph name="TestingExecute">

<Global>

<DBConnection id="InterbaseDB" dbConfig="interbase.cfg"/>

</Global>

<Phase number="0">

<Node id="DBEXEC" type="DB_EXECUTE" dbConnection="InterbaseDB"

inTransaction="N">

<SQLCode>

create table EMPLOYEE

(

  EMP_NO      NUMBER not null,

  FIRST_NAME  VARCHAR2(15) not null,

  LAST_NAME   VARCHAR2(20) not null,

  PHONE_EXT   VARCHAR2(4),

  HIRE_DATE   DATE not null,

  DEPT_NO     CHAR(3) not null,

  JOB_CODE    VARCHAR2(5) not null,

  JOB_GRADE   NUMBER(4,2) not null,

  JOB_COUNTRY VARCHAR2(15) not null,

  SALARY      NUMBER(15,2) not null,

  FULL_NAME   VARCHAR2(35)

);

insert into employee values(2,'Robert','Nelson','250',28/12/1988,'600','VP',2.0,
'USA',105900.0,'Nelson, Robert');

insert into employee values(4,'Bruce','Young','233',28/12/1988,'621','Eng',2.0,
'USA',97500.0,'Young, Bruce');

insert into employee values(5,'Kim','Lambert','22',06/02/1989,'130','Eng',2.0,
'USA', 102750.0,'Lambert, Kim');

insert into employee values(8,'Leslie','Johnson','410',05/04/1989,'180','Mktg',
3.0,'USA', 64635.0,'Johnson, Leslie');

insert into employee values(9,'Phil','Forest','229',17/04/1989,'622','Mngr',3.0,'USA',75060.0,'Forest,

Phil');

</SQLCode>

</Node>

</Phase>

</Graph>

3.6 CloverETL DataRecord Reference

This section provides additional information about the CloverETL DataRecord.

3.6.1 How Data is Represented Within CloverETL

CloverETL works with data in terms of data records, and data fields within records. Internally, all records are represented as variable-length data. This means that every data field consumes only as much memory as needed for storing a field's value. If you have a field of type STRING specified to be 50 characters in length and this field is populated with a string of 20 characters, only 20 characters are allocated in memory.

Moreover, CloverETL does not require that a length be specified. There is an internal maximum length for any field, but it should be enough to accommodate even very long strings. For types other than strings, there is fixed size of the field, regardless of the actual value.

There are some cases when it matters whether you specify the size of each field. This is discussed in the next section.

3.6.2 Supported Data Field Types

The following table lists all supported types of data, along with ranges of values for each type.

Table 3-6 Supported Data Types and Value Ranges in CloverETL

Data Type Name Based on Size Range of Values

string

java.lang.String

Depends on actual data length

 

date

java.util.Date

64bit - sizeof(long)

Starts: January 1, 1970, 00:00:00 GMT

increment: 1ms

integer

java.lang.Integer

32bit - sizeof(int)

Min: -2 31

Max: 2 31 -1

numeric

java.lang.Double

64bit - sizeof(double)

Min:2 -1074

Max: (2-2 -52 ) 2 1023

long

java.lang.Long

64bit - size of (long)

Min: 2 63 -1

Max: -2 63

decimal

NA

NA

Not yet implemented

byte

java.lang.Byte

Depends on actual data length

Min: 0

Max: 255


3.6.3 Specification of Record Format

One way of putting together a description of a record format is to create some Java code and use CloverETL classes/methods calls.

The easier way is to create an XML description of a record format that can be read by CloverETL and automatically materialized in memory.

It is customary to use the .fmt extension for an XML file that contains metadata describing the format of a data record. The following example shows simple metadata that describes a record containing three data fields:

<?xml version="1.0" encoding="UTF-8"?>

<Record name="TestInput" type="delimited">

<Field name="Name" type="string" delimiter=";"/>

<Field name="Age" type="numeric" delimiter="|"/>

<Field name="City" type="string" delimiter="\n"/>

</Record>

  

This simple example shows the definition of a data record specified as delimited. The record has three fields:

  • Name (of type string)

  • Age (of type numeric)

  • City (of type string)

3.6.3.1 Naming

The are no strict rules for naming fields (and records). However, you use the same rules as for naming Java variables. For example, use only letters [a-zA-Z], numbers [0-9] (not in the first position), and underscores [ _ ].

The encoding specified for the XML file is UTF-8.

Note:

When creating a file, you must save the file using the encoding specified in the encoding tag. Otherwise, the XML parser used by CloverETL won't be able to correctly interpret the file.

3.6.4 Delimiters

Each field in the previous example has a specified delimiter character. This information is used by the data parser when parsing data records (of this structure) from external text files. The same delimiters are used when CloverETL outputs internal data records (of this structure) to output text files.

Delimiters can be up to 32 characters long, and each field can have a different one. Basic control characters such as \t (tabulator), \n (line feed), and \r (carriage return) are supported.

3.6.5 Field Formats and Other Features

The following example shows additional features:

<?xml version="1.0" encoding="UTF-8"?>

<!-- Automatically generated from database null -->

<Record name="EMPLOYEE" type="delimited">

    <Field name="EMP_NO" type="integer" delimiter="," format="#"/>

    <Field name="FIRST_NAME" type="string" delimiter="," />

    <Field name="LAST_NAME" type="string" delimiter="," />

    <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," />

    <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" />

    <Field name="BIRTH_DATE" type="date" delimiter="," locale="en"/>

    <Field name="DEPT_NO" type="string" delimiter="," />

    <Field name="JOB_CODE" type="string" delimiter="," />

    <Field name="JOB_GRADE" type="numeric" delimiter="," format="#" />

    <Field name="JOB_COUNTRY" type="string" delimiter="," />

    <Field name="SALARY" type="numeric" delimiter="," />

    <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" />

</Record>

  

3.6.5.1 Nullable

Some fields, such as PHONE_EXT, have the nullable attribute set to yes, which means that the field is allowed to contain a null value. The default is yes or true (that is, the field can contain a null value). The exact behavior is influenced by a concrete data parser or data formatter, but simply put, when a field is not specified to be nullable and an application tries to put a null value in it, this operation fails. This can stop the whole transformation process.

3.6.5.2 Format

Use the Format attribute to specify the expected format of data when parsing in, or printing out of, CloverETL. In this case, the HIRE_DATE field is of type date and it is specified that date values in external textual data will look like this: 19/12/1999

For all possible format specifiers (control characters), see the documentation for java.text.SimpleDateFormat.

Similar to HIRE_DATE is the JOB_GRADE field, which is of type numeric. Here the format specifies that data is expected to be integer numbers only (no decimal point allowed).

See the following tables for date and number format specifiers.

3.6.5.3 Date and Time Specifiers

Table 3-7 Date and Time Specifiers

Letter Date or Time Component Presentation Examples

G

Era designator

Text

AD

y

Year

Year

1996; 96

M

Month in year

Month

July; Jul; 07

w

Week in year

Number

27

W

Week in month

Number

2

D

Day in year

Number

189

d

Day in month

Number

10

F

Day of week in month

Number

2

E

Day in week

Text

Tuesday; Tue

a

Am/pm marker

Text

PM

H

Hour in day (0-23)

Number

0

k

Hour in day (1-24)

Number

24

K

Hour in am/pm (0-11)

Number

0

h

Hour in am/pm (1-12)

Number

12

m

Minute in hour

Number

30

s

Second in minute

Number

55

S

Millisecond

Number

978

z

Time zone

General time zone

Pacific Standard Time; PST; GMT-08:00

Z

Time zone

RFC 822 time zone

-0800


3.6.5.4 Date and Time Format Examples

Table 3-8 Date and Time Format Examples

Date and Time Pattern Result

"yyyy.MM.dd G 'at' HH:mm:ss z"

2001.07.04 AD at 12:08:56 PDT

"EEE, MMM d, ''yy"

Wed, Jul 4, '01

"h:mm a"

12:08 PM

"hh 'o''clock' a, zzzz"

12 o'clock PM, Pacific Daylight Time

"K:mm a, z"

0:08 PM, PDT

"yyyyy.MMMMM.dd GGG hh:mm aaa"

02001.July.04 AD 12:08 PM

"EEE, d MMM yyyy HH:mm:ss Z"

Wed, 4 Jul 2001 12:08:56 -0700

"yyMMddHHmmssZ"

010704120856-0700


3.6.5.5 Number Specifiers

Table 3-9 Number Specifiers

Symbol Location Localized Meaning

0

Number

Localized

Digit

#

Number

Localized

Digit, zero shows as absent

.

Number

Localized

Decimal separator or monetary decimal separator

-

Number

Localized

Minus sign

,

Number

Localized

Grouping separator

E

Number

Localized

Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix.

;

Subpattern boundary

Localized

Separates positive and negative subpatterns

%

Prefix or suffix

Localized

Multiply by 100 and show as percentage

\u2030

Prefix or suffix

Localized

Multiply by 1000 and show as per mille

(\u00A4)

Prefix or suffix

Not localized

Currency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If present in a pattern, the monetary decimal separator is used instead of the decimal separator.

'

Prefix or suffix

Not localized

Used to quote special characters in a prefix or suffix, for example, "'#'#" formats 123 to "#123". To create a single quote itself, use two in a row: "# o''clock".


3.6.5.6 Number Format

When specifying the format for numbers, Clover (Java) uses the default system locale setting, unless another locale is specified through the locale option.

This is important in cases when you are parsing data where decimal numbers use a , (comma) as a decimal separator, whereas the system default (global) says it is . (period).

In such a case, use the locale option together with the format option to change the expected decimal delimiter. For example:

<Field name="Freight" type="numeric" delimiter="|" format="#.#" locale="en.US" />

  

3.6.5.7 Locale

Instead of specifying a format parameter, you can specify a locale parameter, which states the geographical, political, or cultural region for formatting data. Thus, instead of specifying the format for the date field, you could specify the locale for Germany (locale="de"), for example. Clover automatically chooses the proper date format for Germany.

There are cases when both format and locale parameters make sense, for example when formatting decimal numbers. Define the format pattern with a decimal separator, and the locale specifies whether the separator is a comma or a dot.

3.6.6 Specifying Default Values for Fields

CloverETL allows you to specify a default value for each field. This value is used (in certain cases) when a field is assigned to be null, but a null value is not allowed for the field.

The following example shows fields with specified default values:

<?xml version="1.0" encoding="UTF-8"?>

<Record name="Orders" type="delimited">

    <Field name="OrderID" type="numeric" delimiter="|" format="#" />

    <Field name="OrderDate" type="date" delimiter="|" format="dd.MM.yyyy"

default="01.01.1900" nullable="no" />

    <Field name="Amount" type="number" delimiter="\n" default="0.0" 
nullable="no" />

</Record>

  

In this example, OrderDate is defaulted to 1.1.1900, in case it is not present in the text data this record is parsed from. In general, when this field is assigned a null value, the specified default value is assigned instead. The same is true for the Amount field, except the default is specified to be 0.

Note - This behavior is not the default and concerns only data parsers. If your code assigns a null value into a non-nullable field, a BadDataFormatException error will occur.

If you use any of the Clover data parsers, you can specify a DataPolicy, which states what should happen if a parsed value cannot be assigned to a data field (as in the case when the value is null and the field cannot accept null values).

There are three different data policies defined:

  • Strict - Any problem causes BadDataFormatException. This is the default behavior.

  • Controlled - Similar to strict, but also logs the problematic value.

  • Lenient - If a default value exists, CloverETL attempts to assign that default value.