This chapter contains the following sections:
ETL stands for Extract, Transform, and Load. Oracle Identity Analytics uses CloverETL, which is a Java™-based data integration framework, to extract, transform, and load data to applications, databases, or warehouses.
Oracle Identity Analytics provides the ability to import users, accounts, roles, and policies through CSV and XML files. It also supports a wide range of data transformations during the import process. Oracle Identity Analytics processes the CSV and XML files that are placed in a drop location and creates or updates objects in the Oracle Identity Analytics database. Oracle Identity Analytics uses different schema files (templates) to parse different data feeds (for example, users, accounts, roles, and policies). After Oracle Identity Analytics successfully processes a data feed, it moves the feed to a Completed location.
In addition to the Oracle Identity Analytics import functionality, Oracle Identity Analytics also provides the functionality to transform data feeds before they are put into the drop location. For example, Oracle Identity Analytics can read Excel and raw data files using the transformation graphs. Transformation graphs are XML files that contain machine-style processing instructions. For details, see Section 3.2.2, "Transformation Graphs."
Oracle Identity Analytics transforms data files dropped into the ETL drop location using the transformation graphs. Oracle Identity Analytics uses CloverETL to perform all the transformation processing. At the end of transformation, ETL Manager writes the files to a specified drop location, which is usually configured as input for Oracle Identity Analytics.
Transformation graphs are XML files that contain a machine-style processing instructions. The basic elements in graphs are as follows:
Parameters
Nodes
Edges
Metadata
Phases
For example:
<Graph name="testing" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*"> <Global> <Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/TSSAccount.fmt"/> </Global> <Phase number="0"> <Node id="INPUT" type="com... ...DelimitedDataReader" fileURL="${inputFile}"/> <Node id="TRANSFORM" type="REFORMAT" transformClass="com... ...ReformatAccount"/> <Node id="OUTPUT" type="com... ...DelimitedDataWriter" fileURL="${outputFile}"/> <Edge id="INEDGE" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/> <Edge id="OUTEDGE" fromNode="COPY:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
In the previous example, the Oracle Identity Analytics ETL processor will transform all the files dropped in the ETL location that match the tss_\w*_accounts[\.\w]*
format to the following:
tss_endpoint01_accounts.csv tss_endpoint02_accounts.csv
Thus, a different transformation can be applied to each Resource type and to each resource within a Resource type.
Metadata defines records node for node. In the previous example, the metadata is defined in a file called TSSAccount.fmt
.
A record must be defined as delimited
or fixed
. When the record is defined as delimited
, then the attribute delimiter is required. When the record is defined as fixed
, a size
attribute is required.
The following example shows the contents of the TSSAccount.fmt
file:
<?xml version="1.0" encoding="UTF-8"?> <Record name="TestInput" type="delimited"> <Field name="name" type="string" delimiter=","/> <Field name="comments" type="string" delimiter=","/> <Field name="endPoint" type="string" delimiter=","/> <Field name="domain" type="string" delimiter=","/> <Field name="suspended" type="string" delimiter=","/> <Field name="locked" type="string" delimiter=","/> <Field name="AcidAll" type="string" delimiter=","/> <Field name="AcidXAuth" type="string" delimiter=","/> <Field name="FullName" type="string" delimiter=","/> <Field name="GroupMemberOf" type="string" delimiter=","/> <Field name="InstallationData" type="string" delimiter=","/> <Field name="ListDataResource" type="string" delimiter=","/> <Field name="ListDataSource" type="string" delimiter=","/> <Field name="M8All" type="string" delimiter="\r\n"/> </Record>
A node is an element that performs a specific task. In the following example, the Node INPUT
reads from a CSV
file, the node TRANSFORM
transforms the data, and the last Node, OUTPUT
, writes the resulting records to a CSV file.
<Node id="INPUT" type="com... ...DelimitedDataReader" fileURL="${inputFile}"/> <Node id="TRANSFORM" type="REFORMAT" transformClass="com... ...ReformatAccount"/> <Node id="OUTPUT" type="com... ...DelimitedDataWriter" fileURL="${outputFile}"/>
The element's type
attribute refers to a CloverETL or Oracle Identity Analytics class. You can specify a complete class name or a short class name.
Oracle Identity Analytics provides the following nodes to read and write CSV files:
com.vaau.rbacx.etl.clover.components.DelimitedDataReader
com.vaau.rbacx.etl.clover.domain.DelimitedDataWriter
Oracle Identity Analytics also provides the com.vaau.rbacx.etl.clover.components.ExcelDataReader
node to read Excel files.
The Edge element connects nodes. Nodes can have more than one input or output. To indicate a port to connect to, add a semicolon and the port number to the Node.
<Edge id="INEDGE" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/>
In this example, the output port 0 of the node INPUT1
connects to the input port 0 of the node COPY
, and the records are described in the XML element InMetadata
.
The attributes rbacxRegxLookupFiles
and rbacxExecuteAlways
are not part of the CloverETL graph definition. They are processed by the Oracle Identity Analytics ETL Manager.
The attribute rbacxRegxLookupFiles
is a regular expression for file names.
ETL Manager scans the drop location with this regular expression. When ETL Manager finds a file that matches this pattern, ETL Manager runs the graph with the following parameters:
inputFile : Absolute path of the file found in the Drop Location graphsLocation : Graph Location outputLocation : Output Location dropLocation : Drop Location outputFile : Absolute path for the output File
If the attribute rbacxRegxLookupFiles
equals true, but no file is found (for example, if reading from a database), ETL Manager runs the graph without defining the parameters inputFile
and outputFile
.
ETL properties are configured in RBACX_HOME/conf/iam.properties
.
Table 3-1 ETL Configuration Properties
Property Name | Variable | Description |
---|---|---|
ETL Graphs Location |
|
Directory in which to place the CloverETL graph files. |
ETL Drop Location |
|
Directory in which to place data files that need transformation. |
ETL Complete Location |
|
All processed files are moved to this directory after the ETL Manager completes the processing of the file. |
ETL Output Location |
|
This property specifies the directory in which to place the output of the transformation. To allow Oracle Identity Analytics to import the ETL output, this location should point to the Oracle Identity Analytics File Imports Drop Location. |
This section includes reference information on the DelimitedDataReader
, the DelimitedDataWriter
, and the ExcelDataReader
.
CloverETL already has a .csv
reader, but using the Oracle Identity Analytics version is recommended. If different delimiters are in use, however, use the CloverETL version.
Provide the file URL for the DelimitedDataReader
.
<Node id="INPUT" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader" fileURL="${inputFile}"/>
Provide the file URL for the DelimitedDataWriter
.
<Node id="OUTPUT" type=" com.vaau.rbacx.etl.clover.domain.DelimitedDataWriter" fileURL="${outputFile}"/>
This Oracle Identity Analytics node reads Excel files.
Attributes:
fileURL
- This attribute is Mandatory.
Row_From
- Number of the initial Row. (Optional, Default value = 1
)
Row_To
- Number of the final Row. (Optional, Default value= -1
(All))
Col_From
- Number of the initial Column. (Optional, Default value = 1
)
There is no Col_To
because the reader uses the metadata to know how many columns it has to read.
<Node id="INPUT1" type="com.vaau.rbacx.etl.clover.components.ExcelDataReader" fileURL="${inputFile}" Row_From="1" />
When a file with the pattern tss_\w*_accounts[\.\w]*
is found in the drop location by the ETL Manager, the following graph is executed. The ETL Manager will read the file_01.dat
, file_02.dat
, and file_03.dat
CSV files using the com.vaau.rbacx.etl.clover.components.DelimitedDataReader
node and then merge the data with the MERGE
node. The output file will keep the sort order stated in mergeKey="ShipName;ShipVia"
.
The file with the pattern tss_\w*_accounts[\.\w]*
is moved to the completed location. The files file_01.dat
, file_02.dat
, and file_03.dat
stay in the c:\tss
folder. The output file will have the same name as the input file.
<Graph name="TestingMerge" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*"> <!-- This graph illustrates usage of MERGE component. It merges data based on the specified key. --> <Global> <Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/tss_accunts.fmt"/> </Global> <Phase number="0"> <Node id="INPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader" fileURL="c:\tss\file_01.dat"/> <Node id="INPUT2" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader" fileURL="c:\tss\file_02.dat"/> <Node id="INPUT3" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader" fileURL="c:\tss\file_03.dat"/> <Node id="MERGE" type="MERGE" mergeKey="ShipName;ShipVia"/> <Node id="OUTPUT" type="com.vaau.rbacx.etl.clover.domain.DelimitedDataWriter" fileURL="${outputFile}"/> <Edge id="INEDGE1" fromNode="INPUT1:0" toNode="MERGE:0" metadata="InMetadata"/> <Edge id="INEDGE2" fromNode="INPUT2:0" toNode="MERGE:1" metadata="InMetadata"/> <Edge id="INEDGE3" fromNode="INPUT3:0" toNode="MERGE:2" metadata="InMetadata"/> <Edge id="OUTEDGE" fromNode="MERGE:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
The following graph demonstrates the functionality of the Extended Filter component.
It can filter on text, date, integer, and numeric fields with comparison operators: ( >
, <
, ==
, <=
, >=
, !=
).
Text fields can also be compared to a Java regular expression using the ~=
operator.
A filter can be made of different parts separated by a logical operator AND
or OR
. Parentheses for grouping individual comparisons are also supported. For example, $Age>10
and ($Age <20 or $HireDate<"2003-01-01")
.
A filter works on a single input record, where individual fields of the record are referenced using a dollar sign and the field's name. For example, $Age
,$Name
.
The date format for date constants is yyyy-MM-dd
or yyyy-MM-dd hh:mm:ss
.
The following graph produces one output file where all employees have the pattern "DELTSO[0-9]*0
" in the comments field.
<Graph name="Testing Filter" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*"> <Global> <Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/InAccounts.fmt"/> </Global> <Phase number="0"> <Node id="INPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataReader" fileURL="\$\{inputFile\}"/> <Node id="FILTEREMPL2" type="EXT_FILTER"> $comments~="DELTSO[0-9]*0" </Node> <Node id="OUTPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter" fileURL="\$\{outputFile\}"/> <Edge id="INEDGE1" fromNode="INPUT1\:0" toNode="FILTEREMPL2:0" metadata="InMetadata"/> <Edge id="INNEREDGE3" fromNode="FILTEREMPL2\:0" toNode="OUTPUT1:0" metadata="InMetadata"/> </Phase> </Graph>
The following graph transforms a Fixed Length Data file into a CSV file.
<Graph name="Testing Filter" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*"> <Global> <Metadata id="OutMetadata" fileURL="${graphsLocation}/metadata/InAccounts.fmt"/> <Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/InAccountsFixedWith.fmt"/> </Global> <Phase number="0"> <Node id="INPUT1" type="FIXLEN_DATA_READER_NIO" OneRecordPerLine="true" SkipLeadingBlanks="true" LineSeparatorSize="2" fileURL=" $ { inputFile } "/> <Node id="COPY" type="SIMPLE_COPY"/> <Node id="OUTPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter" fileURL="${outputFile}"/> <Edge id="INEDGE1" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/> <Edge id="OUTEDGE1" fromNode="COPY:0" toNode="OUTPUT1:0" metadata="OutMetadata"/> </Phase> </Graph>
Following is the contents of the file InAccountsFixedWith.fmt
.
<?xml version="1.0" encoding="UTF-8"?> <Record name="TestInput" type="fixed"> <Field name="name" type="string" size="16"/> <Field name="comments" type="string" size="16"/> <Field name="endPoint" type="string" size="16"/> <Field name="domain" type="string" size="5"/> <Field name="suspended" type="string" size="10"/> <Field name="locked" type="string" size="10"/> <Field name="AcidAll" type="string" size="10"/> <Field name="AcidXAuth" type="string" size="10"/> <Field name="FullName" type="string" size="40"/> <Field name="GroupMemberOf" type="string" size="60"/> <Field name="InstallationData" type="string" size="60"/> <Field name="ListDataResource" type="string" size="10"/> <Field name="ListDataSource" type="string" size="10"/> <Field name="M8All" type="string" size="10"/> </Record>
This node imports data from databases. In the following example, the ETL Manager executes the graph for each file that matches the pattern in rbacxRegxLookupFiles
.
<Graph name="Testing Filter" rbacxRegxLookupFiles="tss_\w*_accounts[\.\w]*"> <Global> <Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/InAccountsFromDB.fmt"/> <Metadata id="OutMetadata" fileURL="${graphsLocation}/metadata/OutAccounts.fmt"/> <DBConnection id="InterbaseDB" dbConfig="${graphsLocation}/dbConfig/Rbacx.cfg"/> </Global> <Phase number="0"> <Node id="INPUT1" type="DB_INPUT_TABLE" dbConnection="InterbaseDB"> <SQLCode> select * from tss_01_accounts </SQLCode> </Node> <Node id="COPY" type="REFORMAT" > import org.jetel.component.DataRecordTransform; import org.jetel.data.DataRecord; import org.jetel.data.SetVal; import org.jetel.data.GetVal; public class reformatAccount extends DataRecordTransform{ int counter=0; DataRecord source; DataRecord target; public boolean transform(DataRecord _source[], DataRecord[] _target) { StringBuffer strBuf = new StringBuffer(80); source=_source[0]; target=_target[0]; try { SetVal.setString(target,"name",GetVal.getString(source,"name")); SetVal.setString(target,"comments",GetVal.getString(source,"comments")); SetVal.setString(target,"endPoint",GetVal.getString(source,"endPoint")); SetVal.setString(target,"domain",GetVal.getString(source,"domain")); SetVal.setString(target,"suspended", getBooleanString(GetVal.getInt(source,"suspended"))); SetVal.setString(target,"locked", getBooleanString(GetVal.getString(source,"locked"))); SetVal.setString(target,"AcidAll",GetVal.getString(source,"AcidAll")); SetVal.setString(target,"AcidXAuth",GetVal.getString(source,"AcidXAuth")); SetVal.setString(target,"FullName",GetVal.getString(source,"FullName")); SetVal.setString(target,"GroupMemberOf", GetVal.getString(source,"GroupMemberOf")); SetVal.setString(target,"InstallationData", GetVal.getString(source,"InstallationData")); SetVal.setString(target,"ListDataResource", GetVal.getString(source,"ListDataResource")); SetVal.setString(target,"ListDataSource", GetVal.getString(source,"ListDataSource")); SetVal.setString(target,"M8All",GetVal.getString(source,"M8All")); } catch (Exception ex) { errorMessage = ex.getMessage() + " ->occured with record :" + counter; return false; } counter++; return true; } private String getBooleanString(int value){ if(value==0) return "FALSE"; else return "TRUE"; } } </Node> <Node id="OUTPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter" fileURL="${outputFile}/> <Edge id="INEDGE1" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/> <Edge id="OUTEDGE1" fromNode="COPY:0" toNode="OUTPUT1:0" metadata="OutMetadata"/> </Phase> </Graph>
If you don't want to execute this graph by putting a file in the drop location, add the attribute rbacxExecuteAlways=true
.
<Graph name="Testing Filter" rbacxExecuteAlways="true" > <Global> <Metadata id="InMetadata" fileURL="${graphsLocation}/metadata/InAccountsFromDB.fmt"/> <Metadata id="OutMetadata" fileURL="${graphsLocation}/metadata/OutAccounts.fmt"/> <DBConnection id="InterbaseDB" dbConfig="${graphsLocation}/dbConfig/Rbacx.cfg"/> </Global> <Phase number="0"> <Node id="INPUT1" type="DB_INPUT_TABLE" dbConnection="InterbaseDB"> <SQLCode> select * from tss_01_accounts </SQLCode> </Node> <Node id="COPY" type="REFORMAT" > import org.jetel.component.DataRecordTransform; import org.jetel.data.DataRecord; import org.jetel.data.SetVal; import org.jetel.data.GetVal; public class reformatAccount extends DataRecordTransform{ int counter=0; DataRecord source; DataRecord target; public boolean transform(DataRecord _source[], DataRecord[] _target) { StringBuffer strBuf = new StringBuffer(80); source=_source[0]; target=_target[0]; try { SetVal.setString(target,"name",GetVal.getString(source,"name")); SetVal.setString(target,"comments",GetVal.getString(source,"comments")); SetVal.setString(target,"endPoint",GetVal.getString(source,"endPoint")); SetVal.setString(target,"domain",GetVal.getString(source,"domain")); SetVal.setString(target,"suspended", getBooleanString(GetVal.getInt(source,"suspended"))); SetVal.setString(target,"locked", getBooleanString(GetVal.getString(source,"locked"))); SetVal.setString(target,"AcidAll",GetVal.getString(source,"AcidAll")); SetVal.setString(target,"AcidXAuth",GetVal.getString(source,"AcidXAuth")); SetVal.setString(target,"FullName",GetVal.getString(source,"FullName")); SetVal.setString(target, "GroupMemberOf",GetVal.getString(source,"GroupMemberOf")); SetVal.setString(target, "InstallationData",GetVal.getString(source,"InstallationData")); SetVal.setString(target, "ListDataResource",GetVal.getString(source,"ListDataResource")); SetVal.setString(target, "ListDataSource",GetVal.getString(source,"ListDataSource")); SetVal.setString(target,"M8All",GetVal.getString(source,"M8All")); } catch (Exception ex) { errorMessage = ex.getMessage() + " ->occured with record :" + counter; return false; } counter++; return true; } private String getBooleanString(int value){ if(value==0) return "FALSE"; else return "TRUE"; } } </Node> <Node id="OUTPUT1" type="com.vaau.rbacx.etl.clover.components.DelimitedDataWriter" fileURL="${outputLocation}/tss_01_accounts.dat"/> <Edge id="INEDGE1" fromNode="INPUT1:0" toNode="COPY:0" metadata="InMetadata"/> <Edge id="OUTEDGE1" fromNode="COPY:0" toNode="OUTPUT1:0" metadata="OutMetadata"/> </Phase> </Graph>
This section discusses how to move data to and from the database using CloverETL.
CloverETL uses the JDBC™ API to communicate with databases. If your database has a driver supporting the JDBC API, CloverETL can be used to unload data stored within database table, or it can populate a database table with internal data.
Before you can connect to a database, you must define the DBConnection. This property is defined within a graph.
<DBConnection id="InterbaseDB" dbConfig="Interbase.cfg"/>
This specifies that CloverETL should set up a database connection called InterbaseDB. All required parameters (JDBC driver name, DB connect string, user name, and password) can be found in the configuration file Interbase.cfg
.
The dbConfig
file is a standard Java properties file. It contains names of parameters along with their values. The following table lists the possible parameters.
Table 3-2 DBConnection Parameters
Parameter Name | Description of Parameter | Example of Parameter's Value |
---|---|---|
|
Specifies the name of the class containing the JDBC driver for your database. This class must be visible to Java (be part of |
|
|
URL for connecting to the database, including the name of JDBC driver to use, the IP address where the server listens, the name of the database instance, and the port. |
j |
|
The user name under which to connect to the database. |
|
|
The password to be used. |
|
|
Optional. The location of the JDBC driver class. |
|
JDBC driver-specific parameters |
Optional. Specify as needed. |
Oracle example: |
The following example lists the possible contents of a Postgres.cfg
file that defines the connection to a PostgreSQL database:
Postgres.cfg
dbDriver=org.postgresql.Driver dbURL=jdbc:postgresql://192.168.1.100/mydb user=david password=unknown
All parameters can also be directly specified when defining the connection:
Defining Connection
<DBConnection id="InterbaseDB" dbDriver="org.postgresql.Driver" dbURL="jdbc:postgresql://192.168.1.100/mydb" user="david" password="unknown"/>
The values specified with the dbConfig
parameter takes precedence over parameters specified in a properties file.
When working with the database through JDBC drivers, CloverETL needs to map its internal data types onto JDBC data types. The variety of DB (JDBC) field types is large, but most of them (with the exception of BLOBs) can be mapped to Clover internal types without losing any information.
The following table lists JDBC data types and corresponding CloverETL data types. The conversion is done automatically by CloverETL when analyzing DB tables using the org.jetel.database.AnalyzeDB
utility. This conversion can also be made manually.
Table 3-3 JDBC Data Types and Corresponding CloverETL Data Types
JDBC (DB) Data Type | CloverETL Data Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
(True value coded as |
The following example illustrates the conversion. First, the DDL (Oracle DB) definition of the database table is presented, and then Clover's version of the same thing using its internal data types.
DDL (Oracle DB)
create table MYEMPLOYEE ( EMP_NO NUMBER not null, FIRST_NAME VARCHAR2(15) not null, LAST_NAME VARCHAR2(20) not null, PHONE_EXT VARCHAR2(4), HIRE_DATE DATE not null, DEPT_NO CHAR(3) not null, JOB_CODE VARCHAR2(5) not null, JOB_GRADE NUMBER(4,2) not null, JOB_COUNTRY VARCHAR2(15) not null, SALARY NUMBER(15,2) not null, FULL_NAME VARCHAR2(35) );
Clover's Version
<?xml version="1.0" encoding="UTF-8"?> <!-- Automatically generated from database null --> <Record name="EMPLOYEE" type="delimited"> <Field name="EMP_NO" type="numeric" delimiter="," format="#"/> <Field name="FIRST_NAME" type="string" delimiter="," /> <Field name="LAST_NAME" type="string" delimiter="," /> <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," /> <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" /> <Field name="DEPT_NO" type="string" delimiter="," /> <Field name="JOB_CODE" type="string" delimiter="," /> <Field name="JOB_GRADE" type="numeric" delimiter="," /> <Field name="JOB_COUNTRY" type="string" delimiter="," /> <Field name="SALARY" type="numeric" delimiter="," /> <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" /> </Record>
The reverse conversion from a CloverETL to JDBC data type (usually done when populating a target DB table) is also driven by JDBC data types. There are some exceptions that are caused by the non-existence of certain field types on the CloverETL side. These exceptions are handled automatically by CloverETL. Internally it is done by calling different than standard JDBC methods for populating database fields with values. Refer to the source code (org.jetel.database.CopySQLData
) to get detailed information.
Table 3-4 Conversions Performed When Converting From CloverETL to JDBC
JDBC Type | CloverETL Type | Conversion Performed |
---|---|---|
Timestamp |
Date |
Date is converted to Timestamp, and the target is set using the |
Boolean Bit |
String |
If the string contains |
Decimal Double Numeric Real |
Integer |
Conversion from Integer to Decimal is made. The target is set using the |
Other (includes
|
String |
The target is set using the |
The CloverETL package contains a simple utility that can analyze a source or target database table and produce Clover's metadata description file. This metadata can be used by any DB-related component.
The following table lists the parameters that can be specified with the AnalyzeDB
command. The command must specify which database to connect to and which database table to analyze. You can use the same DBConnection
file described previously in Section 3.5.2, "DBConnection."
To specify which table to analyze, supply an SQL query to execute against the database. The returned result set is examined for field types. As a result, you can extract and analyze a portion of table.
The following table lists the options and parameters:
Table 3-5 Parameters for use With the AnalyzeDB Utility
Parameter | Description |
---|---|
|
JDBC driver to use |
|
Database name (URL) |
|
Config or Property file containing parameters |
- |
User name |
|
User's password |
|
Delimiter to use (a comma |
|
Output file to use ( |
|
Read SQL query from file name |
|
SQL query on command line |
|
Displays list of driver's properties |
The following example examines all data fields of the employees DB table:
java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select * from employees where 1=0"
The following example extracts specific fields, as stated in the SQL query:
java -cp cloverETL.rel-1-x.zip org.jetel.database.AnalyzeDB -config postgres.sql -q "select emp_no,full_name from employees where 1=0"
To unload data from the database table, use the DBInputTable
component. It requires that the dbConnection
parameter be specified and an SQL command (sqlQuery
parameter), which will be executed against the database specified by dbConnection
.
Individual fields fetched from the database are mapped to Clover data records/fields. (See Section 3.5.2.2, "JDBC to CloverETL."). The structure of the Clover record is determined by specified Clover metadata. (Metadata is assigned to an Edge, which connects DBInputTable
with other components connected to DBInputTable
.)
The following example transformation graph uses the DBInputTable
component:
Transformation Graph
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingDB"> <Global> <Metadata id="InMetadata" fileURL="metadata/employee.fmt"/> <DBConnection id="PosgressDB" dbConfig="Posgress.cfg"/> </Global> <Phase number="0"> <Node id="INPUT" type="DB_INPUT_TABLE" dbConnection="PosgressDB" sqlQuery="select * from employee"/> <Node id="OUTPUT" type="DELIMITED_DATA_WRITER_NIO" append="false" fileURL="employees2.list.out"/> <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
The SQL command (sqlQuery
) can be more complicated than the previous example suggests. You can use any valid SQL construct, but make sure that the metadata corresponds to the number and types of returned data fields.
When there is a need to populate a database table with data coming from a CloverETL transformation graph, the DBOutputTable
component can be used to fulfill it. It is complementary to DBInputTable
. It maps CloverETL data records and individual fields to target database table fields. It can perform simple data conversions to successfully map CloverETL basic data types on to target database variants. See the previous Section 3.5.2.3, "CloverETL to JDBC."
The following example illustrates the usage of DBOutputTable
:
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingDB2"> <Global> <Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/> <DBConnection id="PosgressDB" dbConfig="posgress.cfg"/> </Global> <Phase number="0"> <Node id="INPUT" type="DELIMITED_DATA_READER_NIO" fileURL="employees.list.dat" /> <Node id="OUTPUT" type="DB_OUTPUT_TABLE" dbConnection="PosgressDB" dbTable="myemployee" /> <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
If you need to populate only certain fields of the target DB table (when, for instance, one field is automatically populated from a DB sequence), the dbFields
parameter of DBOutputTable
can be used:
<Node id="OUTPUT2" type="DB_OUTPUT_TABLE" dbConnection="PosgressDB" dbTable="myemployee" dbFields="FIRST_NAME;LAST_NAME" />
The DBOutputTablecloverFields
parameter can be used to precisely specify mapping from CloverETL data records to database table records. It allows you to specify which source field (from Clover) is mapped to which target database table field.
Coupled with dbFields
, it specifies a 1:1 mapping. Individual fields are mapped according to the order in which they appear in dbFields
and cloverFields
, respectively. The parameter that determines how many fields will be populated is always dbFields
. When there is no dbFields
parameter present, CloverETL assumes that all target fields should be populated in the order in which they appear in the target database table.
The following examples illustrate how to pick certain fields from the source data record (a CloverETL record), regardless of their order, and map them to target database table fields (again, regardless of their order).
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingDB3"> <Global> <Metadata id="InMetadata" fileURL="metadata/myemployee.fmt"/> <DBConnection id="PosgressDB" dbConfig="posgress.cfg"/> </Global> <Phase number="1"> <Node id="INPUT" type="DELIMITED_DATA_READER_NIO" fileURL="employees2.list.tmp" /> <Node id="OUTPUT" type="DB_OUTPUT_TABLE" dbConnection="InterbaseDB" dbTable="myemployee" dbFields="FIRST_NAME;LAST_NAME" cloverFields="LAST_NAME;FIRST_NAME" /> <Edge id="INEDGE" fromNode="INPUT:0" toNode="OUTPUT:0" metadata="InMetadata"/> </Phase> </Graph>
The resulting mapping between fields specified in the previous example is:
Source Field (CloverETL) |
Target Field (DB Table) |
---|---|
LAST_NAME |
FIRST_NAME |
FIRST_NAME |
LAST_NAME |
Sometimes you need to execute one or more database commands that do not require any input. Examples include creating a new table, adding a data partition, and dropping an index. For this purpose, CloverETL offers the DBExecute component.
The DBExecute component takes specified commands and executes them one by one against the database. You can define whether all commands form one transaction, or whether they should be committed to the database after each command.
The following is a simple example of DBExecute:
<?xml version="1.0" encoding="UTF-8"?> <Graph name="TestingExecute"> <Global> <DBConnection id="InterbaseDB" dbConfig="interbase.cfg"/> </Global> <Phase number="0"> <Node id="DBEXEC" type="DB_EXECUTE" dbConnection="InterbaseDB" inTransaction="N"> <SQLCode> create table EMPLOYEE ( EMP_NO NUMBER not null, FIRST_NAME VARCHAR2(15) not null, LAST_NAME VARCHAR2(20) not null, PHONE_EXT VARCHAR2(4), HIRE_DATE DATE not null, DEPT_NO CHAR(3) not null, JOB_CODE VARCHAR2(5) not null, JOB_GRADE NUMBER(4,2) not null, JOB_COUNTRY VARCHAR2(15) not null, SALARY NUMBER(15,2) not null, FULL_NAME VARCHAR2(35) ); insert into employee values(2,'Robert','Nelson','250',28/12/1988,'600','VP',2.0, 'USA',105900.0,'Nelson, Robert'); insert into employee values(4,'Bruce','Young','233',28/12/1988,'621','Eng',2.0, 'USA',97500.0,'Young, Bruce'); insert into employee values(5,'Kim','Lambert','22',06/02/1989,'130','Eng',2.0, 'USA', 102750.0,'Lambert, Kim'); insert into employee values(8,'Leslie','Johnson','410',05/04/1989,'180','Mktg', 3.0,'USA', 64635.0,'Johnson, Leslie'); insert into employee values(9,'Phil','Forest','229',17/04/1989,'622','Mngr',3.0,'USA',75060.0,'Forest, Phil'); </SQLCode> </Node> </Phase> </Graph>
This section provides additional information about the CloverETL DataRecord.
CloverETL works with data in terms of data records, and data fields within records. Internally, all records are represented as variable-length data. This means that every data field consumes only as much memory as needed for storing a field's value. If you have a field of type STRING specified to be 50 characters in length and this field is populated with a string of 20 characters, only 20 characters are allocated in memory.
Moreover, CloverETL does not require that a length be specified. There is an internal maximum length for any field, but it should be enough to accommodate even very long strings. For types other than strings, there is fixed size of the field, regardless of the actual value.
There are some cases when it matters whether you specify the size of each field. This is discussed in the next section.
The following table lists all supported types of data, along with ranges of values for each type.
Table 3-6 Supported Data Types and Value Ranges in CloverETL
Data Type Name | Based on | Size | Range of Values |
---|---|---|---|
string |
|
Depends on actual data length |
|
date |
|
64bit - sizeof(long) |
Starts: January 1, 1970, 00:00:00 GMT increment: 1ms |
integer |
|
32bit - sizeof(int) |
Min: -2 31 Max: 2 31 -1 |
numeric |
|
64bit - sizeof(double) |
Min:2 -1074 Max: (2-2 -52 ) 2 1023 |
long |
|
64bit - size of (long) |
Min: 2 63 -1 Max: -2 63 |
decimal |
NA |
NA |
Not yet implemented |
byte |
|
Depends on actual data length |
Min: 0 Max: 255 |
One way of putting together a description of a record format is to create some Java code and use CloverETL classes/methods calls.
The easier way is to create an XML description of a record format that can be read by CloverETL and automatically materialized in memory.
It is customary to use the .fmt
extension for an XML file that contains metadata describing the format of a data record. The following example shows simple metadata that describes a record containing three data fields:
<?xml version="1.0" encoding="UTF-8"?> <Record name="TestInput" type="delimited"> <Field name="Name" type="string" delimiter=";"/> <Field name="Age" type="numeric" delimiter="|"/> <Field name="City" type="string" delimiter="\n"/> </Record>
This simple example shows the definition of a data record specified as delimited. The record has three fields:
Name (of type string)
Age (of type numeric)
City (of type string)
The are no strict rules for naming fields (and records). However, you use the same rules as for naming Java variables. For example, use only letters [a-zA-Z], numbers [0-9] (not in the first position), and underscores [ _ ].
The encoding specified for the XML file is UTF-8.
Note:
When creating a file, you must save the file using the encoding specified in the encoding tag. Otherwise, the XML parser used by CloverETL won't be able to correctly interpret the file.
Each field in the previous example has a specified delimiter character. This information is used by the data parser when parsing data records (of this structure) from external text files. The same delimiters are used when CloverETL outputs internal data records (of this structure) to output text files.
Delimiters can be up to 32 characters long, and each field can have a different one. Basic control characters such as \t
(tabulator), \n
(line feed), and \r
(carriage return) are supported.
The following example shows additional features:
<?xml version="1.0" encoding="UTF-8"?> <!-- Automatically generated from database null --> <Record name="EMPLOYEE" type="delimited"> <Field name="EMP_NO" type="integer" delimiter="," format="#"/> <Field name="FIRST_NAME" type="string" delimiter="," /> <Field name="LAST_NAME" type="string" delimiter="," /> <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," /> <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" /> <Field name="BIRTH_DATE" type="date" delimiter="," locale="en"/> <Field name="DEPT_NO" type="string" delimiter="," /> <Field name="JOB_CODE" type="string" delimiter="," /> <Field name="JOB_GRADE" type="numeric" delimiter="," format="#" /> <Field name="JOB_COUNTRY" type="string" delimiter="," /> <Field name="SALARY" type="numeric" delimiter="," /> <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" /> </Record>
Some fields, such as PHONE_EXT
, have the nullable
attribute set to yes, which means that the field is allowed to contain a null value. The default is yes or true (that is, the field can contain a null value). The exact behavior is influenced by a concrete data parser or data formatter, but simply put, when a field is not specified to be nullable and an application tries to put a null value in it, this operation fails. This can stop the whole transformation process.
Use the Format
attribute to specify the expected format of data when parsing in, or printing out of, CloverETL. In this case, the HIRE_DATE
field is of type date and it is specified that date values in external textual data will look like this: 19/12/1999
For all possible format specifiers (control characters), see the documentation for java.text.SimpleDateFormat
.
Similar to HIRE_DATE
is the JOB_GRADE
field, which is of type numeric. Here the format specifies that data is expected to be integer numbers only (no decimal point allowed).
See the following tables for date and number format specifiers.
Table 3-7 Date and Time Specifiers
Letter | Date or Time Component | Presentation | Examples |
---|---|---|---|
G |
Era designator |
Text |
AD |
y |
Year |
Year |
1996; 96 |
M |
Month in year |
Month |
July; Jul; 07 |
w |
Week in year |
Number |
27 |
W |
Week in month |
Number |
2 |
D |
Day in year |
Number |
189 |
d |
Day in month |
Number |
10 |
F |
Day of week in month |
Number |
2 |
E |
Day in week |
Text |
Tuesday; Tue |
a |
Am/pm marker |
Text |
PM |
H |
Hour in day (0-23) |
Number |
0 |
k |
Hour in day (1-24) |
Number |
24 |
K |
Hour in am/pm (0-11) |
Number |
0 |
h |
Hour in am/pm (1-12) |
Number |
12 |
m |
Minute in hour |
Number |
30 |
s |
Second in minute |
Number |
55 |
S |
Millisecond |
Number |
978 |
z |
Time zone |
General time zone |
Pacific Standard Time; PST; GMT-08:00 |
Z |
Time zone |
RFC 822 time zone |
-0800 |
Table 3-8 Date and Time Format Examples
Date and Time Pattern | Result |
---|---|
|
2001.07.04 AD at 12:08:56 PDT |
|
Wed, Jul 4, '01 |
|
12:08 PM |
|
12 o'clock PM, Pacific Daylight Time |
|
0:08 PM, PDT |
|
02001.July.04 AD 12:08 PM |
|
Wed, 4 Jul 2001 12:08:56 -0700 |
|
010704120856-0700 |
Symbol | Location | Localized | Meaning |
---|---|---|---|
|
Number |
Localized |
Digit |
|
Number |
Localized |
Digit, zero shows as absent |
|
Number |
Localized |
Decimal separator or monetary decimal separator |
- |
Number |
Localized |
Minus sign |
|
Number |
Localized |
Grouping separator |
|
Number |
Localized |
Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix. |
|
Subpattern boundary |
Localized |
Separates positive and negative subpatterns |
|
Prefix or suffix |
Localized |
Multiply by 100 and show as percentage |
|
Prefix or suffix |
Localized |
Multiply by 1000 and show as per mille |
|
Prefix or suffix |
Not localized |
Currency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If present in a pattern, the monetary decimal separator is used instead of the decimal separator. |
|
Prefix or suffix |
Not localized |
Used to quote special characters in a prefix or suffix, for example, |
When specifying the format for numbers, Clover (Java) uses the default system locale setting, unless another locale is specified through the locale option.
This is important in cases when you are parsing data where decimal numbers use a ,
(comma) as a decimal separator, whereas the system default (global) says it is .
(period).
In such a case, use the locale option together with the format option to change the expected decimal delimiter. For example:
<Field name="Freight" type="numeric" delimiter="|" format="#.#" locale="en.US" />
Instead of specifying a format parameter, you can specify a locale parameter, which states the geographical, political, or cultural region for formatting data. Thus, instead of specifying the format for the date field, you could specify the locale for Germany (locale="de"
), for example. Clover automatically chooses the proper date format for Germany.
There are cases when both format and locale parameters make sense, for example when formatting decimal numbers. Define the format pattern with a decimal separator, and the locale specifies whether the separator is a comma or a dot.
CloverETL allows you to specify a default value for each field. This value is used (in certain cases) when a field is assigned to be null, but a null value is not allowed for the field.
The following example shows fields with specified default values:
<?xml version="1.0" encoding="UTF-8"?> <Record name="Orders" type="delimited"> <Field name="OrderID" type="numeric" delimiter="|" format="#" /> <Field name="OrderDate" type="date" delimiter="|" format="dd.MM.yyyy" default="01.01.1900" nullable="no" /> <Field name="Amount" type="number" delimiter="\n" default="0.0" nullable="no" /> </Record>
In this example, OrderDate
is defaulted to 1.1.1900
, in case it is not present in the text data this record is parsed from. In general, when this field is assigned a null value, the specified default value is assigned instead. The same is true for the Amount
field, except the default is specified to be 0
.
Note - This behavior is not the default and concerns only data parsers. If your code assigns a null value into a non-nullable field, a BadDataFormatException
error will occur.
If you use any of the Clover data parsers, you can specify a DataPolicy
, which states what should happen if a parsed value cannot be assigned to a data field (as in the case when the value is null and the field cannot accept null values).
There are three different data policies defined:
Strict - Any problem causes BadDataFormatException
. This is the default behavior.
Controlled - Similar to strict, but also logs the problematic value.
Lenient - If a default value exists, CloverETL attempts to assign that default value.