CloverETL DataRecord Reference

This section provides additional information about the CloverETL DataRecord.

How Data is Represented Within CloverETL

CloverETL works with data in terms of data records, and data fields within records. Internally, all records are represented as variable-length data. This means that every data field consumes only as much memory as needed for storing a field's value. If you have a field of type STRING specified to be 50 characters in length and this field is populated with a string of 20 characters, only 20 characters are allocated in memory.

Moreover, CloverETL does not require that a length be specified. There is an internal maximum length for any field, but it should be enough to accommodate even very long strings. For types other than strings, there is fixed size of the field, regardless of the actual value.

There are some cases when it matters whether you specify the size of each field. This is discussed in the next section.

Supported Data Field Types

The following table lists all supported types of data, along with ranges of values for each type.

Data Type Name	Based On	Size	Range of Values
string	`java.lang.String`	Depends on actual data length
date	`java.util.Date`	64bit - sizeof(long)	Starts: January 1, 1970, 00:00:00 GMT increment: 1ms
integer	`java.lang.Integer`	32bit - sizeof(int)	Min: -2 ³¹ Max: 2 ³¹ -1.
numeric	`java.lang.Double`	64bit - sizeof(double)	Min:2 ^-1074 Max: (2-2 ^-52 ) 2 ¹⁰²³
long	`java.lang.Long`	64bit - size of (long)	Min:2 ⁶³ -1 Max: -2 ⁶³
decimal	NA	NA	Not yet implemented
byte	`java.lang.Byte`	Depends on actual data length	Min: 0 Max: 255

Specification of Record Format

One way of putting together a description of a record format is to create some Java code and use CloverETL classes/methods calls.

The easier way is to create an XML description of a record format that can be read by CloverETL and automatically materialized in memory.

It is customary to use the .fmt extension for an XML file that contains metadata describing the format of a data record. The following example shows simple metadata that describes a record containing three data fields:

<?xml version="1.0" encoding="UTF-8"?>

<Record name="TestInput" type="delimited">

<Field name="Name" type="string" delimiter=";"/>

<Field name="Age" type="numeric" delimiter="|"/>

<Field name="City" type="string" delimiter="\n"/>

</Record>

This simple example shows the definition of a data record specified as delimited. The record has three fields:

Name (of type string)
Age (of type numeric)
City (of type string)

Naming

The are no strict rules for naming fields (and records). However, you use the same rules as for naming Java variables. For example, use only letters [a-zA-Z], numbers [0-9] (not in the first position), and underscores [ _ ].

The encoding specified for the XML file is UTF-8.

Note - When creating a file, you must save the file using the encoding specified in the encoding tag. Otherwise, the XML parser used by CloverETL won't be able to correctly interpret the file.

Delimiters

Each field in the previous example has a specified delimiter character. This information is used by the data parser when parsing data records (of this structure) from external text files. The same delimiters are used when CloverETL outputs internal data records (of this structure) to output text files.

Delimiters can be up to 32 characters long, and each field can have a different one. Basic control characters such as \t (tabulator), \n (line feed), and \r (carriage return) are supported.

Field Formats and Other Features

The following example shows additional features:

<?xml version="1.0" encoding="UTF-8"?>

<!-- Automatically generated from database null -->

<Record name="EMPLOYEE" type="delimited">

    <Field name="EMP_NO" type="integer" delimiter="," format="#"/>

    <Field name="FIRST_NAME" type="string" delimiter="," />

    <Field name="LAST_NAME" type="string" delimiter="," />

    <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," />

    <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" />

    <Field name="BIRTH_DATE" type="date" delimiter="," locale="en"/>

    <Field name="DEPT_NO" type="string" delimiter="," />

    <Field name="JOB_CODE" type="string" delimiter="," />

    <Field name="JOB_GRADE" type="numeric" delimiter="," format="#" />

    <Field name="JOB_COUNTRY" type="string" delimiter="," />

    <Field name="SALARY" type="numeric" delimiter="," />

    <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" />

</Record>

Nullable

Some fields, such as PHONE_EXT, have the nullable attribute set to yes, which means that the field is allowed to contain a null value. The default is yes or true (that is, the field can contain a null value). The exact behavior is influenced by a concrete data parser or data formatter, but simply put, when a field is not specified to be nullable and an application tries to put a null value in it, this operation fails. This can stop the whole transformation process.

Format

Use the Format attribute to specify the expected format of data when parsing in, or printing out of, CloverETL. In this case, the HIRE_DATE field is of type date and it is specified that date values in external textual data will look like this: 19/12/1999

For all possible format specifiers (control characters), see the documentation for java.text.SimpleDateFormat.

Similar to HIRE_DATE is the JOB_GRADE field, which is of type numeric. Here the format specifies that data is expected to be integer numbers only (no decimal point allowed).

See the following tables for date and number format specifiers.

Date

Letter	Date or Time Component	Presentation	Examples
G	Era designator	Text	AD
y	Year	Year	1996; 96
M	Month in year	Month	July; Jul; 07
w	Week in year	Number	27
W	Week in month	Number	2
D	Day in year	Number	189
d	Day in month	Number	10
F	Day of week in month	Number	2
E	Day in week	Text	Tuesday; Tue
a	Am/pm marker	Text	PM
H	Hour in day (0-23)	Number	0
k	Hour in day (1-24)	Number	24
K	Hour in am/pm (0-11)	Number	0
h	Hour in am/pm (1-12)	Number	12
m	Minute in hour	Number	30
s	Second in minute	Number	55
S	Millisecond	Number	978
z	Time zone	General time zone	Pacific Standard Time; PST; GMT-08:00
Z	Time zone	RFC 822 time zone	-0800

Examples:

Date and Time Pattern	Result
"yyyy.MM.dd G 'at' HH:mm:ss z"	2001.07.04 AD at 12:08:56 PDT
"EEE, MMM d, ''yy"	Wed, Jul 4, '01
"h:mm a"	12:08 PM
"hh 'o''clock' a, zzzz"	12 o'clock PM, Pacific Daylight Time
"K:mm a, z"	0:08 PM, PDT
"yyyyy.MMMMM.dd GGG hh:mm aaa"	02001.July.04 AD 12:08 PM
"EEE, d MMM yyyy HH:mm:ss Z"	Wed, 4 Jul 2001 12:08:56 -0700
"yyMMddHHmmssZ"	010704120856-0700

Number

Symbol	Location	Localized	Meaning
0	Number	Localized	Digit
#	Number	Localized	Digit, zero shows as absent
.	Number	Localized	Decimal separator or monetary decimal separator
-	Number	Localized	Minus sign
,	Number	Localized	Grouping separator
E	Number	Localized	Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix.
;	Subpattern boundary	Localized	Separates positive and negative subpatterns
%	Prefix or suffix	Localized	Multiply by 100 and show as percentage
\u2030	Prefix or suffix	Localized	Multiply by 1000 and show as per mille
(\u00A4)	Prefix or suffix	Not localized	Currency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If present in a pattern, the monetary decimal separator is used instead of the decimal separator.
'	Prefix or suffix	Not localized	Used to quote special characters in a prefix or suffix, for example, "'#'#" formats 123 to "#123". To create a single quote itself, use two in a row: "# o''clock".

Number Format

When specifying the format for numbers, Clover (Java) uses the default system locale setting, unless another locale is specified through the locale option.

This is important in cases when you are parsing data where decimal numbers use a , (comma) as a decimal separator, whereas the system default (global) says it is . (period).

In such a case, use the locale option together with the format option to change the expected decimal delimiter. For example:

<Field name="Freight" type="numeric" delimiter="|" format="#.#" locale="en.US" />

Locale

Instead of specifying a format parameter, you can specify a locale parameter, which states the geographical, political, or cultural region for formatting data. Thus, instead of specifying the format for the date field, you could specify the locale for Germany (locale="de"), for example. Clover automatically chooses the proper date format for Germany.

There are cases when both format and locale parameters make sense, for example when formatting decimal numbers. Define the format pattern with a decimal separator, and the locale specifies whether the separator is a comma or a dot.

Specifying Default Values for Fields

CloverETL allows you to specify a default value for each field. This value is used (in certain cases) when a field is assigned to be null, but a null value is not allowed for the field.

The following example shows fields with specified default values:

<?xml version="1.0" encoding="UTF-8"?>

<Record name="Orders" type="delimited">

    <Field name="OrderID" type="numeric" delimiter="|" format="#" />

    <Field name="OrderDate" type="date" delimiter="|" format="dd.MM.yyyy"

default="01.01.1900" nullable="no" />

    <Field name="Amount" type="number" delimiter="\n" default="0.0" 
nullable="no" />

</Record>

In this example, OrderDate is defaulted to 1.1.1900, in case it is not present in the text data this record is parsed from. In general, when this field is assigned a null value, the specified default value is assigned instead. The same is true for the Amount field, except the default is specified to be 0.

Note - This behavior is not the default and concerns only data parsers. If your code assigns a null value into a non-nullable field, a BadDataFormatException error will occur.

If you use any of the Clover data parsers, you can specify a DataPolicy, which states what should happen if a parsed value cannot be assigned to a data field (as in the case when the value is null and the field cannot accept null values).

There are three different data policies defined:

Strict - Any problem causes BadDataFormatException. This is the default behavior.
Controlled - Similar to strict, but also logs the problematic value.
Lenient - If a default value exists, CloverETL attempts to assign that default value.

Skip Navigation Links
Exit Print View
	Oracle Identity Analytics Business Administrator's Guide 11g Release 1