|Skip Navigation Links|
|Exit Print View|
|Oracle Identity Analytics Business Administrator's Guide 11g Release 1|
This section provides additional information about the CloverETL DataRecord.
CloverETL works with data in terms of data records, and data fields within records. Internally, all records are represented as variable-length data. This means that every data field consumes only as much memory as needed for storing a field's value. If you have a field of type STRING specified to be 50 characters in length and this field is populated with a string of 20 characters, only 20 characters are allocated in memory.
Moreover, CloverETL does not require that a length be specified. There is an internal maximum length for any field, but it should be enough to accommodate even very long strings. For types other than strings, there is fixed size of the field, regardless of the actual value.
There are some cases when it matters whether you specify the size of each field. This is discussed in the next section.
The following table lists all supported types of data, along with ranges of values for each type.
One way of putting together a description of a record format is to create some Java code and use CloverETL classes/methods calls.
The easier way is to create an XML description of a record format that can be read by CloverETL and automatically materialized in memory.
It is customary to use the .fmt extension for an XML file that contains metadata describing the format of a data record. The following example shows simple metadata that describes a record containing three data fields:
<?xml version="1.0" encoding="UTF-8"?> <Record name="TestInput" type="delimited"> <Field name="Name" type="string" delimiter=";"/> <Field name="Age" type="numeric" delimiter="|"/> <Field name="City" type="string" delimiter="\n"/> </Record>
This simple example shows the definition of a data record specified as delimited. The record has three fields:
Name (of type string)
Age (of type numeric)
City (of type string)
The are no strict rules for naming fields (and records). However, you use the same rules as for naming Java variables. For example, use only letters [a-zA-Z], numbers [0-9] (not in the first position), and underscores [ _ ].
The encoding specified for the XML file is UTF-8.
Note - When creating a file, you must save the file using the encoding specified in the encoding tag. Otherwise, the XML parser used by CloverETL won't be able to correctly interpret the file.
Each field in the previous example has a specified delimiter character. This information is used by the data parser when parsing data records (of this structure) from external text files. The same delimiters are used when CloverETL outputs internal data records (of this structure) to output text files.
Delimiters can be up to 32 characters long, and each field can have a different one. Basic control characters such as \t (tabulator), \n (line feed), and \r (carriage return) are supported.
The following example shows additional features:
<?xml version="1.0" encoding="UTF-8"?> <!-- Automatically generated from database null --> <Record name="EMPLOYEE" type="delimited"> <Field name="EMP_NO" type="integer" delimiter="," format="#"/> <Field name="FIRST_NAME" type="string" delimiter="," /> <Field name="LAST_NAME" type="string" delimiter="," /> <Field name="PHONE_EXT" type="string" nullable="yes" delimiter="," /> <Field name="HIRE_DATE" type="date" delimiter="," format="dd/MM/yyyy" /> <Field name="BIRTH_DATE" type="date" delimiter="," locale="en"/> <Field name="DEPT_NO" type="string" delimiter="," /> <Field name="JOB_CODE" type="string" delimiter="," /> <Field name="JOB_GRADE" type="numeric" delimiter="," format="#" /> <Field name="JOB_COUNTRY" type="string" delimiter="," /> <Field name="SALARY" type="numeric" delimiter="," /> <Field name="FULL_NAME" type="string" nullable="yes" delimiter="\n" /> </Record>
Some fields, such as PHONE_EXT, have the nullable attribute set to yes, which means that the field is allowed to contain a null value. The default is yes or true (that is, the field can contain a null value). The exact behavior is influenced by a concrete data parser or data formatter, but simply put, when a field is not specified to be nullable and an application tries to put a null value in it, this operation fails. This can stop the whole transformation process.
Use the Format attribute to specify the expected format of data when parsing in, or printing out of, CloverETL. In this case, the HIRE_DATE field is of type date and it is specified that date values in external textual data will look like this: 19/12/1999
For all possible format specifiers (control characters), see the documentation for java.text.SimpleDateFormat.
Similar to HIRE_DATE is the JOB_GRADE field, which is of type numeric. Here the format specifies that data is expected to be integer numbers only (no decimal point allowed).
See the following tables for date and number format specifiers.
When specifying the format for numbers, Clover (Java) uses the default system locale setting, unless another locale is specified through the locale option.
This is important in cases when you are parsing data where decimal numbers use a , (comma) as a decimal separator, whereas the system default (global) says it is . (period).
In such a case, use the locale option together with the format option to change the expected decimal delimiter. For example:
<Field name="Freight" type="numeric" delimiter="|" format="#.#" locale="en.US" />
Instead of specifying a format parameter, you can specify a locale parameter, which states the geographical, political, or cultural region for formatting data. Thus, instead of specifying the format for the date field, you could specify the locale for Germany (locale="de"), for example. Clover automatically chooses the proper date format for Germany.
There are cases when both format and locale parameters make sense, for example when formatting decimal numbers. Define the format pattern with a decimal separator, and the locale specifies whether the separator is a comma or a dot.
CloverETL allows you to specify a default value for each field. This value is used (in certain cases) when a field is assigned to be null, but a null value is not allowed for the field.
The following example shows fields with specified default values:
<?xml version="1.0" encoding="UTF-8"?> <Record name="Orders" type="delimited"> <Field name="OrderID" type="numeric" delimiter="|" format="#" /> <Field name="OrderDate" type="date" delimiter="|" format="dd.MM.yyyy" default="01.01.1900" nullable="no" /> <Field name="Amount" type="number" delimiter="\n" default="0.0" nullable="no" /> </Record>
In this example, OrderDate is defaulted to 1.1.1900, in case it is not present in the text data this record is parsed from. In general, when this field is assigned a null value, the specified default value is assigned instead. The same is true for the Amount field, except the default is specified to be 0.
Note - This behavior is not the default and concerns only data parsers. If your code assigns a null value into a non-nullable field, a BadDataFormatException error will occur.
If you use any of the Clover data parsers, you can specify a DataPolicy, which states what should happen if a parsed value cannot be assigned to a data field (as in the case when the value is null and the field cannot accept null values).
There are three different data policies defined:
Strict - Any problem causes BadDataFormatException. This is the default behavior.
Controlled - Similar to strict, but also logs the problematic value.
Lenient - If a default value exists, CloverETL attempts to assign that default value.