13 Data Loading Format (DLF) Specification

This chapter describes version 1.0 of the Data Loading Format (DLF), which is the standard format to describe translated messages and seed data loaded into the database by the TransX utility. It contains the following topics:

Introduction to DLF

DLF defines a standard format for loading data with the TransX utility. It is intended to supersede loading data with SQL scripts. DLF provides the following advantages:

  • Format validation. Validation results in fewer errors during the translation and loading processes.

  • Ease of use. The user does not have to maintain the character encoding of each data file to correspond with the language used in the data file.

DLF is based on the XML 1.0 specification.

Naming Conventions for DLF

This section describes the naming conventions used in this document.

Elements and Attributes

The following naming conventions for elements and attributes are used throughout this specification:

  • Standard English letters

  • Lowercase letters only

  • Hyphen (-) may be used for concatenation

  • Attribute names must be consistently defined throughout

  • Industry-standard terminology should be followed wherever possible

Values

Values are case-sensitive except for some attribute values used for column names. All predefined attribute values are lowercase. No element values are defined by this specification.

File Extensions

No file extension is recommended by this specification. A future version of this specification may recommend that documents use an extension that conforms to an 8.3 standard.

General Structure of DLF

Data Loading Format is XML, so it begins with an XML declaration. After the XML declaration comes the DLF document itself, enclosed within the <table> element. A DLF document is composed of the following required sections:

  • The <lookup-key> element contains a list of column names that determine whether existing rows in the database are duplicates of the rows in the dataset definition included in the <dataset> element.

  • The <columns> element contains metadata about the <dataset> such as the names, datatypes, and attributes of columns.

  • The <dataset> element contains a <row> element for each row, which in turn contains a <col> element that corresponds to a piece of data that is loaded in a database column. In this way a DLF document looks similar to the familiar tabular format in printing data in the database and allows easy editing.

DLF provides one optional section, which is enclosed within a <translation> element. This section may precede the required sections.

In addition, DLF provides information about TransX utility processing. Such information includes but is not limited to the following:

  • The <query> element is used to retrieve the value to be loaded to the column from a SQL query.

  • The sequence attribute is used to retrieve the value to be loaded to the column from a sequence object in the database.

  • The constant attribute is used to specify a constant value to the column.

  • The language attribute is used to specify the language identifier to be loaded to the column.

Tree Structure of DLF

This section shows the possible structure of a DLF document as a tree. Each element is represented as <element_name>, where element_name is the name of an element. Attributes have no markup. Each element and attribute is followed by notation indicating its possible occurrence. Table 13-1 describes the occurrence notation.

Table 13-1 Notation for Occurrence of Attributes and Elements

Symbol Meaning

1

one

+

one or more

?

zero or one

*

zero or more

(a|b|c)

exactly one of a, b, and c


Example 13-1 shows the tree structure of a DLF document. The elements are described in "Elements in DLF". The attributes are described in "Attributes in DLF".

Example 13-1 DLF Tree Structure

<table>1
 |
 +---- lang?
 |
 +---- space?
 |
 +---- normalize-langtag?
 |
 +---- <translation>?
 |     |
 |     +---- <target>+
 |     |     |
 |     |     +---- <language ID>
 |     |
 |     +---- <restype>+
 |           |
 |           +---- name1
 |           |
 |           +---- expansion?
 |
 +---- <lookup-key>1
 |     |
 |     +---- <column>*
 |           |
 |           +---- name1
 |
 +---- <columns>1
 |     |
 |     +---- <column>+
 |           |
 |           +---- name1
 |           |
 |           +---- type1
 |           |
 |           +---- translate?
 |           |
 |           +---- translation-note?
 |           |
 |           +---- constant?
 |           |
 |           +---- language?
 |           |
 |           +---- sequence?
 |           |
 |           +---- virtual?
 |           |
 |           +---- useforupdate?
 |           |
 |           +---- maxsize?
 |           |
 |           +---- size-unit?
 |           |
 |           +---- restype?
 |           |
 |           +---- space?
 |           |
 |           +---- (<query>|<sql>)?
 |                 |
 |                 +---- text1
 |                 |
 |                 +---- <parameter>*
 |                       |
 |                       +---- id1
 |                       |
 |                       +---- (col|constant)1
 |                       |
 |                       +---- translate?
 |                       |
 |                       +---- trans-key?
 |                       |
 |                       +---- translation-note?
 |
 |
 +---- <dataset>1
       |
       +---- <row>+
             |
             +---- space?
             |
             +---- <col>*
                   |
                   +---- space?
                   |
                   +---- name1
                   |
                   +---- trans-key?
                   |
                   +---- translation-note?
                   |
                   +---- <the text element for the data>

DLF Specifications

This section contains the following topics:

XML Declaration in DLF

The XML declaration starts an XML entity. It indicates the XML version and can be used to declare the encoding of the file, as in the following example:

<?xml version="1.0" encoding="iso-8859-1" ?>

As in all XML files, the default encoding for a DLF file is assumed to be either UTF-8, which is a superset of the 7-bit ASCII character set, or UTF-16, which is conceptually UCS-2 with surrogate pairs for code points above 65,535. Thus, for these character sets, the encoding declaration is not necessary. Furthermore, all XML parsers support these character sets. If the encoding is UTF-16, then the first character of the file must be the Unicode Byte-Order-Mark, #xFEFF, which indicates the endianness of the file.

Other character sets supported by Oracle XML parsers include all Oracle character sets and commonly used IANA character set and Java encodings. The names of these character sets can be found in the parser documentation. You must declare these with encoding declarations if the document does not have an external source of encoding information such as from the execution environment or the network protocol. Therefore, it is recommended that you use a Unicode character encoding so that you can dispense with the encoding declaration. The recommended practice is to encode the document in UTF-8 and use the following declaration:

<?xml version="1.0" ?>

Entity References in DLF

There are five entity references predefined by XML. These entity references are listed in Table 13-2. The &lt; and &amp; entity references must always be used in place of the character they reference.

Table 13-2 Entity References

Entity Reference Meaning

&lt;

Less than sign (<)

&gt;

Greater than sign (>)

&amp;

Ampersand (&)

&apos;

Apostrophe or single quote (')

&quot;

Straight, double quotation mark (")


Elements in DLF

The DLF elements shown in Example 13-1 are divided into the categories described in Table 13-3. Attributes are shared among them. The attributes are described in "Attributes in DLF".

Table 13-3 DLF Elements

Type of Element Tag

Top Level Table Element

<table>

Translation Elements

<target>, <restype>

Lookup Key Elements

<lookup-key>, <column>

Metadata Elements

<columns>, <column>, <query>, <sql>, <parameter>

Data Elements

<dataset>, <row>, <col>


Top Level Table Element

Table 13-10 describes the top level table element.

Table 13-4 Top Level Table Element

Tag Description Required Attributes Optional Attributes Contents

<table>

Corresponds to a single table. It encloses all the other elements of the document.

name

lang, space, normalize-langtag

The order of the elements within <table> is as follows:

  1. <translation> (optional)

  2. <lookup-key>

  3. <columns>

  4. <dataset>


Translation Elements

Table 13-5 describes the translation elements.

Table 13-5 Translation Elements

Tag Description Required Attributes Optional Attributes Contents

<translation>

Contains generic information pertinent to translation.

None

None

Zero or more <target> elements and zero or more <restype> elements

<target>

Specifies a language to which this document is translated.

None

None

A language identifier as defined by [IETFRFC1766]

<restype>

Declares a type of resource.

name

expansion

Empty element


Lookup Key Elements

Table 13-6 describes the lookup key elements.

Table 13-6 Lookup Key Elements

Tag Description Required Attributes Optional Attributes Contents

<lookup-key>

Contains the <column> element(s) based on which the TransX recognizes the rows as identical or duplicate.

name

None

Zero or more <column> elements

<column>

A <column> element under <lookup-key> element indicates a column to be used to recognize the rows as identical or duplicate. Columns with the same values in specified column(s) are considered duplicate, regardless of the values in the other column(s). A lookup key <column> must have corresponding <col>umns in the <dataset> portion or be declared as a <column> with a constant expression or a <query> in the <columns> section.

name

None

Empty element


Metadata Elements

Table 13-7 describes the metadata elements.

Table 13-7 Metadata Elements

Tag Description Required Attributes Optional Attributes Contents

<columns>

Contains data about the data to be loaded.

None

None

One or more <column> elements

<column>

Specifies a column that corresponds to <col> elements under the <dataset> element. Once a <column> is defined the corresponding <col> element must appear in every <row> unless the column has the sequence, constant or query attribute.

name, type in either order.

The recommended sequence is name, type, then optional attributes.

translate, constant, sequence, virtual, useforupdate, maxsize, size-unit, restype in any order

Zero or one <query> or <sql> element

<query>

Specifies a SQL query whose result is used to fill in the column to which this element belongs.

text

None

Zero or more <parameter> elements

<sql>

Specifies a SQL statement whose result, if any, is used to fill in the column to which this element belongs.

text

None

Zero or more <parameter> elements

<parameter>

Specifies a parameter of a <query> element.

id and either col or constant.

If col is specified, the referenced column cannot have the query, constant, or sequence attributes.

translate, trans-key

Empty


Data Elements

Table 13-8 describes the data elements.

Table 13-8 Data Elements

Tag Description Required Attributes Optional Attributes Contents

<dataset>

Contains data to be loaded into the database.

None

None

One or more <row> elements

<row>

Contains data about the data to be loaded <dataset> element.

None

None

Zero or more <col> elements

<col>

Specifies an instance of a piece of data to be loaded to a database column, or in the case of a virtual column, a piece of data to be used to obtain an actual data to be loaded to a database column.

name

trans-key

Data for use by applications


Attributes in DLF

This section lists the various attributes used in the DLF elements. An attribute is never specified more than once for each element. Along with some of the attributes are the Recommended Attribute Values. Values for these attributes are case sensitive.

Table 13-9 Attributes

Type of Attribute Attributes

DLF Attributes

name, type, translate, constant, sequence, virtual, useforupdate, maxsize, size-unit, restype, text, id, col, trans-key, translation-note, normalize-langtag

XML Namespace Attributes

xml:space


DLF Attributes

Table 13-10 describes the DLF attributes.

Table 13-10 DLF Attributes

Attribute Description Value Description Default Value Used by Elements

lang

Specifies the language of the document.

This is equivalent to the xml:lang attribute.

The values of the attribute are language identifiers as defined by [IETFRFC4646].

This attribute does not affect data loading operation in any way.

None; if absent, "en" is assumed

<table>

normalize-langtag

Specifies how to normalize the case of language tag.

"none", "standard", "uppercase" or "lowercase".

The meanings are as follows:

none - no normalization. the values in the language column on DLF are used as they are

standard - the style as recommended by the standards

* lowercase for the 2 letter language code

* uppercase for the 2 letter country code

* titlecase for the 4 letter script code

* lowercase for others

uppercase - uppercase everything

lowercase - lowercase everything

none

<table>

space

Specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.

"default" or "preserve"

The value "default" signals that applications' default white-space processing modes are acceptable; the value "preserve" indicates the intent that applications preserve all whitespace. If this intent is declared at the root table element, it is considered to apply to all string data elements in the whole document. If declared at column level, it is considered to apply to the specified column of every row. If this attribute is declared in the dataset section, the intent applies to the immediate text data only. Declaration at the document or column level may be overridden with another instance of the space attribute.

"default"

<table>, <column>, <col>

name

Specifies the name of an object such as table, column, restype, and so forth.

String: This is a database table name for the <table> element, and a column name for the <column> or <col> element.

n/a

<table>, <column>, <col>

type

The datatype of a column in the dataset. This attribute specifies the kind of text contained in the <col> element in the dataset. Depending on this type, TransX may apply different processes to the data.

Because implicit datatype conversion is provided by XSU and JDBC, TransX does not do its own parsing based on this type information. It uses this attribute to choose appropriate intermediate data types in Java for columns of date or dateTime type, in which case the standard date formats are accepted.

String: possible values are "number", "string", "date", "dateTime" or "binary".

The lexical representation of a value of number type should be supplied in the SQL language syntax, no matter what the current locale is. The SQL syntax uses no digit grouping separator (usually comma), but uses a dot as the decimal separator (usually dot).For the binary data type, the data value specified in a text field between the col tags indicates the name of a file that contains the actual binary data. Raw data cannot be embedded in the text field.For the other data types (string, date, and dateTime) the representation is constrained by the corresponding types in the XML Schema specification.For the sake of simplicity, DLF only accepts standard date formats of XML Schema in the form "CCYY-MM-DDThh:mm:ss" (dateTime) or "CCYY-MM-DD" (date). No other date format is recognized.

TransX uses this attribute for the following:

  • Bind virtual columns to parameters of a query

  • Bind the result of a query to a corresponding column

n/a

<column>

translate

Indicates whether or not the text of this column or parameter should be translated.

Either "yes" or "no"

"no"

<column>, <parameter>

constant

Specifies a constant value for this column or parameter.

The value of this column for every row

n/a

<column>, <parameter>

language

Specifies language identifier for this column

Language identifier or a placeholder. "%x" gets the value from the xml:lang attribute of the root table element.

n/a

<column>

sequence

Specifies a sequence in the database used to fill in the value for this column.

String: The name of a sequence in the database

n/a

<column>

virtual

Indicates that this column provides data used to construct another piece of data, which in turn is loaded into the database. A virtual column does not exist in the database. It is typically used to provide a value of a parameter in a query. A virtual column cannot be a lookup-key column. A virtual column with a query throws the result away.

Either "yes" or "no"

"no"

<column>

useforupdate

Indicates whether the value of this column is used for the update when uploading seed data. This attribute has no effect unless TransX is in the mode to update duplicate rows. A virtual column cannot have this attribute set to yes.

Either "yes" or "no". If this attribute is set to "no", then the value of the column remains unchanged on the update operation.

"yes"

<column>

maxsize

Specifies the maximum size for the data for this column.

Numeric value in the unit specified by the size-unit attribute. If this attribute is set and the size-unit is not set, the size is assumed to be in characters.

None

<column>

size-unit

Specifies the unit of size specified in the maxsize attribute.

Units. Recommended values are "char" for characters, "byte" for bytes.

For supplemental characters, they take two "char" units.

"char"

<column>

restype

Indicates the type of data contained in this column.

A resource type. The value must match with the name of a <restype> element.

None

<column>

expansion

Indicates the maximum size up to which translated strings are allowed to become longer for this type of resource.

A numeric value in percentage of increase.

0%

<restype>

text

Specifies a SQL query statement to obtain a value to put in the column to which the query belongs.

A SQL statement. Zero or more parameters can be specified with an identifier preceded by a colon. The statement should return a single row of a single value. Any excessive result is discarded.

n/a

<query>

id

Specifies a placeholder used in a SQL query statement with parameters. The value of the column specified by the sibling col attribute is associated as a parameter to the query.

String: an identifier that appears in the text attribute of the parent query element.

Empty string

<parameter>

col

Specifies a column to be associated with a placeholder in the query specified by the sibling id attribute.

String: a column name. The column must be other than the column this attribute is a part of.

n/a

<parameter>

trans-key

Specifies a key for translation.

String: a translation key. The value has to be unique in a translation domain.

n/a

<col>, <parameter>

translation-note

Specifies notes for translation.

String: Translation notes.

n/a

<col>, <column>, <parameter>


XML Namespace Attributes

Table 13-11 describes the XML namespace attributes.

Table 13-11 XML Namespace Attributes

Attribute Description Value Description Default Value Used by Elements

xml:space

Specifies how whitespace (ASCII spaces, tabs and line-breaks) should be treated.

"default" or "preserve"

The value "default" signals that applications' default whitespace processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the whitespace. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:space attribute.

"default"

None

xml:lang

Specifies the language of the content.

A language tag defined by RFC 4646.

n/a

table


DLF Examples

This section contains the following topics:

Minimal DLF Document

Example 13-2 shows a minimal DLF document.

Example 13-2 Minimal DLF Document

<?xml version="1.0" ?>
<table name="dual">
 <lookup-key/>
 <columns>
  <column name="DUMMY" type="string">
 </columns>
 <dataset>
  <row>
   <col name="DUMMY">X</col>
  </row>
 </dataset>
</table>

Typical DLF Document

Example 13-3 shows a sample DLF document that contains seed data for the CLK_STATUS_L table.

Example 13-3 Sample DLF Document

<!--
  - $Header: $
  -
  - Copyright (c) 2001 Oracle Corporation. All Rights Reserved.
  -
  - NAME
  -   status.xml - Seed file for the CLK_STATUS_L table
  -
  - DESCRIPTION
  -   This file contains seed data for the Status level table.
  -
  - NOTES
  -
  - MODIFIED   (MM/DD/YY)
  -   dchiba    06/11/01 - Adaption to enhancements of data loading tool
  -   dchiba    05/23/01 - Adaption to generic data loading tool
  -   rbolsius  05/07/01 - Created
  -->
 
<table name="clk_status_l">
  <lookup-key>
    <!--column name="status_id" /-->
    <column name="status_code" />
  </lookup-key>
 
  <columns>
    <column name="status_id"          type="number" sequence="clk_status_seq" useforupdate="no"/>
    <column name="status_code"        type="number" />
    <column name="status_name"        type="string" translate="yes" />
    <column name="status_description" type="string" translate="yes" />
    <column name="version_created"    type="number" constant="0" />
    <column name="version_updated"    type="number" constant="0" />
    <column name="status_type_code"   type="string" virtual="yes" />
    <column name="status_type_id"     type="number" >
      <query text="select status_type_id from clk_status_type_l where status_type_code = :1" >
        <parameter id="1" col="status_type_code" />
      </query>
    </column>
  </columns>
 
  <dataset>
 
    <row>
      <col name="status_code" >100</col>
      <col name="status_name" trans-key="stts-name-1" >Continue</col>
      <col name="status_description" trans-key="stts-desc-1" >
        The client should continue with its request.</col>
      <col name="status_type_code" >INFO</col>
    </row>
 
    <row>
      <col name="status_code" >101</col>
      <col name="status_name" trans-key="stts-name-2" >Switching Protocols</col>
      <col name="status_description" trans-key="stts-desc-2" >
        The server understands and is willing to comply with the client''s
        request (via the Upgrade message header field) for a change in the
        application protocol being used on this connection.</col>
      <col name="status_type_code" >INFO</col>
    </row>
 
    <row>
      <col name="status_code" >200</col>
      <col name="status_name" trans-key="stts-name-3" >OK</col>
      <col name="status_description" trans-key="stts-desc-3" >
        The request has succeeded.</col>
      <col name="status_type_code" >SUCCESS</col>
    </row>
 
    <row>
      <col name="status_code" >201</col>
      <col name="status_name" trans-key="stts-name-4" >Created</col>
      <col name="status_description" trans-key="stts-desc-4" >
        The request has been fulfilled and resulted in a new resource being
        created.</col>
      <col name="status_type_code" >SUCCESS</col>
    </row>
 
    <row>
      <col name="status_code" >202</col>
      <col name="status_name" trans-key="stts-name-5" >Accepted</col>
      <col name="status_description" trans-key="stts-desc-5" >
        The request has been accepted for processing, but the processing has
        not been completed.</col>
      <col name="status_type_code" >SUCCESS</col>
    </row>
 
    <row>
      <col name="status_code" >203</col>
      <col name="status_name" trans-key="stts-name-6" >Non-Authoritative Information</col>
      <col name="status_description" trans-key="stts-desc-6" >
        The returned metainformation in the entity-header is not the
        definitive set as available from the origin server, but is gathered
        from a local or a third-party copy.</col>
      <col name="status_type_code" >SUCCESS</col>
    </row>
 
    <row>
      <col name="status_code" >204</col>
      <col name="status_name" trans-key="stts-name-7" >No Content</col>
      <col name="status_description" trans-key="stts-desc-7" >
        The server has fulfilled the request but does not need to return an
        entity-body, and might want to return updated metainformation.</col>
      <col name="status_type_code" >SUCCESS</col>
    </row>
 
    <!-- ... -->
 
  </dataset>
</table> 

Localized DLF Document

Example 13-4 shows an example of elements and attributes for localization.

Example 13-4 DLF with Localization

<?xml version="1.0"?>
<table name="table_name">
 
<translation>
<target>ar</target>
<target>bs</target>
<target>es</target>
<restype name="alt" expansion="50%"/>
<restype name="foo" expansion="50%"/>
<restype name="bar" expansion="30%"/>
</translation>
 
<lookup-key><column name="resid" /></lookup-key>
 
<columns>
<column name="resid" type="number" sequence="seq_foo" useforupdate="no"/>
<column name="image" type="binary"/>
<column name="alt_text" type="string" translate="yes" maxsize="30" 
        size-unit="byte" restype="alt"/>
 
</columns>
 
<dataset>
<col name="image">foo1.gif</col>
<col name="alt_text">Hello world</col>
</dataset>
 
</table>

DLF References

See the following references for further information:

[IETFRFC4646] Tags for the Identification of Languages
http://www.ietf.org/rfc/rfc4646