6 XML Files

It is important to understand how to work with XML files in Oracle Data Integrator.

This chapter includes the following sections:

6.1 Introduction

Oracle Data Integrator supports XML files integration through the Oracle Data Integrator Driver for XML.

6.1.1 Concepts

The XML concepts map the Oracle Data Integrator concepts as follows: An XML file corresponds to a data server in Oracle Data Integrator. Within this data server, a single schema maps the content of the XML file.The Oracle Data Integrator Driver for XML (XML driver) loads the hierarchical structure of the XML file into a relational schema. This relational schema is a set of tables located in the schema that can be queried or modified using SQL. The XML driver is also able to unload the relational schema back in the XML file.The relational schema is reverse-engineered as a data model in ODI, with tables, columns, and constraints. This model is used like a normal relational data model in ODI. If the modified data within the relational schema needs to be written back to the XML file, the XML driver provides the capability to synchronize the relational schema into the file.

See Oracle Data Integrator Driver for XML Reference for more information on this driver.

6.1.2 Pre/Post Processing Support for XML Driver

You can now customize the way data is fed to the XML driver. You can set up intermediate processing stages to process the data that is retrieved from an external endpoint using Oracle Data Integrator, or to write the data out to an external endpoint.

For detailed information about configuring and implement the pre and post processing stages for XML driver, see Pre/Post Processing Support for XML and Complex File Drivers.

6.1.3 Knowledge Modules

Oracle Data Integrator provides the IKM XML Control Append for handling XML data. This Knowledge Module is a specific XML Knowledge Module. It has a specific option to synchronize the data from the relational schema to the file.

In addition to this KM, you can also use an XML data server as any SQL data server. XML data servers support both the technology-specific KMs sourcing or targeting SQL data servers, as well as the generic KMs. See Generic SQL or the technology chapters for more information on these KMs.

6.2 Installation and Configuration

Make sure you have read the information in this section before you start using the XML Knowledge Module:

6.2.1 System Requirements

Before performing any installation you should read the system requirements and certification documentation to ensure that your environment meets the minimum installation requirements for the products you are installing.

The list of supported platforms and versions is available on Oracle Technical Network (OTN):

http://www.oracle.com/technology/products/oracle-data-integrator/index.html.

6.2.2 Technologic Specific Requirements

There are no technology-specific requirements for using XML Files in Oracle Data Integrator.

6.2.3 Connectivity Requirements

This section lists the requirements for connecting to XML database.

Oracle Data Integrator Driver for XML

XML files are accessed through the Oracle Data Integrator Driver for XML. This JDBC driver is installed with Oracle Data Integrator and requires no other installed component or configuration.

You must ask the system administrator for the following connection information:

  • The location of the DTD or XSD file associated with your XML file

  • The location of the XML file

6.3 Setting up the Topology

Setting up the topology consists in:

  1. Creating an XML Data Server

  2. Creating a Physical Schema for XML

6.3.1 Creating an XML Data Server

An XML data server corresponds to one XML file that is accessible to Oracle Data Integrator.

6.3.1.1 Creation of the Data Server

Create a data server for the XML technology using the standard procedure, as described in Creating a Data Server of Administering Oracle Data Integrator. This section details only the fields required or specific for defining a File data server:

  1. In the Definition tab:
    • Name: Name of the data server that will appear in Oracle Data Integrator.

    • User/Password: These fields are not used for XML data servers.

  2. In the JDBC tab, enter the values according to the driver used:
    • JDBC Driver: com.sunopsis.jdbc.driver.xml.SnpsXmlDriver

    • JDBC URL: jdbc:snps:xml?[property=value&property=value...]

    Table 6-1 lists the key properties of the Oracle Data Integrator Driver for XML. These properties can be specified in the JDBC URL.

    See Oracle Data Integrator Driver for XML Reference for a detailed description of these properties and for a comprehensive list of all properties.

    Table 6-1 JDBC Driver Properties

    Property Value Notes

    f

    <XML File location>

    XML file name. Use slash "/" in the path name instead of back slash "\". It is possible to use an HTTP, FTP or File URL to locate the file. Files located by URL are read-only.

    d

    <DTD/XSD File location>

    Description file: This file may be a DTD or XSD file. It is possible to use an HTTP, FTP or File URL to locate the file. Files located by URL are read-only.

    Note that when no DTD or XSD file is present, the relational schema is built using only the XML file content. It is not recommended to reverse-engineer the data model from such a structure as one XML file instance may not contain all the possible elements described in the DTD or XSD, and data model may be incomplete.

    re

    <Root element>

    Name of the element to take as the root table of the schema. This value is case sensitive. This property can be used for reverse-engineering for example a specific message definition from a WSDL file, or when several possible root elements exist in a XSD file.

    ro

    true | false

    If true, the XML file is opened in read only mode.

    s

    <schema name>

    Name of the relational schema where the XML file will be loaded. If this property is missing, a schema named after the five first letters of the XML file name will automatically be created.

    cs

    true | false

    Load the XML file in case sensitive or insensitive mode. For case insensitive mode, all element names in the DTD file should be distinct (For example: Abc and abc in the same file will result in name collisions).

    The following examples illustrate these properties:

    Connects to the PROD20100125_001.xml file described by products.xsd in the PRODUCTS schema.

    jdbc:snps:xml?f=/xml/PROD20100125_001.xml&d=/xml/products.xsd&s=PRODUCTS
    

    Connects in read-only mode to the staff_internal.xml file described by staff_internal.dtd in read-only mode. The schema name will be staff.

    jdbc:snps:xml?f=/demo/xml/staff_internal.xml&d=/demo/xml/staff_internal.dtd&ro=true&s=staff
    

6.3.2 Creating a Physical Schema for XML

Create an XML physical schema using the standard procedure, as described in Creating a Physical Schema in Administering Oracle Data Integrator.

The schema name that you have set on the URL will be preset. Select this schema for both the Data Schema and Work Schema.

Create for this physical schema a logical schema using the standard procedure, as described in Creating a Logical Schema in Administering Oracle Data Integrator and associate it in a given context.

6.4 Setting Up an Integration Project

Setting up a Project using the XML database follows the standard procedure. See Creating an Integration Project of the Developing Integration Projects with Oracle Data Integrator.

The recommended knowledge modules to import into your project for getting started with XML are the following:

  • LKM SQL to SQL

  • LKM File to SQL

  • IKM XML Control Append

6.5 Creating and Reverse-Engineering a XML File

This section contains the following topics:

6.5.1 Create an XML Model

An XML file model groups a set of datastores. Each datastore typically represents an element in the XML file.

Create an XML Model using the standard procedure, as described in Creating a Model of Developing Integration Projects with Oracle Data Integrator. Select the XML technology and the XML logical schema created when configuring the topology.

6.5.2 Reverse-Engineering an XML Model

XML supports standard reverse-engineering, which uses only the abilities of the XML driver.

It is recommended to reference a DTD or XSD file in the dtd or d parameters of the URL to reverse-engineer the structure from a generic description of the XML file structure. Reverse-engineering can use an XML instance file if no XSD or DTD is available. In this case, the relational schema structure will be inferred from the data contained in the XML file.

Standard Reverse-Engineering

To perform a Standard Reverse- Engineering on XML use the usual procedure, as described in Reverse-engineering a Model of Developing Integration Projects with Oracle Data Integrator.

The standard reverse-engineering process will automatically reverse-engineer the table from the relational schema generated by the XML driver. Note that these tables automatically include:

  • Primary keys (PK columns) to preserve parent-child elements relationships

  • Foreign keys (FK columns) to preserve parent-child elements relationships

  • Order identifier (ORDER columns) to preserve the order of elements in the XML file

These extra columns enable the mapping of the hierarchical XML structure into the relational schema. See XML to SQL Mapping in the Oracle Data Integrator Driver for XML Reference for more information.

6.6 Designing a Mapping

You can use XML as a source or a target of a mapping.

The KM choice for a mapping or a check determines the abilities and performances of this mapping or check. The recommendations in this section help in the selection of the KM for different situations concerning an XML data server.

6.6.1 Notes about XML Mappings

Read carefully these notes before working with XML in mappings.

6.6.1.1 Targeting an XML Structure

When using a datastore of an XML model as a target of a mapping, you must make sure to load the driver-generated columns that are used for preserving the parent-child relationships and the order in the XML hierarchy. For example, if filling records for the region element into an XML structure as shown in Example 6-1, that correspond to a REGION table in the relational schema, you should load the columns REGION_ID and REGION_NAME of the REGION table. These two columns correspond to XML attributes.

<country COUNTRY_ID="6" COUNTRY_NAME="Australia">
    <region REGION_ID="72" REGION_NAME="Queensland">
</country>

In Example 6-1 you must also load the following additional columns that are automatically created by the XML Driver in the REGION table:

  • REGIONPK: This column enables you to identify each <region> element.

  • REGIONORDER: This column enables you to order the <region> elements in the XML file (records are not ordered in a relational schema, whereas XML elements are ordered).

  • COUNTRYFK: This columns enables you to put the <region> element in relation with the <country> parent element. This value is equal to the COUNTRY.COUNTRYPK value for the Australia record in the COUNTRY table.

Example 6-1 XML Structure

6.6.1.2 Synchronizing XML File and Schema

To ensure a perfect synchronization of the data in an XML file and the data in the XML schema, the following commands have to be called:

  • Before using the tables of an XML model, either to read or update data, it is recommended that you use the SYNCHRONIZE FROM FILE command on the XML logical schema. This operation reloads the XML hierarchical data in the relational XML schema. The schema is loaded in the built-in or external database storage when first accessed. Subsequent changes made to the file are not automatically synchronized into the schema unless you issue this command.

  • After performing changes in the relational schema, you must unload this schema into the XML hierarchical data by calling the SYNCHRONIZE ALL or SYNCHRONIZE FROM DATABASE commands on the XML Logical Schema. The IKM XML Control Append implements this synchronize command.

These commands must be executed in procedures in the packages before (and after) the mappings and procedures manipulating the XML schema.

See Oracle Data Integrator Driver for XML Reference for more information on these commands.

6.6.1.3 Handling Large XML Files

Large XML files can be handled with high performance with Oracle Data Integrator.

The default driver configuration stores the relational schema in a built-in engine in memory. It is recommended to consider the use of external database storage for handling large XML files.

See Schema Storage for more information on these commands.

6.6.2 Loading Data from and to XML

An XML file can be used as a mapping's source or target. The LKM choice in the Loading Knowledge Module tab that is used to load data between XML files and other types of data servers is essential for the performance of the mapping.

6.6.2.1 Loading Data from an XML Schema

Use the Generic SQL KMs or the KMs specific to the other technology involved to load data from an XML database to a target or staging area database.

Table 6-2 lists some examples of KMs that you can use to load from an XML source to a staging area:

Table 6-2 KMs to Load from XML to a Staging Area

Staging Area KM Notes

Microsoft SQL Server

LKM SQL to MSSQL (BULK)

Uses SQL Server's bulk loader.

Oracle

LKM SQL to Oracle

Faster than the Generic LKM (Uses Statistics)

All

LKM SQL to SQL

Generic KM to load data between an ANSI SQL-92 source and an ANSI SQL-92 staging area.

6.6.2.2 Loading Data to an XML Schema

It is not advised to use an XML schema as a staging area, except if XML is the target of the mapping and you wish to use the target as a staging area. In this case, it might be required to load data to an XML schema.

Use the Generic SQL KMs or the KMs specific to the other technology involved to load data from a source or staging area into an XML schema.

Table 6-3 lists some examples of KMs that you can use to load from a source to an XML staging area.

Table 6-3 KMs to Load to an XML Schema

Source KM Notes

File

LKM File to SQL

Generic KM to load a file in a ANSI SQL-92 staging area.

All

LKM SQL to SQL

Generic KM to load data between an ANSI SQL-92 source and an ANSI SQL-92 staging area.

6.6.3 Integrating Data in XML

XML can be used as a target of a mapping. The data integration strategies in XML concern loading from the staging area to XML. The IKM choice in the Integration Knowledge Module tab determines the performances and possibilities for integrating.

The IKM XML Control Append integrates data into the XML schema and has an option to synchronize the data to the file. In addition to this KM, you can also use the Generic SQL KMs or the KMs specific to the other technology involved. Note that if using generic or technology-specific KMs, you must manually perform the synchronize operation to write the changes made in the schema to the XML file.

Table 6-4 lists some examples of KMs that you can use to integrate data:

  • From a staging area to an XML target

  • From an XML staging area to an XML target. Note that in this case the staging area is on the XML target.

Table 6-4 KMs to Integrate Data in an XML File

Mode Staging Area KM Notes

Update

XML

IKM SQL Incremental Update

Generic KM

Append

XML

IKM SQL Control Append

Generic KM

Append

All RDBMS

IKM SQL to SQL Append

Generic KM

6.7 Troubleshooting

This section provides information on how to troubleshoot problems that you might encounter when using XML in Oracle Data Integrator. It contains the following topics:

6.7.1 Detect the Errors Coming from XML

Errors appear often in Oracle Data Integrator in the following way:

java.sql.SQLException: No suitable driver
at ... 
at ... 
...

the java.sql.SQLExceptioncode simply indicates that a query was made through the JDBC driver, which has returned an error. This error is frequently a database or driver error, and must be interpreted in this direction.

Only the part of text in bold must first be taken in account. It must be searched in the XML driver documentation. If it contains a specific error code, like here, the error can be immediately identified.

If such an error is identified in the execution log, it is necessary to analyze the SQL code send to the database to find the source of the error. The code is displayed in the description tab of the task in error.

6.7.2 Common Errors

This section describes the most common errors with XML along with the principal causes. It contains the following topics:

  • No suitable driver

    The JDBC URL is incorrect. Check that the URL syntax is valid.

  • File <XML file> is already locked by another instance of the XML driver.

    The XML file is locked by another user/application. Close all application that might be using the XML file. If such an application has crashed, then remove the .lck file remaining in the XML file's directory.

  • The DTD file "xxxxxxx.dtd" doesn't exist

    This exception may occur when trying to load an XML file by the command LOAD FILE. The error message can have two causes:

    • The path of the DTD file is incorrect.

    • The corresponding XML file was already opened by another schema (during connection for instance).

  • Table not found: S0002 Table not found: <table name> in statement [<SQL statement>]

    The table you are trying to access does not exist in the schema.

  • Column not found: S0022 Column not found: <column name> in statement [<SQL statement>]

    The column you are trying to access does not exist in the tables specified in the statement.