3 Files

This chapter describes how to work with Files in Oracle Data Integrator.

This chapter includes the following sections:

Section 3.1, "Introduction"
Section 3.2, "Installation and Configuration"
Section 3.3, "Setting up the Topology"
Section 3.4, "Setting Up an Integration Project"
Section 3.5, "Creating and Reverse-Engineering a File Model"
Section 3.6, "Designing an Interface"

3.1 Introduction

Oracle Data Integrator supports fixed or delimited files containing ASCII or EBCDIC data.

3.1.1 Concepts

The File technology concepts map the Oracle Data Integrator concepts as follows: A File server corresponds to an Oracle Data Integrator data server. In this File server, a directory containing files corresponds to a physical schema. A group of flat files within a directory corresponds to an Oracle Data Integrator model, in which each file corresponds to a datastore. The fields in the files correspond to the datastore columns.

Oracle Data Integrator provides a built-in driver for Files and knowledge modules for integrating Files using this driver, using the metadata declared in the File data model and in the topology.

Most technologies also have specific features for interacting with flat files, such as database loaders, utilities, and external tables. Oracle Data Integrator can also benefit from these features by using technology-specific Knowledge Modules. In terms of performance, it is most of the time recommended to use database utilities when handling flat files.

Note that the File technology concerns flat files (fixed and delimited). XML files are covered in Chapter 5, "XML Files".

3.1.2 Knowledge Modules

Oracle Data Integrator provides the knowledge modules (KM) listed in this section for handling File data using the File driver.

Note that the KMs listed in Table 3-1 are generic and can be used with any technology. Technology-specific KMs, using features such as loaders or external tables, are listed in the corresponding technology chapter.

Table 3-1 Knowledge Modules to read from a File

Knowledge Module	Description
LKM File to SQL	Loads data from an ASCII or EBCDIC File to any ANSI SQL-92 compliant database used as a staging area.
IKM SQL to File Append	Integrates data in a target file from any ANSI SQL-92 compliant staging area in replace mode.
RKM File (FROM EXCEL)	Retrieves file metadata from a Microsoft Excel spreadsheet. Consider using this KM if you plan to maintain the definition of your files structure in a dedicated Excel spreadsheet.

3.2 Installation and Configuration

Make sure you have read the information in this section before you start working with the File technology:

System Requirements and Certifications
Technology Specific Requirements
Connectivity Requirements

3.2.1 System Requirements and Certifications

Before performing any installation you should read the system requirements and certification documentation to ensure that your environment meets the minimum installation requirements for the products you are installing.

The list of supported platforms and versions is available on Oracle Technical Network (OTN):

http://www.oracle.com/technology/products/oracle-data-integrator/index.html.

3.2.2 Technology Specific Requirements

Some of the knowledge modules for File data use specific features of the database. This section lists the requirements related to these features.

Database Utilities

Most database technologies have their own utilities for interacting with flat files. All require that the database client software is accessible from the Agent that runs the interface that is using the utility. Some examples are:

Oracle: SQL*Loader
Microsoft SQL Server: bcp
Teradata: FastLoad, MultiLoad, TPump, FastExport

You can benefit from these utilities in Oracle Data Integrator by using the technology-specific knowledge modules. See the technology-specific chapter in this guide for more information about the knowledge modules and the requirements for using the database utilities.

3.2.3 Connectivity Requirements

This section lists the requirements for connecting to flat files.

JDBC Driver

Oracle Data Integrator includes a built-in driver for flat files. This driver is installed with Oracle Data Integrator and does not require additional configuration.

3.3 Setting up the Topology

Setting up the topology consists in:

Creating a File Data Server
Creating a File Physical Schema

3.3.1 Creating a File Data Server

A File data server is a container for a set of file folders (each file folder corresponding to a physical schema).

Oracle Data Integrator provides the default FILE_GENERIC data server. This data server suits most of the needs. In most cases, it is not required to create a File data server, and you only need to create a physical schema under the FILE_GENERIC data server.

3.3.1.1 Creation of the Data Server

Create a data server for the File technology using the standard procedure, as described in "Creating a Data Server" of the Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator. This section details only the fields required or specific for defining a File data server:

In the Definition tab:
- Name: Name of the data server that will appear in Oracle Data Integrator.
- User/Password: These fields are not used for File data servers.

In the JDBC tab, enter the following values:

JDBC Driver: com.sunopsis.jdbc.driver.file.FileDriver

JDBC URL: jdbc:snps:dbfile?<property=value>&<property=value>&...

You can use in the URL the properties listed in Table 3-2.

Table 3-2 JDBC File Driver Properties

Property	Value	Description
ENCODING	`<encoding_code>`	File encoding. The list of supported encoding is available at `http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html`. The default encoding value is `ISO8859_1`.
TRUNC_FIXED_STRINGS	`TRUE\|FALSE`	Truncates strings to the field size for fixed files. Default value is `FALSE`.
TRUNC_DEL_STRINGS	`TRUE\|FALSE`	Truncates strings to the field size for delimited files. Default value is `FALSE`.
OPT	`TRUE\|FALSE`	Optimizes file access on multiprocessor machines for better performance. Using this option on single processor machines may actually decrease performance. Default value is `FALSE`.

JDBC URL example:

jdbc:snps:dbfile?ENCODING=ISO8859_1&TRUNC_FIXED_STRINGS=FALSE&OPT=TRUE

3.3.2 Creating a File Physical Schema

Create a File physical schema using the standard procedure, as described in "Creating a physical schema" of the Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator.

In your physical schema, you must set a pair of directories:

The Directory (Schema), where Oracle Data Integrator will look for the source and target files and create error files for invalid records detected in the source files.
A Directory (Work Schema), where Oracle Data Integrator may create temporary files associated to the sources and targets contained in the Data Schema.

Notes:

Data and Work schemas each correspond to a directory. This directory must be accessible to the component that will access the files. The directory can be an absolute path (m:/public/data/files) or relative to the runtime agent or Studio startup directory (../demo/files). It is strongly advised to use a path that is independent from the execution location.
In UNIX in particular, the agent must have read/write permission on both these directories.
Keep in mind that file paths are different in Windows than they are in UNIX. Take the platform used by the agent into account when setting up this information.

Create for this physical schema a logical schema using the standard procedure, as described in "Creating a Logical Schema" of the Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator and associate it in a given context.

3.4 Setting Up an Integration Project

Setting up a project using the File database follows the standard procedure. See "Creating an Integration Project" of the Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator.

It is recommended to import the following knowledge modules into your project for getting started:

LKM File to SQL
IKM SQL to File Append
RKM File (FROM EXCEL)

In addition to these knowledge modules, you can also import file knowledge modules specific to the other technologies involved in your product.

3.5 Creating and Reverse-Engineering a File Model

This section contains the following topics:

Create a File Model
Reverse-engineer a File Model

3.5.1 Create a File Model

An File model is a set of datastores, corresponding to files stored in a directory. A model is always based on a logical schema. In a given context, the logical schema corresponds to one physical schema. The data schema of this physical schema is the directory containing all the files (eventually in sub-directories) described in the model.

Create a File model using the standard procedure, as described in "Creating a Model" of the Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator.

3.5.2 Reverse-engineer a File Model

Oracle Data Integrator provides specific methods for reverse-engineering files. File database supports four types of reverse-engineering:

Delimited Files Reverse-Engineering is performed per file datastore.
Fixed Files Reverse-engineering using the Wizard is performed per file datastore.
COBOL Copybook reverse-engineering, which is available for fixed files, if a copybook describing the file is provided. It is performed per file datastore.
Customized Reverse-Engineering, which uses a RKM (Reverse Knowledge Module) to get the structure of all of the files of the model from a Microsoft Excel spreadsheet. Note that if you use the RKM, you do not need to define manually the datastores by typing in each column definition because the RKM automatically defines the column definitions in the spreadsheet.

Note:

The built-in file driver uses metadata from the Oracle Data Integrator models (field data type or length, number of header rows, etc.). Driver-specific tags are generated by Oracle Data Integrator and passed to the driver along with regular SQL commands. These tags control how the driver reads or writes the file.

Similarly, when Oracle Data Integrator uses database loaders and utilities, it uses the model metadata to control these loaders and utilities.

It is important to pay close attention to the file definition after a reverse-engineering process, as discrepancy between the file definition and file content is a source of issues at run-time.

3.5.2.1 Delimited Files Reverse-Engineering

To perform a delimited file reverse-engineering:

In the Models accordion, right click your File Model and select New Datastore. The Datastore Editor opens.
In the Definition Tab, enter the following fields:
- Name: Name of this datastore
- Resource Name: Sub-directory (if needed) and name of the file. You can browse for the file using the (...) button
Go to the Files tab to describe the type of file. Set the fields as follows:
- File Format: Delimited
- Header (Number of Lines): Enter the number of lines of the header. Note that if there is a header, the first line of the header will be used by Oracle Data Integrator to name the columns in the file.
- Select a Record Separator.
- Select or enter the character used as a Field Separator.
- Enter a Text Delimiter if your file uses one.
- Enter a Decimal Separator if your file contains decimals.
From the File main menu, select Save.
In the Datastore Editor, go to the Columns tab.
In the editor toolbar, click Reverse-Engineer.
Verify the datatype and length for the reverse engineered columns. Oracle Data Integrator infers the fields datatypes and lengths from the file content, but may set default values (for example 50 for the strings field length) or incorrect data types in this process.
From the File main menu, select Save.

3.5.2.2 Fixed Files Reverse-engineering using the Wizard

Oracle Data Integrator provides a wizard to graphically define the columns of a fixed file.

To reverse-engineer a fixed file using the wizard:

In the Models accordion, right click your File Model and select New Datastore. The Datastore Editor opens.
In the Definition Tab, enter the following fields:
- Name: Name of this datastore
- Resource Name: Sub-directory (if needed) and name of the file. You can browse for the file using the (...) button
Go to the Files tab to describe the type of file. Set the fields as follows:
- File Format: Fixed
- Header (Number of Lines): Enter the number of lines of the header.
- Select a Record Separator.
From the File main menu, select Save.
In the Datastore Editor, go to the Columns tab.
In the editor toolbar, click Reverse-Engineer.The Columns Setup Wizard is launched. The Columns Setup Wizard displays the first records of your file.
Click on the ruler (above the file contents) to create markers delimiting the columns. You can right-click within the ruler to delete a marker.
Columns are created with pre-generated names (C1, C2, and so on). You can edit the column name by clicking in the column header line (below the ruler).
In the properties panel (on the right), you can edit all the parameters of the selected column. You should set at least the Column Name, Datatype, and Length for each column.
Click OK when the columns definition is complete.
From the File main menu, select Save.

3.5.2.3 COBOL Copybook reverse-engineering

COBOL Copybook reverse-engineering allows you to retrieve a legacy file structure from its description contained in a COBOL Copybook file.

To reverse-engineer a fixed file using a COBOL Copybook:

In the Models accordion, right click your File Model and select New Datastore. The Datastore Editor opens.
In the Definition Tab, enter the following fields:
- Name: Name of this datastore
- Resource Name: Sub-directory (if needed) and name of the file. You can browse for the file using the (...) button
Go to the Files tab to describe the type of file. Set the fields as follows:
- File Format: Fixed
- Header (Number of Lines): Enter the number of lines of the header.
- Select a Record Separator.
From the File main menu, select Save.
In the Datastore Editor, go to the Columns tab.
Create or open a File datastore that has a fixed format.
In the Datastore Editor, go to the Columns tab.
In the toolbar menu, click Reverse Engineer COBOL CopyBook.
In the Reverse Engineer Cobol CopyBook Dialog, enter the following fields:
- File: Location of the Copybook file
- Character set: Copybook file charset.
- Description format (EBCDIC | ASCII): Copybook file format
- Data format (EBCDIC | ASCII): Data file format
Click OK.

The columns described in the Copybook are reverse-engineered and appear in the column list.

Note:

If a field has a data type declared in the Copybook with no corresponding datatype in Oracle Data Integrator File technology, then this column will appear with no data type.

3.5.2.4 Customized Reverse-Engineering

In this reverse-engineering method, Oracle Data Integrator uses a Microsoft Excel spreadsheet that contains the description of the group of files. This file has a specific format. A sample file (file_repository.xls) is provided in the Oracle Data Integrator demo in the /demo/excel sub-directory.

The following steps assume that you have modified this file with the description of the structure of your flat files.

To perform a customized reverse-engineering, perform the following steps:

Create an ODBC Datasource for the Excel Spreadsheet corresponding to the Excel Spreadsheet containing the files description.
Define the Data Server, Physical and Logical Schema for the Microsoft Excel Spreadsheet
Run the customized reverse-engineering using the RKM File from Excel RKM.

Create an ODBC Datasource for the Excel Spreadsheet

Launch the Microsoft ODBC Administrator.
Add a System Datasource.
Select the Microsoft Excel Driver (*.xls) driver.
Name the data source: ODI_EXCEL_FILE_REPO and select the file /demo/excel/file_repository.xls.

Define the Data Server, Physical and Logical Schema for the Microsoft Excel Spreadsheet

In Topology Navigator, add a Microsoft Excel data server with the following parameters:
- Name: EXCEL_FILE_REPOSITORY
- JDBC Driver: sun.jdbc.odbc.JdbcOdbcDriver
- JDBC URL: jdbc:odbc:ODI_EXCEL_FILE_REPO
From the File main menu, select Save.
Add a physical schema to this data server. Leave the default values in the Definition tab.

In the Context tab of the physical schema, click Add.
In the new line, select the context that will be used for reverse engineering and enter in the logical schema column EXCEL_FILE_REPOSITORY. This name is mandatory.
From the File main menu, select Save.

Run the customized reverse-engineering

In Designer Navigator, import the RKM File From Excel Knowledge Module into your project.
In the Models accordion, double-click the File Model. The Model Editor opens.
In the Reverse-Engineer Tab, set the following parameters:
- Select Customized
- Context: Reverse Context
- Knowledge Module: RKM File from Excel
In the toolbar menu, click Reverse-Engineer.
You can follow the reverse-engineering process in the execution log

Note:

A Microsoft Excel logical schema must be defined. It must be named EXCEL_FILE_REPOSITORY and point to the file file_repository.xls or another file with a similar structure.
The Microsoft Excel file file_repository.xls should be closed before running the reverse engineering.

3.6 Designing an Interface

You can use a file as a source or a target of an integration interface, but NOT as a staging area.

The KM choice for an interface or a check determines the abilities and performances of this interface or check. The recommendations below help in the selection of the KM for different situations concerning a File data server.

3.6.1 Loading Data From Files

Files can be used as a source of an interface. The LKM choice in the Interface Flow tab to load a File to the staging area is essential for the interface performance.

The LKM File to SQL uses the built-in file driver for loading data from a File database to a staging area. In addition to this KM, you can also use KMs that are specific to the technology of the staging area or target. Such KMs support technology-specific optimizations and use methods such as loaders or external tables.

This knowledge module, as well as other KMs relying on the built-in driver, support the following two features attached to the driver:

Erroneous Records Handling
Multi-Record Files Support

Erroneous Records Handling

Oracle Data Integrator built-in driver provides error handling at column level for the File technology. When loading a File, Oracle Data Integrator performs several controls. One of them verifies if the data in the file is consistent with the datastore definition. If one value from the row is inconsistent with the column description, the On Error option - on the Control tab of the Column Editor - defines the action to perform. The On Error option can take the following values:

Reject Error: The row containing the error is moved to a .BAD file, and a reason of the error is written to a .ERROR file.

The .BAD and .ERROR files are located in the same directory as the file being read and are named after this file, with a .BAD and .ERROR extension.
Null if error (inactive trace): The row is kept in the flow and the erroneous value is replaced by null.
Null if error (active trace): The row is kept in the flow, the erroneous value is replaced by null, and an reason of the error is written to the .ERROR file.

Multi-Record Files Support

Oracle Data Integrator is able to handle files that contain multiple record formats. For example, a file may contain records representing orders (these records have 5 columns) and other records representing order lines (these records having 8 columns with different datatypes).

The approach in Oracle Data Integrator consists in considering each specific record format as a different datastore.

Example 3-1 Multi Record File

This example uses the multi record file orders.txt. It contains two different record types: orders and order lines.

Order records have the following format:

REC_CODE,ORDER_ID,CUSTOMER_ID,ORDER_DATE

Order lines records have the following format

REC_CODE,ORDER_ID,LINE_ID,PRODUCT_ID,QTY

Order records are identified by REC_CODE=ORD

Order lines are identified by REC_CODE=LIN

To handle multi record files as a source interface of an integration interface:

Create a File Model using a logical schema that points to the directory containing the source file.
Identify the different record formats and structures of the flat file. In Example 3-1 two record formats can be identified: one for the orders and one for the order lines.
For each record format identified, do the following:
1. Create a datastore in the File Model for each type of record.
  
  For Example 3-1 create two datastores.
2. In the Definition tab of the Datastore Editor, enter a unique name in the Name field and enter the flat file name in the Resource Name field. Note that the resource name is identical for all datastores of this model.
  
  For Example 3-1 you can use ORDERS and ORDER_LINES as the name of your datastores. Enter orders.txt in the Resource Name field for both datastores.
3. In the Files tab, select, depending on the format of your flat file, Fixed or Delimited from the File Format list and specify the record and field separators.
4. In the Columns tab, enter the column definitions for this record type.
5. One or more columns can be used to identify the record type. The record code is the field value content that is used as distinguishing element to be found in the file. The record code must be unique and allows files with several record patterns to be processed. In the Record Codes field, you can specify several values separated by the semicolon (;) character.
  
  In the Column Editor, assign a record code for each record type in the Record Codes field.
  
  In Example 3-1, enter ORD in the Record Codes field of the CODE_REC column of the ORDERS datastore and enter LIN in the Record Codes field of the CODE_REC column of the ORDER_LINES datastore.

With such definition, when reading data from the ORDERS datastore, the file driver will filter only those of the records where the first column contains the value ORD. The same applies to the ORDER_LINES datastore (only the records with the first column containing the value LIN will be returned).

3.6.2 Integrating Data in Files

Files can be used as a target of an interface. The data integration strategies in Files concern loading from the staging area to Files. The IKM choice in the Interface Flow tab determines the performances and possibilities for integrating.

The IKM SQL to File Append uses the file driver for integrating data into a Files target from a staging area in truncate-insert mode.

This KM has the following options:

INSERT automatically attempts to insert data into the target datastore of the interface.
CREATE_TARG_TABLE creates the target table.
TRUNCATE deletes the content of the target file and creates it if it does not exist.
GENERATE_HEADER creates the header row for a delimited file.

In addition to this KM, you can also use IKMs that are specific to the technology of the staging area. Such KMs support technology-specific optimizations and use methods such as loaders or external tables.