Skip Headers
Oracle® Warehouse Builder User's Guide
11g Release 1 (11.1)

B31278-06
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

8 Flat Files as Sources or Targets

You can use data from ASCII and binary flat files as sources and targets. For flat file sources, you have the additional option of creating a SQL representation of the data by using an external table.

This chapter includes the following topics:

About Flat Files

You can use flat files as either sources or targets in mappings.

Flat Files as Sources

To use a flat file as a source, first introduce the metadata into the workspace. The steps you take depend upon if the source metadata is in ASCII or binary format. For ASCII format, see "Importing ASCII Files into the Workspace". For binary format, see "Adding Existing Binary Files to the Workspace".

After you import the file, you can use it as a source in a mapping. Mappings require files to have a permanent name. However, if your source file is a generated file, that is, given a unique name each time it is generated on the main frame for example, you can provide the new name at runtime.

Importing ASCII Files into the Workspace

To import flat file definitions, complete the following steps:

  1. Establish network connectivity to the files you wish to import.

    If you are accessing the data files directly, and if the client and the data files reside on different types of operating systems, contact your system administrator to establish the required connectivity through NFS or other network protocol.

    If the client and data files reside on Windows operating systems, store the data files on any drive the client computer can access.

  2. Creating Flat File Modules

    Create a module for each folder in your file system from which you want to sample existing files.

  3. Importing Definitions from Flat Files

    Use the Import Metadata Wizard to select the flat files you wish to import. On the Connection Information page, select the option to start the Flat File Sample Wizard.

  4. Using the Flat File Sample Wizard

    The Flat File Sample Wizard enables you to view a sample of the flat file and define record organization and file properties. The wizard enables you to sample and define common flat file formats such as string and ascii.

    For files with complex record structures, the Flat File Sample Wizard may not be suitable for sampling the data. In such cases, see "Adding Existing Binary Files to the Workspace".

  5. Use the flat file as either a source or a target in a mapping.

    If you intend to use the file as a source, consider the advantages of creating an external table based on the file. For more information, see "External Table Operators versus Flat File Operators" and"Creating a New External Table Definition".

  6. Updating a File Definition

    If changes are made to the structure of the flat file, update the file definition.

Adding Existing Binary Files to the Workspace

To add existing binary flat files, complete the following steps:

  1. Creating Flat File Modules

    Create a module for each folder in your file system to which you want to create new files.

  2. Using the Create Flat File Wizard

    Follow the prompts in the wizard, taking notice of the special instructions for naming the file in "Describing a New Flat File".

  3. Use the newly created flat file as a target in a mapping.

About External Tables

An external table is a read-only table that is associated with a single record type in external data such as a flat file. External tables represent data from non-relational source in a relational table format. When you use an external table in a mapping, column properties are based on the SQL properties you defined when importing the flat file. For more information about SQL properties for flat files, see "SQL Properties" .

When you use an external table as a source in a mapping, you can use it as a regular source table. Warehouse Builder generates SQL code to select rows from the external table. You can also get parallel access to the file through the table.

Note:

Use external tables for source tables only.

You can either import an existing external table from another database or define a new external table as described in "Creating a New External Table Definition".

External Table Operators versus Flat File Operators

You can introduce data from a flat file into a mapping either through an external table or a flat file operator. To decide between the two options, consider how the data must be transformed.

When you use an external table operator, the mapping generates SQL code. If the data is to be joined with other tables or requires complex transformations, use an external table.

When you use a flat file operator, SQL*Loader code is generated. In cases where large volumes of data are to be extracted and little transformation is required, you can use the flat file operator. From the flat file operator, you could load the data to a staging table, add indexes, and perform transformations as necessary. The transformations you can perform on data introduced by a flat file operator are limited to SQL*Loader transformations only.

You can use an external table to combine the loading and transformation within a single set-based SQL DML statement. You do not have to stage the data temporarily before inserting it into the target table.

For more information about differences between external tables and SQL*Loader (flat file operators), see Oracle Database Utilities.

Flat Files as Targets

You define a new flat file to create a new target.

Creating a New Flat File as a Target

To design a new flat file, complete the following steps:

  1. Creating Flat File Modules

    Create a module for each folder in your file system to which you want to create new files.

  2. Using the Create Flat File Wizard

  3. Use the newly created flat file as a target in a mapping.

Creating Flat File Modules

You can either import or define flat files. Create a flat file modules to store definitions of flat files.

To create a flat file module:

  1. Right-click the Files node in the Project Explorer and select New.

    Warehouse Builder displays the welcome page for the Create Module Wizard.

  2. Define the module in the following steps:

    Describing the Flat File Module

    Defining Locations for Flat File Modules

  3. The Finish page summarizes the information you provided on each of the wizard pages. When you click Finish, the wizard creates the flat file module and inserts its name in the Project Explorer.

    If you checked Import after finish earlier in the wizard on the Connection Information Page, the Import Metadata Wizard described in "Importing Definitions from Flat Files" is displayed.

    Or, to define a new flat file, see "Using the Create Flat File Wizard".

Describing the Flat File Module

Type a name and an optional description for the flat file module.

Recall that you create a module for each folder in your file system from which you want to sample existing files or to which you want to create new files. Therefore, consider naming the module based on the folder name. Each file module corresponds to one folder in the file system.

For example, to import flat files from c:\folder1 and a subfolder such as c:\folder1\subfolder,create two file modules such as C_FOLDER1 and C_FODLER1_SUBFOLDER.

Defining Locations for Flat File Modules

Locations for flat file modules are the folders in the file system from which you sample existing files or to which you create new files. You can define a new location or select an existing location.

Note that a flat file location identifies a folder in the file system and does not include subfolders.

Connection Information Page

The Connection page displays with a default location named based on the module name you typed.

If you intend to import files into this module, click Edit to assign values for the location. This location becomes the folder from which you sample existing files or create new files.

Edit File System Location Dialog Box

On the Edit File System Location dialog box, enter the fully qualified directory, including the drive letter, if appropriate.

Using the Create Flat File Wizard

Use the Create Flat File Wizard to design a new flat file. The primary use of this wizard is to create a flat file for use as a target in a mapping. After you complete the wizard, you can follow the steps described in "Creating a Source or Target Based on an Existing Flat File".

Also consider using this wizard as a secondary method for introducing metadata from an existing flat file. This could be the case when the existing file has complex record formats and using the Flat File Sample wizard is not a viable solution.

The Create Flat File Wizard guides you in completing the following steps:

Describing a New Flat File

Use the Name and Description page to describe the new flat file to create. After you complete the file set up information, click Next to continue with the wizard.

  • Name: This name uniquely identifies the file in the workspace. Type a name that does not include a space or any punctuation. You can include an underscore. You can use upper and lower case letters. Do not start the name with a digit. Do not use reserved words. For a list of reserved words, see "Reserved Words" in the Warehouse Builder Online Help.

  • Default Physical File Name: If you are creating a new file, you can leave this name blank. If you are defining an existing binary file, type the name of the file. Do not include the file path as you specify this later in the wizard.

  • Character set: Character sets determine what languages can be represented in database objects and files. Select a character set or accept the default which is the character set defined for the computer hosting Warehouse Builder. For complete information about NLS character sets, see Oracle Database Globalization Support Guide.

  • Description: You can type in an optional description for the file.

Defining File Properties for a New Flat File

Use the File Properties page to specify Record Organization, Logical Record Definition, Number of Rows to Skip, and the Field Format for the new flat file.

Record Organization

Indicate how to organize the records in the file. Select between the two options to indicate how the length of each record in the file is determined:

  • Records delimited by: Select this option to designate the end of each record by a delimiter. Then specify that record delimiter. You can accept the default record delimiter, new line (\n), or you can type in a new value.

  • Record length (in characters): Select this option to create a file with all records having the same length. Then specify the number of characters in each record. For files with multibyte characters, count a multibyte character as one character.

Logical Record Definition

By default, the wizard creates a file in which each physical record corresponds to one logical record. You can override the default to create a file composed of logical records that correspond to multiple physical records.

  • Number of physical records for each logical record: The data file contains a fixed number of physical records for each logical record.

    PHYSICAL_RECORD1
    PHYSICAL_RECORD2
    PHYSICAL_RECORD3
    PHYSICAL_RECORD4
    

    In the preceding example, if the number of physical records for each logical record is 2, then PHYSICAL_RECORD1 and PHYSICAL_RECORD2 form one logical record and PHYSICAL_RECORD3 and PHYSICAL_RECORD4 form a second logical record.

  • End character of the current physical record: The data file contains a variable number of physical records with a continuation character at the end that signifies that the record is associated with the next physical record.

    In the following example, the continuation character is a percentage sign (%) at the end of the record.

    PHYSICAL_RECORD1%
    PHYSICAL_RECORD2       end log rec 1
    PHYSICAL_RECORD3%
    PHYSICAL_RECORD4       end log rec 2
    
  • Start character of the next physical record: The data file contains a variable number of physical records with a continuation character at the beginning of each physical record that signifies that the record is associated with the previous physical record.

    The following example shows two logical records with a continuation character at beginning of the record.

    PHYSICAL_RECORD1
    %PHYSICAL_RECORD2      end log rec1
    PHYSICAL_RECORD3
    %PHYSICAL_RECORD4      end log rec 2
    

    More than two records can be joined with this technique. The following example shows four physical records for each logical record using continuation at beginning.

    PHYSICAL_RECORD1
    %PHYSICAL_RECORD2
    %PHYSICAL_RECORD25
    %PHYSICAL_RECORD26   (end log record 1)
    PHYSICAL_RECORD3
    %PHYSICAL_RECORD4
    %PHYSICAL_RECORD45
    %PHYSICAL_RECORD46 (end log record 2)
    

Number of Rows to Skip

When creating a new file, you can leave this value blank.

When defining an existing file, indicate the number of records to skip at execution time in Skip rows. This is useful for skipping over unwanted header information. If one of the records includes field names, skip the preceding header records so that the record containing field names is first in the file.

Field Format

Select between Fixed Length and Delimited formats for the file.

To create a delimited file, specify the following properties:

  • Field delimiter: Field delimiters designate where one field ends and another begins. You can type in a field delimiter or select one from the list. The list displays common field delimiters. However, you may type in any character as a delimiter except the ones used for enclosures. The default is the comma (,).

  • Enclosures (Left and Right): Some delimited files contain enclosures that denote text strings within a field. If the file contains enclosures, enter an enclosure character in the text box or select one from the list. The list displays common enclosures. However, you may type in any character. The default for both the left and right enclosure is the double quotation mark (").

Defining the Record Type for a New Flat File

Indicate whether the file you create is to contain a single record type or multiple record types. If the file should contain only one type of record, select Single Record.

If the file should contain more than one record type, select Multi Record. Use multiple record types to create a flat file with more than one record type. For each record type you want to create, specify values under Record Type Location and then its type value and record name.

Valid entries for Record Type Location depend the field format you selected on the File Properties page, fixed length or delimited fields. For example, if you specified the fields are delimited, indicate the start position and length for the record type. For fixed-length files, indicate the field position.

Defining Field Properties for a New Flat File

Use the Field Properties page to define properties for each field.

Since you can use a flat file in a mapping either directly as a flat file source or indirectly via an external table, the Field Properties page shows both SQL*Loader Properties and SQL Properties. Use the scroll bar to scroll to the right and view all the properties.

The wizard deactivates properties that do not apply to a given data type. For example, you can edit the Length for a CHAR, but precision and scale are not available. Deactivated properties are grayed out.

SQL*Loader Properties

The first set of properties the wizard displays are for the SQL*Loader utility. When you use the flat file directly as a source in a mapping, SQL*Loader and the properties you set here are used. SQL*Loader properties include Type, Start, End, Length, Precision, Scale, Mask, NULLIF, and DEFAULTIF.

By default, the wizard displays these properties as they appear in the source file. Edit these properties or accept the defaults to specify how the fields should be handled by the SQL*Loader.

Type

Describes the data type of the field for the SQL*Loader. You can use the wizard to import many data types such as CHAR, DATE, DECIMAL EXTERNAL, FLOAT EXTERNAL, INTEGER EXTERNAL, ZONED, and ZONED EXTERNAL. For complete information about SQL*Loader field and data types, see Oracle Database Utilities. Currently, only portable data types are supported.

Start

In fixed length files, indicates the field start position.

End

In fixed length files, indicates the field end position.

Length

For delimited files, specifies the maximum field length to be used by SQL* Loader.

Precision

Defines precision for DECIMAL and ZONED data types.

Scale

Defines the scale for DECIMAL and ZONED data types.

Mask

The SQL*Loader uses dd-mon-yy as its default date mask. You can override this default by entering a valid date mask when you describe the file. For example, if the input data has the format DD-Mon-YYYY rather than the SQL*Loader default, you can enter the true format as a mask.

NULLIF

You can override the default action of the SQL*Loader by placing a NULLIF condition on a field. For example, when a character field contains all blanks, you can direct SQL*Loader to mark the field as null rather than storing the blanks. Valid syntax for this field includes =BLANKS, ='quoted string', =X'ff' to indicate hexadecimal values, and != for 'not equal to' logic.

DEFAULTIF

You can override the default action of the SQL*Loader by placing a DEFAULTIF condition on a field. For example, when a numeric or DATE field contains all blanks, SQL*Loader rejects the entire record. To override this action, type =BLANKS in the DEFAULTIF property. When SQL*Loader evaluates this condition, it sets the numeric field to zeros and loads the record. Valid syntax for this field includes =BLANKS, ='quoted string', =X'ff' to indicate hexadecimal values, and != for 'not equal to' logic.

SQL Properties

The second set of properties are the SQL properties that include SQL Type, SQL Length, SQL Precision, and SQL Scale. These properties specify how the fields in a flat file translate to the columns in a relational table for example.

The SQL properties you set here have the following implications for mapping design, validation, and generation:

  • External table: If you create an external table based on a single flat file record type, the columns properties are based on the SQL properties you defined for the flat file. For more information about external tables, see "Using External Tables".

  • Populating an Empty Mapping Table. In a mapping, if you populate an empty relational table with the metadata, the table inherits the SQL properties you defined for the flat file source.

  • Flat file target: If you use the flat file as a target in a mapping, the target does not inherit the SQL properties. Instead, all fields inherit the default SQL*Loader data type. For more information about using a flat file operator as a target, see "Flat File Operator".

SQL Type

Warehouse Builder supports many SQL data types such as CHAR, DATE, FLOAT, and BLOB.

The wizard assigns a default value for the SQL type based on the SQL*Loader properties you set. If you accept the default SQL type, the type is updated if you later change the SQL*Loader properties. However, if you override the SQL type by selecting a new SQL type from the list, it then becomes independent of the flat file SQL*Loader data type.

SQL Length

This property defines the length for the SQL column.

SQL Precision

When the SQL type is for example NUMBER and FLOAT, this property defines the precision for the SQL column.

SQL Scale

When the SQL type is for example NUMBER and FLOAT, this property defines the scale for the SQL column.

Importing Definitions from Flat Files

If you have existing flat files to use as sources, you can import and then sample the metadata from these flat files. Use the Import Metadata Wizard to import metadata from flat files. This metadata must be imported into a file module.

To import flat file definitions:

  1. Establish network connectivity to the files you wish to import.

    If you are accessing the data files directly, and if the client and the data files reside on different types of operating systems, contact your system administrator to establish the required connectivity through NFS or other network protocol.

    If the client and data files reside on Windows operating systems, store the data files on any drive the client computer can access.

  2. Create a flat file module that will contain the imported flat file definitions.

    Create a module for each folder in your file system from which you want to import files.

    When you create a flat file module, the location corresponding to this module is a folder in the file system from which metadata is being imported. Use the Connection Information Page of the Create Module Wizard to specify the folder that contains the source metadata.

    Note that a flat file location does not include subfolders of the specified folder.

  3. Right-click the flat file module and select Import.

    The Import Metadata Wizard is displayed.

  4. On the Filter Information page, filter file names by selecting one of the following options:

    All Data Files: This option returns all the data files available for the directory you specified for the flat file module.

    Data files matching this pattern: Use this option to select only data files that match the pattern you type. For example, if you select this option and enter (*.dat), only files with .dat file extensions will be displayed on the next wizard page. If you type % as part of a filter string, it is interpreted as a wild card match for multiple characters. If you type '_' as part of a filter string, it is interpreted as a wild card match for a single character.

  5. On the Object Selection page, move the names of the files to be imported from Available Objects on the left to the Selected Objects section on the right.

    Because inbound synchronization for flat files is not permitted, the available objects will never appear in bold like other objects when they are reimported. When you reimport flat files, you always need to sample the flat file objects again.

  6. On the Summary and Import page, ensure that metadata for the selected flat files is available in the workspace. You cannot complete the import if the metadata is not present.

    If the Status field contains a red x, metadata is not available in the workspace. For all such files, either select a file with a matching format in the workspace or sample the file.

    Use the Same As field to select a file with a matching format.

    To sample a file, select the file and click Sample. The Flat File Sample Wizard is launches. The Flat File Sample Wizard enables you to view a sample of the flat file and define record organization and file properties. You can sample and define common flat file formats such as string and ascii.

    For files with complex record structures, the Flat File Sample Wizard may not be suitable for sampling the data. In such cases, see "Adding Existing Binary Files to the Workspace".

  7. Once you provide metadata information for all files you want to import, click Finish.

    The wizard creates definitions for the files, stores the definitions in the flat file module, and inserts the file names under the flat file module in the Project Explorer.

Using the Flat File Sample Wizard

Each time you use the Import Metadata Wizard to sample data from existing flat files, you have the option to start the Flat File Sample Wizard.

Use the Flat File Sample Wizard as an aid in defining metadata from flat files.

Note that although this wizard samples delimited and fixed length files, the Flat File Sample Wizard does not sample multi-byte character file with a fixed record format. For these and other complex record formats such as binary files, consider "Using the Create Flat File Wizard".

After you complete the Flat File Sample Wizard, it stores the metadata in the workspace and you can use the flat files as sources or targets in a mapping as described in "Creating a Source or Target Based on an Existing Flat File".

Describing the Flat File

Use the Name page to describe the flat file you are sampling.

  • Name: This name uniquely identifies the file in the workspace. By default, the wizard creates a name based on the name of the source file by replacing invalid characters with an underscore. For example, if the file name is myfile.dat, the wizard assign the workspace name myfile_dat.

    If you rename the file, do not include a space or any punctuation in the name. You can include an underscore. You can use upper and lower case letters. Do not start the name with a digit. Do not use reserved words. For a list of reserved words, see "Reserved Words" in the Warehouse Builder Online Help.

  • Description: You can type in an optional description for the file.

  • Character set: Character sets determine what languages can be represented in database objects and files. The default Globalization Support or National Language Support (NLS) character set matches the character set defined for the computer hosting Warehouse Builder. If the character set differs from that of the source file, the data sample might appear unintelligible. You can display the data sample in the character set native to the source by selecting it from the list. For complete information about NLS character sets, see Oracle Database Globalization Support Guide.

  • Number of characters to sample: You can indicate the number of characters for the wizard to sample from the data file. By default, the wizard samples the first 10,000 characters. To determine an optimum value for this field, see "Example: Flat File with Multiple Record Types".

After you complete the file set up information, click Next to continue with the wizard.

Selecting the Record Organization

Use the Record Organization page to indicate how records are organized in the file you are sampling.

Figure 8-1 Record Organization Page for the Flat File Sample Wizard

This illustration is described in the surrounding text.
Description of "Figure 8-1 Record Organization Page for the Flat File Sample Wizard"

Select between the two options to indicate how the length of each record in the file is determined:

  • Records delimited by: Select this option if the end of each record is designated by a delimiter. Then specify that record delimiter. You can accept the default record delimiter, carriage return (<CR>), or you can type in a new value.

  • Record length (in characters): Select this option if each record in the file is the same length. Then specify the number of characters in each record. For files with multibyte characters, count a multibyte character as one character.

Specifying Logical Records

The Flat File Sample Wizard enables you to sample files composed of logical records that correspond to multiple physical records. If the file contains logical records, click File contains logical records. Then select one of the options to describe the file.

The wizard updates the display of the logical record in the lower panel to reflect your selection. The default selection is one physical record for each logical record.

After you complete the logical record information, click Next to continue with the wizard.

  • Number of physical records for each logical record: The data file contains a fixed number of physical records for each logical record.

    PHYSICAL_RECORD1
    PHYSICAL_RECORD2
    PHYSICAL_RECORD3
    PHYSICAL_RECORD4
    

    In the preceding example, if the number of physical records for each logical record is 2, then PHYSICAL_RECORD1 and PHYSICAL_RECORD2 form one logical record and PHYSICAL_RECORD3 and PHYSICAL_RECORD4 form a second logical record.

  • End character of the current physical record: The data file contains a variable number of physical records with a continuation character at the end that signifies that the record is associated with the next physical record.

    In the following example, the continuation character is a percentage sign (%) at the end of the record.

    PHYSICAL_RECORD1%
    PHYSICAL_RECORD2       end log rec 1
    PHYSICAL_RECORD3%
    PHYSICAL_RECORD4       end log rec 2
    
  • Start character of the next physical record: The data file contains a variable number of physical records with a continuation character at the beginning of each physical record that signifies that the record is associated with the previous physical record.

    The following example shows two logical records with a continuation character at beginning of the record.

    PHYSICAL_RECORD1
    %PHYSICAL_RECORD2      end log rec1
    PHYSICAL_RECORD3
    %PHYSICAL_RECORD4      end log rec 2
    

    More than two records can be joined with this technique. The following example shows four physical records for each logical record using continuation at beginning.

    PHYSICAL_RECORD1
    %PHYSICAL_RECORD2
    %PHYSICAL_RECORD25
    %PHYSICAL_RECORD26   (end log record 1)
    PHYSICAL_RECORD3
    %PHYSICAL_RECORD4
    %PHYSICAL_RECORD45
    %PHYSICAL_RECORD46 (end log record 2)
    

Selecting the File Format

Use the File Format page to select between Fixed Length and Delimited formats for the file. The Flat File Sample Wizard does not sample multi-byte character file with a fixed record format. Instead, consider "Using the Create Flat File Wizard".

Figure 8-2 File Format Page for the Flat File Sample Wizard

This illustration is described in the surrounding text.
Description of "Figure 8-2 File Format Page for the Flat File Sample Wizard"

When you select a file format, the wizard updates the sample displayed at the bottom of the wizard page. You can use the scroll bars to navigate the sample data.

When your file is delimited, specify the following properties:

  • Field delimiter: Field delimiters designate where one field ends and another begins. You can type in a field delimiter or select one from the list. The list displays common field delimiters. However, you may type in any character as a delimiter except the ones used for enclosures. The default is the comma (,).

  • Enclosures (Left and Right): Some delimited files contain enclosures that denote text strings within a field. If the file contains enclosures, enter an enclosure character in the text box or select one from the list. The list displays common enclosures. However, you may type in any character. The default for both the left and right enclosure is the double quotation mark (").

Selecting the File Layout

Use the File Layout page to specify the number of rows to skip and to select between a single record type versus multiple record types.

Figure 8-3 File Layout Page for the Flat File Sample Wizard

This illustration is described in the surrounding text.
Description of "Figure 8-3 File Layout Page for the Flat File Sample Wizard"

Indicate the number of records to skip in Skip rows. This is useful for skipping over unwanted header information. If one of the records includes field names, skip the preceding header records so that the record containing field names is first in the file. Later in the wizard, on the Field Properties page, you can instruct the wizard to use that record for field names if you are defining a single record file type.

Indicate whether the file contains a single record type or multiple record types.

  • If the file contains only one type of record, select Single.

  • If the file contains more than one record type, select Multiple. Later in the wizard you can instruct the wizard to scan the file for the record types. For more information about multiple record types, see "Selecting Record Types (Multiple Record Type Files Only)".

Selecting Record Types (Multiple Record Type Files Only)

Use the Record Types wizard page to scan the flat file for record types, add or delete record types, and assign type values to the record types.

Note:

This step in not necessary for files with a single record type. If the data file has a single record type and fixed length file format, proceed to "Specifying Field Lengths (Fixed-Length Files Only)" . If the data file has a single record type and delimited file format, proceed to "Specifying Field Properties".

Example: Flat File with Multiple Record Types

In files with multiple record types, one of the fields distinguishes one record type from the next. When you use the Flat File Sample Wizard, you instruct the wizard to scan a specified field of every record for the record type values.

Figure 8-4 shows an example of a comma delimited file with two record types, "E" and "P". In this case, instruct the wizard to scan the first field. The wizard returns "E" and "P" as the type values.

Figure 8-4 Example of a File with Multiple Record Types

This illustration is described in the surrounding text.

When you use the wizard to sample flat files with multiple record types, ensure that the sample size you specified on the Name page is large enough to include each record type at least once. The default is 10,0000 characters.

Because sampling cannot be canceled after it has been started, make sure you pick a "reasonable" number of characters for optimum performance. If all record types do not appear within a reasonable number of characters, you can mock up a sample file with rows selected from different parts of the master file to provide a representative set of data. If you do not know your data well, you may choose sample the entire file. If you know your data well, you can scan a representative sample and then manually add new record types.

Defining Multiple Record Organization in a Delimited File

When a delimited flat file contains several different types of records, you can use the scanning feature within the Flat File Sample Wizard to search and label record types.

Figure 8-5 Record Types Page for Delimited Files

This illustration is described in the surrounding text.
Description of "Figure 8-5 Record Types Page for Delimited Files"

To complete the Records Type page for a delimited file:

  1. Select the one field that identifies the record types in the file.

    The wizard displays all the fields in a sample in the lower panel of the page. Select the field from the sample box. Or, in Field position, you can type in the position as it appears in the sample. Unless you specify otherwise, the wizard defaults to the first field in the file.

    The wizard scans the file for the field and displays the type values. The wizard assigns default record names (RECORD1, RECORD2...) to each type value.

  2. You can edit the record names.

    Click a record name to rename it or select a different record name from the list. You can associate a record name with multiple record type values. You can also add or delete type values using the New and Delete buttons.

  3. Click Next to continue with the wizard.

Defining Multiple Record Organization in a Fixed-Length File

When a fixed-length flat file contains several different types of records, you can use the scanning feature within the Flat File Sample Wizard to search for record types and assign a type value to each record type.

Figure 8-6 Record Types Page for Fixed Length Files

This illustration is described in the surrounding text.
Description of "Figure 8-6 Record Types Page for Fixed Length Files"

To complete the Records Type page for a fixed-length file:

  1. Specify the one field that identifies the record types in the file. Type in values for the Start position and End position. If you want to scan for records based on the first field, enter 0 for Start Position.

    The wizard indicates the selected field with a red tick mark in the ruler in the file sample in the lower panel of the page.

  2. Click Scan.

    The wizard scans the file field and displays the type values. The wizard assigns default record names (RECORD1, RECORD2...) to each type value.

  3. You can edit the record names.

    Click a record name to rename it or select a different record name from the list. You can associate a record name with multiple record type values. You can also add or delete type values using the New and Delete buttons.

  4. Click Next to continue with the wizard.

Specifying Field Lengths (Fixed-Length Files Only)

When you use the Flat File Sample Wizard to define a fixed-length flat file, you also need to define the length of each field in the file.

Note:

This step is not necessary for delimited files. Proceed to "Specifying Field Properties".

You can define field lengths by typing in the field lengths or by using the ruler.

Figure 8-7 Field Lengths Page for the Flat File Sample Wizard

This illustration is described in the surrounding text.
Description of "Figure 8-7 Field Lengths Page for the Flat File Sample Wizard"

If you know the length of each field, type in the field length in Field Lengths. Separate each length by commas. The wizard displays the changes to the sample at the bottom of the wizard page.

To use the ruler, click any number or hash mark on the ruler. The wizard displays a red tick mark on top of the ruler and marks the boundary with a red line. If you make a mistake, double-click the marker to delete it or move the marker to another position. Use the ruler to create markers for each field in the file.

Note that when you specify a field length using the ruler, your tick markers indicate the starting and ending borders for each field. From this information, the positions occupied by each field is determined. For example, a three-character field occupying positions 6, 7, and 8 is internally identified with the beginning and ending values of '5,8'.

Specifying Field Lengths for Multiple Record Files

You can select the record type by name from Record Name. Or, you can select Next Record Type from the lower right corner of the wizard page. The number of records with unspecified field lengths is indicated on the lower left corner of the wizard page.

If the flat file contains multiple record types, the wizard prompts you to specify field lengths for each record type before continuing.

Figure 8-8 Field Lengths for Multiple Record Files page

This illustration is described in the surrounding text.
Description of "Figure 8-8 Field Lengths for Multiple Record Files page"

Specifying Field Properties

Use the Field Properties page in the Flat File Sample Wizard to define properties for each field. The wizard assigns a name to each field. It assigns 'C1'to the first field, 'C2' to the second, and so on. To rename fields, click a field and type a new name.

For single record file types, you can instruct the wizard to use the first record in the file to name the fields. Indicate this by checking the box entitled Use the first record as the field names. If you select this option, all the field data type attributes default to CHAR.

Since you can use a flat file in a mapping either directly as a flat file source or indirectly via an external table, the Field Properties page shows both SQL*Loader Properties and SQL Properties. Use the scroll bar to scroll to the right and view all the properties.

The wizard deactivates properties that do not apply to a given data type. For example, you can edit the length for a CHAR, but precision and scale are not available. Deactivated properties are grayed out.

Note:

Once you complete the Field Properties page, verify your selections on the Summary page and select Finish. The Flat File Sample Wizard returns you to the Import Metadata Wizard described in "Importing Definitions from Flat Files". You can select more files to sample or select Finish to begin the import.

Figure 8-9 Field Properties Page for the Flat File Sample Wizard

This illustration is described in the surrounding text.
Description of "Figure 8-9 Field Properties Page for the Flat File Sample Wizard"

SQL*Loader Properties

The first set of properties the wizard displays are for the SQL*Loader utility. When you use the flat file directly as a source in a mapping, SQL*Loader and the properties you set here are used. SQL*Loader properties include Type, Start, End, Length, Precision, Scale, Mask, NULLIF, and DEFAULTIF.

By default, the wizard displays these properties as they appear in the source file. Edit these properties or accept the defaults to specify how the fields should be handled by the SQL*Loader.

Type

Describes the data type of the field for the SQL*Loader. You can use the wizard to import many data types such as CHAR, DATE, DECIMAL EXTERNAL, FLOAT EXTERNAL, INTEGER EXTERNAL, ZONED, and ZONED EXTERNAL. For complete information about SQL*Loader field and data types, see Oracle Database Utilities. Currently, only portable data types are supported.

Start

In fixed length files, indicates the field start position.

End

In fixed length files, indicates the field end position.

Length

Specifies the length of the field to be used by SQL* Loader. For delimited files, the field length is not populated, but you can manually edit it if you know the maximum length of the field.

Precision

Defines precision for DECIMAL and ZONED data types.

Scale

Defines the scale for DECIMAL and ZONED data types.

Mask

The SQL*Loader uses dd-mon-yy as its default date mask. You can override this default by entering a valid date mask when you describe the file. For example, if the input data has the format DD-Mon-YYYY rather than the SQL*Loader default, you can enter the true format as a mask.

NULLIF

You can override the default action of the SQL*Loader by placing a NULLIF condition on a field. For example, when a character field contains all blanks, you can direct SQL*Loader to mark the field as null rather than storing the blanks. Valid syntax for this field includes =BLANKS, ='quoted string', =X'ff' to indicate hexadecimal values, and != for 'not equal to' logic.

DEFAULTIF

You can override the default action of the SQL*Loader by placing a DEFAULTIF condition on a field. For example, when a numeric or DATE field contains all blanks, SQL*Loader rejects the entire record. To override this action, type =BLANKS in the DEFAULTIF property. When SQL*Loader evaluates this condition, it sets the numeric field to zeros and loads the record. Valid syntax for this field includes =BLANKS, ='quoted string', =X'ff' to indicate hexadecimal values, and != for 'not equal to' logic.

SQL Properties

The second set of properties are the SQL properties that include SQL Type, SQL Length, SQL Precision, and SQL Scale. These properties specify how the fields in a flat file translate to the columns in a relational table for example.

The SQL properties you set here have the following implications for mapping design, validation, and generation:

  • External table: If you create an external table based on a single flat file record type, the columns properties are based on the SQL properties you defined for the flat file. For more information about external tables, see "Using External Tables".

  • Populating an Empty Mapping Table. In a mapping, if you populate an empty relational table with the metadata, the table inherits the SQL properties you defined for the flat file source.

  • Flat file target: If you use the flat file as a target in a mapping, the target does not inherit the SQL properties. Instead, all fields inherit the default SQL*Loader data type. For more information about using a flat file operator as a target, see "Flat File Operator".

SQL Type

Warehouse Builder supports many SQL data types such as CHAR, DATE, FLOAT, and BLOB.

The wizard assigns a default value for the SQL type based on the SQL*Loader properties you set. If you accept the default SQL type, the default is updated in the event you later change the SQL*Loader properties. However, if you override the SQL type by selecting a new SQL type from the list, it then becomes independent of the flat file SQL*Loader data type.

SQL Length

This property defines the length for the SQL column.

SQL Precision

When the SQL type is for example NUMBER and FLOAT, this property defines the precision for the SQL column.

SQL Scale

When the SQL type is for example NUMBER and FLOAT, this property defines the scale for the SQL column.

Updating a File Definition

You can update the definition of the file format by editing its property sheet.

To update a file definition:

  1. Select the file definition in the Project Explorer.

  2. Right-click the file name and select Properties.

    Warehouse Builder displays the Flat File property sheet with the following tabs:

    Name Tab: Use this tab to edit the name and descriptive for the file.

    General Tab: Use this tab change the global properties, such as the physical record size, the number of physical records for each logical record, and the delimiter and enclosure characters.

    Record Tab: This tab is available only for flat files with multiple record types. Use this tab to redefine fields or add, delete, or edit record types.

    Structure Tab: Use this tab to edit field level attributes, SQL Loader and SQL Properties.

Name Tab

Use this tab to edit the name, default physical file name, description, and character set for the file.

General Tab

Use this tab to change the global properties, such as the physical record size, the number of physical records for each logical record, and the delimiter and enclosure characters.

For delimited records, the General tab contains the following fields:

  • Records delimited by: Select this option if the end of each record is designated by a delimiter. Then specify that record delimiter. You can accept the default record delimiter, carriage return (<CR>), or you can type in a new value.

  • Record length (in characters): Select this option if each record in the file is the same length. Then specify the number of characters in each record. For files with multibyte characters, count a multibyte character as one character.

If the file contains logical records, click File contains logical records. Then select one of the following options to describe the file:

  • Number of physical records for each logical record: The data file contains a fixed number of physical records for each logical record.

  • End character of the current physical record: The data file contains a variable number of physical records with a continuation character at the end that signifies that the record is associated with the next physical record.

  • Start character of the next physical record: The data file contains a variable number of physical records with a continuation character at the beginning of each physical record that signifies that the record is associated with the previous physical record.

Record Tab

If the file contains multiple record types, you can select each record type from the Record Name field. Warehouse Builder displays the Record sheet and you can edit the record type information.

Record type is in column number: This field displays the column that contains the record type indicator. You can change this value. For example, if you have a flat file with two record types that are distinguished from each other by the first character in the third column as shown in the following list, then the value in this field is 3:

  • Record Type 1: 2002 0115 E 4564564

  • Record Type 2: 2003 1231 D 659871 Q HKLIH

Record type values: This table displays each record type, the value that distinguishes it from the other record types, and the name you have given to the record type. Table 8-1 shows an example of what the record type values for the two sample records earlier might be:

Table 8-1 Example of Record Type Values

Type Value Record Name

E

Employee

D

Department


  • To add new record types, click New and enter a Type Value and a Record Name describing the record type.

  • To delete record types, select the check box to the left of each record type you want to remove and click Delete.

After you have identified and defined the sources for our target system, you are ready to model your target schema.

Structure Tab

Use the Structure tab to edit a field name, data type, mask, SQL*Loader Properties and SQL Properties. You can add or delete a field. You can also add a field mask, NULLIF condition, or DEFAULTIF condition.

If the file contains multiple record types, you can select each record type from the Record Name field. Warehouse Builder displays the Record sheet and you can edit the record type information.

The wizard deactivates properties that do not apply to a given data type. For example, you can edit the length for a CHAR, but precision and scale are not available.

Using External Tables

External tables are database objects in the Oracle Database, versions 9i and higher. You cannot use external tables with any other database type or any Oracle Database previous to the 9i release.

External tables are tables that represent data from flat files in a relational format. They are read-only tables that act like regular source tables. When you create and define an external table, the metadata for the external table is saved in the workspace. You can use these external table definitions in mappings to design how you want to move and transform data from flat file sources to your targets.

The following sections provide information about external tables:

Creating a New External Table Definition

Before you begin

Each external table you create corresponds to a single record type in an existing flat file. Before you begin, first define the file within the workspace by one of two methods described in "Importing ASCII Files into the Workspace" and "Adding Existing Binary Files to the Workspace".

To create a new external table definition:

  1. From the Project Explorer expand the Databases node and then the Oracle node.

  2. Expand the target module where you want to create the external table.

  3. Right-click the External Tables node and select New.

    Warehouse Builder displays the Welcome page of the Create External Table wizard. Use the wizard to complete the following pages:

Name Page

Use the Name page to define a name and an optional description for the external table. Enter the name in the Name field. In physical mode, you must type a name between 1 and 30 valid characters. Spaces are not allowed. In logical mode, you can type a unique name up to 4000 characters in length. The external table name must be unique within the table. Spaces are allowed.

Use the Description field to enter an optional description for the external table.

File Selection Page

The wizard displays the File Selection page. The wizard lists all the flat files available in the workspace. Select a file upon which to base the external table. To search through long lists of files, type the first few letters of the file name and click Find.

If you cannot find a file, make sure you either imported or defined the metadata for the file as described in "About Flat Files".

If you select a file that contains multiple record types, you must also select the record type name at the bottom of the File Selection page. An external table can represent only one record type.

Locations Page

You can select a location from the list which lists the locations associated with flat files. Alternatively, you can leave the location unspecified. If you do not specify a location in the wizard, you can later specify a location on the external table properties sheet.

Tip:

You must create and deploy a connector between the locations for the flat file and the Oracle module before you can deploy the external table.

Synchronizing an External Table Definition with a Record in a File

Warehouse Builder enables you to update the external table definition with the metadata changes made to the file associated with the external table. You do this by synchronizing the external table with the source file.

To synchronize an external table definition with a record in a file:

  1. In the Project Explorer, right-click the external table that you want to synchronize and select Synchronize.

    Warehouse Builder displays the Reconcile dialog box.

  2. Use the Select the Object to synchronize list to specify the flat file with which the external table is to be synchronized.

    By default, the flat file that was used to create the external table is displayed in this list. Expand the list to see a list of flat file modules and the flat files they contain.

  3. Click Advanced.

    Warehouse Builder displays the Advanced dialog box.

  4. Use the Match Strategy list to specify how the search is performed for matches and the external table with the information from the flat file is updated. The options for match strategy are:

    MATCH_BY_OBJECT_ID: This strategy compares the field IDs of that the external table columns references with the field IDs in the flat file.

    MATCH_BY_OBJECT_NAME: This strategy compares the physical names in the external table with the physical names in the flat file.

    MATCH_BY_OBJECT_POSITION: This strategy matches by position, regardless of physical names and IDs. The first external table attribute is reconciled with the first record in the file, the second with the second, and so on. Use this strategy when you want to reconcile the external table with a new record.

  5. Use the Synchronize Strategy list to indicate how differences in metadata between the existing external table definition and the record you specified are handled.

    Merge: The metadata from the existing external table definition and the record you specified is combined.

    Replace: The metadata is deleted from the external table definition if it does not match the metadata from the record you specified. The resulting reconciled external table contains metadata that matches the file record metadata.

  6. Click OK.

    The Advanced dialog box is closed and you return to the Reconcile dialog box.

  7. Click OK to complete synchronizing the external table definition.

Editing External Table Definitions

Use the External Table editor to edit an external table definition. To open the editor, right-click the name of the external table from the Project Explorer and select Open Editor. The Edit External Table dialog box is displayed. The tabs and properties that you can edit depend on how you defined the external table in the workspace.

The External Table Properties window displays with the following tabs:

Name Tab

Use the Name tab to rename the external table. The same rules for renaming tables apply to external tables. For more information, see "Naming Conventions for Data Objects".

Columns Tab

Use the Columns tab to add or edit columns. The same rules for adding columns to tables apply to external tables. For more information, see "Editing Table Definitions".

File Tab

Use the File tab to view the name of the flat file that provides the metadata for the external table. If the source flat file has multiple record types, the File tab also displays the record name associated with the external table. You can update this relationship or change it to a different file and record by reconciling the external table. For more information, see "Synchronizing an External Table Definition with a Record in a File".

The File tab displays under the following conditions:

  • You used the New External Table Wizard to create the external table and you specified a file name.

  • You did not specify a file name in the New External Table Wizard, but you reconciled the external table definition with a file and record.

Locations Tab

Use the Location tab to view or change the flat file location. The Location list displays the available locations. Select a location from this list.

Data Rules Tab

Use the Data Rules tab to define data rules for the external table. For more information about using data rules, see "Using Data Rules".

Access Parameters Tab

Access parameters define how to read from the flat file. In some cases, the External Table editor displays the Access Parameters tab instead of the File tab.

The tab for the access parameters displays under the following conditions:

  • You imported an external table from another workspace. In this case, you can view and edit the access parameters.

  • You created an external table in an Oracle Database and imported its definition. In this case, you can view and edit the access parameters.

  • You used the Create External Table Wizard to create an external table and did not specify a reference file. The access parameters will be empty. Before generating the external table, you must reconcile the external table definition with a flat file record or manually enter the access parameters into the properties sheet.

The access parameters describe how fields in the source data file are represented in the external table as columns. For example, if the data file contained a field emp_id with a data type of INTEGER(2), the access parameters could indicate that field be converted to a character string column in the external table.

Although you can make changes to the access parameters that affect how the external table is generated and deployed, it is not recommended. Warehouse Builder does not validate the changes. For more information about the access parameters clause, see Oracle Database Utilities Manual.

Configuring External Tables

Configure the following properties for an external table:

To configure the physical properties for an external table:

  1. Select an external table from the Project Explorer.

  2. From the Edit menu, select Configure. You can also click the Configure icon from the toolbar.

    The Configuration Property window is displayed.

  3. To configure a property, click the white space and make a selection from the list.

Access Specification

If you imported the external table into the workspace or created the external table without specifying a source file, do not configure these properties. Access specification properties are overruled by settings on the Access Parameters tab in the External Table Properties window.

Under Access Specification, you can indicate the following file names and locations Warehouse Builder uses to load the external table through SQL*Loader.

  • Bad File: If you specify a name and location for a bad file, the Oracle Database is directed to write to that file all records that were not loaded due to errors. For example, records written to the bad file include those not loaded due to a data type error in converting a field into a column in the external table. If you specify a bad file that already exists, the existing file is overwritten.

  • Discard File: If you specify a name and location for a discard file, the Oracle Database is directed to write to that file all records that were not loaded based on a SQL *Loader load condition placed on the file. If you specify a discard file that already exists, the existing file is overwritten.

  • Log File: If you specify a name and location for a log file, the Oracle Database is directed to log messages related to the external table to that file. If you specify a log file that already exists, new messages are appended.

For each of these files, you can either specify a file name and location, select Do not use, or select Use default location.

Reject

Under Reject, you can indicate how many rejected rows to allow. By default, the number of rejected rows allowed is unlimited. If you set Rejects are unlimited to false, enter a number in Number of rejects allowed.

Parallel

Parallel: Enables parallel processing. If you are using a single system, set the value to NONPARALLEL to improve performance. If you are using multiple systems, accept the default PARALLEL.The access driver attempts to divide data files into chunks that can be processed separately. The following file, record, and data characteristics make it impossible for a file to be processed in parallel:

  • Sequential data sources (such as a tape drive or pipe).

  • Data in any multibyte character set whose character boundaries cannot be determined starting at an arbitrary byte in the middle of a string. This restriction does not apply to any data file with a fixed number of bytes for each record.

  • Records with the VAR format

Data Characteristics

If you imported the external table into the workspace or created the external table without specifying a source file, do not configure these properties. Data characteristics properties are overruled by settings on the Access Parameters tab in the External Table Properties window.

Under Data Characteristics you can set the following properties:

  • Endian: The default for the Endian property is Platform. This indicates that it is assumed that the endian of the flat file matches the endian of the platform on which it resides. If the file resides on a Windows platform, the data is handled as little-endian data. If the file resides on Sun Solaris or IBM MVS, the data is handled as big-endian. If you know the endian value for the flat file, you can select big or little-endian. If the file is UTF16 and contains a mark at the beginning of the file indicating the endian, that endian is used.

  • String Sizes in: This property indicates how data with multibyte character sets, such as UTF16, is handled. By default, the lengths for character strings in the data file are assumed to be in bytes. You can change the selection to indicate that strings sizes are specified in characters.

Field Editing

If you imported the external table into the workspace or created the external table without specifying a source file, do not configure these properties. Field editing properties are overruled by settings on the Access Parameters tab in the External Table Properties window.

Under Field Editing, you can indicate the type of whitespace trimming to be performed on character fields in the data file. The default setting is to perform no trim. All other trim options can reduce performance. You can also set the trim option to trim blanks to the left, right, or both sides of a character field.

Another option is set the trim to perform according to the SQL*Loader trim function. If you select SQL*Loader trim, fixed-length files are right trimmed and delimited files specified to have enclosures are left trimmed only when a field is missing an enclosure.

You can indicate how to handle missing fields in a record. If you set the option Trim Missing Values Null to true, fields with missing values are set to NULL. If you set the property to false, fields with missing values are rejected and sent to specified bad file.

Identification

See "Identification" for details.

Data Files

You must add at least one data file to an external table to associate the external table with more than one flat file.

To add a data file:

  1. Right-click the Data Files node and select Create.

    Type a name for the data file such as DATAFILE1. Your entry displays as a new node in the right panel of the Configuration Properties dialog box.

  2. Type in the following values for each data file you define:

    Data File Location: Location for the flat file.

    Data File Name: The name of the flat file including its extension. For example, type myflatfile.dat.