5 Creating and Reverse-Engineering a Model

This chapter describes how to create a model, how to reverse-engineer this model to populate it with datastores and how to create manually datastores of a model. This chapter also explains how to use partitioning and check the quality of the data in a model.

This chapter includes the following sections:

Section 5.1, "Introduction to Models"
Section 5.2, "Creating and Reverse-Engineering a Model"
Section 5.3, "Creating and Reverse-Engineering a Datastore"
Section 5.4, "Editing and Viewing a Datastore's Data"
Section 5.5, "Using Partitioning"
Section 5.6, "Checking Data Quality in a Model"

5.1 Introduction to Models

A Model is the description of a set of datastores. It corresponds to a group of tabular data structures stored in a data server. A model is based on a Logical Schema defined in the topology. In a given Context, this Logical Schema is mapped to a Physical Schema. The Data Schema of this Physical Schema contains physical data structure: tables, files, JMS messages, elements from an XML file, that are represented as datastores.

Models as well as all their components are based on the relational paradigm (table, columns, keys, etc.). Models in Data Integrator only contain Metadata, that is the description of the data structures. They do not contain a copy of the actual data.

Note:

Frequently used technologies have their reverse and model creation methods detailed in the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator.

Models can be organized into model folders and the datastores of a model can be organized into sub-models. Section 18.2, "Organizing Models with Folders" describes how to create and organize model folders and sub-models.

5.1.1 Datastores

A datastore represents a data structure. It can be a table, a flat file, a message queue or any other data structure accessible by Oracle Data Integrator.

A datastore describes data in a tabular structure. Datastores are composed of columns.

As datastores are based on the relational paradigm, it is also possible to associate the following elements to a datastore:

Keys

A Key is a set of columns with a specific role in the relational paradigm. Primary and Alternate Keys identify each record uniquely. Non-Unique Indexes enable optimized record access.
References

A Reference is a functional link between two datastores. It corresponds to a Foreign Key in a relational model. For example: The INVOICE datastore references the CUSTOMER datastore through the customer number.
Conditions and Filters

Conditions and Filters are a WHERE-type SQL expressions attached to a datastore. They are used to validate or filter the data in this datastore.

5.1.2 Data Integrity

A model contains constraints such as Keys, References or Conditions, but also non-null flags on columns. Oracle Data Integrator includes a data integrity framework for ensuring the quality of a data model.

This framework allows to perform:

Static Checks to verify the integrity of the data contained in a data model. This operation is performed to assess the quality of the data in a model when constraints do not physically exist in the data server but are defined in Data Integrator only.
Flow Check to verify the integrity of a data flow before it is integrated into a given datastore. The data flow is checked against the constraints defined in Oracle Data Integrator for the datastore that is the target of the data flow.

5.1.3 Reverse-engineering

A new model is created with no datastores. Reverse-engineering is the process that populates the model in Oracle Data Integrator by retrieving metadata from the data server containing the data structures. There are two different types of reverse-engineering:

Standard reverse-engineering uses standard JDBC driver features to retrieve the metadata. Note that unique keys are not reverse-engineered when using a standard reverse-engineering.
Customized reverse-engineering uses a technology-specific Reverse Knowledge Module (RKM) to retrieve the metadata, using a method specific to the given technology. This method is recommended if a technology specific RKM exists because it usually retrieves more information than the Standard reverse-engineering method. See the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator for a list of available RKMs.

Other methods for reverse-engineering exist for flat file datastores. They are detailed in Section 5.3.2, "Reverse-Engineering File Datastores".

Oracle Data Integrator is able to reverse-engineer models containing datastore shortcuts. For more information, see Chapter 17, "Working with Shortcuts".

5.1.4 Changed Data Capture

Change Data Capture (CDC), also referred to as Journalizing, allows to trap changes occurring on the data. CDC is used in Oracle Data Integrator to eliminate the transfer of unchanged data. This feature can be used for example for data synchronization and replication.

Journalizing can be applied to models, sub-models or datastores based on certain type of technologies.

For information about setting up Changed Data Capture, see Chapter 6, "Working with Changed Data Capture".

5.2 Creating and Reverse-Engineering a Model

Now that the key components of an ODI model have been described, an overview is provided on how to create and reverse-engineer a model.

5.2.1 Creating a Model

A Model is a set of datastores corresponding to data structures contained in a Physical Schema.

To create a Model:

In Designer Navigator expand the Models panel.
Right-click then select New Model.
Fill in the following fields in the Definition tab:
- Name: Name of the model used in the user interface.
- Technology: Select the model's technology.
- Logical Schema: Select the Logical Schema on which your model will be based.
Go to the Reverse tab, and select a Context which will be used for the model's reverse-engineering.

Note that if there is only one context that maps the logical schema, this context will be set automatically.
Select Save from the File main menu.

The model is created, but contains no datastore yet.

5.2.2 Reverse-engineering a Model

To automatically populate datastores into the model you need to perform a reverse-engineering for this model.

Standard Reverse-Engineering

A Standard Reverse-Engineering uses the capacities of the JDBC driver used to connect the data server to retrieve the model metadata.

To perform a Standard Reverse- Engineering:

In the Reverse tab of your Model:
- Select Standard.
- Select the Context used for the reverse-engineering
- Select the Types of objects to reverse-engineer. Only object of these types will be taken into account by the reverse-engineering process.
- Enter in the Mask field the mask of tables to reverse engineer. The mask selects the objects to reverse. This mask uses the SQL LIKE syntax. The percent (%) symbol means zero or more characters, and the underscore (_) symbol means one character.
- Optionally, you can specify the characters to remove for the table alias. These are the characters to delete in order to derive the alias. Note that if the datastores already exist, the characters specified here will not be removed from the table alias. Updating a datastore is not applied to the table alias.
In the Selective Reverse tab select Selective Reverse, New Datastores, Existing Datastores and Objects to Reverse.
A list of datastores to be reverse-engineered appears. Leave those you wish to reverse-engineer checked.
Select Save from the File main menu.
Click Reverse Engineer in the Model toolbar menu.
Oracle Data Integrator launches a reverse-engineering process for the selected datastores. A progress bar indicates the progress of the reverse-engineering process.

The reverse-engineered datastores appear under the model node in the Models panel.

Customized Reverse-Engineering

A Customized Reverse-Engineering uses a Reverse-engineering Knowledge Module (RKM), to retrieve metadata for a specific type of technology and create the corresponding datastore definition in the data model.

For example, for the Oracle technology, the RKM Oracle accesses the database dictionary tables to retrieve the definition of tables, columns, keys, etc, that are created in the model.

Note:

The RKM must be available as a global RKM or imported into the project. Refer to Chapter 9, "Creating an Integration Project" for more information on KM import.

To perform a Customized Reverse-Engineering using a RKM:

In the Reverse tab of your Model:
- Select Customized.
- Select the Context used for the reverse-engineering
- Select the Types of objects to reverse-engineer. Only object of these types will be taken into account by the reverse-engineering process.
- Enter in the Mask the mask of tables to reverse engineer.
- Select the KM that you want to use for performing the reverse-engineering process. This KM is typically called RKM <technology>.<name of the project>.
- Optionally, you can specify the characters to remove for the table alias. These are the characters to delete in order to derive the alias. Note that if the datastores already exist, the characters specified here will not be removed from the table alias. Updating a datastore is not applied to the table alias.
Click Reverse Engineer in the Model toolbar menu, then Yes to validate the changes.
Click OK.
The Session Started Window appears.
Click OK.

You can review the reverse-engineering tasks in the Operator Navigator. If the reverse-engineering process completes correctly, reverse-engineered datastores appear under the model node in the Models panel.

5.3 Creating and Reverse-Engineering a Datastore

Although the recommended method for creating datastores in a model is reverse-engineering, it is possible to manually define datastores in a blank model. It is the recommended path for creating flat file datastores.

5.3.1 Creating a Datastore

To create a datastore:

From the Models tree in Designer Navigator, select a Model or a Sub-Model.
Right-click and select New Datastore.
In the Definition tab, fill in the following fields:
- Name of the Datastore: This is the name that appears in the trees and that is used to reference the datastore from a project.
  
  Note:
  
  Do not use ODI reserved names like, for example, JRN_FLAG, JRN_SUBSCRIBER, and JRN_DATE for the datastore name. These names would cause Duplicate Column name SQL errors in ODI intermediate tables such as error tables.
- Resource Name: Name of the object in the form recognized by the data server which stores it. This may be a table name, a file name, the name of a JMS Queue, etc.
- Alias: This is a default alias used to prefix this datastore's columns names in expressions.
If the datastore represents a flat file (delimited or fixed), in the File tab, fill in the following fields:
- File Format: Select the type of your flat file, fixed or delimited.
- Header: Number of header lines for the flat file.
- Record Separator and Field Separator define the characters used to separate records (lines) in the file, and fields within one record.
  
  Record Separator: One or several characters separating lines (or records) in the file:
  - MS-DOS: DOS carriage return
  - Unix: UNIX carriage return
  - Other: Free text you can input as characters or hexadecimal codes
  Field Separator: One ore several characters separating the fields in a record.
  - Tabulation
  - Space
  - Other: Free text you can input as characters or hexadecimal code
Select Save from the File main menu.

The datastore is created. If this is a File datastore, refer to the Reverse-Engineering File Datastores section for creating columns for this datastore. It is also possible to manually edit columns for all datastores. See Adding and Deleting Datastore Columns for more information.

5.3.2 Reverse-Engineering File Datastores

Oracle Data Integrator provides specific methods for reverse-engineering flat files. The methods for reversing flat files are described below.

5.3.2.1 Reverse-Engineering Fixed Files

Fixed files can be reversed engineered using a wizard into which the boundaries of the fixed columns and their parameters can be defined.

Go to the Columns tab the file datastore that has a fixed format.
Click the Reverse button. A window opens displaying the first records of your file.
Click on the ruler (above the file contents) to create markers delimiting the columns. Right-click in the ruler to delete a marker.
Columns are created with pre-generated names (C1, C2, and so on). You can edit the column name by clicking in the column header line (below the ruler).
In the properties panel (on the right), you can edit the parameters of the selected column.
You must set at least the Column Name, Datatype and Length for each column. Note that column names of File datastores cannot contain spaces.
Click OK when the columns definition is complete to close the wizard.
Select Save from the File main menu.

5.3.2.2 Reverse-Engineering Delimited Files

Delimited files can be reversed engineered using a a built-in JDBC which analyzes the file to detect the columns and reads the column names from the file header.

Go to the Columns tab the file datastore that has a delimited format.
Click the Reverse button.
Oracle Data Integrator creates the list of columns according to your file content. The column type and length are set to default values. Column names are pre-generated names (C1, C2, and so on) or names taken from the first Header line declared for this file.
Review and if needed modify the Column Name, Datatype and Length for each column. Note that column names of File datastores cannot contain spaces.
Select Save from the File main menu.

5.3.2.3 Reverse-Engineering COBOL Files

Fixed COBOL files structures are frequently described in Copybook files. Oracle Data Integrator can reverse-engineer the Copybook file structure into a datastore structure.

Go to the Columns tab the file datastore that has a fixed format.
Click the Reverse COBOL Copybook button.
Fill in the following fields:
- File: Location of the Copybook file.
- Character Set: Copybook file character set.
- Description format (EBCDIC or ASCII): Copybook file format
- Data format (EBCDIC or ASCII): Data file format.
Click OK. The columns described in the Copybook are reverse-engineered and appear in the column list.
Select Save from the File main menu.

5.3.3 Adding and Deleting Datastore Columns

To add columns to a datastore:

In the Columns tab of the datastore, click Add Column in the toolbar menu.
An empty line appears. Fill in the information about the new column. You should at least fill in the Name, Datatype and Length fields.
Repeat steps 1 and 2 for each column you want to add to the datastore.
Select Save from the File main menu.

To delete columns from a datastore:

In the Columns tab of the datastore, select the column to delete.
Click the Delete Column button. The column disappears from the list.

5.3.4 Adding and Deleting Constraints and Filters

Oracle Data Integrator manages constraints on data model including Keys, References, Conditions and Mandatory Columns. It includes a data integrity framework for ensuring the quality of a data model based on these constraints.

Filters are not constraints but are defined similarly to Conditions. A Filter is not used to enforce a data quality rule on a datastore, but is used to automatically filter this datastore when using it as a source.

5.3.4.1 Keys

To create a key for a datastore:

In the Designer Navigator, expand in the Model tree the model and then the datastore into which you want to add the key.
Select the Constraints node, right-click and select New Key.
Enter the Name for the constraint, and then select the Key or Index Type. Primary Keys and Alternate Keys can be checked and can act as an update key in an interface. Non-Unique Index are used mainly for performance reasons.
In the Columns tab, select the list of columns that belong to this key.
In the Control tab, select whether this constraint should be checked by default in a Static or Flow check.
By clicking the Check button, you can retrieve the number of records that do not respect this constraint.
Select Save from the File main menu.

5.3.4.2 References

To create a reference between two datastores:

In the Designer Navigator, expand in the Model tree the model and then one of the datastores into which you want to add the reference.
Select the Constraints node, right-click and select New Reference.
Enter the Name for the constraint, and then select the Type for the reference. In a User Reference the two datastores are linked based on column equality. In a Complex User Reference any expression can be used to link the two datastores. A Database Reference is a reference based on column equality that has been reverse-engineered from a database engine.
If you want to reference a datastore that exists in a model, select the Model and the Table that you want to link to the current datastore.
If you want to link a table that does not exist in a model, leave the Model and Table fields undefined, and set the Catalog, Schema and Table names to identify your datastore.
If you are defining a User or Database reference, in the Columns tab, define the matching columns from the two linked datastores.
If you are defining a Complex User reference, enter in the Expression tab the expression that relates columns from the two linked datastores.
In the Control tab, select whether this constraint should be checked by default in a Static or Flow check.
By clicking the Check button, you can retrieve the number of records that respect or do not respect this constraint.
Select Save from the File main menu.

5.3.4.3 Conditions

To create a condition for a datastore:

In the Designer Navigator, expand in the Model tree the model and then one of the datastores into which you want to add the condition.
Select the Constraints node, right-click and select New Condition.
Enter the Name for the constraint, and then select the Type for the condition. An Oracle Data Integrator Condition is a condition that exists only in the model and does not exist in the database. A Database Condition is a condition that is defined in the database and has been reverse-engineered.
In the Where field enter the expression that implements the condition. This expression is a SQL WHERE expression that valid records should respect.
Type in the Message field the error message for this constraint.
In the Control tab, select whether this constraint should be checked by default in a Static or Flow check.
By clicking the Check button, you can retrieve the number of records that do not respect this constraint.
Select Save from the File main menu.

5.3.4.4 Mandatory Columns

To define mandatory columns for a datastore:

In the Designer Navigator, expand in the Model tree the model containing the datastores.
Double-click the datastore containing the column that must be set as mandatory. The Datastore Editor appears.
In the Columns tab, check the Not Null field for each column that is mandatory.
Select Save from the File main menu.

5.3.4.5 Filter

To add a filter to a datastore:

In the Designer Navigator, expand in the Model tree the model and then one of the datastores into which you want to add the filter.
Select the Filter node, right-click and select New Condition.
Enter the Name for the filter.
In the Where field enter the expression that implements the filter. This expression is a SQL WHERE expression used to filter source records.
In the Control tab, check Filter Active for Static Control if you want data from this table to be filtered prior to checking it a static control.
Select Save from the File main menu.

5.4 Editing and Viewing a Datastore's Data

To view a datastore's data:

Select the datastore from the model in the Designer Navigator.
Right-click and select View Data.

The data appear in a non editable grid.

To edit a datastore's data:

Select the datastore from the model in the Designer Navigator.
Right-click and select Data...

The data appear in an editable grid in the Data Editor. The Refresh button enables you to edit and run again the query returning the datastore data. You can filter the data and perform free queries on the datastore using this method.

It is possible to edit a datastore's data if the connectivity used and the data server user's privileges allow it, and if the datastore structure enables to identify each row of the datastore (PK, etc.).

Note:

The data displayed is the data stored in the physical schema corresponding to the model's logical schema, in the current working context.

5.5 Using Partitioning

Oracle Data Integrator is able to use database-defined partitions when processing data in partitioned tables used as source or targets of integration interfaces. These partitions are created in the datastore corresponding to the table, either through the reverse-engineering process or manually. For example with the Oracle technology, partitions are reverse-engineered using the RKM Oracle.

The partitioning methods supported depend on the technology of the datastore. For example, for the Oracle technology the following partitioning methods are supported: Range, Hash, List.

Once defined on a datastore, partitions can be selected when this datastore is used as a source or a target of an interface. Refer to Chapter 11, "Working with Integration Interfaces" for information.

If using the common format designer, you can also create partitions when performing the Generate DDL operation.

5.5.1 Defining Manually Partitions and Sub-Partitions of Model Datastores

Partition information can be reverse-engineered along with the datastore structures or defined manually.

Note:

Standard reverse-engineering does not support the revers-engineering of partitions. To reverse-engineer partitions and sub-partitions, you have to use customized reverse-engineering.

To define manually partitions and sub-partitions for a datastore:

In the Models accordion, double-click the datastore for which you want to define the partition or sub-partition. The Datastore Editor opens.
In the Partitions tab, enter the following details to define the partition and sub-partition:
- Partition by
  
  Select the partitioning method. This list displays the partitioning methods supported by the technology on which the model relies.
- Sub-Partition by
  
  If you want to define sub-partitions in addition to partitions, select the sub-partitioning method. This list displays the partitioning methods supported by the technology on which the model relies.
Click Add Partition.
In the Name field, enter a name for the partition, for example: FY08.
In the Description field, enter a description for the partition, for example: Operations for Fiscal Year 08.
If you want to add:
- additional partitions, repeat steps 3 through 5.
- a sub-partition of a selected partition, click Add Sub-Partition and repeat steps 4 and 5.
From the File menu, select Save.

5.6 Checking Data Quality in a Model

Data Quality control is essential in ensuring the overall consistency of the data in your information system's applications.

Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, etc. In addition, such incorrect data may propagate via integration flows.

5.6.1 Introduction to Data Integrity

Oracle Data Integrator provides a working environment to detect these constraint violation and store them for recycling or reporting purposes.

There are two different main types of controls: Static Control and Flow Control. We will examine the differences between the two.

Static Control

Static Control implies the existence of rules that are used to verify the integrity of your application data. Some of these rules (referred to as constraints) may already be implemented in your data servers (using primary keys, reference constraints, etc.)

With Oracle Data Integrator, you can refine the validation of your data by defining additional constraints, without implementing them directly in your servers. This procedure is called Static Control since it allows you to perform checks directly on existing - or static - data. Note that the active database constraints (these are those that have Defined in the Database and Active selected on the Controls tab) need no additional control from Oracle Data Integrator since they are already controlled by the database.

Flow Control

The information systems targeted by transformation and integration processes often implement their own declarative rules. The Flow Control function is used to verify an application's incoming data according to these constraints before loading the data into these targets. Setting up flow control is detailed in to Chapter 11, "Working with Integration Interfaces".

5.6.2 Checking a Constraint

While creating a constraint in Oracle Data Integrator, it is possible to retrieve the number of lines violating this constraint. This action, referred as Synchronous Control is performed from the Control tab of the given constraint Editor by clicking the Check button.

The result of a synchronous control is not persistent. This type of control is used to quickly evaluate the validity of a constraint definition.

5.6.3 Perform a Static Check on a Model, Sub-Model or Datastore

To perform a Static Check on a Model, Sub-Model or Datastore:

In the Models tree in the Designer Navigator, select the model that you want to check.
Double-click this model to edit it.
In the Control tab of the model Editor, select the Check Knowledge Module (CKM) used in the static check.
From the File menu, select Save All.
Right-click the model, sub-model or datastore that you want to check in the Model tree in the Designer Navigator and select Control > Check.
In the Execution window, select the execution parameters:
1. Select the Context into which the data must be checked.
2. Select the Logical Agent that will run the check tasks.
3. Check the Delete Errors from the Checked Tables option if you want rows detected as erroneous to be removed from the checked tables.
See Table 21-1 for more information about the execution parameters.
Click OK.
The Session Started Window appears.
Click OK.

You can review the check tasks in the Operator Navigator. If the control process completes correctly, you can review the erroneous records for each datastore that has been checked.

5.6.4 Reviewing Erroneous Records

To view a datastore's errors:

Select the datastore from the model in the Designer Navigator.
Right-click and select Control > Errors....

The erroneous rows detected for this datastore appear in a grid.