Creating and Using Data Models and Datastores

12 Creating and Using Data Models and Datastores

This chapter describes how to create a model, how to reverse-engineer this model to populate it with datastores and how to create manually datastores of a model. This chapter also explains how to use partitioning and check the quality of the data in a model.

This chapter includes the following sections:

Introduction to Models

A Model is the description of a set of datastores. It corresponds to a group of tabular data structures stored in a data server. A model is based on a Logical Schema defined in the topology. In a given Context, this Logical Schema is mapped to a Physical Schema. The Data Schema of this Physical Schema contains physical data structure: tables, files, JMS messages, elements from an XML file, that are represented as datastores.

Models as well as all their components are based on the relational paradigm (table, attributes, keys, etc.). Models in Data Integrator only contain Metadata, that is the description of the data structures. They do not contain a copy of the actual data.

Note:

Frequently used technologies have their reverse and model creation methods detailed in the Connectivity and Knowledge Modules Guide for Oracle Data Integrator.

Models can be organized into model folders and the datastores of a model can be organized into sub-models. The section Organizing Models with Folders in Developing Integration Projects with Oracle Data Integrator describes how to create and organize model folders and sub-models.

Datastores

A datastore represents a data structure. It can be a table, a flat file, a message queue or any other data structure accessible by Oracle Data Integrator.

A datastore describes data in a tabular structure. Datastores are composed of attributes.

As datastores are based on the relational paradigm, it is also possible to associate the following elements to a datastore:

Keys

A Key is a set of attributes with a specific role in the relational paradigm. Primary and Alternate Keys identify each record uniquely. Non-Unique Indexes enable optimized record access.
References

A Reference is a functional link between two datastores. It corresponds to a Foreign Key in a relational model. For example: The INVOICE datastore references the CUSTOMER datastore through the customer number.
Conditions and Filters

Conditions and Filters are a WHERE-type SQL expressions attached to a datastore. They are used to validate or filter the data in this datastore.

Data Integrity

A model contains constraints such as Keys, References or Conditions, but also non-null flags on attributes. Oracle Data Integrator includes a data integrity framework for ensuring the quality of a data model.

This framework allows to perform:

Static Checks to verify the integrity of the data contained in a data model. This operation is performed to assess the quality of the data in a model when constraints do not physically exist in the data server but are defined in Data Integrator only.
Flow Check to verify the integrity of a data flow before it is integrated into a given datastore. The data flow is checked against the constraints defined in Oracle Data Integrator for the datastore that is the target of the data flow.

Reverse-engineering

A new model is created with no datastores. Reverse-engineering is the process that populates the model in Oracle Data Integrator by retrieving metadata from the data server containing the data structures. There are two different types of reverse-engineering:

Standard reverse-engineering uses standard JDBC driver features to retrieve the metadata. Note that unique keys are not reverse-engineered when using a standard reverse-engineering.
Customized reverse-engineering uses a technology-specific Reverse Knowledge Module (RKM) to retrieve the metadata, using a method specific to the given technology. This method is recommended if a technology specific RKM exists because it usually retrieves more information than the Standard reverse-engineering method. See the Connectivity and Knowledge Modules Guide for Oracle Data Integrator for a list of available RKMs.

Other methods for reverse-engineering exist for flat file datastores. They are detailed in Reverse-Engineering File Datastores.

Oracle Data Integrator is able to reverse-engineer models containing datastore shortcuts. For more information, see the Using Shortcuts chapter in Developing Integration Projects with Oracle Data Integrator.

Changed Data Capture

Change Data Capture (CDC), also referred to as Journalizing, allows to trap changes occurring on the data. CDC is used in Oracle Data Integrator to eliminate the transfer of unchanged data. This feature can be used for example for data synchronization and replication.

Journalizing can be applied to models, sub-models or datastores based on certain type of technologies.

For information about setting up Changed Data Capture, see the Using Journalizing chapter in Developing Integration Projects with Oracle Data Integrator.

Creating and Reverse-Engineering a Model

This section explains the following topics:

Creating a Model

A Model is a set of datastores corresponding to data structures contained in a Physical Schema.

Tip:

To create a new model and new topology objects at the same time, use the procedure described in Creating a Model and Topology Objects.

To create a Model:

In Designer Navigator expand the Models panel.
Right-click then select New Model.
Fill in the following fields in the Definition tab:
- Name: Name of the model used in the user interface.
- Technology: Select the model's technology.
- Logical Schema: Select the Logical Schema on which your model will be based.
On the Reverse Engineer tab, select a Context which will be used for the model's reverse-engineering.

Note that if there is only one context that maps the logical schema, this context will be set automatically.
Select Save from the File main menu.

The model is created, but contains no datastore yet.

Creating a Model and Topology Objects

You can create a Model along with other topology objects, including new data servers, contexts and schemas, at the same time:

In ODI Studio, select File and click New....
In the New Gallery dialog, select Create a New Model and Topology Objects and click OK.

The Create New Model and Topology Objects wizard appears.
In the Model panel of the wizard, fill in the following fields.
- Name: Name of the model used in the user interface.
- Technology: Select the model's technology.
- Logical Schema: Select the Logical Schema on which your model will be based.
- Context: Select the context that will be used for the model's reverse-engineering.
Click Next.
In the Data Server panel of the wizard, fill in the following data server fields.
- Name: Name of the data server, as it appears in the user interface.
  
  Note: or naming data servers, it is recommended to use the following naming standard: <TECHNOLOGY_NAME>_<SERVER_NAME>.
- Technology: Technology linked to the data server.
  
  Note: Appears only if the Technology selected for the Model is of the type Generic SQL.
- User: User name used for connecting to the data server.
- Password: Password linked with the user name.
  
  Note: This password is stored encrypted in the repository.
- Driver List: Provides a list of available drivers available to be used with the data server.
- Driver: Name of the driver used for connecting to the data server.
- URL: Provides the connection details.
- Properties: Lists the properties that you can set for the selected driver.
Click Next.
In the Physical Schema panel of the wizard, fill in the physical schema fields.
- Name: Name of the physical schema. It is calculated automatically and is read-only.
- Datasource (Catalog): Name of the catalog in the data server.
  
  Note: Appears only if the Technology selected supports Catalogs.
- Schema (Schema): Name of the schema in the data server. Schema, owner, or library where the required data is stored.
  
  Note: Oracle Data Integrator lists all the schemas present in the data server. Sometimes, Oracle Data Integrator cannot draw up this list. In this case, you should enter the schema name, respecting the case.
- Datasource (Work Catalog): Indicate the catalog in which you want to create these objects. For some data validation or transformation operations, Oracle Data Integrator may require work objects to be created.
  
  Note: Appears only if the Technology selected supports Catalogs.
- Schema (Work Schema): Indicates the schema in which you want to create these objects. For some data validation or transformation operations, Oracle Data Integrator may require work objects to be created. If you do not set a Work Schema, it defaults to the Schema during execution
  
  It is recommended that you create a specific schema dedicated to any work tables. By creating a schema named SAS or ODI in all your data servers, you ensure that all Oracle Data Integrator activity remains totally independent from your applications.
  
  Note: Oracle Data Integrator lists all the schemas present in the data server. Sometimes, Oracle Data Integrator cannot draw up this list. In this case, you should enter the schema name, respecting the case.
- Driver: Name of the driver used for connecting to the data server.
- URL: Provides the connection details.
- Properties: Specifies the properties for the selected driver.
Click Next.
Click Finish. The model and topology objects are created, but the model contains no datastore yet.

Reverse-engineering a Model

To automatically populate datastores into the model you need to perform a reverse-engineering for this model.

Standard Reverse-Engineering

A Standard Reverse-Engineering uses the capacities of the JDBC driver used to connect the data server to retrieve the model metadata.

To perform a Standard Reverse- Engineering:

In the Reverse Engineer tab of your Model:
- Select Standard.
- Select the Context used for the reverse-engineering
- Select the Types of objects to reverse-engineer. Only object of these types will be taken into account by the reverse-engineering process.
- Enter in the Mask field the mask of tables to reverse engineer. The mask selects the objects to reverse. This mask uses the SQL LIKE syntax. The percent (%) symbol means zero or more characters, and the underscore (_) symbol means one character.
- Optionally, you can specify the characters to remove for the table alias. These are the characters to delete in order to derive the alias. Note that if the datastores already exist, the characters specified here will not be removed from the table alias. Updating a datastore is not applied to the table alias.
In the Selective Reverse-Engineering tab select Selective Reverse-Engineering, New Datastores, Existing Datastores and Objects to Reverse Engineer.
A list of datastores to be reverse-engineered appears. Leave those you wish to reverse-engineer checked.
Select Save from the File main menu.
Click Reverse Engineer in the Model toolbar menu.
Oracle Data Integrator launches a reverse-engineering process for the selected datastores. A progress bar indicates the progress of the reverse-engineering process.

The reverse-engineered datastores appear under the model node in the Models panel.

Customized Reverse-Engineering

A Customized Reverse-Engineering uses a Reverse-engineering Knowledge Module (RKM), to retrieve metadata for a specific type of technology and create the corresponding datastore definition in the data model.

For example, for the Oracle technology, the RKM Oracle accesses the database dictionary tables to retrieve the definition of tables, attributes, keys, etc., that are created in the model.

Note:

The RKM must be available as a global RKM or imported into the project. For more information on KM import, see the Creating an Integration Project chapter in Developing Integration Projects with Oracle Data Integrator.

To perform a Customized Reverse-Engineering using a RKM:

In the Reverse Engineer tab of your Model:
- Select Customized.
- Select the Context used for the reverse-engineering
- Select the Types of objects to reverse-engineer. Only object of these types will be taken into account by the reverse-engineering process.
- Enter in the Mask the mask of tables to reverse engineer.
- Select the KM that you want to use for performing the reverse-engineering process. This KM is typically called RKM <technology>.<name of the project>.
- Optionally, you can specify the characters to remove for the table alias. These are the characters to delete in order to derive the alias. Note that if the datastores already exist, the characters specified here will not be removed from the table alias. Updating a datastore is not applied to the table alias.
Click Reverse Engineer in the Model toolbar menu, then Yes to validate the changes.
Click OK.
The Session Started Window appears.
Click OK.

You can review the reverse-engineering tasks in the Operator Navigator. If the reverse-engineering process completes correctly, reverse-engineered datastores appear under the model node in the Models panel.

Creating and Reverse-Engineering a Datastore

Although the recommended method for creating datastores in a model is reverse-engineering, it is possible to manually define datastores in a blank model. It is the recommended path for creating flat file datastores.

This section explains how to create a datastore.

Creating a Datastore

To create a datastore:

From the Models tree in Designer Navigator, select a Model or a Sub-Model.
Right-click and select New Datastore.
In the Definition tab, fill in the following fields:
- Name of the Datastore: This is the name that appears in the trees and that is used to reference the datastore from a project.
  
  Note:
  
  Do not use ODI reserved names like, for example, JRN_FLAG, JRN_SUBSCRIBER, and JRN_DATE for the datastore name. These names would cause Duplicate Attribute name SQL errors in ODI intermediate tables such as error tables.
- Resource Name: Name of the object in the form recognized by the data server which stores it. This may be a table name, a file name, the name of a JMS Queue, etc.
  
  Note:
  
  If the Resource is a database table, it must respect the Database rules for object and attributes identifiers. There is a limit in the object identifier for most of the Technologies (in Oracle, typically 30 characters). To avoid these errors, ensure in the topology for a specific technology that maximum lengths for the object names (tables and columns) correspond to your database configuration.
- Alias: This is a default alias used to prefix this datastore's attributes names in expressions.
If the datastore represents a flat file (delimited or fixed), in the File tab, fill in the following fields:
- File Format: Select the type of your flat file, fixed or delimited.
- Header: Number of header lines for the flat file.
- Record Separator and Field Separator define the characters used to separate records (lines) in the file, and fields within one record.
  
  Record Separator: One or several characters separating lines (or records) in the file:
  - MS-DOS: DOS carriage return
  - Unix: UNIX carriage return
  - Other: Free text you can input as characters or hexadecimal codes
  Field Separator: One ore several characters separating the fields in a record.
  - Tabulation
  - Space
  - Other: Free text you can input as characters or hexadecimal code
Select Save from the File main menu.

The datastore is created. If this is a File datastore, refer to Reverse-Engineering File Datastores for creating attributes for this datastore. It is also possible to manually edit attributes for all datastores. See Adding and Deleting Datastore Attributes for more information.

Reverse-Engineering File Datastores

Oracle Data Integrator provides specific methods for reverse-engineering flat files. The methods for reversing flat files are described below.

Adding and Deleting Datastore Attributes

To add attributes to a datastore:

In the Attributes tab of the datastore, click Add Attribute in the tool bar menu.
An empty line appears. Fill in the information about the new attribute. You should at least fill in the Name, Datatype and Length fields.
Repeat steps 1 and 2 for each attribute you want to add to the datastore.
Select Save from the File main menu.

To delete attributes from a datastore:

In the Attributes tab of the datastore, select the attribute to delete.
Click the Delete Attribute button. The attribute disappears from the list.

Adding and Deleting Constraints and Filters

Oracle Data Integrator manages constraints on data model including Keys, References, Conditions and Mandatory Attributes. It includes a data integrity framework for ensuring the quality of a data model based on these constraints.

Filters are not constraints but are defined similarly to Conditions. A Filter is not used to enforce a data quality rule on a datastore, but is used to automatically filter this datastore when using it as a source.

Editing and Viewing a Datastore's Data

To view a datastore's data:

Select the datastore from the model in the Designer Navigator.
Right-click and select View Data.

The data appear in a non editable grid.

To edit a datastore's data:

Select the datastore from the model in the Designer Navigator.
Right-click and select Data...

The data appear in an editable grid in the Data Editor. The Refresh button enables you to edit and run again the query returning the datastore data. You can filter the data and perform free queries on the datastore using this method.

It is possible to edit a datastore's data if the connectivity used and the data server user's privileges allow it, and if the datastore structure enables to identify each row of the datastore (PK, etc.).

Note:

The data displayed is the data stored in the physical schema corresponding to the model's logical schema, in the current working context.

Using Partitioning

Oracle Data Integrator is able to use database-defined partitions when processing data in partitioned tables used as source or targets of mappings. These partitions are created in the datastore corresponding to the table, either through the reverse-engineering process or manually. For example with the Oracle technology, partitions are reverse-engineered using the RKM Oracle.

The partitioning methods supported depend on the technology of the datastore. For example, for the Oracle technology the following partitioning methods are supported: Range, Hash, List.

Once defined on a datastore, partitions can be selected when this datastore is used as a source or a target of a mapping.

Refer to Creating and Using Mappings, for information.

If using the common format designer, you can also create partitions when performing the Generate DDL operation.

Manually Defining Partitions and Sub-Partitions of Model Datastores

Partition information can be reverse-engineered along with the datastore structures or defined manually.

Note:

Standard reverse-engineering does not support the revers-engineering of partitions. To reverse-engineer partitions and sub-partitions, you have to use customized reverse-engineering.

To manually define partitions and sub-partitions for a datastore:

In the Models accordion, double-click the datastore for which you want to define the partition or sub-partition. The Datastore Editor opens.
In the Partitions tab, enter the following details to define the partition and sub-partition:
- Partition by
  
  Select the partitioning method. This list displays the partitioning methods supported by the technology on which the model relies.
- Sub-Partition by
  
  If you want to define sub-partitions in addition to partitions, select the sub-partitioning method. This list displays the partitioning methods supported by the technology on which the model relies.
Click Add Partition.
In the Name field, enter a name for the partition, for example: FY08.
In the Description field, enter a description for the partition, for example: Operations for Fiscal Year 08.
If you want to add:
- additional partitions, repeat steps 3 through 5.
- a sub-partition of a selected partition, click Add Sub-Partition and repeat steps 4 and 5.
From the File menu, select Save.

Checking Data Quality in a Model

Data Quality control is essential in ensuring the overall consistency of the data in your information system's applications.

Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, etc. In addition, such incorrect data may propagate via integration flows.

Introduction to Data Integrity

Oracle Data Integrator provides a working environment to detect these constraint violation and store them for recycling or reporting purposes.

There are two different main types of controls: Static Control and Flow Control. We will examine the differences between the two.

Static Control

Static Control implies the existence of rules that are used to verify the integrity of your application data. Some of these rules (referred to as constraints) may already be implemented in your data servers (using primary keys, reference constraints, etc.)

With Oracle Data Integrator, you can refine the validation of your data by defining additional constraints, without implementing them directly in your servers. This procedure is called Static Control since it allows you to perform checks directly on existing - or static - data. Note that the active database constraints (these are those that have Defined in the Database and Active selected on the Controls tab) need no additional control from Oracle Data Integrator since they are already controlled by the database.

Flow Control

The information systems targeted by transformation and integration processes often implement their own declarative rules. The Flow Control function is used to verify an application's incoming data according to these constraints before loading the data into these targets.

Setting up flow control is detailed in Creating and Using Mappings.

Checking a Constraint

While creating a constraint in Oracle Data Integrator, it is possible to retrieve the number of lines violating this constraint. This action, referred as Synchronous Control is performed from the Control tab of the given constraint Editor by clicking the Check button.

The result of a synchronous control is not persistent. This type of control is used to quickly evaluate the validity of a constraint definition.

Perform a Static Check on a Model, Sub-Model or Datastore

To perform a Static Check on a Model, Sub-Model or Datastore:

In the Models tree in the Designer Navigator, select the model that you want to check.
Double-click this model to edit it.
In the Control tab of the model Editor, select the Check Knowledge Module (CKM) used in the static check.
From the File menu, select Save All.
Right-click the model, sub-model or datastore that you want to check in the Model tree in the Designer Navigator and select Control > Check. Or, in the model editor menu bar, click the Check Model button.
In the Run dialog, select the execution parameters:
- Select the Context into which the step must be executed.
- Select the Logical Agent that will run the step.
- Select a Log Level.
- Check the Delete Errors from the Checked Tables option if you want rows detected as erroneous to be removed from the checked tables.
- Select Recurse Sub-Models to check sub-models of this models
- Optionally, select Simulation. This option performs a simulation of the run operation and generates a run report, without actually affecting data.
See the Execution Parameters table in Administering Oracle Data Integrator for more information about the execution parameters.
Click OK.
The Session Started Window (or, if running a simulation, the Simulation window) appears.
Click OK.

You can review the check tasks in the Operator Navigator. If the control process completes correctly, you can review the erroneous records for each datastore that has been checked.

Reviewing Erroneous Records

To view a datastore's errors:

Select the datastore from the model in the Designer Navigator.
Right-click and select Control > Errors....

The erroneous rows detected for this datastore appear in a grid.