12 Creating and Using Data Models and Datastores
This chapter includes the following sections:
Introduction to Models
Models as well as all their components are based on the relational paradigm (table, attributes, keys, etc.). Models in Data Integrator only contain Metadata, that is the description of the data structures. They do not contain a copy of the actual data.
Note:
Frequently used technologies have their reverse and model creation methods detailed in the Connectivity and Knowledge Modules Guide for Oracle Data Integrator.
Models can be organized into model folders and the datastores of a model can be organized into sub-models. The section Organizing Models with Folders in Developing Integration Projects with Oracle Data Integrator describes how to create and organize model folders and sub-models.
Datastores
A datastore represents a data structure. It can be a table, a flat file, a message queue or any other data structure accessible by Oracle Data Integrator.
A datastore describes data in a tabular structure. Datastores are composed of attributes.
As datastores are based on the relational paradigm, it is also possible to associate the following elements to a datastore:
-
Keys
A Key is a set of attributes with a specific role in the relational paradigm. Primary and Alternate Keys identify each record uniquely. Non-Unique Indexes enable optimized record access.
-
References
A Reference is a functional link between two datastores. It corresponds to a Foreign Key in a relational model. For example: The INVOICE datastore references the CUSTOMER datastore through the customer number.
-
Conditions and Filters
Conditions and Filters are a WHERE-type SQL expressions attached to a datastore. They are used to validate or filter the data in this datastore.
Data Integrity
A model contains constraints such as Keys, References or Conditions, but also non-null flags on attributes. Oracle Data Integrator includes a data integrity framework for ensuring the quality of a data model.
This framework allows to perform:
-
Static Checks to verify the integrity of the data contained in a data model. This operation is performed to assess the quality of the data in a model when constraints do not physically exist in the data server but are defined in Data Integrator only.
-
Flow Check to verify the integrity of a data flow before it is integrated into a given datastore. The data flow is checked against the constraints defined in Oracle Data Integrator for the datastore that is the target of the data flow.
Reverse-engineering
A new model is created with no datastores. Reverse-engineering is the process that populates the model in Oracle Data Integrator by retrieving metadata from the data server containing the data structures. There are two different types of reverse-engineering:
-
Standard reverse-engineering uses standard JDBC driver features to retrieve the metadata. Note that unique keys are not reverse-engineered when using a standard reverse-engineering.
-
Customized reverse-engineering uses a technology-specific Reverse Knowledge Module (RKM) to retrieve the metadata, using a method specific to the given technology. This method is recommended if a technology specific RKM exists because it usually retrieves more information than the Standard reverse-engineering method. See the Connectivity and Knowledge Modules Guide for Oracle Data Integrator for a list of available RKMs.
Other methods for reverse-engineering exist for flat file datastores. They are detailed in Reverse-Engineering File Datastores.
Oracle Data Integrator is able to reverse-engineer models containing datastore shortcuts. For more information, see the Using Shortcuts chapter in Developing Integration Projects with Oracle Data Integrator.
Changed Data Capture
Change Data Capture (CDC), also referred to as Journalizing, allows to trap changes occurring on the data. CDC is used in Oracle Data Integrator to eliminate the transfer of unchanged data. This feature can be used for example for data synchronization and replication.
Journalizing can be applied to models, sub-models or datastores based on certain type of technologies.
For information about setting up Changed Data Capture, see the Using Journalizing chapter in Developing Integration Projects with Oracle Data Integrator.
Creating and Reverse-Engineering a Model
This section explains the following topics:
Creating a Model
A Model is a set of datastores corresponding to data structures contained in a Physical Schema.
Tip:
To create a new model and new topology objects at the same time, use the procedure described in Creating a Model and Topology Objects.
To create a Model:
The model is created, but contains no datastore yet.
Creating a Model and Topology Objects
You can create a Model along with other topology objects, including new data servers, contexts and schemas, at the same time:
Reverse-engineering a Model
To automatically populate datastores into the model you need to perform a reverse-engineering for this model.
Standard Reverse-Engineering
A Standard Reverse-Engineering uses the capacities of the JDBC driver used to connect the data server to retrieve the model metadata.
To perform a Standard Reverse- Engineering:
-
In the Reverse Engineer tab of your Model:
-
Select Standard.
-
Select the Context used for the reverse-engineering
-
Select the Types of objects to reverse-engineer. Only object of these types will be taken into account by the reverse-engineering process.
-
Enter in the Mask field the mask of tables to reverse engineer. The mask selects the objects to reverse. This mask uses the SQL LIKE syntax. The percent (
%
) symbol means zero or more characters, and the underscore (_
) symbol means one character. -
Optionally, you can specify the characters to remove for the table alias. These are the characters to delete in order to derive the alias. Note that if the datastores already exist, the characters specified here will not be removed from the table alias. Updating a datastore is not applied to the table alias.
-
-
In the Selective Reverse-Engineering tab select Selective Reverse-Engineering, New Datastores, Existing Datastores and Objects to Reverse Engineer.
-
A list of datastores to be reverse-engineered appears. Leave those you wish to reverse-engineer checked.
-
Select Save from the File main menu.
-
Click Reverse Engineer in the Model toolbar menu.
-
Oracle Data Integrator launches a reverse-engineering process for the selected datastores. A progress bar indicates the progress of the reverse-engineering process.
The reverse-engineered datastores appear under the model node in the Models panel.
Customized Reverse-Engineering
A Customized Reverse-Engineering uses a Reverse-engineering Knowledge Module (RKM), to retrieve metadata for a specific type of technology and create the corresponding datastore definition in the data model.
For example, for the Oracle technology, the RKM Oracle accesses the database dictionary tables to retrieve the definition of tables, attributes, keys, etc., that are created in the model.
Note:
The RKM must be available as a global RKM or imported into the project. For more information on KM import, see the Creating an Integration Project chapter in Developing Integration Projects with Oracle Data Integrator.
To perform a Customized Reverse-Engineering using a RKM:
You can review the reverse-engineering tasks in the Operator Navigator. If the reverse-engineering process completes correctly, reverse-engineered datastores appear under the model node in the Models panel.
Creating and Reverse-Engineering a Datastore
This section explains how to create a datastore.
Creating a Datastore
To create a datastore:
The datastore is created. If this is a File datastore, refer to Reverse-Engineering File Datastores for creating attributes for this datastore. It is also possible to manually edit attributes for all datastores. See Adding and Deleting Datastore Attributes for more information.
Reverse-Engineering File Datastores
Oracle Data Integrator provides specific methods for reverse-engineering flat files. The methods for reversing flat files are described below.
Adding and Deleting Datastore Attributes
To add attributes to a datastore:
-
In the Attributes tab of the datastore, click Add Attribute in the tool bar menu.
-
An empty line appears. Fill in the information about the new attribute. You should at least fill in the Name, Datatype and Length fields.
-
Repeat steps 1 and 2 for each attribute you want to add to the datastore.
-
Select Save from the File main menu.
To delete attributes from a datastore:
- In the Attributes tab of the datastore, select the attribute to delete.
- Click the Delete Attribute button. The attribute disappears from the list.
Adding and Deleting Constraints and Filters
Oracle Data Integrator manages constraints on data model including Keys, References, Conditions and Mandatory Attributes. It includes a data integrity framework for ensuring the quality of a data model based on these constraints.
Filters are not constraints but are defined similarly to Conditions. A Filter is not used to enforce a data quality rule on a datastore, but is used to automatically filter this datastore when using it as a source.
Editing and Viewing a Datastore's Data
To view a datastore's data:
-
Select the datastore from the model in the Designer Navigator.
-
Right-click and select View Data.
The data appear in a non editable grid.
To edit a datastore's data:
- Select the datastore from the model in the Designer Navigator.
- Right-click and select Data...
The data appear in an editable grid in the Data Editor. The Refresh button enables you to edit and run again the query returning the datastore data. You can filter the data and perform free queries on the datastore using this method.
It is possible to edit a datastore's data if the connectivity used and the data server user's privileges allow it, and if the datastore structure enables to identify each row of the datastore (PK, etc.).
Note:
The data displayed is the data stored in the physical schema corresponding to the model's logical schema, in the current working context.
Using Partitioning
The partitioning methods supported depend on the technology of the datastore. For example, for the Oracle technology the following partitioning methods are supported: Range, Hash, List.
Once defined on a datastore, partitions can be selected when this datastore is used as a source or a target of a mapping.
Refer to Creating and Using Mappings, for information.
If using the common format designer, you can also create partitions when performing the Generate DDL operation.
Manually Defining Partitions and Sub-Partitions of Model Datastores
Partition information can be reverse-engineered along with the datastore structures or defined manually.
Note:
Standard reverse-engineering does not support the revers-engineering of partitions. To reverse-engineer partitions and sub-partitions, you have to use customized reverse-engineering.
To manually define partitions and sub-partitions for a datastore:
Checking Data Quality in a Model
Data Quality control is essential in ensuring the overall consistency of the data in your information system's applications.
Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, etc. In addition, such incorrect data may propagate via integration flows.
Introduction to Data Integrity
Oracle Data Integrator provides a working environment to detect these constraint violation and store them for recycling or reporting purposes.
There are two different main types of controls: Static Control and Flow Control. We will examine the differences between the two.
Static Control
Static Control implies the existence of rules that are used to verify the integrity of your application data. Some of these rules (referred to as constraints) may already be implemented in your data servers (using primary keys, reference constraints, etc.)
With Oracle Data Integrator, you can refine the validation of your data by defining additional constraints, without implementing them directly in your servers. This procedure is called Static Control since it allows you to perform checks directly on existing - or static - data. Note that the active database constraints (these are those that have Defined in the Database and Active selected on the Controls tab) need no additional control from Oracle Data Integrator since they are already controlled by the database.
Flow Control
The information systems targeted by transformation and integration processes often implement their own declarative rules. The Flow Control function is used to verify an application's incoming data according to these constraints before loading the data into these targets.
Setting up flow control is detailed in Creating and Using Mappings.
Checking a Constraint
While creating a constraint in Oracle Data Integrator, it is possible to retrieve the number of lines violating this constraint. This action, referred as Synchronous Control is performed from the Control tab of the given constraint Editor by clicking the Check button.
The result of a synchronous control is not persistent. This type of control is used to quickly evaluate the validity of a constraint definition.
Perform a Static Check on a Model, Sub-Model or Datastore
To perform a Static Check on a Model, Sub-Model or Datastore:
You can review the check tasks in the Operator Navigator. If the control process completes correctly, you can review the erroneous records for each datastore that has been checked.