Oracle® Fusion Middleware Getting Started with Oracle Data Integrator Release 11g (11.1.1) E12641-01 |
|
Previous |
Next |
This chapter describes how to implement data quality control. An introduction to data integrity control is provided.
This chapter includes the following sections:
Data integrity control is essential in ensuring the overall consistency of the data in your information system's applications.
Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, and so forth.
Oracle Data Integrator provides a working environment to detect these constraint violations and to store them for recycling or reporting purposes.
There are two different types of controls: Static Control and Flow Control. We will examine the differences between the two.
Static Control
Static Control implies the existence of rules that are used to verify the integrity of your application data. Some of these rules (referred to as constraints) may already be implemented in your data servers (using primary keys, reference constraints, etc.)
With Oracle Data Integrator, you can enhance the quality of your data by defining and checking additional constraints, without declaring them directly in your servers. This procedure is called Static Control since it allows you to perform checks directly on existing - or static - data.
Flow Control
The information systems targeted by transformation and integration processes often implement their own declarative rules. The Flow Control function is used to verify an application's incoming data according to these constraints before loading the data into these targets. The flow control procedure is detailed in the "Interfaces" chapter.
Benefits
The main advantages of performing data integrity checks are the following:
Increased productivity by using the target database for its entire life cycle. Business rule violations in the data slow down application programming throughout the target database's life-cycle. Cleaning the transferred data can therefore reduce application programming time.
Validation of the target database's model. The rule violations detected do not always imply insufficient source data integrity. They may reveal a degree of incompleteness in the target model. Migrating the data before an application is rewritten makes it possible to validate a new data model while providing a test database in line with reality.
Improved quality of service for the end-users.
Ensuring data integrity is not always a simple task. Indeed, it requires that any data violating declarative rules must be isolated and recycled. This implies the development of complex programming, in particular when the target database incorporates a mechanism for verifying integrity constraints. In terms of operational constraints, it is most efficient to implement a method for correcting erroneous data (on the source, target, or recycled flows) and then to reuse this method throughout the enterprise.
This example guides you through the data integrity audit process (Static Control).
The Orders Application - HSQL application contains data that do not satisfy business rule constraints on a number of different levels. The objective is to determine which data in this application does not satisfy the constraints imposed by the information system.
This section includes the following topics:
Some data in our source may be inconsistent. There may be constraints in the target table that are not implemented in the source table or there may be supplementary rules that you wish to add. In our case we have two constraints that we want to enforce on the SRC_CUSTOMER table:
Customers must be over 21 years of age. However there could be some records corresponding to younger customers in the input table.
The CITY_ID column must refer to an entry in the SRC_CITY table. However there could be some values that do not exist in the city table.
We want to determine which rows do not satisfy these two constraints and automatically copy the corresponding invalid records into an error table for analysis.
Enforcing these types of rules requires the use of a check constraint (also referred to as a condition), as well as a reference constraint between the SRC_CITY and SRC_CUSTOMER tables.
This section describes how to create the following constraints:
Creating an age constraints consists in adding a data validity condition on a column.
To create the age constraint:
In the Models accordion in Designer Navigator, expand the Orders Application - HSQL model.
Expand the SRC_CUSTOMER datastore.
Right-click the Constraints node and select New Condition as shown in Figure 5-1.
In the Definition tab of the Condition Editor:
In the Name field, enter the name of your condition. For example: AGE > 21
.
From the Type list, select Oracle Data Integrator Condition.
In the Where clause field, enter the following SQL code:
SRC_CUSTOMER.AGE > 21
Notes:
|
In the Message field, specify the error message as it will appear in your error table:
Customer age is not over 21!
Figure 5-2 shows the Condition Editor.
From the File main menu, select Save to save the condition.
This section describes how to create a reference constraint based on the CITY_ID column between the SRC_CUSTOMER table and the SRC_CITY table.
This constraint allows checking that customers are located in a city that exists in the SRC_CITY table.
To create the reference constraint:
In the Models accordion in Designer Navigator, expand the Orders Application - HSQL model.
Expand the SRC_CUSTOMER datastore.
Right-click the Constraints node and select New Reference as shown in Figure 5-3.
In the Definition tab of the Reference Editor:
From the Type list, select User Reference.
From the Model list in the Parent Model/Table section, select Orders Application - HSQL. This is the data model containing the table you want to link to.
From the Table list, select SRC_CITY. This is the table you want to link to.
Figure 5-4 shows the Reference Editor.
In the Reference Editor, go to the Columns tab.
On the Columns tab, click Add as shown in Figure 5-5.
A new row is inserted in the columns table.
In this step you define the matching columns:
Click on the row that appears. This will bring up a drop-down list containing all of the columns in the appropriate table.
From the Columns (Foreign Table) list, select CITY_ID.
From the Columns (Primary Table) list, select CITY_ID.
Figure 5-6 shows the Columns tab of the Reference Editor with the selected matching columns.
Note that in this example the Foreign Table is SRC_CUSTOMER and the Primary Table is SRC_CITY. Note also that it is not required for foreign keys that the column names of the Foreign Table and the Primary Table match. It just happens that they do in this example.
Select File > Save to save this reference.
Tip: You can alternately use the [CTRL - S] shortcut to save the current Editor. |
Running the static control verifies the constraints defined on a datastore. You can now verify the data in the SRC_CUSTOMER datastore against the constraints defined in Section 5.2.3, "Creating Constraints".
To run the static control:
In the Models accordion in Designer Navigator, right-click the SRC_CUSTOMER datastore.
Select Control > Check.
The Execution dialog is displayed as shown in Figure 5-7.
Click OK in the Execution dialog.
The Information Dialog is displayed as shown in Figure 5-8.
Click OK in the Information Dialog.
Oracle Data Integrator automatically generates all of the code required to check your data and start an execution session.
Through Operator Navigator, you can view your execution results and manage your development executions in the sessions.
To view the execution results of your control:
In the Session List accordion in Operator Navigator, expand the All Executions node.
The Session List displays all sessions organized per date, physical agent, status, keywords, and so forth.
Refresh the displayed information clicking Refresh in the Operator Navigator toolbar.
The log for one execution session appears as shown in Figure 5-9.
The log comprises 3 levels:
The session (corresponds to an execution of a scenario, an interface, a package or a procedure undertaken by an execution agent)
The step (corresponds to a checked datastore, an interface, a procedure or a step in a package or in a scenario)
The task (corresponds to an elementary task of the interface, process or check)
This section describes how to determine the invalid records. These are the records that do not satisfy the constraints and has been rejected by the static control.
This section includes the following topics:
To determine the number of invalid records:
In the Session List accordion in Operator Navigator, expand the All Executions node and the SRC_CUSTOMER session.
Double-click the SRC_CUSTOMER step to open the Session Step Editor.
The Record Statistics section details the changes performed during the static control. These changes include the number of inserts, updates, deletes, errors, and the total number of rows handled during this step.
Figure 5-10 shows the Session Step Editor of the SRC_CUSTOMER step.
The number of invalid records is listed in the No. of Errors field. Note that the static control of the SRC_CUSTOMER table has revealed 9 invalid records. These records have been isolated in an error table. See Section 5.2.6.2, "Reviewing the Invalid Records" for more information.
You can access the invalid records by right-clicking on the table in your model and selecting Control > Errors...
To review the error table of the static control on the SRC_CUSTOMER table:
In Designer Navigator, expand the Orders Application - HSQL model.
Right-click the SRC_CUSTOMER datastore.
Select Control > Errors...
The Error Table Editor is displayed as shown in Figure 5-11.
The records that were rejected by the check process are the following:
8 records in violation of the AGE > 21 constraint (the actual age of the customer is 21 or younger, see the AGE column for details).
1 record in violation of the FK_CITY_CUSTOMER constraint (The CITY_ID value does not exist in the SRC_CITY table).
You can view the entire record in this Editor. This means that you can instantly see which values are incorrect, for example the invalid CITY_ID value in the top record.
Note that the error message that is displayed is the one that you have defined when setting up the AGE > 21 constraint in Section 5.2.3.1, "Age Constraint".
Now that the static controls have been run on the source data, you are ready to move on to the implementation of integration interfaces.