|Oracle® Warehouse Builder User's Guide
11g Release 1 (11.1)
|PDF · Mobi · ePub|
Oracle Warehouse Builder is a flexible tool that enables you to design and deploy various types of data integration strategies. Projects commonly implemented using Warehouse Builder involve mission critical operational systems, migration scenarios, integration of disparate operational systems, and traditional data warehousing. This chapter provides an introduction to using Warehouse Builder. It provides a starting point for using Warehouse Builder for the first time user and serves as a road map to the documentation.
If you have already read the Oracle Database 2 Day + Data Warehousing Guide, you may recognize some of the same content repeated here in an expanded format and with additional information for long-term planning and maintenance of not only data warehouses but data integration solutions in general.
This chapter includes the following topics:
Oracle Warehouse Builder is comprised of a set of graphical user interfaces to assist you in implementing solutions for integrating data. In the process of designing solutions, you create various objects that are stored as metadata in a centralized repository, known as a workspace.
The workspace is hosted on an Oracle Database. As a general user, you do not have full access to the workspace. Instead, you can access those workspaces to which you have been granted access.
You log in to a workspace by starting the Design Center, which is the primary graphical user interface. Use the Design Center to import source objects, design ETL processes such as mappings, and ultimately define the integration solution.
A mapping is an object in which you define the flow of data from sources to targets. Based on a mapping design, Warehouse Builder generates the code required to implement the ETL logic. In a data warehousing project, for example, the integration solution is a target warehouse. In that case, the mappings you create in the Design Center ultimately define a target warehouse.
After you complete the design of a mapping and prompt Warehouse Builder to generate the code, the next step is to deploy the mapping. Deployment is the process of copying the relevant metadata and code you generated in the Design Center to a target schema. The target schema is generically defined as the Oracle Database which will execute the ETL logic you designed in the Design Center. Specifically, in a traditional data warehousing implementation, the data warehouse is the target schema and the two terms are interchangeable.
Figure 2-1 illustrates the Warehouse Builder components.
As previously noted, the Design Center is the primary user interface. It is also a centralized interface in that you can start from it all the client based tools, including the Control Center Manager. A secondary user interface is the web-based Repository Browser. In addition to browsing design metadata and auditing execution data, you can view and create reports.
For the purposes of this illustration, the target schema and the repository exist on the same Oracle Database; however, in practice, target schemas often exist on separate databases. To deploy design objects and subsequently execute the generated code, use the Control Center Manager, which is the client interface that interacts with the target schema through the control center service.
Use Warehouse Builder to create a data warehouse in the following recommended order:
Before you can use any of the Warehouse Builder client components, first ensure you have access to a Warehouse Builder workspace.
To begin using Warehouse Builder, take the following steps:
Install the Warehouse Builder software and create the necessary workspaces as described in the Oracle Warehouse Builder Installation and Administration Guide.
If an administrator has previously completed the installation, contact that person for the required connection information.
Start the Design Center.
On a Windows platform, from the Start menu, select Programs. Select the Oracle home in which Warehouse Builder is installed, then Warehouse Builder, and then Design Center.
On a Linux platform, run
owbclient.sh located in the owb/bin/unix directory in the Oracle home for Warehouse Builder.
Figure 2-2 shows the Design Center with the top level folders in each of its three explorers expanded.
Use the Project Explorer to manage design objects for a given workspace. The design objects are organized into projects which provide a means for structuring the objects for security and reusability. Each project contains nodes for each type of design object that you can create or import.
Use the Connection Explorer to establish connections between the Warehouse Builder workspace and databases, data files, and applications.
Use the Global Explorer to manage objects that are common to all projects in a workspace and to administer security. Note that the Security node is visible to users who have an administrator role as discussed in the Oracle Warehouse Builder Installation and Administration Guide.
To prepare the Design Center, complete the following steps:
In the Project Explorer, identify the project to be used.
If you are satisfied with the single default project, MY_PROJECT, continue with the next step.
Alternatively, you can rename MY_PROJECT or define more projects. Each project you define is organized in the same fashion with nodes for databases, files, applications, and so on. For a different organization, consider creating optional collections as described in "Defining Collections".
Connect to source and target data objects.
In the Connection Explorer, establish these connections by defining locations. Expand the Location node and the nodes within it to gain a general understanding of the types of source and targets you can access from Warehouse Builder.
To create a location, right-click the appropriate node and select New. Fill in the requested connection information and select Test Connection. In this step, you merely establish connections to sources and targets. You do not move data or metadata until subsequent steps.
For more information about locations see "About Locations".
Identify the target schema.
Although you can use a flat file as a target, the most common and recommended scenario is to use the Oracle Database as the target schema.
To define the target schema, begin by creating a module. Modules are grouping mechanisms in the Project Explorer that correspond to locations in the Connection Explorer. The Oracle target module is the first of several modules you create in Warehouse Builder.
In the Project Explorer, expand the Databases node. Right-click Oracle and select New. The Create Module wizard displays. Set the module type to Warehouse Target and specify whether the module will be used in development, quality assurance, or production. This module status is purely descriptive and has no bearing on subsequent steps you take.
When you complete the wizard, the target module displays with nodes for mappings, transformations, tables, cubes and the various other types of objects you utilize to design the target warehouse.
Create a separate Oracle module for the data sources. (Optional)
At your discretion, you can either create another Oracle module to contain Oracle source data or proceed to the next step.
Identify the execution environment.
Under the Connection Explorer, notice the Control Centers node. A control center is an Oracle Database schema that manages the execution of the ETL jobs you design in the Design Center in subsequent steps.
During installation, Warehouse Builder creates the
DEFAULT_CONTROL_CENTER schema on the same database as the workspace.
If you choose to utilize the default execution environment, continue to the next step. Alternatively, you can define new control centers at any time. For more information and instructions, see "Deploying to Target Schemas and Executing ETL Logic".
Prepare development, test, and production environments. (Optional)
Thus far, these instructions describe the creation of a single project corresponding to a single execution environment. You can, however, reuse the logical design of this project in different physical environments such as testing or production environments.
Deploy a single data system to several different host systems or to various environments, by creating additional configurations.
See Also:Oracle Warehouse Builder Installation and Administration Guide for more information about creating additional configurations.
Adjust the client preference settings as desired or accept the default preference settings and proceed to the next step.
From the main menu in the Design Center, select Tools and then Preferences.
As a new user, you may be interested in setting the Environment Preferences, the locale under Appearance Preferences, and the naming mode under Naming Preferences. For information on all the preferences, see "Setting Preferences".
Create modules for each type of design object you intend to import metadata.
In the Project Explorer, select a node such as Files. For that node, determine the locations from which you intend to ultimately extract data. Now create a module for each relevant location by right-clicking on the node and select New.
Import metadata from the various data sources.
Right-click the module and select Import to extract metadata from the associated location. Warehouse Builder displays a wizard to guide you through the process of importing data.
For an example and additional information on importing data objects, see "Identifying Data Sources and Importing Metadata".
For the metadata you imported, profile its corresponding data. (Optional)
Before continuing to the next step, consider using the data profiling option to ensure data quality as described in "Understanding Data Quality Management".
Data can only be transformed into actionable information when you are confident of its reliability. Before you load data into your target system, you must first understand the structure and the meaning of your data, and then assess the quality.
Consider using the data profiling option to better understand the quality of your source data. Next, correct the source data and establish a means to detect and correct errors that may arise in future loads. For more information, on data profiling and data quality, see "Understanding Data Quality Management".
Create and design the data objects for the Oracle target module.
In previous steps, you may have already imported existing target objects. For new target objects, design any of the dimensional or relational objects listed in Table 13-1.
To create data objects, you can either start the appropriate wizard or use the Data Object Editor. To use a wizard, right-click the node for the desired object and select New. After using a wizard, you may want to modify the object in the editor. In that case, right-click the object and select Open Editor.
For additional information, see "Designing the Target Schema".
As you design objects, be sure to frequently validate the design objects.
You can validate objects as you create them, or validate a group of objects together. In the Project Explorer, select one or more objects or modules, then click the Validate icon.
Examine the messages in the Validation Results window. Correct any errors and try validating again.
To redisplay the most recent validation results at a later time, select Validation Messages from the View menu.
For additional information, see "Validating Data Objects".
Configure the data objects.
Configuring data objects sets the physical properties of the object. You must not generate and deploy data objects without specifying the physical property values.
When you create data objects, Warehouse Builder assigns default configuration property values based on the type of object. In most cases, these default values are appropriate. You can edit and modify the configuration property values of objects according to your requirement. For example, you configure a table to specify the name of the tablespace in which it is created.
To configure a data object, select the data object in the Project Explorer and click the Configure icon. Or right-click the data object in the Project Explorer and select Configure.
When satisfied with the design of the target objects, generate the code.
Generation produces a DDL or PL/SQL script to be used in subsequent steps to create the data objects in the target schema. For more information about generation, see "Generating Data Objects".
In the Data Object Editor, you can generate code for a single object by clicking the Generate icon.
In the Project Explorer, select one or more objects or modules, then click the Generate icon. Examine the messages in the Generation Results window. To redisplay the most recent generation results at a later time, select Generated Scripts from the View menu.
You can save the generated script as a file and optionally deploy it outside Warehouse Builder.
Design mappings that define the flow of data from a source to target objects.
In the Project Explorer, expand the Oracle target module, right-click the Mappings node and select New.
The Mapping Editor enables you to define the flow of data visually. You can drag-and-drop operators onto the canvas, and draw lines that connect the operators. Operators represent both data objects and functions such as filtering, aggregating, and so on.
Follow the Instructions for Defining Mappings, concluding with generating the code for the mapping.
To manage dependencies between mappings, see "Designing Process Flows".
Recall that deployment is the process of copying the relevant metadata and code you generated in the Design Center to a target schema. This step is necessary to enable the target schema to execute ETL logic such as mappings.
To deploy and execute, complete the following steps:
Deploy objects from either the Design Center or Control Center Manager.
In this step, you define the objects in the target schema. You need do this only once.
The simplest approach is to deploy directly from the Design Center by selecting an object and clicking the Deploy icon. In this case, Warehouse Builder deploys the objects with the default deployment settings.
Alternatively, if you want more control and feedback on how Warehouse Builder deploys objects, from the Design Center menu select Tools, then Control Center Manager.
Whether you deploy objects from the Design Center or the Control Center Manager, be sure to deploy all associated objects. For example, when deploying a mapping, also deploy the target data objects such as tables that you defined and any associated process flows or other mappings.
For more information, see "Deploying to Target Schemas and Executing ETL Logic".
Execute the ETL logic to populate the target warehouse.
In this step, you move data for the first time. Repeat this step each time you want to refresh the target with new data.
You have two options for executing the ETL logic in mappings and process flows. You can create and deploy a schedule as described in "Process for Defining and Using Schedules". Or you can execute jobs manually as described in "Starting ETL Jobs".
It is essential to ensure the quality of data entering your data warehouse over time. Data auditors enable you to monitor the quality of incoming data by validating incoming data against a set of data rules and determining if the data confirms to the business rules defined for your data warehouse. For more information about data auditors and data rules, see "Understanding Data Quality Management".
Although the Control Center Manager displays histories for both deployment and execution, the Repository Browser is the preferred interface for monitoring and reporting on Warehouse Builder operations. For more information, see "Auditing Deployments and Executions"