3 Identifying Data Sources and Importing Metadata

In general, the source systems for a data warehouse are typically transaction processing applications. A sales analysis data warehouse, for instance, extracts data from an order entry system that records current order activities.Designing the extraction process can be problematic. If the source system is complex and poorly documented, then determining which data to extract can be difficult. Moreover, it is usually not possible to modify the source system, nor adjust its performance or availability. To address these problems, first import the metadata.

Metadata is the data that describes the contents of a given object in a data set. The metadata for a table, for instance, indicates the data type for each column.

For Oracle Database customers, the recommended tool of choice for importing metadata is Oracle Warehouse Builder. After you import the metadata into Warehouse Builder, you can annotate the metadata and design an extraction strategy independently from the transaction processing application.

This section describes how to use Warehouse Builder to import metadata. It includes the following topics:

General Steps for Importing Metadata from Sources

To import metadata, complete the following steps:

  1. Complete the instructions for "Accessing Oracle Warehouse Builder".

  2. Download and install the Oracle Warehouse Builder demonstration as described in "Installing the Oracle Warehouse Builder Demonstration".

  3. Identify the Warehouse Builder project.

    Refer to "About Workspaces, Projects, and other Devices in Warehouse Builder".

  4. Follow along with the "Example: Importing Metadata from Flat Files".

    The example explains how to specify where the source files are located and how to launch the Metadata Import Wizard. The process for importing data objects such as files, tables, and views is the same. Therefore, once you complete this example, you gain a general understanding of how to import all data objects into Warehouse Builder.

About Workspaces, Projects, and other Devices in Warehouse Builder

After you install the Warehouse Builder demonstration and launch the Design Center, you log into a workspace. Notice that the user name and workspace name display along the top of the Design Center.

Recall that a workspace includes a set of users working on related projects. Security is an important consideration for determining how many workspaces to create. A common model is to create separate workspaces for development, testing, and production. Using this model, you can allow users such as your developers access to the development and testing workspaces but restrict them from the production workspace.

You can optionally divide a workspace into projects. In practise, however, workspaces typically contain only one active project. This is because a project is simply a container and not suitable for implementing security or establishing subject-oriented groupings. Recall that security is implemented via workspaces. Establishing subject-oriented groupings can be accomplished via modules, as discussed later.

The purpose of a project is to contain the sets of metadata related to an initiative. For data warehousing, therefore, include all the metadata for sources and targets in the same project. Also include all the functions, procedures, transformations, mappings, and other objects required to implement your initiative. The project contain nodes for each type of object that you can either create or import into Warehouse Builder. Expand the different nodes to gain a general understanding of the types of objects you can create or import.

Notice that in the demonstration, the Project Explorer displays on the left side and includes two projects. MY_PROJECT is a default, pre-seeded project. You can use MY_PROJECT as your single active project in the workspace. For the purposes of the demonstration, we work in the OWB_DEMO project.

Example: Importing Metadata from Flat Files

This example illustrates how to import metadata from flat files. Specifically, our objective is to import the metadata into the OWB_DEMO project such that the two files, export.csv and expense_categories.csv, display in the Project Explorer under the Files node.

To import metadata from flat files:

  1. Indicate where the flat files are located as described in "Specifying Locations for the Flat Files".

  2. Organize OWB_DEMO to receive the incoming flat file metadata as described in "Creating Modules in the Project".

  3. Indicate which files to import as described in "Launching the Import Metadata Wizard".

  4. Specify the metadata structure as described in "Using the Flat File Sample Wizard".

  5. Import the metadata for both flat files as described in "Importing the Flat File Data".

Specifying Locations for the Flat Files

Indicate where the flat files are located.

Notice in the Design Center, on the right side is an explorer called Connection Explorer and it includes a node called Locations. Use the locations node to indicate where your source data resides.

Expand the Location node and the nodes within it to gain a general understanding of the types of source and targets you can access from Warehouse Builder

For this example, we right-click in the Files node and select New to define a location for the flat files.

Follow the prompts in the Create File System Location dialog box. Each location you define corresponds to a specific directory on your computer file system. Therefore, consider naming the location based on the drive and directory. For the purposes of this demonstration, name the location C_NEWOWBDEMO_SOURCEFILES.

Creating Modules in the Project

In the Project Explorer, organize OWB_DEMO to receive the incoming flat file metadata.

In a data warehousing implementation, you are likely to have numerous source and target objects. As a means of organizing these various objects, Warehouse Builder requires you to create modules. Each module corresponds to a location that you create in the Connection Explorer.

In this example, since you have only one location for the two flat files, you create one module in the Project Explorer. Right-click on the Files node under OWB_DEMO and select New. Name the new module SALES_EXPENSES. For its location, specify the location you defined in the previous step, C_NEWOWBDEMO_SOURCEFILES.

Launching the Import Metadata Wizard

Launch the Import Metadata Wizard.

Right-click on the module SALES_EXPENSES, select New, and follow the prompts in the Import Metadata Wizard. The prompts in the wizard vary according to the type of module you selected and therefore the type of data object you are importing.

In this example, you selected to import two flat files. On the summary page of the Import Metadata Wizard, select one of the files and then select Sample to launch the Flat File Sample Wizard.

In the next steps, you sample each file in turn and then select Finish on this page to import the metadata.

Figure 3-1 Import Metadata Wizard

Description of Figure 3-1 follows
Description of "Figure 3-1 Import Metadata Wizard"

Using the Flat File Sample Wizard

Follow the prompts in the Flat File Sample Wizard to specify the metadata structure.

Based on the number of characters you specify to be sampled, the wizard reads the flat file data and provides you with suggestions as to the structure of the metadata. If the sample size is too small, the wizard may misinterpret the data and make invalid suggestions. Accordingly, you can modify and refine the settings in the wizard.

For the purposes of this example, the wizard correctly determines that the file is delimited, includes a single record type, and the character set is WE8MSWIN1252. Accept all the default settings presented in the Flat File Wizard.

To become familiar with the various types of files the wizard can sample, notice the options on the wizard pages and also select Help for additional insights.

After sampling the first flat file, return to the Summary and Import page of Metadata Import Wizard to sample the second file.

Accept the default setting in the Flat File Wizard as you did for the previous file.

Importing the Flat File Data

Import the metadata for both flat files.

Return again to the Summary and Import page and select Finish.

When you select Finish, the wizard imports the data based on the selections you made when sampling the data. The two comma separated files now display under the SALES_EXPENSES module which is under the Files node in OWB_DEMO project.

Figure 3-2 The SOURCE Flat File Module

Description of Figure 3-2 follows
Description of "Figure 3-2 The SOURCE Flat File Module"