|Oracle8i Data Warehousing Guide
Release 2 (8.1.6)
Part Number A76994-01
This chapter discusses extracting, transporting, and transforming in a data warehousing environment:
You need to load your data warehouse regularly so that it can serve its purpose of allowing business analysis. To do this, however, data from one or more operational systems needs to be extracted and copied into the warehouse. This process of reading and preparing the data is relatively difficult, and needs to be performed on a regular basis.
The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETT, which stands for Extraction, Transformation, and Transportation. Indeed, the acronym ETT is perhaps too simplistic, since it omits one important phase, the loading of the data warehouse, and implies that each of other phases of the process is distinct. Rather than introduce new terminology, we will instead refer to the entire process as ETT. You should understand that ETT refers to a broad process, and not three well-defined steps.
The emphasis in many of the examples in this section is scalability. While many longtime users of Oracle are experts in programming complex data-transformation logic using PL/SQL, these chapters seek to suggest alternatives for many such data-manipulation operations, with a particular emphasis on implementations that leverage Oracle's existing parallel-query infrastructure.
Building and maintaining the ETT process is often considered one of the most difficult and resource-intensive portions of a data-warehouse project. Many data-warehousing projects use ETT tools to manage this process. Oracle Warehouse Builder, for example, provides ETT capabilities. Other data warehouse builders choose to create their own ETT tools and processes.
Oracle8i is not an ETT tool, and does not provide a complete solution for ETT. However, Oracle8i does provide a rich set of capabilities which can be leveraged by both ETT tools and home-grown ETT solutions. Oracle8i offers techniques for transporting data between Oracle databases, for transforming large volumes of data, and for quickly loading new data into a data warehouse.
Many examples in the ETT section of this guide (Chapter 10 through Chapter 15) use the same, simple star schema. This schema consists of a single fact table (called sales) partitioned by month and four dimension tables. The definitions of these tables follow:
CREATE TABLE product ( product_id VARCHAR2(6) NOT NULL, product_oe_id VARCHAR2(6), product_name VARCHAR2(60), product_language VARCHAR2(30), product_media VARCHAR2(8), product_category VARCHAR2(30) ) CREATE TABLE time ( time_id DATE NOT NULL, time_month VARCHAR2(5) NOT NULL, time_quarter VARCHAR2(4) NOT NULL, time_year NUMBER NOT NULL, time_dayno NUMBER NOT NULL, time_weekno NUMBER NOT NULL, time_day_of_week VARCHAR2(9) NOT NULL ) CREATE TABLE customer ( customer_id VARCHAR2(6) NOT NULL, customer_name VARCHAR2(25), customer_address VARCHAR2(40), customer_city VARCHAR2(30), customer_subregion VARCHAR2(30), customer_region VARCHAR2(15), customer_postalcode NUMBER(9), customer_age NUMBER(2), customer_gender VARCHAR2(1) ) CREATE TABLE channel ( channel_id VARCHAR2(2) NOT NULL, channel_description VARCHAR2(10) ) CREATE TABLE sales ( sales_transaction_id VARCHAR2(8) NOT NULL, sales_product_id VARCHAR2(4) NOT NULL, sales_customer_id VARCHAR2(6) NOT NULL, sales_time_id DATE NOT NULL, sales_channel_id VARCHAR2(4) NOT NULL, sales_quantity_sold NUMBER NOT NULL, sales_dollar_amount NUMBER NOT NULL) )