|Oracle8i Data Warehousing Guide
Release 2 (8.1.6)
Part Number A76994-01
This chapter provides an overview of the Oracle implementation of data warehousing. Its sections include:
Note that this book is meant as a supplement to standard texts covering data warehousing, and is not meant to reproduce in detail material of a general nature. This book, therefore, focuses on Oracle-specific material. Two standard texts of a general nature are:
A data warehouse is a relational database that is designed for query and analysis rather than transaction processing. It usually contains historical data that is derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
In addition to a relational database, a data warehouse environment often consists of an Extraction, Transportation, and Transformation (ETT) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. See Chapter 10, "ETT Overview", for further information regarding the ETT process.
A common way of introducing data warehousing is to refer to Inmon's characteristics of a data warehouse, who says that they are:
Data warehouses are designed to help you analyze your data. For example, you might want to learn more about your company's sales data. To do this, you could build a warehouse concentrating on sales. In this warehouse, you could answer questions like "Who was our best customer for this item last year?" This kind of focus on a topic, sales in this case, is what is meant by subject oriented.
Integration is closely related to subject orientation. Data warehouses need to have the data from disparate sources put into a consistent format. This means that naming conflicts have to be resolved and problems like data being in different units of measure must be resolved.
Nonvolatile means that the data should not change once entered into the warehouse. This is logical because the purpose of a warehouse is to analyze what has occurred.
Most business analysis requires analyzing trends. Because of this, analysts tend to need large amounts of data. This is very much in contrast to OLTP systems, where performance requirements demand that historical data be moved to an archive.
Figure 1-1 illustrates some of the key differences between a data warehouse's model and an OLTP system's.
One major difference between the types of system is that data warehouses are not usually in third-normal form.
Data warehouses and OLTP systems have vastly different requirements. Here are some examples of the notable differences between typical data warehouses and OLTP systems:
Data warehouses are designed to accommodate ad hoc queries. The workload of a data warehouse may not be completely understood in advance, and the data warehouse is optimized to perform well for a wide variety of possible query operations.
OLTP systems support only predefined operations. The application may be specifically tuned or designed to support only these operations.
The data in a data warehouse is updated on a regular basis by the ETT process (often, every night or every week) using bulk data-modification techniques. The end users of a data warehouse do not directly update the data warehouse.
In an OLTP system, end users routinely issue individual data-modification statements in the database. The OLTP database is always up-to-date, and reflects the current state of each business transaction.
Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and guarantee data consistency.
A typical data warehouse query may scan thousands or millions of rows. For example, "Find the total sales for all customers last month."
A typical OLTP operation may access only a handful of records. For example, "Retrieve the current order for a given customer."
Data warehouses usually store many months or years of historical data. This is to support historical analysis of business data.
OLTP systems usually store only a few weeks' or months' worth of data. The OLTP system only stores as much historical data as is necessary to successfully meet the current transactional requirements.
As you might expect, data warehouses and their architectures can vary depending upon the specifics of each organization's situation. Figure 1-2 shows the most basic architecture for a data warehouse. In it, a data warehouse is fed from one or more source systems, and end users directly access the data warehouse.
Figure 1-3 illustrates a more complex data warehouse environment. In addition to a central database, there is a staging system used to cleanse and integrate data, as well as multiple data marts, which are systems designed for a particular line of business.