Oracle8i Data Warehousing Guide
Release 2 (8.1.6)

Part Number A76994-01

Library

Product

Contents

Index

Go to previous page Go to next page

9
Dimensions

The following sections will help you create and manage a data warehouse:

What is a Dimension?

A dimension is a structure that categorizes data in order to enable end users to answer business questions. Commonly used dimensions are Customer, Product, and Time. For example, each store of a video chain might gather and store data regarding sales and rentals of video tapes at the check-out counter. The video chain management can build a data warehouse to analyze the sales of its products across all stores over time and help answer questions such as:

The data in the video chain's data warehouse system has two important components: dimensions and facts. The dimensions are products, locations (stores), promotions, and time. One approach for identifying your dimensions is to review your reference tables, such as a product table that contains everything about a product, or a store table containing all information about a store. The facts are sales (units sold or rented) and profits. A data warehouse contains facts about the sales of each product at each store on a daily basis.

A typical dimensional cube is shown in Figure 9-1:

Figure 9-1 Sample Dimensional Cube


Dimensions do not have to be defined, but spending time creating them can yield significant benefits because they help query rewrite perform more complex types of rewrite. They are mandatory if you use the Summary Advisor to recommend which materialized views to create, drop, or retain. See Chapter 19, "Query Rewrite", for further details regarding query rewrite. See Chapter 15, "Summary Advisor", for further details regarding the Summary Advisor.

Dimension values are usually organized into hierarchies. Going up a level in the hierarchy is called rolling up the data and going down a level in the hierarchy is called drilling down the data. In the video chain example:

Figure 9-2 Geography Dimension


Data analysis typically starts at higher levels in the dimensional hierarchy and gradually drills down if the situation warrants such analysis.

You can visualize the dimensions of a business process as an n-dimensional data cube. In the video chain example, the business dimensions product, location, and time can be represented along the three axes of the cube. Each unit along the product axis represents a different product, each unit along the location axis represents a store, and each unit along the time axis represents a month. At the intersection of these values is a cell that contains factual information, such as units sold and profits made. Higher-level analysis consists of selecting and aggregating the factual information within a subcube, such as rentals of comedy videos in California stores during the second quarter of 1998.

Therefore, the first step towards creating a dimension is to identify the dimensions within your data warehouse and then draw the hierarchies as shown in Figure 9-2. For example, city is a child of state (because you can aggregate city-level data up to state), and state. Using this approach, you should find it easier to translate this into an actual dimension.

In the case of normalized or partially normalized dimensions (a dimension that is stored in more than one table), identify how these tables are joined. Note whether the joins between the dimension tables can guarantee that each child-side row joins with one and only one parent-side row. In the case of denormalized dimensions, determine whether the child-side columns uniquely determine the parent-side (or attribute) columns. These constraints can be enabled with the NOVALIDATE and RELY options if the relationships represented by the constraints are guaranteed by other means. Note that if the joins between fact and dimension tables do not support this relationship, you still gain significant performance advantages from defining the dimension with the CREATE DIMENSION statement. Another alternative, subject to certain restrictions, is to use outer joins in the materialized view definition (that is, in the CREATE MATERIALIZED VIEW statement).

You must not create dimensions in any schema that does not satisfy these relationships, incorrect results can be returned from queries otherwise.

Drilling Across

Drilling across is when you change what you are viewing to another hierarchy at the same level. An example of drilling across is in Figure 9-3:

Figure 9-3 Drilling Across


Going from Group to Region is drilling down, but Region to Stores is drilling across. Stores of > 100 is a non-hierarchical attribute.

The caveat with drilling across is that you will not necessarily enter another hierarchy at the same level, which means the totals may be different.

Creating a Dimension

Before you can create a dimension, tables must exist in the database which contain this dimension data. For example, if you create a dimension called LOCATION, one or more tables must exist which contains the city, state, and country information. In a star-schema data warehouse, these dimension tables already exist. It is therefore a simple task to identify which ones will be used.

You create a dimension using either the CREATE DIMENSION statement or the Dimension wizard in Oracle Enterprise Manager. Within the CREATE DIMENSION statement, use the LEVEL...IS clause to identify the names of the dimension levels.

The location dimension contains a single hierarchy, with arrows drawn from the child level to the parent level. At the top of this dimension graph is the special level ALL, that represents aggregation over all rows. Each arrow in this graph indicates that for any child there is one and only one parent. For example, each city must be contained in exactly one state and each state must be contained in exactly one country. States that belong to more than one country, or that belong to no country, violate hierarchical integrity. Hierarchical integrity is necessary for the correct operation of management functions for materialized views that include aggregates.

For example, you can declare a dimension LOCATION which contains levels CITY, STATE, and COUNTRY:

CREATE DIMENSION location_dim

LEVEL city      IS location.city
LEVEL state     IS location.state
LEVEL country   IS location.country

Each level in the dimension must correspond to one or more columns in a table in the database. Thus, level city is identified by the column city in the table called location and level country is identified by a column called country in the same table.

In this example, the database tables are denormalized and all the columns exist in the same table. However, this is not a prerequisite for creating dimensions. "Using Normalized Dimension Tables" shows how to create a dimension that has a normalized schema design using the JOIN KEY clause.

The next step is to declare the relationship between the levels with the HIERARCHY statement and give that hierarchy a name. A hierarchical relationship is a functional dependency from one level of a hierarchy to the next level in the hierarchy. Using the level names defined previously, the CHILD OF relationship denotes that each child's level value is associated with one and only one parent level value. Again, using the entities in Figure 9-4, the following statements declare a hierarchy LOC_ROLLUP and define the relationship between CITY, STATE, and COUNTRY.

HIERARCHY loc_rollup    (

city     CHILD OF
state    CHILD OF
country     )

In addition to the 1:n hierarchical relationships, dimensions also include 1:1 attribute relationships between the hierarchy levels and their dependent dimension attributes. For example, if there are columns governor and mayor, then the ATTRIBUTE...DETERMINES statement would be state to governor and city to mayor.

In our example, suppose a query were issued that queried by mayor instead of city. Since this 1:1 relationship exists between the attribute and the level, city can be used to identify the data.

ATTRIBUTE city DETERMINES mayor

The complete dimension definition, including the creation of the location table, follows:

CREATE TABLE location  (

city     VARCHAR2(30),
state    VARCHAR2(30),
country  VARCHAR2(30),
mayor    VARCHAR2(30),
governor  VARCHAR2(30)  );

CREATE DIMENSION location_dim
LEVEL city      IS location.city
LEVEL state     IS location.state
LEVEL country   IS location.country
HIERARCHY loc_rollup (
city     CHILD OF
state    CHILD OF
country )
ATTRIBUTE city DETERMINES location.mayor ATTRIBUTE state DETERMINES location.governor;

The design, creation, and maintenance of dimensions is part of the design, creation, and maintenance of your data warehouse schema. Once the dimension has been created, check that it meets these requirements:

Multiple Hierarchies

A single dimension definition can contain multiple hierarchies as illustrated below. Suppose a department store wants to track the sales of certain items over time. The first step is to define the time dimension over which sales will be tracked. Figure 9-4 illustrates a dimension "Time_dim" with three time hierarchies.

Figure 9-4 Time_dim Dimension with Three Time Hierarchies


From the illustration, you can construct the following denormalized Time dimension statement. The associated CREATE TABLE statement is also shown.

CREATE TABLE time (

curDate     DATE,
month       INTEGER,
quarter     INTEGER,
year        INTEGER,
season      INTEGER,
week_num    INTEGER,
dayofweek   VARCHAR2(30),
month_name  VARCHAR2(30)  );

CREATE DIMENSION Time_dim
LEVEL curDate    IS time.curDate 
LEVEL month      IS time.month 
LEVEL quarter    IS time.quarter 
LEVEL year       IS time.year
LEVEL season     IS time.season
LEVEL week_num   IS time.week_num

HIERARCHY calendar_rollup (
curDate       CHILD OF
month         CHILD OF
quarter       CHILD OF
year                    )
HIERARCHY weekly_rollup (
     curDate          CHILD OF
     week_num           )
HIERARCHY seasonal_rollup (
     curDate          CHILD OF
     season             ) 
ATTRIBUTE curDate DETERMINES time.dayofweek ATTRIBUTE month DETERMINES time.month_name;

Using Normalized Dimension Tables

The tables used to define a dimension may be normalized or denormalized and the individual hierarchies can be normalized or denormalized. If the levels of a hierarchy come from the same table, it is called a fully denormalized hierarchy. For example, CALENDAR_ROLLUP in the Time dimension is a denormalized hierarchy. If levels of a hierarchy come from different tables, such a hierarchy is either a fully or partially normalized hierarchy. This section shows how to define a normalized hierarchy.

Suppose the tracking of products is done by product, brand, and department. This data is stored in the tables PRODUCT, BRAND, and DEPARTMENT. The product dimension is normalized because the data entities ITEM_NAME, BRAND_ID, and DEPT_ID are taken from different tables. The clause JOIN KEY within the dimension definition specifies how to join together the levels in the hierarchy. The dimension statement and the associated CREATE TABLE statements for the PRODUCT, BRAND, and DEPARTMENT tables are shown below.

CREATE TABLE product  (

item_name    VARCHAR2(30),
brand_id     INTEGER  );

CREATE TABLE brand (
brand_id     INTEGER,
brand_name   VARCHAR2(30),
dept_id      INTEGER);

CREATE TABLE department (
dept_id       INTEGER,
dept_name     VARCHAR2(30),
dept_type     INTEGER);

CREATE DIMENSION product_dim
LEVEL item      IS product.item_name 
LEVEL brand_id  IS brand.brand_id 
LEVEL dept_id   IS department.dept_id 
 
HIERARCHY merchandise_rollup (
item                   CHILD OF  
brand_id               CHILD OF  
dept_id 

JOIN KEY  product.brand_id REFERENCES brand_id 
JOIN KEY  brand.dept_id    REFERENCES dept_id
) ATTRIBUTE brand_id DETERMINES product.brand_name ATTRIBUTE dept_id DETERMINES (product.dept_name, product.dept_type);

Dimension Wizard

The dimension wizard is automatically invoked whenever a request is made to create a dimension object in Oracle Enterprise Manager. The user is then guided step by step through the information required for a dimension.

A dimension created via the wizard may contain any of the attributes described in "Creating a Dimension", such as join keys, multiple hierarchies and attributes. Some users may prefer to use the wizard because it will graphically display the hierarchical relationships as they are being constructed. When it is time to describe the hierarchy, the dimension wizard will automatically display a default hierarchy based on the column values, which the user can subsequently amend. See the Oracle Enterprise Manager documentation set for further details.

Viewing Dimensions

Dimensions can be viewed through one of two methods:

Using The DEMO_DIM Package

Two procedures are available which allow you to display the dimensions that have been defined. First, the file smdim.sql must be executed to provide the DEMO_DIM package, which includes:

The DEMO_DIM.PRINT_DIM procedure has only one parameter, the name of the dimension to display. The example below shows how to display the dimension TIME_PD.

DEMO_DIM.PRINT_DIM  ('TIME_PD');

To display all of the dimensions that have been defined, call the procedure DEMO_DIM.PRINT_ALLDIMS without any parameters as shown below.

DEMO_DIM.PRINT_ALLDIMS ();

Irrespective of which procedure is called, the output format is identical. A sample display is shown below.

DIMENSION GROCERY.TIME_PD 
LEVEL FISCAL_QTR IS GROCERY.WEEK.FISCAL_QTR  
LEVEL MONTH IS GROCERY.MONTH.MONTH  
LEVEL QUARTER IS GROCERY.QUARTER.QUARTER  
LEVEL TIME_KEY IS GROCERY.TIME.TIME_KEY  
LEVEL WEEK IS GROCERY.WEEK.WEEK  
LEVEL YEAR IS GROCERY.YEAR.YEAR  
HIERARCHY WEEKLY_ROLLUP (  
      TIME_KEY 
       CHILD OF WEEK 
   JOIN KEY GROCERY.TIME.WEEK REFERENCES WEEK 
  ) 
    HIERARCHY FISCAL_ROLLUP (  
             TIME_KEY 
             CHILD OF WEEK 
             CHILD OF FISCAL_QTR 
    JOIN KEY GROCERY.TIME.WEEK REFERENCES WEEK 
    ) 
   HIERARCHY CALENDAR_ROLLUP (  
             TIME_KEY 
             CHILD OF MONTH 
             CHILD OF QUARTER 
             CHILD OF YEAR 
      JOIN KEY GROCERY.TIME.MONTH REFERENCES MONTH 
      JOIN KEY GROCERY.MONTH.QUARTER REFERENCES QUARTER 
      JOIN KEY GROCERY.QUARTER.YEAR REFERENCES YEAR 
    ) 
  
    ATTRIBUTE TIME_KEY DETERMINES GROCERY.TIME.DAY_NUMBER_IN_MONTH 
    ATTRIBUTE TIME_KEY DETERMINES GROCERY.TIME.DAY_NUMBER_IN_YEAR 
    ATTRIBUTE WEEK DETERMINES GROCERY.WEEK.WEEK_NUMBER_OF_YEAR 
    ATTRIBUTE MONTH DETERMINES GROCERY.MONTH.FULL_MONTH_NAME

Using Oracle Enterprise Manager

All of the dimensions which exist in the data warehouse can be easily viewed using Oracle Enterprise Manager. Selecting the Dimension object from within the Schema icon, will display all of the dimensions. Selecting a specific dimension will graphically display its hierarchy, levels and any attributes which may have been defined. See the Oracle Enterprise Manager documentation set for further details.

Dimensions and Constraints

Constraints play an important role with dimensions. In most cases, full referential integrity is enforced on the operational databases, and operational procedures can be used to ensure that data flowing into the data warehouse (after data cleansing) never violates referential integrity; so, in practice, referential integrity constraints may or may not be enabled in the data warehouse.

It is recommended that constraints be enabled and, if validation time is a concern, then the NOVALIDATE clause should be used as follows:

ENABLE NOVALIDATE CONSTRAINT pk_time;

Primary and foreign keys should be implemented as described. Referential integrity constraints and NOT NULL constraints on the fact tables provide information that query rewrite can use to extend the usefulness of materialized views.

In addition, the RELY clause should be used to advise query rewrite that it can rely upon the constraints being correct as follows:

ALTER TABLE time MODIFY CONSTRAINT pk_time RELY;

Validating a Dimension

If the relationships described by the dimensions are incorrect, wrong results could occur. Therefore, you should verify the relationships specified by CREATE DIMENSION using the DBMS_OLAP.VALIDATE_DIMENSION procedure periodically.

This procedure is easy to use and only has four parameters:

The following example validates the dimension time_fn in the Grocery schema

DBMS_OLAP.VALIDATE_DIMENSION ('TIME_FN', 'GROCERY', FALSE, TRUE);

All exceptions encountered by the VALIDATE_DIMENSION procedure are placed in the table MVIEW$_EXCEPTIONS, which is created in the user's schema. Querying this table will identify the exceptions that were found. For example:

OWNER    TABLE_NAME  DIMENSION_NAME RELATIONSHIP BAD_ROWID          
-------- ----------- -------------- ------------ ------------------ 
GROCERY  MONTH       TIME_FN        FOREIGN KEY  AAAAuwAAJAAAARwAAA

However, rather than query this table, it may be better to query as follows where the rowid of the invalid row is used to retrieve the actual row that has violated the constraint. In this example, the dimension TIME_FD is checking a table called month. It has found a row that violates the constraints and using the rowid, we can see exactly which row in the month table is causing the problem.

SELECT * FROM month
WHERE rowid IN (SELECT bad_rowid FROM mview$_exceptions);

MONTH      QUARTER    FISCAL_QTR YEAR       FULL_MONTH_NAME      MONTH_NUMB
---------- ---------- ---------- ---------- -------------------- ----------
199903     19981      19981      1998       March                3

Altering a Dimension

Some modification can be made to the dimension using the ALTER DIMENSION statement. You can add or drop a level, hierarchy, or attribute from the dimension using this command.

Referring to the time dimension in Figure 9-4, you could remove the attribute month, drop the hierarchy weekly_rollup, and remove the level week. In addition, you could add a new level called qtr1.

ALTER DIMENSION time_dim DROP ATTRIBUTE month;
ALTER DIMENSION time_dim DROP HIERARCHY weekly_rollup;
ALTER DIMENSION time_dim DROP LEVEL week;
ALTER DIMENSION time_dim ADD LEVEL qtr1 IS time.fiscal_qtr;

A dimension becomes invalid if you change any schema object which the dimension is referencing. For example, if the table on which the dimension is defined is altered, the dimension becomes invalid.

To check the status of a dimension, view the contents of the column invalid in the table ALL_DIMENSIONS.

To revalidate the dimension, use the COMPILE option as follows:

ALTER DIMENSION time_dim COMPILE;

Dimensions can also be modified using Oracle Enterprise Manager.

Deleting a Dimension

A dimension is removed using the DROP DIMENSION command. For example:

DROP DIMENSION time_dim;

Dimensions can also be deleted using Oracle Enterprise Manager.


Go to previous page Go to next page
Oracle
Copyright © 1996-2000, Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index