|Oracle9i Data Warehousing Guide
Release 1 (9.0.1)
Part Number A90237-01
The following sections will help you create and manage a data warehouse:
A dimension is a structure that categorizes data in order to enable users to answer business questions. Commonly used dimensions are
time. For example, each sales channel of a clothing retailer might gather and store data regarding sales and reclamations of their Cloth assortment. The retail chain management can build a data warehouse to analyze the sales of its products across all stores over time and help answer questions such as:
The data in the retailer's data warehouse system has two important components: dimensions and facts. The dimensions are products, customers, promotions, channels, and time. One approach for identifying your dimensions is to review your reference tables, such as a product table that contains everything about a product, or a promotion table containing all information about promotions. The facts are sales (units sold) and profits. A data warehouse contains facts about the sales of each product at on a daily basis.
A typical relational implementation for such a data warehouse is a Star Schema. The fact information is stored in the so-called fact table, whereas the dimensional information is stored in the so-called dimension tables. In our example, each sales transaction record is uniquely defined as per customer, per product, per sales channel, per promotion, and per day (time).
Chapter 17, "Schema Modeling Techniques" for further details
In Oracle9i, the dimensional information itself is stored in a dimension table. In addition, the database object dimension helps to organize and group dimensional information into hierarchies. This represents natural
1:n relationships between columns or column groups (the levels of a hierarchy) that cannot be represented with constraint conditions. Going up a level in the hierarchy is called rolling up the data and going down a level in the hierarchy is called drilling down the data. In the retailer example:
timedimension, months roll up to quarters, quarters roll up to years, and years roll up to all years.
productdimension, products roll up to subcategories, subcategories roll up to categories, and categories roll up to all products.
customerdimension, customers roll up to
city. Then cities rolls up to
state. Then states roll up to
country. Then countries roll up to
subregion. Finally, subregions roll up to
region, as shown in Figure 9-1.
Data analysis typically starts at higher levels in the dimensional hierarchy and gradually drills down if the situation warrants such analysis.
Dimensions do not have to be defined, but spending time creating them can yield significant benefits, because they help query rewrite perform more complex types of rewrite. They are mandatory if you use the Summary Advisor (a GUI tool for materialized view management) to recommend which materialized views to create, drop, or retain.
You must not create dimensions in any schema that does not satisfy these relationships. Incorrect results can be returned from queries otherwise.
Before you can create a dimension object, the dimension tables must exist in the database, containing the dimension data. For example, if you create a customer dimension, one or more tables must exist that contain the city, state, and country information. In a star schema data warehouse, these dimension tables already exist. It is therefore a simple task to identify which ones will be used.
Now you can draw the hierarchies of a dimension as shown in Figure 9-1. For example,
city is a child of
state (because you can aggregate city-level data up to state), and
country. This hierarchical information will be stored in the database object dimension.
In the case of normalized or partially normalized dimension representation (a dimension that is stored in more than one table), identify how these tables are joined. Note whether the joins between the dimension tables can guarantee that each child-side row joins with one and only one parent-side row. In the case of denormalized dimensions, determine whether the child-side columns uniquely determine the parent-side (or attribute) columns. These constraints can be enabled with the
RELY clauses if the relationships represented by the constraints are guaranteed by other means.
You create a dimension using either the
DIMENSION statement or the Dimension Wizard in Oracle Enterprise Manager. Within the
DIMENSION statement, use the
LEVEL clause to identify the names of the dimension levels.
Oracle9i SQL Reference for a complete description of the
This customer dimension contains a single hierarchy with a geograph rollup, with arrows drawn from the child level to the parent level, as shown in Figure 9-1.
Each arrow in this graph indicates that for any child there is one and only one parent. For example, each city must be contained in exactly one state and each state must be contained in exactly one country. States that belong to more than one country, or that belong to no country, violate hierarchical integrity. Hierarchical integrity is necessary for the correct operation of management functions for materialized views that include aggregates.
For example, you can declare a dimension
products_dim, which contains levels
CREATE DIMENSION products_dim LEVEL product IS (products.prod_id) LEVEL subcategory IS (products.prod_subcategory) LEVEL category IS (products.prod_category) ...
Each level in the dimension must correspond to one or more columns in a table in the database. Thus, level
product is identified by the column
prod_id in the products table and level
subcategory is identified by a column called
prod_subcategory in the same table.
In this example, the database tables are denormalized and all the columns exist in the same table. However, this is not a prerequisite for creating dimensions. "Using Normalized Dimension Tables" shows how to create a dimension
customers_dim that has a normalized schema design using the
The next step is to declare the relationship between the levels with the
HIERARCHY statement and give that hierarchy a name. A hierarchical relationship is a functional dependency from one level of a hierarchy to the next level in the hierarchy. Using the level names defined previously, the
OF relationship denotes that each child's level value is associated with one and only one parent level value. The following statements declare a hierarchy
prod_rollup and define the relationship between
In addition to the
1:n hierarchical relationships, dimensions also include
1:1 attribute relationships between the hierarchy levels and their dependent, determined dimension attributes. For example the dimension
times_dim, as defined in Appendix B, has columns
days_in_fiscal_month. Their relationship is defined as follows:
LEVEL fis_month IS TIMES.FISCAL_MONTH_DESC ... ATTRIBUTE fis_month DETERMINES (fiscal_month_desc, fiscal_month_number, fiscal_month_name, days_in_fis_month, end_of_fis_month)
DETERMINES clause relates fis_month to
fiscal_month_name and days_in_fiscal_month. Note that this is a unidirectional determination. It is only guaranteed, that for a specific
fiscal_month, for example,
1999-11, you will find exactly one matching values for
fiscal_month_name, for example,
days_in_fiscal_month, for example, 28. You cannot determine a specific
fiscal_month_desc based on the
fiscal_month_name, which is
November for every fiscal year.
In this example, suppose a query were issued that queried by
fiscal_month_name instead of
fiscal_month_desc. Because this
1:1 relationship exists between the attribute and the level, an already aggregated materialized view containing
fiscal_month_desc can be joined back to the dimension information and used to identify the data.
Chapter 22, "Query Rewrite" for further details of using dimensional information
An exemplary dimension definition follows:
CREATE DIMENSION products_dim LEVEL product IS (products.prod_id) LEVEL subcategory IS (products.prod_subcategory) LEVEL category IS (products.prod_category) HIERARCHY prod_rollup ( product CHILD OF subcategory CHILD OF category ) ATTRIBUTE product DETERMINES (products.prod_name, products.prod_desc, prod_weight_class, prod_unit_of_measure, prod_pack_size,prod_status, prod_list_price, prod_min_price) ATTRIBUTE subcategory DETERMINES (prod_subcategory, prod_subcat_desc) ATTRIBUTE category DETERMINES (prod_category, prod_cat_desc);
The design, creation, and maintenance of dimensions is part of the design, creation, and maintenance of your data warehouse schema. Once the dimension has been created, check that it meets these requirements:
fiscal_month_desc, then a possible attribute relationship would be
Note: The information stored with a dimension objects is only declarative. The above discussed relationships are not enforced with the creation of a dimension object. It is highly recommended to validate any dimension definition with the
DBMS_MVIEW.VALIDATE_DIMENSION procedure, as discussed on "Validating Dimensions".
A single dimension definition can contain multiple hierarchies as illustrated below. Suppose our retailer wants to track the sales of certain items over time. The first step is to define the time dimension over which sales will be tracked. Figure 9-2 illustrates a dimension
times_dim with two time hierarchies.
From the illustration, you can construct the hierarchy of the denormalized
DIMENSION statement as follows. The complete
DIMENSION statement as well as the
TABLE statement are shown in Appendix B, "Sample Data Warehousing Schema".
CREATE DIMENSION times_dim LEVEL day IS TIMES.TIME_ID LEVEL month IS TIMES.CALENDAR_MONTH_DESC LEVEL quarter IS TIMES.CALENDAR_QUARTER_DESC LEVEL year IS TIMES.CALENDAR_YEAR LEVEL fis_week IS TIMES.WEEK_ENDING_DAY LEVEL fis_month IS TIMES.FISCAL_MONTH_DESC LEVEL fis_quarter IS TIMES.FISCAL_QUARTER_DESC LEVEL fis_year IS TIMES.FISCAL_YEAR HIERARCHY cal_rollup ( day CHILD OF month CHILD OF quarter CHILD OF year ) HIERARCHY fis_rollup ( day CHILD OF fis_week CHILD OF fis_month CHILD OF fis_quarter CHILD OF fis_year ) <attribute determination clauses>...
The tables used to define a dimension may be normalized or denormalized and the individual hierarchies can be normalized or denormalized. If the levels of a hierarchy come from the same table, it is called a fully denormalized hierarchy. For example,
cal_rollup in the
times_dim dimension is a denormalized hierarchy. If levels of a hierarchy come from different tables, such a hierarchy is either a fully or partially normalized hierarchy. This section shows how to define a normalized hierarchy.
Suppose the tracking of a customer's location is done by city, state, and country. This data is stored in the tables
customers_dim is partially normalized because the data entities
country_id are taken from different tables. The clause
KEY within the dimension definition specifies how to join together the levels in the hierarchy. The dimension statement is partially shown below. The complete
DIMENSION statement as well as the
TABLE statement are shown in Appendix B, "Sample Data Warehousing Schema".
CREATE DIMENSION customers_dim LEVEL customer IS (customers.cust_id) LEVEL city IS (customers.cust_city) LEVEL state IS (customers.cust_state_province) LEVEL country IS (countries.country_id) LEVEL subregion IS (countries.country_subregion) LEVEL region IS (countries.country_region) HIERARCHY geog_rollup ( customer CHILD OF city CHILD OF state CHILD OF country CHILD OF subregion CHILD OF region JOIN KEY (customers.country_id) REFERENCES country ) ...<attribute determination clause>;
The Dimension Wizard is automatically invoked whenever a request is made to create a dimension object in Oracle Enterprise Manager. You are then guided step by step through the information required for a dimension.
A dimension created using the Wizard can contain any of the attributes described in "Creating Dimensions", such as join keys, multiple hierarchies, and attributes. You might prefer to use the Wizard because it graphically displays the hierarchical relationships as they are being constructed. When it is time to describe the hierarchy, the Wizard automatically displays a default hierarchy based on the column values, which you can subsequently amend.
Dimensions can be viewed through one of two methods:
Two procedures allow you to display the dimensions that have been defined. First, the file
smdim.sql, located under
$ORACLE_HOME/rdbms/demo, must be executed to provide the
DEMO_DIM package, which includes:
PRINT_DIM procedure has only one parameter: the name of the dimension to display. The example below shows how to display the dimension
To display all of the dimensions that have been defined, call the procedure
DEMO_DIM.PRINT_ALLDIMS without any parameters as shown below.
Regardless of which procedure is called, the output format is identical. A sample display is shown here.
DIMENSION SH.PROMO_DIM LEVEL CATEGORY IS SH.PROMOTIONS.PROMO_CATEGORY LEVEL PROMO IS SH.PROMOTIONS.PROMO_ID LEVEL SUBCATEGORY IS SH.PROMOTIONS.PROMO_SUBCATEGORY HIERARCHY PROMO_ROLLUP ( PROMO CHILD OF SUBCATEGORY CHILD OF CATEGORY ) ATTRIBUTE CATEGORY DETERMINES SH.PROMOTIONS.PROMO_CATEGORY ATTRIBUTE PROMO DETERMINES SH.PROMOTIONS.PROMO_BEGIN_DATE ATTRIBUTE PROMO DETERMINES SH.PROMOTIONS.PROMO_COST ATTRIBUTE PROMO DETERMINES SH.PROMOTIONS.PROMO_END_DATE ATTRIBUTE PROMO DETERMINES SH.PROMOTIONS.PROMO_NAME ATTRIBUTE SUBCATEGORY DETERMINES SH.PROMOTIONS.PROMO_SUBCATEGORY
All of the dimensions that exist in the data warehouse can be viewed using Oracle Enterprise Manager. Select the
Dimension object from within the
Schema icon to display all of the dimensions. Select a specific dimension to graphically display its hierarchy, levels, and any attributes that have been defined.
Constraints play an important role with dimensions. Full referential integrity is sometimes enabled in data warehouses, but not always. This is because operational databases normally have full referential integrity and you can ensure that the data flowing into your warehouse never violates the already established integrity rules.
Oracle recommends that constraints be enabled and, if validation time is a concern, then the
NOVALIDATE clause should be used as follows:
Primary and foreign keys should be implemented also. Referential integrity constraints and
NULL constraints on the fact tables provide information that query rewrite can use to extend the usefulness of materialized views.
In addition, you should use the
RELY clause to inform query rewrite that it can rely upon the constraints being correct as follows:
This information is also used for query rewrite.
Chapter 22, "Query Rewrite" for further details
The information of a dimension object is declarative only and not enforced by the database. If the relationships described by the dimensions are incorrect, incorrect results could occur. Therefore, you should verify the relationships specified by
DIMENSION using the
DBMS_OLAP.VALIDATE_DIMENSION procedure periodically.
This procedure is easy to use and has only five parameters:
TRUEto check only the new rows for tables of this dimension
TRUEto verify that all columns are not null
DBMS_OLAP.CREATE_IDprocedure. The ID is used to identify the result of each run
The following example validates the dimension
TIME_FN in the grocery schema
VARIABLE RID NUMBER; EXECUTE DBMS_OLAP.CREATE_ID(:RID); EXECUTE DBMS_OLAP.VALIDATE_DIMENSION ('TIME_FN', 'GROCERY', \ FALSE, TRUE, :RID);
VALIDATE_DIMENSION procedure encounters any errors, they are placed in a system table. The table can be accessed from the view
SYSTEM.MVIEW_EXCEPTIONS. Querying this view will identify the exceptions that were found. For example:
SELECT * FROM SYSTEM.MVIEW_EXCEPTIONS WHERE RUNID = :RID; RUNID OWNER TABLE_NAME DIMENSION_NAME RELATIONSHIP BAD_ROWID ----- -------- ----------- -------------- ------------ --------- 678 GROCERY MONTH TIME_FN FOREIGN KEY AAAAuwAAJAAAARwAAA
However, rather than query this view, it may be better to query the rowid of the invalid row to retrieve the actual row that has violated the constraint. In this example, the dimension
TIME_FN is checking a table called
month. It has found a row that violates the constraints. Using the rowid, you can see exactly which row in the month table is causing the problem.
SELECT * FROM month WHERE rowid IN (SELECT bad_rowid FROM SYSTEM.MVIEW_EXCEPTIONS WHERE RUNID = :RID); MONTH QUARTER FISCAL_QTR YEAR FULL_MONTH_NAME MONTH_NUMB -------- ------- ---------- ---- --------------- ---------- 199903 19981 19981 1998 March 3
Finally, to remove results from the system table for the current run:
You can modify the dimension using the
DIMENSION statement. You can add or drop a level, hierarchy, or attribute from the dimension using this command.
Referring to the
time dimension in Figure 9-2, you can remove the attribute
fis_year, drop the hierarchy
fis_rollup, or remove the level
fiscal_year. In addition, you can add a new level called
f_year as shown below.
ALTER DIMENSION times_dim DROP ATTRIBUTE fis_year; ALTER DIMENSION times_dim DROP HIERARCHY fis_rollup; ALTER DIMENSION times_dim DROP LEVEL fis_year; ALTER DIMENSION times_dim ADD LEVEL f_year IS times.fiscal_year;
If you try to remove anything with further dependencies inside the dimension, Oracle rejects the altering of the dimension. A dimension becomes invalid if you change any schema object that the dimension is referencing. For example, if the table on which the dimension is defined is altered, the dimension becomes invalid.
To check the status of a dimension, view the contents of the column
invalid in the
ALL_DIMENSIONS data dictionary view.
To revalidate the dimension, use the
COMPILE option as follows:
Dimensions can also be modified using Oracle Enterprise Manager.
A dimension is removed using the
DIMENSION statement. For example:
Dimensions can also be deleted using Oracle Enterprise Manager.