|Oracle8i Data Warehousing Guide
Release 2 (8.1.6)
Part Number A76994-01
This chapter describes constraints, and discusses:
Constraints provide a mechanism for ensuring that data conforms to guidelines specified by the database administrator. The most common types of constraints include unique constraints (ensuring that a given column is unique), not-null constraints, and foreign-key constraints (which ensure that two keys share a primary key-foreign key relationship).
Constraints can be used for the following basic purposes in a data warehouse:
Unlike many relational-database environments, data in a data warehouse is typically added and/or modified under very controlled circumstances during the ETT process. Multiple users typically do not update the data warehouse directly; this is considerably different from the usage of a typical operational system.
Thus, the specific usage of constraints in a data warehouse may vary considerably from the usage of constraints in operational systems. Oracle8i provides a wide breadth of constraint functionality to address both the needs of data warehouses as well as the needs to operational systems.
Many significant constraint features for data warehousing were introduced in Oracle8i, so that readers familiar with Oracle's constraint functionality in Oracle7 and Oracle8 should take special note of the new functionality described in this chapter. In fact, many Oracle7 and Oracle8-based data warehouses lacked constraints because of concerns about constraint performance. The new Oracle8i functionality for constraints is designed to address these concerns for data warehouses.
In order to understand how to best utilize constraints in a data warehouse, it is important to first understand the basic purposes of constraints:
Note that validation is independent of enforcement. Although the typical constraint in an operational system is both enabled and validated, any constraint could be validated but not enforced or vice versa (enforced but not validated). These latter two cases can be very useful for data warehouses, and are discussed in the examples below.
Note that the RELY state is only meaningful for constraints that have not been validated.
This section assumes that most readers are familiar with the typical usage of constraints; that is, constraints that are both enabled and validated. For data warehousing, many users have discovered the such constraints may not be effective, because it may be prohibitively costly to build and maintain such constraints.
A unique constraint is typically enforced using a unique index. However, in a data warehouse, in which tables can be extremely large, creating a unique index may be very costly both in terms of processing time and disk space.
Suppose that a data warehouse contained a table SALES, with includes a column SALES_ID. SALES_ID uniquely identifies a single sales transaction, and the data-warehouse wants to ensure that this column is unique within the data warehouse.
One way to create the constraint is:
By default, this constraint is both enabled and validated. Oracle will implicitly create a unique index on sales_id to support this constraint. However, this index may be problematic in a data warehouse for three reasons:
Why is an index required for unique constraints? The index is used for enforcement. That is, the unique index is necessary to ensure that each individual row that is modified in the SALES table will satisfy the unique constraint.
For data warehousing tables, an alternative mechanism for unique constraints is:
This statement will create a unique constraint, but since the constraint is disabled, there is no need for a unique index. This approach can be advantageous for many data warehousing environments, because the constraint now ensures uniqueness without the shortcomings of a unique index.
However, there are trade-offs for the data warehouse administrator to consider with DISABLE VALIDATE constraints. Since this constraint is disabled, no DML statements which may modify the unique column are permitted against the SALES table. Thus, there are two strategies for modifying this table in the presence of a constraint:
In a star schema data warehouse, foreign-key constraints are typically created in order to validate the relationship between the fact table and the dimension tables. An example constraint might be:
ALTER TABLE sales ADD CONSTRAINT fk_sales_time FOREIGN KEY (sales_time_id) REFERENCES time (time_id) ENABLE VALIDATE;
However, there are several different requirements in which a data warehouse administrator may choose to use different states for the foreign-key constraints. In particular, the ENABLE NOVALIDATE state can be useful for data warehousing. A data warehouse administrator may choose to use an ENABLE NOVALIDATE constraint when either:
Suppose that the data warehouse loaded new data into the fact table(s) every day, but only refreshed the dimension tables on the weekend. Then, during the week, it is possible that the dimension tables and fact tables may in fact disobey the foreign-key constraints. Nevertheless, the data warehouse administrator may wish to maintain the enforcement of this constraint to prevent any changes that might affect the foreign key constraint outside of the ETT process. Thus, the foreign key constraints could be created every night, following the ETT process, as:
ALTER TABLE sales ADD CONSTRAINT fk_sales_time FOREIGN KEY (sales_time_id) REFERENCES time (time_id) ENABLE NOVALIDATE;
Another usage of ENABLE NOVALIDATE is for quickly creating an enforced constraint, even when the constraint is believed to be true. Suppose that the ETT process verifies that a foreign-key constraint is true. Rather than have the database re-verify this foreign-key constraint (which would require time and database resources), the data-warehouse administrator could instead create a foreign-key constraint using ENABLE NOVALIDATE.
It is not uncommon for the ETT process to verify that certain constraints are true. For example, the ETT process may validate all of the foreign keys in the incoming data for the fact table. In these environments, the data warehouse administrator may choose to trust the ETT process to provide clean data, instead of implementing constraints in the data warehouse.
For example, during the ETT process, the data warehouse administrator may have verified that a foreign-key constraint is true. Rather than have the database re-verify this foreign-key constraint (which would require time and database resources), the data warehouse administrator could instead create a foreign-key constraint with the RELY state:
ALTER TABLE sales add CONSTRAINT fk_sales_time FOREIGN KEY (sales_time_id) REFERENCES time (time_id) DISABLE NOVALIDATE rely;
The existence of RELY constraints, even though they are not used for data-validation, may be important for several purposes, including:
Creating a RELY constraint is very inexpensive. Because the constraint is not being validated, there is no data processing necessary to create such a constraint.
All constraints can be validated in parallel. When validating constraints on very large tables, parallelism is often necessary to meet performance goals. The degree of parallelism for a given constraint operation is determined by the default degree of parallelism of the underlying table.
Many aspects of creating and maintaining constraints can be executed on a per-partition basis. Later chapters will discuss the significance of partitioning for data warehousing, and partitioning can provide benefits to constraint management (just as partitioning provide benefits to managing of many other operations). For example, Chapter 14, "Loading and Refreshing", provides a scenario example of creating unique and foreign-key constraints on a separate staging table, and these constraints are maintained during the EXCHANGE PARTITION statement.