Describes the two common query techniques, "as is" and "as was" queries:
Characteristics of an As Is Query
An As Is query has the following characteristics:
The resulting report shows the data as it happened.
The snowflake dimension tables are also joined using the surrogate key columns (that is the primary key and foreign key columns).
The fact table is joined with the dimension tables (at leaf level) using the surrogate key column.
Slowly-changing data in the dimensions are joined with their corresponding fact records and are presented individually.
It is possible to add up the components if the different versions share similar characteristics.
Characteristics of an As Was Query
An As Was query (also known as point-in-time analysis) has the following characteristics:
The resulting report shows the data that would result from freezing the dimensions and dimension hierarchy at a specific point in time.
Each snowflake table is initially filtered by applying a point-in-time date filter which selects the records or versions which are valid as of the analysis date. This structure is called the point-in-time version of the snowflake.
The filtered snowflake is joined with an unfiltered version of itself by using the natural key. All of the snowflake attributes are taken from the point-in-time version alias. The resulting structure is called the composite snowflake.
A composite dimension is formed by joining the individual snowflakes on the surrogate key.
The fact table is joined with the composite dimension table at the leaf level using the surrogate key column.
The point-in-time version is super-imposed on all other possible SCD versions of the same business entity -- both backward and forward in time. Joining in this fashion gives the impression that the dimension is composed of only the specific point-in-time records.
All of the fact components for various versions add up correctly due to the super-imposition of point-in-time attributes within the dimensions.
Provides examples that illustrate the characteristics of As Is and As Was queries.
The following examples illustrate the characteristics of As Is and As Was queries:
Data used for the examples
Assume that your data warehouse has a Customer table, a County, and a TaxPaid fact table. As of January 1, 2012, these tables include the values shown below.
Customer Table
Table 5-2 Customer Table
| Customer ID | Customer Code | Customer Name | Gender | Marital Status | County ID | County Code | County Name | ... | Eff Frm | Eff To |
|---|---|---|---|---|---|---|---|---|---|---|
|
101 |
JoD |
John Doe |
Male |
Single |
5001 |
SV |
Sunnyvale |
... |
1-Jan-12 |
31-Dec-99 |
|
102 |
JaD |
Jane Doe |
Female |
Single |
5001 |
SV |
Sunnyvale |
... |
1-Jan-12 |
31-Dec-99 |
|
103 |
JiD |
Jim Doe |
Male |
Married |
5002 |
CU |
Cupertino |
... |
1-Jan-12 |
31-Dec-99 |
Table 5-3 County Table
| County ID | County Cde | County Name | Population | ... | Eff Frm | Eff To |
|---|---|---|---|---|---|---|
|
5001 |
SV |
Sunnyvale |
Very High |
... |
1-Jan-12 |
31-Dec-99 |
|
5002 |
CU |
Cupertino |
High |
... |
1-Jan-12 |
31-Dec-99 |
Table 5-4 Tax Paid Table
| Customer ID | Day | Tax Type | Tax |
|---|---|---|---|
|
101 |
1-Jan-12 |
Professional Tax |
100 |
|
102 |
1-Jan-12 |
Professional Tax |
100 |
|
103 |
1-Jan-12 |
Professional Tax |
100 |
Assume that the following events occurred in January 2012:
On January 20, 2012, Jane Doe marries.
On Jan 29, 2012, John Doe moves from Sunnyvale to Cupertino.
Consequently, as shown below, on February 1, 2012, the Customer and TaxPaid tables have new data while the values in the County table stay the same.
Table 5-5 Customer table
| Customer ID | Customer Code | Customer Name | Gender | Marital Status | County Id | County Code | County Name | ... | Eff Frm | Eff To |
|---|---|---|---|---|---|---|---|---|---|---|
|
101 |
JoD |
John Doe |
Male |
Single |
5001 |
SV |
Sunnyvale |
... |
1-Jan-12 |
29-Jan-12 |
|
102 |
JaD |
Jane Doe |
Female |
Single |
5001 |
SV |
Sunnyvale |
... |
1-Jan-12 |
20-Jan-12 |
|
103 |
JiD |
Jim Doe |
Male |
Married |
5002 |
CU |
Cupertino |
... |
1-Jan-12 |
31-Dec-99 |
|
104 |
JaD |
Jane Doe |
Female |
Married |
5001 |
SV |
Sunnyvale |
... |
21-Jan-12 |
31-Dec-99 |
|
105 |
JoD |
John Doe |
Male |
Single |
5002 |
CD |
Cupertino |
... |
30-Jan-12 |
31-Dec-99 |
Table 5-6 County table
| County Id | County CD | County Name | Population | ... | Eff Frm | Eff To |
|---|---|---|---|---|---|---|
|
5001 |
SV |
Sunnyvale |
Very High |
... |
1-Jan-12 |
31-Dec-99 |
|
5002 |
CU |
Cupertino |
High |
... |
1-Jan-12 |
31-Dec-99 |
Table 5-7 Tax Paid Table
| Customer ID | Day | Tax Type | Tax |
|---|---|---|---|
|
101 |
1-Jan-12 |
Professional Tax |
100 |
|
102 |
1-Jan-12 |
Professional Tax |
100 |
|
103 |
1-Jan-12 |
Professional Tax |
100 |
|
105 |
1-Feb-12 |
Professional Tax |
100 |
|
104 |
1-Feb-12 |
Professional Tax |
100 |
|
103 |
1-Feb-12 |
Professional Tax |
100 |
As Is Query for Tax Collection Split by Marital Status
Assuming the Data used for the examples, to show the tax collection data split by martial status, the following SQL statement that joins the TaxPaid fact table and the Customer dimension table on the cust_id surrogate key and the Customer and County snowflakes on the cnty_id surrogate key:
SELECT cust.cust_nm, cust.m_status, SUM(fct.tx) FROM taxpaid fct, customer cust, county cnty WHERE fct.cust_id = cust.cust_id AND cust.cnty_id = cnt.cnt_id GROUP BY cust.cust_nm, cust.m_status ORDER BY 1,2,3;
Note that there are two rows for Jane Doe; one row for a marital status of Married and another for a marital status of Single.
Table 5-8 County Table
| Customer Name | Marital Status | Tax |
|---|---|---|
|
Jane Doe |
Married |
100 |
|
Jane Doe |
Single |
100 |
|
Jim Doe |
Married |
200 |
|
John Doe |
Single |
200 |
Example 5-6 As Was Queries for Tax Collection Split by Marital Status
Assuming the Data used for the examples, issue the following SQL statement to show the tax collection data split by marital status using an analysis date of January 15, 2012:
select
cust.cust_nm, cust.m_status, sum(fct.tax)
from
TaxPaid fct,
(
select
cust_act.cust_id, cust_pit.cust_cd, cust_pit.cust_nm,
cust_pit.m_status, cust_pit.gender,
cust_pit.cnty_id, cust_pit.cnty_cd, cust_pit.cnty_nm
from Customer cust_act
inner join (
select
cust_id, cust_cd, cust_nm,
m_status, gender,
cnty_id, cnty_cd, cnty_nm
from Customer cust_all
where to_date('15-JAN-2012', 'DD-MON-YYYY') between eff_from and eff_to
) cust_pit
on (cust_act.cust_cd = cust_pit.cust_cd)
) cust,
(
select
cnty_act.cnty_id, cnty_pit.cnty_cd, cnty_pit.cnty_nm
from County cnty_act
inner join (
select
cnty_id, cnty_cd, cnty_nm
from County cnty_all
where to_date('15-JAN-2012', 'DD-MON-YYYY') between eff_from and eff_to
) cnty_pit
on (cnty_act.cnty_cd = cnty_pit.cnty_cd)
) cnty
where fct.cust_id = cust.cust_id
and cust.cnty_id = cnty.cnty_id
GROUP BY cust.cust_nm, cust.m_status
order by 1,2,3;
The results of this query are shown below. Since Jane Doe was single on January 15, 2012 (the analysis date), all tax for Jane Doe is accounted under her Single status.
Example 5-7 As Was Queries for Tax Collection Split by Marital Status
As Was Queries for Tax Collection Split by Marital Status
Table 5-9 County Table
| Customer Name | Marital Status | Tax |
|---|---|---|
|
Jane Doe |
Single |
200 |
|
Jim Doe |
Married |
200 |
|
John Doe |
Single |
200 |
Assume instead that you issued the exact same query except that for the to_date phrase you specify 09-FEB-2012 rather than 15-JAN-2012. Since Jane Doe was married on February 9, 2012, then, as shown below all tax for Jane Doe would be accounted under her Married status.
Table 5-10 County Table
| Customer Name | Marital Status | Tax |
|---|---|---|
|
Jane Doe |
Married |
200 |
|
Jim Doe |
Married |
200 |
|
John Doe |
Single |
200 |
Example 5-8 As Is Query for Tax Collection Data Split by County
As Is Query for Tax Collection Data Split by County
Assuming the Data used for the examples, issue the following SQL statement to show the tax collection data split by county:
SELECT cust.cust_nm, cnty.cnty_nm, SUM(fct.tax) FROM TaxPaid fct, customer cust, county cnty WHERE fct.cust_id = cust.cust_id AND cust.cnty_id = cnty.cnty_ID GROUP BY cut.cust_nm, cnty.cnty_nm ORDER BY 1,2,3;
The results of this query are shown below. Note that since John Doe lived in two different counties, there are two rows of data for John Doe.
Table 5-11 County Table
| Customer Name | County Name | Tax |
|---|---|---|
|
Jane Doe |
Sunnyvale |
200 |
|
Jim Doe |
Cupertino |
200 |
|
John Doe |
Cupertino |
100 |
|
John Doe |
Sunnyvale |
100 |
Example 5-9 As Was Queries for Tax Collection Data Split by County
As Was Queries for Tax Collection Data Split by County
Assuming the Data used for the examples, issue the following SQL statement to show the tax collection data split by county using an analysis date of January 15, 2012:
select
cust.cust_nm, cnty.cnty_nm, sum(fct.tax)
from
TaxPaid fct,
(
select
cust_act.cust_id, cust_pit.cust_cd, cust_pit.cust_nm,
cust_pit.m_status, cust_pit.gender,
cust_pit.cnty_id, cust_pit.cnty_cd, cust_pit.cnty_nm
from Customer cust_act
inner join (
select
cust_id, cust_cd, cust_nm,
m_status, gender,
cnty_id, cnty_cd, cnty_nm
from Customer cust_all
where to_date('15-JAN-2012', 'DD-MON-YYYY') between eff_from and eff_to
) cust_pit
on (cust_act.cust_cd = cust_pit.cust_cd
) cust,
(
select
cnty_act.cnty_id, cnty_pit.cnty_cd, cnty_pit.cnty_nm
from County cnty_act
inner join (
select
cnty_id, cnty_cd, cnty_nm
from County cnty_all
where to_date('15-JAN-2012', 'DD-MON-YYYY') between eff_from and eff_to
) cnty_pit
on (cnty_act.cnty_cd = cnty_pit.cnty_cd)
) cnty
where fct.cust_id = cust.cust_id
and cust.cnty_id = cnty.cnty_id
GROUP BY cust.cust_nm, cnty.cnty_nm
order by 1,2,3;
The results of this query are shown below. Note that since John Doe was in Sunnyvale as of the analysis date of January 15, 2012, all tax for John Doe is accounted for under the Sunnyvale county.
Table 5-12 County Table
| Customer Name | County Name | Tax |
|---|---|---|
|
Jane Doe |
Sunnyvale |
200 |
|
Jim Doe |
Cupertino |
200 |
|
John Doe |
Sunnyvale |
200 |
Assume instead that you issued the exact same query except that for the to_date phrase you specify 09-FEB-2012 rather than 15-JAN-2012. Since John Doe lived in Cupertino on February 9, 2012, then, as shown below all tax for John Doe would be accounted under Cupertino.
Table 5-13 County Table
| Customer Name | County Name | Tax |
|---|---|---|
|
Jane Doe |
Sunnyvale |
200 |
|
Jim Doe |
Cupertino |
200 |
|
John Doe |
Cupertino |
200 |